Interview with Vianney Costemalle - First Prize Winner of the 2019 Young Statistician's Prize Competition

Official Statistics staircase

Read our interview with First Prize Winner of the 2019 Young Statistician's Prize competition.

Official Statistics: Vianney Costemalle

"There is a growing demand for information and in particular for locally relevant information. Understanding the environment (what is in the neighborhood) of the population is essential for localized public policies"

By Nancy Torrieri, Interview Editor - SJIAOS

Vianney Costemalle heads the spatial analysis section of the National Institute of Statistics and Economic Studies, Paris, Statistical Methods Department.  Earlier in his career he served as a demographical methodology specialist in the demographical department of the same agency. He has a Master of Science in Statistics from the national school of statistics and economical administration (ENSAE Paris Tech) and an Engineer diploma of École Polytechnique, one of the top engineering schools of France. 

Vianney’s paper, Detecting Geographical Differencing Problems in the Context of Spatial Data Dissemination, won First Prize as a result of the 2019 IAOS competition for young statisticians.

A major part of Vianney’s professional work and that of his colleagues focuses on the issue of statistical disclosure control (SDC). The provision of very small-scale data provided through a 200-meter square cells grid and through administrative units representing France’s 35,000 municipalities leads to the aggregation of the same variables on different overlapping areas. This could lead to a household’s identification either directly, because the number of households involved in the aggregate is below some threshold, or indirectly by linear combination of published aggregates. 

Two major classes of methods to avoid the disclosure of confidential information rely on perturbative methods, or the identification and suppression of aggregates. The SDC usually adopts ad-hoc measures of risk of the published information with respect to a given household. Vianney adopted a simpler measure of risk: an aggregate is assumed problematic when the number of households involved is below a conventional threshold. This threshold is a parameter of the algorithm. Vianney’s paper is centered on indirect SDC, and specifically in the case of geographic differencing.

The differencing technique can be used when the disseminated data are additive, meaning these data can be added together or subtracted from each other (for example, the population’s total income is additive whereas the median income is not). The principle of the geographic differencing technique consists in making the difference on additive data between the data disseminated in an encompassing area A and the data disseminated in an encompassed area B.  One can then deduce the value of the additive variable on the complementary region of B within A, noted A\B, and named the differenced area. If the differenced area contains fewer observations than a confidentiality threshold, privacy is jeopardized. The difficulty of finding regions A and  B, which leads to a confidentiality issue, lies in the fact that regions A and B can be relatively complex in terms of being composed of multiple smaller areas.  The differencing technique, if used by an ill-intentioned person, can lead to a breach of statistical confidentiality. Differencing, Vianney points out, makes it possible to deduce information on areas smaller than those that were originally planned. In several examples, Vianney shows how differencing can help reduce dramatically the workload of examining differencing combinations from an intractable problem to a much smaller and tractable one, although the new method he has developed for handling complex geographic differencing problems has some limitations.

Vianney’s paper contributes to the research surrounding SDC. He brings attention to a new method for dealing with geographical differencing problems, and provides a framework and methods for their study.  His research serves to remind users of official statistics that there is a growing need to enhance the use of spatial data and the spatial dimension in the collection, processing and dissemination of statistics.

I asked Vianney about how geography is serving as a catalyst for statistical research at Insee. I also asked him to describe the kind of education, resources, and courses that should be considered by anyone seeking to conduct cross-disciplinary studies that embrace a statistical framework coupled with a spatial perspective.

He replied, “A priori, geography is not at the core of the official statistician work, even though geography has always been related to censuses. Nowadays, there is a growing demand for information and in particular for locally relevant information. Understanding the environment (what is in the neighborhood) of the population is essential for localized public policies. For surveys this implies one should explore new methods such as those of small area statistics. Moreover, to have precise spatial data, it is necessary to geolocate databases (administrative ones as well as survey ones), which implies to build some geographic referential and to develop new methods. This allows one to apply spatial methods once statistical observations have been localized (for example through clustering or smoothing). Lastly, visualization plays a very important role in communicating and disseminating spatial information. The institute works to go into those issues in depth in order to provide new geostatistical information.”

“Statistics can be seen as a branch of mathematics, but applied statistics is by nature cross-disciplinary. So it is important to have some knowledge about the field of the studied object. Spatial data are by essence very voluminous as the data describe two- dimensional phenomena (three-dimensional if one adds time), which leads to the importance of knowing how to deal with big data. Mastering spatial database programming languages such as PostGis seems useful. What’s more, there are plenty of free educational resources on the Internet to learn how to deal with geographical data, such as the Spatial Data Science book by Pebesma and Bivand or the Handbook for Spatial Analysis by Insee and Eurostat, for those using the R software. As signal processing can be useful for time series analysis, image processing knowledge is of utility when dealing with raster data, in particular satellite imagery. Environmental sciences courses can also help dealing with statistical issues concerning environment and land use topics.”

At the conclusion of my interview of Vianney, I asked him to comment on any aspect of his experience at the ISI World Statistics Conference held in August 2019.  Vianney noted that “the general atmosphere was very inviting, with lots of people coming from all around the world, and that the reception by the Malaysian people was wonderful.” He declared that “he felt very excited by all the richness of statistics, in official statistics as well as in other fields like health studies or astrostatistics,” and he understands that “every statistician is facing the same challenges, joys and difficulties.“ He added that he was especially interested in and appreciative of the talks paying attention to ethics in statistics and the responsibility of statisticians.