2020 Young Statisticians' Prize Award winner - James Bailie

big data Official Statistics

Congratulations to the 2020 IAOS Young Statisticians' Prize Award winner James Bailie! 

Big Data, Differential Privacy and National Statistical Organizations

About the Author, James Bailie - By Nancy Torrieri - Interview Editor

James Bailie is a researcher in the Data Access and Confidentiality Methods Unit at the Australian Bureau of Statistics (ABS). He has a Bachelor of Science with Honours in pure mathematics from the Australian National University.

At the ABS, James has worked in a range of areas, including administrative data, data linkage and statistical disclosure control. He was recently seconded to the COVID19 task force at the Department of Prime Minister and Cabinet, where he has seen a new side of statistics in the public service. This experience solidified his belief in the necessity of high-quality official statistics as the evidence-base for making good public policy.

James has received a scholarship from Fulbright Australia to study for a Ph.D. in statistics at Harvard University, beginning in the fall of 2020.

Big Data, Differential Privacy and National Statistical Organizations

Summary of a Paper by James Bailie, Australian Bureau of Statistics - Submitted for the 2020 IAOS Young Statisticians’ Prize Competition

This paper provides an introduction to the concept of differential privacy (DP) from the perspective of a National Statistical Organization (NSO). Two competing obligations guide the work of NSOs: providing timely data to inform decision making and protecting the privacy of respondents. A suite of privacy tools and controls have helped make this possible. However, starting about twenty years ago, sophisticated attacks – which aim to determine respondents’ answers from the published aggregate totals – are becoming a significant and growing possibility, owing partly to the proliferation of data available online and partly to the increase in computational power available to attackers. DP has gained popularity among some circles as an effective tool for protecting against these attacks, yet has faced continued opposition from others. 

As of 2020 the use of DP is more limited than one might expect. Continued efforts will be needed to advance its role in the future. Some of the factors that might explain the challenges of using DP include: 

  1. The mathematical and computational complexities inherent in DP;
  2. A lack of key knowledge about NSO priorities among DP researchers, who are predominantly computer scientists; 
  3. Limited progress toward a multidisciplinary research agenda;
  4. Concern about DP’s impacts to the utility of published outputs

NSOs can help address challenges of using DP by:

  1. Helping develop tools to efficiently implement DP at the scale of a typical NSO;
  2. Applying DP to practical examples, which effectively balance privacy against data utility and resolve the myriad of real-word complexities inherent in a typical official statistics publication;
  3. Building an effective mathematical language to express privacy concepts and educating key stakeholders in this language.

The big data era is helping to create a paradigm shift in how data are generated, collected and used. NSOs have an important leading role in this process. By increasing the utility of their data and supporting products such as data visualization tools, NSOs will lead change in producing and generating statistics. By engaging with DP in practice, NSOs will encourage researchers to develop DP methodologies and software that address official statisticians’ needs. These resources will enable NSOs to continue publishing high-quality data while also safeguarding respondent privacy, in a world where these two competing obligations are becoming both increasingly important and increasingly difficult.

The full paper is available here: https://content.iospress.com/articles/statistical-journal-of-the-iaos/s…