"SJIAOS "Each key player has expertise in different domains, and for any big change in the production system to be successful, you really do need everyone on board."
By Nancy Torrieri, Interview Editor - SJIAOS
Nancy Wang is a design analyst at Statistics New Zealand. She works with administrative data and statistical methods. She holds Masters of Applied Statistics and Bachelor of Science degrees from the Victoria University of Wellington and the University of Otago, respectively. Nancy has a strong background in data integration and linking methodologies, statistics and mathematics.
Nancy’s paper, Using Address Histories to Improve the Link Rates of Surveys in the Integrated Data Infrastructure (IDI) describes how high-quality geographic information collected in surveys were utilized to improve the link rates of survey data in the Integrated Data Infrastructure (IDI), the primary database in New Zealand used for policy evaluation and academic research as well as the production of statistical initiatives such as the 2018 Census and child poverty statistics.
Data in the IDI are linked through a central dataset called the spine, comprised of three core datasets: births (registered in New Zealand from the Department of Internal Affairs (DIA), tax registrations from Inland Revenue (IR) and visa applications from the Ministry of Business, Innovation, and Employment (MBIE). The spine represents the resident population for New Zealand (under- and over-coverage can occur). All other datasets are linked to the spine.
Initially, the IDI contained data from administrative sources and needed only one phase of linking. The methodologies developed to link these datasets together relied heavily on variables such as name and date of birth. Later, to optimize the linkage of the 2013 Census to the IDI spine, geographic information was incorporated into the linking methodology. A two-phase process was established to attach geographies contained in other administrative data that had been linked to the spine.
As other datasets were added, link rates declined and were not as high as previously attained. This was attributable to changes in the quality of names collected through surveys and the effects of time-referencing the geographic variables on the spine to the 2013 Census. Addresses collected in surveys across time were not necessarily close to addresses collected in 2013. Revisiting the linking methodology, consultations with the IDI research community, and the growing combined use of survey and administrative data pointed to the need for high-quality linkage of datasets. Improving link rates in the IDI became the focus. To enable this, it was decided that the time dimension of geographic information in the spine would no longer be fixed to a specific reference date and the entire address history of individuals was included on the spine.
The new address-based linking methodology was implemented in the IDI as part of the March 2019 production rebuild. As Nancy shows in a series of figures, overall, the link rates improved, providing researchers with a more complete picture of the survey population. Nancy noted that the continuing collaboration of data linkers, subject matter experts, application developers and others will bode well for the future.
To learn more about how collaboration helped Nancy and her colleagues, I asked her to describe the collaboration process, the key players, and how she envisions they will be involved in the future development of linking methodologies.
“The IDI is supported by several different teams who specialize in different domains. The integrated data team are the product owners of the IDI, and are responsible for the ongoing production and development of the IDI. They hold key relationships with data suppliers, and understand the entire IDI system. This team is supported by application developers from Digital Business Services – they develop the technical processes and systems of the IDI build, and their expertise as SQL coders is critical to help build the IDI. The IDI is also supported by methodologists, who are specialists in data integration methodology and linked data quality (that’s my role/team!). The last key player – and arguably the most important one – is the research community. They are the people who are actually using the IDI on a day-to-day basis so can give us feedback on what areas need improvement.”
Nancy provided details on how the collaboration process worked. “The IDI production/development team went out to consult IDI researchers on what areas of the IDI could be improved. They wanted to make evidence-based decisions on where to invest development resources. When they came back with the issue of survey linkage, I had actually already developed a proof-of-concept on the address-based linkage methodology and had a good understanding of what processes needed to be changed in order to implement the design. This was because around the same time, the child poverty project team at Stats NZ required an increased link rate for the Household Economic Survey (HES) in the IDI, as they wanted to use administrative income sources for their poverty measures. I developed the bespoke solution for this, which my team and I were planning on implementing in production at some point. The timing of the two projects was perfect as now we had researcher agreement that it was a pressing issue.”
When it came to implementation, Nancy worked directly with the application developers on the technical details as they were experts in designing IDI system changes. “We worked together to ensure the solution put in place would enable the implementation of the methodology that was designed. We had backing from the IDI production team to do this work which is important as they drive the development priorities.”
“As is evident,” Nancy continued, each key player has expertise in different domains, and for any big change in the production system to be successful, you really do need everyone on board –which is exactly what happened. That made the collaboration really easy – we all had a clear shared goal.”
Nancy contends that the future development of linking methodologies will require a good researcher feedback loop and close collaboration among production teams, researchers and methodologists. In addition, she believes that the relationship between methodologists and data suppliers should be strengthened. She added, “the data suppliers know the data the best, and in order for a good linking methodology to be developed, or new innovative methods to be developed, methodologists need a good understanding of the underlying data.
It is clear that collaboration is critical as Stats NZ continues to develop and improve the IDI, and, as Nancy states, “unleash the power of data to change lives.”