The R Package surveysd: Estimating Standard Errors for Complex Surveys with a Rotating Panel Design
About the Authors, Johannes Gussenbauer and Gregor de Cillia, Statistics Austria - By Nancy Torrieri - Interview Editor
Johannes Gussenbauer studied mathematics (in Economics) at the Vienna University of Technology. During his study years he had an internship at the National Bank of Austria where he gained first-hand experience on real-life statistical data sets and applying some of the theory learned at the university.
After receiving his masters degree, with a focus on statistics, Johannes started working in the private sector as a data scientist for about a year before applying for a position at Statistics Austria. His experience in the private sector has proved very useful so far since many of his current projects align well with his experience as a data scientist.
Apart from the daily statistical work involving sampling, calibration and estimation, some of his current projects also involve web scraping, text classification or modelling survey responses using machine learning. Johannes noted, “these skills and focus areas provide great opportunities to extend my abilities and knowledge in new areas.”
Johannes is currently working on his Ph.D., with projects related to his work at Statistics Austria. In his free time he likes to cook, play sports (although just occasionally) and play some video games in order to relax.
Gregor de Cillia studied mathematics (in Economics) at the Vienna University of Technology.
During those studies, he pursued some side projects in the field of programming, including a nine-month internship at the Austrian Institute of Technology. The internship led him to acquire significant experience in modern software development. This proved very useful when he started working at Statistics Austria because the unit he’s working in uses and promotes learning about data science workflow methodologies.
Currently, Gregor is transitioning his projects to focus on visualizations, user interface (UI) design and front-end development. For the surveysd project, Gregor was mainly involved in front-end improvements. That required working closely with the package’s current and future users. According to Gregor, “getting into the heads of these users is important for making the package well-documented and easy to use.”
Aside from writing R packages, Gregor also likes cooking and listening to or playing music. Getting up to date with computer science practices (in R and other languages) is also something that he enjoys from time to time. “I also think that reading about other computer languages helps me improve my ability to think outside the box in professional projects.”
The R Package surveysd: Estimating Standard Errors for Complex Surveys With a Rotating Panel Design
Summary of a Paper by Johannes Gussenbauer and Gregor de Cillia, Statistics Austria - Submitted for the 2020 IAOS Young Statisticians’ Prize Competition
Johannes Gussenbauer and Gregor de Cillia are methodologists with Statistics Austria and authors of The R Package surveysd: Estimating Standard Errors for Complex Surveys With a Rotating Panel Design. This paper, submitted for consideration in the YSP competition for 2020, describes the rewards and challenges of using surveysd to produce estimates and corresponding standard errors for complex surveys. Error estimation for indicators regarding trends or net changes over time can be burdensome to produce especially if the correlation between sampling waves should be considered. The R package surveysd aims to support a harmonious and straightforward approach for producing estimates and corresponding standard errors for surveys with a rotating panel design. The core functionality is comprised of only three functions which incorporate:
1. drawing bootstrap replicates according to the sampling design of the survey
2. calibrating those bootstrap replicates to known population margins
3. calculating indicators together with their error estimates
This minimalistic design supports a streamlined production process for statistical estimates and also enables people who are not familiar with the programming language R or programming in general to use this functionality for their production process.
The error estimation in the package is done through bootstrap techniques which can incorporate the complete panel design and can yield consistent estimates for variance even for non-smooth estimators like the median. The procedure can also consider the panel rotation if each survey respondent in the data has a unique identifier that stays constant over the panel waves.
In addition, the package supports a method for producing more efficient estimates by cumulating multiple consecutive sample waves. This can lead to a significant decrease in variance assuming that structural patterns for the indicator in question remain fairly robust over time.
The application of the Package surveysd is showcased on a prominent sample with a rotating panel design, the EU Statistics on Income and Living Conditions (EU- SILC) data collection launched in 2004. This data collection is conducted annually in each member country with a rotating panel of four waves. In combination with a harmonized survey a set of common indicators, the so-called Laeken indicators, were adopted for the countries of the EU. Johannes and Gregor noted that “the survey design and sample size of the EU-SILC is tuned to deliver qualitatively high well-being indicators at national or NUTS1 level but usually lacks the capability to do the same for smaller subgroups of the survey population.” Using multiple consecutive sample waves can however greatly improve this situation and could yield qualitatively high well-being indicators on NUTS2 levels and below. Thus, qualitative indicators can be delivered on a regional level. This can greatly support regional policy making.
The authors noted the availability of step-by-step instructions for usage of the package, which is available online under https://statistikat.github.io/surveysd/. The website also includes more details on the methodology used. The package may be easily extended and will most likely be so in future releases. Examples of the application of these extensions range from improving the methodology used to extending capabilities to analyze the results at hand.