Good Data are Used Data; The 20th discussion


‘Good Data are Used Data’; the 20th SJIAOS Discussion


Questions triggered by the SJIAOS interview with Stefan Schweinfest

How to qualify the use of data?


The 20th SJIAOS  discussion; ‘Good Data are Used Data’  Invites the readers to react with their opinion on a statement from Stefan Schweinfest in the interview published in this issue. In the interview with Pieter Everaers, Stefan Schweinfest, when speaking about the current greater complexity in the world of official statistics, with new methodologies, big data, and for example, data scientists, states that he changed his opinion on the role of official statisticians, from an approach focussing on the quality of producing and disseminating to an approach giving much more importance to the use of the data, ‘good data item is a used data item’. He says that ‘… data have to be validated from two dimensions: First, they are ‘good’ items from a production side, methodologically solid, and trusted. But secondly, they are also only ‘good’ if they are used’. 


The statement ‘Good Data are Used Data’ triggers several questions. First of course on what is considered ‘Data’, but also how to define ‘Used’ or ‘Good’. In the interview, Stefan Schweinfest uses a rather broad concept of users ‘And when I say users, it is also not only official users. This is also something that has changed: the data that we are producing are not only for the government but they are also used by civil society and in the private sector.’


To make this statement more concrete it is useful to reflect on the concepts behind ‘Data’, ‘Good’, and ‘Used’, as well as on the operationalization of these concepts. For this discussion, it is proposed to consider ‘Data’ as covering all information, facts, numbers, etc. that are published/disseminated by official statistics organizations (Regional, national, and international statistical organizations). 


According to the statement, to identify the ‘Good’ data it is required to qualify the ‘Used’. Also the reverse of this statement, ‘Non-used data are not Good’ is interesting to reflect on. The essential element of both statements is the measurement of ‘use’. 


The readers are invited to reflect by their comments on the main question for the discussion: how to measure the use?


One needs to define a measurement rule to operationalize the' used'. Such a rule will contain elements like which societal groups are potential/real users.;  whether there are specific groups (policymakers) that we consider main or more important users; do we need a categorization of the magnitude of the use, whether is it nominal or is there a gliding scale of use, like the number of downloads or citations,  or certain other criteria when considering data used or not-used? Or maybe an impact measurement of the used data?

In defining the use one may consider the use of data in time series. The immediate policy use is not that visible. 


Readers are invited to specifically reflect on how to measure and qualify date that are solely used in time series. 


Readers are invited for suggestions on how to measure the use, and especially which elements should be included  Of course, as an element of the quality of data, use is a frequently discussed topic. Therefore, readers are invited to reflect on the measurement of use but are also invited to refer to literature or analytic results.