Interviewer fabrication (“fake interviews”) is a problem in all interviewer-conducted surveys and repeatedly come up in the Survey of Health, Ageing and Retirement in Europe (SHARE), as well. While there are many variations and different reasons for interviewers deviating from properly administering the survey, in this project we will only deal with the most extreme deviation, i.e. interviewers’ fabrication of entire interviews.
The main aim of this project is to implement a technical procedure to identify fakes in computer administered survey data. In contrast to previous work that often used only few variables to identify fake interviews, we implement a more complex approach that uses variables from different data sources to build up a comprehensive mechanism in order to identify fake interviews. We use several indicators from CAPI data (size of social networks, avoiding follow up questions, number of proxy interviews, rounding in physical tests, extreme answering, straight-lining, number of “other” answers, number of missings) as well as paradata (interview length, number of interviews per day, number of contact attempts, cooperation rates). We combine these indicators using a multivariate cluster analysis to distinguish two groups of interviewers: a falsifier group and an honest interviewer group.
During the sixth wave of the Survey of Health, Ageing and Retirement in Europe (SHARE) we discovered a very elaborate team of falsifiers who faked a fairly large part of the net sample. We use these known fakes as a kind of benchmark to check if our script is able to properly identify fake interviews. Thus, in comparison to most of the existing work so far, our study has the advantage to be based on a large data set including information on actual fakes.
Results show that we are able to identify most of the faked interviews, while at the same time we are able to keep the number of “false alarms” small. Although most of the time we cannot be perfectly sure if an interview has been faked or not, our results can be used to provide survey agencies with a much more informed sample for back checking suspicious interviewers and interviews during fieldwork of the current wave and future waves to come. By this, we hope that we can substantially improve the quality of our survey data.