Attention Checks
A forthcoming article in the Journal of Survey Statistics and Methodology by Tobias Rettig and Annelies Blom raises important concerns about the still all-too-frequent practice of screening out respondents on the basis of attention checks in web surveys. Employing a commonly used version of attention checks, in the course of a body of text – the length of which they experimentally vary – they direct respondents to click on a specific button. They then evaluate who fails this check, both in terms of socio demographics and the experimentally varied length of the item.
In this German panel experiment, these researchers find that younger people, men, and people with less educational attainment are more likely to fail the attention check, as are people who are experimentally assigned a longer statement to read (see graph). It is also not the first study to have shown that people who pass and fail attention checks differ significantly in politically relevant ways. Berinsky, Margolis, and Sances (2014) also find that American respondents who fail screeners differ demographically, including race, gender and age. There are also differences by political interest; Alvarez and Li (2021) find that respondents with validated voter turnout are more likely to pass attention checks. And Alvarez et. al. (2019) find that inattentive respondents report less political participation and demonstrate less political knowledge, and greater uncertainty about policy positions (especially lesser-known policies). Clearly, removing respondents who fail these attention checks would have significant implications for the representativeness of the composition of the resulting sample.
If screening responses on the basis of attention checks is so bad for sample quality, why do researchers do this? Well, attentive respondents provide higher quality data on a number of dimensions. While Alvarez and Li find that attentive and inattentive respondents differ in their vote history, they also report that attentive respondents are more accurate in reporting their vote history and mode, for example. Similarly, Alvarez et. al. show more straightlining among people who fail attention checks, which also implies lower accuracy.
Second, as Berinsky, Margolis and Sances find, those who past multiple attention checks show larger effects in survey experiments, probably because these respondents actually processed the experimental stimuli. Therefore in two different ways, limiting analysis to attentive respondents reduces the noise in survey data. (One thing that attention checks do not do is identify entirely bogus respondents, as this Pew Research study shows).
This then poses a potential paradox for researchers: do you prefer more measurement error, or more non-response error? It is worth noting that none of these authors advocate screening out inattentive respondents, despite the implications of attention for data quality. How then can researchers square this circle?
First, researchers can conduct attention checks but not disqualify respondents who fail them, and then evaluate the heterogeneity of responses or treatment effects with respect to performance on these checks. For example, are the message or drop-off effects of an experimental message in a survey experiment greater among people who pass an attention check, or are the estimates just less noisy? If the latter, the researchers can be more confident in the finding, while if the latter they should consider how heterogenous treatment effects might exist with respect to traits that also correlate with attention. This is particularly relevant for text-to-web surveys (as compared to web surveys with panel providers which have a different cost structure), since the cost is mostly in recruiting respondents to the web survey such that there are no additional savings in terminating respondents on the basis of missed attention checks.
Second, researchers can consider messages in advance of the survey that encourage more attention. For example, Clifford and Jerit (2015) propose warnings about attention checks, though note that it may induce some socially desirable response bias. They also tested other mechanisms, like asking respondents to commit to pay attention, with smaller effects. But other research, such as this study produced by Qualtrics, suggests that other such commitment messages may also be an improvement over screening on the basis of failed attention checks.
Third, there is the solution implied by the experiment in the forthcoming JSSAM article we started with: make the survey easier for respondents. Rettig and Blom find that more people fail the attention check when the attention check question is longer. The implication here is not just for attention check questions, but for all questions: when they are less burdensome to answer, respondents will provide more attention to them.
While not directly tested in this piece, this new article also offers implications for survey length, question wording, and other aspects of the survey design as well. If we center the respondent user experience in how we design surveys, and actually make surveys less costly to take, we can get better data from those respondents in return. The solution to this supposed tradeoff between measurement error and non-response error is to make surveys easier for respondents to take, in which case it is not a paradox at all.
Finally, for those looking for a deeper dive into the existing research literature on attention checks, we highly recommend this recent review article published in Public Opinion Quarterly by Berinsky et. al.