Developing survey questions: “I read for pleasure at home”

We’re currently working on updating and developing survey items for different age groups for our research into university access interventions. We’re arranging a piloting session with a local school, where we’ll test out some survey items on our target age group to see whether they interpret them in the way we intend. One of the particular questions we’ve been discussing is

“I read for pleasure at home”

with response options 1 – 5, where 1 is ‘never’ and 5 is ‘very often’ (with 2 – 4 unspecified). Will the respondents all be thinking of the same thing when they choose a particular response – or is one person’s ‘often’ equivalent to someone else’s ‘sometimes’ or ‘rarely’?

The three of us in the meeting talked about how we would answer the question.

I’d pick 4. I read fiction most weeks in normal work time, more frequently when I’m on holiday (although not ‘most days’ as I first typed!); very frequently if I’m in the middle of a novel I’m really enjoying and for sustained periods if it’s the week before my six-weekly-ish book club. But sometimes the book club will come round and I’ve barely started the novel, and I won’t have read another in that month either. But I read the news every single day, and read blog posts and academic papers. But I didn’t really count that, as I interpret ‘read for pleasure’ as sitting down and reading fiction (although if pressed then I’d include e.g. popular science, biography etc too). (There’s some interesting attempts to define ‘reading for pleasure’ in this National Literacy Trust report.)

But the main reason I picked 4, which I think might be most interesting when related to question interpretation, is that I see myself as a clever, book-y sort of person – the kind of person that researchers would expect to read a lot. Instead of just answering the question in a straightforward way, I was – partially sub-consciously – trying to answer what the hypothetical researchers were really trying to find out.

One of my colleagues did something similar: she reads fiction daily, but answered 3 (half-way between ‘very often’ and ‘never’) because “it isn’t very high-quality fiction”. She “enjoys some of the journal articles” she reads, but didn’t count that as ‘reading for pleasure’.

My other colleague was more straightforward: she answered 2, because although she likes reading she doesn’t get the chance to read very often. We’re both in the same book club, so I know she’s read many of the same novels as me over the past year – but I saw myself as a frequent reader and she saw herself as close to never.

Is there a term in the literature for the phenomenon of respondents over-interpreting the question? I don’t think it’s the same as social desirability bias – I’m not picking my answer because I want to look good, but because I think I’ve understood what the researchers are really getting at and I want my answer to bring them to an accurate conclusion about me overall. But of course I don’t actually know what the researchers are getting at, so my over-thinking could introduce bias.

We’ve decided to ask our piloting group the question with two different scales: the subjective ‘very often’ – ‘never’ and objective frequency measures (daily, weekly, monthly etc.). We can then check to see whether respondents interpret these consistently. We’re also going to ask specifically about this question in the focus group discussions. We’re also planning to ask students about what they read (e.g. fiction, non-fiction, comics, news) to help us decide whether we should ask about this in future (and if it matters), and how they read it (e.g. books, kindle, smartphone). Findings to follow in the coming weeks…


Is realist review the best approach for synthesising evidence on widening participation in higher education?

I recently conducted a systematic review, looking at evidence for the effectiveness of interventions designed to widen participation (WP) in higher education. For those unfamiliar with the terminology, a ‘systematic’ review is a type of literature review, designed to ensure that research evidence is searched for and evaluated in a complete and objective way. Researchers follow a pre-defined search strategy to ensure that their personal bias can’t affect the scope of the review. Typically, this kind of review privileges evidence from randomised controlled trials and other similar experimental designs, to answer a “what works?” question.

Some of the key findings of our review were:

  • there isn’t much ‘robust’ (i.e. randomised trial or high-quality quasi-experiment) evidence about any WP interventions, and
  • one of the things mostly likely to work are ‘black box’-type programmes – lots of different elements combined in a package, so that the trial doesn’t tell you which bits worked, only that the whole lot worked together

There are different ways to interpret the findings. Should we conclude that we need more randomised trials of WP interventions? One of the things we suggested in the review was randomising participants to particular elements within a ‘black box’ intervention, as a step to establishing which bits work and whether some parts aren’t necessary.

That’s one way of adding to the literature. But, having worked on evaluating WP interventions for a year now, meeting lots of practitioners and seeing programmes in action in their specific contexts, I’ve started thinking about whether other types of evidence might not add to the picture?

I’ve started looking further at the concept of ‘realist review’, in particular the work of Ray Pawson and colleagues. Their paper describing realist review relates to health policy interventions. The authors give seven ‘defining features’ of complex [health] service interventions that they argue make them more amenable to realist review than traditional ‘Cochrane-style’ systematic review. I’ve been thinking about whether WP interventions also have those features.

  1. The interventions are theories: they hypothesise, based on assumptions about the causes of undesirable outcomes, that delivering the intervention will bring about a change in outcomes. This applies to WP interventions, where assumptions are made about the reasons that certain groups of people are less likely to go on to higher education than others, and interventions are designed to remove barriers and change behaviour.
  2. The interventions are ‘active’: they are delivered by people, to people. Practitioners may change their delivery of the intervention based on their experiences, and participants’ life experiences will bear on whether they change their behaviour. Tick.
  3. Intervention theories have a ‘journey’ to delivery and change can happen at every step. I’m not sure that this issue looms quite as large in WP interventions as it does in healthcare, but there are some similar processes – in the UK, for example, universities’ access agreements are planned at a high level in the institution, programmes are likely to be delivered by access officers working lower down the scale, and may involve working with different academic departments where staff want to implement the programme differently to adapt to the needs and characteristics of their particular target students.
  4. Interventions are implemented in a non-linear way: the experiences of those delivering the intervention ‘on the ground’ can feed back into the design and focus of the intervention so that what was a ‘top-down’ directive can develop in a bottom-up way. Yes, practitioners have the power to change the way interventions are delivered.
  5. Interventions are ‘fragile’: they change depending on the context in which they are delivered. I’ve seen this first-hand this year: a clearly defined programme with explicit objectives can (and should) be delivered differently in different institutions, where their context (geographical, social, educational…) means that they are likely to achieve impact in different ways.
  6. Interventions are prone to be adapted for delivery, creating new interventions with varying degrees of similarity. We definitely see this in WP: many institutions deliver residential programmes for sixth form students, but their content and the other activities associated with them will differ from place to place.
  7. Interventions are ‘open systems’ that feed back on themselves: the intervention itself changes the conditions, so the intervention needs to change to remain effective. I don’t think this always applies to WP in the same way that it would in health, but there are similarities – for example, if a school-based intervention is particularly effective in raising aspirations and students start seeing their older peers moving on to HE, the focus might need to shift to more practical guidance or an earlier focus on subject choice to ensure conditions were in place to keep improving outcomes.

It seems to me that WP interventions are a good fit with the model for realist review. My current opinion is that WP interventions are social, and that the social world is inherently complex. I think that the ‘generative’ causal model: outcomes caused by interactions between mechanism and context (or, more likely, multiple mechanisms interacting with multiple contextual factors), is more appropriate than a simple ‘if x then y’ experimental model. Accepting this approach would undoubtedly change the scope of a review of evidence. Further thoughts on this to follow.

Reference: Pawson, R., Greenhalgh, T., Harvey, G., & Walshe, K. (2005). Realist review–a new method of systematic review designed for complex policy interventions.Journal of health services research & policy, 10(suppl 1), 21-34.

Why Most Published Research Findings Are False (Ioannidis, 2005)

We have an ‘Academic Reading Group’ in my department and we try to discuss an academic paper each month. This month we’re discussing Why Most Published Research Findings Are False (Ioannidis, 2005).

I haven’t tried to fully explain the content of the paper here, but I wanted to present my thoughts on it. The detail of the probability modelling was a bit beyond me. I managed to get my head around the first truth table, which illustrates the possible outcomes of a research study: either your hypothesis is true (there’s a true relationship) or it isn’t, and either you find it or you don’t.


Unfortunately, the algebra after that goes beyond my understanding, so I’m taking his logic as given.

In general I think that this paper offers truths that we already know, but by quantifying factors such as the likelihood of bias and the pre-study likelihood of an hypothesis being true it can then quantify the likelihood of a true finding. The likelihoods illustrated in the paper are often very small – hence the conclusion that most research findings are false.

In his discussion of bias Ioannidis talks about lots of things we already believe to be true: researchers can knowingly or unknowingly fiddle results so as to increase their likelihood of a ‘positive’ finding; the tendency to publish positive findings (both in journals and in the press) increases benefits of biased behaviour for individuals and therefore makes it more likely. Bias is really bad, invalidates your research, and if you want to do good research you must do your best to avoid bias. As I said, we already believe this to be true.

I wondered about the validity of modelling the probability of bias – is it possible to place a probability on how humans will behave in the conduct of a research trial? I concluded that it doesn’t matter if it’s empirically valid to do this for an individual trial – the aim of the paper is to model the theoretical effects of more, or less, bias. We couldn’t say that a particular trial had a ‘bias probability’ of 0.6, but we could imagine the theoretical case where the bias probability of a given trial is more or less close to 1, i.e. the trial is more or less likely to be totally biased.

The same thought process applies to the quantification of the prior probability of a true relationship. We don’t know the actual probability (if we did, we wouldn’t be doing a trial; there wouldn’t be a scientific question to answer) but we can conceptualize different points on the scale and draw conclusions accordingly.

So then I started thinking about how the probabilities relate to my field, education research. Perhaps I’m looking through rose-tinted glasses, but I’d put our prior probabilities of a true relationship pretty high – especially for large, highly funded trials. Perhaps they could be lower when government fund trials of a pet policy that’s based on whim rather than evidence – but I’d like to think that doesn’t happen too often. Ethically, when you commission a trial you should be in a position of equipoise – genuinely unsure whether the intervention works or not – which I interpret as 1:1 or pretty close to it.

Bias is always a concern, especially in trials where intervention developers are closely involved. But lots of the steps that Ioannidis recommends to minimize bias, such as registering trials and following reporting guidelines like CONSORT, are things that we do already and that funders like the EEF are on board with.

Ioannidis’ thoughts on “null fields” – a scientific field that is eventually superseded by new findings, perhaps due to new experimental methods, that prove the entire field to have been meaningless and chasing false hypotheses – was one of the most interesting parts of the paper. I wondered whether this kind of argument could apply to social research – surely we have some privileged understanding of how human and social causal mechanisms work which means we couldn’t make such a mistake?

But there are examples to the contrary from small ‘sub-fields’ in education research – learning styles, for example, or Brain Gym, were ‘supported’ by research until findings to the contrary were published. More broadly, might we eventually find that educational intervention doesn’t make a sustained difference at all, and attainment outcomes are determined solely by parental background, socio-economic status and related factors?

Finally, I wondered about the weight that Ioannidis gives to the ‘problem’ of single studies being treated as definitive truthful findings. Perhaps this is a bigger problem in medical and biological research than it is in social sciences. In education research at the moment there is some really strong work on meta-analysis – the popularity of the EEF Toolkit is one example. I think the worst culprit for over-valuing single studies is the media – which is a serious problem if we want to do research that is understood and used outside of the academic community. The problem of how to communicate the subtleties of interpreting research evidence in a simple and accessible way is one I will return to in later posts.

Ioannidis, J. P. (2005). Why most published research findings are false. PLoS medicine, 2(8), e124.

Universal Free School Meals – a good policy?

This morning I tweeted my disagreement with the new free school meals policy (that school lunches are free for all children in reception, Year 1 and Year 2). I thought it unfair that school meals are now free for children whose parents can easily afford to pay – especially when harsh and disproportionately allocated budget cuts are affecting those most in need. I thought that the policy was a blatant attempt at winning votes. 

Then I read the report on the free school meals pilot. I’m still not entirely convinced by the policy and I think there are problems with the implementation, but the evidence base for its success looks better than a lot of other government policy. 

The headline finding is that in the pilot, universal free school meals improved attainment. The intervention ‘closed the gap’, with the least affluent and lowest attaining before the pilot making the biggest gains. The gains weren’t massive, but they were ‘significant’ (in the statistical sense): around two months of extra progress. That’s more beneficial than homework for primary pupils, according to the EEF Toolkit. However, the intervention is much more expensive than a lot of other options, so it isn’t clear that it offers good value for money. 

I’m not convinced that pupils will eat a more nutritious diet. In the pilot they ate fewer bags of crisps, but also fewer pieces of fruit. With academies and free schools unregulated in the quality of their food provision, there is a risk that lunches for many pupils – especially those whose parents were able to provide healthy packed lunches – could be less healthy than before. But the benefits of treating everybody the same, and ensuring that the poorest children are provided for, might outweigh this risk. 

The pilot compared universal free school meals (in two authorities) with an increased, but still means tested, entitlement to free school meals in another area. They found that where entitlement was increased, attainment didn’t increase. Many parents who were newly eligible did not take up the offer – perhaps because of stigma, or because of the requirements of the application process. The clear indication is that if you want all those who are vulnerable and would benefit from free school meals to receive them, the best way to achieve that is to make them free for everybody. 

There’s also a positive economic argument. The government are quoting savings of up to £400 for parents. For me, with one child starting reception and the other going into Year 2, we’ll be saving over £600 this year. We won’t be keeping that money back for a rainy day – we’ll be spending it, either on essential household maintenance that always seems to get put off as money runs short at the end of the month, or on days out with the kids, occasional new clothes and the odd meal out – things that we can’t always afford. Either way, the money will go straight back into the economy. I’m sure that for many families around the UK, who aren’t quite poor enough to be eligible for free school meals on the previous criteria but who nonetheless struggle to get by as living costs continue to soar and salaries fail to keep pace, will be in the same position. Paying a living wage would be a better solution, but at least this policy gets a little cash back into the pockets of those who need it. Having it also go to those who don’t need it is, as the pilot showed, a necessary part of ensuring take-up across the board. 

There are some flaws with the implementation of the policy. As is often the case, delivery has been rushed and funding hasn’t covered all the costs, leaving some schools struggling to deliver on time and without cutting money elsewhere. This could present an important difference between the pilot and implementation – if schools have spent so much time and money on free school meals that other areas of learning have suffered, attainment gains could be reduced. 

A stepped wedge randomised roll-out, where the policy is delivered gradually in an increasing number of randomly selected areas to allow robust comparisons to be drawn, would have offered a great opportunity to evaluate the impact. As it is, any gains can’t be measured as there will be no ‘untreated’ group to compare with. 

Further research could be commissioned this year to help understand whether free school meals is having an effect, and to test hypotheses based on the pilot. The gains in attainment weren’t down to reduced absences, and parents didn’t report improved behaviour. But did universal free school meals lead to improved afternoon concentration? Did school mealtimes with everyone eating the same thing (rather than those who could only afford poor-quality packed lunches sitting on separate tables) lead to a sense of community, more shared conversation and improved classroom climate? Did the information given to children and parents as part of the pilot cause the improvement rather than the food itself (this is another difference which means the gains may not be replicated, as the information campaign will not be replicated for the national roll-out). Some properly funded case studies, surveys and quantitative analysis of attainment and other school data could help answer some of these questions. 

I’m still not 100% convinced that the policy is a good idea. I know of some bloggers, such as Not Very Jolley, think the policy is an unmitigated disaster and give convincing arguments for that position (so convincing that I nearly didn’t bother with this post, as the issues are covered in much more depth over there). But I think it is based on better research than a lot of government policy, and has some points to recommend it. I’ll be looking out for follow-up research.