Tuesday, 21 December 2010

Terrible 'science' from the Scottish Government

Some recent evaluations of interventions or associations with close ties to the current mental health agenda in Scotland have highlighted the shift away from 'evidence-based policy' to 'policy-based evidence'.

The first was "An Evaluation of wellness planning in self-help and mutual support groups", produced by the Scottish Centre for Social Research, and published in September 2010. Essentially, this was an attempt to determine the benefits from WRAPs (Wellness Recovery Action Plans), which are advocated by the Scottish Government and are tightly integrated with the Recovery model which is driving most aspects of mental health service delivery in the UK and beyond.

The remit for the evaluation was: "...to assess the relevance, impact and effectiveness of Wellness Recovery Action Planning (WRAP) as a tool for self management and wellness planning by individuals with mental health problems from pre-existing and newly-formed groups, where the possibilities for continued mutual support in the development of WRAPs could be explored."

The approach was 'mixed methods', which typically combines qualitative and quantitative methods to address different aspects of the particular research question. Whilst some quantitative information was collected, the bulk of the evaluation relied on qualitative information from interviews and focus groups.

As with most such evaluations of policy, there is a dissociation between the objectives and methods. The purpose of the evaluation included a specific aim to determine the "effectiveness" of WRAPS, but data on whether they make a difference is notably absent. There are lots of anecdotes on how great everything is, but it's when the researchers try to get to grips with some numbers that the problems really start.

Table 2 is shown below. It compares scores on the Recovery Assessment Scale (RAS) and the Warwick-Edinburgh Mental Wellbeing Scale (WEMWBS) in people before and after their WRAP training:


The authors report that: "...that RAS scores increased in all groups, and WEMWBS scores in all but one group, after the respondents had completed their WRAP training. This suggests that both the facilitators and participants had more positive views in relation to their own sense of recovery and well-being having been trained..."

Also: "...the pre- and post- WRAP training questionnaires were not completed by the same number of people. Any differences between pre- and post- WRAP training scores might, therefore, be due to the fact that people with higher scores completed the post-WRAP training questionnaires."

Indeed. If the authors looked at the table, they might have noticed that when the pre- and post-groups were the same size, the change is not really that great. The numbers in each group are very small, but any changes are likely to be non-significant (i.e. due to chance alone). The only groups that show improvements are those where some people did not return forms. Some groups (e.g. the Tayside Carers Groups 1 and 2) had more respondents in the post-group than existed in the pre-group. This suggests poor handling of data and/ or the possibility that different people were completing forms in the pre- and post groups.

In the groups where data is missing in the post-group, the lower end of the range will often shift up whilst the upper end doesn't move. This suggests that those who had low scores on pre-testing are not included in post-testing. One can suggest this because a change on the WEMWBS score of 10 is improbable given the psychometric properties of the scale, although very little is actually known about its sensitivity to change.

The authors should have known that averaging very small groups is problematic: the average of two people with scores of 2 and 100 on a 100-point scale is 51. However, this score is representative of neither of them. Trying to calculate the mean for such small samples is simply daft, and betrays a degree of statistical ignorance. This doesn't stop the researchers from spinning the poor data: "However, these results do support the very positive views expressed by facilitators and group participants in the main qualitative phase of the study."

The really low quality of the data analysis is indicated by the scores of the facilitators pre- and post-WRAP training. These are shown below (as Table 1):


The authors helpfully point out that scores on the WEMWBS can range from 14-70. It is therefore puzzling how the Pre-WRAP WEMWBS scores range from 76-100 (all above the upper end of the scale), yet still have an average score of 49.4. This is, of course, impossible given the range of scores.

Unfortunately, this is not a new phenomena in Government-commissioned evaluations of its own policy. The quantitative evaluation is handled as if the researchers didn't have a GCSE in maths, and yet all the conclusions are positive. A further example of this will follow.

More papers on the WEMWBS can be found on the links below.

TENNANT, R., FISHWICK, R., PLATT, S., JOSEPH, S. & STEWART-BROWN, S. (2006) Monitoring positive mental health in Scotland: validating the Affectometer 2 scale and developing the Warwick-Edinburgh Mental Well-Being Scale for the UK. Edinburgh, NHS Health Scotland, University of Warwick and University of Edinburgh.

TENNANT, R., HILLER, L., FISHWICK, R., PLATT, S., JOSEPH, S., WEICH, S., PARKINSON, J., SECKER, J. & STEWART-BROWN, S. (2007) The Warwick-Edinburgh Mental Well-being Scale (WEMWBS): development and UK validation. Health and Quality of Life Outcomes, 5, 63.

STEWART-BROWN, S., TENNANT, A., TENNANT, R., PLATT, S., PARKINSON, J. & WEICH, S. (2009) Internal Construct Validity of the Warwick-Edinburgh Mental Well-being Scale (WEMWBS): a Rasch analysis using data from the Scottish Health Education Population Survey. Health and Quality of Life Outcomes, 7, 15.

No comments:

Post a Comment