Market research scales, ranks and trade-offs

Most standard market research is the preserve of rating scales - score out of 10, agree-disagree, satifaction, likelihood to buy. Among the research community there can be heated debate about which type of scales are best. For trade-off researchers, scales have more limited value. We'd generally prefer ranks or choices. Why is this?


Scales in market research

Rating scales are meat and drink to market researchers - please rate each of these items out of 10, how much do you agree or disagree with each of these statements, how satisfied are you, out of 10 how likely are you to recommend (net promotor score - NPS), how likely are you to buy this product at this price? Scales come in many different forms and formats. The main type are Likert scales. The benefit of scales is that they are easy to ask, provide data that can be analysed statistically and that are stable on repeated measures across a sample (though not necessarily at the individual level).

Most scales are ordinal in nature - that is the express an order (Disagree strongly is worse than disagree for instance), and they can be scored in order to calculate a mean score if required. Many researchers dislike mean scores from scales as means hide the distribution and can be more difficult to understand. Instead you often find researchers reporting 'top-box' scores - usually the top two items. If you are reporting mean scores, it can be extremely useful to scale the mean so it represents a range of 0 to 100 (for instance in a four-point scale Disagree strongly=0, Disagree=33, Agree=66, Agree strongly=100, or in a five point scale you set the score points a 0, 25, 50,75 and 100). If you are asking respondents to rate on a 1 to 10 or 1 to 5 scale, one minor point that sometimes gets over looked is the middle point for the mean is 5.5 for 1 to 10, and 3 for a 1 to 5 scale. These intuitively feel odd.

The most common questions about scales are how many points? Should you have a midpoint? Should you label the points? And can you transform old answers into a new scale? The question of number of points is down to the researchers as much as anything. If the researcher has a particular theoretical slant they are more likely to go for a 7 point scale as it gives more data for later analysis. A ten point scales 1 to 10 (or preferably an 11 point scale 0 to 10 - people scoring low tend to prefer to be able to allocate zero points, than 1 point) is familiar from school, but tends to find that respondents don't use the whole scale - there's a bias towards using the top half of the scale (7,8,9,10) by respondents. This is one reason that the Net Promotor score looks at the difference between 9 and 10 compared to less than 7 to derive their score.

Though 7-point scales are most 'pure' theoretically, it would be fair to say that most researchers would use a 5 point scale. A seven point scale can be difficult to label so the intervals seem equal. It's much easier on a 5-point scale. Some people offer a 1 to 5 numerical scale. As with 1 to 10, this can be a little complex as you're asking respondents to allocate 1 point to something they would prefer to allocate zero to. Secondly, unlike a 0 to 10 scale where intuitively 10 is best, on a 1 to 5 scale, 1 might be best (rank one), or where the scale is agree-disagree, it can be confusing as to which end of 1 to 5 means which (at least on the telephone - obviously on screen or on-paper is easier). For this reason a 5 point scale is often best done by labelling the points - agree strongly, agree a little..., very satisfied, quite satisfied... etc). On the telephone a five point scale can also be used as a 'roll out' scale. That is rather than label all five points for the respondent you split the question in two: Do you agree or disagree with this statement? Is that a lot or a little? The result is Disagree a lot, Disagree a little, Neither, Agree a little, Agree a lot. After a couple of goes respondents know the scale and automatically start to answer Disgaree a little, Agree a lot etc.

Mid-points are the next issue. All the scales above have a mid-point. They are also balanced scales - equal positive and negative items. Some scales naturally have no mid-point (eg likelihood to purchase, Very likely, Quite likely, Not very likely, Not at all likely). The debate comes on whether a five point scale should really be a four point. On the telephone, it's common to offer a four point scale (eg do you agree or disagree) explicitly, but allow the interviewer to code for Neither, so neither is different to Don't know. On screen or on paper it is more difficult. As respondents get annoyed if they have to split there decision, it is often better for completion rates and accuracy to include the mid-point. It's also common to switch the statements so respondents have to switch between positive and negatives and so keep 'on task' rather than just run mechanically saying 5, 5, 5...

Though theoretically scales are better balanced, in practice unbalanced scales can also be used, particularly where there is a strong natural bias towards a positive rating. If you are talking to donors to a charity for instance there is a tendency to eulogise through the scores to the positive end. For tracking purposes, this can be difficult if everything is always rated at the top and no-one rates negatively. For this reason additional superlative levels may be added to the scale or alternative ways of asking the question used.

There can be a tendency to overuse scales. For instance "You know a lot about the brand": do you agree or disagree? If the aim is to understand level of knowledge it might be better asked as "How much do you know about the brand" - Never heard of it, Heard of it, but know nothing, Know a little, Know a lot. In this case the clearer more categorical approach will have more meaning to respondents and be easier to understand in analysis. A similar more categorical approach is used in Kano analysis of which features are needed for a new product.

Another alternative to straight likert scales is a split rating: Which of these two brands is better quality? and a scale from Brand 1 to Brand 2. A second approach is an associative battery. "Which brands are friendly?" and a list of brands can be associated with the word, followed by "Which of these brands are unfriendly". This type of associative approach leads to measures known as "Image strength" (total associations made for a brand) and "Image character" (the direction of associations - positive or negative).

Obviously many scales online have been transformed into graphical sliders or selectors to encourage more participation. Generally respondents get put off by large gridded questions - so these graphics versions make it much more friendly to the respondent.

Scales also form the basis of a great many statistical techniques for understanding markets. Scales can be used as independent variables to feed into a regression model and so determine which ratings are most important in decision making (note though that correlation doesn't necessarily imply causation - there's a great deal of evidence that people change behaviour before they change attitude). Other areas are perceptual mapping, cluster analysis for segmentation, factor analysis to try to distill out key drivers.

With such a large number of options for scales and ratings it's not surprising they are used extensively in market resarch (and also in clinical trials as a measurement of outcomes - such as pain reduction). Scales are very easy to create and to use and seem to have an intuitiveness to them, they are stable across a sample across time and so can be used for tracking and measuring improvement.

There are however three major criticisms of scales. The first is that they are 'fuzzy' or unclear in meaning. What does the respondent mean if they say they agree slightly with something. Is this good or bad? Will it affect their decision making. We can assess some of this by carrying out statistical analysis to link scale scores to behaviour but it is difficult to relate a change of scale rating to behaviour. In particular for many respondents, their personal rating is not stable - they might give a different rating to the same measure one or two weeks later (or even in the same survey). So though the rating is stable at the sample level, if individual's views apparently switch readily, can it be used as a predictive tool?

The second criticism is that internally in the business understanding and implementing change based on scale ratings is hard. If 20% of customers think you are unfriendly, is this good or bad? How do you change? Is it something you should change? How much effort and money should you put towards changing? And what will the return be?

And the third criticism is that the measures are 'thinking' measures - they are explicit about items that people often think or react to implicitly. By making them explicit, the respondent takes into account social norms - what would other people think if I said I hate recycling? Some of this can be mitigated by the way the question is asked. In face-to-face surveys shuffle boards (the items to be rated are written on small cards and sorted onto a board marked up with the scale categories) for instance, but there is still a tendency to be forcing a post-rationalised answer, some of which are on very weakly held beliefs.

In general choice researchers prefer categories to abstract scales - though naturally we use scales too. If you force someone to make a choice, for instance from two options, or by asking for a ranking or top three, you get a better perspective of relative values between items. For researchers from a scale-based background, the power of ranks seems difficult to understand and there is a tendency to try to turn the rank into a scale (how do I score the ranks? is a common question). But ranks indicate preference and trade-offs. They can be interpreted using statistical tools like those available for conjoint and we can ask modelling questions like if X wasn't available what would people choose next. There are issues with ranks too - determining the 'step-size' between items for instance, so there are a number of hybrid techniques that combine elements of scales and ranks to provide more information.

And obviously from conjoint analysis, we would prefer categorical answers rather than scale or ordinal answers as they provide a fuller description of where customers needs are and what they want. You don't want a 'quite good' or 'very good' camera, you want a DSLR or a Canon - a specific category, not a rating. Defining categories can be difficult, but knowing categorical preferences is much more powerful than just understanding rated preferences.


For help and advice on introducing Quality of Service Reviews into your business contact info@dobney.com

X

How can we help?

Help with a query Site feedback Contact me