Market research for decision makers >> Market research learning resources

Market research scales, ranks and trade-offs

Scales and ranks for market research Most standard market research is the preserve of rating scales - score out of 10, agree-disagree, satisfaction, likelihood to buy.

Scales are not the only possible method of measurement. Using choices, ranks and trade-offs through techniques like conjoint analysis or MaxDiff can provide more actionable data for models and to forecast market behaviour.

Scales in market research

Rating scales are meat and drink to market researchers - "Please rate each of these items out of 10", "How much do you agree or disagree with each of these statements?", "How satisfied are you?", "Out of 10 how likely are you to recommend?" (net promoter score - NPS), "How likely are you to buy this product at this price?"

Scales come in many different forms and formats. The most common type are Likert scales - also known as agree-disagree scales - where the scale agreement is measures on a five or seven points in order (hence an ordinal scale). Ratings, typically from 1 to 10, are another use of scales to indicate a level of performance.

The benefit of scales is that they are easy to ask, provide data that can be analysed numerically for statistical analysis and that are stable on repeated measures across a sample (though not necessarily at the individual level).

How many points and what format?

The most common questions about scales are how many points? Should you have a midpoint? Should you label the points? And can you transform old answers into a new scale? Though these are easy questions, there is a surprising amount of academic study and debate around, and some decisions come to researcher preference.

For more academic researchers, a 7 point scale is often used as it gives more data for later analysis. Though 7-point scales are most 'pure' theoretically, it would be fair to say that most commercial researchers would use a 5 point scale as it is easier to label the points. In customer satisfaction a ten point scale 1 to 10 (or preferably an 11 point scale 0 to 10 - people scoring low tend to prefer to be able to allocate zero points, than 1 point) is often used, as it is familiar from use in grading papers and tests.

1 to 10 scales, in practice, tends to find that respondents don't use the whole scale evenly. There is a bias towards using the top half of the scale (7,8,9,10) when allocating scores. This bias is explicitly used as the basis of Net Promoter scores where high scores (9,10) are considered promoters, and low scores (0-6) are considered detractors, recognising the unevenness of the scale in practical use.

Numeric scales can also be used to convert verbal point (eg 'very valuable', 'somewhat valuable', 'not very valuable', 'not at all valuable') as numeric points from 1 to 4. On telephone surveys, this saves the respondent from remembering the actual names. However, the order needs to be made very clear - is 1 good or bad?

On the telephone a five point scale can also be used as a 'roll out' scale. That is rather than label all five points for the respondent you split the question in two: The first part is to ask "Do you agree or disagree?" Then follow this with "Is that a lot or a little?" The combination result is "Disagree a lot, Disagree a little, Neither, Agree a little, Agree a lot". After a couple of goes respondents know the scale and automatically start to answer Disagree a little, Agree a lot etc.

For online surveys, scales may be replaced by sliders, or visual points - eg smiley to sad faces. Using images rather than words can help standardise meaning in international projects, assuming the same cultural inferences are taken from the images.

In practice, the choice of scale will depend on the subject, the presentation format and whether it will be written or spoken. Because scales are easy to create, there is a temptation to ask too many questions. Large question grids with banks of attitude statements and scales, are a common reason for drop-out on online surveys.

Mid points

Scales without a mid-point (ie removing the neutral 'neither' point) force opinions in one direction or another. Ratings and likelihood scales naturally have no mid-point, they run from high to low ('Very likely', 'Somewhat likely', 'Not very likely', 'Not at all likely'), so the choice for mid-points is normally around Likert-type scales.

The reason for forcing a choice, is that it can make statistical analysis easier and it reduces the problem of mid-lining - simply choosing a mid point on all answers to complete the survey quickly.

The debate comes on whether a five point scale should really be a four point. On the telephone, it's common to offer a four point scale (eg do you agree or disagree) explicitly, but allow the interviewer to code for Neither, so neither is different to Don't know. On screen or on paper, forcing responses is more difficult, and often creates annoyance for respondents if they feel their answer is not represented.

Since most surveys are now self-complete online, it is often better for completion rates and accuracy to include the mid-point, or at least a "Can't Say" option.

Straightlining

A common quality problem seen on scale and grid questions is straightlining. That is the respondent simply gives the same answer to all questions - to get through a question quickly, due to boredom, inattentiveness or distractions, or just attempting to finish the survey fast to get to a reward. Straightlining is one of the quality checks we use to assess response quality on a survey. If the overall quality of responses is low, then the questionnaire will be rejected.

To counter straightlining, the statements should be framed so that respondents would be expected to switch between positive and negative ratings, and so keep 'on task' rather than just run mechanically saying 5, 5, 5... So instead of just giving statements in the positive, encouraging agreement, the statements would also be reversed with an expectation of disagreement ("This store is clean","The store is poorly laid out").

This would then be combined with randomising the order in which the statements are shown both to minimise order effects (top items getting rated higher because they are the first statements), and ensuring that response order differs for each respondent.

Unbalanced scales

Though theoretically scales are better balanced, in practice unbalanced scales can also be used, particularly where there is a strong natural bias towards a positive rating. If you are talking to donors to a charity for instance there is a tendency to eulogise through the scores to the positive end. For tracking purposes, this can be difficult if everything is always rated at the top and no-one rates negatively. For this reason additional superlative levels may be added to the scale or alternative ways of asking the question used.

Mixing scale types and using choice type questions

There can be a tendency to overuse simple ordinal scales. For instance "How much do you agree or disagree with 'I know a lot about the brand'? forces a likert type answer. But the response in terms of agreement doesn't explicitly say how much someone knows about the brand.

In this case, if the aim is to understand level of knowledge it might be better asked as a categorical question "How much do you know about the brand" - "Never heard of it", "Heard of it, but know nothing", "Know a little", "Know a lot". In this case the clearer more categorical approach will have more meaning to respondents and be easier to understand in analysis. Categorical approaches are used in Kano analysis of which features are needed for a new product and are often easier to interpret.

Another second alternative to likert scales is a choice: "Which of these two brands is better quality?" The choice can also be framed with a scale from Brand 1 to Brand 2. Choices offer powerful alternatives to standard scales and are used in techniques like conjoint analysis or MaxDiff as they are more actionable than simple scale points.

The concept of choices can extend to associative questions. "Which of these brands are ... friendly?" and a list of brands can be associated with the word, followed by "Which of these brands are ... unfriendly". This type of associative approach leads to measures known as "Image strength" (total associations made for a brand) and "Image character" (the direction of associations - positive or negative), that allow a large number of brands to be scored across a number of different characteristics quickly and easily.

We take this approach further in our hot-cold, or thumbs-up-thumbs-down questions. In these types of questions, respondents choose which of the items to select - positively or negatively - and can give multiple ticks up or down like a scale. In choosing what to rate, no answers are forced so only genuine opinion is scored.

Analysis and reporting

Challenges moving means
Imagine there are only 8 respondents on a 1-4 scale. a 2.5 midpoint score can come from distributions of 4:0:0:4; 0,4,4,0; 2:2:2:2; 1:3:3:1, 3:1:1:3. In contrast, there is only one way of getting 3.75 - 0:1:0:7.

If a business wants to improve its score by 0.1 from 2.5 midpoint there are multiple possibilities - between four and eight moves. But the move from 3.75 to 3.9 can only be accomplished one way. Thus, as the mean score gets higher, it gets harder to to find improvements - something seen on customer satisfaction measures for high performing businesses

Scales form the basis of a great many statistical techniques for understanding markets. Often scales are treated as numeric values be used as independent variables to feed into a regression model and so determine which ratings are most important in decision making (note though that correlation doesn't necessarily imply causation).

The use of scales as statistical parameters opens up statistical techniques such as perceptual mapping, cluster analysis for segmentation, factor analysis to distil core meanings, and regression analysis to identify key drivers.

In reporting, scales are often reported as mean scores, particularly for academic reports. Mean scores implicitly imply that points on the scale are equally spaced and that respondents use the scale points in the same way. Mean scores give a single number to report, with a confidence interval (and p-scores for comparisons between groups.

However, for commercial research, mean scores are less commonly used. A mean score itself can be difficult to interpret directly and more difficult for non-expert readers to understand. For instance, for a four-point scale, scored as 1 to 4, the mid-point - ie balance point - is 2.5. Most people would intuitively think it should be 2.

The mean score itself is arbitrary in terms of the scale it uses. The points can be given any value, not just 1, 2, 3, 4, 5 say, but -2, -1, 0, 1, 2. This means the score can be converted to a number between 0 and 100 and which is easier to understand - and avoids decimals for less numerate readers.

A second reason for avoiding showing means is that the mean itself is not evenly distributed in terms of actionable changes. What this means is that it is easier to move a mean score up by 0.1 in the middle, than to move it at the extremes (see sidebar). Consequently, researchers often prefer to report 'top-box' scores - usually the sum of the percentages of the top two items, or to show the actual percentages to show the distribution of answers.

Criticisms of scale use

With such a large number of options for scales and ratings it's not surprising they are used extensively in market research (and also in clinical trials as a measurement of outcomes - such as pain reduction). Scales are very easy to create and to use and seem to have an intuitiveness to them, they are stable across a sample across time and so can be used for tracking and measuring improvement.

There are criticisms of scales and a need to understand good practice. Ideally scales should be validated to show they measure what they are supposed to measure. Otherwise there is a concern that they are 'fuzzy' or unclear in meaning. What does the respondent mean if they say they agree slightly with something? Is this good or bad? Will it affect their decision making?

The meaning or impact of a scale rating can be shown by carrying out statistical analysis to link scale scores to behaviour on aggregate using regression type techniques. And factor analysis or principal components can be used to identify meta-concepts behind combinations of scale ratings.

However, for many respondents, their personal rating is not stable - they might give a different rating to the same measure one or two weeks later (or even in the same survey). So though the rating is stable at the sample level, if individual's views apparently switch readily, can it be used as a predictive tool?

Ratings also vary according to the scale used. For long-standing surveys with trends, switching scales from say 5 points to 7 points, or from 10 to 5, changes the ratings (and introduces a discontinuity into the trend series). Directionally, the ratings are usually the same, but the precise values shift.

In part, this may because a rating is a forced item. Individuals are required to show an opinion on something, that perhaps they do not value, or do not consider important, and they have to give their opinion within a frame created by an external researcher who may use language the individual would not normally use.

The scale therefore mixes strong opinions of those in the know, with the opinions of individuals who are simply giving an answer because that is what they have been asked to do. This 'positivist bias' (not 'positive') assumes that the questions the researcher asks are pertinent to the opinions of the respondents.

In practice, low-involvement respondents will tend to guess and give a response they think is required. This provides stability to the statistics without necessarily representing genuine opinion.

For this reason, unforced ratings where respondents can choose what to give an opinion on, or not (such as our hot-cold scales) may be a better method of understanding underlying opinion dynamics.

A second criticism of scales is that unless scales can be linked to actions or decisions, for business decision makers understanding and implementing change based on scale ratings is hard. If 20% of customers think you are unfriendly, is this good or bad? How do you change? Is it something you should change? How much effort and money should you put towards changing? And what will the return be?

And the third criticism is that the measures are rationalisations - or 'thinking' measures - they are explicit about items that people often think or react to implicitly.

By making them explicit, the respondent takes into account social norms - what would other people think if I said I hate recycling? Or expected perceptions of the interviewer. Respondents mask opinions, even to themselves, to make themselves seem more socially with-it, or to have higher social standing or because they think there is a right answer. Respondents thus can end up 'negotiating' in the answers that they give.

Some of these norms and niceties can be mitigated by the way the question is asked, to find ways to allow implicitly unpopular views to be voiced fairly - for instance attitudes to drug taking, or questions over sexuality.

Using choices over scales

In general, choice-researchers prefer categories to abstract scales - though naturally we use scales too. If you force someone to make a choice, for instance from two options, or by asking for a ranking or top three, you get a better perspective of relative values between items. For researchers from a scale-based background, the power of ranks seems difficult to understand and there is a tendency to try to turn the rank into a scale (how do I score the ranks? is a common question). But ranks indicate preference and trade-offs. They can be interpreted using statistical tools like those available for conjoint and we can ask modelling questions like if X wasn't available what would people choose next. There are issues with ranks too - determining the 'step-size' between items for instance, so there are a number of hybrid techniques that combine elements of scales and ranks to provide more information.

And obviously from conjoint analysis, we would prefer categorical answers rather than scale or ordinal answers as they provide a fuller description of where customers needs are and what they want. You don't want a 'quite good' or 'very good' camera, you want a DSLR or a Canon - a specific category, not a rating. Defining categories can be difficult, but knowing categorical preferences is much more powerful than just understanding rated preferences.

For help and advice on market research design and development contact info@dobney.com