Market research for decision makers >> Market research learning resources

Understanding market research samples and sampling methods

Understanding sampling The principle of samples and sampling is the foundation for survey research - using a small properly constructed sample to be able to project the views and opinions of a large population.

Market research sampling derives from statistics and the principles of probability. Researchers will choose how they sample based on a number of factors including how easy it is to find the target population and how important it is to have a genuine random sample as opposed to a so-called convenience sample.

Market research sampling basics

Sampling, at its simplest, is a very straightforward process. Start with a list of people in the population to be sampled; select a sampling fraction N according to the sample size required; randomise the order of the list and then you pick every Nth record in the dataset and conduct a survey or questionnaire with that person.

The simple option - sampling from lists

For instance, starting with a list of 20,000 people, if the target is a sample of 100, the sample fraction is 20,000/100=200. Pick every 200th person and there is the sample. Crucially, each person on the list would have had an equal chance to have been invited to take part (EPSEM - equal probability of selection methodology) and so the sample is fully random.

Some projects can be as simple as this process. A company with a list of willing customers eager to take part in a survey would be a good approximation to this process. The only slight catch is to check the list is randomised fairly, and a proper 1 in N method is used (usually with a seed number to pick the first record).

Challenges for list-based sampling.

The 1 in N database method of picking a sample is not without it's problems and still requires care. For instance, the N customers to be contacted, could be taken sequentially from the first record on, until the sample size is met. Unfortunately as database IDs run from oldest to newest, this can end up with a sample entirely of old customers this way.

Similarly, lists are not necessarily evenly spread. For, B2B companies, the likelihood is that the biggest customers will also be larger businesses themselves too. However, the list of all customers you have will be dominated by smaller customers because the typical profile of a B2B customer database, where 20% of the customers make up 80% of the sales.

Consequently, a sample of 200 on the 1 in N basis we would anticipate 40 large customers and 160 small customers. But that doesn't really reflect the sales profile so the views of the larger customers might be drowned out by the larger number of smaller customers in the analysis.

Sample stratification and planning the sampling

For this type of reason, sample often need to be stratified - that is different groups need to be sampled separately - so a sample of large customers separate from a sample of small customers. This type of stratification can also be used to control the profile of the sample.

In a fully random sample, the randomness means that at times the sample might lead to a disproportionate number of questionnaires for one group or another. Stratification and sampling by the strata allows known profiles to be controlled for. It might mean dividing the contact list into geographic regions and then sampling within each region in order to ensure the sample matches the known profile of the database.

Sampling without a list

In the database case, the sample is drawn from a known list. But in most research situations there is no list to draw from - there is no known list of internet users, or mobile phone owners, or owners of a particular car, or drinkers of a particular beer.

In these cases sampling moves from the theoretical purity of a 1 in N sample to something which balances purity of design with practicality of locating the people you wish to interview, requiring finding individuals, or using pre-selected research panels to help speed up the process of finding the right people.

Random samples (or pseudo-random samples)

For consumer markets, there is often no list to work with. For a real random sample (eg for Government statistics or measuring media use where people pay for a particular level of advertising exposure) randomness requires work.

Random digit-dialling for phone samples

The main method for random selection, until the arrival of mobile phones, was to select by telephone number. For random telephone samples the broad principle is that you set a computer to call numbers at random in order to make contact with individuals using random digit dialling.

In practice pure random numbers isn't so efficient, but phone companies often allocate numbers in blocks and in places like the US, databases of these blocks existed. It was then a process of randomly selecting a block and then selecting a number within the block at random. And then, if it's a household number, selecting an individual in the household at random.

This was at least the principle before mobile phones and before homes with multiple phone lines. It is still used as a method, but once you have more than one line, or a line that might be turned on or off according to the weight of use, the quality of the randomness starts to diminish.

People with more than one line, are more likely to be called. If you have a mobile phone on more of the time, you are more likely to be called. Mobiles also cause problems because mobile numbers are used differently than fixed lines - you have no idea where the person receiving the call might be. They could be overseas, in which case they might get charged to receive your call. They could be driving in which case you shouldn't be interviewing them.

Face-to-face random samples

If you're still looking for a random sample and telephone is not applicable, face-to-face might be an option. It's not common in the United States because of the geography and distances involved, but it remains common in the UK for major surveys like follow ups to the government census or major health studies.

In a face-to-face random survey the first approach for the UK is to use the electoral roll. Individuals are selected at random from the electoral role, then an interviewer would seek the individual out, going to their address physically, returning several times if they were out to get an interview.

Though very pure statistically, these type of survey are extremely labour intensive and so extremely expensive. For this reason alternatives were developed.

Geographic sampling or random-location sampling

The main face-to-face approach is to split the country into geographic regions, sometimes down to blocks of 10-20 houses (enumeration districts that are used to ensure full coverage for the census). Then starting with a list of houses or small geographic areas, areas would be picked at random, and an interviewer allocated to visit the households in those areas. Additional controls would be added for who was at home (eg employees were likely to be out during the day) by controlling the time of the interviews and who could be interviewed.

These type of random-location surveys are still expensive, but at least manageable in terms of allocating an interviewing team and is still the dominant method for conducting face-to-face omnibus studies and media studies where randomness is important in order to properly measure survey-to-survey variations.

Online research sampling

What about online research? For online research, drawing a random sample of a population relies on having a list that covers all individuals. Research panels provide very large lists, but there is an element of self-selection for panel members to join the list in the first place.

Some companies combined face-to-face or random telephone samples as a method for recruiting to online panels, so retaining some of the elements of more representative methods by including people who otherwise would not have signed up to a panel.

Alternatives include 'river sampling' or using open contact points to pull in a natural sample via adverts or pop-ups.

In practice, the current size of market research panels is so large and their recruitment methods are designed to ensure population coverage, that results are representative of a significant proportion of the population, and are increasingly treated as if they are a full random sample.

Screening versus convenience samples

Unless there is a detailed list to start with or random-digit dialling can be used, fully randomised samples as expensive and difficult to obtain. For most categories - eg recent car buyers, users of fly spray, visitors to Bristol - a list simply isn't available.

For some of these types of categories you can do a 'screen'. In a screen you take a random or pseudo-random sample and then use a screener questionnaire (also known as a recruitment questionnaire) to identify the core group you wish to interview. For groups that are a small part of a larger population this can mean asking thousands of people to help to get just a few hundred responses.

As this can also turn out to be expensive, researchers will use what are known as 'convenience samples'. A convenience sample means finding people who fit the criteria, but not worrying about whether the sample is genuinely random.

An example is stopping people in the street to ask them to take part in a survey (street interviewing). Here only people who are passing can be interviewed, so the sample is not genuinely random - for instance it's likely to be biased towards people not working, able-bodied people and often younger females during the day.

Similarly, an online panel is, in reality, another form of convenience sample. The people who sign up for an online panel are not necessarily representative of the full range of views in the market because you don't know if there is a bias introduced by getting people to sign up (eg if you ran a survey on privacy, you might find panel respondents less concerned about privacy than those who have not signed up to a panel).

Using customer lists

A classic convenience sample is a company's own customer lists. This introduces a natural bias towards the company and the company's products - it will not include many non-customers or people who reject the company's products. This can be acceptable within known limits, but it is something to be very careful of.

This hidden type of bias comes into a lot of database and web-analytics as these internal sources of information can only provide information about the people who bought, or who visited and not those who didn't. With the consequence that it can be very difficult to say anything about why people don't become customers or don't spend a long time on the website.

Quotas and quota-based sampling

Because of the hidden potentials for bias in convenience sampling, one method for control is to set quotas to ensure that a certain number of interviews are achieved in certain categories. This might include setting quotas by age, or working status, or socio-economic grade, but in business-to-business surveys might include the sector (the companies that do the most marketing are typically the least likely to do market research surveys - local government the most likely to take part), or size of the business.

A quota is then used to set a target and a limit on the number of interviews to be achieved. For instance a minimum of 5 men under 25 and a maximum of 10 men aged 65+.

Quotas are often used for street interviewing or house-to-house interviews, and are common in telephone research to keep the sample balanced.

If the quotas are set very tightly it can make it very difficult to find the last few interviews, but too loose and the sample will tend towards the easy to find categories of respondents.

Non-response bias

Adding quotas and setting interview targets, doesn't make the sample random but for reasons such as cost or speed it may be considered the best available sample for the job. Obviously the researcher needs to keep an eye on potential biases, but there is one more hidden potential bias, even with random samples.

Imagine an individual has been chosen at random to take part, if that individual then declines to complete the survey there is the potential that this introduces a non-response bias. In other words, how can you know that the people who don't take part are like or match with people who do take part in the survey?

In some cases simply saying a survey is being carried out on behalf of say Epson will mean that customers who prefer HP may be less likely to take part. For this reason deciding to reveal or not reveal the sponsor of the survey could skew the results.

In some cases for governmental surveys, the question of non-response bias has been important enough for follow-up checks on those who did not respond. Instead of completing a full questionnaire the non-responders were asked a handful of the key questions. In general these suggested that the original non-responders were similar to those who took part in the survey at the start.

Other forms of sampling

Proxy sampling

In some cases obtaining a full sample can be extremely difficult and creative ways are needed to provide an answer to the research problem. A very famous case of this was at BMRB in the 1990s looking into the effectiveness of advertising to counter the threat of AIDS/HIV. In this case a sample of gay men was vital, but extremely difficult to get any form of sample from conventional means. So instead a 'proxy sample' was used.

Interviews were carried out in gay clubs and changes in opinions and behaviour monitored over time. This use of a proxy for monitoring purposes is common. Even if the sample is biased, so long as the samples are consistent it may be possible to measure changes, even if these are not directly projectable to the population in question, and therefore judge the success or otherwise of the advertising.

Snowball sampling

A second common problem is that the population to be researched may exist, but may not be easy to reach through an interviewer or formal request to take part. An example is a survey among volleyball players we carried out for the English Volleyball Association. The group of volleyball players clearly exists, but rather than use an interviewer led approach, a 'snowball' method was used. In other words, friends ask friends to complete the survey. Again there are obvious potential biases - the keener and more interested players are more likely to take part, as are the better connected individuals. Snowball techniques have also been used to recruit difficult to reach groups like ex-teachers to help monitor campaigns to recruit people back into teaching.

AI-based sampling and synthetic data

With the arrival of large language models (LLM) such as ChatGPT, which are commonly known as AI systems, it has become very easy to get the AI system to act in different personas and so to generate realistic-looking synthetic data for a number of market research tasks.

For some tasks, such as pre-screening ideas and looking for glitches, this makes the options to take into real market research more relevant by refining the offers prior to testing with real people. In other situations, synthetic data needs to be handled carefully. AI is good at 'acting' as an 'average' person, but for much research the questions and offers are new and AI has no data to work with. In the end, reality is key to understanding marketing effectiveness, so research should always plan to be real.

Help for sample design to get good quality research

Sampling is core to survey research, but with so much done online, it can be neglected leading to unexpected biases in the results. In practice, sampling is often a trade-off and an awareness of the trade-offs involved helps mitigate the potential for bias.

For help and advice on sampling and sample design contact info@dobney.com