Market Research Statistics
Most quantitative market research is delivered in the form of percentages, often pulled out of cross-tabulations so that differences between groups (on the banner or cross-break) can be compared according to the different categories of answers on the stub. Beyond these basics there are a number of techniques to analyse the data more deeply including cluster analysis, factor analysis, regression and display techniques like perceptual maps
The basic data from a market research is presented in the form of percentages. What percentage of the sample gave this response? How many of each subgroup gave another response? Are there significant differences between subgroups? To investigate these questions the researcher runs a set of tabs (cross-tabulations) which list percentage answering each question (the stub) by the total and other interesting subgroups (the break, or banner). Differences can be checked for significance using standard statistical estimates - the most common of which is the Student t-test (not just because something is significant statistically, it's not necessarily significant commercially unless you can leverage the difference). For numeric and scales questions answers can be scored and a mean score estimated. As means can hide a lot of useful additional data, it's always worth looking at the distribution of answers. And for numeric questions, checking for issues like outliers (extreme values) that might throw out a mean value.
If the data is representing a population - for instance the adult population of a country, or a population like a database, then the data can be weighted to adjust the sample so it represents the chosen population numerically (note weighting always reduces the effective sample size - you can't create interviews by weighting). In cases like business research you might want to weight to size of business to better reflect volume of sales, rather than to individual business names. Weighting is carried out by multiplying each respondent of a particular type by a 'weight'. For instance if a population is 50:50 male to female and your sample is 60:40. Then you would scale the male part by 50/60 and the female part by 50/40 - in other words uprate the female values. Weights always bring the problem that you can make one person seemingly represent a large number of people. So it's important to keep an eye on weight sizes.
Regression and correlation
Many surveys have a bank of rating or scale questions to understand attitudes or to assess performance across a range of different areas. One of the basic questions is how these ratings drive opinions. For instance in a customer satisfaction survey you might ask how overall satisfaction is related to satisfaction with specific aspects such as delivery, ordering, appearance, packaging etc. For this you can take the overall satisfaction and run regression analysis for the sample as a whole to understand the impact of the individual elements in driving overall satisfaction and thereby obtain what is known as a 'derived importance' measure. In practice, the individual items themselves are often related to each other, which can make a regression model confusing or difficult to use. So often people just look at correlation variable by variable.
An alternative to reduce the data prior to regression is 'factor analysis' which takes a group of related variables and identifies a reduced set of meta-variables or factors that group reduct the full raw set of variables to a smaller group. Each factor draws on a number of underlying raw ratings or scores - so for instance you might find a meta-factor for service that relates aspects of helpfulness, checkout speed, queue length and delivery time. These are tehn pulled together into a single factor and the single factors can then be used as a feed in either to a regression (eg looking at drivers of purchase or satisfaction), or in some instances as a feed into cluster analysis.
One of the objectives of segmentation is to see what groups exist in the market. One method is to use cluster analysis. Cluster analysis attempts to group individuals according to the similarity or difference of their answers. There are two main methods - heirarchical cluster or k-means, but there are also other clustering algorithms available. As clustering always creates groups, a mix of approaches is generally recommended to see if different algorithms discover the same groups - if they do, it increases the probability that the clusters are real groups and not just artifacts of the clustering. Clustering works best used with some a priori feeling for what types of segments might come out as one challenge always for cluster-based solutions is replicating the cluster solution in the real world. Having some ideas beforehand makes it easier to check and validate the groups. If clustering is the only method to uncover the groups then some form of scoring or marking will be needed to try to find the groups in subsequent follow up work. It is common to use regression and correlation analysis to identify these groups, but CHAID might also be possible.
Another approach to grouping respondents or variables is to use CHAID analysis. CHAID splits a sample according to levels of significance creating a tree of related variables and can be used to identify key variables or key groups - for instance variables most correlated with purchasing or groups most likely to purchase
For help and advice on carrying out any research projects on-line or off-line contact firstname.lastname@example.org