In cluster sampling the population again is subdivided into subgroups termed clusters instead of strata. The term cluster means a bunch of similar things. Suppose the thousand employees of a factory in our example comprise of teams each consisting of one supervisor and nine workers, and we choose these teams at random, it is a cluster sample, because we have taken one hundred clusters as our sample.
One popular type of cluster sample is the systematic random sample also called quasi-random sample. Suppose the thousand employees of the factory are listed alphabetically and numbered serially to form the sample frame. (The alphabetical order is usually a random arrangement with respect to most characteristics and so forms an unbiased sample frame). As twenty employees are to be chosen from the thousand, it is a one in fifty sample. So, here, 50 is the sampling interval or ‘skip factor’. Now choose a number within fifty at random. Suppose you get 32. This is called the sampling start. Now choose every 50th number from the sampling start. Then we get the following twenty numbers for the sample: 32, 82, … 932, 982. (in other words, if N/n = k, choose every kth number starting from a randomly chosen number within k for a systematic sample). In fact, the thousand employees in our example can be said to form 50 clusters on the basis of their numbers in the sample frame, and you have chosen one of the fifty at random.
If a proportionate representation of supervisors and workers is required here, the sample frame may be arranged with the names of the hundred supervisors first, followed by those of the nine hundred worker, each it alphabetical order. If the sampled numbers are as above, then there would be of the supervisors selected (32 and 82) and eighteen of the workers (132, 182, 982) among the twenty chosen. In the case of an ordered sample frame as above a systematic sample will prove to be statistically more efficient than a simple random sample.
But there is one major problem with systematic random sampling in research methodology. If the sample frame has a cyclical pattern that coincides with a multiple of the size of sampling interval or vice versa, the result would be biased. Suppose, for a study of absenteeism, two days in the week are chosen from the five working days the week. If the two happen to be Monday and Friday, the average percentage of absenteeism calculated from this study will be much higher than the actual daily average for the year because more people generally absent themselves on the two days since they are adjoint to the weekend holidays. Again, if the sample frame is a list of members of a social club, where the husband’s name is follow by the wife’s name such that every odd number is a male and every even number is a woman, a systematic sample of an even number is a woman, a systematic sample of an even number is a woman, a systematic sample of an even number series like 32, 82, 132, … will represent women only. To give another example, if the selection interval is 10 and the sample start is also 10 in a marketing research of households; it is possible that every tenth house in a colony is a corner house, and corner houses are inhabited by people with higher than average income as they command a higher cent. In all the above cases, it is obvious that the sample survey will yield biased results.
Another problem with systematic random sampling in research is what to do when the sampling interval k is a fraction. Suppose five persons are to be selected from 32 by systematic sampling. The sampling interval is 32/5=6.4. In such cases select a number at random between 1 and 64. Suppose you get 8. You start with 0.8 and add up 6.4 repeatedly. You then get the series 0.8, 7.2, 13.6, 20.0, 26.4. Rounding them up you get the following numbers for the sample: 1, 7, 14, 20 and 26. The interval here is sometimes 6 and sometimes 7.