# Stratified Random Sampling in Research

In the example of choosing a simple random sample of twenty employees out of a thousand in a factory, suppose they include 100 supervise and 900 workers. A simple random sample comes by mere chance and it possible that among the twenty chosen, all may be supervisors or none may be. If the researcher feels that he should study both the subgroups, it would be me sensible to take a random sample from each subgroup (stratum) after separate lists for the two strata.  Such sampling is called stratified random sampling. Stratification may be done in business research on differ characteristics like sex, age (e.g. young, middle-aged, old), race, religion, occupation, education, residential area (e.g., rural, urban) ownership of (e.g., public sector, private sector), size of business or income.

The stratification may be proportionate or disproportionate. In the sampling of twenty employees from a factory described above, if 2 supervisors and 18 workers are chosen at random to form the sample of twenty, the sample division between supervisor and worker is in the same proportion and it is a stratified proportionate sample. In this case worker and supervisors have the same sampling fraction, 1/50 which is the probability of each employee getting selected into the sample. Suppose on the other hand, 5 supervisors and 15 workers are chosen random from the 100 and 900 respectively. It is a stratified disproportionate sample. Here the sampling fraction is different for the two groups. In where you expect workers to have more homogeneity and the supervise have more variability in the relevant characteristics, a larger than proportionate sample of supervisors would be advisable. The rule here is, “the larger the variation, the larger the sample and vice versa”.

In the case of a stratified sample as above, the supervisors and the workers do not have an equal chance of getting selected, but nevertheless, each employee has a known, non-zero chance and hence stratified sampling is also a probability sampling.

We have already noted a reason why disproportionate stratification may be adopted. There are also other reasons for this as well. Suppose the cost of collecting data from one stratum (say, rural population) is higher, its proportion may be reduced, and strata with lower data collection costs may be sampled at higher rates. Sometimes, the researcher is interested more in making comparisons between each stratum estimate rather than in aggregating them all into an overall population estimate. Then, we may take an equal number of items from the strata (e.g. 10 supervisors and 10 workers) assuming the variability of characteristics and cost of collection to be constant between the strata.

However, proportionate stratification is used more than disproportionate stratification because estimation is simpler and because it guarantees that the estimates of population parameters is no less precise than those obtained from a simple random sample of the same size. In fact, with regard to accuracy of estimates, proportionate stratification is preferable to simple random sampling. But the problem here is that in addition to the overall sampling frame we require information on about which item belongs to which subgroup. We may, for example, are a sample frame from the voters list in a city for a study of housewives respect to, say, their purchase of certain goods; but if we wish to have the sample stratified on the basis of education, or between mothers and childless housewives, we require additional information about the educational level of the housewives and the number of children they have, to determine how many housewives in each stratum we wish to interview.