Sampling is that part of statistical practice concerned with the selection of an unbiased or random subset of individual observations within a population of individuals intended to yield some knowledge about the population of concern, especially for the purposes of making predictions based on statistical inference. Sampling is an important aspect of data collection.

There are two basic approaches to sampling: probabilistic and non-probabilistic sampling.

A probability sampling scheme is one in which every unit in the population has a chance (greater than zero) of being selected in the sample, and this probability can be accurately determined. The combination of these traits makes it possible to produce unbiased estimates of population totals, by weighting sampled units according to their probability of selection. Example: We want to estimate the total income of adults living in a given street. We visit each household in that street, identify all adults living there, and randomly select one adult from each household. (For example, we can allocate each person a random number, generated from a uniform distribution between 0 and 1, and select the person with the highest number in each household). We then interview the selected person and find their income. People living on their own are certain to be selected, so we simply add their income to our estimate of the total. But a person living in a household of two adults has only a one-in-two chance of selection. To reflect this, when we come to such a household, we would count the selected person’s income twice towards the total. (In effect, the person who is selected from that household is taken as representing the person who isn’t selected.) In the above example, not everybody has the same probability of selection; what makes it a probability sample is the fact that each person’s probability is known. When every element in the population does have the same probability of selection, this is known as an ‘equal probability of selection’ (EPS) design. Such designs are also referred to as ‘self-weighting’ because all sampled units are given the same weight.

Nonprobability sampling is any sampling method where some elements of the population have no chance of selection (these are sometimes referred to as ‘out of coverage’/’undercovered’), or where the probability of selection can’t be accurately determined. It involves the selection of elements based on assumptions regarding the population of interest, which forms the criteria for selection. Hence, because the selection of elements is nonrandom, nonprobability sampling does not allow the estimation of sampling errors. These conditions place limits on how much information a sample can provide about the population. Information about the relationship between sample and population is limited, making it difficult to extrapolate from the sample to the population. Example: We visit every household in a given street, and interview the first person to answer the door. In any household with more than one occupant, this is a nonprobability sample, because some people are more likely to answer the door (e.g. an unemployed person who spends most of their time at home is more likely to answer than an employed housemate who might be at work when the interviewer calls) and it’s not practical to calculate these probabilities. In addition, non-response effects may turn any probability design into a non-probability design if the characteristics of non-response are not well understood, since non-response effectively modifies each element’s probability of being sampled.

Let us look at the various types of sampling under each category:

**Probability Sampling**

- Simple random sampling
- Systematic sampling
- Stratified sampling
- Multistage cluster sampling

**Non-probability Sampling**

- Convenience sampling
- Quota sampling
- Judgment sampling
- Snowball sampling

## 1. Probability Sampling Methods

A sampling in which every member of the population has a calculable and non-zero probability of being included in the sample is known as **probability sampling**. Methods of random selection consistent with both the probabilities of inclusion are used in forming estimates from the sample. The probability of selection need not be equal for members of the population. If the purpose of a research is to arrive at conclusions or make predictions affecting the population as a whole, then the choice of a probabilistic sampling approach is desirable.

**1.1. Simple Random Sampling:**

A sampling process where each element in the target population has an equal chance or probability of inclusion in the sample is known as simple random sampling. For ex, if a sample of 15000 names is to be drawn from the telephone directory, then there is equal chance for each number in the directory to be selected. These numbers (serial no of name) could be randomly generated by the computer or picked out of a box. These numbers could be later matched with the corresponding names thus fulfilling the list. In small populations random sampling is done without replacement to avoid the instance of a unit being sampled more than once.

The benefits of **simple random sampling** can be reaped when the target population size is small, homogeneous, sampling frame is clearly defined, and not much information is available regarding the population. It is advantageous in that it is free of classification error, and requires minimum advance knowledge of the population. Two striking features are the elimination of human bias and non-dependency on the availability of the element. It is seldom put into practice because of the application problem associated with it. This sampling method is generally not preferred as it becomes imperative to list every item in the population prior to the sampling and requires constructing a very large sampling frame, resulting in extensive sampling calculations and excessive costs.

**1.2. Systematic Sampling:**

Systematic sampling involves the selection of every kth element from a sampling frame. Here ‘k’ represents the skip interval and is calculated using the following formulae.

Skip interval (k) = population size/Sample size

Often used as a substitute to simple random sample, it involves the selection of units from a list using a skip interval (k) so that every k’th element on the list, following a random start between 1 and k, is included in the sample. For ex, if k were to equal 6, and the random start were2, then the sample would consists of 2^{nd}, 8^{th}, 14^{th}, 20^{th} …….elements of the sampling frame.

It is to be noted here that if the skip interval is not a whole number then it is rounded off to the nearest whole number. This sampling method can be used industrial operations where the equipments and machinery in the production line are checked for proper functioning as per the specifications. The manufacturer can select every k’th item to ensure consistent quality or for detection of defects. Therefore, he requires the first item to be selected at the random as the starting point and subsequently he can choose every k’th item for evaluation against specifications. It also finds its applicability while questioning people in a sample survey where the interviewer may catch hold of every 10^{th} person entering a particular shop. However, in every case, the researcher has to determine the skip interval and proceed thereafter. In both the cases, it is necessary to select the first item in the population in a random manner and thereafter follow the skip interval. This method is more economical and less time consuming than simple random sampling.

**1.3. Stratified random sampling:**

Stratification is the process of grouping the members of the population in homogenous group before sampling. It should be ensured that each element in the population is assigned a particular stratum only. The random sampling is applied within each stratum independently. This often improves the representativeness of the sample by reducing the sampling error.

The number of units drawn for sampling from each stratum depends on the homogeneity of the elements. A smaller sample can be drawn from the known to have the elements with the same value whereas sample can be drawn in much higher proportion from another stratum where the values are known to differ. This is because in the former case the information from the smaller number of respondents can be enumerated to the whole sample stratum. However in the latter case with much variability among the elements the higher elements value will keep the sampling to minimum errors to minimum value. The smaller errors may be due to groups are appreciably represented when strata are combined.

**1.4. Multistage cluster sampling: **

Clustering involves grouping the population into various clusters and selecting few clusters for study. Cluster sampling is suitable for conducting research studies that cover large geographic area. Once the cluster is formed the researcher can either go for one stage, two stages, or multi stage cluster sampling. In single stage, all the elements from each selected are studied, whereas in two stages, the researchers use random to select few elements from clusters. Multistage sampling involves selecting a sample in two or more successive stages. Here the cluster selected in the first stage can be divided into cluster units.

For example consider the case where a company decides to interview 400 households about the likeability of its new detergent in a metropolitan city. To minimize the resources and time researchers divide the city into separate blocks say 40, each block consist of heterogeneous units. The researcher may opt for the two stage cluster sampling if he finds that individual clusters have little heterogeneity to other clusters. Similarly a multi stage cluster sampling involves three or more sampling steps, it differs from stratified sampling that is done in cluster in contrast to elements within strata as is the case in the stratified sampling. Elements are randomly selected from each stratum in each stratum in case of stratified sampling whereas only selected clusters are studied in cluster sampling.

## 2. Non-probability Sampling Methods

It involves the selection of units based on factors other than random chance. It is also known as deliberate sampling and purposive sampling. For ex, a scheme whereby units are selected purposefully would yield a non-random sample. In a general sense, it is an umbrella term, which includes any sample that does not conform to the requirements of a probability sampling. Convenience sampling, quota sampling, judgment sampling and snow ball sampling are few ex’s of non-probability sampling.

**2.1. Convenience Sampling:**

The selection of units from the population based on their easy availability and accessibility to the researcher is known as convenience sampling. For ex, imagine a co., that surveys a sample of its employees to know the acceptance for a new flavor of potato chips that it plans to introduce in the market. This type of sampling is a typical ex of convenience sampling as the criterion for selecting a sample is convenience and availability. Although this type of research is easy and cost effective, the findings of the sample survey cannot be generalized to the entire population, as the sample is not representative. As there is no set criterion for selecting the sample, there is a scope for research being influenced by the bias of the researcher. As in the above ex, the researcher may conduct a sample survey involving its own employees to find whether the market, would accept the product.

**2.2. ****Quota Sampling:**

In quota sampling, the entire population is segmented into mutually exclusive groups. The number of respondents (quota) that are to be drawn from each of several categories is specified in advance and the final selection of respondents is left to the interviewer who proceeds until the quota for each category is filled. Quota sampling finds extensive use in commercial research where the main objective is to ensure that the sample represents in relative proportion, the people in the various categories in the population, such as gender, age group, social class, ethnicity, and region of residence. For ex, if a researcher wants to segment the entire population based on gender, then he would have two categories of respondents, that is, males and females. If he plans to collect a sample of 30, he may allot a quota of 15 for male and 15 for female respondents. Therefore, the researcher will stop administering the questionnaire to females after he interviews the 15^{th} female respondent, that is, when the quota of 15 females is filled.

Quota sampling is subject to interviewer bias that may result in:

- The quota reflecting the population in terms of superficial characteristics.
- The researcher selecting the respondents based on availability rather than on their suitability to the study.

**2.3. Judgment Sampling:**

The selection of a unit, from the population based on the judgment of an experienced researcher, is known as judgment or purposive sampling. Here, the sample units are selected based on population’s parameters. It is often noticed that companies frequently select certain preferred cities during test marketing their products. This is because they consider the population of that particular city to be representative of the total population of the country. The same is the case with the selection of specific shopping malls that according to the researcher’s judgment attract a reasonable number of customers from different sections of the society. Polling results predicted on television is also a result of judgment sampling. Researchers select those districts that have voting patterns close to the overall state or country in the previous year. The judgment of the researcher is based on the assumption that the past voting trends of selected sample districts are still representative of the political behavior of the state’s population. For ex, certain companies test market their new product launches in cities like Mumbai and Bangalore, because the profile of these cities is representative of the total Indian population.

**2.4. Snowball Sampling:**

Sampling procedures that involve the selection of additional respondents are known as snowball sampling. This sampling technique is used against low incidence or rare populations. Sampling is a big problem in this case, as the defined population from which the sample can be drawn is not available. Therefore, the process sampling depends on the chain system of referrals. Suppose, SG sports Ltd., a manufacturer of sports equipment plans to survey 100 senior players through its new website for getting their feedback on the quality of its products.

However, getting track of such senior senior squash players can be very difficult, as their presence may be very rare or low. Therefore, it collects the details of the first 200 visitors to its website, to list if any of them is a squash player or knows a squash player. If the visitor is a squash player, then he is requested to refer the names of at least 3 other players known to him. The referred names of the squash players are then called upon for further referrals and this gone on until the sample size of 100 adult players is reached. Although small sample sizes and low costs are the clear advantages of snowball sampling, bias is one of its disadvantages. The referral names obtained from those sampled in the initial stages may be similar to those initially sampled. Therefore, the sample may not represent a cross-section of the total population. It may also happen that visitors to the site or interviewers may refuse to disclose the names of those whom they know.