Data Analysis in Research Methodology

After data are obtained are obtained through questionnaire, interviews, observation or through secondary sources, they need to be edited. The blank responses, if any, have to be handled in some way, the data coded, and categorizing scheme has to be set up. The data will then have to be keyed in, and some software program used to analyze them.

Editing

Data have to be edited, especially when they relate to responses to open-ended questions of interviews and questionnaires, or unstructured observations. In other words, information that may have been noted down by the interviewer, observer, or researcher in a hurry must be clearly deciphered so that it may be coded systematically in its entirety. Lack of clarity at this stage will result later in confusion. The edited data should be identifiable through the use of a different color pencil or ink so that original information is still available in case of further doubts.

Incoming mailed questionnaire data have to be checked for incompleteness and inconsistencies, if any, by designated members of research staff. Inconsistencies that can be logically corrected should be rectified and edited at this stage.

Much of the editing is automatically taken care of in the case of computer-assisted telephone interviews and electronically administered questionnaires, even as the respondent is answering the question.

Handling Blank Responses

Not all respondents answer every item in the questionnaire. Answers may have been left blank because the respondent did not understand the question, did not know the answer, was not willing to answer, or was simply indifferent to the need to respond the entire questionnaire. If a substantial number of questions — say 25% of the items in the questionnaire — have been left unanswered, it may be a good idea to throw out the questionnaire and not include it in the data set for analysis. In this event, it is important to mention the number of returned but unused responses due to excessive missing data in the final report submitted to the sponsor of the study. If, however, only two or three items are left blank in a questionnaire with, say, 30 or more items, we need to decide how these blank responses are to be handled.

One way to handle a blank response to an interval-scaled item with a midpoint would be assign the midpoint in the scale as the response to the particular item. An alternative way is to allow the computer to ignore the blank responses when the analyses are done. There are several ways to handling blank responses; a common approach, however, is either to give the midpoint in the scale as the value or to ignore the particular item during the analysis.

Coding

The next step is to code the responses. Scanner sheets can be used for collecting questionnaire data; such sheets facilitate the entry of the responses directly into the computer without manual keying in of the data. However, if for whatever reason this cannot be done, then it is perhaps better to use a coding sheet first to transcribe the data from the questionnaire and then key in the data. This method, in contrast to flipping through each questionnaire for each item, avoids confusions, especially when there are many questions and a large number of questionnaires as well.

It is possible to key in the data directly from the questionnaires, but that would need flipping through several questionnaires, page by page, resulting in possible errors and omissions of items. Transfer of the data first onto a code sheet would thus help.

Human errors can occur while coding. At least 10% of the coded questionnaires should therefore be checked for coding accuracy. Their selection may follow a systematic sampling procedure. That is, every nth form coded could be verified for accuracy. If many errors are found in the sample, all items may have to be checked.

Categorizing

At this point it is useful to set up a scheme for categorizing the variables such that the several items measuring a concept are all grouped together. Responses to some of the negatively worded questions have also to be reversed so that all answers are in the same direction.

If the questions measuring a concept are not contiguous but scattered over various parts of the questionnaire, care has to be taken to include all the items without any omission or wrong inclusion.

Entering Data

If questionnaire data are not collected on scanner answer sheets, which can be directly entered into the computer as a data file, the raw data will have to be manually keyed into the computer. Raw data can be entered through and software program. For instance, the SPSS Data Editor, which looks like a spread-editor represents a case, and each column represents a variable. All missing values will appear with a period (dot) in the cell. It is possible to add, change, or delete values after the data have been entered.

It is also easy to compute the new variables that have been categorized earlier, using the Compute dialog box, which opens when the Transform icon is chosen. Once the missing values, the recodes, and the computing of new variables are taken care of, the data are ready for analysis.

Feel for the Data

We can acquire a feel for the data by checking the central tendency and the dispersion. The mean, the range, the standard deviation, and the variance in the data will give researcher a good idea of how the respondents have reacted to the items in the questionnaire and how good items and measures are. If the response to each individual item in a scale does not have a good spread (range) and shows very little variability, then the researcher would suspect that the particular question was probably not properly worded and respondents did not quite understand the intent of the question. Biases, if any, could also be detected if the respondents have tended to respond similarly to all the items — that is, struck to only certain points on the scale. The maximum and minimum scores, mean, standard deviation, variance, and other statistics can be easily obtained, and these will indicate whether the responses range satisfactorily over the scale.

A frequency distribution of the nominal variables of interest should be obtained. Visual displays thereof through histograms/bar charts, and son on, can also be provided through programs that generate charts. In addition to the frequency distributions and the means and standard deviations, it is good to know how the dependent and independent variables in the study are related to each other. For this purpose, an intercorrelation matrix of these variables should also be obtained.

It is always prudent to obtain (1) the frequency distributions for the demographic variables, (2) the mean, standard deviation, range, and variance on the other dependent and independent variables, and (3) an intercorrelation matrix of the variables, irrespective of whether or not the hypotheses are directly related to these analyses. These statistics give a feel for the data.

Establishing the goodness of data lends credibility to all subsequent analyses and findings. Hence, getting a feel for the data becomes the necessary first step in all data analysis. Based on this initial feel, further detailed analyses may be done to test the goodness of the data.

Testing Goodness of Data

The reliability and validity of the measures can now be tested.

Reliability: The reliability of a measure is established by testing for both consistency and stability. Consistency indicates how well the items measuring a concept hang together as a set. Cronbach’s alpha is a reliability coefficient that indicates how well the items in a set are positively correlated to one another. Cronbach’s alpha is computed in terms of the average intercorrelations among the items measuring the concept. The closer Cronbach’s alpha is to 1, the higher the internal consistency reliability.

Another measure of consistency reliability used in specific situations is the split-half reliability coefficient. Since this reflects the correlations between two halves of a set of items, the coefficients obtained will vary depending on how scale is split. Sometimes split-half reliability is obtained to test for consistency when more than one scale, dimension, or factor, is assessed. The items across each of the dimensions or factors are split, based on some predetermined logic. In almost every case, Cronbach’s alpha is an adequate test of internal consistency reliability

The stability of a measure can be assessed through parallel from reliability and test-retest reliability. When a high correlation between two similar forms of a measure is obtained, parallel form reliability is established. Test-retest reliability can be established by computing the correlation between the same tests administered at two different time periods.

Validity: Factorial validity can be established by submitting the data for factor analysis. The results of factor analysis (a multivariate technique) will confirm whether or not the theorized dimensions emerge. Measures are developed by first delineating the dimensions so as to operationalize the concept. Factor analysis would reveal whether the dimensions are indeed tapped by the items in the measure, as theorizes. Criterion-related validity can be established by testing for the power of the measure to differentiate individuals who are known to be different. Convergent validity can be established when there is high degree of correlation between two different sources responding to the same measure. Discriminate validity can be established when two distinctly different concepts are not correlated to each other.

Hypothesis Testing

Once the data are ready for analysis, (i.e., out-of-range/missing responses, etc., are cleaned up, and the goodness of the measures is established), the researcher is ready to test the hypotheses already developed for the study.

There are different statistical tests which are selected according to different hypotheses and nature of data.

Interpretation of Data Analyzed

After the data has been completely analyzed, its results have to be properly interpreted. That interpretation of results is the most meaningful to the organization.