Methods of Data Processing in Research

Data processing is concerned with editing, coding, classifying, tabulating and charting and diagramming research data. The essence of data processing in research is data reduction. Data reduction involves winnowing out the irrelevant from the relevant data and establishing order from chaos and giving shape to a mass of data. Data processing in research consists of five important steps. They are:

1. Editing of Data

Editing is the first step in data processing. Editing is the process of examining the data collected in questionnaires/schedules to detect errors and omissions and to see that they are corrected and the schedules are ready for tabulation. When the whole data collection is over a final and a thorough check up is made. Mildred B. Parten in his book points out that the editor is responsible for seeing that the data are;

  1. Accurate as possible,
  2. Consistent with other facts secured,
  3. Uniformly entered,
  4. As complete as possible,
  5. Acceptable for tabulation and arranged to facilitate coding tabulation.

There are different types of editing. They are:

  1. Editing for quality asks the following questions: are the data forms complete, are the data free of bias, are the recordings free of errors, are the inconsistencies in responses within limits, are there evidences to show dishonesty of enumerators or interviewers and are there any wanton manipulation of data.
  2. Editing for tabulation does certain accepted modification to data or even rejecting certain pieces of data in order to facilitate tabulation. or instance, extremely high or low value data item may be ignored or bracketed with suitable class interval.
  3. Field Editing is done by the enumerator. The schedule filled up by the enumerator or the respondent might have some abbreviated writings, illegible writings and the like. These are rectified by the enumerator. This should be done soon after the enumeration or interview before the loss of memory. The field editing should not extend to giving some guess data to fill up omissions.
  4. Central Editing is done by the researcher after getting all schedules or questionnaires or forms from the enumerators or respondents. Obvious errors can be corrected. For missed data or information, the editor may substitute data or information by reviewing information provided by likely placed other respondents. A definite inappropriate answer is removed and “no answer” is entered when reasonable attempts to get the appropriate answer fail to produce results.

Editors must keep in view the following points while performing their work:

  1. They should be familiar with instructions given to the interviewers and coders as well as with the editing instructions supplied to them for the purpose,
  2. While crossing out an original entry for one reason or another, they should just draw a single line on it so that the same may remain legible,
  3. They must make entries (if any) on the form in some distinctive color and that too in a standardized form,
  4. They should initial all answers which they change or supply,
  5. Editor’s initials and the data of editing should be placed on each completed form or schedule.

2. Coding of Data

Coding is necessary for efficient analysis and through it the several replies may be reduced to a small number of classes which contain the critical information required for analysis. Coding decisions should usually be taken at the designing stage of the questionnaire. This makes it possible to pre-code the questionnaire choices and which in turn is helpful for computer tabulation as one can straight forward key punch from the original questionnaires. But in case of hand coding some standard method may be used. One such standard method is to code in the margin with a colored pencil. The other method can be to transcribe the data from the questionnaire to a coding sheet. Whatever method is adopted, one should see that coding errors are altogether eliminated or reduced to the minimum level.

Coding is the process/operation by which data/responses are organized into classes/categories and numerals or other symbols are given to each item according to the class in which it falls. In other words, coding involves two important operations; (a) deciding the categories to be used and (b) allocating individual answers to them. These categories should be appropriate to the research problem, exhaustive of the data, mutually exclusive and uni – directional Since the coding eliminates much of information in the raw data, it is important that researchers design category sets carefully in order to utilize the available data more fully.

The study of the responses is the first step in coding. In the case of pressing – coded questions, coding begins at the preparation of interview schedules. Secondly, coding frame is developed by listing the possible answers to each question and assigning code numbers or symbols to each of them which are the indicators used for coding. The coding frame is an outline of what is coded and how it is to be coded. That is, a coding frame is an outline of what is coded and how it is to be coded. That is, coding frame is a set of explicit rules and conventions that are used to base classification of observations variable into values which are which are transformed into numbers. Thirdly, after preparing the sample frame the gradual process of fitting the answers to the questions must be begun. Lastly, transcription is undertaken i.e., transferring of the information from the schedules to a separate sheet called transcription sheet. Transcription sheet is a large summary sheet which contain the answer/codes of all the respondents. Transcription may not be necessary when only simple tables are required and the number of respondents are few.

3. Classification of Data

Classification or categorization is the process of grouping the statistical data under various understandable homogeneous groups for the purpose of convenient interpretation. A uniformity of attributes is the basic criterion for classification; and the grouping of data is made according to similarity. Classification becomes necessary when there is a diversity in the data collected for meaningless for meaningful presentation and analysis. However, it is meaningless in respect of homogeneous data. A good classification should have the characteristics of clarity, homogeneity, equality of scale, purposefulness and accuracy.

Objectives of Classification are below:

  1. The complex scattered and haphazard data is organized into concise, logical and intelligible form.
  2. It is possible to make the characteristics of similarities and dis – similarities clear.
  3. Comparative studies is possible.
  4. Understanding of the significance is made easier and thereby good deal of human energy is saved.
  5. Underlying unity amongst different items is made clear and expressed.
  6. Data is so arranged that analysis and generalization becomes possible.

Classification is of two types, viz., quantitative classification, which is on the basis of variables or quantity and qualitative classification, in which classification according to attributes. The former is the way of, grouping the variables, say, quantifying the variables in cohesive groups, while the latter groups the data on the basis of attributes or qualities. Again, it may be multiple classification or dichotomous classification. The former is the way of making many (more than two) groups on the basis of some quality or attributes while the latter is the classification into two groups on the basis of presence or absence of a certain quality. Grouping the workers of a factory under various income (class intervals) groups come under the multiple classification; and making two groups into skilled workers and unskilled workers is the dichotomous classification. The tabular form of such classification is known as statistical series, which may be inclusive or exclusive.

4. Tabulation of Data

Tabulation is the process of summarizing raw data and displaying it in compact form for further analysis. Therefore, preparing tables is a very important step. Tabulation may be by hand, mechanical, or electronic. The choice is made largely on the basis of the size and type of study, alternative costs, time pressures, and the availability of computers, and computer programmes. If the number of questionnaire is small, and their length short, hand tabulation is quite satisfactory.

Table may be divided into: (i) Frequency tables, (ii) Response tables, (iii) Contingency tables, (iv) Uni-variate tables, (v) Bi-variate tables, (vi) Statistical table and (vii) Time series tables.

Generally a research table has the following parts: (a) table number, (b) title of the table, (c) caption (d) stub (row heading), (e) body, (f) head note, (g) foot note.

As a general rule the following steps are necessary in the preparation of table:

  1. Title of table: The table should be first given a brief, simple and clear title which may express the basis of classification.
  2. Columns and rows: Each table should be prepared in just adequate number of columns and rows.
  3. Captions and stubs: The columns and rows should be given simple and clear captions and stubs.
  4. Ruling: Columns and rows should be divided by means of thin or thick rulings.
  5. Arrangement of items; Comparable figures should be arranged side by side.
  6. Deviations: These should be arranged in the column near the original data so that their presence may easily be noted.
  7. Size of columns: This should be according to the requirement.
  8. Arrangements of items: This should be according to the problem.
  9. Special emphasis: This can be done by writing important data in bold or special letters.
  10. Unit of measurement: The unit should be noted below the lines.
  11. Approximation: This should also be noted below the title.
  12. Foot – notes: These may be given below the table.
  13. Total: Totals of each column and grand total should be in one line.
  14. Source : Source of data must be given. For primary data, write primary data.

It is always necessary to present facts in tabular form if they can be presented more simply in the body of the text. Tabular presentation enables the reader to follow quickly than textual presentation. A table should not merely repeat information covered in the text. The same information should not, of course be presented in tabular form and graphical form. Smaller and simpler tables may be presented in the text while the large and complex table may be placed at the end of the chapter or report.

5. Data Diagrams

Diagrams are charts and graphs used to present data. These facilitate getting the attention of the reader more. These help presenting data more effectively. Creative presentation of data is possible. The data diagrams classified into:

  1. Charts: A chart is a diagrammatic form of data presentation. Bar charts, rectangles, squares and circles can be used to present data. Bar charts are uni-dimensional, while rectangular, squares and circles are two-dimensional.
  2. Graphs: The method of presenting numerical data in visual form is called graph, A graph gives relationship between two variables by means of either a curve or a straight line. Graphs may be divided into two categories. (1) Graphs of Time Series and (2) Graphs of Frequency Distribution. In graphs of time series one of the factors is time and other or others is / are the study factors. Graphs on frequency show the distribution of by income, age, etc. of executives and so on.

Bookmark the permalink.
  • Jayawardena H. M. J.

    Hi for all,
    This is my very first entrance to this site and I am looking for Data processing articles for my Engineering course. The site sounds very useful and simple in English, not only that it attracted my mind compared to other similar sites.

    Wish you all to continue yours all effort and keep the site be very useful for students/beginners like us.


    Jayawardena H. M. J.
    Sri Lanka