Introduction to File Organization

As in our daily life, huge amount of data has to be collected and processed, so it is very difficult to handle it. But this can be handled fast and easily by using files. Files are the mega byte data structure used in information processing. Actually, a file itself is a bunch of bytes stored on some storage devices like magnetic disk, magnetic drum and magnetic tape etc. A file is a collection of records. Each record is made up of fields. The various fields consists of groups of characters, say the decimal digits 0 through 9 and alphabet A through Z. Group of fields are combined to form a logical record. This logical record contains all the data of interest about some entity.

Different application requires a variety of record types and file structure; one basic distinction is between fixed and variable length records. A fixed length record has all the field sizes and a number of fields fixed or known in advance whereas in the variable length record type the number of fields is not specified in advance. The variable length record makes programming and file design a complex one. The only way out be to break this variable-length record in to several fixed length records and identify them as a header and trailer records(which may be a variable). The file as input and output medium has the following advantages from different point of view:-

  1. Files are used to save time for data entry and reduce its processing time also.
  2. These are used for storage and subsequent retrieval of data.
  3. Files are also used to correct the incorrect data and also update the correct form data.
  4. Files are used in testing and debugging of programs.
  5. Files are used for processing the voluminous data.
  6. Data validation for the accuracy of data can be measured by using files.
  7. Using files, data can be stored on many devices; like floppy disks, hard disks, magnetic tapes, magnetic drums etc, which help for the permanent storage and distributed data processing.

Field: A field is a meaningful collection of related characters. It is the smallest logical data entity that is treated as a single unit in data processing.

Record: Fields are generally grouped together to form a record. A record is a collection of related fields that are treated as a single unit. A record can be classified as:

  • Logical record: A logical record is a group of related information uniquely identifiable and treated as a unit. The data record is usually considered to be a group item comprising of several related items.
  • Physical record: A physical record is a physical unit of information whose size and record mode is convenient for a particular input or output device for the storage of data. The physical and the logical records are related to each other in many possible ways. A physical record may consist of a portion of the logical records and may consist of a mixture of several complete and partial logical records.
  • Label record: Label records are normally used for files that are stored on magnetic tape or direct access devices. The record usually contains the information relative to the file. Card files do not contain any label records. Generally label records are written at the end or at the beginning of a file.

File: A file is a collection of number of related records that may have same or varying length. A file is a collection of related data stored in a particular area on the disk. There are two types of access on the files(two ways to access the files). These are sequential access files and random access files. In sequential access files, the data or text can be stored or read back sequentially. In the random file access files, data can be stored and accessed randomly. Also files can be divided into two types, one is a data file and other is a program file. Two types of operations can be done on data i.e. data can be transferred between the console and the program and further can be transferred between program and a disk file.

Blocking: A block is referred to as a physical record containing a series of logical records. a physical record is a group of characters or records which are treated as an entity when move into and out of the main storage. When data is stored in the magnetic tape or direct access devices, the logical records are grouped into blocks. Each read or write operation may transfer the entire block of data to or from main storage at one time and to or from an input/output device. Each logical record within the block is then processed separately. Once the relationship between logical and physical record is established, only logical records are made available to the program.

Compaction: Compaction is a technique for reclaiming the storage. Compaction works on actually moving blocks of data etc. from one location in memory to another so as to collect all the free blocks into one large block. The allocation problem of data in the storage area occurs during the file organization or data organization. So with the help of compaction, the allocation problem then becomes completely simplified.

Operations on files

  • Create: Essential if a system is to add files. Need not be a separate system call, can be merged with open.
  • Delete: Essential if a system is to delete files.
  • Open: Not essential. An optimization in which the translation from file name to disk locations is performed only once per file rather than once per access can be used.
  • Close: Not essential. Free resources.
  • Read: Essential. Must specify filename, file location, number of bytes, and a buffer into which the data is to be placed. Several of these parameters can be set by other system calls and in many OS’s they are done so.
  • Write: Essential if updates are to be supported. See read for parameters.
  • Seek: Not essential. (could be in read/write). Specify the offset of the next(read/write) access to this file.
  • Get attributes: Essential if attributes are to be used.
  • Set attributes: Essential if attributes are to be user settable.
  • Rename: Instead of using copy and delete, which is a problem in case of big files we should use rename which is atomic in nature. Indeed link-delete is not atomic so even if link is provided, renaming a file adds functionality.

File Updation: The active lifetime of a file is usually short. Very soon the information stored on a file becomes old and it becomes necessary to modify the file with current information. This process of modifying an old file with current information is known as file updation. The problem of file updation can be defined as follows: given an old master file and a transaction file, the problem is to create an updated new master file.

The process of updating may include:-

  • insertion of new records.
  • Modification of some existing records
  • deletion of obsolete records
  • copy of those records which are neither obsolete nor require any modification

Example:- Consider a list of magazine subscribers in which the list of master file contains the following information: subscriber number, name, address, age, sex, etc. The transaction file contains records having the following fields: subscriber number, transaction code, name and address, age, sex. The transaction code indicates the type of transaction. If the code is 1,it is assumed to be the record of a new subscriber. Code 2 indicates that the subscriber has not renewed his subscription and the record is to be deleted. Code 3 indicates that there has been a change of address for the subscriber and the record is to be modified. Both the files are stored in the ascending order of their key which is the subscriber number. The logic of the file updation is as follows:-

In the beginning, one record from each file is read.

  1. If the key of transaction record is lower than the master record, a new record will be created in the new master file for this new subscriber. Subsequently next transaction is read and the procedure is repeated.
  2. If the key of the master record is less than the transaction record, the record will be simply copied onto the new master file without any charges. After this, next master record is read and same procedure is repeated.
  3. In case keys of transaction and master records are equal then the master record will be deleted or updated depending on whether the transaction code is 2 or 3. In either case, next master record and transaction record will be read and procedure is repeated.

The above logic will work until either of the file is exhausted. Note that either of these files can be exhausted earlier. In that case, the remaining records of the other file will be copied on to the New Master file.

  1. File Referencing: When access is made to a particular record to ascertain what is contained therein, e. g. reference is made to a ‘process’ file during an invoicing run. Note that this does not involve any alteration to the record itself.
  2. File maintenance: It is the process of adding, amending or deleting standing data of a reference or master file.
  3. File Inquiry: It is the examination of both reference and master files to obtain information contained therein. This does not involve any alteration to what the file contains.

Leave a Reply

Your email address will not be published. Required fields are marked *