Distributed Data Processing (DDP)

Distributed database system technology is the union of what appear to be two diametrically opposed approaches to data processing: database system and computer network technologies. Database system have taken us from a paradigm of data processing in which each application defined and maintained its own data to one in which the data is defined and administered centrally. This new orientation results in data independence , whereby the application programs are immune to changes in the logical or physical organization of the data. One of the major motivations behind the use of database systems is the desire to integration the operation data of an enterprise and to provide centralized, thus controlled access to that data. The technology of computer networks, on the other hand, promotes a mode of that work that goes against all centralization efforts. At first glance it might be difficult to understand how these two contrasting approaches can possibly be synthesized to produce a technology that is more powerful and more promising than either one alone. The key to this understanding is the realization that the most important objective of the database technology is integration, not centralization. It is important to realize that either one of these terms does not necessarily imply the other. It is possible to achieve integration with centralization and that is exactly what at distributed database technology attempts to achieve.

The term distributed processing is probably the most used term in computer science for last couple of years. It has been used to refer to such diverse system as multiprocessing systems, distributed data processing , and computer networks. Here are some of the other term that have been synonymously with distributed processing distributed/multi-computers, satellite processing /satellite computers, back-end processing, dedicated/special-purpose computers, time-shared systems and functionally modular system.

Obviously, some degree of distributed processing goes on in any computer system, ever on single-processor computers, starting with the second-generation computers, the central processing. However, it should be quite clear that what we would like to refer to as distributed processing, or distributed computing has nothing to do with this form of distribution of function of function in a single-processor computer system.

A term that has caused so much confused is obviously quite difficult to define precisely. The working definition we use for a distributed computing systems states that it is a number of autonomous processing elements that are interconnected by a computer network and that cooperate in performing their assigned tasks. The processing elements referred to in this definition is a computing device that can execute a program on its own .

One fundamental question that needs to be asked is: Distributed is one thing that might be distributed is that processing logic. In fact, the definition of a distributed computing computer system give above implicitly assumes that the processing logic or processing elements are distributed. Another possible distribution is according to function. Various functions of a computer system could be delegated to various pieces of hardware sites. Finally, control can be distributed. The control of execution of various task might be distributed instead of being performed by one computer systems, from the view of distributed instead of being system, these modes of distribution are all necessary and important .

Distributed computing system can be classified with respect to a number of criteria. Some of these criteria are as follows : degree of coupling, interconnection structure, interdependence of components, and synchronization between components. Degree of coupling refer to a measure that determines closely the processing elements are connected together. This can be measured as the ratio of the amount of data exchanged to the amount of local processing performed in executing a task. If the communication is done a computer network, there exits weak coupling among the processing elements. However if components are shared we talk about strong coupling. Shared components can be both primary memory or secondary storage devices. As for the interconnection structure, one can talk about those case that have a point to point interconnection channel. The processing elements might depend on each other quite strongly in the execution of a task, or this interdependence might be as minimal as passing message at the beginning of execution and reporting results at the end. Synchronization between processing elements might be maintained by synchronous or by asynchronous means. Note that some of these criteria are not entirely independent like the processing elements to be strongly interdependent and possibly to work in a strongly coupled fashion.

The fundamental reason behind distributed data processing is to be better able to solve the big and complicated problems by using a variation of the well-known divide-and -conquer. This approach has two fundamental advantages from an economics standpoint. First, distributed computing provides an economical method of harnessing more computer power by employing multiple processing elements optimally. This require research in distributed processing as defined earlier as well as in parallel processing. The second economic reason is that by attacking these problem in discipline the cost of software development. Indeed it is well known that the cost of software has increasing in opposition to the cost trends of hardware.

Distributed Database System

Distributed Database system is a collection of multiple, logical interrelated database distributed over a computer networks. A distributed database management system is known as the software that permits the management of the DDMS and make the distributed transparent to the user.

A DDBS is not a collection of files that can be individually stored at each node of a computer networks. To form a DDBS , files should not only be logically related but there should be structure among the files and access should be via a common interface. We should note that there has been much recent activity in providing DBMS functionality over semi-structured data that are stored in file on the Internet .

The physical distribution of data is not the most significant issue. The proponent of this view would therefore feel comfortable in labeling as a distributed data base two database that reside in the same computer system. However the physical distribution of data is very important. It creates problem that are not encountered when the database in the same computer. This brings us to another point is multiprocessor system as DDBSs. A multiprocessing system is generally considered to be a system where two or more processors share some from of memory either primary memory in which case the multiprocessor is called shared memory or shared disk.

The shared-nothing architecture is one where each processor has its own primary and secondary memories as well as peripherals and communicates with other processors other processors over a very high speed interconnect. However there are differences between the interactions in multiprocessors architectures and the rather loose interaction that is common in distributed computing environments . The fundamental difference is the mode of operation . A multiprocessor system design is rather symmetrically , consisting of a number of identical processor and memory components and controlled by one or more copies of the same operating system, which is responsible for a strict control of the task assignment to each processor.