Distributed Data Processing (DDP)

Distributed database system technology is the union of what appear to be two diametrically opposed approaches to data processing: database system and computer network technologies. Database system have taken us from a paradigm of data processing in which each application defined and maintained its own data to one in which the data is defined and administered centrally. This new orientation results in data independence , whereby the application programs are immune to changes in the logical or physical organization of the data. One of the major motivations behind the use of database systems is the desire to integration the operation data of an enterprise and to provide centralized, thus controlled access to that data. The technology of computer networks, on the other hand, promotes a mode of that work that goes against all centralization efforts. At first glance it might be difficult to understand how these two contrasting approaches can possibly be synthesized to produce a technology that is more powerful and more promising than either one alone. The key to this understanding is the realization that the most important objective of the database technology is integration, not centralization. It is important to realize that either one of these terms does not necessarily imply the other. It is possible to achieve integration with centralization and that is exactly what at distributed database technology attempts to achieve.

The term distributed processing is probably the most used term in computer science for last couple of years. It has been used to refer to such diverse system as multiprocessing systems, distributed data processing , and computer networks.… Read the rest

Data Mining Functionalities

Data mining has an important place in today’s world. It becomes an important research area as there is a huge amount of data available in most of the applications. This huge amount of data must be processed in order to extract useful information and knowledge, since they are not explicit. Data Mining is the process of discovering interesting knowledge from large amount of data.

The kinds of patterns that can be discovered depend upon the data mining tasks employed. By and large, there are two types of data mining tasks: descriptive data mining tasks that describe the general properties of the existing data, and predictive data mining tasks that attempt to do predictions based on inference on available data. The data mining functionalities and the variety of knowledge they discover are briefly presented in the following list:

  1. Characterization: It is the summarization of general features of objects in a target class, and produces what is called characteristic rules. The data relevant to a user-specified class are normally retrieved by a database query and run through a summarization module to extract the essence of the data at different levels of abstractions. For example, one may wish to characterize the customers of a store who regularly rent more than movies a year. With concept hierarchies on the attributes describing the target class, the attribute oriented induction method can be used to carry out data summarization. With a data cube containing summarization of data, simple OLAP operations fit the purpose of data characterization.
  2. Discrimination: Data discrimination produces what are called discriminant rules and is basically the comparison of the general features of objects between two classes referred to as the target class and the contrasting class.
Read the rest

An Introduction to Data Mining

Data mining involves the use of sophisticated data analysis tools to discover previously unknown, valid patterns and relationships in large data sets. These tools can include statistical models, mathematical algorithms, and machine learning methods such as neural networks or decision trees. Consequently, data mining consists of more than collecting and managing data, it also includes analysis and prediction. The objective of data mining is to identify valid, novel, potentially useful, and understandable correlations and patterns in existing data. Finding useful patterns in data is known by different names (e.g., knowledge extraction, information discovery, information harvesting, data archaeology, and data pattern processing).

The term “data mining” is primarily used by statisticians, database researchers, and the business communities. The term KDD (Knowledge Discovery in Databases) refers to the overall process of discovering useful knowledge from data, where data mining is a particular step in this process. The steps in the KDD process, such as data preparation, data selection, data cleaning, and proper interpretation of the results of the data mining process, ensure that useful knowledge is derived from the data. Data mining is an extension of traditional data analysis and statistical approaches as it incorporates analytical techniques drawn from various disciplines like AI, machine learning, OLAP, data visualization, etc.

Data Mining covers variety of techniques to identify nuggets of information or decision-making knowledge in bodies of data, and extracting these in such a way that they can be. Put to use in the areas such as decision support, prediction, forecasting and estimation. The data is often voluminous, but as it stands of low value as no direct use can be made of it; it is the hidden information in the data that is really useful.… Read the rest

Important features of database systems

A major feature of a database system is to provide users with an abstract view of data i.e. the system hides certain details of how data is stored and maintained.

1) Data Abstraction

Data abstraction is the property of showing only the necessary details to a user and hides the rest of the details from that user. Since many database system users are not computer trained, developers hide the complexity from users through several levels of abstraction, to simplify user’s interactions with the system.

  • Physical level – This level describes how data is actually stored in the database.
  • Logical level – This level what data are stored in the database and what relationship exist among those data. The entire database is described to a community of users. The logical level of abstraction is used by Database Administrators (DBA) who must decide what information is to be stored in the database. One of the main reasons for using DBMS’s is to have central control of both the data and the programs that access those data. A person who has such central control over the system is called a Database Administrator (DBA).
  • View level – This level describes only part of the entire database that a particular user group is interestedin and hides the rest of the database from that user group. The system may provide many views for the same database.

The inter-relationship between the three levels is as shown below:

2) Instances and Schemas

Databases change over time as information is inserted and deleted.… Read the rest

Introduction to database concepts

A database is a collection of related data. By data, we mean known facts that can be recorded and that have implicit meaning. For example, consider the names, telephone numbers and addresses of the people we know.

A Database Management System (DBMS) is a collection of inter-related data and a set of programs to access those data. The primary goal of a DBMS is to provide an environment that is both convenient and efficient to use in retrieving and storing database information. DBMS is a general purpose software system that facilitates the processes of defining, constructing, manipulating and sharing databases among various users and applications. Defining a database involves specifying the data types, structures and constraints to the data to be stored in the database. The database definition or descriptive information is also stored in the database in the form of a database catalog or dictionary; it is called metadata. Constructing a database is the process of storing the data on some storage medium that is controlled by the DBMS. Manipulating the database includes functions such as querying the database to retrieve specific data, updating the database to reflect the real world changes and generating reports from the data. Sharing a database allows multiple users and programs to access the database simultaneously. Other important functions provided by the DBMS include protecting the database and maintaining it over a long period of time. Protection includes system protection against hardware or software malfunctions and security protection against unauthorized or malicious access.… Read the rest