Data Mining Functionalities

Data mining has an important place in today’s world. It becomes an important research area as there is a huge amount of data available in most of the applications. This huge amount of data must be processed in order to extract useful information and knowledge, since they are not explicit. Data Mining is the process of discovering interesting knowledge from large amount of data.

The kinds of patterns that can be discovered depend upon the data mining tasks employed. By and large, there are two types of data mining tasks: descriptive data mining tasks that describe the general properties of the existing data, and predictive data mining tasks that attempt to do predictions based on inference on available data. The data mining functionalities and the variety of knowledge they discover are briefly presented in the following list:

  1. Characterization: It is the summarization of general features of objects in a target class, and produces what is called characteristic rules. The data relevant to a user-specified class are normally retrieved by a database query and run through a summarization module to extract the essence of the data at different levels of abstractions. For example, one may wish to characterize the customers of a store who regularly rent more than movies a year. With concept hierarchies on the attributes describing the target class, the attribute oriented induction method can be used to carry out data summarization. With a data cube containing summarization of data, simple OLAP operations fit the purpose of data characterization.
  2. Discrimination: Data discrimination produces what are called discriminant rules and is basically the comparison of the general features of objects between two classes referred to as the target class and the contrasting class.
Read the rest

An Introduction to Data Mining

Data mining involves the use of sophisticated data analysis tools to discover previously unknown, valid patterns and relationships in large data sets. These tools can include statistical models, mathematical algorithms, and machine learning methods such as neural networks or decision trees. Consequently, data mining consists of more than collecting and managing data, it also includes analysis and prediction. The objective of data mining is to identify valid, novel, potentially useful, and understandable correlations and patterns in existing data. Finding useful patterns in data is known by different names (e.g., knowledge extraction, information discovery, information harvesting, data archaeology, and data pattern processing).

The term “data mining” is primarily used by statisticians, database researchers, and the business communities. The term KDD (Knowledge Discovery in Databases) refers to the overall process of discovering useful knowledge from data, where data mining is a particular step in this process. The steps in the KDD process, such as data preparation, data selection, data cleaning, and proper interpretation of the results of the data mining process, ensure that useful knowledge is derived from the data. Data mining is an extension of traditional data analysis and statistical approaches as it incorporates analytical techniques drawn from various disciplines like AI, machine learning, OLAP, data visualization, etc.

Data Mining covers variety of techniques to identify nuggets of information or decision-making knowledge in bodies of data, and extracting these in such a way that they can be. Put to use in the areas such as decision support, prediction, forecasting and estimation. The data is often voluminous, but as it stands of low value as no direct use can be made of it; it is the hidden information in the data that is really useful.… Read the rest

Important features of database systems

A major feature of a database system is to provide users with an abstract view of data i.e. the system hides certain details of how data is stored and maintained.

1) Data Abstraction

Data abstraction is the property of showing only the necessary details to a user and hides the rest of the details from that user. Since many database system users are not computer trained, developers hide the complexity from users through several levels of abstraction, to simplify user’s interactions with the system.

  • Physical level – This level describes how data is actually stored in the database.
  • Logical level – This level what data are stored in the database and what relationship exist among those data. The entire database is described to a community of users. The logical level of abstraction is used by Database Administrators (DBA) who must decide what information is to be stored in the database. One of the main reasons for using DBMS’s is to have central control of both the data and the programs that access those data. A person who has such central control over the system is called a Database Administrator (DBA).
  • View level – This level describes only part of the entire database that a particular user group is interestedin and hides the rest of the database from that user group. The system may provide many views for the same database.

The inter-relationship between the three levels is as shown below:

2) Instances and Schemas

Databases change over time as information is inserted and deleted.… Read the rest

Introduction to database concepts

A database is a collection of related data. By data, we mean known facts that can be recorded and that have implicit meaning. For example, consider the names, telephone numbers and addresses of the people we know.

A Database Management System (DBMS) is a collection of inter-related data and a set of programs to access those data. The primary goal of a DBMS is to provide an environment that is both convenient and efficient to use in retrieving and storing database information. DBMS is a general purpose software system that facilitates the processes of defining, constructing, manipulating and sharing databases among various users and applications. Defining a database involves specifying the data types, structures and constraints to the data to be stored in the database. The database definition or descriptive information is also stored in the database in the form of a database catalog or dictionary; it is called metadata. Constructing a database is the process of storing the data on some storage medium that is controlled by the DBMS. Manipulating the database includes functions such as querying the database to retrieve specific data, updating the database to reflect the real world changes and generating reports from the data. Sharing a database allows multiple users and programs to access the database simultaneously. Other important functions provided by the DBMS include protecting the database and maintaining it over a long period of time. Protection includes system protection against hardware or software malfunctions and security protection against unauthorized or malicious access.… Read the rest