Data mining or knowledge discovery is the process of analyzing data from different perspectives & summarizing it into useful information. This information can be used to increase revenue & cut cost or both. We know that data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze data from many angels & categories it. It also summarizes the relationship identified. Technically speaking data mining is the process of correlations among dozens of fields in large relational database. In other words it is the process of sorting through large amount of data & picking out important information. It is often use by business intelligence organizations & financial analysts. It is also used in the sciences to extract information from the data set generated by modern experiment & observational methods.
Data mining in relation to Enterprise Resource Planning(ERP) is the statistical & logical analysis of large sets of transaction data looking for patterns that can aid decision making. Although data mining is a new term but technology is not. Companies have used powerful computers to shift through volumes of supermarket scanner data & analyze market research report for year.
However, continuous innovations in computer processing power, disk storage etc is increasing the accuracy of analyzing while driving down the cost. There are also human rights & privacy related concerns with data mining, specifically regarding the source of the data analyzed. Data mining provides information that would not be providing otherwise. It must be interpreted to be useful. When individual people involves in data collection, there are many questions related privacy, ethics & legality. Data mining government or commercial data sets for national security or law enforcement purposes has raised privacy concerns. Data mining has also become an important part of customer relationship management.
Data mining have five major elements.
- Extract, transform & load transactions data onto data warehouse system.
- Store & manage the data in a multi dimensional database system.
- Provide data access to business analysts & information technology professionals.
- Analyze the data by application software.
- Present the data in a useful format, such as table or graph.
Steps of Data Mining
- Data Integration: All the different sources contribute data which are collected and integrated.
- Data Selection: We have to select data and make sure that it is useful for data mining.
- Data Cleaning: The data collected may not all correct and need to be checked again before being used to avoid data errors and uncertain problem.
- Data Transformation: Even though the data has been cleaned, to have data ready for mining, we still have to do something and transform data into the right form so that mining process will not be any problem. Many techniques can be used to complete data transformation suck as like smoothing, aggregation, normalization techniques.
- Data Mining: Techniques like clustering and association analysis are used among the many different techniques used for data mining only when we are ready to apply data mining techniques on the data to discover the interesting patterns.
- Pattern Evaluation and Knowledge Presentation: Transformation, visualization, removing redundant patterns are steps from the patterns we have generated.
- Decisions/Use of Discovered Knowledge: In order to make use of the knowledge which acquired to take better decisions, this step helps.
Data Mining Tools
Purchasing mining programs designed for existing software and hardware platforms, which can be integrated into new products and systems as they are brought online is the first way if organizations wish to use data mining tools, or they can build their own custom mining solution. For example in order to give the mined data more value, the organization has to feed the output of a data mining exercise into another computer system, such as a neural network, is quite common. This is because, while the second program (e.g., the neural network) makes decisions based on the data collected, the data mining tool gathers the data.
In the marketplace they are different types of data mining tools available, each with their own strengths and weaknesses. Different kinds of data mining tools available and recommend the purchase of a tool that matches the organization’s current detective needs is the awareness needed by the internals auditors. In the project’s lifecycle, this should be considered as early as possible, perhaps even in the feasibility study.
These three categories are the classification of most data mining tools: dashboards, traditional data mining tools and text-mining tools. Below is explanations of each:
- Dashboards. Is used to monitor information in a database and is installed in computer, dashboards reflect updates onscreen and data changes – in the form of a table or chart – so that the user can see how the business is performing and working.
- Traditional Data Mining Tools. Helping companies establish data patterns and trends by using a number of complex algorithms and techniques is the use of traditional data mining programs. In order to monitor the data and highlight trends and others capture information residing outside a database, some of these tools are installed on the desktop.
- Text-mining Tools. Because of its ability to mine data from different kinds of text – from Microsoft Word and Acrobat PDF documents to simple text files (example), this third type of data mining tool sometimes is called a text-mining tool. These tools can be used to scan content. There are a lot of unstructured scanned content (i.e., information is scattered almost randomly across the document, including e-mails, Internet pages, audio and video data) or structured (i.e., the data’s form and purpose is known, such as content found in a database). A wealth of information can be provided for organizations by capturing these inputs that can be mined to discover trends, concepts, and attitudes.
Besides these tools, there are other applications and programs may be used in data mining process.
Benefits of Data Mining
By using Data warehouses business executives can look at the company as a whole unit. There must be reasons on spending so much money by many corporations to implement data instead of looking at an organization in terms of the departments that it comprises. Data warehouses also use their ability to handle a lot of tasks in an organization involving many different departments. In order to make sure that every transaction will be made in certain time frame, the good transaction system will be set up by every organization. The biggest problem with report and queries is the transaction can not be made in time frame and then will be late to be compiled. On top of that and in order to overcome the problem, many companies are working to come out with a good data warehouse and hopefully can be able to settle down any problems regarding transactions. Another benefit from data warehouse is the uses of data model for any queries regarding transactions and the outcome is really convincing.
Models for queries are really important especially to come out with good reports. Eventhough transaction processing system doesn’t really need this but the implementing of a good model can help the company. Anyway, wrong modeling methods can slow down the process of transactions. Eventhough transaction process has to be at speed by the server units, but at the same time, they will slow down the process of query.
Queries of data can be made using data ware house and this is one of the reason why it is very efficient. Anyway, a big number of transaction system can lead to big transaction system. So, the company always has to get ready to come out with different data warehouse or even worse, different models of processing. The combination of every departments in a company is really important in order to overcome any problems regarding processing and transaction of data.
Business Applications of Data Mining
Data mining is used in customer relationship management (CRM). Data mining in CRM applications can contribute significantly to the bottom line. Rather than contacting a customer through a call center or through a mail, only customers that are predicted to have a high likelihood of responding to an offer are contacted. In cases where many people will take an action without an offer, uplift modeling can be used to determine which people will have the greatest increase in responding if given an offer. Data clustering can also be used for automatically discovering the segments or groups within a customer data set.
We can identity groups that are less profitable to companies by using data mining, which could lead to discrimination against certain customers. Many companies will learn which consumers make them the most profit & will start to direct all of their effects into making products for only target market. This technique is very beneficial to the company because they are maximizing profit by focusing the efforts on a specific group without wasting time & resources by selling to a target market that might not return as much value to them.
Business employing data mining quickly see a return on investment (ROI), but also they recognize that the number of predictive models can quickly become very large. Instead of one model to predict which customers will churn, a business could build a separate model for each region & customer type.
Data mining can also be helpful to Human Resources in identifying the characteristics of their most successful employees. Strategic Enterprise Management applications also help a company translate co-operate level goals, such as profit & margin share targets into operational decisions, such as production plans & workforce levels.
Recently data mining has been widely use in are of science & technologies including medicine, genetics, bioinformatics & electrical power engineering.
In genetic the important goal is to understand the mapping relationship between the inter-individual variations in human DNA sequences. It is used to find out that how the changes in an individual’s sequence affect the risk of developing common diseases for example cancer. It is very helpful for improving the diagnosis, prevent & treatment of the diseases. This technique is also known as Multifactor dimensionality reduction.
Data mining techniques is used for condition monitoring of high voltage electrical equipment in the area of electrical power engineering. The purpose of condition monitoring is to obtain valuable information on the insulation’s health status of the equipment.
Data mining is also use in games now-a-days. The National Basketball Association is exploring a data mining application that can be used in conjunction with image recordings of basketball games. The Advanced Scout software analyzes the movements of players to help coaches orchestrate plays & strategies. Coach can automatically bring up the video clips showing each of the jump shots by using NBA universal clock.
Today, data mining applications are available on all size systems for mainframe, client/server & PC platforms. Systems prices range from several thousand dollars for the smallest applications up to $1 million a terabyte for the largest. Enterprise-wide applications generally range in size from 10 gigabytes to over 11 terabytes.