Data mining is a collective term used to denote a set of methods for the detection within data of previously unknown, non-trivial, practically useful and accessible information, which can then be interpreted as necessary for the purposes of making decisions in various spheres of human activity.
The basis of data mining methods comprises all sorts of classification, modelling and forecasting, based on the use of decision trees, artificial neural networks, algorithms, evolutionary programming, associative memory and fuzzy logic. Data mining methods often involve the use of probability and statistical analysis.
One of the most important purposes of data mining methods is to visualise the results of calculations, which makes possible the use of data mining tools by people who lack special mathematical skills.
Problem Statement
Initially, the task is set as follows:
A fairly large database exists
- Some degree of “hidden knowledge” is assumed to exist somewhere within it
- Methods must be developed for detecting knowledge buried within significant volumes of raw data. In the current conditions of global competition, it is precisely the patterns that are found (knowledge) that can serve as a source of additional competitive advantage.
What does "hidden knowledge" mean?
The hidden knowledge is the information, that:
- Previously unknown – that is, knowledge that must be new (rather than confirming some previously received information);
- Non-trivial – i.e. that which cannot be simply observed (for direct visual data analysis or for calculating simple statistical characteristics);
- Practically useful – knowledge that is of value to the researcher or consumer;
- Accessible for interpretation – knowledge that is easy to present in a user-friendly form and easily explained in terms of the subject area.
These requirements largely determine the essence of data mining methods and in what form and according to what ratio data mining technology is used within database management systems, statistical analysis methods and methods of artificial intelligence.
Data mining and artificial intelligence
The knowledge extracted by data mining methods is usually presented in the form of regularities (patterns) such as:
- associative rules;
- decision trees;
- clusters;
- mathematical functions.
The algorithms for finding such regularities are at the intersection of the following areas: Artificial Intelligence, Mathematical Statistics, Mathematical Programming, Visualisation, OLAP.