Data mining is a step within KDD whose goal is to find patterns within large volumes of data. It would be a field of statistics and information systems.
Some of these patterns are:
- Data groups (cluster analysis)
- Unusual registries (anomaly detection)
- Dependencies (association rule mining)
It combines statistics, artificial intelligence, machine learning and database management systems.
Data Mining Algorithms
This post considers that data mining algorithms are borrowed from machine learning.
You can read this post about machine learning algorithms.
Model Assessment Techniques
This section shows model assessment techniques, that are methods to assess the accuracy of a model.
Model assessment techniques featured on this post:
- Train-Test Split
- Cross validation
Train-Test Split
Train test split or train-validate-test is a simpler approach with a single split into training and validation sets, leaving a separate test set for final model evaluation.
Cross Validation
Cross validation is a method to assess the accuracy of a model.
It involves partitioning a dataset into multiple subsets for training and validation, iteratively switching the validation set.
List of Data Mining Tools
Data mining tools:
- Knime
- RapidMiner
- ELKI
- Teradata
Knime
Knime is FOSS.
RapidMiner
RapidMiner is FOSS.
ELKI
ELKI is FOSS.
Teradata
Teradata is proprietary.
Related entries
- Knowledge discovery
- Data science and engineering