Data mining is a step within KDD whose goal is to find patterns within large volumes of data. It would be a field of statistics and information systems.
Some of these patterns are:
- Data groups (cluster analysis)
- Unusual registries (anomaly detection)
- Dependencies (association rule mining)
It combines statistics, artificial intelligence, machine learning and database management systems.
Data Mining Algorithms
This post considers that data mining algorithms are borrowed from machine learning.
You can read this post about machine learning algorithms.
Model Assessment Techniques
This section shows model assessment techniques, that are methods to assess the accuracy of a model.
Model assessment techniques featured on this post:
- Train-Test Split
- Cross validation
Train-Test Split
Train test split or train-validate-test is a simpler approach with a single split into training and validation sets, leaving a separate test set for final model evaluation.
Cross Validation
Cross validation is a method to assess the accuracy of a model.
It involves partitioning a dataset into multiple subsets for training and validation, iteratively switching the validation set.
List of Data Mining Tools
Data mining tools:
- Knime
- RapidMiner
- ELKI
- Weka
- Teradata
Knime
Knime is FOSS.
RapidMiner
RapidMiner is FOSS.
ELKI
ELKI is FOSS.
Weka
You can read this post about Weka.
Teradata
Teradata is proprietary.
Social media mining
Social media mining refers to data mining from social media.
Bibliography
- WITTEN, Ian H. et al. Data Mining: Practical Machine Learning Tools and Techniques. 4th ed. 2017.
Related entries
- Knowledge discovery
- Data science and engineering