This post is about data mining.
Data Mining Algorithms
Data mining algorithms are used in data mining and machine learning.
Data mining techniques or algorithms:
- Rule induction
- K-nearest neighbor
- Artificial neural networks
- Decision trees
- Logistic regression
- K-means
- K-medioids
- DBSCAN
- Gaussian
- Naive Bayes (NBC)
- Perceptron
- Combination methods
- Random Forest
- ExtraTree
- GBM
Rule Induction
Rule induction involves extracting useful if-then rules from data based on statistical significance.
It is often used in association rule learning and classification tasks.
It is supervised.
K-nearest neighbor
K-nearest neighbor (KNN) is a method classifies each registry into a dataset based on a combination of the classes of the k registries that are more similar to it in a set of historical data (where k >= 1).
Artificial Neural Networks
You can read this post about artificial neural networks.
Decision Trees
A decision tree is a tree-like model used for classification and regression. It splits the data into subsets based on feature values, creating a tree structure that can be used to make predictions.
It is supervised.
Logistic Regression
Logistic regression is a supervised machine learning algorithm that accomplishes binary classification tasks by predicting the probability of an outcome, event, or observation.
It is supervised.
Logistic Regression at Wikipedia
K-Means Clustering
K-means clustering is a clustering algorithm that partitions data into ‘k’ clusters based on feature similarity. It’s widely used in unsupervised learning for grouping similar data points together
It is unsupervised.
K-means clustering at Wikipedia
K-medioids
K-medioids
DBSCAN
Density-based Spatial Clustering of Applications with Noise (DBSCAN)
GMM
Gaussian Mixture Model (GMM)
Naive Bayes (NBC)
Naive Bayes (NBC)
Perceptron
Perceptron is a supervised.
Multi-layer perceptron is a perceptron variant.
Logistic Regression
Logistic regression is…
Combination methods
Combination methods:
- Random forest
- ExtraTree
- GBM
Random Forest
Random forest
ExtraTree
Extremely Randomized Tree (ExtraTree)
GBM
Gradient Boosting Machine (GBM)
Model Assessment Techniques
This section shows model assessment techniques, that are methods to assess the accuracy of a model.
Model assessment techniques featured on this post:
- Train-Test Split
- Cross validation
Train-Test Split
Train test split or train-validate-test is a simpler approach with a single split into training and validation sets, leaving a separate test set for final model evaluation.
Cross Validation
Cross validation is a method to assess the accuracy of a model.
It involves partitioning a dataset into multiple subsets for training and validation, iteratively switching the validation set.
List of Data Mining Tools
Data mining tools:
- Knime
- RapidMiner
- ELKI
- Teradata
Knime
Knime is FOSS.
RapidMiner
RapidMiner is FOSS.
ELKI
ELKI is FOSS.
Teradata
Teradata is proprietary.