Data Mining

This post is about data mining.

Data Mining Algorithms

Data mining algorithms are used in data mining and machine learning.

Data mining techniques or algorithms:

  • Rule induction
  • K-nearest neighbor
  • Artificial neural networks
  • Decision trees
  • Logistic regression
  • K-means
  • K-medioids
  • DBSCAN
  • Gaussian
  • Naive Bayes (NBC)
  • Perceptron
  • Combination methods
    • Random Forest
    • ExtraTree
    • GBM

Rule Induction

Rule induction involves extracting useful if-then rules from data based on statistical significance.

It is often used in association rule learning and classification tasks.

It is supervised.

K-nearest neighbor

K-nearest neighbor (KNN) is a method classifies each registry into a dataset based on a combination of the classes of the k registries that are more similar to it in a set of historical data (where k >= 1).

Artificial Neural Networks

You can read this post about artificial neural networks.

Decision Trees

A decision tree is a tree-like model used for classification and regression. It splits the data into subsets based on feature values, creating a tree structure that can be used to make predictions.

It is supervised.

Logistic Regression

Logistic regression is a supervised machine learning algorithm that accomplishes binary classification tasks by predicting the probability of an outcome, event, or observation.

It is supervised.

Logistic Regression at Wikipedia

K-Means Clustering

K-means clustering is a clustering algorithm that partitions data into ‘k’ clusters based on feature similarity. It’s widely used in unsupervised learning for grouping similar data points together

It is unsupervised.

K-means clustering at Wikipedia

K-medioids

K-medioids

DBSCAN

Density-based Spatial Clustering of Applications with Noise (DBSCAN)

GMM

Gaussian Mixture Model (GMM)

Naive Bayes (NBC)

Naive Bayes (NBC)

Perceptron

Perceptron is a supervised.

Multi-layer perceptron is a perceptron variant.

Logistic Regression

Logistic regression is…

Combination methods

Combination methods:

  • Random forest
  • ExtraTree
  • GBM

Random Forest

Random forest

ExtraTree

Extremely Randomized Tree (ExtraTree)

GBM

Gradient Boosting Machine (GBM)

Model Assessment Techniques

This section shows model assessment techniques, that are methods to assess the accuracy of a model.

Model assessment techniques featured on this post:

  • Train-Test Split
  • Cross validation

Train-Test Split

Train test split or train-validate-test is a simpler approach with a single split into training and validation sets, leaving a separate test set for final model evaluation.

Cross Validation

Cross validation is a method to assess the accuracy of a model.

It involves partitioning a dataset into multiple subsets for training and validation, iteratively switching the validation set.

List of Data Mining Tools

Data mining tools:

  • Knime
  • RapidMiner
  • ELKI
  • Teradata

Knime

Knime is FOSS.

RapidMiner

RapidMiner is FOSS.

ELKI

ELKI is FOSS.

Teradata

Teradata is proprietary.

Leave a Reply

Your email address will not be published. Required fields are marked *