Scikit-learn

Scikit-learn is a Python library for the traditional machine learning tasks (e.g. regression, classification, clustering, etc.).

It is FOSS under a BSD license.

scikit-learn official website

scikit-learn code repository

History

It was developed originally by David Cournapeau in 2007. It is maintained by a team of researchers at the French Institute for Research in Computer Science and Automation (Inria).

It is bundled in packages such as Mambaforge and Anaconda. It can be installed using package managers such as pip and conda.

Concepts

A feature is a column contained in the training dataset.

An estimator is any object that can learn from data. They are initilized as untrained estimators. They become a trained estimator after being trained using the common .fit(x,y) function.

Supervised algorithms requires both x and y arguments (features and target labels), while unsupervised algorithms only require x (features).

Main types of estimators:

  • Transformer
  • Predictor

A transformer transforms data. Examples of transformer operations are scaling and encoding.

A predictor predicts data.

The pipeline object chains different estimators.

The column transformer groups estimators affecting different columns.

A dense matrix is a matrix that contains meaningful data on most of its cells.

A dense matrix in Scikit-learn is stored as a NumPy ndarray.

A sparse matrix is one that contains meaningful data on a small number of cells.

A sparse matrix in Scikit-learn is stored using a special object from Scikit-learn for sparse matrixes.

Some estimators return a dense matrix while other return a sparse matrix.

One-hot encoding implies representing different categories as numbers.

For example, if a column contains the values red/green/blue, a one-hot encoder creates a column for each category (3) and assigns the value 1 where the value contained was the category matching the value and 0 in other case.

The OneHotEncodeer returns a sparse matrix.

Learning

Resources:

Related entries

Leave a Reply

Your email address will not be published. Required fields are marked *