A Refresher: Machine Learning Basics

Here’s a post to remind me of what I’ve learned so far. I’ll keep updating this post as I move forward.

A few definitions


Labels are what we’re trying to predict with the model. For example, if we’re classifying among animals/objects, then what the system predicts will be the label. The concept of label stays the same among most machine learning methods.


Features are what we use as input to the model. Say that we want to predict a movie recommendation: we could give genre, characters, and correlation of other people having the same taste as features for the model to try to predict a good recommendation.

Basic Machine Learning Tools/Methods


For a classification task, a data mining procedure produces a model that, given a new individual, determines which class that individual belongs to. Classification and scoring are very closely related; as we shall see, a model that can do one can usually be modified to do the other.


Also called “Value Estimation”. A regression procedure produces a model that, given an individual, estimates the value of the particular variable specific to that individual.

Similarity Matching

Similarity matching attempts to identify similar individuals based on data known about them. Similarity matching can be used directly to find similar entities.


Clustering attempts to group individuals in a population together by their similarity, but not driven by any specific purpose. Clustering is useful in pre‐liminary domain exploration to see which natural groups exist.

Association Rules

This method attempts to find associations between entities based on transactions involving them. An example co-occurrence question would be: What items are commonly purchased together?