on
A Refresher: Machine Learning Basics
Here’s a post to remind me of what I’ve learned so far. I’ll keep updating this post as I move forward.
A few definitions
Labels
Labels are what we’re trying to predict with the model. For example, if we’re classifying among animals/objects, then what the system predicts will be the label. The concept of label stays the same among most machine learning methods.
Features
Features are what we use as input to the model. Say that we want to predict a movie recommendation: we could give genre, characters, and correlation of other people having the same taste as features for the model to try to predict a good recommendation.
Basic Machine Learning Tools/Methods
Classification
For a classification task, a data mining procedure produces a model that, given a new individual, determines which class that individual belongs to. Classification and scoring are very closely related; as we shall see, a model that can do one can usually be modified to do the other.
Regression
Also called “Value Estimation”. A regression procedure produces a model that, given an individual, estimates the value of the particular variable specific to that individual.
-
Classification vs Regression
Classification is used to predict discrete categories/labels. Like if a it’s a shoe or sandal, if it’s a person or animal etc. Whereas regression refers to predicting continuous values like the stock value of a company, or perhaps the estimated price of a property. Regression is related to classification, but the two are different. Informally, classi‐ fication predicts whether something will happen, whereas regression predicts how much something will happen.
Similarity Matching
Similarity matching attempts to identify similar individuals based on data known about them. Similarity matching can be used directly to find similar entities.
Clustering
Clustering attempts to group individuals in a population together by their similarity, but not driven by any specific purpose. Clustering is useful in pre‐liminary domain exploration to see which natural groups exist.
Association Rules
This method attempts to find associations between entities based on transactions involving them. An example co-occurrence question would be: What items are commonly purchased together?
-
Clustering vs Association Rules
While clustering looks at similarity between objects based on the objects’ attributes, co-occurrence grouping considers similarity of objects based on their appearing together in transactions.