Machine Learning in 10 minutes

Published in

DataDrivenInvestor

6 min readApr 13, 2020

Machine learning (ML) is the study of computer algorithms that improve automatically through experience. ML is a subset of artificial intelligence and is built upon a statistical framework. The ultimate goal of machine learning is to apply computer algorithms against available data to create a mathematical model that can make future predictions or decisions without being explicitly programmed to do so.

Unsupervised Learning

This is a class of ML algorithms that lets the computer recognize patterns on its own with a minimum or no human supervision. Imagine someone shows you the following picture:

Your brain recognizes two types of objects Xs and Os without any additional instructions given. Unsupervised learning works the same way. The machine is applying clustering and other methods to the data and can discover patterns on its own.

Supervised Learning

This class of ML algorithms requires upfront examples it could learn from. This process is called training and typically machine needs a lot of examples created by human to recognize the rule. For instance, you have a thousand photos of pets and you want an algorithm to tell if a picture contains a dog or a cat. The first thing you should do is to label which of the images have dogs and those that have cats. You would take some pictures with their labels and process them by the algorithm, so it can learn patterns and create a mathematical model that can distinguish dog and cat pictures on its own. Then you will test the model on the remaining part of labeled images, which you kept aside to check how well it learned. The first portion is called a training dataset and the second is a test dataset. You could also have a validation dataset that is used to fine-tune the model before you test it.

Cognitive computing - a skill-set widely considered to be the most vital manifestation of…

As its users, we have grown to take technology for granted. Hardly anything these days is as commonplace and…

www.datadriveninvestor.com

Supervised learning algorithms can be used to solve two problems — regression and classification.

Supervised Learning — Regression

These algorithms can find the relationship between one or more predictor variables and one outcome variable. Because this is a supervised learning algorithm, it needs data to learn from. Regressions use statistics to learn about patterns. An example will be a task to predict someone’s happiness if you give him a $40 tip. You should have historical data on how happy this person was when you gave him different bonuses in the past.

Linear Regression is the most straightforward regression algorithm. This method will try to draw a straight line through all of the values that are known and assume that all of the future values will be on this line as well. In other words, it will extrapolate using current observations.

You could rarely draw a line that always gives you correct predictions. In our case, there would be many other factors that could affect someone’s happiness. Therefore the accuracy of this method unlikely to be 100%, and there would be errors.

Supervised Learning — Classification

This is a class of supervised algorithms used to separate items into categories where they belong. The very basic version of this algorithm would be to find a single line that would separate two types of objects. This method is called a linear classification.

As in the happiness example for regression algorithms, you likely to find that there is no way to draw a single line that always would perfectly separate two classes of objects:

When the algorithm learns, it will try to plot as many lines as possible to find at which position and what angle it would give the most accurate results.

There are more advanced non-linear classification algorithms. These algorithms can separate classes using more complex mathematical equations. Most of the currently used classification algorithms are non-linear. Here is how the non-linear Kernel Approximation algorithm could have divided our classes:

Having perfect accuracy for training dataset could be dangerous. The model that learned on the outliers and can fail to generalize in real-world cases. This problem is called overfitting a model.

Semi-Supervised Learning

These algorithms use a combination of supervised and unsupervised learning methods. In semi-supervised learning, there are labeled and unlabeled datasets. With this method, you would use the labeled data to label the remaining unlabeled part of your dataset. For example, you want to train a model to classify images, but you want to give your algorithm a hint about how to construct the categories. You want to use only a very small portion of labeled images because every picture is not labeled, and at the same time you want your model to classify the unlabeled pictures as accurately as possible based on the images that already are labeled.

Self-Supervised Learning

Self-supervised learning (SSL) is a special case of learning where a human does not manually label the training dataset, but unsupervised learning algorithms generate the labels. There is much more unlabeled data in the world than labeled. And this approach uses techniques of unsupervised learning to find common patterns in the data first, which would help to build a better supervised model later on. Some people say that this is just a variation of unsupervised learning, but SSL is a two-step process, and its final goal is to build a supervised learning model.

Reinforcement Learning

This is a type of machine learning technique that enables an algorithm to learn in an interactive environment by trial and error using feedback from its actions and experiences. With this method, with a constant feedback loop, your model could self-improve over time.

Deep Learning

This is a brain-inspired family of algorithms. It utilizes information processing principles from the human brain. In deep learning, there are numerous layers of ML algorithms working together, with each providing a different interpretation of the data it feeds on. Because it attempts to imitate the work of the neural networks in the human brain, it needs significant computation power for training and big data to succeed.

Hopefully, this gives you an idea of ML methods and which algorithms can be used for a given situation. If you are a product manager and want to start implementing Machine Learning for your product, check out my article about Product Management for Artificial Intelligence.

I would also recommend you to read the Machine Learning is fun article if you want to dive deeper into ML methods and can dedicate for this a few more hours.

Many thanks to Yuriy Muzychuk, Andrey Boreyko, and Pavel Surmenok for your help on this article.