Random Forest Algorithm

Vineet Maheshwari
DataDrivenInvestor

--

It is easy to use Machine Learning algorithm that can even work without hyper-parameter tuning. It can be used for both classification and regression tasks. It is a supervised learning algorithm which creates Forest and makes it random. The forest is an ensemble of Decision Trees which is trained with Bagging method.

It builds multiple decision trees and merges them together to get more accurate results. It has nearly the same hyper-parameters as that of Decision trees or a bagging classifier.

Source : SlidePlayer

Random Forest adds randomness to model while forming decision trees. Instead of searching for the most important feature, it searches for best feature from a random subset of features. It results in diversification which makes our model better. We can make our model more random by using some threshold for each feature rather than searching for the best threshold.

It is very easy to measure the relative importance of feature while making predictions.

Hyper-Parameters

  • Predictive Power: “n_estimators” is the parameter that defines the number of trees before building the algorithm and taking the majority voting or taking the average. A higher number of trees increase the performance of the model but it slows down the model at the same time. Another parameter is “max_features”, it defines the maximum number of features that the algorithm talks about. The last parameter that can modify the predictive power is “min_sample_leaf”, it simply tells us about the minimum number of leaves required for the splitting of the internal node.
  • Model Speed: “n_jobs” parameter tells the engine that how many processors it can use. If it has a value = 1, it can use 1 processor and if it has a value = -1, there is no limit. “random_state” makes the output replicable. It always produces similar results, if it has a certain value of random_state. “oob_score” (oob sampling) is another parameter to check on the speed of the model. It does not uses one-third of data for training and uses it later for the evaluation of the model. These samples are out of bag samples.

We will talk about the implementation of Random Forest in a later post. HAPPY CODING !!!

--

--