Forecasting a Recession in the USA

Fabrizio Basso

Published in

DataDrivenInvestor

6 min readJul 7, 2020

Chapter — IV:

Data Scaling Strategies.

The Story so far

In the first part of this study, I analyzed why recessions are important and how they affect returns in the Equity Market (S&P500). The second chapter’s focus has been on the creation of the dataset and its EDA. In the third part, I tackled the issue of dimensionality reduction by shortlisting two strategies to achieve this goal (SelectFromModel with LogisticRegression and PCA) to be used later on in the construction of an ML-pipeline. In this chapter, I will explore data scaling strategies.

Index:

4.1 Data Scaling — Overview and Goals;
4.2 Data Scaling — Example of two features;
4.3 Data Scaling — Testing the alternatives;
4.4 Conclusion.

4.1 Data Scaling — Overview and Goals

Performances of ML algorithms can be affected by the magnitude of the data they are fed with. If one feature’s level is on a scale of magnitude far above those of other features, that particular feature can dominate all the others. Most of the times dataset will contain features highly varying in magnitudes, units, and range. But since most of the machine learning algorithms use Euclidian distance between two data points in their computations, this is a problem. If left alone, these algorithms will naturally ‘overweight’ the features with a greater magnitude, neglecting the smaller ones. For instance, in the dataset, ISM-PMI has an average above 50 and most of the time scores values between 40 and 60, while all the features calculated as percentage average close to zero and seldom exceed +/-5% in their value. The features with high magnitudes will weight in a lot more in the distance calculations than features with low magnitudes. To suppress this effect, I need to bring all features to the same level of magnitudes. This can be achieved by scaling. To overcome these shortcomings, different approaches to data scaling provided in SciKit-Learn are tested:

StandardScaler;
QuantileTransformer-Uniform;
QuantileTransformer-Normal;
PowerTransformer;
MinMaxScaler; and
RobustScaler.

as a general reminder, a not exhaustive (!) list of some the ML models whose performance is affected by data scaling is the following:

linear and logistic regression;
nearest neighbours;
neural networks;
support vector machines with radial bias kernel functions; and
principal components analysis (PCA).

In particular, since PCA is one of the dimensionality reduction tools I selected to apply in this project, then a proper scaling of the data is a necessary step.

Goal: The main aim of this chapter is to understand the effect of different scaling process on models generalization performances in classifying recession/expansion periods in the US economy. The idea is to shortlist a set of strategies to be nested into a pipeline later on in the analysis.

4.2 Data Scaling — Example on two features

In order to appreciate how scaling can improve models’ performance, I first carry out an example by extracting only two features from the original dataset using the PCA.

I implement the following steps:

After importing all the libraries that will be used in this chapter and the training dataset, I split the data between a train and validation subset (As a reminder I already split the data between training and test set, where the latter will be used only at the very end to evaluate the generalization properties of the selected models). The validation set is equal to 30% of the overall training dataset.

Out.1: Code Output

I create a new features’ dataframe by fitting and transforming the MinMaxScaler on the training subset. Then the PCA is used to extract two new features from the unscaled dataset and two new features from the scaled one.

Out.2 Code Output

A first impression on how scaling might affect the analysis comes from the sheer visualization of a scatter plot of the two sets of features extracted from the scaled and the un-scaled data. The two sets of features differ massively in terms of the scale and relative location of recession and expansion observation. Now I take the analysis a step forward by training a KNN-classifier on the two set of data.

Pic.1 — PCA with 2 features on Unscaled and Scaled Data

First I train a KNN-classifier with 6 neighbours on the two features derived from the unscaled dataset. Then the same model is fitted on the PCA features extracted from the scaled data. The results are reported in the two graphs below. As the scores show, working on the scaled data improved the validation score of the model of 2.5%, from 87.6% to 90.1%. Despite being less relevant, also the performance on the training dataset slightly improved by 1.7%.

Pic.2 — KNN Performance and Graphic Representation when fitted on Un-Scaled date.

Pic.3 — KNN Performance and Graphic Representation when fitted on scaled data.

4.3 Data Scaling — Testing the alternatives

The next step is to generalize this analysis by applying a wider range of scalers on 16 PCA features. I will test the scalers on several ML algorithms to generalize as much as possible the results and to not over-rely on a specific model.

As mentioned before, the set of scaler that will be tested is the following:

StandardScaler;
Quantile-Transformer-Uniform;
Quantile-Transformer-Normal;
PowerTransformer;
MinMaxScaler; and
RobustScaler.

While these ML algorithms are used as tester:

GaussianNaiveBayes;
LogisticRegression;
KNeighborsClassifier;
ExtraTreesClassifier;
SVC, kernel=’rbf’;
LinearSVC;
Bagging with GaussianNB; and
MLPClassifier.

Each of these models is first trained and its performance tested on the entire unscaled dataset (“No Scaling — All Features” row in the table). Then I try to extract the 16 features using PCA from the unscaled dataset and I train and test the models on them (“No Scaling — PCA”). This line should provide an insight into the negative impact of using PCA on un-scaled data and a further term of comparison for the models in the list. Then the training and the validation dataset are scaled according to the six metrics enumerated above. PCA analysis is performed on each of them and 16 features extracted. Each model is then trained and tested on the validation set (all the other rows in the table). The results are reported in two tables: one reporting the score on the training dataset and the other, and most importantly, on validation data. Best scores are marked in light blue.

The code to produce the results is reproduced here:

Here below the result tables:

4.4 Conclusions

The use of a proper scaling strategy in some cases grants a considerable improvement in the model performances on the validation data. The boost exceeds 10% for SVC with RBF kernel model and around 7% for KNN classifier. Relevant from a technical standpoint is the fact that model performances using features extracted from unscaled data are worse in comparison to those achieved on the whole dataset but in two cases. The Quantile-Transformer scaler using a uniform marginal distribution for the output seems to perform particularly well, while the MinMaxScaler is the second-best performer. In all the cases the best score is realized among the scaled data. One particular reason for the good performance of Quantile-Transformers might be related to one of the hallmarks of the dataset that I highlighted during the EDA: the widespread presence of outliers. Quantile-Transformers are robust with regards to outliers and therefore they are a solid choice in order to scale the dataset. As an alternative metric, MinMaxScaler is considered.

To conclude, based on the analysis, to scale the dataset I will consider the following two procedures: