The Mathematical Approach to Artificial Intelligence

Published in

DataDrivenInvestor

4 min readMar 8, 2022

Just how much math do I have to know to be a great Software Engineer in AI? 🤔

Photo by Dan-Cristian Pădureț on Unsplash

In a previous article here, I wrote about the importance of learning the fundamentals of any subject matter or domain. I recommend you read it first (if you haven’t) so as to fully understand this article.

If you’ve already read it, you can skip this section. If you haven’t, here’s a TLDR:

Every major field of work is made up of layers of abstraction.
These layers of abstraction form the foundation upon which concepts in that field are built.
A lot of these fundamentals center around Philosophy, Mathematics, and Physics.
While you don’t need to understand the math and physics underlying any field in engineering, performance at the highest level in your career requires this.

In this article, I write about how the same concept, but with respect to Machine Learning and Artificial Intelligence. The question is “How much math do I have to learn to be a great AI Engineer?”.

The truth is a lot of us do not need to know that much math. Entry-level roles in Data Science, Machine Learning, or Artificial Intelligence do not require that you know a lot of mathematics. These have already been taken care of with libraries and frameworks. The lower layers have already been abstracted away for you so that you do not have to worry about them. At this level, most of what you have to do is understand things at the high level (top-surface concepts like regression, etc.) and how to use the most common tools and software.

However, elite-level performance requires a strong background in math. The best engineers in AI are the ones who can read a research paper, understand it, and code out a prototype within a short period of time.

It is obvious that some knowledge of mathematics is required for them to even understand the research papers before implementing them.

The ability to read any research work in AI, understand it, and implement it in quick prototyping or build it at scale is the holy grail of engineering in AI.

A lot of the novel methodologies used in Deep Neural Networks and Reinforcement Learning come from research work done by top researchers in academia or industry.

Tools are just tools

Most engineers learn how to use sci-kit learn, Tensorflow, Pytorch, or NumPy, but these are just tools. Tools change, they get upgraded, many even get deprecated. Without a good understanding of the concepts you’re trying to build on, your tools will almost be near-useless. The deepest level of understanding in AI is the mathematical level.

Think about this, how do you build NumPy or TensorFlow if you do not have any knowledge of Linear Algebra? 🤷🏽
You need to have a deep understanding of matrices, matrix operations, and Calculus before you can conveniently build any of these libraries. Thankfully, you don’t have to build your own Tensorflow or Numpy (except you’re doing something more novel) as they’ve already been created. You can simply use them.

Mathematical Concepts to learn

Here are a few topics in math and statistics that would largely aid your advancement in AI.

Calculus
Vector Calculus
Derivatives
Linear Algebra
Matrix operations
Eigenvectors and eigenvalues
Probability
Conditional Probability
Stochastic Processes
Monte-Carlo Markov Chains

Example — Artificial Neural Networks

Consider the following scenario:
Your team has come up with a new activation function. It has just been discovered by a scientist on your team and so it has not been built to scale. As a Machine Learning Engineer, you have been called upon to test the long-term effects of Stochastic Gradient Descent with this activation function on training with a huge real-life dataset. This activation function is not on your TensorFlow or Pytorch library, how would you implement it? You could extend the ActivationLayer in Keras and modify the call function. During implementation, now you learn the following:

Most neural network computation is vector-matrix calculations.
The chain rule in Calculus is very important in the backpropagation algorithm, so you have to factor that in the implementation of your activation function. How would the derivative of your activation function change?
Since the data size is very large, you may have to implement some dimensionality reduction. How do you know to choose between PCA, tSNE, etc.?

To start, I would recommend this course by Imperial College London

It won’t make you a pro, but it would really help your intuition and help you gain some level of mastery in AI. You would have a better understanding of all the math that goes into the development of Tensorflow and Pytorch.

Thanks for reading up to this point. If you like what you just read, kindly give a few claps. 👏🏽
You can also follow me on Twitter: @iamtemibabs

Schedule a DDIChat Session in Data Science / AI / ML / DL:

Experts - Data Science / AI / ML / DL - DDIChat

DDIChat allows individuals and businesses to speak directly with subject matter experts. It makes consultation fast…

app.ddichat.com

Apply to be a DDIChat Expert here.
Work with DDI: https://datadriveninvestor.com/collaborate
Subscribe to DDIntel here.