Visualizing Training and Validation Losses in real-time using PyTorch and Bokeh

Medhy Vinceslas
DataDrivenInvestor

--

Sometimes during training a neural network, I’m keeping an eye on some output like the current number of epochs, the training loss, and the validation loss. All of this in order to have an Idea of in which direction, the algorithm is moving, and trying answering questions like:

Should I choose a bigger/smaller Learning rate?
Should I go for a decay approach?
Should I stop the training, Maybe reduce the number of epochs and many other questions.

Many of these questions can be answered by some package like early_stopping or other. But I found interesting the fact of being able to visualize these value in real-time. By Real-time, I mean during the training process.

And you know what? Here is a quick tutorial on how do do this using the wonderful Deep Learning Framework PyTorch and the sublime Bokeh Library for plotting.

Step 1: Install dependencies

bokeh==1.1.0
cycler==0.10.0
Jinja2==2.10.1
kiwisolver==1.1.0
MarkupSafe==1.1.1
matplotlib==3.0.3
numpy==1.16.3
opencv-python==4.1.0.25
packaging==19.0
pandas==0.24.2
Pillow==6.0.0
pyparsing==2.4.0
python-dateutil==2.8.0
pytz==2019.1
PyYAML==5.1
six==1.12.0
torch==1.0.1.post2
torchvision==0.2.2.post3
tornado==6.0.2

Step 2: Import the necessary module

#PyTorch
import torch
# Bokeh
from bokeh.io import curdoc
from bokeh.layouts import column
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure
from functools import partial
from threading import Thread
from tornado import gen

Step 3: Prepare the plot

First, we have to define an object called ColumnDataSourcewhich contains a dictionary of the variables that you want to include in the plot with initial value if you want. Here I’ve no initial values.

source = ColumnDataSource(data={‘epochs’: [],
‘trainlosses’: [],
‘vallosses’: [] }
)

Then create the window object by calling figure() and add the train and val losses as a line plot.

plot = figure()
plot.line(x= ‘epochs’, y=’trainlosses’,
color=’green’, alpha=0.8, legend=’Train loss’, line_width=2,
source=source)
plot.line(x= ‘epochs’, y=’vallosses’,
color=’red’, alpha=0.8, legend=’Val loss’, line_width=2,
source=source)

Finally, we create the document that we will display by calling the curdoc() method. Here it is important to save a local copy of curdoc() in the doc variable so that all threads have access to the same document.

doc = curdoc()
# Add the plot to the current document
doc.add_root(plot)

Step 4: Update the plot

Here is a function that takes as input a dictionary that contains the same items as the data dictionary declared in step 3. This function is responsible for taking the new losses and current epochs from the training loop defined in step 5.

@gen.coroutine
def update(new_data):
source.stream(new_data)

Step 5: Process data and write your training loop as usual

Here I assume that you know how to train a Neural Net using PyTorch, I’ll just focus on some part of the code in order to make thing more clear.

def train(n_epochs):
model = Net()

model.train()
for epoch in range(1, n_epochs+1):
# Keep track of training and validation loss
train_loss = 0.0
valid_loss = 0.0
for data in train_loader:

# compute your training loss as usual
train_loss += loss.item()*images.size(0)

model.eval()
for data in valid_loader:

#compute your validation loss as usual
valid_loss += loss.item()*images.size(0)
# calculate average losses as Usual
train_loss = train_loss/len(train_loader)
valid_loss = valid_loss/len(valid_loader)

Until here, nothing was changed comparing to what we do usually when training a neural network. The only thing we have to add after the last line and still within the for loop is the following lines.
We construct the new data dictionary and then update the plot using the update method defined in step 4.

new_data = {‘epochs’: [epoch],
‘trainlosses’: [train_loss],
‘vallosses’: [valid_loss] }
doc.add_next_tick_callback(partial(update, new_data))

So the train() method should look like

def train(n_epochs):
model = Net()

model.train()
for epoch in range(1, n_epochs+1):

for data in train_loader:

# compute your training loss as usual
train_loss += loss.item()*images.size(0)

model.eval()
for data in valid_loader:

#compute your validation loss as usual
valid_loss += loss.item()*images.size(0)
# calculate average losses as Usual
train_loss = train_loss/len(train_loader)
valid_loss = valid_loss/len(valid_loader)
new_data = {‘epochs’: [epoch],
‘trainlosses’: [train_loss],
‘vallosses’: [valid_loss] }
doc.add_next_tick_callback(partial(update, new_data))

Finally, we close the program by adding these two lines

thread = Thread(target=train)
thread.start()

Step 6: Display result via the Terminal

if your file name is training.py instead of launching the python command, we have to launch the bokeh server and execute the python script by typing in the terminal

bokeh serve --show training.py

And We can see the result in the browser

UPDATE

Now a simple high level visualization module that I called Epochsviz is available from the repo here. So you can easily in 3 lines of code obtain the result above

from Epochsviz.epochsviz import Epochsvizeviz = Epochsviz()# In the train function
eviz.send_data(current_epoch, current_train_loss, current_val_loss)
# After the train function
eviz.start_thread(train_function=train)

Then launch the script with

bokeh serve --show training.py

I hope you enjoyed this tutorial, I tried my best with simple explanations.

Thank you!

--

--