Spread the love

Nonlinear_Multiclass_Classification_with_PyTorch_A_Typical_Workflow

In this article, we’ll have a look at a typical workflow for a simple nonlinear multiclass classification problem. We’ll keep things simple.

There are four major steps to follow during a machine learning process:

Data preparation
Model building
Model training
Model evaluation

And now, without further ado, let’s jump right in and start at the first step.

Table of Contents

Data Preparation¶

Let’s import the stuff we need to work with:

In [1]:

import torch
from torch import nn
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

Creating Data¶

We’ll use custom code to create some nonlinear multiclass data:

In [2]:

# Let's make it 300 data points per class.
n_points_per_class = 300

# Let's make it 2 features (X labels).
n_features = 2

# Let's make it 3 classes (y labels).
n_classes = 3

# Let's add some noise.
noise = 0.2

X = []
y = []

np.random.seed(42)

for class_number in range(n_classes):
    r = np.linspace(0.0, 1, n_points_per_class)
    theta = np.linspace(class_number * 4, (class_number + 1) * 4, n_points_per_class) + np.random.randn(n_points_per_class) * noise

    x1 = (r + .2) * np.sin(theta)
    x2 = (r + .2) * np.cos(theta)

    X.append(np.c_[x1, x2])
    y.append(np.full(n_points_per_class, class_number))

X, y = np.vstack(X), np.hstack(y)

Let’s view some samples, like every 100th one:

In [3]:

X[::100], y[::100]

Out[3]:

(array([[ 0.0198359 ,  0.19901391],
        [ 0.46484235,  0.26373552],
        [ 0.33391719, -0.80217213],
        [-0.12770934, -0.15391662],
        [-0.50953752,  0.16126486],
        [ 0.47445613,  0.72792321],
        [ 0.19121938, -0.05861014],
        [ 0.10173812, -0.52467532],
        [-0.86129944, -0.1146477 ]]),
 array([0, 0, 0, 1, 1, 1, 2, 2, 2]))

Let’s check out the shapes:

In [4]:

X.shape, y.shape

Out[4]:

((900, 2), (900,))

As we can see, there are 900 records. This is what the first dimension tells us. The second dimension for X means it has 2 features (vector) whereas y has a single feature (scalar). We have 2 inputs for 1 output.

The output is one of the 3 classes (0, 1 or 2).

Data Visualization¶

Let’s first see the data in a more readable tabular form:

In [5]:

points = pd.DataFrame(X, columns=["X1", "X2"])
points['label'] = y
points.head()

Out[5]:

	X1	X2
0	0.019836	0.199014
1	-0.002903	0.203324
2	0.032173	0.204170
3	0.070981	0.197676
4	0.001426	0.213373

Let’s plot the data:

In [6]:

plt.figure(figsize=(8, 6))
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='viridis', s=30, edgecolors='k', alpha=0.7)
plt.grid(True)
plt.show()

No description has been provided for this image

As you can see, this data is nonlinear, so we won’t be able to find straight lines that separate the data belonging to the three classes. This is why we’re going to use a nonlinear model.

Tensors¶

PyTorch works with tensors, so we have to turn our data into tensors:

In [7]:

X = torch.tensor(X, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.long)

# Let's view some samples.
X[:5], y[:5]

Out[7]:

(tensor([[ 0.0198,  0.1990],
         [-0.0029,  0.2033],
         [ 0.0322,  0.2042],
         [ 0.0710,  0.1977],
         [ 0.0014,  0.2134]]),
 tensor([0, 0, 0, 0, 0]))

Training Set and Test Set¶

Before we build our model, we should split the data into two separate sets, a training set and a test set:

In [8]:

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

len(X_train), len(X_test), len(y_train), len(y_test)

Out[8]:

(720, 180, 720, 180)

We have 720 training samples and 180 testing samples. Now we can start building the model.

Model Building¶

We have the data in place, it’s time to build a model. Besides the model, we’ll define a loss function and optimizer.

But before that, let’s make our code device agnostic. This is not strictly necessary for such a small dataset as ours, but it’s good practice. This way, we’ll make use of GPU if it’s available, otherwise we’ll make use of CPU:

In [9]:

device = 'cuda' if torch.cuda.is_available() else 'cpu'
device

Out[9]:

'cpu'

In Google Colab, we can change the runtime to GPU in the Runtime menu under Change runtime type. We’re not going to do it here, though.

Defining the Model¶

We now want to build a model that will take our inputs and produce outputs similar to our current outputs. In other words, given the features, the model will predict the labels.

There are a couple ways of approaching this in PyTorch, but we’ll create the model as a class. If we create a model as a class, we almost always inherit from nn.Module. Then, inside the __init__ method, we create the layers of the neural network. In our case, we need both linear and nonlinear layers. We’ll use the ReLU activation function for the latter.

It’s up to us how many layers we create. It depends on how much space we want to give the model to learn. In any case, we pass two arguments to the nn.Linear method: in_features, set to the number of inputs in a particular layer, and out_features, set to the number of outputs from a particular layer. We can set these arguments to any numbers we want. We just have to follow the following rules:

In the first layer, we set in_features to the number of inputs to the model.
In the last layer, we set out_features to the number of outputs from the model.
In the hidden layers (the layers between the first and last layers), the number if in_features must be equal to the number of out_features in the preceding layer.

We also have to define a forward method, which will contain the forward pass computation of the model.

In order to inherit from nn.Module and create the layers of the neural network, we have to import nn:

In [10]:

from torch import nn

And now, let’s build the model:

In [11]:

class MulticlassClassificationModel(nn.Module):
    def __init__(self, input_features, output_features, hidden_units=8):
        super().__init__()

        # Here we create three linear layers with the ReLU activation function in-between.
        self.layer_stack = nn.Sequential(
            nn.Linear(in_features=input_features, out_features=hidden_units),
            nn.ReLU(),
            nn.Linear(in_features=hidden_units, out_features=hidden_units),
            nn.ReLU(),
            nn.Linear(in_features=hidden_units, out_features=output_features),
        )

    # Here we define the method that will compute the forward pass.
    def forward(self, x):
        return self.layer_stack(x)

Let’s create an instance of the model and send it to the target device:

In [12]:

model = MulticlassClassificationModel(input_features=n_features, output_features=n_classes).to(device)
model

Out[12]:

MulticlassClassificationModel(
  (layer_stack): Sequential(
    (0): Linear(in_features=2, out_features=8, bias=True)
    (1): ReLU()
    (2): Linear(in_features=8, out_features=8, bias=True)
    (3): ReLU()
    (4): Linear(in_features=8, out_features=3, bias=True)
  )
)

Loss Function and Optimizer¶

The loss function is used to measure how wrong your model’s predictions are compared to the truth labels (the labels in the y_test set).

The optimizer instructs your model to update its internal parameters to lower the loss.

There are a lot of loss functions in PyTorch we can choose from. For multiclass classification, a common choice is Cross Entropy Loss, which we’re going to use. To this end, we’ll use the torch.nn.CrossEntropyLoss method.

There are also a lot of optimizers. Some common ones are Adam and SGD (stochastic gradient descent). Let’s pick SGD. In order to do that, we’ll use the torch.optim.SGD method.

The SGD optimizer takes two parameters:

params – these are the model’s parameters that we want to optimize,
lr – this is the learning rate – the higher it is the faster the optimizer will update the parameters.

We must be careful with the learning rate. It should be neither too high nor too low or it will fail to work.

So, here are the loss function and the optimizer:

In [13]:

loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

Metrics¶

We’ll evaluate our model using just one metric, accuracy. It’s calculated as the total number of correct predictions divided by the total number of predictions. Let’s define a function that will calculate accuracy for us:

In [14]:

def accuracy_fn(y_true, y_pred):
    correct = torch.eq(y_true, y_pred).sum().item()
    acc = (correct / len(y_pred)) * 100
    return acc

Model Training¶

Training the model involves two loops: a training loop, where the model learns the relationships between the features and labels, and a testing loop, where the model is evaluated. Let’s see what exactly each of the loops contains.

Training Loop¶

As mentioned above, in the training loop, the model goes through the training data and learns how the features are related to the labels.

The steps inside a training loop contain:

Forward pass – the model performs the forward method on all training data,
Loss calculation – the model’s predictions are compared to the test values to see how badly the model performs,
Gradient zeroing – the optimizer’s gradients are set to zero (by default, they’re accumulated) so that they can be calculated from scratch for this step,
Backpropagation – the gradient of the loss with respect to each parameter with requires_grad set to True is calculated,
Gradient descent – the parameters are updated.

Testing Loop¶

The testing loop consists of the following steps:

Forward pass – the model performs the forward method on all testing data,
Loss calculation – the model’s predictions are compared to the test values to see how badly the model performs,
(optionally) Evaluation metrics – we can calculate metrics like accuracy, precision or recall on the test set; we’re not going to do it here.

Training the Model¶

Now, let’s implement all these steps in the training and testing loops:

In [15]:

# seed for reprodicibility of random values
torch.manual_seed(42)

# Train for 5000 epochs.
epochs = 5000

# Put data on the available device
X_train = X_train.to(device)
X_test = X_test.to(device)
y_train = y_train.to(device)
y_test = y_test.to(device)

for epoch in range(epochs):
    ### Training loop
    model.train()

    # 1. Forward pass
    y_logits = model(X_train) # raw logits
    y_pred = torch.softmax(y_logits, dim=1).argmax(dim=1) # logits -> prediction probabilities -> prediction labels

    # 2. Loss and accuracy calculation
    loss = loss_fn(y_logits, y_train)
    acc = accuracy_fn(y_true=y_train, y_pred=y_pred)

    # 3. Gradient zeroing
    optimizer.zero_grad()

    # 4. Backpropagation
    loss.backward()

    # 5. Gradient descent
    optimizer.step()

    ### Testing loop

    # Put the model in evaluation mode.
    model.eval()

    with torch.inference_mode():
        # 1. Forward pass
        test_logits = model(X_test)
        test_pred = torch.softmax(test_logits, dim=1).argmax(dim=1)

        # 2. Loss and accuracy calculation
        test_loss = loss_fn(test_logits, y_test)
        test_acc = accuracy_fn(y_true=y_test, y_pred=test_pred)

        # Print out information
        if epoch % 500 == 0:
            print(f"Epoch: {epoch} | Loss: {loss:.5f}, Accuracy: {acc:.2f}% | Test loss: {test_loss:.5f}, Test acc: {test_acc:.2f}%")

Epoch: 0 | Loss: 1.11990, Accuracy: 33.33% | Test loss: 1.10627, Test acc: 33.33%
Epoch: 500 | Loss: 0.55444, Accuracy: 73.33% | Test loss: 0.55996, Test acc: 73.89%
Epoch: 1000 | Loss: 0.14995, Accuracy: 97.78% | Test loss: 0.15024, Test acc: 98.33%
Epoch: 1500 | Loss: 0.03684, Accuracy: 99.72% | Test loss: 0.05809, Test acc: 99.44%
Epoch: 2000 | Loss: 0.01758, Accuracy: 100.00% | Test loss: 0.04567, Test acc: 99.44%
Epoch: 2500 | Loss: 0.01090, Accuracy: 100.00% | Test loss: 0.04223, Test acc: 99.44%
Epoch: 3000 | Loss: 0.00771, Accuracy: 100.00% | Test loss: 0.04099, Test acc: 99.44%
Epoch: 3500 | Loss: 0.00589, Accuracy: 100.00% | Test loss: 0.04025, Test acc: 99.44%
Epoch: 4000 | Loss: 0.00467, Accuracy: 100.00% | Test loss: 0.03964, Test acc: 99.44%
Epoch: 4500 | Loss: 0.00378, Accuracy: 100.00% | Test loss: 0.03919, Test acc: 99.44%

As you can see, the model is performing pretty well.

Model Evaluation¶

Let’s evaluate the model and see how it performs on data it never saw:

In [16]:

model.eval()

with torch.inference_mode():
    y_logits = model(X_test) # logits
    y_pred_probs = torch.softmax(y_logits, dim=1) # prediction probabilities
    y_preds = y_pred_probs.argmax(dim=1) # prediction labels

Let’s see how the test data and the predictions compare:

In [17]:

y_preds[:10], y_test[:10]

Out[17]:

(tensor([0, 2, 0, 1, 0, 2, 0, 0, 0, 0]),
 tensor([0, 2, 0, 1, 0, 2, 0, 0, 0, 0]))

It looks good. The labels are identical. But, to see it even better, let’s plot it. To do it, we’ll use a function that we are going to download from an external Github repository. This function will enable us to plot a decision boundary. It was created by Daniel Bourke and we’re going to download it from his repo, which is available at https://github.com/mrdbourke/pytorch-deep-learning/blob/main/helper_functions.py.

In [18]:

import requests
from pathlib import Path

if not Path("helper_functions.py").is_file():
  print("Downloading helper_functions.py")
  request = requests.get("https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/helper_functions.py")
  with open("helper_functions.py", "wb") as f:
    f.write(request.content)

from helper_functions import plot_decision_boundary

Downloading helper_functions.py

The code snippet above downloads the helper_functions.py file. Now we can use the function we’re interested in:

In [19]:

plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.title("Training")
plot_decision_boundary(model, X_train, y_train)
plt.subplot(1, 2, 2)
plt.title("Test")
plot_decision_boundary(model, X_test, y_test)

As we can see, the model classifies almost all of the test data correctly.

Conclusion¶

The model we created performs pretty well. It’s almost perfect.

Spread the love

Nonlinear Multiclass Classification with PyTorch – A Typical Workflow

Data Preparation¶

Creating Data¶

Data Visualization¶

Tensors¶

Training Set and Test Set¶

Model Building¶

Defining the Model¶

Loss Function and Optimizer¶

Metrics¶

Model Training¶

Training Loop¶

Testing Loop¶

Training the Model¶

Model Evaluation¶

Conclusion¶

Like this:

Leave a ReplyCancel reply

Nonlinear Multiclass Classification with PyTorch – A Typical Workflow

Data Preparation¶

Creating Data¶

Data Visualization¶

Tensors¶

Training Set and Test Set¶

Model Building¶

Defining the Model¶

Loss Function and Optimizer¶

Metrics¶

Model Training¶

Training Loop¶

Testing Loop¶

Training the Model¶

Model Evaluation¶

Conclusion¶

Share this:

Like this:

Leave a ReplyCancel reply