Spread the love

Image_Classification_Computer_Vision_with_Convolutional_Neural_Networks_and_PyTorch

In this project, we’ll have a look at a typical workflow for an image classification problem with a convolutional neural network and PyTorch.

There are four major steps to follow during a machine learning process:

Data preparation
Model building
Model training
Model evaluation

And now, without further ado, let’s jump right in and start at the first step.

Table of Contents

Data Preparation¶

Let’s import the libraries we’re going to need:

In [2]:

import torch
from torch import nn
import torchvision
from torchvision import datasets
from torchvision.transforms import ToTensor
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt

As you see, we’re importing torchvision here. This is the most important PyTorch computer vision library. It contains datasets, models and image transformations that we use to handle vision-related tasks.

We’ll also need torchvision.datasets, which contains multiple datasets for image classification, object detection, and other computer vision-related problems.

Next, we’ll need torchvision.transforms to handle image transformations, and in particular ToTensor to transform image data into tensors.

Finally, we’ll need torch.utils.data.DataLoader to create a Python iterable over a dataset.

The Data¶

We’re going to use the EuroSAT dataset. It’s one of many datasets meant for image classification available in torchvision.datsets. You can read about more built-in datasets here.

Here’s the original description of the dataset:

In this study, we address the challenge of land use and land cover classification using Sentinel-2 satellite images. The Sentinel-2 satellite images are openly and freely accessible provided in the Earth observation program Copernicus. We present a novel dataset based on Sentinel-2 satellite images covering 13 spectral bands and consisting out of 10 classes with in total 27,000 labeled and geo-referenced images.

So, let’s load the dataset:

In [3]:

data = torchvision.datasets.EuroSAT(
    root="data",
    download=True,
    transform=ToTensor()
)

Here we specify the following parameters:

root – root directory of the dataset
download – if True, downloads the dataset from the internet (provided it’s not downloaded yet) and puts it in root directory
transform – a function that transforms the data on download and returns a transformed version

Let’s split the data into training and test sets.

In [4]:

from sklearn.model_selection import train_test_split
train_data, test_data = train_test_split(data, test_size=0.2, random_state=42)

Viewing the Data¶

Let’s view a sample of the data.

In [5]:

image, label = train_data[0]
image, label

Out[5]:

(tensor([[[0.1412, 0.1412, 0.1333,  ..., 0.1922, 0.1843, 0.1882],
          [0.1412, 0.1412, 0.1333,  ..., 0.1961, 0.1922, 0.2000],
          [0.1569, 0.1529, 0.1373,  ..., 0.2667, 0.2314, 0.2000],
          ...,
          [0.4235, 0.4235, 0.4392,  ..., 0.2392, 0.2549, 0.2824],
          [0.3490, 0.3412, 0.4471,  ..., 0.2392, 0.2706, 0.3216],
          [0.3294, 0.3216, 0.4078,  ..., 0.2392, 0.2902, 0.3529]],
 
         [[0.2510, 0.2510, 0.2431,  ..., 0.2902, 0.2941, 0.2980],
          [0.2510, 0.2510, 0.2431,  ..., 0.2863, 0.2902, 0.2980],
          [0.2745, 0.2706, 0.2549,  ..., 0.3333, 0.3020, 0.2784],
          ...,
          [0.4745, 0.4745, 0.4902,  ..., 0.3608, 0.3569, 0.3686],
          [0.3922, 0.3961, 0.5020,  ..., 0.3529, 0.3490, 0.3843],
          [0.3725, 0.3765, 0.4627,  ..., 0.3529, 0.3529, 0.4000]],
 
         [[0.2941, 0.2941, 0.2980,  ..., 0.3176, 0.3098, 0.3137],
          [0.2941, 0.2941, 0.2902,  ..., 0.3176, 0.3098, 0.3176],
          [0.3059, 0.3098, 0.2980,  ..., 0.3725, 0.3412, 0.3137],
          ...,
          [0.4980, 0.4980, 0.5137,  ..., 0.3412, 0.3451, 0.3608],
          [0.4078, 0.4078, 0.5059,  ..., 0.3373, 0.3412, 0.3804],
          [0.3804, 0.3804, 0.4627,  ..., 0.3373, 0.3490, 0.4000]]]),
 5)

Input and Output¶

The input are the tensors the images were transformed into on download. There are 3 color channels (red, green, blue), the height and width are 64:

In [6]:

image.shape

Out[6]:

torch.Size([3, 64, 64])

The output is just a single number representing the class. From the dataset description we know there are 10 classes. Let’s enumerate them:

In [7]:

classes = data.classes
classes

Out[7]:

['AnnualCrop',
 'Forest',
 'HerbaceousVegetation',
 'Highway',
 'Industrial',
 'Pasture',
 'PermanentCrop',
 'Residential',
 'River',
 'SeaLake']

Data Visualization¶

Let’s visualize the first sample in the training set:

In [8]:

image, label = train_data[0]
plt.imshow(image.permute(1, 2, 0))
plt.title(classes[label])
plt.axis(False)

Out[8]:

(np.float64(-0.5), np.float64(63.5), np.float64(63.5), np.float64(-0.5))

No description has been provided for this image

We use the image.permute method to change the order of the dimensions. The original order was: color_channel (0) – height (1) – width (2). We need it to be height (1) – width (2) – color_channel (0), hence this specific order.

Now, let’s visualize more samples:

In [9]:

torch.manual_seed(42)
fig = plt.figure(figsize=(14, 9))
rows, cols = 4, 5
for i in range(1, rows * cols + 1):
    random_idx = torch.randint(0, len(train_data), size=[1]).item()
    img, label = train_data[random_idx]
    fig.add_subplot(rows, cols, i)
    plt.imshow(img.permute(1, 2, 0))
    plt.title(classes[label])
    plt.axis(False)

The DataLoader¶

Let’s check out the number of samples in both sets:

In [10]:

len(train_data), len(test_data)

Out[10]:

(21600, 5400)

It’s a lot of data. It’ll be more efficient to break up the large dataset into smaller chunks called batches and work at one batch at a time. To do that, we need a DataLoader. Now the model will have an opportunity to improve not every epoch, but every batch, so much more frequently. A good batch size is 32. We’ll also shuffle the samples so that if they are in any special order, this order will not impact the process of learning. We don’t want the model to learn the order. It’s more efficient to leave the data unshuffled in the test set, though:

In [11]:

BATCH_SIZE = 32
train_dataloader = DataLoader(train_data, batch_size=BATCH_SIZE, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=BATCH_SIZE, shuffle=False)

print(f'Train dataloder: {len(train_dataloader)} batches of {BATCH_SIZE}')
print(f'Test dataloader: {len(test_dataloader)} batches of {BATCH_SIZE}')

Train dataloder: 675 batches of 32
Test dataloader: 169 batches of 32

Let’s check out the shape of the data in the training dataloader:

In [12]:

train_features_batch, train_labels_batch = next(iter(train_dataloader))
train_features_batch.shape, train_labels_batch.shape

Out[12]:

(torch.Size([32, 3, 64, 64]), torch.Size([32]))

So, we we have one more dimension, the batch size.

Device-Agnostic Code¶

Let’s set the device to GPU (cuda) if it’s available or CPU if it’s not:

In [13]:

device = 'cuda' if torch.cuda.is_available() else 'cpu'
device

Out[13]:

'cuda'

Baseline Model¶

It’s always good practice to start with a baseline model other models will be compared to. This way, we’ll be able to see how our next models perform – better or worse than the baseline model.

So, the baseine model will contain just three layers: an nn.Flatten layer and two nn.Linear layers, so no nonlinearity:

In [14]:

class EuroSATModelV0(nn.Module):
    def __init__(self, input_shape: int, hidden_units: int, output_shape: int):
        super().__init__()
        self.layer_stack = nn.Sequential(
            nn.Flatten(),
            nn.Linear(in_features=input_shape, out_features=hidden_units),
            nn.Linear(in_features=hidden_units, out_features=output_shape)
        )

    def forward(self, x):
        return self.layer_stack(x)

The nn.Flatten layer reduces the dimensions represented by height and width into a single dimension. Let’s create an instance of the model:

In [15]:

model0 = EuroSATModelV0(
    input_shape=3 * 64 * 64,
    hidden_units=10,
    output_shape=len(classes)
)
model0.to(device)

Out[15]:

EuroSATModelV0(
  (layer_stack): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=12288, out_features=10, bias=True)
    (2): Linear(in_features=10, out_features=10, bias=True)
  )
)

Loss Function, Optimizer, Accuracy¶

Next, we need a loss function, an optimizer and some evaluation metrics. We’ll use CrossEntropyLoss as the loss function, SGD as the optimizer and torchmetrics.Accuracy as the only metric. First, we must make sure torchmetrics is installed:

In [16]:

!pip install torchmetrics

Requirement already satisfied: torchmetrics in /usr/local/lib/python3.11/dist-packages (1.7.1)
Requirement already satisfied: numpy>1.20.0 in /usr/local/lib/python3.11/dist-packages (from torchmetrics) (2.0.2)
Requirement already satisfied: packaging>17.1 in /usr/local/lib/python3.11/dist-packages (from torchmetrics) (24.2)
Requirement already satisfied: torch>=2.0.0 in /usr/local/lib/python3.11/dist-packages (from torchmetrics) (2.6.0+cu124)
Requirement already satisfied: lightning-utilities>=0.8.0 in /usr/local/lib/python3.11/dist-packages (from torchmetrics) (0.14.3)
Requirement already satisfied: setuptools in /usr/local/lib/python3.11/dist-packages (from lightning-utilities>=0.8.0->torchmetrics) (75.2.0)
Requirement already satisfied: typing_extensions in /usr/local/lib/python3.11/dist-packages (from lightning-utilities>=0.8.0->torchmetrics) (4.13.1)
Requirement already satisfied: filelock in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->torchmetrics) (3.18.0)
Requirement already satisfied: networkx in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->torchmetrics) (3.4.2)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->torchmetrics) (3.1.6)
Requirement already satisfied: fsspec in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->torchmetrics) (2025.3.2)
Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->torchmetrics) (12.4.127)
Requirement already satisfied: nvidia-cuda-runtime-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->torchmetrics) (12.4.127)
Requirement already satisfied: nvidia-cuda-cupti-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->torchmetrics) (12.4.127)
Requirement already satisfied: nvidia-cudnn-cu12==9.1.0.70 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->torchmetrics) (9.1.0.70)
Requirement already satisfied: nvidia-cublas-cu12==12.4.5.8 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->torchmetrics) (12.4.5.8)
Requirement already satisfied: nvidia-cufft-cu12==11.2.1.3 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->torchmetrics) (11.2.1.3)
Requirement already satisfied: nvidia-curand-cu12==10.3.5.147 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->torchmetrics) (10.3.5.147)
Requirement already satisfied: nvidia-cusolver-cu12==11.6.1.9 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->torchmetrics) (11.6.1.9)
Requirement already satisfied: nvidia-cusparse-cu12==12.3.1.170 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->torchmetrics) (12.3.1.170)
Requirement already satisfied: nvidia-cusparselt-cu12==0.6.2 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->torchmetrics) (0.6.2)
Requirement already satisfied: nvidia-nccl-cu12==2.21.5 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->torchmetrics) (2.21.5)
Requirement already satisfied: nvidia-nvtx-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->torchmetrics) (12.4.127)
Requirement already satisfied: nvidia-nvjitlink-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->torchmetrics) (12.4.127)
Requirement already satisfied: triton==3.2.0 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->torchmetrics) (3.2.0)
Requirement already satisfied: sympy==1.13.1 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->torchmetrics) (1.13.1)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.11/dist-packages (from sympy==1.13.1->torch>=2.0.0->torchmetrics) (1.3.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.11/dist-packages (from jinja2->torch>=2.0.0->torchmetrics) (3.0.2)

Now we can instantiate the three:

In [17]:

import torchmetrics

loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(params=model0.parameters(), lr=0.02)
accuracy_fn = torchmetrics.Accuracy(task='multiclass', num_classes=len(classes)).to(device)

Functionizing the Training and Test Loops¶

As we’re going to train and evaluate a couple models, let’s make things a little easier for us and functionalize the two loops, the training loop and the test loop:

In [18]:

def train_step(model: nn.Module,
               data_loader: DataLoader,
               loss_fn: nn.Module,
               optimizer: torch.optim.Optimizer,
               accuracy_fn: torchmetrics.Accuracy,
               device: torch.device = device):
    '''
    Trains the model on a data_loader.

    Args:
        model (nn.Module): A PyTorch model to train.
        data_loader (DataLoader): A PyTorch DataLoader instance.
        loss_fn (nn.Module): The loss function of the model.
        optimizer (torch.optim.Optimizer): The optimizer to use to update the model's parameters.
        accuracy_fn (torchmetrics.Accuracy): The accuracy function of the model.
        device (torch.device, optional): The device to train on. Defaults to device.
    '''

    train_loss, train_acc = 0, 0
    model.to(device)
    model.train()
    for batch, (X, y) in enumerate(data_loader):
        # send data to device
        X, y = X.to(device), y.to(device)

        # forward pass
        y_pred = model(X)

        # loss
        loss = loss_fn(y_pred, y)
        train_loss += loss

        # accuracy
        acc = accuracy_fn(y_pred, y)
        train_acc += acc

        # zero grad
        optimizer.zero_grad()

        # backpropagation
        loss.backward()

        # optimizer step
        optimizer.step()

    # loss and accuracy per epoch
    train_loss /= len(data_loader)
    train_acc /= len(data_loader)

    # info
    print(f'Train loss: {train_loss:.5f} | Train acc: {train_acc:.5f}')

def test_step(model: nn.Module,
              data_loader: DataLoader,
              loss_fn: nn.Module,
              accuracy_fn: torchmetrics.Accuracy,
              device: torch.device = device):
    '''
    Evaluates the model on a data_loader.

    Args:
        model (nn.Module): A PyTorch model.
        data_loader (DataLoader): A PyTorch DataLoader instance.
        loss_fn (nn.Module): The loss function of the model.
        accuracy_fn (torchmetrics.Accuracy): The accuracy function of the model.
        device (torch.device, optional): The device.
    '''

    test_loss, test_acc = 0, 0
    model.to(device)
    model.eval()
    with torch.inference_mode():
        for X, y in data_loader:
            # send data to device
            X, y = X.to(device), y.to(device)

            # forward pass
            test_pred = model(X)

            # loss
            test_loss += loss_fn(test_pred, y)

            # accuracy
            test_acc += accuracy_fn(test_pred, y)

        # loss and accuracy
        test_loss /= len(data_loader)
        test_acc /= len(data_loader)

        # info
        print(f'Test loss: {test_loss:.5f} | Test acc: {test_acc:.5f}')

Let’s now use the functions to train and evaluate the baseline model:

In [19]:

torch.manual_seed(42)

epochs = 20

for epoch in range(epochs):
    print(f'Epoch: {epoch}\n---------')
    train_step(model=model0, data_loader=train_dataloader, loss_fn=loss_fn, optimizer=optimizer, accuracy_fn=accuracy_fn)
    test_step(model=model0, data_loader=test_dataloader, loss_fn=loss_fn, accuracy_fn=accuracy_fn)

Epoch: 0
---------
Train loss: 1.90438 | Train acc: 0.27403
Test loss: 1.73945 | Test acc: 0.37075
Epoch: 1
---------
Train loss: 1.73678 | Train acc: 0.33690
Test loss: 1.71340 | Test acc: 0.31755
Epoch: 2
---------
Train loss: 1.70461 | Train acc: 0.35361
Test loss: 1.79175 | Test acc: 0.30356
Epoch: 3
---------
Train loss: 1.68641 | Train acc: 0.36264
Test loss: 1.68053 | Test acc: 0.37512
Epoch: 4
---------
Train loss: 1.67662 | Train acc: 0.36759
Test loss: 1.64029 | Test acc: 0.38505
Epoch: 5
---------
Train loss: 1.66598 | Train acc: 0.37278
Test loss: 1.63045 | Test acc: 0.38911
Epoch: 6
---------
Train loss: 1.65819 | Train acc: 0.38083
Test loss: 1.77616 | Test acc: 0.30560
Epoch: 7
---------
Train loss: 1.64496 | Train acc: 0.38731
Test loss: 1.74361 | Test acc: 0.33876
Epoch: 8
---------
Train loss: 1.64172 | Train acc: 0.38532
Test loss: 1.63300 | Test acc: 0.37272
Epoch: 9
---------
Train loss: 1.63537 | Train acc: 0.38954
Test loss: 1.70837 | Test acc: 0.35441
Epoch: 10
---------
Train loss: 1.62831 | Train acc: 0.39056
Test loss: 1.71232 | Test acc: 0.36662
Epoch: 11
---------
Train loss: 1.62696 | Train acc: 0.39255
Test loss: 1.72715 | Test acc: 0.38098
Epoch: 12
---------
Train loss: 1.62105 | Train acc: 0.39704
Test loss: 1.71678 | Test acc: 0.38727
Epoch: 13
---------
Train loss: 1.61350 | Train acc: 0.40009
Test loss: 1.64302 | Test acc: 0.35639
Epoch: 14
---------
Train loss: 1.60781 | Train acc: 0.39856
Test loss: 1.61220 | Test acc: 0.36236
Epoch: 15
---------
Train loss: 1.60607 | Train acc: 0.40255
Test loss: 1.63067 | Test acc: 0.38609
Epoch: 16
---------
Train loss: 1.60241 | Train acc: 0.40333
Test loss: 1.69421 | Test acc: 0.38036
Epoch: 17
---------
Train loss: 1.59711 | Train acc: 0.40356
Test loss: 1.64635 | Test acc: 0.42480
Epoch: 18
---------
Train loss: 1.59209 | Train acc: 0.40657
Test loss: 1.60754 | Test acc: 0.43916
Epoch: 19
---------
Train loss: 1.59512 | Train acc: 0.40370
Test loss: 1.64686 | Test acc: 0.34159

As we can see, the loss is decreasing and the accuracy is increasing, which is good. Let’s create a function to display these two values so that we ca use it later with the other models too:

In [20]:

torch.manual_seed(42)

def evaluate_model(model: nn.Module,
                   data_loader: DataLoader,
                   loss_fn: nn.Module,
                   accuracy_fn: torchmetrics.Accuracy,
                   device: torch.device):
    '''
    Evaluates the model on a data_loader.

    Args:
        model (nn.Module): A PyTorch model trained to make predictions on the data_loader.
        data_loader (DataLoader): A PyTorch DataLoader instance.
        loss_fn (nn.Module): The loss function of the model.
        accuracy_fn (torchmetrics.Accuracy): The accuracy function of the model.

    Returns:
        Returns a dictionary with the results of model predicting on data_loader.
    '''

    loss, acc = 0, 0
    model.eval()
    with torch.inference_mode():
        for X, y in data_loader:
            # send data to device
            X, y = X.to(device), y.to(device)

            # forward pass
            y_pred = model(X)

            # loss
            loss += loss_fn(y_pred, y)

            # accuracy
            acc += accuracy_fn(y_pred, y)

        # loss and accuracy in batch
        loss /= len(data_loader)
        acc /= len(data_loader)

    return {'model': model.__class__.__name__,
            'loss': loss.item(),
            'accuracy': acc.item()}

And now let’s use the function to evaluate the baseline model:

In [21]:

model0_evaluation = evaluate_model(model=model0, data_loader=test_dataloader, loss_fn=loss_fn, accuracy_fn=accuracy_fn, device=device)
model0_evaluation

Out[21]:

{'model': 'EuroSATModelV0',
 'loss': 1.6468614339828491,
 'accuracy': 0.3415926992893219}

Model with Nonlinearities¶

Let’s create another model, very much like the baseline model, but this time with some nonlinear activation functions between the layers:

In [22]:

class EuroSATModelV1(nn.Module):
    def __init__(self, input_shape: int, hidden_units: int, output_shape: int):
        super().__init__()
        self.layer_stack = nn.Sequential(
            nn.Flatten(),
            nn.Linear(in_features=input_shape, out_features=hidden_units),
            nn.ReLU(),
            nn.Linear(in_features=hidden_units, out_features=output_shape),
            nn.ReLU()
        )

    def forward(self, x):
        return self.layer_stack(x)

Here’s an instance of the model:

In [23]:

torch.manual_seed(42)

model1 = EuroSATModelV1(
    input_shape=3 * 64 * 64,
    hidden_units=10,
    output_shape=len(classes)
).to(device)

Let’s define the loss function, optimizer and accuracy:

In [24]:

loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(params=model1.parameters(), lr=0.02)
accuracy_fn = torchmetrics.Accuracy(task='multiclass', num_classes=len(classes)).to(device)

Now we can train and evaluate the model:

In [25]:

epochs = 20

for epoch in range(epochs):
    print(f'Epoch: {epoch}\n---------')
    train_step(model=model1, data_loader=train_dataloader, loss_fn=loss_fn, optimizer=optimizer, accuracy_fn=accuracy_fn)
    test_step(model=model1, data_loader=test_dataloader, loss_fn=loss_fn, accuracy_fn=accuracy_fn)

Epoch: 0
---------
Train loss: 2.22829 | Train acc: 0.18944
Test loss: 2.19198 | Test acc: 0.19903
Epoch: 1
---------
Train loss: 2.17576 | Train acc: 0.20185
Test loss: 2.17099 | Test acc: 0.20587
Epoch: 2
---------
Train loss: 2.15642 | Train acc: 0.21685
Test loss: 2.16047 | Test acc: 0.22608
Epoch: 3
---------
Train loss: 2.14495 | Train acc: 0.22667
Test loss: 2.15271 | Test acc: 0.22584
Epoch: 4
---------
Train loss: 2.13550 | Train acc: 0.23389
Test loss: 2.13791 | Test acc: 0.22898
Epoch: 5
---------
Train loss: 2.12349 | Train acc: 0.23991
Test loss: 2.12725 | Test acc: 0.23126
Epoch: 6
---------
Train loss: 2.11282 | Train acc: 0.23917
Test loss: 2.12444 | Test acc: 0.24932
Epoch: 7
---------
Train loss: 2.10570 | Train acc: 0.24449
Test loss: 2.12664 | Test acc: 0.22947
Epoch: 8
---------
Train loss: 2.10138 | Train acc: 0.24444
Test loss: 2.10546 | Test acc: 0.24069
Epoch: 9
---------
Train loss: 2.09728 | Train acc: 0.24704
Test loss: 2.10856 | Test acc: 0.26769
Epoch: 10
---------
Train loss: 2.09383 | Train acc: 0.24875
Test loss: 2.09566 | Test acc: 0.24760
Epoch: 11
---------
Train loss: 2.09142 | Train acc: 0.24884
Test loss: 2.11417 | Test acc: 0.24581
Epoch: 12
---------
Train loss: 2.08912 | Train acc: 0.24940
Test loss: 2.09818 | Test acc: 0.24168
Epoch: 13
---------
Train loss: 2.08804 | Train acc: 0.25069
Test loss: 2.09497 | Test acc: 0.24519
Epoch: 14
---------
Train loss: 2.08647 | Train acc: 0.25181
Test loss: 2.09831 | Test acc: 0.24199
Epoch: 15
---------
Train loss: 2.08673 | Train acc: 0.25181
Test loss: 2.09825 | Test acc: 0.25037
Epoch: 16
---------
Train loss: 2.08387 | Train acc: 0.25194
Test loss: 2.09910 | Test acc: 0.26492
Epoch: 17
---------
Train loss: 2.08352 | Train acc: 0.25236
Test loss: 2.08875 | Test acc: 0.25234
Epoch: 18
---------
Train loss: 2.08143 | Train acc: 0.25384
Test loss: 2.09729 | Test acc: 0.26418
Epoch: 19
---------
Train loss: 2.08204 | Train acc: 0.25412
Test loss: 2.08610 | Test acc: 0.25481

In [26]:

torch.manual_seed(42)

model1_evaluation = evaluate_model(model=model1, data_loader=test_dataloader, loss_fn=loss_fn, accuracy_fn=accuracy_fn, device=device)
model1_evaluation

Out[26]:

{'model': 'EuroSATModelV1',
 'loss': 2.086103677749634,
 'accuracy': 0.2548076808452606}

Let’s compare the two models:

In [27]:

import pandas as pd

models_comparison = pd.DataFrame([model0_evaluation, model1_evaluation]).set_index('model')
models_comparison

Out[27]:

	loss	accuracy
model
EuroSATModelV0	1.646861	0.341593
EuroSATModelV1	2.086104	0.254808

Looks like adding the nonlinear activation functions didn’t help at all. Just the contrary. What about a type of model that is best suited for working with images? Our third, final, model is going to be a convolutional neural network.

Convolutional Neural Network (CNN)¶

Convolutional neural networks are good at finding patterns in visual data. Between the input and output layers they contain blocks with, possibly repeated, convolutional layers, activation layers and pooling layers. And possibly also other types of layers. Let’s create another model, this time a CNN, with two such blocks and a classifier:

In [28]:

class EuroSATModelV2(nn.Module):
    def __init__(self, input_shape: int, hidden_units: int, output_shape: int):
        super().__init__()
        self.conv_block_1 = nn.Sequential(
            nn.Conv2d(in_channels=input_shape, out_channels=hidden_units, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.Conv2d(in_channels=hidden_units, out_channels=hidden_units, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2)
        )
        self.conv_block_2 = nn.Sequential(
            nn.Conv2d(in_channels=hidden_units, out_channels=hidden_units, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.Conv2d(in_channels=hidden_units, out_channels=hidden_units, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2)
        )
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(in_features=hidden_units * 16 * 16, out_features=output_shape)
        )

    def forward(self, x: torch.Tensor):
        x = self.conv_block_1(x)
        x = self.conv_block_2(x)
        x = self.classifier(x)
        return x

As you can see, we pass the data through each of the blocks in turn. Here we have two types of layers characteristic of CNNs, the Conv2d and the MaxPool2d layers. Let’s have a look at them on by one.

Let’s start with the nn.conv2d layer. It takes the following attributes:

in_channels – number of channels in the input image,
out_channels – number of channels produced by the convolution,
kernel_size – size of the kernel (filter) that hovers over the pixels of the image,
stride – the step of the kernel at each move,
padding – padding added to the four sides of the input.

The first convolutional layer is followed by a ReLU activation layer, another convolutional layer, another ReLU layer and a nn.MaxPool2d layer. It takes a kernel_size attribute that affects the output shape. In this layer, each time only the maximum value from the given part of a tensor is picked.

Let’s instantiate the model:

In [29]:

torch.manual_seed(42)
model2 = EuroSATModelV2(
    input_shape=3,
    hidden_units=32,
    output_shape=len(classes)
).to(device)

model2

Out[29]:

EuroSATModelV2(
  (conv_block_1): Sequential(
    (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU()
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (conv_block_2): Sequential(
    (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU()
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (classifier): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=8192, out_features=10, bias=True)
  )
)

Let’s now define the loss function, optimizer and accuracy, just as before:

In [30]:

loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(params=model2.parameters(), lr=0.02)
accuracy_fn = torchmetrics.Accuracy(task='multiclass', num_classes=len(classes)).to(device)

Finally, let’s train and evaluate the model.

In [31]:

epochs = 20
for epoch in range(epochs):
    print(f'Epoch: {epoch}\n---------')

    # train
    train_step(model=model2, data_loader=train_dataloader, loss_fn=loss_fn, optimizer=optimizer, accuracy_fn=accuracy_fn, device=device)
    test_step(model=model2, data_loader=test_dataloader, loss_fn=loss_fn, accuracy_fn=accuracy_fn, device=device)

Epoch: 0
---------
Train loss: 1.92549 | Train acc: 0.26407
Test loss: 1.67585 | Test acc: 0.34005
Epoch: 1
---------
Train loss: 1.55153 | Train acc: 0.43287
Test loss: 1.33739 | Test acc: 0.50986
Epoch: 2
---------
Train loss: 1.18155 | Train acc: 0.58403
Test loss: 1.00698 | Test acc: 0.63283
Epoch: 3
---------
Train loss: 0.91457 | Train acc: 0.67819
Test loss: 0.98556 | Test acc: 0.64904
Epoch: 4
---------
Train loss: 0.76308 | Train acc: 0.73227
Test loss: 0.96170 | Test acc: 0.66389
Epoch: 5
---------
Train loss: 0.64701 | Train acc: 0.77296
Test loss: 0.70806 | Test acc: 0.75462
Epoch: 6
---------
Train loss: 0.57117 | Train acc: 0.80028
Test loss: 0.69177 | Test acc: 0.76140
Epoch: 7
---------
Train loss: 0.48993 | Train acc: 0.82963
Test loss: 0.68994 | Test acc: 0.76362
Epoch: 8
---------
Train loss: 0.44711 | Train acc: 0.84259
Test loss: 0.63598 | Test acc: 0.79093
Epoch: 9
---------
Train loss: 0.39383 | Train acc: 0.86176
Test loss: 0.62630 | Test acc: 0.79167
Epoch: 10
---------
Train loss: 0.35554 | Train acc: 0.87444
Test loss: 0.66397 | Test acc: 0.79050
Epoch: 11
---------
Train loss: 0.30153 | Train acc: 0.89472
Test loss: 0.71465 | Test acc: 0.78482
Epoch: 12
---------
Train loss: 0.27912 | Train acc: 0.90255
Test loss: 0.75525 | Test acc: 0.78310
Epoch: 13
---------
Train loss: 0.47336 | Train acc: 0.84134
Test loss: 0.71628 | Test acc: 0.76103
Epoch: 14
---------
Train loss: 0.30538 | Train acc: 0.89204
Test loss: 0.79924 | Test acc: 0.75832
Epoch: 15
---------
Train loss: 0.23845 | Train acc: 0.92190
Test loss: 0.93719 | Test acc: 0.75382
Epoch: 16
---------
Train loss: 0.20709 | Train acc: 0.92903
Test loss: 0.86071 | Test acc: 0.78741
Epoch: 17
---------
Train loss: 0.21379 | Train acc: 0.92741
Test loss: 0.95157 | Test acc: 0.76356
Epoch: 18
---------
Train loss: 0.19080 | Train acc: 0.93745
Test loss: 0.84801 | Test acc: 0.78575
Epoch: 19
---------
Train loss: 0.14242 | Train acc: 0.95324
Test loss: 0.90357 | Test acc: 0.79820

In [32]:

model2_evaluation = evaluate_model(model=model2, data_loader=test_dataloader, loss_fn=loss_fn, accuracy_fn=accuracy_fn, device=device)
model2_evaluation

Out[32]:

{'model': 'EuroSATModelV2',
 'loss': 0.9035666584968567,
 'accuracy': 0.7982001900672913}

Model Comparison¶

Let’s compare all three models:

In [33]:

models_comparison = pd.DataFrame([model0_evaluation, model1_evaluation, model2_evaluation]).set_index('model')
models_comparison

Out[33]:

	loss	accuracy
model
EuroSATModelV0	1.646861	0.341593
EuroSATModelV1	2.086104	0.254808
EuroSATModelV2	0.903567	0.798200

Let’s also compare the models visually:

In [34]:

models_comparison[['loss', 'accuracy']].plot.bar(title='Loss & Accuracy')
plt.xlabel('Model')
plt.xticks(rotation=45)

Out[34]:

(array([0, 1, 2]),
 [Text(0, 0, 'EuroSATModelV0'),
  Text(1, 0, 'EuroSATModelV1'),
  Text(2, 0, 'EuroSATModelV2')])

Looks like the CNN performs best. But it’s not ideal. Let’s try to improve it.

Optimizing the CNN Model¶

The first thing we could do is add some nn.Dropout layers to the two blocks. Their role is to drop a portion of the neurons. Let’s set the portion to be 30%:

In [35]:

class EuroSATModelV2a(nn.Module):
    def __init__(self, input_shape: int, hidden_units: int, output_shape: int):
        super().__init__()
        self.conv_block_1 = nn.Sequential(
            nn.Conv2d(in_channels=input_shape, out_channels=hidden_units, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.Conv2d(in_channels=hidden_units, out_channels=hidden_units, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2),
            nn.Dropout(p=0.3)
        )
        self.conv_block_2 = nn.Sequential(
            nn.Conv2d(in_channels=hidden_units, out_channels=hidden_units, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.Conv2d(in_channels=hidden_units, out_channels=hidden_units, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2),
            nn.Dropout(p=0.3)
        )
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(in_features=hidden_units * 16 * 16, out_features=output_shape)
        )

    def forward(self, x: torch.Tensor):
        x = self.conv_block_1(x)
        x = self.conv_block_2(x)
        x = self.classifier(x)
        return x

Let’s instantiate the model:

In [36]:

torch.manual_seed(42)
model2a = EuroSATModelV2a(
    input_shape=3,
    hidden_units=32,
    output_shape=len(classes)
).to(device)

model2a

Out[36]:

EuroSATModelV2a(
  (conv_block_1): Sequential(
    (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU()
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Dropout(p=0.3, inplace=False)
  )
  (conv_block_2): Sequential(
    (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU()
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Dropout(p=0.3, inplace=False)
  )
  (classifier): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=8192, out_features=10, bias=True)
  )
)

Let’s now define the loss function, optimizer and accuracy:

In [37]:

loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(params=model2a.parameters(), lr=0.02)
accuracy_fn = torchmetrics.Accuracy(task='multiclass', num_classes=len(classes)).to(device)

Let’s train and evaluate the model:

In [38]:

epochs = 20
for epoch in range(epochs):
    print(f'Epoch: {epoch}\n---------')

    # train
    train_step(model=model2a, data_loader=train_dataloader, loss_fn=loss_fn, optimizer=optimizer, accuracy_fn=accuracy_fn, device=device)
    test_step(model=model2a, data_loader=test_dataloader, loss_fn=loss_fn, accuracy_fn=accuracy_fn, device=device)

Epoch: 0
---------
Train loss: 1.92495 | Train acc: 0.26181
Test loss: 1.68880 | Test acc: 0.33284
Epoch: 1
---------
Train loss: 1.63997 | Train acc: 0.39977
Test loss: 1.61185 | Test acc: 0.43362
Epoch: 2
---------
Train loss: 1.40260 | Train acc: 0.50014
Test loss: 1.16747 | Test acc: 0.57890
Epoch: 3
---------
Train loss: 1.11090 | Train acc: 0.60968
Test loss: 0.95288 | Test acc: 0.64540
Epoch: 4
---------
Train loss: 0.89898 | Train acc: 0.68093
Test loss: 0.91601 | Test acc: 0.65459
Epoch: 5
---------
Train loss: 0.76764 | Train acc: 0.72440
Test loss: 0.66465 | Test acc: 0.76504
Epoch: 6
---------
Train loss: 0.66887 | Train acc: 0.76380
Test loss: 0.83158 | Test acc: 0.69027
Epoch: 7
---------
Train loss: 0.60047 | Train acc: 0.78477
Test loss: 0.62674 | Test acc: 0.77620
Epoch: 8
---------
Train loss: 0.54571 | Train acc: 0.80671
Test loss: 0.54791 | Test acc: 0.80806
Epoch: 9
---------
Train loss: 0.50021 | Train acc: 0.81880
Test loss: 0.59864 | Test acc: 0.78717
Epoch: 10
---------
Train loss: 0.44885 | Train acc: 0.84056
Test loss: 0.60463 | Test acc: 0.78384
Epoch: 11
---------
Train loss: 0.41617 | Train acc: 0.84856
Test loss: 0.50600 | Test acc: 0.81934
Epoch: 12
---------
Train loss: 0.39486 | Train acc: 0.85968
Test loss: 0.52694 | Test acc: 0.81607
Epoch: 13
---------
Train loss: 0.36861 | Train acc: 0.86917
Test loss: 0.73988 | Test acc: 0.77817
Epoch: 14
---------
Train loss: 0.34378 | Train acc: 0.87907
Test loss: 0.53473 | Test acc: 0.80917
Epoch: 15
---------
Train loss: 0.32078 | Train acc: 0.88755
Test loss: 0.45843 | Test acc: 0.84449
Epoch: 16
---------
Train loss: 0.29895 | Train acc: 0.89444
Test loss: 0.52832 | Test acc: 0.82723
Epoch: 17
---------
Train loss: 0.28797 | Train acc: 0.89662
Test loss: 0.48809 | Test acc: 0.83870
Epoch: 18
---------
Train loss: 0.27637 | Train acc: 0.90218
Test loss: 0.47354 | Test acc: 0.84369
Epoch: 19
---------
Train loss: 0.25855 | Train acc: 0.91000
Test loss: 0.43474 | Test acc: 0.86021

In [39]:

model2a_evaluation = evaluate_model(model=model2a, data_loader=test_dataloader, loss_fn=loss_fn, accuracy_fn=accuracy_fn, device=device)
model2a_evaluation

Out[39]:

{'model': 'EuroSATModelV2a',
 'loss': 0.43474334478378296,
 'accuracy': 0.8602071404457092}

Let’s now compare this model with the previous version:

In [40]:

models_comparison = pd.DataFrame([model2_evaluation, model2a_evaluation]).set_index('model')
models_comparison

Out[40]:

	loss	accuracy
model
EuroSATModelV2	0.903567	0.798200
EuroSATModelV2a	0.434743	0.860207

In [41]:

models_comparison[['loss', 'accuracy']].plot.bar(title='Loss & Accuracy')
plt.xlabel('Model')
plt.xticks(rotation=45)

Out[41]:

(array([0, 1]), [Text(0, 0, 'EuroSATModelV2'), Text(1, 0, 'EuroSATModelV2a')])

We can further optimize the model by adding another nn.Conv2s layer to each block:

In [42]:

class EuroSATModelV2b(nn.Module):
    def __init__(self, input_shape: int, hidden_units: int, output_shape: int):
        super().__init__()
        self.conv_block_1 = nn.Sequential(
            nn.Conv2d(in_channels=input_shape, out_channels=hidden_units, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.Conv2d(in_channels=hidden_units, out_channels=hidden_units, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.Conv2d(in_channels=hidden_units, out_channels=hidden_units, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2),
            nn.Dropout(p=0.3)
        )
        self.conv_block_2 = nn.Sequential(
            nn.Conv2d(in_channels=hidden_units, out_channels=hidden_units, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.Conv2d(in_channels=hidden_units, out_channels=hidden_units, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.Conv2d(in_channels=hidden_units, out_channels=hidden_units, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2),
            nn.Dropout(p=0.3)
        )
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(in_features=hidden_units * 16 * 16, out_features=output_shape)
        )

    def forward(self, x: torch.Tensor):
        x = self.conv_block_1(x)
        x = self.conv_block_2(x)
        x = self.classifier(x)
        return x

Let’s instantiate the model:

In [43]:

torch.manual_seed(42)
model2b = EuroSATModelV2b(
    input_shape=3,
    hidden_units=32,
    output_shape=len(classes)
).to(device)

model2b

Out[43]:

EuroSATModelV2b(
  (conv_block_1): Sequential(
    (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU()
    (4): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (5): ReLU()
    (6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (7): Dropout(p=0.3, inplace=False)
  )
  (conv_block_2): Sequential(
    (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU()
    (4): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (5): ReLU()
    (6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (7): Dropout(p=0.3, inplace=False)
  )
  (classifier): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=8192, out_features=10, bias=True)
  )
)

Let’s now define the loss function, optimizer and accuracy:

In [44]:

loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(params=model2b.parameters(), lr=0.02)
accuracy_fn = torchmetrics.Accuracy(task='multiclass', num_classes=len(classes)).to(device)

Let’s train and evaluate the model:

In [45]:

epochs = 20
for epoch in range(epochs):
    print(f'Epoch: {epoch}\n---------')

    # train
    train_step(model=model2b, data_loader=train_dataloader, loss_fn=loss_fn, optimizer=optimizer, accuracy_fn=accuracy_fn, device=device)
    test_step(model=model2b, data_loader=test_dataloader, loss_fn=loss_fn, accuracy_fn=accuracy_fn, device=device)

Epoch: 0
---------
Train loss: 2.29624 | Train acc: 0.10782
Test loss: 2.29568 | Test acc: 0.11236
Epoch: 1
---------
Train loss: 2.07952 | Train acc: 0.20009
Test loss: 1.76738 | Test acc: 0.29543
Epoch: 2
---------
Train loss: 1.73246 | Train acc: 0.35181
Test loss: 1.62496 | Test acc: 0.39750
Epoch: 3
---------
Train loss: 1.54407 | Train acc: 0.43130
Test loss: 1.40638 | Test acc: 0.48515
Epoch: 4
---------
Train loss: 1.32119 | Train acc: 0.51361
Test loss: 1.21183 | Test acc: 0.55393
Epoch: 5
---------
Train loss: 1.06509 | Train acc: 0.62190
Test loss: 0.90633 | Test acc: 0.67098
Epoch: 6
---------
Train loss: 0.91606 | Train acc: 0.67144
Test loss: 0.83162 | Test acc: 0.70353
Epoch: 7
---------
Train loss: 0.81681 | Train acc: 0.70634
Test loss: 0.69446 | Test acc: 0.75783
Epoch: 8
---------
Train loss: 0.67483 | Train acc: 0.75968
Test loss: 0.61005 | Test acc: 0.78556
Epoch: 9
---------
Train loss: 0.61503 | Train acc: 0.78444
Test loss: 0.69197 | Test acc: 0.75727
Epoch: 10
---------
Train loss: 0.55332 | Train acc: 0.80644
Test loss: 0.56710 | Test acc: 0.80738
Epoch: 11
---------
Train loss: 0.49726 | Train acc: 0.82315
Test loss: 0.56055 | Test acc: 0.80615
Epoch: 12
---------
Train loss: 0.45686 | Train acc: 0.83574
Test loss: 0.63468 | Test acc: 0.79093
Epoch: 13
---------
Train loss: 0.41511 | Train acc: 0.85315
Test loss: 0.54203 | Test acc: 0.81879
Epoch: 14
---------
Train loss: 0.40171 | Train acc: 0.86000
Test loss: 0.52208 | Test acc: 0.82495
Epoch: 15
---------
Train loss: 0.35454 | Train acc: 0.87560
Test loss: 0.49974 | Test acc: 0.83389
Epoch: 16
---------
Train loss: 0.33627 | Train acc: 0.87856
Test loss: 0.55583 | Test acc: 0.81700
Epoch: 17
---------
Train loss: 0.31076 | Train acc: 0.89074
Test loss: 0.62120 | Test acc: 0.80769
Epoch: 18
---------
Train loss: 0.28916 | Train acc: 0.89532
Test loss: 0.53589 | Test acc: 0.83315
Epoch: 19
---------
Train loss: 0.27316 | Train acc: 0.90597
Test loss: 0.47864 | Test acc: 0.84307

In [46]:

model2b_evaluation = evaluate_model(model=model2b, data_loader=test_dataloader, loss_fn=loss_fn, accuracy_fn=accuracy_fn, device=device)
model2b_evaluation

Out[46]:

{'model': 'EuroSATModelV2b',
 'loss': 0.4786429703235626,
 'accuracy': 0.8430720567703247}

Let’s see how the three CNN models compare:

In [47]:

models_comparison = pd.DataFrame([model2_evaluation, model2a_evaluation, model2b_evaluation]).set_index('model')
models_comparison

Out[47]:

	loss	accuracy
model
EuroSATModelV2	0.903567	0.798200
EuroSATModelV2a	0.434743	0.860207
EuroSATModelV2b	0.478643	0.843072

In [48]:

models_comparison[['loss', 'accuracy']].plot.bar(title='Loss & Accuracy')
plt.xlabel('Model')
plt.xticks(rotation=45)

Out[48]:

(array([0, 1, 2]),
 [Text(0, 0, 'EuroSATModelV2'),
  Text(1, 0, 'EuroSATModelV2a'),
  Text(2, 0, 'EuroSATModelV2b')])

Let’s add one more improvement. This time, we’ll modify the classifier by adding a 50%-nn.Dropout layer, an nn.ReLU layer and another linear layer:

In [49]:

class EuroSATModelV2c(nn.Module):
    def __init__(self, input_shape: int, hidden_units: int, output_shape: int):
        super().__init__()
        self.conv_block_1 = nn.Sequential(
            nn.Conv2d(in_channels=input_shape, out_channels=hidden_units, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.Conv2d(in_channels=hidden_units, out_channels=hidden_units, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.Conv2d(in_channels=hidden_units, out_channels=hidden_units, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2),
            nn.Dropout(p=0.3)
        )
        self.conv_block_2 = nn.Sequential(
            nn.Conv2d(in_channels=hidden_units, out_channels=hidden_units, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.Conv2d(in_channels=hidden_units, out_channels=hidden_units, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.Conv2d(in_channels=hidden_units, out_channels=hidden_units, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2),
            nn.Dropout(p=0.3)
        )
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(in_features=hidden_units * 16 * 16, out_features=hidden_units * 16 * 16),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(in_features=hidden_units * 16 * 16, out_features=output_shape)
        )

    def forward(self, x: torch.Tensor):
        x = self.conv_block_1(x)
        x = self.conv_block_2(x)
        x = self.classifier(x)
        return x

Let’s instantiate the model:

In [50]:

torch.manual_seed(42)
model2c = EuroSATModelV2c(
    input_shape=3,
    hidden_units=32,
    output_shape=len(classes)
).to(device)

model2c

Out[50]:

EuroSATModelV2c(
  (conv_block_1): Sequential(
    (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU()
    (4): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (5): ReLU()
    (6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (7): Dropout(p=0.3, inplace=False)
  )
  (conv_block_2): Sequential(
    (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU()
    (4): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (5): ReLU()
    (6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (7): Dropout(p=0.3, inplace=False)
  )
  (classifier): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=8192, out_features=8192, bias=True)
    (2): ReLU()
    (3): Dropout(p=0.5, inplace=False)
    (4): Linear(in_features=8192, out_features=10, bias=True)
  )
)

Let’s define the loss function, optimizer and accuracy:

In [51]:

loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(params=model2c.parameters(), lr=0.02)
accuracy_fn = torchmetrics.Accuracy(task='multiclass', num_classes=len(classes)).to(device)

Let’s train and evaluate the model:

In [52]:

epochs = 20
for epoch in range(epochs):
    print(f'Epoch: {epoch}\n---------')

    # train
    train_step(model=model2c, data_loader=train_dataloader, loss_fn=loss_fn, optimizer=optimizer, accuracy_fn=accuracy_fn, device=device)
    test_step(model=model2c, data_loader=test_dataloader, loss_fn=loss_fn, accuracy_fn=accuracy_fn, device=device)

Epoch: 0
---------
Train loss: 2.29606 | Train acc: 0.11208
Test loss: 2.29577 | Test acc: 0.11236
Epoch: 1
---------
Train loss: 2.29500 | Train acc: 0.11403
Test loss: 2.29531 | Test acc: 0.11144
Epoch: 2
---------
Train loss: 2.13820 | Train acc: 0.18023
Test loss: 1.79248 | Test acc: 0.30350
Epoch: 3
---------
Train loss: 1.72966 | Train acc: 0.33032
Test loss: 1.59057 | Test acc: 0.37555
Epoch: 4
---------
Train loss: 1.58228 | Train acc: 0.38926
Test loss: 1.53846 | Test acc: 0.40736
Epoch: 5
---------
Train loss: 1.36793 | Train acc: 0.49431
Test loss: 1.18499 | Test acc: 0.58752
Epoch: 6
---------
Train loss: 1.15396 | Train acc: 0.57088
Test loss: 1.03731 | Test acc: 0.62229
Epoch: 7
---------
Train loss: 0.98389 | Train acc: 0.64255
Test loss: 0.92103 | Test acc: 0.66926
Epoch: 8
---------
Train loss: 0.87593 | Train acc: 0.68157
Test loss: 0.80237 | Test acc: 0.71105
Epoch: 9
---------
Train loss: 0.79426 | Train acc: 0.71417
Test loss: 0.79490 | Test acc: 0.70963
Epoch: 10
---------
Train loss: 0.68222 | Train acc: 0.75204
Test loss: 0.66262 | Test acc: 0.75992
Epoch: 11
---------
Train loss: 0.59417 | Train acc: 0.78370
Test loss: 0.60582 | Test acc: 0.78026
Epoch: 12
---------
Train loss: 0.53144 | Train acc: 0.81037
Test loss: 0.57311 | Test acc: 0.79370
Epoch: 13
---------
Train loss: 0.46417 | Train acc: 0.83218
Test loss: 0.60087 | Test acc: 0.79635
Epoch: 14
---------
Train loss: 0.40754 | Train acc: 0.85329
Test loss: 0.60476 | Test acc: 0.79117
Epoch: 15
---------
Train loss: 0.35350 | Train acc: 0.87481
Test loss: 0.53716 | Test acc: 0.82415
Epoch: 16
---------
Train loss: 0.30994 | Train acc: 0.89287
Test loss: 0.49068 | Test acc: 0.82594
Epoch: 17
---------
Train loss: 0.27412 | Train acc: 0.90634
Test loss: 0.52953 | Test acc: 0.82427
Epoch: 18
---------
Train loss: 0.22980 | Train acc: 0.92069
Test loss: 0.55975 | Test acc: 0.81207
Epoch: 19
---------
Train loss: 0.19639 | Train acc: 0.93153
Test loss: 0.68178 | Test acc: 0.80369

In [53]:

model2c_evaluation = evaluate_model(model=model2c, data_loader=test_dataloader, loss_fn=loss_fn, accuracy_fn=accuracy_fn, device=device)
model2c_evaluation

Out[53]:

{'model': 'EuroSATModelV2c',
 'loss': 0.68178391456604,
 'accuracy': 0.8036859631538391}

Let’s compare all the CNN models:

In [54]:

models_comparison = pd.DataFrame([model2_evaluation, model2a_evaluation, model2b_evaluation, model2c_evaluation]).set_index('model')
models_comparison

Out[54]:

	loss	accuracy
model
EuroSATModelV2	0.903567	0.798200
EuroSATModelV2a	0.434743	0.860207
EuroSATModelV2b	0.478643	0.843072
EuroSATModelV2c	0.681784	0.803686

In [55]:

models_comparison[['loss', 'accuracy']].plot.bar(title='Loss & Accuracy')
plt.xlabel('Model')
plt.xticks(rotation=45)

Out[55]:

(array([0, 1, 2, 3]),
 [Text(0, 0, 'EuroSATModelV2'),
  Text(1, 0, 'EuroSATModelV2a'),
  Text(2, 0, 'EuroSATModelV2b'),
  Text(3, 0, 'EuroSATModelV2c')])

We could keep optimizing the model by changing the learning rate, changing the number of epochs, changing the number of hidden units, adding more layers to the blocks and classifier, and so on. But let’s stop here.

Looks like of all four CNN models, model2a is the best. So, let’s use it to make some predictions.

Predictions¶

Let’s use the best model to make some predictions:

In [56]:

def make_predictions(model: nn.Module, data: list, device: torch.device):
    pred_probs = []
    model.eval()
    with torch.inference_mode():
        for sample in data:
            sample = torch.unsqueeze(sample, dim=0).to(device)

            # logits
            pred_logit = model(sample)

            # logits -> prediction probabilities
            pred_prob = torch.softmax(pred_logit.squeeze(), dim=0)

            pred_probs.append(pred_prob.cpu())
    return torch.stack(pred_probs)

Let’s get some random samples:

In [57]:

import random
random.seed(42)
test_samples = []
test_labels = []
for sample, label in random.sample(list(test_data), k=16):
    test_samples.append(sample)
    test_labels.append(label)

Let’s use the samples for predictions:

In [58]:

pred_probs = make_predictions(model=model2a, data=test_samples, device=device)
pred_probs

Out[58]:

tensor([[5.2620e-05, 9.9973e-01, 8.0184e-05, 8.9340e-05, 4.5576e-12, 5.4281e-07,
         1.4791e-08, 1.3254e-10, 1.7600e-05, 3.0821e-05],
        [4.2757e-05, 1.0373e-06, 1.6362e-01, 4.9032e-01, 2.6546e-05, 5.4002e-03,
         3.3794e-01, 3.2598e-04, 2.3233e-03, 1.0836e-07],
        [4.3061e-05, 1.4522e-01, 8.5304e-01, 1.1201e-04, 1.1554e-08, 9.6508e-04,
         5.5737e-04, 6.5630e-07, 6.4802e-05, 3.3018e-06],
        [1.6674e-04, 2.3102e-13, 1.1174e-09, 1.0854e-02, 1.7073e-08, 1.2278e-07,
         5.0648e-05, 3.9191e-11, 9.8893e-01, 3.0446e-13],
        [1.4487e-05, 9.9870e-01, 1.9194e-04, 2.6715e-05, 4.0150e-11, 1.7323e-06,
         1.4556e-08, 2.3499e-10, 1.0334e-03, 3.5787e-05],
        [4.9741e-15, 5.1709e-13, 1.5151e-04, 6.1187e-06, 9.9942e-01, 2.9146e-10,
         4.1278e-07, 4.2279e-04, 1.4350e-09, 4.1282e-11],
        [6.9446e-07, 4.4732e-05, 9.9278e-01, 6.5584e-05, 1.6141e-07, 5.1006e-04,
         6.4587e-03, 1.3020e-04, 1.3990e-05, 4.0530e-08],
        [1.0597e-02, 2.6723e-04, 6.1078e-10, 4.6423e-04, 7.6807e-12, 9.7655e-01,
         6.7932e-07, 5.9719e-11, 1.1939e-02, 1.8512e-04],
        [1.9481e-04, 8.2576e-08, 1.2252e-01, 1.8858e-02, 3.1774e-07, 5.6317e-05,
         7.5772e-04, 1.9258e-08, 8.5761e-01, 9.9261e-09],
        [3.4365e-02, 1.9095e-06, 6.8999e-05, 6.5470e-01, 1.2693e-08, 8.7312e-04,
         1.5308e-01, 2.4636e-09, 1.5691e-01, 4.7072e-07],
        [9.7083e-01, 9.8780e-08, 2.8806e-03, 9.6759e-03, 1.2170e-07, 3.8617e-05,
         1.6546e-02, 7.3366e-10, 3.3181e-05, 5.7991e-08],
        [3.1431e-02, 1.1575e-02, 2.5030e-05, 9.6184e-02, 3.7493e-08, 8.2887e-01,
         8.4270e-04, 1.4772e-07, 3.1071e-02, 2.1957e-06],
        [3.4066e-03, 6.3285e-02, 8.0721e-05, 4.5271e-02, 6.3864e-08, 7.9327e-01,
         1.3115e-02, 3.9299e-06, 8.1521e-02, 4.6054e-05],
        [1.2153e-11, 7.4877e-13, 1.4819e-06, 1.9659e-06, 9.9999e-01, 3.4144e-08,
         7.7698e-07, 3.5343e-07, 5.2866e-07, 9.9078e-11],
        [6.9118e-01, 6.6048e-04, 1.8911e-02, 2.4607e-03, 2.5089e-07, 1.0543e-01,
         1.5094e-01, 3.8922e-09, 2.8508e-02, 1.9163e-03],
        [4.7205e-05, 9.9980e-01, 2.4397e-05, 5.7754e-05, 1.5298e-11, 8.7736e-07,
         6.7762e-09, 4.1086e-10, 5.2881e-05, 1.9618e-05]])

Here we have prediction probabilities, but we need prediction labels. Let’s turn the former into the latter then:

In [59]:

pred_labels = torch.argmax(pred_probs, dim=1)
pred_labels

Out[59]:

tensor([1, 3, 2, 8, 1, 4, 2, 5, 8, 3, 0, 5, 5, 4, 0, 1])

Let’s check out the test labels. The more similar they are the better:

In [60]:

test_labels

Out[60]:

[1, 6, 2, 8, 1, 4, 2, 5, 8, 8, 0, 5, 5, 4, 0, 1]

Let’s visualize the samples and predictions and see how many of them were correct:

In [61]:

plt.figure(figsize=(18, 15))
for i, sample in enumerate(test_samples):

    # create subplot
    plt.subplot(4, 4, i + 1)

    # plot target image
    plt.imshow(sample.permute(1, 2, 0))

    # prediction label
    pred_label = classes[pred_labels[i]]

    # ground truth label
    gt_label = classes[test_labels[i]]

    # title text
    title_text = f'Pred: {pred_label} \n Truth: {gt_label}'

    # title color
    title_color = 'g' if pred_label == gt_label else 'r'

    # plot title
    plt.title(title_text, color=title_color)
    plt.axis(False)

As we can see, the predictions were not ideal, but most of them were correct.

Spread the love

Image Classification – Computer Vision with Convolutional Neural Networks and PyTorch

Data Preparation¶

The Data¶

Viewing the Data¶

Input and Output¶

Data Visualization¶

The DataLoader¶

Device-Agnostic Code¶

Baseline Model¶

Loss Function, Optimizer, Accuracy¶

Functionizing the Training and Test Loops¶

Model with Nonlinearities¶

Convolutional Neural Network (CNN)¶

Model Comparison¶

Optimizing the CNN Model¶

Predictions¶

Like this:

Leave a ReplyCancel reply

Image Classification – Computer Vision with Convolutional Neural Networks and PyTorch

Data Preparation¶

The Data¶

Viewing the Data¶

Input and Output¶

Data Visualization¶

The DataLoader¶

Device-Agnostic Code¶

Baseline Model¶

Loss Function, Optimizer, Accuracy¶

Functionizing the Training and Test Loops¶

Model with Nonlinearities¶

Convolutional Neural Network (CNN)¶

Model Comparison¶

Optimizing the CNN Model¶

Predictions¶

Share this:

Like this:

Leave a ReplyCancel reply