In this project, we’ll have a look at a typical workflow for an image classification problem with a convolutional neural network and PyTorch.
There are four major steps to follow during a machine learning process:
- Data preparation
- Model building
- Model training
- Model evaluation
And now, without further ado, let’s jump right in and start at the first step.
Table of Contents
Data Preparation¶
Let’s import the libraries we’re going to need:
import torch
from torch import nn
import torchvision
from torchvision import datasets
from torchvision.transforms import ToTensor
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt
As you see, we’re importing torchvision
here. This is the most important PyTorch computer vision library. It contains datasets, models and image transformations that we use to handle vision-related tasks.
We’ll also need torchvision.datasets
, which contains multiple datasets for image classification, object detection, and other computer vision-related problems.
Next, we’ll need torchvision.transforms
to handle image transformations, and in particular ToTensor
to transform image data into tensors.
Finally, we’ll need torch.utils.data.DataLoader
to create a Python iterable over a dataset.
The Data¶
We’re going to use the EuroSAT dataset. It’s one of many datasets meant for image classification available in torchvision.datsets
. You can read about more built-in datasets here.
Here’s the original description of the dataset:
In this study, we address the challenge of land use and land cover classification using Sentinel-2 satellite images. The Sentinel-2 satellite images are openly and freely accessible provided in the Earth observation program Copernicus. We present a novel dataset based on Sentinel-2 satellite images covering 13 spectral bands and consisting out of 10 classes with in total 27,000 labeled and geo-referenced images.
So, let’s load the dataset:
data = torchvision.datasets.EuroSAT(
root="data",
download=True,
transform=ToTensor()
)
Here we specify the following parameters:
- root – root directory of the dataset
- download – if True, downloads the dataset from the internet (provided it’s not downloaded yet) and puts it in root directory
- transform – a function that transforms the data on download and returns a transformed version
Let’s split the data into training and test sets.
from sklearn.model_selection import train_test_split
train_data, test_data = train_test_split(data, test_size=0.2, random_state=42)
Viewing the Data¶
Let’s view a sample of the data.
image, label = train_data[0]
image, label
(tensor([[[0.1412, 0.1412, 0.1333, ..., 0.1922, 0.1843, 0.1882], [0.1412, 0.1412, 0.1333, ..., 0.1961, 0.1922, 0.2000], [0.1569, 0.1529, 0.1373, ..., 0.2667, 0.2314, 0.2000], ..., [0.4235, 0.4235, 0.4392, ..., 0.2392, 0.2549, 0.2824], [0.3490, 0.3412, 0.4471, ..., 0.2392, 0.2706, 0.3216], [0.3294, 0.3216, 0.4078, ..., 0.2392, 0.2902, 0.3529]], [[0.2510, 0.2510, 0.2431, ..., 0.2902, 0.2941, 0.2980], [0.2510, 0.2510, 0.2431, ..., 0.2863, 0.2902, 0.2980], [0.2745, 0.2706, 0.2549, ..., 0.3333, 0.3020, 0.2784], ..., [0.4745, 0.4745, 0.4902, ..., 0.3608, 0.3569, 0.3686], [0.3922, 0.3961, 0.5020, ..., 0.3529, 0.3490, 0.3843], [0.3725, 0.3765, 0.4627, ..., 0.3529, 0.3529, 0.4000]], [[0.2941, 0.2941, 0.2980, ..., 0.3176, 0.3098, 0.3137], [0.2941, 0.2941, 0.2902, ..., 0.3176, 0.3098, 0.3176], [0.3059, 0.3098, 0.2980, ..., 0.3725, 0.3412, 0.3137], ..., [0.4980, 0.4980, 0.5137, ..., 0.3412, 0.3451, 0.3608], [0.4078, 0.4078, 0.5059, ..., 0.3373, 0.3412, 0.3804], [0.3804, 0.3804, 0.4627, ..., 0.3373, 0.3490, 0.4000]]]), 5)
Input and Output¶
The input are the tensors the images were transformed into on download. There are 3 color channels (red, green, blue), the height and width are 64:
image.shape
torch.Size([3, 64, 64])
The output is just a single number representing the class. From the dataset description we know there are 10 classes. Let’s enumerate them:
classes = data.classes
classes
['AnnualCrop', 'Forest', 'HerbaceousVegetation', 'Highway', 'Industrial', 'Pasture', 'PermanentCrop', 'Residential', 'River', 'SeaLake']
Data Visualization¶
Let’s visualize the first sample in the training set:
image, label = train_data[0]
plt.imshow(image.permute(1, 2, 0))
plt.title(classes[label])
plt.axis(False)
(np.float64(-0.5), np.float64(63.5), np.float64(63.5), np.float64(-0.5))
We use the image.permute
method to change the order of the dimensions. The original order was: color_channel (0) – height (1) – width (2). We need it to be height (1) – width (2) – color_channel (0), hence this specific order.
Now, let’s visualize more samples:
torch.manual_seed(42)
fig = plt.figure(figsize=(14, 9))
rows, cols = 4, 5
for i in range(1, rows * cols + 1):
random_idx = torch.randint(0, len(train_data), size=[1]).item()
img, label = train_data[random_idx]
fig.add_subplot(rows, cols, i)
plt.imshow(img.permute(1, 2, 0))
plt.title(classes[label])
plt.axis(False)
The DataLoader¶
Let’s check out the number of samples in both sets:
len(train_data), len(test_data)
(21600, 5400)
It’s a lot of data. It’ll be more efficient to break up the large dataset into smaller chunks called batches and work at one batch at a time. To do that, we need a DataLoader
. Now the model will have an opportunity to improve not every epoch, but every batch, so much more frequently. A good batch size is 32. We’ll also shuffle the samples so that if they are in any special order, this order will not impact the process of learning. We don’t want the model to learn the order. It’s more efficient to leave the data unshuffled in the test set, though:
BATCH_SIZE = 32
train_dataloader = DataLoader(train_data, batch_size=BATCH_SIZE, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=BATCH_SIZE, shuffle=False)
print(f'Train dataloder: {len(train_dataloader)} batches of {BATCH_SIZE}')
print(f'Test dataloader: {len(test_dataloader)} batches of {BATCH_SIZE}')
Train dataloder: 675 batches of 32 Test dataloader: 169 batches of 32
Let’s check out the shape of the data in the training dataloader:
train_features_batch, train_labels_batch = next(iter(train_dataloader))
train_features_batch.shape, train_labels_batch.shape
(torch.Size([32, 3, 64, 64]), torch.Size([32]))
So, we we have one more dimension, the batch size.
Device-Agnostic Code¶
Let’s set the device to GPU (cuda) if it’s available or CPU if it’s not:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
device
'cuda'
Baseline Model¶
It’s always good practice to start with a baseline model other models will be compared to. This way, we’ll be able to see how our next models perform – better or worse than the baseline model.
So, the baseine model will contain just three layers: an nn.Flatten
layer and two nn.Linear
layers, so no nonlinearity:
class EuroSATModelV0(nn.Module):
def __init__(self, input_shape: int, hidden_units: int, output_shape: int):
super().__init__()
self.layer_stack = nn.Sequential(
nn.Flatten(),
nn.Linear(in_features=input_shape, out_features=hidden_units),
nn.Linear(in_features=hidden_units, out_features=output_shape)
)
def forward(self, x):
return self.layer_stack(x)
The nn.Flatten
layer reduces the dimensions represented by height and width into a single dimension. Let’s create an instance of the model:
model0 = EuroSATModelV0(
input_shape=3 * 64 * 64,
hidden_units=10,
output_shape=len(classes)
)
model0.to(device)
EuroSATModelV0( (layer_stack): Sequential( (0): Flatten(start_dim=1, end_dim=-1) (1): Linear(in_features=12288, out_features=10, bias=True) (2): Linear(in_features=10, out_features=10, bias=True) ) )
Loss Function, Optimizer, Accuracy¶
Next, we need a loss function, an optimizer and some evaluation metrics. We’ll use CrossEntropyLoss
as the loss function, SGD
as the optimizer and torchmetrics.Accuracy
as the only metric. First, we must make sure torchmetrics
is installed:
!pip install torchmetrics
Requirement already satisfied: torchmetrics in /usr/local/lib/python3.11/dist-packages (1.7.1) Requirement already satisfied: numpy>1.20.0 in /usr/local/lib/python3.11/dist-packages (from torchmetrics) (2.0.2) Requirement already satisfied: packaging>17.1 in /usr/local/lib/python3.11/dist-packages (from torchmetrics) (24.2) Requirement already satisfied: torch>=2.0.0 in /usr/local/lib/python3.11/dist-packages (from torchmetrics) (2.6.0+cu124) Requirement already satisfied: lightning-utilities>=0.8.0 in /usr/local/lib/python3.11/dist-packages (from torchmetrics) (0.14.3) Requirement already satisfied: setuptools in /usr/local/lib/python3.11/dist-packages (from lightning-utilities>=0.8.0->torchmetrics) (75.2.0) Requirement already satisfied: typing_extensions in /usr/local/lib/python3.11/dist-packages (from lightning-utilities>=0.8.0->torchmetrics) (4.13.1) Requirement already satisfied: filelock in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->torchmetrics) (3.18.0) Requirement already satisfied: networkx in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->torchmetrics) (3.4.2) Requirement already satisfied: jinja2 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->torchmetrics) (3.1.6) Requirement already satisfied: fsspec in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->torchmetrics) (2025.3.2) Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->torchmetrics) (12.4.127) Requirement already satisfied: nvidia-cuda-runtime-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->torchmetrics) (12.4.127) Requirement already satisfied: nvidia-cuda-cupti-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->torchmetrics) (12.4.127) Requirement already satisfied: nvidia-cudnn-cu12==9.1.0.70 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->torchmetrics) (9.1.0.70) Requirement already satisfied: nvidia-cublas-cu12==12.4.5.8 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->torchmetrics) (12.4.5.8) Requirement already satisfied: nvidia-cufft-cu12==11.2.1.3 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->torchmetrics) (11.2.1.3) Requirement already satisfied: nvidia-curand-cu12==10.3.5.147 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->torchmetrics) (10.3.5.147) Requirement already satisfied: nvidia-cusolver-cu12==11.6.1.9 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->torchmetrics) (11.6.1.9) Requirement already satisfied: nvidia-cusparse-cu12==12.3.1.170 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->torchmetrics) (12.3.1.170) Requirement already satisfied: nvidia-cusparselt-cu12==0.6.2 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->torchmetrics) (0.6.2) Requirement already satisfied: nvidia-nccl-cu12==2.21.5 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->torchmetrics) (2.21.5) Requirement already satisfied: nvidia-nvtx-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->torchmetrics) (12.4.127) Requirement already satisfied: nvidia-nvjitlink-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->torchmetrics) (12.4.127) Requirement already satisfied: triton==3.2.0 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->torchmetrics) (3.2.0) Requirement already satisfied: sympy==1.13.1 in /usr/local/lib/python3.11/dist-packages (from torch>=2.0.0->torchmetrics) (1.13.1) Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.11/dist-packages (from sympy==1.13.1->torch>=2.0.0->torchmetrics) (1.3.0) Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.11/dist-packages (from jinja2->torch>=2.0.0->torchmetrics) (3.0.2)
Now we can instantiate the three:
import torchmetrics
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(params=model0.parameters(), lr=0.02)
accuracy_fn = torchmetrics.Accuracy(task='multiclass', num_classes=len(classes)).to(device)
Functionizing the Training and Test Loops¶
As we’re going to train and evaluate a couple models, let’s make things a little easier for us and functionalize the two loops, the training loop and the test loop:
def train_step(model: nn.Module,
data_loader: DataLoader,
loss_fn: nn.Module,
optimizer: torch.optim.Optimizer,
accuracy_fn: torchmetrics.Accuracy,
device: torch.device = device):
'''
Trains the model on a data_loader.
Args:
model (nn.Module): A PyTorch model to train.
data_loader (DataLoader): A PyTorch DataLoader instance.
loss_fn (nn.Module): The loss function of the model.
optimizer (torch.optim.Optimizer): The optimizer to use to update the model's parameters.
accuracy_fn (torchmetrics.Accuracy): The accuracy function of the model.
device (torch.device, optional): The device to train on. Defaults to device.
'''
train_loss, train_acc = 0, 0
model.to(device)
model.train()
for batch, (X, y) in enumerate(data_loader):
# send data to device
X, y = X.to(device), y.to(device)
# forward pass
y_pred = model(X)
# loss
loss = loss_fn(y_pred, y)
train_loss += loss
# accuracy
acc = accuracy_fn(y_pred, y)
train_acc += acc
# zero grad
optimizer.zero_grad()
# backpropagation
loss.backward()
# optimizer step
optimizer.step()
# loss and accuracy per epoch
train_loss /= len(data_loader)
train_acc /= len(data_loader)
# info
print(f'Train loss: {train_loss:.5f} | Train acc: {train_acc:.5f}')
def test_step(model: nn.Module,
data_loader: DataLoader,
loss_fn: nn.Module,
accuracy_fn: torchmetrics.Accuracy,
device: torch.device = device):
'''
Evaluates the model on a data_loader.
Args:
model (nn.Module): A PyTorch model.
data_loader (DataLoader): A PyTorch DataLoader instance.
loss_fn (nn.Module): The loss function of the model.
accuracy_fn (torchmetrics.Accuracy): The accuracy function of the model.
device (torch.device, optional): The device.
'''
test_loss, test_acc = 0, 0
model.to(device)
model.eval()
with torch.inference_mode():
for X, y in data_loader:
# send data to device
X, y = X.to(device), y.to(device)
# forward pass
test_pred = model(X)
# loss
test_loss += loss_fn(test_pred, y)
# accuracy
test_acc += accuracy_fn(test_pred, y)
# loss and accuracy
test_loss /= len(data_loader)
test_acc /= len(data_loader)
# info
print(f'Test loss: {test_loss:.5f} | Test acc: {test_acc:.5f}')
Let’s now use the functions to train and evaluate the baseline model:
torch.manual_seed(42)
epochs = 20
for epoch in range(epochs):
print(f'Epoch: {epoch}\n---------')
train_step(model=model0, data_loader=train_dataloader, loss_fn=loss_fn, optimizer=optimizer, accuracy_fn=accuracy_fn)
test_step(model=model0, data_loader=test_dataloader, loss_fn=loss_fn, accuracy_fn=accuracy_fn)
Epoch: 0 --------- Train loss: 1.90438 | Train acc: 0.27403 Test loss: 1.73945 | Test acc: 0.37075 Epoch: 1 --------- Train loss: 1.73678 | Train acc: 0.33690 Test loss: 1.71340 | Test acc: 0.31755 Epoch: 2 --------- Train loss: 1.70461 | Train acc: 0.35361 Test loss: 1.79175 | Test acc: 0.30356 Epoch: 3 --------- Train loss: 1.68641 | Train acc: 0.36264 Test loss: 1.68053 | Test acc: 0.37512 Epoch: 4 --------- Train loss: 1.67662 | Train acc: 0.36759 Test loss: 1.64029 | Test acc: 0.38505 Epoch: 5 --------- Train loss: 1.66598 | Train acc: 0.37278 Test loss: 1.63045 | Test acc: 0.38911 Epoch: 6 --------- Train loss: 1.65819 | Train acc: 0.38083 Test loss: 1.77616 | Test acc: 0.30560 Epoch: 7 --------- Train loss: 1.64496 | Train acc: 0.38731 Test loss: 1.74361 | Test acc: 0.33876 Epoch: 8 --------- Train loss: 1.64172 | Train acc: 0.38532 Test loss: 1.63300 | Test acc: 0.37272 Epoch: 9 --------- Train loss: 1.63537 | Train acc: 0.38954 Test loss: 1.70837 | Test acc: 0.35441 Epoch: 10 --------- Train loss: 1.62831 | Train acc: 0.39056 Test loss: 1.71232 | Test acc: 0.36662 Epoch: 11 --------- Train loss: 1.62696 | Train acc: 0.39255 Test loss: 1.72715 | Test acc: 0.38098 Epoch: 12 --------- Train loss: 1.62105 | Train acc: 0.39704 Test loss: 1.71678 | Test acc: 0.38727 Epoch: 13 --------- Train loss: 1.61350 | Train acc: 0.40009 Test loss: 1.64302 | Test acc: 0.35639 Epoch: 14 --------- Train loss: 1.60781 | Train acc: 0.39856 Test loss: 1.61220 | Test acc: 0.36236 Epoch: 15 --------- Train loss: 1.60607 | Train acc: 0.40255 Test loss: 1.63067 | Test acc: 0.38609 Epoch: 16 --------- Train loss: 1.60241 | Train acc: 0.40333 Test loss: 1.69421 | Test acc: 0.38036 Epoch: 17 --------- Train loss: 1.59711 | Train acc: 0.40356 Test loss: 1.64635 | Test acc: 0.42480 Epoch: 18 --------- Train loss: 1.59209 | Train acc: 0.40657 Test loss: 1.60754 | Test acc: 0.43916 Epoch: 19 --------- Train loss: 1.59512 | Train acc: 0.40370 Test loss: 1.64686 | Test acc: 0.34159
As we can see, the loss is decreasing and the accuracy is increasing, which is good. Let’s create a function to display these two values so that we ca use it later with the other models too:
torch.manual_seed(42)
def evaluate_model(model: nn.Module,
data_loader: DataLoader,
loss_fn: nn.Module,
accuracy_fn: torchmetrics.Accuracy,
device: torch.device):
'''
Evaluates the model on a data_loader.
Args:
model (nn.Module): A PyTorch model trained to make predictions on the data_loader.
data_loader (DataLoader): A PyTorch DataLoader instance.
loss_fn (nn.Module): The loss function of the model.
accuracy_fn (torchmetrics.Accuracy): The accuracy function of the model.
Returns:
Returns a dictionary with the results of model predicting on data_loader.
'''
loss, acc = 0, 0
model.eval()
with torch.inference_mode():
for X, y in data_loader:
# send data to device
X, y = X.to(device), y.to(device)
# forward pass
y_pred = model(X)
# loss
loss += loss_fn(y_pred, y)
# accuracy
acc += accuracy_fn(y_pred, y)
# loss and accuracy in batch
loss /= len(data_loader)
acc /= len(data_loader)
return {'model': model.__class__.__name__,
'loss': loss.item(),
'accuracy': acc.item()}
And now let’s use the function to evaluate the baseline model:
model0_evaluation = evaluate_model(model=model0, data_loader=test_dataloader, loss_fn=loss_fn, accuracy_fn=accuracy_fn, device=device)
model0_evaluation
{'model': 'EuroSATModelV0', 'loss': 1.6468614339828491, 'accuracy': 0.3415926992893219}
Model with Nonlinearities¶
Let’s create another model, very much like the baseline model, but this time with some nonlinear activation functions between the layers:
class EuroSATModelV1(nn.Module):
def __init__(self, input_shape: int, hidden_units: int, output_shape: int):
super().__init__()
self.layer_stack = nn.Sequential(
nn.Flatten(),
nn.Linear(in_features=input_shape, out_features=hidden_units),
nn.ReLU(),
nn.Linear(in_features=hidden_units, out_features=output_shape),
nn.ReLU()
)
def forward(self, x):
return self.layer_stack(x)
Here’s an instance of the model:
torch.manual_seed(42)
model1 = EuroSATModelV1(
input_shape=3 * 64 * 64,
hidden_units=10,
output_shape=len(classes)
).to(device)
Let’s define the loss function, optimizer and accuracy:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(params=model1.parameters(), lr=0.02)
accuracy_fn = torchmetrics.Accuracy(task='multiclass', num_classes=len(classes)).to(device)
Now we can train and evaluate the model:
epochs = 20
for epoch in range(epochs):
print(f'Epoch: {epoch}\n---------')
train_step(model=model1, data_loader=train_dataloader, loss_fn=loss_fn, optimizer=optimizer, accuracy_fn=accuracy_fn)
test_step(model=model1, data_loader=test_dataloader, loss_fn=loss_fn, accuracy_fn=accuracy_fn)
Epoch: 0 --------- Train loss: 2.22829 | Train acc: 0.18944 Test loss: 2.19198 | Test acc: 0.19903 Epoch: 1 --------- Train loss: 2.17576 | Train acc: 0.20185 Test loss: 2.17099 | Test acc: 0.20587 Epoch: 2 --------- Train loss: 2.15642 | Train acc: 0.21685 Test loss: 2.16047 | Test acc: 0.22608 Epoch: 3 --------- Train loss: 2.14495 | Train acc: 0.22667 Test loss: 2.15271 | Test acc: 0.22584 Epoch: 4 --------- Train loss: 2.13550 | Train acc: 0.23389 Test loss: 2.13791 | Test acc: 0.22898 Epoch: 5 --------- Train loss: 2.12349 | Train acc: 0.23991 Test loss: 2.12725 | Test acc: 0.23126 Epoch: 6 --------- Train loss: 2.11282 | Train acc: 0.23917 Test loss: 2.12444 | Test acc: 0.24932 Epoch: 7 --------- Train loss: 2.10570 | Train acc: 0.24449 Test loss: 2.12664 | Test acc: 0.22947 Epoch: 8 --------- Train loss: 2.10138 | Train acc: 0.24444 Test loss: 2.10546 | Test acc: 0.24069 Epoch: 9 --------- Train loss: 2.09728 | Train acc: 0.24704 Test loss: 2.10856 | Test acc: 0.26769 Epoch: 10 --------- Train loss: 2.09383 | Train acc: 0.24875 Test loss: 2.09566 | Test acc: 0.24760 Epoch: 11 --------- Train loss: 2.09142 | Train acc: 0.24884 Test loss: 2.11417 | Test acc: 0.24581 Epoch: 12 --------- Train loss: 2.08912 | Train acc: 0.24940 Test loss: 2.09818 | Test acc: 0.24168 Epoch: 13 --------- Train loss: 2.08804 | Train acc: 0.25069 Test loss: 2.09497 | Test acc: 0.24519 Epoch: 14 --------- Train loss: 2.08647 | Train acc: 0.25181 Test loss: 2.09831 | Test acc: 0.24199 Epoch: 15 --------- Train loss: 2.08673 | Train acc: 0.25181 Test loss: 2.09825 | Test acc: 0.25037 Epoch: 16 --------- Train loss: 2.08387 | Train acc: 0.25194 Test loss: 2.09910 | Test acc: 0.26492 Epoch: 17 --------- Train loss: 2.08352 | Train acc: 0.25236 Test loss: 2.08875 | Test acc: 0.25234 Epoch: 18 --------- Train loss: 2.08143 | Train acc: 0.25384 Test loss: 2.09729 | Test acc: 0.26418 Epoch: 19 --------- Train loss: 2.08204 | Train acc: 0.25412 Test loss: 2.08610 | Test acc: 0.25481
torch.manual_seed(42)
model1_evaluation = evaluate_model(model=model1, data_loader=test_dataloader, loss_fn=loss_fn, accuracy_fn=accuracy_fn, device=device)
model1_evaluation
{'model': 'EuroSATModelV1', 'loss': 2.086103677749634, 'accuracy': 0.2548076808452606}
Let’s compare the two models:
import pandas as pd
models_comparison = pd.DataFrame([model0_evaluation, model1_evaluation]).set_index('model')
models_comparison
loss | accuracy | |
---|---|---|
model | ||
EuroSATModelV0 | 1.646861 | 0.341593 |
EuroSATModelV1 | 2.086104 | 0.254808 |
Looks like adding the nonlinear activation functions didn’t help at all. Just the contrary. What about a type of model that is best suited for working with images? Our third, final, model is going to be a convolutional neural network.
Convolutional Neural Network (CNN)¶
Convolutional neural networks are good at finding patterns in visual data. Between the input and output layers they contain blocks with, possibly repeated, convolutional layers, activation layers and pooling layers. And possibly also other types of layers. Let’s create another model, this time a CNN, with two such blocks and a classifier:
class EuroSATModelV2(nn.Module):
def __init__(self, input_shape: int, hidden_units: int, output_shape: int):
super().__init__()
self.conv_block_1 = nn.Sequential(
nn.Conv2d(in_channels=input_shape, out_channels=hidden_units, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=hidden_units, out_channels=hidden_units, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2)
)
self.conv_block_2 = nn.Sequential(
nn.Conv2d(in_channels=hidden_units, out_channels=hidden_units, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=hidden_units, out_channels=hidden_units, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2)
)
self.classifier = nn.Sequential(
nn.Flatten(),
nn.Linear(in_features=hidden_units * 16 * 16, out_features=output_shape)
)
def forward(self, x: torch.Tensor):
x = self.conv_block_1(x)
x = self.conv_block_2(x)
x = self.classifier(x)
return x
As you can see, we pass the data through each of the blocks in turn. Here we have two types of layers characteristic of CNNs, the Conv2d and the MaxPool2d layers. Let’s have a look at them on by one.
Let’s start with the nn.conv2d
layer. It takes the following attributes:
- in_channels – number of channels in the input image,
- out_channels – number of channels produced by the convolution,
- kernel_size – size of the kernel (filter) that hovers over the pixels of the image,
- stride – the step of the kernel at each move,
- padding – padding added to the four sides of the input.
The first convolutional layer is followed by a ReLU activation layer, another convolutional layer, another ReLU layer and a nn.MaxPool2d
layer. It takes a kernel_size
attribute that affects the output shape. In this layer, each time only the maximum value from the given part of a tensor is picked.
Let’s instantiate the model:
torch.manual_seed(42)
model2 = EuroSATModelV2(
input_shape=3,
hidden_units=32,
output_shape=len(classes)
).to(device)
model2
EuroSATModelV2( (conv_block_1): Sequential( (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU() (2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (3): ReLU() (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) ) (conv_block_2): Sequential( (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU() (2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (3): ReLU() (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) ) (classifier): Sequential( (0): Flatten(start_dim=1, end_dim=-1) (1): Linear(in_features=8192, out_features=10, bias=True) ) )
Let’s now define the loss function, optimizer and accuracy, just as before:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(params=model2.parameters(), lr=0.02)
accuracy_fn = torchmetrics.Accuracy(task='multiclass', num_classes=len(classes)).to(device)
Finally, let’s train and evaluate the model.
epochs = 20
for epoch in range(epochs):
print(f'Epoch: {epoch}\n---------')
# train
train_step(model=model2, data_loader=train_dataloader, loss_fn=loss_fn, optimizer=optimizer, accuracy_fn=accuracy_fn, device=device)
test_step(model=model2, data_loader=test_dataloader, loss_fn=loss_fn, accuracy_fn=accuracy_fn, device=device)
Epoch: 0 --------- Train loss: 1.92549 | Train acc: 0.26407 Test loss: 1.67585 | Test acc: 0.34005 Epoch: 1 --------- Train loss: 1.55153 | Train acc: 0.43287 Test loss: 1.33739 | Test acc: 0.50986 Epoch: 2 --------- Train loss: 1.18155 | Train acc: 0.58403 Test loss: 1.00698 | Test acc: 0.63283 Epoch: 3 --------- Train loss: 0.91457 | Train acc: 0.67819 Test loss: 0.98556 | Test acc: 0.64904 Epoch: 4 --------- Train loss: 0.76308 | Train acc: 0.73227 Test loss: 0.96170 | Test acc: 0.66389 Epoch: 5 --------- Train loss: 0.64701 | Train acc: 0.77296 Test loss: 0.70806 | Test acc: 0.75462 Epoch: 6 --------- Train loss: 0.57117 | Train acc: 0.80028 Test loss: 0.69177 | Test acc: 0.76140 Epoch: 7 --------- Train loss: 0.48993 | Train acc: 0.82963 Test loss: 0.68994 | Test acc: 0.76362 Epoch: 8 --------- Train loss: 0.44711 | Train acc: 0.84259 Test loss: 0.63598 | Test acc: 0.79093 Epoch: 9 --------- Train loss: 0.39383 | Train acc: 0.86176 Test loss: 0.62630 | Test acc: 0.79167 Epoch: 10 --------- Train loss: 0.35554 | Train acc: 0.87444 Test loss: 0.66397 | Test acc: 0.79050 Epoch: 11 --------- Train loss: 0.30153 | Train acc: 0.89472 Test loss: 0.71465 | Test acc: 0.78482 Epoch: 12 --------- Train loss: 0.27912 | Train acc: 0.90255 Test loss: 0.75525 | Test acc: 0.78310 Epoch: 13 --------- Train loss: 0.47336 | Train acc: 0.84134 Test loss: 0.71628 | Test acc: 0.76103 Epoch: 14 --------- Train loss: 0.30538 | Train acc: 0.89204 Test loss: 0.79924 | Test acc: 0.75832 Epoch: 15 --------- Train loss: 0.23845 | Train acc: 0.92190 Test loss: 0.93719 | Test acc: 0.75382 Epoch: 16 --------- Train loss: 0.20709 | Train acc: 0.92903 Test loss: 0.86071 | Test acc: 0.78741 Epoch: 17 --------- Train loss: 0.21379 | Train acc: 0.92741 Test loss: 0.95157 | Test acc: 0.76356 Epoch: 18 --------- Train loss: 0.19080 | Train acc: 0.93745 Test loss: 0.84801 | Test acc: 0.78575 Epoch: 19 --------- Train loss: 0.14242 | Train acc: 0.95324 Test loss: 0.90357 | Test acc: 0.79820
model2_evaluation = evaluate_model(model=model2, data_loader=test_dataloader, loss_fn=loss_fn, accuracy_fn=accuracy_fn, device=device)
model2_evaluation
{'model': 'EuroSATModelV2', 'loss': 0.9035666584968567, 'accuracy': 0.7982001900672913}
Model Comparison¶
Let’s compare all three models:
models_comparison = pd.DataFrame([model0_evaluation, model1_evaluation, model2_evaluation]).set_index('model')
models_comparison
loss | accuracy | |
---|---|---|
model | ||
EuroSATModelV0 | 1.646861 | 0.341593 |
EuroSATModelV1 | 2.086104 | 0.254808 |
EuroSATModelV2 | 0.903567 | 0.798200 |
Let’s also compare the models visually:
models_comparison[['loss', 'accuracy']].plot.bar(title='Loss & Accuracy')
plt.xlabel('Model')
plt.xticks(rotation=45)
(array([0, 1, 2]), [Text(0, 0, 'EuroSATModelV0'), Text(1, 0, 'EuroSATModelV1'), Text(2, 0, 'EuroSATModelV2')])
Looks like the CNN performs best. But it’s not ideal. Let’s try to improve it.
Optimizing the CNN Model¶
The first thing we could do is add some nn.Dropout
layers to the two blocks. Their role is to drop a portion of the neurons. Let’s set the portion to be 30%:
class EuroSATModelV2a(nn.Module):
def __init__(self, input_shape: int, hidden_units: int, output_shape: int):
super().__init__()
self.conv_block_1 = nn.Sequential(
nn.Conv2d(in_channels=input_shape, out_channels=hidden_units, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=hidden_units, out_channels=hidden_units, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2),
nn.Dropout(p=0.3)
)
self.conv_block_2 = nn.Sequential(
nn.Conv2d(in_channels=hidden_units, out_channels=hidden_units, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=hidden_units, out_channels=hidden_units, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2),
nn.Dropout(p=0.3)
)
self.classifier = nn.Sequential(
nn.Flatten(),
nn.Linear(in_features=hidden_units * 16 * 16, out_features=output_shape)
)
def forward(self, x: torch.Tensor):
x = self.conv_block_1(x)
x = self.conv_block_2(x)
x = self.classifier(x)
return x
Let’s instantiate the model:
torch.manual_seed(42)
model2a = EuroSATModelV2a(
input_shape=3,
hidden_units=32,
output_shape=len(classes)
).to(device)
model2a
EuroSATModelV2a( (conv_block_1): Sequential( (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU() (2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (3): ReLU() (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (5): Dropout(p=0.3, inplace=False) ) (conv_block_2): Sequential( (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU() (2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (3): ReLU() (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (5): Dropout(p=0.3, inplace=False) ) (classifier): Sequential( (0): Flatten(start_dim=1, end_dim=-1) (1): Linear(in_features=8192, out_features=10, bias=True) ) )
Let’s now define the loss function, optimizer and accuracy:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(params=model2a.parameters(), lr=0.02)
accuracy_fn = torchmetrics.Accuracy(task='multiclass', num_classes=len(classes)).to(device)
Let’s train and evaluate the model:
epochs = 20
for epoch in range(epochs):
print(f'Epoch: {epoch}\n---------')
# train
train_step(model=model2a, data_loader=train_dataloader, loss_fn=loss_fn, optimizer=optimizer, accuracy_fn=accuracy_fn, device=device)
test_step(model=model2a, data_loader=test_dataloader, loss_fn=loss_fn, accuracy_fn=accuracy_fn, device=device)
Epoch: 0 --------- Train loss: 1.92495 | Train acc: 0.26181 Test loss: 1.68880 | Test acc: 0.33284 Epoch: 1 --------- Train loss: 1.63997 | Train acc: 0.39977 Test loss: 1.61185 | Test acc: 0.43362 Epoch: 2 --------- Train loss: 1.40260 | Train acc: 0.50014 Test loss: 1.16747 | Test acc: 0.57890 Epoch: 3 --------- Train loss: 1.11090 | Train acc: 0.60968 Test loss: 0.95288 | Test acc: 0.64540 Epoch: 4 --------- Train loss: 0.89898 | Train acc: 0.68093 Test loss: 0.91601 | Test acc: 0.65459 Epoch: 5 --------- Train loss: 0.76764 | Train acc: 0.72440 Test loss: 0.66465 | Test acc: 0.76504 Epoch: 6 --------- Train loss: 0.66887 | Train acc: 0.76380 Test loss: 0.83158 | Test acc: 0.69027 Epoch: 7 --------- Train loss: 0.60047 | Train acc: 0.78477 Test loss: 0.62674 | Test acc: 0.77620 Epoch: 8 --------- Train loss: 0.54571 | Train acc: 0.80671 Test loss: 0.54791 | Test acc: 0.80806 Epoch: 9 --------- Train loss: 0.50021 | Train acc: 0.81880 Test loss: 0.59864 | Test acc: 0.78717 Epoch: 10 --------- Train loss: 0.44885 | Train acc: 0.84056 Test loss: 0.60463 | Test acc: 0.78384 Epoch: 11 --------- Train loss: 0.41617 | Train acc: 0.84856 Test loss: 0.50600 | Test acc: 0.81934 Epoch: 12 --------- Train loss: 0.39486 | Train acc: 0.85968 Test loss: 0.52694 | Test acc: 0.81607 Epoch: 13 --------- Train loss: 0.36861 | Train acc: 0.86917 Test loss: 0.73988 | Test acc: 0.77817 Epoch: 14 --------- Train loss: 0.34378 | Train acc: 0.87907 Test loss: 0.53473 | Test acc: 0.80917 Epoch: 15 --------- Train loss: 0.32078 | Train acc: 0.88755 Test loss: 0.45843 | Test acc: 0.84449 Epoch: 16 --------- Train loss: 0.29895 | Train acc: 0.89444 Test loss: 0.52832 | Test acc: 0.82723 Epoch: 17 --------- Train loss: 0.28797 | Train acc: 0.89662 Test loss: 0.48809 | Test acc: 0.83870 Epoch: 18 --------- Train loss: 0.27637 | Train acc: 0.90218 Test loss: 0.47354 | Test acc: 0.84369 Epoch: 19 --------- Train loss: 0.25855 | Train acc: 0.91000 Test loss: 0.43474 | Test acc: 0.86021
model2a_evaluation = evaluate_model(model=model2a, data_loader=test_dataloader, loss_fn=loss_fn, accuracy_fn=accuracy_fn, device=device)
model2a_evaluation
{'model': 'EuroSATModelV2a', 'loss': 0.43474334478378296, 'accuracy': 0.8602071404457092}
Let’s now compare this model with the previous version:
models_comparison = pd.DataFrame([model2_evaluation, model2a_evaluation]).set_index('model')
models_comparison
loss | accuracy | |
---|---|---|
model | ||
EuroSATModelV2 | 0.903567 | 0.798200 |
EuroSATModelV2a | 0.434743 | 0.860207 |
models_comparison[['loss', 'accuracy']].plot.bar(title='Loss & Accuracy')
plt.xlabel('Model')
plt.xticks(rotation=45)
(array([0, 1]), [Text(0, 0, 'EuroSATModelV2'), Text(1, 0, 'EuroSATModelV2a')])
We can further optimize the model by adding another nn.Conv2s
layer to each block:
class EuroSATModelV2b(nn.Module):
def __init__(self, input_shape: int, hidden_units: int, output_shape: int):
super().__init__()
self.conv_block_1 = nn.Sequential(
nn.Conv2d(in_channels=input_shape, out_channels=hidden_units, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=hidden_units, out_channels=hidden_units, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=hidden_units, out_channels=hidden_units, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2),
nn.Dropout(p=0.3)
)
self.conv_block_2 = nn.Sequential(
nn.Conv2d(in_channels=hidden_units, out_channels=hidden_units, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=hidden_units, out_channels=hidden_units, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=hidden_units, out_channels=hidden_units, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2),
nn.Dropout(p=0.3)
)
self.classifier = nn.Sequential(
nn.Flatten(),
nn.Linear(in_features=hidden_units * 16 * 16, out_features=output_shape)
)
def forward(self, x: torch.Tensor):
x = self.conv_block_1(x)
x = self.conv_block_2(x)
x = self.classifier(x)
return x
Let’s instantiate the model:
torch.manual_seed(42)
model2b = EuroSATModelV2b(
input_shape=3,
hidden_units=32,
output_shape=len(classes)
).to(device)
model2b
EuroSATModelV2b( (conv_block_1): Sequential( (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU() (2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (3): ReLU() (4): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (5): ReLU() (6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (7): Dropout(p=0.3, inplace=False) ) (conv_block_2): Sequential( (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU() (2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (3): ReLU() (4): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (5): ReLU() (6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (7): Dropout(p=0.3, inplace=False) ) (classifier): Sequential( (0): Flatten(start_dim=1, end_dim=-1) (1): Linear(in_features=8192, out_features=10, bias=True) ) )
Let’s now define the loss function, optimizer and accuracy:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(params=model2b.parameters(), lr=0.02)
accuracy_fn = torchmetrics.Accuracy(task='multiclass', num_classes=len(classes)).to(device)
Let’s train and evaluate the model:
epochs = 20
for epoch in range(epochs):
print(f'Epoch: {epoch}\n---------')
# train
train_step(model=model2b, data_loader=train_dataloader, loss_fn=loss_fn, optimizer=optimizer, accuracy_fn=accuracy_fn, device=device)
test_step(model=model2b, data_loader=test_dataloader, loss_fn=loss_fn, accuracy_fn=accuracy_fn, device=device)
Epoch: 0 --------- Train loss: 2.29624 | Train acc: 0.10782 Test loss: 2.29568 | Test acc: 0.11236 Epoch: 1 --------- Train loss: 2.07952 | Train acc: 0.20009 Test loss: 1.76738 | Test acc: 0.29543 Epoch: 2 --------- Train loss: 1.73246 | Train acc: 0.35181 Test loss: 1.62496 | Test acc: 0.39750 Epoch: 3 --------- Train loss: 1.54407 | Train acc: 0.43130 Test loss: 1.40638 | Test acc: 0.48515 Epoch: 4 --------- Train loss: 1.32119 | Train acc: 0.51361 Test loss: 1.21183 | Test acc: 0.55393 Epoch: 5 --------- Train loss: 1.06509 | Train acc: 0.62190 Test loss: 0.90633 | Test acc: 0.67098 Epoch: 6 --------- Train loss: 0.91606 | Train acc: 0.67144 Test loss: 0.83162 | Test acc: 0.70353 Epoch: 7 --------- Train loss: 0.81681 | Train acc: 0.70634 Test loss: 0.69446 | Test acc: 0.75783 Epoch: 8 --------- Train loss: 0.67483 | Train acc: 0.75968 Test loss: 0.61005 | Test acc: 0.78556 Epoch: 9 --------- Train loss: 0.61503 | Train acc: 0.78444 Test loss: 0.69197 | Test acc: 0.75727 Epoch: 10 --------- Train loss: 0.55332 | Train acc: 0.80644 Test loss: 0.56710 | Test acc: 0.80738 Epoch: 11 --------- Train loss: 0.49726 | Train acc: 0.82315 Test loss: 0.56055 | Test acc: 0.80615 Epoch: 12 --------- Train loss: 0.45686 | Train acc: 0.83574 Test loss: 0.63468 | Test acc: 0.79093 Epoch: 13 --------- Train loss: 0.41511 | Train acc: 0.85315 Test loss: 0.54203 | Test acc: 0.81879 Epoch: 14 --------- Train loss: 0.40171 | Train acc: 0.86000 Test loss: 0.52208 | Test acc: 0.82495 Epoch: 15 --------- Train loss: 0.35454 | Train acc: 0.87560 Test loss: 0.49974 | Test acc: 0.83389 Epoch: 16 --------- Train loss: 0.33627 | Train acc: 0.87856 Test loss: 0.55583 | Test acc: 0.81700 Epoch: 17 --------- Train loss: 0.31076 | Train acc: 0.89074 Test loss: 0.62120 | Test acc: 0.80769 Epoch: 18 --------- Train loss: 0.28916 | Train acc: 0.89532 Test loss: 0.53589 | Test acc: 0.83315 Epoch: 19 --------- Train loss: 0.27316 | Train acc: 0.90597 Test loss: 0.47864 | Test acc: 0.84307
model2b_evaluation = evaluate_model(model=model2b, data_loader=test_dataloader, loss_fn=loss_fn, accuracy_fn=accuracy_fn, device=device)
model2b_evaluation
{'model': 'EuroSATModelV2b', 'loss': 0.4786429703235626, 'accuracy': 0.8430720567703247}
Let’s see how the three CNN models compare:
models_comparison = pd.DataFrame([model2_evaluation, model2a_evaluation, model2b_evaluation]).set_index('model')
models_comparison
loss | accuracy | |
---|---|---|
model | ||
EuroSATModelV2 | 0.903567 | 0.798200 |
EuroSATModelV2a | 0.434743 | 0.860207 |
EuroSATModelV2b | 0.478643 | 0.843072 |
models_comparison[['loss', 'accuracy']].plot.bar(title='Loss & Accuracy')
plt.xlabel('Model')
plt.xticks(rotation=45)
(array([0, 1, 2]), [Text(0, 0, 'EuroSATModelV2'), Text(1, 0, 'EuroSATModelV2a'), Text(2, 0, 'EuroSATModelV2b')])
Let’s add one more improvement. This time, we’ll modify the classifier by adding a 50%-nn.Dropout
layer, an nn.ReLU
layer and another linear layer:
class EuroSATModelV2c(nn.Module):
def __init__(self, input_shape: int, hidden_units: int, output_shape: int):
super().__init__()
self.conv_block_1 = nn.Sequential(
nn.Conv2d(in_channels=input_shape, out_channels=hidden_units, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=hidden_units, out_channels=hidden_units, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=hidden_units, out_channels=hidden_units, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2),
nn.Dropout(p=0.3)
)
self.conv_block_2 = nn.Sequential(
nn.Conv2d(in_channels=hidden_units, out_channels=hidden_units, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=hidden_units, out_channels=hidden_units, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.Conv2d(in_channels=hidden_units, out_channels=hidden_units, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2),
nn.Dropout(p=0.3)
)
self.classifier = nn.Sequential(
nn.Flatten(),
nn.Linear(in_features=hidden_units * 16 * 16, out_features=hidden_units * 16 * 16),
nn.ReLU(),
nn.Dropout(0.5),
nn.Linear(in_features=hidden_units * 16 * 16, out_features=output_shape)
)
def forward(self, x: torch.Tensor):
x = self.conv_block_1(x)
x = self.conv_block_2(x)
x = self.classifier(x)
return x
Let’s instantiate the model:
torch.manual_seed(42)
model2c = EuroSATModelV2c(
input_shape=3,
hidden_units=32,
output_shape=len(classes)
).to(device)
model2c
EuroSATModelV2c( (conv_block_1): Sequential( (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU() (2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (3): ReLU() (4): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (5): ReLU() (6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (7): Dropout(p=0.3, inplace=False) ) (conv_block_2): Sequential( (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU() (2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (3): ReLU() (4): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (5): ReLU() (6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (7): Dropout(p=0.3, inplace=False) ) (classifier): Sequential( (0): Flatten(start_dim=1, end_dim=-1) (1): Linear(in_features=8192, out_features=8192, bias=True) (2): ReLU() (3): Dropout(p=0.5, inplace=False) (4): Linear(in_features=8192, out_features=10, bias=True) ) )
Let’s define the loss function, optimizer and accuracy:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(params=model2c.parameters(), lr=0.02)
accuracy_fn = torchmetrics.Accuracy(task='multiclass', num_classes=len(classes)).to(device)
Let’s train and evaluate the model:
epochs = 20
for epoch in range(epochs):
print(f'Epoch: {epoch}\n---------')
# train
train_step(model=model2c, data_loader=train_dataloader, loss_fn=loss_fn, optimizer=optimizer, accuracy_fn=accuracy_fn, device=device)
test_step(model=model2c, data_loader=test_dataloader, loss_fn=loss_fn, accuracy_fn=accuracy_fn, device=device)
Epoch: 0 --------- Train loss: 2.29606 | Train acc: 0.11208 Test loss: 2.29577 | Test acc: 0.11236 Epoch: 1 --------- Train loss: 2.29500 | Train acc: 0.11403 Test loss: 2.29531 | Test acc: 0.11144 Epoch: 2 --------- Train loss: 2.13820 | Train acc: 0.18023 Test loss: 1.79248 | Test acc: 0.30350 Epoch: 3 --------- Train loss: 1.72966 | Train acc: 0.33032 Test loss: 1.59057 | Test acc: 0.37555 Epoch: 4 --------- Train loss: 1.58228 | Train acc: 0.38926 Test loss: 1.53846 | Test acc: 0.40736 Epoch: 5 --------- Train loss: 1.36793 | Train acc: 0.49431 Test loss: 1.18499 | Test acc: 0.58752 Epoch: 6 --------- Train loss: 1.15396 | Train acc: 0.57088 Test loss: 1.03731 | Test acc: 0.62229 Epoch: 7 --------- Train loss: 0.98389 | Train acc: 0.64255 Test loss: 0.92103 | Test acc: 0.66926 Epoch: 8 --------- Train loss: 0.87593 | Train acc: 0.68157 Test loss: 0.80237 | Test acc: 0.71105 Epoch: 9 --------- Train loss: 0.79426 | Train acc: 0.71417 Test loss: 0.79490 | Test acc: 0.70963 Epoch: 10 --------- Train loss: 0.68222 | Train acc: 0.75204 Test loss: 0.66262 | Test acc: 0.75992 Epoch: 11 --------- Train loss: 0.59417 | Train acc: 0.78370 Test loss: 0.60582 | Test acc: 0.78026 Epoch: 12 --------- Train loss: 0.53144 | Train acc: 0.81037 Test loss: 0.57311 | Test acc: 0.79370 Epoch: 13 --------- Train loss: 0.46417 | Train acc: 0.83218 Test loss: 0.60087 | Test acc: 0.79635 Epoch: 14 --------- Train loss: 0.40754 | Train acc: 0.85329 Test loss: 0.60476 | Test acc: 0.79117 Epoch: 15 --------- Train loss: 0.35350 | Train acc: 0.87481 Test loss: 0.53716 | Test acc: 0.82415 Epoch: 16 --------- Train loss: 0.30994 | Train acc: 0.89287 Test loss: 0.49068 | Test acc: 0.82594 Epoch: 17 --------- Train loss: 0.27412 | Train acc: 0.90634 Test loss: 0.52953 | Test acc: 0.82427 Epoch: 18 --------- Train loss: 0.22980 | Train acc: 0.92069 Test loss: 0.55975 | Test acc: 0.81207 Epoch: 19 --------- Train loss: 0.19639 | Train acc: 0.93153 Test loss: 0.68178 | Test acc: 0.80369
model2c_evaluation = evaluate_model(model=model2c, data_loader=test_dataloader, loss_fn=loss_fn, accuracy_fn=accuracy_fn, device=device)
model2c_evaluation
{'model': 'EuroSATModelV2c', 'loss': 0.68178391456604, 'accuracy': 0.8036859631538391}
Let’s compare all the CNN models:
models_comparison = pd.DataFrame([model2_evaluation, model2a_evaluation, model2b_evaluation, model2c_evaluation]).set_index('model')
models_comparison
loss | accuracy | |
---|---|---|
model | ||
EuroSATModelV2 | 0.903567 | 0.798200 |
EuroSATModelV2a | 0.434743 | 0.860207 |
EuroSATModelV2b | 0.478643 | 0.843072 |
EuroSATModelV2c | 0.681784 | 0.803686 |
models_comparison[['loss', 'accuracy']].plot.bar(title='Loss & Accuracy')
plt.xlabel('Model')
plt.xticks(rotation=45)
(array([0, 1, 2, 3]), [Text(0, 0, 'EuroSATModelV2'), Text(1, 0, 'EuroSATModelV2a'), Text(2, 0, 'EuroSATModelV2b'), Text(3, 0, 'EuroSATModelV2c')])
We could keep optimizing the model by changing the learning rate, changing the number of epochs, changing the number of hidden units, adding more layers to the blocks and classifier, and so on. But let’s stop here.
Looks like of all four CNN models, model2a
is the best. So, let’s use it to make some predictions.
Predictions¶
Let’s use the best model to make some predictions:
def make_predictions(model: nn.Module, data: list, device: torch.device):
pred_probs = []
model.eval()
with torch.inference_mode():
for sample in data:
sample = torch.unsqueeze(sample, dim=0).to(device)
# logits
pred_logit = model(sample)
# logits -> prediction probabilities
pred_prob = torch.softmax(pred_logit.squeeze(), dim=0)
pred_probs.append(pred_prob.cpu())
return torch.stack(pred_probs)
Let’s get some random samples:
import random
random.seed(42)
test_samples = []
test_labels = []
for sample, label in random.sample(list(test_data), k=16):
test_samples.append(sample)
test_labels.append(label)
Let’s use the samples for predictions:
pred_probs = make_predictions(model=model2a, data=test_samples, device=device)
pred_probs
tensor([[5.2620e-05, 9.9973e-01, 8.0184e-05, 8.9340e-05, 4.5576e-12, 5.4281e-07, 1.4791e-08, 1.3254e-10, 1.7600e-05, 3.0821e-05], [4.2757e-05, 1.0373e-06, 1.6362e-01, 4.9032e-01, 2.6546e-05, 5.4002e-03, 3.3794e-01, 3.2598e-04, 2.3233e-03, 1.0836e-07], [4.3061e-05, 1.4522e-01, 8.5304e-01, 1.1201e-04, 1.1554e-08, 9.6508e-04, 5.5737e-04, 6.5630e-07, 6.4802e-05, 3.3018e-06], [1.6674e-04, 2.3102e-13, 1.1174e-09, 1.0854e-02, 1.7073e-08, 1.2278e-07, 5.0648e-05, 3.9191e-11, 9.8893e-01, 3.0446e-13], [1.4487e-05, 9.9870e-01, 1.9194e-04, 2.6715e-05, 4.0150e-11, 1.7323e-06, 1.4556e-08, 2.3499e-10, 1.0334e-03, 3.5787e-05], [4.9741e-15, 5.1709e-13, 1.5151e-04, 6.1187e-06, 9.9942e-01, 2.9146e-10, 4.1278e-07, 4.2279e-04, 1.4350e-09, 4.1282e-11], [6.9446e-07, 4.4732e-05, 9.9278e-01, 6.5584e-05, 1.6141e-07, 5.1006e-04, 6.4587e-03, 1.3020e-04, 1.3990e-05, 4.0530e-08], [1.0597e-02, 2.6723e-04, 6.1078e-10, 4.6423e-04, 7.6807e-12, 9.7655e-01, 6.7932e-07, 5.9719e-11, 1.1939e-02, 1.8512e-04], [1.9481e-04, 8.2576e-08, 1.2252e-01, 1.8858e-02, 3.1774e-07, 5.6317e-05, 7.5772e-04, 1.9258e-08, 8.5761e-01, 9.9261e-09], [3.4365e-02, 1.9095e-06, 6.8999e-05, 6.5470e-01, 1.2693e-08, 8.7312e-04, 1.5308e-01, 2.4636e-09, 1.5691e-01, 4.7072e-07], [9.7083e-01, 9.8780e-08, 2.8806e-03, 9.6759e-03, 1.2170e-07, 3.8617e-05, 1.6546e-02, 7.3366e-10, 3.3181e-05, 5.7991e-08], [3.1431e-02, 1.1575e-02, 2.5030e-05, 9.6184e-02, 3.7493e-08, 8.2887e-01, 8.4270e-04, 1.4772e-07, 3.1071e-02, 2.1957e-06], [3.4066e-03, 6.3285e-02, 8.0721e-05, 4.5271e-02, 6.3864e-08, 7.9327e-01, 1.3115e-02, 3.9299e-06, 8.1521e-02, 4.6054e-05], [1.2153e-11, 7.4877e-13, 1.4819e-06, 1.9659e-06, 9.9999e-01, 3.4144e-08, 7.7698e-07, 3.5343e-07, 5.2866e-07, 9.9078e-11], [6.9118e-01, 6.6048e-04, 1.8911e-02, 2.4607e-03, 2.5089e-07, 1.0543e-01, 1.5094e-01, 3.8922e-09, 2.8508e-02, 1.9163e-03], [4.7205e-05, 9.9980e-01, 2.4397e-05, 5.7754e-05, 1.5298e-11, 8.7736e-07, 6.7762e-09, 4.1086e-10, 5.2881e-05, 1.9618e-05]])
Here we have prediction probabilities, but we need prediction labels. Let’s turn the former into the latter then:
pred_labels = torch.argmax(pred_probs, dim=1)
pred_labels
tensor([1, 3, 2, 8, 1, 4, 2, 5, 8, 3, 0, 5, 5, 4, 0, 1])
Let’s check out the test labels. The more similar they are the better:
test_labels
[1, 6, 2, 8, 1, 4, 2, 5, 8, 8, 0, 5, 5, 4, 0, 1]
Let’s visualize the samples and predictions and see how many of them were correct:
plt.figure(figsize=(18, 15))
for i, sample in enumerate(test_samples):
# create subplot
plt.subplot(4, 4, i + 1)
# plot target image
plt.imshow(sample.permute(1, 2, 0))
# prediction label
pred_label = classes[pred_labels[i]]
# ground truth label
gt_label = classes[test_labels[i]]
# title text
title_text = f'Pred: {pred_label} \n Truth: {gt_label}'
# title color
title_color = 'g' if pred_label == gt_label else 'r'
# plot title
plt.title(title_text, color=title_color)
plt.axis(False)
As we can see, the predictions were not ideal, but most of them were correct.