Pytorch训练图像分类网络

2024-05-19
作者 qinxi

机器人学导论实验：Training an image classifier

一、实验目的

利用Pytorch在CPU/GPU上训练一个图像分类网络

二、实验步骤

加载并预处理 CIFAR-10 数据集,包括划分训练集和测试集,并应用图像变换。
定义一个卷积神经网络模型结构。
定义损失函数和优化器,并利用训练集对模型进行训练。
保存训练好的模型。
使用测试集评估模型的性能,包括计算总体准确率以及各个类别的准确率。

三、实验过程

1. 使用pip安装Pytorch

在Pytorch官网找到下载命令，以管理员模式进入命令行，切换到项目根目录输入命令

安装成功后，输入pip list确认已安装Pytorch

2. 使用Pytorch官网中训练图像分裂网络的代码，发现报错

于是在代码前面部分加上

3. 将数据集下载到本地

下载运用网上数据集速度非常慢，因此我们将数据集下载到项目的根目录下

进入官网找到下载方式

将下载的压缩包解压放在新的文件夹data中

并将download改成false，表示在本地读取数据集

4. 运行代码

加载数据集并进行归一化（normalize）处理，便于后续对模型进行训练

# 实验：Training an image classifier
# Pytorch训练图像分类网络
# 报告人：韦沁曦
# 学号：2022280210
# 2024.5.15


# First step:
# Load and normalize CIFAR10
# 加载数据集


if __name__ == '__main__':
    import torch
    import torchvision
    import torchvision.transforms as transforms


    transform = transforms.Compose(
        [transforms.ToTensor(),
         transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

    batch_size = 4

    # download = False: load local dataset
    trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                            download=False, transform=transform)
    trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
                                              shuffle=True, num_workers=2)

    testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                           download=True, transform=transform)
    testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                             shuffle=False, num_workers=2)

    classes = ('plane', 'car', 'bird', 'cat',
               'deer', 'dog', 'frog', 'horse', 'ship', 'truck')


    import matplotlib.pyplot as plt
    import numpy as np

    # functions to show an image


    def imshow(img):
        img = img / 2 + 0.5     # unnormalize
        npimg = img.numpy()
        plt.imshow(np.transpose(npimg, (1, 2, 0)))
        plt.show()


    # get some random training images
    dataiter = iter(trainloader)
    images, labels = next(dataiter)

    # show images
    imshow(torchvision.utils.make_grid(images))
    # print labels
    print(' '.join(f'{classes[labels[j]]:5s}' for j in range(batch_size)))

显示处理后图像：

打印真实图像类别

搭建卷积神经网络，定义前向传播为进行两轮卷积、激活、池化，并拉成一个向量

# Second step:
    # Define a Convolutional Neural Network
    # 定义卷积神经网络


    import torch.nn as nn
    import torch.nn.functional as F


    class Net(nn.Module):
        def __init__(self):
            super().__init__()
            self.conv1 = nn.Conv2d(3, 6, 5)
            self.pool = nn.MaxPool2d(2, 2)
            self.conv2 = nn.Conv2d(6, 16, 5)
            self.fc1 = nn.Linear(16 * 5 * 5, 120)
            self.fc2 = nn.Linear(120, 84)
            self.fc3 = nn.Linear(84, 10)

        def forward(self, x):
            x = self.pool(F.relu(self.conv1(x)))
            x = self.pool(F.relu(self.conv2(x)))
            x = torch.flatten(x, 1) # flatten all dimensions except batch
            x = F.relu(self.fc1(x))
            x = F.relu(self.fc2(x))
            x = self.fc3(x)
            return x


    net = Net()

定义损失函数和优化器SGD

# Third step:
    # Define a Loss function and optimizer
    # 定义损失函数和优化器


    import torch.optim as optim

    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

对网络进行训练

# Fourth step:
    # Train the network
    # 训练网络


    for epoch in range(2):  # loop over the dataset multiple times

        running_loss = 0.0
        for i, data in enumerate(trainloader, 0):
            # get the inputs; data is a list of [inputs, labels]
            inputs, labels = data

            # zero the parameter gradients
            optimizer.zero_grad()

            # forward + backward + optimize
            outputs = net(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            # print statistics
            running_loss += loss.item()
            if i % 2000 == 1999:  # print every 2000 mini-batches
                print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}')
                running_loss = 0.0

    print('Finished Training')

    PATH = './cifar_net.pth'
    torch.save(net.state_dict(), PATH)

训练结果：

由图可观察到，损失函数逐渐降低

对网络进行测试，计算准确度

# Fifth step:
    # Test the network on the test data
    # 用测试数据集测试网络


    dataiter = iter(testloader)
    images, labels = next(dataiter)

    # print images
    imshow(torchvision.utils.make_grid(images))
    print('GroundTruth: ', ' '.join(f'{classes[labels[j]]:5s}' for j in range(4)))

    net = Net()
    net.load_state_dict(torch.load(PATH))

    outputs = net(images)

    _, predicted = torch.max(outputs, 1)

    print('Predicted: ', ' '.join(f'{classes[predicted[j]]:5s}'
                                  for j in range(4)))


    correct = 0
    total = 0
    # since we're not training, we don't need to calculate the gradients for our outputs
    with torch.no_grad():
        for data in testloader:
            images, labels = data
            # calculate outputs by running images through the network
            outputs = net(images)
            # the class with the highest energy is what we choose as prediction
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    print(f'Accuracy of the network on the 10000 test images: {100 * correct // total} %')


    # prepare to count predictions for each class
    correct_pred = {classname: 0 for classname in classes}
    total_pred = {classname: 0 for classname in classes}

    # again no gradients needed
    with torch.no_grad():
        for data in testloader:
            images, labels = data
            outputs = net(images)
            _, predictions = torch.max(outputs, 1)
            # collect the correct predictions for each class
            for label, prediction in zip(labels, predictions):
                if label == prediction:
                    correct_pred[classes[label]] += 1
                total_pred[classes[label]] += 1

    # print accuracy for each class
    for classname, correct_count in correct_pred.items():
        accuracy = 100 * float(correct_count) / total_pred[classname]
        print(f'Accuracy for class: {classname:5s} is {accuracy:.1f} %')

显示出预测图像

打印真实图像和预测图像的类别，并打印出训练后网络的预测图像准确度

逐行打印每个类别的准确度

Qinxi的个人博客