POSIT AI Blog: Classify the image by torch

In recent posts we are exploring essential. torch Function: Tenser, Sign Quay of all deep learning frameworks; Auto Grad, torchReverse mode automatic differentiation; A synthesized building block of module, neural network; And optimal firearms, well -optimization algorithms torch offer.

But we haven’t seen our “Hello World” moment yet. At least it doesn’t mean you can’t avoid it by “Hello World”. Deep learning experience to classify pets. Cat or dog? Beagle or boxer? Chinuk or Chihuahua? We ask different questions: what kind of bird?

Topics we will deal with:

Core role torch data set and Data loaderEach.
How to apply transformS. on image pretreatment and data expansion
How to use Resnet (I have et al. 2015)It comes with a pre -trained model torchvisionFor this learning.
How to use the learning speed scheduler, especially the one-cycle learning speed algorithm (@abs-1708-07120).
How to find a good initial learning speed.

For convenience, this code can be used in Google collaboration-there is no need to copy.

Data load and pretreatment

The example data set used here is available in Kaggle.

It can be obtained conveniently torchdatasetsUsed pins For authentication, search and storage. it’s possible pins To manage Kagle downloads, follow the instructions here.

Unlike the image we used, this data set, unlike the image we used, is very “clean. Data enlargement. ~ Inside torchvisionData expansion is part of AN Image processing pipeline First, convert the image into a tensor, then apply conversions such as size adjustment, cutting, normalizing, or expressions of various forms.

The following is a variation performed in the training set. Most of the methods for data expansion and normalization are performed to comply with what you expect from Resnet.

Image pretreatment pipeline

library(torch)
library(torchvision)
library(torchdatasets)

library(dplyr)
library(pins)
library(ggplot2)

device <- if (cuda_is_available()) torch_device("cuda:0") else "cpu"

train_transforms <- function(img) {
  img %>%
    # first convert image to tensor
    transform_to_tensor() %>%
    # then move to the GPU (if available)
    (function(x) x$to(device = device)) %>%
    # data augmentation
    transform_random_resized_crop(size = c(224, 224)) %>%
    # data augmentation
    transform_color_jitter() %>%
    # data augmentation
    transform_random_horizontal_flip() %>%
    # normalize according to what is expected by resnet
    transform_normalize(mean = c(0.485, 0.456, 0.406), std = c(0.229, 0.224, 0.225))
}

In the validation set, you do not want to introduce the noise, but you need to adjust, cut, and normalize the image. The test set must be processed the same.

valid_transforms <- function(img) {
  img %>%
    transform_to_tensor() %>%
    (function(x) x$to(device = device)) %>%
    transform_resize(256) %>%
    transform_center_crop(224) %>%
    transform_normalize(mean = c(0.485, 0.456, 0.406), std = c(0.229, 0.224, 0.225))
}

test_transforms <- valid_transforms

And now, we will be divided into education, verification and test sets to bring data. It also tells you what conversion to apply to that R object.

train_ds <- bird_species_dataset("data", download = TRUE, transform = train_transforms)

valid_ds <- bird_species_dataset("data", split = "valid", transform = valid_transforms)

test_ds <- bird_species_dataset("data", split = "test", transform = test_transforms)

Two things to note. First, the conversion is part of it data set Concept Data loader We will meet soon. Second, let’s see how the image was stored on the disk. Full directory structure (start dataThe root directory we will use is:

data/bird_species/train
data/bird_species/valid
data/bird_species/test

at train,,, validand test The directory, the images of various classes are in their own folders. For example, the directory layout for the first three classes of the test set is as follows.

data/bird_species/test/ALBATROSS/
 - data/bird_species/test/ALBATROSS/1.jpg
 - data/bird_species/test/ALBATROSS/2.jpg
 - data/bird_species/test/ALBATROSS/3.jpg
 - data/bird_species/test/ALBATROSS/4.jpg
 - data/bird_species/test/ALBATROSS/5.jpg
 
data/test/'ALEXANDRINE PARAKEET'/
 - data/bird_species/test/'ALEXANDRINE PARAKEET'/1.jpg
 - data/bird_species/test/'ALEXANDRINE PARAKEET'/2.jpg
 - data/bird_species/test/'ALEXANDRINE PARAKEET'/3.jpg
 - data/bird_species/test/'ALEXANDRINE PARAKEET'/4.jpg
 - data/bird_species/test/'ALEXANDRINE PARAKEET'/5.jpg
 
 data/test/'AMERICAN BITTERN'/
 - data/bird_species/test/'AMERICAN BITTERN'/1.jpg
 - data/bird_species/test/'AMERICAN BITTERN'/2.jpg
 - data/bird_species/test/'AMERICAN BITTERN'/3.jpg
 - data/bird_species/test/'AMERICAN BITTERN'/4.jpg
 - data/bird_species/test/'AMERICAN BITTERN'/5.jpg

This is an exactly expected type of layout torchS image_folder_dataset() – And really bird_species_dataset() Instances the sub -type of this class. If you downloaded the data manually in connection with the necessary directory structure, you could create the following data sets:

# e.g.
train_ds <- image_folder_dataset(
  file.path(data_dir, "train"),
  transform = train_transforms)

Now we’ve got data, so let’s look at how many items in each set are.

train_ds$.length()
valid_ds$.length()
test_ds$.length()

31316
1125
1125

The training set is really big! Therefore, you need to play this with the GPU or play with the Colab notebook provided.

There are so many samples, so I wonder how many classes have.

class_names <- test_ds$classes
length(class_names)

So we do It has a real training set, but the task is also powerful. We will differentiate more than 225 other bird species.

Data loader

While doing data set I know what to do with each single item, Data loader I know how to treat them as a whole. How many samples make the placement? Do we always want to feed in the same order, or do you want to choose a different order for all times?

batch_size <- 64

train_dl <- dataloader(train_ds, batch_size = batch_size, shuffle = TRUE)
valid_dl <- dataloader(valid_ds, batch_size = batch_size)
test_dl <- dataloader(test_ds, batch_size = batch_size)

Data loaders can also be queryed. Now is the length: How many batches are you?

train_dl$.length() 
valid_dl$.length() 
test_dl$.length()

490
18
18

Some birds

Next, let’s look at some images in the test set. We can create a repetitive person by searching for the first batch -image and the class. dataloader And the phone next() To that:

# for display purposes, here we are actually using a batch_size of 24
batch <- train_dl$.iter()$.next()

batch The first item is the image tensor.

(1)  24   3 224 224

And second, class:

(1) 24

The class is coded as integer and is used as an index in the vector of the class name. We use it to label the image.

classes <- batch((2))
classes

torch_tensor 
 1
 1
 1
 1
 1
 2
 2
 2
 2
 2
 3
 3
 3
 3
 3
 4
 4
 4
 4
 4
 5
 5
 5
 5
( GPULongType{24} )

The image tensor has a shape batch_size x num_channels x height x width. Plot the use as.raster()We must reconstruct the image so that the channel comes last. We also cancel the normalization applied by dataloader.

The first 24 images are as follows:

library(dplyr)

images <- as_array(batch((1))) %>% aperm(perm = c(1, 3, 4, 2))
mean <- c(0.485, 0.456, 0.406)
std <- c(0.229, 0.224, 0.225)
images <- std * images + mean
images <- images * 255
images(images > 255) <- 255
images(images < 0) <- 0

par(mfcol = c(4,6), mar = rep(1, 4))

images %>%
  purrr::array_tree(1) %>%
  purrr::set_names(class_names(as_array(classes))) %>%
  purrr::map(as.raster, max = 255) %>%
  purrr::iwalk(~{plot(.x); title(.y)})

model

The backbone of our model is a pre -trained Resnet instance.

model <- model_resnet18(pretrained = TRUE)

But we wanted to distinguish 225 bird species and Resnet was trained in 1000 different classes. What can we do? We simply replace the output layer.

The new output layer is also the only thing we want to train. Technically, we I could do it In order to fine -tune the weight of the Resnet, do the reversal through a complete model. But this will slow down. In fact, the choice is entirely or not at all. The number of the original parameters that need to be fixed and “to set free” for some of the original parameters and fine tuning depends on us. For the work, we will be satisfied with training the newly added output layer. With the richness of animals, including birds, we look forward to knowing a lot about the trained Resnet!

model$parameters %>% purrr::walk(function(param) param$requires_grad_(FALSE))

To replace the output layer, the model is modified all the time.

num_features <- model$fc$in_features

model$fc <- nn_linear(in_features = num_features, out_features = length(class_names))

Now put the modified model in the GPU (if available):

model <- model$to(device = device)

training

To optimize, we use cross -entropy loss and probability gradient.

criterion <- nn_cross_entropy_loss()

optimizer <- optim_sgd(model$parameters, lr = 0.1, momentum = 0.9)

Find the optimal efficient learning speed

We set the learning speed 0.1But it is just format. As it is widely known for the excellent lecture of Fast.Ai, it is reasonable to spend time in advance to determine the efficient learning speed. Outside the box torch Logic is simple unless you provide the same tool as Fast.ai’s learning speed finder. Here’s how to find a good learning speed translated into R in Sylvain Gugger’s post:

# ported from: https://sgugger.github.io/how-do-you-find-a-good-learning-rate.html

losses <- c()
log_lrs <- c()

find_lr <- function(init_value = 1e-8, final_value = 10, beta = 0.98) {

  num <- train_dl$.length()
  mult = (final_value/init_value)^(1/num)
  lr <- init_value
  optimizer$param_groups((1))$lr <- lr
  avg_loss <- 0
  best_loss <- 0
  batch_num <- 0

  coro::loop(for (b in train_dl) {

    batch_num <- batch_num + 1
    optimizer$zero_grad()
    output <- model(b((1))$to(device = device))
    loss <- criterion(output, b((2))$to(device = device))

    #Compute the smoothed loss
    avg_loss <- beta * avg_loss + (1-beta) * loss$item()
    smoothed_loss <- avg_loss / (1 - beta^batch_num)
    #Stop if the loss is exploding
    if (batch_num > 1 && smoothed_loss > 4 * best_loss) break
    #Record the best loss
    if (smoothed_loss < best_loss || batch_num == 1) best_loss <- smoothed_loss

    #Store the values
    losses <<- c(losses, smoothed_loss)
    log_lrs <<- c(log_lrs, (log(lr, 10)))

    loss$backward()
    optimizer$step()

    #Update the lr for the next step
    lr <- lr * mult
    optimizer$param_groups((1))$lr <- lr
  })
}

find_lr()

df <- data.frame(log_lrs = log_lrs, losses = losses)
ggplot(df, aes(log_lrs, losses)) + geom_point(size = 1) + theme_classic()

The best learning speed is not an accurate learning speed with a minimum loss. Instead, you have to choose slightly early on the curve, but the loss is still decreasing. 0.05 It looks like a wise choice.

But this value is just anchor. Learning speed scheduler It is allowed to develop the learning speed according to the proven algorithm. Above all, torch Cycle learning (@abs-1708-07120), implementing cycle learning speed (Smith 2015)Cosine annealing with warm restart (Loshchilov and Hutter 2016).

Here we use lr_one_cycle()We newly discovered our new discovery, optimal efficient, hopeful, and values. 0.05 By maximum learning speed. lr_one_cycle() Start at a low speed and then gradually increase until it reaches the allowed maximum value. After that, the learning speed decreases slowly until it is slightly lower than the initial value.

All of this occurs exactly once, not the Epok per party, so the name one_cycle thereto. The following is how the evolution of learning speed looks.

Before you start training, let’s quickly start the model to start with a clean slate.

model <- model_resnet18(pretrained = TRUE)
model$parameters %>% purrr::walk(function(param) param$requires_grad_(FALSE))

num_features <- model$fc$in_features

model$fc <- nn_linear(in_features = num_features, out_features = length(class_names))

model <- model$to(device = device)

criterion <- nn_cross_entropy_loss()

optimizer <- optim_sgd(model$parameters, lr = 0.05, momentum = 0.9)

Instances the scheduler.

num_epochs = 10

scheduler <- optimizer %>% 
  lr_one_cycle(max_lr = 0.05, epochs = num_epochs, steps_per_epoch = train_dl$.length())

Training loop

Now we train for 10 Epok. We call about all training placements scheduler$step() Adjust the learning speed. Especially, this must be completed ~ Later optimizer$step().

train_batch <- function(b) {

  optimizer$zero_grad()
  output <- model(b((1)))
  loss <- criterion(output, b((2))$to(device = device))
  loss$backward()
  optimizer$step()
  scheduler$step()
  loss$item()

}

valid_batch <- function(b) {

  output <- model(b((1)))
  loss <- criterion(output, b((2))$to(device = device))
  loss$item()
}

for (epoch in 1:num_epochs) {

  model$train()
  train_losses <- c()

  coro::loop(for (b in train_dl) {
    loss <- train_batch(b)
    train_losses <- c(train_losses, loss)
  })

  model$eval()
  valid_losses <- c()

  coro::loop(for (b in valid_dl) {
    loss <- valid_batch(b)
    valid_losses <- c(valid_losses, loss)
  })

  cat(sprintf("\nLoss at epoch %d: training: %3f, validation: %3f\n", epoch, mean(train_losses), mean(valid_losses)))
}

Loss at epoch 1: training: 2.662901, validation: 0.790769

Loss at epoch 2: training: 1.543315, validation: 1.014409

Loss at epoch 3: training: 1.376392, validation: 0.565186

Loss at epoch 4: training: 1.127091, validation: 0.575583

Loss at epoch 5: training: 0.916446, validation: 0.281600

Loss at epoch 6: training: 0.775241, validation: 0.215212

Loss at epoch 7: training: 0.639521, validation: 0.151283

Loss at epoch 8: training: 0.538825, validation: 0.106301

Loss at epoch 9: training: 0.407440, validation: 0.083270

Loss at epoch 10: training: 0.354659, validation: 0.080389

The model seems to have made a good progress, but I don’t know anything about classification accuracy in absolute terms. Let’s check in the test set.

Test set accuracy

Finally, calculate the accuracy of the test set.

model$eval()

test_batch <- function(b) {

  output <- model(b((1)))
  labels <- b((2))$to(device = device)
  loss <- criterion(output, labels)
  
  test_losses <<- c(test_losses, loss$item())
  # torch_max returns a list, with position 1 containing the values
  # and position 2 containing the respective indices
  predicted <- torch_max(output$data(), dim = 2)((2))
  total <<- total + labels$size(1)
  # add number of correct classifications in this batch to the aggregate
  correct <<- correct + (predicted == labels)$sum()$item()

}

test_losses <- c()
total <- 0
correct <- 0

for (b in enumerate(test_dl)) {
  test_batch(b)
}

mean(test_losses)

(1) 0.03719

test_accuracy <-  correct/total
test_accuracy

(1) 0.98756

It is impressive considering how many different species are there!

Lapup

Hopefully, this was a useful introduction to classifying images. torchIn a non -domain -specific architecture element such as data sets, data loaders and learning speed schedulers. In the future, posts will explore other domains and move beyond “Hello World” in image recognition. Thank you for reading!

He is Kaiming, Xiangyu Zhang, Shaoqing Ren and Jian Sun. “Deep residences for image recognition.” CORR ABS/1512.03385. http://arxiv.org/abs/1512.03385.

Loshchilov, ilya and frank hutter. 2016. “SGDR: Restart down the probability of reduction. ” CORR ABS/1608.03983. http://arxiv.org/abs/1608.03983.

Smith, Leslie N. 2015. “There is no any more crowning game guessing game.” CORR ABS/1506.01186. http://arxiv.org/abs/1506.01186.

POSIT AI Blog: Classify the image by torch

How to solve the AI collaboration gap

DeepSeek may not be a good news about energy.

Microsoft Azure Openai Service Announces the availability of O3-Mini reasoning models

Light it up! Snoop Dogg carries the Olympic torch at the final games in Paris – National

Gausman contributes to Blue Jays’ sweep of Angels

A Drake security guard was shot outside his Toronto home.

The Jays scored four runs in the eighth to beat the Rays 6-3.

Frank Lampard has echoed many Chelsea fans with his latest comments – Talk Chelsea

Chris Billam-Smith has revealed he turned down ‘life-changing money’ from Jai Opetaia ahead of his Gilberto Ramirez fight. boxing news

Manchester United transfers double as Lille chairman confirms Lenny Yoro and Jonathan David available for sale – Manchester United News and Transfer News

RFK Jr. has filed a new petition in Nevada amid a legal battle over ballot access.

Our Picks

Karl-Anthony Towns and Anthony Edwards put the Timberwolves up over the Mavericks 105-100 to avoid a sweep.

“Trusting older people didn’t work…”: Former India star drops ‘Rohit Sharma-Virat Kohli’ bomb.

What to expect at the Paris 2024 Closing Ceremony

Most Popular

Light it up! Snoop Dogg carries the Olympic torch at the final games in Paris – National

Gausman contributes to Blue Jays’ sweep of Angels

A Drake security guard was shot outside his Toronto home.

POSIT AI Blog: Classify the image by torch

Data load and pretreatment

Image pretreatment pipeline

Data loader

Some birds

model

training

Find the optimal efficient learning speed

Training loop

Test set accuracy

Lapup

Related Posts