In recent posts we are exploring essential. torch
Function: Tenser, Sign Quay of all deep learning frameworks; Auto Grad, torch
Reverse mode automatic differentiation; A synthesized building block of module, neural network; And optimal firearms, well -optimization algorithms torch
offer.
But we haven’t seen our “Hello World” moment yet. At least it doesn’t mean you can’t avoid it by “Hello World”. Deep learning experience to classify pets. Cat or dog? Beagle or boxer? Chinuk or Chihuahua? We ask different questions: what kind of bird?
Topics we will deal with:
-
Core role
torch
data set and Data loaderEach. -
How to apply
transform
S. on image pretreatment and data expansion -
How to use Resnet (I have et al. 2015)It comes with a pre -trained model
torchvision
For this learning. -
How to use the learning speed scheduler, especially the one-cycle learning speed algorithm (@abs-1708-07120).
-
How to find a good initial learning speed.
For convenience, this code can be used in Google collaboration-there is no need to copy.
Data load and pretreatment
The example data set used here is available in Kaggle.
It can be obtained conveniently torchdatasets
Used pins
For authentication, search and storage. it’s possible pins
To manage Kagle downloads, follow the instructions here.
Unlike the image we used, this data set, unlike the image we used, is very “clean. Data enlargement. ~ Inside torchvision
Data expansion is part of AN Image processing pipeline First, convert the image into a tensor, then apply conversions such as size adjustment, cutting, normalizing, or expressions of various forms.
The following is a variation performed in the training set. Most of the methods for data expansion and normalization are performed to comply with what you expect from Resnet.
Image pretreatment pipeline
library(torch)
library(torchvision)
library(torchdatasets)
library(dplyr)
library(pins)
library(ggplot2)
device <- if (cuda_is_available()) torch_device("cuda:0") else "cpu"
train_transforms <- function(img) {
img %>%
# first convert image to tensor
transform_to_tensor() %>%
# then move to the GPU (if available)
(function(x) x$to(device = device)) %>%
# data augmentation
transform_random_resized_crop(size = c(224, 224)) %>%
# data augmentation
transform_color_jitter() %>%
# data augmentation
transform_random_horizontal_flip() %>%
# normalize according to what is expected by resnet
transform_normalize(mean = c(0.485, 0.456, 0.406), std = c(0.229, 0.224, 0.225))
}
In the validation set, you do not want to introduce the noise, but you need to adjust, cut, and normalize the image. The test set must be processed the same.
valid_transforms <- function(img) {
img %>%
transform_to_tensor() %>%
(function(x) x$to(device = device)) %>%
transform_resize(256) %>%
transform_center_crop(224) %>%
transform_normalize(mean = c(0.485, 0.456, 0.406), std = c(0.229, 0.224, 0.225))
}
test_transforms <- valid_transforms
And now, we will be divided into education, verification and test sets to bring data. It also tells you what conversion to apply to that R object.
train_ds <- bird_species_dataset("data", download = TRUE, transform = train_transforms)
valid_ds <- bird_species_dataset("data", split = "valid", transform = valid_transforms)
test_ds <- bird_species_dataset("data", split = "test", transform = test_transforms)
Two things to note. First, the conversion is part of it data set Concept Data loader We will meet soon. Second, let’s see how the image was stored on the disk. Full directory structure (start data
The root directory we will use is:
data/bird_species/train
data/bird_species/valid
data/bird_species/test
at train
,,, valid
and test
The directory, the images of various classes are in their own folders. For example, the directory layout for the first three classes of the test set is as follows.
data/bird_species/test/ALBATROSS/
- data/bird_species/test/ALBATROSS/1.jpg
- data/bird_species/test/ALBATROSS/2.jpg
- data/bird_species/test/ALBATROSS/3.jpg
- data/bird_species/test/ALBATROSS/4.jpg
- data/bird_species/test/ALBATROSS/5.jpg
data/test/'ALEXANDRINE PARAKEET'/
- data/bird_species/test/'ALEXANDRINE PARAKEET'/1.jpg
- data/bird_species/test/'ALEXANDRINE PARAKEET'/2.jpg
- data/bird_species/test/'ALEXANDRINE PARAKEET'/3.jpg
- data/bird_species/test/'ALEXANDRINE PARAKEET'/4.jpg
- data/bird_species/test/'ALEXANDRINE PARAKEET'/5.jpg
data/test/'AMERICAN BITTERN'/
- data/bird_species/test/'AMERICAN BITTERN'/1.jpg
- data/bird_species/test/'AMERICAN BITTERN'/2.jpg
- data/bird_species/test/'AMERICAN BITTERN'/3.jpg
- data/bird_species/test/'AMERICAN BITTERN'/4.jpg
- data/bird_species/test/'AMERICAN BITTERN'/5.jpg
This is an exactly expected type of layout torch
S image_folder_dataset()
– And really bird_species_dataset()
Instances the sub -type of this class. If you downloaded the data manually in connection with the necessary directory structure, you could create the following data sets:
# e.g.
train_ds <- image_folder_dataset(
file.path(data_dir, "train"),
transform = train_transforms)
Now we’ve got data, so let’s look at how many items in each set are.
train_ds$.length()
valid_ds$.length()
test_ds$.length()
31316
1125
1125
The training set is really big! Therefore, you need to play this with the GPU or play with the Colab notebook provided.
There are so many samples, so I wonder how many classes have.
class_names <- test_ds$classes
length(class_names)
225
So we do It has a real training set, but the task is also powerful. We will differentiate more than 225 other bird species.
Data loader
While doing data set I know what to do with each single item, Data loader I know how to treat them as a whole. How many samples make the placement? Do we always want to feed in the same order, or do you want to choose a different order for all times?
batch_size <- 64
train_dl <- dataloader(train_ds, batch_size = batch_size, shuffle = TRUE)
valid_dl <- dataloader(valid_ds, batch_size = batch_size)
test_dl <- dataloader(test_ds, batch_size = batch_size)
Data loaders can also be queryed. Now is the length: How many batches are you?
train_dl$.length()
valid_dl$.length()
test_dl$.length()
490
18
18
Some birds
Next, let’s look at some images in the test set. We can create a repetitive person by searching for the first batch -image and the class. dataloader
And the phone next()
To that:
# for display purposes, here we are actually using a batch_size of 24
batch <- train_dl$.iter()$.next()
batch
The first item is the image tensor.
(1) 24 3 224 224
And second, class:
(1) 24
The class is coded as integer and is used as an index in the vector of the class name. We use it to label the image.
classes <- batch((2))
classes
torch_tensor
1
1
1
1
1
2
2
2
2
2
3
3
3
3
3
4
4
4
4
4
5
5
5
5
( GPULongType{24} )
The image tensor has a shape batch_size x num_channels x height x width
. Plot the use as.raster()
We must reconstruct the image so that the channel comes last. We also cancel the normalization applied by dataloader
.
The first 24 images are as follows:
library(dplyr)
images <- as_array(batch((1))) %>% aperm(perm = c(1, 3, 4, 2))
mean <- c(0.485, 0.456, 0.406)
std <- c(0.229, 0.224, 0.225)
images <- std * images + mean
images <- images * 255
images(images > 255) <- 255
images(images < 0) <- 0
par(mfcol = c(4,6), mar = rep(1, 4))
images %>%
purrr::array_tree(1) %>%
purrr::set_names(class_names(as_array(classes))) %>%
purrr::map(as.raster, max = 255) %>%
purrr::iwalk(~{plot(.x); title(.y)})
model
The backbone of our model is a pre -trained Resnet instance.
model <- model_resnet18(pretrained = TRUE)
But we wanted to distinguish 225 bird species and Resnet was trained in 1000 different classes. What can we do? We simply replace the output layer.
The new output layer is also the only thing we want to train. Technically, we I could do it In order to fine -tune the weight of the Resnet, do the reversal through a complete model. But this will slow down. In fact, the choice is entirely or not at all. The number of the original parameters that need to be fixed and “to set free” for some of the original parameters and fine tuning depends on us. For the work, we will be satisfied with training the newly added output layer. With the richness of animals, including birds, we look forward to knowing a lot about the trained Resnet!
model$parameters %>% purrr::walk(function(param) param$requires_grad_(FALSE))
To replace the output layer, the model is modified all the time.
num_features <- model$fc$in_features
model$fc <- nn_linear(in_features = num_features, out_features = length(class_names))
Now put the modified model in the GPU (if available):
model <- model$to(device = device)
training
To optimize, we use cross -entropy loss and probability gradient.
criterion <- nn_cross_entropy_loss()
optimizer <- optim_sgd(model$parameters, lr = 0.1, momentum = 0.9)
Find the optimal efficient learning speed
We set the learning speed 0.1
But it is just format. As it is widely known for the excellent lecture of Fast.Ai, it is reasonable to spend time in advance to determine the efficient learning speed. Outside the box torch
Logic is simple unless you provide the same tool as Fast.ai’s learning speed finder. Here’s how to find a good learning speed translated into R in Sylvain Gugger’s post:
# ported from: https://sgugger.github.io/how-do-you-find-a-good-learning-rate.html
losses <- c()
log_lrs <- c()
find_lr <- function(init_value = 1e-8, final_value = 10, beta = 0.98) {
num <- train_dl$.length()
mult = (final_value/init_value)^(1/num)
lr <- init_value
optimizer$param_groups((1))$lr <- lr
avg_loss <- 0
best_loss <- 0
batch_num <- 0
coro::loop(for (b in train_dl) {
batch_num <- batch_num + 1
optimizer$zero_grad()
output <- model(b((1))$to(device = device))
loss <- criterion(output, b((2))$to(device = device))
#Compute the smoothed loss
avg_loss <- beta * avg_loss + (1-beta) * loss$item()
smoothed_loss <- avg_loss / (1 - beta^batch_num)
#Stop if the loss is exploding
if (batch_num > 1 && smoothed_loss > 4 * best_loss) break
#Record the best loss
if (smoothed_loss < best_loss || batch_num == 1) best_loss <- smoothed_loss
#Store the values
losses <<- c(losses, smoothed_loss)
log_lrs <<- c(log_lrs, (log(lr, 10)))
loss$backward()
optimizer$step()
#Update the lr for the next step
lr <- lr * mult
optimizer$param_groups((1))$lr <- lr
})
}
find_lr()
df <- data.frame(log_lrs = log_lrs, losses = losses)
ggplot(df, aes(log_lrs, losses)) + geom_point(size = 1) + theme_classic()
The best learning speed is not an accurate learning speed with a minimum loss. Instead, you have to choose slightly early on the curve, but the loss is still decreasing. 0.05
It looks like a wise choice.
But this value is just anchor. Learning speed scheduler It is allowed to develop the learning speed according to the proven algorithm. Above all, torch
Cycle learning (@abs-1708-07120), implementing cycle learning speed (Smith 2015)Cosine annealing with warm restart (Loshchilov and Hutter 2016).
Here we use lr_one_cycle()
We newly discovered our new discovery, optimal efficient, hopeful, and values. 0.05
By maximum learning speed. lr_one_cycle()
Start at a low speed and then gradually increase until it reaches the allowed maximum value. After that, the learning speed decreases slowly until it is slightly lower than the initial value.
All of this occurs exactly once, not the Epok per party, so the name one_cycle
thereto. The following is how the evolution of learning speed looks.
Before you start training, let’s quickly start the model to start with a clean slate.
model <- model_resnet18(pretrained = TRUE)
model$parameters %>% purrr::walk(function(param) param$requires_grad_(FALSE))
num_features <- model$fc$in_features
model$fc <- nn_linear(in_features = num_features, out_features = length(class_names))
model <- model$to(device = device)
criterion <- nn_cross_entropy_loss()
optimizer <- optim_sgd(model$parameters, lr = 0.05, momentum = 0.9)
Instances the scheduler.
num_epochs = 10
scheduler <- optimizer %>%
lr_one_cycle(max_lr = 0.05, epochs = num_epochs, steps_per_epoch = train_dl$.length())
Training loop
Now we train for 10 Epok. We call about all training placements scheduler$step()
Adjust the learning speed. Especially, this must be completed ~ Later optimizer$step()
.
train_batch <- function(b) {
optimizer$zero_grad()
output <- model(b((1)))
loss <- criterion(output, b((2))$to(device = device))
loss$backward()
optimizer$step()
scheduler$step()
loss$item()
}
valid_batch <- function(b) {
output <- model(b((1)))
loss <- criterion(output, b((2))$to(device = device))
loss$item()
}
for (epoch in 1:num_epochs) {
model$train()
train_losses <- c()
coro::loop(for (b in train_dl) {
loss <- train_batch(b)
train_losses <- c(train_losses, loss)
})
model$eval()
valid_losses <- c()
coro::loop(for (b in valid_dl) {
loss <- valid_batch(b)
valid_losses <- c(valid_losses, loss)
})
cat(sprintf("\nLoss at epoch %d: training: %3f, validation: %3f\n", epoch, mean(train_losses), mean(valid_losses)))
}
Loss at epoch 1: training: 2.662901, validation: 0.790769
Loss at epoch 2: training: 1.543315, validation: 1.014409
Loss at epoch 3: training: 1.376392, validation: 0.565186
Loss at epoch 4: training: 1.127091, validation: 0.575583
Loss at epoch 5: training: 0.916446, validation: 0.281600
Loss at epoch 6: training: 0.775241, validation: 0.215212
Loss at epoch 7: training: 0.639521, validation: 0.151283
Loss at epoch 8: training: 0.538825, validation: 0.106301
Loss at epoch 9: training: 0.407440, validation: 0.083270
Loss at epoch 10: training: 0.354659, validation: 0.080389
The model seems to have made a good progress, but I don’t know anything about classification accuracy in absolute terms. Let’s check in the test set.
Test set accuracy
Finally, calculate the accuracy of the test set.
model$eval()
test_batch <- function(b) {
output <- model(b((1)))
labels <- b((2))$to(device = device)
loss <- criterion(output, labels)
test_losses <<- c(test_losses, loss$item())
# torch_max returns a list, with position 1 containing the values
# and position 2 containing the respective indices
predicted <- torch_max(output$data(), dim = 2)((2))
total <<- total + labels$size(1)
# add number of correct classifications in this batch to the aggregate
correct <<- correct + (predicted == labels)$sum()$item()
}
test_losses <- c()
total <- 0
correct <- 0
for (b in enumerate(test_dl)) {
test_batch(b)
}
mean(test_losses)
(1) 0.03719
test_accuracy <- correct/total
test_accuracy
(1) 0.98756
It is impressive considering how many different species are there!
Lapup
Hopefully, this was a useful introduction to classifying images. torch
In a non -domain -specific architecture element such as data sets, data loaders and learning speed schedulers. In the future, posts will explore other domains and move beyond “Hello World” in image recognition. Thank you for reading!
He is Kaiming, Xiangyu Zhang, Shaoqing Ren and Jian Sun. “Deep residences for image recognition.” CORR ABS/1512.03385. http://arxiv.org/abs/1512.03385.
Loshchilov, ilya and frank hutter. 2016. “SGDR: Restart down the probability of reduction. ” CORR ABS/1608.03983. http://arxiv.org/abs/1608.03983.
Smith, Leslie N. 2015. “There is no any more crowning game guessing game.” CORR ABS/1506.01186. http://arxiv.org/abs/1506.01186.