… Before I begin, I apologize to my Spanish-speaking readers. … I had to choose between “let’s do it”. And “Haya”And in the end, it all comes down to a coin toss…
As I write this, we are very pleased with the rapid adoption we have seen so far. torch
– Leverage core functionality not only for immediate use, but also in packages based on it.
However, in application scenarios (training and validation at precise steps, calculating metrics and acting on them, changing hyperparameters dynamically during the process), it may seem like it involves a non-negligible amount of boilerplate code. First of all, there is a main loop for epochs and inside it are loops for training and validation batches. Additionally, steps such as updating the model are method (training or validation, etc.), zeroing and gradient computation, and propagating model updates must be performed in the correct order. Finally, care must be taken to ensure that the tensor is at its expected location at any time. device.
If it’s a dream, wouldn’t it be an illusion?“Head First…” was popular in the early 2000s. ” As the series says, was there a way to eliminate these manual steps while still maintaining flexibility? with luz
there is.
In this post we focused on two things. First, the streamlined workflow itself. Second, it is a general mechanism that allows for customization. I’ll link to the (already extensive) documentation for more detailed examples and specific coding instructions for the latter.
Train, validate and then test: basic deep learning workflow luz
To demonstrate the essential workflow, we use a dataset that is readily available and unobtrusive in terms of preprocessing. dog vs cat Collections that come with it torchdatasets
. torchvision
You will need it for image conversion. Besides these two packages, what we need is torch
and luz
.
data
The dataset is downloaded from Kaggle. You will need to edit the path below to reflect your Kaggle token location.
dir <- "~/Downloads/dogs-vs-cats"
ds <- torchdatasets::dogs_vs_cats_dataset(
dir,
token = "~/.kaggle/kaggle.json",
transform = . %>%
torchvision::transform_to_tensor() %>%
torchvision::transform_resize(size = c(224, 224)) %>%
torchvision::transform_normalize(rep(0.5, 3), rep(0.5, 3)),
target_transform = function(x) as.double(x) - 1
)
Conveniently, we can use dataset_subset()
Split the data into training, validation, and test sets.
train_ids <- sample(1:length(ds), size = 0.6 * length(ds))
valid_ids <- sample(setdiff(1:length(ds), train_ids), size = 0.2 * length(ds))
test_ids <- setdiff(1:length(ds), union(train_ids, valid_ids))
train_ds <- dataset_subset(ds, indices = train_ids)
valid_ds <- dataset_subset(ds, indices = valid_ids)
test_ds <- dataset_subset(ds, indices = test_ids)
Next, we instantiate each instance. dataloader
S.
train_dl <- dataloader(train_ds, batch_size = 64, shuffle = TRUE, num_workers = 4)
valid_dl <- dataloader(valid_ds, batch_size = 64, num_workers = 4)
test_dl <- dataloader(test_ds, batch_size = 64, num_workers = 4)
That’s all there is to it about data. There are no changes to the workflow so far. There is also no difference in how the model is defined.
model
To speed up training, we pre-trained AlexNet ( Kryszewski (2014)).
net <- torch::nn_module(
initialize = function(output_size) {
self$model <- model_alexnet(pretrained = TRUE)
for (par in self$parameters) {
par$requires_grad_(FALSE)
}
self$model$classifier <- nn_sequential(
nn_dropout(0.5),
nn_linear(9216, 512),
nn_relu(),
nn_linear(512, 256),
nn_relu(),
nn_linear(256, output_size)
)
},
forward = function(x) {
self$model(x)(,1)
}
)
If you look closely, you’ll see that everything we’ve done so far is as follows: define model. Unlike torch
-We will not just instantiate the workflow and we will not move it to the final GPU.
Expanding on the latter, we can further say: every Device processing is managed by: luz
. Investigate whether you have a CUDA-capable GPU, and if found, ensure that both model weights and data tensors are transparently moved whenever necessary. The same goes in the opposite direction. For example, predictions computed on the test set are automatically transferred to the CPU so the user can manipulate them further in R. But when it comes to predictions, we’re not there yet. Model training makes a difference. luz
It catches your eye right away.
training
Four calls are shown below. luz
Two of them are required for all settings and two are case dependent. Things you always need setup()
and fit()
:
-
in
setup()
you tell meluz
How much the loss will be and what optimization tools to use. Optionally, in addition to the loss itself (the default metric that informs weight updates), you can:luz
Calculate further. For example, here we are asking for classification accuracy. (For someone looking at a progress bar, a second-class accuracy of 0.91 makes much more sense than a cross-entropy loss of 1.26.) -
in
fit()
Pass references to training and validation.dataloader
S. There are default values for the number of epochs to train, but it is usually a good idea to also pass a custom value for this parameter.
Here the case-by-case call is: set_hparams()
and set_opt_hparams()
. here,
-
set_hparams()
In the model definition we haveinitialize()
Take the parameters,output_size
. expected acquisitioninitialize()
It must be delivered this way. -
set_opt_hparams()
This is because we want to use a non-default learning rate.optim_adam()
. If you’re happy with the defaults, those calls won’t be made out of order.
fitted <- net %>%
setup(
loss = nn_bce_with_logits_loss(),
optimizer = optim_adam,
metrics = list(
luz_metric_binary_accuracy_with_logits()
)
) %>%
set_hparams(output_size = 1) %>%
set_opt_hparams(lr = 0.01) %>%
fit(train_dl, epochs = 3, valid_data = valid_dl)
The output is as follows:
Epoch 1/3
Train metrics: Loss: 0.8692 - Acc: 0.9093
Valid metrics: Loss: 0.1816 - Acc: 0.9336
Epoch 2/3
Train metrics: Loss: 0.1366 - Acc: 0.9468
Valid metrics: Loss: 0.1306 - Acc: 0.9458
Epoch 3/3
Train metrics: Loss: 0.1225 - Acc: 0.9507
Valid metrics: Loss: 0.1339 - Acc: 0.947
Training is complete. Please contact us. luz
To save the trained model:
luz_save(fitted, "dogs-and-cats.pt")
Test set prediction
And finally, predict()
Get predictions for the passed data. dataloader
– Here is the test set. As the first argument, we expect a fitted model.
preds <- predict(fitted, test_dl)
probs <- torch_sigmoid(preds)
print(probs, n = 5)
torch_tensor
1.2959e-01
1.3032e-03
6.1966e-05
5.9575e-01
4.5577e-03
... (the output was truncated (use n=-1 to disable))
( CPUFloatType{5000} )
This is it for the complete workflow. If you’ve ever used Keras, it will feel quite familiar. The same goes for even the most diverse yet standardized customization techniques. luz
.
How to do (almost) anything at any time
Like hard, luz
has the concept of callback You can “attach” to the training process and run arbitrary R code. Specifically, you can schedule code to run at the following times:
-
When the entire learning process begins or ends (
on_fit_begin()
/on_fit_end()
); -
When the era of training and verification begins or ends (
on_epoch_begin()
/on_epoch_end()
); -
When half of the training (validation, each) starts or ends during an epoch (
on_train_begin()
/on_train_end()
;on_valid_begin()
/on_valid_end()
); -
When a new batch is about to be processed or has been processed during training (validation, etc.)
on_train_batch_begin()
/on_train_batch_end()
;on_valid_batch_begin()
/on_valid_batch_end()
); -
The same goes for certain landmarks inside the “innermost” training/validation logic, such as “after loss calculation”, “after backwards”, or “after step”.
You can use this technique to implement any logic you want, but luz
It already comes with a very useful set of callbacks.
for example:
-
luz_callback_model_checkpoint()
Save model weights periodically. -
luz_callback_lr_scheduler()
You can enable one of the following:torch
‘S learning rate scheduler. There are a variety of schedulers, each following its own logic in how it dynamically adjusts the learning rate. -
luz_callback_early_stopping()
Terminate training if model performance does not improve.
The callback is passed to: fit()
From the list. Here we apply the above example to (1) ensure that the model weights are saved after each epoch and (2) ensure that training is terminated if the validation loss does not improve for two consecutive epochs.
fitted <- net %>%
setup(
loss = nn_bce_with_logits_loss(),
optimizer = optim_adam,
metrics = list(
luz_metric_binary_accuracy_with_logits()
)
) %>%
set_hparams(output_size = 1) %>%
set_opt_hparams(lr = 0.01) %>%
fit(train_dl,
epochs = 10,
valid_data = valid_dl,
callbacks = list(luz_callback_model_checkpoint(path = "./models"),
luz_callback_early_stopping(patience = 2)))
What about other types of flexibility requirements, such as scenarios of multiple interaction models, each with its own loss function and optimization function? In that case the code will be slightly longer than what you see here, but luz
It can still be of great help in streamlining your workflow.
In conclusion, use luz
You don’t lose the flexibility that comes with torch
Code simplicity, modularity, and maintainability are greatly improved. We’ll be happy to hear that you’ll give it a try!
Thanks for reading!
Photo by JD Rincs on Unsplash
Kryszewski, Alex. 2014. “One of the strangest ways to parallelize convolutional neural networks.” CoRR ABS/1404.5997. http://arxiv.org/abs/1404.5997.