Posit AI Blog: Let There Be Light: Shine more light on the torch!

… Before I begin, I apologize to my Spanish-speaking readers. … I had to choose between “let’s do it”. And “Haya”And in the end, it all comes down to a coin toss…

As I write this, we are very pleased with the rapid adoption we have seen so far. torch – Leverage core functionality not only for immediate use, but also in packages based on it.

However, in application scenarios (training and validation at precise steps, calculating metrics and acting on them, changing hyperparameters dynamically during the process), it may seem like it involves a non-negligible amount of boilerplate code. First of all, there is a main loop for epochs and inside it are loops for training and validation batches. Additionally, steps such as updating the model are method (training or validation, etc.), zeroing and gradient computation, and propagating model updates must be performed in the correct order. Finally, care must be taken to ensure that the tensor is at its expected location at any time. device.

If it’s a dream, wouldn’t it be an illusion?“Head First…” was popular in the early 2000s. ” As the series says, was there a way to eliminate these manual steps while still maintaining flexibility? with luzthere is.

In this post we focused on two things. First, the streamlined workflow itself. Second, it is a general mechanism that allows for customization. I’ll link to the (already extensive) documentation for more detailed examples and specific coding instructions for the latter.

Train, validate and then test: basic deep learning workflow `luz`

To demonstrate the essential workflow, we use a dataset that is readily available and unobtrusive in terms of preprocessing. dog vs cat Collections that come with it torchdatasets. torchvision You will need it for image conversion. Besides these two packages, what we need is torch and luz.

data

The dataset is downloaded from Kaggle. You will need to edit the path below to reflect your Kaggle token location.

dir <- "~/Downloads/dogs-vs-cats" 

ds <- torchdatasets::dogs_vs_cats_dataset(
  dir,
  token = "~/.kaggle/kaggle.json",
  transform = . %>%
    torchvision::transform_to_tensor() %>%
    torchvision::transform_resize(size = c(224, 224)) %>% 
    torchvision::transform_normalize(rep(0.5, 3), rep(0.5, 3)),
  target_transform = function(x) as.double(x) - 1
)

Conveniently, we can use dataset_subset() Split the data into training, validation, and test sets.

train_ids <- sample(1:length(ds), size = 0.6 * length(ds))
valid_ids <- sample(setdiff(1:length(ds), train_ids), size = 0.2 * length(ds))
test_ids <- setdiff(1:length(ds), union(train_ids, valid_ids))

train_ds <- dataset_subset(ds, indices = train_ids)
valid_ds <- dataset_subset(ds, indices = valid_ids)
test_ds <- dataset_subset(ds, indices = test_ids)

Next, we instantiate each instance. dataloaderS.

train_dl <- dataloader(train_ds, batch_size = 64, shuffle = TRUE, num_workers = 4)
valid_dl <- dataloader(valid_ds, batch_size = 64, num_workers = 4)
test_dl <- dataloader(test_ds, batch_size = 64, num_workers = 4)

That’s all there is to it about data. There are no changes to the workflow so far. There is also no difference in how the model is defined.

model

To speed up training, we pre-trained AlexNet ( Kryszewski (2014)).

net <- torch::nn_module(
  
  initialize = function(output_size) {
    self$model <- model_alexnet(pretrained = TRUE)

    for (par in self$parameters) {
      par$requires_grad_(FALSE)
    }

    self$model$classifier <- nn_sequential(
      nn_dropout(0.5),
      nn_linear(9216, 512),
      nn_relu(),
      nn_linear(512, 256),
      nn_relu(),
      nn_linear(256, output_size)
    )
  },
  forward = function(x) {
    self$model(x)(,1)
  }
  
)

If you look closely, you’ll see that everything we’ve done so far is as follows: define model. Unlike torch-We will not just instantiate the workflow and we will not move it to the final GPU.

Expanding on the latter, we can further say: every Device processing is managed by: luz. Investigate whether you have a CUDA-capable GPU, and if found, ensure that both model weights and data tensors are transparently moved whenever necessary. The same goes in the opposite direction. For example, predictions computed on the test set are automatically transferred to the CPU so the user can manipulate them further in R. But when it comes to predictions, we’re not there yet. Model training makes a difference. luz It catches your eye right away.

training

Four calls are shown below. luzTwo of them are required for all settings and two are case dependent. Things you always need setup() and fit() :

in setup()you tell me luz How much the loss will be and what optimization tools to use. Optionally, in addition to the loss itself (the default metric that informs weight updates), you can: luz Calculate further. For example, here we are asking for classification accuracy. (For someone looking at a progress bar, a second-class accuracy of 0.91 makes much more sense than a cross-entropy loss of 1.26.)
in fit()Pass references to training and validation. dataloaderS. There are default values for the number of epochs to train, but it is usually a good idea to also pass a custom value for this parameter.

Here the case-by-case call is: set_hparams() and set_opt_hparams(). here,

set_hparams() In the model definition we have initialize() Take the parameters, output_size. expected acquisition initialize() It must be delivered this way.
set_opt_hparams() This is because we want to use a non-default learning rate. optim_adam(). If you’re happy with the defaults, those calls won’t be made out of order.

fitted <- net %>%
  setup(
    loss = nn_bce_with_logits_loss(),
    optimizer = optim_adam,
    metrics = list(
      luz_metric_binary_accuracy_with_logits()
    )
  ) %>%
  set_hparams(output_size = 1) %>%
  set_opt_hparams(lr = 0.01) %>%
  fit(train_dl, epochs = 3, valid_data = valid_dl)

The output is as follows:

Epoch 1/3
Train metrics: Loss: 0.8692 - Acc: 0.9093
Valid metrics: Loss: 0.1816 - Acc: 0.9336
Epoch 2/3
Train metrics: Loss: 0.1366 - Acc: 0.9468
Valid metrics: Loss: 0.1306 - Acc: 0.9458
Epoch 3/3
Train metrics: Loss: 0.1225 - Acc: 0.9507
Valid metrics: Loss: 0.1339 - Acc: 0.947

Training is complete. Please contact us. luz To save the trained model:

luz_save(fitted, "dogs-and-cats.pt")

Test set prediction

And finally, predict() Get predictions for the passed data. dataloader – Here is the test set. As the first argument, we expect a fitted model.

preds <- predict(fitted, test_dl)

probs <- torch_sigmoid(preds)
print(probs, n = 5)

torch_tensor
 1.2959e-01
 1.3032e-03
 6.1966e-05
 5.9575e-01
 4.5577e-03
... (the output was truncated (use n=-1 to disable))
( CPUFloatType{5000} )

This is it for the complete workflow. If you’ve ever used Keras, it will feel quite familiar. The same goes for even the most diverse yet standardized customization techniques. luz.

How to do (almost) anything at any time

Like hard, luz has the concept of callback You can “attach” to the training process and run arbitrary R code. Specifically, you can schedule code to run at the following times:

When the entire learning process begins or ends (on_fit_begin() / on_fit_end());
When the era of training and verification begins or ends (on_epoch_begin() / on_epoch_end());
When half of the training (validation, each) starts or ends during an epoch (on_train_begin() / on_train_end(); on_valid_begin() / on_valid_end());
When a new batch is about to be processed or has been processed during training (validation, etc.)on_train_batch_begin() / on_train_batch_end(); on_valid_batch_begin() / on_valid_batch_end());
The same goes for certain landmarks inside the “innermost” training/validation logic, such as “after loss calculation”, “after backwards”, or “after step”.

You can use this technique to implement any logic you want, but luz It already comes with a very useful set of callbacks.

for example:

luz_callback_model_checkpoint() Save model weights periodically.
luz_callback_lr_scheduler() You can enable one of the following: torch‘S learning rate scheduler. There are a variety of schedulers, each following its own logic in how it dynamically adjusts the learning rate.
luz_callback_early_stopping() Terminate training if model performance does not improve.

The callback is passed to: fit() From the list. Here we apply the above example to (1) ensure that the model weights are saved after each epoch and (2) ensure that training is terminated if the validation loss does not improve for two consecutive epochs.

fitted <- net %>%
  setup(
    loss = nn_bce_with_logits_loss(),
    optimizer = optim_adam,
    metrics = list(
      luz_metric_binary_accuracy_with_logits()
    )
  ) %>%
  set_hparams(output_size = 1) %>%
  set_opt_hparams(lr = 0.01) %>%
  fit(train_dl,
      epochs = 10,
      valid_data = valid_dl,
      callbacks = list(luz_callback_model_checkpoint(path = "./models"),
                       luz_callback_early_stopping(patience = 2)))

What about other types of flexibility requirements, such as scenarios of multiple interaction models, each with its own loss function and optimization function? In that case the code will be slightly longer than what you see here, but luz It can still be of great help in streamlining your workflow.

In conclusion, use luzYou don’t lose the flexibility that comes with torchCode simplicity, modularity, and maintainability are greatly improved. We’ll be happy to hear that you’ll give it a try!

Thanks for reading!

Photo by JD Rincs on Unsplash

Kryszewski, Alex. 2014. “One of the strangest ways to parallelize convolutional neural networks.” CoRR ABS/1404.5997. http://arxiv.org/abs/1404.5997.

Posit AI Blog: Let There Be Light: Shine more light on the torch!

DJI Air 3S – Initial Setup (Includes Video)

Drone Rush 2018 State of the Industry Report

Review: BDI Patch G2 Antenna for DJI Goggles 2 – Antenna Tests and Results

Light it up! Snoop Dogg carries the Olympic torch at the final games in Paris – National

Gausman contributes to Blue Jays’ sweep of Angels

A Drake security guard was shot outside his Toronto home.

The Jays scored four runs in the eighth to beat the Rays 6-3.

The Raptors are learning how to close close games.

Plan your advertising campaigns with Amazon Marketing Cloud in AWS Clean Rooms. Now generally available.

It takes 48 hours to clean up a 90-minute mess: Amid Biden camp’s post-debate frenzy

Who is slowing down Luka Dončić? The Celtics or himself?

Our Picks

Today’s match prediction -NZLW vs SLW-Dream11-T20 World Cup 2024-15th match – Who will win?

Trump jokes about ‘Nazi oven’ to Jewish executives: Former Trump Organization vice president

Former Roma player Marco D’Alessandro reflects on Ranieri’s return.

Most Popular

Light it up! Snoop Dogg carries the Olympic torch at the final games in Paris – National

Gausman contributes to Blue Jays’ sweep of Angels

A Drake security guard was shot outside his Toronto home.

Posit AI Blog: Let There Be Light: Shine more light on the torch!

Train, validate and then test: basic deep learning workflow luz

data

model

training

Test set prediction

How to do (almost) anything at any time

Related Posts

Train, validate and then test: basic deep learning workflow `luz`