Custom loops with luz

Luz is a higher level API for torch that is designed to be highly flexible by providing a layered API that allows it to be useful no matter the level of control your need for your training loop.

In the getting started vignette we have seen the basics of luz and how to quickly modify parts of the training loop using callbacks and custom metrics. In this document we will describe how luz allows the user to get fine-grained control of the training loop.

Apart from the use of callbacks, there are three more ways that you can use luz (depending on how much control you need):

Let’s consider a simplified version of the net that we implemented in the getting started vignette:

Multiple optimizers

Suppose we want to do an experiment where we train the first fully connected layer using a learning rate of 0.1 and the second one using a learning rate of 0.01. We will minimize the same nn_cross_entropy_loss() for both, but for the first layer we want to add L1 regularization on the weights.

In order to use luz for this, we will implement two methods in the net module:

set_optimizers: returns a named list of optimizers depending on the ctx.
loss: computes the loss depending on the selected optimizer.

Let’s go to the code:

net <- nn_module(
  "Net",
  initialize = function() {
    self$fc1 <- nn_linear(100, 50)
    self$fc1 <- nn_linear(50, 10)
  },
  forward = function(x) {
    x %>% 
      self$fc1() %>% 
      nnf_relu() %>% 
      self$fc2()
  },
  set_optimizers = function(lr_fc1 = 0.1, lr_fc2 = 0.01) {
    list(
      opt_fc1 = optim_adam(self$fc1$parameters, lr = lr_fc1),
      opt_fc2 = optim_adam(self$fc2$parameters, lr = lr_fc2)
    )
  },
  loss = function(input, target) {
    pred <- ctx$model(input)
  
    if (ctx$opt_name == "opt_fc1") 
      nnf_cross_entropy(pred, target) + torch_norm(self$fc1$weight, p = 1)
    else if (ctx$opt_name == "opt_fc2")
      nnf_cross_entropy(pred, target)
  }
)

Notice that the model optimizers will be initialized according to the set_optimizers() method’s return value (a list). In this case, we are initializing the optimizers using different model parameters and learning rates.

The loss() method is responsible for computing the loss that will then be back-propagated to compute gradients and update the weights. This loss() method can access the ctx object that will contain an opt_name field, describing which optimizer is currently being used. Note that this function will be called once for each optimizer for each training and validation step. See help("ctx") for complete information about the context object.

We can finally setup and fit this module, however we no longer need to specify optimizers and loss functions.

fitted <- net %>% 
  setup(metrics = list(luz_metric_accuracy)) %>% 
  fit(train_dl, epochs = 10, valid_data = test_dl)

Now let’s re-implement this same model using the slightly more flexible approach of overriding the training and validation step.

Fully flexible step

Instead of implementing the loss() method, we can implement the step() method. This allows us to flexibly modify what happens when training and validating for each batch in the dataset. You are now responsible for updating the weights by stepping the optimizers and back-propagating the loss.

net <- nn_module(
  "Net",
  initialize = function() {
    self$fc1 <- nn_linear(100, 50)
    self$fc1 <- nn_linear(50, 10)
  },
  forward = function(x) {
    x %>% 
      self$fc1() %>% 
      nnf_relu() %>% 
      self$fc2()
  },
  set_optimizers = function(lr_fc1 = 0.1, lr_fc2 = 0.01) {
    list(
      opt_fc1 = optim_adam(self$fc1$parameters, lr = lr_fc1),
      opt_fc2 = optim_adam(self$fc2$parameters, lr = lr_fc2)
    )
  },
  step = function() {
    ctx$loss <- list()
    for (opt_name in names(ctx$optimizers)) {
    
      pred <- ctx$model(ctx$input)
      opt <- ctx$optimizers[[opt_name]]
      loss <- nnf_cross_entropy(pred, target)
      
      if (opt_name == "opt_fc1") {
        # we have L1 regularization in layer 1
        loss <- nnf_cross_entropy(pred, target) + 
          torch_norm(self$fc1$weight, p = 1)
      }
        
      if (ctx$training) {
        opt$zero_grad()
        loss$backward()
        opt$step()  
      }
      
      ctx$loss[[opt_name]] <- loss$detach()
    }
  }
)

The important things to notice here are:

The step() method is used for both training and validation. You need to be careful to only modify the weights when training. Again, you can get complete information regarding the context object using help("ctx").
ctx$optimizers is a named list holding each optimizer that was created when the set_optimizers() method was called.
You need to manually track the losses by saving saving them in a named list in ctx$loss. By convention, we use the same name as the optimizer it refers to. It is good practice to detach() them before saving to reduce memory usage.
Callbacks that would be called inside the default step() method like on_train_batch_after_pred, on_train_batch_after_loss, etc, won’t be automatically called. You can still cal them manually by adding ctx$call_callbacks("<callback name>") inside your training step. See the code for fit_one_batch() and valid_one_batch to find all the callbacks that won’t be called.

Custom loops with luz

Multiple optimizers

Fully flexible step

Next steps