Getting started

Training

While the training process is fully automated, the choice of hyperparameters (i.e. what and how to train) is left to you, if the default values are not satisfactory.

It helps to understand the meta-architecture of our models. They are composed of a stack of 2 or 3 layers:

  1. Backbone contains most of the weights. It generates abstract representations of the inputs for the head(s).
  2. Neck optional layer that fuses the backbone's features at multiple scales.
  3. Head(s) a head solves a specific ML task. Some heads attach to a single level; others, to multiple (with shared weights).

Neural networks are not scale-invariant, and they usually manage to understand large spatial contexts by processing the input in "levels", each of which downscales feature maps by a factor of 2. If the top level of a backbone (i.e. its "depth") is i, then the feature maps are downscaled by 2^i (the "stride") compared to the input image.

Here's an example model with a backbone (top level: 5), a neck (bottom level: 3, top level: 7), and a shared head (same levels as the neck):

    block-beta
    columns 7
  
    classDef title fill:#0000,stroke:#0000;
  
    block:level_group
      columns 1
      Level
      l7["7"] space l6["6"] space l5["5"] space l4["4"] space l3["3"] space l2["2"] space l1["1"] space l0["0"]
  
      class Level,l7,l6,l5,l4,l3,l2,l1,l0 title
    end
    class level_group title
  
    block:stride_group
      columns 1
      Stride
      s7["128"] space s6["64"] space s5["32"] space s4["16"] space s3["8"] space s2["4"] space s1["2"] space s0["1"]
  
      class Stride,s7,s6,s5,s4,s3,s2,s1,s0 title
    end
    class stride_group title
  
  
    block:input_group
      columns 1
      space:15
      image
    end
    class input_group title

    block:backbone_group:1
      columns 1
      Backbone
      space:4 B5 space B4 space B3 space B2 space B1 space:2
      class Backbone title
    end

    image --> B1
    B1 --> B2
    B2 --> B3
    B3 --> B4
    B4 --> B5

    block:neck_group:1
      columns 1
      Neck
      N7 space N6 space N5 space N4 space N3 space:6
      class Neck title
    end
    
    B5 --> N5
    B4 --> N4
    B3 --> N3

    N3 --> N4
    N4 --> N5
    N5 --> N6
    N6 --> N7

    block:head_group:1
      columns 1
      Head
      h7["H"] space h6["H"] space h5["H"] space h4["H"] space h3["H"] space:6
      class Head title
    end
    N7 --> h7
    N6 --> h6
    N5 --> h5
    N4 --> h4
    N3 --> h3
  

Jobs

Once hyperparameters are chosen, you can submit a training job (i.e. a request to find an available GPU runner and train your model on the project's dataset).
The job is always in one of these states:
The job is first queued, waiting for all other jobs you previously submitted in the same project to complete.
It is then starting, meaning it's been assigned an available runner, who is downloading your dataset to its local workspace and generally setting things up.
Then, it is running. During this phase, logs will be uploaded gradually, so you can follow the training's progress.
When it is starting or running, you can stop a job, essentially canceling it ASAP.
If a job isn't stopped or hasn't errored out, it will run until it reaches either the specified maximum number of iterations, or the maximum number of training hours; it is then considered done.

Models can be "duplicated", meaning that you can create a new model with the same hyperparameters, and just modify the one or few you want.
The "config" tab in the logs page allows you to compare model hyperparameters side by side.

Multi-task learning

(coming soon)