While the training process is fully automated, the choice of hyperparameters (i.e. what and how to train) is left to you, if the default values are not satisfactory.
It helps to understand the meta-architecture of our models. They are composed of a stack of 2 or 3 layers:
Neural networks are not scale-invariant, and they usually manage to understand large spatial contexts by processing the input in "levels", each of which downscales feature maps by a factor of 2. If the top level of a backbone (i.e. its "depth") is i, then the feature maps are downscaled by 2^i (the "stride") compared to the input image.
Here's an example model with a backbone (top level: 5), a neck (bottom level: 3, top level: 7), and a shared head (same levels as the neck):
block-beta columns 7 classDef title fill:#0000,stroke:#0000; block:level_group columns 1 Level l7["7"] space l6["6"] space l5["5"] space l4["4"] space l3["3"] space l2["2"] space l1["1"] space l0["0"] class Level,l7,l6,l5,l4,l3,l2,l1,l0 title end class level_group title block:stride_group columns 1 Stride s7["128"] space s6["64"] space s5["32"] space s4["16"] space s3["8"] space s2["4"] space s1["2"] space s0["1"] class Stride,s7,s6,s5,s4,s3,s2,s1,s0 title end class stride_group title block:input_group columns 1 space:15 image end class input_group title block:backbone_group:1 columns 1 Backbone space:4 B5 space B4 space B3 space B2 space B1 space:2 class Backbone title end image --> B1 B1 --> B2 B2 --> B3 B3 --> B4 B4 --> B5 block:neck_group:1 columns 1 Neck N7 space N6 space N5 space N4 space N3 space:6 class Neck title end B5 --> N5 B4 --> N4 B3 --> N3 N3 --> N4 N4 --> N5 N5 --> N6 N6 --> N7 block:head_group:1 columns 1 Head h7["H"] space h6["H"] space h5["H"] space h4["H"] space h3["H"] space:6 class Head title end N7 --> h7 N6 --> h6 N5 --> h5 N4 --> h4 N3 --> h3
Once hyperparameters are chosen, you can submit a training job (i.e. a
request to find an available GPU runner and train your model on the
project's dataset).
The job is always in one of these states:
The job is first queued, waiting for all
other jobs you previously submitted in the same project to complete.
It is then starting, meaning it's been
assigned an available runner, who is downloading your dataset to its local
workspace and generally setting things up.
Then, it is running. During this phase,
logs will be uploaded gradually, so you can follow the training's
progress.
When it is starting or running, you can stop a job, essentially canceling it ASAP.
If a job isn't stopped or hasn't errored out, it will run until it reaches either the specified maximum number of
iterations, or the maximum number of training hours; it is then considered
done.
Models can be "duplicated", meaning that you can create a new model with
the same hyperparameters, and just modify the one or few you want.
The "config" tab in the logs page allows you to compare model hyperparameters
side by side.