Getting started
Hyperparameters
Training jobs are configured by a set of hyperparameters, which all have
default values, but are available for you to adjust to finely control the
model architecture and training process.
It helps to understand the meta-architecture of our models. They are
composed of 2 or 3 parts:
-
Backbone
contains most of the weights. It generates abstract representations of
the inputs for the head(s).
-
Neck
optional layer that enhances features across scales.
-
Head(s)
solves a specific ML task. Some heads attach to a single level, others
to multiple (with shared weights).
Neural networks are not scale-invariant, and they usually manage to
understand large spatial contexts by processing the input in "levels", each
of which downscales feature maps by a factor of 2. If the top level of a
backbone (i.e. its "depth") is i, then the
feature maps are downscaled by
2^i (the "stride") compared to the input image.
Here's an example model with a backbone (levels 1 to 5), a neck (levels 3 to
7), and a shared head (levels 3 to 7):
block-beta
columns 7
classDef title fill:#0000,stroke:#0000;
block:level_group
columns 1
Level
l7["7"] space l6["6"] space l5["5"] space l4["4"] space l3["3"] space l2["2"] space l1["1"] space l0["0"]
class Level,l7,l6,l5,l4,l3,l2,l1,l0 title
end
class level_group title
block:stride_group
columns 1
Stride
s7["128"] space s6["64"] space s5["32"] space s4["16"] space s3["8"] space s2["4"] space s1["2"] space s0["1"]
class Stride,s7,s6,s5,s4,s3,s2,s1,s0 title
end
class stride_group title
block:input_group
columns 1
space:15
image
end
class input_group title
block:backbone_group:1
columns 1
Backbone
space:4 B5 space B4 space B3 space B2 space B1 space:2
class Backbone title
end
image --> B1
B1 --> B2
B2 --> B3
B3 --> B4
B4 --> B5
block:neck_group:1
columns 1
Neck
N7 space N6 space N5 space N4 space N3 space:6
class Neck title
end
B5 --> N5
B4 --> N4
B3 --> N3
N4 --> N3
N5 --> N4
N5 --> N6
N6 --> N7
block:head_group:1
columns 1
Head
h7["H"] space h6["H"] space h5["H"] space h4["H"] space h3["H"] space:6
class Head title
end
N7 --> h7
N6 --> h6
N5 --> h5
N4 --> h4
N3 --> h3
Trainer
-
Initialization
random, imagenet-pretrained, or weights of any model from any
project you're in
-
Maximum training iterations
-
Maximum training hours
-
Number of validations
-
Batch size
-
Gradient clipping
limit gradient norms to stabilize training (optional) - (choose one)
-
SGD
- learning rate
-
weight decay(optional)
-
momentum(optional)
-
AdamW
- learning rate
-
weight decay(optional)
- (optional, choose one)
-
Multi-step
multiply the learning rate by the given factor at every
milestone
- milestones
- learning rate factor
-
One-cycle
Start with a reduced learning rate, increase it to its
reference value during the warm-up period, then decrease it
again for the rest of the schedule.
- initial learning rate factor
- final learning rate factor
- warm-up period
Data
In order to batch samples together, they need to have the same shape.
We automatically scale and pad images to the desired resolution
("letterbox resizing"), thus avoiding any distortion or cropping. Backbone
- (choose one)
- Resnet 18, 34, 50, 101, 152
- Efficientnet B0, B1, B2, B3, B4, B5, B6, B7
- Efficientnet V2 small, medium, large
- Mobilenet V2, V3 small, V3 large
- [more coming soon...]
-
Resize input?
If enabled, arbitrarily sized inputs will be scaled to the
training resolution during inference (which makes latency more
predictable). If disabled, inputs are processed at their actual
resolution.
-
Top level ("depth")
Level i has stride 2^i
Neck
- (optional, choose one)
-
FPN
- Levels to fuse
-
Output channels
Output feature maps from levels fused by the neck will
all have the same number of channels
-
BiFPN
- Levels to fuse
-
Output channels
Output feature maps from levels fused by the neck will
all have the same number of channels
-
Number of layers
How many repetitions of this bi-directional neck
Head
(see each task's docs)