Training jobs are configured by a set of hyperparameters, which all have default values, but are available for you to adjust to finely control the model architecture and training process.
It helps to understand the meta-architecture of our models. They are composed of 2 or 3 parts:
Neural networks are not scale-invariant, and they usually manage to understand large spatial contexts by processing the input in "levels", each of which downscales feature maps by a factor of 2. If the top level of a backbone (i.e. its "depth") is i, then the feature maps are downscaled by 2^i (the "stride") compared to the input image.
Here's an example model with a backbone (levels 1 to 5), a neck (levels 3 to 7), and a shared head (levels 3 to 7):
block-beta
columns 7
classDef title fill:#0000,stroke:#0000;
block:level_group
columns 1
Level
l7["7"] space l6["6"] space l5["5"] space l4["4"] space l3["3"] space l2["2"] space l1["1"] space l0["0"]
class Level,l7,l6,l5,l4,l3,l2,l1,l0 title
end
class level_group title
block:stride_group
columns 1
Stride
s7["128"] space s6["64"] space s5["32"] space s4["16"] space s3["8"] space s2["4"] space s1["2"] space s0["1"]
class Stride,s7,s6,s5,s4,s3,s2,s1,s0 title
end
class stride_group title
block:input_group
columns 1
space:15
image
end
class input_group title
block:backbone_group:1
columns 1
Backbone
space:4 B5 space B4 space B3 space B2 space B1 space:2
class Backbone title
end
image --> B1
B1 --> B2
B2 --> B3
B3 --> B4
B4 --> B5
block:neck_group:1
columns 1
Neck
N7 space N6 space N5 space N4 space N3 space:6
class Neck title
end
B5 --> N5
B4 --> N4
B3 --> N3
N4 --> N3
N5 --> N4
N5 --> N6
N6 --> N7
block:head_group:1
columns 1
Head
h7["H"] space h6["H"] space h5["H"] space h4["H"] space h3["H"] space:6
class Head title
end
N7 --> h7
N6 --> h6
N5 --> h5
N4 --> h4
N3 --> h3