Semantic segmentation models map each pixel of the input image to a category. It could be seen as pixelwise multiclass classification.
A datasets follows this structure:
endpoint_url/bucket
├── prefix/images/
├── prefix/semantic_maps/
└── prefix/metadata.yaml
Dataset images are placed directly inside images/ (subdirectories are ignored).
The metadata file looks something like this:
task: semantic segmentation
annotations: semantic_maps/
categories: [cat1, cat2, cat3, cat4]
colors:
cat1: [255, 0, 0] # red
cat2: [0, 0, 255] # blue
cat3: [255, 255, 0] # yellow
The annotations field specifies the name of
the folder containing the ground truth annotations, which share the file name
with the image they are associated with (e.g. semantic_maps/000.png annotates images/000.jpg).
An annotation is a PNG color image, using the colors listed in the metadata to
indicate the semantic regions of the associated input image (file in images/ with the same name).
Category colors can be any RGB triplet except for black ([0, 0, 0]), which is reserved for pixels to be ignored in training and validation.
Annotation images can have a smaller resolution as their associated input image,
but the aspect ratio must be the same.