Object detection models detect instances in an image and produce an axis-aligned rectangular bounding box for each, in addition to the instance category.
Datasets follow this structure:
endpoint_url/bucket
├── prefix/images/
├── prefix/instances.yaml
└── prefix/metadata.yaml
Dataset images are placed directly inside images/ (subdirectories are ignored).
The metadata file looks something like this:
task: object detection
annotations: instances.yaml
categories: [cat1, cat2, cat3]
The annotations field specifies the name of
the file containing the ground truth annotations.
Here's an example of annotations file:
000.jpg:
- category: cat2
bbox: [0, 3, 6, 10] # [xmin, ymin, xmax, ymax]
- category: cat1
bbox: [2, 2, 7, 9]
001.jpg: [] # no instance in this image
# ...