Tasks

Depth estimation

Depth estimation models map each pixel of an input image to a numerical value (the "depth"). It could be seen as pixelwise regression.

Dataset format

Datasets follow this structure:

endpoint_url/bucket/
├── prefix/images/
├── prefix/depth_maps/
└── prefix/metadata.yaml

Dataset images are placed directly inside images/ (subdirectories are ignored).
The metadata file looks something like this:

metadata.yaml
task: depth estimation
annotations: depth_maps/
range: [-10, 8.7]

The annotations field specifies the name of the folder containing the ground truth annotations, which share the file name with the image they are associated with (e.g. depth_maps/000.png annotates images/000.jpg).
An annotation is a (8-bit or 16-bit) grayscale PNG image.
The maximum depth value of the given range is mapped to the pixel value 1, while the minimum depth value of the given range is mapped to 255 (for 8-bit maps, and 65'535 for 16-bit maps).
Intermediate depth values are linearly interpolated and rounded to the nearest integer. Pure black pixels (value 0) are ignored during training and validation.
Intuitively, this means that the brighter a pixel, the "closer" it is in the foreground.

Here's an example of how to generate a 16-bit depth map PNG image in Python:

import numpy as np
from PIL import Image
image = np.random.randint(0, 2**16, (100, 100), np.uint16)
Image.fromarray(image).save("tmp.png")