Depth estimation models map each pixel of an input image to a numerical value (the "depth"). It could be seen as pixelwise regression.
Datasets follow this structure:
endpoint_url/bucket/
├── prefix/images/
├── prefix/depth_maps/
└── prefix/metadata.yaml
Dataset images are placed directly inside images/ (subdirectories are ignored).
The metadata file looks something like this:
task: depth estimation
annotations: depth_maps/
range: [-10, 8.7]
The annotations field specifies the name of
the folder containing the ground truth annotations, which share the file name
with the image they are associated with (e.g. depth_maps/000.png annotates images/000.jpg).
An annotation is a (8-bit or 16-bit) grayscale PNG image.
The maximum depth value of the given range is mapped to the pixel value 1, while
the minimum depth value of the given range is mapped to 255 (for 8-bit maps,
and 65'535 for 16-bit maps).
Intermediate depth values are linearly interpolated and rounded to the nearest
integer. Pure black pixels (value 0) are ignored during training and validation.
Intuitively, this means that the brighter a pixel, the "closer" it is in the
foreground.
Here's an example of how to generate a 16-bit depth map PNG image in Python:
import numpy as np
from PIL import Image
image = np.random.randint(0, 2**16, (100, 100), np.uint16)
Image.fromarray(image).save("tmp.png")