Getting started

Inference

ONNX is an open-source format for deep learning models. You can use netron to visualize ONNX models.
The ONNX runtime supports many platforms for efficient inference, including desktop and mobile CPUs, GPUs, and even WebGPU. It also has APIs in many programming languages.
As described in the official documentation, here's how you'd run inference in Python:

import onnxruntime as ort
import numpy as np
from PIL import Image

x = np.array(Image.open("image.jpg"))
ort_sess = ort.InferenceSession("model.onnx")
outputs = ort_sess.run(None, {"input": x})

TensorRT

TensorRT provides great acceleration on Nvidia GPUs.
The ONNX Runtime supports TensorRT as an execution provider (i.e. a backend to run ONNX models).

ort_sess = ort.InferenceSession(
    'model.onnx', 
    providers=['TensorrtExecutionProvider', 'CUDAExecutionProvider']
)

But you can bypass ONNX entirely and also use TensorRT directly using the engine format.

import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
import numpy as np
from PIL import Image

# Load engine
with open("model.trt", "rb") as f:
    engine = trt.Runtime(trt.Logger(trt.Logger.WARNING)).deserialize_cuda_engine(f.read())
context = engine.create_execution_context()

# Prepare input
x = np.array(Image.open("image.jpg"))
inp = cuda.pagelocked_empty(x.size, x.dtype)
out = cuda.pagelocked_empty(engine.get_binding_shape(1).prod(), np.float32)
np.copyto(inp, x.ravel())

# Allocate device memory
d_inp = cuda.mem_alloc(inp.nbytes)
d_out = cuda.mem_alloc(out.nbytes)

# Run inference
cuda.memcpy_htod(d_inp, inp)
context.execute_v2(bindings=[int(d_inp), int(d_out)])
cuda.memcpy_dtoh(out, d_out)

PyTorch

Our models are trained with Pytorch, and we provide the weights as a state dict for you to load, whether for inference or further training.
To load the model weights, you need to instantiate the model class with the same hyperparameters used during training.

from sihl import SihlLightningModule
import torch

model = SihlLightningModule(**hyperparameters)
model.load_state_dict(torch.load("model.pt", weights_only=True))
model.eval()  # for inference

Hosted API

[coming soon]