smaller. faster. open.

Prune, quantize, and distill neural networks. Ship models that are smaller and faster, on any hardware.

optimize.py

from fasterai import optimize

result = optimize(model, sample, target='speed')

result.model         # → compressed, ready to use
result.compression   # → {'size': 8.1, 'latency': 3.4}
result.export('model.onnx')

up to10×

faster

up to90%

smaller

up to70%

less CO₂

Compared to the original uncompressed model. Best-case results from combined pruning, quantization, and distillation.

Choose your path

Two ways to optimize.

Use our open-source tools yourself, or let us handle it for you.

DIY

Use our tools

Open-source, Apache 2.0 licensed

Pruning, quantization, distillation, benchmarking

Full documentation and tutorials

Community support via Discord

Browse Libraries

Done for you

Work with us

We audit your model and recommend a compression strategy

Apply our proprietary optimization pipeline

Deliver a production-ready compressed model

Typical results: 3–10× speedup, minimal accuracy loss

Book a Call

smaller. faster. open.

Two ways to optimize.

Use our tools

Work with us

smaller. faster. open.

Company

Resources

Legal