Open-source AI compression

smaller. faster. open.

Prune, quantize, and distill neural networks. Ship models that are smaller and faster, on any hardware.

compress.py
from fasterai.prune.all import *
from fasterbench import benchmark

pruner = Pruner(model, 50, 'global', large_final)
pruner.prune_model()

benchmark(model, dummy).summary()
up to10×

faster

up to90%

smaller

up to70%

less CO₂

Compared to the original uncompressed model. Best-case results from combined pruning, quantization, and distillation.

Choose your path

Two ways to optimize.

Use our open-source tools yourself, or let us handle it for you.

DIY

Use our tools

Open-source, Apache 2.0 licensed
Pruning, quantization, distillation, benchmarking
Full documentation and tutorials
Community support via Discord
Browse Libraries
Done for you

Work with us

We audit your model and recommend a compression strategy
Apply our proprietary optimization pipeline
Deliver a production-ready compressed model
Typical results: 3–10× speedup, minimal accuracy loss
Book a Call

smaller. faster. open.


© 2026 smaller. faster. open. All rights reserved.