A high-performance inference engine for small language models and vision-language models on the edge.
Mobile-native LLM serving engine research in Rust. Paged KV cache, multi-session scheduling, and Metal/Vulkan kernels for on-device inference under 512MB RAM.
CellmFFI.xcframework: 33MB total (11MB per
platform: iOS, Simulator, macOS)
cellm-sdk.aar: 157KB + ~4.5MB .so
cellm-wasm: 2.6MB .wasm + WebGPU support + 21KB JS glue [Launch Demo]
© 2026 cellm research. jeff asante. minimalist technical documentation.