A high-performance inference engine for small language models and vision-language models on the edge.
Mobile-native LLM serving engine research in Rust. Paged KV cache, multi-session scheduling, and Metal/Vulkan kernels for on-device inference under 512MB RAM.
CellmFFI.xcframework: 33MB total (11MB per platform: iOS, Simulator, macOS)