cellm

github.com/jeffasante/cellm →

A high-performance inference engine for small language models and vision-language models on the edge.

Mobile-native LLM serving engine research in Rust. Paged KV cache, multi-session scheduling, and Metal/Vulkan kernels for on-device inference under 512MB RAM.

CellmFFI.xcframework: 33MB total (11MB per platform: iOS, Simulator, macOS)
cellm-sdk.aar: 157KB + ~4.5MB .so
cellm-wasm: 2.6MB .wasm + WebGPU support + 21KB JS glue [Launch Demo]

Loading latest commit...

Project Overview

Core Architecture

Performance & Memory

Model Research

Development

Mobile & Bindings

© 2026 cellm research. jeff asante. minimalist technical documentation.