cellm

github.com/jeffasante/cellm →

A high-performance inference engine for small language models and vision-language models on the edge.

Mobile-native LLM serving engine research in Rust. Paged KV cache, multi-session scheduling, and Metal/Vulkan kernels for on-device inference under 512MB RAM.

CellmFFI.xcframework: 33MB total (11MB per platform: iOS, Simulator, macOS)

Project Overview

Core Architecture

Performance & Memory

Model Research

Development

Mobile & Bindings