cellm
WASM LLM Inference Demo
Model
Load
Tokenizer:
Load Qwen Debug Model
Load SmolVLM Debug Model
Backend:
CPU (WASM)
WebGPU (Metal/Vulkan)
Waiting for model file...
VIEW LOGS
Suggested Models (Hugging Face)
NanoWhale-100M (MLA+MoE)
Qwen2.5-0.5B (8bit)
LFM-350M
SmolLM2-360M
Bonsai-1.7B
VLM Models (Text + Vision)
SmolVLM-256M (Vision)
SmolVLM-500M (Vision)
SmolVLM-256M (Debug Full)
Text
Chat
Inference
What is consciousness? in one paragraph
Max tokens:
Fair
Latency First
Throughput
Generate
Stop
Quick Image Description
Describe
Describe this image in detail.
Attach an image and click Describe...
Tokens
0
Sessions
0
Free Blocks
0
Chat
Send a message to start the conversation.
Load a VLM model + enable WebGPU for image chat.
+
Send
Stop
Vision (VLM) — Describe an Image
Image (JPEG/PNG):
Describe this image in detail.
Max tokens:
Describe
Sample Image
Stop
Upload an image and click Describe to generate a description...
Vision encoder:
-
ms | Decode:
-
ms | Total:
-
ms
Documentation
Load WASM Backend Docs