Anass Kartit

Writing

Notes from the workbench

AI, local LLMs, cloud architecture, and developer tools — benchmarked, built, and broken so you don't have to.

LATEST

Gemma 4 26B Won't Fit on My 24GB MacBook — Until I Did This

Ollama gives 2 tok/s with broken tool calling. I got 49 tok/s with perfect tool calling using Unsloth Q3_K_XL + llama.cpp. Then I built a Claude Code clone on top of it.

gemma 4llama.cppunslothlocal aiapple silicon