Pure C/C++ implementation of LLM inference. No dependencies, runs on CPU and GPU with quantization support for consumer hardware.