High-throughput, memory-efficient LLM inference engine. Supports PagedAttention, continuous batching, and tensor parallelism for production deployments.