The LLM evaluation framework. Pytest-like unit tests for LLMs with hallucination detection, RAG evaluation, and 50+ built-in metrics.