benmeryem-tech/llm-eval-kit
A lightweight, modular toolkit for evaluating and benchmarking Large Language Models with focus on reasoning quality, consistency, and error detection.
0Active
On the radar — signal detected
Stars
3
Forks
0
Contributors
0
Language
Python
Score updated May 12, 2026
// SUBSCRIBE
The repos that moved this week, why they matter, and what to watch next. One email. No noise.