LLM as Evaluator
How large language models can score, grade, or critique other models and humans.
No evaluator experiments are published yet.
How large language models can score, grade, or critique other models and humans.
No evaluator experiments are published yet.