🧑⚖️ Self-improving evaluators in LangSmith One method for evaluating LLM systems is to use another LLM "as a judge". These "LLM-as-a-Judges" can review raw text, using a prompt to guide the grader and automate human review. However, these "LLM-as-a-Judge" systems require constant prompt engineering to align with human preferences. In LangSmith, you can now use "LLM-as-a-Judge" evaluators with a self-improving feedback loop: + Allow a human to easily correct the "LLM-as-a-Judge" + And easily pass these back to the 'LLM-as-a-Judge' as few shot examples In part 1 last week, we showed how to apply self-improving evaluators to any LangSmith project: + The evaluator is applied to all traces in your project automatically and can run on production logs + It's easy to review, correct, and pass back correction to improve the evaluator Here in part 2, we show how to pin self-improving evaluators to any LangSmith dataset: + The evaluator is applied on every experiment run on your dataset In both cases, the evaluator can be self-improved with human feedback! 🎥 Video: https://lnkd.in/gi6CG6qH 📓 Docs: https://lnkd.in/gPbYCnvm 🤓 Data flywheel resource: https://lnkd.in/gurFTjC9 ✍️ Blog: https://lnkd.in/gvEgXuJU
Nag Maddula - Please have a look for the product modeler , to refine the answers before publishing to user !
The concept of self-improving evaluators in LangSmith is a game-changer! It's impressive how it allows a human to easily correct and improve the LLM-as-a-Judge systems, creating a continuous feedback loop for enhanced performance. The ability to apply these evaluators to any project or dataset further adds to its utility. Excited to see more advancements in this space! Keep up the excellent work.