LangChain’s Post

View organization page for LangChain, graphic

239,009 followers

🧑⚖️ Self-improving evaluators in LangSmith One method for evaluating LLM systems is to use another LLM "as a judge". These "LLM-as-a-Judges" can review raw text, using a prompt to guide the grader and automate human review. However, these "LLM-as-a-Judge" systems require constant prompt engineering to align with human preferences. In LangSmith, you can now use "LLM-as-a-Judge" evaluators with a self-improving feedback loop: + Allow a human to easily correct the "LLM-as-a-Judge" + And easily pass these back to the 'LLM-as-a-Judge' as few shot examples In part 1 last week, we showed how to apply self-improving evaluators to any LangSmith project: + The evaluator is applied to all traces in your project automatically and can run on production logs + It's easy to review, correct, and pass back correction to improve the evaluator Here in part 2, we show how to pin self-improving evaluators to any LangSmith dataset: + The evaluator is applied on every experiment run on your dataset In both cases, the evaluator can be self-improved with human feedback! 🎥 Video: https://lnkd.in/gi6CG6qH 📓 Docs: https://lnkd.in/gPbYCnvm 🤓 Data flywheel resource: https://lnkd.in/gurFTjC9 ✍️ Blog: https://lnkd.in/gvEgXuJU

  • No alternative text description for this image

The concept of self-improving evaluators in LangSmith is a game-changer! It's impressive how it allows a human to easily correct and improve the LLM-as-a-Judge systems, creating a continuous feedback loop for enhanced performance. The ability to apply these evaluators to any project or dataset further adds to its utility. Excited to see more advancements in this space! Keep up the excellent work.

Like
Reply

Nag Maddula - Please have a look for the product modeler , to refine the answers before publishing to user !

Like
Reply
Azaz Rasool

juggling | its fun when you have a clear view of your north star 💫

3w
Like
Reply
Charles Dadi

@Nexa Forward | Streamline your business with AI.

3w
See more comments

To view or add a comment, sign in

Explore topics