🐢 Open-Source Evaluation & Testing for LLMs and ML models
-
Updated
Jul 18, 2024 - Python
🐢 Open-Source Evaluation & Testing for LLMs and ML models
The LLM Evaluation Framework
The all-in-one LLM developer platform: prompt management, evaluation, human feedback, and deployment all in one place.
Production-Grade Evaluation for LLM-Powered Applications
Prompty makes it easy to create, manage, debug, and evaluate LLM prompts for your AI applications. Prompty is an asset class and format for LLM prompts designed to enhance observability, understandability, and portability for developers.
The official evaluation suite and dynamic data release for MixEval.
Python SDK for running evaluations on LLM generated responses
Superpipe - optimized LLM pipelines for structured data
Connect agents to live web environments evaluation.
Framework for LLM evaluation, guardrails and security
Evaluating LLMs with CommonGen-Lite
A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.
Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models
The implementation for EMNLP 2023 paper ”Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators“
Find better generation parameters for your LLM
A framework to build scenario simulation projects where human and LLM based agents can participant in, with a user-friendly web UI to visualize simulation, support automatically evaluation on agent action level.
Open-LLM-Leaderboard: Open-Style Question Evaluation. Paper at https://arxiv.org/abs/2406.07545
Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"
DCR-Consistency: Divide-Conquer-Reasoning for Consistency Evaluation and Improvement of Large Language Models
Add a description, image, and links to the llm-evaluation topic page so that developers can more easily learn about it.
To associate your repository with the llm-evaluation topic, visit your repo's landing page and select "manage topics."