llm-evaluation

Star

Here are 85 public repositories matching this topic...

fuxiAIlab / CivAgent

Star

CivAgent is an LLM-based Human-like Agent acting as a Digital Player within the Strategy Game Unciv.

game llm-agent llm-evaluation aiagent

Updated Jul 17, 2024
Python

awesome-software / ray-summit-2023-training

Star

llm-evaluation

Updated Sep 21, 2023
Jupyter Notebook

nagababumo / Automated-Testing-for-LLMOps

Star

automation evaluation llm llmops llm-evaluation llm-automation

Updated Jun 4, 2024
Jupyter Notebook

j0st / PoliticalLLM

Star

A framework for automatically manipulating and evaluating the political ideology of LLMs with two ideology tests: Wahl-O-Mat and Political Compass Test.

german pct manifesto-project rag wahlomat political-ideology-detection llms llm-evaluation

Updated Jul 4, 2024
Python

kwinkunks / promptly

Star

A prompt collection for testing and evaluation of LLMs.

prompts prompt-engineering chatgpt llm-evaluation

Updated Jun 5, 2024
Jupyter Notebook

prompt-foundry / java-sdk

Star

The prompt engineering, prompt management, and prompt evaluation tool for Java.

java evaluation openai prompt-engineering prompt-manager prompt-management llm-evaluation prompt-evaluation

Updated Jun 16, 2024

gretelai / navigator-helpers

Star

Navigator Helpers

ai agent-based synthetic-data llm llm-evaluation

Updated Jul 16, 2024
Python

Mihir3009 / GridPuzzle

Star

An evaluation dataset comprising of 274 grid-based puzzles with different complexities

evaluation-metrics reasoning logical-reasoning large-language-models llm-evaluation

Updated Jun 25, 2024

johnsonhk88 / Web-Scraping-by-LLM-And-AI-Agent

Star

Use LLM for Web scraping (collection data)

python web-scraping gemma ai-agents rag llm vectordb llm-evaluation llama3

Updated Jul 3, 2024
Python

CommissarSilver / TraWiC

Star

Trained Without My Consent (TraWiC): Detecting Code Inclusion In Language Models Trained on Code

intellectual-property llm-security llm-training code-llms llm-evaluation

Updated Jun 20, 2024
Python

rochitasundar / Generative-AI-with-Large-Language-Models

Star

This repository contains the lab work for Coursera course on "Generative AI with Large Language Models".

reinforcement-learning transformer kl-divergence proximal-policy-optimization large-language-models prompt-engineering flan-t5 instruction-finetuning low-rank-adaptation reward-model parameter-efficient-fine-tuning llm-evaluation

Updated Dec 1, 2023
Jupyter Notebook

IteraLabs / knowledge-benchmarks

Star

A compilation of referenced benchmark metrics to evaluate different aspects of knowledge for Large Language Models.

nlp artificial-intelligence benchmarks natural-language-understanding llm llm-evaluation

Updated May 18, 2024

prompt-foundry / go-sdk

Star

The prompt engineering, prompt management, and prompt evaluation tool for Go.

go golang open-api gpt gpt-4 prompt-engineering prompt-manager prompt-management llm-eval llm-test llm-evaluation prompt-test llm-testing prompt-eva

Updated Jun 16, 2024

pyladiesams / llm-guardrails-jul2024

Star

Dive into the world of LLM Guardrails using tools like NVIDIA’s NeMo Guardrails. Discover the mechanisms that ensure applications produce reliable, robust, safe, and ethical outputs, and understand their crucial role in LLMs

llm llms nemo-guardrails llm-evaluation

Updated Jul 16, 2024
Jupyter Notebook

prompt-foundry / typescript-sdk

Star

The prompt engineering, prompt management, and prompt evaluation tool for TypeScript, JavaScript, and NodeJS.

typescript gpt open-ai gpt-3 gpt-4 llm prompt-engineering llmops prompt-testing prompt-manager prompt-management llm-eval llm-test llm-ops llm-evaluation prompt-evaluation

Updated Jul 18, 2024
TypeScript

Yifan-Song793 / GoodBadGreedy

Star

The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism

large-language-models llm llm-evaluation

Updated Jul 17, 2024
Python

prompt-foundry / kotlin-sdk

Star

The prompt engineering, prompt management, and prompt evaluation tool for Kotlin.

kotlin open-ai llm prompt-engineering prompt-management llm-eval llm-evaluation prompt-evaluation

Updated Jun 16, 2024

AdamCoscia / iScore

Star

Upload, score, and visually compare multiple LLM-graded summaries simultaneously!

transformers visual-analytics summary-evaluation learning-sciences responsible-ai ethical-ai llm-evaluation

Updated Mar 8, 2024
JavaScript

ricardo-agz / LLMChess

Star

Benchmark LLMs' abilities to plan, strategize, and reason by making them play chess against each other.

python chess ai openai llm anthropic llm-agent llm-evaluation llm-benchmarking

Updated Jun 21, 2024
Python

manoharvellala / CS120AI

Star

Innovated CS120.AI an Angular, Django based chatbot designed for courses at Old Dominion University. Harnessed the power of the Transformers library from Hugging Face to fine-tune the Llama 2 and used the RAG-based method to query course data stored in the Pinacone Vector Database.

pinecone rag vector-database llm-training llm-evaluation

Updated Jun 28, 2024
Jupyter Notebook

Improve this page

Add a description, image, and links to the llm-evaluation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-evaluation topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-evaluation

Here are 85 public repositories matching this topic...

fuxiAIlab / CivAgent

awesome-software / ray-summit-2023-training

nagababumo / Automated-Testing-for-LLMOps

j0st / PoliticalLLM

kwinkunks / promptly

prompt-foundry / java-sdk

gretelai / navigator-helpers

Mihir3009 / GridPuzzle

johnsonhk88 / Web-Scraping-by-LLM-And-AI-Agent

CommissarSilver / TraWiC

rochitasundar / Generative-AI-with-Large-Language-Models

IteraLabs / knowledge-benchmarks

prompt-foundry / go-sdk

pyladiesams / llm-guardrails-jul2024

prompt-foundry / typescript-sdk

Yifan-Song793 / GoodBadGreedy

prompt-foundry / kotlin-sdk

AdamCoscia / iScore

ricardo-agz / LLMChess

manoharvellala / CS120AI

Improve this page

Add this topic to your repo