Build software better, together

promptfoo / promptfoo

Test your prompts, agents, and RAGs. AI Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.

testing ci evaluation ci-cd pentesting cicd vulnerability-scanners prompts evaluation-framework red-teaming rag llm prompt-engineering llmops prompt-testing llm-eval llm-evaluation llm-evaluation-framework

Updated Jan 14, 2026
TypeScript

msoedov / agentic_security

Star

Agentic LLM Vulnerability Scanner / AI red teaming kit 🧪

agent-framework ai-red-team prompt-testing llm-security llm-vulnerabilities llm-evaluation llm-fuzzing llm-evaluation-framework llm-guardrails llm-scanner llm-jailbreaks llm-fuzzer llm-fuzzer-aggregator agent-security

Updated Dec 24, 2025
Python

babelcloud / LLM-RGB

Star

LLM Reasoning and Generation Benchmark. Evaluate LLMs in complex scenarios systematically.

benchmark prompt llm prompt-engineering prompt-testing

Updated May 25, 2025
TypeScript

jhd3197 / Prompture

Sponsor

Star

Prompture is an API-first library for requesting structured JSON output from LLMs (or any structure), validating it against a schema, and running comparative tests between models.

openai toon json-validation structured-output pydantic llm prompt-engineering ai-testing prompt-testing

Updated Nov 22, 2025
Python

aralyekta / prompttester

Star

Test, compare, and optimize your AI prompts in minutes

prompt-testing llm-tools llm-test llm-evaluation prompt-test llm-testing

Updated Aug 13, 2025
JavaScript

prompt-foundry / typescript-sdk

Star

The prompt engineering, prompt management, and prompt evaluation tool for TypeScript, JavaScript, and NodeJS.

typescript gpt open-ai gpt-3 gpt-4 llm prompt-engineering llmops prompt-testing prompt-manager prompt-management llm-eval llm-test llm-ops llm-evaluation prompt-evaluation

Updated Nov 15, 2025
TypeScript

bluewave-labs / evalwise

Sponsor

Star

EvalWise is a developer-friendly platform for LLM evaluation and red teaming that helps test AI models for safety, compliance, and performance issues

rag llm prompt-engineering llmops prompt-testing evals llm-evaluation rag-evaluation llm-evaluation-toolkit

Updated Nov 20, 2025
Python

calibrtr / llm-prompt-test

Star

LLM Prompt Test helps you test Large Language Models (LLMs) prompts to ensure they consistently meet your expectations.

testing tdd test prompt test-automation testing-tools prompts large-language-models llm prompt-engineering prompt-testing

Updated May 22, 2024
TypeScript

yukinagae / genkitx-promptfoo

Star

Community Plugin for Genkit to use Promptfoo

plugin testing firebase ai evaluation prompt prompts evaluation-framework llm llmops prompt-testing llm-eval llm-evaluation llm-evaluation-framework promptfoo genkit genkitx genkit-plugin

Updated Jan 3, 2025
TypeScript

syamsasi99 / prompt-evaluator

Star

prompt-evaluator is an open-source toolkit for evaluating, testing, and comparing LLM prompts. It provides a GUI-driven workflow for running prompt tests, tracking token usage, visualizing results, and ensuring reliability across models like OpenAI, Claude, and Gemini.

electron react typescript datascience developer-tools ai-evaluation llm prompt-engineering prompt-testing promptfoo ai-evaluation-tools ai-evaluation-metrics ai-evaluation-framework

Updated Dec 4, 2025
TypeScript

SEMalytics / claude_project_chat

Star

Test Claude Projects without copy-pasting. Local workbench for prompt engineering, agent testing, and workflow iteration. Direct Claude.ai access via cookie auth, 20+ prompt templates, web fetch/search tools, file uploads. Stop switching tabs to test your prompts.

flask devtools api-client developer-tools testing-tools knowledge-base workflow-automation ai-agents claude ai-development llm prompt-engineering prompt-testing anthropic claude-api prompt-templates claude-projects

Updated Jan 13, 2026
JavaScript

amansoomro062 / atelier

Star

An open-source AI prompt engineering playground with live code execution. Test OpenAI & Claude prompts, execute JavaScript, and iterate in real-time.

playground ai nextjs openai developer-tools claude llm prompt-engineering prompt-testing anthropic prompt-optimization system-prompts

Updated Nov 8, 2025
TypeScript

missoutlaw / Outlaw-Prompt

Star

A Beautiful, Cinematic Prompt Engineering Studio

gemini openai ai-tools prompt-engineering prompting prompt-testing ai-prompts prompt-manager llm-prompting llm-tools ollama perplexity-ai ai-prompt image-generation-ai open-webui llm-prompts

Updated Dec 21, 2025

yukinagae / promptfoo-sample

Star

Sample project demonstrates how to use Promptfoo, a test framework for evaluating the output of generative AI models

testing evaluation prompts evaluation-framework llm llmops prompt-testing llm-eval llm-evaluation llm-evaluation-framework promptfoo

Updated Sep 10, 2024

radoslaw-sz / maia

Star

A pytest-based framework for testing multi AI agents systems. It provides a flexible and extensible platform for complex multi-agent simulations. Supports many integrations like LiteLLM, CrewAI, LangChain etc.

python framework ai test agents maia llm prompt-engineering ai-testing prompt-testing agentic ai-testing-tool

Updated Sep 24, 2025
TypeScript

abdullahkhalid00 / prompt-db

Star

A collection of prompts that I use on a day-to-day basis for work and leisure.

markdown jinja2 text prompts prompt-engineering chatgpt prompt-testing prompt-template

Updated Sep 9, 2024

GTMVP / modal-llm-evaluator

Star

Run 1,000 LLM evaluations in 10 minutes. Test prompts across Claude, GPT-4, and Gemini with parallel execution, real-time cost tracking, and beautiful visualizations. Open source.

python testing benchmarking machine-learning automation ai modal developer-tools parallel-execution mlops streamlit llm prompt-engineering llms prompt-testing anthropic llm-evaluation cost-tracking google-gemini

Updated Dec 12, 2025
Python

alinaleo27 / ai-rag-eval-qa

Star

AI RAG evaluation project using Ragas. Includes RAG metrics (precision, recall, faithfulness), retrieval diagnostics, and prompt testing examples for fintech/banking LLM systems. Designed as an AI QA Specialist portfolio project.

ai-qa prompt-testing llm-evaluation rag-evaluation ragas llm-testing

Updated Nov 17, 2025
Python

srdarkseer / PromptForge

Star

Visual prompt engineering platform for creating, testing, and versioning LLM prompts across multiple providers (OpenAI, Anthropic, Mistral, Gemini).

ai-tools llm prompt-engineering prompt-testing prompt-optimization

Updated Nov 5, 2025
TypeScript

ashleysally00 / promptfoo-quickstart-guide

Star

Quickstart guide for using PromptFoo to evaluate LLM prompts via CLI or Colab.

openai colab model-evaluation cli-tool llm prompt-engineering prompt-testing promptfoo

Updated Nov 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prompt-testing

Here are 29 public repositories matching this topic...

promptfoo / promptfoo

msoedov / agentic_security

babelcloud / LLM-RGB

jhd3197 / Prompture

aralyekta / prompttester

prompt-foundry / typescript-sdk

bluewave-labs / evalwise

calibrtr / llm-prompt-test

yukinagae / genkitx-promptfoo

syamsasi99 / prompt-evaluator

SEMalytics / claude_project_chat

amansoomro062 / atelier

missoutlaw / Outlaw-Prompt

yukinagae / promptfoo-sample

radoslaw-sz / maia

abdullahkhalid00 / prompt-db

GTMVP / modal-llm-evaluator

alinaleo27 / ai-rag-eval-qa

srdarkseer / PromptForge

ashleysally00 / promptfoo-quickstart-guide

Improve this page

Add this topic to your repo