$ timeahead.in
/ servers/npm/mcp-evals
npm

mcp-evals

GitHub Action for evaluating MCP server tool calls using LLM-based scoring

128 stars113k/wkupdated 351d agogithub ↗
88good
▣ Overview

What it does

Node.js package and GitHub Action that validates MCP tool implementations using LLM-based scoring. Supports both TypeScript and YAML configuration formats. Evaluations are scored across five dimensions—accuracy, completeness, relevance, clarity, reasoning—on a 1–5 scale, with structured result objects. Includes built-in observability support with metrics and tracing via OTEL.

Who it's for

MCP server developers verifying tool correctness and CI/CD teams automating tool validation on pull requests. Useful for teams already running GitHub Actions who want to catch tool regressions before merge.

Common use cases

  • Score tool output accuracy on PR changes using the GitHub Action
  • Test error handling—e.g., invalid inputs—via YAML eval files
  • Verify multi-step tool functionality, such as weather forecasts, locally with npx mcp-eval
  • Monitor tool performance metrics with the OTEL-compatible observability stack
  • Gate merges on eval scores by parsing GitHub Action results

Setup pitfalls

  • Requires OPENAI_API_KEY or ANTHROPIC_API_KEY environment variable; defaults to GPT-4, which may incur token quota or billing issues on shared keys
  • The metrics and observability feature is alpha with unstable APIs; the docker-compose stack requires Docker and careful port configuration
  • Reads filesystem to load eval definitions—ensure sandboxing if running untrusted eval files
  • Last commit 347 days ago; package is not actively maintained, so Anthropic SDK features or new MCP spec updates may lag
▣ Score BreakdownMCPScore = Σ(raw × weight)
DimensionRawWeighted
Security
35%
100
35.0
Freshness
25%
85
21.3
Adoption
20%
81
16.2
Quality
10%
100
10.0
Trust
10%
50
5.0
Total
87.5
⚿ Capabilities & Risk Explainer
fs readexecsecrets
◆ Risk level: medium· 1 tools · auth: API key
fs read + exec + secrets active — can execute code, access credentials, and make external network calls.
Tool nameDescriptionDestructive?
add✓ no
⚙ Install config
Claude Desktop · Cursor · Windsurf · VS Code (Copilot) · Claude Code
add to your MCP client config:
{
  "mcpServers": {
    "mcp-evals": {
      "command": "npx",
      "args": [
        "-y",
        "mcp-evals"
      ]
    }
  }
}
📈 Score historylast 32 snapshots
4/30/20266/10/2026 · 32 snapshots
⚙ Maintenance health
8/ 100 · is this project alive?
contributors (1y)1
top contributor share100%
releases (1y)0
last release391d ago
ci✗ none
⛁ Raw data
weekly downloads113k
github stars128
forks13
open issues6
license✓ present
readme length6465 chars
last publish25d ago
last commit351d ago
last updated3d ago
install verified✗ fail · 28d ago
owner of this server? claim your listing to get a verified badgeclaim →
🔔 Score drop alerts
get notified by email when this server's score drops 5+ points