Score every LLM response. Route to the cheapest model that passes. Auto-escalate when quality drops.
npm install proofmark
First model to pass the gate wins. You pay for the cheapest sufficient answer.
Every response gets a quality score from 0.0 to 1.0. Checks XML structure, rubric completeness, injection detection, and more.
Start with the cheapest model. If it passes the gate threshold, return immediately. No wasted spend on expensive models.
If quality drops below threshold, automatically try the next tier. The caller never sees a bad response.
import { createRouter } from "proofmark";
const router = createRouter({
openaiKey: process.env.OPENAI_API_KEY,
anthropicKey: process.env.ANTHROPIC_API_KEY,
minimaxKey: process.env.MINIMAX_API_KEY,
promptId: "pmpt_your_stored_prompt"
});
const result = await router.evaluate("Analyze this SaaS idea");
// result.provider → "minimax" | "openai" | "anthropic"
// result.quality → 0.82
// result.escalated → false
Composite score: 40% structural + 60% per-response quality
| Check | What it catches |
|---|---|
| XML well-formedness | Broken tags, unclosed elements, malformed output |
| Rubric completeness | Missing evaluation dimensions, incomplete scoring |
| Probability sanity | Scores that violate statistical bounds |
| Injection detection | Prompt injection attempts in LLM output |
| Text length | Suspiciously short or truncated responses |
| Structural consistency | Format drift across repeated calls |
Standalone servers built on the core SDK. Each one imports from proofmark and runs independently.
OpenAI-compatible LLM proxy with quality gate. Drop-in replacement for /v1/chat/completions that scores and routes every request.
Prompt A/B testing service. Define experiments with Standard Schema validation, track metrics, and pick winners with statistical confidence.
Documentation quality scoring. Readability analysis, structural checks, and automatic Diataxis classification (Tutorial, Guide, Reference, Explanation).
Pedagogy-aware quality gate. Detects teaching methods (Suzuki, Kodaly, Orff), maps to Bloom's taxonomy, and audits curriculum sequences.
Powered by Polar
Proofmark is MIT licensed. The core SDK, quality gate, router, and all verticals are open source.