Quality gate for AI responses

Score every LLM response. Route to the cheapest model that passes. Auto-escalate when quality drops.

npm install proofmark
MiniMax $0.001/1K tokens
OpenAI $0.01/1K tokens
Opus $0.06/1K tokens

First model to pass the gate wins. You pay for the cheapest sufficient answer.

How it works

1

Score

Every response gets a quality score from 0.0 to 1.0. Checks XML structure, rubric completeness, injection detection, and more.

2

Route

Start with the cheapest model. If it passes the gate threshold, return immediately. No wasted spend on expensive models.

3

Escalate

If quality drops below threshold, automatically try the next tier. The caller never sees a bad response.

router.mjs
import { createRouter } from "proofmark";

const router = createRouter({
  openaiKey: process.env.OPENAI_API_KEY,
  anthropicKey: process.env.ANTHROPIC_API_KEY,
  minimaxKey: process.env.MINIMAX_API_KEY,
  promptId: "pmpt_your_stored_prompt"
});

const result = await router.evaluate("Analyze this SaaS idea");
// result.provider  → "minimax" | "openai" | "anthropic"
// result.quality   → 0.82
// result.escalated → false

Quality gate scoring

Composite score: 40% structural + 60% per-response quality

Check What it catches
XML well-formedness Broken tags, unclosed elements, malformed output
Rubric completeness Missing evaluation dimensions, incomplete scoring
Probability sanity Scores that violate statistical bounds
Injection detection Prompt injection attempts in LLM output
Text length Suspiciously short or truncated responses
Structural consistency Format drift across repeated calls
0.70 Default quality threshold. Responses scoring below this trigger automatic escalation to the next model tier.

Verticals

Standalone servers built on the core SDK. Each one imports from proofmark and runs independently.

:8090

Gate

OpenAI-compatible LLM proxy with quality gate. Drop-in replacement for /v1/chat/completions that scores and routes every request.

:8091

Bench

Prompt A/B testing service. Define experiments with Standard Schema validation, track metrics, and pick winners with statistical confidence.

:8092

Docs

Documentation quality scoring. Readability analysis, structural checks, and automatic Diataxis classification (Tutorial, Guide, Reference, Explanation).

:8093

Teach

Pedagogy-aware quality gate. Detects teaching methods (Suzuki, Kodaly, Orff), maps to Bloom's taxonomy, and audits curriculum sequences.

Pricing

Free

$0
  • Core SDK (MIT)
  • Quality gate scoring
  • 3-tier model routing
  • A/B testing framework
  • All verticals (self-hosted)

Pro

$29/mo
  • Everything in Free
  • Hosted Gate proxy
  • Dashboard analytics
  • Webhook notifications
  • Priority support

Enterprise

Custom
  • Everything in Pro
  • Custom model chains
  • SSO / SAML
  • SLA guarantee
  • Dedicated support

Powered by Polar

Open source

Proofmark is MIT licensed. The core SDK, quality gate, router, and all verticals are open source.

18 Unit tests
50 Adversarial cases
0% Crash rate
4 Verticals