Proofmark — Quality Gate for AI Responses

How it works

1

Score

Every response gets a quality score from 0.0 to 1.0. Checks XML structure, rubric completeness, injection detection, and more.

2

Route

Start with the cheapest model. If it passes the gate threshold, return immediately. No wasted spend on expensive models.

3

Escalate

If quality drops below threshold, automatically try the next tier. The caller never sees a bad response.

router.mjs

import { createRouter } from "proofmark";

const router = createRouter({
  openaiKey: process.env.OPENAI_API_KEY,
  anthropicKey: process.env.ANTHROPIC_API_KEY,
  minimaxKey: process.env.MINIMAX_API_KEY,
  promptId: "pmpt_your_stored_prompt"
});

const result = await router.evaluate("Analyze this SaaS idea");
// result.provider  → "minimax" | "openai" | "anthropic"
// result.quality   → 0.82
// result.escalated → false

Quality gate scoring

Composite score: 40% structural + 60% per-response quality

Check	What it catches
XML well-formedness	Broken tags, unclosed elements, malformed output
Rubric completeness	Missing evaluation dimensions, incomplete scoring
Probability sanity	Scores that violate statistical bounds
Injection detection	Prompt injection attempts in LLM output
Text length	Suspiciously short or truncated responses
Structural consistency	Format drift across repeated calls

0.70 Default quality threshold. Responses scoring below this trigger automatic escalation to the next model tier.

Verticals

Standalone servers built on the core SDK. Each one imports from proofmark and runs independently.

:8090

Gate

OpenAI-compatible LLM proxy with quality gate. Drop-in replacement for /v1/chat/completions that scores and routes every request.

:8091

Bench

Prompt A/B testing service. Define experiments with Standard Schema validation, track metrics, and pick winners with statistical confidence.

:8092

Docs

Documentation quality scoring. Readability analysis, structural checks, and automatic Diataxis classification (Tutorial, Guide, Reference, Explanation).

:8093

Teach

Pedagogy-aware quality gate. Detects teaching methods (Suzuki, Kodaly, Orff), maps to Bloom's taxonomy, and audits curriculum sequences.

Pricing

Free

$0

Core SDK (MIT)
Quality gate scoring
3-tier model routing
A/B testing framework
All verticals (self-hosted)

Pro

$29/mo

Everything in Free
Hosted Gate proxy
Dashboard analytics
Webhook notifications
Priority support

Enterprise

Custom

Everything in Pro
Custom model chains
SSO / SAML
SLA guarantee
Dedicated support

Powered by Polar

Open source

Proofmark is MIT licensed. The core SDK, quality gate, router, and all verticals are open source.

View on GitHub npm package

18 Unit tests

50 Adversarial cases

0% Crash rate

4 Verticals