same code. half the bill.

One integration. Every provider. Measurable savings.

Install
skalpel gateway
requestchat.completions
modelclaude-opus-4-6
tokens3,847 in / 512 out
cachesemantic hit
cost$0.000 (saved $0.048)
latency12ms (cached)
status✓ served
Get Access

Start optimizing in minutes.

Create an account, connect a provider, swap one URL. That's it.

Free tier included. No credit card required.

How it works

Every request. Optimized automatically.

Cost$0.048
Request receivedINGRESS
Semantic lookupCACHE
Model selectionROUTE
Token reductionCOMPRESS
Forward to LLMPROVIDER
Result servedRESPONSE
Capabilities

The layer between your code and model providers.

Intelligent Routing

Requests land on the right model at the right price. Rules you define, decisions you can trace.

Semantic Cache

Identical and near-identical requests served from cache. Scoped, auditable, and measured.

Context Compression

Inputs compressed before they reach the provider. Structure preserved. Confidence reported.

Quality Protection

Shadow evaluations on every optimization path. Regressions caught before they ship.

Multi-Provider

Normalized interface. OpenAI, Anthropic, Google, and more through one endpoint.

Full Trace View

Every request: policies matched, cache status, route decision, final cost. Nothing hidden.

Cost Optimization

Measured, auditable, and separated into verified and estimated savings.

Key Metrics
40%
Average cost reduction
<150ms
p95 gateway overhead
99.9%
Monthly API availability
Relative Cost per Request

Production workload comparison

100%80%60%40%20%0%
38%
Skalpel
100%
Direct
78%
Basic cache
85%
Prompt trim
72%
Manual route
Direct Provider Call

Direct Provider Call

Full price, no optimization

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

// Every call pays full token price
const response = await client.messages.create({
  model: "claude-opus-4-6",
  max_tokens: 1024,
  messages: [{ role: "user", content: prompt }],
});
Through Skalpel

Through Skalpel

Same code, lower cost

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  baseURL: "https://gateway.skalpel.ai/v1",
  apiKey: process.env.SKALPEL_KEY,
});

// Same call. Optimized automatically.
const response = await client.messages.create({
  model: "claude-opus-4-6",
  max_tokens: 1024,
  messages: [{ role: "user", content: prompt }],
});

Pricing

Start free. Scale when you see the savings.

Free
$0/forever

For individual developers

Get Started Free
  • 1 workspace
  • 10,000 requests/month
  • Core dashboard
  • Basic routing
  • Exact cache
Pro
Popular
$25/month

For power users

Get started
  • Everything in Free +
  • Multiple workspaces
  • Advanced routing
  • Semantic cache
  • Prompt optimization
  • Compression
  • Quality monitoring
  • CLI patcher
Power
Heavy spender
$100/month

For teams burning through tokens

Get started
  • Everything in Pro +
  • Unlimited workspaces
  • Priority routing
  • Advanced compression
  • Full tracing retention
  • Billing exports
  • Dedicated support
  • Custom quality floors
Enterprise

For teams with compliance & security needs

  • Everything unlimited
  • SSO (SAML / OIDC)
  • SCIM provisioning
  • Audit log streaming
  • Private networking
  • Dedicated infrastructure
  • Custom SLAs
  • Self-hosted option
Talk to Founders

FAQ

FAQ
Three ways: swap your base URL and use a Skalpel key, install our lightweight SDK wrapper, or run npx skalpel init to auto-patch your project. Most teams are live in under 10 minutes.
Skalpel processes requests in transit to apply caching, routing, and compression. We never store prompt or response bodies beyond the request lifecycle unless you opt in to tracing retention. Enterprise customers can run fully on-prem.
We separate savings into three auditable buckets: verified (measured baseline vs actual), estimated (modeled counterfactual), and provider-side (prompt cache or batch discounts). They are never blurred together.
Correctness comes first. Every engine has a quality floor, shadow evaluation, and a kill switch. If an engine can change customer-visible meaning, it requires benchmark evidence before going live.
OpenAI, Anthropic, Google (Gemini), and more coming. Each adapter handles streaming, tools, JSON schema, vision, and batch as first-class capabilities.
Reach us at founders@skalpel.ai
Get Started

Start routing smarter today.

Get started free