Self-hosted. No data egress. By Nativerse.

Your AI bill keeps growing. TokenLedger re-counts every token on your own machine.

It logs every billable API call across providers, independently re-counts the text that is checkable, and reconciles the provider's numbers three ways. Nothing you send or receive leaves the box.

Runs alongside your LiteLLM gateway. It audits the numbers and routes nothing. Self-hosted, SQLite on your machine, open source under Apache-2.0.

What it does, step by step.

Re-derive the numbers yourself instead of trusting the provider's, then label every figure by how sure we are.

01 / See
See the call

Capture what you sent, what came back, and the token counts the provider says you used.

02 / Re-count
Re-count locally

Re-tokenise the actual text with the model's own tokenizer. We never ask the provider to count.

03 / Reconcile
Reconcile three ways

Compare our count with the provider's and return a verdict: OK, over-count, or out of band.

04 / Price
Price it honestly

Apply pay-per-token or rented-GPU cost models for an effective cost per token you can compare.

How sure we are

We tell you exactly what we can and cannot verify.

Exact

Output tokens on providers with a public tokenizer. We re-tokenise the text you received with tiktoken or the open-weight model's own tokenizer. Billed above counted is a hard discrepancy.

Bounded

Input tokens and closed models like Claude and Gemini. We re-count what you sent plus documented overhead, and flag figures outside a tolerance band. We never dollarise an estimate as exact.

Unverifiable

Reasoning tokens and per-call cache. Billed but never returned, so there is nothing to re-count. We record them and never assert them.

Every result carries its confidence label. The tool never claims proof it does not have.

Sits beside LiteLLM and audits from the outside.

LiteLLM already writes spend logs. Point TokenLedger at them and it audits the numbers from the outside. It does not route or proxy your traffic.

A planted over-count, caught and labelled.

The offline demo plants realistic discrepancies and catches them, with every figure carrying its EXACT, BOUNDED, or UNVERIFIABLE label.

Run it yourself in 60 seconds.

terminal
pip install "tokenledger[exact]"
tokenledger demo
open tokenledger_demo.html

No signup, no API keys, nothing leaves your machine.

View on GitHub

Questions, answered plainly.

Does my prompt or response data leave my network?

No. All counting and reconciliation run locally. The only network call the system makes is the optional proxy forwarding your own request to the provider you chose. Text can be stored hashed only.

Do I replace my gateway?

No. TokenLedger runs alongside LiteLLM and audits the numbers. It routes nothing.

Can you verify Claude or Gemini token counts exactly?

No, and we say so. Closed models are bounded, not exact. We flag figures outside a tolerance band and never present an estimate as an exact result.

What about reasoning tokens?

Recorded, never asserted. They are billed but not returned, so there is nothing to re-count.

Is this validated with real customers?

The reconciliation engine, store, dashboard and report are working and tested offline. Demand and the closed-model band width are still being validated with design partners. We will not claim a result we have not measured.

See it on your own logs.

We run a small number of seven-day validations with teams whose AI spend is growing. Bring a sample of your gateway logs and we reconcile them with you, with no data leaving your environment.

Tell us your stack and the spend you cannot see today. We reply to every serious one.

Prefer to talk first? Use the Book now button, or book a 20-minute call.