Self-hosted. No data egress. By Nativerse.
Your AI bill keeps growing. TokenLedger re-counts every token on your own machine.
It logs every billable API call across providers, independently re-counts the text that is checkable, and reconciles the provider's numbers three ways. Nothing you send or receive leaves the box.
Runs alongside your LiteLLM gateway. It audits the numbers and routes nothing. Self-hosted, SQLite on your machine, open source under Apache-2.0.
What it does, step by step.
Re-derive the numbers yourself instead of trusting the provider's, then label every figure by how sure we are.
Capture what you sent, what came back, and the token counts the provider says you used.
Re-tokenise the actual text with the model's own tokenizer. We never ask the provider to count.
Compare our count with the provider's and return a verdict: OK, over-count, or out of band.
Apply pay-per-token or rented-GPU cost models for an effective cost per token you can compare.
How sure we are
We tell you exactly what we can and cannot verify.
Output tokens on providers with a public tokenizer. We re-tokenise the text you received with tiktoken or the open-weight model's own tokenizer. Billed above counted is a hard discrepancy.
Input tokens and closed models like Claude and Gemini. We re-count what you sent plus documented overhead, and flag figures outside a tolerance band. We never dollarise an estimate as exact.
Reasoning tokens and per-call cache. Billed but never returned, so there is nothing to re-count. We record them and never assert them.
Every result carries its confidence label. The tool never claims proof it does not have.
Sits beside LiteLLM and audits from the outside.
LiteLLM already writes spend logs. Point TokenLedger at them and it audits the numbers from the outside. It does not route or proxy your traffic.
# read what the gateway already wrote, re-count it independently tokenledger ingest litellm_spendlogs.jsonl --format litellm → output re-counted exactly where a tokenizer exists → gateways hand back the provider's own number; we re-tokenise and verify it
A planted over-count, caught and labelled.
The offline demo plants realistic discrepancies and catches them, with every figure carrying its EXACT, BOUNDED, or UNVERIFIABLE label.
| bucket | billed | re-counted | verdict |
|---|---|---|---|
| output | 89 | 64 | over-count, EXACT |
| input | 112 | ~20 | out of band, BOUNDED |
| reasoning | 1,024 | n/a | UNVERIFIABLE |
Offline demo, planted discrepancies. Figures are illustrative, not a measured customer result.
Run it yourself in 60 seconds.
pip install "tokenledger[exact]" tokenledger demo open tokenledger_demo.html
No signup, no API keys, nothing leaves your machine.
Questions, answered plainly.
Does my prompt or response data leave my network?
No. All counting and reconciliation run locally. The only network call the system makes is the optional proxy forwarding your own request to the provider you chose. Text can be stored hashed only.
Do I replace my gateway?
No. TokenLedger runs alongside LiteLLM and audits the numbers. It routes nothing.
Can you verify Claude or Gemini token counts exactly?
No, and we say so. Closed models are bounded, not exact. We flag figures outside a tolerance band and never present an estimate as an exact result.
What about reasoning tokens?
Recorded, never asserted. They are billed but not returned, so there is nothing to re-count.
Is this validated with real customers?
The reconciliation engine, store, dashboard and report are working and tested offline. Demand and the closed-model band width are still being validated with design partners. We will not claim a result we have not measured.
See it on your own logs.
We run a small number of seven-day validations with teams whose AI spend is growing. Bring a sample of your gateway logs and we reconcile them with you, with no data leaving your environment.
Prefer to talk first? Use the Book now button, or book a 20-minute call.