If you’ve ever pasted a massive JSON blob into ChatGPT or Claude, you know the pain. It eats up your context window, and sometimes the model still hallucinates the structure when trying to generate it back.
Enter Toon Format (TOON), or Token-Oriented Object Notation. It’s a new data format designed specifically to be the “native tongue” of Large Language Models.
Why Should You Care?
- Token Savings: Uses ~40% fewer tokens than standard JSON.
- Higher Accuracy: Benchmarks show 74% accuracy in structure preservation vs JSON’s 70%.
- Lossless: It maps 1:1 to the JSON data model (Objects, Arrays, Primitives).
- Human Readable: Looks a lot like YAML met CSV.
The Problem with JSON
JSON is the king of the web, but it wasn’t built for LLMs.
- Syntax Heavy: All those braces
{}, quotes"", and commas,count as tokens. - Redundant: Repeated keys in lists of objects waste massive amounts of space.
[
{"id": 1, "name": "Alice", "role": "admin"},
{"id": 2, "name": "Bob", "role": "user"},
{"id": 3, "name": "Charlie", "role": "user"}
]
In the example above, the words “id”, “name”, and “role” are repeated for every single user.
The TOON Solution
Toon Format solves this by being “schema-aware”. It creates a header for arrays of objects (like a CSV) and uses indentation instead of braces (like YAML).
Here is that same data in TOON:
users[3]{id,name,role}:
1,Alice,admin
2,Bob,user
3,Charlie,user
Notice the difference?
- Header:
users[3]{id,name,role}tells the LLM “This is a list of 3 users with these specific fields.” - Data: The values are just comma-separated. No repeated keys.
Key Features
1. Tabular Arrays
As seen above, TOON collapses uniform lists of objects into tables. This is where the massive token savings come from.
2. Explicit Guardrails
The [N] syntax (e.g., users[3]) gives the LLM a hint about how many items to generate or expect. This drastically reduces “cut-off” generation errors.
3. Minimal Syntax
TOON uses significant whitespace (indentation) instead of braces, similar to Python or YAML. This removes visual noise and token overhead.
Comparison: JSON vs TOON
Let’s look at a more complex example from the official repository.
JSON:
{
"context": {
"task": "Hiking Trip",
"location": "Boulder"
},
"hikes": [
{ "id": 1, "name": "Blue Lake", "dist": 7.5 },
{ "id": 2, "name": "Ridge", "dist": 9.2 }
]
}
TOON:
context:
task: Hiking Trip
location: Boulder
hikes[2]{id,name,dist}:
1,Blue Lake,7.5
2,Ridge,9.2
Real-World Benchmarks
The team behind TOON ran extensive benchmarks across 4 different models (including Claude and GPT-4) to see how it stacks up against JSON, YAML, and XML. The results are significant.
1. Retrieval Accuracy
This tests how well an LLM can answer questions about the data provided in the format.
- TOON: 73.9% Accuracy
- JSON: 69.7% Accuracy
- YAML: 69.0% Accuracy
TOON actually helps the model understand the data better than native JSON, likely due to the cleaner structure and lack of syntax noise.
2. Token Efficiency
The real money-saver. Here is how many tokens were required for a mixed dataset (nested objects + lists):
- TOON: ~2,744 tokens
- JSON: ~4,545 tokens
- XML: ~5,167 tokens
That is a 39.6% reduction in tokens compared to JSON. If you are paying for API calls by the million tokens, this cuts your bill almost in half.
Implementations & Tools
TOON is designed as a “translation layer”. You don’t need to rewrite your database. You keep using JSON for your code, but when you send data to an LLM (or receive it), you convert it to TOON.
CLI Tool
You can try it right now without installing anything:
# Convert JSON to TOON
npx @toon-format/cli input.json -o output.toon
# Pipe from stdin
echo '{"name": "Ada", "role": "dev"}' | npx @toon-format/cli
Libraries
Official and community libraries are available for most major languages:
- TypeScript/JS:
@toon-format/toon - Python:
toon-python - Go:
toon-go - Rust:
toon-rust - Java:
toon-java - Swift:
toon-swift - .NET:
toon-dotnet
There are also community implementations for C++, PHP, Ruby, Kotlin, and more. Check out the full list on GitHub.
Conclusion
As we build more complex AI agents, context windows become precious real estate. Formats like Toon Format that treat “tokens” as a first-class constraint are the future of AI-Data interaction.
If you are building RAG pipelines or large-scale data extraction agents, giving TOON a try might just save you a fortune in API costs.