So you’ve learned about Toon Format and its impressive token savings. But here’s the real question: when should you actually use it?
Not every project needs TOON. Sometimes JSON works just fine. But if you’re dealing with structured data and LLMs, there are specific scenarios where switching to TOON can save you serious time, money, and headaches.
TLDR: When TOON Shines
- RAG Systems: Feeding large datasets to LLMs for retrieval
- API Cost Optimization: High-volume LLM API calls with structured data
- Agent Systems: Multi-step workflows with data passing between agents
- Prompt Engineering: Few-shot examples with structured data
- Batch Processing: Converting datasets for LLM analysis
Use Case #1: RAG Systems with Large Datasets
Problem: You’re building a Retrieval-Augmented Generation system. Your vector search returns 20 product records, and you need to stuff them into the LLM’s context for the user’s question.
With JSON, those 20 products might consume 3,000+ tokens before you even add the user’s question or system prompt.
Solution: Convert your retrieved data to TOON before sending it to the LLM.
Before (JSON):
[
{
"id": "prod-001",
"name": "Wireless Mouse",
"price": 29.99,
"stock": 145,
"category": "Electronics"
},
{
"id": "prod-002",
"name": "USB-C Cable",
"price": 12.99,
"stock": 89,
"category": "Electronics"
}
// ... 18 more products
]
After (TOON):
products[20]{id,name,price,stock,category}:
prod-001,Wireless Mouse,29.99,145,Electronics
prod-002,USB-C Cable,12.99,89,Electronics
...
Impact: With 20 products, you save approximately 1,200 tokens. That’s enough space to include more context, examples, or retrieved chunks.
Implementation Guide:
from toon import to_toon
import anthropic
# Your RAG retrieval
search_results = vector_db.search(query, limit=20)
# Convert to TOON
products_toon = to_toon(search_results)
# Build prompt
prompt = f"""Here are the available products:
{products_toon}
User question: {user_query}
Please recommend the best option and explain why."""
# Send to LLM
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
messages=[{"role": "user", "content": prompt}]
)
Use Case #2: Reducing API Costs in Production
Problem: Your app makes 1 million LLM API calls per month. Each call includes a structured dataset (user profile, purchase history, preferences). Your monthly token bill is $4,500.
Solution: Switch the structured data portions to TOON format.
Let’s say each request looks like this:
Before (JSON - 450 tokens):
{
"user": {
"id": "u-12345",
"tier": "premium",
"joinDate": "2024-03-15"
},
"purchases": [
{"date": "2024-12-01", "item": "Widget A", "amount": 49.99},
{"date": "2024-12-15", "item": "Gadget B", "amount": 89.99},
{"date": "2025-01-10", "item": "Tool C", "amount": 129.99}
],
"preferences": {
"notifications": true,
"newsletter": false
}
}
After (TOON - ~270 tokens):
user:
id: u-12345
tier: premium
joinDate: 2024-03-15
purchases[3]{date,item,amount}:
2024-12-01,Widget A,49.99
2024-12-15,Gadget B,89.99
2025-01-10,Tool C,129.99
preferences:
notifications: true
newsletter: false
Cost Impact:
- Token reduction per request: 180 tokens (40%)
- Monthly token savings: 180M tokens
- Cost savings: ~$1,800/month (at typical API pricing)
That’s a 40% reduction in your data transfer costs, without changing any application logic.
Use Case #3: Multi-Agent Systems
Problem: You’re building an agent system where Agent A retrieves data, Agent B processes it, and Agent C generates a report. Each handoff involves passing structured data through the LLM.
With JSON, each agent consumes extra tokens just for syntax overhead.
Solution: Use TOON as the “data wire format” between agents.
Workflow Example:
# Agent A: Data Retrieval
def agent_a_retrieve(query):
results = database.query(query)
return to_toon(results) # Convert to TOON
# Agent B: Data Processing
def agent_b_process(data_toon):
prompt = f"""Analyze this data and extract key insights:
{data_toon}
Provide insights in TOON format:
insights[N]{category,description,importance}:
..."""
response = llm.generate(prompt)
return response # Already in TOON
# Agent C: Report Generation
def agent_c_report(insights_toon):
prompt = f"""Generate a summary report from these insights:
{insights_toon}"""
return llm.generate(prompt)
Benefits:
- Less token waste between agent handoffs
- Cleaner, more readable intermediate data
- Explicit schemas help agents understand data structure
Use Case #4: Prompt Engineering with Examples
Problem: You’re doing few-shot prompting and need to include 3-5 examples in your prompt. Each example has structured input/output. The examples alone consume 2,000+ tokens.
Solution: Format your examples in TOON to save tokens for the actual instruction and response.
Before:
Example 1:
Input: {"userId": "u-001", "items": [{"sku": "A1", "qty": 2}, {"sku": "B2", "qty": 1}]}
Output: {"total": 145.97, "shipping": 12.00, ...}
Example 2:
Input: {"userId": "u-002", "items": [{"sku": "C3", "qty": 5}]}
Output: {"total": 89.95, "shipping": 0, ...}
Token count: ~600 tokens
After:
Example 1:
Input:
userId: u-001
items[2]{sku,qty}:
A1,2
B2,1
Output:
total: 145.97
shipping: 12.00
Example 2:
Input:
userId: u-002
items[1]{sku,qty}:
C3,5
Output:
total: 89.95
shipping: 0
Token count: ~360 tokens (40% savings)
This gives you room to include MORE examples, which often improves model performance more than verbose formatting.
Use Case #5: Data Export & Batch Processing
Problem: You have 10,000 database records to analyze with an LLM. You can’t fit them all in one prompt, so you batch them into groups of 100.
Solution: Export batches to TOON format, maximizing the records per batch.
Scenario: Customer feedback analysis
Script:
import pandas as pd
from toon import to_toon
import anthropic
# Load customer feedback
df = pd.read_csv('feedback.csv')
# Process in batches
batch_size = 100
results = []
for i in range(0, len(df), batch_size):
batch = df[i:i+batch_size].to_dict('records')
batch_toon = to_toon({'feedback': batch})
prompt = f"""Analyze this customer feedback and categorize by sentiment:
{batch_toon}
Return categories in TOON format:
categories[N]{id,sentiment,theme}:
..."""
response = llm.generate(prompt)
results.append(response)
Impact: By using TOON, you can fit 100 records instead of ~70 with JSON. That’s 30% fewer API calls needed.
When NOT to Use TOON
Let’s be honest—TOON isn’t always the answer. Here are scenarios where you should stick with JSON:
-
Browser-based applications: If you’re sending data directly to a web frontend, JSON is native. The conversion overhead isn’t worth it.
-
Small data payloads: If you’re only sending 3-4 fields, the token savings are minimal (maybe 10-20 tokens). Not worth the complexity.
-
Non-LLM APIs: If your data is going to a REST API or traditional service, they expect JSON. Don’t convert unnecessarily.
-
Complex nested structures: TOON excels with tabular data (arrays of objects). If your data is heavily nested with irregular shapes, JSON might be clearer.
Getting Started Checklist
Ready to try TOON in your project? Here’s a practical checklist:
✓ Identify high-volume structured data flows
- Where are you sending arrays of objects to LLMs?
- Which prompts have repeated data patterns?
✓ Measure current token usage
- Use your LLM provider’s API to check token counts
- Calculate the cost per request
✓ Run a pilot conversion
- Pick one endpoint or workflow
- Convert the data portion to TOON
- Measure token reduction and accuracy
✓ Add conversion functions
// Add to your codebase
import { toToon, fromToon } from '@toon-format/toon';
function prepareForLLM(data) {
return toToon(data);
}
function parseFromLLM(toonString) {
return fromToon(toonString);
}
✓ Update prompts
- Add clear instructions that data is in TOON format
- Include example TOON structures for output generation
✓ Monitor and iterate
- Track token savings
- Monitor parsing errors (should decrease)
- Measure response quality
Conclusion
Toon Format isn’t a replacement for JSON in your application layer. It’s a specialized tool for the LLM interface layer—the boundary where structured data meets natural language processing.
The sweet spot for TOON is anywhere you have:
- High volume (many API calls or large datasets)
- Structured data (arrays of objects with consistent schemas)
- Token constraints (cost concerns or context limits)
If you hit even two of those three, TOON is worth testing. Start with one high-impact use case, measure the results, and expand from there.
The 40% token savings are real, but the bigger win is often the improved reliability in structured output generation. Less syntax means fewer ways for the LLM to mess up.
Ready to optimize your LLM workflows? Check out the official Toon Format documentation or try our JSON to TOON converter to see your own data transformed.