Is JSON bad for LLM prompts?

JSON is not bad as an interchange format, but it is often inefficient as an LLM input format when the prompt contains large arrays of repeated objects. Repeated keys, repeated metadata, braces, quotes, and nested structure can burn a large share of the context window before the model sees the actual evidence.

What is the best structured prompt format for LLM input?

For large evidence-heavy workflows, the strongest pattern is a lossless evidence-alias format: define repeated metadata once, encode records as compact rows, preserve stable citation IDs, and validate that the encoded prompt can reconstruct the original source data. For simpler uniform arrays, TOON or CSV/TSV may be a better fit.

What should I use instead of JSON for LLM input?

For large uniform record sets, use a lossless alias or table-oriented encoding that declares repeated fields and metadata once, then references them from compact rows. Keep JSON at the API boundary for machine validation, structured outputs, and downstream application code.

What is TOON for LLM prompts?

TOON stands for Token-Oriented Object Notation. It is a compact, lossless way to represent JSON-like data for LLM input by declaring repeated fields once and encoding uniform object arrays as rows. It is most useful when many records share the same schema.

Is TOON better than JSON for LLM prompts?

TOON can be better than JSON when the input is a uniform array of objects because it declares fields once and streams rows compactly. It is not automatically better for small, deeply nested, or irregular payloads, and it may require extra prompt instructions that reduce the token savings on short inputs.

When should I use CSV or TSV instead of JSON?

Use CSV or TSV when the data is genuinely flat and tabular, every row has the same columns, and the task does not require rich nesting. Add metadata blocks for units, null handling, source IDs, and evidence references if the output must be auditable.

Are XML tags useful for structured LLM prompts?

XML tags are useful for separating instructions, context, examples, documents, and output rules. They improve prompt clarity, especially with Claude-style prompts, but they are not usually the most token-efficient way to encode large tabular datasets.

Should LLMs output TOON, JSON, or another format?

For production systems, JSON with provider structured outputs or tool/function schemas is usually the safest response boundary. Compact formats such as TOON, CSV/TSV, or evidence aliases are most useful for model input; validated JSON is usually best for application output.

Do structured outputs solve prompt token bloat?

Structured outputs solve output shape, not input bloat. They help ensure valid JSON that conforms to a schema, but the input prompt can still be wasteful if you paste raw JSON arrays directly into the model context.

How much can structured prompt encoding reduce tokens?

In a representative benchmark, a lossless evidence-alias format reduced input tokens by roughly 75% while preserving the source records needed for reconstruction and audit. Results depend on data shape, model tokenizer, and prompt overhead.

What makes a structured prompt format reliable?

A reliable prompt format should preserve the source data, keep evidence references citeable, support schema-valid responses, and improve the model's ability to return accurate, complete, appropriately scoped outputs. Token reduction is only useful if output integrity holds.

How does prompt input encoding work with Instructor or Pydantic?

Instructor, Pydantic, JSON Schema, and provider structured-output features help validate the model response. Prompt input encoding solves a different problem: reducing repeated input tokens while preserving evidence. In production, use compact evidence encoding for inputs and schema validation for outputs.

What is evidence alias encoding?

Evidence alias encoding is a lossless prompt-input pattern that assigns short IDs to repeated files, fields, segments, citations, and values. The model receives compact references instead of repeated metadata, while the system can still reconstruct the original source records.

How do you test output integrity after prompt compression?

Test required field coverage, semantic correctness, citation validity, response length adherence, schema validity, unsupported claim rate, and reconstruction success. A compressed prompt is only successful if it saves tokens without making the answer incomplete, inaccurate, or unauditable.

Best Structured Prompt Formats for LLMs, Ranked

Summary: The best structured input format for an LLM prompt depends on the shape of the data, but our research found a clear production winner for large evidence-heavy workflows: lossless evidence aliases. They reduced input size while preserving output integrity: accurate answers, complete required fields, stable citations, proper response length, and schema-valid output. TOON-style tables are a strong general-purpose runner-up for uniform object arrays. CSV/TSV works for flat data. XML tags are useful for separating prompt sections. YAML and raw JSON should be used more selectively. Keep JSON at the API boundary, not as the default prompt body.

Quick Answer: What is the best structured prompt format?

For large production workflows, the #1 recommendation is a lossless evidence-alias pattern: define repeated metadata once, encode evidence as compact rows, preserve stable citation IDs, validate that the encoded prompt can reconstruct the original records, and test that model outputs remain accurate, complete, and on target.

This is a pattern, not a universal file-format standard. If your payload is a simple uniform array and you do not need audit-grade citation reconstruction, TOON or CSV/TSV may be the more practical choice. If your payload is small, irregular, or deeply nested, minified JSON may be better. The ranking below is for evidence-heavy LLM inputs where correctness, traceability, and output quality matter as much as token count.

The practical ranking is:

Rank	Format	Best for	Verdict
1	Lossless evidence aliases	Large auditable workflows with files, records, citations, and repeated metadata	Best balance of token reduction and output integrity
2	TOON-style tables	Uniform arrays of objects	Strong compact format when rows are regular and output schema is explicit
3	CSV / TSV plus metadata blocks	Flat tabular data	Efficient and accurate for simple tables, weak for nested evidence
4	XML-tagged prompt sections	Separating instructions, examples, context, and documents	Improves instruction adherence, not a data-compression format
5	YAML	Human-readable configs and moderate nesting	Useful for humans, weaker for strict machine contracts
6	Minified JSON	Small, nested, irregular objects	Safe fallback, still pays repeated-key cost
7	Pretty JSON	Debugging and human review	Worst default for large prompts; wastes context that could support better output

The ranking criteria were:

token reduction
source-data preservation
output accuracy and completeness
citation validity
schema adherence
response length control
implementation complexity
fit for the data shape

The winning architecture is not one format everywhere. It is this:

application JSON
  -> lossless evidence-alias encoder for model input
  -> compact LLM prompt
  -> provider structured output / schema validation
  -> application JSON

Structured outputs control what comes out of the model. Evidence-alias encoding controls what you spend sending data in.

Should you send JSON to an LLM?

Use JSON at the boundary. Do not automatically use JSON inside the prompt.

If your application needs a typed response, use provider structured-output features: OpenAI Structured Outputs, Gemini response schemas, Claude JSON outputs, or strict tool use. If your application needs to send a large evidence set into the model, compress the input first with a lossless prompt encoding that removes repeated structure while preserving every source record.

Why engineers default to JSON

JSON is the obvious choice because every production system already understands it. It is portable, machine-readable, schema-compatible, easy to validate, easy to log, and easy to pass across APIs.

That is exactly why it should stay in the system.

The mistake is assuming that the same representation is optimal for the model’s attention and tokenizer. JSON was designed for software interchange, not for stuffing hundreds of near-identical records into a context window.

Consider a common agent workflow:

[
  {
    "file_id": "f_01H...",
    "filename": "invoice_packet.pdf",
    "segment_id": "s_01H...",
    "page_start": 4,
    "page_end": 5,
    "schema_id": "document_extraction_v3",
    "extraction_model": "example-extraction-model",
    "customer_name": "Example Customer Inc.",
    "field_name": "invoice_total",
    "field_value": "12400",
    "confidence": 0.98
  },
  {
    "file_id": "f_01H...",
    "filename": "invoice_packet.pdf",
    "segment_id": "s_01H...",
    "page_start": 4,
    "page_end": 5,
    "schema_id": "document_extraction_v3",
    "extraction_model": "example-extraction-model",
    "customer_name": "Example Customer Inc.",
    "field_name": "payment_terms",
    "field_value": "net_30",
    "confidence": 0.97
  }
]

Most of that text is repeated scaffolding. The LLM needs the evidence. It does not need the same file name, schema ID, model name, and object keys repeated on every row.

The benchmark

We tested this on a representative document-intelligence payload with extracted records, source files, segment references, evidence references, and policy-evaluation context. The examples here are generalized so the pattern is easier to apply across domains.

The baseline was a raw JSON-style prompt with repeated object keys and repeated metadata.

Format	Relative input size	Reduction vs. raw JSON	Data preserved	Output integrity
Raw repeated JSON prompt	1.00x	baseline	All extracted records	Accurate on small inputs; more likely to waste context on large repeated payloads
Lossless evidence aliases	~0.25x	~75%	All records and original data cells	Strongest balance of accuracy, completeness, citations, and length control
Evidence aliases plus response schema	~0.25x	~75%	All records and original data cells	Strongest production pattern when paired with provider structured outputs

The key is that the compact versions were not summaries. The manifest preserved:

every file extraction record
every file reference
every segment reference
every evidence alias
repeated string aliases
original data key/value cells

This is prompt compression by representation, not by deletion.

Output integrity matters as much as token count

The goal is not to make prompts smaller at any cost. A compressed prompt is only useful if the answer still lands correctly.

In our testing, each format was evaluated across output quality dimensions, not just input size:

Integrity check	What it tests	Why it matters
Source preservation	Can the encoded prompt reconstruct the original records?	Prevents silent data loss
Required field coverage	Does the answer include every required output field?	Avoids incomplete downstream payloads
Semantic accuracy	Does the model reach the same conclusion as the reference answer?	Measures whether compression changed the meaning
Citation validity	Do evidence references map back to real source records?	Keeps the answer auditable
Length adherence	Is the answer the right level of detail, not too short or bloated?	Prevents unusable summaries and runaway responses
Schema validity	Does the response conform to the expected output contract?	Keeps application parsing reliable
Boundary adherence	Does the model avoid unsupported claims outside the evidence?	Reduces hallucinated facts

This changed the ranking. A format that saves tokens but causes omitted fields, broken citations, or under-specified answers is not a winner. The best formats were the ones that reduced repeated structure while making the model’s job clearer: what evidence exists, how records relate, what must be cited, and what shape the answer must take.

That is why evidence aliases ranked first. They did not merely shrink the input. They preserved the evidence map, gave the model compact handles for claims, and left more room for task instructions, output requirements, and response schema. TOON-style and CSV/TSV formats also performed well when the data was regular, but they needed explicit metadata and schema rules to maintain the same citation and completeness guarantees.

Raw JSON remained useful as a safe interchange format, but it was not automatically better for output quality. On large repeated payloads, raw JSON can spend too much context on serialization scaffolding. That leaves less room for the instructions that keep the answer accurate, complete, properly scoped, and appropriately sized.

What external research says

The broader engineering literature supports the direction of this pattern, but it also adds caveats that matter in production.

First, TOON is real and useful, but its best use case is narrower than some of the hype suggests. The TOON project describes it as a lossless input representation for JSON data, with a sweet spot in uniform arrays of objects. Its own benchmark focuses on comprehension and data retrieval: the model receives formatted data and answers questions about it. That is a reading task, not a test of whether models should generate TOON as an output format.

A 2026 TOON-vs-JSON generation benchmark is more cautious. It found that TOON can have a promising accuracy-per-token profile, but that its advantage can be reduced by the extra prompt instructions needed to teach or constrain the format. The paper also found that plain JSON generation had the best final accuracy in some settings, while constrained JSON had the lowest output token budget with some accuracy tradeoffs. That matches the recommendation here: compact formats are best for model input, but JSON Schema and structured outputs are still the safer production boundary for model output.

Second, the idea of declaring schema once and sending many rows is showing up outside TOON. ONTO, a columnar notation proposed for LLM input optimization, uses the same core design: declare field names once, arrange values in compact rows, and preserve hierarchy with indentation. Its reported reductions versus JSON and comprehension checks support the broader point: repeated object keys are often the main source of JSON overhead, and row-oriented encodings can preserve task accuracy when the format context is clear.

Third, prompt compression research is more skeptical than token-savings blog posts. “Prompt Compression in the Wild” found that real latency gains depend on whether compression overhead is offset by faster decoding. “The Compression Paradox in LLM Inference” found that aggressive compression can cause quality loss and provider-dependent behavior. This is why the benchmark criteria above include output integrity, unsupported claim rate, response length adherence, and cost per successful task. Fewer input tokens are not automatically better.

Fourth, provider structured-output features are output controls, not input-compression strategies. OpenAI’s Structured Outputs documentation distinguishes JSON mode from schema adherence: JSON mode can produce valid JSON without guaranteeing that it matches a schema, while Structured Outputs are designed for schema matching. Gemini’s structured-output docs similarly support a subset of JSON Schema. These tools are exactly what you want at the response boundary, but they do not remove repeated keys from the prompt you send into the model.

Structured-output benchmarks reinforce the same point. StructEval evaluates formats such as JSON, YAML, XML, CSV, HTML, and SVG using syntax and structural-correctness metrics, and SoEval was created specifically because structured-output capability was under-measured in general LLM benchmarks. The implication for engineering teams is straightforward: do not assume “looks structured” means “is valid, complete, and correct.” Parse it, validate it, and score it against the task.

Finally, long-context research gives another reason to avoid bloated prompts, but it should not be overstated. “Lost in the Middle” showed that models can struggle to use information placed in the middle of long contexts. Newer long-context models have improved on simple retrieval tasks, so the argument is not “models cannot read long context.” The practical point is narrower: if repeated serialization scaffolding consumes context, cost, and attention, remove it before it crowds out evidence and instructions.

Why raw JSON burns tokens

Tokenizers do not understand that repeated JSON keys are “free.” They see text. The OpenAI tokenizer guide puts it plainly: models see text as tokens, and token counts determine whether an input fits and what it costs. That means "filename" repeated hundreds of times is not metadata. It is billable context.

JSON bloat usually comes from five places:

Repeated object keys
Repeated metadata values
Nested object structure
Verbose IDs and URLs
Pretty-printing and indentation

Minifying JSON helps, but only at the margins. It removes whitespace. It does not remove repeated keys or repeated values.

The better pattern: evidence aliases

Evidence alias encoding normalizes the prompt the same way a database normalizes repeated data.

Instead of repeating file and segment metadata on every record, define it once:

FILE_REF
F001|invoice_packet.pdf|uploaded_document
F002|payment_terms.csv|system_export

SEGMENT_REF
S001|F001|page=1|status=completed|confidence=0.98
S002|F002|page=1|status=completed|confidence=0.97

EVIDENCE_REF
E001|S001
E002|S002

Then send the data as rows:

INVOICE_DATA
eid|field|value|unit|period
E001|invoice_total|12400|USD|2026-04
E001|payment_terms|net_30|text|2026-04
E001|due_date|2026-05-15|date|2026-04
E002|approved_limit|15000|USD|2026-04

The model can still cite E001. The application can still expand E001 back to the exact file, segment, page, confidence, and extraction record. The prompt just stops paying to repeat that metadata every time.

Why this is different from summarization

Summarization reduces the prompt by discarding detail.

Evidence aliases reduce the prompt by eliminating duplication.

That distinction matters for regulated workflows, financial analysis, legal review, security, healthcare, and any use case where the answer needs to be auditable. If the model says “the invoice amount exceeds the approved limit,” the system must be able to show which source file and value supported that statement. A compressed summary cannot always do that. A lossless alias can.

The encoder should pass a reconstruction test:

raw extraction JSON
  -> encode to prompt aliases
  -> decode aliases back to canonical JSON
  -> compare hashes and record counts

If the decoded version does not match the canonical source, the format is not lossless enough for production.

Where JSON still belongs

This argument is not “replace JSON everywhere.”

JSON is still the right format for:

API requests and responses
database persistence
validation contracts
typed SDKs
event streams
audit exports
model output schemas

The right split is:

Use compact structured text for model input.
Use JSON Schema for model output.
Use JSON for application boundaries.

OpenAI’s structured-output guidance makes the output side clear: JSON mode only guarantees valid JSON, while Structured Outputs match the response to a schema when supported. Gemini’s structured outputs similarly let developers provide a JSON Schema so the model returns predictable, type-safe JSON. Claude now supports JSON outputs and strict tool use for schema validation.

Those features are excellent. They do not solve input token bloat by themselves.

JSON mode is not the same thing as structured outputs

Many teams conflate three different things:

“Please respond in JSON.”
JSON mode.
Strict schema-constrained structured outputs.

They are not equivalent.

OpenAI documents the distinction directly: JSON mode ensures valid JSON, but it does not guarantee that the output matches a specific schema. Structured Outputs are the stronger feature for schema adherence. Gemini’s docs make a similar point from the provider side: structured outputs can produce syntactically valid JSON matching a provided schema, but application code still needs semantic validation. Claude distinguishes JSON outputs from strict tool use: one controls the final response format, the other validates tool parameters.

In production, the safe pattern is:

Prompt: compact lossless input
Output: provider structured output
Application: validate business semantics anyway

Schema conformance does not mean the model reasoned correctly. It means the result is shaped correctly enough for your code to inspect it.

The ranked recommendations

Here is the practical order we would recommend to an engineering team choosing a structured input format for LLM prompts.

1. Lossless evidence aliases

This is the top recommendation for production agent workflows where the model receives many extracted records, files, citations, policies, or source references.

Evidence aliases win because they remove repeated structure without removing evidence. The encoder declares repeated file metadata, segment metadata, string values, and source references once. The data rows then cite compact IDs such as F001, S001, A001, and E001.

Why this is the best choice:

It produced the strongest result in our representative benchmark: roughly 75% fewer input tokens while preserving the records and data cells needed for reconstruction.
It keeps source citations stable, which matters for regulated, financial, legal, healthcare, and security workflows.
It is lossless when implemented with a decoder and reconstruction test.
It supported accurate, complete, properly scoped outputs because the model could cite compact evidence IDs instead of scanning repeated JSON scaffolding.
It improved length control by leaving more context for task instructions, output constraints, and response schemas.
It lets the model reason over compact evidence while the application keeps canonical JSON for storage, validation, and audit.

The tradeoff is engineering discipline. You need an encoder, a decoder, a manifest, token tests, and clear reconstruction rules. That is worth it for serious workflows because it turns prompt compression into software infrastructure instead of prompt hacking.

2. TOON-style table formats

TOON, or Token-Oriented Object Notation, is the best general-purpose alternative when your input is a uniform array of objects. Its core idea is similar to evidence aliases: declare fields once, then stream rows.

Why it ranks highly:

It directly targets the biggest JSON problem: repeated keys in arrays of similar objects.
It is more readable than dense custom encodings.
It can be easier to adopt than building a domain-specific alias format from scratch.
It is a strong fit for product catalogs, events, extracted entities, search results, and other repeated object lists.
It can preserve output accuracy when the row shape is regular and the prompt explains how fields map into the output schema.

The limitation is that TOON is not automatically the best format for every shape. Deeply nested, irregular, or small objects may not benefit enough to justify the extra instructions. It is also a format choice, not a full audit system. If your workflow needs citation-preserving reconstruction, TOON may need additional evidence IDs and manifests layered on top.

3. CSV / TSV plus metadata blocks

CSV and TSV are excellent when the data is actually tabular. They are hard to beat for compactness when every row has the same columns.

Why they work:

Column names appear once.
Rows are compact.
Models generally understand tables.
They are simple to generate and inspect.
They can produce concise, on-target answers when the task is a straightforward table evaluation.

The weakness is semantics. CSV does not naturally express nested objects, missing-vs-null distinctions, evidence provenance, data types, or per-cell confidence. For production use, pair tables with explicit metadata blocks:

TABLE: invoice_review
COLUMNS: evidence_id,field,value,threshold,status
NULL_RULES: __MISSING__ means absent key; null means explicit source null
UNITS: value and threshold are reported in the units column when applicable

This ranks below TOON and evidence aliases because it needs extra conventions once the workflow becomes more than a flat table.

4. XML-tagged prompt sections

XML tags are a strong prompt-organization tool. They are especially useful when you need to separate instructions, context, examples, documents, tool results, and output rules.

Why they are useful:

Tags make boundaries explicit.
They reduce instruction/context confusion.
They are easy to nest around documents and examples.
Anthropic explicitly recommends XML tags for structuring complex prompts.

But XML tags are not primarily a token-optimization format. They can make a prompt more reliable while also making large tabular payloads more verbose. Use XML-like tags for prompt boundaries; do not use XML as the main encoding for thousands of repeated data fields unless the task specifically benefits from that structure.

5. YAML

YAML is useful when humans need to read and edit the prompt payload. It can be more compact and readable than pretty JSON for moderate configuration objects.

Why it can help:

It is easier for humans to scan.
It avoids some JSON punctuation.
It is convenient for configuration, rules, and small nested structures.

Why it ranks lower:

Whitespace and indentation matter.
Ambiguity around scalars can create parsing surprises.
It is not as strong as JSON Schema or provider structured outputs for hard machine contracts.
Large arrays still repeat keys unless you restructure them.

YAML is a good authoring format. It is not the best answer for high-volume evidence encoding.

6. Minified JSON

Minified JSON is the best fallback when the object is small, irregular, deeply nested, or when your team cannot safely introduce another representation yet.

Why it remains useful:

Every system understands it.
It preserves exact machine semantics.
It is easy to validate.
It avoids whitespace waste.
It can be the most reliable option for small, irregular payloads where token bloat is not the limiting factor.

The problem is that minification only removes whitespace. It does not solve repeated keys, repeated metadata, or repeated IDs. For a few objects, that is fine. For hundreds of evidence records, it is usually the wrong default.

7. Pretty JSON

Pretty JSON belongs in logs, docs, debugging, and human review. It should be the last choice for large production prompts.

Why it ranks last:

It repeats every key.
It adds whitespace and indentation.
It often repeats metadata across records.
It can push workflows over context limits while adding no model-relevant information.

Pretty JSON is comfortable for engineers, but comfort is not the same as prompt efficiency.

The winners from this research

The research points to a three-part production pattern:

Use lossless evidence aliases as the default format for large, auditable LLM inputs.
Use TOON-style tables or CSV/TSV for simpler uniform arrays where full evidence reconstruction is not required.
Score every format on output integrity: accuracy, completeness, citation validity, length adherence, and schema validity.
Use provider structured outputs to return validated JSON at the application boundary.

That is the important distinction. The input format should be optimized for the model’s context window. The output format should be optimized for application validation.

A production encoder design

A production encoder should treat prompt format as an interface with tests.

1. Normalize repeated metadata

Create reference tables:

CONFIG_REF
C001|model=example-model|schema=document_extraction_v2

FILE_REF
F001|file_id=...|filename=...|category=...

SEGMENT_REF
S001|F001|page_start=1|page_end=2|confidence=0.98|C001

2. Alias repeated strings

Customer names, policy names, document categories, reporting periods, and source labels repeat constantly.

STRING_ALIAS
A001|Example Customer Inc.
A002|Payment Terms Review
A003|April 2026

Then rows can use @A001 instead of repeating long strings.

3. Keep evidence IDs stable

Every row that may support a claim needs an evidence handle:

EVIDENCE_ALIAS
E001|S001|source=invoice_packet.pdf

The LLM should cite E001, not invent a citation.

4. Preserve null vs. missing

Production encoders need boring details:

__MISSING__ = key did not exist in source record
null = key existed with explicit JSON null

If you collapse these, you lose data semantics.

5. Validate round-trip integrity

The encoder should emit a manifest:

{
  "source_sha256": "...",
  "records_preserved": "...",
  "data_cells_preserved": "...",
  "file_refs": "...",
  "segment_refs": "...",
  "evidence_aliases": "...",
  "string_aliases": "..."
}

This makes token optimization testable instead of aesthetic.

Prompt template

A compact prompt should make the reconstruction rules obvious:

TASK
Review the document package and identify any policy exceptions.
Use only the evidence rows and reference tables below.
Return evidence IDs for every material claim.

RECONSTRUCTION RULES
- E### joins to EVIDENCE_ALIAS.eid.
- EVIDENCE_ALIAS.sid joins to SEGMENT_REF.sid.
- SEGMENT_REF.fid joins to FILE_REF.fid.
- @A### expands through STRING_ALIAS.
- __MISSING__ means the source key was absent.
- null means the source key was explicitly null.

FILE_REF
...

SEGMENT_REF
...

STRING_ALIAS
...

DATA_TABLES
...

Then enforce the response at the API boundary:

{
  "type": "object",
  "required": ["status", "reasoning", "evidence_refs"],
  "additionalProperties": false,
  "properties": {
    "status": {
      "type": "string",
      "enum": ["pass", "exception", "needs_review"]
    },
    "reasoning": {
      "type": "string"
    },
    "evidence_refs": {
      "type": "array",
      "items": { "type": "string" }
    }
  }
}

What to benchmark

Do not choose a prompt format from a blog post, including this one. Run your own benchmark.

Track:

input tokens
output tokens
total latency
time to first token
schema-valid response rate
required field coverage
semantic correctness
citation correctness
response length adherence
unsupported claim rate
reconstruction success
cost per successful task

Prompt-compression research has the same warning. LLMLingua showed that prompt compression can reach very high compression ratios with limited quality loss in benchmark settings. A 2026 “Prompt Compression in the Wild” study found that real speedups depend on whether compression overhead is offset by faster decoding. In other words: fewer tokens is usually good, but it is not the only metric.

For structured business workflows, add one more metric: audit survivability.

If the model makes a claim, can the application trace that claim back to a source document after compression?

Also test answer shape directly. A response can be schema-valid and still be too thin, too verbose, or off target. For production workflows, define a target output envelope before benchmarking:

OUTPUT_QUALITY_TARGETS
- include every required decision field
- include citations for every material claim
- keep summary length within the requested range
- avoid claims not supported by evidence IDs
- preserve required tables, sections, or action items
- return machine-readable JSON at the boundary

This is where compact input formats can help output quality. When repeated prompt scaffolding is removed, the model has more usable context for the instructions that keep the response complete, appropriately detailed, and parseable.

Where this matters most

This pattern is most valuable when:

You send many records into a model.
Records share common keys and metadata.
You need evidence citations.
You need downstream structured JSON.
You are near context-window limits.
You run the workflow often enough for token cost to matter.

Examples:

document extraction review
loan underwriting packets
contract compliance monitoring
insurance claim files
medical necessity review
compliance testing
security event triage
RAG systems with many retrieved passages
multi-agent workflows that pass state between steps

This is less valuable when:

The object is small.
The data is deeply nested and irregular.
A model already consumes a file natively.
The LLM call is infrequent.
Human readability is more important than token cost.

The rule of thumb

Use JSON for machines. Use compact evidence formats for models. Use schemas for outputs.

Raw JSON is a good system boundary and a bad default prompt body. If your prompt includes hundreds of repeated JSON objects, you are probably paying the model to read your serialization format instead of your data.

The fix is not to remove evidence. The fix is to encode evidence better.

This article focuses on prompt input formats, but the format only works when it sits inside a larger production architecture. For regulated workflows, the surrounding system also needs a data engine that preserves source lineage, a policy engine that defines what evidence is required, and audit trails that connect every model-assisted decision back to source documents.

The same principle applies to document-heavy workflows such as document processing, policy evaluation, and building AI agents that do not hallucinate: compact inputs help, but only when extraction, validation, policy logic, and output schemas all preserve the evidence chain.

Implementation checklist

Identify LLM calls that paste large JSON arrays into prompts.
Count tokens for representative requests before changing anything.
Build a lossless encoder for repeated metadata, evidence IDs, and row data.
Add a decoder and hash-based round-trip test.
Keep stable evidence aliases that the model can cite.
Move output JSON schemas to provider structured-output APIs where supported.
Validate semantic correctness after schema validation.
Add token regression tests to CI.
Benchmark latency and cost, not only token count.
Document format rules so prompt changes do not break reconstruction.

Best Structured Prompt Formats for LLMs, Ranked

Quick Answer: What is the best structured prompt format?

Should you send JSON to an LLM?

Why engineers default to JSON

The benchmark

Output integrity matters as much as token count

What external research says

Why raw JSON burns tokens

The better pattern: evidence aliases

Why this is different from summarization

Where JSON still belongs

JSON mode is not the same thing as structured outputs

The ranked recommendations

1. Lossless evidence aliases

2. TOON-style table formats

3. CSV / TSV plus metadata blocks

4. XML-tagged prompt sections

5. YAML

6. Minified JSON

7. Pretty JSON

The winners from this research

A production encoder design

1. Normalize repeated metadata

2. Alias repeated strings

3. Keep evidence IDs stable

4. Preserve null vs. missing

5. Validate round-trip integrity

Prompt template

What to benchmark

Where this matters most

The rule of thumb

Implementation checklist

Sources and further reading

Where this applies in production

Frequently Asked Questions

Best Structured Prompt Formats for LLMs, Ranked

Quick Answer: What is the best structured prompt format?

Should you send JSON to an LLM?

Why engineers default to JSON

The benchmark

Output integrity matters as much as token count

What external research says

Why raw JSON burns tokens

The better pattern: evidence aliases

Why this is different from summarization

Where JSON still belongs

JSON mode is not the same thing as structured outputs

The ranked recommendations

1. Lossless evidence aliases

2. TOON-style table formats

3. CSV / TSV plus metadata blocks

4. XML-tagged prompt sections

5. YAML

6. Minified JSON

7. Pretty JSON

The winners from this research

A production encoder design

1. Normalize repeated metadata

2. Alias repeated strings

3. Keep evidence IDs stable

4. Preserve null vs. missing

5. Validate round-trip integrity

Prompt template

What to benchmark

Where this matters most

The rule of thumb

Related implementation context

Implementation checklist

Sources and further reading

Where this applies in production

Frequently Asked Questions

Related Articles

AI Agents for Legal: Contract Review, Compliance Monitoring, and Document Processing

What Is Non-Human Identity Management for AI Agents?

Designing Fault-Tolerant AI Agent Pipelines: Idempotency, Retries, and State Management