Short factual work
Cost rises without a matching quality gain.
Blog / evidence notes
The evidence dashboard is the audit trail; the articles are the decision story. Each essay starts with a question, names the chart used to answer it, and closes with an operator takeaway. Read them in order or jump to the layer you need.
The arc runs from claim to practice. The overview essay states the single message and the evidence boundary. The evidence topics open each measured slice one at a time. The bridge essay marks where the voice shifts from measurement to production. The operations essays carry the same measurement habit into day-to-day running — recovery, cache routing, retention, and migration sizing. Underneath all of it, the evidence dashboard holds the governed tables and source charts every claim is drawn from.
A guided read of the public chart-data snapshot: where reasoning effort wastes spend, where it earns a trial, and why hidden reasoning tokens must be budgeted explicitly.
Cost rises without a matching quality gain.
Internal tokens explain why the bill changes even when output looks similar.
Reasoning can pay when the evaluator moves enough to justify it.
Agent workloads need quality and latency read together.
Operator pattern (L6), not a new measurement: bound the loop the ceiling check can't reach.
Modeled crossover is planning guidance, not direct capacity evidence.
Inspect the governed tables and rendered source charts behind the prose.
One essay sits between the evidence and operations layers, naming why the voice changes and which habits carry across.
How the evidence-layer measurement habit transfers to production operations, and what stays the same.
These essays move one layer closer to production operations: recovery, cache routing, retention, and migration sizing.
retry-after-msWhy PTU recovery should follow the service header instead of health-check polling.
prompt_cache_key bucketingCache keys are routing-affinity controls; bucket by workload, not by request ID.
When the operation assumes a longer cache window, make retention an explicit request policy.
Explain PTU demand through output weighting, reasoning tokens, cache shape, and max-token policy.