top of page

Modelnomics: Why You Should Not Run a Frontier Model to Send an Email

The most expensive habit in applied AI right now is not building too much. It is running a frontier model to do work that a model costing a fraction as much would do identically.

Call it Modelnomics: the discipline of matching the right model to the right job. Frontier reasoning models are extraordinary. They are also the wrong default for the bulk of what production AI systems actually do, which is run the same defined play thousands of times a day.

The anti-pattern: running a frontier model to send an email

Picture a workflow that fires a confirmation email after a form is submitted. The body is templated. The variables are known. The output is deterministic. Routing that through a top-tier reasoning model is the AI equivalent of chartering a jet to cross the street.

You pay three times for that mistake: frontier token rates on a trivial task, added latency your user feels, and, counterintuitively, often lower reliability. A powerful model handed an open-ended prompt has more room to improve output you wanted left exactly as specified.

The core principle: architect hot, run cold

Almost every AI workflow has two phases that get billed as if they were one.

Phase one is the heavy lift: the novel reasoning. Designing the system, mapping the edge cases, writing the prompt and the schema, solving the hard, never-seen-before problem. This is where frontier models earn their price. Spend here, and use the most capable model you can afford.

Phase two is execution: running the play you just defined. Once the workflow is designed and the prompt is locked, the ten-thousandth run is no longer a hard problem. It is a known transformation. Hand it to an economy model such as Composer 2.5 or Gemini Flash 3, and keep the frontier model on the bench for when something genuinely novel shows up.

Architect hot. Run cold. The expensive model designs the machine; the cheap model operates it.

The Modelnomics rubric: five questions per task

Before you assign a model to a step, score the step on five axes:

  1. Novelty. Is this a new problem, or a defined play being repeated? Novel work moves up a tier.

  2. Reasoning depth. Does it need multi-step judgment, or is it extraction, classification, or templating?

  3. Volume. Ten times a month tolerates a premium. Fifty thousand times a month does not.

  4. Latency tolerance. A user waiting on a live response feels every extra second; a nightly batch does not.

  5. Blast radius. What does an error cost? High-stakes, irreversible output justifies a stronger model and a review step.

A task-to-tier map

As a starting default, then tune it to your own evals:

  • Frontier tier (Opus 4.8, GPT-5.5 class): novel architecture, ambiguous multi-step reasoning, code that does not exist yet, gnarly debugging, one-time heavy lifts, anything where being right matters more than being cheap.

  • Mid tier: structured drafting against a clear brief, code edits with a tight spec, summarization that needs some judgment, first-pass research.

  • Economy tier (Composer 2.5, Gemini Flash 3 class): high-volume classification, field extraction, routing, tagging, templated generation, and the routine execution of any play you have already designed and validated.

  • No model at all: if the task is fully deterministic, send the email, write the row, hit the webhook, then a template and a function beat any model on cost, speed, and reliability. The cheapest token is the one you never spend.

Put a router in front

The mechanism that makes this real is a router: a cheap classifier that reads each incoming task and decides which tier handles it. Easy and known? Economy model. Uncertain or novel? Escalate to a stronger model, and only then. Most workloads skew heavily toward routine, so a router quietly moves the majority of traffic into the cheap lane without anyone noticing a quality drop.

Escalation should be the exception you measure, not the default you pay for.

Why this compounds

Take a workflow handling 50,000 tasks a month where 90 percent are routine. The gap between a frontier model and an economy model on those routine tasks is frequently 20 to 40 times per unit of work. Run all of it hot and you are not buying better outcomes on the routine 90 percent, because the output is identical. You are simply burning runway.

Multiply that across every workflow in the business and model selection stops being a technical detail. It becomes one of the larger line items you control, and one of the few you can cut without cutting anything a customer feels.

The implementation checklist

  1. Inventory your AI workflows and break each into discrete steps.

  2. Tag every step as phase one (design and novel) or phase two (execution and repeat).

  3. Set an economy model as the default for phase-two steps.

  4. Reserve frontier models for design, novel problems, and escalation.

  5. Add a router so easy tasks never touch an expensive model.

  6. Strip out any step where deterministic code does the job and no model is needed.

  7. Measure cost per outcome, not cost per token, and re-run the audit quarterly as prices and capabilities move.

FAQ

Does using a cheaper model mean worse results?

Not for the right tasks. On defined, repeatable work such as extraction, classification, and templated generation, a well-prompted economy model matches frontier output at a fraction of the cost. The quality risk shows up only when you push a cheap model at genuinely novel reasoning, which is exactly what the router exists to prevent.

When is a frontier model worth it?

When the problem is new, the reasoning is deep and multi-step, the output is high-stakes, or you are still designing the workflow. Pay for capability while you are figuring out the play. Stop paying for it once the play is defined.

What is the fastest win?

Find your highest-volume AI workflow, confirm whether it is running on a frontier model, and move the routine path to an economy model behind a router. That single change usually pays for the whole audit several times over.

Modelnomics is not about always buying cheap. It is about refusing to overpay for work that does not need the horsepower. Use the expensive model to build the machine. Use the cheap model to run it. That is the whole discipline.

This is the same model-tiering logic behind every AI system I design for clients. If your AI spend is climbing faster than your output, that gap is usually hiding in the wrong tier, and it is fixable. Start a conversation.

Comments


bottom of page