Code Mode for MCP

LLMs have, in practice, proven their usefulness in generating program code. You cannot say that LLMs have excelled at architecture or engineering ideas – that is not their job – but generating structured code is one of their strengths. A rather interesting and reasonable question arises: can we entrust an LLM with decision-making? Or do we want binary logic to make those decisions? The question is interesting and even philosophical, because there are areas where an error in a decision will not lead to fatal consequences – for example, deciding when to clear the shoulders of rural roads. If an LLM decides to do this every two weeks, or once a month, or once every two months, it will not significantly affect traffic safety, or stormwater overflow, or anything else. But if the LLM pushes back the same schedule on a high-speed motorway, it will affect road safety. So it is becoming more and more obvious that there are questions where the good old if / else / case do the job much better and more precisely in the domain of decision-making and planning.

What LLMs Are Good At, and Where They Stop

An LLM can write a book (this is not a joke) using your service’s MCP, but can it decide what to do with a transaction? An order? Obviously not. More than that – there are specialized models trained on data precisely to make such decisions, and to keep a human at the head of that process. The approach first articulated at Cloudflare is worth studying, and worth studying at different levels of management. What if we let the model do what it does best? Yes, exactly that: write code.

An LLM can write a scenario that can then be processed – together with the company’s rules (its expert knowledge) – and a decision can be made on top of it. This is a revolutionary idea for the use of language models. And yet linguistic communication is exactly what business decisions are made of: how many meetings with experts does it take to settle one question or another, and whether we like it or not, we use language to do it.

A Claims Example

Consider an example for an insurance company that processes claims. Hundreds, if not thousands, of refund requests come in – usually they are simply invoices. Can an LLM extract the information? Of course – this is a routine task for a language model paired with OCR. Then the model generates the scenario it sees, in the Talon language. But the insurance company has its own expert rules, and those rules can be applied on top of the LLM’s code.

Imagine a claim arrives as a PDF invoice from a clinic. The LLM reads the document, OCRs the amounts, identifies the policy holder, the diagnosis codes, the providers – and emits a Talon scenario describing what it found. The expert system then takes that scenario, checks it against policy facts, regulatory limits, fraud heuristics, and the insurer’s own rules, and decides whether to pay, partially pay, reject, or escalate to a human.

1. What the LLM generates from the invoice

The LLM never sees the policy, never sees the rules. It only sees the document. Its job is to convert unstructured text into a structured Talon scenario – a set of facts about this claim:

// ── Generated by the LLM from claim_invoice.pdf ─────────
// The LLM acts as an information provider, not a decision-maker.

fact "claim.C-2026-04891" {
  holder_id = "P-882341"
  provider_id = "clinic_zurich_01"
  visit_date = "2026-05-22"
  amount_chf = 620
  service_category = "outpatient"
  diagnosis_codes = ["J06.9", "R50.9"]
  referral_id = "REF-77120"
  document_source = "invoice_2026_04891.pdf"
}

That is the entire LLM contribution: one block of facts. No decisions, no thresholds, no policy logic. If the LLM hallucinates an amount, the worst it can do is produce a wrong fact – which the expert system can still cross-check.

2. What the company’s expert rules add on top

The rest of the file is written once, by the insurer, and reviewed by underwriters, lawyers, and compliance. It does not change between claims. It is the deterministic layer that makes the actual decision:

// ── Authored by the insurance company ───────────────────
// Policy facts, regulatory limits, and decision rules.
// Reviewed by underwriting, legal, and compliance.

fact "policy.P-882341" {
  plan = "outpatient_basic"
  annual_limit_chf = 5000
  annual_used_chf = 1840
  deductible_chf = 500
  deductible_used_chf = 320
  excluded_categories = ["cosmetic", "experimental"]
  network = "tier_1"
}

fact "provider_directory" {
  in_network = ["clinic_zurich_01", "clinic_basel_04"]
  blacklisted = ["clinic_unverified_99"]
}

fact "regulatory_limits.outpatient" {
  max_per_visit_chf = 800
  requires_referral_if_over_chf = 400
}

// ── Conditions ──────────────────────────────────────────

define "provider_in_network" {
  "claim.provider_id" in "provider_directory.in_network"
}

define "provider_blacklisted" {
  "claim.provider_id" in "provider_directory.blacklisted"
}

define "category_excluded" {
  "claim.service_category" in "policy.excluded_categories"
}

define "exceeds_per_visit_cap" {
  "claim.amount_chf" > "regulatory_limits.outpatient.max_per_visit_chf"
}

define "needs_referral" {
  "claim.amount_chf" > "regulatory_limits.outpatient.requires_referral_if_over_chf"
    and "claim.referral_id" == null
}

define "within_annual_limit" {
  "policy.annual_used_chf" + "claim.amount_chf" <= "policy.annual_limit_chf"
}

// ── Decision rules ──────────────────────────────────────

rule "Auto-reject blacklisted provider" {
  when is "provider_blacklisted"
  do set "claim.decision" "rejected"
  do explain "Provider {claim.provider_id} is on the fraud blacklist"
  do notify "team.fraud" "claim.id"
}

rule "Auto-reject excluded category" {
  when is "category_excluded"
  do set "claim.decision" "rejected"
  do explain "Service category {claim.service_category} is not covered by plan {policy.plan}"
}

rule "Escalate when referral missing" {
  when is "needs_referral"
  do require "review.claims_officer"
  do explain "Amount {claim.amount_chf} CHF requires a referral; none provided"
}

rule "Cap payout at per-visit limit" {
  when is "exceeds_per_visit_cap"
  do set "claim.payable_chf" "regulatory_limits.outpatient.max_per_visit_chf"
  do comment "claim" "Payout capped at regulatory per-visit maximum"
}

rule "Approve routine in-network claim" {
  when is "provider_in_network"
    and not is "category_excluded"
    and not is "exceeds_per_visit_cap"
    and not is "needs_referral"
    and is "within_annual_limit"
  do set "claim.decision" "approved"
  do set "claim.payable_chf" "claim.amount_chf" - "policy.remaining_deductible"
  do schedule_payment "claim.payable_chf" to "claim.holder_id"
}

// ── Anomaly detection — escalate to a human ─────────────

rule "Unusual frequency from one holder" {
  when count("claims.by_holder", last_days = 30) > 6
  do require "review.fraud_analyst"
  do explain "Holder {claim.holder_id} submitted {count} claims in 30 days"
}

3. The LLM closes the loop – but only with permission

Once the expert system has made the decision, the LLM is allowed back in for one more task: drafting the letter to the policy holder in human language. The expert system verifies the draft against the recorded facts before anything is sent:

// ── LLM drafts, expert system verifies, then sends ──────

rule "Draft explanation letter, verify, then send" {
  when "claim.decision" in ["approved", "rejected", "capped"]
  do llm_draft "letter" from "claim.decision_facts" tone "formal"
  do verify "letter" against "claim.decision_facts"
  do send "letter" to "claim.holder_id" only_if "verify.passed" == true
}

Look at the division of labor. The LLM does what an LLM is genuinely good at: reading a messy invoice in any language, pulling out amounts, dates, provider IDs, diagnosis codes; and later, drafting a clear letter to the policy holder. The expert system does what it is good at: checking the extracted facts against policy limits, provider directories, regulatory caps, and fraud heuristics; deciding to approve, reject, cap, or escalate; and refusing to send any letter whose claims do not match the recorded facts.

This is Code Mode applied to a business decision. The LLM is not deciding whether the claim pays out. The LLM is generating a structured scenario from unstructured input. The decision is made by deterministic rules that an auditor, a regulator, or a claims supervisor can read and sign off on.

How Does the LLM Know Talon?

A reasonable worry: if the LLM has to emit Talon, does that mean every request is bloated by a giant grammar prompt? No. Talon is intentionally small. The whole grammar – enough for the LLM to write valid scenarios – fits in a system prompt of a few dozen lines:

You write scenarios in Talon. Talon has three top-level forms:

  fact "<name>" { key = value, ... }       // declare a fact
  define "<name>" { <condition expr> }     // name a condition
  rule "<name>" { when <cond> do <act>... } // fire actions when a condition holds

Values: string "x", number 42, boolean true/false, list [a, b], null.
Refs:   "namespace.field" reads from a fact.

Conditions:
  is "<name>"              not is "<name>"
  <ref> == <value>         <ref> != <value>
  <ref> < <value>          <ref> > <value>   (also <= >=)
  <ref> in <list>          <ref> not in <list>
  <ref> exists             <ref> not_empty
  <cond> and <cond>        <cond> or <cond>

Actions inside `do`:
  set <ref> <value>
  notify <channel> <message>
  explain "<text with {ref} interpolation>"
  require <reviewer>
  block <action>
  llm_extract <ref> into <ref>
  llm_draft <ref> from <ref> tone "<tone>"
  verify <ref> against <ref>
  send <ref> to <ref> only_if <ref> == <value>

Your job: read the input document and emit ONLY `fact` blocks.
Do not write `rule` or `define` -- those are authored by the company.

That is the entire prompt. Forty-odd lines. The model is now able to read an invoice, a service note, a contract, or a support ticket and emit a Talon fact block that the company’s existing rules can immediately reason over. No fine-tuning, no special model, no megabyte-long system prompt – just a tiny, deterministic grammar that the LLM treats like any other structured output format.

This is the second half of why Code Mode for MCP works. The language is small enough to teach in a prompt, and the LLM is constrained to the one form it is good at: producing facts. Everything else – the rules, the policies, the decisions – lives outside the prompt entirely, in the company’s expert system.

Why This Is the Future

This, I am convinced, is the future of LLM use for business decisions without risk. Let the model write the script. Let the expert system run it. The LLM’s strength – generating structured language – becomes the bridge between unstructured human input and deterministic rules. The risk surface of “the model hallucinated a decision” disappears, because the model is no longer the one deciding. It is preparing the case file.

For anything that touches money, safety, or compliance, this is the architecture that scales. It is also the architecture that survives an audit – because every decision can be traced to the rule that fired and the facts that triggered it. That, paired with the EITL principle, is what makes the combination of LLMs and expert systems genuinely usable in the enterprise.

What LLMs Are Good At, and Where They Stop#

A Claims Example#

1. What the LLM generates from the invoice#

2. What the company’s expert rules add on top#

3. The LLM closes the loop – but only with permission#

How Does the LLM Know Talon?#

Why This Is the Future#