RFC-012: Untranslatables
Status: Proposed Date: 2026-04-01 Authors: Eelco Hotting
Context
The engine's operation set is deliberately small — arithmetic, comparison, logical, conditional, and a handful of date operations. It only grows when real legal texts demand a new construct. This RFC is about streamlining what happens at that moment of discovery.
Dutch law regularly uses constructs that fall outside the current operation set. The law-generate process (LLM-driven translation from legal Dutch to machine-readable YAML) has no structured way to handle this. Without an explicit "I can't express this" path, the translator will approximate: flattening tables into fragile IF/cases trees, pre-computing dynamic values, silently omitting inexpressible conditions, or inventing creative encodings that pass validation but misrepresent the law.
We call these constructs untranslatables: the law is clear about what it means, but the engine's formal language cannot yet express it. The term is borrowed from translation theory — the law-generate process is translation, and some things don't cross the boundary between natural language and the engine's operation set.
Decision
The untranslatables mechanism has three layers: generation-time detection, schema annotation, and engine runtime behavior.
1. Generation-time detection
The law-generate skill recognizes when a legal construct cannot be faithfully expressed with the available operations. When this happens, the translator:
- Adds an
untranslatablesentry to the article'smachine_readablesection (see layer 2) - Skips the untranslatable parts — no approximation
- Generates execution logic for the parts that are translatable
- Reports all untranslatables in the generation summary
The law-reverse-validate skill detects likely workarounds that indicate the translator improvised instead of flagging:
IFwith >8 cases (possible inlined bracket table)- Arithmetic chains that approximate rounding
- Hardcoded values that look like pre-computed aggregations
These are flagged as "possible untranslatable workaround — verify with human."
2. Schema annotation
An optional untranslatables field on the machine_readable section. Each entry has two required fields (construct, reason) and three optional fields (suggestion, legal_text_excerpt, accepted):
machine_readable:
untranslatables:
- construct: "afronden op hele euro's"
reason: "Rounding is not available as an engine operation"
suggestion: "Add ROUND/CEIL/FLOOR operation to engine"
legal_text_excerpt: "Het bedrag wordt naar boven afgerond op hele euro's"
accepted: false
execution:
# ... execution logic for the parts that ARE translatableUntranslatables are structured (queryable by tooling), persistent (survive across generate/validate cycles), visible (surfaced in admin dashboard and pipeline), and co-located (next to the article they apply to).
The accepted field (default false) indicates whether a human has reviewed and acknowledged the gap. This enables per-article scoping of runtime behavior (see layer 3).
Articles with untranslatables may still have partial execution logic for the parts that are expressible.
3. Engine runtime behavior
When the engine encounters articles with untranslatables, behavior is controlled by the --untranslatable flag. The engine always has partial execution logic (generation already excluded the untranslatable parts), so the modes control how the engine treats that known-incomplete result:
| Mode | Behavior | Use case |
|---|---|---|
error (default) | Hard error on any unaccepted untranslatable. Accepted untranslatables execute their partial logic with a trace entry. | CI, production |
propagate | Execute partial logic. Outputs from articles with untranslatables carry an UNTRANSLATABLE taint that propagates through downstream operations. | Audit, analysis |
warn | Execute partial logic, log warning in trace. No taint propagation — outputs look normal but the trace shows they're incomplete. | Development, exploration |
ignore | Execute partial logic silently. Only valid for entries with accepted: true — unaccepted untranslatables still error. | Human-verified acceptable gaps |
Default is fail-fast. Tolerating gaps requires explicit opt-in.
The trace records untranslatables regardless of mode. The flag controls what the engine does, not whether it notices.
In propagate mode, UNTRANSLATABLE behaves like NaN in floating point: any operation involving an untranslatable input produces an untranslatable output. The result shows exactly which outputs are tainted and which are trustworthy. The trace captures the origin point.
The accepted field provides per-article scoping. In error mode (default), accepted untranslatables are allowed to execute their partial logic — a human has verified the gap is tolerable. Unaccepted untranslatables always error in error and ignore modes. This prevents a global bypass: teams cannot set --untranslatable=ignore and accidentally suppress newly-discovered gaps.
Why
Benefits
- Prevents silent divergence between law text and machine-readable interpretation
- Fail-fast default: gaps are never silently ignored in CI or production
- Per-article acceptance prevents global bypass — new untranslatables always surface
- Propagation mode shows which outputs are affected without stopping execution
- Drives the engine roadmap: the corpus tells us which operations to add next
- A flagged gap is better than a plausible-looking wrong answer
- Each layer is independently useful — layer 1 requires no code changes
Tradeoffs
- More articles will be incomplete in the short term, because the translator can no longer paper over gaps
- Schema change for layer 2 — all tooling must handle
untranslatablesgracefully - Heuristic detection has false positives — a legitimate 10-case IF is not necessarily a workaround
- LLM compliance is imperfect — even with explicit instructions, models may still improvise; the reverse-validate heuristics are a safety net, not a guarantee
- Propagation adds engine complexity — a new value type handled in every operation
Alternatives Considered
Alternative 1: Expand the operation set preemptively
- Add operations before they're needed.
- Rejected: violates the principle of minimal, auditable operations. We add operations when concrete legal texts demand them.
Alternative 2: YAML comments only
- Flag issues as
# UNTRANSLATABLE: ...comments, no schema field. - Rejected as sole mechanism: comments are invisible to tooling, can't be queried, and are easily lost during edits.
Alternative 3: Fail the entire law
- Abort generation for the whole file if one article is untranslatable.
- Rejected: most laws have a mix. Partial coverage with clear annotations is more useful than no coverage.
Alternative 4: Propagate by default
- Default to propagation, require
--untranslatable=errorfor strictness. - Rejected: silent partial results in production are dangerous. The safe default is fail-fast.
Alternative 5: Global ignore without per-article scoping
- A single
--untranslatable=ignoreflag that bypasses all untranslatables. - Rejected: breaks the trust model. A team verifying one gap and setting the flag would unknowingly suppress all future gaps. The
acceptedfield provides the needed granularity.
Implementation Notes
Layer 1 requires editing the law-generate and law-reverse-validate skill definitions. No code changes.
Layer 2 requires adding untranslatables to the schema, and updating corpus, admin, and pipeline to parse, display, and track them.
Layer 3 requires adding an Untranslatable variant to the engine's Value type, propagation logic in every operation, CLI flag parsing, trace entries, and result distinction between clean and tainted outputs.
Ordering is layer 1 → 2 → 3. Each layer is independently useful.
References
- RFC-003: Inversion of Control — IoC pattern for cross-law references
- RFC-007: Reactive Execution — hooks and overrides mechanism
- Glossary of Dutch Legal Terms