RFC-014: Engine Conformance Test Suite
Status: Draft Date: 2026-04-02 Authors: Eelco Hotting
Context
RFC-013 establishes that the schema is a specification and the engine is an implementation. Independent versioning allows third-party organisations to build their own engine. Without a shared behavioral contract, though, two engines claiming to support schema v0.5.1 might produce different outputs for identical inputs, with no way to determine which is correct.
Today, the regelrecht engine's correctness is verified through BDD tests (features/*.feature) using cucumber-rs. These are Rust-specific. A third-party Java or Python engine cannot run them. The tests verify integration behavior (loading regulations from the corpus, resolving cross-law references), not isolated schema-level operations. There is no artifact that says "if your engine supports schema v0.5.1, these are the exact inputs and outputs it must produce."
Three things depend on this:
Multi-org execution (RFC-009): when Org A accepts a value from Org B's engine, both engines must agree on what the law means. If they disagree, one has a bug. The conformance suite defines "correct."
Reproducibility (RFC-013): an Execution Receipt records which engine version produced a result. If a different engine (or engine version) re-executes the same regulation with the same inputs, it must produce the same outputs. The conformance suite makes this testable.
Schema as execution specification: without conformance tests, the schema only validates structure (does the YAML have the right fields?). It does not specify computation (what should the engine produce?). The conformance suite encodes the intended semantics of each operation.
Current state of testing
The engine has three layers of tests:
- Unit tests (
packages/engine/src/*.rs#[cfg(test)]): individual functions and operations in isolation. Rust-specific. - BDD tests (
features/*.feature+packages/engine/tests/bdd/): end-to-end scenarios using real regulation YAML from the corpus. Rust-specific (cucumber-rs). Cover realistic scenarios (zorgtoeslag, bijstand, erfgrensbeplanting) but do not test each operation in isolation. - Mutation tests (
.github/workflows/mutation-testing.yml): cargo-mutants to find untested code paths. Rust-specific.
None of these can be used by a third-party engine implementation.
Decision
1. Conformance test suite
A conformance test suite defines the expected behavior for each schema version. The suite is a collection of JSON files with no dependency on a specific runtime or test framework. Any engine that can read JSON and YAML can run the tests.
The suite lives under conformance/ in the repository:
conformance/
v0.5.0/
manifest.json
arithmetic.json
comparison.json
logical.json
conditionals.json
collection.json
variable_resolution.json
cross_law.json
date_operations.json
v0.5.1/
manifest.json
arithmetic.json
...
untranslatables.jsonEach schema version has its own directory. A new schema version inherits all tests from the previous version (they must still pass) and adds tests for new features.
2. Test format
Each test file contains an array of test groups. Each group has a description, a regulation YAML (inline or referenced), and an array of test cases:
{
"description": "MULTIPLY operation",
"schema_version": "v0.5.0",
"tests": [
{
"description": "multiply two positive integers",
"regulation": {
"yaml": "$schema: https://...\n$id: test_multiply\n..."
},
"article": "1",
"parameters": {
"value": 21
},
"calculation_date": "2025-01-01",
"expected": {
"outputs": {
"result": 42
}
}
},
{
"description": "multiply by zero",
"regulation": {
"yaml": "$schema: https://...\n$id: test_multiply\n..."
},
"article": "1",
"parameters": {
"value": 0
},
"calculation_date": "2025-01-01",
"expected": {
"outputs": {
"result": 0
}
}
}
]
}For tests that require multiple regulations (cross-law resolution, IoC):
{
"description": "cross-law source reference",
"schema_version": "v0.5.0",
"tests": [
{
"description": "resolve output from another regulation",
"regulations": [
{"yaml": "...primary law..."},
{"yaml": "...referenced law..."}
],
"article": "1",
"regulation_id": "primary_law",
"parameters": {
"input_value": 100
},
"calculation_date": "2025-01-01",
"expected": {
"outputs": {
"result": 200
}
}
}
]
}The expected block can assert error conditions:
{
"expected": {
"error": "unsupported_schema_version"
}
}Or trace properties (for engines that support tracing):
{
"expected": {
"outputs": {"result": 42},
"trace_contains": {
"node_type": "Operation",
"operation": "MULTIPLY"
}
}
}Trace assertions are optional. An engine that does not produce traces can skip them. Output assertions are mandatory.
3. Conformance levels
Tests are tagged with a conformance level. An engine declares which levels it supports per schema version.
| Level | Scope | Operations covered |
|---|---|---|
| Core | Single-article evaluation with basic operations | ADD, SUBTRACT, MULTIPLY, DIVIDE, EQUALS, GREATER_THAN, LESS_THAN, GREATER_THAN_OR_EQUAL, LESS_THAN_OR_EQUAL, AND, OR, NOT, IF, IN, LIST, MAX, MIN |
| Cross-law | Multi-regulation evaluation with source references | All Core operations + source.regulation resolution, parameter passing |
| IoC | Delegated legislation via open terms | All Cross-law operations + open_terms, implements (RFC-003) |
| Temporal | Date operations and multi-version law selection | All Core operations + AGE, DATE_ADD, DATE, DAY_OF_WEEK, valid_from filtering, reference_date |
| Advanced | Full engine features | All above + hooks (RFC-007), overrides, procedures (RFC-008), untranslatables (RFC-012), data sources |
Each level includes all operations from previous levels. A Core-level engine can execute simple single-law computations. An Advanced-level engine can participate in the full multi-org execution model (RFC-009).
Note: the engine's ADD operation handles numeric addition, array concatenation, and string concatenation. String concatenation tests are in arithmetic.json alongside numeric ADD tests. Rounding operations (ROUND, CEIL, FLOOR) are not in the operation set yet (RFC-012 lists rounding as an untranslatable). When added, they join the Core level and get their own conformance tests.
The manifest file declares the level structure. Each level lists its test files and which operations it covers:
{
"schema_version": "v0.5.0",
"levels": {
"core": {
"test_files": ["arithmetic.json", "comparison.json", "logical.json",
"conditionals.json", "collection.json", "variable_resolution.json"],
"operations": ["ADD", "SUBTRACT", "MULTIPLY", "DIVIDE", "EQUALS",
"GREATER_THAN", "LESS_THAN", "GREATER_THAN_OR_EQUAL",
"LESS_THAN_OR_EQUAL", "AND", "OR", "NOT", "IF",
"IN", "LIST", "MAX", "MIN"]
},
"cross_law": {
"test_files": ["cross_law.json"],
"operations": []
},
"temporal": {
"test_files": ["date_operations.json", "temporal_resolution.json"],
"operations": ["AGE", "DATE_ADD", "DATE", "DAY_OF_WEEK"]
}
}
}Levels without new operations (cross_law, ioc, advanced) have an empty operations array — they test resolution patterns and features, not additional operation types.
4. Coverage invariant
The engine defines Operation::SCHEMA_OPERATIONS — a const array of all operations that are part of the schema specification (excluding backward-compat aliases like NOT_EQUALS). An integration test (conformance_coverage.rs) reads the manifest and verifies:
- Every entry in
SCHEMA_OPERATIONSappears in at least one conformance level - Every operation in the manifest exists in
SCHEMA_OPERATIONS(no phantom operations) - No operation appears in more than one level
A separate unit test in types.rs verifies that SCHEMA_OPERATIONS + COMPAT_ALIASES account for every variant in the Operation enum. Adding a new operation variant without classifying it fails this test.
The chain of invariants: add an Operation variant → compiler forces a name() match arm → unit test forces adding it to SCHEMA_OPERATIONS or COMPAT_ALIASES → integration test forces adding schema operations to a conformance level in the manifest.
5. Provenance conformance (RFC-013 integration)
A dedicated provenance.json verifies that the engine produces correct Execution Receipts (RFC-013):
- Output includes
engine_versionfield - Output includes
schema_versionmatching the regulation's$schema - Output includes
regulation_hashas SHA-256 of the YAML content - When executing with accepted values (replay mode), outputs match the original execution
Provenance tests are required at all conformance levels. Even a Core-level engine must stamp its outputs with provenance metadata.
6. Test derivation from BDD scenarios
The initial conformance test suite is derived from the existing BDD features. The process:
- For each Gherkin scenario in
features/*.feature, extract the regulation YAML, input parameters, and expected outputs - Decompose integration scenarios into isolated operation-level tests where possible
- Add edge cases not covered by the BDD scenarios (null handling, empty arrays, boundary values, negative numbers)
- Validate by running the conformance suite against the regelrecht Rust engine
The BDD features remain as integration tests for the regelrecht engine. The conformance suite is the portable subset.
7. CI integration
The conformance suite is validated in CI:
- On every PR: run the regelrecht engine against the full conformance suite for all supported schema versions
- On schema changes: verify that new schema versions include conformance tests and that existing tests still pass
- On engine changes: verify that no conformance test regresses
A new CI job conformance runs alongside the existing test job. It uses the evaluate binary to execute each test case and compares outputs.
Why
Benefits
- Any organisation can build an engine and prove it behaves correctly by running the conformance suite. No Rust toolchain required.
- The conformance tests define what each operation does. The schema specifies structure; the conformance suite specifies computation. Together they form the complete specification.
- If an engine change causes a conformance test to fail, it is a behavioral change that must be intentional and versioned (per RFC-013).
- Third-party engines can start at Core level and work up. They do not need the full operation set to be useful for simple law evaluation.
- The test format is plain JSON. A test runner is around 50 lines in any language.
Tradeoffs
- The test suite must be maintained alongside the schema. Every new operation or semantic change requires new or updated tests.
- The suite can only test documented behavior. Undocumented edge cases (how does DIVIDE handle division by zero? what happens when AGE is computed for a future date?) must be decided and encoded as tests. Writing conformance tests for these forces a decision that was previously deferred.
- Covering all operations, types, and edge cases produces many tests. The suite will grow with each schema version.
- Passing the suite does not guarantee identical behavior for all possible inputs, only for the tested cases. Mutation testing and property-based testing (Rust-specific) complement it for the regelrecht engine.
Alternatives Considered
Alternative 1: Shared BDD features with multi-language runners
- Write Gherkin features and provide step definitions for multiple languages (Rust, Java, Python).
- Rejected because maintaining step definitions in multiple languages is ongoing cost that scales with the number of implementations. The JSON format means each implementer writes a thin test runner once.
Alternative 2: WASM-based conformance (compile the test runner to WASM)
- Provide a WASM module that runs conformance tests. Any platform that supports WASM can run them.
- Rejected because it adds a WASM dependency for testing. JSON files are simpler, easier to inspect, and more widely supported.
Alternative 3: Conformance via the inter-engine protocol
- A conformance service that sends test requests to an engine's HTTP endpoint and validates responses.
- Not rejected, but complementary. The JSON suite tests core logic without requiring HTTP infrastructure. A protocol-level conformance service could be built on top, testing the HTTP interface against the same test cases.
Implementation Notes
Phase 1: Test format and Core level
- Define the JSON test format (as described above)
- Write Core-level tests covering all arithmetic, comparison, logical, and conditional operations
- Add a Rust-based test runner that executes the conformance suite against the regelrecht engine
- Add
conformancejob to CI
Phase 2: Cross-law and IoC levels
- Extract cross-law tests from the zorgtoeslag and bijstand BDD scenarios
- Write IoC tests covering
open_termsandimplementsresolution - Add temporal tests for date operations and
valid_fromfiltering
Phase 3: Advanced level and provenance
- Add tests for hooks, overrides, untranslatables, procedures
- Add
provenance.jsontesting Execution Receipt fields (RFC-013) - Publish the suite as a standalone downloadable artifact (GitHub Release or npm package)
Phase 4: Documentation and third-party guide
- Write a guide for third-party engine implementers: how to run the suite, how to declare conformance level, how to report results
- Add conformance badge specification (e.g., "regelrecht conformant: Core v0.5.1")
Affected components
| File | Change |
|---|---|
conformance/ | New: test suite |
conformance/v0.5.0/manifest.json | Test manifest for schema v0.5.0 |
packages/engine/src/types.rs | SCHEMA_OPERATIONS and COMPAT_ALIASES consts, exhaustiveness test |
packages/engine/tests/conformance_coverage.rs | New: coverage invariant tests (manifest vs engine operations) |
packages/engine/tests/conformance.rs | New: Rust test runner for the conformance suite |
.github/workflows/ci.yml | New conformance job |
References
- RFC-003: Inversion of Control —
open_termsandimplements(IoC conformance level) - RFC-007: Reactive Execution — hooks and overrides (Advanced conformance level)
- RFC-008: Procedures — AWB procedure lifecycle (Advanced conformance level)
- RFC-012: Untranslatables — handling constructs beyond engine expressiveness (Advanced conformance level)
- RFC-013: Execution Provenance — Execution Receipt format (provenance conformance tests)
- JSON Schema Test Suite — model for language-agnostic conformance testing
- W3C Web Platform Tests — model for multi-implementation conformance testing at scale
- Glossary of Dutch Legal Terms