RFC-014: Engine Conformance Test Suite

Status: Draft Date: 2026-04-02 Authors: Eelco Hotting

Context

RFC-013 establishes that the schema is a specification and the engine is an implementation. Independent versioning allows third-party organisations to build their own engine. Without a shared behavioral contract, though, two engines claiming to support schema v0.5.1 might produce different outputs for identical inputs, with no way to determine which is correct.

Today, the regelrecht engine's correctness is verified through BDD tests (features/*.feature) using cucumber-rs. These are Rust-specific. A third-party Java or Python engine cannot run them. The tests verify integration behavior (loading regulations from the corpus, resolving cross-law references), not isolated schema-level operations. There is no artifact that says "if your engine supports schema v0.5.1, these are the exact inputs and outputs it must produce."

Three things depend on this:

Multi-org execution (RFC-009): when Org A accepts a value from Org B's engine, both engines must agree on what the law means. If they disagree, one has a bug. The conformance suite defines "correct."
Reproducibility (RFC-013): an Execution Receipt records which engine version produced a result. If a different engine (or engine version) re-executes the same regulation with the same inputs, it must produce the same outputs. The conformance suite makes this testable.
Schema as execution specification: without conformance tests, the schema only validates structure (does the YAML have the right fields?). It does not specify computation (what should the engine produce?). The conformance suite encodes the intended semantics of each operation.

Current state of testing

The engine has three layers of tests:

Unit tests (packages/engine/src/*.rs #[cfg(test)]): individual functions and operations in isolation. Rust-specific.
BDD tests (features/*.feature + packages/engine/tests/bdd/): end-to-end scenarios using real regulation YAML from the corpus. Rust-specific (cucumber-rs). Cover realistic scenarios (zorgtoeslag, bijstand, erfgrensbeplanting) but do not test each operation in isolation.
Mutation tests (.github/workflows/mutation-testing.yml): cargo-mutants to find untested code paths. Rust-specific.

None of these can be used by a third-party engine implementation.

Decision

1. Conformance test suite

A conformance test suite defines the expected behavior for each schema version. The suite is a collection of JSON files with no dependency on a specific runtime or test framework. Any engine that can read JSON and YAML can run the tests.

The suite lives under conformance/ in the repository:

conformance/
  v0.5.0/
    manifest.json
    arithmetic.json
    comparison.json
    logical.json
    conditionals.json
    collection.json
    variable_resolution.json
    cross_law.json
    date_operations.json
  v0.5.1/
    manifest.json
    arithmetic.json
    ...
    untranslatables.json

Each schema version has its own directory. A new schema version inherits all tests from the previous version (they must still pass) and adds tests for new features.

2. Test format

Each test file contains an array of test groups. Each group has a description, a regulation YAML (inline or referenced), and an array of test cases:

json

{
  "description": "MULTIPLY operation",
  "schema_version": "v0.5.0",
  "tests": [
    {
      "description": "multiply two positive integers",
      "regulation": {
        "yaml": "$schema: https://...\n$id: test_multiply\n..."
      },
      "article": "1",
      "parameters": {
        "value": 21
      },
      "calculation_date": "2025-01-01",
      "expected": {
        "outputs": {
          "result": 42
        }
      }
    },
    {
      "description": "multiply by zero",
      "regulation": {
        "yaml": "$schema: https://...\n$id: test_multiply\n..."
      },
      "article": "1",
      "parameters": {
        "value": 0
      },
      "calculation_date": "2025-01-01",
      "expected": {
        "outputs": {
          "result": 0
        }
      }
    }
  ]
}

For tests that require multiple regulations (cross-law resolution, IoC):

json

{
  "description": "cross-law source reference",
  "schema_version": "v0.5.0",
  "tests": [
    {
      "description": "resolve output from another regulation",
      "regulations": [
        {"yaml": "...primary law..."},
        {"yaml": "...referenced law..."}
      ],
      "article": "1",
      "regulation_id": "primary_law",
      "parameters": {
        "input_value": 100
      },
      "calculation_date": "2025-01-01",
      "expected": {
        "outputs": {
          "result": 200
        }
      }
    }
  ]
}

The expected block can assert error conditions:

json

{
  "expected": {
    "error": "unsupported_schema_version"
  }
}

Or trace properties (for engines that support tracing):

json

{
  "expected": {
    "outputs": {"result": 42},
    "trace_contains": {
      "node_type": "Operation",
      "operation": "MULTIPLY"
    }
  }
}

Trace assertions are optional. An engine that does not produce traces can skip them. Output assertions are mandatory.

3. Conformance levels

Tests are tagged with a conformance level. An engine declares which levels it supports per schema version.

Level	Scope	Operations covered
Core	Single-article evaluation with basic operations	ADD, SUBTRACT, MULTIPLY, DIVIDE, EQUALS, GREATER_THAN, LESS_THAN, GREATER_THAN_OR_EQUAL, LESS_THAN_OR_EQUAL, AND, OR, NOT, IF, IN, LIST, MAX, MIN
Cross-law	Multi-regulation evaluation with source references	All Core operations + `source.regulation` resolution, parameter passing
IoC	Delegated legislation via open terms	All Cross-law operations + `open_terms`, `implements` (RFC-003)
Temporal	Date operations and multi-version law selection	All Core operations + AGE, DATE_ADD, DATE, DAY_OF_WEEK, `valid_from` filtering, `reference_date`
Advanced	Full engine features	All above + hooks (RFC-007), overrides, procedures (RFC-008), untranslatables (RFC-012), data sources

Each level includes all operations from previous levels. A Core-level engine can execute simple single-law computations. An Advanced-level engine can participate in the full multi-org execution model (RFC-009).

Note: the engine's ADD operation handles numeric addition, array concatenation, and string concatenation. String concatenation tests are in arithmetic.json alongside numeric ADD tests. Rounding operations (ROUND, CEIL, FLOOR) are not in the operation set yet (RFC-012 lists rounding as an untranslatable). When added, they join the Core level and get their own conformance tests.

The manifest file declares the level structure. Each level lists its test files and which operations it covers:

json

{
  "schema_version": "v0.5.0",
  "levels": {
    "core": {
      "test_files": ["arithmetic.json", "comparison.json", "logical.json",
                      "conditionals.json", "collection.json", "variable_resolution.json"],
      "operations": ["ADD", "SUBTRACT", "MULTIPLY", "DIVIDE", "EQUALS",
                      "GREATER_THAN", "LESS_THAN", "GREATER_THAN_OR_EQUAL",
                      "LESS_THAN_OR_EQUAL", "AND", "OR", "NOT", "IF",
                      "IN", "LIST", "MAX", "MIN"]
    },
    "cross_law": {
      "test_files": ["cross_law.json"],
      "operations": []
    },
    "temporal": {
      "test_files": ["date_operations.json", "temporal_resolution.json"],
      "operations": ["AGE", "DATE_ADD", "DATE", "DAY_OF_WEEK"]
    }
  }
}

Levels without new operations (cross_law, ioc, advanced) have an empty operations array — they test resolution patterns and features, not additional operation types.

4. Coverage invariant

The engine defines Operation::SCHEMA_OPERATIONS — a const array of all operations that are part of the schema specification (excluding backward-compat aliases like NOT_EQUALS). An integration test (conformance_coverage.rs) reads the manifest and verifies:

Every entry in SCHEMA_OPERATIONS appears in at least one conformance level
Every operation in the manifest exists in SCHEMA_OPERATIONS (no phantom operations)
No operation appears in more than one level

A separate unit test in types.rs verifies that SCHEMA_OPERATIONS + COMPAT_ALIASES account for every variant in the Operation enum. Adding a new operation variant without classifying it fails this test.

The chain of invariants: add an Operation variant → compiler forces a name() match arm → unit test forces adding it to SCHEMA_OPERATIONS or COMPAT_ALIASES → integration test forces adding schema operations to a conformance level in the manifest.

5. Provenance conformance (RFC-013 integration)

A dedicated provenance.json verifies that the engine produces correct Execution Receipts (RFC-013):

Output includes engine_version field
Output includes schema_version matching the regulation's $schema
Output includes regulation_hash as SHA-256 of the YAML content
When executing with accepted values (replay mode), outputs match the original execution

Provenance tests are required at all conformance levels. Even a Core-level engine must stamp its outputs with provenance metadata.

6. Test derivation from BDD scenarios

The initial conformance test suite is derived from the existing BDD features. The process:

For each Gherkin scenario in features/*.feature, extract the regulation YAML, input parameters, and expected outputs
Decompose integration scenarios into isolated operation-level tests where possible
Add edge cases not covered by the BDD scenarios (null handling, empty arrays, boundary values, negative numbers)
Validate by running the conformance suite against the regelrecht Rust engine

The BDD features remain as integration tests for the regelrecht engine. The conformance suite is the portable subset.

7. CI integration

The conformance suite is validated in CI:

On every PR: run the regelrecht engine against the full conformance suite for all supported schema versions
On schema changes: verify that new schema versions include conformance tests and that existing tests still pass
On engine changes: verify that no conformance test regresses

A new CI job conformance runs alongside the existing test job. It uses the evaluate binary to execute each test case and compares outputs.

Why

Benefits

Any organisation can build an engine and prove it behaves correctly by running the conformance suite. No Rust toolchain required.
The conformance tests define what each operation does. The schema specifies structure; the conformance suite specifies computation. Together they form the complete specification.
If an engine change causes a conformance test to fail, it is a behavioral change that must be intentional and versioned (per RFC-013).
Third-party engines can start at Core level and work up. They do not need the full operation set to be useful for simple law evaluation.
The test format is plain JSON. A test runner is around 50 lines in any language.

Tradeoffs

The test suite must be maintained alongside the schema. Every new operation or semantic change requires new or updated tests.
The suite can only test documented behavior. Undocumented edge cases (how does DIVIDE handle division by zero? what happens when AGE is computed for a future date?) must be decided and encoded as tests. Writing conformance tests for these forces a decision that was previously deferred.
Covering all operations, types, and edge cases produces many tests. The suite will grow with each schema version.
Passing the suite does not guarantee identical behavior for all possible inputs, only for the tested cases. Mutation testing and property-based testing (Rust-specific) complement it for the regelrecht engine.

Alternatives Considered

Alternative 1: Shared BDD features with multi-language runners

Write Gherkin features and provide step definitions for multiple languages (Rust, Java, Python).
Rejected because maintaining step definitions in multiple languages is ongoing cost that scales with the number of implementations. The JSON format means each implementer writes a thin test runner once.

Alternative 2: WASM-based conformance (compile the test runner to WASM)

Provide a WASM module that runs conformance tests. Any platform that supports WASM can run them.
Rejected because it adds a WASM dependency for testing. JSON files are simpler, easier to inspect, and more widely supported.

Alternative 3: Conformance via the inter-engine protocol

A conformance service that sends test requests to an engine's HTTP endpoint and validates responses.
Not rejected, but complementary. The JSON suite tests core logic without requiring HTTP infrastructure. A protocol-level conformance service could be built on top, testing the HTTP interface against the same test cases.

Implementation Notes

Phase 1: Test format and Core level

Define the JSON test format (as described above)
Write Core-level tests covering all arithmetic, comparison, logical, and conditional operations
Add a Rust-based test runner that executes the conformance suite against the regelrecht engine
Add conformance job to CI

Phase 2: Cross-law and IoC levels

Extract cross-law tests from the zorgtoeslag and bijstand BDD scenarios
Write IoC tests covering open_terms and implements resolution
Add temporal tests for date operations and valid_from filtering

Phase 3: Advanced level and provenance

Add tests for hooks, overrides, untranslatables, procedures
Add provenance.json testing Execution Receipt fields (RFC-013)
Publish the suite as a standalone downloadable artifact (GitHub Release or npm package)

Phase 4: Documentation and third-party guide

Write a guide for third-party engine implementers: how to run the suite, how to declare conformance level, how to report results
Add conformance badge specification (e.g., "regelrecht conformant: Core v0.5.1")

Affected components

File	Change
`conformance/`	New: test suite
`conformance/v0.5.0/manifest.json`	Test manifest for schema v0.5.0
`packages/engine/src/types.rs`	`SCHEMA_OPERATIONS` and `COMPAT_ALIASES` consts, exhaustiveness test
`packages/engine/tests/conformance_coverage.rs`	New: coverage invariant tests (manifest vs engine operations)
`packages/engine/tests/conformance.rs`	New: Rust test runner for the conformance suite
`.github/workflows/ci.yml`	New `conformance` job

References

RFC-003: Inversion of Control — open_terms and implements (IoC conformance level)
RFC-007: Reactive Execution — hooks and overrides (Advanced conformance level)
RFC-008: Procedures — AWB procedure lifecycle (Advanced conformance level)
RFC-012: Untranslatables — handling constructs beyond engine expressiveness (Advanced conformance level)
RFC-013: Execution Provenance — Execution Receipt format (provenance conformance tests)
JSON Schema Test Suite — model for language-agnostic conformance testing
W3C Web Platform Tests — model for multi-implementation conformance testing at scale
Glossary of Dutch Legal Terms

RFC-014: Engine Conformance Test Suite ​

Context ​

Current state of testing ​

Decision ​

1. Conformance test suite ​

2. Test format ​

3. Conformance levels ​

4. Coverage invariant ​

5. Provenance conformance (RFC-013 integration) ​

6. Test derivation from BDD scenarios ​

7. CI integration ​

Why ​

Benefits ​

Tradeoffs ​

Alternatives Considered ​

Implementation Notes ​

References ​

RFC-014: Engine Conformance Test Suite

Context

Current state of testing

Decision

1. Conformance test suite

2. Test format

3. Conformance levels

4. Coverage invariant

5. Provenance conformance (RFC-013 integration)

6. Test derivation from BDD scenarios

7. CI integration

Why

Benefits

Tradeoffs

Alternatives Considered

Implementation Notes

References