Skip to content

RFC-018: Note Infrastructure: Storage, Resolution, and Editor Integration

Status: Accepted Date: 2026-05-18 Authors: Anne Schuth

Terminology

This RFC uses note (NL: notitie) for the feature, following the terminology decision in RFC-005. The W3C technical term Annotation is kept only where it refers to the data model itself. “Annotation” as a Dutch legal genre (annotatie, noot, “m.nt.”) is deliberately not what this feature is called.

Context

RFC-005 defines the note format: the W3C Web Annotation Data Model with TextQuoteSelector for version-resilient text anchoring. It specifies what a note looks like (selector, motivation, body, resolution states) but is explicitly storage-agnostic. It does not say where notes live, how they are loaded, who creates them, or how they are displayed.

This RFC addresses the implementation questions RFC-005 leaves open. They map directly onto the open design questions raised in interpretation discussions:

  1. Working notes: needed for the editor to bridge the gap between law text and machine_readable execution logic.
  2. Conflict resolution for generated notes: when notes are produced by tooling (e.g., an LLM suggesting linking notes), how are conflicts between generated and human-authored notes handled?
  3. Responsible law-to-YAML mapping: linking notes connect specific text fragments to the corresponding machine_readable elements, making the interpretation chain explicit and auditable.
  4. Note types and storage locations: different note types serve different purposes and may live in different places.
  5. Note history: version tracking via Git.
  6. Note scope: what does a note relate to? The entire law across all versions? A specific version? Is it personal or public? Does it target structure or content?
  7. Ambiguity tracking in work-in-progress laws: labelling diepte-casussen where we have imperfect information (“we don’t know what we don’t know”): open norms partially filled, open norms not yet filled, provisions that need explanation by implementation policy, and notes about missing documents that are still being searched for.

Two other RFCs shape the infrastructure:

  • RFC-009 (Multi-Organisation Execution) introduces competent_authority (bevoegd gezag) as the boundary that determines who can authoritatively execute what. The same concept applies to notes: the competent authority’s note linking text to machine_readable represents the official interpretation (gezaghebbende interpretatie). Other organisations, legal experts, and citizens can bring their own notes alongside, as advisory perspectives, not as the authoritative reading.

  • RFC-010 (Federated Corpus) introduces a “bring your own regulations” model where municipalities (gemeenten), provinces (provincies), and other organisations maintain laws in their own repositories. Notes need the same federated model: any organisation can annotate any law from their own perspective, in their own repository.

Current state

Nothing is implemented on main. A historical Python proof-of-concept exists on the feature/annotation-resolver branch (TextQuoteSelector resolution, fuzzy matching, BDD tests), but the stack has moved to Rust (engine) and Vue 3 (frontend). The Python code is not reused; the BDD scenarios are ported.

This RFC has been renumbered twice. An earlier draft circulated as PR #328 numbered “RFC-013”; that number was already taken on main (Execution Provenance), so it became RFC-016. RFC-016 in turn collided with the open PR #510, which uses that number for Collection Operations (foreach). RFC-016 is the older claim, so the note infrastructure RFC takes the next free number, RFC-018 (rfc-011 was never used; rfc-016 and rfc-017 are claimed by PR #510). PR #328 is closed in favour of this RFC.

Decision

Ten interconnected decisions form the note infrastructure.

1. Storage: Sidecar YAML files

Notes are stored in separate YAML files alongside law files, not embedded in the law YAML. This preserves the verbatim legal text (RFC-005 requirement: notes must not modify the source) and enables independent versioning.

Directory convention:

corpus/
  regulation/nl/
    wet/
      wet_op_de_zorgtoeslag/                     # filesystem path
        2025-01-01.yaml                          # law text, $id: zorgtoeslagwet
  annotations/
    zorgtoeslagwet/                              # keyed by the law's $id
      annotations.yaml                           # law-level notes (all versions)
      2025-01-01.annotations.yaml                # version-specific notes (optional)
    _vocabulary/
      ambiguity.yaml                             # controlled vocabulary (Decision 9)

The annotation directory is keyed by the law’s $id (e.g. zorgtoeslagwet), not by its filesystem path under regulation/ (e.g. wet_op_de_zorgtoeslag). The $id is the stable identifier laws reference each other by.

  • annotations.yaml: law-level notes that apply across all versions. The TextQuoteSelector resolves on whichever version is being viewed.
  • {valid_from}.annotations.yaml: notes specific to a particular law version. Used when the annotated text only exists in that version (e.g., a percentage that was changed in a later amendment).

The directory is named annotations/ (not notes/) because it stores W3C Annotation objects; the storage path follows the data model, the feature name follows RFC-005.

Discovery: convention-based. Given a law with $id: zorgtoeslagwet, the engine looks for notes at corpus/annotations/zorgtoeslagwet/annotations.yaml. No registry manifest is needed for local notes.

Validation: note files conform to schema/v0.5.2/annotation-schema.json, validated by just validate-annotations.

Example note file:

# corpus/annotations/zorgtoeslagwet/annotations.yaml
$schema: "https://raw.githubusercontent.com/MinBZK/regelrecht/refs/heads/main/schema/v0.5.2/annotation-schema.json"
annotations:
  - type: Annotation
    motivation: linking
    creator: "Dienst Toeslagen"
    target:
      source: "regelrecht://zorgtoeslagwet"
      selector:
        type: TextQuoteSelector
        exact: "zorgtoeslag ter grootte van dat verschil"
        prefix: "heeft de verzekerde aanspraak op een "
        suffix: ". Voor een verzekerde met een toeslagpartner"
        regelrecht:hint:
          type: CssSelector
          value: "article[number='2']"
          refinedBy:
            type: TextPositionSelector
            start: 112
            end: 152
    body:
      type: SpecificResource
      source: "regelrecht://zorgtoeslagwet/hoogte_zorgtoeslag#hoogte_zorgtoeslag"
      purpose: linking
    resolution: found
    workflow: resolved

2. Federated note sources

Notes follow the same “bring your own” pattern as laws in RFC-010. Any organisation can maintain notes in their own repository, alongside their regulations or in a dedicated note repository.

Required structure in a source repository:

annotations/
  {law_id}/
    annotations.yaml

When a source in corpus-registry.yaml (RFC-010) includes an annotations/ directory, the engine discovers and loads those notes alongside the source’s regulations.

Example: a municipality (gemeente) annotating a national law:

# In gemeente-amsterdam/regelrecht-amsterdam repo:
regulation/nl/
  verordening/
    afstemmingsverordening/
      2025-01-01.yaml                           # Amsterdam's ordinance
annotations/
  zorgtoeslagwet/
    annotations.yaml                             # Amsterdam's perspective on national law
  afstemmingsverordening/
    annotations.yaml                             # notes on their own ordinance

Amsterdam’s note on the Healthcare Allowance Act (Wet op de zorgtoeslag) might link text about “toeslagpartner” to their local implementation of the partner check, or add a comment explaining how Amsterdam interprets a provision in the context of their municipal social assistance (bijstand) policy.

Personal notes: a .local.annotations.yaml file (gitignored) follows the RFC-010 .local.yaml override pattern. Personal notes are loaded alongside public notes but not committed.

Loading order: the engine loads notes from all registered sources. Notes from different sources are layered, not prioritised (see Decision 7).

3. Note authority and provenance

Each note carries a creator field (part of the W3C Web Annotation Data Model) identifying who created it. The relationship between the creator and the annotated law determines the note’s authority level.

CreatorRelationship to competent_authorityAuthority levelExample
Competent authority (bevoegd gezag)Same orgAuthoritative (gezaghebbend)Allowances Service (Dienst Toeslagen) annotates Healthcare Allowance Act
Other government orgDifferent orgAdvisory (adviserend)Municipality (gemeente) annotates national law from local perspective
Legal expert / researcherNo orgPersonal (persoonlijk)Law professor adds explanatory comment
Automated toolingGeneratedGenerated (gegenereerd)LLM-produced linking note

The authority level is derived at display time, not stored. It follows the same principle as RFC-009’s execute/accept boundary: the law determines authority, the engine derives behaviour.

Derivation logic:

Is the note's creator the competent_authority for the target law?
  YES → Authoritative
  NO  →
    Is the creator a government organisation?
      YES → Advisory
      NO  →
        Is the creator an automated tool?
          YES → Generated
          NO  → Personal

For the MVP, creator is a string field (e.g., "Dienst Toeslagen", "LLM-generated", "J. de Vries"). When RFC-009’s identity model (EngineIdentity with OIN) is implemented, creator can carry a structured identity with signature verification. The schema accommodates both:

# MVP: simple string
creator: "Dienst Toeslagen"

# Future: structured identity (W3C Agent)
creator:
  type: Organisation
  name: "Dienst Toeslagen"
  organisation_id: "00000004003214345000"  # OIN

Provenance: the note file’s Git history provides when each note was created and by whom (git log, git blame). The schema optionally includes created and modified timestamps for systems that do not use Git.

4. Scope model

Each scope dimension maps to a concrete mechanism:

Scope dimensionMechanismHow it works
Entire law, all versions (hele wet, alle versies)Law-level annotations.yamlTextQuoteSelector resolves on whichever version is being viewed. A note on “zorgtoeslag” finds the word in both the 2020 and 2025 versions.
Specific version (specifieke versie)Version-specific {valid_from}.annotations.yamlA note on a percentage changed by Staatsblad 2008, 516 only makes sense on the pre-2008 version.
Personal / public (persoonlijk / publiek)Public: in Git. Personal: .local.annotations.yaml (gitignored)An expert’s draft notes before publishing; a student’s study notes.
Structure or content (structuur of inhoud)Notes target text content via TextQuoteSelectorThe note on “zorgtoeslag” finds the word regardless of which article it is in. If article 2 is renumbered to article 3, the note follows the text. Article numbers appear only as performance hints (non-authoritative).

5. Note types and their use cases

The W3C Web Annotation vocabulary defines 13 motivation types. Four are primary for regelrecht.

Linking (koppeling)

Connects text to a machine_readable element. This is the most critical type: it makes the interpretation chain from law text to executable logic explicit and auditable.

- type: Annotation
  motivation: linking
  creator: "Dienst Toeslagen"
  target:
    source: "regelrecht://zorgtoeslagwet"
    selector:
      type: TextQuoteSelector
      exact: "zorgtoeslag ter grootte van dat verschil"
      prefix: "heeft de verzekerde aanspraak op een "
      suffix: ". Voor een verzekerde met een toeslagpartner"
  body:
    type: SpecificResource
    source: "regelrecht://zorgtoeslagwet/hoogte_zorgtoeslag#hoogte_zorgtoeslag"
    purpose: linking

The body.source uses a regelrecht:// URI (as defined in packages/engine/src/uri.rs) pointing to the specific execution element. The fragment (#hoogte_zorgtoeslag) identifies the output name.

Commenting (toelichting)

Human explanation of a legal concept or provision.

- type: Annotation
  motivation: commenting
  creator: "J. de Vries"
  target:
    source: "regelrecht://zorgtoeslagwet"
    selector:
      type: TextQuoteSelector
      exact: "normpremie"
      prefix: "Indien de "
      suffix: " voor een verzekerde"
  body:
    type: TextualBody
    value: >-
      The normative premium (normpremie) is a notional amount representing what
      the insured person is expected to contribute to health insurance. It is
      based on income, not on the actual premium paid.
    purpose: commenting
    format: text/plain
    language: en

Tagging (classificatie)

Classification of legal concepts for search, analysis, and cross-referencing. Also the carrier for ambiguity state (Decision 9).

- type: Annotation
  motivation: tagging
  creator: "regelrecht-tooling"
  target:
    source: "regelrecht://zorgtoeslagwet"
    selector:
      type: TextQuoteSelector
      exact: "verzekerde"
      prefix: "heeft de "
      suffix: " aanspraak op een zorgtoeslag"
  body:
    type: TextualBody
    value: "legal-subject"
    purpose: tagging

Questioning (vraag)

Open question raised during interpretation. Tracked via the workflow field. This type carries the ambiguity-tracking use case (Decision 9).

- type: Annotation
  motivation: questioning
  creator: "interpreter-team"
  target:
    source: "regelrecht://zorgtoeslagwet"
    selector:
      type: TextQuoteSelector
      exact: "drempelinkomen"
      prefix: "procent van het "
      suffix: ", vermeerderd met"
  body:
    type: TextualBody
    value: "Is this the same drempelinkomen defined in Awir article 2?"
    purpose: questioning
  workflow: open

6. Ambiguity tracking and missing documents

This decision answers the original interpretation-research need: labelling ambiguity in work-in-progress laws, where we start from existing law with (possibly unknown) implementation policy and want to reach a validated executable rule set, knowing we have imperfect information.

The states are not a fixed list. Today there are a handful (“open norm partially filled”, “open norm not yet filled”, “needs explanation by implementation policy”, “document still being searched for”); the set will grow as the interpretation process matures. Freezing it into a validated schema enum would mean every new state is a schema version bump plus a migration of existing note files plus an RFC amendment. That is exactly the brittleness RFC-005 avoids for text anchoring; we avoid it here too.

Model. Ambiguity is expressed with the existing W3C dimensions, no schema extension:

  • motivation: questioning: there is an open interpretation issue here.
  • workflow: open | resolved: has the issue been addressed?
  • a tagging body whose value is the specific ambiguity state, drawn from a controlled vocabulary (Decision 9).

A note can carry both a questioning body (the question in prose) and a tagging body (the machine-readable state). The W3C model permits multiple bodies.

Example: an open norm that is only partially filled in.

- type: Annotation
  motivation: questioning
  creator: "interpreter-team"
  target:
    source: "regelrecht://zorgtoeslagwet"
    selector:
      type: TextQuoteSelector
      exact: "bij ministeriële regeling"
      prefix: "worden "
      suffix: " nadere regels gesteld"
  body:
    - type: TextualBody
      value: >-
        Ministerial regulation partially located. The income brackets are
        implemented; the hardship clause is referenced but the implementing text
        has not been found yet.
      purpose: questioning
      format: text/plain
      language: en
    - type: TextualBody
      value: "open-norm-partial"
      purpose: tagging
  workflow: open

Missing documents. A note can be about a document that does not exist in the corpus yet but is needed to resolve a provision. The target is the text fragment that triggers the search; the body describes what is missing. This makes “what are we still looking for” a queryable property of the corpus, not tribal knowledge.

- type: Annotation
  motivation: questioning
  creator: "interpreter-team"
  target:
    source: "regelrecht://zorgtoeslagwet"
    selector:
      type: TextQuoteSelector
      exact: "beleidsregels van de Belastingdienst"
      prefix: "overeenkomstig de "
      suffix: " omtrent"
  body:
    - type: TextualBody
      value: >-
        These policy rules (beleidsregels) govern the discretionary hardship
        assessment but are not in the corpus. Searching Staatscourant and the
        Belastingdienst policy register.
      purpose: questioning
      format: text/plain
      language: en
    - type: TextualBody
      value: "missing-document"
      purpose: tagging
  workflow: open

7. Resolver architecture

The TextQuoteSelector resolver is implemented in Rust as a module within the engine crate. It is exposed to the frontend via WASM.

Module structure:

packages/engine/src/
  annotation/
    mod.rs           # pub mod types; pub mod resolver;
    types.rs         # Data types
    resolver.rs      # Resolution algorithm

Types (in types.rs):

use serde::{Deserialize, Serialize};

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct TextQuoteSelector {
    pub exact: String,
    pub prefix: Option<String>,
    pub suffix: Option<String>,
    pub hint: Option<SelectorHint>,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SelectorHint {
    pub article_number: String,
    pub start: Option<usize>,
    pub end: Option<usize>,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct MatchResult {
    pub status: MatchStatus,
    pub matches: Vec<TextMatch>,
}

#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub enum MatchStatus {
    Found,
    Orphaned,
    Ambiguous,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct TextMatch {
    pub article_number: String,
    pub start: usize,
    pub end: usize,
    pub confidence: f64,
    pub matched_text: String,
}

Resolution algorithm (in resolver.rs):

The hint is checked first so it provides an actual fast path: searching one article is O(article length) versus O(law length) for the full scan. The hint is non-authoritative, so a miss falls through to the full search rather than failing.

resolve(selector, articles) -> MatchResult:
  1. If hint is present (fast path):
     - Try exact match within the hinted article only
       (if the hint has a position, verify exact text at that offset first)
     - If found: return Found with confidence 1.0
     - If not: hint was outdated, fall through to the full search
  2. Try exact match across all articles:
     - Search for prefix + exact + suffix as substring
     - One match: return Found (confidence 1.0)
     - Multiple matches: return Ambiguous
  3. Fuzzy match across all articles:
     - For each sliding window of len(exact) +/- 30%:
       - exact_score = levenshtein_similarity(exact, window_text)
       - prefix_score = levenshtein_similarity(prefix, text_before_window)
       - suffix_score = levenshtein_similarity(suffix, text_after_window)
       - score = exact_score * 0.5 + prefix_score * 0.25 + suffix_score * 0.25
     - Collect matches above threshold (0.7), deduplicate overlapping spans
  4. One match (or a clear winner > 0.1 ahead): return Found, confidence = score
  5. Multiple equally-good matches: return Ambiguous
  6. No matches: return Orphaned

Dependency: strsim crate for Levenshtein distance.

WASM bindings (additions to packages/engine/src/wasm.rs):

#[wasm_bindgen]
impl WasmEngine {
    /// Resolve a single TextQuoteSelector against a loaded law
    pub fn resolve_annotation(
        &self,
        law_id: &str,
        selector: JsValue,
    ) -> Result<JsValue, JsValue>;

    /// Resolve all notes from a YAML string against a loaded law
    pub fn resolve_annotations(
        &self,
        law_id: &str,
        annotations_yaml: &str,
    ) -> Result<JsValue, JsValue>;
}

BDD tests: port the 8 scenarios from feature/annotation-resolver:features/annotation.feature into features/notes.feature:

  1. Exact match: Zorgtoeslagwet 2025 article 4a
  2. Article renumbered: Staatsblad 2024, 291 (article 3 → article 4a)
  3. Text change: Staatsblad 2008, 516 (percentage 3,5 → 2,7)
  4. Orphaned note: text fully removed
  5. Ambiguous match: common word without context
  6. Unique with context: prefix/suffix disambiguation
  7. Hint optimisation: correct hint
  8. Hint fallback: outdated hint

Plus ambiguity scenarios (Decision 6): a questioning note with workflow: open and a tagging body open-norm-partial resolving correctly, and a workflow: resolved variant.

8. Conflict resolution and layering

Notes from different sources layer: they do not conflict. This is fundamentally different from laws, where RFC-010 uses priority to resolve $id collisions. Two organisations can both annotate the same text fragment; both notes are valid and visible.

Layering model:

Law text: "heeft de verzekerde aanspraak op een zorgtoeslag ter grootte van dat verschil"
           ─────────────────────────────────────────────

           ┌────────────────────────┼─────────────────────────┐
           │                        │                          │
    Dienst Toeslagen         Gemeente Amsterdam          LLM-generated
    motivation: linking      motivation: commenting      motivation: linking
    body: #hoogte_zorg-      body: "Core provision our    body: #berekening_
          toeslag                  bijstand calc refs"          zorgtoeslag
    authority: authoritative  authority: advisory         authority: generated

All three coexist. The editor displays them layered, with authoritative notes visually distinguished (solid highlight vs dashed border).

When law text changes:

  1. The resolver runs all notes against the new version
  2. Notes that still match: resolution: found (no action needed)
  3. Notes that no longer match above threshold: resolution: orphaned
  4. Orphaned notes are preserved with their original selector context
  5. CI reports orphaned notes as warnings, not errors

Tooling-generated notes:

  • carry creator: "{tool-name}" and authority level generated
  • are treated as suggestions: visible in the editor but visually distinguished
  • a human reviewer can promote a generated note by changing the creator to their own identity
  • conflicting generated notes (two tools suggesting different linking targets for the same text) are both shown; the human picks the correct one

9. Controlled vocabulary for tagging values

Tagging bodies (including ambiguity states from Decision 6) draw their value from a controlled vocabulary stored as a plain YAML list:

# corpus/annotations/_vocabulary/ambiguity.yaml
ambiguity:
  - id: open-norm-not-filled
    label: "Open norm, nog niet ingevuld"
    description: "An open norm whose implementing text has not been written or located."
  - id: open-norm-partial
    label: "Open norm, deels ingevuld"
    description: "An open norm partially implemented; some parts still unresolved."
  - id: needs-uitvoeringsbeleid
    label: "Behoeft uitleg door uitvoeringsbeleid"
    description: "The provision cannot be made executable without implementation policy."
  - id: missing-document
    label: "Document ontbreekt"
    description: "A document needed to resolve the provision is not in the corpus."

just validate-annotations warns (does not fail) when a tagging body’s value is not in the vocabulary. This catches typos and keeps the set queryable, without forking the W3C standard and without a schema bump when a state is added. The warning-not-error stance mirrors how orphaned notes are reported (Decision 8).

10. Editor integration

Read path

                                                  ┌─────────────────────┐
                                                  │   useNotes.js        │
┌──────────────────┐    fetch    ┌────────────┐   │                     │
│ /data/annotations│───────────>│ annotations │──>│  WASM resolve()     │
│ /{lawId}/        │            │    .yaml    │   │         │           │
│ annotations.yaml │            └────────────┘   │         v           │
└──────────────────┘                              │  resolved positions │
                                                  └─────────┬───────────┘

                                                            v
                                                  ┌─────────────────────┐
                                                  │  AnnotatedText.vue  │
                                                  │  <mark> spans with  │
                                                  │  colour by motivation│
                                                  └─────────────────────┘

frontend/src/composables/useNotes.js:

  1. Fetches note YAML from /data/annotations/{lawId}/annotations.yaml
  2. Calls WasmEngine.resolveAnnotations() to get match positions per article
  3. Returns a reactive list of { note, match } objects, filtered by the selected article

frontend/src/components/AnnotatedText.vue:

A variant of ArticleText.vue for the editor’s Tekst pane. Takes article text and resolved positions, renders <mark> spans. Each span carries:

  • background colour by motivation (linking: blue, commenting: yellow, questioning: orange, tagging: green)
  • border style by authority (authoritative: solid, advisory: dashed, generated: dotted)
  • click handler to show note details

Write path (MVP)

  1. User selects text in AnnotatedText.vue
  2. useTextSelection.js captures the selection, extracts exact, computes prefix (30-50 chars before) and suffix (30-50 chars after)
  3. Validates uniqueness via WasmEngine.resolveAnnotation(): if the selector matches multiple locations, asks for more context
  4. NoteCreator.vue opens as a popover: pick motivation, select target machine_readable element (linking), type text (commenting/questioning), or pick an ambiguity tag from the vocabulary (questioning)
  5. New note added to local state and persisted in localStorage
  6. Export button downloads the updated note YAML for manual git commit

Amended after implementation. Step 6 is no longer the only way out. The editor can also write notes back through editor-api. The manual export stays for the offline case.

The write runs through the active traject, exactly like law and scenario edits since the traject concept landed (PR #632). A traject is a named, member-scoped editing project with its own federated corpus config; its write_target_for_source map decides which backend (and branch) a given law’s edits land in. The notes sidecar for law_id is written to annotations/{law_id}/annotations.yaml through that same backend, so a note and a law edit made in one session ride the same branch and PR. There is no separate per-session branch and no X-Editor-Session header: the session cookie carries the active traject. With no active traject the save returns 403, the same rule the law and scenario writes follow.

This also settles the federated-target question from Decision 2 without a per-note source picker. An org annotating another org’s law into its own repo configures that as its traject’s writable source; the traject’s routing map then sends the notes there. Routing is a property of the traject, decided once when the traject is set up, not a free per-request override. The earlier ?source= override is gone: a second routing mechanism that could disagree with the traject’s own config was a privilege gap, not federation.

The write is append-only, and this matters. The browser sends only the notes it just created, not a rebuilt file. editor-api reads the current sidecar from the traject branch, appends the new notes (deduplicated by content), and validates the merged document against the schema. The non-shrink property is structural, not a post-hoc size check: the normal path keeps the existing file’s bytes verbatim and only appends, so it cannot drop a note. The one path that rebuilds the file (a base whose annotations is not a readable block sequence, e.g. flow style or non-LF) carries an explicit destructive-shrink guard that refuses rather than rebuild over content it could not parse. An earlier sketch had the browser send the whole file rebuilt from the static /data mirror; that mirror is the corpus at deploy time, not the branch, so a save silently discarded notes other contributors had added. Append-only follows directly from Decision 8: notes layer, they do not conflict, so the write path must never overwrite, only add.

Known limitation: read-your-writes holds within a single editor-api process (the traject’s checkout accumulates its own commits on disk), but the checkout is cloned once per traject and not pulled before each read. Under horizontal scaling, or if the traject branch is updated out-of-band, a replica can read a stale base. Append-only bounds the blast radius — the worst case is a dedup miss or a non-fast-forward push that fails loudly, not the silent loss of others’ notes the /data mirror caused — but a strict cross-replica guarantee needs a pull-before-read (or a single-writer assumption) and is not yet built.

Every note’s target.source must resolve to the law the path names. The schema allows any string there, so this is checked explicitly: a note whose source is absent or not a regelrecht://{law_id} URI is rejected, not skipped. It is the note-side counterpart of the $id/path guard on save_law.

The content-review half of Decision 8’s two-layer model relies on branch protection requiring code-owner review on main. A CODEOWNERS entry for corpus/annotations/ exists, but on its own it only requests a reviewer; it gates a merge only once the repository enables require_code_owner_reviews. Until then the schema/resolve checks in CI are the only enforced layer.

Known limitation: schema drift on an existing sidecar. The merged file is validated against the current schema before it is written. If the sidecar already on the branch contains a note that no longer satisfies the schema (a version bump landed without migrating that file), the append cannot proceed: writing it would commit a file CI then rejects. The write path validates the existing file separately and returns a distinct 409 (“the file is itself invalid, this is not your note”) rather than the generic note-invalid 400, so the author is not sent chasing a fault in their own valid note. The fix is to repair or migrate the offending file; this is rare and a hard, clearly-attributed stop is preferred over silently appending into a file that will fail the gate anyway.

Editor layout

The right pane’s segmented control gains a third option, labelled Notities:

┌──────────────────────────────────────────────────────────┐
│  [ Machine ]  [ YAML ]  [ Notities ]                     │
├──────────────────────────────────────────────────────────┤
│  Notities voor Artikel 2                                 │
│                                                          │
│  ┌─ Gezaghebbend (Dienst Toeslagen) ──────────────────┐ │
│  │ ► "zorgtoeslag ter grootte van dat verschil"        │ │
│  │   → #hoogte_zorgtoeslag (linking)                   │ │
│  └─────────────────────────────────────────────────────┘ │
│  ┌─ Adviserend (Gemeente Amsterdam) ──────────────────┐  │
│  │ ► "zorgtoeslag"  "Core provision..." (commenting)   │ │
│  └─────────────────────────────────────────────────────┘ │
│  ┌─ Vraag (interpreter-team) ────────────────────────┐  │
│  │ ► "bij ministeriële regeling"                       │ │
│  │   [open-norm-partial]  workflow: open               │ │
│  └─────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────┘

Why

Benefits

  • Git gives full note history for free. Every change is a commit with author, timestamp, and diff. git blame shows who added each note and when.

  • Sidecar files preserve verbatim legal text. The law YAML is never modified (RFC-005 requirement). Essential for legal integrity: the source must be identical to the official publication.

  • Federated model lets every org bring their own notes. A municipality can annotate national laws from their perspective without touching the central corpus. This mirrors RFC-010.

  • Authority model reflects legal reality. The competent authority’s interpretation carries more weight, as in administrative law. Authority is derived from existing schema fields (competent_authority), following RFC-009.

  • Linking notes make interpretation auditable. Every connection between text and machine_readable logic is an explicit, reviewable note. This supports the reasoning requirement (motiveringsplicht, AWB 3:46): the chain from law text to computation can be inspected by citizens, courts, and oversight bodies.

  • Ambiguity is tracked without freezing a vocabulary. Interpretation research can label open norms and missing documents today and refine the categories later, without schema migrations.

  • W3C standard enables interoperability. The format works with Hypothesis, Apache Annotator, and Recogito. External parties can produce notes without the regelrecht editor.

  • Rust resolver is the single source of truth. The same resolver runs in the engine (CI validation, server-side) and the browser (WASM). No divergence between what the editor shows and what the engine validates.

Tradeoffs

  • Two places to look. A law’s notes live in a separate directory from its YAML source. Tooling must load both. Convention-based discovery keeps this straightforward but it is an extra step.

  • No real-time collaboration. Notes are Git-based. Concurrent editing requires branches and pull requests. Acceptable for the MVP; production may need a collaboration layer.

  • WASM adds build complexity. The wasm-pack build step is additional. The engine already supports WASM compilation, so the incremental cost is low.

  • Authority derivation requires competent_authority in the law. Laws without a declared competent_authority cannot distinguish authoritative from advisory notes. This is an existing gap (RFC-009 also depends on it).

  • Tag vocabulary is soft-validated. A typo in a tagging value produces a warning, not a hard failure. Deliberate: it keeps the vocabulary growable without ceremony.

Alternatives Considered

Notes inside law YAML. Embed an annotations array in the law file. Rejected: modifies the verbatim source (RFC-005 forbids this) and creates noisy Git diffs where note changes obscure law changes.

Database storage. Store notes in PostgreSQL (the pipeline already uses it). Rejected for MVP: introduces a backend dependency where none exists, and loses Git’s built-in versioning, blame, and merge workflow. A database layer can be added later without changing the format.

JavaScript-only resolver. Implement TextQuoteSelector resolution in JS. Rejected: duplicates the fuzzy matching algorithm. The engine needs resolution for CI validation; a separate JS implementation would diverge. WASM compiles the same Rust code for the browser.

Centralised note authority. Designate one organisation as the note authority per law. Rejected: does not match RFC-010’s federated model or RFC-009’s multi-org reality. The layering model accommodates multiple legitimate annotators.

Dedicated regelrecht:ambiguity enum field. Add a validated enum to the schema for ambiguity states. Rejected: the state set is still emerging; an enum makes every new state a schema version bump plus migration plus RFC amendment. The tagging-body plus controlled-vocabulary approach gives queryability and typo detection without forking the W3C model.

Implementation Notes

Implementation is planned in six phases, each delivering a working whole.

Phase 0: RFC review RFC-005 and this RFC accepted and merged before code lands.

Phase 1: Rust resolver + BDD tests Implement the annotation module (types.rs, resolver.rs). Port the 8 BDD scenarios. Add strsim to Cargo.toml.

Phase 2: Note schema + first notes Create schema/v0.5.2/annotation-schema.json. Write the first real linking and commenting notes for the Healthcare Allowance Act article 2 plus the ambiguity vocabulary. Add validate-annotations to the Justfile.

Phase 3: WASM bindings Add resolve_annotation() and resolve_annotations() to WasmEngine.

Phase 4: Frontend display Create useNotes.js and AnnotatedText.vue. Integrate behind a feature flag. Update the build script to copy note files.

Phase 5: Frontend creation Create useTextSelection.js and NoteCreator.vue.

Phase 6: CI validation + ambiguity use case Add validate-annotations to the quality gate (orphaned notes as warnings). Add ambiguity BDD scenarios and one concrete ambiguity note in the corpus.

Affected components

File/DirectoryChange
packages/engine/src/annotation/New: resolver module (mod.rs, types.rs, resolver.rs)
packages/engine/src/lib.rsAdd pub mod annotation and re-exports
packages/engine/src/wasm.rsAdd resolve_annotation(), resolve_annotations()
packages/engine/Cargo.tomlAdd strsim dependency
schema/v0.5.2/annotation-schema.jsonNew: JSON Schema for note files
corpus/annotations/New: note sidecar files + _vocabulary/ambiguity.yaml
features/notes.featureNew: BDD scenarios (ported + ambiguity)
packages/engine/tests/bdd/steps/New: note step definitions
frontend/src/components/AnnotatedText.vueNew: text with highlights
frontend/src/components/NoteCreator.vueNew: note creation form
frontend/src/composables/useNotes.jsNew: note loading and resolution
frontend/src/composables/useTextSelection.jsNew: text selection capture
frontend/src/EditorApp.vueAdd Notities tab, use AnnotatedText behind a flag
frontend/scripts/copy-laws.jsCopy note files to public/data/annotations/
JustfileAdd validate-annotations recipe

References

Type to search the documentation.