Skip to content

Corpus Library

The corpus library is a shared Rust crate for loading and managing the regulation corpus. It abstracts over multiple source types (local filesystem and GitHub repositories) and handles YAML parsing, registry management, and schema validation.

Overview

  • Language: Rust
  • Location: packages/corpus/
  • Type: Library crate (used by other packages)

What it does

The corpus library provides a single API for loading law files regardless of where they are stored. It reads the corpus-registry.yaml manifest, authenticates with remote sources, fetches YAML files, and parses them into typed Rust structures.

Other packages use it:

  • The editor-api uses it to serve law files to the frontend
  • The admin uses it to proxy corpus data to the dashboard
  • The engine uses the parsed output for execution

Key modules

ModulePurpose
registry.rsCorpusRegistry - loads corpus-registry.yaml, merges local overrides
source_map.rsSourceMap - maps law IDs to parsed regulation YAML
models.rsSource, SourceType (Local/GitHub), RegistryManifest
github.rsGitHubFetcher - fetches YAML via GitHub API (feature-gated)
validation.rsSchema validation against the JSON schema
auth.rsToken management for private repositories

Usage

rust
use regelrecht_corpus::CorpusRegistry;

let registry = CorpusRegistry::load("corpus-registry.yaml")?;
let source_map = registry.load_all().await?;
let law = source_map.get("zorgtoeslagwet");

The github feature flag enables remote fetching from GitHub repositories. Without it, only local filesystem sources are available.

Further reading