Jump to content

Nomos: Difference between revisions

From Yusupov's House
No edit summary
No edit summary
 
(One intermediate revision by the same user not shown)
Line 6: Line 6:
| 05_genre        = AI-generated legal document archive
| 05_genre        = AI-generated legal document archive
| 06_language    = Python
| 06_language    = Python
| 07_framework    = [[Django]] 6.0 / [[Wagtail]] 7.3
| 07_framework    = [[Django]] / [[Wagtail]]
| 08_license      = Proprietary
| 08_license      = Proprietary
}}
}}
Line 14: Line 14:
== Technology stack ==
== Technology stack ==


The application is built on Django 6.0.4 with Wagtail 7.3.1 as its content management framework and [[SQLite]] as its development database.<ref name="requirements">requirements.txt lists Django 6.0.4, wagtail 7.3.1, openai 2.32.0, requests 2.33.1, python-dotenv 1.2.2, beautifulsoup4 4.14.3, and Pillow 12.2.0.</ref> Additional dependencies include the [[OpenAI]] Python client for language-model calls, [[Requests (software)|Requests]] for HTTP communication with the Vlaamse Codex API, [[Beautiful Soup (HTML parser)|Beautiful Soup]] for scraping title inspiration from the Belgisch Staatsblad, [[Pillow (imaging library)|Pillow]] as a Wagtail dependency, and [[python-dotenv]] for environment configuration. The front end uses [[Bootstrap]] 5.3.8 loaded from the jsDelivr CDN with subresource integrity hashes. All user interface text is in Dutch. The planned production deployment targets a Hetzner VPS behind [[Nginx]] with [[Gunicorn]] and [[PostgreSQL]].
The application is built on Django with Wagtail as its content management framework and [[SQLite]] as its development database. Additional dependencies include the [[OpenAI]] Python client for language-model calls, [[Requests (software)|Requests]] for HTTP communication with the Vlaamse Codex API, [[Beautiful Soup (HTML parser)|Beautiful Soup]] for scraping title inspiration from the Belgisch Staatsblad, [[Pillow (imaging library)|Pillow]] as a Wagtail dependency, and [[python-dotenv]] for environment configuration. The front end uses [[Bootstrap]] loaded from the jsDelivr CDN with subresource integrity hashes. All user interface text is in Dutch. The planned production deployment targets a Hetzner VPS behind [[Nginx]] with [[Gunicorn]] and [[PostgreSQL]].


== Data model ==
== Data model ==
Line 62: Line 62:
=== Stage 2: Technical mutation ===
=== Stage 2: Technical mutation ===


The seed document's title (''opschrift'') is sent to the OpenAI Chat Completions API (default model: GPT-5, configurable via the <code>OPENAI_MODEL</code> environment variable). The system prompt instructs the model to behave as an expert in Flemish legislation and to perform a semantic shift: replace the core subject with a plausible but fictional technical equivalent while preserving the exact grammatical structure and bureaucratic tone. If a topic hint was obtained from the Belgisch Staatsblad, it is included as thematic guidance.<ref name="temperature">GPT-5 does not support custom temperature values. All API calls use the model's default temperature (1).</ref>
The seed document's title (''opschrift'') is sent to the OpenAI Chat Completions API (the model is configurable via the <code>OPENAI_MODEL</code> environment variable). The system prompt instructs the model to behave as an expert in Flemish legislation and to perform a semantic shift: replace the core subject with a plausible but fictional technical equivalent while preserving the exact grammatical structure and bureaucratic tone. If a topic hint was obtained from the Belgisch Staatsblad, it is included as thematic guidance.


=== Stage 3: Administrative drafting ===
=== Stage 3: Administrative drafting ===
Line 89: Line 89:
=== Text sanitisation ===
=== Text sanitisation ===


All generated text is passed through a sanitisation function that strips Unicode control characters (C0/C1 range, excluding newlines and tabs), applies NFC normalisation, removes empty list items and orphaned list wrappers from the HTML, and cleans whitespace artifacts. This addresses a known issue where GPT-5 occasionally emits ASCII control characters in place of Unicode punctuation.
All generated text is passed through a sanitisation function that strips Unicode control characters (C0/C1 range, excluding newlines and tabs), applies NFC normalisation, removes empty list items and orphaned list wrappers from the HTML, and cleans whitespace artifacts. This addresses a known issue where some language models occasionally emit ASCII control characters in place of Unicode punctuation.


=== Persistence ===
=== Persistence ===

Latest revision as of 15:42, 20 April 2026

Infobox
nameNomos
urlhttps://nomos.yusupov.cloud
developerMichel Vuijlsteke
released2026
genreAI-generated legal document archive
languagePython
frameworkDjango / Wagtail
licenseProprietary

Nomos is a web application hosted at nomos.yusupov.cloud that generates and publishes AI-created Flemish administrative legislation. Each day, the system fetches a real document from the Vlaamse Codex open-data API, performs a semantic shift on its subject matter using a large language model, and produces a structurally faithful but entirely fictional legal text — a decree, ministerial order, or circular — that reads as if it were published in the Belgisch Staatsblad. The generated documents are stored in a Wagtail content management system and presented through a GOV.UK-inspired Dutch-language public interface. The name Nomos derives from the Greek word νόμος, meaning "law."

Technology stack

The application is built on Django with Wagtail as its content management framework and SQLite as its development database. Additional dependencies include the OpenAI Python client for language-model calls, Requests for HTTP communication with the Vlaamse Codex API, Beautiful Soup for scraping title inspiration from the Belgisch Staatsblad, Pillow as a Wagtail dependency, and python-dotenv for environment configuration. The front end uses Bootstrap loaded from the jsDelivr CDN with subresource integrity hashes. All user interface text is in Dutch. The planned production deployment targets a Hetzner VPS behind Nginx with Gunicorn and PostgreSQL.

Data model

The data model uses Wagtail's page tree. A singleton DecreeIndexPage (limited to max_count = 1) serves as the parent of all generated documents. Each document is a DecreePage with the following fields:

  • instrument — one of four types: Decreet, Besluit, Omzendbrief, or Reglement.
  • full_title (TextField) — the complete title, unlimited in length. The standard Wagtail title field (255-character limit) holds a truncated copy for internal use.
  • body (RichTextField) — the full HTML text of the generated legislation.
  • publication_date (DateField) — set to the generation date.
  • status — either Geldig (valid) or Gearchiveerd (archived); defaults to Geldig.
  • seed_id (IntegerField, unique, nullable) — the numeric ID of the source document in the Vlaamse Codex, used for deduplication to ensure no seed is used twice.
  • seed_reference (URLField) — direct API link to the source document.
  • seed_document (TextField) — human-readable label recording the type and title of the source document.
  • generation_notes (TextField) — an LLM-generated human-readable summary describing how the generated document differs from its source.
  • revision_notes (TextField) — an LLM-generated summary of the subtle details introduced during the revision stage (see below); empty if the revision was rejected or produced no changes.

Full-text search is indexed on full_title and body via Wagtail's database search backend.

Generation pipeline

Generation is driven by the generate_decree management command, which invokes a multi-stage pipeline implemented in nomos/services/generator.py. The pipeline retries up to three times if validation fails.

Stage 1: Structural sourcing

The pick_random_seed() function in nomos/services/codex.py fetches up to 200 recent documents from the Vlaamse Codex open-data API (codex.opendata.api.vlaanderen.be). Candidates are filtered to four allowed document types:

Codex type Mapped instrument
Decreet DECREET
Besluit van de Vlaamse Regering BESLUIT
Ministerieel besluit BESLUIT
Omzendbrief OMZENDBRIEF

Documents whose seed_id already exists in the database are excluded. The remaining candidates are grouped by type, and a weighted random selection favours types that have been used less frequently in the preceding seven days. For each recently used instrument, the selection weight is reduced by two per occurrence (minimum weight of 1). Once a seed is selected, the system fetches its full detail and chapter/section structure from the API.

Topic inspiration

Before generating, the pipeline scrapes the Belgisch Staatsblad website (ejustice.just.fgov.be) for a random document title to use as thematic inspiration. To avoid reusing the same inspiration across runs, the scraper selects a random publication date from the past 90 days rather than always fetching the current edition. Federal and national references in the scraped title are replaced with Flemish equivalents using a table of 17 substitution pairs — for example, "Federale Overheidsdienst" becomes "Vlaamse overheidsdienst," "Koninklijk besluit" becomes "Besluit van de Vlaamse Regering," and "België" becomes "Vlaanderen." If the scrape fails, generation proceeds without a topic hint.

Stage 2: Technical mutation

The seed document's title (opschrift) is sent to the OpenAI Chat Completions API (the model is configurable via the OPENAI_MODEL environment variable). The system prompt instructs the model to behave as an expert in Flemish legislation and to perform a semantic shift: replace the core subject with a plausible but fictional technical equivalent while preserving the exact grammatical structure and bureaucratic tone. If a topic hint was obtained from the Belgisch Staatsblad, it is included as thematic guidance.

Stage 3: Administrative drafting

The tilted title, the seed document's full text (up to 4,000 characters), and its structural outline are sent to a second API call. The system prompt instructs the model to act as a legislative jurist of the Flemish government and to rewrite the source document about the new subject. Strict structural parity rules are enforced:

  • The output must contain the exact same number of chapters (hoofdstukken), sections (afdelingen), and articles (artikelen) as the source.
  • If the source has no chapter divisions, the output must not introduce them.
  • The total length must be comparable to the source.
  • Content must be entirely original — only the form is emulated.

The output is requested as clean HTML using <h2>, <h3>, <p>, <ol>, and <li> elements, without a top-level heading (which is rendered separately on the page).

Stage 3b: Subtle revision

After drafting, the full text is sent to an additional API call that introduces defamiliarisation through precision: the model is instructed to locate 3 to 5 passages dealing with execution, control, materials, or conditions, and to make a single local detail in each slightly more specific or procedural than necessary — for example, adding an unexpectedly precise measurement, a format requirement, or a procedural substep. The changes must be strictly additive; no text may be removed or truncated. Two programmatic guards run before the revision is accepted: an identity check rejects revisions that return the text unchanged, and a length check rejects revisions where the word count drops below 95% of the original (indicating deleted content). If a revision fails either guard, it is retried once. A separate validation call (the revision scrub) then checks whether the revision shifted the main subject, introduced too many or contextually inappropriate details, or deleted content. If the revision is rejected or produced no changes, the unrevised text is used.

Stage 4: Juridical scrub

The generated HTML is submitted to a validation call in which the model acts as a quality controller. It checks for the presence of narrative, poetic, or metaphorical language; references to fiction, imagination, or art; and humor or irony. Documents that do not read as authentic administrative texts are rejected with a reason. If all three attempts fail validation, the pipeline raises an error.

Stage 5: Generation notes

After successful validation, an API call generates a human-readable summary of the transformation. The model is instructed to act as an archivist and to describe in one or two plain-language sentences how the new document's subject differs from the original, without technical jargon. If the subtle revision was accepted, a second call compares the pre- and post-revision texts word by word and produces a bullet-point list of the specific passages that were changed, stored as revision_notes.

Text sanitisation

All generated text is passed through a sanitisation function that strips Unicode control characters (C0/C1 range, excluding newlines and tabs), applies NFC normalisation, removes empty list items and orphaned list wrappers from the HTML, and cleans whitespace artifacts. This addresses a known issue where some language models occasionally emit ASCII control characters in place of Unicode punctuation.

Persistence

The save_decree() function in nomos/services/storage.py creates a DecreePage as a child of the DecreeIndexPage. The slug is derived from the title (truncated to 200 characters) and made unique by appending a numeric suffix if necessary. The page is published immediately via Wagtail's save_revision().publish() mechanism. The publication date is set to the current date and the status defaults to Geldig.

Anti-sameness system

To prevent the archive from becoming repetitive, the system employs two mechanisms:

  • Seed deduplication: the seed_id field (unique integer) ensures that no Vlaamse Codex document is used as a source more than once. Before selecting a seed, the pipeline queries all existing seed_id values and excludes them from the candidate pool.
  • Type-weighted selection: the _get_recent_type_counts() function counts how many times each instrument type has appeared in the last seven days. Types with higher recent counts receive proportionally lower selection weights, encouraging the system to alternate between decrees, orders, and circulars.

Public interface

Index page

The index page lists all published DecreePage children ordered by publication date (newest first). A search bar and an instrument type dropdown filter are provided. Search uses Wagtail's database search backend, querying the full_title and body fields. The type filter applies an exact match on the instrument field. Each list entry displays the full title, publication date, instrument type label, and a colour-coded status tag (green for Geldig, grey for Gearchiveerd).

Document detail page

Each decree page displays:

  • A breadcrumb navigation link back to the index page.
  • The full title as a top-level heading.
  • A GOV.UK-style summary list with key/value rows for document type, publication date, and status (rendered as a tag badge).
  • The full decree body rendered as rich text.

Visual design

The interface is very loosely inspired by the GOV.UK Design System. The base template features a dark masthead with a yellow (#ffe615) accent border, the site name "Nomos" as a navigation link, and a dark/light mode toggle button. The toggle uses CSS custom properties for theming and persists the user's preference in localStorage. In dark mode, the masthead shifts to #1a1a1a, links become lighter, and status tag colours are adjusted for contrast. A responsive layout collapses the summary list key/value pairs into a single column below 576px.

Administration

The application uses the standard Wagtail admin interface. DecreePage content panels expose the instrument type, full title, body, publication date, and status. A separate "Generatie-informatie" settings panel groups the seed ID, seed reference URL, seed document description, generation notes, and revision notes — metadata that is recorded automatically during generation and is accessible to editors but not displayed on the public site.

The Django admin is available for lower-level database access.

Logging

Application logging is configured with two handlers: console output and a rotating file log (nomos.log in the project root). The nomos logger is set to INFO level and records each pipeline stage (seed selection, title tilting, drafting, validation, publication) with document identifiers and truncated titles.

Management commands

Command Purpose
generate_decree Run the full generation pipeline: fetch seed, tilt title, draft decree, apply subtle revision, validate, generate notes, and publish as a Wagtail page
setup_index_page Create the DecreeIndexPage as the site root (idempotent); removes the default Wagtail welcome page if present

Deployment

The planned production deployment targets nomos.yusupov.cloud on a Hetzner VPS running Nginx as a reverse proxy, Gunicorn as the WSGI application server, and PostgreSQL as the production database (replacing SQLite). TLS is to be provided by Let's Encrypt via Certbot. Daily generation is to be scheduled via cron calling python manage.py generate_decree.

See also

References