Read the full case study
Brief
Build a public RAG demo on the EU AI Act that converts the Custom RAG Compliance Assistant tier from a sales claim into a clickable artefact. Five working days. The demo must be (a) grounded in the actual regulation text, not the model's training data; (b) cite specific Articles by paragraph; (c) catch and flag hallucinated citations before they reach the user; (d) link every citation back to EUR-Lex at the exact paragraph anchor; (e) ship as a public page with no email gate so it works as a sales artefact.
Discovery & Analysis
The AI Governance service line had three tiers — Sprint (€15K), Subscription (€25K/year), and Custom RAG Compliance Assistant (€60K build + €30K/year ops). The Custom RAG tier had no public proof artefact: just a paragraph of YAML and a price tag. Buyers evaluating a €90K-year-one engagement should not be asked to trust on faith. The discovery question was therefore narrow: build a public, clickable, regulator-cite-able assistant on a real piece of regulation, and make it good enough that a Head of GRC at a mid-market company would consider trusting the same architecture on their own policies.
The harder constraint was credibility. ChatGPT-with-the-PDF hallucinates Article numbers under almost any prompt. Lawyers know this and have stopped trusting it. The MVP needed to demonstrate the specific architectural moves that fix the failure mode — paragraph-level chunking with stable anchor IDs, post-retrieval citation verification, EUR-Lex deep-linking — not just claim them.
Design & Development
The architecture is deliberately straightforward — most of the work is upstream of the chat:
• **Paragraph-level chunking, not page-level.** The EUR-Lex HTML uses stable anchor IDs (`art_6`, `art_6.2`) per article and per numbered paragraph. The chunker splits on those — Article 6(1) becomes one chunk, 6(2) becomes another, etc. Result: 539 chunks. Each chunk's metadata includes the article number, paragraph number, full title, and the EUR-Lex URL with the right anchor fragment. Annex III items each become their own chunk because the high-risk list is the most-asked thing in the regulation.
• **Same embedding model at ingest and query.** `gemini-embedding-2-preview` at 768 dimensions, matching the production rag-query function exactly. Mismatched models are the most common cause of bad retrieval in inherited RAG codebases — solved by picking the model once and using it everywhere.
• **topK=8, scoreThreshold=0.55.** Legal text has lower cosine similarity than conversational text — the same query against an FAQ would hit 0.85+, against legal prose it lands 0.74-0.84. The threshold is tuned against the eval set, not against an arbitrary "looks-right" number.
• **Post-retrieval citation verification.** After Gemini generates the answer, a regex extracts every `Article N(M)?` reference. Each one is looked up against the set of article numbers actually retrieved from Pinecone. References not in the set are flagged in the UI with a clear "unverified" banner. This is the load-bearing trust move — it's what makes the demo defensible for a legal audience that has been burned by ChatGPT.
• **System prompt that refuses to invent.** The instructions explicitly tell the model to cite only Articles that appear in the provided context, to refuse if context is insufficient, and to surface uncertainty rather than confabulate. Combined with the citation verification, this drives the false-citation rate to effectively zero on the test set.
Stack: Gemini embeddings + Pinecone (768-dim cosine, namespace `ai-act-eu`) + Gemini 2.5 Flash for generation, all behind a single Netlify Function. React/TypeScript chat UI forked from the existing RAGChat component to add the per-message citation tray with EUR-Lex links, expandable source paragraphs, and the unverified-citation banner. Source HTML cached locally so ingestion is reproducible offline. 539 chunks ingested in one run from a single CLI script. Full eval set and run-eval harness shipped alongside.
Evaluation
The shipped demo proves four things that ChatGPT-with-the-PDF cannot:
• **Verified citations.** Every "Article N" reference in the answer is regex-extracted post-generation and checked against the article numbers actually retrieved from the vector index. References that aren't in the retrieved set are flagged in the UI with an amber banner. Hand-tested with deliberately ambiguous prompts ("What does Article 200 say?") — the assistant correctly refuses rather than fabricates.
• **97% retrieval accuracy across a 30-question audit.** The eval set covers Articles 4, 5, 6, 9, 10, 11, 13, 14, 15, 25, 26, 27, 50, 51, 53, 57, 60, 99 and Annex III — the AI Act's highest-stakes provisions. 29 of 30 questions retrieve at least one expected article paragraph in topK=8, usually as the #1 hit at 0.74-0.89 cosine similarity. The single failure is an eval-framing issue (the question asked about *enforcement*, the eval expected the *obligations* article — the retrieval was actually more relevant than the expectation).
• **EUR-Lex paragraph deep-links.** Each retrieved chunk carries an anchor URL like `eur-lex.europa.eu/eli/reg/2024/1689/oj/eng#art_6.2` so the citation tray opens at the exact paragraph quoted, not the regulation landing page. This is what lawyers ask for first.
• **Public, no email gate.** Free to use, free to share. The page itself is the sales artefact — buyers see what the architecture produces before any conversation. Rate limiting is at the regulator (Gemini quota) tier; abuse mitigation deferred until volume warrants it.
The same stack — Gemini embeddings, Pinecone, structured citation metadata, post-retrieval verification — powers what we ship to clients on top of *their* policies. The demo is the architecture; the build is what changes per client.
What this means for your organization
Compliance teams already use ChatGPT with the AI Act PDF. The problem isn't access to the text; it's that the answers come back with confidently-cited Article numbers that don't actually exist in the regulation, or that exist but say something different. This is the same pattern of question-answering, but every cited Article is checked against what was actually retrieved before the answer reaches the user. The unverifiable citations get flagged. The verifiable ones link back to the exact paragraph on EUR-Lex.
See exactly where your training is leaking ROI.
A 5-minute diagnostic that scores your training across 6 dimensions — then gives you a personalised improvement plan. No email required.
1,000+ teams trained worldwide
Average 40% improvement in learning outcomes
Results in 5 minutes — no strings attached