Every day, a new cybersecurity roundup appears on this site covering the most relevant developments in AI security, privacy, and digital sovereignty. The entire process — from collecting articles to publishing the finished piece — is automated. It might look straightforward, but getting here involved solving several non-obvious problems. It was also important to us that we build this ethically, respecting the copyright and intellectual property of the authors whose work we reference.
The Pipeline
The system is a Python CLI tool that runs as a single command. It moves through five stages — fetch, classify, summarize, reference, and post-process — each feeding its output into the next.
Fetching
We subscribe to roughly 60 RSS feeds spanning security news outlets, research blogs, vendor threat intelligence teams, privacy organizations, and AI safety researchers. The fetcher pulls all of them concurrently, strips HTML from the content, and produces a structured list of articles with URLs, titles, source names, summaries, and body text. Importantly, we only use content that appears in the RSS feeds themselves.
From there, we filter out articles shorter than 200 characters (typically link posts or bare announcements with no substance), deduplicate by URL, and narrow the results to articles published on the target date. On a typical day, this yields between 10 and 60 candidate articles.
Classification
Not every article from those 60 feeds is relevant to our focus. We use GPT-4o-mini to score each article from 0 to 10 based on our core interests: AI safety, EU AI Act developments, privacy and digital sovereignty, and secure development practices. The model receives the title, summary, and the first 1,000 characters of the body when available.
Articles scoring 3 or above make the cut. We cap the list at 30 to keep the roundup focused and costs reasonable. GPT-4o-mini is fast and inexpensive enough for this triage step — there’s no need to deploy a frontier model just to decide whether a Chrome patch article is relevant. We also benchmarked this approach against traditional classification methods and found the LLM scores more accurately for our use case.
Summarization
This is where the core writing happens. We send all surviving articles to GPT-4.1 with a system prompt that instructs it to produce a thematic news roundup — not a list of individual summaries, but flowing prose that groups related stories under coherent section headings. We use a more capable model here because the quality of the narrative matters.
Early on, we asked the summarization model to insert source citations inline as it wrote. This consistently led to hallucinated or misattributed references. Splitting citation handling into its own stage solved the problem entirely.
Reference Insertion
A separate LLM call receives the finished narrative alongside the list of source articles. Its sole task is to insert numbered citation markers at appropriate positions in the text and append a Sources section at the bottom. By giving the model a single, focused job, citation accuracy improved dramatically.
We validate the output deterministically after the LLM returns it, stripping any citations that point to nonexistent articles — a guardrail against the occasional hallucinated reference number.
Post-Processing
The final stage involves no AI at all. We convert the inline citation markers into clickable anchor links, extract section headings to generate topic tags, and build YAML frontmatter with the title, date, tags, and summary. The result is a publish-ready markdown file.
Orchestration
A shell script ties the pipeline together. It invokes the Python tool, which writes a markdown file directly into the website’s content directory. The script then commits the new file to git and, optionally, deploys the site to production.
What We Learned
Decompose the AI work. Splitting the pipeline into separate stages — classify, summarize, reference — rather than asking a single model to handle everything in one pass made the biggest difference in output quality. Each stage gets a focused prompt and the right model for the job. Classification doesn’t need a frontier model; summarization does.
Add deterministic guardrails. LLMs will occasionally cite article 15 when only 7 exist, or wrap perfectly valid markdown in code fences. Every boundary between an LLM stage and the rest of the system needs validation logic that catches and corrects these errors programmatically.
Log everything. We write every stage’s input and output to a session log. When a roundup comes out poorly, we can trace exactly what each model received and returned. This has been invaluable for debugging prompt issues and catching regressions.
Cost
The full pipeline costs roughly $0.05–0.15 per run, depending on how many articles pass classification. GPT-4o-mini for classification is negligible. The two GPT-4.1 calls — summarize and reference — account for nearly all the cost. At that price point, running it daily is an easy decision.