How We Built an Automated News Roundup with AI

ai-polished

Every day, a new cybersecurity roundup appears on this site covering the most relevant developments in AI security, privacy, and digital sovereignty. The entire process — from collecting articles to publishing the finished piece — is automated. It might look straightforward, but getting here involved solving several non-obvious problems. It was also important to us that we build this ethically, respecting the copyright and intellectual property of the authors whose work we reference.

The Pipeline

The system is a Python CLI tool that runs as a single command. It moves through five stages — fetch, classify, summarize, reference, and post-process — each feeding its output into the next.

Fetching

We subscribe to roughly 60 RSS feeds spanning security news outlets, research blogs, vendor threat intelligence teams, privacy organizations, and AI safety researchers. The fetcher pulls all of them concurrently, strips HTML from the content, and produces a structured list of articles with URLs, titles, source names, summaries, and body text. Importantly, we only use content that appears in the RSS feeds themselves.

From there, we filter out articles shorter than 200 characters (typically link posts or bare announcements with no substance), deduplicate by URL, and narrow the results to articles published on the target date. On a typical day, this yields between 10 and 60 candidate articles.

Classification

Not every article from those 60 feeds is relevant to our focus. We use GPT-4o-mini to score each article from 0 to 10 based on our core interests: AI safety, EU AI Act developments, privacy and digital sovereignty, and secure development practices. The model receives the title, summary, and the first 1,000 characters of the body when available.

Articles scoring 3 or above make the cut. We cap the list at 30 to keep the roundup focused and costs reasonable. GPT-4o-mini is fast and inexpensive enough for this triage step — there’s no need to deploy a frontier model just to decide whether a Chrome patch article is relevant. We also benchmarked this approach against traditional classification methods and found the LLM scores more accurately for our use case.

Summarization

This is where the core writing happens. We send all surviving articles to GPT-4.1 with a system prompt that instructs it to produce a thematic news roundup — not a list of individual summaries, but flowing prose that groups related stories under coherent section headings. We use a more capable model here because the quality of the narrative matters.

Early on, we asked the summarization model to insert source citations inline as it wrote. This consistently led to hallucinated or misattributed references. Splitting citation handling into its own stage solved the problem entirely.

Reference Insertion

A separate LLM call receives the finished narrative alongside the list of source articles. Its sole task is to insert numbered citation markers at appropriate positions in the text and append a Sources section at the bottom. By giving the model a single, focused job, citation accuracy improved dramatically.

We validate the output deterministically after the LLM returns it, stripping any citations that point to nonexistent articles — a guardrail against the occasional hallucinated reference number.

Post-Processing

The final stage involves no AI at all. We convert the inline citation markers into clickable anchor links, extract section headings to generate topic tags, and build YAML frontmatter with the title, date, tags, and summary. The result is a publish-ready markdown file.

Orchestration

A shell script ties the pipeline together. It invokes the Python tool, which writes a markdown file directly into the website’s content directory. The script then commits the new file to git and, optionally, deploys the site to production.

What We Learned

Decompose the AI work. Splitting the pipeline into separate stages — classify, summarize, reference — rather than asking a single model to handle everything in one pass made the biggest difference in output quality. Each stage gets a focused prompt and the right model for the job. Classification doesn’t need a frontier model; summarization does.

Add deterministic guardrails. LLMs will occasionally cite article 15 when only 7 exist, or wrap perfectly valid markdown in code fences. Every boundary between an LLM stage and the rest of the system needs validation logic that catches and corrects these errors programmatically.

Log everything. We write every stage’s input and output to a session log. When a roundup comes out poorly, we can trace exactly what each model received and returned. This has been invaluable for debugging prompt issues and catching regressions.

Cost

The full pipeline costs roughly $0.05–0.15 per run, depending on how many articles pass classification. GPT-4o-mini for classification is negligible. The two GPT-4.1 calls — summarize and reference — account for nearly all the cost. At that price point, running it daily is an easy decision.

Show raw draft What is this?

How We Built an Automated News Roundup with AI

We publish a daily cybersecurity roundup, covering the most relevant developments in AI security, privacy, digital sovereignty and related topics. Generating and publishing these roundups are automated. It might seem simple, but there were many issues and challenges we had to overcome. It was also very important for us to generate these roundups ethically, respecting the copyright and intellectual property of the authors.

The Pipeline

The system is a Python CLI tool that runs as a single command. It goes through several stages: fetch, classify, summarize, reference, and post-process. Each stage feeds into the next.

Fetching

We subscribe to about 60 RSS feeds from security news outlets, research blogs, vendor threat intel teams, privacy organizations and AI safety researchers. The fetcher pulls all of them concurrently, strips HTML from the content, and produces a list of Article objects with the URL, title, source name, summary and body text. We only use content published in the RSS feeds.

We then filter out very short articles (under 200 characters, which are usually just link posts or announcements with no substance), deduplicate by URL, and filter to only articles published on the target date. On a typical day this gives us somewhere between 10 and 60 candidate articles.

Classification

Not everything from those 60 feeds is relevant to our focus. So we use GPT-4o-mini to score each article from 0 to 10 based on our interests: AI safety, EU AI Act, privacy/digital sovereignty, and secure development practices. We pass the title, summary, and first 1000 characters of the body if we have it.

Articles scoring 3 or above make it through. We cap at 30 to keep the roundup focused and costs reasonable. gpt-4o-mini is cheap and fast enough for this triage step, we don’t need a frontier model to decide whether an article about a Chrome patch is relevant.

We also compared this method with a traditional classification method and the LLM scores better.

Summarization

We send all the surviving articles to GPT-4.1 with a system prompt that asks it to write a thematic news roundup. The LLM is instructed to write the article in a free flowing style, rather than giving one summary per article. We use a more capable model here because the quality of the writing matters. Initially, we instructed the LLM to place references within the text at this point, but hallucination was an issue for those, the LLM made up references or inserted wrong ones.

Reference Insertion

We split inserting the references as a separate step, one LLM call writes the narrative and a second one inserts the citations. The reference stage receives the narrative and the list of source articles, and its only job is to insert [N] markers at appropriate positions and append a Sources section at the bottom.

Post-Processing

The final stage is all deterministic, no LLM involved. We convert inline [N] markers into clickable anchor links, clean up and style the document.

Orchestration

A shell script ties it all together. It runs the Python tool, which writes a markdown file to the website’s content directory. The script then commits the new file to git and optionally deploys the website. We verify the generated content before publishing.

What We Learned

Splitting the LLM work into separate stages (classify, summarize, reference) instead of asking one model to do everything in a single call made a huge difference in output quality. Each stage has a focused prompt and the right model for the job. Classification doesn’t need a frontier model, but summarization does.

The validation step in reference insertion is important. LLMs will sometimes cite [15] when there are only 7 articles, or wrap perfectly good markdown in code fences. You need deterministic guardrails around LLM output.

Logging every stage’s input and output to a session JSON file has been invaluable for debugging. When a roundup comes out weird, we can go back and see exactly what each model received and returned.

Cost

The whole pipeline costs roughly $0.05-0.15 per run depending on how many articles pass classification. GPT-4o-mini for classification is negligible. The two GPT-4.1 calls (summarize + reference) are the bulk of the cost. At this price point, running it daily is a no-brainer.