I Built an AI That Writes AI News Articles — Here's the Full System

Why I Built This

I run a paid article platform called Draft. To drive traffic, I needed weekly AI industry analysis articles. Problem: I don't have time to write them manually every week.

But I also didn't want to just translate RSS feeds — that's not worth reading. I wanted original analysis backed by real data, generated fully automatically at zero cost.

Here's how I built it.

The Setup

Two machines, zero cloud API costs.

mini (Ubuntu Server) — Data Collection

Daily cron job collects from 9 RSS feeds + Hacker News
AI funding data (AI Funding Tracker, Crunchbase)
US federal & state AI legislation (GovTrack API, OpenStates API)
AI company stock prices, volume, sentiment (Trade2 system — ML analysis of all S&P 500 stocks)

M1 Max Mac (64GB RAM) — The Brain

Qwen 2.5 72B running locally via Ollama
Rich article text extraction (trafilatura)
DuckDuckGo web searches
9-phase pipeline generating the final article

No OpenAI API. No Claude API. Everything runs locally.

The monthly cost? $0. Just electricity.

For someone like me who wants to stay on top of AI industry news without paying for API calls, this turned out to be the perfect setup.

The 9-Phase Pipeline

Just telling an LLM "write an article" produces shallow content. I broke down how a human journalist works and turned it into 9 automated steps.

Phase A: Data Aggregation

Pull this week's data from mini's databases, then use trafilatura on Mac to extract clean full-text from each article URL.

From the last run:

News articles: 146 (9 sources)
Funding rounds: 25
AI regulation bills: 46 (federal + state)
Stock data: GOOGL, MSFT, META, NVDA daily prices & volume

Phase B: Theme Discovery

Instead of "summarize the news," I ask Qwen to "find the stories the data is telling." Not a list of headlines — big themes (3-4 per week).

Last week's themes:

AI security breaches escalating
Big AI companies making strategic moves
Regulation accelerating
Funding surge in healthcare & robotics

Phase C: Deep Analysis

For each theme, Qwen reads the full article text (up to 3,000 chars per article). RSS titles and summaries aren't enough for deep analysis. Full text gives Qwen access to specific numbers, quotes, and facts.

Phase C2-C4: Qwen Says "Look This Up For Me"

This is the most interesting part.

After the initial analysis, I ask Qwen: "What else do you need to know to write a great article?" It responds:

{
  "queries": [
    {
      "query": "Claude Code leak GitHub details 512000 lines",
      "reason": "Need specifics on the scale and impact of the leak"
    },
    {
      "query": "Anthropic political action committee spending 2026",
      "reason": "Need PAC spending details"
    }
  ]
}

Qwen is offline (local LLM) — it can't search the web. So Mac acts as Qwen's hands, searching DuckDuckGo and fetching full article text with trafilatura.

Qwen: "Look up the Claude Code leak details"
  ↓
Mac: DuckDuckGo search → fetches TechCrunch article (5,093 chars)
  ↓
Qwen: "Got it. 512,000 lines of TypeScript leaked via npm. Updating analysis."

This loops until Qwen says it has enough information. Up to 3 rounds. In the last run, Qwen requested 8 additional searches, and Mac successfully fetched 7.

Phase D: Writing the Article

At this point, Qwen has:

146 news articles (with full text)
4 theme analyses
7 additional web research results
Real stock price and volume data
46 regulation bills

All of this goes into the article generation prompt. Key rules:

Don't write "ai_score rose to 72." Write "GOOGL gained +5.3% for the week, with volume at 71% of the 20-day average." Internal metrics mean nothing to readers.

Never write "according to our additional research." No references to the research process. Write as if you knew it all along.

Phase E: Self-Review

Qwen reviews its own article as a "tough editor." Scores each aspect (originality, data usage, structure, depth, source balance) out of 5, then rewrites based on its own critique.

Phase F: Final Polish + Publish

One more pass for consistency, then programmatic cleanup — Python mechanically removes any leaked phrases like "according to additional information" that the LLM might have missed.

Posted to Draft via API.

What Went Wrong

Google Blocks Scraping

Google search completely blocked my scraping attempts. Switched to DuckDuckGo's HTML endpoint — works reliably every time.

"According to additional research" Problem

When Qwen incorporates web search results, it tends to write "according to additional research..." — meaningless to readers. Three-layer defense:

Prompt instructs "never use this phrase"
Review phase tells LLM to remove it
Python mechanically strips remaining instances

72B Is Slow

Qwen 2.5 72B on M1 Max takes 2-15 minutes per call. 24 LLM calls = 103 minutes total. Fine for overnight runs, not for real-time.

Results

Metric	Value
LLM calls	24
Web searches	7 successful
Input data	146 articles + 25 funding rounds + 46 bills + stock data
Output article	4,290 characters
Total time	103 minutes
Cloud API cost	$0

Honest Assessment

What works:

Data-backed analysis is consistent. No "tired this week, phoning it in"
Cross-referencing stock data with news is tedious manually but effortless automated
No human can read 46 regulation bills weekly. An LLM can

What doesn't (yet):

Still lacks the "aha!" insights that a great human analyst would catch
News sources skew toward TechCrunch
The "RSS translation" feel isn't fully gone despite all the prompting
72B is good but not GPT-4 level for nuanced analysis

What's next:

Auto-tracking predictions ("we said X last month — here's what actually happened")
Market share data integration (Similarweb)
Auto-generated charts in articles

Articles generated by this system are published on Draft.

Stack: Qwen 2.5 72B (M1 Max 64GB, local) + FastAPI + SQLite + Trade2 (S&P 500 ML analysis) + DuckDuckGo