I Built an AI That Writes AI News Articles — Here's the Full System
Why I Built This
I run a paid article platform called Draft. To drive traffic, I needed weekly AI industry analysis articles. Problem: I don't have time to write them manually every week.
But I also didn't want to just translate RSS feeds — that's not worth reading. I wanted original analysis backed by real data, generated fully automatically at zero cost.
Here's how I built it.
The Setup
Two machines, zero cloud API costs.
mini (Ubuntu Server) — Data Collection
- Daily cron job collects from 9 RSS feeds + Hacker News
- AI funding data (AI Funding Tracker, Crunchbase)
- US federal & state AI legislation (GovTrack API, OpenStates API)
- AI company stock prices, volume, sentiment (Trade2 system — ML analysis of all S&P 500 stocks)
M1 Max Mac (64GB RAM) — The Brain
- Qwen 2.5 72B running locally via Ollama
- Rich article text extraction (trafilatura)
- DuckDuckGo web searches
- 9-phase pipeline generating the final article
No OpenAI API. No Claude API. Everything runs locally.
The monthly cost? $0. Just electricity.
For someone like me who wants to stay on top of AI industry news without paying for API calls, this turned out to be the perfect setup.
The 9-Phase Pipeline
Just telling an LLM "write an article" produces shallow content. I broke down how a human journalist works and turned it into 9 automated steps.
Phase A: Data Aggregation
Pull this week's data from mini's databases, then use trafilatura on Mac to extract clean full-text from each article URL.
From the last run:
- News articles: 146 (9 sources)
- Funding rounds: 25
- AI regulation bills: 46 (federal + state)
- Stock data: GOOGL, MSFT, META, NVDA daily prices & volume
Phase B: Theme Discovery
Instead of "summarize the news," I ask Qwen to "find the stories the data is telling." Not a list of headlines — big themes (3-4 per week).
Last week's themes:
- AI security breaches escalating
- Big AI companies making strategic moves
- Regulation accelerating
- Funding surge in healthcare & robotics
Phase C: Deep Analysis
For each theme, Qwen reads the full article text (up to 3,000 chars per article). RSS titles and summaries aren't enough for deep analysis. Full text gives Qwen access to specific numbers, quotes, and facts.
Phase C2-C4: Qwen Says "Look This Up For Me"
This is the most interesting part.
After the initial analysis, I ask Qwen: "What else do you need to know to write a great article?" It responds:
{
"queries": [
{
"query": "Claude Code leak GitHub details 512000 lines",
"reason": "Need specifics on the scale and impact of the leak"
},
{
"query": "Anthropic political action committee spending 2026",
"reason": "Need PAC spending details"
}
]
}
Qwen is offline (local LLM) — it can't search the web. So Mac acts as Qwen's hands, searching DuckDuckGo and fetching full article text with trafilatura.
Qwen: "Look up the Claude Code leak details"
↓
Mac: DuckDuckGo search → fetches TechCrunch article (5,093 chars)
↓
Qwen: "Got it. 512,000 lines of TypeScript leaked via npm. Updating analysis."
This loops until Qwen says it has enough information. Up to 3 rounds. In the last run, Qwen requested 8 additional searches, and Mac successfully fetched 7.
Phase D: Writing the Article
At this point, Qwen has:
- 146 news articles (with full text)
- 4 theme analyses
- 7 additional web research results
- Real stock price and volume data
- 46 regulation bills
All of this goes into the article generation prompt. Key rules:
Don't write "ai_score rose to 72." Write "GOOGL gained +5.3% for the week, with volume at 71% of the 20-day average." Internal metrics mean nothing to readers.
Never write "according to our additional research." No references to the research process. Write as if you knew it all along.
Phase E: Self-Review
Qwen reviews its own article as a "tough editor." Scores each aspect (originality, data usage, structure, depth, source balance) out of 5, then rewrites based on its own critique.
Phase F: Final Polish + Publish
One more pass for consistency, then programmatic cleanup — Python mechanically removes any leaked phrases like "according to additional information" that the LLM might have missed.
Posted to Draft via API.
What Went Wrong
Google Blocks Scraping
Google search completely blocked my scraping attempts. Switched to DuckDuckGo's HTML endpoint — works reliably every time.
"According to additional research" Problem
When Qwen incorporates web search results, it tends to write "according to additional research..." — meaningless to readers. Three-layer defense:
- Prompt instructs "never use this phrase"
- Review phase tells LLM to remove it
- Python mechanically strips remaining instances
72B Is Slow
Qwen 2.5 72B on M1 Max takes 2-15 minutes per call. 24 LLM calls = 103 minutes total. Fine for overnight runs, not for real-time.
Results
| Metric | Value |
|---|---|
| LLM calls | 24 |
| Web searches | 7 successful |
| Input data | 146 articles + 25 funding rounds + 46 bills + stock data |
| Output article | 4,290 characters |
| Total time | 103 minutes |
| Cloud API cost | $0 |
Honest Assessment
What works:
- Data-backed analysis is consistent. No "tired this week, phoning it in"
- Cross-referencing stock data with news is tedious manually but effortless automated
- No human can read 46 regulation bills weekly. An LLM can
What doesn't (yet):
- Still lacks the "aha!" insights that a great human analyst would catch
- News sources skew toward TechCrunch
- The "RSS translation" feel isn't fully gone despite all the prompting
- 72B is good but not GPT-4 level for nuanced analysis
What's next:
- Auto-tracking predictions ("we said X last month — here's what actually happened")
- Market share data integration (Similarweb)
- Auto-generated charts in articles
Articles generated by this system are published on Draft.
Stack: Qwen 2.5 72B (M1 Max 64GB, local) + FastAPI + SQLite + Trade2 (S&P 500 ML analysis) + DuckDuckGo