Personalized News Reader
A self-hosted, bilingual, friction-free press reader.
Overview
Personalized News Reader is a personal press reader I built for myself. It aggregates paywalled articles from the French press (via Europresse/BnF) and Spanish press (direct scraping), presents them in reader mode, and saves them to a personal, rated library. I have used it every day since it first ran.
The problem
I enjoy reading serious journalism in French and Spanish. The problem wasn’t access —I have legitimate access to Europresse through the BnF— it was friction. Every article required opening Europresse, authenticating, searching, navigating. That accumulated attention cost meant I ended up not reading at all.
The answer wasn’t to read less. It was to remove the access friction entirely.
The solution
The workflow is the heart of the project: from URL to reader mode in seconds, across two languages.
- I paste a URL (or type keywords in French mode) on the web, or share the article from Safari via an iOS Shortcut.
- Playwright automates access headless: in French, BnF authentication and Europresse search; in Spanish, direct access with paywall bypass (El País) using a persistent session.
- Readability cleans the content and the article appears in reader mode —in under 15 seconds— ready to rate with like / dislike.
Architecture
📱 iOS Shortcut 🌐 Web (Safari)
│ POST /add │
└──────────┬─────────┘
▼
⚙️ Flask (Hetzner VPS)
Google OAuth · job queue
│
┌──────────┴──────────┐
│ FR: keywords │ ES: direct URL
▼ ▼
🎭 Playwright 🎭 Playwright
BnF → Europresse paywall bypass
│ │
▼ ▼
📝 Clean text 📰 Readability
└──────────┬──────────┘
▼
🗄️ SQLite (articles.db)
like · dislike · time · scroll
▼
📖 Personal reader The only manual step is running the Shortcut. Everything else is automated.
Tech stack
| Layer | Technology |
|---|---|
| Browser automation | Playwright + Chromium (headless) |
| Content extraction | readability-lxml (ES), custom extraction (FR) |
| Web server | Flask + SSE (real-time progress) |
| Storage | SQLite (careful connection management) |
| Authentication | Google OAuth 2.0 (Authlib) |
| Mobile integration | iOS Shortcuts → POST /add with token |
| Infrastructure | Hetzner VPS (Ubuntu) + Cloudflare Tunnel + systemd |
| Frontend | Vanilla HTML/CSS/JS, mobile-first (Safari iOS) |
Technical challenges solved
Chained authentication automation
Playwright manages two independent auth flows —BnF/Europresse for French, and El País with persistent JSON sessions for Spanish— with no manual intervention.
Robust extraction against hostile CMSs
Some sites inject an editor toolbar into the public DOM. I built a cleaning pipeline with anti-CMS garbage filters before Readability, with a paragraph-selector fallback.
Image proxy with Referer spoofing
El País images are blocked by CORS/Referer policy. An /api/img server-side proxy route adds the correct Referer header, with a domain whitelist.
Frictionless iOS integration
The /add endpoint responds in under 200 ms so the Shortcut notification appears immediately. Includes URL deduplication and a fix for the iOS bug that doubles the URL in the POST body.
Background job queue with auto-drain
When the scraper is busy, articles are queued. After each scrape completes, the system automatically picks up the next queued items in sequence — no manual retry needed.
Personal production infrastructure
The service runs as a systemd daemon on a VPS, reachable from anywhere via HTTPS Cloudflare Tunnel, with Google OAuth as the auth layer.
Results
- Daily use since day one — the project solved a real problem and changed a habit.
- Two completely different sources unified in a single bilingual interface.
- Zero access friction: a URL shared from Safari → article in reader mode in seconds.
- A personal rated library, ready to feed a recommendation system.
What this project demonstrates
I built a complete end-to-end tool alone —from browser automation to production infrastructure— to solve a real everyday problem. The motivation wasn’t technical: it was recovering the habit of reading serious journalism, removing the friction that prevented it. The next step is the one that interests me most: using the rating history to build a recommender that learns my taste.