A small Python pipeline that pulls the past week from ~20 curated Kubernetes-and-cloud RSS feeds, asks Claude to synthesize a themed roundup, and writes a markdown post into the blog's content/digest/ directory.
Built on the same pattern as
compliance-analyzer— mock-friendly architecture, real LLM, optional API key. WithoutANTHROPIC_API_KEYset, the script falls back to a deterministic canned digest from the headlines, so testing and CI don't need credentials.
Does:
- Pulls RSS feeds only — no HTML scraping, no source bodies, no terms-of-service gymnastics.
- Passes titles + the publisher-provided summary to Claude — not full articles.
- Synthesizes one ~400-word post per week, organized by theme (not by source).
- Hard-codes constraints into the system prompt: never quote >10 words verbatim from any source, every claim must link back, drop pure marketing items.
- Writes a
Sources considered this weekblock at the bottom linking every item that was passed in (cited or not).
Does not:
- Scrape source HTML pages.
- Reproduce article bodies, code blocks, diagrams, or screenshots.
- Run as a "content farm" — the output is a synthesized roundup, with every claim attributed.
- Touch any source whose
robots.txtblocks the bot.
The bot identifies itself in fetches with a User-Agent that links to a contact page on the blog: fabiorollin-weekly-digest/1.0 (+https://blog.fabiorollin.com/digest/about/).
| Layer | Choice |
|---|---|
| Fetch | feedparser |
| LLM | Anthropic Claude (Sonnet 4.5) |
| Output | Markdown into a Hugo blog content/digest/ |
| Schedule (Phase 2) | Kubernetes CronJob, weekly |
| Deploy (Phase 2) | CronJob commits to git → GitHub Actions rebuilds blog |
pip install -r requirements.txt
cp .env.example .env # optional — set ANTHROPIC_API_KEY here
# Dry-run to stdout, no file written
python digest.py --dry-run
# Write to ../blog/content/digest/weekly-YYYY-wWW.md
python digest.pyThen rebuild the blog with the existing docker build / push / kubectl set image flow and the new digest will be live at https://blog.fabiorollin.com/digest/.
The script and Dockerfile are already shaped for a CronJob deployment. The path:
- Push the blog content folder to a git repo (e.g.,
github.com/fabiorollin/blog). - Set up GitHub Actions on that repo to rebuild + push the blog Docker image and roll the deployment whenever
content/digest/*.mdchanges. - Create a deploy key with write access for the digest bot:
ssh-keygen -t ed25519 -C "digest-bot" -f digest_key -N "" # Add digest_key.pub to the blog repo's Deploy keys (write access)
- Build + push the digest image:
ECR=807291695385.dkr.ecr.us-east-1.amazonaws.com aws ecr create-repository --repository-name weekly-digest --region us-east-1 docker build -t $ECR/weekly-digest:latest . docker push $ECR/weekly-digest:latest
- Create the secrets in cluster:
kubectl create namespace weekly-digest kubectl create secret generic anthropic-api -n weekly-digest \ --from-literal=api-key=sk-ant-xxxxx ssh-keyscan github.com > known_hosts kubectl create secret generic blog-deploy-key -n weekly-digest \ --from-file=id_ed25519=digest_key \ --from-file=known_hosts=known_hosts - Apply the CronJob:
kubectl apply -f k8s/cronjob.yaml
The job runs every Monday at 13:00 UTC. The first run can be triggered manually:
kubectl create job --from=cronjob/weekly-digest manual-run -n weekly-digest
kubectl logs -f job/manual-run -n weekly-digest- One Anthropic call per week, ~10K input +
600 output tokens with Claude Sonnet ≈ $0.05/week ($2.50/year) - One CronJob pod, ~30 seconds/week, free inside EKS Auto's existing capacity
- ECR storage: <$1/month
- Net: ~$1–4/month
Curated in sources.yaml — 20 feeds split across three tiers:
- Tier 1 (5): Kubernetes Blog, CNCF Blog, Last Week in Kubernetes Development, KubeFM, Learnk8s
- Tier 2 (10): Red Hat OpenShift, Datadog, Sysdig, Aqua, Snyk, Grafana, HashiCorp, Buoyant, Solo.io, VMware Tanzu
- Tier 3 (5): Container Solutions, iximiuz, Loft Labs, Ahmet Alp Balkan, KubeWeekly
Edit sources.yaml to add or drop sources. Restart the script — no other config touches needed.
weekly-digest/
├── digest.py # main pipeline
├── sources.yaml # curated RSS feed list
├── requirements.txt
├── Dockerfile # for the Phase 2 CronJob
├── scripts/
│ └── run.sh # CronJob entrypoint: clone → run → commit → push
├── k8s/
│ └── cronjob.yaml # Namespace + CronJob with secret mounts
├── .env.example
├── .gitignore
└── README.md
For a Solutions Engineer / platform engineer / AI-curious portfolio:
- A real LLM content pipeline in production — not a toy, runs on a schedule.
- Cost-bounded prompt design — bounded payload, single round-trip per week, ~$0.05/run.
- Editorial judgment encoded in the system prompt — copyright-aware, signal-over-noise.
- Kubernetes CronJob + git-driven CI/CD — same pattern as a release pipeline, just for content.
- Defensive ops — RSS-only, attribution-required, robots.txt respected, contact path published.