Skip to content

fabiorollin/weekly-digest

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Weekly Kubernetes Digest

A small Python pipeline that pulls the past week from ~20 curated Kubernetes-and-cloud RSS feeds, asks Claude to synthesize a themed roundup, and writes a markdown post into the blog's content/digest/ directory.

Built on the same pattern as compliance-analyzermock-friendly architecture, real LLM, optional API key. Without ANTHROPIC_API_KEY set, the script falls back to a deterministic canned digest from the headlines, so testing and CI don't need credentials.

What it does — and what it deliberately doesn't

Does:

  • Pulls RSS feeds only — no HTML scraping, no source bodies, no terms-of-service gymnastics.
  • Passes titles + the publisher-provided summary to Claude — not full articles.
  • Synthesizes one ~400-word post per week, organized by theme (not by source).
  • Hard-codes constraints into the system prompt: never quote >10 words verbatim from any source, every claim must link back, drop pure marketing items.
  • Writes a Sources considered this week block at the bottom linking every item that was passed in (cited or not).

Does not:

  • Scrape source HTML pages.
  • Reproduce article bodies, code blocks, diagrams, or screenshots.
  • Run as a "content farm" — the output is a synthesized roundup, with every claim attributed.
  • Touch any source whose robots.txt blocks the bot.

The bot identifies itself in fetches with a User-Agent that links to a contact page on the blog: fabiorollin-weekly-digest/1.0 (+https://blog.fabiorollin.com/digest/about/).

Stack

Layer Choice
Fetch feedparser
LLM Anthropic Claude (Sonnet 4.5)
Output Markdown into a Hugo blog content/digest/
Schedule (Phase 2) Kubernetes CronJob, weekly
Deploy (Phase 2) CronJob commits to git → GitHub Actions rebuilds blog

Quick start (local)

pip install -r requirements.txt
cp .env.example .env       # optional — set ANTHROPIC_API_KEY here

# Dry-run to stdout, no file written
python digest.py --dry-run

# Write to ../blog/content/digest/weekly-YYYY-wWW.md
python digest.py

Then rebuild the blog with the existing docker build / push / kubectl set image flow and the new digest will be live at https://blog.fabiorollin.com/digest/.

Phase 2: weekly automation in EKS

The script and Dockerfile are already shaped for a CronJob deployment. The path:

  1. Push the blog content folder to a git repo (e.g., github.com/fabiorollin/blog).
  2. Set up GitHub Actions on that repo to rebuild + push the blog Docker image and roll the deployment whenever content/digest/*.md changes.
  3. Create a deploy key with write access for the digest bot:
    ssh-keygen -t ed25519 -C "digest-bot" -f digest_key -N ""
    # Add digest_key.pub to the blog repo's Deploy keys (write access)
  4. Build + push the digest image:
    ECR=807291695385.dkr.ecr.us-east-1.amazonaws.com
    aws ecr create-repository --repository-name weekly-digest --region us-east-1
    docker build -t $ECR/weekly-digest:latest .
    docker push $ECR/weekly-digest:latest
  5. Create the secrets in cluster:
    kubectl create namespace weekly-digest
    kubectl create secret generic anthropic-api -n weekly-digest \
      --from-literal=api-key=sk-ant-xxxxx
    ssh-keyscan github.com > known_hosts
    kubectl create secret generic blog-deploy-key -n weekly-digest \
      --from-file=id_ed25519=digest_key \
      --from-file=known_hosts=known_hosts
  6. Apply the CronJob:
    kubectl apply -f k8s/cronjob.yaml

The job runs every Monday at 13:00 UTC. The first run can be triggered manually:

kubectl create job --from=cronjob/weekly-digest manual-run -n weekly-digest
kubectl logs -f job/manual-run -n weekly-digest

Cost

  • One Anthropic call per week, ~10K input + 600 output tokens with Claude Sonnet ≈ $0.05/week ($2.50/year)
  • One CronJob pod, ~30 seconds/week, free inside EKS Auto's existing capacity
  • ECR storage: <$1/month
  • Net: ~$1–4/month

Source list

Curated in sources.yaml — 20 feeds split across three tiers:

  • Tier 1 (5): Kubernetes Blog, CNCF Blog, Last Week in Kubernetes Development, KubeFM, Learnk8s
  • Tier 2 (10): Red Hat OpenShift, Datadog, Sysdig, Aqua, Snyk, Grafana, HashiCorp, Buoyant, Solo.io, VMware Tanzu
  • Tier 3 (5): Container Solutions, iximiuz, Loft Labs, Ahmet Alp Balkan, KubeWeekly

Edit sources.yaml to add or drop sources. Restart the script — no other config touches needed.

File layout

weekly-digest/
├── digest.py              # main pipeline
├── sources.yaml           # curated RSS feed list
├── requirements.txt
├── Dockerfile             # for the Phase 2 CronJob
├── scripts/
│   └── run.sh             # CronJob entrypoint: clone → run → commit → push
├── k8s/
│   └── cronjob.yaml       # Namespace + CronJob with secret mounts
├── .env.example
├── .gitignore
└── README.md

What this project demonstrates

For a Solutions Engineer / platform engineer / AI-curious portfolio:

  • A real LLM content pipeline in production — not a toy, runs on a schedule.
  • Cost-bounded prompt design — bounded payload, single round-trip per week, ~$0.05/run.
  • Editorial judgment encoded in the system prompt — copyright-aware, signal-over-noise.
  • Kubernetes CronJob + git-driven CI/CD — same pattern as a release pipeline, just for content.
  • Defensive ops — RSS-only, attribution-required, robots.txt respected, contact path published.

About

Automated weekly Kubernetes news roundup — Python + RSS + Anthropic Claude, ships as a Kubernetes CronJob

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors