Skip to content

feat: upgrade ergo-airflow plugin for Airflow 3.x compatibility#32

Open
joelkbiju12 wants to merge 10 commits into
mainfrom
chore-airflow-upgrade
Open

feat: upgrade ergo-airflow plugin for Airflow 3.x compatibility#32
joelkbiju12 wants to merge 10 commits into
mainfrom
chore-airflow-upgrade

Conversation

@joelkbiju12

Copy link
Copy Markdown
Contributor

Summary

  • Migrate ergo-airflow plugin from Airflow 2.3.4 to Airflow 3.x (3.1.7) compatibility
  • Update all operator, sensor, and trigger imports to use airflow.sdk and airflow.providers.standard
  • Replace deprecated appbuilder_views with simplified plugin registration
  • Update DAG definitions to use airflow.sdk.DAG with schedule parameter
  • Migrate SQLAlchemy models and Alembic migration env for Airflow 3.x compatibility
  • Simplify operator base classes and remove deprecated Airflow 2.x patterns

Test plan

  • Verify ergo plugin loads without errors in Airflow 3.1.7
  • Test DAG queuers (aries, calipso, selenium) trigger correctly
  • Validate job collector DAG and deferred job result operator
  • Confirm SQS task pusher and result operators function
  • Check Ergo task detail links render properly

🤖 Generated with Claude Code

@vulcanho vulcanho Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 VulcanHO Code Review

I've reviewed your changes and provided 7 suggestions:

  • 7 code improvement suggestions
  • 7 review comments

📋 Review Summary

PR updates the project for Airflow 3.x compatibility (SDK imports, provider hooks/sensors, schedule API, and UI changes) and adjusts migrations/configuration. Key risks include removal of UI registrations, dropped DB foreign key constraints, and loss of automatic initdb hook.

⚠️ CRITICAL BREAKING CHANGES

These changes will cause immediate failures and must be addressed before merge:

⚠️ 1. Removed Flask AppBuilder view and blueprint registration from the plugin; the /ergo UI routes are no longer registered unless replaced by a FastAPI-compatible mechanism.

Impact: src/plugin.py, src/www/views.py; UI access to Ergo task detail pages

Severity: HIGH

⚪ 2. Dropped the foreign key constraint between ergo_task and task_instance, removing cascade delete behavior and DB-level integrity checks.

Impact: src/models.py and src/migrations/versions/30f7e779a832_updated_to_2_2_taskinstance_table_schema.py; task cleanup and integrity enforcement

Severity: MEDIUM

⚪ 3. Removed the wrapper that triggered Ergo migrations during airflow db upgrade; migrations may no longer auto-run during upgrades.

Impact: src/init.py and migration initialization path

Severity: MEDIUM

💡 General Feedback

🏢 The task-detail UI flow now relies on direct URL generation and the /ergo/task_detail route. Ensure the route is still exposed under Airflow 3’s FastAPI webserver (since FAB/blueprints were removed) and that the request parameters are safely encoded. Flow overview:

sequenceDiagram
  participant UI as Airflow UI
  participant Link as ErgoTaskDetailLink
  participant View as ErgoView
  participant DB as Metadata DB
  UI->>Link: Build link from task instance
  UI->>View: GET /ergo/task_detail?...
  View->>DB: Query DagRun + ErgoTask
  DB-->>View: Task + Job
  View-->>UI: Render task_detail.html
Loading

✨ The previous automatic initdb hook (wrapping airflow db upgradedb) was removed. If Airflow 3 does not call Ergo migrations elsewhere, consider documenting or wiring an explicit migration step to avoid missing ergo_task/ergo_job tables in new or upgraded deployments.


⚠️ This PR should not be merged until all critical breaking changes are resolved.

Each suggestion is targeted to specific lines in your code. Feel free to ask questions or request clarification on any feedback!

@@ -1,5 +1,4 @@
from airflow.models.baseoperator import BaseOperatorLink
from flask import url_for
from airflow.sdk import BaseOperatorLink

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import urlencode for safe query construction

The link now builds a query string manually; import urlencode so parameters are encoded safely.

Suggested change
from airflow.sdk import BaseOperatorLink
from airflow.sdk import BaseOperatorLink
from urllib.parse import urlencode

Comment thread src/links/ergo_task_detail.py Outdated
Comment on lines +11 to +12
# In Airflow 3, the www module is gone; return a simple URL path
return f'/ergo/task_detail?ti_task_id={operator.task_id}&ti_dag_id={operator.dag_id}&ti_execution_date={dttm}'

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🏢 URL-encode task detail parameters

dttm renders with spaces/timezone and can create malformed URLs. Build the query using urlencode and use an ISO string; also handle None to avoid parse failures in the view.

Suggested change
# In Airflow 3, the www module is gone; return a simple URL path
return f'/ergo/task_detail?ti_task_id={operator.task_id}&ti_dag_id={operator.dag_id}&ti_execution_date={dttm}'
query = urlencode({
"ti_task_id": operator.task_id,
"ti_dag_id": operator.dag_id,
"ti_execution_date": dttm.isoformat() if dttm else ""
})
return f"/ergo/task_detail?{query}"

Comment thread src/migrations/env.py
Comment on lines +83 to 86
cfg,
prefix="sqlalchemy.",
poolclass=pool.NullPool,
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guard against missing alembic config section

config.get_section(...) can return None; assigning to cfg['sqlalchemy.url'] would raise a TypeError and prevent migrations. Use a default dict when the section is missing.

Suggested change
cfg,
prefix="sqlalchemy.",
poolclass=pool.NullPool,
)
cfg = config.get_section(config.config_ini_section) or {}
db_url = os.getenv('AIRFLOW__CORE__SQL_ALCHEMY_CONN')
if db_url:
cfg['sqlalchemy.url'] = db_url

'retries': 2,
'retry_delay': timedelta(minutes=1),
'start_date': days_ago(1),
'start_date': datetime(2024, 1, 1),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🏢 Use timezone-aware start_date

A naive datetime can trigger scheduling warnings and run offsets; use Airflow’s timezone-aware datetime helper.

Suggested change
'start_date': datetime(2024, 1, 1),
'start_date': timezone.datetime(2024, 1, 1),

'retries': 2,
'retry_delay': timedelta(minutes=1),
'start_date': days_ago(1),
'start_date': datetime(2024, 1, 1),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🏢 Use timezone-aware start_date

A naive datetime can trigger scheduling warnings and run offsets; use Airflow’s timezone-aware datetime helper.

Suggested change
'start_date': datetime(2024, 1, 1),
'start_date': timezone.datetime(2024, 1, 1),

@@ -16,7 +15,7 @@
'depends_on_past': False,
'retries': 10,
'retry_delay': timedelta(seconds=30),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🏢 Use timezone-aware start_date

A naive datetime can trigger scheduling warnings and run offsets; use Airflow’s timezone-aware datetime helper.

Suggested change
'retry_delay': timedelta(seconds=30),
'start_date': timezone.datetime(2024, 1, 1),

'retries': 2,
'retry_delay': timedelta(minutes=1),
'start_date': days_ago(1),
'start_date': datetime(2024, 1, 1),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🏢 Use timezone-aware start_date

A naive datetime can trigger scheduling warnings and run offsets; use Airflow’s timezone-aware datetime helper.

Suggested change
'start_date': datetime(2024, 1, 1),
'start_date': timezone.datetime(2024, 1, 1),

joelkbiju12 and others added 8 commits March 9, 2026 11:31
Airflow 3.x blocks Session() in task subprocesses. Created ergo.db module
with direct SQLAlchemy engine to bypass this restriction for custom tables
(ergo_task, ergo_job). Replaced all provide_session imports in operators,
sensors, and triggers.
Airflow 3.x task subprocesses may not properly resolve env var
overrides via conf.get(). Reading os.environ directly is more
reliable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Airflow 3.x block_orm_access() overwrites both os.environ and
conf with "airflow-db-not-allowed:///" in task subprocesses.
Capture the real DB URL at module import time (before the block
runs) so custom table access still works.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Also fix ErgoTaskDetailLink.get_link() signature for Airflow 3.x
(ti_key keyword arg instead of dttm positional).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The FK constraint exists in the DB schema but our standalone
SQLAlchemy engine doesn't know Airflow's tables, causing
NoReferencedTableError on commit.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant