Skip to content

Health data set upload#234

Open
reanbrenda wants to merge 8 commits into
mainfrom
health-upload-dagster-flow
Open

Health data set upload#234
reanbrenda wants to merge 8 commits into
mainfrom
health-upload-dagster-flow

Conversation

@reanbrenda
Copy link
Copy Markdown
Collaborator

What type of PR is this?

  • build: Commits that affect build components like build tool, dependencies, project
    version
  • chore: Miscellaneous commits (e.g. modifying .gitignore)
  • ci: Commits are special build commits that affect the CI/CD pipeline
  • docs: Commits that affect documentation only
  • feat: Commits that add a new feature
  • fix: Commits that fix a bug
  • perf: Commits are special refactor commits that improve performance
  • refactor: Commits that rewrite/restructure your code, however does not change any
    behaviour
  • revert: Commits that revert another commit/PR, usually can be autogenerated on
    GitHub or using git revert
  • style: Commits are special refactor commits that edit the code to comply with a
    code style, linter, or formatter
  • test: Commits that add missing tests or correcting existing tests

Summary

What does this PR do
This PR adds a dedicated health dataset upload flow to the Giga Data Ingestion portal and API.

What does this PR do:

New health upload UI path — Adds a "Health dataset" button on the upload landing page, routing users through a two-step flow: CSV file selection → health metadata form → submit
Health metadata form — Replicates the school metadata structure with health-specific labels (focal point, data owner, year, dataset description, modality, facility ID type) using a shared base form component
Health-specific facility ID options — Dropdown uses dhis2_id, hims_id, hfml_id, Other, Unknown instead of school ID types (EMIS, Examination code)
API: dedicated health storage path — Files uploaded with portal_dataset: "health" are stored at updated_master_schema/health-master//.csv with sidecar metadata JSON
Health tab on upload list — Health uploads appear under a new "Health" tab filtered by dataset=health
No portal DQ — Health uploads skip in-portal data quality checks (dq_status = SKIPPED); validation and enrichment happen downstream in Dagster
Success screen — Informs users that downstream staging in Azure Data Lake and Dagster will process the file
Copy improvements — Capitalizes "Health Master" as a proper noun; softens metadata prompt wording
Column-mapping refactor — Extracts a reusable BaseUploadMetadataForm component shared between school and health flows

How to test

  1. Instructions on how to test
  2. Specify which files to review
  3. etc.

Link to Jira/Asana/Airtable task (if applicable)

placeholder

Wireframe screenshot/screencap (if applicable)

placeholder

Implementation screenshot/screencap (if applicable)

placeholder

reanbrenda and others added 3 commits May 6, 2026 14:21
Add a dedicated health upload path in the portal and API, including health metadata capture, blob path routing, timestamped filenames, and sidecar metadata persistence so health uploads are traceable and consistent with existing ingestion behavior.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown
Collaborator

@brianmusisi brianmusisi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Look at the comments I've shared and ajdust everything related to this. This should be simple: Upload file -> dump into the location. Nothing else
  2. Remember that we need to select a country because the files need to go into each country's specific folder

Comment thread api/data_ingestion/routers/upload.py
@@ -0,0 +1,266 @@
import type { ReactNode } from "react";
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear on what the logic in this file does

"Unknown",
] as const;

const healthIdTypeOptions = [
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, we are only uploading a file, this should not be relevant. It is just a data file dump

],
};

/** Same field `name`s as school mapping (Zod / API JSON keys); health-specific labels and section titles. */
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, not required

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Care with this file refactor, it's deleting the content of this file which affects other uploads flow that uses this mapping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants