Skip to content

HyPhy v0.2: replace CDS input with genome FASTA + GTF#1265

Merged
d-callan merged 6 commits into
galaxyproject:mainfrom
d-callan:hyphy-update
Jul 4, 2026
Merged

HyPhy v0.2: replace CDS input with genome FASTA + GTF#1265
d-callan merged 6 commits into
galaxyproject:mainfrom
d-callan:hyphy-update

Conversation

@d-callan

@d-callan d-callan commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

Updates the HyPhy workflow suite (CAPHEINE, Core, Preprocessing) to accept a reference genome FASTA and a GTF annotation file instead of a pre-extracted CDS FASTA. This enables gffread to derive coding sequences automatically, aligning the workflows with BRC Analytics' ASSEMBLY_FASTA_URL + GENE_MODEL_URL parameter model.

Changes

  • Workflow files (*.ga)
    • Replaced single reference cds input with reference GTF + reference Fasta inputs
    • Added gffread step to extract CDS from annotated genome
    • Bumped release to 0.2, removed source_metadata
  • Test data
    • Added denv1_genome.fasta (NC_001477.1 full genome, ~10.7 kb)
    • Added denv1_ref.gtf with two gene annotations (capsid protein C: 95–394; prM: 437–934)
    • Retained denv1_ref_cds.fasta for reference but removed from test inputs
  • Test definitions
    • Updated all 3 test YAML files to supply genome FASTA + GTF
    • Updated collection element test keys to match gffread gene_id output (capsid_protein_C, membrane_glycoprotein_precursor_prM)
  • Documentation
    • README.md: updated test data description
    • CHANGELOG.md: documented v0.2 changes

Verification

  • All 4 CAPHEINE planemo test scenarios pass locally with --mulled_containers
  • gffread correctly extracts both CDS regions from the genome + GTF combo

FOR CONTRIBUTOR:

FOR REVIEWERS:

  • .dockstore.yml: file is present and aligned with creator metadata in workflow. ORCID identifiers are strongly encouraged in creator metadata. The .dockstore.yml file is required to run tests
  • Workflow is sufficiently generic to be used with lab data and does not hardcode sample names, reference data and can be run without reading an accompanying tutorial.
  • In workflow: annotation field contains short description of what the workflow does. Should start with This workflow does/runs/performs … xyz … to generate/analyze/etc …
  • In workflow: workflow inputs and outputs have human readable names (spaces are fine, no underscore, dash only where spelling dictates it), no abbreviation unless it is generally understood. Altering input or output labels requires adjusting these labels in the the workflow-tests.yml file as well
  • In workflow: name field should be human readable (spaces are fine, no underscore, dash only where spelling dictates it), no abbreviation unless generally understood
  • Workflow folder: prefer dash (-) over underscore (_), prefer all lowercase. Folder becomes repository in iwc-workflows organization and is included in TRS id
  • Readme explains what workflow does, what are valid inputs and what outputs users can expect. If a tutorial or other resources exist they can be linked. If a similar workflow exists in IWC readme should explain differences with existing workflow and when one might prefer one workflow over another
  • Changelog contains appropriate entries
  • Large files (> 100 KB) are uploaded to zenodo and location urls are used in test file

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

Test Results (powered by Planemo)

Test Summary

Test State Count
Total 7
Passed 6
Error 1
Failure 0
Skipped 0
Errored Tests
  • ❌ hyphy-compare.ga_0
Passed Tests
  • ✅ capheine-core-and-compare.ga_0
  • ✅ capheine-core-and-compare.ga_1
  • ✅ capheine-core-and-compare.ga_2
  • ✅ capheine-core-and-compare.ga_3
  • ✅ hyphy-core.ga_0
  • ✅ hyphy-preprocessing.ga_0

… readmes

- Swap the transposed reference GTF / reference Fasta input annotations in
  the HyPhy preprocessing subworkflow (labels/wiring were already correct)
- Update embedded workflow READMEs (CAPHEINE, Core, Preprocessing) that still
  described the removed 'Reference CDS FASTA' input to the new genome FASTA +
  GTF model with gffread CDS extraction
@mvdbeek

mvdbeek commented Jul 3, 2026

Copy link
Copy Markdown
Member

@d-callan I pushed a small follow-up commit (193ae07) with two documentation fixes found during review — could you please review them when you get a chance?

  1. Swapped input annotations in the HyPhy preprocessing subworkflow: the reference GTF input was annotated "fasta for the genome…" and reference Fasta was annotated "gtf for the genome…". The labels and gffread wiring were already correct (GTF → input, FASTA → genome_fasta), so this only swaps the two description strings so they match their inputs. Applied consistently in the standalone preprocessing workflow and its embedded copies in Core and CAPHEINE.

  2. Stale embedded READMEs: the readme fields inside the .ga files still listed "Reference CDS FASTA" as input Generate an announcement / mission statement  #2. Updated them (CAPHEINE, Core, Preprocessing) to the new genome FASTA + GTF annotation (gffread extracts CDS) model, matching the top-level README.md you already updated.

Net change is 18/18 lines, text-only — no wiring, tool, or test changes. The functional change (GTF+FASTA → gffread → CDS) looks good and the test element keys match the GTF gene_ids.

Two optional, non-blocking items I did not change, up to you:

  • Label capitalization is mixed: reference GTF vs reference Fasta (consider reference FASTA).
  • Pre-existing workflow_outputs labels use underscores (busted_output, fel_output, treefile, etc.) — not introduced here, but the release bump would be a natural time to make them human-readable.

@mvdbeek mvdbeek left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but have a look at the commit i added.

@d-callan d-callan merged commit 5c8d90f into galaxyproject:main Jul 4, 2026
11 of 16 checks passed
@d-callan d-callan deleted the hyphy-update branch July 4, 2026 11:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants