galaxyproject · mertydn · May 9, 2026 · May 18, 2026 · May 18, 2026 · May 18, 2026
diff --git a/workflows/paleogenomics/aDNA-analysis/.dockstore.yml b/workflows/paleogenomics/aDNA-analysis/.dockstore.yml
@@ -0,0 +1,11 @@
+version: 1.2
+workflows:
+- name: aDNA-analysis
+  subclass: Galaxy
+  publish: true
+  primaryDescriptorPath: /aDNA-analysis.ga
+  testParameterFiles:
+  - /aDNA-analysis-tests.yml
+  authors:
+  - name: Ali Mert AYDIN
+    orcid: "https://orcid.org/0009-0008-9038-0815"
diff --git a/workflows/paleogenomics/aDNA-analysis/CHANGELOG.md b/workflows/paleogenomics/aDNA-analysis/CHANGELOG.md
@@ -0,0 +1,5 @@
+# Changelog
+
+## [0.1] - 2026-05-09
+
+- First release.
diff --git a/workflows/paleogenomics/aDNA-analysis/README.md b/workflows/paleogenomics/aDNA-analysis/README.md
@@ -0,0 +1,72 @@
+# Ancient DNA analysis pipeline
+This workflow performs an ancient DNA (aDNA) based analysis similar to the one in the [nf-core/eager](https://nf-co.re/eager/2.5.3/) workflow. nf-core/eager is a bioinformatics best-practise processing pipeline for genomic NGS sequencing data, with a focus on ancient DNA data. It is ideal for the (palaeo)genomic analysis of humans, animals, plants, microbes and even microbiomes.
+
+The pipeline pre-processes raw data from FASTQ inputs, or preprocessed BAM inputs. It can align reads and performs extensive general NGS and aDNA specific quality-control on the results.
+
+
+## Required Inputs
+To run this workflow successfully, you need to provide the following input datasets:
+
+* **`InputReads` :** The raw sequencing data for your sample in `FASTQ` format. The workflow supports both single-end and paired-end reads.
+* **`ReferenceGenome` :** The reference genome sequence for your target organism in `FASTA` format. This is essential for read mapping (BWA) and variant calling.
+* **`HapMapChrXReference` :** A reference HapMap dataset file. This is required by the ANGSD tool to estimate nuclear X-chromosome contamination in human ancient DNA samples.
+
+
+## Workflow Steps
+By default the pipeline currently performs the following:
+
+## 1. Preprocessing and Quality Control
+* **Format Conversion:** Converts input files from BAM/SAM format to FASTQ format (`Picard SamToFastq`)
+* **Quality Control:** Evaluates read quality before and after trimming (`FastQC`)
+* **Adapter Trimming:** Removes adapter sequences and merges paired-end reads (`AdapterRemoval`)
+
+## 2. Read Mapping and Processing
+* **Alignment:** Maps reads to the provided reference genome (`BWA`)
+* **Filtering and Statistics:** Separates unmapped reads and calculates alignment statistics (`Samtools View and Flagstat`)
+* **Duplicate Removal:** Detects and marks PCR duplicates (`Picard MarkDuplicates`)
+* **Alignment Quality:** Generates detailed BAM quality metrics (`QualiMap BamQC`)
+* **Library Complexity:** Estimates library complexity (`Preseq`)
+
+## 3. Ancient DNA (aDNA) Analysis
+* **Damage Profiling:** Visualizes aDNA-specific C-to-T damage patterns (`mapDamage`)
+* **Endogenous Content:** Calculates the proportion of endogenous (target) DNA in the sample (`EndorSpy`)
+* **Contamination:** Estimates nuclear X-chromosome contamination using HapMap data (`ANGSD X-Contamination`)
+
+## 4. Biological Information
+* **Sex Determination:** Determines biological sex based on relative chromosome coverage ratio (`Sex.DetERRmine`)
+* **Mt/Nuc Ratio:** Calculates the ratio of mitochondrial reads to nuclear reads (`MtNucRatioCalculator`)
+
+## 5. Genotyping
+* **Variant Analysis:** Performs variant calling to generate VCF files (`FreeBayes`)
+* **Variant Statistics:** Calculates statistics for the generated variants (`Bcftools stats`)
+
+## 6. Metagenomic Screening (For Unmapped Reads)
+* **Read Extraction:** Extracts unmapped reads for microbial analysis (`Picard SamToFastq`)
+* **Quality Filter:** Filters low-complexity sequences (`BBTools BBduk`)
+* **Taxonomic Classification:** Performs microbiome/taxonomic screening without alignment (`Kraken2`)
+
+## 7. Reporting
+* **Summary Report:** Aggregates logs and statistics from all these tools into a single interactive HTML report (`MultiQC`)
+
+
+## Workflow Outputs
+Upon successful execution, the workflow explicitly highlights and provides the following final files for analysis:
+
+* **`MultiQC Report` :** An interactive HTML report aggregating QC and analysis logs from all tools.
+* **`QualiMap BamQC Report` :** A detailed HTML report containing mapping quality metrics, GC content, and coverage statistics.
+* **`mapDamage Visualisation` :** Visual plots displaying the characteristic C-to-T deamination patterns at the ends of ancient DNA reads.
+* **`Kraken2 Report `:** A tabular report showing the taxonomic classification of unmapped reads.
+* **`EndorSpy Report `:** A JSON file containing the calculated endogenous DNA percentage.
+* **`Sex.DetERRmine Report` :** A JSON file containing relative chromosomal coverage and the calculated biological sex metrics.
+* **`Mt/Nuc Ratio Report` :** A JSON file containing the calculated ratio between mitochondrial and nuclear reads.
+* **`ANGSD Contamination Report` :** A tabular text file detailing the estimates of nuclear X-chromosome contamination.
+* **`Bcftools Stats Report` :** A text file containing comprehensive summary statistics for the called variants (VCF).
+
+
+## Testing Data
+To ensure the workflow functions correctly, it was validated using the following datasets and databases:
+
+* **`Primary Test Data` :** The [JK2067](https://github.com/nf-core/test-datasets/blob/eager/testdata/Human/bam/JK2067.bam) BAM file (HiSeq 1240k captured UDG-half single-end libraries containing approximately 10,000 reads post-clipping) obtained from Lamnidis et al., 2018, Nat. Comms.
+* **`Primary Reference Genome` :** The [hs37d5_chr21-MT.fa.gz](https://github.com/nf-core/test-datasets/blob/eager/reference/Human/hs37d5_chr21-MT.fa.gz) file was utilized as the primary reference genome sequence.
+* **`X-Chromosome Contamination Reference` :** The [HapMap ChrX](https://github.com/ANGSD/angsd/blob/master/RES/HapMapChrX.gz) dataset was provided as the initial reference for the estimation of X-chromosome contamination using the ANGSD tool.
+* **`Taxonomic Classification Database` :** The Minikraken v2 database was utilized to perform taxonomic classification via Kraken2.
diff --git a/workflows/paleogenomics/aDNA-analysis/aDNA-analysis-tests.yml b/workflows/paleogenomics/aDNA-analysis/aDNA-analysis-tests.yml
@@ -0,0 +1,77 @@
+- doc: Test outline for aDNA-analysis.ga
+  job:
+    InputReads:
+      class: File
+      location: https://github.com/nf-core/test-datasets/raw/eager/testdata/Human/bam/JK2067.bam
+      filetype: bam
+    ReferenceGenome:
+      class: File
+      location: https://github.com/nf-core/test-datasets/raw/eager/reference/Human/hs37d5_chr21-MT.fa.gz
+      filetype: fasta.gz
+    HapMapChrXReference:
+      class: File
+      location: https://github.com/ANGSD/angsd/raw/master/RES/HapMapChrX.gz
+      filetype: gz
+  outputs:
+    EndorSpyReport:
+      asserts:
+        has_text:
+            text: "percent_on_target"
+    SexdetERRmineReport:
+      asserts:
+        has_text:
+            text: "Sex.DetERRmine"
+    MtNucReport:
+      asserts:
+        has_text:
+            text: "mtnuccalculator"
+    BamQCReport:
+      asserts:
+        has_text:
+            text: "Qualimap Report: BAM QC"
+    DamageVisualisation:
+      element_tests:
+        dnacomp:
+          asserts:
+            has_text:
+              text: "mapDamage"
+        misincorporation:
+          asserts:
+            has_text:
+              text: "mapDamage"
+        5pCtoT_freq:
+          asserts:
+            has_text:
+              text: "5pC>T"
+        3pGtoA_freq:
+          asserts:
+            has_text:
+              text: "3pG>A"
+        Fragmisincorporation_plot:
+          asserts:
+            has_size:
+              min: 100
+        lgdistribution:
+          asserts:
+            has_text:
+              text: "mapDamage"
+        Length_plot:
+          asserts:
+            has_size:
+              min: 100
+    ANGSDReport:
+      asserts:
+        has_text:
+            text: "Method1_MOM_estimate"
+    bcftoolsReport:
+      asserts:
+        has_text:
+            text: "ACT>TCGA"
+    Kraken2Report:
+      asserts:
+        has_text:
+            text: "root"
+    MultiQCReport:
+      asserts:
+        has_text:
+            text: "MultiQC"