Feat/sample and analytics by qplevier · Pull Request #372 · ckan/ckanext-dcat

qplevier · 2026-02-26T09:13:47Z

This pull request implements improved handling and serialization of the sample and analytics fields in DCAT-AP and HealthDCAT-AP profiles, aligning them with the latest Euro DCAT-AP 3 and HealthDCAT-AP specifications. It also enhances agent serialization for HealthDCAT-AP with support for additional properties. The changes update both parsing and serialization logic, as well as relevant tests and schemas.

DCAT-AP and Euro DCAT-AP 3:

Added support for the sample field as a list of URIs in both parsing (parse_dataset) and serialization (graph_from_catalog, _graph_from_dataset_v3), including schema updates and test coverage. [1] [2] [3] [4] [5] [6]
Refactored handling of distributions to consistently collect and serialize their URIs in a new distribution field, and improved compatibility tweaks for legacy support. [1] [2] [3] [4]

HealthDCAT-AP:

Added support for the analytics field as a list of URIs, including parsing, serialization, and test updates. [1] [2] [3] [4] [5]
Enhanced agent serialization to include publisherNote and publisherType properties, with multilingual support and proper RDF output.

Testing and Configuration:

Updated and added tests to reflect the new handling of sample and analytics fields, and adjusted test configuration for database connectivity. [1] [2] [3] [4]

These changes ensure better compliance with the latest DCAT-AP standards and improve interoperability and data quality for CKAN-based data catalogs.

Extract distribution parsing into _parse_distribution and use it when building dataset dicts; collect distribution URIs into dataset_dict["distribution"] and emit DCAT.distribution triples when graphing. Add ADMS.sample handling in DCAT-AP3 parsing and round‑trip graph serialization for dataset samples. Extend Health DCAT-AP profile to parse/serialize analytics distributions and include publisherNote/publisherType on agents, with helpers to read/write those properties. Update test to use .get() for analytics presence. Overall reduces duplicated distribution parsing code and adds support for sample/analytics agent metadata.

Add handling for the dataset 'sample' property in the European DCAT-AP 3 profile by using _add_list_triples_from_dict with ADMS.sample (allowing URI or literal). Update schema (dcat_ap_full.yaml) to include a 'sample' dataset field. Adjust tests: remove legacy v2 sample assertions, add parsing/serialization checks for v3, and update the example dataset JSON to include sample values. These changes enable parsing and serializing ADMS.sample values for DCAT-AP v3 datasets.

amercader

@qplevier The sample and analytics handling looks good, but the logic around distribution URIs I'm not sure about.

amercader · 2026-05-22T13:02:11Z

+            distribution_uris.append(str(distribution))
+
+        if distribution_uris:
+            dataset_dict["distribution"] = distribution_uris


I'm confused about this, Why do we need it?

amercader · 2026-05-22T13:05:09Z

+        for dist_uri in dataset_dict.get("distribution", []):
+            if dist_uri:
+                g.add((dataset_ref, DCAT.distribution, URIRef(dist_uri)))
+


Why is this done? IIUC dataset["distribution"] is an internal property used during parsing, that should not be part of the output dataset_dict, so it shouldn't be available when serializing (or at least we shouldn't rely on it being present)
Besides the reference between datasets and each distribution is already added on line 608

amercader · 2026-05-22T13:07:08Z

        # Resources
+        distribution_uris = []
        for distribution in self._distributions(dataset_ref):
-


I know the method is long but I'd like to keep this in the _parse_dataset_base() method for now to not break other profiles

qplevier added 3 commits February 26, 2026 10:07

Update dcat_ap_full.yaml

7eaddee

hcvdwerf approved these changes Feb 26, 2026

View reviewed changes

Update CKAN config file path for test-core.ini

18150b0

amercader reviewed May 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/sample and analytics#372

Feat/sample and analytics#372
qplevier wants to merge 4 commits into
ckan:masterfrom
GenomicDataInfrastructure:feat/sample-and-analytics

qplevier commented Feb 26, 2026

Uh oh!

amercader left a comment

Uh oh!

amercader May 22, 2026

Uh oh!

amercader May 22, 2026

Uh oh!

amercader May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

qplevier commented Feb 26, 2026

Uh oh!

amercader left a comment

Choose a reason for hiding this comment

Uh oh!

amercader May 22, 2026

Choose a reason for hiding this comment

Uh oh!

amercader May 22, 2026

Choose a reason for hiding this comment

Uh oh!

amercader May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants