-
Notifications
You must be signed in to change notification settings - Fork 151
Add missing fields #352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
amercader
merged 40 commits into
ckan:master
from
GenomicDataInfrastructure:add-missing-fields
Sep 24, 2025
Merged
Add missing fields #352
Changes from 33 commits
Commits
Show all changes
40 commits
Select commit
Hold shift + click to select a range
e79099b
feat(rdf): support serialization of DatasetSeries from RDF and link c…
22face7
remove fallback. not support by dataseries extension
84ef929
Remove enter
1f5110b
fix: skip fluent for 2.9
2d70ee8
fix import
b061455
check if this works
c9e4626
Disable support for 2.9 in tests
01991b4
check if this works
ed6b696
Merge pull request #1 from GenomicDataInfrastructure/support-RDF-data…
hcvdwerf 806e248
feat(missing field) add missing fields
b1c8193
Add homepage
a6e1e4b
Also serrilaize homepage when available
217da9a
Added retention period to healthDCAT
cd8661b
Fix retention period UT
90dac79
fix test
1294f4a
feat(missing field) add missing fields
501a8de
Added DCAT AP 3 has version
c0efdfc
Merge branch 'add-missing-fields' of https://github.com/hcvdwerf/ckan…
c4ad649
Added has version to DCAT 3 and added missing dataservice fields
163d284
fix import
d8c04df
fix unit tests
63a2749
Added has version to DCAT 3 and added missing dataservice fields
fdbd8bc
update schema
572acbe
Merge branch 'add-missing-fields' of https://github.com/hcvdwerf/ckan…
1985f6e
fiix mapping documentation
3cc8905
Updated documetation for retention period
c03bdd2
fix(dataseries) cardanality for dataseries
cccd727
fix(UT-cardanality) fix UT for cardanality
6d23062
add applicable_legislation to Dataservice
fbbd48e
fix(dataservice (contact & creator)) fix mapping for creator and cont…
4d1e3a0
add mapping + UT for description within dataservice
2abc07c
Add if check by contactpoint
7759e07
Add modified, publisher, license and theme to dataservice
5762c35
fix(dataseries) Remove dataseries from pull request
5cf6942
Remove fluent extension tag
6eec310
Merge branch 'master' into add-missing-fields
hcvdwerf 4485715
Update health_dcat_ap.yaml
hcvdwerf ba081d7
fix: Always store as list when complex object
419c364
fix: parse of creator and contact within acces service
03a5f88
fix(croisant) point to mlcroisant version 1.0.22
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -210,39 +210,18 @@ def gather_stage(self, harvest_job): | |
| return [] | ||
|
|
||
| try: | ||
|
|
||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes done |
||
| source_dataset = model.Package.get(harvest_job.source.id) | ||
|
|
||
| for dataset in parser.datasets(): | ||
| if not dataset.get('name'): | ||
| dataset['name'] = self._gen_new_name(dataset['title']) | ||
| if dataset['name'] in self._names_taken: | ||
| suffix = len([i for i in self._names_taken if i.startswith(dataset['name'] + '-')]) + 1 | ||
| dataset['name'] = '{}-{}'.format(dataset['name'], suffix) | ||
| self._names_taken.append(dataset['name']) | ||
|
|
||
| # Unless already set by the parser, get the owner organization (if any) | ||
| # from the harvest source dataset | ||
| if not dataset.get('owner_org'): | ||
| if source_dataset.owner_org: | ||
| dataset['owner_org'] = source_dataset.owner_org | ||
|
|
||
| # Try to get a unique identifier for the harvested dataset | ||
| guid = self._get_guid(dataset, source_url=source_dataset.url) | ||
|
|
||
| if not guid: | ||
| self._save_gather_error('Could not get a unique identifier for dataset: {0}'.format(dataset), | ||
| harvest_job) | ||
| continue | ||
|
|
||
| dataset['extras'].append({'key': 'guid', 'value': guid}) | ||
| guids_in_source.append(guid) | ||
|
|
||
| obj = HarvestObject(guid=guid, job=harvest_job, | ||
| content=json.dumps(dataset)) | ||
|
|
||
| obj.save() | ||
| object_ids.append(obj.id) | ||
| source_dataset = model.Package.get(harvest_job.source.id) | ||
|
|
||
| series_ids, series_mapping = self._parse_and_collect( | ||
| parser.dataset_series(), | ||
| source_dataset, | ||
| harvest_job, | ||
| guids_in_source, | ||
| is_series=True, | ||
| collect_series_mapping=True | ||
| ) | ||
| object_ids += series_ids | ||
| object_ids += self._parse_and_collect(parser.datasets(series_mapping), source_dataset, harvest_job, guids_in_source, is_series=False) | ||
| except Exception as e: | ||
| self._save_gather_error('Error when processsing dataset: %r / %s' % (e, traceback.format_exc()), | ||
| harvest_job) | ||
|
|
@@ -422,3 +401,70 @@ def import_stage(self, harvest_object): | |
| model.Session.commit() | ||
|
|
||
| return True | ||
|
|
||
| def _parse_and_collect( | ||
| self, | ||
| items, | ||
| source_dataset, | ||
| harvest_job, | ||
| guids_in_source, | ||
| is_series=False, | ||
| collect_series_mapping=False | ||
| ): | ||
| object_ids = [] | ||
| label = "dataset series" if is_series else "dataset" | ||
| series_mapping = {} if collect_series_mapping else None | ||
|
|
||
| for item in items: | ||
| original_title = item.get("title", label) | ||
| if not item.get("name"): | ||
| item["name"] = self._gen_new_name(original_title) | ||
|
|
||
| if item["name"] in self._names_taken: | ||
| suffix = len([i for i in self._names_taken if i.startswith(item["name"] + "-")]) + 1 | ||
| item["name"] = f"{item['name']}-{suffix}" | ||
|
|
||
| self._names_taken.append(item["name"]) | ||
|
|
||
| if not item.get("owner_org") and source_dataset.owner_org: | ||
| item["owner_org"] = source_dataset.owner_org | ||
|
|
||
| guid = self._get_guid(item, source_url=source_dataset.url) | ||
| if not guid: | ||
| self._save_gather_error(f"Could not get a unique identifier for {label}: {item}", harvest_job) | ||
| continue | ||
|
|
||
| item.setdefault("extras", []).append({"key": "guid", "value": guid}) | ||
| guids_in_source.append(guid) | ||
|
|
||
| obj = HarvestObject(guid=guid, job=harvest_job, content=json.dumps(item)) | ||
| obj.save() | ||
| object_ids.append(obj.id) | ||
|
|
||
| # Store mapping of RDF URI to dataset name if requested | ||
| if collect_series_mapping: | ||
| series_uri = item.get("uri") or item.get("identifier") | ||
| if series_uri: | ||
| # Try to find an existing active dataset series by 'guid' match | ||
| existing = model.Session.query(model.Package).\ | ||
| join(model.PackageExtra).\ | ||
| filter(model.PackageExtra.key == 'guid').\ | ||
| filter(model.PackageExtra.value == series_uri).\ | ||
| filter(model.Package.type == 'dataset_series').\ | ||
| filter(model.Package.state == 'active').\ | ||
| first() | ||
|
|
||
| if existing: | ||
| item["name"] = existing.name | ||
|
|
||
| series_mapping[str(series_uri)] = { | ||
| "id": existing.id if existing else item.get("id"), | ||
| "name": item["name"] | ||
| } | ||
|
|
||
|
|
||
| if collect_series_mapping: | ||
| return object_ids, series_mapping | ||
|
|
||
| return object_ids | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed support for CKAN 2.9 in be3c8d6 so this should not be longer necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I revert the test.yml