Skip to content

CSW harvester OutputSchema config support #258#259

Open
ccancellieri wants to merge 9 commits into
ckan:masterfrom
ccancellieri:csw-output-schema
Open

CSW harvester OutputSchema config support #258#259
ccancellieri wants to merge 9 commits into
ckan:masterfrom
ccancellieri:csw-output-schema

Conversation

@ccancellieri

@ccancellieri ccancellieri commented Oct 22, 2021

Copy link
Copy Markdown
Contributor

This will close #258 adding support to an additional param into the csw json config:

"output_schema": "mdb"

mdb is the namespace of the schema to use (in this case it's an iso19115-3.2018)

{'mdb':'http://standards.iso.org/iso/19115/-3/mdb/2.0'}

Full Example below:

{
"user":"ckan_admin",
"cql": "dc:identifier = '0-----292--------------------------'",
"output_schema": "mdb",
"default_tags": [ ],
"default_extras": {},
"group_mapping": {},
"read_only": false
}

Doing this the CSW harvester will receive the metadata in the configured outputschema (must be supported by the target csw server).

@ccancellieri

Copy link
Copy Markdown
Contributor Author

Can also help
#209
#210
#219

@amercader amercader left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good and useful @ccancellieri. I just added some minor comments



# load config
self._set_source_config(harvest_object.source.config)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you document the new output_schema option and its default value in here so others are aware of it?

https://github.com/ckan/ckanext-spatial/blob/master/doc/harvesters.rst

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added fallback to default in case the server is not supporting iso19139 -> 19115 transformation
the fallback will log and switch back to default asking for iso19139 -> iso19139.

Comment thread ckanext/spatial/lib/csw_client.py Outdated
self.sortby = SortBy([SortProperty('dc:identifier')])
# check capabilities
_cap = self.getcapabilities(endpoint)['response']
self.capabilities=etree.ElementTree(etree.fromstring(_cap))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please try to follow PEP8 guidelines, specially spacing between = and , :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I can't validate the whole project and my code editor is not helping me, good catch, I'll try to fix my bad.

Comment thread ckanext/spatial/lib/csw_client.py Outdated
csw = self._ows(**kw)

# fetch target csw server capabilities for requested output schema
output_schemas=self._get_output_schemas('GetRecords')

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this call to the __init__() method to avoid duplication and multiple calls to GetCapabilities?
Something like:

def __init__(self, endpoint=None):
    _cap = self.getcapabilities(endpoint)['response']
    self.capabilities = etree.ElementTree(etree.fromstring(_cap))
    self.output_schemas = {
        'GetRecords': self._get_output_schemas('GetRecords'),
        'GetRecordById': self._get_output_schemas('GetRecordById'),
    }

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

# fetch target csw server capabilities for requested output schema
output_schemas=output_schemas = self.output_schemas['GetRecordById']
if not output_schemas.get(outputschema):
raise CswError('Output schema \'{}\' not supported by target server: '.format(output_schemas))

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably here I should be more tolerant Logging ERROR and returning.

@frafra

frafra commented Feb 18, 2022

Copy link
Copy Markdown
Contributor

This is great :) Do you need any help with this PR?

@frafra

frafra commented Mar 1, 2022

Copy link
Copy Markdown
Contributor

I get this generic error after applying this PR (rebased on master) : Error contacting the CSW server: can only parse strings. I think there is a problem with the changes made to the __init__ function of CswService.

@ccancellieri

Copy link
Copy Markdown
Contributor Author

Ciao @frafra thanks to look into this.
I think something bad could happen here:

record = self._xmd(etree.fromstring(csw.response))

Would you be able to check the response provided by the server?

I'm apologize but I'm not using this plugin anymore, I changed approach, so my help can be very limited on this.

@frafra

frafra commented Mar 4, 2022

Copy link
Copy Markdown
Contributor

@ccancellieri I think you are right, I will look into that.
Which approach have you taken, If I may ask? I am interested into harvesting data from GeoNetwork too.

@ccancellieri

ccancellieri commented Mar 7, 2022 via email

Copy link
Copy Markdown
Contributor Author

@frafra

frafra commented Mar 8, 2022

Copy link
Copy Markdown
Contributor

markstuart added a commit to data-govt-nz/ckanext-spatial that referenced this pull request Jun 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Using CSW harvester OutputSchema is ignored while gmd is imposed

3 participants