Skip to content

Prefer default language values for some Literal nodes#143

Merged
metaodi merged 4 commits into
ckan:masterfrom
GovDataOfficial:prefer-default-language-values
Jan 8, 2019
Merged

Prefer default language values for some Literal nodes#143
metaodi merged 4 commits into
ckan:masterfrom
GovDataOfficial:prefer-default-language-values

Conversation

@seitenbau-govdata

Copy link
Copy Markdown
Member

If a graph contains multiple languages for a node, a random node is currently used when parsing.

This introduces the option to prefer the configured default language for some nodes:

  • dct:title (dataset and distribution)
  • dct:description (dataset and distribution)
  • foaf:name in dct:publisher
  • vcard:fn in dcat:contactPoint

The changes are related to #51 and #124.

* dct:title (dataset and distribution)
* dct:description (dataset and distribution)
* foaf:name in dct:publisher
* vcard:fn in dcat:contactPoint
@amercader

Copy link
Copy Markdown
Member

Thanks @seitenbau-govdata. I wonder if this behavior (picking the default language) should be the default for all fields, not just the 4 you included here (so not having to pass prefer_default_language=True as this would be the default) . What do you think? cc @metaodi

@seitenbau-govdata

Copy link
Copy Markdown
Member Author

@amercader We thought about that as well. It would make parsing more reproducible as no more random values would be picked in case of multiple languages. But as we weren't sure about the effects of picking the default language for all Literal elemens, we only chose the attributes we needed.

In case we use it for all elements, we would drop the whole use_default_lang parameter.

@metaodi

metaodi commented Dec 19, 2018

Copy link
Copy Markdown
Member

@amercader I think it would make sense to have this as the default for all fields, since the language attribute is used to determine this behavior. So I would drop the prefer_default_language parameter.

We currently use a similar mechanism in #124 to get some values as multilingual dict. But I think it's a bit different there, as we actually change the returned value (dict instead of string).

@seitenbau-govdata

Copy link
Copy Markdown
Member Author

The language handling is used for all Literal values now.

Regarding the fallback behavior, assume two nodes exist in the graph:

  • one without lang attribute
  • one with lang=en
  1. In case no default language is configured, always the lang=en one is picked now.
  2. If a language differing from "en" is configured as default, a random node (more precisely, the first one found) is returned like with the old logic.

What do you think about that behavior @amercader @metaodi ?

@metaodi

metaodi commented Dec 19, 2018

Copy link
Copy Markdown
Member

@seitenbau-govdata I think this is fine.

Why does this only apply to Literal values? I understand that most values will be Literal's anyway, but if another type has a language attribute, the same "rule" should apply. Or am I missing something? If not, I would drop the isinstance condition.

@seitenbau-govdata

Copy link
Copy Markdown
Member Author

@metaodi The reason is that the class Literal is the only class in rdflib at the moment which contains a property language. There are some additional checks (e.g. language value validation) and the mapping from the XML attribute langto the property language in the class Literal. So we just added the check for the type Literal.

@seitenbau-govdata

Copy link
Copy Markdown
Member Author

@amercader and @metaodi Happy New Year! Do you have any comments about our latest comment? We would like to update ckanext-dcat in our installation to the latest version. And these are the last changes that should be included in the new version. So we are able to remove the patched files and close the issue in our repo GovDataOfficial/ckanext-dcatde#6. 😉

@metaodi

metaodi commented Jan 8, 2019

Copy link
Copy Markdown
Member

@seitenbau-govdata I still think it would be more pythonic to not use isinstance, but rather catch a possible AttributeError. But it's really just a minor thing, so I will just merge this now and maybe refactor it later.

@metaodi metaodi merged commit 8f37873 into ckan:master Jan 8, 2019
@seitenbau-govdata

Copy link
Copy Markdown
Member Author

@metaodi Thanks for merging it. Now it would be perfect if we could have a new version from the master. Maybe you could release a new version?

@metaodi

metaodi commented Jan 10, 2019

Copy link
Copy Markdown
Member

@seitenbau-govdata I just released 0.0.9, have fun! 😄

@seitenbau-govdata

Copy link
Copy Markdown
Member Author

@metaodi Thanks a lot!

@seitenbau-govdata seitenbau-govdata deleted the prefer-default-language-values branch January 10, 2019 10:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants