Skip to content

Multilingual support in DCAT profiles#318

Merged
amercader merged 10 commits into
masterfrom
multilingual
Oct 31, 2024
Merged

Multilingual support in DCAT profiles#318
amercader merged 10 commits into
masterfrom
multilingual

Conversation

@amercader

Copy link
Copy Markdown
Member

This builds on excellent code started by @stefina and @JVickery-TBS in #124 and #240 respectively, but adapting it to the current profiles and generalizing it for maximum compatibility.

Multilingual support is provided via integration with ckanext-fluent, the supported way of implementing translations for CKAN fields.

At the serialization level, a new triple will be added for each of the defined languages (if the translation is present):

@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix vcard: <http://www.w3.org/2006/vcard/ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<https://example.org/dataset/0112cf32-bce0-4071-9504-923375f9f2ad> a dcat:Dataset ;
    dct:title "Conjunt de dades de prova DCAT"@ca,
        "Test DCAT dataset"@en,
        "Conjunto de datos de prueba DCAT"@es ;
    dct:description "Una descripció qualsevol"@ca,
        "Some description"@en,
        "Una descripción cualquiera"@es ;
    dct:language "ca",
        "en",
        "es" ;
    dct:provenance [ a dct:ProvenanceStatement ;
        rdfs:label "Una declaració sobre la procedència"@ca,
            "Statement about provenance"@en,
            "Una declaración sobre la procedencia"@es ] ;

When parsing, the parsers will import properties from DCAT serializations in the expected format if the field is defined as fluent in
the schema:

{
    "name": "test-dataset",
    "provenance": {
        "en": "Statement about provenance",
        "ca": "Una declaració sobre la procedència",
        "es": "Una declaración sobre la procedencia"
    }
}

As implemented in #124, if one of the languages is missing in the DCAT serialization, an empty string will be returned for that language. Also if the DCAT serialization does not define the language used, the default CKAN language will be used (ckan.locale_default).

@JVickery-TBS this covers most of your changes in #240 except for the handling of translated fields in publishers / organizations. As it's difficult to come up with a logic that works in the many different scenarios, this is best suited in a small custom profile. But let me know if I missed anything else besides this issue.

cc @seitenbau-govdata

`_add_triple_from_dict()` will check if the value is a dict and assume
it's a fluent field (i.e `{"lang1": "value_lang1", "lang2":
`value_lang2"}. `URIRefOrLiteral` also supports a lang parameter
Created multilingual versions of _object_value() and
_object_value_list() that store the different translations in the format
expected by the fluent fields, e.g.:

{
    "en": "Dataset title",
    "es": "Título del conjunto de datos"
}

and for tags:

{
    "en": ["Oaks", "Pines"],
    "es": ["Robles", "Pinos"],
}

Core fields (those ending in `_translated` are handled separately)
@amercader amercader merged commit ac1c34b into master Oct 31, 2024
@amercader amercader deleted the multilingual branch October 31, 2024 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant