Describe a language description (model, grammar)

In this section you will find information on how to describe a language description (model, grammar) with the minimal metadata in order to register it in the ELG platform. If you want to find more on the ELG resource types, see Overview. You will also find instructions for all data resources(technical requirements, registration instructions to the platform) in Provide a Language Resource.

Under language descriptions, we comprise:

  • models, including Machine Learning models, statistical models, word embeddings, n-gram models,
  • computational grammars of a language, language variety or for a specific domain or phenomenon.

The vast majority of these consist of a text part, but videos and images are also foreseen for cases such as sign language grammars.

Examples of metadata records for language descriptions

Monolingual computational grammar for a specific domain: Tourism Italian grammar Published at: https://live.european-language-grid.eu/catalogue/#/resource/service/ld/901

<?xml version="1.0" encoding="UTF-8"?>
<ms:MetadataRecord xmlns="http://w3id.org/meta-share/meta-share/" xmlns:datacite="http://purl.org/spar/datacite/" xmlns:dcat="http://www.w3.org/ns/dcat#" xmlns:ms="http://w3id.org/meta-share/meta-share/" xmlns:omtd="http://w3id.org/meta-share/omtd-share/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://w3id.org/meta-share/meta-share/ ../../Schema/ELG-SHARE.xsd">
        <ms:MetadataRecordIdentifier ms:MetadataRecordIdentifierScheme="http://w3id.org/meta-share/meta-share/elg">value automatically assigned - leave as is</ms:MetadataRecordIdentifier>
        <ms:metadataCreationDate>2020-10-03</ms:metadataCreationDate>
        <ms:metadataCurator>
                <ms:actorType>Person</ms:actorType>
                <ms:surname xml:lang="en">Smith</ms:surname>
                <ms:givenName xml:lang="en">John</ms:givenName>
                <ms:email>username@someDomain.com</ms:email>
        </ms:metadataCurator>
        <ms:compliesWith>http://w3id.org/meta-share/meta-share/ELG-SHARE</ms:compliesWith>
        <ms:metadataCreator>
                <ms:actorType>Person</ms:actorType>
                <ms:surname xml:lang="en">Smith</ms:surname>
                <ms:givenName xml:lang="en">John</ms:givenName>
                <ms:email>username@someDomain.com</ms:email>
        </ms:metadataCreator>
        <ms:DescribedEntity>
                <ms:LanguageResource>
                        <ms:entityType>LanguageResource</ms:entityType>
                        <ms:resourceName xml:lang="en">Tourism Italian grammar</ms:resourceName>
                        <ms:resourceShortName xml:lang="en">Tour.ita.grm</ms:resourceShortName>
                        <ms:description xml:lang="en">Tourism Italian abnf grammar, manually created. Created within the Portdial project</ms:description>
                        <ms:version>v1.0.0 (automatically assigned)</ms:version>
                        <ms:additionalInfo>
                                <ms:landingPage>https://sites.google.com/site/portdial2</ms:landingPage>
                        </ms:additionalInfo>
                        <ms:additionalInfo>
                                <ms:email>contact@someDomain.com</ms:email>
                        </ms:additionalInfo>
                        <ms:contact>
                                <ms:Person>
                                        <ms:actorType>Person</ms:actorType>
                                        <ms:surname xml:lang="en">Potamianos</ms:surname>
                                        <ms:givenName xml:lang="en">Alex</ms:givenName>
                                        <ms:email>contact@someDomain.com</ms:email>
                                </ms:Person>
                        </ms:contact>
                        <ms:keyword xml:lang="en">languagedescription</ms:keyword>
                        <ms:fundingProject>
                                <ms:projectName xml:lang="en">Portdial</ms:projectName>
                        </ms:fundingProject>
                        <ms:LRSubclass>
                                <ms:LanguageDescription>
                                        <ms:lrType>LanguageDescription</ms:lrType>
                                        <ms:LanguageDescriptionSubclass>
                                                <ms:Grammar>
                                                        <ms:ldSubclassType>Grammar</ms:ldSubclassType>
                                                        <ms:encodingLevel>http://w3id.org/meta-share/meta-share/morphology</ms:encodingLevel>
                                                </ms:Grammar>
                                        </ms:LanguageDescriptionSubclass>
                                        <ms:LanguageDescriptionMediaPart>
                                                <ms:LanguageDescriptionTextPart>
                                                        <ms:ldMediaType>LanguageDescriptionTextPart</ms:ldMediaType>
                                                        <ms:mediaType>http://w3id.org/meta-share/meta-share/text</ms:mediaType>
                                                        <ms:lingualityType>http://w3id.org/meta-share/meta-share/monolingual</ms:lingualityType>
                                                        <ms:language>
                                                                <ms:languageTag>it</ms:languageTag>
                                                                <ms:languageId>it</ms:languageId>
                                                        </ms:language>
                                                        <ms:metalanguage>
                                                                <ms:languageTag>und</ms:languageTag>
                                                                <ms:languageId>und</ms:languageId>
                                                        </ms:metalanguage>
                                                </ms:LanguageDescriptionTextPart>
                                        </ms:LanguageDescriptionMediaPart>
                                        <ms:DatasetDistribution>
                                                <ms:DatasetDistributionForm>http://w3id.org/meta-share/meta-share/downloadable</ms:DatasetDistributionForm>
                                                <ms:accessLocation>http://accessURL</ms:accessLocation>
                                                <ms:licenceTerms>
                                                        <ms:licenceTermsName xml:lang="en">CC-BY-SA-4.0</ms:licenceTermsName>
                                                        <ms:licenceTermsURL>https://spdx.org/licenses/CC-BY-SA-4.0.html</ms:licenceTermsURL>
                                                </ms:licenceTerms>
                                        </ms:DatasetDistribution>
                                        <ms:personalDataIncluded>false</ms:personalDataIncluded>
                                        <ms:sensitiveDataIncluded>false</ms:sensitiveDataIncluded>
                                </ms:LanguageDescription>
                        </ms:LRSubclass>
                </ms:LanguageResource>
        </ms:DescribedEntity>
</ms:MetadataRecord>

N-gram model: PANACEA Environment Corpus n-grams EL (Greek) Published at: https://live.european-language-grid.eu/catalogue/#/resource/service/ld/900

<?xml version="1.0" encoding="UTF-8"?>
<ms:MetadataRecord xmlns="http://w3id.org/meta-share/meta-share/" xmlns:datacite="http://purl.org/spar/datacite/" xmlns:dcat="http://www.w3.org/ns/dcat#" xmlns:ms="http://w3id.org/meta-share/meta-share/" xmlns:omtd="http://w3id.org/meta-share/omtd-share/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://w3id.org/meta-share/meta-share/ ../../Schema/ELG-SHARE.xsd">
        <ms:MetadataRecordIdentifier ms:MetadataRecordIdentifierScheme="http://w3id.org/meta-share/meta-share/elg">value automatically assigned - leave as is</ms:MetadataRecordIdentifier>
        <ms:metadataCreationDate>2020-10-03</ms:metadataCreationDate>
        <ms:metadataCurator>
                <ms:actorType>Person</ms:actorType>
                <ms:surname xml:lang="en">Smith</ms:surname>
                <ms:givenName xml:lang="en">John</ms:givenName>
                <ms:email>username@someDomain.com</ms:email>
        </ms:metadataCurator>
        <ms:compliesWith>http://w3id.org/meta-share/meta-share/ELG-SHARE</ms:compliesWith>
        <ms:metadataCreator>
                <ms:actorType>Person</ms:actorType>
                <ms:surname xml:lang="en">Smith</ms:surname>
                <ms:givenName xml:lang="en">John</ms:givenName>
                <ms:email>username@someDomain.com</ms:email>
        </ms:metadataCreator>
        <ms:DescribedEntity>
                <ms:LanguageResource>
                        <ms:entityType>LanguageResource</ms:entityType>
                        <ms:resourceName xml:lang="en">PANACEA Environment Corpus n-grams EL (Greek)</ms:resourceName>
                        <ms:description xml:lang="en">PANACEA Environment Corpus n-grams EL (Greek) 1.0 contains Greek word n-grams and Greek word/tag/lemma n-grams in the "Environment" (ENV) domain. N-grams are accompanied by their observed frequency counts. The length of the n-grams ranges from unigrams (single words) to five-grams. The data were collected in the context of PANACEA (http://www.panacea-lr.eu), an EU-FP7 Funded Project under Grant Agreement 248064.
The n-gram counts were generated from crawled Web pages that were automatically detected to be in the Greek language and were automatically classified as relevant to the ENV domain. The collection consisted of approximately 31.71 million tokens. Data collection took place in the summer of 2011.</ms:description>
                        <ms:version>v1.0</ms:version>
                        <ms:additionalInfo>
                                <ms:landingPage>http://nlp.ilsp.gr/panacea/D4.3/data/201209/gms/env_el/README.txt</ms:landingPage>
                        </ms:additionalInfo>
                        <ms:additionalInfo>
                                <ms:email>contact@someDomain.com</ms:email>
                        </ms:additionalInfo>
                        <ms:contact>
                                <ms:Person>
                                        <ms:actorType>Person</ms:actorType>
                                        <ms:surname xml:lang="en">Prokopidis</ms:surname>
                                        <ms:givenName xml:lang="en">Prokopis</ms:givenName>
                                        <ms:email>contact@someDomain.com</ms:email>
                                </ms:Person>
                        </ms:contact>
                        <ms:contact>
                                <ms:Person>
                                        <ms:actorType>Person</ms:actorType>
                                        <ms:surname xml:lang="en">Papavassiliou</ms:surname>
                                        <ms:givenName xml:lang="en">Vassilis</ms:givenName>
                                        <ms:email>contact@someDomain.com</ms:email>
                                </ms:Person>
                        </ms:contact>
                        <ms:keyword xml:lang="en">corpus</ms:keyword>
                        <ms:domain>
                                <ms:categoryLabel xml:lang="en">environment</ms:categoryLabel>
                        </ms:domain>
                        <ms:resourceCreator>
                                <ms:Organization>
                                        <ms:actorType>Organization</ms:actorType>
                                        <ms:organizationName xml:lang="en">Institute for Language and Speech Processing</ms:organizationName>
                                        <ms:website>http://www.ilsp.gr</ms:website>
                                </ms:Organization>
                        </ms:resourceCreator>
                        <ms:creationStartDate>2011-06-01</ms:creationStartDate>
                        <ms:creationEndDate>2011-08-31</ms:creationEndDate>
                        <ms:fundingProject>
                                <ms:projectName xml:lang="en">Platform for Automatic, Normalized Annotation and Cost-Effective Acquisition of Language Resources for Human Language </ms:projectName>
                                <ms:website>http://www.panacea-lr.eu</ms:website>
                        </ms:fundingProject>
                        <ms:LRSubclass>
                                <ms:LanguageDescription>
                                        <ms:lrType>LanguageDescription</ms:lrType>
                                        <ms:LanguageDescriptionSubclass>
                                                <ms:NGramModel>
                                                        <ms:ldSubclassType>NGramModel</ms:ldSubclassType>
                                                        <ms:baseItem>http://w3id.org/meta-share/meta-share/word</ms:baseItem>
                                                        <ms:order>5</ms:order>
                                                </ms:NGramModel>
                                        </ms:LanguageDescriptionSubclass>
                                        <ms:LanguageDescriptionMediaPart>
                                                <ms:LanguageDescriptionTextPart>
                                                        <ms:ldMediaType>LanguageDescriptionTextPart</ms:ldMediaType>
                                                        <ms:mediaType>http://w3id.org/meta-share/meta-share/text</ms:mediaType>
                                                        <ms:lingualityType>http://w3id.org/meta-share/meta-share/monolingual</ms:lingualityType>
                                                        <ms:language>
                                                                <ms:languageTag>el</ms:languageTag>
                                                                <ms:languageId>el</ms:languageId>
                                                        </ms:language>
                                                        <ms:metalanguage>
                                                                <ms:languageTag>und</ms:languageTag>
                                                                <ms:languageId>und</ms:languageId>
                                                        </ms:metalanguage>
                                                        <ms:creationDetails xml:lang="en">automatic web crawling, automatic language detection, data preprocessing (boilerpipe filtering, lemmatization &amp; tagging)</ms:creationDetails>
                                                </ms:LanguageDescriptionTextPart>
                                        </ms:LanguageDescriptionMediaPart>
                                        <ms:DatasetDistribution>
                                                <ms:DatasetDistributionForm>http://w3id.org/meta-share/meta-share/downloadable</ms:DatasetDistributionForm>
                                                <ms:accessLocation>http://metashare.ilsp.gr:8080/repository/download/490952dc1cec11e2b545842b2b6a04d78dc202de28d5421f91752610a781175e</ms:accessLocation>
                                                <ms:distributionTextFeature>
                                                        <ms:size>
                                                                <ms:amount>435189</ms:amount>
                                                                <ms:sizeUnit>http://w3id.org/meta-share/meta-share/unigram</ms:sizeUnit>
                                                        </ms:size>
                                                        <ms:size>
                                                                <ms:amount>3.860716E6</ms:amount>
                                                                <ms:sizeUnit>http://w3id.org/meta-share/meta-share/bigram</ms:sizeUnit>
                                                        </ms:size>
                                                        <ms:size>
                                                                <ms:amount>9.767383E6</ms:amount>
                                                                <ms:sizeUnit>http://w3id.org/meta-share/meta-share/trigram</ms:sizeUnit>
                                                        </ms:size>
                                                        <ms:size>
                                                                <ms:amount>1.368394E7</ms:amount>
                                                                <ms:sizeUnit>http://w3id.org/meta-share/meta-share/four-gram</ms:sizeUnit>
                                                        </ms:size>
                                                        <ms:size>
                                                                <ms:amount>1.495402E7</ms:amount>
                                                                <ms:sizeUnit>http://w3id.org/meta-share/meta-share/five-gram</ms:sizeUnit>
                                                        </ms:size>
                                                        <ms:dataFormat>http://w3id.org/meta-share/omtd-share/Text</ms:dataFormat>
                                                </ms:distributionTextFeature>
                                                <ms:licenceTerms>
                                                        <ms:licenceTermsName xml:lang="en">CC-BY-SA-4.0</ms:licenceTermsName>
                                                        <ms:licenceTermsURL>https://spdx.org/licenses/CC-BY-SA-4.0.html</ms:licenceTermsURL>
                                                </ms:licenceTerms>
                                                <ms:attributionText xml:lang="en">This LR has been created by Athena R.C./ILSP (www.ilsp.gr) and is licensed under a CC-BY-SA licence</ms:attributionText>
                                        </ms:DatasetDistribution>
                                        <ms:personalDataIncluded>false</ms:personalDataIncluded>
                                        <ms:sensitiveDataIncluded>false</ms:sensitiveDataIncluded>
                                </ms:LanguageDescription>
                        </ms:LRSubclass>
                </ms:LanguageResource>
        </ms:DescribedEntity>
</ms:MetadataRecord>

Minimal version metadata for language descriptions

The set of the metadata (mandatory or recommended) that are common to all kinds of resources including data language resources are presented in section Minimal version - List of elements common to all LRTs. In addition, the metadata elements that are required or recommended for language descriptions are described below.

For a quick guide to the ELG template, see Template - Explanations.

LanguageDescription

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LanguageDescription

Data type component

Optionality Mandatory

Explanation & Instructions

Wraps together elements for language descriptions

Example

<ms:LRSubclass>
        <ms:LanguageDescription>
                <ms:lrType>LanguageDescription</ms:lrType>
                ...
        </ms:LanguageDescription>
</ms:LRSubclass>

LanguageDescriptionSubclass

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LanguageDescription.LanguageDescriptionSubclass

Data type component

Optionality Mandatory

Explanation & Instructions

The type of the language description (used for documentation purposes)

It wraps the set of elements that must be used for the Language Description subclasses:

Example

<ms:LanguageDescriptionSubclass>
        ...
<ms:LanguageDescriptionSubclass>

MLModel

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LanguageDescription.LanguageDescriptionSubclass.MLModel

Data type Component

Optionality Mandatory if applicable

Explanation & Instructions

Mandatory for Machine Learning (ML) models; a ML model, for our purposes, is defined as “The model artifact that is created through a training process involving an ML algorithm (that is, the learning algorithm) and the training data to learn from”

The following set of elements are mandatory or recommended for ML models:

  • ldSubclassType (Mandatory): Used to mark the subclass of a language description. For ML models, the value is fixed to ‘MLModel’.
  • modelVariant (Recommended): Introduces a label that can be used to identify the variant of a ML model.
  • typesystem (Recommended): Specifies the typesystem (preferrably through an identifier or URL) that has been used for the annotation of a resource or that is required for the input resource of a tool/service or that should be used (dependency) for the annotation or used in the training of a ML model.
  • method (Recommended): Specifies the method used for the development of a tool/service or the ML model. You must use one of the values from the CV.
  • mlFramework (Recommended): Specifies the framework that has been used for developing a model (e.g. keras, tensorflow, etc.).
  • trainingCorpusDetails (Recommended): Provides a detailed description of the training corpus (e.g., size, number of features , etc.).

Example

<ms:MLModel>
        <ms:ldSubclassType>MlModel</ms:ldSubclassType>
        <ms:modelVariant>factored</ms:modelVariant>
        <ms:typesystem>
                <ms:resourceName xml:lang="en">Universal dependencies</ms:resourceName>
                <ms:version>undefined</ms:version>
        </ms:typesystem>
        <ms:method>http://w3id.org/meta-share/omtd-share/DeepLearning</ms:method>
        <ms:mlFramework>tensorflow</ms:mlFramework>
        <ms:trainingCorpusDetails xml:lang="en">Trained on a corpus of tweets</ms:trainingCorpusDetails>
</ms:MLModel>

NGramModel

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LanguageDescription.LanguageDescriptionSubclass.NGramModel

Data type Component

Optionality Mandatory if applicable

Explanation & Instructions

Mandatory for n-gram models; n-gram model for our purposes is defined as “A language model consisting of n-grams, i.e. specific sequences of a number of words”

The following set of elements are mandatory or recommended for Machine Learning models:

  • ldSubclassType (Mandatory): Used to mark the subclass of a language description. For ML models, the value is fixed to ‘NGramModel’.
  • baseItem (Mandatory): Type of item that is represented in the n-gram resource.
  • order (Mandatory): Specifies the maximum number of items in the sequence.
  • perplexity (Recommended): Provides information on the perplexity derived from running on test set taken from the same corpus.

Example

<ms:NGramModel>
        <ms:ldSubclassType>NGramModel</ms:ldSubclassType>
        <ms:baseItem>http://w3id.org/meta-share/meta-share/word</ms:baseItem>
        <ms:order>5</ms:order>
</ms:NGramModel>

Grammar

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LanguageDescription.LanguageDescriptionSubclass.Grammar

Data type Component

Optionality Mandatory if applicable

Explanation & Instructions

Mandatory for grammars; grammar for our purposes is defined as “A set of rules governing what strings are valid or allowable in a language or text” [https://en.oxforddictionaries.com/definition/grammar]

The following set of elements are mandatory or recommended for computational grammars:

  • ldSubclassType (Mandatory): Used to mark the subclass of a language description. For grammars, the value is fixed to ‘Grammar.’
  • encodingLevel (Mandatory): Classifies the contents of a lexical/conceptual resource or language description as regards the linguistic level of analysis it caters for.
  • compliesWith (Recommended): Specifies the vocabulary/standard/best practice to which a resource is compliant with.
  • formalism (Recommended): Specifies the formalism (bibliographic reference, URL, name) used for the creation/enrichment of the resource (grammar or tool/service).
  • ldTask (Recommended): Specifies the task performed by the language description.

Example

<ms:Grammar>
        <ms:ldSubclassType>Grammar</ms:ldSubclassType>
        <ms:encodingLevel>http://w3id.org/meta-share/meta-share/morphology</ms:encodingLevel>
        <ms:compliesWith>http://w3id.org/meta-share/meta-share/GrAF</ms:compliesWith>
</ms:Grammar>

LanguageDescriptionTextPart

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.CorpusMediaPart.LanguageDescriptionTextPart

Data type component

Optionality Mandatory if applicable

Explanation & Instructions

The textual part (or whole set) of a language description

You can repeat the group of elements for multiple textual parts.

The mandatory or recommended elements for the text part of lexical/conceptual resources are:

  • mediaType (Mandatory): Specifies the media type of a language resource (the physical medium of the contents representation). For text parts, always use the value ‘text’.
  • lingualityType (Mandatory ): Indicates whether the resource includes one, two or more languages.
  • multilingualityType (Mandatory if applicable): Indicates whether the resource (part) is parallel, comparable or mixed. If lingualityType = bilingual or multilingual, it is required; select one of the values for parallel (e.g., original text and its translations), comparable (e.g. corpus of the same domain in multiple languages) and multilingualSingleText (for corpora that consist of segments including text in two or more languages (e.g., the transcription of a European Parliament session with MPs speaking in their native language).
  • language (Mandatory): Specifies the language that is used in the resource part , expressed according to the BCP47 recommendation. See language.
  • languageVariety (Mandatory if applicable): Relates a language resource that contains segments in a language variety (e.g., dialect, jargon) to it. Please use for dialect corpora.
  • metalanguage (Recommended): Specifies the language that is used in the resource part , expressed according to the BCP47 recommendation. See language.

Example

<ms:LanguageDescriptionMediaPart>
        <ms:LanguageDescriptionTextPart>
                <ms:ldMediaType>LanguageDescriptionTextPart</ms:lcrMediaType>
                <ms:mediaType>http://w3id.org/meta-share/meta-share/text</ms:mediaType>
                <ms:lingualityType>http://w3id.org/meta-share/meta-share/monolingual</ms:lingualityType>
                <ms:language>
                        <ms:languageTag>es</ms:languageTag>
                        <ms:languageId>es</ms:languageId>
                </ms:language>
                <ms:metalanguage>
                        <ms:languageTag>en</ms:languageTag>
                        <ms:languageId>en</ms:languageId>
                </metalanguage>
                </ms:language>
        </ms:LanguageDescriptionTextPart>
</ms:LanguageDescriptionMediaPart>

DatasetDistribution

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.DatasetDistribution

Data type component

Optionality Mandatory

Explanation & Instructions

Any form with which a dataset is distributed, such as a downloadable form in a specific format (e.g., spreadsheet, plain text , etc.) or an API with which it can be accessed

You can repeat the element for multiple distributions.

The list of mandatory and recommended elements are:

  • DatasetDistributionForm (Mandatory): The form (medium/channel) used for distributing a language resource consisting of data (e.g., a corpus, a lexicon, etc.). The typical values are ‘downloadable’, ‘accessibleThroughInterface’, ‘accessibleThroughQuery’ (see more at DatasetDistributionForm).
  • downloadLocation (Mandatory if applicable): A URL where the language resource (mainly data but also downloadable software programmes or forms) can be downloaded from. Use this element if the value of datasetDistributionForm is ‘downloadable’ and only for direct download links (i.e., from which the dataset is downloaded without the need of further actions such as clicks on a page).
  • accessLocation (Mandatory if applicable): A URL where the resource can be accessed from; it can be used for landing pages or for cases where the resource is accessible via an interface, i.e. cases where the resource itself is not provided with a direct link for downloading. Use if the value of datasetDistributionForm is ‘accessibleThroughInterface’ or ‘accessibleThroughQuery’ but also for links used for downloading corpora which are mentioned on a landing page or require some kind of action on the part of the user.
  • licenceTerms (Mandatory): See licenceTerms
  • cost (Mandatory if applicable): Introduces the cost for accessing a resource, formally described as a set of amount and currency unit. Please use only for resources available at a cost and not for free resources.

Depending on the parts of the corpus, you must also use one or more of the following:

Example

<ms:DatasetDistribution>
        <ms:DatasetDistributionForm>http://w3id.org/meta-share/meta-share/downloadable</ms:DatasetDistributionForm>
        <ms:accessLocation>https://www.someAccessURL</ms:accessLocation>
        <ms:distributionTextFeature>
                <ms:size>
                        <ms:amount>17601</ms:amount>
                        <ms:sizeUnit>http://w3id.org/meta-share/meta-share/unit</ms:sizeUnit>
                </ms:size>
                <ms:dataFormat>http://w3id.org/meta-share/omtd-share/Xml</ms:dataFormat>
                <ms:characterEncoding>http://w3id.org/meta-share/meta-share/UTF-8</ms:characterEncoding>
        </ms:distributionTextFeature>
        <ms:licenceTerms>
                <ms:licenceTermsName xml:lang="en">openUnder-PSI</ms:licenceTermsName>
                <ms:licenceTermsURL>https://elrc-share.eu/terms/openUnderPSI.html</ms:licenceTermsURL>
        </ms:licenceTerms>
</ms:DatasetDistribution>

<ms:DatasetDistribution>
        <ms:DatasetDistributionForm>http://w3id.org/meta-share/meta-share/accessibleThroughInterface</ms:DatasetDistributionForm>
        <ms:accessLocation>https://www.someAccessURL</ms:accessLocation>
        <ms:distributionTextFeature>
                <ms:size>
                        <ms:amount>100</ms:amount>
                        <ms:sizeUnit>http://w3id.org/meta-share/meta-share/text1</ms:sizeUnit>
                </ms:size>
                <ms:dataFormat>http://w3id.org/meta-share/omtd-share/Pdf</ms:dataFormat>
                <ms:characterEncoding>http://w3id.org/meta-share/meta-share/UTF-8</ms:characterEncoding>
        </ms:distributionTextFeature>
        <ms:licenceTerms>
                <ms:licenceTermsName xml:lang="en">some commercial licence</ms:licenceTermsName>
                <ms:licenceTermsURL>https://elrc-share.eu/terms/someCommercialLicence.html</ms:licenceTermsURL>
        </ms:licenceTerms>
        <ms:cost>
                <ms:amount>10000</ms:amount>
                <ms:currency>http://w3id.org/meta-share/meta-share/euro</ms:currency>
        </ms:cost>
</ms:DatasetDistribution>

personalDataIncluded

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.personalDataIncluded

Data type boolean

Optionality Mandatory

Explanation & Instructions

Specifies whether the language resource contains personal data (mainly in the sense falling under the GDPR)

If the resource contains personal data, you can use the (optional) personalDataDetails to provide more information.

Example

<ms:personalDataIncluded>true</ms:personalDataIncluded>
<ms:personalDataDetails>The corpus contains data on the place of living and place of birth of participants</ms:personalDataDetails>

sensitiveDataIncluded

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.sensitiveDataIncluded

Data type boolean

Optionality Mandatory

Explanation & Instructions

Specifies whether the language resource contains sensitive data (e.g., medical/health-related, etc.) and thus requires special handling

If the resource contains sensitive data, you can use the (optional) sensitiveDataDetails to provide more information.

Example

<ms:sensitiveDataIncluded>true</ms:sensitiveDataIncluded>
<ms:sensitiveDataDetails>The corpus contains medical data for persons with disabilities</ms:sensitiveDataDetails>

anonymized

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.anonymized

Data type boolean

Optionality Mandatory if applicable

Explanation & Instructions

Indicates whether the language resource has been anonymized

The element is mandatory if either personalDataIncluded or sensitiveDataIncluded have ‘true’ as value; anonymizationDetails must also be filled in with information on the anonymization mehod, etc.

Example

<ms:anonymized>true</ms:anonmized>
<ms:anonymizationDetails>pseudonymization performed manually</ms:anonymizationDetails>