Minimal version

The minimal version of the ELG schema consists of the required and recommended elements 1. These have been carefully selected for various reasons, such as:

  • identification and citation: resource name(s); identifier(s); a short description of contents; versioning information; a contact point for further information (email or landing page); data of the resource provider(s) and resource creator(s); classification by domain, keywords and intended LT application; language coverage (language and, if needed, dialect); publication date;

  • support: links to manuals, training material; samples of the resource;

  • usage/access: distribution form (e.g. as downloadable file, a form that can be accessed via an interface, source code or binary file of software, etc.); licensing conditions; access location.

These metadata elements can be used to describe all resources, irrespective of the resource type. Additional metadata elements, particular to each resource type, are required, such as size and format for data files, dependencies and technical requirements for tools and services, etc.

Outline and explanations for the following sections

The following sections present the minimal schema, grouped as described above, i.e. first for elements common to all LRTs, and then by resource type. Each section includes:

  • an overview, with a tabular presentation of the mandatory (M) and recommended (R) elements. More specifically, the table provides information on the element name, the element optionality and the section tab where the user can find each element in the interactive editor. The elements are grouped according to the tab where they are found. The values for optionality are:

    • Mandatory (Μ): the element must always be filled in the metadata record

    • Recommended (R): the use of the element is not enforced but provides important information

    • Mandatory if applicable (MA): the element must be filled in when specific conditions apply

    • Recommended if applicable (RA): the use of the element is recommended when specific conditions apply

  • a detailed presentation for each metadata element with the following information:

  • Path: the path of the element as in the XSD

  • Data type:

    • string

    • multilingual string: you can repeat the element for different language versions; to specify the language, you must use the xml attribute lang with a value from IETF BCP 47, the IANA Language Subtag Registry; for all metadata elements, a value in English (“en”) is mandatory

    • component: group of elements

    • Controlled Vocabulary (CV): value taken from a controlled vocabulary; a link to the relevant controlled vocabulary is provided

    • date: date in the format xs:date

    • URL

  • Optionality:

For an explanation of the values, see above.

  • Explanation & Instructions: A short definition of the element, followed by instructions on how it should be used in the specific context.

  • Example: One or more examples for the element in XML format.

Minimal elements for all entities

This page describes the minimal metadata elements common to all types of entities.

1. Overview

Element name

Optionality

Section

Tab

metadataCreationDate

R

metadataCurator

R

compliesWith

R

metadataCreator

R

sourceOfMetadataRecord

R

LRT

Identity

Organization

Project

2. Element presentation

In this section all the aforementioned elements are presented following the order of the elements in the table of the previous section.


MetadataRecord

Path MetadataRecord

Data type component

Optionality Mandatory

Explanation & Instructions

A set of formalized structured information used to describe the contents, structure, function, etc. of an entity, usually according to a specific set of rules (metadata schema)

The MetadataRecord element wraps together a set of administrative data, of which the main elements (automatically assigned by the ELG software) for metadata records registered by individuals (presented in the previous table) are:

  • metadataCreationDate: the date when the metadata record was created

  • metadataCurator: the person that will be assigned the responsibility to update the metadata record when imported in the ELG database; it is usually the same person as the metadataCreator

  • compliesWith: for ELG metadata records, this is by default the ELG-SHARE metadata schema

  • metadataCreator: the person that has created the metadata record

  • sourceOfMetadataRecord: used for metadata records that have been imported into ELG from other catalogues, either automatically harvested or through a manual collection procedure; it consists of two mandatory elements, repositoryName and repositoryURL, and the optional element repositoryIdentifier.

All elements apart from the sourceOfMetadataRecord are automatically assigned; they are, therefore, not displayed on the interactive editor and they do not have to be added in the metadata file.

The sourceOfMetadataRecord is mandatory for harvested records and automatically assigned for them. It is recommended for records registered by individuals and, therefore, displayed in the interactive editor form under the section “Language Resource/Technology”, “Project” or “Organization”.

Example

<ms:MetadataRecord>
    <ms:MetadataRecordIdentifier ms:MetadataRecordIdentifierScheme="http://w3id.org/meta-share/meta-share/elg">default id</ms:MetadataRecordIdentifier>
    <ms:metadataCreationDate>2020-02-28</ms:metadataCreationDate>
    <ms:metadataCurator>
            <ms:actorType>Person</ms:actorType>
            <ms:surname xml:lang="en">Smith</ms:surname>
            <ms:givenName xml:lang="en">John</ms:givenName>
    </ms:metadataCurator>
    <ms:compliesWith>http://w3id.org/meta-share/meta-share/ELG-SHARE</ms:compliesWith>
    <ms:metadataCreator>
            <ms:actorType>Person</ms:actorType>
            <ms:surname xml:lang="en">Brown</ms:surname>
            <ms:givenName xml:lang="en">George</ms:givenName>
    </ms:metadataCreator>
            <sourceOfMetadataRecord>
                    <repositoryName xml:lang="en">ELRC-SHARE</repositoryName>
                    <repositoryURL>https://www.elrc-share.eu/</repositoryName>
            </sourceOfMetadataRecord>
    </ms:metadataRecord>

Minimal elements for all language resources and technologies

This page describes the minimal metadata elements common to all language resources and technologies (LRTs).

1. Overview

Element name

Optionality

Section

Tab

resourceName

M

LRT

Identity

LRIdentifier

R

LRT

Identity

resourceShortName

R

LRT

Identity

description

M

LRT

Identity

version

M

LRT

Identity

versionDate

R

LRT

Identity

resourceProvider

R

LRT

Identity

resourceCreator

R

LRT

Identity

publicationDate

R

LRT

Identity

fundingProject

R

LRT

Identity

logo

R

LRT

Identity

sourceOfMetadataRecord

R

LRT

Identity

intendedApplication

R

LRT

Categories

compliesWith

R

LRT

Categories

domain

R

LRT

Categories

keyword

M

LRT

Categories

additionalInfo

M

LRT

Contact

contact

R

LRT

Contact

isDocumentedBy

R

LRT

Documentation

isToBeCitedBy

R

LRT

Documentation

replaces

R

LRT

Related LRTs

isVersionOf

R

LRT

Related LRTs

isPartOf

R

LRT

Related LRTs

isSimilarTo

R

LRT

Related LRTs

isRelatedTo

R

LRT

Related LRTs

relation

R

LRT

Related LRTs

2. Element presentation

In this section all the aforementioned elements are presented each one separately. The presentation follows the order of the elements in the table of the previous section.


resourceName

Path MetadataRecord.DescribedEntity.LanguageResource.resourceName

Data type multilingual string

Optionality Mandatory

Explanation & Instructions

Introduces a human-readable name or title by which the resource is known

This is the “brand name” of your resource; try to use a name that is unique.

Example

<ms:resourceName xml:lang="en">GATE: English Named Entity Recognizer</ms:resourceName>

LRIdentifier

Path MetadataRecord.DescribedEntity.LanguageResource.LRIdentifier

Data type string with attribute

Optionality Recommended when applicable

Explanation & Instructions

A string (e.g., PID, DOI, internal to an organization , etc.) used to uniquely identify a language resource

You must also use the attribute LRIdentifierScheme to specify the identifier scheme (e.g., DOI, Hanldle, …)

If the resource is already described in another repository/catalogue and has a PID, please add it with the appropriate attribute.

Example

<ms:LRIdentifier ms:LRIdentifierScheme="http://w3id.org/meta-share/meta-share/elg">ELG id automatically assigned</ms:LRIdentifier>

resourceShortName

Path MetadataRecord.DescribedEntity.LanguageResource.resourceShortName

Data type multilingual string

Optionality Recommended

Explanation & Instructions

Introduces a short form (e.g., abbreviation, acronym , etc.) used to refer to a language resource

Example

<ms:resourceShortName xml:lang="en">annie-named-entity-recognizer</ms:resourceShortName>

description

Path MetadataRecord.DescribedEntity.LanguageResource.description

Data type multilingual string

Optionality Mandatory

Explanation & Instructions

Introduces a short free-text account that provides information about the resource (e.g., service function, contents of a data resource, technical information , etc.)

Example

<ms:description xml:lang="en">Identifies names of persons, locations, organizations, as well as money amounts, time and date expressions in English texts automatically. </ms:description>

version

Path MetadataRecord.DescribedEntity.LanguageResource.version

Data type string

Optionality Mandatory

Explanation & Instructions

Associates a language resource with a pattern that indicates its version; the recommended way is to follow the semantic versioning guidelines (http://semver.org) and use a numeric pattern of the form major_version.minor_version.patch

If no version is provided, the system will automatically assign the resource a ‘v1.0.0 (automatically assigned)’ value

Example

<ms:version>v8.6</ms:version>

versionDate

Path MetadataRecord.DescribedEntity.LanguageResource.versionDate

Data type date

Optionality Recommended

Explanation & Instructions

Identifies the date associated with the version of the language resource being described (as a recommendation, of the latest update of the particular version)

Example

<ms:versionDate>2020-02-10</ms:versionDate>

resourceProvider

Path MetadataRecord.DescribedEntity.LanguageResource.resourceProvider

Data type component

Optionality Recommended

Explanation & Instructions

The person/organization responsible for providing, curating, maintaining and making available (publishing) the resource

The resource provider is very similar to the publisher of scientific articles; it can be an individual or an organization.

For organizations you must add the name of the organization (organizationName) and, if possible, the website.

For persons, you must add the given name and surname and, if possible, an email address or an identifier (such as ORCID id) to help uniquely identify them.

Example

    <ms:resourceProvider>
            <ms:Organization>
                    <ms:actorType>Organization</ms:actorType>
                    <ms:organizationName xml:lang="en">Organization</ms:organizationName>
                    <ms:website>https://provider.org/</ms:website>
            </ms:Organization>
</ms:resourceProvider>

<ms:resourceProvider>
            <ms:Person>
                    <ms:actorType>Person</ms:actorType>
                    <ms:surname xml:lang="en">Smith</ms:surname>
                    <ms:givenName xml:lang="en">John</ms:givenName>
            </ms:Person>
</ms:resourceProvider>

resourceCreator

Path MetadataRecord.DescribedEntity.LanguageResource.resourceCreator

Data type component

Optionality Recommended

Explanation & Instructions

Links a resource to the person, group or organization that has created the resource

The element is important for citation and acknowledgement purposes.

For organizations, you must add the name of the organization (organizationName) and, if possible, the website.

For persons, you must add the given name and surname and, if possible, an email address or an identifier (such as ORCID id) to help uniquely identify them.

Example

<ms:resourceCreator>
            <ms:Organization>
                    <ms:actorType>Organization</ms:actorType>
                    <ms:organizationName xml:lang="en">example organization</ms:organizationName>
                    <ms:website>https://provider.org/</ms:website>
            </ms:Organization>
</ms:resourceCreator>

<ms:resourceCreator>
            <ms:Person>
                    <ms:actorType>Person</ms:actorType>
                    <ms:surname xml:lang="en">Smith</ms:surname>
                    <ms:givenName xml:lang="en">John</ms:givenName>
            </ms:Person>
    </ms:resourceCreator>

publicationDate

Path MetadataRecord.DescribedEntity.LanguageResource.publicationDate

Data type date

Optionality Recommended

Explanation & Instructions

Specifies the date when a language resource has been made available to the public

Publication date is important for citation purposes, just as for scientific articles. If this is the first time your resource is published, please use the same date as for metadataCrationDate. If the resource has been previously published in another repository, please add the date it was first provided there.

Example

<ms:publicationDate>2015-12-17</ms:publicationDate>

fundingProject

Path MetadataRecord.DescribedEntity.LanguageResource.fundingProject

Data type component

Optionality Recommended when applicable

Explanation & Instructions

Links a language resource to the project that has funded its creation, enrichment, extension , etc.

Funding information is important for acknowledgement purposes.

For projects, you must provide the name of the project (projectName) and, if possible, a website (website) and/or an identifier (ProjectIdentifier). You may also provide the short name of the project (projectShortName), a grant number issued by the funding authority (grantNumber), the funder(s) (funder), in the form of organization, person or group, and a value selected from the fundingType controlled vocabulary.

Example

<ms:fundingProject>
            <ms:projectName xml:lang="en">European Language Resource Coordination LOT3</ms:projectName>
            <ms:projectName xml:lang="en">ELRC - LOT3</ms:projectName>
            <ms:ProjectIdentifier ms:ProjectIdentifierScheme="http://w3id.org/meta-share/meta-share/other">SMART 2015/1091 - 30-CE-0816766/00-92</ms:ProjectIdentifier>
            <ms:website>http://www.lr-coordination.eu</ms:website>
            <ms:grantNumber>EU 1234567890</ms:grantNumber>
            <ms:fundingType>http://w3id.org/meta-share/meta-share/serviceContract</ms:fundingType>
            <ms:fundingType>http://w3id.org/meta-share/meta-share/other</ms:fundingType>
            <ms:funder>
                    <ms:Organization>
                            <ms:actorType>Organization</ms:actorType>
                            <ms:organizationName xml:lang="en">Ministry of Research and Innovation</ms:organizationName>
                            <ms:website>http://www.ministry.org</ms:website>
                    </ms:Organization>
            </ms:funder>
</ms:fundingProject>


sourceOfMetadataRecord

Path MetadataRecord.sourceOfMetadataRecord

Data type component

Optionality Recommended

Explanation & Instructions

Refers to the entity (repository, catalogue, archive, etc.) from which the metadata record has been imported into the new catalogue

This element is a property of the metadata record, and it is automatically assigned by the ELG software for records automatically harvested. For records originally included in other catalogues and registered in ELG by individuals, the element can be filled in at the LRT section of the editor.

It consists of two mandatory elements, repositoryName and repositoryURL, and the optional element repositoryIdentifier.

Example

<sourceOfMetadataRecord>
        <repositoryName xml:lang="en">ELRC-SHARE</repositoryName>
        <repositoryURL>https://www.elrc-share.eu/</repositoryName>
</sourceOfMetadataRecord>

intendedApplication

Path MetadataRecord.DescribedEntity.LanguageResource.intendedApplication

Data type component

Optionality Recommended

Explanation & Instructions

Specifies an LT application for which the language resource has been created or for which it can be used or is recommended to be used

The element is important for discovery purposes.

You can use the element LTClassRecommended with one of the recommended values from the LT taxonomy (class ‘Function’ of the OMTD-SHARE ontology at http://w3id.org/meta-share/omtd-share/), or add a free text at the LTClassOther element.

You can repeat the element if the resource can be used for various applications. For instance, a part-of-speech tagger can be used as a component for Named entity recognition, for sentiment analysis, etc.

Example

<ms:intendedApplication>
            <ms:LTClassRecommended>http://w3id.org/meta-share/omtd-share/NamedEntityRecognition</ms:LTClassRecommended>
</ms:intendedApplication>

<ms:intendedApplication>
            <ms:LTClassRecommended>http://w3id.org/meta-share/omtd-share/SentimentAnalysis</ms:LTClassRecommended>
</ms:intendedApplication>

<ms:intendedApplication>
            <ms:LTClassOther>face recognition</ms:LTClassRecommended>
</ms:intendedApplication>

compliesWith

Path MetadataRecord.DescribedEntity.LanguageResource.compliesWith

Data type controlled vocabulary

Optionality Recommended

Explanation & Instructions

Specifies the vocabulary/standard/best practice to which a resource is compliant with.

You can use a value from the compliesWith controlled vocabulary.

Example

<ms:compliesWith>http://w3id.org/meta-share/meta-share/LemonOntolex</ms:compliesWith>

domain

Path MetadataRecord.DescribedEntity.LanguageResource.domain

Data type component

Optionality Recommended

Explanation & Instructions

Identifies the domain according to which a resource is classified

You must fill in the CategoryLabel element with a free text value. If you prefer to add a value from an established controlled vocabulary, you can also use the DomainIdentifier (with the attribute DomainClassificationScheme with the appropriate value).

Example

<ms:domain>
        <ms:categoryLabel xml:lang="en">EDUCATION &amp; COMMUNICATIONS</ms:categoryLabel>
        <ms:DomainIdentifier ms:DomainClassificationScheme="http://w3id.org/meta-share/meta-share/EUROVOC">32</ms:DomainIdentifier>
</ms:domain>

<ms:domain>
        <ms:categoryLabel xml:lang="en">health</ms:categoryLabel>
</ms:domain>

keyword

Path MetadataRecord.DescribedEntity.LanguageResource.keyword

Data type multilingual string

Optionality Mandatory

Explanation & Instructions

Introduces a word or phrase considered important for the description of a language resource, person or organization and thus used to index or classify it

You can repeat the element if you want to add more keywords. Keywords are used for discovery purposes; so, try to use words or phrases that you think users will use to find similar resources to yours.

Example

<ms:keyword xml:lang="en">Named entity recognition</ms:keyword>
<ms:keyword xml:lang="en">person</ms:keyword>
<ms:keyword xml:lang="en">location</ms:keyword>
<ms:keyword xml:lang="en">fake news</ms:keyword>
<ms:keyword xml:lang="en">tweets</ms:keyword>

additionalInfo

Path MetadataRecord.DescribedEntity.LanguageResource.additionalInfo

Data type component

Optionality Mandatory

Explanation & Instructions

Introduces a point that can be used for further information (e.g. a landing page with a more detailed description of the resource or a general email that can be contacted for further queries)

It’s a recommended practice to give at least a landing page (landingPage) or a general email addresss (email); if you want, you can also specify a contact person (see full schema for contactPerson)

Example

<ms:additionalInfo>
            <ms:landingPage>https://provider.example.com/product</ms:landingPage>
</ms:additionalInfo>

<ms:additionalInfo>
            <ms:email>product@example.com</ms:email>
</ms:additionalInfo>

contact

Path MetadataRecord.DescribedEntity.LanguageResource.contact

Data type component

Optionality Recommended

Explanation & Instructions

Specifies the data of the person/organization/group that can be contacted for information about a language resource

Example

<ms:contact>
        <ms:Person>
                <ms:actorType>Person</ms:actorType>
                <ms:surname xml:lang="en">Smith</ms:surname>
                <ms:givenName xml:lang="en">John</ms:givenName>
                <ms:PersonalIdentifier ms:PersonalIdentifierScheme="http://purl.org/spar/datacite/orcid">String</ms:PersonalIdentifier>
                <ms:email>smith@example.com</ms:email>
        </ms:Person>
</ms:contact>

isDocumentedBy

Path MetadataRecord.DescribedEntity.LanguageResource.document

Data type component

Optionality Recommended

Explanation & Instructions

Links a language resource to a document (e.g., research paper describing its contents or its use in a project, user manual, etc.) or any other form of documentation (e.g., a URL with support information) that is related to the resource

You can use this element to add

  • supporting documentation (user manuals, training material, etc.) for the installation and use of your resource

  • scientific publications that describe the resource.

If you want, you can use one of the more fine-grained relations to documents (see full schema).

You can repeat the element if you want to add more documents.

You must fill in the title element with the title of the document (or even an entire bibliographic record). When available, it’s also recommended to add the DocumentIdentifier with the DOI of the document, or any other link to the document; if you do, use the attribute DocumentIdentifierScheme to indicate the identifier type.’

Example

<ms:isDocumentedBy>
        <ms:title xml:lang="en">Product User Manual</ms:title>
        <ms:DocumentIdentifier ms:DocumentIdentifierScheme="http://purl.org/spar/datacite/url">https://www.company.org/product.pdf</ms:DocumentIdentifier>
</ms:isDocumentedBy>

replaces

Path MetadataRecord.DescribedEntity.LanguageResource.replaces

Data type component

Optionality Recommended

Explanation & Instructions

Links two Language Resources: the one being described to another which is an older version and has been replaced

You must provide the resourceName of the language resource and, if possible, an LRIdentifier that will help uniquely identify it.

Example

<ms:replaces>
        <ms:resourceName xml:lang="en">COVID-19 Concept Embeddings</ms:resourceName>
        <ms:LRIdentifier ms:LRIdentifierScheme="http://w3id.org/meta-share/meta-share/doi">https://zenodo.org/record/3753531</ms:LRIdentifier>
</ms:replaces>

isVersionOf

Path MetadataRecord.DescribedEntity.LanguageResource.isVersionOf

Data type component

Optionality Recommended

Explanation & Instructions

Links two Language Resources: the one being described to another which is a version (corrected, annotated, enriched, processed, etc.) of it

You must provide the resourceName of the language resource and, if possible, an LRIdentifier that will help uniquely identify it.

Example

<ms:isVersionOf>
        <ms:resourceName xml:lang="en">COVID-19 Concept Embeddings</ms:resourceName>
        <ms:LRIdentifier ms:LRIdentifierScheme="http://w3id.org/meta-share/meta-share/doi">https://zenodo.org/record/3753531</ms:LRIdentifier>
</ms:isVersionOf>

isPartOf

Path MetadataRecord.DescribedEntity.LanguageResource.isPartOf

Data type component

Optionality Recommended

Explanation & Instructions

Links two Language Resources: the one being described to another containing it (e.g., a monolingual corpus which is a part of a bilingual corpus)

You must provide the resourceName of the language resource and, if possible, an LRIdentifier that will help uniquely identify it.

Example

<ms:isPartOf>
        <ms:resourceName xml:lang="en">Multilingual Example corpus</ms:resourceName>
        <ms:LRIdentifier ms:LRIdentifierScheme="http://w3id.org/meta-share/meta-share/doi">https://zenodo.org/record/123456789</ms:LRIdentifier>
</ms:PartOf>

isSimilarTo

Path MetadataRecord.DescribedEntity.LanguageResource.isSimilarTo

Data type component

Optionality Recommended

Explanation & Instructions

Links two Language Resources: the one being described to another that bears resemblances with. Examples are: two resources which have been built with the same theoretical principles; the same resource which comes in different formats, or processed at the same level with different tools.

You must provide the resourceName of the language resource and, if possible, an LRIdentifier that will help uniquely identify it.

Example

<ms:isSimilarTo>
        <ms:resourceName xml:lang="en">Multilingual Example corpus</ms:resourceName>
        <ms:LRIdentifier ms:LRIdentifierScheme="http://w3id.org/meta-share/meta-share/doi">https://zenodo.org/record/123456789</ms:LRIdentifier>
</ms:isSimilarTo>

isRelatedToLR

Path MetadataRecord.DescribedEntity.LanguageResource.isRelatedToLR

Data type component

Optionality Recommended

Explanation & Instructions

Links to a language resource that holds a relation with the entity being described (without further specification of the relation type).

You must provide the resourceName of the language resource and, if possible, an LRIdentifier that will help uniquely identify it.

Example

<ms:isRelatedToLR>
        <ms:resourceName xml:lang="en">Multilingual Example corpus</ms:resourceName>
        <ms:LRIdentifier ms:LRIdentifierScheme="http://w3id.org/meta-share/meta-share/doi">https://zenodo.org/record/123456789</ms:LRIdentifier>
</ms:isRelatedToLR>

relation

Path MetadataRecord.DescribedEntity.LanguageResource.relation

Data type component

Optionality Recommended

Explanation & Instructions

Links two Language Resources specifying the type of relation as well

You must provide the relationType (free text) and for the relatedLR, the resourceName of the language resource and, if possible, an LRIdentifier that will help uniquely identify it.

Example

<ms:relation>
        <ms:relationType xml:lang="en">new relation</ms:relationType>
        <ms:relatedLR>
                <ms:resourceName xml:lang="en">COVID-19 Concept Embeddings</ms:resourceName>
                <ms:LRIdentifier ms:LRIdentifierScheme="http://w3id.org/meta-share/meta-share/doi">https://zenodo.org/record/3753531</ms:LRIdentifier>
        </ms:relatedLR>
</ms:relation>

Minimal elements for tools/services

This page describes the minimal metadata elements specific to tools/services.

1. Overview

Element name

Optionality

Section

Tab

function

M

Tool/Service

categories

developmentFramework

R

Tool/Service

categories

implementationLanguage

R

Tool/Service

categories

languageDependent

M

Tool/Service

technical

inputContentResource

M

Tool/Service

technical

processingResourceType

M

Tool/Service

technical

language

MA

Tool/Service

technical

mediaType

R

Tool/Service

technical

dataFormat

R

Tool/Service

technical

annotationType

R

Tool/Service

technical

sample

R

Tool/Service

technical

outputResource

R

Tool/Service

technical

processingResourceType

M

Tool/Service

technical

language

MA

Tool/Service

technical

mediaType

R

Tool/Service

technical

dataFormat

R

Tool/Service

technical

annotationType

R

Tool/Service

technical

requiredHardware

R

Tool/Service

technical

mlModel

R

Tool/Service

technical

parameter

R

Tool/Service

technical

evaluated

R

Tool/Service

evaluation

trl

R

Tool/Service

evaluation

SoftwareDistribution

M

distribution

technical

SoftwareDistributionForm

M

distribution

technical

webServiceType

MA

distribution

technical

dockerDownloadLocation

RA

distribution

technical

serviceAdapterDownloadLocation

RA

distribution

technical

downloadLocation

RA

distribution

technical

executionLocation

RA

distribution

technical

accessLocation

RA

distribution

technical

demoLocation

R

distribution

technical

privateResource

R

distribution

technical

additionalHWRequirements

R

distribution

technical

isDescribedBy

R

distribution

technical

licenceTerms

M

distribution

technical

cost

R

distribution

technical

membershipInstitution

R

distribution

technical

2. Element presentation

In this section all the aforementioned elements are presented each one separately. The presentation follows the order of the elements in the table of the previous section.


function

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService.function

Data type component

Optionality Mandatory

Explanation & Instructions

Specifies the operation/function/task that a software object performs

The element is important for discovery purposes.

You can fill in:

  • the LTClassRecommended element with one of the recommended values from the LT taxonomy, or

  • the LTClassOther element with a free text.

For services that perform multiple functions (e.g., syntactic and semantic annotation) you can repeat the element.

Example

<ms:function>
        <ms:LTClassRecommended>http://w3id.org/meta-share/omtd-share/NamedEntityRecognition</ms:LTClassRecommended>
</ms:function>

<ms:function>
        <ms:LTClassRecommended>http://w3id.org/meta-share/omtd-share/MachineTranslation</ms:LTClassRecommended>
</ms:function>

<ms:function>
        <ms:LTClassOther>video segmentation</ms:LTClassRecommended>
</ms:function>

developmentFramework

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService.developmentFramework

Data type CV

Optionality Recommended

Explanation & Instructions

A framework or toolkit (Machine Learning model, NLP toolkit) used in the development of a resource

Example

<ms:developmentFramework>
        <ms:DevelopmentFrameworkRecommended>http://w3id.org/meta-share/meta-share/TensorFlow<ms:DevelopmentFrameworkRecommended>
</ms:developmentFramework>

implementationLanguage

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService.implementationLanguage

Data type string

Optionality Recommended

Explanation & Instructions

The programming language(s) used for the development of a tool/service, which is needed for running the tools/services, in case no executables are available

Example

<ms:implementationLanguage>Java v8</ms:implementationLanguage>

languageDependent

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService.languageDependent

Data type boolean

Optionality Mandatory

Explanation & Instructions

Indicates whether the operation of the tool or service is language dependent or not

For language-dependent tools/services, you will be asked to also provide the language of the input and output resources.

Example

<ms:languageDependent>true</ms:languageDependent>

inputContentResource

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService.inputContentResource

Data type component

Optionality Mandatory

Explanation & Instructions

Specifies the requirements set by a tool/service for the (content) resource that it processes

The following elements are mandatory or recommended:

  • processingResourceType (Mandatory): Specifies the resource type that a tool/service takes as input or produces as output; you must specify, for instance, if the tool/service can process a single file, or set of files, or processes a string typed in by the users.

  • language (Mandatory if applicable): Specifies the language that is used in the resource or supported by the tool/service, expressed according to the BCP47 recommendation. See language

  • mediaType (Recommended): Specifies the media type of the input/output of a language processing tool/service. For ELG functional services, this will be used to fit the appropriate GUI (e.g. “audio” for ASR applications, vs. “text” for Machine Translation applications)

  • dataFormat (Recommended): Indicates the format(s) of a data resource Please, use to indicate the data format of the resource supported by the tool/service. The dataFormat controlled vocabulary lists data formats, with their mimetype and documentation on the particularities, thus catering for variations of formats, e.g. GATE XML, TEI variants, etc. You may also use a free text value.

  • characterEncoding (Recommended if applicable): Specifies the character encoding used for the input/output text resource of an LT service

  • annotationType (Recommended if applicable): Specifies the annotation type of the annotated version(s) of a resource or the annotation type a tool/ service requires or produces as an output. Use this element only if the tool/service processes pre-annotated corpora; for tools/services processing raw files, do not use. The element takes a value from a controlled vocabulary, see annotationType or a free text value.

Example

<!-- example for a tool with textual input -->
<ms:inputContentResource>
        <ms:processingResourceType>http://w3id.org/meta-share/meta-share/file1</ms:processingResourceType>
        <ms:language>
                <ms:languageTag>en</ms:languageTag> <ms:languageId>en</ms:languageId>
        </ms:language>
        <ms:mediaType>http://w3id.org/meta-share/meta-share/text</ms:mediaType>
        <ms:dataFormat><ms:dataFormatRecommended>http://w3id.org/meta-share/omtd-share/Json</ms:dataFormatRecommended></ms:dataFormat>
        <ms:characterEncoding>http://w3id.org/meta-share/meta-share/UTF-8</ms:characterEncoding>
</ms:inputContentResource>

<!-- example for an Automatic Speech Recognizer -->
<ms:inputContentResource>
        <ms:processingResourceType>http://w3id.org/meta-share/meta-share/file1</ms:processingResourceType>
        <ms:language>
                <ms:languageTag>de</ms:languageTag> <ms:languageId>de</ms:languageId>
        </ms:language>
        <ms:mediaType>http://w3id.org/meta-share/meta-share/audio</ms:mediaType>
        <ms:dataFormat><ms:dataFormatRecommended>http://w3id.org/meta-share/omtd-share/mp3</ms:dataFormatRecommended></ms:dataFormat>
        <ms:dataFormat><ms:dataFormatRecommended>http://w3id.org/meta-share/omtd-share/wav</ms:dataFormatRecommended></ms:dataFormat>
</ms:inputContentResource>

outputResource

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService.outputResource

Data type component

Optionality Recommended if applicable

Explanation & Instructions

Describes the features of the output resource processed by a tool/service.

The set of elements are the same as for the inputContentResource.

Make sure that you add here what is relevant for your application. For instance,

  • for annotation and information extraction tools/services, use the annotationType to indicate the results of your processing; you can repeat it to indicate mutliple annotation types (e.g., part of speech, person, amount, location, etc.)

  • for Machine Translation tools, indicate the input and output languages respectively.

Example

<!-- example for an Information Extraction tool -->
<ms:outputResource>
        <ms:processingResourceType>http://w3id.org/meta-share/meta-share/file1</ms:processingResourceType>
        <ms:language>
                <ms:languageTag>en</ms:languageTag>
                <ms:languageId>en</ms:languageId>
        </ms:language>
        <ms:mediaType>http://w3id.org/meta-share/meta-share/text</ms:mediaType>
        <ms:dataFormat><ms:dataFormatRecommended>http://w3id.org/meta-share/omtd-share/Json</ms:dataFormatRecommended></ms:dataFormat>
        <ms:characterEncoding>http://w3id.org/meta-share/meta-share/UTF-8</ms:characterEncoding>
        <ms:annotationType><ms:annotationTypeRecommended>http://w3id.org/meta-share/omtd-share/Person</ms:annotationTypeRecommended></ms:annotationType>
        <ms:annotationType><ms:annotationTypeRecommended>http://w3id.org/meta-share/omtd-share/Location</ms:annotationTypeRecommended></ms:annotationType>
        <ms:annotationType><ms:annotationTypeRecommended>http://w3id.org/meta-share/omtd-share/Organization</ms:annotationTypeRecommended></ms:annotationType>
        <ms:annotationType><ms:annotationTypeRecommended>http://w3id.org/meta-share/omtd-share/Date</ms:annotationTypeRecommended></ms:annotationType>
        <ms:annotationType><ms:annotationTypeRecommended>http://w3id.org/meta-share/omtd-share/Date</ms:annotationTypeRecommended></ms:annotationType>
</ms:outputResource>

<!-- example for a Machine Translation tool -->
<ms:outputResource>
        <ms:processingResourceType>http://w3id.org/meta-share/meta-share/file1</ms:processingResourceType>
        <ms:language>
                <ms:languageTag>en</ms:languageTag>
                <ms:languageId>en</ms:languageId>
        </ms:language>
        <ms:mediaType>http://w3id.org/meta-share/meta-share/text</ms:mediaType>
        <ms:dataFormat><ms:dataFormatRecommended>http://w3id.org/meta-share/omtd-share/Json</ms:dataFormatRecommended></ms:dataFormat>
        <ms:characterEncoding>http://w3id.org/meta-share/meta-share/UTF-8</ms:characterEncoding>
</ms:outputResource>

language

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService.language

Data type component

Optionality Mandatory if applicable

Explanation & Instructions

Specifies the language that is used in the resource or supported by the tool/service, expressed according to the BCP47 recommendation

The element languageTag is composed of the languageId, and optionally scriptId, regionId and variantId; you can use those elements that best describe the language(s) of your resource.

Example

<ms:language>
        <ms:languageTag>en</ms:languageTag>
        <ms:languageId>en</ms:languageId>
</ms:language>

<ms:language>
        <ms:languageTag>en-US</ms:languageTag>
        <ms:languageId>en</ms:languageId>
        <ms:regionId>US</ms:regionId>
</ms:language>

language

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService.sample

Data type component

Optionality Recommended

Explanation & Instructions

Introduces a combination of the sample text(s) or sample file(s) and optional tags that can be used for feeding a processing service for testing purposes.

You can add either a free text value using the sampleText element, and/or link to a text using the samplesLocation. You can also introduce a tag (tag) that can be used as a criterion for selecting different samples for testing (e.g. the language value for Machine Translation services that operate on multiple languages).

Example

<ms:sample>
        <ms:sampleText>John is in Berlin.</ms:sampleText>
        <ms:tag>en</ms:tag>
</ms:language>

<ms:sample>
        <ms:sampleText>Jean est à Berlin.</ms:sampleText>
        <ms:tag>fr</ms:tag>
</ms:language>

requiredHardware

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService.requiredHardware

Data type CV (requiredHardware)

Optionality Recommended

Explanation & Instructions

Specifies the type of hardware required for running a tool and/or computational grammar

Example

<ms:requiredHardware>http://w3id.org/meta-share/meta-share/ocrSystem</ms:requiredHardware>

mlModel

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService.mlModel

Data type component

Optionality Recommended

Explanation & Instructions

Specifies the ML model that must be used together with the tool/service to perform the desired task

You must provide the resourceName of the language resource and, if possible, an LRIdentifier that will help uniquely identify it.

Example

<ms:isRelatedToLR>
        <ms:resourceName xml:lang="en">Bio2Vec - Results from October 13, 2017</ms:resourceName>
        <ms:LRIdentifier ms:LRIdentifierScheme="http://w3id.org/meta-share/meta-share/url">https://live.european-language-grid.eu/catalogue/ld/7509</ms:LRIdentifier>
</ms:isRelatedToLR>

requiredHardware

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService.requiredHardware

Data type CV (requiredHardware)

Optionality Recommended

Explanation & Instructions

Specifies the type of hardware required for running a tool and/or computational grammar

Example

<ms:requiredHardware>http://w3id.org/meta-share/meta-share/ocrSystem</ms:requiredHardware>

parameter

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService.parameter

Data type component

Optionality Recommended

Explanation & Instructions

Introduces a parameter used for running a tool/service

It can be filled in with the following elements:

  • parameterName (M): Introduces the name of the parameter as sent to a processing service

  • parameterLabel (M): Introduces a short name for a parameter suitable for use as a field label in a user interface

  • parameterDescription (M): Provides a short account of he parameter (e.g., function it performs, input / output requirements, etc.) in free text

  • parameterType (M): Classifies the parameter according to a specific (not yet standardised) typing system (e.g., whether it’s boolean, string, integer, a document, mapping, etc.)

  • optional (M): Specifies whether the parameter should be treated as mandatory or optional by user interfaces

  • multiValue (M): Specifies whether the parameter takes a list of values

  • defaultValue (MA): Specifies the initial value that user interfaces should use when prompting the user for a parameter taking a list of values

  • dataFormat (MA): Use to specify the data format, if applicable, for the input/output resource that can be used in the parameter; it takes a value from a recommended controlled vocabulary or a free text value.

  • enumerationValue (MA): Introduces a value of a list used inside parameters; it is a component with the following elements: valueLabel and valueDescription.

Example

<ms:parameter>
        <ms:parameterName>no_global</ms:parameterName>
        <ms:parameterLabel xml:lang="en">Skip global relation extraction</ms:parameterLabel>
        <ms:parameterDescription xml:lang="en">Speedup for large documents, but less extracted relations and lower accuracy.</ms:parameterDescription>
        <ms:parameterType>http://w3id.org/meta-share/meta-share/boolean</ms:parameterType>
        <ms:optional>true</ms:optional>
        <ms:multiValue>false</ms:multiValue>
        <ms:defaultValue>false</ms:defaultValue>
</ms:parameter>

trl

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService.trl

Data type CV (TRL)

Optionality Recommended

Explanation & Instructions

Specifies the TRL (Technology Readiness Level) of the technology according to the measurement system defined by the EC (https://ec.europa.eu/research/participants/data/ref/h2020/wp/2014_2015/annexes/h2020-wp1415-annex-g-trl_en.pdf)

Example

<ms:trl>http://w3id.org/meta-share/meta-share/trl4</ms:trl>

evaluated

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService.evaluated

Data type boolean

Optionality Mandatory

Explanation & Instructions

Indicates whether the tool or service has been evaluated

If the tool/service has been evaluated, you can use the ‘evaluation’ component to give more detailed information; see here for the relevant elements.

Example

<ms:evaluated>false</ms:evaluated>

SoftwareDistribution

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService.SoftwareDistribution

Data type component

Optionality Mandatory

Explanation & Instructions

Any form with which software is distributed (e.g., web services, executable or code files, etc.)

This element groups together information that pertains to the physical form of a tool/service that is made available through the catalogue. For software that is distributed with multiple forms (e.g., as source code, as a web service, etc.), you can repeat this group of elements. The access location and the licensing conditions may differ for each distribution.

The following list includes the mandatory and recommended elements:

  • SoftwareDistributionForm (Mandatory): The medium, delivery channel or form (e.g., source code, API, web service, etc.) through which a software object is distributed. Use the value http://w3id.org/meta-share/meta-share/dockerImage for ELG integrated services.

  • webServiceType (Recommended if applicable): The type of a web service following the web service communication protocols. Recommended for web services.

  • dockerDownloadLocation (Mandatory if applicable): A location where the the LT tool docker image is stored. For ELG integrated services, add the location from where the ELG team can download the docker image in order to test it.

  • serviceAdapterDownloadLocation (Mandatory if applicable): Τhe URL where the docker image of the service adapter can be downloaded from. Required only for ELG integrated services implemented with an adapter.

  • executionLocation (Mandatory if applicable): A URL where the resource (mainly software) can be directly executed. Add here the REST endpoint at which the LT tool is exposed within the Docker image. It is also used for software available in the form of executable code or web services.

  • downloadLocation (Mandatory if applicable): A URL where a tool can be downloaded from. To be used only for direct links, i.e. for links that require no extra actions on the part of the user.

  • accessLocation (Mandatory if applicable): A URL where a tool can be accessed. It can be used, for instance, for links to tools that are included in a web page, or for tools that require authentication and authorization before being accessed.

  • demoLocation (Recommended if applicable): A URL providing access to a demo version of the tool/service. For ELG integrated services, this does not have to be filled in, since ELG provides a demo version at the “Try out” tab of the metadata record.

  • privateResource (Recommended): Specifies whether the resource is private so that its access/download location remains hidden.

  • additionalHwRequirements (Mandatory if applicable): A short text where you specify additional requirements for running the service, e.g. memory requirements, etc. The recommended format for this is: ‘limits_memory: X limits_cpu: Y’

  • licenceTerms (Mandatory): See licenceTerms

  • cost (Recommended if applicable): The cost for accessing a resource or the overall budget of a project, formally described as a set of amount (amount) and currency unit (currency). Fill in this element only if the tool/service can be accessed on a fee.

  • membershipInstitution (Recommended if applicable): Introduces an institution with members that can benefit from specific conditions on the use of a resource (e.g. discount, unlimited access, etc.). Use this element only if such specific conditions apply.

Example

<ms:SoftwareDistribution>
        <ms:SoftwareDistributionForm>http://w3id.org/meta-share/meta-share/dockerImage</ms:SoftwareDistributionForm>
        <ms:executionLocation>http://localhost:8080/mt/process/</ms:executionLocation>
        <ms:dockerDownloadLocation>registry.gitlab.com/EXAMPLE</ms:dockerDownloadLocation>
        <ms:serviceAdapterDownloadLocation>registry.gitlab.com/serviceAdapter</ms:serviceAdapterDownloadLocation>
        <ms:privateResource>false</ms:privateResource>
        <ms:isDescribedBy>
                <ms:title xml:lang="en">description article</ms:title>
                <ms:DocumentIdentifier ms:DocumentIdentifierScheme="http://purl.org/spar/datacite/bibcode">String</ms:DocumentIdentifier>
        </ms:isDescribedBy>
        <ms:additionalHWRequirements>terabytes</ms:additionalHWRequirements>
        <ms:licenceTerms>
                <ms:licenceTermsName xml:lang="en">GNU Lesser General Public License v3.0 only</ms:licenceTermsName>
                <ms:licenceTermsURL>https://spdx.org/licenses/LGPL-3.0-only.html</ms:licenceTermsURL>
                <ms:LicenceIdentifier ms:LicenceIdentifierScheme="http://w3id.org/meta-share/meta-share/SPDX">LGPL-3.0-only</ms:LicenceIdentifier>
                <ms:conditionOfUse>http://w3id.org/meta-share/meta-share/unspecified</ms:conditionOfUse>
        </ms:licenceTerms>
        <ms:cost>
                <ms:amount>14500</ms:amount>
                <ms:currency>http://w3id.org/meta-share/meta-share/euro</ms:currency>
        </ms:cost>
        <ms:membershipInstitution>http://w3id.org/meta-share/meta-share/ELRA</ms:membershipInstitution>
</ms:SoftwareDistribution>

licenceTerms

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService.SoftwareDistribution.licenceTerms

Data type component

Optionality Mandatory

Explanation & Instructions

Links the distribution (distributable form) of a language resource to the licence or terms of use/service (a specific legal document) with which it is distributed

The recommended practice is to add a licence name and identifier from the SPDX list of licences (https://spdx.org/licenses/). For proprietary licences or licences not included in the above list, please add a (unique) licence name and the URL where the text of the licence can be found.

You must also fill in the conditionOfUse element. For popular standard licences, we have already included the conditions of use. So, you can add the element with the value http://w3id.org/meta-share/meta-share/unspecified. For proprietary licences, you can add the conditions of user or use the same value.

Example

<ms:licenceTerms>
        <ms:licenceTermsName xml:lang="en">GNU Lesser General Public License v3.0 only</ms:licenceTermsName>
        <ms:licenceTermsURL>https://spdx.org/licenses/LGPL-3.0-only.html</ms:licenceTermsURL>
        <ms:LicenceIdentifier ms:LicenceIdentifierScheme="http://w3id.org/meta-share/meta-share/SPDX">LGPL-3.0-only</ms:LicenceIdentifier>
        <ms:conditionOfUse>http://w3id.org/meta-share/meta-share/unspecified</ms:conditionOfUse>
</ms:licenceTerms>

<ms:licenceTerms>
        <ms:licenceTermsName xml:lang="en">publicDomain</ms:licenceTermsName>
        <ms:licenceTermsURL>https://elrc-share.eu/terms/publicDomain.html</ms:licenceTermsURL>
        <ms:conditionOfUse>http://w3id.org/meta-share/meta-share/noConditions</ms:conditionOfUse>
</ms:licenceTerms>

<ms:licenceTerms>
        <ms:licenceTermsName xml:lang="en">Creative Commons Attribution 4.0 International</ms:licenceTermsName>
        <ms:licenceTermsURL>https://creativecommons.org/licenses/by/4.0/legalcode</ms:licenceTermsURL>
        <ms:LicenceIdentifier ms:LicenceIdentifierScheme="http://w3id.org/meta-share/meta-share/SPDX">CC-BY-4.0</ms:LicenceIdentifier>
        <ms:conditionOfUse>http://w3id.org/meta-share/meta-share/attribution</ms:conditionOfUse>
</ms:licenceTerms>

Minimal elements for corpora

This page describes the minimal metadata elements specific to corpora.

1. Overview

Corpora are collections of text documents, audio transcripts, audio and video recordings, etc. To cater for the representation of multimedia/multimodal language resources (e.g. a corpus of videos and their subtitles, or corpus of audio recordings and their transcripts), the notion of “media part” is introduced in the model. Thus, a corpus consists of at least one text, audio, video, image and numerical text parts. Depending on the media part type, the DatasetDistribution component includes a set of text, audio, video, etc. distribution features.

The first table below has all the elements (mandatory and recommended) for a Corpus. The second table presents the mandatory and recommended elements for each media part. The third table presents the mandatory and recommended elements for the Distribution component, which includes elements that are specific to each media part.

Table 1 - Corpus common

Element name

Optionality

Section

Tab

corpusSubclass

M

Corpus

Technical

personalDataIncluded

M

Corpus

Technical

personalDataDetails

RA

Corpus

Technical

sensitiveDataIncluded

RA

Corpus

Technical

sensitiveDataDetails

M

Corpus

Technical

anonymized

MA

Corpus

Technical

anonymizationDetails

RA

Corpus

Technical

isAnnotatedVersionOf

R

Corpus

Technical

Table 2 - Media parts

Element name

Optionality

Section

Tab

lingualityType

M

Corpus

text part

multilingualityType

MA

Corpus

text part

multilingualityTypeDetails

R

Corpus

text part

language

M

Corpus

text part

textType

R

Corpus

text part

annotation

RA

Corpus

text part

lingualityType

M

Corpus

audio part

multilingualityType

MA

Corpus

audio part

multilingualityTypeDetails

RA

Corpus

audio part

language

M

Corpus

audio part

AudioGenre

R

Corpus

audio part

SpeechGenre

R

Corpus

audio part

numberOfParticipants

R

Corpus

audio part

dialectAccentOfParticipants

R

Corpus

audio part

geographicDistributionOfParticipants

R

Corpus

audio part

annotation

RA

Corpus

audio part

lingualityType

M

Corpus

video part

multilingualityType

MA

Corpus

video part

multilingualityTypeDetails

RA

Corpus

video part

language

M

Corpus

video part

typeOfVideoContent

M

Corpus

video part

VideoGenre

R

Corpus

video part

numberOfParticipants

R

Corpus

video part

dialectAccentOfParticipants

R

Corpus

video part

geographicDistributionOfParticipants

R

Corpus

video part

annotation

RA

Corpus

video part

lingualityType

M

Corpus

image part

multilingualityType

RA

Corpus

image part

multilingualityTypeDetails

RA

Corpus

image part

language

M

Corpus

image part

typeOfImageContent

M

Corpus

image part

ImageGenre

R

Corpus

image part

annotation

RA

Corpus

image part

typeOfTextNumericalContent

M

Corpus

numerical text part

numberOfParticipants

R

Corpus

numerical text part

dialectAccentOfParticipants

R

Corpus

numerical text part

geographicDistributionOfParticipants

R

Corpus

numerical text part

annotation

RA

Corpus

numerical text part

Table 3 - Distribution

Element name

Optionality

Section

Tab

DatasetDistribution

M

Distribution

Technical

DatasetDistributionForm

M

Distribution

Technical

downloadLocation

MA

Distribution

Technical

accessLocation

MA

Distribution

Technical

distributionLocation

MA

Distribution

Technical

samplesLocation

R

Distribution

Technical

distributionTextFeature

MA

Distribution

Technical

distributionAudioFeature

MA

Distribution

Technical

distributionVideoFeature

MA

Distribution

Technical

distributionImageFeature

MA

Distribution

Technical

distributionTextNumericalFeature

MA

Distribution

Technical

licenceTerms

M

Distribution

Technical

cost

R

Distribution

Technical

membershipInstitution

R

Distribution

Technical

2. Element presentation

In this section all the aforementioned elements are presented each one separately. The presentation follows the order of the elements in the tables of the previous section.


Corpus

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus

Data type component

Optionality Mandatory

Explanation & Instructions

Wraps together the set of elements that is specific to corpora

Example

<ms:LRSubclass>
        <ms:Corpus>
                <ms:lrType>Corpus</ms:lrType>
        </ms:Corpus>
</ms:LRSubclass>

corpusSubclass

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.corpusSubclass

Data type CV (corpusSubclass)

Optionality Mandatory

Explanation & Instructions

Introduces a classification of corpora into types (used for descriptive reasons)

Use one of the values for raw corpora, annotated corpora (mixed raw with annotations), annotations (only annotations without the original corpus)

Example

<ms:corpusSubclass>http://w3id.org/meta-share/meta-share/rawCorpus</ms:corpusSubclass>

<ms:corpusSubclass>http://w3id.org/meta-share/meta-share/annotatedCorpus</ms:corpusSubclass>

personalDataIncluded

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.personalDataIncluded

Data type CV

Optionality Mandatory

Explanation & Instructions

Specifies whether the language resource contains personal data (mainly in the sense falling under the GDPR)

If the resource contains personal data, you can use the (recommended) personalDataDetails to provide more information

Example

<ms:personalDataIncluded>http://w3id.org/meta-share/meta-share/yesP</ms:personalDataIncluded>
<ms:personalDataDetails>The corpus contains data on the place of living and place of birth of participants</ms:personalDataDetails>

sensitiveDataIncluded

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.sensitiveDataIncluded

Data type CV

Optionality Mandatory

Explanation & Instructions

Specifies whether the language resource contains sensitive data (e.g., medical/health-related, etc.) and thus requires special handling

If the resource contains sensitive data, you can use the (recommended) sensitiveDataDetails to provide more information.

Example

<ms:sensitiveDataIncluded>http://w3id.org/meta-share/meta-share/yesS</ms:sensitiveDataIncluded>
<ms:sensitiveDataDetails>The corpus contains medical data for persons with disabilities</ms:sensitiveDataDetails>

anonymized

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.anonymized

Data type CV

Optionality Mandatory if applicable

Explanation & Instructions

Indicates whether the language resource has been anonymized

The element is mandatory if either personalDataIncluded or sensitiveDataIncluded have ‘true’ as value; anonymizationDetails must also be filled in with information on the anonymization mehod, etc.

Example

<ms:anonymized>http://w3id.org/meta-share/meta-share/yesA</ms:anonmized>
<ms:anonymizationDetails>pseudonymization performed manually</ms:anonymizationDetails>

isAnnotatedVersionOf

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.isAnnotatedVersionOf

Data type component

Optionality Recommended when applicable

Explanation & Instructions

Links to a corpus B which is the raw corpus that has been annotated (corpus A, the one being described)

You must provide the resourceName of the language resource and, if possible, an LRIdentifier that will help uniquely identify it.

Example

<ms:isAnnotatedVersionOf>
        <ms:resourceName xml:lang="en">MTP Annotated German corpus - untagged version</ms:resourceName>
        <ms:LRIdentifier ms:LRIdentifierScheme="http://w3id.org/meta-share/meta-share/islrn">417-827-623-669-9</ms:LRIdentifier>
</ms:isAnnotatedVersionOf>

CorpusTextPart

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.CorpusMediaPart.CorpusTextPart

Data type component

Optionality Mandatory if applicable

Explanation & Instructions

The part of a corpus (or a whole corpus) that consists of textual segments (e.g., a corpus of publications, or transcriptions of an oral corpus, or subtitles , etc.)

You can repeat the group of elements for multiple textual parts.

The mandatory or recommended elements for the text part are:

  • mediaType (Mandatory): Specifies the media type of a language resource (the physical medium of the contents representation). For text parts, always use the value ‘text’.

  • lingualityType (Mandatory): Indicates whether the resource includes one, two or more languages. Computed by the system based on the number of language or the ISO value for collective languages.

  • multilingualityType (Mandatory if applicable): Indicates whether the resource (part) is parallel, comparable or mixed. If lingualityType = bilingual or multilingual, it is required; select one of the values for parallel (e.g., original text and its translations), comparable (e.g. corpus of the same domain in multiple languages) and multilingualSingleText (for corpora that consist of segments including text in two or more languages (e.g., the transcription of a European Parliament session with MPs speaking in their native language.

  • language (Mandatory): Specifies the language that is used in the resource part , expressed according to the BCP47 recommendation. See language.

  • languageVariety (Mandatory if applicable): Relates a language resource that contains segments in a language variety (e.g., dialect, jargon) to it. Please use for dialect corpora.

  • modalityType (Recommended if applicable): Specifies the type of the modality represented in the resource. For instance, you can use ‘spoken language’ to describe transcribed speech corpora.

  • TextGenre (Recommended): A category of text characterized by a particular style, form, or content according to a specific classification scheme. See TextGenre.

  • annotation (Mandatory if applicable): A set of features describing the annotated parts of a resource. See annotation.

Example

<ms:CorpusTextPart>
        <ms:corpusMediaType>CorpusTextPart</ms:corpusMediaType>
        <ms:mediaType>http://w3id.org/meta-share/meta-share/text</ms:mediaType>
        <ms:lingualityType>http://w3id.org/meta-share/meta-share/monolingual</ms:lingualityType>
        <ms:language>
                <ms:languageTag>es</ms:languageTag>
                <ms:languageId>es</ms:languageId>
        </ms:language>
</ms:CorpusTextPart>

<ms:CorpusTextPart>
        <ms:corpusMediaType>CorpusTextPart</ms:corpusMediaType>
        <ms:mediaType>http://w3id.org/meta-share/meta-share/text</ms:mediaType>
        <ms:lingualityType>http://w3id.org/meta-share/meta-share/bilingual</ms:lingualityType>
        <ms:language>
                <ms:languageTag>es</ms:languageTag>
                <ms:languageId>es</ms:languageId>
        </ms:language>
        <ms:language>
                <ms:languageTag>en</ms:languageTag>
                <ms:languageId>en</ms:languageId>
        </ms:language>
        <ms:multilingualityType>http://w3id.org/meta-share/meta-share/parallel</ms:multilingualityType>
        <ms:TextGenre>
                <ms:CategoryLabel>administrative texts</ms:CategoryLabel>
        </ms:TextGenre>
</ms:CorpusTextPart>

<ms:CorpusTextPart>
        <ms:corpusMediaType>CorpusTextPart</ms:corpusMediaType>
        <ms:mediaType>http://w3id.org/meta-share/meta-share/text</ms:mediaType>
        <ms:lingualityType>http://w3id.org/meta-share/meta-share/monolingual</ms:lingualityType>
        <ms:language>
                <ms:languageTag>en</ms:languageTag>
                <ms:languageId>en</ms:languageId>
        </ms:language>
        <ms:modalityType>http://w3id.org/meta-share/meta-share/spokenLanguage</ms:modalityType>
</ms:CorpusTextPart>

CorpusAudioPart

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.CorpusMediaPart.CorpusAudioPart

Data type component

Optionality Mandatory if applicable

Explanation & Instructions

The part of a corpus (or whole corpus) that consists of audio segments

You can repeat the group of elements for multiple audio parts.

The mandatory or recommended elements for the audio part are:

  • mediaType (Mandatory): Specifies the media type of a language resource (the physical medium of the contents representation). For audio parts, always use the value ‘audio’

  • lingualityType (Mandatory ): Indicates whether the resource includes one, two or more languages. Computed by the system based on the number of language or the ISO value for collective languages.

  • multilingualityType (Mandatory if applicable): Indicates whether the resource (part) is parallel, comparable or mixed. If lingualityType = bilingual or multilingual, it is required; select one of the values for parallel (e.g., original text and its translations), comparable (e.g. corpus of the same domain in multiple languages) and multilingualSingleText (for corpora that consist of segments with content in two or more languages (e.g., the transcription of a European Parliament session with MPs speaking in their native language)

  • language (Mandatory): Specifies the language that is used in the resource part , expressed according to the BCP47 recommendation. See language

  • languageVariety (Mandatory if applicable): Relates a language resource that contains segments in a language variety (e.g., dialect, jargon) to it. Please use for dialect corpora.

  • modalityType (Recommended if applicable): Specifies the type of the modality represented in the resource. For instance, you can use ‘spoken language’ to describe transcribed speech corpora.

  • AudioGenre (Recommended if applicable): A category of audio characterized by a particular style, form, or content according to a specific classification scheme. See AudioGenre

  • SpeechGenre (Recommended if applicable): A category for the conventionalized discourse of the speech part of a language resource, based on extra-linguistic and internal linguistic criteria. See SpeechGenre

  • annotation (Mandatory if applicable): A set of features describing the annotated parts of a resource. See annotation.

Example

<ms:CorpusAudioPart>
        <ms:corpusMediaType>CorpusAudioPart</ms:corpusMediaType>
        <ms:mediaType>http://w3id.org/meta-share/meta-share/audio</ms:mediaType>
        <ms:lingualityType>http://w3id.org/meta-share/meta-share/monolingual</ms:lingualityType>
        <ms:language>
                <ms:languageTag>en</ms:languageTag>
                <ms:languageId>en</ms:languageId>
        </ms:language>
        <ms:AudioGenre>
                <ms:CategoryLabel>conference noises</ms:CategoryLabel>
        </ms:AudioGenre>
</ms:CorpusAudioPart>

<ms:CorpusAudioPart>
        <ms:corpusMediaType>CorpusAudioPart</ms:corpusMediaType>
        <ms:mediaType>http://w3id.org/meta-share/meta-share/audio</ms:mediaType>
        <ms:lingualityType>http://w3id.org/meta-share/meta-share/monolingual</ms:lingualityType>
        <ms:language>
                <ms:languageTag>en</ms:languageTag>
                <ms:languageId>en</ms:languageId>
        </ms:language>
        <ms:modalityType>http://w3id.org/meta-share/meta-share/spokenLanguage</ms:modalityType>
        <ms:SpeechGenre>
                <ms:CategoryLabel>monologue</ms:CategoryLabel>
        </ms:SpeechGenre>
</ms:CorpusAudioPart>

CorpusVideoPart

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.CorpusMediaPart.CorpusVideoPart

Data type component

Optionality Mandatory if applicable

Explanation & Instructions

The part of a corpus (or a whole corpus) that consists of video segments (e.g., a corpus of video lectures, a part of a corpus with news, a sign language corpus, etc.)

You can repeat the group of elements for multiple video parts.

The mandatory or recommended elements for the video part are:

  • mediaType (Mandatory): Specifies the media type of a language resource (the physical medium of the contents representation). For video parts, always use the value ‘video’.

  • lingualityType (Mandatory ): Indicates whether the resource includes one, two or more languages. Computed by the system based on the number of language or the ISO value for collective languages.

  • multilingualityType (Mandatory if applicable): Indicates whether the resource (part) is parallel, comparable or mixed. If lingualityType = bilingual or multilingual, it is required; select one of the values for parallel (e.g., original text and its translations), comparable (e.g. corpus of the same domain in multiple languages) and multilingualSingleText (for corpora that consist of segments with content in two or more languages (e.g., the transcription of a European Parliament session with MPs speaking in their native language).

  • language (Mandatory): Specifies the language that is used in the resource part , expressed according to the BCP47 recommendation. See language.

  • languageVariety (Mandatory if applicable): Relates a language resource that contains segments in a language variety (e.g., dialect, jargon) to it. Please use for dialect corpora.

  • modalityType (Recommended if applicable): Specifies the type of the modality represented in the resource. For instance, you can use ‘spoken language’ to describe transcribed speech corpora.

  • VideoGenre (Recommended): A classification of video parts based on extra-linguistic and internal linguistic criteria and reflected on the video style, form or content. See VideoGenre

  • typeOfVideoContent (Mandatory): Main type of object or people represented in the video.

  • annotation (Mandatory if applicable): A set of features describing the annotated parts of a resource. See annotation.

Example

<ms:CorpusVideoPart>
        <ms:corpusMediaType>CorpusVideoPart</ms:corpusMediaType>
        <ms:mediaType>http://w3id.org/meta-share/meta-share/video</ms:mediaType>
        <ms:lingualityType>http://w3id.org/meta-share/meta-share/monolingual</ms:lingualityType>
        <ms:language>
                <ms:languageTag>en</ms:languageTag>
                <ms:languageId>en</ms:languageId>
        </ms:language>
        <ms:modalityType>http://w3id.org/meta-share/meta-share/bodyGesture</ms:modalityType>
        <ms:modalityType>http://w3id.org/meta-share/meta-share/facialExpression</ms:modalityType>
        <ms:modalityType>http://w3id.org/meta-share/meta-share/spokenLanguage</ms:modalityType>
        <ms:typeOfVideoContent>people eating at a restaurant</ms:typeOfVideoContent>
</ms:CorpusVideoPart>

<ms:CorpusVideoPart>
        <ms:corpusMediaType>CorpusVideoPart</ms:corpusMediaType>
        <ms:mediaType>http://w3id.org/meta-share/meta-share/video</ms:mediaType>
        <ms:lingualityType>http://w3id.org/meta-share/meta-share/monolingual</ms:lingualityType>
        <ms:language>
                <ms:languageTag>fr</ms:languageTag>
                <ms:languageId>fr</ms:languageId>
        </ms:language>
        <ms:VideoGenre>
                <ms:CategoryLabel>documentary</ms:CategoryLabel>
        </ms:VideoGenre>
        <ms:typeOfVideoContent>birds, wild animals, plants</ms:typeOfVideoContent>
</ms:CorpusVideoPart>

CorpusImagePart

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.CorpusMediaPart.CorpusImagePart

Data type component

Optionality Mandatory if applicable

Explanation & Instructions

The part of a corpus (or whole corpus) that consists of images (e.g., g a corpus of photographs and their captions)

You can repeat the group of elements for multiple image parts.

The mandatory or recommended elements for the image part are:

  • mediaType (Mandatory): Specifies the media type of a language resource (the physical medium of the contents representation). For image parts, always use the value ‘image’.

  • lingualityType (Mandatory ): Indicates whether the resource includes one, two or more languages. Computed by the system based on the number of language or the ISO value for collective languages.

  • multilingualityType (Mandatory if applicable): Indicates whether the resource (part) is parallel, comparable or mixed. If lingualityType = bilingual or multilingual, it is required; select one of the values for parallel (e.g., original text and its translations), comparable (e.g. corpus of the same domain in multiple languages) and multilingualSingleText (for corpora that consist of segments with content in two or more languages (e.g., the transcription of a European Parliament session with MPs speaking in their native language).

  • language (Mandatory): Specifies the language that is used in the resource part, expressed according to the BCP47 recommendation. See language.

  • languageVariety (Mandatory if applicable): Relates a language resource that contains segments in a language variety (e.g., dialect, jargon) to it. Please use for dialect corpora.

  • modalityType (Recommended if applicable): Specifies the type of the modality represented in the resource.

  • ImageGenre (Recommended): A category of images characterized by a particular style, form, or content according to a specific classification scheme. See ImageGenre.

  • typeOfImageContent (Mandatory): Main type of object or people represented in the image.

  • annotation (Mandatory if applicable): A set of features describing the annotated parts of a resource. See annotation.

Example

<ms:CorpusImagePart>
        <ms:corpusMediaType>CorpusImagePart</ms:corpusMediaType>
        <ms:mediaType>http://w3id.org/meta-share/meta-share/image</ms:mediaType>
        <ms:lingualityType>http://w3id.org/meta-share/meta-share/monolingual</ms:lingualityType>
        <ms:language>
                <ms:languageTag>el</ms:languageTag>
                <ms:languageId>el</ms:languageId>
        </ms:language>
        <ms:ImageGenre>
                <ms:CategoryLabel>comics</ms:CategoryLabel>
        </ms:ImageGenre>
        <ms:typeOfImageContent>human figures</ms:typeOfImageContent>
</ms:CorpusImagePart>

CorpusTextNumericalPart

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.CorpusMediaPart.CorpusTextNumericalPart

Data type component

Optionality Mandatory if applicable

Explanation & Instructions

The part of a corpus (or whole corpus) that consists of sets of textual representations of measurements and observations linked to sensorimotor recordings

You can repeat the group of elements for multiple numerical text parts.

The mandatory or recommended elements for this part are:

  • mediaType (Mandatory): Specifies the media type of a language resource (the physical medium of the contents representation). For numerical text parts, always use the value ‘textNumerical’.

  • typeOfTextNumericalContent (Mandatory): Main type of object or people represented in this part.

  • numberOfParticipants (Recommended): The number of the persons participating in the part of the resource

  • dialectAccentOfParticipants (Recommended): Provides information on the dialect accent of the group of participants

  • geographicDistributionOfParticipants (Recommended): Gives information on the geographic distribution of the participants

  • annotation (Mandatory if applicable): A set of features describing the annotated parts of a resource. See annotation.

Example

<ms:CorpusTextNumericalPart>
        <ms:corpusMediaType>CorpusImagePart</ms:corpusMediaType>
        <ms:mediaType>http://w3id.org/meta-share/meta-share/textNumerical</ms:mediaType>
        <ms:typeOfTextNumericalContent>temperature measures</ms:typeOfTextNumericalContent>
</ms:CorpusTextNumericalPart>

TextGenre

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.CorpusMediaPart.CorpusTextPart.TextGenre

Data type component

Optionality Recommended

Explanation & Instructions

A category of text characterized by a particular style, form, or content according to a specific classification scheme

You can add only a free text value at the CategoryLabel element; if you have used a value from an established controlled vocabulary, you can use the TextGenreIdentifier and the attribute TextGenreClassificationScheme.

Example

<ms:TextGenre>
        <ms:CategoryLabel>movie subtitles</ms:CategoryLabel>
</ms:TextGenre>

<ms:TextGenre>
        <ms:CategoryLabel>news articles</ms:CategoryLabel>
</ms:TextGenre>

AudioGenre

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.CorpusMediaPart.CorpusAudioPart

Data type component

Optionality Recommended if applicable

Explanation & Instructions

A category of audio characterized by a particular style, form, or content according to a specific classification scheme

You can add only a free text value at the CategoryLabel element; if you have used a value from an established controlled vocabulary, you can use the AudioGenreIdentifier and the attribute AudioGenreClassificationScheme to provide further details.

Example

<ms:AudioGenre>
        <ms:CategoryLabel>conference noises</ms:CategoryLabel>
</ms:AudioGenre>

SpeechGenre

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.CorpusMediaPart.CorpusAudioPart.SpeechGenre

Data type component

Optionality Recommended if applicable

Explanation & Instructions

A category for the conventionalized discourse of the speech part of a language resource, based on extra-linguistic and internal linguistic criteria

You can add only a free text value at the CategoryLabel element; if you have used a value from an established controlled vocabulary, you can use the SpeechGenreIdentifier and the attribute SpeechGenreClassificationScheme to provide further details.

Example

<ms:SpeechGenre>
        <ms:CategoryLabel>broadcast news</ms:CategoryLabel>
</ms:SpeechGenre>

<ms:SpeechGenre>
        <ms:CategoryLabel>monologue</ms:CategoryLabel>
</ms:SpeechGenre>

VideoGenre

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.CorpusMediaPart.CorpusVideoPart.VideoGenre

Data type string (+ id + scheme)

Optionality Recommended if applicable

Explanation & Instructions

A classification of video parts based on extra-linguistic and internal linguistic criteria and reflected on the video style, form or content

You can add only a free text value at the CategoryLabel element; if you have used a value from an established controlled vocabulary, you can use the VideoGenreIdentifier and the attribute VideoClassificationScheme

Example

<ms:videoGenre>
        <ms:CategoryLabel>documentaries</ms:CategoryLabel>
</ms:videoGenre>

<ms:videoGenre>
        <ms:CategoryLabel>video lectures</ms:CategoryLabel>
</ms:videoGenre>

ImageGenre

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.CorpusMediaPart.CorpusImagePart.ImageGenre

Data type component

Optionality Recommended

Explanation & Instructions

A category of images characterized by a particular style, form, or content according to a specific classification scheme

You can add only a free text value at the CategoryLabel element; if you have used a value from an established controlled vocabulary, you can use the ImageGenreIdentifier and the attribute ImageClassificationScheme to provide further details.

Example

<ms:imageGenre>
        <ms:CategoryLabel>human faces</ms:CategoryLabel>
</ms:imageGenre>

<ms:imageGenre>
        <ms:CategoryLabel>landscape</ms:CategoryLabel>
</ms:imageGenre>

annotation

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.annotation

Data type component

Optionality Mandatory if applicable

Explanation & Instructions

Links a corpus to its annotated part(s)

You must use it for annotated corpora and annotations. You can repeat it for corpora that have separate files for each annotation type, or if you want to given information such as the use of different annotation tools for each annotation level.

Enter at least the annotation type(s); if you want, you can give a more detailed description of the annotated parts - see the annotation component of the full schema.

Example

<ms:annotation>
        <ms:annotationType><ms:annotationTypeRecommended>http://w3id.org/meta-share/omtd-share/Lemma</ms:annotationTypeRecommended></ms:annotationType>
        <ms:annotationStandoff>false</ms:annotationStandoff>
        <ms:annotationMode>http://w3id.org/meta-share/meta-share/mixed</ms:annotationMode>
        <ms:isAnnotatedBy>
                <ms:resourceName xml:lang="en">Lemmatizer</ms:resourceName>
        </ms:isAnnotatedBy>
</ms:annotation>

<ms:annotation>
        <ms:annotationType><ms:annotationTypeRecommended>http://w3id.org/meta-share/omtd-share/PartOfSpeech</ms:annotationTypeRecommended></ms:annotationType>
        <ms:annotationStandoff>false</ms:annotationStandoff>
        <ms:tagset>
                <ms:resourceName xml:lang="en">Universal Dependencies</ms:resourceName>
        </ms:tagset>
        <ms:isAnnotatedBy>
                <ms:resourceName xml:lang="en">PoS tagger</ms:resourceName>
        </ms:isAnnotatedBy>
</ms:annotation>

<ms:annotation>
        <ms:annotationType><ms:annotationTypeRecommended>http://w3id.org/meta-share/omtd-share/SyntacticAnnotationType</ms:annotationTypeRecommended></ms:annotationType>
</ms:annotation>

DatasetDistribution

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.DatasetDistribution

Data type component

Optionality Mandatory

Explanation & Instructions

Any form with which a dataset is distributed, such as a downloadable form in a specific format (e.g., spreadsheet, plain text, etc.) or an API with which it can be accessed

You can repeat the element for multiple distributions.

The list of mandatory and recommended elements are:

  • DatasetDistributionForm (Mandatory): The form (medium/channel) used for distributing a language resource consisting of data (e.g., a corpus, a lexicon, etc.). The typical values are ‘downloadable’, ‘accessibleThroughInterface’, ‘accessibleThroughQuery’ (see more at DatasetDistributionForm).

  • downloadLocation (Mandatory if applicable): A URL where the language resource (mainly data but also downloadable software programmes or forms) can be downloaded from. Use this element if the value of DatasetDistributionForm is ‘downloadable’ and only for direct download links (i.e., from which the dataset is downloaded without the need of further actions such as clicks on a page).

  • accessLocation (Mandatory if applicable): A URL where the resource can be accessed from; it can be used for landing pages or for cases where the resource is accessible via an interface, i.e. cases where the resource itself is not provided with a direct link for downloading. Use if the value of DatasetDistributionForm is ‘accessibleThroughInterface’ or ‘accessibleThroughQuery’ but also for links used for downloading corpora which are mentioned on a landing page or require some kind of action on the part of the user.

  • samplesLocation (Recommended): Links a resource to a url (or url’s) with samples of a data resource or of the input of output resource of a tool/service.

  • licenceTerms (Mandatory): See licenceTerms

  • cost (Mandatory if applicable): Introduces the cost for accessing a resource, formally described as a set of amount and currency unit. Please use only for resources available at a cost and not for free resources.

Depending on the parts of the corpus, you must also use one or more of the following:

Example

<ms:DatasetDistribution>
        <ms:DatasetDistributionForm>http://w3id.org/meta-share/meta-share/downloadable</ms:DatasetDistributionForm>
        <ms:accessLocation>https://www.someAccessURL.com</ms:accessLocation>
        <ms:samplesLocation>https://www.URLwithsamples.com</ms:samplesLocation>
        <ms:distributionTextFeature>
                <ms:size>
                        <ms:amount>17601</ms:amount>
                        <ms:sizeUnit><ms:sizeUnitRecomended>http://w3id.org/meta-share/meta-share/unit</ms:sizeUnitRecomended></ms:sizeUnit>
                </ms:size>
                <ms:dataFormat><ms:dataFormatRecommended>http://w3id.org/meta-share/omtd-share/Xml</ms:dataFormat></ms:dataFormatRecommended>
                <ms:characterEncoding>http://w3id.org/meta-share/meta-share/UTF-8</ms:characterEncoding>
        </ms:distributionTextFeature>
        <ms:licenceTerms>
                <ms:licenceTermsName xml:lang="en">openUnder-PSI</ms:licenceTermsName>
                <ms:licenceTermsURL>https://elrc-share.eu/terms/openUnderPSI.html</ms:licenceTermsURL>
        </ms:licenceTerms>
</ms:DatasetDistribution>

<ms:DatasetDistribution>
        <ms:DatasetDistributionForm>http://w3id.org/meta-share/meta-share/accessibleThroughInterface</ms:DatasetDistributionForm>
        <ms:accessLocation>https://www.someAccessURL.com</ms:accessLocation>
        <ms:distributionTextFeature>
                <ms:size>
                        <ms:amount>100</ms:amount>
                        <ms:sizeUnit><ms:sizeUnitRecomended>http://w3id.org/meta-share/meta-share/text1</ms:sizeUnitRecomended></ms:sizeUnit>
                </ms:size>
                <ms:dataFormat><ms:dataFormatRecommended>http://w3id.org/meta-share/omtd-share/Pdf</ms:dataFormat></ms:dataFormatRecommended>
                <ms:characterEncoding>http://w3id.org/meta-share/meta-share/UTF-8</ms:characterEncoding>
        </ms:distributionTextFeature>
        <ms:licenceTerms>
                <ms:licenceTermsName xml:lang="en">some commercial licence</ms:licenceTermsName>
                <ms:licenceTermsURL>https://elrc-share.eu/terms/someCommercialLicence.html</ms:licenceTermsURL>
        </ms:licenceTerms>
        <ms:cost>
                <ms:amount>10000</ms:amount>
                <ms:currency>http://w3id.org/meta-share/meta-share/euro</ms:currency>
        </ms:cost>
</ms:DatasetDistribution>

distributionTextFeature

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.DatasetDistribution.distributionTextFeature

Data type component

Optionality Mandatory if applicable

Explanation & Instructions

Links to a feature that can be used for describing distinct distributable forms of text resources/parts

The following are mandatory or recommended:

  • size (Mandatory): The size of the text part, expressed as a combination of amount and sizeUnit (with a value from a recommended CV for sizeUnitRecommended) or a free text value (sizeUnitOther).

  • dataFormat (Mandatory): Indicates the format(s) of a data resource; it takes a value from a recommended CV (dataFormatRecommended) or a free value (dataFormatOther); the dataFormat includes the IANA mimetype and pointers to additional documentation for specialized formats (e.g., GATE XML, CONLL formats, etc.).

  • characterEncoding (Recommended): Specifies the character encoding used for a language resource data distribution.

Example

<ms:distributionTextFeature>
        <ms:size>
                <ms:amount>9139</ms:amount>
                <ms:sizeUnit><ms:sizeUnitRecomended>http://w3id.org/meta-share/meta-share/sentence</ms:sizeUnitRecomended></ms:sizeUnit>
        </ms:size>
        <ms:size>
                <ms:amount>40</ms:amount>
                <ms:sizeUnit><ms:sizeUnitRecomended>http://w3id.org/meta-share/meta-share/file</ms:sizeUnitRecomended></ms:sizeUnit>
        </ms:size>
        <ms:dataFormat><ms:dataFormatRecommended>http://w3id.org/meta-share/omtd-share/Xml</ms:dataFormat></ms:dataFormatRecommended>
        <ms:characterEncoding>http://w3id.org/meta-share/meta-share/UTF-8</ms:characterEncoding>
</ms:distributionTextFeature>

distributionAudioFeature

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.DatasetDistribution.distributionAudioFeature

Data type component

Optionality Mandatory if applicable

Explanation & Instructions

Links to a feature that can be used for describing distinct distributable forms of audio resources/parts

The following are mandatory or recommended:

  • size (Mandatory): The size of the text part, expressed as a combination of amount and sizeUnit (with a value from a recommended CV for sizeUnitRecommended) or a free text value (sizeUnitOther).

  • dataFormat (Mandatory): Indicates the format(s) of a data resource; it takes a value from a recommended CV (dataFormatRecommended) or a free value (dataFormatOther); the dataFormat includes the IANA mimetype and pointers to additional documentation for specialized formats (e.g., GATE XML, CONLL formats, etc.).

  • durationOfAudio (Recommended): Specifies the duration of the audio recording including silences, music, pauses, etc., expressed as a combination of amount and durationUnit (with a value from the CV for durationUnit).

  • durationOfEffectiveSpeech (Recommended): Specifies the duration of effective speech of the audio (part of a) resource, expressed as a combination of amount and durationUnit (with a value from the CV for durationUnit).

  • dataFormat (Mandatory): Indicates the format(s) of a data resource; it takes a value from a recommended CV (dataFormatRecommended) or a free value (dataFormatOther); the dataFormat includes the IANA mimetype and pointers to additional documentation for specialized formats (e.g., GATE XML, CONLL formats, etc.).

  • audioFormat (Recommended): Indicates the format(s) of the audio (part of a) data resource, expressed as a value of dataFormat (with a value from a CV for dataFormat) and compressed.

Example

<ms:distributionAudioFeature>
        <ms:size>
                <ms:amount>10</ms:amount>
                <ms:sizeUnit><ms:sizeUnitRecomended>http://w3id.org/meta-share/meta-share/file</ms:sizeUnitRecomended></ms:sizeUnit>
        </ms:size>
        <ms:durationOfAudio>
                <ms:amount>3</ms:amount>
                <ms:durationUnit>http://w3id.org/meta-share/meta-share/hour</ms:durationUnit>
        </ms:durationOfAudio>
        <ms:dataFormat><ms:dataFormatRecommended>http://w3id.org/meta-share/omtd-share/wav</ms:dataFormat></ms:dataFormatRecommended>
        <ms:audioFormat>
                <ms:dataFormat><ms:dataFormatRecommended>http://w3id.org/meta-share/omtd-share/wav</ms:dataFormat></ms:dataFormatRecommended>
                <ms:compressed>true</ms:compressed>
        </ms:audioFormat>
</ms:distributionAudioFeature>

distributionVideoFeature

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.DatasetDistribution.distributionVideoFeature

Data type component

Optionality Mandatory if applicable

Explanation & Instructions

Links to a feature that can be used for describing distinct distributable forms of video resources/parts

The following are mandatory or recommended:

  • size (Mandatory): The size of the text part, expressed as a combination of amount and sizeUnit (with a value from a recommended CV for sizeUnitRecommended) or a free text value (sizeUnitOther).

  • durationOfVideo (Recommended): Specifies the duration of the video recording, expressed as a combination of amount and durationUnit (with a value from the CV for durationUnit).

  • dataFormat (Mandatory): Indicates the format(s) of a data resource; it takes a value from a recommended CV (dataFormatRecommended) or a free value (dataFormatOther); the dataFormat includes the IANA mimetype and pointers to additional documentation for specialized formats (e.g., GATE XML, CONLL formats, etc.).

  • videoFormat (Recommended): Indicates the format(s) of the video (part of a) data resource, expressed as a value of dataFormat (with a value from a CV for dataFormat) and compressed.

Example

<ms:distributionVideoFeature>
        <ms:size>
                <ms:amount>9139</ms:amount>
                <ms:sizeUnit><ms:sizeUnitRecomended>http://w3id.org/meta-share/meta-share/screen</ms:sizeUnitRecomended></ms:sizeUnit>
        </ms:size>
        <ms:size>
                <ms:amount>40</ms:amount>
                <ms:sizeUnit><ms:sizeUnitRecomended>http://w3id.org/meta-share/meta-share/file</ms:sizeUnitRecomended></ms:sizeUnit>
        </ms:size>
        <ms:durationOfVideo>
                <ms:amount>40</ms:amount>
                <ms:durationUnit>http://w3id.org/meta-share/meta-share/hour</ms:durationUnit>
        </ms:durationOfVideo>
        <ms:dataFormat><ms:dataFormatRecommended>http://w3id.org/meta-share/omtd-share/wav</ms:dataFormat></ms:dataFormatRecommended>
        <ms:videoFormat>
                <ms:dataFormat><ms:dataFormatRecommended>http://w3id.org/meta-share/omtd-share/wav</ms:dataFormat></ms:dataFormatRecommended>
                <ms:compressed>true</ms:compressed>
        </ms:videoFormat>

distributionImageFeature

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.DatasetDistribution.distributionImageFeature

Data type component

Optionality Mandatory if applicable

Explanation & Instructions

Links to a feature that can be used for describing distinct distributable forms of image resources/parts

The following are mandatory or recommended:

  • size (Mandatory): The size of the text part, expressed as a combination of amount and sizeUnit (with a value from a recommended CV for sizeUnitRecommended) or a free text value (sizeUnitOther).

  • dataFormat (Mandatory): Indicates the format(s) of a data resource; it takes a value from a recommended CV (dataFormatRecommended) or a free value (dataFormatOther); the dataFormat includes the IANA mimetype and pointers to additional documentation for specialized formats (e.g., GATE XML, CONLL formats, etc.).

  • imageFormat (Mandatory): Indicates the format(s) of the image (part of a) data resource, expressed as a value of dataFormat (with a value from a CV for dataFormat) and compressed.

Example

<ms:distributionImageFeature>
        <ms:size>
                <ms:amount>100</ms:amount>
                <ms:sizeUnit><ms:sizeUnitRecomended>http://w3id.org/meta-share/meta-share/file</ms:sizeUnitRecomended></ms:sizeUnit>
        </ms:size>
        <ms:dataFormat><ms:dataFormatRecommended>http://w3id.org/meta-share/omtd-share/Pdf</ms:dataFormat></ms:dataFormatRecommended>
        <ms:imageFormat>
                <ms:dataFormat><ms:dataFormatRecommended>http://w3id.org/meta-share/omtd-share/Pdf</ms:dataFormat></ms:dataFormatRecommended>
                <ms:compressed>true</ms:compressed>
        </ms:imageFormat>
</ms:distributionImageFeature>

distributiontextNumericalFeature

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.DatasetDistribution.distributiontextNumericalFeature

Data type component

Optionality Mandatory if applicable

Explanation & Instructions

Links to a feature that can be used for describing distinct distributable forms of image resources/parts

The following are mandatory or recommended:

  • size (Mandatory): The size of the text part, expressed as a combination of amount and sizeUnit (with a value from a recommended CV for sizeUnitRecommended) or a free text value (sizeUnitOther).

  • dataFormat (Mandatory): Indicates the format(s) of a data resource; it takes a value from a recommended CV (dataFormatRecommended) or a free value (dataFormatOther); the dataFormat includes the IANA mimetype and pointers to additional documentation for specialized formats (e.g., GATE XML, CONLL formats, etc.).

Example

<ms:distributionTextNumericalFeature>
        <ms:size>
                <ms:amount>30</ms:amount>
                <ms:sizeUnit><ms:sizeUnitRecomended>http://w3id.org/meta-share/meta-share/file</ms:sizeUnitRecomended></ms:sizeUnit>
        </ms:size>
        <ms:dataFormat><ms:dataFormatRecommended>http://w3id.org/meta-share/omtd-share/Pdf</ms:dataFormat></ms:dataFormatRecommended>
        <ms:imageFormat>
                <ms:dataFormat><ms:dataFormatRecommended>http://w3id.org/meta-share/omtd-share/Pdf</ms:dataFormat></ms:dataFormatRecommended>
                <ms:compressed>true</ms:compressed>
        </ms:imageFormat>
</ms:distributionTextNumericalFeature>

Minimal elements for models

This page describes the minimal metadata elements specific to models.

1. Overview

Although models are a subclass of language descriptions, we describe them here separately, as we do for the editor.

The table below has all the elements (mandatory and recommended) for a model and the second for the Distribution component as implemented for models.

Table 1 - Elements for models

Element name

Optionality

Section

Tab

ldSubclass

M

languageDescriptionSubclass

M

Model

MA

modelFunction

M

Model/Grammar

technical

modelType

R

Model/Grammar

technical

developmentFramework

R

Model/Grammar

technical

hasOriginalSource

R

Model/Grammar

technical

trainingCorpusDetails

R

Model/Grammar

technical

trainingProcessDetails

R

Model/Grammar

technical

biasDetails

R

Model/Grammar

technical

requiresLR

R

Model/Grammar

technical

NgramModel

MA

Model/Grammar

technical

baseItem

M

Model/Grammar

technical

order

M

Model/Grammar

technical

unspecifiedPart

MA

Part

Media part

language

M

Part

Media part

lingualityType

M

Part

Media part

multilingualityType

MA

Part

Media part

multilingualityTypeDetails

R

Part

Media part

metalanguage

R

Part

Media part

Table 2 - Distribution

Element name

Optionality

Section

Tab

DatasetDistribution

M

Distribution

Technical

DatasetDistributionForm

M

Distribution

Technical

downloadLocation

MA

Distribution

Technical

accessLocation

MA

Distribution

Technical

distributionLocation

MA

Distribution

Technical

samplesLocation

R

Distribution

Technical

distributionUnspecifiedFeature

M

Distribution

Technical

licenceTerms

M

Distribution

Technical

cost

R

Distribution

Technical

membershipInstitution

R

Distribution

Technical

2. Element presentation

In this section all the aforementioned elements are presented each one separately. The presentation follows the order of the elements in the tables of the previous section.


LanguageDescription

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LanguageDescription

Data type component

Optionality Mandatory

Explanation & Instructions

Wraps together elements for language descriptions

Example

<ms:LRSubclass>
        <ms:LanguageDescription>
                <ms:lrType>LanguageDescription</ms:lrType>
                ...
        </ms:LanguageDescription>
</ms:LRSubclass>

ldSubclass

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LanguageDescription.ldSubclass

Data type CV

Optionality Mandatory

Explanation & Instructions

The type of the language description

For models, select always http://w3id.org/meta-share/meta-share/model.

Example

<ms:ldSubclass>http://w3id.org/meta-share/meta-share/model<ms:ldSubclass>

LanguageDescriptionSubclass

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LanguageDescription.LanguageDescriptionSubclass

Data type component

Optionality Mandatory

Explanation & Instructions

The type of the language description (used for documentation purposes)

It wraps the set of elements that must be used for the Language Description subclasses. For models, this is the Model component.

Example

<ms:LanguageDescriptionSubclass><ms:Model>
        ...
</ms:Model><ms:LanguageDescriptionSubclass>

Model

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LanguageDescription.LanguageDescriptionSubclass.Model

Data type Component

Optionality Mandatory if applicable

Explanation & Instructions

Mandatory for all models, defined as “The model artifact that is created through a training process involving an algorithm (that is, the learning algorithm) and the training data to learn from”

The following set of elements are mandatory or recommended for ML models:

  • ldSubclassType (Mandatory): Used to mark the subclass of a language description. For ML models, the value is fixed to ‘MLModel’.

  • modelFunction (Mandatory): Specifies the operation/function/task that a model performs; use either a value from the recommended CV (modelFunctionRecommended) or a free text value (modelFunctionFree).

  • modelType (Recommended): A classification of models based on their algorithm; use either a value from the recommended CV (modelTypeRecommended) or a free text value (modelTypeFree).

  • modelVariant (Recommended): Introduces a label that can be used to identify the variant of a ML model.

  • developmentFramework (Recommended): A framework or toolkit (Machine Learning model, NLP toolkit) used in the development of a resource

  • trainingCorpusDetails (Recommended): Provides a detailed description of the training corpus (e.g., size, number of features , etc.).

  • trainingProcessDetails (Recommended): Provides a detailed description of the training process and method.

  • biasDetails (Recommended): Provides a detailed description on bias considerations for the model.

  • requiresLR (Recommended): Links to a language resource or technology that must be used for the operation of the model, such as the tool deploying it.

  • NGramModel (MA): You must use this for describing n-gram models; see NGramModel for more information.

Example

<ms:MLModel>
        <ms:ldSubclassType>Model</ms:ldSubclassType>
        <ms:modelFunction><ms:modelFunctionRecommended>http://w3id.org/meta-share/omtd-share/QuestionAnswering</ms:modelFunctionRecommended></ms:modelFunction>
        <ms:modelType><ms:modelTypeRecommended>http://w3id.org/meta-share/meta-share/DeepLearningModel</ms:modelTypeRecommended><ms:modelType>
        <ms:modelVariant>factored</ms:modelVariant>
        <ms:developmentFramework><ms:DevelopmentFrameworkRecommended>tensorflow</ms:DevelopmentFrameworkRecommended></ms:developmentFramework>
        <ms:trainingCorpusDetails xml:lang="en">Trained on a corpus of tweets</ms:trainingCorpusDetails>
</ms:MLModel>

NGramModel

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LanguageDescription.LanguageDescriptionSubclass.Model.NGramModel

Data type Component

Optionality Mandatory if applicable

Explanation & Instructions

Mandatory for n-gram models; n-gram model for our purposes is defined as “A language model consisting of n-grams, i.e. specific sequences of a number of words”

The following set of elements are mandatory or recommended for Machine Learning models:

  • baseItem (Mandatory): Type of item that is represented in the n-gram resource.

  • order (Mandatory): Specifies the maximum number of items in the sequence.

Example

<ms:NGramModel>
        <ms:ldSubclassType>NGramModel</ms:ldSubclassType>
        <ms:baseItem>http://w3id.org/meta-share/meta-share/word</ms:baseItem>
        <ms:order>5</ms:order>
</ms:NGramModel>

unspecifiedPart

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LanguageDescription.unspecifiedPart

Data type component

Optionality Mandatory

Explanation & Instructions

Groups together all information related to languages for a model.

  • lingualityType (Mandatory): Indicates whether the resource includes one, two or more languages. Computed by the system based on the number of language or the ISO value for collective languages.

  • multilingualityType (Mandatory if applicable): Indicates whether the resource (part) is parallel, comparable or mixed. If lingualityType = bilingual or multilingual, it is required; select one of the values for parallel (e.g., original text and its translations), comparable (e.g. corpus of the same domain in multiple languages) and multilingualSingleText (for corpora that consist of segments including text in two or more languages (e.g., the transcription of a European Parliament session with MPs speaking in their native language.

  • language (Mandatory): Specifies the language that is used in the resource part , expressed according to the BCP47 recommendation. See language.

  • languageVariety (Mandatory if applicable): Relates a language resource that contains segments in a language variety (e.g., dialect, jargon) to it. Please use for dialect corpora.

  • language (Recommended): Specifies the metalanguage, if used, in the resource part , expressed according to the BCP47 recommendation. See language.

Example

<ms:unspecifiedPart>
        <ms:lingualityType>http://w3id.org/meta-share/meta-share/monolingual</ms:lingualityType>
        <ms:language>
                <ms:languageTag>es</ms:languageTag>
                <ms:languageId>es</ms:languageId>
        </ms:language>
</ms:unspecifiedPart>

Minimal elements for grammars

This page describes the minimal metadata elements specific to grammars.

1. Overview

Although grammars are a subclass of language descriptions, we describe them here separately, as we do for the editor.

In addition, as for corpora, we also cater for multimedia resources, which include not only text but also audio, video and image files. To cater for these cases, the notion of “media part” is introduced in the model. Thus, a language description consists of at least one text, video and image parts. Depending on the media part type, the DatasetDistribution component includes a set of text, video, etc. distribution features.

The table below has all the elements (mandatory and recommended) for a grammar, The second table presents the mandatory and recommended elements for each media part for grammars. The third table presents the mandatory and recommended elements for the Distribution component, which includes elements that are specific to each media part.

Table 1 - Elements for grammars

Element name

Optionality

Section

Tab

ldSubclass

M

languageDescriptionSubclass

M

Grammar

MA

Model/Grammar

technical

encodingLevel

M

Model/Grammar

technical

formalism

R

Model/Grammar

technical

ldTask

R

Model/Grammar

technical

personalDataIncluded

R

Model/Grammar

technical

personalDataDetails

RA

Model/Grammar

technical

sensitiveDataIncluded

R

Model/Grammar

technical

sensitiveDataDetails

RA

Model/Grammar

technical

anonymized

MA

Model/Grammar

technical

anonymizationDetails

RA

Model/Grammar

technical

requiresHardware

R

Model/Grammar

technical

Table 2 - Media parts

Element name

Optionality

Section

Tab

textPart

MA

LD

Part

lingualityType

M

LD

Part

multilingualityType

MA

LD

Part

multilingualityTypeDetails

R

LD

Part

language

M

LD

Part

metalanguage

R

LD

Part

videoPart

MA

LD

Part

lingualityType

M

LD

Part

multilingualityType

MA

LD

Part

multilingualityTypeDetails

RA

LD

Part

language

M

LD

Part

metalanguage

R

LD

Part

typeOfVideoContent

M

LD

Part

imagePart

MA

LD

Part

lingualityType

M

LD

Part

multilingualityType

RA

LD

Part

multilingualityTypeDetails

RA

LD

Part

language

M

LD

Part

metalanguage

R

LD

Part

typeOfImageContent

M

LD

Part

Table 2 - Distribution

Element name

Optionality

Section

Tab

DatasetDistribution

M

Distribution

Technical

DatasetDistributionForm

M

Distribution

Technical

downloadLocation

MA

Distribution

Technical

accessLocation

MA

Distribution

Technical

distributionLocation

MA

Distribution

Technical

samplesLocation

R

Distribution

Technical

distributionUnspecifiedFeature

M

Distribution

Technical

licenceTerms

M

Distribution

Technical

cost

R

Distribution

Technical

membershipInstitution

R

Distribution

Technical

2. Element presentation

In this section all the aforementioned elements are presented each one separately. The presentation follows the order of the elements in the tables of the previous section.


LanguageDescription

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LanguageDescription

Data type component

Optionality Mandatory

Explanation & Instructions

Wraps together elements for language descriptions

Example

<ms:LRSubclass>
        <ms:LanguageDescription>
                <ms:lrType>LanguageDescription</ms:lrType>
                ...
        </ms:LanguageDescription>
</ms:LRSubclass>

ldSubclass

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LanguageDescription.ldSubclass

Data type CV

Optionality Mandatory

Explanation & Instructions

The type of the language description

For grammars, select always http://w3id.org/meta-share/meta-share/grammar.

Example

<ms:ldSubclass>http://w3id.org/meta-share/meta-share/grammar<ms:ldSubclass>

LanguageDescriptionSubclass

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LanguageDescription.LanguageDescriptionSubclass

Data type component

Optionality Mandatory

Explanation & Instructions

The type of the language description (used for documentation purposes)

It wraps the set of elements that must be used for the Language Description subclasses. For models, this is the Grammar component.

Example

<ms:LanguageDescriptionSubclass><ms:Grammar>
        ...
</ms:Grammar><ms:LanguageDescriptionSubclass>

Grammar

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LanguageDescription.LanguageDescriptionSubclass.Grammar

Data type Component

Optionality Mandatory if applicable

Explanation & Instructions

Mandatory for grammars; grammar for our purposes is defined as “A set of rules governing what strings are valid or allowable in a language or text” [https://en.oxforddictionaries.com/definition/grammar]

The following set of elements are mandatory or recommended for computational grammars:

  • ldSubclassType (Mandatory): Used to mark the subclass of a language description. For grammars, the value is fixed to ‘Grammar.’

  • encodingLevel (Mandatory): Classifies the contents of a lexical/conceptual resource or language description as regards the linguistic level of analysis it caters for.

  • compliesWith (Recommended): Specifies the vocabulary/standard/best practice to which a resource is compliant with.

  • formalism (Recommended): Specifies the formalism (bibliographic reference, URL, name) used for the creation/enrichment of the resource (grammar or tool/service).

  • ldTask (Recommended): Specifies the task performed by the language description.

Example

<ms:Grammar>
        <ms:ldSubclassType>Grammar</ms:ldSubclassType>
        <ms:encodingLevel>http://w3id.org/meta-share/meta-share/morphology</ms:encodingLevel>
        <ms:compliesWith>http://w3id.org/meta-share/meta-share/GrAF</ms:compliesWith>
</ms:Grammar>

Minimal elements for lexical/conceptual resources

This page describes the minimal metadata elements specific to lexical/conceptual resources.

1. Overview

Lexical/Conceptual resources comprise computational lexica, gazetteers, ontologies, term lists, etc. Under this class, we also include multimedia dictionaries, sign language resources, etc. which include not only text but also audio, video and image files. To cater for these cases, the notion of “media part” is introduced in the model. Thus, a lexical/conceptual resource consists of at least one text, audio, video, image and numerical text parts. Depending on the media part type, the DatasetDistribution component includes a set of text, audio, video, etc. distribution features.

The first table below has all the elements (mandatory and recommended) for a lexical/conceptual resource. The second table presents the mandatory and recommended elements for each media part. The third table presents the mandatory and recommended elements for the Distribution component, which includes elements that are specific to each media part.

Table 1 - Lexical/Conceptual resource common elements

Element name

Optionality

Section

Tab

lcrSubclass

R

LCR

technical

encodingLevel

M

LCR

technical

contentType

R

LCR

technical

compliesWith

R

LCR

technical

personalDataIncluded

M

LCR

technical

personalDataDetails

RA

LCR

technical

sensitiveDataIncluded

M

LCR

technical

sensitiveDataDetails

RA

LCR

technical

anonymized

MA

LCR

technical

anonymizationDetails

RA

LCR

technical

Table 2 - Media parts

Element name

Optionality

Section

Tab

textPart

MA

LCR

Part

lingualityType

M

LCR

Part

multilingualityType

MA

LCR

Part

multilingualityTypeDetails

R

LCR

Part

language

M

LCR

Part

metalanguage

R

LCR

Part

audioPart

MA

LCR

Part

lingualityType

M

LCR

Part

multilingualityType

MA

LCR

Part

multilingualityTypeDetails

RA

LCR

Part

language

M

LCR

Part

metalanguage

R

LCR

Part

videoPart

MA

LCR

Part

lingualityType

M

LCR

Part

multilingualityType

MA

LCR

Part

multilingualityTypeDetails

RA

LCR

Part

language

M

LCR

Part

metalanguage

R

LCR

Part

typeOfVideoContent

M

LCR

Part

imagePart

MA

LCR

Part

lingualityType

M

LCR

Part

multilingualityType

RA

LCR

Part

multilingualityTypeDetails

RA

LCR

Part

language

M

LCR

Part

metalanguage

R

LCR

Part

typeOfImageContent

M

LCR

Part

Table 3 - Distribution

Element name

Optionality

Section

Tab

DatasetDistribution

M

Distribution

Technical

DatasetDistributionForm

M

Distribution

Technical

downloadLocation

MA

Distribution

Technical

accessLocation

MA

Distribution

Technical

distributionLocation

MA

Distribution

Technical

samplesLocation

R

Distribution

Technical

distributionTextFeature

MA

Distribution

Technical

distributionAudioFeature

MA

Distribution

Technical

distributionVideoFeature

MA

Distribution

Technical

distributionImageFeature

MA

Distribution

Technical

distributionTextNumericalFeature

MA

Distribution

Technical

licenceTerms

M

Distribution

Technical

cost

R

Distribution

Technical

membershipInstitution

R

Distribution

Technical

2. Element presentation

In this section all the aforementioned elements are presented each one separately. The presentation follows the order of the elements in the tables of the previous section.


LexicalConceptualResource

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LexicalConceptualResource

Data type component

Optionality Mandatory

Explanation & Instructions

Wraps together elements for lexical/conceptual resources

Example

<ms:LRSubclass>
        <ms:LexicalConceptualResource>
                <ms:lrType>LexicalConceptualResource</ms:lrType>
                ...
        </ms:LexicalConceptualResource>
</ms:LRSubclass>

lcrSubclass

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LexicalConceptualResource.lcrSubclass

Data type CV (lcrSubclass)

Optionality Recommended

Explanation & Instructions

Introduces a classification of lexical/conceptual resources into types (used for descriptive reasons)

Example

<lcrSubclass>http://w3id.org/meta-share/meta-share/computationalLexicon</lcrSubclass>

<lcrSubclass>http://w3id.org/meta-share/meta-share/ontology</lcrSubclass>

encodingLevel

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LexicalConceptualResource.encodingLevel

Data type CV (encodingLevel)

Optionality Mandatory

Explanation & Instructions

Classifies the contents of a lexical/conceptual resource or language description as regards the linguistic level of analysis it caters for

You can repeat the element for multiple encoding levels.

Example

<ms:encodingLevel>http://w3id.org/meta-share/meta-share/phonology</ms:encodingLevel>

<ms:encodingLevel>http://w3id.org/meta-share/meta-share/semantics</ms:encodingLevel>

ContentType

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LexicalConceptualResource.ContentType

Data type CV (ContentType)

Optionality Recommended

Explanation & Instructions

A more detailed account of the linguistic information contained in the lexical/conceptual resource

You can repeat the element for multiple content types.

Example

<ms:ContentType>http://w3id.org/meta-share/meta-share/collocation</ms:ContentType>

<ms:ContentType>http://w3id.org/meta-share/meta-share/definition</ms:ContentType>

compliesWith

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LexicalConceptualResource.ContentType

Data type CV (compliesWith)

Optionality Recommended

Explanation & Instructions

Specifies the vocabulary/standard/best practice to which a resource is compliant with

Example

<ms:compliesWith>http://w3id.org/meta-share/meta-share/LMF</ms:compliesWith>

LexicalConceptualResourceTextPart

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LexicalConceptualResource.LexicalConceptualResourceMediaPart.LexicalConceptualResourceTextPart

Data type component

Optionality Mandatory if applicable

Explanation & Instructions

A part (or whole set) of a lexical/conceptual resource that consists of textual elements

You can repeat the group of elements for multiple textual parts.

The mandatory or recommended elements for the text part of lexical/conceptual resources are:

  • mediaType (Mandatory): Specifies the media type of a language resource (the physical medium of the contents representation). For text parts, always use the value ‘text’.

  • lingualityType (Mandatory ): Indicates whether the resource includes one, two or more languages.

  • multilingualityType (Recommended if applicable): Indicates whether the resource (part) is parallel, comparable or mixed. If lingualityType = bilingual or multilingual, it is recommended for lexical/conceptual resources; select one of the values for parallel (e.g., bilingual dictionaries with source and translation equivalents), comparable (e.g. lexica of the same domain in multiple languages).

  • language (Mandatory): Specifies the language that is used in the resource part, expressed according to the BCP47 recommendation. See language.

  • languageVariety (Mandatory if applicable): Relates a language resource that contains segments in a language variety (e.g., dialect, jargon) to it. Please use for dialect corpora.

  • metalanguage (Recommended if applicable): pecifies the language that is used as support for the resource (e.g., English for a grammar of French described in English or for a French dictionary with English definitions), expressed according to the BCP47 recommendation. See language.

  • modalityType (Recommended if applicable): Specifies the type of the modality represented in the resource. For instance, you can use ‘spoken language’ to describe transcribed speech corpora.

Example

<ms:LexicalConceptualResourceMediaPart>
        <ms:LexicalConceptualResourceTextPart>
                <ms:lcrMediaType>LexicalConceptualResourceTextPart</ms:lcrMediaType>
                <ms:mediaType>http://w3id.org/meta-share/meta-share/text</ms:mediaType>
                <ms:lingualityType>http://w3id.org/meta-share/meta-share/bilingual</ms:lingualityType>
                <ms:multilingualityType>http://w3id.org/meta-share/meta-share/parallel</ms:multilingualityType>
                <ms:language>
                        <ms:languageTag>en-US</ms:languageTag>
                        <ms:languageId>en</ms:languageId>
                        <ms:regionId>US</ms:regionId>
                </ms:language>
                <ms:language>
                        <ms:languageTag>es</ms:languageTag>
                        <ms:languageId>es</ms:languageId>
                </ms:language>
                <ms:metalanguage>
                        <ms:languageTag>es</ms:languageTag>
                        <ms:languageId>es</ms:languageId>
                </metalanguage>
                </ms:language>
        </ms:LexicalConceptualResourceTextPart>
</ms:LexicalConceptualResourceMediaPart>

LexicalConceptualResourceAudioPart

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LexicalConceptualResource.LexicalConceptualResourceMediaPart.LexicalConceptualResourceAudioPart

Data type component

Optionality Mandatory if applicable

Explanation & Instructions

A part (or whole set) of a lexical/conceptual resource that consists of audio elements

You can repeat the group of elements for multiple audio parts.

The mandatory or recommended elements for the audio part of lexical/conceptual resources are:

  • mediaType (Mandatory): Specifies the media type of a language resource (the physical medium of the contents representation). For audio parts, always use the value ‘audio’.

  • lingualityType (Mandatory ): Indicates whether the resource includes one, two or more languages.

  • multilingualityType (Recommended if applicable): Indicates whether the resource (part) is parallel, comparable or mixed. If lingualityType = bilingual or multilingual, it is recommended for lexical/conceptual resources; select one of the values for parallel (e.g., bilingual dictionaries with source and translation equivalents), comparable (e.g. lexica of the same domain in multiple languages).

  • language (Mandatory): Specifies the language that is used in the resource part, expressed according to the BCP47 recommendation. See language.

  • languageVariety (Mandatory if applicable): Relates a language resource that contains segments in a language variety (e.g., dialect, jargon) to it. Please use for dialect corpora.

  • metalanguage (Recommended if applicable): pecifies the language that is used as support for the resource (e.g., English for a grammar of French described in English or for a French dictionary with English definitions), expressed according to the BCP47 recommendation. See language.

  • modalityType (Recommended if applicable): Specifies the type of the modality represented in the resource. For instance, you can use ‘spoken language’ to describe transcribed speech corpora.

Example

<ms:LexicalConceptualResourceMediaPart>
        <ms:LexicalConceptualResourceAudioPart>
                <ms:lcrMediaType>LexicalConceptualResourceAudioPart</ms:lcrMediaType>
                <ms:mediaType>http://w3id.org/meta-share/meta-share/audio</ms:mediaType>
                <ms:lingualityType>http://w3id.org/meta-share/meta-share/bilingual</ms:lingualityType>
                <ms:multilingualityType>http://w3id.org/meta-share/meta-share/parallel</ms:multilingualityType>
                <ms:language>
                        <ms:languageTag>en-US</ms:languageTag>
                        <ms:languageId>en</ms:languageId>
                        <ms:regionId>US</ms:regionId>
                </ms:language>
                <ms:language>
                        <ms:languageTag>es</ms:languageTag>
                        <ms:languageId>es</ms:languageId>
                </ms:language>
                <ms:metalanguage>
                        <ms:languageTag>es</ms:languageTag>
                        <ms:languageId>es</ms:languageId>
                </metalanguage>
                </ms:language>
        </ms:LexicalConceptualResourceAudioPart>
</ms:LexicalConceptualResourceMediaPart>

LexicalConceptualResourceVideoPart

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LexicalConceptualResource.LexicalConceptualResourceMediaPart.LexicalConceptualResourceVideoPart

Data type component

Optionality Mandatory if applicable

Explanation & Instructions

A part (or whole set) of a lexical/conceptual resource that consists of video elements

You can repeat the group of elements for multiple video parts.

The mandatory or recommended elements for the video part of lexical/conceptual resources are:

  • mediaType (Mandatory): Specifies the media type of a language resource (the physical medium of the contents representation). For video parts, always use the value ‘video’.

  • lingualityType (Mandatory ): Indicates whether the resource includes one, two or more languages.

  • multilingualityType (Recommended if applicable): Indicates whether the resource (part) is parallel, comparable or mixed. If lingualityType = bilingual or multilingual, it is recommended for lexical/conceptual resources; select one of the values for parallel (e.g., bilingual dictionaries with source and translation equivalents), comparable (e.g. lexica of the same domain in multiple languages).

  • language (Mandatory): Specifies the language that is used in the resource part, expressed according to the BCP47 recommendation. See language.

  • languageVariety (Mandatory if applicable): Relates a language resource that contains segments in a language variety (e.g., dialect, jargon) to it. Please use for dialect corpora.

  • metalanguage (Recommended if applicable): pecifies the language that is used as support for the resource (e.g., English for a grammar of French described in English or for a French dictionary with English definitions), expressed according to the BCP47 recommendation. See language.

  • modalityType (Recommended if applicable): Specifies the type of the modality represented in the resource. For instance, you can use ‘spoken language’ to describe transcribed speech corpora.

Example

<ms:LexicalConceptualResourceMediaPart>
        <ms:LexicalConceptualResourceVideoPart>
                <ms:lcrMediaType>LexicalConceptualResourceVideoPart</ms:lcrMediaType>
                <ms:mediaType>http://w3id.org/meta-share/meta-share/video</ms:mediaType>
                <ms:lingualityType>http://w3id.org/meta-share/meta-share/bilingual</ms:lingualityType>
                <ms:multilingualityType>http://w3id.org/meta-share/meta-share/parallel</ms:multilingualityType>
                <ms:language>
                        <ms:languageTag>en-US</ms:languageTag>
                        <ms:languageId>en</ms:languageId>
                        <ms:regionId>US</ms:regionId>
                </ms:language>
                <ms:language>
                        <ms:languageTag>es</ms:languageTag>
                        <ms:languageId>es</ms:languageId>
                </ms:language>
                <ms:metalanguage>
                        <ms:languageTag>es</ms:languageTag>
                        <ms:languageId>es</ms:languageId>
                </metalanguage>
                </ms:language>
        </ms:LexicalConceptualResourceVideoPart>
</ms:LexicalConceptualResourceMediaPart>

LexicalConceptualResourceImagePart

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LexicalConceptualResource.LexicalConceptualResourceMediaPart.LexicalConceptualResourceImagePart

Data type component

Optionality Mandatory if applicable

Explanation & Instructions

A part (or whole set) of a lexical/conceptual resource that consists of image elements

You can repeat the group of elements for multiple image parts.

The mandatory or recommended elements for the image part of lexical/conceptual resources are:

  • mediaType (Mandatory): Specifies the media type of a language resource (the physical medium of the contents representation). For image parts, always use the value ‘image’.

  • lingualityType (Mandatory ): Indicates whether the resource includes one, two or more languages.

  • multilingualityType (Recommended if applicable): Indicates whether the resource (part) is parallel, comparable or mixed. If lingualityType = bilingual or multilingual, it is recommended for lexical/conceptual resources; select one of the values for parallel (e.g., bilingual dictionaries with source and translation equivalents), comparable (e.g. lexica of the same domain in multiple languages).

  • language (Mandatory): Specifies the language that is used in the resource part, expressed according to the BCP47 recommendation. See language.

  • languageVariety (Mandatory if applicable): Relates a language resource that contains segments in a language variety (e.g., dialect, jargon) to it. Please use for dialect corpora.

  • metalanguage (Recommended if applicable): pecifies the language that is used as support for the resource (e.g., English for a grammar of French described in English or for a French dictionary with English definitions), expressed according to the BCP47 recommendation. See language.

  • modalityType (Recommended if applicable): Specifies the type of the modality represented in the resource. For instance, you can use ‘spoken language’ to describe transcribed speech corpora.

Example

<ms:LexicalConceptualResourceMediaPart>
        <ms:LexicalConceptualResourceImagePart>
                <ms:lcrMediaType>LexicalConceptualResourceImagePart</ms:lcrMediaType>
                <ms:mediaType>http://w3id.org/meta-share/meta-share/image</ms:mediaType>
                <ms:lingualityType>http://w3id.org/meta-share/meta-share/bilingual</ms:lingualityType>
                <ms:multilingualityType>http://w3id.org/meta-share/meta-share/parallel</ms:multilingualityType>
                <ms:language>
                        <ms:languageTag>en-US</ms:languageTag>
                        <ms:languageId>en</ms:languageId>
                        <ms:regionId>US</ms:regionId>
                </ms:language>
                <ms:language>
                        <ms:languageTag>es</ms:languageTag>
                        <ms:languageId>es</ms:languageId>
                </ms:language>
                <ms:metalanguage>
                        <ms:languageTag>es</ms:languageTag>
                        <ms:languageId>es</ms:languageId>
                </metalanguage>
                </ms:language>
        </ms:LexicalConceptualResourceImagePart>
</ms:LexicalConceptualResourceMediaPart>

Minimal elements for projects

This page describes the minimal metadata elements specific to projects.

N.B. The interactive editor supports the full schema, i.e. it also includes optional elements.

1. Overview

Element name

Optionality

Tab

projectName

M

Identity

ProjectIdentifier

R

Identity

projectShortName

R

Identity

projectAlternativeName

R

Identity

projectSummary

R

Identity

website

R

Identity

email

R

Identity

logo

R

Identity

fundingType

R

Identity

funder

R

Identity

fundingCountry

R

Identity

socialMediaOccupationalAccount

R

Identity

LTArea

R

Categories

domain

R

Categories

keyword

R

Categories

2. Element presentation

In this section all the aforementioned elements are presented each one separately. The presentation follows the order of the elements in the table of the previous section.


Project

Path MetadataRecord.DescribedEntity.Project

Data type component

Optionality Mandatory

Explanation & Instructions

Wraps together elements for projects

Example

<ms:Project>
        <ms:entityType>project</ms:entityType>
        ...
</ms:Project>

ProjectIdentifier

Path MetadataRecord.DescribedEntity.Project.ProjectIdentifier

Data type string

Optionality Recommended

Explanations & Instructions

A string (e.g., PID, internal to an organization, issued by the funding authority, etc.) used to uniquely identify a project

You must also use the attribute ProjectIdentifierScheme to specify the name of the scheme according to which an identifier is assigned to a project by the authority that issues it. ProjectIdentifierScheme for details.

Example

<ms:ProjectIdentifier ms:ProjectIdentifierScheme="http://w3id.org/meta-share/meta-share/cordis">219608</ms:ProjectIdentifier>

<ms:ProjectIdentifier ms:ProjectIdentifierScheme="http://w3id.org/meta-share/meta-share/cordis">219378</ms:ProjectIdentifier>

projectName

Path MetadataRecord.DescribedEntity.Project.projectName

Data type multilingual string

Optionality Mandatory

Explanations & Instructions

The full name (title) of a project

Example

<ms:projectName xml:lang="en">Browser-based Multilingual Translation</ms:projectName>

<ms:projectName xml:lang="en">European Language Grid</ms:projectName>

projectShortName

Path MetadataRecord.DescribedEntity.Project.projectShortName

Data type multiligual string

Optionality Recommended

Explanations & Instructions

Introduces a short name (e.g., acronym, abbreviated form) by which a project is known

Example

<ms:projectShortName xml:lang="en">Bergamot</ms:projectShortName>

<ms:projectShortName xml:lang="en">ELG</ms:projectShortName>

projectAlternativeName

Path MetadataRecord.DescribedEntity.Project.projectAlternativeName

Data type multilingual string

Optionality Recommended

Explanations & Instructions

Introduces an alternative name (other than the short name) used for a project

Example

<ms:projectAlternativeName xml:lang="en">The European Language Grid</ms:projectName>

projectSummary

Path MetadataRecord.DescribedEntity.Project.projectSummary

Data type multilingual string

Optionality Recommended

Explanations & Instructions

Introduces a short description (in free text) of the main objectives, mission or contents of the project

Example

<ms:projectSummary xml:lang="en">The Bergamot project will add and improve client-side machine translation in a web browser.  Unlike current cloud-based options, running directly on users'' machines empowers citizens to preserve their privacy and increases the uptake of language technologies in Europe in various sectors that require confidentiality.  Free software integrated with an open-source web browser, such as Mozilla Firefox, will enable bottom-up adoption by non-experts, resulting in cost savings for private and public sector users who would otherwise procure translation or operate monolingually.  To understand and support non-expert users, our user experience work package researches their needs and creates the user interface.  Rather than simply translating text, this interface will expose improved quality estimates, addressing the rising public debate on algorithmic trust.  Building on quality estimation research, we will enable users to confidently generate text in a language they do not speak, enabling cross-lingual online form filling.  To improve quality overall, dynamic domain adaptation research addresses the peculiar writing style of a website or user by adapting translation on the fly using local information too private to upload to the cloud.  These applications require adaptation and inference to run on desktop hardware with compact model downloads, which we address with neural network efficiency research.  Our combined research on user experience, domain adaptation, quality estimation, outbound translation, and efficiency support a broad browser-based innovation plan.</ms:projectSummary>

<ms:projectSummary xml:lang="en">With 24 official EU and many more additional languages, multilingualism in Europe and an inclusive Digital Single Market can only be enabled through Language Technologies (LTs). European LT business is dominated by thousands of SMEs and a few large players. Many are world-class, with technologies that outperform the global players. However, European LT business is also fragmented  by nation states, languages, verticals and sectors. Likewise, while much of European LT research is world-class, with results transferred into industry and commercial products, its full impact is held back by fragmentation. The key issue and challenge is the fragmentation of the European LT landscape. The European Language Grid (ELG) project will address this fragmentation by establishing the ELG as the primary platform for LT in Europe. The ELG will be a scalable cloud platform, providing, in an easy-to-integrate way, access to hundreds of commercial and non-commercial Language Technologies for all European languages, including running tools and services as well as data sets and resources. It will enable the commercial and non-commercial European LT community to deposit and upload their technologies and data sets into the ELG, to deploy them through the grid, and to connect with other resources. The ELG will boost the Multilingual Digital Single Market towards a thriving European LT community, creating new jobs and opportunities. Through open calls, up to 20 pilot projects will be financially supported to demonstrate the usefulness of the ELG. The proposal is rooted in the experience of a consortium with partners involved in all relevant initiatives. Based on these, 30\\ national competence centres and the European LT Board will be set up for European coordination. The ELG will foster "language technologies for Europe built in Europe", tailored to our languages and cultures and to our societal and economical demands, benefitting the European citizen, society, innovation and industry.</ms:projectSummary>

website

Path MetadataRecord.DescribedEntity.Project.website

Data type URL

Optionality Recommended

Explanations & Instructions

Links to a URL that acts as the primary page (like a table of contents) introducing information about an organization (e.g., products, contact information, etc.) or project

Example

<ms:website>https://browser.mt/</ms:website>

<ms:website>https://www.european-language-grid.eu/</ms:website>

email

Path MetadataRecord.DescribedEntity.Project.email

Data type string

Optionality Recommended

Explanation & Instructions

Points to the email address used for information purposes of a project

Example

<ms:email>info@project.eu</ms:email>

logo

Path MetadataRecord.DescribedEntity.Project.logo

Data type URL

Optionality Recommended Explanations & Instructions

Links to a URL with an image file containing a symbol or graphic object used to identify the entity

In the interactive editor, users can also upload an image file.

Example

<ms:logo>https://ufal.mff.cuni.cz/sites/default/files/styles/drupal_projects_logo_style/public/bergamot_logo.png</ms:logo>

<ms:logo>https://www.european-language-grid.eu/wp-content/themes/elg_theme/fab/image/logo/rgb_elg__logo--colour.svg</ms:logo>

fundingType

Path MetadataRecord.DescribedEntity.Project.fundingType

Data type CV (fundingType)

Optionality Recommended

Explanations & Instructions

Specifies the type of funding of a project with regard to the source of the funding

Example

<ms:fundingType>http://w3id.org/meta-share/meta-share/euFunds</ms:fundingType>

funder

Path MetadataRecord.DescribedEntity.Project.funder

Data type component

Optionality Recommended

Explanations & Instructions

Identifies the person/organization/group that has financed the project

Funding information is important for acknowledgement purposes.

For organizations, you must provide the name of the organization (organizationName) and, if possible, a website (website) and/or an identifier (OrganizationIdentifier).

Example

<ms:funder>
        <ms:Organization>
                <ms:actorType>Organization</ms:actorType>
                <ms:organizationName xml:lang="en">European Commission</ms:organizationName>
                <ms:website>https://ec.europa.eu/info/index_en</ms:website>
        </ms:Organization>
</ms:funder>

fundingCountry

Path MetadataRecord.DescribedEntity.Project.fundingCountry

Data type CV (regionIdType)

Optionality Recommended

Explanations & Instructions

Specifies the name of the funding country, in case of national funding as mentioned in ISO3166

Example

<ms:fundingCountry>EU</ms:fundingCountry>

socialMediaOccupationalAccount

Path MetadataRecord.DescribedEntity.Project.socialMediaOccupationalAccount

Data type multilingual string

Optionality Recommended

Explanations & Instructions

Introduces the social media or occupational account details of a person, organization or project

You must also use the attribute socialMediaAccountType to specify the type of social media account. See socialMediaOccupationalAccountType for details.

Example

<ms:socialMediaOccupationalAccount ms:socialMediaOccupationalAccountType="http://w3id.org/meta-share/meta-share/facebook">https://www.facebook.com/project</ms:socialMediaOccupationalAccount>

LTArea

Path MetadataRecord.DescribedEntity.Project.LTArea

Data type component

Optionality Recommended

Explanations & Instructions

Introduces a Language Technology-related area that the project deals with

For details, see LTArea More specifically, you can fill in:

  • the LTClassRecommended element with one of the recommended values from the LT taxonomy, or

  • the LTClassOther element with a free text.

Example

<ms:LTArea>
        <ms:LTClassRecommended>http://w3id.org/meta-share/omtd-share/MachineTranslation</ms:LTClassRecommended>
</ms:LTArea>
<ms:LTArea>
        <ms:LTClassOther>Browser-based Machine Translation</ms:LTClassOther>
</ms:LTArea>

domain

Path MetadataRecord.DescribedEntity.Project.domain

Data type component

Optionality Recommended

Explanations & Instructions

Identifies a domain that the project deals with

You must fill in the CategoryLabel element with a free text value. If you prefer to add a value from an established controlled vocabulary, you can also use the DomainIdentifier (with the attribute DomainClassificationScheme with the appropriate value).

Example

<ms:domain>
        <ms:categoryLabel xml:lang="en">htttp://w3id.org/meta-share/omtd-share/NewsMediaJournalismAndPublishing</ms:categoryLabel>
</ms:domain>
<ms:domain>
        <ms:categoryLabel xml:lang="en">General</ms:categoryLabel>
</ms:domain>

keyword

Path MetadataRecord.DescribedEntity.Project.keyword

Data type multilingual string

Optionality Recomended

Explanations & Instructions

Introduces a word or phrase considered important for the description of the project and thus used to index or classify it

Example

<ms:keyword xml:lang="en">Machine translation</ms:keyword>
<ms:keyword xml:lang="en">translation integration</ms:keyword>

<ms:keyword xml:lang="en">Language technology services</ms:keyword>
<ms:keyword xml:lang="en">Multilingualism</ms:keyword>
<ms:keyword xml:lang="en">Less-resourced languages</ms:keyword>

Minimal elements for organizations

This page describes the minimal metadata elements specific to organizations.

N.B. The interactive editor supports the full schema, i.e. it also includes optional elements.

1. Overview

Element name

Optionality

Tab

organizationName

M

Identity

OrganizationIdentifier

R

Identity

organizationShortName

R

Identity

organizationAlternativeName

R

Identity

organizationBio

R

Identity

logo

R

Identity

LTArea

R

Activities

serviceOffered

R

Activities

domain

R

Activities

keyword

R

Activities

email

R

Contact

website

R

Contact

headOfficeAddress

R

Contact

socialMediaOccupationalAccount

R

Contact

divisionCategory

R

Division

isDivisionOf

R

Division

2. Element presentation

In this section all the aforementioned elements are presented each one separately. The presentation follows the order of the elements in the table of the previous section.

Organization

Path MetadataRecord.DescribedEntity.Organization

Data type component

Optionality Mandatory

Explanation & Instructions

Wraps together elements for organizations

Example

<ms:Organization>
        <ms:entityType>organization</ms:entityType>
        ...
</ms:Organization>

organizationName

Path MetadataRecord.DescribedEntity.Organization.organizationName

Data type multilingual string

Optionality Mandatory

Explanation & Instructions

The full name of an organization

Example

<ms:organizationName xml:lang="en">Charles University</ms:organizationName>

<ms:organizationName xml:lang="en">Evaluation and Language Resources Distribution Agency</ms:organizationName>

OrganizationIdentifier

Path MetadataRecord.DescribedEntity.Organization.OrganizationIdentifier

Data type string

Optionality Recommended

Explanation & Instructions

A string (e.g., PID, internal to an organization, issued by the funding authority, etc.) used to uniquely identify an organization

You must also use the attribute OrganizationIdentifierScheme to specify the name of the scheme according to which an identifier is assigned to an organization by the authority that issues it. See OrganizationIdentifierScheme for details.

It is recommended to add an identifier issued by an authority, such as GRID, if available.

Example

<ms:OrganizationIdentifier ms:OrganizationIdentifierScheme="http://w3id.org/meta-share/meta-share/grid">https://www.grid.ac/institutes/grid.5216.0</ms:OrganizationIdentifier>

organizationShortName

Path MetadataRecord.DescribedEntity.Organization.organizationShortName

Data type multilingual string

Optionality Recommended

Explanation & Instructions

Introduces the short name (abbreviation, acronym , etc.) used for an organization

Example

<ms:organizationShortName xml:lang="en">CUNI</ms:organizationName>

<ms:organizationShortName xml:lang="en">ELDA</ms:organizationName>

organizationAlternativeName

Path MetadataRecord.DescribedEntity.Organization.organizationAlternativeName

Data type multilingual string

Optionality Recommended

Explanation & Instructions

Introduces an alternative name (other than the short name) used for an organization

Example

<ms:organizationAlternativeName xml:lang="en">UNIVERZITA KARLOVA</ms:organizationAlternativeName>

<ms:organizationAlternativeName xml:lang="en">EVALUATIONS AND LANGUAGE RESOURCES DISTRIBUTION AGENCY</ms:organizationAlternativeName>

organizationBio

Path MetadataRecord.DescribedEntity.Organization.organizationBio

Data type multilingual string

Optionality Recommended

Explanation & Instructions

Introduces a short free-text account that provides information on an organization

Example

<ms:organizationBio xml:lang="en">Charles University was founded in 1348, making it one of the oldest universities in the world. Yet it is also renowned as a modern, dynamic, cosmopolitan and prestigious institution of higher education. It is the largest and most renowned Czech university, and is also the best-rated Czech university according to international rankings. There are currently 17 faculties at the University, plus 3 institutes, 6 other centres of teaching, research, development and other creative activities, a centre providing information services, 5 facilities serving the whole University, and the Rectorate - which is the executive management body for the whole University.</ms:organizationBio>

<ms:organizationBio xml:lang="en">The Evaluations and Language Resources Distribution Agency (ELDA), was created in 1995 as the organizational infrastructure with the mission of  providing a central clearing house for Language Resources (LR) of the  European Language Resources Association (ELRA). ELDA was set up to  identify, classify, collect, validate and distribute the language resources that are  needed by the Human Language Technology (HLT) community. Anticipating the   evolutions in the HLT field, ELDA broadened its activities to cover  multimedia/multimodal resources as well as evaluation activities, distributing  the language resources needed for evaluation purposes, and conducting/coordinating evaluation campaigns. ELDA has played a significant role within the major Multimedia and Multimodal production projects that resulted in one of the most impressive catalogues of available data sets, embracing all aspects of Language Technologies. ELDA was also involved in evaluation initiatives, in several FPs’ projects involving HLT infrastructures, as well as in national programmes. In addition to work on data production, processing and annotation, validation and quality control, several of these projects also involved work on legal framework management for the produced resources. Moreover, ELDA has contributed to the development of open platforms and has joined forces with other European key players by bringing its assets (LR catalogue, evaluation services and benchmarking) to constitute Europe's backbone for Language Resources sharing and distribution. ELDA is also the initiator of the Language Resource and the Evaluation Conference (LREC), since 1998. With over 1200 participants, LREC is the major event on Language Resources (LRs) and Evaluation for Human Language Technologies (HLT).</ms:organizationBio>

logo

Path MetadataRecord.DescribedEntity.Organization.logo

Data type URL

Optionality Recommended

Explanation & Instructions

Links to a URL with an image file containing a symbol or graphic object used to identify the entity

In the interactive form, users can also upload an image file.

Example

<ms:logo>https://cuni.cz/UKEN-1-version1-afoto.jpg</ms:logo>

<ms:logo>https://www.european-language-grid.eu/wp-content/uploads/2019/03/logo__consortium-elda.svg</ms:logo>

LTArea

Path MetadataRecord.DescribedEntity.Organization.LTArea

Data type component

Optionality Recommended

Explanation & Instructions

Introduces a Language Technology-related area that a person or organization is involved or active in

For details, see LTArea More specifically, you can fill in:

  • the LTClassRecommended element with one of the recommended values from the LT taxonomy, or

  • the LTClassOther element with a free text.

Example

<ms:LTArea>
        <ms:LTClassRecommended>http://w3id.org/meta-share/omtd-share/LanguageTechnology</ms:LTClassRecommended>
</ms:LTArea>
<ms:LTArea>
        <ms:LTClassRecommended>http://w3id.org/meta-share/omtd-share/MachineTranslation</ms:LTClassRecommended>
</ms:LTArea>

serviceOffered

Path MetadataRecord.DescribedEntity.Organization.serviceOffered

Data type multilingual string

Optionality Recommended

Explanation & Instructions

Lists the service(s) offered by an organization or person

Example

<ms:serviceOffered xml:lang="en">Evaluation and benchmarking</ms:serviceOffered>
<ms:serviceOffered xml:lang="en">Legal support</ms:serviceOffered>

domain

Path MetadataRecord.DescribedEntity.Organization.domain

Data type component

Optionality Recommended

Explanation & Instructions

Identifies a domain that the organization deals with

You must fill in the CategoryLabel element with a free text value. If you prefer to add a value from an established controlled vocabulary, you can also use the DomainIdentifier (with the attribute DomainClassificationScheme with the appropriate value).

Example

<ms:domain>
        <ms:categoryLabel xml:lang="en">environment</ms:categoryLabel>
</ms:domain>

keyword

Path MetadataRecord.DescribedEntity.Organization.keyword

Data type multilingual string

Optionality Recommended

Explanation & Instructions

Introduces a word or phrase considered important for the description of the project and thus used to index or classify it

Example

<ms:keyword xml:lang="en">Research infrastructures</ms:keyword>
<ms:keyword xml:lang="en">Language Resources</ms:keyword>
<ms:keyword xml:lang="en">Digital Humanities</ms:keyword>
<ms:keyword xml:lang="en">Language Resources and Evaluation</ms:keyword>
<ms:keyword xml:lang="en">Legal support</ms:keyword>
<ms:keyword xml:lang="en">Data management</ms:keyword>

email

Path MetadataRecord.DescribedEntity.Organization.email

Data type string

Optionality Recommended

Explanation & Instructions

Points to the email address of a person, organization or group

Example

<ms:email>info@company.eu</ms:email>

website

Path MetadataRecord.DescribedEntity.Organization.website

Data type URL

Optionality Recommended

Explanation & Instructions

Links to a URL that acts as the primary page (like a table of contents) introducing information about an organization (e.g., products, contact information, etc.) or project

Example

<ms:website>https://www.cuni.cz</ms:website>

<ms:website>http://www.elra.info/en/</ms:website>

headOfficeAddress

Path MetadataRecord.DescribedEntity.Organization.headOfficeAddress

Data type component

Optionality Recommended

Explanation & Instructions

Links to a set of elements that describe the full address of the head office of an or organization (i.e. including street address, zip code, etc.). The only mandatory element in this set is country.

Example

<ms:headOfficeAddress>
        <ms:address xml:lang="en">OLD COLLEGE, SOUTH BRIDGE</ms:address>
        <ms:zipCode>EH8 9YL</ms:zipCode>
        <ms:city xml:lang="en">EDINBURGH</ms:city>
        <ms:country>GB</ms:country>
</ms:headOfficeAddress>

socialMediaOccupationalAccount

Path MetadataRecord.DescribedEntity.Organization.socialMediaOccupationalAccount

Data type multilingual string

Optionality Recommended

Explanation & Instructions

Introduces the social media or occupational account details of a person or organization

You must also use the attribute socialMediaAccountType to specify the type of social media account. See https://european-language-grid.readthedocs.io/en/stable/Documentation/ELG-SHAREschema.html#socialMediaOccupationalAccountType for details.

Example

<ms:socialMediaOccupationalAccount ms:socialMediaOccupationalAccountType="http://w3id.org/meta-share/meta-share/facebook">https://www.facebook.com/UFALMFFUK</ms:socialMediaOccupationalAccount>

divisionCategory

Path MetadataRecord.DescribedEntity.Organization.divisionCategory

Data type CV

Optionality Recommended

Explanation & Instructions

Classifies the division of an organization according to a controlled vocabulary

Specify, in case the organization you describe is part of a parent organization, the category, e.g. faculty or department of a university, laboratory in a company, etc.

Example

<ms:divisionCategory>http://w3id.org/meta-share/meta-share/institute</ms:divisionCategory>

isDivisionOf

Path MetadataRecord.DescribedEntity.Organization.isDivisionOf

Data type component

Optionality Recommended

Explanation & Instructions

Links an organization to the division(s) it consists of

Example

<ms:isDivisionOf>
        <ms:organizationName xml:lang="en">Charles University</ms:organizationName>
        <ms:website>https://www.cuni.cz</ms:website>
</ms:isDivisionOf>
1

To register a metadata record at the ELG platform, the recommended elements do not have to be filled in. However, they increase the visibility and usability of the item, and providers are encouraged to fill them in. The ELG interactive editor contains both the mandatory and recommended elements. The full schema is currently supported through the upload of metadata records.