Minimal version¶
The minimal version of the ELG schema consists of the required and recommended elements 1. These have been carefully selected for various reasons, such as:
identification and citation: resource name(s); identifier(s); a short description of contents; versioning information; a contact point for further information (email or landing page); data of the resource provider(s) and resource creator(s); classification by domain, keywords and intended LT application; language coverage (language and, if needed, dialect); publication date;
support: links to manuals, training material; samples of the resource;
usage/access: distribution form (e.g. as downloadable file, a form that can be accessed via an interface, source code or binary file of software, etc.); licensing conditions; access location.
These metadata elements can be used to describe all resources, irrespective of the resource type. Additional metadata elements, particular to each resource type, are required, such as size and format for data files, dependencies and technical requirements for tools and services, etc.
Outline and explanations for the following sections¶
The following sections present the minimal schema, grouped as described above, i.e. first for elements common to all LRTs, and then by resource type. Each section includes:
an overview, with a tabular presentation of the mandatory (M) and recommended (R) elements. More specifically, the table provides information on the element name, the element optionality and the section tab where the user can find each element in the interactive editor. The elements are grouped according to the tab where they are found. The values for optionality are:
Mandatory (Μ): the element must always be filled in the metadata record
Recommended (R): the use of the element is not enforced but provides important information
Mandatory if applicable (MA): the element must be filled in when specific conditions apply
Recommended if applicable (RA): the use of the element is recommended when specific conditions apply
a detailed presentation for each metadata element with the following information:
Path: the path of the element as in the XSD
Data type:
string
multilingual string: you can repeat the element for different language versions; to specify the language, you must use the xml attribute
lang
with a value from IETF BCP 47, the IANA Language Subtag Registry; for all metadata elements, a value in English (“en”) is mandatorycomponent: group of elements
Controlled Vocabulary (CV): value taken from a controlled vocabulary; a link to the relevant controlled vocabulary is provided
date: date in the format xs:date
URL
Optionality:
For an explanation of the values, see above.
Explanation & Instructions: A short definition of the element, followed by instructions on how it should be used in the specific context.
Example: One or more examples for the element in XML format.
Minimal elements for all entities¶
This page describes the minimal metadata elements common to all types of entities.
1. Overview¶
Element name |
Optionality |
Section |
Tab |
---|---|---|---|
metadataCreationDate |
R |
||
metadataCurator |
R |
||
compliesWith |
R |
||
metadataCreator |
R |
||
sourceOfMetadataRecord |
R |
LRT |
Identity |
Organization |
|||
Project |
2. Element presentation¶
In this section all the aforementioned elements are presented following the order of the elements in the table of the previous section.
MetadataRecord¶
Path MetadataRecord
Data type component
Optionality Mandatory
Explanation & Instructions
A set of formalized structured information used to describe the contents, structure, function, etc. of an entity, usually according to a specific set of rules (metadata schema)
The MetadataRecord
element wraps together a set of administrative data, of which the main elements (automatically assigned by the ELG software) for metadata records registered by individuals (presented in the previous table) are:
metadataCreationDate
: the date when the metadata record was createdmetadataCurator
: the person that will be assigned the responsibility to update the metadata record when imported in the ELG database; it is usually the same person as the metadataCreatorcompliesWith
: for ELG metadata records, this is by default the ELG-SHARE metadata schemametadataCreator
: the person that has created the metadata recordsourceOfMetadataRecord
: used for metadata records that have been imported into ELG from other catalogues, either automatically harvested or through a manual collection procedure; it consists of two mandatory elements,repositoryName
andrepositoryURL
, and the optional elementrepositoryIdentifier
.
All elements apart from the sourceOfMetadataRecord
are automatically assigned; they are, therefore, not displayed on the interactive editor and they do not have to be added in the metadata file.
The sourceOfMetadataRecord
is mandatory for harvested records and automatically assigned for them. It is recommended for records registered by individuals and, therefore, displayed in the interactive editor form under the section “Language Resource/Technology”, “Project” or “Organization”.
Example
<ms:MetadataRecord>
<ms:MetadataRecordIdentifier ms:MetadataRecordIdentifierScheme="http://w3id.org/meta-share/meta-share/elg">default id</ms:MetadataRecordIdentifier>
<ms:metadataCreationDate>2020-02-28</ms:metadataCreationDate>
<ms:metadataCurator>
<ms:actorType>Person</ms:actorType>
<ms:surname xml:lang="en">Smith</ms:surname>
<ms:givenName xml:lang="en">John</ms:givenName>
</ms:metadataCurator>
<ms:compliesWith>http://w3id.org/meta-share/meta-share/ELG-SHARE</ms:compliesWith>
<ms:metadataCreator>
<ms:actorType>Person</ms:actorType>
<ms:surname xml:lang="en">Brown</ms:surname>
<ms:givenName xml:lang="en">George</ms:givenName>
</ms:metadataCreator>
<sourceOfMetadataRecord>
<repositoryName xml:lang="en">ELRC-SHARE</repositoryName>
<repositoryURL>https://www.elrc-share.eu/</repositoryName>
</sourceOfMetadataRecord>
</ms:metadataRecord>
Minimal elements for all language resources and technologies¶
This page describes the minimal metadata elements common to all language resources and technologies (LRTs).
1. Overview¶
Element name |
Optionality |
Section |
Tab |
---|---|---|---|
resourceName |
M |
LRT |
Identity |
LRIdentifier |
R |
LRT |
Identity |
resourceShortName |
R |
LRT |
Identity |
description |
M |
LRT |
Identity |
version |
M |
LRT |
Identity |
versionDate |
R |
LRT |
Identity |
resourceProvider |
R |
LRT |
Identity |
resourceCreator |
R |
LRT |
Identity |
publicationDate |
R |
LRT |
Identity |
fundingProject |
R |
LRT |
Identity |
logo |
R |
LRT |
Identity |
sourceOfMetadataRecord |
R |
LRT |
Identity |
intendedApplication |
R |
LRT |
Categories |
compliesWith |
R |
LRT |
Categories |
domain |
R |
LRT |
Categories |
keyword |
M |
LRT |
Categories |
additionalInfo |
M |
LRT |
Contact |
contact |
R |
LRT |
Contact |
isDocumentedBy |
R |
LRT |
Documentation |
isToBeCitedBy |
R |
LRT |
Documentation |
replaces |
R |
LRT |
Related LRTs |
isVersionOf |
R |
LRT |
Related LRTs |
isPartOf |
R |
LRT |
Related LRTs |
isSimilarTo |
R |
LRT |
Related LRTs |
isRelatedTo |
R |
LRT |
Related LRTs |
relation |
R |
LRT |
Related LRTs |
2. Element presentation¶
In this section all the aforementioned elements are presented each one separately. The presentation follows the order of the elements in the table of the previous section.
resourceName¶
Path MetadataRecord.DescribedEntity.LanguageResource.resourceName
Data type multilingual string
Optionality Mandatory
Explanation & Instructions
Introduces a human-readable name or title by which the resource is known
This is the “brand name” of your resource; try to use a name that is unique.
Example
<ms:resourceName xml:lang="en">GATE: English Named Entity Recognizer</ms:resourceName>
LRIdentifier¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRIdentifier
Data type string with attribute
Optionality Recommended when applicable
Explanation & Instructions
A string (e.g., PID, DOI, internal to an organization , etc.) used to uniquely identify a language resource
You must also use the attribute LRIdentifierScheme
to specify the identifier scheme (e.g., DOI, Hanldle, …)
If the resource is already described in another repository/catalogue and has a PID, please add it with the appropriate attribute.
Example
<ms:LRIdentifier ms:LRIdentifierScheme="http://w3id.org/meta-share/meta-share/elg">ELG id automatically assigned</ms:LRIdentifier>
resourceShortName¶
Path MetadataRecord.DescribedEntity.LanguageResource.resourceShortName
Data type multilingual string
Optionality Recommended
Explanation & Instructions
Introduces a short form (e.g., abbreviation, acronym , etc.) used to refer to a language resource
Example
<ms:resourceShortName xml:lang="en">annie-named-entity-recognizer</ms:resourceShortName>
description¶
Path MetadataRecord.DescribedEntity.LanguageResource.description
Data type multilingual string
Optionality Mandatory
Explanation & Instructions
Introduces a short free-text account that provides information about the resource (e.g., service function, contents of a data resource, technical information , etc.)
Example
<ms:description xml:lang="en">Identifies names of persons, locations, organizations, as well as money amounts, time and date expressions in English texts automatically. </ms:description>
version¶
Path MetadataRecord.DescribedEntity.LanguageResource.version
Data type string
Optionality Mandatory
Explanation & Instructions
Associates a language resource with a pattern that indicates its version; the recommended way is to follow the semantic versioning guidelines (http://semver.org) and use a numeric pattern of the form major_version.minor_version.patch
If no version is provided, the system will automatically assign the resource a ‘v1.0.0 (automatically assigned)’ value
Example
<ms:version>v8.6</ms:version>
versionDate¶
Path MetadataRecord.DescribedEntity.LanguageResource.versionDate
Data type date
Optionality Recommended
Explanation & Instructions
Identifies the date associated with the version of the language resource being described (as a recommendation, of the latest update of the particular version)
Example
<ms:versionDate>2020-02-10</ms:versionDate>
resourceProvider¶
Path MetadataRecord.DescribedEntity.LanguageResource.resourceProvider
Data type component
Optionality Recommended
Explanation & Instructions
The person/organization responsible for providing, curating, maintaining and making available (publishing) the resource
The resource provider is very similar to the publisher of scientific articles; it can be an individual or an organization.
For organizations you must add the name of the organization (organizationName
) and, if possible, the website.
For persons, you must add the given name and surname and, if possible, an email address or an identifier (such as ORCID id) to help uniquely identify them.
Example
<ms:resourceProvider>
<ms:Organization>
<ms:actorType>Organization</ms:actorType>
<ms:organizationName xml:lang="en">Organization</ms:organizationName>
<ms:website>https://provider.org/</ms:website>
</ms:Organization>
</ms:resourceProvider>
<ms:resourceProvider>
<ms:Person>
<ms:actorType>Person</ms:actorType>
<ms:surname xml:lang="en">Smith</ms:surname>
<ms:givenName xml:lang="en">John</ms:givenName>
</ms:Person>
</ms:resourceProvider>
resourceCreator¶
Path MetadataRecord.DescribedEntity.LanguageResource.resourceCreator
Data type component
Optionality Recommended
Explanation & Instructions
Links a resource to the person, group or organization that has created the resource
The element is important for citation and acknowledgement purposes.
For organizations, you must add the name of the organization (organizationName
) and, if possible, the website.
For persons, you must add the given name and surname and, if possible, an email address or an identifier (such as ORCID id) to help uniquely identify them.
Example
<ms:resourceCreator>
<ms:Organization>
<ms:actorType>Organization</ms:actorType>
<ms:organizationName xml:lang="en">example organization</ms:organizationName>
<ms:website>https://provider.org/</ms:website>
</ms:Organization>
</ms:resourceCreator>
<ms:resourceCreator>
<ms:Person>
<ms:actorType>Person</ms:actorType>
<ms:surname xml:lang="en">Smith</ms:surname>
<ms:givenName xml:lang="en">John</ms:givenName>
</ms:Person>
</ms:resourceCreator>
publicationDate¶
Path MetadataRecord.DescribedEntity.LanguageResource.publicationDate
Data type date
Optionality Recommended
Explanation & Instructions
Specifies the date when a language resource has been made available to the public
Publication date is important for citation purposes, just as for scientific articles. If this is the first time your resource is published, please use the same date as for metadataCrationDate
. If the resource has been previously published in another repository, please add the date it was first provided there.
Example
<ms:publicationDate>2015-12-17</ms:publicationDate>
fundingProject¶
Path MetadataRecord.DescribedEntity.LanguageResource.fundingProject
Data type component
Optionality Recommended when applicable
Explanation & Instructions
Links a language resource to the project that has funded its creation, enrichment, extension , etc.
Funding information is important for acknowledgement purposes.
For projects, you must provide the name of the project (projectName
) and, if possible, a website (website
) and/or an identifier (ProjectIdentifier
). You may also provide the short name of the project (projectShortName
), a grant number issued by the funding authority (grantNumber
), the funder(s) (funder
), in the form of organization, person or group, and a value selected from the fundingType
controlled vocabulary.
Example
<ms:fundingProject>
<ms:projectName xml:lang="en">European Language Resource Coordination LOT3</ms:projectName>
<ms:projectName xml:lang="en">ELRC - LOT3</ms:projectName>
<ms:ProjectIdentifier ms:ProjectIdentifierScheme="http://w3id.org/meta-share/meta-share/other">SMART 2015/1091 - 30-CE-0816766/00-92</ms:ProjectIdentifier>
<ms:website>http://www.lr-coordination.eu</ms:website>
<ms:grantNumber>EU 1234567890</ms:grantNumber>
<ms:fundingType>http://w3id.org/meta-share/meta-share/serviceContract</ms:fundingType>
<ms:fundingType>http://w3id.org/meta-share/meta-share/other</ms:fundingType>
<ms:funder>
<ms:Organization>
<ms:actorType>Organization</ms:actorType>
<ms:organizationName xml:lang="en">Ministry of Research and Innovation</ms:organizationName>
<ms:website>http://www.ministry.org</ms:website>
</ms:Organization>
</ms:funder>
</ms:fundingProject>
logo¶
Path MetadataRecord.DescribedEntity.LanguageResource.logo
Data type URL
Optionality Recommended
Explanation & Instructions
Links to a URL with an image file containing a symbol or graphic object used to identify the entity
The logo is like a brand name for the resource; it is displayed next to the resource name in the catalogue. In the interactive editor form, you can also upload an image file.
Example
<logo>https://gate.ac.uk/plugins/gau-0.1/images/logo-gate.png</logo>
sourceOfMetadataRecord¶
Path MetadataRecord.sourceOfMetadataRecord
Data type component
Optionality Recommended
Explanation & Instructions
Refers to the entity (repository, catalogue, archive, etc.) from which the metadata record has been imported into the new catalogue
This element is a property of the metadata record, and it is automatically assigned by the ELG software for records automatically harvested. For records originally included in other catalogues and registered in ELG by individuals, the element can be filled in at the LRT section of the editor.
It consists of two mandatory elements, repositoryName
and repositoryURL
, and the optional element repositoryIdentifier
.
Example
<sourceOfMetadataRecord>
<repositoryName xml:lang="en">ELRC-SHARE</repositoryName>
<repositoryURL>https://www.elrc-share.eu/</repositoryName>
</sourceOfMetadataRecord>
intendedApplication¶
Path MetadataRecord.DescribedEntity.LanguageResource.intendedApplication
Data type component
Optionality Recommended
Explanation & Instructions
Specifies an LT application for which the language resource has been created or for which it can be used or is recommended to be used
The element is important for discovery purposes.
You can use the element LTClassRecommended
with one of the recommended values from the LT taxonomy (class ‘Function’ of the OMTD-SHARE ontology at http://w3id.org/meta-share/omtd-share/), or add a free text at the LTClassOther
element.
You can repeat the element if the resource can be used for various applications. For instance, a part-of-speech tagger can be used as a component for Named entity recognition, for sentiment analysis, etc.
Example
<ms:intendedApplication>
<ms:LTClassRecommended>http://w3id.org/meta-share/omtd-share/NamedEntityRecognition</ms:LTClassRecommended>
</ms:intendedApplication>
<ms:intendedApplication>
<ms:LTClassRecommended>http://w3id.org/meta-share/omtd-share/SentimentAnalysis</ms:LTClassRecommended>
</ms:intendedApplication>
<ms:intendedApplication>
<ms:LTClassOther>face recognition</ms:LTClassRecommended>
</ms:intendedApplication>
compliesWith¶
Path MetadataRecord.DescribedEntity.LanguageResource.compliesWith
Data type controlled vocabulary
Optionality Recommended
Explanation & Instructions
Specifies the vocabulary/standard/best practice to which a resource is compliant with.
You can use a value from the compliesWith controlled vocabulary.
Example
<ms:compliesWith>http://w3id.org/meta-share/meta-share/LemonOntolex</ms:compliesWith>
domain¶
Path MetadataRecord.DescribedEntity.LanguageResource.domain
Data type component
Optionality Recommended
Explanation & Instructions
Identifies the domain according to which a resource is classified
You must fill in the CategoryLabel
element with a free text value. If you prefer to add a value from an established controlled vocabulary, you can also use the DomainIdentifier
(with the attribute DomainClassificationScheme
with the appropriate value).
Example
<ms:domain>
<ms:categoryLabel xml:lang="en">EDUCATION & COMMUNICATIONS</ms:categoryLabel>
<ms:DomainIdentifier ms:DomainClassificationScheme="http://w3id.org/meta-share/meta-share/EUROVOC">32</ms:DomainIdentifier>
</ms:domain>
<ms:domain>
<ms:categoryLabel xml:lang="en">health</ms:categoryLabel>
</ms:domain>
keyword¶
Path MetadataRecord.DescribedEntity.LanguageResource.keyword
Data type multilingual string
Optionality Mandatory
Explanation & Instructions
Introduces a word or phrase considered important for the description of a language resource, person or organization and thus used to index or classify it
You can repeat the element if you want to add more keywords. Keywords are used for discovery purposes; so, try to use words or phrases that you think users will use to find similar resources to yours.
Example
<ms:keyword xml:lang="en">Named entity recognition</ms:keyword>
<ms:keyword xml:lang="en">person</ms:keyword>
<ms:keyword xml:lang="en">location</ms:keyword>
<ms:keyword xml:lang="en">fake news</ms:keyword>
<ms:keyword xml:lang="en">tweets</ms:keyword>
additionalInfo¶
Path MetadataRecord.DescribedEntity.LanguageResource.additionalInfo
Data type component
Optionality Mandatory
Explanation & Instructions
Introduces a point that can be used for further information (e.g. a landing page with a more detailed description of the resource or a general email that can be contacted for further queries)
It’s a recommended practice to give at least a landing page (landingPage
) or a general email addresss (email
); if you want, you can also specify a contact person (see full schema for contactPerson
)
Example
<ms:additionalInfo>
<ms:landingPage>https://provider.example.com/product</ms:landingPage>
</ms:additionalInfo>
<ms:additionalInfo>
<ms:email>product@example.com</ms:email>
</ms:additionalInfo>
contact¶
Path MetadataRecord.DescribedEntity.LanguageResource.contact
Data type component
Optionality Recommended
Explanation & Instructions
Specifies the data of the person/organization/group that can be contacted for information about a language resource
Example
<ms:contact>
<ms:Person>
<ms:actorType>Person</ms:actorType>
<ms:surname xml:lang="en">Smith</ms:surname>
<ms:givenName xml:lang="en">John</ms:givenName>
<ms:PersonalIdentifier ms:PersonalIdentifierScheme="http://purl.org/spar/datacite/orcid">String</ms:PersonalIdentifier>
<ms:email>smith@example.com</ms:email>
</ms:Person>
</ms:contact>
isDocumentedBy¶
Path MetadataRecord.DescribedEntity.LanguageResource.document
Data type component
Optionality Recommended
Explanation & Instructions
Links a language resource to a document (e.g., research paper describing its contents or its use in a project, user manual, etc.) or any other form of documentation (e.g., a URL with support information) that is related to the resource
You can use this element to add
supporting documentation (user manuals, training material, etc.) for the installation and use of your resource
scientific publications that describe the resource.
If you want, you can use one of the more fine-grained relations to documents (see full schema).
You can repeat the element if you want to add more documents.
You must fill in the title
element with the title of the document (or even an entire bibliographic record). When available, it’s also recommended to add the DocumentIdentifier
with the DOI of the document, or any other link to the document; if you do, use the attribute DocumentIdentifierScheme
to indicate the identifier type.’
Example
<ms:isDocumentedBy>
<ms:title xml:lang="en">Product User Manual</ms:title>
<ms:DocumentIdentifier ms:DocumentIdentifierScheme="http://purl.org/spar/datacite/url">https://www.company.org/product.pdf</ms:DocumentIdentifier>
</ms:isDocumentedBy>
replaces¶
Path MetadataRecord.DescribedEntity.LanguageResource.replaces
Data type component
Optionality Recommended
Explanation & Instructions
Links two Language Resources: the one being described to another which is an older version and has been replaced
You must provide the resourceName
of the language resource and, if possible, an LRIdentifier
that will help uniquely identify it.
Example
<ms:replaces>
<ms:resourceName xml:lang="en">COVID-19 Concept Embeddings</ms:resourceName>
<ms:LRIdentifier ms:LRIdentifierScheme="http://w3id.org/meta-share/meta-share/doi">https://zenodo.org/record/3753531</ms:LRIdentifier>
</ms:replaces>
isVersionOf¶
Path MetadataRecord.DescribedEntity.LanguageResource.isVersionOf
Data type component
Optionality Recommended
Explanation & Instructions
Links two Language Resources: the one being described to another which is a version (corrected, annotated, enriched, processed, etc.) of it
You must provide the resourceName
of the language resource and, if possible, an LRIdentifier
that will help uniquely identify it.
Example
<ms:isVersionOf>
<ms:resourceName xml:lang="en">COVID-19 Concept Embeddings</ms:resourceName>
<ms:LRIdentifier ms:LRIdentifierScheme="http://w3id.org/meta-share/meta-share/doi">https://zenodo.org/record/3753531</ms:LRIdentifier>
</ms:isVersionOf>
isPartOf¶
Path MetadataRecord.DescribedEntity.LanguageResource.isPartOf
Data type component
Optionality Recommended
Explanation & Instructions
Links two Language Resources: the one being described to another containing it (e.g., a monolingual corpus which is a part of a bilingual corpus)
You must provide the resourceName
of the language resource and, if possible, an LRIdentifier
that will help uniquely identify it.
Example
<ms:isPartOf>
<ms:resourceName xml:lang="en">Multilingual Example corpus</ms:resourceName>
<ms:LRIdentifier ms:LRIdentifierScheme="http://w3id.org/meta-share/meta-share/doi">https://zenodo.org/record/123456789</ms:LRIdentifier>
</ms:PartOf>
isSimilarTo¶
Path MetadataRecord.DescribedEntity.LanguageResource.isSimilarTo
Data type component
Optionality Recommended
Explanation & Instructions
Links two Language Resources: the one being described to another that bears resemblances with. Examples are: two resources which have been built with the same theoretical principles; the same resource which comes in different formats, or processed at the same level with different tools.
You must provide the resourceName
of the language resource and, if possible, an LRIdentifier
that will help uniquely identify it.
Example
<ms:isSimilarTo>
<ms:resourceName xml:lang="en">Multilingual Example corpus</ms:resourceName>
<ms:LRIdentifier ms:LRIdentifierScheme="http://w3id.org/meta-share/meta-share/doi">https://zenodo.org/record/123456789</ms:LRIdentifier>
</ms:isSimilarTo>
relation¶
Path MetadataRecord.DescribedEntity.LanguageResource.relation
Data type component
Optionality Recommended
Explanation & Instructions
Links two Language Resources specifying the type of relation as well
You must provide the relationType
(free text) and for the relatedLR
, the resourceName
of the language resource and, if possible, an LRIdentifier
that will help uniquely identify it.
Example
<ms:relation>
<ms:relationType xml:lang="en">new relation</ms:relationType>
<ms:relatedLR>
<ms:resourceName xml:lang="en">COVID-19 Concept Embeddings</ms:resourceName>
<ms:LRIdentifier ms:LRIdentifierScheme="http://w3id.org/meta-share/meta-share/doi">https://zenodo.org/record/3753531</ms:LRIdentifier>
</ms:relatedLR>
</ms:relation>
Minimal elements for tools/services¶
This page describes the minimal metadata elements specific to tools/services.
1. Overview¶
Element name |
Optionality |
Section |
Tab |
---|---|---|---|
function |
M |
Tool/Service |
categories |
developmentFramework |
R |
Tool/Service |
categories |
implementationLanguage |
R |
Tool/Service |
categories |
languageDependent |
M |
Tool/Service |
technical |
inputContentResource |
M |
Tool/Service |
technical |
processingResourceType |
M |
Tool/Service |
technical |
language |
MA |
Tool/Service |
technical |
mediaType |
R |
Tool/Service |
technical |
dataFormat |
R |
Tool/Service |
technical |
annotationType |
R |
Tool/Service |
technical |
sample |
R |
Tool/Service |
technical |
outputResource |
R |
Tool/Service |
technical |
processingResourceType |
M |
Tool/Service |
technical |
language |
MA |
Tool/Service |
technical |
mediaType |
R |
Tool/Service |
technical |
dataFormat |
R |
Tool/Service |
technical |
annotationType |
R |
Tool/Service |
technical |
requiredHardware |
R |
Tool/Service |
technical |
mlModel |
R |
Tool/Service |
technical |
parameter |
R |
Tool/Service |
technical |
evaluated |
R |
Tool/Service |
evaluation |
trl |
R |
Tool/Service |
evaluation |
SoftwareDistribution |
M |
distribution |
technical |
SoftwareDistributionForm |
M |
distribution |
technical |
webServiceType |
MA |
distribution |
technical |
dockerDownloadLocation |
RA |
distribution |
technical |
serviceAdapterDownloadLocation |
RA |
distribution |
technical |
downloadLocation |
RA |
distribution |
technical |
executionLocation |
RA |
distribution |
technical |
accessLocation |
RA |
distribution |
technical |
demoLocation |
R |
distribution |
technical |
privateResource |
R |
distribution |
technical |
additionalHWRequirements |
R |
distribution |
technical |
isDescribedBy |
R |
distribution |
technical |
licenceTerms |
M |
distribution |
technical |
cost |
R |
distribution |
technical |
membershipInstitution |
R |
distribution |
technical |
2. Element presentation¶
In this section all the aforementioned elements are presented each one separately. The presentation follows the order of the elements in the table of the previous section.
function¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService.function
Data type component
Optionality Mandatory
Explanation & Instructions
Specifies the operation/function/task that a software object performs
The element is important for discovery purposes.
You can fill in:
the
LTClassRecommended
element with one of the recommended values from the LT taxonomy, orthe
LTClassOther
element with a free text.
For services that perform multiple functions (e.g., syntactic and semantic annotation) you can repeat the element.
Example
<ms:function>
<ms:LTClassRecommended>http://w3id.org/meta-share/omtd-share/NamedEntityRecognition</ms:LTClassRecommended>
</ms:function>
<ms:function>
<ms:LTClassRecommended>http://w3id.org/meta-share/omtd-share/MachineTranslation</ms:LTClassRecommended>
</ms:function>
<ms:function>
<ms:LTClassOther>video segmentation</ms:LTClassRecommended>
</ms:function>
developmentFramework¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService.developmentFramework
Data type CV
Optionality Recommended
Explanation & Instructions
A framework or toolkit (Machine Learning model, NLP toolkit) used in the development of a resource
Example
<ms:developmentFramework>
<ms:DevelopmentFrameworkRecommended>http://w3id.org/meta-share/meta-share/TensorFlow<ms:DevelopmentFrameworkRecommended>
</ms:developmentFramework>
implementationLanguage¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService.implementationLanguage
Data type string
Optionality Recommended
Explanation & Instructions
The programming language(s) used for the development of a tool/service, which is needed for running the tools/services, in case no executables are available
Example
<ms:implementationLanguage>Java v8</ms:implementationLanguage>
languageDependent¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService.languageDependent
Data type boolean
Optionality Mandatory
Explanation & Instructions
Indicates whether the operation of the tool or service is language dependent or not
For language-dependent tools/services, you will be asked to also provide the language of the input and output resources.
Example
<ms:languageDependent>true</ms:languageDependent>
inputContentResource¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService.inputContentResource
Data type component
Optionality Mandatory
Explanation & Instructions
Specifies the requirements set by a tool/service for the (content) resource that it processes
The following elements are mandatory or recommended:
processingResourceType
(Mandatory): Specifies the resource type that a tool/service takes as input or produces as output; you must specify, for instance, if the tool/service can process a single file, or set of files, or processes a string typed in by the users.language
(Mandatory if applicable): Specifies the language that is used in the resource or supported by the tool/service, expressed according to the BCP47 recommendation. See languagemediaType
(Recommended): Specifies the media type of the input/output of a language processing tool/service. For ELG functional services, this will be used to fit the appropriate GUI (e.g. “audio” for ASR applications, vs. “text” for Machine Translation applications)dataFormat
(Recommended): Indicates the format(s) of a data resource Please, use to indicate the data format of the resource supported by the tool/service. The dataFormat controlled vocabulary lists data formats, with their mimetype and documentation on the particularities, thus catering for variations of formats, e.g. GATE XML, TEI variants, etc. You may also use a free text value.characterEncoding
(Recommended if applicable): Specifies the character encoding used for the input/output text resource of an LT serviceannotationType
(Recommended if applicable): Specifies the annotation type of the annotated version(s) of a resource or the annotation type a tool/ service requires or produces as an output. Use this element only if the tool/service processes pre-annotated corpora; for tools/services processing raw files, do not use. The element takes a value from a controlled vocabulary, see annotationType or a free text value.
Example
<!-- example for a tool with textual input -->
<ms:inputContentResource>
<ms:processingResourceType>http://w3id.org/meta-share/meta-share/file1</ms:processingResourceType>
<ms:language>
<ms:languageTag>en</ms:languageTag> <ms:languageId>en</ms:languageId>
</ms:language>
<ms:mediaType>http://w3id.org/meta-share/meta-share/text</ms:mediaType>
<ms:dataFormat><ms:dataFormatRecommended>http://w3id.org/meta-share/omtd-share/Json</ms:dataFormatRecommended></ms:dataFormat>
<ms:characterEncoding>http://w3id.org/meta-share/meta-share/UTF-8</ms:characterEncoding>
</ms:inputContentResource>
<!-- example for an Automatic Speech Recognizer -->
<ms:inputContentResource>
<ms:processingResourceType>http://w3id.org/meta-share/meta-share/file1</ms:processingResourceType>
<ms:language>
<ms:languageTag>de</ms:languageTag> <ms:languageId>de</ms:languageId>
</ms:language>
<ms:mediaType>http://w3id.org/meta-share/meta-share/audio</ms:mediaType>
<ms:dataFormat><ms:dataFormatRecommended>http://w3id.org/meta-share/omtd-share/mp3</ms:dataFormatRecommended></ms:dataFormat>
<ms:dataFormat><ms:dataFormatRecommended>http://w3id.org/meta-share/omtd-share/wav</ms:dataFormatRecommended></ms:dataFormat>
</ms:inputContentResource>
outputResource¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService.outputResource
Data type component
Optionality Recommended if applicable
Explanation & Instructions
Describes the features of the output resource processed by a tool/service.
The set of elements are the same as for the inputContentResource.
Make sure that you add here what is relevant for your application. For instance,
for annotation and information extraction tools/services, use the
annotationType
to indicate the results of your processing; you can repeat it to indicate mutliple annotation types (e.g., part of speech, person, amount, location, etc.)for Machine Translation tools, indicate the input and output languages respectively.
Example
<!-- example for an Information Extraction tool -->
<ms:outputResource>
<ms:processingResourceType>http://w3id.org/meta-share/meta-share/file1</ms:processingResourceType>
<ms:language>
<ms:languageTag>en</ms:languageTag>
<ms:languageId>en</ms:languageId>
</ms:language>
<ms:mediaType>http://w3id.org/meta-share/meta-share/text</ms:mediaType>
<ms:dataFormat><ms:dataFormatRecommended>http://w3id.org/meta-share/omtd-share/Json</ms:dataFormatRecommended></ms:dataFormat>
<ms:characterEncoding>http://w3id.org/meta-share/meta-share/UTF-8</ms:characterEncoding>
<ms:annotationType><ms:annotationTypeRecommended>http://w3id.org/meta-share/omtd-share/Person</ms:annotationTypeRecommended></ms:annotationType>
<ms:annotationType><ms:annotationTypeRecommended>http://w3id.org/meta-share/omtd-share/Location</ms:annotationTypeRecommended></ms:annotationType>
<ms:annotationType><ms:annotationTypeRecommended>http://w3id.org/meta-share/omtd-share/Organization</ms:annotationTypeRecommended></ms:annotationType>
<ms:annotationType><ms:annotationTypeRecommended>http://w3id.org/meta-share/omtd-share/Date</ms:annotationTypeRecommended></ms:annotationType>
<ms:annotationType><ms:annotationTypeRecommended>http://w3id.org/meta-share/omtd-share/Date</ms:annotationTypeRecommended></ms:annotationType>
</ms:outputResource>
<!-- example for a Machine Translation tool -->
<ms:outputResource>
<ms:processingResourceType>http://w3id.org/meta-share/meta-share/file1</ms:processingResourceType>
<ms:language>
<ms:languageTag>en</ms:languageTag>
<ms:languageId>en</ms:languageId>
</ms:language>
<ms:mediaType>http://w3id.org/meta-share/meta-share/text</ms:mediaType>
<ms:dataFormat><ms:dataFormatRecommended>http://w3id.org/meta-share/omtd-share/Json</ms:dataFormatRecommended></ms:dataFormat>
<ms:characterEncoding>http://w3id.org/meta-share/meta-share/UTF-8</ms:characterEncoding>
</ms:outputResource>
language¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService.language
Data type component
Optionality Mandatory if applicable
Explanation & Instructions
Specifies the language that is used in the resource or supported by the tool/service, expressed according to the BCP47 recommendation
The element languageTag
is composed of the languageId
, and optionally scriptId
, regionId
and variantId
; you can use those elements that best describe the language(s) of your resource.
Example
<ms:language>
<ms:languageTag>en</ms:languageTag>
<ms:languageId>en</ms:languageId>
</ms:language>
<ms:language>
<ms:languageTag>en-US</ms:languageTag>
<ms:languageId>en</ms:languageId>
<ms:regionId>US</ms:regionId>
</ms:language>
language¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService.sample
Data type component
Optionality Recommended
Explanation & Instructions
Introduces a combination of the sample text(s) or sample file(s) and optional tags that can be used for feeding a processing service for testing purposes.
You can add either a free text value using the sampleText
element, and/or link to a text using the samplesLocation
. You can also introduce a tag (tag
) that can be used as a criterion for selecting different samples for testing (e.g. the language value for Machine Translation services that operate on multiple languages).
Example
<ms:sample>
<ms:sampleText>John is in Berlin.</ms:sampleText>
<ms:tag>en</ms:tag>
</ms:language>
<ms:sample>
<ms:sampleText>Jean est à Berlin.</ms:sampleText>
<ms:tag>fr</ms:tag>
</ms:language>
requiredHardware¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService.requiredHardware
Data type CV (requiredHardware)
Optionality Recommended
Explanation & Instructions
Specifies the type of hardware required for running a tool and/or computational grammar
Example
<ms:requiredHardware>http://w3id.org/meta-share/meta-share/ocrSystem</ms:requiredHardware>
mlModel¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService.mlModel
Data type component
Optionality Recommended
Explanation & Instructions
Specifies the ML model that must be used together with the tool/service to perform the desired task
You must provide the resourceName
of the language resource and, if possible, an LRIdentifier
that will help uniquely identify it.
Example
<ms:isRelatedToLR>
<ms:resourceName xml:lang="en">Bio2Vec - Results from October 13, 2017</ms:resourceName>
<ms:LRIdentifier ms:LRIdentifierScheme="http://w3id.org/meta-share/meta-share/url">https://live.european-language-grid.eu/catalogue/ld/7509</ms:LRIdentifier>
</ms:isRelatedToLR>
requiredHardware¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService.requiredHardware
Data type CV (requiredHardware)
Optionality Recommended
Explanation & Instructions
Specifies the type of hardware required for running a tool and/or computational grammar
Example
<ms:requiredHardware>http://w3id.org/meta-share/meta-share/ocrSystem</ms:requiredHardware>
parameter¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService.parameter
Data type component
Optionality Recommended
Explanation & Instructions
Introduces a parameter used for running a tool/service
It can be filled in with the following elements:
parameterName
(M): Introduces the name of the parameter as sent to a processing serviceparameterLabel
(M): Introduces a short name for a parameter suitable for use as a field label in a user interfaceparameterDescription
(M): Provides a short account of he parameter (e.g., function it performs, input / output requirements, etc.) in free textparameterType
(M): Classifies the parameter according to a specific (not yet standardised) typing system (e.g., whether it’s boolean, string, integer, a document, mapping, etc.)optional
(M): Specifies whether the parameter should be treated as mandatory or optional by user interfacesmultiValue
(M): Specifies whether the parameter takes a list of valuesdefaultValue
(MA): Specifies the initial value that user interfaces should use when prompting the user for a parameter taking a list of valuesdataFormat
(MA): Use to specify the data format, if applicable, for the input/output resource that can be used in the parameter; it takes a value from a recommended controlled vocabulary or a free text value.enumerationValue
(MA): Introduces a value of a list used inside parameters; it is a component with the following elements: valueLabel and valueDescription.
Example
<ms:parameter>
<ms:parameterName>no_global</ms:parameterName>
<ms:parameterLabel xml:lang="en">Skip global relation extraction</ms:parameterLabel>
<ms:parameterDescription xml:lang="en">Speedup for large documents, but less extracted relations and lower accuracy.</ms:parameterDescription>
<ms:parameterType>http://w3id.org/meta-share/meta-share/boolean</ms:parameterType>
<ms:optional>true</ms:optional>
<ms:multiValue>false</ms:multiValue>
<ms:defaultValue>false</ms:defaultValue>
</ms:parameter>
trl¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService.trl
Data type CV (TRL)
Optionality Recommended
Explanation & Instructions
Specifies the TRL (Technology Readiness Level) of the technology according to the measurement system defined by the EC (https://ec.europa.eu/research/participants/data/ref/h2020/wp/2014_2015/annexes/h2020-wp1415-annex-g-trl_en.pdf)
Example
<ms:trl>http://w3id.org/meta-share/meta-share/trl4</ms:trl>
evaluated¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService.evaluated
Data type boolean
Optionality Mandatory
Explanation & Instructions
Indicates whether the tool or service has been evaluated
If the tool/service has been evaluated, you can use the ‘evaluation’ component to give more detailed information; see here for the relevant elements.
Example
<ms:evaluated>false</ms:evaluated>
SoftwareDistribution¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService.SoftwareDistribution
Data type component
Optionality Mandatory
Explanation & Instructions
Any form with which software is distributed (e.g., web services, executable or code files, etc.)
This element groups together information that pertains to the physical form of a tool/service that is made available through the catalogue. For software that is distributed with multiple forms (e.g., as source code, as a web service, etc.), you can repeat this group of elements. The access location and the licensing conditions may differ for each distribution.
The following list includes the mandatory and recommended elements:
SoftwareDistributionForm
(Mandatory): The medium, delivery channel or form (e.g., source code, API, web service, etc.) through which a software object is distributed. Use the valuehttp://w3id.org/meta-share/meta-share/dockerImage
for ELG integrated services.webServiceType
(Recommended if applicable): The type of a web service following the web service communication protocols. Recommended for web services.dockerDownloadLocation
(Mandatory if applicable): A location where the the LT tool docker image is stored. For ELG integrated services, add the location from where the ELG team can download the docker image in order to test it.serviceAdapterDownloadLocation
(Mandatory if applicable): Τhe URL where the docker image of the service adapter can be downloaded from. Required only for ELG integrated services implemented with an adapter.executionLocation
(Mandatory if applicable): A URL where the resource (mainly software) can be directly executed. Add here the REST endpoint at which the LT tool is exposed within the Docker image. It is also used for software available in the form of executable code or web services.downloadLocation
(Mandatory if applicable): A URL where a tool can be downloaded from. To be used only for direct links, i.e. for links that require no extra actions on the part of the user.accessLocation
(Mandatory if applicable): A URL where a tool can be accessed. It can be used, for instance, for links to tools that are included in a web page, or for tools that require authentication and authorization before being accessed.demoLocation
(Recommended if applicable): A URL providing access to a demo version of the tool/service. For ELG integrated services, this does not have to be filled in, since ELG provides a demo version at the “Try out” tab of the metadata record.privateResource
(Recommended): Specifies whether the resource is private so that its access/download location remains hidden.additionalHwRequirements
(Mandatory if applicable): A short text where you specify additional requirements for running the service, e.g. memory requirements, etc. The recommended format for this is: ‘limits_memory: X limits_cpu: Y’licenceTerms
(Mandatory): See licenceTermscost
(Recommended if applicable): The cost for accessing a resource or the overall budget of a project, formally described as a set of amount (amount) and currency unit (currency). Fill in this element only if the tool/service can be accessed on a fee.membershipInstitution
(Recommended if applicable): Introduces an institution with members that can benefit from specific conditions on the use of a resource (e.g. discount, unlimited access, etc.). Use this element only if such specific conditions apply.
Example
<ms:SoftwareDistribution>
<ms:SoftwareDistributionForm>http://w3id.org/meta-share/meta-share/dockerImage</ms:SoftwareDistributionForm>
<ms:executionLocation>http://localhost:8080/mt/process/</ms:executionLocation>
<ms:dockerDownloadLocation>registry.gitlab.com/EXAMPLE</ms:dockerDownloadLocation>
<ms:serviceAdapterDownloadLocation>registry.gitlab.com/serviceAdapter</ms:serviceAdapterDownloadLocation>
<ms:privateResource>false</ms:privateResource>
<ms:isDescribedBy>
<ms:title xml:lang="en">description article</ms:title>
<ms:DocumentIdentifier ms:DocumentIdentifierScheme="http://purl.org/spar/datacite/bibcode">String</ms:DocumentIdentifier>
</ms:isDescribedBy>
<ms:additionalHWRequirements>terabytes</ms:additionalHWRequirements>
<ms:licenceTerms>
<ms:licenceTermsName xml:lang="en">GNU Lesser General Public License v3.0 only</ms:licenceTermsName>
<ms:licenceTermsURL>https://spdx.org/licenses/LGPL-3.0-only.html</ms:licenceTermsURL>
<ms:LicenceIdentifier ms:LicenceIdentifierScheme="http://w3id.org/meta-share/meta-share/SPDX">LGPL-3.0-only</ms:LicenceIdentifier>
<ms:conditionOfUse>http://w3id.org/meta-share/meta-share/unspecified</ms:conditionOfUse>
</ms:licenceTerms>
<ms:cost>
<ms:amount>14500</ms:amount>
<ms:currency>http://w3id.org/meta-share/meta-share/euro</ms:currency>
</ms:cost>
<ms:membershipInstitution>http://w3id.org/meta-share/meta-share/ELRA</ms:membershipInstitution>
</ms:SoftwareDistribution>
licenceTerms¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService.SoftwareDistribution.licenceTerms
Data type component
Optionality Mandatory
Explanation & Instructions
Links the distribution (distributable form) of a language resource to the licence or terms of use/service (a specific legal document) with which it is distributed
The recommended practice is to add a licence name and identifier from the SPDX list of licences (https://spdx.org/licenses/). For proprietary licences or licences not included in the above list, please add a (unique) licence name and the URL where the text of the licence can be found.
You must also fill in the conditionOfUse element. For popular standard licences, we have already included the conditions of use. So, you can add the element with the value http://w3id.org/meta-share/meta-share/unspecified. For proprietary licences, you can add the conditions of user or use the same value.
Example
<ms:licenceTerms>
<ms:licenceTermsName xml:lang="en">GNU Lesser General Public License v3.0 only</ms:licenceTermsName>
<ms:licenceTermsURL>https://spdx.org/licenses/LGPL-3.0-only.html</ms:licenceTermsURL>
<ms:LicenceIdentifier ms:LicenceIdentifierScheme="http://w3id.org/meta-share/meta-share/SPDX">LGPL-3.0-only</ms:LicenceIdentifier>
<ms:conditionOfUse>http://w3id.org/meta-share/meta-share/unspecified</ms:conditionOfUse>
</ms:licenceTerms>
<ms:licenceTerms>
<ms:licenceTermsName xml:lang="en">publicDomain</ms:licenceTermsName>
<ms:licenceTermsURL>https://elrc-share.eu/terms/publicDomain.html</ms:licenceTermsURL>
<ms:conditionOfUse>http://w3id.org/meta-share/meta-share/noConditions</ms:conditionOfUse>
</ms:licenceTerms>
<ms:licenceTerms>
<ms:licenceTermsName xml:lang="en">Creative Commons Attribution 4.0 International</ms:licenceTermsName>
<ms:licenceTermsURL>https://creativecommons.org/licenses/by/4.0/legalcode</ms:licenceTermsURL>
<ms:LicenceIdentifier ms:LicenceIdentifierScheme="http://w3id.org/meta-share/meta-share/SPDX">CC-BY-4.0</ms:LicenceIdentifier>
<ms:conditionOfUse>http://w3id.org/meta-share/meta-share/attribution</ms:conditionOfUse>
</ms:licenceTerms>
Minimal elements for corpora¶
This page describes the minimal metadata elements specific to corpora.
1. Overview¶
Corpora are collections of text documents, audio transcripts, audio and video recordings, etc. To cater for the representation of multimedia/multimodal language resources (e.g. a corpus of videos and their subtitles, or corpus of audio recordings and their transcripts), the notion of “media part” is introduced in the model. Thus, a corpus consists of at least one text, audio, video, image and numerical text parts. Depending on the media part type, the DatasetDistribution component includes a set of text, audio, video, etc. distribution features.
The first table below has all the elements (mandatory and recommended) for a Corpus. The second table presents the mandatory and recommended elements for each media part. The third table presents the mandatory and recommended elements for the Distribution component, which includes elements that are specific to each media part.
Table 1 - Corpus common
Element name |
Optionality |
Section |
Tab |
---|---|---|---|
corpusSubclass |
M |
Corpus |
Technical |
personalDataIncluded |
M |
Corpus |
Technical |
personalDataDetails |
RA |
Corpus |
Technical |
sensitiveDataIncluded |
RA |
Corpus |
Technical |
sensitiveDataDetails |
M |
Corpus |
Technical |
anonymized |
MA |
Corpus |
Technical |
anonymizationDetails |
RA |
Corpus |
Technical |
isAnnotatedVersionOf |
R |
Corpus |
Technical |
Table 2 - Media parts
Element name |
Optionality |
Section |
Tab |
---|---|---|---|
lingualityType |
M |
Corpus |
text part |
multilingualityType |
MA |
Corpus |
text part |
multilingualityTypeDetails |
R |
Corpus |
text part |
language |
M |
Corpus |
text part |
textType |
R |
Corpus |
text part |
annotation |
RA |
Corpus |
text part |
lingualityType |
M |
Corpus |
audio part |
multilingualityType |
MA |
Corpus |
audio part |
multilingualityTypeDetails |
RA |
Corpus |
audio part |
language |
M |
Corpus |
audio part |
AudioGenre |
R |
Corpus |
audio part |
SpeechGenre |
R |
Corpus |
audio part |
numberOfParticipants |
R |
Corpus |
audio part |
dialectAccentOfParticipants |
R |
Corpus |
audio part |
geographicDistributionOfParticipants |
R |
Corpus |
audio part |
annotation |
RA |
Corpus |
audio part |
lingualityType |
M |
Corpus |
video part |
multilingualityType |
MA |
Corpus |
video part |
multilingualityTypeDetails |
RA |
Corpus |
video part |
language |
M |
Corpus |
video part |
typeOfVideoContent |
M |
Corpus |
video part |
VideoGenre |
R |
Corpus |
video part |
numberOfParticipants |
R |
Corpus |
video part |
dialectAccentOfParticipants |
R |
Corpus |
video part |
geographicDistributionOfParticipants |
R |
Corpus |
video part |
annotation |
RA |
Corpus |
video part |
lingualityType |
M |
Corpus |
image part |
multilingualityType |
RA |
Corpus |
image part |
multilingualityTypeDetails |
RA |
Corpus |
image part |
language |
M |
Corpus |
image part |
typeOfImageContent |
M |
Corpus |
image part |
ImageGenre |
R |
Corpus |
image part |
annotation |
RA |
Corpus |
image part |
typeOfTextNumericalContent |
M |
Corpus |
numerical text part |
numberOfParticipants |
R |
Corpus |
numerical text part |
dialectAccentOfParticipants |
R |
Corpus |
numerical text part |
geographicDistributionOfParticipants |
R |
Corpus |
numerical text part |
annotation |
RA |
Corpus |
numerical text part |
Table 3 - Distribution
Element name |
Optionality |
Section |
Tab |
---|---|---|---|
DatasetDistribution |
M |
Distribution |
Technical |
DatasetDistributionForm |
M |
Distribution |
Technical |
downloadLocation |
MA |
Distribution |
Technical |
accessLocation |
MA |
Distribution |
Technical |
distributionLocation |
MA |
Distribution |
Technical |
samplesLocation |
R |
Distribution |
Technical |
distributionTextFeature |
MA |
Distribution |
Technical |
distributionAudioFeature |
MA |
Distribution |
Technical |
distributionVideoFeature |
MA |
Distribution |
Technical |
distributionImageFeature |
MA |
Distribution |
Technical |
distributionTextNumericalFeature |
MA |
Distribution |
Technical |
licenceTerms |
M |
Distribution |
Technical |
cost |
R |
Distribution |
Technical |
membershipInstitution |
R |
Distribution |
Technical |
2. Element presentation¶
In this section all the aforementioned elements are presented each one separately. The presentation follows the order of the elements in the tables of the previous section.
Corpus¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus
Data type component
Optionality Mandatory
Explanation & Instructions
Wraps together the set of elements that is specific to corpora
Example
<ms:LRSubclass>
<ms:Corpus>
<ms:lrType>Corpus</ms:lrType>
</ms:Corpus>
</ms:LRSubclass>
corpusSubclass¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.corpusSubclass
Data type CV (corpusSubclass)
Optionality Mandatory
Explanation & Instructions
Introduces a classification of corpora into types (used for descriptive reasons)
Use one of the values for raw corpora, annotated corpora (mixed raw with annotations), annotations (only annotations without the original corpus)
Example
<ms:corpusSubclass>http://w3id.org/meta-share/meta-share/rawCorpus</ms:corpusSubclass>
<ms:corpusSubclass>http://w3id.org/meta-share/meta-share/annotatedCorpus</ms:corpusSubclass>
personalDataIncluded¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.personalDataIncluded
Data type CV
Optionality Mandatory
Explanation & Instructions
Specifies whether the language resource contains personal data (mainly in the sense falling under the GDPR)
If the resource contains personal data, you can use the (recommended) personalDataDetails
to provide more information
Example
<ms:personalDataIncluded>http://w3id.org/meta-share/meta-share/yesP</ms:personalDataIncluded>
<ms:personalDataDetails>The corpus contains data on the place of living and place of birth of participants</ms:personalDataDetails>
sensitiveDataIncluded¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.sensitiveDataIncluded
Data type CV
Optionality Mandatory
Explanation & Instructions
Specifies whether the language resource contains sensitive data (e.g., medical/health-related, etc.) and thus requires special handling
If the resource contains sensitive data, you can use the (recommended) sensitiveDataDetails
to provide more information.
Example
<ms:sensitiveDataIncluded>http://w3id.org/meta-share/meta-share/yesS</ms:sensitiveDataIncluded>
<ms:sensitiveDataDetails>The corpus contains medical data for persons with disabilities</ms:sensitiveDataDetails>
anonymized¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.anonymized
Data type CV
Optionality Mandatory if applicable
Explanation & Instructions
Indicates whether the language resource has been anonymized
The element is mandatory if either personalDataIncluded
or sensitiveDataIncluded
have ‘true’ as value; anonymizationDetails
must also be filled in with information on the anonymization mehod, etc.
Example
<ms:anonymized>http://w3id.org/meta-share/meta-share/yesA</ms:anonmized>
<ms:anonymizationDetails>pseudonymization performed manually</ms:anonymizationDetails>
isAnnotatedVersionOf¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.isAnnotatedVersionOf
Data type component
Optionality Recommended when applicable
Explanation & Instructions
Links to a corpus B which is the raw corpus that has been annotated (corpus A, the one being described)
You must provide the resourceName
of the language resource and, if possible, an LRIdentifier
that will help uniquely identify it.
Example
<ms:isAnnotatedVersionOf>
<ms:resourceName xml:lang="en">MTP Annotated German corpus - untagged version</ms:resourceName>
<ms:LRIdentifier ms:LRIdentifierScheme="http://w3id.org/meta-share/meta-share/islrn">417-827-623-669-9</ms:LRIdentifier>
</ms:isAnnotatedVersionOf>
CorpusTextPart¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.CorpusMediaPart.CorpusTextPart
Data type component
Optionality Mandatory if applicable
Explanation & Instructions
The part of a corpus (or a whole corpus) that consists of textual segments (e.g., a corpus of publications, or transcriptions of an oral corpus, or subtitles , etc.)
You can repeat the group of elements for multiple textual parts.
The mandatory or recommended elements for the text part are:
mediaType
(Mandatory): Specifies the media type of a language resource (the physical medium of the contents representation). For text parts, always use the value ‘text’.
lingualityType
(Mandatory): Indicates whether the resource includes one, two or more languages. Computed by the system based on the number of language or the ISO value for collective languages.
multilingualityType
(Mandatory if applicable): Indicates whether the resource (part) is parallel, comparable or mixed. If lingualityType = bilingual or multilingual, it is required; select one of the values for parallel (e.g., original text and its translations), comparable (e.g. corpus of the same domain in multiple languages) and multilingualSingleText (for corpora that consist of segments including text in two or more languages (e.g., the transcription of a European Parliament session with MPs speaking in their native language.
language
(Mandatory): Specifies the language that is used in the resource part , expressed according to the BCP47 recommendation. See language.
languageVariety
(Mandatory if applicable): Relates a language resource that contains segments in a language variety (e.g., dialect, jargon) to it. Please use for dialect corpora.
modalityType
(Recommended if applicable): Specifies the type of the modality represented in the resource. For instance, you can use ‘spoken language’ to describe transcribed speech corpora.
TextGenre
(Recommended): A category of text characterized by a particular style, form, or content according to a specific classification scheme. See TextGenre.
annotation
(Mandatory if applicable): A set of features describing the annotated parts of a resource. See annotation.
Example
<ms:CorpusTextPart>
<ms:corpusMediaType>CorpusTextPart</ms:corpusMediaType>
<ms:mediaType>http://w3id.org/meta-share/meta-share/text</ms:mediaType>
<ms:lingualityType>http://w3id.org/meta-share/meta-share/monolingual</ms:lingualityType>
<ms:language>
<ms:languageTag>es</ms:languageTag>
<ms:languageId>es</ms:languageId>
</ms:language>
</ms:CorpusTextPart>
<ms:CorpusTextPart>
<ms:corpusMediaType>CorpusTextPart</ms:corpusMediaType>
<ms:mediaType>http://w3id.org/meta-share/meta-share/text</ms:mediaType>
<ms:lingualityType>http://w3id.org/meta-share/meta-share/bilingual</ms:lingualityType>
<ms:language>
<ms:languageTag>es</ms:languageTag>
<ms:languageId>es</ms:languageId>
</ms:language>
<ms:language>
<ms:languageTag>en</ms:languageTag>
<ms:languageId>en</ms:languageId>
</ms:language>
<ms:multilingualityType>http://w3id.org/meta-share/meta-share/parallel</ms:multilingualityType>
<ms:TextGenre>
<ms:CategoryLabel>administrative texts</ms:CategoryLabel>
</ms:TextGenre>
</ms:CorpusTextPart>
<ms:CorpusTextPart>
<ms:corpusMediaType>CorpusTextPart</ms:corpusMediaType>
<ms:mediaType>http://w3id.org/meta-share/meta-share/text</ms:mediaType>
<ms:lingualityType>http://w3id.org/meta-share/meta-share/monolingual</ms:lingualityType>
<ms:language>
<ms:languageTag>en</ms:languageTag>
<ms:languageId>en</ms:languageId>
</ms:language>
<ms:modalityType>http://w3id.org/meta-share/meta-share/spokenLanguage</ms:modalityType>
</ms:CorpusTextPart>
CorpusAudioPart¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.CorpusMediaPart.CorpusAudioPart
Data type component
Optionality Mandatory if applicable
Explanation & Instructions
The part of a corpus (or whole corpus) that consists of audio segments
You can repeat the group of elements for multiple audio parts.
The mandatory or recommended elements for the audio part are:
mediaType
(Mandatory): Specifies the media type of a language resource (the physical medium of the contents representation). For audio parts, always use the value ‘audio’
lingualityType
(Mandatory ): Indicates whether the resource includes one, two or more languages. Computed by the system based on the number of language or the ISO value for collective languages.
multilingualityType
(Mandatory if applicable): Indicates whether the resource (part) is parallel, comparable or mixed. If lingualityType = bilingual or multilingual, it is required; select one of the values for parallel (e.g., original text and its translations), comparable (e.g. corpus of the same domain in multiple languages) and multilingualSingleText (for corpora that consist of segments with content in two or more languages (e.g., the transcription of a European Parliament session with MPs speaking in their native language)
language
(Mandatory): Specifies the language that is used in the resource part , expressed according to the BCP47 recommendation. See language
languageVariety
(Mandatory if applicable): Relates a language resource that contains segments in a language variety (e.g., dialect, jargon) to it. Please use for dialect corpora.
modalityType
(Recommended if applicable): Specifies the type of the modality represented in the resource. For instance, you can use ‘spoken language’ to describe transcribed speech corpora.
AudioGenre
(Recommended if applicable): A category of audio characterized by a particular style, form, or content according to a specific classification scheme. See AudioGenre
SpeechGenre
(Recommended if applicable): A category for the conventionalized discourse of the speech part of a language resource, based on extra-linguistic and internal linguistic criteria. See SpeechGenre
annotation
(Mandatory if applicable): A set of features describing the annotated parts of a resource. See annotation.
Example
<ms:CorpusAudioPart>
<ms:corpusMediaType>CorpusAudioPart</ms:corpusMediaType>
<ms:mediaType>http://w3id.org/meta-share/meta-share/audio</ms:mediaType>
<ms:lingualityType>http://w3id.org/meta-share/meta-share/monolingual</ms:lingualityType>
<ms:language>
<ms:languageTag>en</ms:languageTag>
<ms:languageId>en</ms:languageId>
</ms:language>
<ms:AudioGenre>
<ms:CategoryLabel>conference noises</ms:CategoryLabel>
</ms:AudioGenre>
</ms:CorpusAudioPart>
<ms:CorpusAudioPart>
<ms:corpusMediaType>CorpusAudioPart</ms:corpusMediaType>
<ms:mediaType>http://w3id.org/meta-share/meta-share/audio</ms:mediaType>
<ms:lingualityType>http://w3id.org/meta-share/meta-share/monolingual</ms:lingualityType>
<ms:language>
<ms:languageTag>en</ms:languageTag>
<ms:languageId>en</ms:languageId>
</ms:language>
<ms:modalityType>http://w3id.org/meta-share/meta-share/spokenLanguage</ms:modalityType>
<ms:SpeechGenre>
<ms:CategoryLabel>monologue</ms:CategoryLabel>
</ms:SpeechGenre>
</ms:CorpusAudioPart>
CorpusVideoPart¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.CorpusMediaPart.CorpusVideoPart
Data type component
Optionality Mandatory if applicable
Explanation & Instructions
The part of a corpus (or a whole corpus) that consists of video segments (e.g., a corpus of video lectures, a part of a corpus with news, a sign language corpus, etc.)
You can repeat the group of elements for multiple video parts.
The mandatory or recommended elements for the video part are:
mediaType
(Mandatory): Specifies the media type of a language resource (the physical medium of the contents representation). For video parts, always use the value ‘video’.
lingualityType
(Mandatory ): Indicates whether the resource includes one, two or more languages. Computed by the system based on the number of language or the ISO value for collective languages.
multilingualityType
(Mandatory if applicable): Indicates whether the resource (part) is parallel, comparable or mixed. If lingualityType = bilingual or multilingual, it is required; select one of the values for parallel (e.g., original text and its translations), comparable (e.g. corpus of the same domain in multiple languages) and multilingualSingleText (for corpora that consist of segments with content in two or more languages (e.g., the transcription of a European Parliament session with MPs speaking in their native language).
language
(Mandatory): Specifies the language that is used in the resource part , expressed according to the BCP47 recommendation. See language.
languageVariety
(Mandatory if applicable): Relates a language resource that contains segments in a language variety (e.g., dialect, jargon) to it. Please use for dialect corpora.
modalityType
(Recommended if applicable): Specifies the type of the modality represented in the resource. For instance, you can use ‘spoken language’ to describe transcribed speech corpora.
VideoGenre
(Recommended): A classification of video parts based on extra-linguistic and internal linguistic criteria and reflected on the video style, form or content. See VideoGenre
typeOfVideoContent
(Mandatory): Main type of object or people represented in the video.
annotation
(Mandatory if applicable): A set of features describing the annotated parts of a resource. See annotation.
Example
<ms:CorpusVideoPart>
<ms:corpusMediaType>CorpusVideoPart</ms:corpusMediaType>
<ms:mediaType>http://w3id.org/meta-share/meta-share/video</ms:mediaType>
<ms:lingualityType>http://w3id.org/meta-share/meta-share/monolingual</ms:lingualityType>
<ms:language>
<ms:languageTag>en</ms:languageTag>
<ms:languageId>en</ms:languageId>
</ms:language>
<ms:modalityType>http://w3id.org/meta-share/meta-share/bodyGesture</ms:modalityType>
<ms:modalityType>http://w3id.org/meta-share/meta-share/facialExpression</ms:modalityType>
<ms:modalityType>http://w3id.org/meta-share/meta-share/spokenLanguage</ms:modalityType>
<ms:typeOfVideoContent>people eating at a restaurant</ms:typeOfVideoContent>
</ms:CorpusVideoPart>
<ms:CorpusVideoPart>
<ms:corpusMediaType>CorpusVideoPart</ms:corpusMediaType>
<ms:mediaType>http://w3id.org/meta-share/meta-share/video</ms:mediaType>
<ms:lingualityType>http://w3id.org/meta-share/meta-share/monolingual</ms:lingualityType>
<ms:language>
<ms:languageTag>fr</ms:languageTag>
<ms:languageId>fr</ms:languageId>
</ms:language>
<ms:VideoGenre>
<ms:CategoryLabel>documentary</ms:CategoryLabel>
</ms:VideoGenre>
<ms:typeOfVideoContent>birds, wild animals, plants</ms:typeOfVideoContent>
</ms:CorpusVideoPart>
CorpusImagePart¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.CorpusMediaPart.CorpusImagePart
Data type component
Optionality Mandatory if applicable
Explanation & Instructions
The part of a corpus (or whole corpus) that consists of images (e.g., g a corpus of photographs and their captions)
You can repeat the group of elements for multiple image parts.
The mandatory or recommended elements for the image part are:
mediaType
(Mandatory): Specifies the media type of a language resource (the physical medium of the contents representation). For image parts, always use the value ‘image’.
lingualityType
(Mandatory ): Indicates whether the resource includes one, two or more languages. Computed by the system based on the number of language or the ISO value for collective languages.
multilingualityType
(Mandatory if applicable): Indicates whether the resource (part) is parallel, comparable or mixed. If lingualityType = bilingual or multilingual, it is required; select one of the values for parallel (e.g., original text and its translations), comparable (e.g. corpus of the same domain in multiple languages) and multilingualSingleText (for corpora that consist of segments with content in two or more languages (e.g., the transcription of a European Parliament session with MPs speaking in their native language).
language
(Mandatory): Specifies the language that is used in the resource part, expressed according to the BCP47 recommendation. See language.
languageVariety
(Mandatory if applicable): Relates a language resource that contains segments in a language variety (e.g., dialect, jargon) to it. Please use for dialect corpora.
modalityType
(Recommended if applicable): Specifies the type of the modality represented in the resource.
ImageGenre
(Recommended): A category of images characterized by a particular style, form, or content according to a specific classification scheme. See ImageGenre.
typeOfImageContent
(Mandatory): Main type of object or people represented in the image.
annotation
(Mandatory if applicable): A set of features describing the annotated parts of a resource. See annotation.
Example
<ms:CorpusImagePart>
<ms:corpusMediaType>CorpusImagePart</ms:corpusMediaType>
<ms:mediaType>http://w3id.org/meta-share/meta-share/image</ms:mediaType>
<ms:lingualityType>http://w3id.org/meta-share/meta-share/monolingual</ms:lingualityType>
<ms:language>
<ms:languageTag>el</ms:languageTag>
<ms:languageId>el</ms:languageId>
</ms:language>
<ms:ImageGenre>
<ms:CategoryLabel>comics</ms:CategoryLabel>
</ms:ImageGenre>
<ms:typeOfImageContent>human figures</ms:typeOfImageContent>
</ms:CorpusImagePart>
CorpusTextNumericalPart¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.CorpusMediaPart.CorpusTextNumericalPart
Data type component
Optionality Mandatory if applicable
Explanation & Instructions
The part of a corpus (or whole corpus) that consists of sets of textual representations of measurements and observations linked to sensorimotor recordings
You can repeat the group of elements for multiple numerical text parts.
The mandatory or recommended elements for this part are:
mediaType
(Mandatory): Specifies the media type of a language resource (the physical medium of the contents representation). For numerical text parts, always use the value ‘textNumerical’.
typeOfTextNumericalContent
(Mandatory): Main type of object or people represented in this part.
numberOfParticipants
(Recommended): The number of the persons participating in the part of the resource
dialectAccentOfParticipants
(Recommended): Provides information on the dialect accent of the group of participants
geographicDistributionOfParticipants
(Recommended): Gives information on the geographic distribution of the participants
annotation
(Mandatory if applicable): A set of features describing the annotated parts of a resource. See annotation.
Example
<ms:CorpusTextNumericalPart>
<ms:corpusMediaType>CorpusImagePart</ms:corpusMediaType>
<ms:mediaType>http://w3id.org/meta-share/meta-share/textNumerical</ms:mediaType>
<ms:typeOfTextNumericalContent>temperature measures</ms:typeOfTextNumericalContent>
</ms:CorpusTextNumericalPart>
TextGenre¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.CorpusMediaPart.CorpusTextPart.TextGenre
Data type component
Optionality Recommended
Explanation & Instructions
A category of text characterized by a particular style, form, or content according to a specific classification scheme
You can add only a free text value at the CategoryLabel
element; if you have used a value from an established controlled vocabulary, you can use the TextGenreIdentifier
and the attribute TextGenreClassificationScheme
.
Example
<ms:TextGenre>
<ms:CategoryLabel>movie subtitles</ms:CategoryLabel>
</ms:TextGenre>
<ms:TextGenre>
<ms:CategoryLabel>news articles</ms:CategoryLabel>
</ms:TextGenre>
AudioGenre¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.CorpusMediaPart.CorpusAudioPart
Data type component
Optionality Recommended if applicable
Explanation & Instructions
A category of audio characterized by a particular style, form, or content according to a specific classification scheme
You can add only a free text value at the CategoryLabel
element; if you have used a value from an established controlled vocabulary, you can use the AudioGenreIdentifier
and the attribute AudioGenreClassificationScheme
to provide further details.
Example
<ms:AudioGenre>
<ms:CategoryLabel>conference noises</ms:CategoryLabel>
</ms:AudioGenre>
SpeechGenre¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.CorpusMediaPart.CorpusAudioPart.SpeechGenre
Data type component
Optionality Recommended if applicable
Explanation & Instructions
A category for the conventionalized discourse of the speech part of a language resource, based on extra-linguistic and internal linguistic criteria
You can add only a free text value at the CategoryLabel
element; if you have used a value from an established controlled vocabulary, you can use the SpeechGenreIdentifier
and the attribute SpeechGenreClassificationScheme
to provide further details.
Example
<ms:SpeechGenre>
<ms:CategoryLabel>broadcast news</ms:CategoryLabel>
</ms:SpeechGenre>
<ms:SpeechGenre>
<ms:CategoryLabel>monologue</ms:CategoryLabel>
</ms:SpeechGenre>
VideoGenre¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.CorpusMediaPart.CorpusVideoPart.VideoGenre
Data type string (+ id + scheme)
Optionality Recommended if applicable
Explanation & Instructions
A classification of video parts based on extra-linguistic and internal linguistic criteria and reflected on the video style, form or content
You can add only a free text value at the CategoryLabel
element; if you have used a value from an established controlled vocabulary, you can use the VideoGenreIdentifier
and the attribute VideoClassificationScheme
Example
<ms:videoGenre>
<ms:CategoryLabel>documentaries</ms:CategoryLabel>
</ms:videoGenre>
<ms:videoGenre>
<ms:CategoryLabel>video lectures</ms:CategoryLabel>
</ms:videoGenre>
ImageGenre¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.CorpusMediaPart.CorpusImagePart.ImageGenre
Data type component
Optionality Recommended
Explanation & Instructions
A category of images characterized by a particular style, form, or content according to a specific classification scheme
You can add only a free text value at the CategoryLabel
element; if you have used a value from an established controlled vocabulary, you can use the ImageGenreIdentifier
and the attribute ImageClassificationScheme
to provide further details.
Example
<ms:imageGenre>
<ms:CategoryLabel>human faces</ms:CategoryLabel>
</ms:imageGenre>
<ms:imageGenre>
<ms:CategoryLabel>landscape</ms:CategoryLabel>
</ms:imageGenre>
annotation¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.annotation
Data type component
Optionality Mandatory if applicable
Explanation & Instructions
Links a corpus to its annotated part(s)
You must use it for annotated corpora and annotations. You can repeat it for corpora that have separate files for each annotation type, or if you want to given information such as the use of different annotation tools for each annotation level.
Enter at least the annotation type(s); if you want, you can give a more detailed description of the annotated parts - see the annotation component of the full schema.
Example
<ms:annotation>
<ms:annotationType><ms:annotationTypeRecommended>http://w3id.org/meta-share/omtd-share/Lemma</ms:annotationTypeRecommended></ms:annotationType>
<ms:annotationStandoff>false</ms:annotationStandoff>
<ms:annotationMode>http://w3id.org/meta-share/meta-share/mixed</ms:annotationMode>
<ms:isAnnotatedBy>
<ms:resourceName xml:lang="en">Lemmatizer</ms:resourceName>
</ms:isAnnotatedBy>
</ms:annotation>
<ms:annotation>
<ms:annotationType><ms:annotationTypeRecommended>http://w3id.org/meta-share/omtd-share/PartOfSpeech</ms:annotationTypeRecommended></ms:annotationType>
<ms:annotationStandoff>false</ms:annotationStandoff>
<ms:tagset>
<ms:resourceName xml:lang="en">Universal Dependencies</ms:resourceName>
</ms:tagset>
<ms:isAnnotatedBy>
<ms:resourceName xml:lang="en">PoS tagger</ms:resourceName>
</ms:isAnnotatedBy>
</ms:annotation>
<ms:annotation>
<ms:annotationType><ms:annotationTypeRecommended>http://w3id.org/meta-share/omtd-share/SyntacticAnnotationType</ms:annotationTypeRecommended></ms:annotationType>
</ms:annotation>
DatasetDistribution¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.DatasetDistribution
Data type component
Optionality Mandatory
Explanation & Instructions
Any form with which a dataset is distributed, such as a downloadable form in a specific format (e.g., spreadsheet, plain text, etc.) or an API with which it can be accessed
You can repeat the element for multiple distributions.
The list of mandatory and recommended elements are:
DatasetDistributionForm
(Mandatory): The form (medium/channel) used for distributing a language resource consisting of data (e.g., a corpus, a lexicon, etc.). The typical values are ‘downloadable’, ‘accessibleThroughInterface’, ‘accessibleThroughQuery’ (see more at DatasetDistributionForm).
downloadLocation
(Mandatory if applicable): A URL where the language resource (mainly data but also downloadable software programmes or forms) can be downloaded from. Use this element if the value ofDatasetDistributionForm
is ‘downloadable’ and only for direct download links (i.e., from which the dataset is downloaded without the need of further actions such as clicks on a page).
accessLocation
(Mandatory if applicable): A URL where the resource can be accessed from; it can be used for landing pages or for cases where the resource is accessible via an interface, i.e. cases where the resource itself is not provided with a direct link for downloading. Use if the value ofDatasetDistributionForm
is ‘accessibleThroughInterface’ or ‘accessibleThroughQuery’ but also for links used for downloading corpora which are mentioned on a landing page or require some kind of action on the part of the user.
samplesLocation
(Recommended): Links a resource to a url (or url’s) with samples of a data resource or of the input of output resource of a tool/service.
licenceTerms
(Mandatory): See licenceTerms
cost
(Mandatory if applicable): Introduces the cost for accessing a resource, formally described as a set of amount and currency unit. Please use only for resources available at a cost and not for free resources.
Depending on the parts of the corpus, you must also use one or more of the following:
distributionTextFeature
: See distributionTextFeature
distributionAudioFeature
: See distributionAudioFeature
distributionVideoFeature
: See distributionVideoFeature
distributionImageFeature
: See distributionImageFeature
distributionTextNumericalFeatureFeature
: See distributiontextNumericalFeature
Example
<ms:DatasetDistribution>
<ms:DatasetDistributionForm>http://w3id.org/meta-share/meta-share/downloadable</ms:DatasetDistributionForm>
<ms:accessLocation>https://www.someAccessURL.com</ms:accessLocation>
<ms:samplesLocation>https://www.URLwithsamples.com</ms:samplesLocation>
<ms:distributionTextFeature>
<ms:size>
<ms:amount>17601</ms:amount>
<ms:sizeUnit><ms:sizeUnitRecomended>http://w3id.org/meta-share/meta-share/unit</ms:sizeUnitRecomended></ms:sizeUnit>
</ms:size>
<ms:dataFormat><ms:dataFormatRecommended>http://w3id.org/meta-share/omtd-share/Xml</ms:dataFormat></ms:dataFormatRecommended>
<ms:characterEncoding>http://w3id.org/meta-share/meta-share/UTF-8</ms:characterEncoding>
</ms:distributionTextFeature>
<ms:licenceTerms>
<ms:licenceTermsName xml:lang="en">openUnder-PSI</ms:licenceTermsName>
<ms:licenceTermsURL>https://elrc-share.eu/terms/openUnderPSI.html</ms:licenceTermsURL>
</ms:licenceTerms>
</ms:DatasetDistribution>
<ms:DatasetDistribution>
<ms:DatasetDistributionForm>http://w3id.org/meta-share/meta-share/accessibleThroughInterface</ms:DatasetDistributionForm>
<ms:accessLocation>https://www.someAccessURL.com</ms:accessLocation>
<ms:distributionTextFeature>
<ms:size>
<ms:amount>100</ms:amount>
<ms:sizeUnit><ms:sizeUnitRecomended>http://w3id.org/meta-share/meta-share/text1</ms:sizeUnitRecomended></ms:sizeUnit>
</ms:size>
<ms:dataFormat><ms:dataFormatRecommended>http://w3id.org/meta-share/omtd-share/Pdf</ms:dataFormat></ms:dataFormatRecommended>
<ms:characterEncoding>http://w3id.org/meta-share/meta-share/UTF-8</ms:characterEncoding>
</ms:distributionTextFeature>
<ms:licenceTerms>
<ms:licenceTermsName xml:lang="en">some commercial licence</ms:licenceTermsName>
<ms:licenceTermsURL>https://elrc-share.eu/terms/someCommercialLicence.html</ms:licenceTermsURL>
</ms:licenceTerms>
<ms:cost>
<ms:amount>10000</ms:amount>
<ms:currency>http://w3id.org/meta-share/meta-share/euro</ms:currency>
</ms:cost>
</ms:DatasetDistribution>
distributionTextFeature¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.DatasetDistribution.distributionTextFeature
Data type component
Optionality Mandatory if applicable
Explanation & Instructions
Links to a feature that can be used for describing distinct distributable forms of text resources/parts
The following are mandatory or recommended:
size
(Mandatory): The size of the text part, expressed as a combination ofamount
andsizeUnit
(with a value from a recommended CV for sizeUnitRecommended) or a free text value (sizeUnitOther).
dataFormat
(Mandatory): Indicates the format(s) of a data resource; it takes a value from a recommended CV (dataFormatRecommended) or a free value (dataFormatOther); the dataFormat includes the IANA mimetype and pointers to additional documentation for specialized formats (e.g., GATE XML, CONLL formats, etc.).
characterEncoding
(Recommended): Specifies the character encoding used for a language resource data distribution.
Example
<ms:distributionTextFeature>
<ms:size>
<ms:amount>9139</ms:amount>
<ms:sizeUnit><ms:sizeUnitRecomended>http://w3id.org/meta-share/meta-share/sentence</ms:sizeUnitRecomended></ms:sizeUnit>
</ms:size>
<ms:size>
<ms:amount>40</ms:amount>
<ms:sizeUnit><ms:sizeUnitRecomended>http://w3id.org/meta-share/meta-share/file</ms:sizeUnitRecomended></ms:sizeUnit>
</ms:size>
<ms:dataFormat><ms:dataFormatRecommended>http://w3id.org/meta-share/omtd-share/Xml</ms:dataFormat></ms:dataFormatRecommended>
<ms:characterEncoding>http://w3id.org/meta-share/meta-share/UTF-8</ms:characterEncoding>
</ms:distributionTextFeature>
distributionAudioFeature¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.DatasetDistribution.distributionAudioFeature
Data type component
Optionality Mandatory if applicable
Explanation & Instructions
Links to a feature that can be used for describing distinct distributable forms of audio resources/parts
The following are mandatory or recommended:
size
(Mandatory): The size of the text part, expressed as a combination ofamount
andsizeUnit
(with a value from a recommended CV for sizeUnitRecommended) or a free text value (sizeUnitOther).
dataFormat
(Mandatory): Indicates the format(s) of a data resource; it takes a value from a recommended CV (dataFormatRecommended) or a free value (dataFormatOther); the dataFormat includes the IANA mimetype and pointers to additional documentation for specialized formats (e.g., GATE XML, CONLL formats, etc.).
durationOfAudio
(Recommended): Specifies the duration of the audio recording including silences, music, pauses, etc., expressed as a combination ofamount
anddurationUnit
(with a value from the CV for durationUnit).
durationOfEffectiveSpeech
(Recommended): Specifies the duration of effective speech of the audio (part of a) resource, expressed as a combination ofamount
anddurationUnit
(with a value from the CV for durationUnit).
dataFormat
(Mandatory): Indicates the format(s) of a data resource; it takes a value from a recommended CV (dataFormatRecommended) or a free value (dataFormatOther); the dataFormat includes the IANA mimetype and pointers to additional documentation for specialized formats (e.g., GATE XML, CONLL formats, etc.).
audioFormat
(Recommended): Indicates the format(s) of the audio (part of a) data resource, expressed as a value ofdataFormat
(with a value from a CV for dataFormat) andcompressed
.
Example
<ms:distributionAudioFeature>
<ms:size>
<ms:amount>10</ms:amount>
<ms:sizeUnit><ms:sizeUnitRecomended>http://w3id.org/meta-share/meta-share/file</ms:sizeUnitRecomended></ms:sizeUnit>
</ms:size>
<ms:durationOfAudio>
<ms:amount>3</ms:amount>
<ms:durationUnit>http://w3id.org/meta-share/meta-share/hour</ms:durationUnit>
</ms:durationOfAudio>
<ms:dataFormat><ms:dataFormatRecommended>http://w3id.org/meta-share/omtd-share/wav</ms:dataFormat></ms:dataFormatRecommended>
<ms:audioFormat>
<ms:dataFormat><ms:dataFormatRecommended>http://w3id.org/meta-share/omtd-share/wav</ms:dataFormat></ms:dataFormatRecommended>
<ms:compressed>true</ms:compressed>
</ms:audioFormat>
</ms:distributionAudioFeature>
distributionVideoFeature¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.DatasetDistribution.distributionVideoFeature
Data type component
Optionality Mandatory if applicable
Explanation & Instructions
Links to a feature that can be used for describing distinct distributable forms of video resources/parts
The following are mandatory or recommended:
size
(Mandatory): The size of the text part, expressed as a combination ofamount
andsizeUnit
(with a value from a recommended CV for sizeUnitRecommended) or a free text value (sizeUnitOther).
durationOfVideo
(Recommended): Specifies the duration of the video recording, expressed as a combination ofamount
anddurationUnit
(with a value from the CV for durationUnit).
dataFormat
(Mandatory): Indicates the format(s) of a data resource; it takes a value from a recommended CV (dataFormatRecommended) or a free value (dataFormatOther); the dataFormat includes the IANA mimetype and pointers to additional documentation for specialized formats (e.g., GATE XML, CONLL formats, etc.).
videoFormat
(Recommended): Indicates the format(s) of the video (part of a) data resource, expressed as a value ofdataFormat
(with a value from a CV for dataFormat) andcompressed
.
Example
<ms:distributionVideoFeature>
<ms:size>
<ms:amount>9139</ms:amount>
<ms:sizeUnit><ms:sizeUnitRecomended>http://w3id.org/meta-share/meta-share/screen</ms:sizeUnitRecomended></ms:sizeUnit>
</ms:size>
<ms:size>
<ms:amount>40</ms:amount>
<ms:sizeUnit><ms:sizeUnitRecomended>http://w3id.org/meta-share/meta-share/file</ms:sizeUnitRecomended></ms:sizeUnit>
</ms:size>
<ms:durationOfVideo>
<ms:amount>40</ms:amount>
<ms:durationUnit>http://w3id.org/meta-share/meta-share/hour</ms:durationUnit>
</ms:durationOfVideo>
<ms:dataFormat><ms:dataFormatRecommended>http://w3id.org/meta-share/omtd-share/wav</ms:dataFormat></ms:dataFormatRecommended>
<ms:videoFormat>
<ms:dataFormat><ms:dataFormatRecommended>http://w3id.org/meta-share/omtd-share/wav</ms:dataFormat></ms:dataFormatRecommended>
<ms:compressed>true</ms:compressed>
</ms:videoFormat>
distributionImageFeature¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.DatasetDistribution.distributionImageFeature
Data type component
Optionality Mandatory if applicable
Explanation & Instructions
Links to a feature that can be used for describing distinct distributable forms of image resources/parts
The following are mandatory or recommended:
size
(Mandatory): The size of the text part, expressed as a combination ofamount
andsizeUnit
(with a value from a recommended CV for sizeUnitRecommended) or a free text value (sizeUnitOther).
dataFormat
(Mandatory): Indicates the format(s) of a data resource; it takes a value from a recommended CV (dataFormatRecommended) or a free value (dataFormatOther); the dataFormat includes the IANA mimetype and pointers to additional documentation for specialized formats (e.g., GATE XML, CONLL formats, etc.).
imageFormat
(Mandatory): Indicates the format(s) of the image (part of a) data resource, expressed as a value ofdataFormat
(with a value from a CV for dataFormat) andcompressed
.
Example
<ms:distributionImageFeature>
<ms:size>
<ms:amount>100</ms:amount>
<ms:sizeUnit><ms:sizeUnitRecomended>http://w3id.org/meta-share/meta-share/file</ms:sizeUnitRecomended></ms:sizeUnit>
</ms:size>
<ms:dataFormat><ms:dataFormatRecommended>http://w3id.org/meta-share/omtd-share/Pdf</ms:dataFormat></ms:dataFormatRecommended>
<ms:imageFormat>
<ms:dataFormat><ms:dataFormatRecommended>http://w3id.org/meta-share/omtd-share/Pdf</ms:dataFormat></ms:dataFormatRecommended>
<ms:compressed>true</ms:compressed>
</ms:imageFormat>
</ms:distributionImageFeature>
distributiontextNumericalFeature¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.DatasetDistribution.distributiontextNumericalFeature
Data type component
Optionality Mandatory if applicable
Explanation & Instructions
Links to a feature that can be used for describing distinct distributable forms of image resources/parts
The following are mandatory or recommended:
size
(Mandatory): The size of the text part, expressed as a combination ofamount
andsizeUnit
(with a value from a recommended CV for sizeUnitRecommended) or a free text value (sizeUnitOther).
dataFormat
(Mandatory): Indicates the format(s) of a data resource; it takes a value from a recommended CV (dataFormatRecommended) or a free value (dataFormatOther); the dataFormat includes the IANA mimetype and pointers to additional documentation for specialized formats (e.g., GATE XML, CONLL formats, etc.).
Example
<ms:distributionTextNumericalFeature>
<ms:size>
<ms:amount>30</ms:amount>
<ms:sizeUnit><ms:sizeUnitRecomended>http://w3id.org/meta-share/meta-share/file</ms:sizeUnitRecomended></ms:sizeUnit>
</ms:size>
<ms:dataFormat><ms:dataFormatRecommended>http://w3id.org/meta-share/omtd-share/Pdf</ms:dataFormat></ms:dataFormatRecommended>
<ms:imageFormat>
<ms:dataFormat><ms:dataFormatRecommended>http://w3id.org/meta-share/omtd-share/Pdf</ms:dataFormat></ms:dataFormatRecommended>
<ms:compressed>true</ms:compressed>
</ms:imageFormat>
</ms:distributionTextNumericalFeature>
Minimal elements for models¶
This page describes the minimal metadata elements specific to models.
1. Overview¶
Although models are a subclass of language descriptions, we describe them here separately, as we do for the editor.
The table below has all the elements (mandatory and recommended) for a model and the second for the Distribution component as implemented for models.
Table 1 - Elements for models
Element name |
Optionality |
Section |
Tab |
---|---|---|---|
ldSubclass |
M |
||
languageDescriptionSubclass |
M |
||
Model |
MA |
||
modelFunction |
M |
Model/Grammar |
technical |
modelType |
R |
Model/Grammar |
technical |
developmentFramework |
R |
Model/Grammar |
technical |
hasOriginalSource |
R |
Model/Grammar |
technical |
trainingCorpusDetails |
R |
Model/Grammar |
technical |
trainingProcessDetails |
R |
Model/Grammar |
technical |
biasDetails |
R |
Model/Grammar |
technical |
requiresLR |
R |
Model/Grammar |
technical |
NgramModel |
MA |
Model/Grammar |
technical |
baseItem |
M |
Model/Grammar |
technical |
order |
M |
Model/Grammar |
technical |
unspecifiedPart |
MA |
Part |
Media part |
language |
M |
Part |
Media part |
lingualityType |
M |
Part |
Media part |
multilingualityType |
MA |
Part |
Media part |
multilingualityTypeDetails |
R |
Part |
Media part |
metalanguage |
R |
Part |
Media part |
Table 2 - Distribution
Element name |
Optionality |
Section |
Tab |
---|---|---|---|
DatasetDistribution |
M |
Distribution |
Technical |
DatasetDistributionForm |
M |
Distribution |
Technical |
downloadLocation |
MA |
Distribution |
Technical |
accessLocation |
MA |
Distribution |
Technical |
distributionLocation |
MA |
Distribution |
Technical |
samplesLocation |
R |
Distribution |
Technical |
distributionUnspecifiedFeature |
M |
Distribution |
Technical |
licenceTerms |
M |
Distribution |
Technical |
cost |
R |
Distribution |
Technical |
membershipInstitution |
R |
Distribution |
Technical |
2. Element presentation¶
In this section all the aforementioned elements are presented each one separately. The presentation follows the order of the elements in the tables of the previous section.
LanguageDescription¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LanguageDescription
Data type component
Optionality Mandatory
Explanation & Instructions
Wraps together elements for language descriptions
Example
<ms:LRSubclass>
<ms:LanguageDescription>
<ms:lrType>LanguageDescription</ms:lrType>
...
</ms:LanguageDescription>
</ms:LRSubclass>
ldSubclass¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LanguageDescription.ldSubclass
Data type CV
Optionality Mandatory
Explanation & Instructions
The type of the language description
For models, select always http://w3id.org/meta-share/meta-share/model.
Example
<ms:ldSubclass>http://w3id.org/meta-share/meta-share/model<ms:ldSubclass>
LanguageDescriptionSubclass¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LanguageDescription.LanguageDescriptionSubclass
Data type component
Optionality Mandatory
Explanation & Instructions
The type of the language description (used for documentation purposes)
It wraps the set of elements that must be used for the Language Description subclasses. For models, this is the Model component.
Example
<ms:LanguageDescriptionSubclass><ms:Model>
...
</ms:Model><ms:LanguageDescriptionSubclass>
Model¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LanguageDescription.LanguageDescriptionSubclass.Model
Data type Component
Optionality Mandatory if applicable
Explanation & Instructions
Mandatory for all models, defined as “The model artifact that is created through a training process involving an algorithm (that is, the learning algorithm) and the training data to learn from”
The following set of elements are mandatory or recommended for ML models:
ldSubclassType
(Mandatory): Used to mark the subclass of a language description. For ML models, the value is fixed to ‘MLModel’.modelFunction
(Mandatory): Specifies the operation/function/task that a model performs; use either a value from the recommended CV (modelFunctionRecommended) or a free text value (modelFunctionFree).modelType
(Recommended): A classification of models based on their algorithm; use either a value from the recommended CV (modelTypeRecommended) or a free text value (modelTypeFree).modelVariant
(Recommended): Introduces a label that can be used to identify the variant of a ML model.developmentFramework
(Recommended): A framework or toolkit (Machine Learning model, NLP toolkit) used in the development of a resourcetrainingCorpusDetails
(Recommended): Provides a detailed description of the training corpus (e.g., size, number of features , etc.).trainingProcessDetails
(Recommended): Provides a detailed description of the training process and method.biasDetails
(Recommended): Provides a detailed description on bias considerations for the model.requiresLR
(Recommended): Links to a language resource or technology that must be used for the operation of the model, such as the tool deploying it.NGramModel
(MA): You must use this for describing n-gram models; see NGramModel for more information.
Example
<ms:MLModel>
<ms:ldSubclassType>Model</ms:ldSubclassType>
<ms:modelFunction><ms:modelFunctionRecommended>http://w3id.org/meta-share/omtd-share/QuestionAnswering</ms:modelFunctionRecommended></ms:modelFunction>
<ms:modelType><ms:modelTypeRecommended>http://w3id.org/meta-share/meta-share/DeepLearningModel</ms:modelTypeRecommended><ms:modelType>
<ms:modelVariant>factored</ms:modelVariant>
<ms:developmentFramework><ms:DevelopmentFrameworkRecommended>tensorflow</ms:DevelopmentFrameworkRecommended></ms:developmentFramework>
<ms:trainingCorpusDetails xml:lang="en">Trained on a corpus of tweets</ms:trainingCorpusDetails>
</ms:MLModel>
NGramModel¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LanguageDescription.LanguageDescriptionSubclass.Model.NGramModel
Data type Component
Optionality Mandatory if applicable
Explanation & Instructions
Mandatory for n-gram models; n-gram model for our purposes is defined as “A language model consisting of n-grams, i.e. specific sequences of a number of words”
The following set of elements are mandatory or recommended for Machine Learning models:
baseItem
(Mandatory): Type of item that is represented in the n-gram resource.order
(Mandatory): Specifies the maximum number of items in the sequence.
Example
<ms:NGramModel>
<ms:ldSubclassType>NGramModel</ms:ldSubclassType>
<ms:baseItem>http://w3id.org/meta-share/meta-share/word</ms:baseItem>
<ms:order>5</ms:order>
</ms:NGramModel>
unspecifiedPart¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LanguageDescription.unspecifiedPart
Data type component
Optionality Mandatory
Explanation & Instructions
Groups together all information related to languages for a model.
lingualityType
(Mandatory): Indicates whether the resource includes one, two or more languages. Computed by the system based on the number of language or the ISO value for collective languages.
multilingualityType
(Mandatory if applicable): Indicates whether the resource (part) is parallel, comparable or mixed. If lingualityType = bilingual or multilingual, it is required; select one of the values for parallel (e.g., original text and its translations), comparable (e.g. corpus of the same domain in multiple languages) and multilingualSingleText (for corpora that consist of segments including text in two or more languages (e.g., the transcription of a European Parliament session with MPs speaking in their native language.
language
(Mandatory): Specifies the language that is used in the resource part , expressed according to the BCP47 recommendation. See language.
languageVariety
(Mandatory if applicable): Relates a language resource that contains segments in a language variety (e.g., dialect, jargon) to it. Please use for dialect corpora.
language
(Recommended): Specifies the metalanguage, if used, in the resource part , expressed according to the BCP47 recommendation. See language.
Example
<ms:unspecifiedPart>
<ms:lingualityType>http://w3id.org/meta-share/meta-share/monolingual</ms:lingualityType>
<ms:language>
<ms:languageTag>es</ms:languageTag>
<ms:languageId>es</ms:languageId>
</ms:language>
</ms:unspecifiedPart>
Minimal elements for grammars¶
This page describes the minimal metadata elements specific to grammars.
1. Overview¶
Although grammars are a subclass of language descriptions, we describe them here separately, as we do for the editor.
In addition, as for corpora, we also cater for multimedia resources, which include not only text but also audio, video and image files. To cater for these cases, the notion of “media part” is introduced in the model. Thus, a language description consists of at least one text, video and image parts. Depending on the media part type, the DatasetDistribution component includes a set of text, video, etc. distribution features.
The table below has all the elements (mandatory and recommended) for a grammar, The second table presents the mandatory and recommended elements for each media part for grammars. The third table presents the mandatory and recommended elements for the Distribution component, which includes elements that are specific to each media part.
Table 1 - Elements for grammars
Element name |
Optionality |
Section |
Tab |
---|---|---|---|
ldSubclass |
M |
||
languageDescriptionSubclass |
M |
||
Grammar |
MA |
Model/Grammar |
technical |
encodingLevel |
M |
Model/Grammar |
technical |
formalism |
R |
Model/Grammar |
technical |
ldTask |
R |
Model/Grammar |
technical |
personalDataIncluded |
R |
Model/Grammar |
technical |
personalDataDetails |
RA |
Model/Grammar |
technical |
sensitiveDataIncluded |
R |
Model/Grammar |
technical |
sensitiveDataDetails |
RA |
Model/Grammar |
technical |
anonymized |
MA |
Model/Grammar |
technical |
anonymizationDetails |
RA |
Model/Grammar |
technical |
requiresHardware |
R |
Model/Grammar |
technical |
Table 2 - Media parts
Element name |
Optionality |
Section |
Tab |
---|---|---|---|
textPart |
MA |
LD |
Part |
lingualityType |
M |
LD |
Part |
multilingualityType |
MA |
LD |
Part |
multilingualityTypeDetails |
R |
LD |
Part |
language |
M |
LD |
Part |
metalanguage |
R |
LD |
Part |
videoPart |
MA |
LD |
Part |
lingualityType |
M |
LD |
Part |
multilingualityType |
MA |
LD |
Part |
multilingualityTypeDetails |
RA |
LD |
Part |
language |
M |
LD |
Part |
metalanguage |
R |
LD |
Part |
typeOfVideoContent |
M |
LD |
Part |
imagePart |
MA |
LD |
Part |
lingualityType |
M |
LD |
Part |
multilingualityType |
RA |
LD |
Part |
multilingualityTypeDetails |
RA |
LD |
Part |
language |
M |
LD |
Part |
metalanguage |
R |
LD |
Part |
typeOfImageContent |
M |
LD |
Part |
Table 2 - Distribution
Element name |
Optionality |
Section |
Tab |
---|---|---|---|
DatasetDistribution |
M |
Distribution |
Technical |
DatasetDistributionForm |
M |
Distribution |
Technical |
downloadLocation |
MA |
Distribution |
Technical |
accessLocation |
MA |
Distribution |
Technical |
distributionLocation |
MA |
Distribution |
Technical |
samplesLocation |
R |
Distribution |
Technical |
distributionUnspecifiedFeature |
M |
Distribution |
Technical |
licenceTerms |
M |
Distribution |
Technical |
cost |
R |
Distribution |
Technical |
membershipInstitution |
R |
Distribution |
Technical |
2. Element presentation¶
In this section all the aforementioned elements are presented each one separately. The presentation follows the order of the elements in the tables of the previous section.
LanguageDescription¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LanguageDescription
Data type component
Optionality Mandatory
Explanation & Instructions
Wraps together elements for language descriptions
Example
<ms:LRSubclass>
<ms:LanguageDescription>
<ms:lrType>LanguageDescription</ms:lrType>
...
</ms:LanguageDescription>
</ms:LRSubclass>
ldSubclass¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LanguageDescription.ldSubclass
Data type CV
Optionality Mandatory
Explanation & Instructions
The type of the language description
For grammars, select always http://w3id.org/meta-share/meta-share/grammar.
Example
<ms:ldSubclass>http://w3id.org/meta-share/meta-share/grammar<ms:ldSubclass>
LanguageDescriptionSubclass¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LanguageDescription.LanguageDescriptionSubclass
Data type component
Optionality Mandatory
Explanation & Instructions
The type of the language description (used for documentation purposes)
It wraps the set of elements that must be used for the Language Description subclasses. For models, this is the Grammar component.
Example
<ms:LanguageDescriptionSubclass><ms:Grammar>
...
</ms:Grammar><ms:LanguageDescriptionSubclass>
Grammar¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LanguageDescription.LanguageDescriptionSubclass.Grammar
Data type Component
Optionality Mandatory if applicable
Explanation & Instructions
Mandatory for grammars; grammar for our purposes is defined as “A set of rules governing what strings are valid or allowable in a language or text” [https://en.oxforddictionaries.com/definition/grammar]
The following set of elements are mandatory or recommended for computational grammars:
ldSubclassType
(Mandatory): Used to mark the subclass of a language description. For grammars, the value is fixed to ‘Grammar.’encodingLevel
(Mandatory): Classifies the contents of a lexical/conceptual resource or language description as regards the linguistic level of analysis it caters for.compliesWith
(Recommended): Specifies the vocabulary/standard/best practice to which a resource is compliant with.formalism
(Recommended): Specifies the formalism (bibliographic reference, URL, name) used for the creation/enrichment of the resource (grammar or tool/service).ldTask
(Recommended): Specifies the task performed by the language description.
Example
<ms:Grammar>
<ms:ldSubclassType>Grammar</ms:ldSubclassType>
<ms:encodingLevel>http://w3id.org/meta-share/meta-share/morphology</ms:encodingLevel>
<ms:compliesWith>http://w3id.org/meta-share/meta-share/GrAF</ms:compliesWith>
</ms:Grammar>
Minimal elements for lexical/conceptual resources¶
This page describes the minimal metadata elements specific to lexical/conceptual resources.
1. Overview¶
Lexical/Conceptual resources comprise computational lexica, gazetteers, ontologies, term lists, etc. Under this class, we also include multimedia dictionaries, sign language resources, etc. which include not only text but also audio, video and image files. To cater for these cases, the notion of “media part” is introduced in the model. Thus, a lexical/conceptual resource consists of at least one text, audio, video, image and numerical text parts. Depending on the media part type, the DatasetDistribution component includes a set of text, audio, video, etc. distribution features.
The first table below has all the elements (mandatory and recommended) for a lexical/conceptual resource. The second table presents the mandatory and recommended elements for each media part. The third table presents the mandatory and recommended elements for the Distribution component, which includes elements that are specific to each media part.
Table 1 - Lexical/Conceptual resource common elements
Element name |
Optionality |
Section |
Tab |
---|---|---|---|
lcrSubclass |
R |
LCR |
technical |
encodingLevel |
M |
LCR |
technical |
contentType |
R |
LCR |
technical |
compliesWith |
R |
LCR |
technical |
personalDataIncluded |
M |
LCR |
technical |
personalDataDetails |
RA |
LCR |
technical |
sensitiveDataIncluded |
M |
LCR |
technical |
sensitiveDataDetails |
RA |
LCR |
technical |
anonymized |
MA |
LCR |
technical |
anonymizationDetails |
RA |
LCR |
technical |
Table 2 - Media parts
Element name |
Optionality |
Section |
Tab |
---|---|---|---|
textPart |
MA |
LCR |
Part |
lingualityType |
M |
LCR |
Part |
multilingualityType |
MA |
LCR |
Part |
multilingualityTypeDetails |
R |
LCR |
Part |
language |
M |
LCR |
Part |
metalanguage |
R |
LCR |
Part |
audioPart |
MA |
LCR |
Part |
lingualityType |
M |
LCR |
Part |
multilingualityType |
MA |
LCR |
Part |
multilingualityTypeDetails |
RA |
LCR |
Part |
language |
M |
LCR |
Part |
metalanguage |
R |
LCR |
Part |
videoPart |
MA |
LCR |
Part |
lingualityType |
M |
LCR |
Part |
multilingualityType |
MA |
LCR |
Part |
multilingualityTypeDetails |
RA |
LCR |
Part |
language |
M |
LCR |
Part |
metalanguage |
R |
LCR |
Part |
typeOfVideoContent |
M |
LCR |
Part |
imagePart |
MA |
LCR |
Part |
lingualityType |
M |
LCR |
Part |
multilingualityType |
RA |
LCR |
Part |
multilingualityTypeDetails |
RA |
LCR |
Part |
language |
M |
LCR |
Part |
metalanguage |
R |
LCR |
Part |
typeOfImageContent |
M |
LCR |
Part |
Table 3 - Distribution
Element name |
Optionality |
Section |
Tab |
---|---|---|---|
DatasetDistribution |
M |
Distribution |
Technical |
DatasetDistributionForm |
M |
Distribution |
Technical |
downloadLocation |
MA |
Distribution |
Technical |
accessLocation |
MA |
Distribution |
Technical |
distributionLocation |
MA |
Distribution |
Technical |
samplesLocation |
R |
Distribution |
Technical |
distributionTextFeature |
MA |
Distribution |
Technical |
distributionAudioFeature |
MA |
Distribution |
Technical |
distributionVideoFeature |
MA |
Distribution |
Technical |
distributionImageFeature |
MA |
Distribution |
Technical |
distributionTextNumericalFeature |
MA |
Distribution |
Technical |
licenceTerms |
M |
Distribution |
Technical |
cost |
R |
Distribution |
Technical |
membershipInstitution |
R |
Distribution |
Technical |
2. Element presentation¶
In this section all the aforementioned elements are presented each one separately. The presentation follows the order of the elements in the tables of the previous section.
LexicalConceptualResource¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LexicalConceptualResource
Data type component
Optionality Mandatory
Explanation & Instructions
Wraps together elements for lexical/conceptual resources
Example
<ms:LRSubclass>
<ms:LexicalConceptualResource>
<ms:lrType>LexicalConceptualResource</ms:lrType>
...
</ms:LexicalConceptualResource>
</ms:LRSubclass>
lcrSubclass¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LexicalConceptualResource.lcrSubclass
Data type CV (lcrSubclass)
Optionality Recommended
Explanation & Instructions
Introduces a classification of lexical/conceptual resources into types (used for descriptive reasons)
Example
<lcrSubclass>http://w3id.org/meta-share/meta-share/computationalLexicon</lcrSubclass>
<lcrSubclass>http://w3id.org/meta-share/meta-share/ontology</lcrSubclass>
encodingLevel¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LexicalConceptualResource.encodingLevel
Data type CV (encodingLevel)
Optionality Mandatory
Explanation & Instructions
Classifies the contents of a lexical/conceptual resource or language description as regards the linguistic level of analysis it caters for
You can repeat the element for multiple encoding levels.
Example
<ms:encodingLevel>http://w3id.org/meta-share/meta-share/phonology</ms:encodingLevel>
<ms:encodingLevel>http://w3id.org/meta-share/meta-share/semantics</ms:encodingLevel>
ContentType¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LexicalConceptualResource.ContentType
Data type CV (ContentType)
Optionality Recommended
Explanation & Instructions
A more detailed account of the linguistic information contained in the lexical/conceptual resource
You can repeat the element for multiple content types.
Example
<ms:ContentType>http://w3id.org/meta-share/meta-share/collocation</ms:ContentType>
<ms:ContentType>http://w3id.org/meta-share/meta-share/definition</ms:ContentType>
compliesWith¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LexicalConceptualResource.ContentType
Data type CV (compliesWith)
Optionality Recommended
Explanation & Instructions
Specifies the vocabulary/standard/best practice to which a resource is compliant with
Example
<ms:compliesWith>http://w3id.org/meta-share/meta-share/LMF</ms:compliesWith>
LexicalConceptualResourceTextPart¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LexicalConceptualResource.LexicalConceptualResourceMediaPart.LexicalConceptualResourceTextPart
Data type component
Optionality Mandatory if applicable
Explanation & Instructions
A part (or whole set) of a lexical/conceptual resource that consists of textual elements
You can repeat the group of elements for multiple textual parts.
The mandatory or recommended elements for the text part of lexical/conceptual resources are:
mediaType
(Mandatory): Specifies the media type of a language resource (the physical medium of the contents representation). For text parts, always use the value ‘text’.
lingualityType
(Mandatory ): Indicates whether the resource includes one, two or more languages.
multilingualityType
(Recommended if applicable): Indicates whether the resource (part) is parallel, comparable or mixed. If lingualityType = bilingual or multilingual, it is recommended for lexical/conceptual resources; select one of the values for parallel (e.g., bilingual dictionaries with source and translation equivalents), comparable (e.g. lexica of the same domain in multiple languages).
language
(Mandatory): Specifies the language that is used in the resource part, expressed according to the BCP47 recommendation. See language.
languageVariety
(Mandatory if applicable): Relates a language resource that contains segments in a language variety (e.g., dialect, jargon) to it. Please use for dialect corpora.
metalanguage
(Recommended if applicable): pecifies the language that is used as support for the resource (e.g., English for a grammar of French described in English or for a French dictionary with English definitions), expressed according to the BCP47 recommendation. See language.
modalityType
(Recommended if applicable): Specifies the type of the modality represented in the resource. For instance, you can use ‘spoken language’ to describe transcribed speech corpora.
Example
<ms:LexicalConceptualResourceMediaPart>
<ms:LexicalConceptualResourceTextPart>
<ms:lcrMediaType>LexicalConceptualResourceTextPart</ms:lcrMediaType>
<ms:mediaType>http://w3id.org/meta-share/meta-share/text</ms:mediaType>
<ms:lingualityType>http://w3id.org/meta-share/meta-share/bilingual</ms:lingualityType>
<ms:multilingualityType>http://w3id.org/meta-share/meta-share/parallel</ms:multilingualityType>
<ms:language>
<ms:languageTag>en-US</ms:languageTag>
<ms:languageId>en</ms:languageId>
<ms:regionId>US</ms:regionId>
</ms:language>
<ms:language>
<ms:languageTag>es</ms:languageTag>
<ms:languageId>es</ms:languageId>
</ms:language>
<ms:metalanguage>
<ms:languageTag>es</ms:languageTag>
<ms:languageId>es</ms:languageId>
</metalanguage>
</ms:language>
</ms:LexicalConceptualResourceTextPart>
</ms:LexicalConceptualResourceMediaPart>
LexicalConceptualResourceAudioPart¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LexicalConceptualResource.LexicalConceptualResourceMediaPart.LexicalConceptualResourceAudioPart
Data type component
Optionality Mandatory if applicable
Explanation & Instructions
A part (or whole set) of a lexical/conceptual resource that consists of audio elements
You can repeat the group of elements for multiple audio parts.
The mandatory or recommended elements for the audio part of lexical/conceptual resources are:
mediaType
(Mandatory): Specifies the media type of a language resource (the physical medium of the contents representation). For audio parts, always use the value ‘audio’.
lingualityType
(Mandatory ): Indicates whether the resource includes one, two or more languages.
multilingualityType
(Recommended if applicable): Indicates whether the resource (part) is parallel, comparable or mixed. If lingualityType = bilingual or multilingual, it is recommended for lexical/conceptual resources; select one of the values for parallel (e.g., bilingual dictionaries with source and translation equivalents), comparable (e.g. lexica of the same domain in multiple languages).
language
(Mandatory): Specifies the language that is used in the resource part, expressed according to the BCP47 recommendation. See language.
languageVariety
(Mandatory if applicable): Relates a language resource that contains segments in a language variety (e.g., dialect, jargon) to it. Please use for dialect corpora.
metalanguage
(Recommended if applicable): pecifies the language that is used as support for the resource (e.g., English for a grammar of French described in English or for a French dictionary with English definitions), expressed according to the BCP47 recommendation. See language.
modalityType
(Recommended if applicable): Specifies the type of the modality represented in the resource. For instance, you can use ‘spoken language’ to describe transcribed speech corpora.
Example
<ms:LexicalConceptualResourceMediaPart>
<ms:LexicalConceptualResourceAudioPart>
<ms:lcrMediaType>LexicalConceptualResourceAudioPart</ms:lcrMediaType>
<ms:mediaType>http://w3id.org/meta-share/meta-share/audio</ms:mediaType>
<ms:lingualityType>http://w3id.org/meta-share/meta-share/bilingual</ms:lingualityType>
<ms:multilingualityType>http://w3id.org/meta-share/meta-share/parallel</ms:multilingualityType>
<ms:language>
<ms:languageTag>en-US</ms:languageTag>
<ms:languageId>en</ms:languageId>
<ms:regionId>US</ms:regionId>
</ms:language>
<ms:language>
<ms:languageTag>es</ms:languageTag>
<ms:languageId>es</ms:languageId>
</ms:language>
<ms:metalanguage>
<ms:languageTag>es</ms:languageTag>
<ms:languageId>es</ms:languageId>
</metalanguage>
</ms:language>
</ms:LexicalConceptualResourceAudioPart>
</ms:LexicalConceptualResourceMediaPart>
LexicalConceptualResourceVideoPart¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LexicalConceptualResource.LexicalConceptualResourceMediaPart.LexicalConceptualResourceVideoPart
Data type component
Optionality Mandatory if applicable
Explanation & Instructions
A part (or whole set) of a lexical/conceptual resource that consists of video elements
You can repeat the group of elements for multiple video parts.
The mandatory or recommended elements for the video part of lexical/conceptual resources are:
mediaType
(Mandatory): Specifies the media type of a language resource (the physical medium of the contents representation). For video parts, always use the value ‘video’.
lingualityType
(Mandatory ): Indicates whether the resource includes one, two or more languages.
multilingualityType
(Recommended if applicable): Indicates whether the resource (part) is parallel, comparable or mixed. If lingualityType = bilingual or multilingual, it is recommended for lexical/conceptual resources; select one of the values for parallel (e.g., bilingual dictionaries with source and translation equivalents), comparable (e.g. lexica of the same domain in multiple languages).
language
(Mandatory): Specifies the language that is used in the resource part, expressed according to the BCP47 recommendation. See language.
languageVariety
(Mandatory if applicable): Relates a language resource that contains segments in a language variety (e.g., dialect, jargon) to it. Please use for dialect corpora.
metalanguage
(Recommended if applicable): pecifies the language that is used as support for the resource (e.g., English for a grammar of French described in English or for a French dictionary with English definitions), expressed according to the BCP47 recommendation. See language.
modalityType
(Recommended if applicable): Specifies the type of the modality represented in the resource. For instance, you can use ‘spoken language’ to describe transcribed speech corpora.
Example
<ms:LexicalConceptualResourceMediaPart>
<ms:LexicalConceptualResourceVideoPart>
<ms:lcrMediaType>LexicalConceptualResourceVideoPart</ms:lcrMediaType>
<ms:mediaType>http://w3id.org/meta-share/meta-share/video</ms:mediaType>
<ms:lingualityType>http://w3id.org/meta-share/meta-share/bilingual</ms:lingualityType>
<ms:multilingualityType>http://w3id.org/meta-share/meta-share/parallel</ms:multilingualityType>
<ms:language>
<ms:languageTag>en-US</ms:languageTag>
<ms:languageId>en</ms:languageId>
<ms:regionId>US</ms:regionId>
</ms:language>
<ms:language>
<ms:languageTag>es</ms:languageTag>
<ms:languageId>es</ms:languageId>
</ms:language>
<ms:metalanguage>
<ms:languageTag>es</ms:languageTag>
<ms:languageId>es</ms:languageId>
</metalanguage>
</ms:language>
</ms:LexicalConceptualResourceVideoPart>
</ms:LexicalConceptualResourceMediaPart>
LexicalConceptualResourceImagePart¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LexicalConceptualResource.LexicalConceptualResourceMediaPart.LexicalConceptualResourceImagePart
Data type component
Optionality Mandatory if applicable
Explanation & Instructions
A part (or whole set) of a lexical/conceptual resource that consists of image elements
You can repeat the group of elements for multiple image parts.
The mandatory or recommended elements for the image part of lexical/conceptual resources are:
mediaType
(Mandatory): Specifies the media type of a language resource (the physical medium of the contents representation). For image parts, always use the value ‘image’.
lingualityType
(Mandatory ): Indicates whether the resource includes one, two or more languages.
multilingualityType
(Recommended if applicable): Indicates whether the resource (part) is parallel, comparable or mixed. If lingualityType = bilingual or multilingual, it is recommended for lexical/conceptual resources; select one of the values for parallel (e.g., bilingual dictionaries with source and translation equivalents), comparable (e.g. lexica of the same domain in multiple languages).
language
(Mandatory): Specifies the language that is used in the resource part, expressed according to the BCP47 recommendation. See language.
languageVariety
(Mandatory if applicable): Relates a language resource that contains segments in a language variety (e.g., dialect, jargon) to it. Please use for dialect corpora.
metalanguage
(Recommended if applicable): pecifies the language that is used as support for the resource (e.g., English for a grammar of French described in English or for a French dictionary with English definitions), expressed according to the BCP47 recommendation. See language.
modalityType
(Recommended if applicable): Specifies the type of the modality represented in the resource. For instance, you can use ‘spoken language’ to describe transcribed speech corpora.
Example
<ms:LexicalConceptualResourceMediaPart>
<ms:LexicalConceptualResourceImagePart>
<ms:lcrMediaType>LexicalConceptualResourceImagePart</ms:lcrMediaType>
<ms:mediaType>http://w3id.org/meta-share/meta-share/image</ms:mediaType>
<ms:lingualityType>http://w3id.org/meta-share/meta-share/bilingual</ms:lingualityType>
<ms:multilingualityType>http://w3id.org/meta-share/meta-share/parallel</ms:multilingualityType>
<ms:language>
<ms:languageTag>en-US</ms:languageTag>
<ms:languageId>en</ms:languageId>
<ms:regionId>US</ms:regionId>
</ms:language>
<ms:language>
<ms:languageTag>es</ms:languageTag>
<ms:languageId>es</ms:languageId>
</ms:language>
<ms:metalanguage>
<ms:languageTag>es</ms:languageTag>
<ms:languageId>es</ms:languageId>
</metalanguage>
</ms:language>
</ms:LexicalConceptualResourceImagePart>
</ms:LexicalConceptualResourceMediaPart>
Minimal elements for projects¶
This page describes the minimal metadata elements specific to projects.
N.B. The interactive editor supports the full schema, i.e. it also includes optional elements.
1. Overview¶
Element name |
Optionality |
Tab |
projectName |
M |
Identity |
ProjectIdentifier |
R |
Identity |
projectShortName |
R |
Identity |
projectAlternativeName |
R |
Identity |
projectSummary |
R |
Identity |
website |
R |
Identity |
R |
Identity |
|
logo |
R |
Identity |
fundingType |
R |
Identity |
funder |
R |
Identity |
fundingCountry |
R |
Identity |
socialMediaOccupationalAccount |
R |
Identity |
LTArea |
R |
Categories |
domain |
R |
Categories |
keyword |
R |
Categories |
2. Element presentation¶
In this section all the aforementioned elements are presented each one separately. The presentation follows the order of the elements in the table of the previous section.
Project¶
Path MetadataRecord.DescribedEntity.Project
Data type component
Optionality Mandatory
Explanation & Instructions
Wraps together elements for projects
Example
<ms:Project>
<ms:entityType>project</ms:entityType>
...
</ms:Project>
ProjectIdentifier¶
Path MetadataRecord.DescribedEntity.Project.ProjectIdentifier
Data type string
Optionality Recommended
Explanations & Instructions
A string (e.g., PID, internal to an organization, issued by the funding authority, etc.) used to uniquely identify a project
You must also use the attribute ProjectIdentifierScheme
to specify the name of the scheme according to which an identifier is assigned to a project by the authority that issues it. ProjectIdentifierScheme for details.
Example
<ms:ProjectIdentifier ms:ProjectIdentifierScheme="http://w3id.org/meta-share/meta-share/cordis">219608</ms:ProjectIdentifier>
<ms:ProjectIdentifier ms:ProjectIdentifierScheme="http://w3id.org/meta-share/meta-share/cordis">219378</ms:ProjectIdentifier>
projectName¶
Path MetadataRecord.DescribedEntity.Project.projectName
Data type multilingual string
Optionality Mandatory
Explanations & Instructions
The full name (title) of a project
Example
<ms:projectName xml:lang="en">Browser-based Multilingual Translation</ms:projectName>
<ms:projectName xml:lang="en">European Language Grid</ms:projectName>
projectShortName¶
Path MetadataRecord.DescribedEntity.Project.projectShortName
Data type multiligual string
Optionality Recommended
Explanations & Instructions
Introduces a short name (e.g., acronym, abbreviated form) by which a project is known
Example
<ms:projectShortName xml:lang="en">Bergamot</ms:projectShortName>
<ms:projectShortName xml:lang="en">ELG</ms:projectShortName>
projectAlternativeName¶
Path MetadataRecord.DescribedEntity.Project.projectAlternativeName
Data type multilingual string
Optionality Recommended
Explanations & Instructions
Introduces an alternative name (other than the short name) used for a project
Example
<ms:projectAlternativeName xml:lang="en">The European Language Grid</ms:projectName>
projectSummary¶
Path MetadataRecord.DescribedEntity.Project.projectSummary
Data type multilingual string
Optionality Recommended
Explanations & Instructions
Introduces a short description (in free text) of the main objectives, mission or contents of the project
Example
<ms:projectSummary xml:lang="en">The Bergamot project will add and improve client-side machine translation in a web browser. Unlike current cloud-based options, running directly on users'' machines empowers citizens to preserve their privacy and increases the uptake of language technologies in Europe in various sectors that require confidentiality. Free software integrated with an open-source web browser, such as Mozilla Firefox, will enable bottom-up adoption by non-experts, resulting in cost savings for private and public sector users who would otherwise procure translation or operate monolingually. To understand and support non-expert users, our user experience work package researches their needs and creates the user interface. Rather than simply translating text, this interface will expose improved quality estimates, addressing the rising public debate on algorithmic trust. Building on quality estimation research, we will enable users to confidently generate text in a language they do not speak, enabling cross-lingual online form filling. To improve quality overall, dynamic domain adaptation research addresses the peculiar writing style of a website or user by adapting translation on the fly using local information too private to upload to the cloud. These applications require adaptation and inference to run on desktop hardware with compact model downloads, which we address with neural network efficiency research. Our combined research on user experience, domain adaptation, quality estimation, outbound translation, and efficiency support a broad browser-based innovation plan.</ms:projectSummary>
<ms:projectSummary xml:lang="en">With 24 official EU and many more additional languages, multilingualism in Europe and an inclusive Digital Single Market can only be enabled through Language Technologies (LTs). European LT business is dominated by thousands of SMEs and a few large players. Many are world-class, with technologies that outperform the global players. However, European LT business is also fragmented by nation states, languages, verticals and sectors. Likewise, while much of European LT research is world-class, with results transferred into industry and commercial products, its full impact is held back by fragmentation. The key issue and challenge is the fragmentation of the European LT landscape. The European Language Grid (ELG) project will address this fragmentation by establishing the ELG as the primary platform for LT in Europe. The ELG will be a scalable cloud platform, providing, in an easy-to-integrate way, access to hundreds of commercial and non-commercial Language Technologies for all European languages, including running tools and services as well as data sets and resources. It will enable the commercial and non-commercial European LT community to deposit and upload their technologies and data sets into the ELG, to deploy them through the grid, and to connect with other resources. The ELG will boost the Multilingual Digital Single Market towards a thriving European LT community, creating new jobs and opportunities. Through open calls, up to 20 pilot projects will be financially supported to demonstrate the usefulness of the ELG. The proposal is rooted in the experience of a consortium with partners involved in all relevant initiatives. Based on these, 30\\ national competence centres and the European LT Board will be set up for European coordination. The ELG will foster "language technologies for Europe built in Europe", tailored to our languages and cultures and to our societal and economical demands, benefitting the European citizen, society, innovation and industry.</ms:projectSummary>
website¶
Path MetadataRecord.DescribedEntity.Project.website
Data type URL
Optionality Recommended
Explanations & Instructions
Links to a URL that acts as the primary page (like a table of contents) introducing information about an organization (e.g., products, contact information, etc.) or project
Example
<ms:website>https://browser.mt/</ms:website>
<ms:website>https://www.european-language-grid.eu/</ms:website>
email¶
Path MetadataRecord.DescribedEntity.Project.email
Data type string
Optionality Recommended
Explanation & Instructions
Points to the email address used for information purposes of a project
Example
<ms:email>info@project.eu</ms:email>
logo¶
Path MetadataRecord.DescribedEntity.Project.logo
Data type URL
Optionality Recommended Explanations & Instructions
Links to a URL with an image file containing a symbol or graphic object used to identify the entity
In the interactive editor, users can also upload an image file.
Example
<ms:logo>https://ufal.mff.cuni.cz/sites/default/files/styles/drupal_projects_logo_style/public/bergamot_logo.png</ms:logo>
<ms:logo>https://www.european-language-grid.eu/wp-content/themes/elg_theme/fab/image/logo/rgb_elg__logo--colour.svg</ms:logo>
fundingType¶
Path MetadataRecord.DescribedEntity.Project.fundingType
Data type CV (fundingType)
Optionality Recommended
Explanations & Instructions
Specifies the type of funding of a project with regard to the source of the funding
Example
<ms:fundingType>http://w3id.org/meta-share/meta-share/euFunds</ms:fundingType>
funder¶
Path MetadataRecord.DescribedEntity.Project.funder
Data type component
Optionality Recommended
Explanations & Instructions
Identifies the person/organization/group that has financed the project
Funding information is important for acknowledgement purposes.
For organizations, you must provide the name of the organization (organizationName
) and, if possible, a website (website
) and/or an identifier (OrganizationIdentifier
).
Example
<ms:funder>
<ms:Organization>
<ms:actorType>Organization</ms:actorType>
<ms:organizationName xml:lang="en">European Commission</ms:organizationName>
<ms:website>https://ec.europa.eu/info/index_en</ms:website>
</ms:Organization>
</ms:funder>
fundingCountry¶
Path MetadataRecord.DescribedEntity.Project.fundingCountry
Data type CV (regionIdType)
Optionality Recommended
Explanations & Instructions
Specifies the name of the funding country, in case of national funding as mentioned in ISO3166
Example
<ms:fundingCountry>EU</ms:fundingCountry>
LTArea¶
Path MetadataRecord.DescribedEntity.Project.LTArea
Data type component
Optionality Recommended
Explanations & Instructions
Introduces a Language Technology-related area that the project deals with
For details, see LTArea More specifically, you can fill in:
the
LTClassRecommended
element with one of the recommended values from the LT taxonomy, orthe
LTClassOther
element with a free text.
Example
<ms:LTArea>
<ms:LTClassRecommended>http://w3id.org/meta-share/omtd-share/MachineTranslation</ms:LTClassRecommended>
</ms:LTArea>
<ms:LTArea>
<ms:LTClassOther>Browser-based Machine Translation</ms:LTClassOther>
</ms:LTArea>
domain¶
Path MetadataRecord.DescribedEntity.Project.domain
Data type component
Optionality Recommended
Explanations & Instructions
Identifies a domain that the project deals with
You must fill in the CategoryLabel
element with a free text value. If you prefer to add a value from an established controlled vocabulary, you can also use the DomainIdentifier
(with the attribute DomainClassificationScheme
with the appropriate value).
Example
<ms:domain>
<ms:categoryLabel xml:lang="en">htttp://w3id.org/meta-share/omtd-share/NewsMediaJournalismAndPublishing</ms:categoryLabel>
</ms:domain>
<ms:domain>
<ms:categoryLabel xml:lang="en">General</ms:categoryLabel>
</ms:domain>
keyword¶
Path MetadataRecord.DescribedEntity.Project.keyword
Data type multilingual string
Optionality Recomended
Explanations & Instructions
Introduces a word or phrase considered important for the description of the project and thus used to index or classify it
Example
<ms:keyword xml:lang="en">Machine translation</ms:keyword>
<ms:keyword xml:lang="en">translation integration</ms:keyword>
<ms:keyword xml:lang="en">Language technology services</ms:keyword>
<ms:keyword xml:lang="en">Multilingualism</ms:keyword>
<ms:keyword xml:lang="en">Less-resourced languages</ms:keyword>
Minimal elements for organizations¶
This page describes the minimal metadata elements specific to organizations.
N.B. The interactive editor supports the full schema, i.e. it also includes optional elements.
1. Overview¶
Element name |
Optionality |
Tab |
organizationName |
M |
Identity |
OrganizationIdentifier |
R |
Identity |
organizationShortName |
R |
Identity |
organizationAlternativeName |
R |
Identity |
organizationBio |
R |
Identity |
logo |
R |
Identity |
LTArea |
R |
Activities |
serviceOffered |
R |
Activities |
domain |
R |
Activities |
keyword |
R |
Activities |
R |
Contact |
|
website |
R |
Contact |
headOfficeAddress |
R |
Contact |
socialMediaOccupationalAccount |
R |
Contact |
divisionCategory |
R |
Division |
isDivisionOf |
R |
Division |
2. Element presentation¶
In this section all the aforementioned elements are presented each one separately. The presentation follows the order of the elements in the table of the previous section.
Organization¶
Path MetadataRecord.DescribedEntity.Organization
Data type component
Optionality Mandatory
Explanation & Instructions
Wraps together elements for organizations
Example
<ms:Organization>
<ms:entityType>organization</ms:entityType>
...
</ms:Organization>
organizationName¶
Path MetadataRecord.DescribedEntity.Organization.organizationName
Data type multilingual string
Optionality Mandatory
Explanation & Instructions
The full name of an organization
Example
<ms:organizationName xml:lang="en">Charles University</ms:organizationName>
<ms:organizationName xml:lang="en">Evaluation and Language Resources Distribution Agency</ms:organizationName>
OrganizationIdentifier¶
Path MetadataRecord.DescribedEntity.Organization.OrganizationIdentifier
Data type string
Optionality Recommended
Explanation & Instructions
A string (e.g., PID, internal to an organization, issued by the funding authority, etc.) used to uniquely identify an organization
You must also use the attribute OrganizationIdentifierScheme
to specify the name of the scheme according to which an identifier is assigned to an organization by the authority that issues it. See OrganizationIdentifierScheme for details.
It is recommended to add an identifier issued by an authority, such as GRID, if available.
Example
<ms:OrganizationIdentifier ms:OrganizationIdentifierScheme="http://w3id.org/meta-share/meta-share/grid">https://www.grid.ac/institutes/grid.5216.0</ms:OrganizationIdentifier>
organizationShortName¶
Path MetadataRecord.DescribedEntity.Organization.organizationShortName
Data type multilingual string
Optionality Recommended
Explanation & Instructions
Introduces the short name (abbreviation, acronym , etc.) used for an organization
Example
<ms:organizationShortName xml:lang="en">CUNI</ms:organizationName>
<ms:organizationShortName xml:lang="en">ELDA</ms:organizationName>
organizationAlternativeName¶
Path MetadataRecord.DescribedEntity.Organization.organizationAlternativeName
Data type multilingual string
Optionality Recommended
Explanation & Instructions
Introduces an alternative name (other than the short name) used for an organization
Example
<ms:organizationAlternativeName xml:lang="en">UNIVERZITA KARLOVA</ms:organizationAlternativeName>
<ms:organizationAlternativeName xml:lang="en">EVALUATIONS AND LANGUAGE RESOURCES DISTRIBUTION AGENCY</ms:organizationAlternativeName>
organizationBio¶
Path MetadataRecord.DescribedEntity.Organization.organizationBio
Data type multilingual string
Optionality Recommended
Explanation & Instructions
Introduces a short free-text account that provides information on an organization
Example
<ms:organizationBio xml:lang="en">Charles University was founded in 1348, making it one of the oldest universities in the world. Yet it is also renowned as a modern, dynamic, cosmopolitan and prestigious institution of higher education. It is the largest and most renowned Czech university, and is also the best-rated Czech university according to international rankings. There are currently 17 faculties at the University, plus 3 institutes, 6 other centres of teaching, research, development and other creative activities, a centre providing information services, 5 facilities serving the whole University, and the Rectorate - which is the executive management body for the whole University.</ms:organizationBio>
<ms:organizationBio xml:lang="en">The Evaluations and Language Resources Distribution Agency (ELDA), was created in 1995 as the organizational infrastructure with the mission of providing a central clearing house for Language Resources (LR) of the European Language Resources Association (ELRA). ELDA was set up to identify, classify, collect, validate and distribute the language resources that are needed by the Human Language Technology (HLT) community. Anticipating the evolutions in the HLT field, ELDA broadened its activities to cover multimedia/multimodal resources as well as evaluation activities, distributing the language resources needed for evaluation purposes, and conducting/coordinating evaluation campaigns. ELDA has played a significant role within the major Multimedia and Multimodal production projects that resulted in one of the most impressive catalogues of available data sets, embracing all aspects of Language Technologies. ELDA was also involved in evaluation initiatives, in several FPs’ projects involving HLT infrastructures, as well as in national programmes. In addition to work on data production, processing and annotation, validation and quality control, several of these projects also involved work on legal framework management for the produced resources. Moreover, ELDA has contributed to the development of open platforms and has joined forces with other European key players by bringing its assets (LR catalogue, evaluation services and benchmarking) to constitute Europe's backbone for Language Resources sharing and distribution. ELDA is also the initiator of the Language Resource and the Evaluation Conference (LREC), since 1998. With over 1200 participants, LREC is the major event on Language Resources (LRs) and Evaluation for Human Language Technologies (HLT).</ms:organizationBio>
logo¶
Path MetadataRecord.DescribedEntity.Organization.logo
Data type URL
Optionality Recommended
Explanation & Instructions
Links to a URL with an image file containing a symbol or graphic object used to identify the entity
In the interactive form, users can also upload an image file.
Example
<ms:logo>https://cuni.cz/UKEN-1-version1-afoto.jpg</ms:logo>
<ms:logo>https://www.european-language-grid.eu/wp-content/uploads/2019/03/logo__consortium-elda.svg</ms:logo>
LTArea¶
Path MetadataRecord.DescribedEntity.Organization.LTArea
Data type component
Optionality Recommended
Explanation & Instructions
Introduces a Language Technology-related area that a person or organization is involved or active in
For details, see LTArea More specifically, you can fill in:
the
LTClassRecommended
element with one of the recommended values from the LT taxonomy, orthe
LTClassOther
element with a free text.
Example
<ms:LTArea>
<ms:LTClassRecommended>http://w3id.org/meta-share/omtd-share/LanguageTechnology</ms:LTClassRecommended>
</ms:LTArea>
<ms:LTArea>
<ms:LTClassRecommended>http://w3id.org/meta-share/omtd-share/MachineTranslation</ms:LTClassRecommended>
</ms:LTArea>
serviceOffered¶
Path MetadataRecord.DescribedEntity.Organization.serviceOffered
Data type multilingual string
Optionality Recommended
Explanation & Instructions
Lists the service(s) offered by an organization or person
Example
<ms:serviceOffered xml:lang="en">Evaluation and benchmarking</ms:serviceOffered>
<ms:serviceOffered xml:lang="en">Legal support</ms:serviceOffered>
domain¶
Path MetadataRecord.DescribedEntity.Organization.domain
Data type component
Optionality Recommended
Explanation & Instructions
Identifies a domain that the organization deals with
You must fill in the CategoryLabel
element with a free text value. If you prefer to add a value from an established controlled vocabulary, you can also use the DomainIdentifier
(with the attribute DomainClassificationScheme
with the appropriate value).
Example
<ms:domain>
<ms:categoryLabel xml:lang="en">environment</ms:categoryLabel>
</ms:domain>
keyword¶
Path MetadataRecord.DescribedEntity.Organization.keyword
Data type multilingual string
Optionality Recommended
Explanation & Instructions
Introduces a word or phrase considered important for the description of the project and thus used to index or classify it
Example
<ms:keyword xml:lang="en">Research infrastructures</ms:keyword>
<ms:keyword xml:lang="en">Language Resources</ms:keyword>
<ms:keyword xml:lang="en">Digital Humanities</ms:keyword>
<ms:keyword xml:lang="en">Language Resources and Evaluation</ms:keyword>
<ms:keyword xml:lang="en">Legal support</ms:keyword>
<ms:keyword xml:lang="en">Data management</ms:keyword>
email¶
Path MetadataRecord.DescribedEntity.Organization.email
Data type string
Optionality Recommended
Explanation & Instructions
Points to the email address of a person, organization or group
Example
<ms:email>info@company.eu</ms:email>
website¶
Path MetadataRecord.DescribedEntity.Organization.website
Data type URL
Optionality Recommended
Explanation & Instructions
Links to a URL that acts as the primary page (like a table of contents) introducing information about an organization (e.g., products, contact information, etc.) or project
Example
<ms:website>https://www.cuni.cz</ms:website>
<ms:website>http://www.elra.info/en/</ms:website>
headOfficeAddress¶
Path MetadataRecord.DescribedEntity.Organization.headOfficeAddress
Data type component
Optionality Recommended
Explanation & Instructions
Links to a set of elements that describe the full address of the head office of an or organization (i.e. including street address, zip code, etc.). The only mandatory element in this set is country
.
Example
<ms:headOfficeAddress>
<ms:address xml:lang="en">OLD COLLEGE, SOUTH BRIDGE</ms:address>
<ms:zipCode>EH8 9YL</ms:zipCode>
<ms:city xml:lang="en">EDINBURGH</ms:city>
<ms:country>GB</ms:country>
</ms:headOfficeAddress>
socialMediaOccupationalAccount¶
Path MetadataRecord.DescribedEntity.Organization.socialMediaOccupationalAccount
Data type multilingual string
Optionality Recommended
Explanation & Instructions
Introduces the social media or occupational account details of a person or organization
You must also use the attribute socialMediaAccountType
to specify the type of social media account. See https://european-language-grid.readthedocs.io/en/stable/Documentation/ELG-SHAREschema.html#socialMediaOccupationalAccountType for details.
Example
<ms:socialMediaOccupationalAccount ms:socialMediaOccupationalAccountType="http://w3id.org/meta-share/meta-share/facebook">https://www.facebook.com/UFALMFFUK</ms:socialMediaOccupationalAccount>
divisionCategory¶
Path MetadataRecord.DescribedEntity.Organization.divisionCategory
Data type CV
Optionality Recommended
Explanation & Instructions
Classifies the division of an organization according to a controlled vocabulary
Specify, in case the organization you describe is part of a parent organization, the category, e.g. faculty or department of a university, laboratory in a company, etc.
Example
<ms:divisionCategory>http://w3id.org/meta-share/meta-share/institute</ms:divisionCategory>
isDivisionOf¶
Path MetadataRecord.DescribedEntity.Organization.isDivisionOf
Data type component
Optionality Recommended
Explanation & Instructions
Links an organization to the division(s) it consists of
Example
<ms:isDivisionOf>
<ms:organizationName xml:lang="en">Charles University</ms:organizationName>
<ms:website>https://www.cuni.cz</ms:website>
</ms:isDivisionOf>
- 1
To register a metadata record at the ELG platform, the recommended elements do not have to be filled in. However, they increase the visibility and usability of the item, and providers are encouraged to fill them in. The ELG interactive editor contains both the mandatory and recommended elements. The full schema is currently supported through the upload of metadata records.
socialMediaOccupationalAccount¶
Path
MetadataRecord.DescribedEntity.Project.socialMediaOccupationalAccount
Data type multilingual string
Optionality Recommended
Explanations & Instructions
Introduces the social media or occupational account details of a person, organization or project
You must also use the attribute
socialMediaAccountType
to specify the type of social media account. See socialMediaOccupationalAccountType for details.Example