Metadata schema¶
This annex provides an overview of ELG’s metadata schema, ELG-SHARE. We describe the basic concepts, provide links to the full schema documentation, and finally present the “minimal version” of the schema, consisting only of required and recommended elements.
Basic concepts¶
The following figure shows the main notions upon which the ELG schema builds.
These include:
MetadataRecord
: It corresponds to the catalogue entry, and records information concerning the registration process, such as who created the entry and when, whether it was harvested from another catalogue, who is responsible for its curation (updates), etc.DescribedEntity
: It corresponds to any entity that can be described by a metadata record. It can be a Language Resource, a Person, Organization, etc. (cf. Types of catalogue entries).LanguageResource
, which is further classified into one of four resource types:ToolService
,Corpus
,LexicalConceptualResource
andLanguageDescription
. A Language Resource can be described through a set of metadata elements common to all, and a further set that fits to each of these four types.Distribution
: It corresponds to the physical form with which a Language Resource is made available through the catalogue, e.g. as a downloadable file, or a form accessed via an interface, etc.
Full schema documentation¶
You can find the full schema XSD, documentation as well as templates and examples of metadata records for all resource types in the ELG SHARE schema Git repository.
You can browse the full schema documentation here:
Minimal version¶
The minimal version comprises a set of carefully selected metadata elements that are deemed important for various reasons, such as:
identification and citation: resource name(s); identifier(s); a short description of contents; versioning information; a contact point for further information (email or landing page); data of the resource provider(s) and resource creator(s); classification by domain, keywords and intended LT application; language coverage (language and, if needed, dialect); publication date;
support: links to manuals, training material; samples of the resource;
usage/access: distribution form (e.g. as downloadable file, a form that can be accessed via an interface, source code or binary file of software, etc.); licensing conditions; access location.
These metadata elements can be used to describe all resources, irrespective of the resource type. Additional metadata elements, particular to each resource type, are required, such as size and format for data files, prerequisites for tools and services, etc.
For each metadata element we present the following information:
Path: the path of the element as in the XSD
- Data type:
string
multilingual string: you can repeat the element for different language versions; to specify the language, you must use the xml attribute
lang
with a value from IETF BCP 47, the IANA Language Subtag Registry; for all metadata elements, a value in English (“en”) is mandatorycomponent: group of elements
Controlled Vocabulary (CV): value taken from a controlled vocabulary; a link to the relevant controlled vocabulary is provided
date: date in the format xs:date
URL
- Optionality:
Mandatory (Μ): the element must always be filled in the metadata record
Recommended (R): the use of the element is not enforced but provides important information
Mandatory if applicable (MA): the element must be filled in when specific conditions apply
Recommended if applicable (RA): the use of the element is recommended when specific conditions apply
Explanation & Instructions: A short definition of the element, followed by instructions on how it should be used in the specific context.
Example: One or more examples for the element in XML format.