Metadata schema

This annex provides an overview of ELG’s metadata schema, ELG-SHARE. We describe the basic concepts, provide links to the full schema documentation, and finally present the “minimal version” of the schema, consisting only of required and recommended elements.

Basic concepts

The following figure shows the main notions upon which the ELG schema builds.

ELG schema basic concepts

These include:

  • MetadataRecord: It corresponds to the catalogue entry, and records information concerning the registration process, such as who created the entry and when, whether it was harvested from another catalogue, who is responsible for its curation (updates), etc.

  • DescribedEntity: It corresponds to any entity that can be described by a metadata record. It can be a Language Resource, a Person, Organization, etc. (cf. Types of catalogue entries).

  • LanguageResource, which is further classified into one of four resource types: ToolService, Corpus, LexicalConceptualResource and LanguageDescription. A Language Resource can be described through a set of metadata elements common to all, and a further set that fits to each of these four types.

  • Distribution: It corresponds to the physical form with which a Language Resource is made available through the catalogue, e.g. as a downloadable file, or a form accessed via an interface, etc.

Full schema documentation

You can find the full schema XSD, documentation as well as templates and examples of metadata records for all resource types in the ELG SHARE schema Git repository.

You can browse the full schema documentation here:

Minimal version

The minimal version comprises a set of carefully selected metadata elements that are deemed important for various reasons, such as:

  • identification and citation: resource name(s); identifier(s); a short description of contents; versioning information; a contact point for further information (email or landing page); data of the resource provider(s) and resource creator(s); classification by domain, keywords and intended LT application; language coverage (language and, if needed, dialect); publication date;

  • support: links to manuals, training material; samples of the resource;

  • usage/access: distribution form (e.g. as downloadable file, a form that can be accessed via an interface, source code or binary file of software, etc.); licensing conditions; access location.

These metadata elements can be used to describe all resources, irrespective of the resource type. Additional metadata elements, particular to each resource type, are required, such as size and format for data files, prerequisites for tools and services, etc.

For each metadata element we present the following information:

  • Path: the path of the element as in the XSD

  • Data type:
    • string

    • multilingual string: you can repeat the element for different language versions; to specify the language, you must use the xml attribute lang with a value from IETF BCP 47, the IANA Language Subtag Registry; for all metadata elements, a value in English (“en”) is mandatory

    • component: group of elements

    • Controlled Vocabulary (CV): value taken from a controlled vocabulary; a link to the relevant controlled vocabulary is provided

    • date: date in the format xs:date

    • URL

  • Optionality:
    • Mandatory (Μ): the element must always be filled in the metadata record

    • Recommended (R): the use of the element is not enforced but provides important information

    • Mandatory if applicable (MA): the element must be filled in when specific conditions apply

    • Recommended if applicable (RA): the use of the element is recommended when specific conditions apply

  • Explanation & Instructions: A short definition of the element, followed by instructions on how it should be used in the specific context.

  • Example: One or more examples for the element in XML format.