Minimal elements for language descriptions

This page describes the minimal metadata elements specific to language descriptions, a type of language resource under which we subsume both models and grammars.


LanguageDescription

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LanguageDescription

Data type component

Optionality Mandatory

Explanation & Instructions

Wraps together elements for language descriptions

Example

<ms:LRSubclass>
        <ms:LanguageDescription>
                <ms:lrType>LanguageDescription</ms:lrType>
                ...
        </ms:LanguageDescription>
</ms:LRSubclass>

LanguageDescriptionSubclass

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LanguageDescription.LanguageDescriptionSubclass

Data type component

Optionality Mandatory

Explanation & Instructions

The type of the language description (used for documentation purposes)

It wraps the set of elements that must be used for the Language Description subclasses:

Example

<ms:LanguageDescriptionSubclass>
        ...
<ms:LanguageDescriptionSubclass>

MLModel

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LanguageDescription.LanguageDescriptionSubclass.MLModel

Data type Component

Optionality Mandatory if applicable

Explanation & Instructions

Mandatory for Machine Learning (ML) models; a ML model, for our purposes, is defined as “The model artifact that is created through a training process involving an ML algorithm (that is, the learning algorithm) and the training data to learn from”

The following set of elements are mandatory or recommended for ML models:

  • ldSubclassType (Mandatory): Used to mark the subclass of a language description. For ML models, the value is fixed to ‘MLModel’.

  • modelVariant (Recommended): Introduces a label that can be used to identify the variant of a ML model.

  • typesystem (Recommended): Specifies the typesystem (preferrably through an identifier or URL) that has been used for the annotation of a resource or that is required for the input resource of a tool/service or that should be used (dependency) for the annotation or used in the training of a ML model.

  • method (Recommended): Specifies the method used for the development of a tool/service or the ML model. You must use one of the values from the CV.

  • mlFramework (Recommended): Specifies the framework that has been used for developing a model (e.g. keras, tensorflow, etc.).

  • trainingCorpusDetails (Recommended): Provides a detailed description of the training corpus (e.g., size, number of features , etc.).

Example

<ms:MLModel>
        <ms:ldSubclassType>MlModel</ms:ldSubclassType>
        <ms:modelVariant>factored</ms:modelVariant>
        <ms:typesystem>
                <ms:resourceName xml:lang="en">Universal dependencies</ms:resourceName>
                <ms:version>undefined</ms:version>
        </ms:typesystem>
        <ms:method>http://w3id.org/meta-share/omtd-share/DeepLearning</ms:method>
        <ms:mlFramework>tensorflow</ms:mlFramework>
        <ms:trainingCorpusDetails xml:lang="en">Trained on a corpus of tweets</ms:trainingCorpusDetails>
</ms:MLModel>

NGramModel

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LanguageDescription.LanguageDescriptionSubclass.NGramModel

Data type Component

Optionality Mandatory if applicable

Explanation & Instructions

Mandatory for n-gram models; n-gram model for our purposes is defined as “A language model consisting of n-grams, i.e. specific sequences of a number of words”

The following set of elements are mandatory or recommended for Machine Learning models:

  • ldSubclassType (Mandatory): Used to mark the subclass of a language description. For ML models, the value is fixed to ‘NGramModel’.

  • baseItem (Mandatory): Type of item that is represented in the n-gram resource.

  • order (Mandatory): Specifies the maximum number of items in the sequence.

  • perplexity (Recommended): Provides information on the perplexity derived from running on test set taken from the same corpus.

Example

<ms:NGramModel>
        <ms:ldSubclassType>NGramModel</ms:ldSubclassType>
        <ms:baseItem>http://w3id.org/meta-share/meta-share/word</ms:baseItem>
        <ms:order>5</ms:order>
</ms:NGramModel>

Grammar

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LanguageDescription.LanguageDescriptionSubclass.Grammar

Data type Component

Optionality Mandatory if applicable

Explanation & Instructions

Mandatory for grammars; grammar for our purposes is defined as “A set of rules governing what strings are valid or allowable in a language or text” [https://en.oxforddictionaries.com/definition/grammar]

The following set of elements are mandatory or recommended for computational grammars:

  • ldSubclassType (Mandatory): Used to mark the subclass of a language description. For grammars, the value is fixed to ‘Grammar.’

  • encodingLevel (Mandatory): Classifies the contents of a lexical/conceptual resource or language description as regards the linguistic level of analysis it caters for.

  • compliesWith (Recommended): Specifies the vocabulary/standard/best practice to which a resource is compliant with.

  • formalism (Recommended): Specifies the formalism (bibliographic reference, URL, name) used for the creation/enrichment of the resource (grammar or tool/service).

  • ldTask (Recommended): Specifies the task performed by the language description.

Example

<ms:Grammar>
        <ms:ldSubclassType>Grammar</ms:ldSubclassType>
        <ms:encodingLevel>http://w3id.org/meta-share/meta-share/morphology</ms:encodingLevel>
        <ms:compliesWith>http://w3id.org/meta-share/meta-share/GrAF</ms:compliesWith>
</ms:Grammar>