Minimal elements for language descriptions¶
This page describes the minimal metadata elements specific to language descriptions, a type of language resource under which we subsume both models and grammars.
LanguageDescription¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LanguageDescription
Data type component
Optionality Mandatory
Explanation & Instructions
Wraps together elements for language descriptions
Example
<ms:LRSubclass>
<ms:LanguageDescription>
<ms:lrType>LanguageDescription</ms:lrType>
...
</ms:LanguageDescription>
</ms:LRSubclass>
LanguageDescriptionSubclass¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LanguageDescription.LanguageDescriptionSubclass
Data type component
Optionality Mandatory
Explanation & Instructions
The type of the language description (used for documentation purposes)
It wraps the set of elements that must be used for the Language Description subclasses:
Machine Learning Model: See MLModel
N-gram model: See NGramModel
Computational grammar: See Grammar
Example
<ms:LanguageDescriptionSubclass>
...
<ms:LanguageDescriptionSubclass>
MLModel¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LanguageDescription.LanguageDescriptionSubclass.MLModel
Data type Component
Optionality Mandatory if applicable
Explanation & Instructions
Mandatory for Machine Learning (ML) models; a ML model, for our purposes, is defined as “The model artifact that is created through a training process involving an ML algorithm (that is, the learning algorithm) and the training data to learn from”
The following set of elements are mandatory or recommended for ML models:
ldSubclassType
(Mandatory): Used to mark the subclass of a language description. For ML models, the value is fixed to ‘MLModel’.modelVariant
(Recommended): Introduces a label that can be used to identify the variant of a ML model.typesystem
(Recommended): Specifies the typesystem (preferrably through an identifier or URL) that has been used for the annotation of a resource or that is required for the input resource of a tool/service or that should be used (dependency) for the annotation or used in the training of a ML model.method
(Recommended): Specifies the method used for the development of a tool/service or the ML model. You must use one of the values from the CV.mlFramework
(Recommended): Specifies the framework that has been used for developing a model (e.g. keras, tensorflow, etc.).trainingCorpusDetails
(Recommended): Provides a detailed description of the training corpus (e.g., size, number of features , etc.).
Example
<ms:MLModel>
<ms:ldSubclassType>MlModel</ms:ldSubclassType>
<ms:modelVariant>factored</ms:modelVariant>
<ms:typesystem>
<ms:resourceName xml:lang="en">Universal dependencies</ms:resourceName>
<ms:version>undefined</ms:version>
</ms:typesystem>
<ms:method>http://w3id.org/meta-share/omtd-share/DeepLearning</ms:method>
<ms:mlFramework>tensorflow</ms:mlFramework>
<ms:trainingCorpusDetails xml:lang="en">Trained on a corpus of tweets</ms:trainingCorpusDetails>
</ms:MLModel>
NGramModel¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LanguageDescription.LanguageDescriptionSubclass.NGramModel
Data type Component
Optionality Mandatory if applicable
Explanation & Instructions
Mandatory for n-gram models; n-gram model for our purposes is defined as “A language model consisting of n-grams, i.e. specific sequences of a number of words”
The following set of elements are mandatory or recommended for Machine Learning models:
ldSubclassType
(Mandatory): Used to mark the subclass of a language description. For ML models, the value is fixed to ‘NGramModel’.baseItem
(Mandatory): Type of item that is represented in the n-gram resource.order
(Mandatory): Specifies the maximum number of items in the sequence.perplexity
(Recommended): Provides information on the perplexity derived from running on test set taken from the same corpus.
Example
<ms:NGramModel>
<ms:ldSubclassType>NGramModel</ms:ldSubclassType>
<ms:baseItem>http://w3id.org/meta-share/meta-share/word</ms:baseItem>
<ms:order>5</ms:order>
</ms:NGramModel>
Grammar¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LanguageDescription.LanguageDescriptionSubclass.Grammar
Data type Component
Optionality Mandatory if applicable
Explanation & Instructions
Mandatory for grammars; grammar for our purposes is defined as “A set of rules governing what strings are valid or allowable in a language or text” [https://en.oxforddictionaries.com/definition/grammar]
The following set of elements are mandatory or recommended for computational grammars:
ldSubclassType
(Mandatory): Used to mark the subclass of a language description. For grammars, the value is fixed to ‘Grammar.’encodingLevel
(Mandatory): Classifies the contents of a lexical/conceptual resource or language description as regards the linguistic level of analysis it caters for.compliesWith
(Recommended): Specifies the vocabulary/standard/best practice to which a resource is compliant with.formalism
(Recommended): Specifies the formalism (bibliographic reference, URL, name) used for the creation/enrichment of the resource (grammar or tool/service).ldTask
(Recommended): Specifies the task performed by the language description.
Example
<ms:Grammar>
<ms:ldSubclassType>Grammar</ms:ldSubclassType>
<ms:encodingLevel>http://w3id.org/meta-share/meta-share/morphology</ms:encodingLevel>
<ms:compliesWith>http://w3id.org/meta-share/meta-share/GrAF</ms:compliesWith>
</ms:Grammar>