Minimal elements for corpora

This page describes the minimal metadata elements specific to corpora.


Corpus

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus

Data type component

Optionality Mandatory

Explanation & Instructions

Wraps together the set of elements that is specific to corpora

Example

<ms:LRSubclass>
        <ms:Corpus>
                <ms:lrType>Corpus</ms:lrType>
        </ms:Corpus>
</ms:LRSubclass>

corpusSubclass

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.corpusSubclass

Data type CV (corpusSubclass)

Optionality Mandatory

Explanation & Instructions

Introduces a classification of corpora into types (used for descriptive reasons)

Use one of the values for raw corpora, annotated corpora (mixed raw with annotations), annotations (only annotations without the original corpus)

Example

<ms:corpusSubclass>http://w3id.org/meta-share/meta-share/rawCorpus</ms:corpusSubclass>

<ms:corpusSubclass>http://w3id.org/meta-share/meta-share/annotatedCorpus</ms:corpusSubclass>

CorpusTextPart

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.CorpusMediaPart.CorpusTextPart

Data type component

Optionality Mandatory if applicable

Explanation & Instructions

The part of a corpus (or a whole corpus) that consists of textual segments (e.g., a corpus of publications, or transcriptions of an oral corpus, or subtitles , etc.)

You can repeat the group of elements for multiple textual parts.

The mandatory or recommended elements for the text part are:

  • mediaType (Mandatory): Specifies the media type of a language resource (the physical medium of the contents representation). For text parts, always use the value ‘text’.
  • lingualityType (Mandatory): Indicates whether the resource includes one, two or more languages.
  • multilingualityType (Mandatory if applicable): Indicates whether the resource (part) is parallel, comparable or mixed. If lingualityType = bilingual or multilingual, it is required; select one of the values for parallel (e.g., original text and its translations), comparable (e.g. corpus of the same domain in multiple languages) and multilingualSingleText (for corpora that consist of segments including text in two or more languages (e.g., the transcription of a European Parliament session with MPs speaking in their native language.
  • language (Mandatory): Specifies the language that is used in the resource part , expressed according to the BCP47 recommendation. See language.
  • languageVariety (Mandatory if applicable): Relates a language resource that contains segments in a language variety (e.g., dialect, jargon) to it. Please use for dialect corpora.
  • modalityType (Recommended if applicable): Specifies the type of the modality represented in the resource. For instance, you can use ‘spoken language’ to describe transcribed speech corpora.
  • TextGenre (Recommended): A category of text characterized by a particular style, form, or content according to a specific classification scheme. See TextGenre.

Example

<ms:CorpusTextPart>
        <ms:corpusMediaType>CorpusTextPart</ms:corpusMediaType>
        <ms:mediaType>http://w3id.org/meta-share/meta-share/text</ms:mediaType>
        <ms:lingualityType>http://w3id.org/meta-share/meta-share/monolingual</ms:lingualityType>
        <ms:language>
                <ms:languageTag>es</ms:languageTag>
                <ms:languageId>es</ms:languageId>
        </ms:language>
</ms:CorpusTextPart>

<ms:CorpusTextPart>
        <ms:corpusMediaType>CorpusTextPart</ms:corpusMediaType>
        <ms:mediaType>http://w3id.org/meta-share/meta-share/text</ms:mediaType>
        <ms:lingualityType>http://w3id.org/meta-share/meta-share/bilingual</ms:lingualityType>
        <ms:language>
                <ms:languageTag>es</ms:languageTag>
                <ms:languageId>es</ms:languageId>
        </ms:language>
        <ms:language>
                <ms:languageTag>en</ms:languageTag>
                <ms:languageId>en</ms:languageId>
        </ms:language>
        <ms:multilingualityType>http://w3id.org/meta-share/meta-share/parallel</ms:multilingualityType>
        <ms:TextGenre>
                <ms:CategoryLabel>administrative texts</ms:CategoryLabel>
        </ms:TextGenre>
</ms:CorpusTextPart>

<ms:CorpusTextPart>
        <ms:corpusMediaType>CorpusTextPart</ms:corpusMediaType>
        <ms:mediaType>http://w3id.org/meta-share/meta-share/text</ms:mediaType>
        <ms:lingualityType>http://w3id.org/meta-share/meta-share/monolingual</ms:lingualityType>
        <ms:language>
                <ms:languageTag>en</ms:languageTag>
                <ms:languageId>en</ms:languageId>
        </ms:language>
        <ms:modalityType>http://w3id.org/meta-share/meta-share/spokenLanguage</ms:modalityType>
</ms:CorpusTextPart>

CorpusAudioPart

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.CorpusMediaPart.CorpusAudioPart

Data type component

Optionality Mandatory if applicable

Explanation & Instructions

The part of a corpus (or whole corpus) that consists of audio segments

You can repeat the group of elements for multiple audio parts.

The mandatory or recommended elements for the audio part are:

  • mediaType (Mandatory): Specifies the media type of a language resource (the physical medium of the contents representation). For text parts, always use the value ‘audio’
  • lingualityType (Mandatory ): Indicates whether the resource includes one, two or more languages
  • multilingualityType (Mandatory if applicable): Indicates whether the resource (part) is parallel, comparable or mixed. If lingualityType = bilingual or multilingual, it is required; select one of the values for parallel (e.g., original text and its translations), comparable (e.g. corpus of the same domain in multiple languages) and multilingualSingleText (for corpora that consist of segments including text in two or more languages (e.g., the transcription of a European Parliament session with MPs speaking in their native language)
  • language (Mandatory): Specifies the language that is used in the resource part , expressed according to the BCP47 recommendation. See language
  • languageVariety (Mandatory if applicable): Relates a language resource that contains segments in a language variety (e.g., dialect, jargon) to it. Please use for dialect corpora.
  • modalityType (Recommended if applicable): Specifies the type of the modality represented in the resource. For instance, you can use ‘spoken language’ to describe transcribed speech corpora.
  • AudioGenre (Recommended if applicable): A category of audio characterized by a particular style, form, or content according to a specific classification scheme. See AudioGenre
  • SpeechGenre (Recommended if applicable): A category for the conventionalized discourse of the speech part of a language resource, based on extra-linguistic and internal linguistic criteria. See SpeechGenre

Example

<ms:CorpusAudioPart>
        <ms:corpusMediaType>CorpusAudioPart</ms:corpusMediaType>
        <ms:mediaType>http://w3id.org/meta-share/meta-share/audio</ms:mediaType>
        <ms:lingualityType>http://w3id.org/meta-share/meta-share/monolingual</ms:lingualityType>
        <ms:language>
                <ms:languageTag>en</ms:languageTag>
                <ms:languageId>en</ms:languageId>
        </ms:language>
        <ms:AudioGenre>
                <ms:CategoryLabel>conference noises</ms:CategoryLabel>
        </ms:AudioGenre>
</ms:CorpusAudioPart>

<ms:CorpusAudioPart>
        <ms:corpusMediaType>CorpusAudioPart</ms:corpusMediaType>
        <ms:mediaType>http://w3id.org/meta-share/meta-share/audio</ms:mediaType>
        <ms:lingualityType>http://w3id.org/meta-share/meta-share/monolingual</ms:lingualityType>
        <ms:language>
                <ms:languageTag>en</ms:languageTag>
                <ms:languageId>en</ms:languageId>
        </ms:language>
        <ms:modalityType>http://w3id.org/meta-share/meta-share/spokenLanguage</ms:modalityType>
        <ms:SpeechGenre>
                <ms:CategoryLabel>monologue</ms:CategoryLabel>
        </ms:SpeechGenre>
</ms:CorpusAudioPart>

CorpusVideoPart

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.CorpusMediaPart.CorpusVideoPart

Data type component

Optionality Mandatory if applicable

Explanation & Instructions

The part of a corpus (or a whole corpus) that consists of video segments (e.g., a corpus of video lectures, a part of a corpus with news, a sign language corpus, etc.)

You can repeat the group of elements for multiple video parts.

The mandatory or recommended elements for the video part are:

  • mediaType (Mandatory): Specifies the media type of a language resource (the physical medium of the contents representation). For text parts, always use the value ‘video’.
  • lingualityType (Mandatory ): Indicates whether the resource includes one, two or more languages.
  • multilingualityType (Mandatory if applicable): Indicates whether the resource (part) is parallel, comparable or mixed. If lingualityType = bilingual or multilingual, it is required; select one of the values for parallel (e.g., original text and its translations), comparable (e.g. corpus of the same domain in multiple languages) and multilingualSingleText (for corpora that consist of segments including text in two or more languages (e.g., the transcription of a European Parliament session with MPs speaking in their native language).
  • language (Mandatory): Specifies the language that is used in the resource part , expressed according to the BCP47 recommendation. See language.
  • languageVariety (Mandatory if applicable): Relates a language resource that contains segments in a language variety (e.g., dialect, jargon) to it. Please use for dialect corpora.
  • modalityType (Recommended if applicable): Specifies the type of the modality represented in the resource. For instance, you can use ‘spoken language’ to describe transcribed speech corpora.
  • VideoGenre (Recommended): A classification of video parts based on extra-linguistic and internal linguistic criteria and reflected on the video style, form or content. See VideoGenre
  • typeOfVideoContent (Mandatory): Main type of object or people represented in the video.

Example

<ms:CorpusVideoPart>
        <ms:corpusMediaType>CorpusVideoPart</ms:corpusMediaType>
        <ms:mediaType>http://w3id.org/meta-share/meta-share/video</ms:mediaType>
        <ms:lingualityType>http://w3id.org/meta-share/meta-share/monolingual</ms:lingualityType>
        <ms:language>
                <ms:languageTag>en</ms:languageTag>
                <ms:languageId>en</ms:languageId>
        </ms:language>
        <ms:modalityType>http://w3id.org/meta-share/meta-share/bodyGesture</ms:modalityType>
        <ms:modalityType>http://w3id.org/meta-share/meta-share/facialExpression</ms:modalityType>
        <ms:modalityType>http://w3id.org/meta-share/meta-share/spokenLanguage</ms:modalityType>
        <ms:typeOfVideoContent>people eating at a restaurant</ms:typeOfVideoContent>
</ms:CorpusVideoPart>

<ms:CorpusVideoPart>
        <ms:corpusMediaType>CorpusVideoPart</ms:corpusMediaType>
        <ms:mediaType>http://w3id.org/meta-share/meta-share/video</ms:mediaType>
        <ms:lingualityType>http://w3id.org/meta-share/meta-share/monolingual</ms:lingualityType>
        <ms:language>
                <ms:languageTag>fr</ms:languageTag>
                <ms:languageId>fr</ms:languageId>
        </ms:language>
        <ms:VideoGenre>
                <ms:CategoryLabel>documentary</ms:CategoryLabel>
        </ms:VideoGenre>
        <ms:typeOfVideoContent>birds, wild animals, plants</ms:typeOfVideoContent>
</ms:CorpusVideoPart>

CorpusImagePart

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.CorpusMediaPart.CorpusImagePart

Data type component

Optionality Mandatory if applicable

Explanation & Instructions

The part of a corpus (or whole corpus) that consists of images (e.g., g a corpus of photographs and their captions)

You can repeat the group of elements for multiple video parts.

The mandatory or recommended elements for the image part are:

  • mediaType (Mandatory): Specifies the media type of a language resource (the physical medium of the contents representation). For text parts, always use the value ‘image’.
  • lingualityType (Mandatory ): Indicates whether the resource includes one, two or more languages.
  • multilingualityType (Mandatory if applicable): Indicates whether the resource (part) is parallel, comparable or mixed. If lingualityType = bilingual or multilingual, it is required; select one of the values for parallel (e.g., original text and its translations), comparable (e.g. corpus of the same domain in multiple languages) and multilingualSingleText (for corpora that consist of segments including text in two or more languages (e.g., the transcription of a European Parliament session with MPs speaking in their native language).
  • language (Mandatory): Specifies the language that is used in the resource part, expressed according to the BCP47 recommendation. See language.
  • languageVariety (Mandatory if applicable): Relates a language resource that contains segments in a language variety (e.g., dialect, jargon) to it. Please use for dialect corpora.
  • modalityType (Recommended if applicable): Specifies the type of the modality represented in the resource.
  • ImageGenre (Recommended): A category of images characterized by a particular style, form, or content according to a specific classification scheme. See ImageGenre.
  • typeOfImageContent (Mandatory): Main type of object or people represented in the image.

Example

<ms:CorpusImagePart>
        <ms:corpusMediaType>CorpusImagePart</ms:corpusMediaType>
        <ms:mediaType>http://w3id.org/meta-share/meta-share/image</ms:mediaType>
        <ms:lingualityType>http://w3id.org/meta-share/meta-share/monolingual</ms:lingualityType>
        <ms:language>
                <ms:languageTag>el</ms:languageTag>
                <ms:languageId>el</ms:languageId>
        </ms:language>
        <ms:ImageGenre>
                <ms:CategoryLabel>comics</ms:CategoryLabel>
        </ms:ImageGenre>
        <ms:typeOfImageContent>human figures</ms:typeOfVideoContent>
</ms:CorpusImagePart>

TextGenre

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.CorpusMediaPart.CorpusTextPart.TextGenre

Data type component

Optionality Recommended

Explanation & Instructions

A category of text characterized by a particular style, form, or content according to a specific classification scheme

You can add only a free text value at the CategoryLabel element; if you have used a value from an established controlled vocabulary, you can use the TextGenreIdentifier and the attribute TextGenreClassificationScheme.

Example

<ms:TextGenre>
        <ms:CategoryLabel>movie subtitles</ms:CategoryLabel>
</ms:TextGenre>

<ms:TextGenre>
        <ms:CategoryLabel>news articles</ms:CategoryLabel>
</ms:TextGenre>

AudioGenre

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.CorpusMediaPart.CorpusAudioPart

Data type component

Optionality Recommended if applicable

Explanation & Instructions

A category of audio characterized by a particular style, form, or content according to a specific classification scheme

You can add only a free text value at the CategoryLabel element; if you have used a value from an established controlled vocabulary, you can use the AudioGenreIdentifier and the attribute AudioGenreClassificationScheme to provide further details.

Example

<ms:AudioGenre>
        <ms:CategoryLabel>conference noises</ms:CategoryLabel>
</ms:AudioGenre>

SpeechGenre

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.CorpusMediaPart.CorpusAudioPart.SpeechGenre

Data type component

Optionality Recommended if applicable

Explanation & Instructions

A category for the conventionalized discourse of the speech part of a language resource, based on extra-linguistic and internal linguistic criteria

You can add only a free text value at the CategoryLabel element; if you have used a value from an established controlled vocabulary, you can use the SpeechGenreIdentifier and the attribute SpeechGenreClassificationScheme to provide further details.

Example

<ms:SpeechGenre>
        <ms:CategoryLabel>broadcast news</ms:CategoryLabel>
</ms:SpeechGenre>

<ms:SpeechGenre>
        <ms:CategoryLabel>monologue</ms:CategoryLabel>
</ms:SpeechGenre>

VideoGenre

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.CorpusMediaPart.CorpusVideoPart.VideoGenre

Data type string (+ id + scheme)

Optionality Recommended if applicable

Explanation & Instructions

A classification of video parts based on extra-linguistic and internal linguistic criteria and reflected on the video style, form or content

You can add only a free text value at the CategoryLabel element; if you have used a value from an established controlled vocabulary, you can use the VideoGenreIdentifier and the attribute VideoClassificationScheme

Example

<ms:videoGenre>
        <ms:CategoryLabel>documentaries</ms:CategoryLabel>
</ms:videoGenre>

<ms:videoGenre>
        <ms:CategoryLabel>video lectures</ms:CategoryLabel>
</ms:videoGenre>

ImageGenre

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.CorpusMediaPart.CorpusImagePart.ImageGenre

Data type component

Optionality Recommended

Explanation & Instructions

A category of images characterized by a particular style, form, or content according to a specific classification scheme

You can add only a free text value at the CategoryLabel element; if you have used a value from an established controlled vocabulary, you can use the ImageGenreIdentifier and the attribute ImageClassificationScheme to provide further details.

Example

<ms:imageGenre>
        <ms:CategoryLabel>human faces</ms:CategoryLabel>
</ms:imageGenre>

<ms:imageGenre>
        <ms:CategoryLabel>landscape</ms:CategoryLabel>
</ms:imageGenre>

DatasetDistribution

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.DatasetDistribution

Data type component

Optionality Mandatory

Explanation & Instructions

Any form with which a dataset is distributed, such as a downloadable form in a specific format (e.g., spreadsheet, plain text , etc.) or an API with which it can be accessed

You can repeat the element for multiple distributions.

The list of mandatory and recommended elements are:

  • DatasetDistributionForm (Mandatory): The form (medium/channel) used for distributing a language resource consisting of data (e.g., a corpus, a lexicon, etc.). The typical values are ‘downloadable’, ‘accessibleThroughInterface’, ‘accessibleThroughQuery’ (see more at DatasetDistributionForm).
  • downloadLocation (Mandatory if applicable): A URL where the language resource (mainly data but also downloadable software programmes or forms) can be downloaded from. Use this element if the value of DatasetDistributionForm is ‘downloadable’ and only for direct download links (i.e., from which the dataset is downloaded without the need of further actions such as clicks on a page).
  • accessLocation (Mandatory if applicable): A URL where the resource can be accessed from; it can be used for landing pages or for cases where the resource is accessible via an interface, i.e. cases where the resource itself is not provided with a direct link for downloading. Use if the value of DatasetDistributionForm is ‘accessibleThroughInterface’ or ‘accessibleThroughQuery’ but also for links used for downloading corpora which are mentioned on a landing page or require some kind of action on the part of the user.
  • samplesLocation (Recommended): Links a resource to a url (or url’s) with samples of a data resource or of the input of output resource of a tool/service.
  • licenceTerms (Mandatory): See licenceTerms
  • cost (Mandatory if applicable): Introduces the cost for accessing a resource, formally described as a set of amount and currency unit. Please use only for resources available at a cost and not for free resources.

Depending on the parts of the corpus, you must also use one or more of the following:

Example

<ms:DatasetDistribution>
        <ms:DatasetDistributionForm>http://w3id.org/meta-share/meta-share/downloadable</ms:DatasetDistributionForm>
        <ms:accessLocation>https://www.someAccessURL.com</ms:accessLocation>
        <ms:samplesLocation>https://www.URLwithsamples.com</ms:samplesLocation>
        <ms:distributionTextFeature>
                <ms:size>
                        <ms:amount>17601</ms:amount>
                        <ms:sizeUnit>http://w3id.org/meta-share/meta-share/unit</ms:sizeUnit>
                </ms:size>
                <ms:dataFormat>http://w3id.org/meta-share/omtd-share/Xml</ms:dataFormat>
                <ms:characterEncoding>http://w3id.org/meta-share/meta-share/UTF-8</ms:characterEncoding>
        </ms:distributionTextFeature>
        <ms:licenceTerms>
                <ms:licenceTermsName xml:lang="en">openUnder-PSI</ms:licenceTermsName>
                <ms:licenceTermsURL>https://elrc-share.eu/terms/openUnderPSI.html</ms:licenceTermsURL>
        </ms:licenceTerms>
</ms:DatasetDistribution>

<ms:DatasetDistribution>
        <ms:DatasetDistributionForm>http://w3id.org/meta-share/meta-share/accessibleThroughInterface</ms:DatasetDistributionForm>
        <ms:accessLocation>https://www.someAccessURL.com</ms:accessLocation>
        <ms:distributionTextFeature>
                <ms:size>
                        <ms:amount>100</ms:amount>
                        <ms:sizeUnit>http://w3id.org/meta-share/meta-share/text1</ms:sizeUnit>
                </ms:size>
                <ms:dataFormat>http://w3id.org/meta-share/omtd-share/Pdf</ms:dataFormat>
                <ms:characterEncoding>http://w3id.org/meta-share/meta-share/UTF-8</ms:characterEncoding>
        </ms:distributionTextFeature>
        <ms:licenceTerms>
                <ms:licenceTermsName xml:lang="en">some commercial licence</ms:licenceTermsName>
                <ms:licenceTermsURL>https://elrc-share.eu/terms/someCommercialLicence.html</ms:licenceTermsURL>
        </ms:licenceTerms>
        <ms:cost>
                <ms:amount>10000</ms:amount>
                <ms:currency>http://w3id.org/meta-share/meta-share/euro</ms:currency>
        </ms:cost>
</ms:DatasetDistribution>

distributionTextFeature

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.DatasetDistribution.distributionTextFeature

Data type component

Optionality Mandatory if applicable

Explanation & Instructions

Links to a feature that can be used for describing distinct distributable forms of text resources/parts

The following are mandatory or recommended:

  • size (Mandatory): The size of the text part, expressed as a combination of amount and sizeUnit (with a value from a CV for sizeUnit).
  • dataFormat (Mandatory): Indicates the format(s) of a data resource; it takes a value from a CV (dataFormat); the dataFormat includes the IANA mimetype and pointers to additional documentation for specialized formats (e.g., GATE XML, CONLL formats, etc.).
  • characterEncoding (Recommended): Specifies the character encoding used for a language resource data distribution.

Example

<ms:distributionTextFeature>
        <ms:size>
                <ms:amount>9139</ms:amount>
                <ms:sizeUnit>http://w3id.org/meta-share/meta-share/sentence</ms:sizeUnit>
        </ms:size>
        <ms:size>
                <ms:amount>40</ms:amount>
                <ms:sizeUnit>http://w3id.org/meta-share/meta-share/file</ms:sizeUnit>
        </ms:size>
        <ms:dataFormat>http://w3id.org/meta-share/omtd-share/Xml</ms:dataFormat>
        <ms:characterEncoding>http://w3id.org/meta-share/meta-share/UTF-8</ms:characterEncoding>
</ms:distributionTextFeature>

distributionAudioFeature

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.DatasetDistribution.distributionAudioFeature

Data type component

Optionality Mandatory if applicable

Explanation & Instructions

Links to a feature that can be used for describing distinct distributable forms of audio resources/parts

The following are mandatory or recommended:

  • size (Mandatory): The size of the audio part, expressed as a combination of amount and sizeUnit (with a value from a CV for sizeUnit).
  • durationOfAudio (Recommended): Specifies the duration of the audio recording including silences, music, pauses, etc., expressed as a combination of amount and durationUnit (with a value from the CV for durationUnit).
  • durationOfEffectiveSpeech (Recommended): Specifies the duration of effective speech of the audio (part of a) resource, expressed as a combination of amount and durationUnit (with a value from the CV for durationUnit).
  • audioFormat (Mandatory): Indicates the format(s) of the audio (part of a) data resource, expressed as a value of dataFormat (with a value from a CV for dataFormat) and compressed.

Example

<ms:distributionAudioFeature>
        <ms:size>
                <ms:amount>10</ms:amount>
                <ms:sizeUnit>http://w3id.org/meta-share/meta-share/file</ms:sizeUnit>
        </ms:size>
        <ms:durationOfAudio>
                <ms:amount>3</ms:amount>
                <ms:durationUnit>http://w3id.org/meta-share/meta-share/hour</ms:durationUnit>
        </ms:durationOfAudio>
        <ms:audioFormat>
                <ms:dataFormat>http://w3id.org/meta-share/omtd-share/wav</ms:dataFormat>
                <ms:compressed>true</ms:compressed>
        </ms:audioFormat>
</ms:distributionAudioFeature>

distributionVideoFeature

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.DatasetDistribution.distributionVideoFeature

Data type component

Optionality Mandatory if applicable

Explanation & Instructions

Links to a feature that can be used for describing distinct distributable forms of video resources/parts

The following are mandatory or recommended:

  • size (Mandatory): The size of the video part, expressed as a combination of amount and sizeUnit (with a value from a CV for sizeUnit).
  • durationOfVideo (Recommended): Specifies the duration of the video recording, expressed as a combination of amount and durationUnit (with a value from the CV for durationUnit).
  • videoFormat (Mandatory): Indicates the format(s) of the video (part of a) data resource, expressed as a value of dataFormat (with a value from a CV for dataFormat) and compressed.

Example

<ms:distributionVideoFeature>
        <ms:size>
                <ms:amount>9139</ms:amount>
                <ms:sizeUnit>http://w3id.org/meta-share/meta-share/screen</ms:sizeUnit>
        </ms:size>
        <ms:size>
                <ms:amount>40</ms:amount>
                <ms:sizeUnit>http://w3id.org/meta-share/meta-share/file</ms:sizeUnit>
        </ms:size>
        <ms:durationOfVideo>
                <ms:amount>40</ms:amount>
                <ms:durationUnit>http://w3id.org/meta-share/meta-share/hour</ms:durationUnit>
        </ms:durationOfVideo>
        <ms:videoFormat>
                <ms:dataFormat>http://w3id.org/meta-share/omtd-share/wav</ms:dataFormat>
                <ms:compressed>true</ms:compressed>
        </ms:videoFormat>

distributionImageFeature

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.DatasetDistribution.distributionImageFeature

Data type component

Optionality Mandatory if applicable

Explanation & Instructions

Links to a feature that can be used for describing distinct distributable forms of image resources/parts

The following are mandatory or recommended:

  • size (Mandatory): The size of the image part, expressed as a combination of amount and sizeUnit (with a value from a CV for sizeUnit).
  • imageFormat (Mandatory): Indicates the format(s) of the image (part of a) data resource, expressed as a value of dataFormat (with a value from a CV for dataFormat) and compressed.

Example

<ms:distributionImageFeature>
        <ms:size>
                <ms:amount>100</ms:amount>
                <ms:sizeUnit>http://w3id.org/meta-share/meta-share/file</ms:sizeUnit>
        </ms:size>
        <ms:imageFormat>
                <ms:dataFormat>http://w3id.org/meta-share/omtd-share/Pdf</ms:dataFormat>
                <ms:compressed>true</ms:compressed>
        </ms:imageFormat>

personalDataIncluded

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.personalDataIncluded

Data type boolean

Optionality Mandatory

Explanation & Instructions

Specifies whether the language resource contains personal data (mainly in the sense falling under the GDPR)

If the resource contains personal data, you can use the (optional) personalDataDetails to provide more information

Example

<ms:personalDataIncluded>true</ms:personalDataIncluded>
<ms:personalDataDetails>The corpus contains data on the place of living and place of birth of participants</ms:personalDataDetails>

sensitiveDataIncluded

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.sensitiveDataIncluded

Data type boolean

Optionality Mandatory

Explanation & Instructions

Specifies whether the language resource contains sensitive data (e.g., medical/health-related, etc.) and thus requires special handling

If the resource contains sensitive data, you can use the (optional) sensitiveDataDetails to provide more information.

Example

<ms:sensitiveDataIncluded>true</ms:sensitiveDataIncluded>
<ms:sensitiveDataDetails>The corpus contains medical data for persons with disabilities</ms:sensitiveDataDetails>

anonymized

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.anonymized

Data type boolean

Optionality Mandatory if applicable

Explanation & Instructions

Indicates whether the language resource has been anonymized

The element is mandatory if either personalDataIncluded or sensitiveDataIncluded have ‘true’ as value; anonymizationDetails must also be filled in with information on the anonymization mehod, etc.

Example

<ms:anonymized>true</ms:anonmized>
<ms:anonymizationDetails>pseudonymization performed manually</ms:anonymizationDetails>

annotation

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.annotation

Data type component

Optionality Mandatory if applicable

Explanation & Instructions

Links a corpus to its annotated part(s)

You must use it for annotated corpora and annotations. You can repeat it for corpora that have separate files for each annotation type, or if you want to given information such as the use of different annotation tools for each annotation level.

Enter at least the annotation type(s); if you want, you can give a more detailed description of the annotated parts - see the annotation component of the full schema.

Example

<ms:annotation>
        <ms:annotationType>http://w3id.org/meta-share/omtd-share/Lemma</ms:annotationType>
        <ms:annotationStandoff>false</ms:annotationStandoff>
        <ms:annotationMode>http://w3id.org/meta-share/meta-share/mixed</ms:annotationMode>
        <ms:isAnnotatedBy>
                <ms:resourceName xml:lang="en">Lemmatizer</ms:resourceName>
        </ms:isAnnotatedBy>
</ms:annotation>

<ms:annotation>
        <ms:annotationType>http://w3id.org/meta-share/omtd-share/PartOfSpeech</ms:annotationType>
        <ms:annotationStandoff>false</ms:annotationStandoff>
        <ms:tagset>
                <ms:resourceName xml:lang="en">Universal Dependencies</ms:resourceName>
        </ms:tagset>
        <ms:isAnnotatedBy>
                <ms:resourceName xml:lang="en">PoS tagger</ms:resourceName>
        </ms:isAnnotatedBy>
</ms:annotation>

<ms:annotation>
        <ms:annotationType>http://w3id.org/meta-share/omtd-share/SyntacticAnnotationType</ms:annotationType>
</ms:annotation>