Minimal elements for corpora¶
This page describes the minimal metadata elements specific to corpora.
Corpus¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus
Data type component
Optionality Mandatory
Explanation & Instructions
Wraps together the set of elements that is specific to corpora
Example
<ms:LRSubclass>
<ms:Corpus>
<ms:lrType>Corpus</ms:lrType>
</ms:Corpus>
</ms:LRSubclass>
corpusSubclass¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.corpusSubclass
Data type CV (corpusSubclass)
Optionality Mandatory
Explanation & Instructions
Introduces a classification of corpora into types (used for descriptive reasons)
Use one of the values for raw corpora, annotated corpora (mixed raw with annotations), annotations (only annotations without the original corpus)
Example
<ms:corpusSubclass>http://w3id.org/meta-share/meta-share/rawCorpus</ms:corpusSubclass>
<ms:corpusSubclass>http://w3id.org/meta-share/meta-share/annotatedCorpus</ms:corpusSubclass>
CorpusTextPart¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.CorpusMediaPart.CorpusTextPart
Data type component
Optionality Mandatory if applicable
Explanation & Instructions
The part of a corpus (or a whole corpus) that consists of textual segments (e.g., a corpus of publications, or transcriptions of an oral corpus, or subtitles , etc.)
You can repeat the group of elements for multiple textual parts.
The mandatory or recommended elements for the text part are:
mediaType
(Mandatory): Specifies the media type of a language resource (the physical medium of the contents representation). For text parts, always use the value ‘text’.lingualityType
(Mandatory): Indicates whether the resource includes one, two or more languages.multilingualityType
(Mandatory if applicable): Indicates whether the resource (part) is parallel, comparable or mixed. If lingualityType = bilingual or multilingual, it is required; select one of the values for parallel (e.g., original text and its translations), comparable (e.g. corpus of the same domain in multiple languages) and multilingualSingleText (for corpora that consist of segments including text in two or more languages (e.g., the transcription of a European Parliament session with MPs speaking in their native language.language
(Mandatory): Specifies the language that is used in the resource part , expressed according to the BCP47 recommendation. See language.languageVariety
(Mandatory if applicable): Relates a language resource that contains segments in a language variety (e.g., dialect, jargon) to it. Please use for dialect corpora.modalityType
(Recommended if applicable): Specifies the type of the modality represented in the resource. For instance, you can use ‘spoken language’ to describe transcribed speech corpora.TextGenre
(Recommended): A category of text characterized by a particular style, form, or content according to a specific classification scheme. See TextGenre.
Example
<ms:CorpusTextPart>
<ms:corpusMediaType>CorpusTextPart</ms:corpusMediaType>
<ms:mediaType>http://w3id.org/meta-share/meta-share/text</ms:mediaType>
<ms:lingualityType>http://w3id.org/meta-share/meta-share/monolingual</ms:lingualityType>
<ms:language>
<ms:languageTag>es</ms:languageTag>
<ms:languageId>es</ms:languageId>
</ms:language>
</ms:CorpusTextPart>
<ms:CorpusTextPart>
<ms:corpusMediaType>CorpusTextPart</ms:corpusMediaType>
<ms:mediaType>http://w3id.org/meta-share/meta-share/text</ms:mediaType>
<ms:lingualityType>http://w3id.org/meta-share/meta-share/bilingual</ms:lingualityType>
<ms:language>
<ms:languageTag>es</ms:languageTag>
<ms:languageId>es</ms:languageId>
</ms:language>
<ms:language>
<ms:languageTag>en</ms:languageTag>
<ms:languageId>en</ms:languageId>
</ms:language>
<ms:multilingualityType>http://w3id.org/meta-share/meta-share/parallel</ms:multilingualityType>
<ms:TextGenre>
<ms:CategoryLabel>administrative texts</ms:CategoryLabel>
</ms:TextGenre>
</ms:CorpusTextPart>
<ms:CorpusTextPart>
<ms:corpusMediaType>CorpusTextPart</ms:corpusMediaType>
<ms:mediaType>http://w3id.org/meta-share/meta-share/text</ms:mediaType>
<ms:lingualityType>http://w3id.org/meta-share/meta-share/monolingual</ms:lingualityType>
<ms:language>
<ms:languageTag>en</ms:languageTag>
<ms:languageId>en</ms:languageId>
</ms:language>
<ms:modalityType>http://w3id.org/meta-share/meta-share/spokenLanguage</ms:modalityType>
</ms:CorpusTextPart>
CorpusAudioPart¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.CorpusMediaPart.CorpusAudioPart
Data type component
Optionality Mandatory if applicable
Explanation & Instructions
The part of a corpus (or whole corpus) that consists of audio segments
You can repeat the group of elements for multiple audio parts.
The mandatory or recommended elements for the audio part are:
mediaType
(Mandatory): Specifies the media type of a language resource (the physical medium of the contents representation). For text parts, always use the value ‘audio’lingualityType
(Mandatory ): Indicates whether the resource includes one, two or more languagesmultilingualityType
(Mandatory if applicable): Indicates whether the resource (part) is parallel, comparable or mixed. If lingualityType = bilingual or multilingual, it is required; select one of the values for parallel (e.g., original text and its translations), comparable (e.g. corpus of the same domain in multiple languages) and multilingualSingleText (for corpora that consist of segments including text in two or more languages (e.g., the transcription of a European Parliament session with MPs speaking in their native language)language
(Mandatory): Specifies the language that is used in the resource part , expressed according to the BCP47 recommendation. See languagelanguageVariety
(Mandatory if applicable): Relates a language resource that contains segments in a language variety (e.g., dialect, jargon) to it. Please use for dialect corpora.modalityType
(Recommended if applicable): Specifies the type of the modality represented in the resource. For instance, you can use ‘spoken language’ to describe transcribed speech corpora.AudioGenre
(Recommended if applicable): A category of audio characterized by a particular style, form, or content according to a specific classification scheme. See AudioGenreSpeechGenre
(Recommended if applicable): A category for the conventionalized discourse of the speech part of a language resource, based on extra-linguistic and internal linguistic criteria. See SpeechGenre
Example
<ms:CorpusAudioPart>
<ms:corpusMediaType>CorpusAudioPart</ms:corpusMediaType>
<ms:mediaType>http://w3id.org/meta-share/meta-share/audio</ms:mediaType>
<ms:lingualityType>http://w3id.org/meta-share/meta-share/monolingual</ms:lingualityType>
<ms:language>
<ms:languageTag>en</ms:languageTag>
<ms:languageId>en</ms:languageId>
</ms:language>
<ms:AudioGenre>
<ms:CategoryLabel>conference noises</ms:CategoryLabel>
</ms:AudioGenre>
</ms:CorpusAudioPart>
<ms:CorpusAudioPart>
<ms:corpusMediaType>CorpusAudioPart</ms:corpusMediaType>
<ms:mediaType>http://w3id.org/meta-share/meta-share/audio</ms:mediaType>
<ms:lingualityType>http://w3id.org/meta-share/meta-share/monolingual</ms:lingualityType>
<ms:language>
<ms:languageTag>en</ms:languageTag>
<ms:languageId>en</ms:languageId>
</ms:language>
<ms:modalityType>http://w3id.org/meta-share/meta-share/spokenLanguage</ms:modalityType>
<ms:SpeechGenre>
<ms:CategoryLabel>monologue</ms:CategoryLabel>
</ms:SpeechGenre>
</ms:CorpusAudioPart>
CorpusVideoPart¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.CorpusMediaPart.CorpusVideoPart
Data type component
Optionality Mandatory if applicable
Explanation & Instructions
The part of a corpus (or a whole corpus) that consists of video segments (e.g., a corpus of video lectures, a part of a corpus with news, a sign language corpus, etc.)
You can repeat the group of elements for multiple video parts.
The mandatory or recommended elements for the video part are:
mediaType
(Mandatory): Specifies the media type of a language resource (the physical medium of the contents representation). For text parts, always use the value ‘video’.lingualityType
(Mandatory ): Indicates whether the resource includes one, two or more languages.multilingualityType
(Mandatory if applicable): Indicates whether the resource (part) is parallel, comparable or mixed. If lingualityType = bilingual or multilingual, it is required; select one of the values for parallel (e.g., original text and its translations), comparable (e.g. corpus of the same domain in multiple languages) and multilingualSingleText (for corpora that consist of segments including text in two or more languages (e.g., the transcription of a European Parliament session with MPs speaking in their native language).language
(Mandatory): Specifies the language that is used in the resource part , expressed according to the BCP47 recommendation. See language.languageVariety
(Mandatory if applicable): Relates a language resource that contains segments in a language variety (e.g., dialect, jargon) to it. Please use for dialect corpora.modalityType
(Recommended if applicable): Specifies the type of the modality represented in the resource. For instance, you can use ‘spoken language’ to describe transcribed speech corpora.VideoGenre
(Recommended): A classification of video parts based on extra-linguistic and internal linguistic criteria and reflected on the video style, form or content. See VideoGenretypeOfVideoContent
(Mandatory): Main type of object or people represented in the video.
Example
<ms:CorpusVideoPart>
<ms:corpusMediaType>CorpusVideoPart</ms:corpusMediaType>
<ms:mediaType>http://w3id.org/meta-share/meta-share/video</ms:mediaType>
<ms:lingualityType>http://w3id.org/meta-share/meta-share/monolingual</ms:lingualityType>
<ms:language>
<ms:languageTag>en</ms:languageTag>
<ms:languageId>en</ms:languageId>
</ms:language>
<ms:modalityType>http://w3id.org/meta-share/meta-share/bodyGesture</ms:modalityType>
<ms:modalityType>http://w3id.org/meta-share/meta-share/facialExpression</ms:modalityType>
<ms:modalityType>http://w3id.org/meta-share/meta-share/spokenLanguage</ms:modalityType>
<ms:typeOfVideoContent>people eating at a restaurant</ms:typeOfVideoContent>
</ms:CorpusVideoPart>
<ms:CorpusVideoPart>
<ms:corpusMediaType>CorpusVideoPart</ms:corpusMediaType>
<ms:mediaType>http://w3id.org/meta-share/meta-share/video</ms:mediaType>
<ms:lingualityType>http://w3id.org/meta-share/meta-share/monolingual</ms:lingualityType>
<ms:language>
<ms:languageTag>fr</ms:languageTag>
<ms:languageId>fr</ms:languageId>
</ms:language>
<ms:VideoGenre>
<ms:CategoryLabel>documentary</ms:CategoryLabel>
</ms:VideoGenre>
<ms:typeOfVideoContent>birds, wild animals, plants</ms:typeOfVideoContent>
</ms:CorpusVideoPart>
CorpusImagePart¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.CorpusMediaPart.CorpusImagePart
Data type component
Optionality Mandatory if applicable
Explanation & Instructions
The part of a corpus (or whole corpus) that consists of images (e.g., g a corpus of photographs and their captions)
You can repeat the group of elements for multiple video parts.
The mandatory or recommended elements for the image part are:
mediaType
(Mandatory): Specifies the media type of a language resource (the physical medium of the contents representation). For text parts, always use the value ‘image’.lingualityType
(Mandatory ): Indicates whether the resource includes one, two or more languages.multilingualityType
(Mandatory if applicable): Indicates whether the resource (part) is parallel, comparable or mixed. If lingualityType = bilingual or multilingual, it is required; select one of the values for parallel (e.g., original text and its translations), comparable (e.g. corpus of the same domain in multiple languages) and multilingualSingleText (for corpora that consist of segments including text in two or more languages (e.g., the transcription of a European Parliament session with MPs speaking in their native language).language
(Mandatory): Specifies the language that is used in the resource part, expressed according to the BCP47 recommendation. See language.languageVariety
(Mandatory if applicable): Relates a language resource that contains segments in a language variety (e.g., dialect, jargon) to it. Please use for dialect corpora.modalityType
(Recommended if applicable): Specifies the type of the modality represented in the resource.ImageGenre
(Recommended): A category of images characterized by a particular style, form, or content according to a specific classification scheme. See ImageGenre.typeOfImageContent
(Mandatory): Main type of object or people represented in the image.
Example
<ms:CorpusImagePart>
<ms:corpusMediaType>CorpusImagePart</ms:corpusMediaType>
<ms:mediaType>http://w3id.org/meta-share/meta-share/image</ms:mediaType>
<ms:lingualityType>http://w3id.org/meta-share/meta-share/monolingual</ms:lingualityType>
<ms:language>
<ms:languageTag>el</ms:languageTag>
<ms:languageId>el</ms:languageId>
</ms:language>
<ms:ImageGenre>
<ms:CategoryLabel>comics</ms:CategoryLabel>
</ms:ImageGenre>
<ms:typeOfImageContent>human figures</ms:typeOfVideoContent>
</ms:CorpusImagePart>
TextGenre¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.CorpusMediaPart.CorpusTextPart.TextGenre
Data type component
Optionality Recommended
Explanation & Instructions
A category of text characterized by a particular style, form, or content according to a specific classification scheme
You can add only a free text value at the CategoryLabel
element; if you have used a value from an established controlled vocabulary, you can use the TextGenreIdentifier
and the attribute TextGenreClassificationScheme
.
Example
<ms:TextGenre>
<ms:CategoryLabel>movie subtitles</ms:CategoryLabel>
</ms:TextGenre>
<ms:TextGenre>
<ms:CategoryLabel>news articles</ms:CategoryLabel>
</ms:TextGenre>
AudioGenre¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.CorpusMediaPart.CorpusAudioPart
Data type component
Optionality Recommended if applicable
Explanation & Instructions
A category of audio characterized by a particular style, form, or content according to a specific classification scheme
You can add only a free text value at the CategoryLabel
element; if you have used a value from an established controlled vocabulary, you can use the AudioGenreIdentifier
and the attribute AudioGenreClassificationScheme
to provide further details.
Example
<ms:AudioGenre>
<ms:CategoryLabel>conference noises</ms:CategoryLabel>
</ms:AudioGenre>
SpeechGenre¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.CorpusMediaPart.CorpusAudioPart.SpeechGenre
Data type component
Optionality Recommended if applicable
Explanation & Instructions
A category for the conventionalized discourse of the speech part of a language resource, based on extra-linguistic and internal linguistic criteria
You can add only a free text value at the CategoryLabel
element; if you have used a value from an established controlled vocabulary, you can use the SpeechGenreIdentifier
and the attribute SpeechGenreClassificationScheme
to provide further details.
Example
<ms:SpeechGenre>
<ms:CategoryLabel>broadcast news</ms:CategoryLabel>
</ms:SpeechGenre>
<ms:SpeechGenre>
<ms:CategoryLabel>monologue</ms:CategoryLabel>
</ms:SpeechGenre>
VideoGenre¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.CorpusMediaPart.CorpusVideoPart.VideoGenre
Data type string (+ id + scheme)
Optionality Recommended if applicable
Explanation & Instructions
A classification of video parts based on extra-linguistic and internal linguistic criteria and reflected on the video style, form or content
You can add only a free text value at the CategoryLabel
element; if you have used a value from an established controlled vocabulary, you can use the VideoGenreIdentifier
and the attribute VideoClassificationScheme
Example
<ms:videoGenre>
<ms:CategoryLabel>documentaries</ms:CategoryLabel>
</ms:videoGenre>
<ms:videoGenre>
<ms:CategoryLabel>video lectures</ms:CategoryLabel>
</ms:videoGenre>
ImageGenre¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.CorpusMediaPart.CorpusImagePart.ImageGenre
Data type component
Optionality Recommended
Explanation & Instructions
A category of images characterized by a particular style, form, or content according to a specific classification scheme
You can add only a free text value at the CategoryLabel
element; if you have used a value from an established controlled vocabulary, you can use the ImageGenreIdentifier
and the attribute ImageClassificationScheme
to provide further details.
Example
<ms:imageGenre>
<ms:CategoryLabel>human faces</ms:CategoryLabel>
</ms:imageGenre>
<ms:imageGenre>
<ms:CategoryLabel>landscape</ms:CategoryLabel>
</ms:imageGenre>
DatasetDistribution¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.DatasetDistribution
Data type component
Optionality Mandatory
Explanation & Instructions
Any form with which a dataset is distributed, such as a downloadable form in a specific format (e.g., spreadsheet, plain text , etc.) or an API with which it can be accessed
You can repeat the element for multiple distributions.
The list of mandatory and recommended elements are:
DatasetDistributionForm
(Mandatory): The form (medium/channel) used for distributing a language resource consisting of data (e.g., a corpus, a lexicon, etc.). The typical values are ‘downloadable’, ‘accessibleThroughInterface’, ‘accessibleThroughQuery’ (see more at DatasetDistributionForm).downloadLocation
(Mandatory if applicable): A URL where the language resource (mainly data but also downloadable software programmes or forms) can be downloaded from. Use this element if the value ofDatasetDistributionForm
is ‘downloadable’ and only for direct download links (i.e., from which the dataset is downloaded without the need of further actions such as clicks on a page).accessLocation
(Mandatory if applicable): A URL where the resource can be accessed from; it can be used for landing pages or for cases where the resource is accessible via an interface, i.e. cases where the resource itself is not provided with a direct link for downloading. Use if the value ofDatasetDistributionForm
is ‘accessibleThroughInterface’ or ‘accessibleThroughQuery’ but also for links used for downloading corpora which are mentioned on a landing page or require some kind of action on the part of the user.samplesLocation
(Recommended): Links a resource to a url (or url’s) with samples of a data resource or of the input of output resource of a tool/service.licenceTerms
(Mandatory): See licenceTermscost
(Mandatory if applicable): Introduces the cost for accessing a resource, formally described as a set of amount and currency unit. Please use only for resources available at a cost and not for free resources.
Depending on the parts of the corpus, you must also use one or more of the following:
distributionTextFeature
: See distributionTextFeaturedistributionAudioFeature
: See distributionAudioFeaturedistributionVideoFeature
: See distributionVideoFeaturedistributionImageFeature
: See distributionImageFeature
Example
<ms:DatasetDistribution>
<ms:DatasetDistributionForm>http://w3id.org/meta-share/meta-share/downloadable</ms:DatasetDistributionForm>
<ms:accessLocation>https://www.someAccessURL.com</ms:accessLocation>
<ms:samplesLocation>https://www.URLwithsamples.com</ms:samplesLocation>
<ms:distributionTextFeature>
<ms:size>
<ms:amount>17601</ms:amount>
<ms:sizeUnit>http://w3id.org/meta-share/meta-share/unit</ms:sizeUnit>
</ms:size>
<ms:dataFormat>http://w3id.org/meta-share/omtd-share/Xml</ms:dataFormat>
<ms:characterEncoding>http://w3id.org/meta-share/meta-share/UTF-8</ms:characterEncoding>
</ms:distributionTextFeature>
<ms:licenceTerms>
<ms:licenceTermsName xml:lang="en">openUnder-PSI</ms:licenceTermsName>
<ms:licenceTermsURL>https://elrc-share.eu/terms/openUnderPSI.html</ms:licenceTermsURL>
</ms:licenceTerms>
</ms:DatasetDistribution>
<ms:DatasetDistribution>
<ms:DatasetDistributionForm>http://w3id.org/meta-share/meta-share/accessibleThroughInterface</ms:DatasetDistributionForm>
<ms:accessLocation>https://www.someAccessURL.com</ms:accessLocation>
<ms:distributionTextFeature>
<ms:size>
<ms:amount>100</ms:amount>
<ms:sizeUnit>http://w3id.org/meta-share/meta-share/text1</ms:sizeUnit>
</ms:size>
<ms:dataFormat>http://w3id.org/meta-share/omtd-share/Pdf</ms:dataFormat>
<ms:characterEncoding>http://w3id.org/meta-share/meta-share/UTF-8</ms:characterEncoding>
</ms:distributionTextFeature>
<ms:licenceTerms>
<ms:licenceTermsName xml:lang="en">some commercial licence</ms:licenceTermsName>
<ms:licenceTermsURL>https://elrc-share.eu/terms/someCommercialLicence.html</ms:licenceTermsURL>
</ms:licenceTerms>
<ms:cost>
<ms:amount>10000</ms:amount>
<ms:currency>http://w3id.org/meta-share/meta-share/euro</ms:currency>
</ms:cost>
</ms:DatasetDistribution>
distributionTextFeature¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.DatasetDistribution.distributionTextFeature
Data type component
Optionality Mandatory if applicable
Explanation & Instructions
Links to a feature that can be used for describing distinct distributable forms of text resources/parts
The following are mandatory or recommended:
size
(Mandatory): The size of the text part, expressed as a combination ofamount
andsizeUnit
(with a value from a CV for sizeUnit).dataFormat
(Mandatory): Indicates the format(s) of a data resource; it takes a value from a CV (dataFormat); the dataFormat includes the IANA mimetype and pointers to additional documentation for specialized formats (e.g., GATE XML, CONLL formats, etc.).characterEncoding
(Recommended): Specifies the character encoding used for a language resource data distribution.
Example
<ms:distributionTextFeature>
<ms:size>
<ms:amount>9139</ms:amount>
<ms:sizeUnit>http://w3id.org/meta-share/meta-share/sentence</ms:sizeUnit>
</ms:size>
<ms:size>
<ms:amount>40</ms:amount>
<ms:sizeUnit>http://w3id.org/meta-share/meta-share/file</ms:sizeUnit>
</ms:size>
<ms:dataFormat>http://w3id.org/meta-share/omtd-share/Xml</ms:dataFormat>
<ms:characterEncoding>http://w3id.org/meta-share/meta-share/UTF-8</ms:characterEncoding>
</ms:distributionTextFeature>
distributionAudioFeature¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.DatasetDistribution.distributionAudioFeature
Data type component
Optionality Mandatory if applicable
Explanation & Instructions
Links to a feature that can be used for describing distinct distributable forms of audio resources/parts
The following are mandatory or recommended:
size
(Mandatory): The size of the audio part, expressed as a combination ofamount
andsizeUnit
(with a value from a CV for sizeUnit).durationOfAudio
(Recommended): Specifies the duration of the audio recording including silences, music, pauses, etc., expressed as a combination ofamount
anddurationUnit
(with a value from the CV for durationUnit).durationOfEffectiveSpeech
(Recommended): Specifies the duration of effective speech of the audio (part of a) resource, expressed as a combination ofamount
anddurationUnit
(with a value from the CV for durationUnit).audioFormat
(Mandatory): Indicates the format(s) of the audio (part of a) data resource, expressed as a value ofdataFormat
(with a value from a CV for dataFormat) andcompressed
.
Example
<ms:distributionAudioFeature>
<ms:size>
<ms:amount>10</ms:amount>
<ms:sizeUnit>http://w3id.org/meta-share/meta-share/file</ms:sizeUnit>
</ms:size>
<ms:durationOfAudio>
<ms:amount>3</ms:amount>
<ms:durationUnit>http://w3id.org/meta-share/meta-share/hour</ms:durationUnit>
</ms:durationOfAudio>
<ms:audioFormat>
<ms:dataFormat>http://w3id.org/meta-share/omtd-share/wav</ms:dataFormat>
<ms:compressed>true</ms:compressed>
</ms:audioFormat>
</ms:distributionAudioFeature>
distributionVideoFeature¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.DatasetDistribution.distributionVideoFeature
Data type component
Optionality Mandatory if applicable
Explanation & Instructions
Links to a feature that can be used for describing distinct distributable forms of video resources/parts
The following are mandatory or recommended:
size
(Mandatory): The size of the video part, expressed as a combination ofamount
andsizeUnit
(with a value from a CV for sizeUnit).durationOfVideo
(Recommended): Specifies the duration of the video recording, expressed as a combination ofamount
anddurationUnit
(with a value from the CV for durationUnit).videoFormat
(Mandatory): Indicates the format(s) of the video (part of a) data resource, expressed as a value ofdataFormat
(with a value from a CV for dataFormat) andcompressed
.
Example
<ms:distributionVideoFeature>
<ms:size>
<ms:amount>9139</ms:amount>
<ms:sizeUnit>http://w3id.org/meta-share/meta-share/screen</ms:sizeUnit>
</ms:size>
<ms:size>
<ms:amount>40</ms:amount>
<ms:sizeUnit>http://w3id.org/meta-share/meta-share/file</ms:sizeUnit>
</ms:size>
<ms:durationOfVideo>
<ms:amount>40</ms:amount>
<ms:durationUnit>http://w3id.org/meta-share/meta-share/hour</ms:durationUnit>
</ms:durationOfVideo>
<ms:videoFormat>
<ms:dataFormat>http://w3id.org/meta-share/omtd-share/wav</ms:dataFormat>
<ms:compressed>true</ms:compressed>
</ms:videoFormat>
distributionImageFeature¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.DatasetDistribution.distributionImageFeature
Data type component
Optionality Mandatory if applicable
Explanation & Instructions
Links to a feature that can be used for describing distinct distributable forms of image resources/parts
The following are mandatory or recommended:
size
(Mandatory): The size of the image part, expressed as a combination ofamount
andsizeUnit
(with a value from a CV for sizeUnit).imageFormat
(Mandatory): Indicates the format(s) of the image (part of a) data resource, expressed as a value ofdataFormat
(with a value from a CV for dataFormat) andcompressed
.
Example
<ms:distributionImageFeature>
<ms:size>
<ms:amount>100</ms:amount>
<ms:sizeUnit>http://w3id.org/meta-share/meta-share/file</ms:sizeUnit>
</ms:size>
<ms:imageFormat>
<ms:dataFormat>http://w3id.org/meta-share/omtd-share/Pdf</ms:dataFormat>
<ms:compressed>true</ms:compressed>
</ms:imageFormat>
personalDataIncluded¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.personalDataIncluded
Data type boolean
Optionality Mandatory
Explanation & Instructions
Specifies whether the language resource contains personal data (mainly in the sense falling under the GDPR)
If the resource contains personal data, you can use the (optional) personalDataDetails
to provide more information
Example
<ms:personalDataIncluded>true</ms:personalDataIncluded>
<ms:personalDataDetails>The corpus contains data on the place of living and place of birth of participants</ms:personalDataDetails>
sensitiveDataIncluded¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.sensitiveDataIncluded
Data type boolean
Optionality Mandatory
Explanation & Instructions
Specifies whether the language resource contains sensitive data (e.g., medical/health-related, etc.) and thus requires special handling
If the resource contains sensitive data, you can use the (optional) sensitiveDataDetails
to provide more information.
Example
<ms:sensitiveDataIncluded>true</ms:sensitiveDataIncluded>
<ms:sensitiveDataDetails>The corpus contains medical data for persons with disabilities</ms:sensitiveDataDetails>
anonymized¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.anonymized
Data type boolean
Optionality Mandatory if applicable
Explanation & Instructions
Indicates whether the language resource has been anonymized
The element is mandatory if either personalDataIncluded
or sensitiveDataIncluded
have ‘true’ as value; anonymizationDetails
must also be filled in with information on the anonymization mehod, etc.
Example
<ms:anonymized>true</ms:anonmized>
<ms:anonymizationDetails>pseudonymization performed manually</ms:anonymizationDetails>
annotation¶
Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.annotation
Data type component
Optionality Mandatory if applicable
Explanation & Instructions
Links a corpus to its annotated part(s)
You must use it for annotated corpora and annotations. You can repeat it for corpora that have separate files for each annotation type, or if you want to given information such as the use of different annotation tools for each annotation level.
Enter at least the annotation type(s); if you want, you can give a more detailed description of the annotated parts - see the annotation component of the full schema.
Example
<ms:annotation>
<ms:annotationType>http://w3id.org/meta-share/omtd-share/Lemma</ms:annotationType>
<ms:annotationStandoff>false</ms:annotationStandoff>
<ms:annotationMode>http://w3id.org/meta-share/meta-share/mixed</ms:annotationMode>
<ms:isAnnotatedBy>
<ms:resourceName xml:lang="en">Lemmatizer</ms:resourceName>
</ms:isAnnotatedBy>
</ms:annotation>
<ms:annotation>
<ms:annotationType>http://w3id.org/meta-share/omtd-share/PartOfSpeech</ms:annotationType>
<ms:annotationStandoff>false</ms:annotationStandoff>
<ms:tagset>
<ms:resourceName xml:lang="en">Universal Dependencies</ms:resourceName>
</ms:tagset>
<ms:isAnnotatedBy>
<ms:resourceName xml:lang="en">PoS tagger</ms:resourceName>
</ms:isAnnotatedBy>
</ms:annotation>
<ms:annotation>
<ms:annotationType>http://w3id.org/meta-share/omtd-share/SyntacticAnnotationType</ms:annotationType>
</ms:annotation>