Describe a lexical/conceptual resource (lexicon, glossary, ontology, etc.)

In this section you will find information on how to describe a language description (model, grammar) with the minimal metadata in order to register it in the ELG platform. If you want to find more on the ELG resource types, see Overview. You will also find instructions for all data resources(technical requirements, registration instructions to the platform) in Provide a Language Resource.

Examples of lexical/conceptual resources include

  • computational lexica, that are used for computational processing, and include morphological, syntactic and semantic information;
  • dictionaries in digital format,
  • ontologies and controlled vocabularies,
  • monolingual and multilingual terminological glossaries,
  • word lists, gazetteers of place names, proper names, etc.

They typically consist of a text part, but they may also comprise audio and video files, as in the case of:

  • multimedia lexica with sound recordings (e.g., pronunciation of a word) and images (e.g. pictures denoting the sense of a word),
  • sign language lexica with videos.

Examples of metadata records for lexical/conceptual resources

Terminological lexicon: INTERA Corpus - the Bulgarian-English terms from the BG-EN pair

Published at:

<?xml version="1.0" encoding="UTF-8"?>
<ms:MetadataRecord xmlns="" xmlns:datacite="" xmlns:dcat="" xmlns:ms="" xmlns:omtd="" xmlns:xsi="" xsi:schemaLocation=" ../../Schema/ELG-SHARE.xsd">
        <ms:MetadataRecordIdentifier ms:MetadataRecordIdentifierScheme="">value automatically assigned - leave as is</ms:MetadataRecordIdentifier>
                <ms:surname xml:lang="en">Smith</ms:surname>
                <ms:givenName xml:lang="en">John</ms:givenName>
                <ms:surname xml:lang="en">Smith</ms:surname>
                <ms:givenName xml:lang="en">John</ms:givenName>
                        <ms:resourceName xml:lang="en">INTERA Corpus - the Bulgarian-English terms from the BG-EN pair</ms:resourceName>
                        <ms:description xml:lang="en">The Bulgarian-English terms from the BG-EN pair of the INTERA corpus; written language, domain specific (law, education).</ms:description>
                        <ms:version>v1.0.0 (automatically assigned)</ms:version>
                                        <ms:surname xml:lang="en">Gavrilidou</ms:surname>
                                        <ms:givenName xml:lang="en">Maria</ms:givenName>
                        <ms:keyword xml:lang="en">lexicalconceptualresource</ms:keyword>
                                <ms:categoryLabel xml:lang="en">education</ms:categoryLabel>
                                <ms:categoryLabel xml:lang="en">law</ms:categoryLabel>
                                <ms:projectName xml:lang="en">Integrated European language data Repository Area</ms:projectName>
                                <ms:actualUseDetails xml:lang="en">nlpApplications</ms:actualUseDetails>
                                <ms:title xml:lang="en">Building Multilingual Terminological Resources</ms:title>
                                <ms:title xml:lang="en">Building parallel corpora for eContent professionals</ms:title>
                                <ms:title xml:lang="en">Language resources production models: the case of INTERA multilingual corpus and terminology</ms:title>
                                <ms:title xml:lang="en">D5.2 - Report on the multilingual resources production</ms:title>
                                <ms:DocumentIdentifier ms:DocumentIdentifierScheme=""></ms:DocumentIdentifier>
                                <ms:relationType xml:lang="en">isExtractedfrom</ms:relationType>
                                        <ms:resourceName xml:lang="en">INTERA corpus</ms:resourceName>
                                                        <ms:licenceTermsName xml:lang="en">CC-BY-4.0</ms:licenceTermsName>
                                                        <ms:LicenceIdentifier ms:LicenceIdentifierScheme="">ELG-ENT-LIC-270220-00000072</ms:LicenceIdentifier>
                                                <ms:attributionText xml:lang="en">The INTERA Corpus - the Bulgarian-English terms from the BG-EN pair of the ILSP/RC Athena licensed under CC-BY as accessed via META-SHARE</ms:attributionText>

Computational lexicon: MCL - Multifunctional Computational Lexicon of Contemporary Portuguese

Published at:

<?xml version="1.0" encoding="UTF-8"?>
<ms:MetadataRecord xmlns:ms="" xmlns:xsi="" xsi:schemaLocation=" ../../Schema/ELG-SHARE.xsd">
        <ms:MetadataRecordIdentifier ms:MetadataRecordIdentifierScheme="">value automatically assigned - leave as is</ms:MetadataRecordIdentifier>
                <ms:surname xml:lang="en">Smith</ms:surname>
                <ms:givenName xml:lang="en">John</ms:givenName>
                <ms:surname xml:lang="en">Smith</ms:surname>
                <ms:givenName xml:lang="en">John</ms:givenName>
                        <ms:resourceName xml:lang="en">MCL - Multifunctional Computational Lexicon of Contemporary Portuguese</ms:resourceName>
                        <ms:description xml:lang="en">MCL is a 26,443 lemma Frequency Lexicon with 140,315 tokens, with the minimum lemma frequency of 6, extracted from CORLEX, a contemporary Portuguese corpus (16,210,438 words). CORLEX is a subcorpus of the Reference Corpus of Contemporary Portuguese and contains written and spoken texts of several types, being genre diversity a characteristic of this corpus. CORLEX contains mainly journalistic texts (56% of the written corpus and 53% of the whole corpus). In order to extract the lexicon, all the different lexical forms occurring in the corpus were indexed and subsequently tagged morphosyntactically and lemmatised by PALAVROSO. Each lemma in MCL is followed by morphosyntactic and quantitative information. The same information is given regarding each lemma token (inflected forms and some compounds). The lexicon indexations are listed in alphabetical order or decreasing frequency order.</ms:description>
                        <ms:LRIdentifier ms:LRIdentifierScheme="">489-956-642-755-8</ms:LRIdentifier>
                        <ms:LRIdentifier ms:LRIdentifierScheme="">ELRA-L0096</ms:LRIdentifier>
                        <ms:keyword xml:lang="en">lexicalconceptualresource</ms:keyword>
                                                        <ms:licenceTermsName xml:lang="en">ELRA-VAR-ACADEMIC-MEMBER-COMMERCIALUSE-1.0</ms:licenceTermsName>
                                                                <ms:organizationName xml:lang="en">ELRA</ms:organizationName>
                                                        <ms:licenceTermsName xml:lang="en">ELRA-END-USER-ACADEMIC-MEMBER-NONCOMMERCIALUSE-1.0</ms:licenceTermsName>
                                                                <ms:organizationName xml:lang="en">ELRA</ms:organizationName>
                                                        <ms:licenceTermsName xml:lang="en">ELRA-VAR-COMMERCIAL-MEMBER-COMMERCIALUSE-1.0</ms:licenceTermsName>
                                                                <ms:organizationName xml:lang="en">ELRA</ms:organizationName>
                                                        <ms:licenceTermsName xml:lang="en">ELRA-END-USER-COMMERCIAL-MEMBER-NONCOMMERCIALUSE-1.0</ms:licenceTermsName>
                                                                <ms:organizationName xml:lang="en">ELRA</ms:organizationName>
                                                        <ms:licenceTermsName xml:lang="en">ELRA-VAR-ACADEMIC-NOMEMBER-COMMERCIALUSE-1.0</ms:licenceTermsName>
                                                                <ms:organizationName xml:lang="en">ELRA</ms:organizationName>
                                                        <ms:licenceTermsName xml:lang="en">ELRA-END-USER-ACADEMIC-NOMEMBER-NONCOMMERCIALUSE-1.0</ms:licenceTermsName>
                                                                <ms:organizationName xml:lang="en">ELRA</ms:organizationName>
                                                        <ms:licenceTermsName xml:lang="en">ELRA-VAR-COMMERCIAL-NOMEMBER-COMMERCIALUSE-1.0</ms:licenceTermsName>
                                                                <ms:organizationName xml:lang="en">ELRA</ms:organizationName>
                                                        <ms:licenceTermsName xml:lang="en">ELRA-END-USER-COMMERCIAL-NOMEMBER-NONCOMMERCIALUSE-1.0</ms:licenceTermsName>
                                                                <ms:organizationName xml:lang="en">ELRA</ms:organizationName>

Minimal version metadata for lexical/conceptual resources

The set of the metadata (mandatory or recommended) that are common to all kinds of resources including data language resources are presented in section Minimal version - List of elements common to all LRTs. In addition, the metadata elements that are required or recommended for lexical/conceptual resources are described below.

For a quick guide to the ELG template, see Template - Explanations.


Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LexicalConceptualResource

Data type component

Optionality Mandatory

Explanation & Instructions

Wraps together elements for lexical/conceptual resources




Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LexicalConceptualResource.lcrSubclass

Data type CV (lcrSubclass)

Optionality Recommended

Explanation & Instructions

Introduces a classification of lexical/conceptual resources into types (used for descriptive reasons)





Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LexicalConceptualResource.encodingLevel

Data type CV (encodingLevel)

Optionality Mandatory

Explanation & Instructions

Classifies the contents of a lexical/conceptual resource or language description as regards the linguistic level of analysis it caters for

You can repeat the element for multiple encoding levels.





Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LexicalConceptualResource.ContentType

Data type CV (ContentType)

Optionality Mandatory

Explanation & Instructions

A more detailed account of the linguistic information contained in the lexical/conceptual resource

You can repeat the element for multiple encoding levels.





Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.LexicalConceptualResource.ContentType

Data type CV (compliesWith)

Optionality Mandatory

Explanation & Instructions

Specifies the vocabulary/standard/best practice to which a resource is compliant with

You can repeat the element for multiple encoding levels.




Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.CorpusMediaPart.LexicalConceptualResourceTextPart

Data type component

Optionality Mandatory if applicable

Explanation & Instructions

A part (or whole set) of a lexical/conceptual resource that consists of textual elements

You can repeat the group of elements for multiple textual parts.

The mandatory or recommended elements for the text part of lexical/conceptual resources are:

  • mediaType (Mandatory): Specifies the media type of a language resource (the physical medium of the contents representation). For text parts, always use the value ‘text’.
  • lingualityType (Mandatory ): Indicates whether the resource includes one, two or more languages.
  • multilingualityType (Mandatory if applicable): Indicates whether the resource (part) is parallel, comparable or mixed. If lingualityType = bilingual or multilingual, it is required; select one of the values for parallel (e.g., original text and its translations), comparable (e.g. corpus of the same domain in multiple languages) and multilingualSingleText (for corpora that consist of segments including text in two or more languages (e.g., the transcription of a European Parliament session with MPs speaking in their native language).
  • language (Mandatory): Specifies the language that is used in the resource part , expressed according to the BCP47 recommendation. See language.
  • languageVariety (Mandatory if applicable): Relates a language resource that contains segments in a language variety (e.g., dialect, jargon) to it. Please use for dialect corpora.
  • metalanguage (Mandatory): Specifies the language that is used in the resource part , expressed according to the BCP47 recommendation. See language.
  • modalityType (Recommended if applicable): Specifies the type of the modality represented in the resource. For instance, you can use ‘spoken language’ to describe transcribed speech corpora.




Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.DatasetDistribution

Data type component

Optionality Mandatory

Explanation & Instructions

Any form with which a dataset is distributed, such as a downloadable form in a specific format (e.g., spreadsheet, plain text , etc.) or an API with which it can be accessed

You can repeat the element for multiple distributions.

The list of mandatory and recommended elements are:

  • DatasetDistributionForm (Mandatory): The form (medium/channel) used for distributing a language resource consisting of data (e.g., a corpus, a lexicon, etc.). The typical values are ‘downloadable’, ‘accessibleThroughInterface’, ‘accessibleThroughQuery’ (see more at DatasetDistributionForm).
  • downloadLocation (Mandatory if applicable): A URL where the language resource (mainly data but also downloadable software programmes or forms) can be downloaded from. Use this element if the value of datasetDistributionForm is ‘downloadable’ and only for direct download links (i.e., from which the dataset is downloaded without the need of further actions such as clicks on a page).
  • accessLocation (Mandatory if applicable): A URL where the resource can be accessed from; it can be used for landing pages or for cases where the resource is accessible via an interface, i.e. cases where the resource itself is not provided with a direct link for downloading. Use if the value of datasetDistributionForm is ‘accessibleThroughInterface’ or ‘accessibleThroughQuery’ but also for links used for downloading corpora which are mentioned on a landing page or require some kind of action on the part of the user.
  • licenceTerms (Mandatory): See licenceTerms.
  • cost (Mandatory if applicable): Introduces the cost for accessing a resource, formally described as a set of amount and currency unit. Please use only for resources available at a cost and not for free resources.

Depending on the parts of the corpus, you must also use one or more of the following:


                <ms:licenceTermsName xml:lang="en">openUnder-PSI</ms:licenceTermsName>

                <ms:licenceTermsName xml:lang="en">some commercial licence</ms:licenceTermsName>


Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.personalDataIncluded

Data type boolean

Optionality Mandatory

Explanation & Instructions

Specifies whether the language resource contains personal data (mainly in the sense falling under the GDPR)

If the resource contains personal data, you can use the (optional) personalDataDetails to provide more information.


<ms:personalDataDetails>The corpus contains data on the place of living and place of birth of participants</ms:personalDataDetails>


Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.sensitiveDataIncluded

Data type boolean

Optionality Mandatory

Explanation & Instructions

Specifies whether the language resource contains sensitive data (e.g., medical/health-related, etc.) and thus requires special handling

If the resource contains sensitive data, you can use the (optional) sensitiveDataDetails to provide more information.


<ms:sensitiveDataDetails>The corpus contains medical data for persons with disabilities</ms:sensitiveDataDetails>


Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.Corpus.anonymized

Data type boolean

Optionality Mandatory if applicable

Explanation & Instructions

Indicates whether the language resource has been anonymized

The element is mandatory if either personalDataIncluded or sensitiveDataIncluded have ‘true’ as value; anonymizationDetails must also be filled in with information on the anonymization mehod, etc.


<ms:anonymizationDetails>pseudonymization performed manually</ms:anonymizationDetails>