Contribute a model¶
This page describes how to contribute a model to the European Language Grid. You can describe a model and upload its contents at ELG or include in its description a link to the location it can be accessed from.
0. Before you start¶
Please make sure that the model you want to contribute complies with our terms of use.
Please make sure you have registered and been assigned the provider role.
1. Prepare the content files (for ELG hosted resources)¶
If you wish to upload the model at ELG, you must package it in a compressed format (currently as a .zip file).
If the files are available in multiple formats, (e.g. in XML, TXT and PDF formats), you are advised to package them in different zip files by data format.
2. Describe the model¶
Metadata overview¶
The model must be described according to the ELG schema and comply at least with the minimal version. The metadata elements that you need to provide for the model comprise a set of elements organized (for presentation purposes) into the following groups:
Examples¶
Example 1: Machine Learning (ML) model¶
English Model (CoNLL-2003) for NameTag
<?xml version="1.0" encoding="UTF-8"?>
<ms:MetadataRecord xmlns:ms="http://w3id.org/meta-share/meta-share/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://w3id.org/meta-share/meta-share/ ../Schema/ELG-SHARE.xsd">
<ms:DescribedEntity>
<ms:LanguageResource>
<ms:entityType>LanguageResource</ms:entityType>
<ms:resourceName xml:lang="en">English Model (CoNLL-2003) for NameTag</ms:resourceName>
<ms:description xml:lang="en">English model for NameTag, a named entity recognition tool. The model is trained on CoNLL-2003 training data. Recognizes PER, ORG, LOC and MISC named entities. Achieves F-measure 84.73 on CoNLL-2003 test data.</ms:description>
<ms:version>undefined</ms:version>
<ms:additionalInfo>
<ms:landingPage>http://hdl.handle.net/11234/1-3118</ms:landingPage>
</ms:additionalInfo>
<ms:keyword xml:lang="en">NameTag</ms:keyword>
<ms:keyword xml:lang="en">English</ms:keyword>
<ms:keyword xml:lang="en">named entity recognition</ms:keyword>
<ms:resourceProvider>
<ms:Organization>
<ms:actorType>Organization</ms:actorType>
<ms:organizationName xml:lang="en">Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)</ms:organizationName>
</ms:Organization>
</ms:resourceProvider>
<ms:publicationDate>2014-04-08</ms:publicationDate>
<ms:resourceCreator>
<ms:Person>
<ms:actorType>Person</ms:actorType>
<ms:surname xml:lang="en">Straka</ms:surname>
<ms:givenName xml:lang="en">Milan</ms:givenName>
</ms:Person>
</ms:resourceCreator>
<ms:resourceCreator>
<ms:Person>
<ms:actorType>Person</ms:actorType>
<ms:surname xml:lang="en">Straková</ms:surname>
<ms:givenName xml:lang="en">Jana</ms:givenName>
</ms:Person>
</ms:resourceCreator>
<ms:fundingProject>
<ms:projectName xml:lang="en">LINDAT/CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat</ms:projectName>
<ms:grantNumber>LM2010013</ms:grantNumber>
<ms:fundingType>http://w3id.org/meta-share/meta-share/nationalFunds</ms:fundingType>
<ms:funder>
<ms:Organization>
<ms:actorType>Organization</ms:actorType>
<ms:organizationName xml:lang="en">Ministerstvo školství, mládeže a tělovýchovy České republiky</ms:organizationName>
</ms:Organization>
</ms:funder>
</ms:fundingProject>
<ms:fundingProject>
<ms:projectName xml:lang="en">Integrace jazykových zdrojů za účelem extrakce informací z přirozených textů</ms:projectName>
<ms:grantNumber>1ET101120503</ms:grantNumber>
<ms:fundingType>http://w3id.org/meta-share/meta-share/nationalFunds</ms:fundingType>
<ms:funder>
<ms:Organization>
<ms:actorType>Organization</ms:actorType>
<ms:organizationName xml:lang="en">Grantová agentura Akademie věd České republiky</ms:organizationName>
</ms:Organization>
</ms:funder>
</ms:fundingProject>
<ms:fundingProject>
<ms:projectName xml:lang="en">Teoretické základy informatiky a výpočetní lingvistiky</ms:projectName>
<ms:grantNumber>SVV 267 314</ms:grantNumber>
<ms:fundingType>http://w3id.org/meta-share/meta-share/nationalFunds</ms:fundingType>
<ms:funder>
<ms:Organization>
<ms:actorType>Organization</ms:actorType>
<ms:organizationName xml:lang="en">Univerzita Karlova v Praze (mimo GAUK)</ms:organizationName>
</ms:Organization>
</ms:funder>
</ms:fundingProject>
<ms:fundingProject>
<ms:projectName xml:lang="en">LINDAT/CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat</ms:projectName>
<ms:grantNumber>LM2015071</ms:grantNumber>
<ms:fundingType>http://w3id.org/meta-share/meta-share/nationalFunds</ms:fundingType>
<ms:funder>
<ms:Organization>
<ms:actorType>Organization</ms:actorType>
<ms:organizationName xml:lang="en">Ministerstvo školství, mládeže a tělovýchovy České republiky</ms:organizationName>
</ms:Organization>
</ms:funder>
</ms:fundingProject>
<ms:fundingProject>
<ms:projectName xml:lang="en">LINDAT/CLARIN - Výzkumná infrastruktura pro jazykové technologie - rozšíření repozitáře a výpočetní kapacity</ms:projectName>
<ms:grantNumber>CZ.02.1.01/0.0/0.0/16_013/0001781</ms:grantNumber>
<ms:fundingType>http://w3id.org/meta-share/meta-share/nationalFunds</ms:fundingType>
<ms:funder>
<ms:Organization>
<ms:actorType>Organization</ms:actorType>
<ms:organizationName xml:lang="en">Ministerstvo školství, mládeže a tělovýchovy České republiky</ms:organizationName>
</ms:Organization>
</ms:funder>
</ms:fundingProject>
<ms:LRSubclass>
<ms:LanguageDescription>
<ms:lrType>LanguageDescription</ms:lrType>
<ms:LanguageDescriptionSubclass>
<ms:MLModel>
<ms:ldSubclassType>MlModel</ms:ldSubclassType>
</ms:MLModel>
</ms:LanguageDescriptionSubclass>
<ms:LanguageDescriptionMediaPart>
<ms:LanguageDescriptionTextPart>
<ms:ldMediaType>LanguageDescriptionTextPart</ms:ldMediaType>
<ms:mediaType>http://w3id.org/meta-share/meta-share/text</ms:mediaType>
<ms:lingualityType>http://w3id.org/meta-share/meta-share/monolingual</ms:lingualityType>
<ms:multilingualityType>http://w3id.org/meta-share/meta-share/unspecified</ms:multilingualityType>
<ms:language>
<ms:languageTag>en</ms:languageTag>
<ms:languageId>en</ms:languageId>
</ms:language>
</ms:LanguageDescriptionTextPart>
</ms:LanguageDescriptionMediaPart>
<ms:DatasetDistribution>
<ms:DatasetDistributionForm>http://w3id.org/meta-share/meta-share/downloadable</ms:DatasetDistributionForm>
<ms:downloadLocation>https://lindat.mff.cuni.cz/repository/xmlui/bitstream/11234/1-3118/1/english-conll-140408.zip</ms:downloadLocation>
<ms:accessLocation>http://hdl.handle.net/11234/1-3118</ms:accessLocation>
<ms:samplesLocation>http://lindat.mff.cuni.cz/services/nametag/</ms:samplesLocation>
<ms:distributionTextFeature>
<ms:size>
<ms:amount>9.1</ms:amount>
<ms:sizeUnit>http://w3id.org/meta-share/meta-share/mb</ms:sizeUnit>
</ms:size>
<ms:dataFormat>http://w3id.org/meta-share/omtd-share/BinaryFormat</ms:dataFormat>
</ms:distributionTextFeature>
<ms:licenceTerms>
<ms:licenceTermsName xml:lang="en">Creative Commons Attribution Non Commercial Share Alike 4.0 International</ms:licenceTermsName>
<ms:licenceTermsURL>http://creativecommons.org/licenses/by-nc-sa/4.0/</ms:licenceTermsURL>
<ms:conditionOfUse>http://w3id.org/meta-share/meta-share/attribution</ms:conditionOfUse>
<ms:conditionOfUse>http://w3id.org/meta-share/meta-share/nonCommercialUse</ms:conditionOfUse>
<ms:conditionOfUse>http://w3id.org/meta-share/meta-share/shareAlike</ms:conditionOfUse>
</ms:licenceTerms>
</ms:DatasetDistribution>
<ms:personalDataIncluded>false</ms:personalDataIncluded>
<ms:sensitiveDataIncluded>false</ms:sensitiveDataIncluded>
</ms:LanguageDescription>
</ms:LRSubclass>
</ms:LanguageResource>
</ms:DescribedEntity>
</ms:MetadataRecord>
Example 2: N-gram model¶
PANACEA Environment Corpus n-grams EL (Greek)
Published at https://live.european-language-grid.eu/catalogue/#/resource/service/ld/900
<?xml version="1.0" encoding="UTF-8"?>
<ms:MetadataRecord xmlns:ms="http://w3id.org/meta-share/meta-share/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://w3id.org/meta-share/meta-share/ ../Schema/ELG-SHARE.xsd">
<ms:DescribedEntity>
<ms:LanguageResource>
<ms:entityType>LanguageResource</ms:entityType>
<ms:resourceName xml:lang="en">PANACEA Environment Corpus n-grams EL (Greek)</ms:resourceName>
<ms:description xml:lang="en">PANACEA Environment Corpus n-grams EL (Greek) 1.0 contains Greek word n-grams and Greek word/tag/lemma n-grams in the "Environment" (ENV) domain. N-grams are accompanied by their observed frequency counts. The length of the n-grams ranges from unigrams (single words) to five-grams. The data were collected in the context of PANACEA (http://www.panacea-lr.eu), an EU-FP7 Funded Project under Grant Agreement 248064.
The n-gram counts were generated from crawled Web pages that were automatically detected to be in the Greek language and were automatically classified as relevant to the ENV domain. The collection consisted of approximately 31.71 million tokens. Data collection took place in the summer of 2011.</ms:description>
<ms:version>1.0</ms:version>
<ms:additionalInfo>
<ms:landingPage>http://nlp.ilsp.gr/panacea/D4.3/data/201209/gms/env_el/README.txt</ms:landingPage>
</ms:additionalInfo>
<ms:additionalInfo>
<ms:email>contact@someDomain.com</ms:email>
</ms:additionalInfo>
<ms:contact>
<ms:Person>
<ms:actorType>Person</ms:actorType>
<ms:surname xml:lang="en">Prokopidis</ms:surname>
<ms:givenName xml:lang="en">Prokopis</ms:givenName>
<ms:email>contact@someDomain.com</ms:email>
</ms:Person>
</ms:contact>
<ms:contact>
<ms:Person>
<ms:actorType>Person</ms:actorType>
<ms:surname xml:lang="en">Papavassiliou</ms:surname>
<ms:givenName xml:lang="en">Vassilis</ms:givenName>
<ms:email>contact@someDomain.com</ms:email>
</ms:Person>
</ms:contact>
<ms:keyword xml:lang="en">corpus</ms:keyword>
<ms:domain>
<ms:categoryLabel xml:lang="en">environment</ms:categoryLabel>
</ms:domain>
<ms:resourceCreator>
<ms:Organization>
<ms:actorType>Organization</ms:actorType>
<ms:organizationName xml:lang="en">Institute for Language and Speech Processing</ms:organizationName>
<ms:website>http://www.ilsp.gr</ms:website>
</ms:Organization>
</ms:resourceCreator>
<ms:creationStartDate>2011-06-01</ms:creationStartDate>
<ms:creationEndDate>2011-08-31</ms:creationEndDate>
<ms:fundingProject>
<ms:projectName xml:lang="en">Platform for Automatic, Normalized Annotation and Cost-Effective Acquisition of Language Resources for Human Language </ms:projectName>
<ms:website>http://www.panacea-lr.eu</ms:website>
</ms:fundingProject>
<ms:LRSubclass>
<ms:LanguageDescription>
<ms:lrType>LanguageDescription</ms:lrType>
<ms:LanguageDescriptionSubclass>
<ms:NGramModel>
<ms:ldSubclassType>NGramModel</ms:ldSubclassType>
<ms:baseItem>http://w3id.org/meta-share/meta-share/word</ms:baseItem>
<ms:order>5</ms:order>
</ms:NGramModel>
</ms:LanguageDescriptionSubclass>
<ms:LanguageDescriptionMediaPart>
<ms:LanguageDescriptionTextPart>
<ms:ldMediaType>LanguageDescriptionTextPart</ms:ldMediaType>
<ms:mediaType>http://w3id.org/meta-share/meta-share/text</ms:mediaType>
<ms:lingualityType>http://w3id.org/meta-share/meta-share/monolingual</ms:lingualityType>
<ms:language>
<ms:languageTag>el</ms:languageTag>
<ms:languageId>el</ms:languageId>
</ms:language>
<ms:metalanguage>
<ms:languageTag>und</ms:languageTag>
<ms:languageId>und</ms:languageId>
</ms:metalanguage>
<ms:creationDetails xml:lang="en">automatic web crawling, automatic language detection, data preprocessing (boilerpipe filtering, lemmatization & tagging)</ms:creationDetails>
</ms:LanguageDescriptionTextPart>
</ms:LanguageDescriptionMediaPart>
<ms:DatasetDistribution>
<ms:DatasetDistributionForm>http://w3id.org/meta-share/meta-share/downloadable</ms:DatasetDistributionForm>
<ms:accessLocation>http://metashare.ilsp.gr:8080/repository/download/490952dc1cec11e2b545842b2b6a04d78dc202de28d5421f91752610a781175e</ms:accessLocation>
<ms:distributionTextFeature>
<ms:size>
<ms:amount>435189</ms:amount>
<ms:sizeUnit>http://w3id.org/meta-share/meta-share/unigram</ms:sizeUnit>
</ms:size>
<ms:size>
<ms:amount>3.860716E6</ms:amount>
<ms:sizeUnit>http://w3id.org/meta-share/meta-share/bigram</ms:sizeUnit>
</ms:size>
<ms:size>
<ms:amount>9.767383E6</ms:amount>
<ms:sizeUnit>http://w3id.org/meta-share/meta-share/trigram</ms:sizeUnit>
</ms:size>
<ms:size>
<ms:amount>1.368394E7</ms:amount>
<ms:sizeUnit>http://w3id.org/meta-share/meta-share/four-gram</ms:sizeUnit>
</ms:size>
<ms:size>
<ms:amount>1.495402E7</ms:amount>
<ms:sizeUnit>http://w3id.org/meta-share/meta-share/five-gram</ms:sizeUnit>
</ms:size>
<ms:dataFormat>http://w3id.org/meta-share/omtd-share/Text</ms:dataFormat>
</ms:distributionTextFeature>
<ms:licenceTerms>
<ms:licenceTermsName xml:lang="en">CC-BY-SA-4.0</ms:licenceTermsName>
<ms:licenceTermsURL>https://spdx.org/licenses/CC-BY-SA-4.0.html</ms:licenceTermsURL>
<ms:conditionOfUse>http://w3id.org/meta-share/meta-share/attribution</ms:conditionOfUse>
<ms:conditionOfUse>http://w3id.org/meta-share/meta-share/shareAlike</ms:conditionOfUse>
</ms:licenceTerms>
<ms:attributionText xml:lang="en">This LR has been created by Athena R.C./ILSP (www.ilsp.gr) and is licensed under a CC-BY-SA licence</ms:attributionText>
</ms:DatasetDistribution>
<ms:personalDataIncluded>false</ms:personalDataIncluded>
<ms:sensitiveDataIncluded>false</ms:sensitiveDataIncluded>
</ms:LanguageDescription>
</ms:LRSubclass>
</ms:LanguageResource>
</ms:DescribedEntity>
</ms:MetadataRecord>
3. Register the model at ELG¶
The current release of ELG offers two options for registering a catalogue item:
the ELG interactive editor (see Use the interactive editor)
the upload of a metadata file that conforms to the ELG schema in XML format (see Create and upload metadata files).
To upload the content files for the model, you can follow the procedure described here.
4. Manage and submit for publication¶
Through the “My items” page you can access your metadata record (see Manage your items) and edit it until you are satisfied. You can then submit it for publication, in line with the publication lifecycle defined for ELG metadata records.
At this stage, the metadata record can no longer be edited and is only visible to you and to us, the ELG platform administrators.
Before it is published, your submission undergoes a validation process, which is described in detail at CHAPTER 4: VALIDATING ITEMS.
Once approved, it will appear on the ELG catalogue and you will receive a notification email.