Provide a functional LT service

Currently, ELG supports the integration of tools/services that fall into one of the following broad categories:

  • Information Extraction (IE),
  • Text Classification (TC),
  • Machine Translation (MT),
  • Automatic Speech Recognition (ASR), and
  • Text to Speech Generation (TTS).

How an LT Service is integrated to ELG

An overview of the ELG platform is depicted below.

Platform overview

The following bullets summarize how LT services are deployed and invoked in ELG.

  • All LT Services (as well as all the other ELG components) are deployed (run as containers) on a Kubernetes (k8s) cluster; k8s is a system for automating deployment, scaling, and management of containerized applications.
  • All LT Services are integrated into ELG via the LT Service Execution Orchestrator/Server. This server exposes a common public REST API used for invoking any of the deployed backend LT Services. The public API is used from ELG’s Test/Trial UIs that are embedded in the ELT Catalogue; however, it can also be invoked from the command line or any programming language; see Test an LT service section for more information. Some of the HTTP endpoints that are offered in the API are given below; for more information see Public LT API specification.
Endpoint Type Consumes Produces
https://{domain}/execution/processText/{ltServiceID} POST ‘application/json’ ‘application/json’
https://{domain}/execution/processText/{ltServiceID} POST ‘text/plain’ or ‘text/html’ ‘application/json’
https://{domain}/execution/processAudio/{ltServiceID} POST ‘audio/x-wav’ or ‘audio/wav’ ‘application/json’
https://{domain}/execution/processAudio/{ltServiceID} POST ‘audio/mpeg’ ‘application/json’

{domain} is ‘live.european-language-grid.eu’ and {ltServiceID} is the ID of the backend LT service. This ID is assigned/configured during registration; see section Register an LT Service to the platform (‘LT Service is deployed to ELG and configured’ step).

Note

The REST API that is exposed from an LT Service X (see previous section) is for the communication between LT Service Execution Orchestrator Server and X (ELG Internal LT API).

  • When LT Service Execution Orchestrator receives a processing request for service X, it retrieves from the database X’s k8s REST endpoint and sends a request to it. This endpoint is configured/specified during the registration process; see section Register an LT Service to the platform (‘LT Service is deployed to ELG and configured’ step). When the Orchestrator gets the response from the LT Service, it returns it to the application/client that sent the initial call.

Technical Requirements

The requirements for integrating an LT tool/service are the following:

Expose an ELG compatible endpoint: You MUST create an application that exposes an HTTP endpoint for the provided LT tool(s). The application MUST consume (via the aforementioned HTTP endpoint) requests that follow the ELG JSON format, call the underlying LT tool and produce responses again in the ELG JSON format. For a detailed description of the JSON-based HTTP protocol (ELG Internal LT API) that you have to implement, see the Internal LT API specification annex.

Dockerization: You MUST dockerize the application and upload the respective image(s) in a Docker Registry, such as GitLab, DockerHub, Azure container registry etc. You MAY select out of the three following options, the one that best fits your needs:

  • LT tools packaged in one standalone image: One docker image is created that contains the application that exposes the ELG compatible endpoint and the actual LT tool.
  • LT tools running remotely outside the ELG infrastructure: For these tools, one proxy image is created that exposes one (or more) ELG compatible endpoints; the proxy container communicates with the actual LT service that runs outside the ELG infrastructure.
  • LT tools requiring an adapter: For tools that already offer an image that exposes a non-ELG compatible endpoint (HTTP-based or other), a second adapter image SHOULD be created that exposes an ELG-compatible endpoint and acts as proxy to the container that hosts the actual LT tool.

In the following diagram the three different options for integrating a LT tool are shown:

Integration options

In the Dockerization annex you will find more information on how you can create an ELG-compatible Docker image.

Describe a functional LT service

To register an ELG-compliant LT service at the platform, you must describe it according to the ELG metadata schema (at least minimal version), i.e., you have to provide a metadata record; some of the metadata elements are used for deploying/integrating your service to the platform.

Note

For this release, you MUST create an ELG-compliant XML metadata file and upload it to the platform. Upcoming releases will also provide a metadata editor as well as other functionalities supporting an easy import of metadata records.

You will find templates of metadata records for each of ELG’s five categories in this GitLab folder and some examples of already registered services here.

Examples of metadata records for LT services

ANNIE’s Named Entity Recognizer (IE tool)

<?xml version="1.0" encoding="UTF-8"?>
<ms:MetadataRecord xsi:schemaLocation="http://w3id.org/meta-share/meta-share/ ../../../Schema/ELG-SHARE.xsd" xmlns:ms="http://w3id.org/meta-share/meta-share/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
        <ms:MetadataRecordIdentifier ms:MetadataRecordIdentifierScheme="http://w3id.org/meta-share/meta-share/elg">default id</ms:MetadataRecordIdentifier>
        <ms:metadataCreationDate>2020-02-25</ms:metadataCreationDate>
        <ms:metadataLastDateUpdated>2020-02-25</ms:metadataLastDateUpdated>
        <ms:metadataCurator>
                <ms:actorType>Person</ms:actorType>
                <ms:surname xml:lang="en">Roberts</ms:surname>
                <ms:givenName xml:lang="en">Ian</ms:givenName>
                <ms:email>username1@somedomain.com</ms:email>
        </ms:metadataCurator>
        <ms:compliesWith>http://w3id.org/meta-share/meta-share/ELG-SHARE</ms:compliesWith>
        <ms:metadataCreator>
                <ms:actorType>Person</ms:actorType>
                <ms:surname xml:lang="en">Roberts</ms:surname>
                <ms:givenName xml:lang="en">Ian</ms:givenName>
                <ms:email>username2@somedomain.com</ms:email>
        </ms:metadataCreator>
        <ms:DescribedEntity>
                <ms:LanguageResource>
                        <ms:entityType>LanguageResource</ms:entityType>
                        <ms:resourceName xml:lang="en">GATE: English Named Entity Recognizer</ms:resourceName>
                        <ms:resourceShortName xml:lang="en">annie-named-entity-recognizer</ms:resourceShortName>
                        <ms:description xml:lang="en">Identify names of &lt;em&gt;persons&lt;/em&gt;, &lt;em&gt;locations&lt;/em&gt;, &lt;em&gt;organizations&lt;/em&gt;, as well as &lt;em&gt;money amounts&lt;/em&gt;, &lt;em&gt;time and date expressions&lt;/em&gt; in English texts automatically. </ms:description>
                        <ms:LRIdentifier ms:LRIdentifierScheme="http://w3id.org/meta-share/meta-share/elg">ELG id automatically assigned</ms:LRIdentifier>
                        <ms:version>v8.6</ms:version>
                        <ms:additionalInfo>
                                <ms:landingPage>https://cloud.gate.ac.uk/shopfront/displayItem/annie-named-entity-recognizer</ms:landingPage>
                        </ms:additionalInfo>
                        <ms:keyword xml:lang="en">Named Entity Recognition</ms:keyword>
                        <ms:keyword xml:lang="en">English</ms:keyword>
                        <ms:resourceProvider>
                                <ms:Group>
                                        <ms:actorType>Group</ms:actorType>
                                        <ms:organizationName xml:lang="en">GATE Team, University of Sheffield</ms:organizationName>
                                        <ms:website>https://gate.ac.uk/</ms:website>
                                </ms:Group>
                        </ms:resourceProvider>
                        <ms:publicationDate>2020-02-25</ms:publicationDate>
                        <ms:resourceCreator>
                                <ms:Person>
                                        <ms:actorType>Person</ms:actorType>
                                        <ms:surname xml:lang="en">Roberts</ms:surname>
                                        <ms:givenName xml:lang="en">Ian</ms:givenName>
                                        <ms:email>username3@somedomain.com</ms:email>
                                </ms:Person>
                        </ms:resourceCreator>
                        <ms:intendedApplication>
                                <ms:LTClassRecommended>http://w3id.org/meta-share/omtd-share/NamedEntityRecognition</ms:LTClassRecommended>
                        </ms:intendedApplication>
                        <ms:LRSubclass>
                                <ms:ToolService>
                                        <ms:lrType>ToolService</ms:lrType>
                                        <ms:function>
                                                <ms:LTClassRecommended>http://w3id.org/meta-share/omtd-share/NamedEntityRecognition</ms:LTClassRecommended>
                                        </ms:function>
                                        <ms:SoftwareDistribution>
                                                <ms:SoftwareDistributionForm>http://w3id.org/meta-share/meta-share/dockerImage</ms:SoftwareDistributionForm>
                                                <ms:executionLocation>http://localhost:8080/process</ms:executionLocation>
                                                <ms:dockerDownloadLocation>registry.gitlab.com/european-language-grid/usfd/gate-ie-tools/annie:8.6-0.0.3</ms:dockerDownloadLocation>
                                                <ms:licenceTerms>
                                                        <ms:licenceTermsName xml:lang="en">GNU Lesser General Public License v3.0 only</ms:licenceTermsName>
                                                        <ms:licenceTermsURL>https://spdx.org/licenses/LGPL-3.0-only.html</ms:licenceTermsURL>
                                                        <ms:LicenceIdentifier ms:LicenceIdentifierScheme="http://w3id.org/meta-share/meta-share/SPDX">LGPL-3.0-only</ms:LicenceIdentifier>
                                                        <ms:LicenceIdentifier ms:LicenceIdentifierScheme="http://w3id.org/meta-share/meta-share/elg">ELG-ENT-LIC-270220-00000199</ms:LicenceIdentifier>
                                                </ms:licenceTerms>
                                        </ms:SoftwareDistribution>
                                        <ms:languageDependent>true</ms:languageDependent>
                                        <ms:inputContentResource>
                                                <ms:processingResourceType>http://w3id.org/meta-share/meta-share/file1</ms:processingResourceType>
                                                <ms:language>
                                                        <ms:languageTag>en</ms:languageTag> <ms:languageId>en</ms:languageId>
                                                </ms:language>
                                                <ms:mediaType>http://w3id.org/meta-share/meta-share/text</ms:mediaType>
                                                <ms:dataFormat>http://w3id.org/meta-share/omtd-share/Json</ms:dataFormat>
                                                <ms:characterEncoding>http://w3id.org/meta-share/meta-share/UTF-8</ms:characterEncoding>
                                        </ms:inputContentResource>
                                        <ms:outputResource>
                                                <ms:processingResourceType>http://w3id.org/meta-share/meta-share/file1</ms:processingResourceType>
                                                <ms:language>
                                                        <ms:languageTag>en</ms:languageTag> <ms:languageId>en</ms:languageId>
                                                </ms:language>
                                                <ms:mediaType>http://w3id.org/meta-share/meta-share/text</ms:mediaType>
                                                <ms:dataFormat>http://w3id.org/meta-share/omtd-share/Json</ms:dataFormat>
                                                <ms:characterEncoding>http://w3id.org/meta-share/meta-share/UTF-8</ms:characterEncoding>
                                                <!-- annotations: :Address, :Date, :Location, :Organization, :Person, :Money, :Percent, :Token, :SpaceToken, :Sentence -->
                                                <ms:annotationType>http://w3id.org/meta-share/omtd-share/Person</ms:annotationType>
                                                <ms:annotationType>http://w3id.org/meta-share/omtd-share/Location</ms:annotationType>
                                                <ms:annotationType>http://w3id.org/meta-share/omtd-share/Organization</ms:annotationType>
                                                <ms:annotationType>http://w3id.org/meta-share/omtd-share/Date</ms:annotationType>
                                        </ms:outputResource>
                                        <ms:trl>http://w3id.org/meta-share/meta-share/trl4</ms:trl>
                                        <ms:evaluated>false</ms:evaluated>
                                </ms:ToolService>
                        </ms:LRSubclass>
                </ms:LanguageResource>
        </ms:DescribedEntity>
</ms:MetadataRecord>

The Docker image for this LT tool is stored at GitLab registry.

Edinburgh’s German to English engine (MT tool)

<?xml version="1.0" encoding="UTF-8"?>
<ms:MetadataRecord xmlns:ms="http://w3id.org/meta-share/meta-share/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://w3id.org/meta-share/meta-share/ ../../Schema/ELG-SHARE.xsd">
        <ms:MetadataRecordIdentifier ms:MetadataRecordIdentifierScheme="http://w3id.org/meta-share/meta-share/elg">default id</ms:MetadataRecordIdentifier>
        <ms:metadataCreationDate>2020-02-28</ms:metadataCreationDate>
        <ms:metadataLastDateUpdated>2020-02-28</ms:metadataLastDateUpdated>
        <ms:metadataCurator>
                <ms:actorType>Person</ms:actorType>
                <ms:surname xml:lang="en">Germann</ms:surname>
                <ms:givenName xml:lang="en">Ulrich</ms:givenName>
                <ms:PersonalIdentifier ms:PersonalIdentifierScheme="http://w3id.org/meta-share/meta-share/elg">ELG-ENT-PER-050320-00000787</ms:PersonalIdentifier>
                <ms:email>user@somedomain.uk</ms:email>
        </ms:metadataCurator>
        <ms:compliesWith>http://w3id.org/meta-share/meta-share/ELG-SHARE</ms:compliesWith>
        <ms:metadataCreator>
                <ms:actorType>Person</ms:actorType>
                <ms:surname xml:lang="en">Germann</ms:surname>
                <ms:givenName xml:lang="en">Ulrich</ms:givenName>
                <ms:PersonalIdentifier ms:PersonalIdentifierScheme="http://w3id.org/meta-share/meta-share/elg">ELG-ENT-PER-050320-00000787</ms:PersonalIdentifier>
                <ms:email>user@somedomain.uk</ms:email>
        </ms:metadataCreator>
        <ms:DescribedEntity>
                <ms:LanguageResource>
                        <ms:entityType>LanguageResource</ms:entityType>
                        <ms:resourceName xml:lang="en">UEDIN Machine Translation Service for German to English</ms:resourceName>
                        <ms:resourceShortName xml:lang="en">UEDIN-MT-DeEn</ms:resourceShortName>
                        <ms:description xml:lang="en">A machine translation (MT) service for German-to-English translation based on the Marian machine translation framework. The translation model is a basic transformer model trained on ca 13.3M sentence pairs using Marian NMT</ms:description>
                        <ms:LRIdentifier ms:LRIdentifierScheme="http://w3id.org/meta-share/meta-share/elg">ELG id automatically assigned</ms:LRIdentifier>
                        <ms:version>v1.0.0</ms:version>
                        <ms:additionalInfo>
                                <ms:email>user@somedomain.uk</ms:email>
                        </ms:additionalInfo>
                        <ms:keyword xml:lang="en">Machine Translation</ms:keyword>
                        <ms:keyword xml:lang="en">German</ms:keyword>
                        <ms:keyword xml:lang="en">English</ms:keyword>
                        <ms:keyword xml:lang="en">Neural machine translation</ms:keyword>
                        <ms:keyword xml:lang="en">Marian framework</ms:keyword>
                        <ms:resourceProvider>
                                <ms:Organization>
                                        <ms:actorType>Organization</ms:actorType>
                                        <ms:organizationName xml:lang="en">UEDIN</ms:organizationName>
                                        <ms:OrganizationIdentifier ms:OrganizationIdentifierScheme="http://w3id.org/meta-share/meta-share/elg">ELG-ENT-ORG-280220-00000397</ms:OrganizationIdentifier>
                                        <ms:website>https://www.ed.ac.uk/informatics/</ms:website>
                                </ms:Organization>
                        </ms:resourceProvider>
                        <ms:publicationDate>2020-02-28</ms:publicationDate>
                        <ms:resourceCreator>
                                <ms:Person>
                                        <ms:actorType>Person</ms:actorType>
                                        <ms:surname xml:lang="en">Germann</ms:surname>
                                        <ms:givenName xml:lang="en">Ulrich</ms:givenName>
                                        <ms:PersonalIdentifier ms:PersonalIdentifierScheme="http://w3id.org/meta-share/meta-share/elg">ELG-ENT-PER-050320-00000787</ms:PersonalIdentifier>
                                        <ms:email>user@somedomain.uk</ms:email>
                                </ms:Person>
                        </ms:resourceCreator>
                        <ms:intendedApplication>
                                <ms:LTClassRecommended>http://w3id.org/meta-share/omtd-share/MachineTranslation</ms:LTClassRecommended>
                        </ms:intendedApplication>
                        <ms:LRSubclass>
                                <ms:ToolService>
                                        <ms:lrType>ToolService</ms:lrType>
                                        <ms:function>
                                                <ms:LTClassRecommended>http://w3id.org/meta-share/omtd-share/MachineTranslation</ms:LTClassRecommended>
                                        </ms:function>
                                        <ms:SoftwareDistribution>
                                                <ms:SoftwareDistributionForm>http://w3id.org/meta-share/meta-share/dockerImage</ms:SoftwareDistributionForm>
                                                <ms:executionLocation>http://localhost:18080/api/elg/v1</ms:executionLocation>
                                                <ms:dockerDownloadLocation>mt4elg-de-en</ms:dockerDownloadLocation>
                                                <ms:additionalHWRequirements>limits_memory: 2048Mi limits_cpu: 1.5</ms:additionalHWRequirements>
                                                <ms:licenceTerms>
                                                        <ms:licenceTermsName xml:lang="en">CC BY-SA 4.0</ms:licenceTermsName>
                                                        <ms:licenceTermsURL>https://creativecommons.org/licenses/by-sa/4.0/</ms:licenceTermsURL>
                                                        <ms:LicenceIdentifier ms:LicenceIdentifierScheme="http://w3id.org/meta-share/meta-share/elg">ELG-ENT-LIC-270220-00000097</ms:LicenceIdentifier>
                                                </ms:licenceTerms>
                                        </ms:SoftwareDistribution>
                                        <ms:languageDependent>true</ms:languageDependent>
                                        <ms:inputContentResource>
                                                <ms:processingResourceType>http://w3id.org/meta-share/meta-share/file1</ms:processingResourceType>
                                                <ms:language>
                                                        <ms:languageTag>de</ms:languageTag>
                                                        <ms:languageId>de</ms:languageId>
                                                </ms:language>
                                                <ms:mediaType>http://w3id.org/meta-share/meta-share/text</ms:mediaType>
                                                <ms:dataFormat>http://w3id.org/meta-share/omtd-share/Json</ms:dataFormat>
                                                <ms:characterEncoding>http://w3id.org/meta-share/meta-share/UTF-8</ms:characterEncoding>
                                        </ms:inputContentResource>
                                        <ms:outputResource>
                                                <ms:processingResourceType>http://w3id.org/meta-share/meta-share/file1</ms:processingResourceType>
                                                <ms:language>
                                                        <ms:languageTag>en</ms:languageTag>
                                                        <ms:languageId>en</ms:languageId>
                                                </ms:language>
                                                <ms:mediaType>http://w3id.org/meta-share/meta-share/text</ms:mediaType>
                                                <ms:dataFormat>http://w3id.org/meta-share/omtd-share/Json</ms:dataFormat>
                                                <ms:characterEncoding>http://w3id.org/meta-share/meta-share/UTF-8</ms:characterEncoding>
                                        </ms:outputResource>
                                        <ms:trl>http://w3id.org/meta-share/meta-share/trl4</ms:trl>
                                        <ms:evaluated>false</ms:evaluated>
                                </ms:ToolService>
                        </ms:LRSubclass>
                </ms:LanguageResource>
        </ms:DescribedEntity>
</ms:MetadataRecord>

The Docker image for this LT tool is stored at DockerHub.

Minimal version metadata

The set of the metadata (mandatory or recommended) that are common to all kinds of resources, including functional LT services, are presented in Minimal version - List of elements common to all LRTs. In addition, the following metadata elements are required or recommended for tools/services:

For a quick guide to the ELG template, see Template - Explanations.

ToolService

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService

Data type component

Optionality Mandatory

Explanation & Instructions

Introduces the set of elements that is specific to tools/services

Example

<ms:LRSubclass>
        <ms:ToolService>
                <ms:lrType>toolService</ms:lrType>
                ...
        </ms:ToolService>
</ms:LRSubclass>

function

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService.function

Data type component

Optionality Mandatory

Explanation & Instructions

Specifies the operation/function/task that a software object performs

The element is important for discovery purposes. You can fill in:

  • the LTClassRecommended element with one of the recommended values from the LT taxonomy, or
  • the LTClassOther element with a free text.

For services that perform multiple functions (e.g., syntactic and semantic annotation) you can repeat the element.

Example

<ms:function>
        <ms:LTClassRecommended>http://w3id.org/meta-share/omtd-share/NamedEntityRecognition</ms:LTClassRecommended>
</ms:function>

<ms:function>
        <ms:LTClassRecommended>http://w3id.org/meta-share/omtd-share/MachineTranslation</ms:LTClassRecommended>
</ms:function>

<ms:function>
        <ms:LTClassOther>video segmentation</ms:LTClassRecommended>
</ms:function>

SoftwareDistribution

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService.SoftwareDistribution

Data type component

Optionality Mandatory

Explanation & Instructions

Any form with which software is distributed (e.g., web services, executable or code files, etc.)

This element groups together information that pertains to the physical form of a tool/service that is made available through the catalogue. For software that is distributed with multiple forms (e.g., as source code, as a web service, etc.), you can repeat this group of elements. The access location and the licensing conditions may differ for each distribution.

The following list includes the mandatory and recommended elements:

  • SoftwareDistributionForm (Mandatory): The medium, delivery channel or form (e.g., source code, API, web service, etc.) through which a software object is distributed. Use the value http://w3id.org/meta-share/meta-share/dockerImage.
  • dockerDownloadLocation (Mandatory if applicable): A location where the the LT tool docker image is stored. Add the location from where the ELG team can download the docker image in order to test it.
  • serviceAdapterDownloadLocation (Mandatory if applicable): Τhe URL where the docker image of the service adapter can be downloaded from. Required only for ELG functional services implemented with an adapter.
  • executionLocation (Mandatory): A URL where the resource (mainly software) can be directly executed. Add here the REST endpoint at which the LT tool is exposed within the Docker image.
  • additionalHwRequirements (Mandatory if applicable): A short text where you specify additional requirements for running the service, e.g. memory requirements, etc. The recommended format for this is: ‘limits_memory: X limits_cpu: Y’
  • licenceTerms (Mandatory): See licenceTerms

licenceTerms

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService.SoftwareDistribution.licenceTerms

Data type component

Optionality Mandatory

Explanation & Instructions

Links the distribution (distributable form) of a language resource to the licence or terms of use/service (a specific legal document) with which it is distributed

The recommended practice is to add a licence name and identifier from the SPDX list of licences (https://spdx.org/licenses/). For proprietary licences or licences not included in the above list, please add a (unique) licence name and the URL where the text of the licence can be found.

Example

<ms:licenceTerms>
        <ms:licenceTermsName xml:lang="en">GNU Lesser General Public License v3.0 only</ms:licenceTermsName>
        <ms:licenceTermsURL>https://spdx.org/licenses/LGPL-3.0-only.html</ms:licenceTermsURL>
        <ms:LicenceIdentifier ms:LicenceIdentifierScheme="http://w3id.org/meta-share/meta-share/SPDX">LGPL-3.0-only</ms:LicenceIdentifier>
</ms:licenceTerms>

<ms:licenceTerms>
        <ms:licenceTermsName xml:lang="en">publicDomain</ms:licenceTermsName>
        <ms:licenceTermsURL>https://elrc-share.eu/terms/publicDomain.html</ms:licenceTermsURL>
</ms:licenceTerms>

<ms:licenceTerms>
        <ms:licenceTermsName xml:lang="en">Creative Commons Attribution 4.0 International</ms:licenceTermsName>
        <ms:licenceTermsURL>https://creativecommons.org/licenses/by/4.0/legalcodel</ms:licenceTermsURL>
        <ms:LicenceIdentifier ms:LicenceIdentifierScheme="http://w3id.org/meta-share/meta-share/SPDX">CC-BY-4.0</ms:LicenceIdentifier>
</ms:licenceTerms>

languageDependent

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService.languageDependent

Data type boolean

Optionality Mandatory

Explanation & Instructions

Indicates whether the operation of the tool or service is language dependent or not

For language-dependent tools/services, you will be asked to also provide the language of the input and output resources.

Example

<ms:languageDependent>true</ms:languageDependent>

inputContentResource

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService.inputContentResource

Data type component

Optionality Mandatory

Explanation & Instructions

Specifies the requirements set by a tool/service for the (content) resource that it processes

The following elements are mandatory or recommended:

  • processingResourceType: Specifies the resource type that a tool/service takes as input or produces as output; you must specify, for instance, if the tool/service can process a single file, or set of files, or processes a string typed in by the users.
  • language: Specifies the language that is used in the resource or supported by the tool/service, expressed according to the BCP47 recommendation. See language
  • mediaType (Recommended): Specifies the media type of the input/output of a language processing tool/service. For ELG functional services, this will be used to fit the appropriate GUI (e.g. “audio” for ASR applications, vs. “text” for Machine Translation applications)
  • dataFormat (Recommended): Indicates the format(s) of a data resource Please, use to indicate the data format of the resource supported by the tool/service. The dataFormat CV (TODO) lists data formats, with their mimetype and documentation on the particularities, thus catering for variations of formats, e.g. GATE XML, TEI variants, etc.
  • characterEncoding (Recommended if applicable): Specifies the character encoding used for the input/output text resource of an LT service
  • annotationType (Recommended if applicable): Specifies the annotation type of the annotated version(s) of a resource or the annotation type a tool/ service requires or produces as an output. Use this element only if the tool/service processes pre-annotated corpora; for tools/services processing raw files, do not use. The element takes a value from a CV, see annotationType.

Example

<!-- example for a tool with textual input -->
<ms:inputContentResource>
        <ms:processingResourceType>http://w3id.org/meta-share/meta-share/file1</ms:processingResourceType>
        <ms:language>
                <ms:languageTag>en</ms:languageTag> <ms:languageId>en</ms:languageId>
        </ms:language>
        <ms:mediaType>http://w3id.org/meta-share/meta-share/text</ms:mediaType>
        <ms:dataFormat>http://w3id.org/meta-share/omtd-share/Json</ms:dataFormat>
        <ms:characterEncoding>http://w3id.org/meta-share/meta-share/UTF-8</ms:characterEncoding>
</ms:inputContentResource>

<!-- example for an Automatic Speech Recognizer -->
<ms:inputContentResource>
        <ms:processingResourceType>http://w3id.org/meta-share/meta-share/file1</ms:processingResourceType>
        <ms:language>
                <ms:languageTag>de</ms:languageTag> <ms:languageId>de</ms:languageId>
        </ms:language>
        <ms:mediaType>http://w3id.org/meta-share/meta-share/audio</ms:mediaType>
        <ms:dataFormat>http://w3id.org/meta-share/omtd-share/mp3</ms:dataFormat>
        <ms:dataFormat>http://w3id.org/meta-share/omtd-share/wav</ms:dataFormat>
</ms:inputContentResource>

outputResource

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService.outputResource

Data type component

Optionality Recommended if applicable

Explanation & Instructions

Describes the features of the output resource processed by a tool/service.

The set of elements are the same as for the inputContentResource.

Make sure that you add here what is relevant for your application. For instance,

  • for annotation and information extraction tools/services, use the annotationType to indicate the results of your processing; you can repeat it to indicate mutliple annotation types (e.g., part of speech, person, amount, location, etc.)
  • for Machine Translation tools, indicate the input and output languages respectively.

Example

<!-- example for an Information Extraction tool -->
<ms:outputResource>
        <ms:processingResourceType>http://w3id.org/meta-share/meta-share/file1</ms:processingResourceType>
        <ms:language>
                <ms:languageTag>en</ms:languageTag>
                <ms:languageId>en</ms:languageId>
        </ms:language>
        <ms:mediaType>http://w3id.org/meta-share/meta-share/text</ms:mediaType>
        <ms:dataFormat>http://w3id.org/meta-share/omtd-share/Json</ms:dataFormat>
        <ms:characterEncoding>http://w3id.org/meta-share/meta-share/UTF-8</ms:characterEncoding>
        <ms:annotationType>http://w3id.org/meta-share/omtd-share/Person</ms:annotationType>
        <ms:annotationType>http://w3id.org/meta-share/omtd-share/Location</ms:annotationType>
        <ms:annotationType>http://w3id.org/meta-share/omtd-share/Organization</ms:annotationType>
        <ms:annotationType>http://w3id.org/meta-share/omtd-share/Date</ms:annotationType>
</ms:outputResource>

<!-- example for a Machine Translation tool -->
<ms:outputResource>
        <ms:processingResourceType>http://w3id.org/meta-share/meta-share/file1</ms:processingResourceType>
        <ms:language>
                <ms:languageTag>en</ms:languageTag>
                <ms:languageId>en</ms:languageId>
        </ms:language>
        <ms:mediaType>http://w3id.org/meta-share/meta-share/text</ms:mediaType>
        <ms:dataFormat>http://w3id.org/meta-share/omtd-share/Json</ms:dataFormat>
        <ms:characterEncoding>http://w3id.org/meta-share/meta-share/UTF-8</ms:characterEncoding>
</ms:outputResource>

language

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService.language

Data type component

Optionality Mandatory if applicable

Explanation & Instructions

Specifies the language that is used in the resource or supported by the tool/service, expressed according to the BCP47 recommendation

The element languageTag is composed of the languageId, and optionally scriptId, regionId and variantId; you can use those elements that best describe the language(s) of your resource.

Example

<ms:language>
        <ms:languageTag>en</ms:languageTag>
        <ms:languageId>en</ms:languageId>
</ms:language>

<ms:language>
        <ms:languageTag>en-US</ms:languageTag>
        <ms:languageId>en</ms:languageId>
        <ms:regionId>US</ms:regionId>
</ms:language>

implementationLanguage

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService.implementationLanguage

Data type string

Optionality Recommended

Explanation & Instructions

The programming language(s) used for the development of a tool/service

Example

<ms:implementationLanguage>Java v8.1</ms:implementationLanguage>

trl

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService.trl

Data type CV http://w3id.org/meta-share/meta-share/trl

Optionality Recommended

Explanation & Instructions

Specifies the TRL (Technology Readiness Level) of the technology according to the measurement system defined by the EC (https://ec.europa.eu/research/participants/data/ref/h2020/wp/2014_2015/annexes/h2020-wp1415-annex-g-trl_en.pdf)

Example

<ms:trl>http://w3id.org/meta-share/meta-share/trl4</ms:trl>

evaluated

Path MetadataRecord.DescribedEntity.LanguageResource.LRSubclass.ToolService.evaluated

Data type boolean

Optionality Mandatory

Explanation & Instructions

Indicates whether the tool or service has been evaluated

If the tool/service has been evaluated, you can use the ‘evaluation’ component to give more detailed information; see here for the relevant elements.

Example

<ms:evaluated>false</ms:evaluated>

Register an LT Service to the platform

The following steps should be followed:

  • Provide a metadata record: Sign into the ELG platform using your credentials and press the “upload” button on the main menu.
Upload menu

Then upload the XML file that contains the metadata. In the current release, this is the only way to provide them. Upcoming releases will also include a metadata editor and other functionalities.

Upload metadata XML

The metadata record is validated at import against the metadata schema. Additional rules that check for syntactic and partial semantic integrity are also used. If the file is found invalid, you will see a message with a list of errors; you must correct them and re-upload the file. If it is valid you will be shown a success message; the file will be imported in the database. At this stage, it is visible only to the platform administrators.

  • LT Service is assigned to a reviewer: The administrator will assign it to a reviewer; during the review process, the metadata record is visible only to you (LT provider) and the reviewer.
LT Service under review.
  • LT Service is deployed to ELG and configured: The LT service is deployed (by the reviewer) to the k8s cluster by creating the appropriate configuration yaml file and uploading to the respective GitLab repository. The CI/CD pipeline that is responsible for deployments will automatically install the new service at the k8s cluster. If you request it, a separate dedicated k8s namespace can be created for the LT service before creating the yaml file. The reviewer of the service assigns to it:

    • the k8s REST endpoint that will be used for invoking it. The endpoint follows this template: http://{k8s service name for the registered LT tool}.{k8s namespace for the registered LT tool}.svc.cluster.local{the path where the REST service is running at}. The {the path where the REST service is running at} part can be found in the executionLocation field in the metadata. For instance, for the Edinburgh’s MT tool above it is ‘/api/elg/v1’.
    • An ID that will be used to call it.
    • Which “try out” UI will be used for testing it and visualizing the returned results.
  • LT Service is tested: On the LT landing page, there is a “Try out” tab and a “Code samples” tab; both can be used to test the service with some input; see Test an LT service section. The reviewer can help you identify integration issues and resolve them. This process is continued until the LT service is correctly integrated to the platform. The procedure may require access to the k8s cluster for the reviewer (e.g., to check containers start-up/failures, logs, etc.).

    Tru out UI
  • LT Service is published: When the LT service works as expected, the reviewer will approve it; the metadata record is then published and visible to all ELG users through the catalogue.

Frequently Asked Questions

Question: What is a k8s namespace and when should an LT Provider ask for one?
Answer: A k8s namespace is a virtual sub-cluster, which can be used to restrict access to the respective containers that run within it. You should ask for a dedicated namespace (in ELG k8s cluster) when you need to ensure isolation and security; i.e, limit access to your container, logs etc.
Question: The image that I have created is not publicly available. Is it possible to register it to the ELG platform?
Answer: Yes, it can be registered. A k8s secret containing the required credentials will be created for the namespace in which your image is going to be deployed. k8s will then be able to pull the image and deploy it.
Question: Are there any requirements for executionLocation? For example, an IE tool has to expose a specific path or use a specific port?
Answer: No, you can use any valid port or path. This holds for any kind of LT tool (IE, MT, ASR, etc.). The internal container port will be mapped (via port mapping) to port 80. Remember that the endpoint of the LT service follows this pattern: http://{k8s service name for the registered LT tool}.{k8s namespace for the registered LT tool}.svc.cluster.local{the path where the REST service is running at}, which assumes that the service is exposed to port 80.
Question: I have n different versions of the same IE LT tool; e.g., one version per language. How should I register them to the platform? I have to create one Docker image with all the different versions or one image per version?
Answer: Both are possible. In both cases you will have to provide a separate metadata record for each LT tool. However, in the case where the tools are packaged together, all metadata records must point to the same image location (dockerDownloadLocation) and each of them has to listen in a different HTTP endpoint (executionLocation) but on the same port (for simplicity). E.g, “http://localhost:8080/NamedEntityRecognitionEN”, “http://localhost:8080/NamedEntityRecognitionDE”.
Question: Should the Docker image that I will provide have a specific tag?
Answer: The images that are stored in GitLab or DockerHub are not immutable, even when they have been assigned a specific/custom tag; thus, it is possible that they are overwritten (by their creators). ELG (currently) does not have a private Docker registry that caches images. Therefore, when ELG will try (at some point) to spawn a new instance of an LT service, it might download (pull) and use an image that is not (any more) ELG compatible, because it has been overwritten (e.g. by accident). So, yes, it is recommended (but not enforced) to put a custom tag (dedicated for ELG) to the image that you will register, since it is usually more common to override the :latest one.
Question: How many resources will be allocated for my LT container in the k8s cluster?
Answer: By default, 512MB of RAM and half a CPU core. If your LT service requires more resources you have to specify it by using the additionalHWRequirements metadata element (see the MT example above) or by communicating with the ELG administrators.