Contribute an ELG compatible service

This page describes how to contribute a language technology service to run on the cloud platform of the European Language Grid.

Currently, ELG supports the integration of tools/services that fall into one of the following broad categories:

  • Information Extraction (IE) : Services that take text and annotate it with metadata on specific segments, e.g. Named Entity Recognition (NER), the task of extracting persons, locations, and organizations from a given text.

  • Text Classification (TC) : Services that take text and return a classification for the given text from a finite set of classes, e.g. Text Categorization which is the task of categorizing text into (usually labelled) organized categories.

  • Machine Translation (MT) : Services that take text in one language and translate it into text in another language, possibly with additional metadata associated with each segment (sentence, phrase, etc.).

  • Automatic Speech Recognition (ASR) : Services that take audio as input and produce text (e.g., a transcription) as output, possibly with metadata associated with each segment.

  • Text-to-Speech Generation (TTS) : Services that take text as input and produce audio as output.

Overview: How an LT Service is integrated to ELG

An overview of the ELG platform is depicted below.

Platform overview

The following bullets summarize how LT services are deployed and invoked in ELG.

  • All LT Services (as well as all the other ELG components) are deployed (run as containers) on a Kubernetes (k8s) cluster; k8s is a system for automating deployment, scaling, and management of containerised applications.

  • All LT Services are integrated into ELG via the LT Service Execution Orchestrator/Server. This server exposes a common public REST API (Representational state transfer) used for invoking any of the deployed backend LT Services. The public API is used from ELG’s Trial UIs that are embedded in the ELG Catalogue; it can also be invoked from the command line or any programming language (for more information, see Use an LT service). Some of the HTTP endpoints that are offered in the API are given below; for more information see Public LT API specification.

Endpoint

Type

Consumes

Produces

https://{domain}/execution/processText/{ltServiceID}

POST

‘application/json’

‘application/json’

https://{domain}/execution/processText/{ltServiceID}

POST

‘text/plain’ or ‘text/html’

‘application/json’

https://{domain}/execution/processAudio/{ltServiceID}

POST

‘audio/x-wav’ or ‘audio/wav’

‘application/json’

https://{domain}/execution/processAudio/{ltServiceID}

POST

‘audio/mpeg’

‘application/json’

https://{domain}/execution/processAudio/{ltServiceID} | POST | ‘audio/mpeg | application/json |

{domain} is ‘live.european-language-grid.eu’ and {ltServiceID} is the ID of the backend LT service. This ID is assigned/configured during registration; see section 3. Manage and submit the service for publication - ‘LT Service is deployed to ELG and configured’ step.

Note

The REST API that is exposed from an LT Service X (see above) is for the communication between the LT Service Execution Orchestrator Server and X (ELG internal API - see Internal LT Service API specification).

  • When the LT Service Execution Orchestrator receives a processing request for service X, it retrieves from the database X’s k8s REST endpoint and sends a request to it. This endpoint is configured/specified during the registration process; see section 3. Manage and submit the service for publication - ‘LT Service is deployed to ELG and configured’ step. When the Orchestrator gets the response from the LT Service, it returns it to the application/client that sent the initial call.

0. Before you start

  • Please make sure that the service you want to contribute complies with our terms of use.

  • Please make sure you have registered and been assigned the provider role.

  • Please make sure that your service meets the technical requirements below, and choose one of the three integration options.

Technical requirements and integration options

The requirements for integrating an LT tool/service to ELG are the following:

Expose an ELG compatible endpoint: You MUST create an application that exposes an HTTP endpoint for the provided LT tool(s). The application MUST consume (via the aforementioned HTTP endpoint) requests that follow the ELG JSON format, call the underlying LT tool and produce responses again in the ELG JSON format. For a detailed description of the JSON-based HTTP protocol (ELG Internal LT API) that you have to implement, see the Internal LT API specification.

Dockerisation: You MUST dockerise the application and upload the respective image(s) in a Docker Registry, such as GitLab, DockerHub, Azure Container Registry etc. You MAY select out of the three following options, the one that best fits your needs:

  • LT tools packaged in one standalone image: One docker image is created that contains the application that exposes the ELG-compatible endpoint and the actual LT tool.

  • LT tools running remotely outside the ELG infrastructure 1 : For these tools, one proxy image is created that exposes one (or more) ELG-compatible endpoints; the proxy container communicates with the actual LT service that runs outside the ELG infrastructure.

  • LT tools requiring an adapter: For tools that already offer an image that exposes a non-ELG compatible endpoint (HTTP-based or other), a second adapter image SHOULD be created that exposes an ELG-compatible endpoint and acts as proxy to the container that hosts the actual LT tool.

In the following diagram the three different options for integrating a LT tool are shown:

Integration options

1. Dockerize your service

Build/Store Docker images

Ideally, the source code of your LT tool/service already resides on GitLab where a built-in Continuous Integration (CI) Runner can take care of building the image. GitLab also offers a container registry that can be used for storing the built image. For this, you need to add at the root level of your GitLab repository a .gitlab-ci.yml file as well as a Dockerfile, i.e, the recipe for building the image. Here you can find an example. After each new commit, the CI Runner is automatically triggered and runs the CI pipeline that is defined in .gitlab-ci.yml. You can see the progress of the pipeline on the respective page in GitLab UI (“CI / CD -> Jobs”); when it completes successfully, you can also find the image at “Packages -> Container Registry”.

Your image can also be built and tagged in your machine by running the docker build command. Then it can be uploaded (with docker push) to the GitLab registry, DockerHub (which is a public Docker registry) or any other Docker registry.

For instance, for this GitLab hosted project, the commands would be:

  • docker login registry.gitlab.com

for logging in and be allowed to push an image

  • docker build -t registry.gitlab.com/european-language-grid/dfki/elg-jtok

for building an image (locally) for the project - please note that before running docker build you have to download (clone) a copy of the project and be at the top-level directory (elg-jtok)

  • docker push registry.gitlab.com/european-language-grid/dfki/elg-jtok

for pushing the image to GitLab.

In the following links you can find some more inforrmation on docker commands plus some examples:

Dockerization of a Python-based LT tool

An example of a Python-based LT tool: Python-based example.

Dockerization of a Java-based tool

A Spring Boot starter to make it as easy as possible to create ELG-compliant tools in Java is provided at: ELG Spring Boot Starter.

2. Describe and register the service at ELG

You can describe and register the service

In both modes, you MUST indicate that it is an ELG compatible service. More specifically, if you use the interactive editor, select the Service or Tool form and, when prompted, select Yes.

Select "yes" for ELG-compatible services

If you decide to upload a metadata file, you MUST check the box next to ELG-compatible service at the upload page.

Upload metadata XML

Depending on your answer, the respective box will/will not be checked in the editor. You can change your decision anytime through the editor form.

ELG-compatible service tickbox

The service MUST be described according to the ELG schema and include at least the mandatory metadata elements.

The following figure gives an overview of the metadata elements you must provide 2 for an ELG-compatible service, replicating the editor (with sections horizontally and tabs vertically) so that you can easily track each element. In the editor, all elements, mandatory or not, are explained by definitions and examples.

ELG compatible service at a glance

To describe any resource efficiently you need to name it, provide a description with a few words about it and indicate its version 3. Then, one or more keywords are asked for the resource and an email or a landing page for anyone who wishes to have additional information about it.

For services, you must also specify the function (i.e., the task it performs, e.g. Named Entity Recognition, Machine Translation, Speech Recognition, etc.) and supply the technical specifications of its input, at least the resource type it processes (e.g. corpus, lexical/conceptual resource etc.). You must also select whether it is language independent and, if not, specify the input language(s). Depending on the function, you may be required to add further information, e.g. the language(s) of the ouput resource for Machine Translation services.

You also have to describe independently each distributable form of the service (i.e. all the ways the user can obtain it, e.g., in a downloadable form, as a file with the source code or a docker image). For each distribution, you must always specify the licence under which it is made available. In the case of ELG compatible services, one Software Distribution with the following elements MUST be included in the metadata record. The editor will guide you through the process of filling them in.

  • Software distribution form (SoftwareDistributionForm): For ELG compatible services, use the value docker image (http://w3id.org/meta-share/meta-share/dockerImage).

  • Docker download location (dockerDownloadLocation): Add the location from where the ELG team can download the docker image in order to integrate it in the platform.

  • Service adapter download location (serviceAdapterDownloadLocation): Τhe URL where the docker image of the service adapter can be downloaded from. Required only for ELG integrated services implemented with an adapter.

  • Execution location (executionLocation): Add here the REST endpoint at which the LT tool is exposed within the Docker image.

  • Private (privateResource): Specifies whether the resource is private so that its access/download location remains hidden when the item is published in the ELG catalogue.

  • Additional h/w requirements (additionalHwRequirements): A short text where you specify additional requirements for running the service, e.g. memory requirements, etc. The recommended format for this is: ‘limits_memory: X limits_cpu: Y’.

3. Manage and submit the service for publication

Through the My items page you can access your metadata record (see Manage your items) and edit it until you are satisfied. You can then submit it for publication, in line with the publication lifecycle defined for ELG metadata records.

At this stage, the metadata record can no longer be edited and is only visible to you and to us, the ELG platform administrators.

Before it is published, the service undergoes a validation process, which is described in detail at CHAPTER 4: VALIDATING ITEMS.

During this process, the service is deployed to ELG, configured and tested to ensure it conforms to the ELG technical specifications. We describe here the main steps in this process:

  • LT Service is deployed to ELG and configured: The LT service is deployed (by the validator) to the k8s cluster by creating the appropriate configuration YAML file and uploading to the respective GitLab repository. The CI/CD pipeline that is responsible for deployments will automatically install the new service at the k8s cluster. If you request it, a separate dedicated k8s namespace can be created for the LT service before creating the YAML file. The validator of the service assigns to it:

    • the k8s REST endpoint that will be used for invoking it, according to the following template: http://{k8s service name for the registered LT tool}.{k8s namespace for the registered LT tool}.svc.cluster.local{the path where the REST service is running at}. The {the path where the REST service is running at} part can be found in the executionLocation field in the metadata. For instance, for the Edinburgh’s MT tool above it is ‘/api/elg/v1’.

    • An ID that will be used to call it.

    • Which “try out” UI will be used for testing it and visualizing the returned results.

  • LT Service is tested: On the LT landing page, there is a Try out tab and a Code samples tab, which can both be used to test the service with some input; see Use an LT service section. The validator can help you identify integration issues and resolve them. This process is continued until the LT service is correctly integrated to the platform. The procedure may require access to the k8s cluster for the validator (e.g., to check containers start-up/failures, logs, etc.).

  • LT Service is published: When the LT service works as expected, the validator will approve it; the metadata record is then published and visible to all ELG users through the catalogue.

Frequently asked questions

Question: What is a k8s namespace and when should an LT Provider ask for one?
Answer: A k8s namespace is a virtual sub-cluster, which can be used to restrict access to the respective containers that run within it. You should ask for a dedicated namespace (in ELG k8s cluster) when you need to ensure isolation and security; i.e, limit access to your container, logs etc.
Question: The image that I have created is not publicly available. Is it possible to register it to the ELG platform?
Answer: Yes, it can be registered. A k8s secret containing the required credentials will be created for the namespace in which your image is going to be deployed. k8s will then be able to pull the image and deploy it.
Question: Are there any requirements for executionLocation? For example, an IE tool has to expose a specific path or use a specific port?
Answer: No, you can use any valid port or path. This holds for any kind of LT tool (IE, MT, ASR, etc.). The internal container port will be mapped (via port mapping) to port 80. Remember that the endpoint of the LT service follows this pattern: http://{k8s service name for the registered LT tool}.{k8s namespace for the registered LT tool}.svc.cluster.local{the path where the REST service is running at}, which assumes that the service is exposed to port 80.
Question: I have n different versions of the same IE LT tool; e.g., one version per language. How should I register them to the platform? I have to create one Docker image with all the different versions or one image per version?
Answer: Both are possible. In both cases you will have to provide a separate metadata record for each LT tool. However, in the case where the tools are packaged together, all metadata records must point to the same image location (dockerDownloadLocation) and each of them has to listen in a different HTTP endpoint (executionLocation) but on the same port (for simplicity). E.g, http://localhost:8080/NamedEntityRecognitionEN,``http://localhost:8080/NamedEntityRecognitionDE``.
Question: Should the Docker image that I will provide have a specific tag?
Answer: The images that are stored in GitLab or DockerHub are not immutable, even when they have been assigned a specific/custom tag; thus, it is possible that they are overwritten (by their creators). ELG (currently) does not have a private Docker registry that caches images. Therefore, when ELG will try (at some point) to spawn a new instance of an LT service, it might download (pull) and use an image that is not (any more) ELG compatible, because it has been overwritten (e.g. by accident). So, yes, it is recommended (but not enforced) to put a custom tag (dedicated for ELG) to the image that you will register, since it is usually more common to override the :latest one.
Question: How many resources will be allocated for my LT container in the k8s cluster?
Answer: By default, 512MB of RAM and half a CPU core. If your LT service requires more resources you have to specify it by using the additionalHWRequirements metadata element (see the MT example above) or by communicating with the ELG administrators.
Question: What is a YAML file and what does it contain?
Answer: Each service has a YAML file which contains information about the allocated resources in the k8s cluster (see question above) and the scaling parameters (whether it is readily available at all times or started on demand).
1

Services running remotely outside the ELG infrastructure are marked as such with a tag on their view page.

2

You must fill in at least the mandatory elements for the metadata record to be saved. In addition, you may be required to fill in specific mandatory if applicable elements (indicated in the figure with an asterisk), depending on the values you provide for other elements.

3

If no version number is provided, the system will automatically number it as “1.0.0” with an indication that it has been automatically assigned. We recommend, however, the use of Semantic Versioning (https://semver.org/) for labelling versions.