Quickstart

The SDK contains a class for each main ELG functionality.

The Catalog class is for browsing the catalogue, the Entity class is for representing an ELG entity (i.e., an ELG resource), the Service class is for using the ELG services, etc…

These classes can be imported directly from the elg package as follows:

[1]:
from elg import Catalog, Entity, Service, Authentication, Corpus

Browsing the catalogue

First you have to init a Catalog object.

Then you can use the search method to search for resources. This method returns a list of Entity which can be displayed individually.

For example, we can search for a Machine Translation service for English and French.

[2]:
catalog = Catalog()

# Search and get the result as a list of Entity
results = catalog.search(
    resource = "Tool/Service", # "Corpus", "Lexical/Conceptual resource" or "Language description"
    function = "Machine Translation", # function should be pass only if resource is set to "Tool/Service"
    languages = ["en", "fr"], # string or list if multiple languages
    limit = 100,
)
print(f"Machine Translation service for English and French:\n{list(results)[0]}")
Machine Translation service for English and French:
----------------------------------------------------------------------
Id             597
Name           Transformer en-fr: Machine Translation Model Trained
               Using Tensor2tensor
Resource type  Tool/Service
Entity type    LanguageResource
Description    Transformer en-fr translation model performs automatic
               translation of raw text from en to fr
Licences       ['Apache License 2.0']
Languages      ['English', 'French']
Status         None
----------------------------------------------------------------------

Another example can be a German NER corpora.

[3]:
results = catalog.search(
    resource = "Corpus", # "Corpus", "Lexical/Conceptual resource" or "Language description"
    languages = ["German"], # string or list if multiple languages
    search="ner",
    limit = 100,
)
print(f"German corpus for NER:\n{list(results)[0]}")
German corpus for NER:
----------------------------------------------------------------------
Id             5010
Name           GermEval 2014 NER Shared Task
Resource type  Corpus
Entity type    LanguageResource
Description    The data was sampled from German Wikipedia and News
               Corpora as a collection of citations.The dataset covers
               over 31,000 sentences corresponding to over 590,000
               tokens.
Licences       ['Creative Commons Attribution 4.0 International']
Languages      ['German']
Status         None
----------------------------------------------------------------------

Using the ELG services

You can use the ELG services directly in Python. Every ELG service can be initialized using its id, and then call with your custom input.

[4]:
lt = Service.from_id(474)
result = lt("Nikolas Tesla lives in Berlin.")
print(f"\n{result}")
Calling:
        [474] Cogito Discover Named Entity Recognizer
with request:
        type: text - content: Nikolas Tesla lives in Berlin. - mimeType: text/plain


type='annotations' warnings=None features=None annotations={'People': [Annotation(start=0, end=13, source_start=None, source_end=None, features={'SURNAME': 'Tesla', 'SEX': 'M', 'name': 'Nikolas Tesla', 'NAME': 'Nikolas'})], 'Place': [Annotation(start=23, end=29, source_start=None, source_end=None, features={'Lemma': 'Berlin', 'name': 'Berlin', 'Glossa': 'Staatshauptstadt in Berlin (Deutschland/Europa', 'GEOREF': 'Berlin/Deutschland/Europa'})]}

A service can also be initialized from a catalogue search result.

[5]:
catalog = Catalog()
results = catalog.search(
    resource = "Tool/Service",
    function = "Machine Translation",
    languages = ["en", "fr"],
    limit = 1,
)
service = Service.from_entity(next(results))
print(service)
----------------------------------------------------------------------
Id             597
Name           Transformer en-fr: Machine Translation Model Trained
               Using Tensor2tensor
Resource type  Tool/Service
Entity type    LanguageResource
Description    Transformer en-fr translation model performs automatic
               translation of raw text from en to fr
Licences       ['Apache License 2.0']
Languages      ['English', 'French']
Status         None
----------------------------------------------------------------------

Different type of inputs can be used to call a service. It is also possible to authenticate with a specific scope to obtain an offline token that will never expire.

Downloading a corpora

You can use the Python SDK to download ELG corpora.

[6]:
corpus = Corpus.from_id(913)
corpus.download()
Downloading:
        [913] 2006 CoNLL Shared Task - Ten Languages

Please, visit the licence of this corpus distribution by clicking: https://live.european-language-grid.eu/catalogue_backend/static/project/licences/ELG-ENT-LIC-050320-00000769.pdf

Do you accept the licence terms: (yes/[no]): yes

Downloading the corpus distribution to 2006_CoNLL_Shared_Task_Ten_Languages.zip:
100%|██████████| 19.0M/19.0M [00:03<00:00, 4.98MiB/s]

As for services, corpora can be initialized directly from catalogue search results.