Quickstart¶
The SDK contains a class for each main ELG functionality.
The Catalog
class is for browsing the catalogue, the Entity
class is for representing an ELG entity (i.e., an ELG resource), the Service
class is for using the ELG services, etc…
These classes can be imported directly from the elg
package as follows:
[1]:
from elg import Catalog, Entity, Service, Authentication, Corpus
Browsing the catalogue¶
First you have to init a Catalog
object.
Then you can use the search method to search for resources. This method returns a list of Entity
which can be displayed individually.
For example, we can search for a Machine Translation service for English and French.
[2]:
catalog = Catalog()
# Search and get the result as a list of Entity
results = catalog.search(
resource = "Tool/Service", # "Corpus", "Lexical/Conceptual resource" or "Language description"
function = "Machine Translation", # function should be pass only if resource is set to "Tool/Service"
languages = ["en", "fr"], # string or list if multiple languages
limit = 100,
)
print(f"Machine Translation service for English and French:\n{list(results)[0]}")
Machine Translation service for English and French:
----------------------------------------------------------------------
Id 597
Name Transformer en-fr: Machine Translation Model Trained
Using Tensor2tensor
Resource type Tool/Service
Entity type LanguageResource
Description Transformer en-fr translation model performs automatic
translation of raw text from en to fr
Licences ['Apache License 2.0']
Languages ['English', 'French']
Status None
----------------------------------------------------------------------
Another example can be a German NER corpora.
[3]:
results = catalog.search(
resource = "Corpus", # "Corpus", "Lexical/Conceptual resource" or "Language description"
languages = ["German"], # string or list if multiple languages
search="ner",
limit = 100,
)
print(f"German corpus for NER:\n{list(results)[0]}")
German corpus for NER:
----------------------------------------------------------------------
Id 5010
Name GermEval 2014 NER Shared Task
Resource type Corpus
Entity type LanguageResource
Description The data was sampled from German Wikipedia and News
Corpora as a collection of citations.The dataset covers
over 31,000 sentences corresponding to over 590,000
tokens.
Licences ['Creative Commons Attribution 4.0 International']
Languages ['German']
Status None
----------------------------------------------------------------------
Using the ELG services¶
You can use the ELG services directly in Python. Every ELG service can be initialized using its id, and then call with your custom input.
[4]:
lt = Service.from_id(474)
result = lt("Nikolas Tesla lives in Berlin.")
print(f"\n{result}")
Calling:
[474] Cogito Discover Named Entity Recognizer
with request:
type: text - content: Nikolas Tesla lives in Berlin. - mimeType: text/plain
type='annotations' warnings=None features=None annotations={'People': [Annotation(start=0, end=13, source_start=None, source_end=None, features={'SURNAME': 'Tesla', 'SEX': 'M', 'name': 'Nikolas Tesla', 'NAME': 'Nikolas'})], 'Place': [Annotation(start=23, end=29, source_start=None, source_end=None, features={'Lemma': 'Berlin', 'name': 'Berlin', 'Glossa': 'Staatshauptstadt in Berlin (Deutschland/Europa', 'GEOREF': 'Berlin/Deutschland/Europa'})]}
A service can also be initialized from a catalogue search result.
[5]:
catalog = Catalog()
results = catalog.search(
resource = "Tool/Service",
function = "Machine Translation",
languages = ["en", "fr"],
limit = 1,
)
service = Service.from_entity(next(results))
print(service)
----------------------------------------------------------------------
Id 597
Name Transformer en-fr: Machine Translation Model Trained
Using Tensor2tensor
Resource type Tool/Service
Entity type LanguageResource
Description Transformer en-fr translation model performs automatic
translation of raw text from en to fr
Licences ['Apache License 2.0']
Languages ['English', 'French']
Status None
----------------------------------------------------------------------
Different type of inputs can be used to call a service. It is also possible to authenticate with a specific scope to obtain an offline token that will never expire.
Downloading a corpora¶
You can use the Python SDK to download ELG corpora.
[6]:
corpus = Corpus.from_id(913)
corpus.download()
Downloading:
[913] 2006 CoNLL Shared Task - Ten Languages
Please, visit the licence of this corpus distribution by clicking: https://live.european-language-grid.eu/catalogue_backend/static/project/licences/ELG-ENT-LIC-050320-00000769.pdf
Do you accept the licence terms: (yes/[no]): yes
Downloading the corpus distribution to 2006_CoNLL_Shared_Task_Ten_Languages.zip:
100%|██████████| 19.0M/19.0M [00:03<00:00, 4.98MiB/s]
As for services, corpora can be initialized directly from catalogue search results.