Browsing the ELG catalogue¶
Search the ELG catalogue using Python
[1]:
from elg import Catalog
First you have to init a catalog object.
[2]:
catalog = Catalog()
Then you can use the search method to search for resources. This method returns a list of Entity
which can be displayed individually. For example, we can search for a Machine Translation service for English and French.
[3]:
results = catalog.search(
resource = "Tool/Service", # "Corpus", "Lexical/Conceptual resource" or "Language description"
function = "Machine Translation", # function should be pass only if resource is set to "Tool/Service"
languages = ["en", "fr"], # string or list if multiple languages
limit = 100,
)
print(f"Machine Translation service for English and French:\n{list(results)[0]}")
Machine Translation service for English and French:
----------------------------------------------------------------------
Id 597
Name Transformer en-fr: Machine Translation Model Trained
Using Tensor2tensor
Resource type Tool/Service
Entity type LanguageResource
Description Transformer en-fr translation model performs automatic
translation of raw text from en to fr
Licences ['Apache License 2.0']
Languages ['English', 'French']
Status None
----------------------------------------------------------------------
Another example can be a German NER corpora.
[4]:
results = catalog.search(
resource = "Corpus", # "Corpus", "Lexical/Conceptual resource" or "Language description"
languages = ["German"], # string or list if multiple languages
search="ner",
limit = 100,
)
print(f"German corpus for NER:\n{next(results)}")
German corpus for NER:
----------------------------------------------------------------------
Id 5010
Name GermEval 2014 NER Shared Task
Resource type Corpus
Entity type LanguageResource
Description The data was sampled from German Wikipedia and News
Corpora as a collection of citations.The dataset covers
over 31,000 sentences corresponding to over 590,000
tokens.
Licences ['Creative Commons Attribution 4.0 International']
Languages ['German']
Status None
----------------------------------------------------------------------
You can init a service from an Entity
.
We can use the catalog to search a Named Entity Recognizer for French and init a Service with the returned Entity.
[5]:
catalog = Catalog()
results = catalog.search(
resource = "Tool/Service",
function = "Named Entity Recognition",
languages = ["fr"],
limit = 1,
)
entity = next(results)
print(entity)
from elg import Service
lt = Service.from_entity(entity=entity)
result = lt("Jean Dupond vit à Paris.")
print(f"\n{result}")
----------------------------------------------------------------------
Id 474
Name Cogito Discover Named Entity Recognizer
Resource type Tool/Service
Entity type LanguageResource
Description Annotation of entities: People, Organizations, Places,
Known concepts, Unknown concepts. And also tags: urls,
mail addresses, phone numbers, addresses, dates, time,
measures, money, percentage, file folder.
Licences ['Cogito Discover License']
Languages ['English', 'German', 'Portuguese', 'Dutch', 'French',
'Spanish', 'Italian']
Status None
----------------------------------------------------------------------
Warning: The refresh token will expire in -2520.0 seconds!
Calling:
[474] Cogito Discover Named Entity Recognizer
with request:
type: text - content: Jean Dupond vit à Paris. - mimeType: text/plain
type='annotations' warnings=None features=None annotations={'People': [Annotation(start=0, end=11, source_start=None, source_end=None, features={'SURNAME': 'Dupond', 'SEX': 'M', 'name': 'Jean Dupond', 'NAME': 'Jean'})], 'Place': [Annotation(start=18, end=23, source_start=None, source_end=None, features={'Lemma': 'Paris', 'name': 'Paris', 'Glossa': 'capitale in Paris (Île-de-France/France/Europe', 'GEOREF': 'Paris/Île-de-France/France/Europe'})]}