
Open In Colab

class elg.corpus.Corpus(id: int, resource_name: str, resource_short_name: List[str], resource_type: str, entity_type: str, description: str, keywords: List[str], detail: str, licences: List[str], languages: List[str], country_of_registration: List[str], creation_date: str, last_date_updated: str, functional_service: bool, functions: List[str], intended_applications: List[str], views: int, downloads: int, size: int, service_execution_count: int, status: str, under_construction: bool, record: dict, auth_object: elg.authentication.Authentication, auth_file: str, scope: str, domain: str, use_cache: bool, cache_dir: str)

Class to represent a corpus. Download ELG corpora.


from elg import Corpus

# You can initialize a corpus from its id. You will be asked to authenticate on the ELG website.
corpus = Corpus.from_id(913)

# You can display the corpus information.

# You can download the corpus. Note that only corpora hosted on ELG are downloadable using the python SDK.

# By default the corpus is downloaded at the current location and the filename is the name of the ELG corpus.
# You can overwrite this with the folder and filename parameters."ELG_corpus", folder="/tmp/")

# You can create an corpus from a catalog search result. First you need to search for a service using the catalog.
# Let's search an English to French Machine Translation service.
from elg import Catalog

catalog = Catalog()
results =
    resource = "Corpus",
    languages = ["German"],
    limit = 1,

corpus = Corpus.from_entity(results[0])
download(distribution_idx: int = 0, filename: str = None, folder: str = './')

Method to download the corpus if possible.

  • distribution_idx (int, optional) – Index of the distribution of the corpus to download. Defaults to 0.

  • filename (str, optional) – Name of the output file. If None, the name of the corpus will be used. Defaults to None.

  • folder (str, optional) – path to the folder where to save the downloaded file. Defaults to “./”.

classmethod from_entity(entity: elg.entity.Entity, auth_object: Optional[str] = None, auth_file: Optional[str] = None, scope: Optional[str] = None, use_cache: bool = True, cache_dir='~/.cache/elg')

Class method to init a Corpus class from an Entity object. You can provide authentication information through the auth_object or the auth_file attributes. If not authentication information is provided, the Authentication object will be initialized.

  • entity (elg.Entity) – Entity object to init as a Corpus.

  • auth_object (elg.Authentication, optional) – elg.Authentication object to use. Defaults to None.

  • auth_file (str, optional) – json file that contains the authentication tokens. Defaults to None.

  • scope (str, optional) – scope to use when requesting tokens. Can be set to “openid” or “offline_access” to get offline tokens. Defaults to “openid”.

  • domain (str, optional) – ELG domain you want to use. “live” to use the public ELG, “dev” to use the development ELG and another value to use a local ELG. Defaults to “live”.

  • use_cache (bool, optional) – True if you want to use cached files. Defaults to True.

  • cache_dir (str, optional) – path to the cache_dir. Set it to None to not store any cached files. Defaults to “~/.cache/elg”.


Corpus object with authentication information.

Return type


classmethod from_id(id: int, auth_object: Optional[elg.authentication.Authentication] = None, auth_file: Optional[str] = None, scope: Optional[str] = None, domain: Optional[str] = None, use_cache: bool = True, cache_dir: str = '~/.cache/elg')

Class method to init a Corpus class from its id. You can provide authentication information through the auth_object or the auth_file attributes. If not authentication information is provided, the Authentication object will be initialized.

  • id (int) – id of the corpus.

  • auth_object (elg.Authentication, optional) – elg.Authentication object to use. Defaults to None.

  • auth_file (str, optional) – json file that contains the authentication tokens. Defaults to None.

  • scope (str, optional) – scope to use when requesting tokens. Can be set to “openid” or “offline_access” to get offline tokens. Defaults to “openid”.

  • domain (str, optional) – ELG domain you want to use. “live” to use the public ELG, “dev” to use the development ELG and another value to use a local ELG. Defaults to “live”.

  • use_cache (bool, optional) – True if you want to use cached files. Defaults to True.

  • cache_dir (str, optional) – path to the cache_dir. Set it to None to not store any cached files. Defaults to “~/.cache/elg”.


the corpus object initialized.

Return type


class elg.corpus.Distribution(corpus_id: int, domain: str, form: str, distribution_location: str, download_location: str, access_location: str, licence: elg.corpus.Licence, cost: str, attribution_text: str)

Class to represent a corpus distribution

classmethod from_data(corpus_id: int, domain: str, data: dict)

Class method to init the distribution object from the metadata information.

  • corpus_id (int) – id of the corpus the distribution is from.

  • domain (str) – ELG domain you want to use. “live” to use the public ELG, “dev” to use the development ELG and another value to use a local ELG.

  • data (dict) – metadata information of the distribution.


the distribution object initialized.

Return type



Method to get if the distribution is downloadable.


return True is the distribution is downloadable, False if not.

Return type


class elg.corpus.Licence(name: str, urls: List[str], identifiers: List[dict])

Class to represent a licence