Create an ELG compatible service

Example with the integration of a NER model from HuggingFace: https://huggingface.co/elastic/distilbert-base-cased-finetuned-conll03-english

0. Set up the environment

Before starting, we will set up a new environment. Let’s create a new folder:

mkdir distilbert-ner-en
cd distilbert-ner-en

And now a new environment:

conda create -n python-services-distilbert-ner-en python=3.7
conda activate python-services-distilbert-ner-en

We will also install the packages needed:

pip install torch==1.10.2 transformers==4.16.2

1. Have the model running locally

The first step is to have the model we want to integrate into ELG running locally. In our case, we will firstly download the model using git lfs (make sure to have it installed https://git-lfs.github.com/):

git lfs install
git clone https://huggingface.co/elastic/distilbert-base-cased-finetuned-conll03-english

And then create a simple python script that run the model (we can call it use.py):

from transformers import pipeline

class DistilbertNEREn:

    nlp = pipeline("ner", "distilbert-base-cased-finetuned-conll03-english")

    def run(self, input_str):
        return self.nlp(input_str)

We can have a try using the Python Interpreter:

>>> from use import DistilbertNEREn
>>> model = DistilbertNEREn()
>>> model.run("Albert is from Germany.")
[{'word': 'Albert', 'score': 0.9995226263999939, 'entity': 'B-PER', 'index': 1}, {'word': 'Germany', 'score': 0.9997316598892212, 'entity': 'B-LOC', 'index': 4}]

The model is working well but is not yet ELG compatible.

2. Create an ELG compatible service

To create the ELG compatible service, we will use the ELG Python SDK installable through PIP (we include the flask extra to use the FlaskService class):

pip install 'elg[flask]'

Then we need to make our model inherits from the FlaskService class of elg and uses ELG Request and Response object. Let’s create a new python file called elg_service.py:

from transformers import pipeline

from elg import FlaskService
from elg.model import AnnotationsResponse

class DistilbertNEREn(FlaskService):

    nlp = pipeline("ner", "distilbert-base-cased-finetuned-conll03-english")

    def convert_outputs(self, outputs, content):
        annotations = {}
        offset = 0
        for output in outputs:
            word = output["word"]
            score = output["score"]
            entity = output["entity"]
            start = content.find(word) + offset
            end = start + len(word)
            content = content[end - offset :]
            offset = end
            if entity not in annotations.keys():
                annotations[entity] = [
                    {
                        "start": start,
                        "end": end,
                        "features": {
                            "word": str(word),
                            "score": str(score),
                        },
                    }
                ]
            else:
                annotations[entity].append(
                    {
                        "start": start,
                        "end": end,
                        "features": {
                            "word": str(word),
                            "score": str(score),
                        },
                    }
                )
        return AnnotationsResponse(annotations=annotations)

    def process_text(self, content):
        outputs = self.nlp(content.content)
        return self.convert_outputs(outputs, content.content)

flask_service = DistilbertNEREn("distilbert-ner-en")
app = flask_service.app

We also need to initialize the service at the end of the file and create the app variable.

We can now test our service from the Python Interpreter to make sure that everything is working:

>>> from elg_service import DistilbertNEREn
>>> from elg.model import TextRequest
>>> service = DistilbertNEREn("distilbert-ner-en")
>>> request = TextRequest(content="Albert is from Germany.")
>>> response = service.process_text(request)
>>> print(response)
type='annotations' warnings=None features=None annotations={'B-PER': [Annotation(start=0, end=6, sourceStart=None, sourceEnd=None, features={'word': 'Albert', 'score': 0.9995226263999939})], 'B-LOC': [Annotation(start=15, end=22, sourceStart=None, sourceEnd=None, features={'word': 'Germany', 'score': 0.9997316598892212})]}
>>> print(type(response))
<class 'elg.model.response.AnnotationsResponse.AnnotationsResponse'>

Our service is using an ELG Request object as input and an ELG Response object as output. It is therefore ELG compatible.

3. Generate the Docker image

To generate the Docker image, the first step is to create a Dockerfile. To simplify the process, you can use the elg CLI and run:

elg docker create --path ./elg_service.py --classname=DistilbertNEREn \
      --required_folders=distilbert-base-cased-finetuned-conll03-english \
      --requirements=torch==1.10.2 --requirements=transformers==4.16.2 \
      --base_image=python:3.7

It will generate the Dockerfile but also a file called docker_entrypoint.sh used to start the service inside the container, and a requirements.txt file.

Special note for PyTorch

The default PyTorch packages from PyPi are extremely large (over 700MB) as they are compiled with support for many different GPU architectures. Since the ELG platform currently supports only CPU-based inference you can generate a much smaller image by using the CPU-only builds of torch that are available from PyTorch’s own website:

elg docker create --path ./elg_service.py --classname=DistilbertNEREn \
      --required_folders=distilbert-base-cased-finetuned-conll03-english \
      --requirements="-f https://download.pytorch.org/whl/torch_stable.html" \
      --requirements=torch==1.10.2+cpu --requirements=transformers==4.16.2 \
      --base_image=python:3.7

We can now build the Docker image (here you need to replace with the name of your docker registry) (we use –platform linux/amd64 to make sure the resulting image will be compatible with the ELG cluster):

docker build --platform linux/amd64 -t airklizz/distilbert-ner-en:v1 .

It will generate a Docker image locally that we can test using the local installation feature of the ELG Python SDK (see the following session to have more information) as follows:

  • Generate the local installation configuration files

elg local-installation docker \
    --image airklizz/distilbert-ner-en:v1 \
    --execution_location http://localhost:8000/process \
    --name ner-elg-service \
    --gui
  • Start the local installation

cd elg_local_installation && docker-compose up

The local intallation takes a couple of seconds to start and once it done, you should be able to test the service directly from the GUI accessible at http://localhost:8080.

Local installation of the NER service from HF

The image is working and is compatible with ELG so we can push it on Docker Hub (or the Docker registry of you choice):

docker push airklizz/distilbert-ner-en:v1

4. Create a new service on ELG

You have to go to the ELG, have a provider account and then you can add a new ‘Service or Tool’.

../../_images/1.png

You have to fill all the information needed.

../../_images/2.png

And then save and create the item. You will be redirect to the landing page of the service.

../../_images/3.png

When the landing page looks good to you, you can submit the service for publication through the ‘Actions’ button.

At that step, the ELG technical team will deploy the Docker image of your service into the ELG Kubernetes cluster. Once it done, the service is tested and published.

The service is now publicly available on ELG.

4 And can be run through the UI:

5 or through the Python SDK (you need to find the id of the service, here: 6366):

from elg import Service

service = Service.from_id(6366)
response = service('Albert is from Germany.')
print(response)
# type='annotations' warnings=None features=None annotations={'B-PER': [Annotation(start=0, end=6, sourceStart=None, sourceEnd=None, features={'word': 'Albert', 'score': 0.9995226263999939})], 'B-LOC': [Annotation(start=15, end=22, sourceStart=None, sourceEnd=None, features={'word': 'Germany', 'score': 0.9997316598892212})]}