Benchmark ELG services

The benchmark class allows to compare services regarding there responses and there inference time. It can be used to choose between different services.

[1]:
from elg import Benchmark

A benchmark can be initialized with a list of services ids (from_ids method) or with a list of entities (from_entities method). Here we compare English to German Machine Translation services.

[2]:
ben = Benchmark.from_ids([610, 624])

The benchmark can be run on multiple inputs and can be run multiple times to guarantee the result (the first run is also usually longer than the next ones due to the service pods initialization).

[ ]:
result = ben(["Bush is the president of the USA and lives in Washington.", "ELG is an amazing project."], number_of_runs=2)

The benchmark call returns a benchmark result object that can be used to compare the result.

You can have an overview of the result,

[4]:
df = result.compare()
print("General comparison:\n")
df
General comparison:

[4]:
result response_time
service request_input run
610 Bush is the president of the USA and lives in Washington. 0 {'content': 'Bush ist Präsident der USA und le... 14.931094
1 {'content': 'Bush ist Präsident der USA und le... 1.573129
ELG is an amazing project. 0 {'content': 'ELG ist ein großartiges Projekt.'... 1.520602
1 {'content': 'ELG ist ein großartiges Projekt.'... 1.472771
624 Bush is the president of the USA and lives in Washington. 0 Bush ist der Präsident der USA und lebt in Was... 13.735600
1 Bush ist der Präsident der USA und lebt in Was... 1.576197
ELG is an amazing project. 0 ELG ist ein erstaunliches Projekt. 1.535216
1 ELG ist ein erstaunliches Projekt. 1.470568

compare only the results,

[5]:
df = result.compare_results()
print("Comparison of the results:\n")
df
Comparison of the results:

[5]:
result
service request_input
610 Bush is the president of the USA and lives in Washington. {'content': 'Bush ist Präsident der USA und le...
ELG is an amazing project. {'content': 'ELG ist ein großartiges Projekt.'...
624 Bush is the president of the USA and lives in Washington. Bush ist der Präsident der USA und lebt in Was...
ELG is an amazing project. ELG ist ein erstaunliches Projekt.

or only the response time.

[6]:
df = result.compare_response_times()
print("Comparison of the response time:\n")
df
Comparison of the response time:

[6]:
response_time
count mean std min 25% 50% 75% max
service
610 4.0 4.874399 6.704589 1.472771 1.508644 1.546865 4.912620 14.931094
624 4.0 4.579395 6.104291 1.470568 1.519054 1.555707 4.616048 13.735600

The compare methods return a DataFrame object that can be exported to csv, excel and many other formats to have a better visualization

[7]:
result.compare().to_csv("/tmp/result.csv")

We can take another example and compare sentiment analysis services.

[ ]:
ben = Benchmark.from_ids([477, 510])
inputs = [
    "This movie is not good at all.",
    "This movie is not good but it was a good moment at the cinema.",
    "This movie is not so bad.",
    "I liked the movie but it was not must seen.",
    "It was the best movie I have ever seen."
]
result = ben(
    inputs,
    output_funcs=[
        lambda x: x.features["OVERALL"],
        lambda x: x.annotations["SentenceSet"][0].features["score"] * 100
    ]
)
[9]:
print("Result:\n")
result.compare()
Result:

[9]:
result response_time
service request_input run
477 I liked the movie but it was not must seen. 0 4.5 1.536851
1 4.5 1.536089
It was the best movie I have ever seen. 0 13.7 1.550828
1 13.7 1.553860
This movie is not good at all. 0 -70.4 1.716957
1 -70.4 1.536368
This movie is not good but it was a good moment at the cinema. 0 19 1.527658
1 19 1.533827
This movie is not so bad. 0 0 1.519724
1 0 1.552159
510 I liked the movie but it was not must seen. 0 50.0 1.532401
1 50.0 1.531347
It was the best movie I have ever seen. 0 50.0 1.544706
1 50.0 1.549288
This movie is not good at all. 0 -50.0 1.772147
1 -50.0 1.538869
This movie is not good but it was a good moment at the cinema. 0 -50.0 1.532049
1 -50.0 1.540134
This movie is not so bad. 0 100.0 1.522124
1 100.0 1.545880