Benchmark ELG services¶
The benchmark class allows to compare services regarding there responses and there inference time. It can be used to choose between different services.
[1]:
from elg import Benchmark
A benchmark can be initialized with a list of services ids (from_ids method) or with a list of entities (from_entities method). Here we compare English to German Machine Translation services.
[2]:
ben = Benchmark.from_ids([610, 624])
The benchmark can be run on multiple inputs and can be run multiple times to guarantee the result (the first run is also usually longer than the next ones due to the service pods initialization).
[ ]:
result = ben(["Bush is the president of the USA and lives in Washington.", "ELG is an amazing project."], number_of_runs=2)
The benchmark call returns a benchmark result object that can be used to compare the result.
You can have an overview of the result,
[4]:
df = result.compare()
print("General comparison:\n")
df
General comparison:
[4]:
result | response_time | |||
---|---|---|---|---|
service | request_input | run | ||
610 | Bush is the president of the USA and lives in Washington. | 0 | {'content': 'Bush ist Präsident der USA und le... | 14.931094 |
1 | {'content': 'Bush ist Präsident der USA und le... | 1.573129 | ||
ELG is an amazing project. | 0 | {'content': 'ELG ist ein großartiges Projekt.'... | 1.520602 | |
1 | {'content': 'ELG ist ein großartiges Projekt.'... | 1.472771 | ||
624 | Bush is the president of the USA and lives in Washington. | 0 | Bush ist der Präsident der USA und lebt in Was... | 13.735600 |
1 | Bush ist der Präsident der USA und lebt in Was... | 1.576197 | ||
ELG is an amazing project. | 0 | ELG ist ein erstaunliches Projekt. | 1.535216 | |
1 | ELG ist ein erstaunliches Projekt. | 1.470568 |
compare only the results,
[5]:
df = result.compare_results()
print("Comparison of the results:\n")
df
Comparison of the results:
[5]:
result | ||
---|---|---|
service | request_input | |
610 | Bush is the president of the USA and lives in Washington. | {'content': 'Bush ist Präsident der USA und le... |
ELG is an amazing project. | {'content': 'ELG ist ein großartiges Projekt.'... | |
624 | Bush is the president of the USA and lives in Washington. | Bush ist der Präsident der USA und lebt in Was... |
ELG is an amazing project. | ELG ist ein erstaunliches Projekt. |
or only the response time.
[6]:
df = result.compare_response_times()
print("Comparison of the response time:\n")
df
Comparison of the response time:
[6]:
response_time | ||||||||
---|---|---|---|---|---|---|---|---|
count | mean | std | min | 25% | 50% | 75% | max | |
service | ||||||||
610 | 4.0 | 4.874399 | 6.704589 | 1.472771 | 1.508644 | 1.546865 | 4.912620 | 14.931094 |
624 | 4.0 | 4.579395 | 6.104291 | 1.470568 | 1.519054 | 1.555707 | 4.616048 | 13.735600 |
The compare methods return a DataFrame object that can be exported to csv, excel and many other formats to have a better visualization
[7]:
result.compare().to_csv("/tmp/result.csv")
We can take another example and compare sentiment analysis services.
[ ]:
ben = Benchmark.from_ids([477, 510])
inputs = [
"This movie is not good at all.",
"This movie is not good but it was a good moment at the cinema.",
"This movie is not so bad.",
"I liked the movie but it was not must seen.",
"It was the best movie I have ever seen."
]
result = ben(
inputs,
output_funcs=[
lambda x: x.features["OVERALL"],
lambda x: x.annotations["SentenceSet"][0].features["score"] * 100
]
)
[9]:
print("Result:\n")
result.compare()
Result:
[9]:
result | response_time | |||
---|---|---|---|---|
service | request_input | run | ||
477 | I liked the movie but it was not must seen. | 0 | 4.5 | 1.536851 |
1 | 4.5 | 1.536089 | ||
It was the best movie I have ever seen. | 0 | 13.7 | 1.550828 | |
1 | 13.7 | 1.553860 | ||
This movie is not good at all. | 0 | -70.4 | 1.716957 | |
1 | -70.4 | 1.536368 | ||
This movie is not good but it was a good moment at the cinema. | 0 | 19 | 1.527658 | |
1 | 19 | 1.533827 | ||
This movie is not so bad. | 0 | 0 | 1.519724 | |
1 | 0 | 1.552159 | ||
510 | I liked the movie but it was not must seen. | 0 | 50.0 | 1.532401 |
1 | 50.0 | 1.531347 | ||
It was the best movie I have ever seen. | 0 | 50.0 | 1.544706 | |
1 | 50.0 | 1.549288 | ||
This movie is not good at all. | 0 | -50.0 | 1.772147 | |
1 | -50.0 | 1.538869 | ||
This movie is not good but it was a good moment at the cinema. | 0 | -50.0 | 1.532049 | |
1 | -50.0 | 1.540134 | ||
This movie is not so bad. | 0 | 100.0 | 1.522124 | |
1 | 100.0 | 1.545880 |