Benchmark ELG services¶

The benchmark class allows to compare services regarding there responses and there inference time. It can be used to choose between different services.

[1]:

from elg import Benchmark

A benchmark can be initialized with a list of services ids (from_ids method) or with a list of entities (from_entities method). Here we compare English to German Machine Translation services.

[2]:

ben = Benchmark.from_ids([610, 624])

The benchmark can be run on multiple inputs and can be run multiple times to guarantee the result (the first run is also usually longer than the next ones due to the service pods initialization).

[ ]:

result = ben(["Bush is the president of the USA and lives in Washington.", "ELG is an amazing project."], number_of_runs=2)

The benchmark call returns a benchmark result object that can be used to compare the result.

You can have an overview of the result,

[4]:

df = result.compare()
print("General comparison:\n")
df

General comparison:

[4]:

			result	response_time
service	request_input	run
610	Bush is the president of the USA and lives in Washington.	0	{'content': 'Bush ist Präsident der USA und le...	14.931094
	Bush is the president of the USA and lives in Washington.	1	{'content': 'Bush ist Präsident der USA und le...	1.573129
	ELG is an amazing project.	0	{'content': 'ELG ist ein großartiges Projekt.'...	1.520602
	ELG is an amazing project.	1	{'content': 'ELG ist ein großartiges Projekt.'...	1.472771
624	Bush is the president of the USA and lives in Washington.	0	Bush ist der Präsident der USA und lebt in Was...	13.735600
	Bush is the president of the USA and lives in Washington.	1	Bush ist der Präsident der USA und lebt in Was...	1.576197
	ELG is an amazing project.	0	ELG ist ein erstaunliches Projekt.	1.535216
	ELG is an amazing project.	1	ELG ist ein erstaunliches Projekt.	1.470568

compare only the results,

[5]:

df = result.compare_results()
print("Comparison of the results:\n")
df

Comparison of the results:

[5]:

		result
service	request_input
610	Bush is the president of the USA and lives in Washington.	{'content': 'Bush ist Präsident der USA und le...
610	ELG is an amazing project.	{'content': 'ELG ist ein großartiges Projekt.'...
624	Bush is the president of the USA and lives in Washington.	Bush ist der Präsident der USA und lebt in Was...
624	ELG is an amazing project.	ELG ist ein erstaunliches Projekt.

or only the response time.

[6]:

df = result.compare_response_times()
print("Comparison of the response time:\n")
df

Comparison of the response time:

[6]:

	response_time
	count	mean	std	min	25%	50%	75%	max
service
610	4.0	4.874399	6.704589	1.472771	1.508644	1.546865	4.912620	14.931094
624	4.0	4.579395	6.104291	1.470568	1.519054	1.555707	4.616048	13.735600

The compare methods return a DataFrame object that can be exported to csv, excel and many other formats to have a better visualization

[7]:

result.compare().to_csv("/tmp/result.csv")

We can take another example and compare sentiment analysis services.

[ ]:

ben = Benchmark.from_ids([477, 510])
inputs = [
    "This movie is not good at all.",
    "This movie is not good but it was a good moment at the cinema.",
    "This movie is not so bad.",
    "I liked the movie but it was not must seen.",
    "It was the best movie I have ever seen."
]
result = ben(
    inputs,
    output_funcs=[
        lambda x: x.features["OVERALL"],
        lambda x: x.annotations["SentenceSet"][0].features["score"] * 100
    ]
)

[9]:

print("Result:\n")
result.compare()

Result:

[9]:

			result	response_time
service	request_input	run
477	I liked the movie but it was not must seen.	0	4.5	1.536851
	I liked the movie but it was not must seen.	1	4.5	1.536089
	It was the best movie I have ever seen.	0	13.7	1.550828
	It was the best movie I have ever seen.	1	13.7	1.553860
	This movie is not good at all.	0	-70.4	1.716957
	This movie is not good at all.	1	-70.4	1.536368
	This movie is not good but it was a good moment at the cinema.	0	19	1.527658
		1	19	1.533827
	This movie is not so bad.	0	0	1.519724
	This movie is not so bad.	1	0	1.552159
510	I liked the movie but it was not must seen.	0	50.0	1.532401
	I liked the movie but it was not must seen.	1	50.0	1.531347
	It was the best movie I have ever seen.	0	50.0	1.544706
	It was the best movie I have ever seen.	1	50.0	1.549288
	This movie is not good at all.	0	-50.0	1.772147
	This movie is not good at all.	1	-50.0	1.538869
	This movie is not good but it was a good moment at the cinema.	0	-50.0	1.532049
		1	-50.0	1.540134
	This movie is not so bad.	0	100.0	1.522124
	This movie is not so bad.	1	100.0	1.545880