Benchmark¶

class elg.benchmark.Benchmark(services: List[Service])¶

Class to execute multiple services in parallel and allows for easy comparison of their outputs using the same input.

Examples:

from elg import Benchmark

# A benchmark can be initialized with a list of services ids (from_ids method) or with a list of entities (from_entities method).
# Here we compare English to German Machine Translation services.
ben = Benchmark.from_ids([610, 624])

# The benchmark can be run on multiple inputs and can be run multiple times to guarantee the result (the first run is also
# usually longer than the next ones due to the service pods initialization).
result = ben(["Bush is the president of the USA and lives in Washington.", "ELG is an amazing project."], number_of_runs=2)

# The benchmark call returns a benchmark result object that can be used to compare the result.

# You can have an overview of the result,
df = result.compare()
print("General comparison:\n", df)

# compare only the results,
df = result.compare_results()
print("Comparison of the results:\n", df)

# or only the response time.
df = result.compare_response_times()
print("Comparison of the response time:\n", df)

# The compare methods return a DataFrame object that can be exported to csv, excel and many other formats to have a
# better visualization
result.compare().to_csv("/tmp/result.csv")

# We can take another example and compare sentiment analysis services.
ben = Benchmark.from_ids([477, 510])
inputs = [
    "This movie is not good at all.",
    "This movie is not good but it was a good moment at the cinema.",
    "This movie is not so bad.",
    "I liked the movie but it was not must seen.",
    "It was the best movie I have ever seen."
]
result = ben(
    inputs,
    output_funcs=[
        lambda x: x.features["OVERALL"],
        lambda x: x.annotations["SentenceSet"][0].features["score"] * 100
    ]
)

print("Result:\n")
result.compare()

classmethod from_entities(entities: List[Entity], auth_object: Optional[str] = None, auth_file: Optional[str] = None, scope: Optional[str] = None, use_cache: bool = True, cache_dir='~/.cache/elg')¶: Class method to init a Benchmark using a list of entities which will be convert into services using the from_entity class method of the Service class. Refer to this method for further explanation.

classmethod from_ids(ids: List[int], auth_object: Optional[Authentication] = None, auth_file: Optional[str] = None, scope: Optional[str] = None, domain: Optional[str] = None, use_cache: bool = True, cache_dir: str = '~/.cache/elg')¶: Class method to init a Benchmark using a list of ids which will be convert into services using the from_id class method of the Service class. Refer to this method for further explanation.

__call__(request_inputs: Optional[Union[str, List[str], Request, List[Request]]] = None, request_type: str = 'text', sync_mode: bool = False, timeout: Optional[int] = None, check_file: bool = True, output_funcs: Union[str, Callable, List[Union[Callable, str]]] = 'auto', number_of_runs: int = 2)¶

Method to run the comparison of the services with the given inputs.

Parameters

request_inputs (Union[str, List[str], Request, List[Request]], optional) – list of inputs on which to compare the services. Each input must correspond to the request_input parameter of the Service __call__ method. Defaults to None.
request_type (str, optional) – precise the type of the request. Can be “text”, “structuredText”, or “audio”. It is only used if request_input is not a Request object. Defaults to “text”.
sync_mode (bool, optional) – sync_mode parameter to give to the Service __call__ method. Defaults to False.
timeout (int, optional) – timeout parameter to give to the Service __call__ method. Defaults to None.
check_file (bool, optional) – check_file parameter to give to the Service __call__ method. Defaults to True.
output_funcs (Union[str, Callable, List[Union[str, Callable]]], optional) – output_func parameters to give to the Services __call__ method. Defaults to “auto”.
number_of_runs (int, optional) – number of times to run the services on each input. It is recommended to run the services at least 2 times because on the first time the services usually need to be loaded in the ELG cluster, which will increase the response time. The response time of the second pass is, therefore, more precise. Defaults to 2.

Returns

result of the Benchmark call. To obtain the pandas DataFrame containing all the results, run the compare: method on the obtained BenchmarkResult object.

Return type

BenchmarkResult

class elg.benchmark.BenchmarkResult(services: List[Service], request_inputs: List[str])¶

Class the represent the result of a Benchmark call

set_colwidth(value: Optional[int] = None)¶

Method to easily change the colwidth value of pandas to better vizualize the DataFrame

Parameters: value (int, optional) – value of the colwidth. Defaults to None.

compare(columns: List[str] = ['result', 'response_time'], func: Union[str, list, dict] = 'last', level: str = 'run', colwidth: int = 0, **agg_kwargs)¶

Method to compare the obtained results. It returns a pandas DataFrame object containing the comparison

Parameters

columns (List[str], optional) – colums of the DataFrame to returned. Defaults to [“result”, “response_time”].
func (Union[str, list, dict], optional) – function to use for the comparison. To see all the possible function, please see https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.DataFrameGroupBy.aggregate.html?. Defaults to “last”.
level (str, optional) – level of the comparison. The level value can be: ‘service’, ‘request_input
' –
"run". (or 'run'. Defaults to) –
colwidth (int, optional) – if set, will change the colwidth parameter of pandas to better vizualize the DataFrame. Defaults to 0.

Raises

ValueError – error if the level parameters is not set to a correct value.

Returns

pandas DataFrame object containing the comparison

Return type

pd.DataFrame

compare_results(func: Union[str, list, dict] = 'last', level: str = 'request_input', colwidth: int = 0, **agg_kwargs)¶: Method similar to the compare method with default parameters optimized to compare the results

compare_response_times(func: Union[str, list, dict] = 'describe', level: str = 'service', colwidth: int = 0, **agg_kwargs)¶: Method similar to the compare method with default parameters optimized to compare the response_times