You can interact with this notebook online: Launch notebook
Benchmarking Information¶
A benchmark is a comparison of the performance of code along commit history. It is a way to measure the performance of the code and to ensure that the code is not getting slower. In this notebook, we will be understanding the best ways to write benchmarks and how to run them. Throughout the notebook, we will be using the asv
package for benchmarking.
Setting up asv¶
Prior to installing asv, you need to set up a conda environment and install conda-build. You can do this by running the following commands:
conda activate base
conda install conda-build
After running the above commands, you can install asv by running the following command:
pip install asv
Running the benchmarks¶
To run the benchmarks, you can run the following command:
asv run
This command will run for last 2 commits. asv can also run for a specific commit or a range of commits which can be done by running the following commands:
asv run <tag/branch>^!
runs for the last commit for the given tag/branch.asv run master..mybranch
runs for the commits between master and mybranch.asv run HASHFILE:hashestobenchmark.txt
runs for the commits in the file hashestobenchmark.txt where each line is a commit hash.
Some of the important commands of asv are:
--quick
quickly runs the benchmarks.-e
shows errors in the benchmark.--bench <file>
runs the benchmarks only for the given file. The extension of the file should not be included.--skip-existing-successful
skips the benchmarks that have already been run successfully.
Example: asv run master^! --quick -e --bench run_tardis
To view the result in a website, you can run the following command:
asv publish
asv preview
In order to run the tardis benchmarks, you need to change atomic_data_fname
function in benchmark_base.py file and download the atomic_data. Here are the changes:
def atomic_data_fname(self):
from tardis.io.configuration.config_internal import get_data_dir
data_dir = get_data_dir()
atomic_data_fname = (
f"{data_dir}/kurucz_cd23_chianti_H_He.h5"
)
if not Path(atomic_data_fname).exists():
atom_data_missing_str = (
f"{atomic_data_fname} atomic datafiles "
f"does not seem to exist"
)
raise Exception(atom_data_missing_str)
return atomic_data_fname
After this you need to download the atomic data file which can be done by running the following file:
[1]:
from tardis.io.atom_data import download_atom_data
download_atom_data('kurucz_cd23_chianti_H_He')
Atomic Data kurucz_cd23_chianti_H_He already exists in /home/runner/Downloads/tardis-data/kurucz_cd23_chianti_H_He.h5. Will not download - override with force_download=True.
Writing Benchmarks¶
TARDIS has adopted a class way of writing benchmarks. You can browse the benchmarks directory in the tardis repository to understand how the benchmarks are written. Here are some of the important points to keep in mind while writing benchmarks:
The naming of file must specify the directory along with the file. Mentioning the tardis directory is not required.
The class name should be the same as the file name with
Benchmark
prepended to it.The class should inherit from
BenchmarkBase
if required.Every class should have a
setup
function which is used to set up the environment for the benchmark. This is done so that we can avoid potential noise in the benchmarks. For example, if a function needs a parameter which needs some time to be set up, we can write it up in thesetup
function.Common functions which can be inherited in multiple files in the future should be written in benchmark_base.py file.
The benchmark should be written in a function with the name
time_<function_name>
.Based on the local run, setting up the repeat might reduce time to run the benchmarks in the github actions. For example, if a benchmark takes less time to run then it might be a good idea to run that function for say 4 times but if a benchmark takes more time to run then it might be a good idea to run that function for 2 times.