You can interact with this notebook online: Launch notebook

Benchmarking Information

A benchmark is a comparison of the performance of code along commit history. It is a way to measure the performance of the code and to ensure that the code is not getting slower. In this notebook, we will be understanding the best ways to write benchmarks and how to run them. Throughout the notebook, we will be using the asv package for benchmarking.

Setting up asv

Prior to installing asv, you need to set up a conda environment and install conda-build. You can do this by running the following commands:

conda activate base
conda install conda-build

After running the above commands, you can install asv by running the following command:

pip install asv

Running the benchmarks

To run the benchmarks, you can run the following command:

asv run

This command will run for last 2 commits. asv can also run for a specific commit or a range of commits which can be done by running the following commands:

  • asv run <tag/branch>^! runs for the last commit for the given tag/branch.

  • asv run master..mybranch runs for the commits between master and mybranch.

  • asv run HASHFILE:hashestobenchmark.txt runs for the commits in the file hashestobenchmark.txt where each line is a commit hash.

Some of the important commands of asv are:

  • --quick quickly runs the benchmarks.

  • -e shows errors in the benchmark.

  • --bench <file> runs the benchmarks only for the given file. The extension of the file should not be included.

  • --skip-existing-successful skips the benchmarks that have already been run successfully.

Example: asv run master^! --quick -e --bench run_tardis

To view the result in a website, you can run the following command:

asv publish
asv preview

In order to run the tardis benchmarks, you need to change atomic_data_fname function in benchmark_base.py file and download the atomic_data. Here are the changes:

def atomic_data_fname(self):
    from tardis.io.configuration.config_internal import get_data_dir

    data_dir = get_data_dir()
    atomic_data_fname = (
        f"{data_dir}/kurucz_cd23_chianti_H_He.h5"
    )

    if not Path(atomic_data_fname).exists():
        atom_data_missing_str = (
            f"{atomic_data_fname} atomic datafiles "
            f"does not seem to exist"
        )
        raise Exception(atom_data_missing_str)

    return atomic_data_fname

After this you need to download the atomic data file which can be done by running the following file:

[1]:
from tardis.io.atom_data.util import download_atom_data

download_atom_data('kurucz_cd23_chianti_H_He')
Atomic Data kurucz_cd23_chianti_H_He already exists in /home/runner/Downloads/tardis-data/kurucz_cd23_chianti_H_He.h5. Will not download - override with force_download=True.

Writing Benchmarks

TARDIS has adopted a class way of writing benchmarks. You can browse the benchmarks directory in the tardis repository to understand how the benchmarks are written. Here are some of the important points to keep in mind while writing benchmarks:

  • The naming of file must specify the directory along with the file. Mentioning the tardis directory is not required.

  • The class name should be the same as the file name with Benchmark prepended to it.

  • The class should inherit from BenchmarkBase if required.

  • Every class should have a setup function which is used to set up the environment for the benchmark. This is done so that we can avoid potential noise in the benchmarks. For example, if a function needs a parameter which needs some time to be set up, we can write it up in the setup function.

  • Common functions which can be inherited in multiple files in the future should be written in benchmark_base.py file.

  • The benchmark should be written in a function with the name time_<function_name>.

  • Based on the local run, setting up the repeat might reduce time to run the benchmarks in the github actions. For example, if a benchmark takes less time to run then it might be a good idea to run that function for say 4 times but if a benchmark takes more time to run then it might be a good idea to run that function for 2 times.