tardisbase.testing.regression_comparison.visualize_files module

class tardisbase.testing.regression_comparison.visualize_files.MultiCommitCompare(regression_repo_path, commits, tardis_commits=None, tardis_repo_path=None, file_extensions=None, compare_function='git_diff')[source]

Bases: object

Visualizes file changes across commits in a matrix format.

This class analyzes changes to files across multiple Git commits and displays the results in a tabular matrix format showing file transitions (added, deleted, modified, unchanged) between commits.

Supports filtering by file extensions to focus analysis on specific file types (e.g., .h5, .hdf5, .py files).

Parameters:
  • regression_repo_path (str or Path) – Path to the regression data repository.

  • commits (list of str) – List of regression data commit hashes to analyze.

  • tardis_commits (list of str, optional) – List of corresponding TARDIS commits (for case 2).

  • tardis_repo_path (str or Path, optional) – Path to TARDIS repository (for getting TARDIS commit messages).

  • file_extensions (tuple of str, optional) – File extensions to filter by (e.g., (‘.h5’, ‘.hdf5’)). If None, analyzes all files.

  • compare_function (str, optional) –

    Comparison method to use. Default is ‘git_diff’.

    Options: - ‘git_diff’: Uses git’s built-in diff functionality to compare files

    directly within the repository.

    • ’cmd_diff’: Extracts files to temporary locations and uses the system’s diff command.

analyze_commits()[source]

Analyze file changes across all commits.

Notes

Requires at least 2 commits. Populates transition_columns, file_transitions, and all_files attributes.

cmd_diff_compare(file_path, older_commit, newer_commit)[source]

Compare files using command-line diff tool.

Parameters:
  • file_path (str) – Path to the file to compare.

  • older_commit (str) – Older commit hash.

  • newer_commit (str) – Newer commit hash.

Returns:

True if files differ, False if identical.

Return type:

bool

create_file_data_row(file_path)[source]

Create a data row for the file change matrix.

Parameters:

file_path (str) – File path for the row.

Returns:

dict of str – Row data with file path and change symbols.

Return type:

str

extract_file_from_commit(commit_hash, file_path, temp_dir, suffix)[source]

Extract a single file from a git commit to temporary location.

Parameters:
  • commit_hash (str) – Git commit hash to extract file from.

  • file_path (str) – Path to the file within the commit.

  • temp_dir (str or Path) – Temporary directory to extract file to.

  • suffix (str) – Suffix to add to the extracted filename.

Returns:

Path to the extracted file.

Return type:

str

get_analysis_results()[source]

Get complete file change analysis results.

Returns:

Commit info DataFrame, legend Series, and matrix DataFrame. Returns None if no analysis has been done.

Return type:

tuple of (pandas.DataFrame, pandas.Series, pandas.DataFrame) or None

Notes

Must be called after analyze_commits().

get_changes_with_git(older_commit, newer_commit)[source]

Analyze file changes between two commits.

Parameters:
  • older_commit (str) – Older commit hash.

  • newer_commit (str) – Newer commit hash.

Returns:

dict of str – File paths mapped to change symbols (A/D/M/•/−).

Return type:

str

get_commit_info()[source]

Get commit information table.

Returns:

DataFrame containing commit information.

Return type:

pandas.DataFrame

get_dataframe_matrix()[source]

Get the file change matrix as a DataFrame.

Returns:

Legend series and matrix DataFrame, or None if no files found.

Return type:

tuple of (pandas.Series, pandas.DataFrame) or (None, None)

get_files_in_commit(commit_hash, file_extensions=None)[source]

Extract file paths from a Git commit, optionally filtered by extensions.

Parameters:
  • commit_hash (str) – Git commit hash to analyze.

  • file_extensions (tuple of str, optional) – File extensions to filter by (e.g., (‘.h5’, ‘.hdf5’)). If None, returns all files.

Returns:

Set of file paths in the commit.

Return type:

set of str

is_file_modified(file_path, older_commit, newer_commit)[source]

Check if a file was modified between two commits.

Uses the configured comparison function (git_diff or cmd_diff) to determine if the file content differs between the two commits.

Parameters:
  • file_path (str) – Path to the file to check.

  • older_commit (str) – Older commit hash.

  • newer_commit (str) – Newer commit hash.

Returns:

True if file was modified, False otherwise.

Return type:

bool

Raises:

ValueError – If an invalid comparison function is configured.