tardisbase.testing.regression_comparison.visualize_files module¶
- class tardisbase.testing.regression_comparison.visualize_files.MultiCommitCompare(regression_repo_path, commits, tardis_commits=None, tardis_repo_path=None, file_extensions=None, compare_function='git_diff')[source]¶
Bases:
object
Visualizes file changes across commits in a matrix format.
This class analyzes changes to files across multiple Git commits and displays the results in a tabular matrix format showing file transitions (added, deleted, modified, unchanged) between commits.
Supports filtering by file extensions to focus analysis on specific file types (e.g., .h5, .hdf5, .py files).
- Parameters:
regression_repo_path (str or Path) – Path to the regression data repository.
commits (list of str) – List of regression data commit hashes to analyze.
tardis_commits (list of str, optional) – List of corresponding TARDIS commits (for case 2).
tardis_repo_path (str or Path, optional) – Path to TARDIS repository (for getting TARDIS commit messages).
file_extensions (tuple of str, optional) – File extensions to filter by (e.g., (‘.h5’, ‘.hdf5’)). If None, analyzes all files.
compare_function (str, optional) –
Comparison method to use. Default is ‘git_diff’.
Options: - ‘git_diff’: Uses git’s built-in diff functionality to compare files
directly within the repository.
’cmd_diff’: Extracts files to temporary locations and uses the system’s diff command.
- analyze_commits()[source]¶
Analyze file changes across all commits.
Notes
Requires at least 2 commits. Populates transition_columns, file_transitions, and all_files attributes.
- cmd_diff_compare(file_path, older_commit, newer_commit)[source]¶
Compare files using command-line diff tool.
- extract_file_from_commit(commit_hash, file_path, temp_dir, suffix)[source]¶
Extract a single file from a git commit to temporary location.
- get_analysis_results()[source]¶
Get complete file change analysis results.
- Returns:
Commit info DataFrame, legend Series, and matrix DataFrame. Returns None if no analysis has been done.
- Return type:
tuple of (pandas.DataFrame, pandas.Series, pandas.DataFrame) or None
Notes
Must be called after analyze_commits().
- get_commit_info()[source]¶
Get commit information table.
- Returns:
DataFrame containing commit information.
- Return type:
pandas.DataFrame
- get_dataframe_matrix()[source]¶
Get the file change matrix as a DataFrame.
- Returns:
Legend series and matrix DataFrame, or None if no files found.
- Return type:
tuple of (pandas.Series, pandas.DataFrame) or (None, None)
- get_files_in_commit(commit_hash, file_extensions=None)[source]¶
Extract file paths from a Git commit, optionally filtered by extensions.
- is_file_modified(file_path, older_commit, newer_commit)[source]¶
Check if a file was modified between two commits.
Uses the configured comparison function (git_diff or cmd_diff) to determine if the file content differs between the two commits.
- Parameters:
- Returns:
True if file was modified, False otherwise.
- Return type:
- Raises:
ValueError – If an invalid comparison function is configured.