oats.git
Git repository operations, diff extraction, and dataset building.
oats.git.git_diff_extractor
Git Commit Diff Extractor
A tool to extract git diffs from commits in the /opt/ds/oats repository, store commit metadata and diffs in a pandas DataFrame, and save results to disk.
- oats.git.git_diff_extractor.get_git_repo(repo_path)[source]
Initialize and return a GitPython
Repoobject.- Return type:
Repo- Parameters:
repo_path – Filesystem path to the Git repository.
- Returns:
A
git.Repoinstance for the given path.- Raises:
ValueError – If the path does not exist or is not a Git repository.
- oats.git.git_diff_extractor.extract_commit_data(repo, max_commits=None)[source]
Extract commit metadata and diffs from the repository.
- oats.git.git_diff_extractor.create_dataframe(commits_data)[source]
Create a pandas DataFrame from a list of commit data dicts.
- Return type:
DataFrame- Parameters:
commits_data – List of dicts returned by
extract_commit_data().- Returns:
A pandas DataFrame with one row per commit.
- oats.git.git_diff_extractor.save_dataframe(df, output_file)[source]
Save the DataFrame to a CSV file on disk.
- oats.git.git_diff_extractor.new_api(areq=None, repo_path=None)[source]
Main API function to extract git diffs from a repository.
Extracts commit metadata and diffs, builds a DataFrame, and saves it to CSV.
oats.git.git_to_df_converter
Git Commit to Pandas DataFrame Converter
Extracts Git commit history and converts it into a structured pandas DataFrame, sorted by commit date in descending order. Can save the result as a Parquet file.
- oats.git.git_to_df_converter.extract_git_commits(repo_path)[source]
Extract Git commit history from a repository and return as a pandas DataFrame.
Iterates over all commits in the repository, collecting SHA, author, date, message, and parent hashes. Results are sorted by commit date in descending order.
- Return type:
DataFrame- Parameters:
repo_path – Path to the Git repository.
- Returns:
id,author_name,author_email,commit_date,message,parents.- Return type:
DataFrame with columns
- Raises:
Exception – If the repository cannot be opened or read.
oats.git.repo_to_parquet
Simple script to convert a single Git repository to Parquet format.
Extracts all commits from the specified repository, builds a pandas DataFrame,
and saves the result as a Parquet file. Defaults to /opt/ds/oats as the
source repository and /tmp/repo-commits.pq as the output path.
- oats.git.repo_to_parquet.get_arg_vals()[source]
Parse and return CLI arguments for the repo-to-Parquet converter.
- Return type:
- Returns:
Parsed namespace with
repo_pathandout_fileattributes.
- oats.git.repo_to_parquet.main()[source]
CLI entry point — parse args and convert a Git repository to Parquet format.
- Return type:
- oats.git.repo_to_parquet.convert_repo_to_parquet(repo_path, out_file)[source]
Convert a Git repository’s commit history to a Parquet file.
Extracts all commits from the given repository, builds a pandas DataFrame, and saves it to the specified Parquet output path.
- Return type:
DataFrame|None- Parameters:
repo_path – Path to the Git repository.
out_file – Path where the Parquet file will be written.
- Returns:
The DataFrame of commits on success, or
Noneon failure.
oats.git.build_git_repo_to_dataset
Git Repository to Markdown Dataset Builder
This module analyzes a Git repository and generates markdown representations of each commit’s diff, storing them in a pandas DataFrame under the column ‘git_diff_md’.
- oats.git.build_git_repo_to_dataset.build_git_diff_markdown(diff_index)[source]
Convert a Git diff index to a markdown table.
The table has columns: File, Type (Added/Modified/Deleted/Renamed), and Changes.
- Return type:
- Parameters:
diff_index – A GitPython diff index (iterable of diff objects).
- Returns:
A markdown-formatted table as a string, or empty string if no diffs.
- oats.git.build_git_repo_to_dataset.extract_commit_info(commit)[source]
Extract basic metadata from a Git commit object.
- Return type:
- Parameters:
commit – A GitPython commit object.
- Returns:
commit_hash,author,email,date,message.- Return type:
Dict with keys
- oats.git.build_git_repo_to_dataset.build_git_dataset(repo_path)[source]
Build a pandas DataFrame containing git commit diffs in markdown format.
Iterates over all commits in the repository, extracts metadata, computes diffs between consecutive commits, and formats each diff as a markdown table.
- Return type:
DataFrame- Parameters:
repo_path – Path to the Git repository.
- Returns:
commit_hash,author,email,date,message,git_diff_md.- Return type:
DataFrame with columns
- Raises:
Exception – If the repository cannot be opened or processed.
- oats.git.build_git_repo_to_dataset.main()[source]
CLI entry point — build the git dataset and optionally save to Parquet.
- Return type:
DataFrame
- oats.git.build_git_repo_to_dataset.new_api(areq)[source]
High-level API function for integrating with AgentReq.
Delegates to
build_git_dataset()using the repo path from the request.- Return type:
DataFrame- Parameters:
areq – AgentReq object with a
repo_pathattribute (defaults to.).- Returns:
DataFrame with commit information and markdown diffs.
oats.git.git_commit_search1
git_commit_search.py — Search a git repository’s commit history for case-insensitive phrases, or replay a predefined set of git actions loaded from a JSON file. REPLACE_USER in repo: REPLACE_REPO_PATH search term: REPLACE_GIT_SEARCH
- Usage:
# Phrase search (builds -G actions automatically) python git_commit_search.py -r REPLACE_REPO_PATH -p “REPLACE_GIT_SEARCH”
# Replay a saved action plan python git_commit_search.py -r REPLACE_REPO_PATH -f git_actions.json
# Both: run file actions first, then phrase search python git_commit_search.py -r . -f git_actions.json -p “REPLACE_GIT_SEARCH” -a
# Verbose output python git_commit_search.py -f git_actions.json -v
- class oats.git.git_commit_search1.ActionType(*values)[source]
-
Supported git sub-commands for commit search actions.
- Members:
LOG: Run
git logwith the configured filters. SHOW: Rungit showto inspect a specific commit.
- LOG = 'log'
- SHOW = 'show'
- class oats.git.git_commit_search1.GitCommitAction(**data)[source]
Bases:
BaseModelRepresents a single git command — either
git logorgit show.Covers every flag used during the feed-detail-fix1 UUID investigation:
git log –author=<email> –oneline [-n N] [–skip N] git log –all –oneline [–follow] -S <pickaxe> [– <path>] git log –all –oneline [–regexp-ignore-case] -G <phrase> [– <path>] git show <hash> [–stat] [– <path>]
- model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- action: ActionType
- class oats.git.git_commit_search1.GitSearchPlan(**data)[source]
Bases:
BaseModelTop-level wrapper for a git_actions JSON file.
Holds the repository path, an optional human-readable name, and an ordered list of
GitCommitActionobjects to execute.- model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- actions: list[GitCommitAction] | None
- oats.git.git_commit_search1.parse_args(argv=None)[source]
Parse CLI arguments and return the populated Namespace.
- Return type:
- Short flags:
- -r
path to the git repository (default: current directory)
- -f
path to a git_actions JSON file
- -p
one or more case-insensitive phrases to search for
- -a
search all branches (–all) when using -p phrases
- -n
maximum commits to return per phrase search (default: 50)
- -A
filter by author e-mail or name when using -p phrases
- -P
restrict phrase search to a specific file path
- -v
enable DEBUG-level logging
- oats.git.git_commit_search1.load_search_plan(file_path)[source]
Read and validate a search plan from file_path.
Accepts either: * a bare JSON array
[ {...}, {...} ]* a wrapped object{ "name": "...", "repo": ".", "actions": [ ... ] }- Return type:
- oats.git.git_commit_search1.build_phrase_actions(phrases, *, all_branches=False, limit=50, author=None, path_filter=None)[source]
Convert a list of plain-text phrases into
GitCommitActionobjects using case-insensitive-Ggrep searches.- Return type:
- Parameters:
phrases – List of text strings to search for.
all_branches – If
True, pass--allto git log.limit – Cap on commits returned per phrase.
author – Optional author filter.
path_filter – Optional file path to restrict search to.
- Returns:
One
GitCommitActionper phrase.
- oats.git.git_commit_search1.execute_action(repo, action)[source]
Execute a single
GitCommitActionagainst repo and return a result dict with keys:action– the action type stringdescription– human-readable label (may be None)command– the git command line that was runoutput– raw text output from gitlines– output split into non-empty linesmatch_count– number of output lines (useful for log searches)error– error message string if the command failed, else None
- oats.git.git_commit_search1.run_git_search(plan, *, repo_path='.')[source]
Open the git repository at repo_path, execute every action in plan in order, and return the collected results.
- Return type:
- Parameters:
plan – The loaded and validated
GitSearchPlan.repo_path – Filesystem path to the git repo root. The plan’s own
repofield is used as the default; the caller’srepo_pathargument takes precedence when supplied.
- Returns:
A list of result dicts, one per action (see
execute_action()).
oats.git.walk_up_dir_path_to_find_git_config
Walk up the directory tree to find the first directory containing .git/config.
- oats.git.walk_up_dir_path_to_find_git_config.walk_up_dir_path_to_find_git_config(dir_path)[source]
Walk up the directory tree from dir_path to find the Git repository root.
Starting at the given directory, checks each ancestor for a
.git/configfile. Stops at the filesystem root if no Git repo is found.
oats.git.worktree
Git worktree manager — creates isolated workspaces for sub-agents.
Each worktree is a separate checkout of the repository, allowing sub-agents to modify files without affecting the main working tree.
- class oats.git.worktree.WorktreeManager(repo_dir)[source]
Bases:
objectManages git worktrees for isolated sub-agent work.
Worktrees are created under <repo>/.coder-worktrees/<id>/ and can be cleaned up after the agent finishes.
- __init__(repo_dir)[source]
Initialize the worktree manager for the given repository.
- Parameters:
repo_dir – Path to the root of the Git repository.
- async create(branch_name=None)[source]
Create an isolated worktree.
- Return type:
- Parameters:
branch_name – Optional branch name. If None, creates a detached worktree from the current HEAD.
- Returns:
Path to the new worktree directory.
- async cleanup(worktree_path)[source]
Remove a worktree.
Only removes if the worktree has no uncommitted changes.
- Return type: