oats.git

Git repository operations, diff extraction, and dataset building.

oats.git.coauthor

Git co-author enforcement for the coder agent.

Ensures all git commit commands include a Co-Authored-By trailer so that AI-generated code is properly attributed, in line with the Linux Foundation’s stance on AI contribution disclosure.

Usage:

from oats.git.coauthor import ensure_coauthor_trailer
cmd = ensure_coauthor_trailer('git commit -m "fix bug"')
oats.git.coauthor.ensure_coauthor_trailer(command)[source]

Ensure a git commit command includes the coder co-author trailer.

Inspects the command for a -m message flag (quoted or heredoc style) and appends the Co-Authored-By trailer. Falls back to adding a --trailer flag if the message format is unrecognized.

Return type:

str

Parameters:

command – The full shell command string.

Returns:

The original or modified command string with the trailer included.

oats.git.git_diff_extractor

Git Commit Diff Extractor

A tool to extract git diffs from commits in the /opt/ds/oats repository, store commit metadata and diffs in a pandas DataFrame, and save results to disk.

oats.git.git_diff_extractor.get_git_repo(repo_path)[source]

Initialize and return a GitPython Repo object.

Return type:

Repo

Parameters:

repo_path – Filesystem path to the Git repository.

Returns:

A git.Repo instance for the given path.

Raises:

ValueError – If the path does not exist or is not a Git repository.

oats.git.git_diff_extractor.extract_commit_data(repo, max_commits=None)[source]

Extract commit metadata and diffs from the repository.

Return type:

List[Dict[str, Any]]

Parameters:
  • repo – A git.Repo instance to extract from.

  • max_commits – Optional cap on the number of commits to process.

Returns:

List of dicts, each containing commit hash, author info, dates, message, and diff content.

oats.git.git_diff_extractor.create_dataframe(commits_data)[source]

Create a pandas DataFrame from a list of commit data dicts.

Return type:

DataFrame

Parameters:

commits_data – List of dicts returned by extract_commit_data().

Returns:

A pandas DataFrame with one row per commit.

oats.git.git_diff_extractor.save_dataframe(df, output_file)[source]

Save the DataFrame to a CSV file on disk.

Return type:

None

Parameters:
  • df – The pandas DataFrame to persist.

  • output_file – Filesystem path for the output CSV.

Raises:

Exception – If the file cannot be written.

oats.git.git_diff_extractor.new_api(areq=None, repo_path=None)[source]

Main API function to extract git diffs from a repository.

Extracts commit metadata and diffs, builds a DataFrame, and saves it to CSV.

Return type:

Dict[str, Any]

Parameters:
  • areq – AgentReq instance containing configuration (repo_path, max_commits, output_file).

  • repo_path – Direct repo path override (ignored if areq provides one).

Returns:

success, message, output_file, commit_count, df.

Return type:

Dictionary with keys

oats.git.git_diff_extractor.setup_parser()[source]

Set up and return the CLI argument parser for the diff extractor.

Return type:

ArgumentParser

Returns:

Configured argparse.ArgumentParser instance.

oats.git.git_diff_extractor.main()[source]

CLI entry point — parse args, run diff extraction, and return exit code.

Return type:

int

oats.git.git_to_df_converter

Git Commit to Pandas DataFrame Converter

Extracts Git commit history and converts it into a structured pandas DataFrame, sorted by commit date in descending order. Can save the result as a Parquet file.

oats.git.git_to_df_converter.extract_git_commits(repo_path)[source]

Extract Git commit history from a repository and return as a pandas DataFrame.

Iterates over all commits in the repository, collecting SHA, author, date, message, and parent hashes. Results are sorted by commit date in descending order.

Return type:

DataFrame

Parameters:

repo_path – Path to the Git repository.

Returns:

id, author_name, author_email, commit_date, message, parents.

Return type:

DataFrame with columns

Raises:

Exception – If the repository cannot be opened or read.

oats.git.git_to_df_converter.save_dataframe_to_parquet(df, output_path)[source]

Save a pandas DataFrame to a Parquet file.

Creates parent directories as needed (unless the path is an S3 URI).

Return type:

None

Parameters:
  • df – The DataFrame to persist.

  • output_path – Local filesystem path or S3 URI for the Parquet file.

Raises:

Exception – If the file cannot be written.

oats.git.git_to_df_converter.main()[source]

CLI entry point — parse args, extract commits, and save to Parquet.

Return type:

None

oats.git.repo_to_parquet

Simple script to convert a single Git repository to Parquet format.

Extracts all commits from the specified repository, builds a pandas DataFrame, and saves the result as a Parquet file. Defaults to /opt/ds/oats as the source repository and /tmp/repo-commits.pq as the output path.

oats.git.repo_to_parquet.get_arg_vals()[source]

Parse and return CLI arguments for the repo-to-Parquet converter.

Return type:

Namespace

Returns:

Parsed namespace with repo_path and out_file attributes.

oats.git.repo_to_parquet.main()[source]

CLI entry point — parse args and convert a Git repository to Parquet format.

Return type:

None

oats.git.repo_to_parquet.convert_repo_to_parquet(repo_path, out_file)[source]

Convert a Git repository’s commit history to a Parquet file.

Extracts all commits from the given repository, builds a pandas DataFrame, and saves it to the specified Parquet output path.

Return type:

DataFrame | None

Parameters:
  • repo_path – Path to the Git repository.

  • out_file – Path where the Parquet file will be written.

Returns:

The DataFrame of commits on success, or None on failure.

oats.git.build_git_repo_to_dataset

Git Repository to Markdown Dataset Builder

This module analyzes a Git repository and generates markdown representations of each commit’s diff, storing them in a pandas DataFrame under the column ‘git_diff_md’.

oats.git.build_git_repo_to_dataset.build_git_diff_markdown(diff_index)[source]

Convert a Git diff index to a markdown table.

The table has columns: File, Type (Added/Modified/Deleted/Renamed), and Changes.

Return type:

str

Parameters:

diff_index – A GitPython diff index (iterable of diff objects).

Returns:

A markdown-formatted table as a string, or empty string if no diffs.

oats.git.build_git_repo_to_dataset.extract_commit_info(commit)[source]

Extract basic metadata from a Git commit object.

Return type:

dict

Parameters:

commit – A GitPython commit object.

Returns:

commit_hash, author, email, date, message.

Return type:

Dict with keys

oats.git.build_git_repo_to_dataset.build_git_dataset(repo_path)[source]

Build a pandas DataFrame containing git commit diffs in markdown format.

Iterates over all commits in the repository, extracts metadata, computes diffs between consecutive commits, and formats each diff as a markdown table.

Return type:

DataFrame

Parameters:

repo_path – Path to the Git repository.

Returns:

commit_hash, author, email, date, message, git_diff_md.

Return type:

DataFrame with columns

Raises:

Exception – If the repository cannot be opened or processed.

oats.git.build_git_repo_to_dataset.main()[source]

CLI entry point — build the git dataset and optionally save to Parquet.

Return type:

DataFrame

oats.git.build_git_repo_to_dataset.new_api(areq)[source]

High-level API function for integrating with AgentReq.

Delegates to build_git_dataset() using the repo path from the request.

Return type:

DataFrame

Parameters:

areq – AgentReq object with a repo_path attribute (defaults to .).

Returns:

DataFrame with commit information and markdown diffs.

oats.git.git_commit_search1

git_commit_search.py — Search a git repository’s commit history for case-insensitive phrases, or replay a predefined set of git actions loaded from a JSON file. REPLACE_USER in repo: REPLACE_REPO_PATH search term: REPLACE_GIT_SEARCH

Usage:

# Phrase search (builds -G actions automatically) python git_commit_search.py -r REPLACE_REPO_PATH -p “REPLACE_GIT_SEARCH”

# Replay a saved action plan python git_commit_search.py -r REPLACE_REPO_PATH -f git_actions.json

# Both: run file actions first, then phrase search python git_commit_search.py -r . -f git_actions.json -p “REPLACE_GIT_SEARCH” -a

# Verbose output python git_commit_search.py -f git_actions.json -v

class oats.git.git_commit_search1.ActionType(*values)[source]

Bases: str, Enum

Supported git sub-commands for commit search actions.

Members:

LOG: Run git log with the configured filters. SHOW: Run git show to inspect a specific commit.

LOG = 'log'
SHOW = 'show'
class oats.git.git_commit_search1.GitCommitAction(**data)[source]

Bases: BaseModel

Represents a single git command — either git log or git show.

Covers every flag used during the feed-detail-fix1 UUID investigation:

git log –author=<email> –oneline [-n N] [–skip N] git log –all –oneline [–follow] -S <pickaxe> [– <path>] git log –all –oneline [–regexp-ignore-case] -G <phrase> [– <path>] git show <hash> [–stat] [– <path>]

model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

action: ActionType
description: str | None
author: str | None
all_branches: bool
follow: bool
oneline: bool
limit: int | None
skip: int | None
pickaxe: str | None
grep_phrase: str | None
path_filter: str | None
commit_hash: str | None
stat: bool
class oats.git.git_commit_search1.GitSearchPlan(**data)[source]

Bases: BaseModel

Top-level wrapper for a git_actions JSON file.

Holds the repository path, an optional human-readable name, and an ordered list of GitCommitAction objects to execute.

model_config: ClassVar[ConfigDict] = {'extra': 'ignore', 'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str | None
repo: str | None
actions: list[GitCommitAction] | None
oats.git.git_commit_search1.parse_args(argv=None)[source]

Parse CLI arguments and return the populated Namespace.

Return type:

Namespace

Short flags:
-r

path to the git repository (default: current directory)

-f

path to a git_actions JSON file

-p

one or more case-insensitive phrases to search for

-a

search all branches (–all) when using -p phrases

-n

maximum commits to return per phrase search (default: 50)

-A

filter by author e-mail or name when using -p phrases

-P

restrict phrase search to a specific file path

-v

enable DEBUG-level logging

oats.git.git_commit_search1.load_search_plan(file_path)[source]

Read and validate a search plan from file_path.

Accepts either: * a bare JSON array [ {...}, {...} ] * a wrapped object { "name": "...", "repo": ".", "actions": [ ... ] }

Return type:

GitSearchPlan

oats.git.git_commit_search1.build_phrase_actions(phrases, *, all_branches=False, limit=50, author=None, path_filter=None)[source]

Convert a list of plain-text phrases into GitCommitAction objects using case-insensitive -G grep searches.

Return type:

list[GitCommitAction]

Parameters:
  • phrases – List of text strings to search for.

  • all_branches – If True, pass --all to git log.

  • limit – Cap on commits returned per phrase.

  • author – Optional author filter.

  • path_filter – Optional file path to restrict search to.

Returns:

One GitCommitAction per phrase.

oats.git.git_commit_search1.execute_action(repo, action)[source]

Execute a single GitCommitAction against repo and return a result dict with keys:

  • action – the action type string

  • description – human-readable label (may be None)

  • command – the git command line that was run

  • output – raw text output from git

  • lines – output split into non-empty lines

  • match_count – number of output lines (useful for log searches)

  • error – error message string if the command failed, else None

Return type:

dict[str, Any]

Open the git repository at repo_path, execute every action in plan in order, and return the collected results.

Return type:

list[dict[str, Any]]

Parameters:
  • plan – The loaded and validated GitSearchPlan.

  • repo_path – Filesystem path to the git repo root. The plan’s own repo field is used as the default; the caller’s repo_path argument takes precedence when supplied.

Returns:

A list of result dicts, one per action (see execute_action()).

oats.git.git_commit_search1.main(argv=None)[source]

Program entry point — parse args, build plan, run search.

Return type:

None

oats.git.walk_up_dir_path_to_find_git_config

Walk up the directory tree to find the first directory containing .git/config.

oats.git.walk_up_dir_path_to_find_git_config.walk_up_dir_path_to_find_git_config(dir_path)[source]

Walk up the directory tree from dir_path to find the Git repository root.

Starting at the given directory, checks each ancestor for a .git/config file. Stops at the filesystem root if no Git repo is found.

Return type:

Tuple[bool, str]

Parameters:

dir_path – The directory to start the search from.

Returns:

(True, repo_dir) with the absolute path to the repo root on success, or (False, "") if no Git repository is found.

oats.git.walk_up_dir_path_to_find_git_config.main()[source]

CLI entry point — walk up from the target directory and print the git repo root.

Return type:

None

oats.git.worktree

Git worktree manager — creates isolated workspaces for sub-agents.

Each worktree is a separate checkout of the repository, allowing sub-agents to modify files without affecting the main working tree.

class oats.git.worktree.WorktreeManager(repo_dir)[source]

Bases: object

Manages git worktrees for isolated sub-agent work.

Worktrees are created under <repo>/.coder-worktrees/<id>/ and can be cleaned up after the agent finishes.

__init__(repo_dir)[source]

Initialize the worktree manager for the given repository.

Parameters:

repo_dir – Path to the root of the Git repository.

async create(branch_name=None)[source]

Create an isolated worktree.

Return type:

Path

Parameters:

branch_name – Optional branch name. If None, creates a detached worktree from the current HEAD.

Returns:

Path to the new worktree directory.

async cleanup(worktree_path)[source]

Remove a worktree.

Only removes if the worktree has no uncommitted changes.

Return type:

None

async has_changes(worktree_path)[source]

Check if a worktree has uncommitted changes.

Return type:

bool

async list_worktrees()[source]

List all worktrees managed by this instance.

Return type:

list[Path]

async merge_back(worktree_path, target_branch=None)[source]

Merge worktree changes back to the main branch.

Returns the merge commit message or error.

Return type:

str