oats.trajectory

Trajectory logging, metrics, retrieval, and reporting.

oats.trajectory.logger

Hook-driven trajectory logger.

Registers global Python handlers on the HookEngine that persist each turn into the SQLite/FTS5 trajectory store. Registration is idempotent and gated by CODER_FEATURE_TRAJECTORY_STORE; calling install() when the flag is off is a no-op.

Turn indexing uses a per-session in-memory counter keyed on session_id and backfilled from TrajectoryStore.session_turns() on first touch, so a restarted process continues numbering where it left off.

oats.trajectory.logger.install()[source]

Register the logger’s global handlers. Idempotent. Returns True if active.

Called once at process startup when the feature flag is on. Safe to call more than once — subsequent calls short-circuit so handlers aren’t registered multiple times.

Return type:

bool

oats.trajectory.logger.reset_for_tests()[source]

Testing hook — forget the installed flag and clear per-session counters.

Return type:

None

oats.trajectory.metrics

Per-turn self-improvement metrics.

Writes to the turn_metrics table in the same SQLite db as the trajectory store so retrieval records and turn outcomes can be joined for A/B analysis.

Two entry points:

  • log_retrieval_used() — at prompt-injection time, records what was retrieved (if anything) for a given (session_id, turn_idx).

  • log_turn_outcome() — at turn end, records whether the model completed cleanly, how many agent-loop iterations it took, and tool-error count.

report() aggregates the table into a diagnostics dict callers can render to stdout, Markdown, or JSON. The CLI at python -m coder.trajectory.report prints a weekly summary.

class oats.trajectory.metrics.TurnMetricRow(session_id, turn_idx, user_prompt, retrieved_ids, retrieved_scores, retrieval_used, iterations, tool_error_count, completed, duration_ms, model_id, created_at, updated_at)[source]

Bases: object

One row from the turn_metrics table.

session_id

The session this turn belongs to.

turn_idx

Sequential turn index within the session.

user_prompt

The user’s prompt text (truncated to 2000 chars).

retrieved_ids

JSON-encoded list of trajectory IDs that were retrieved.

retrieved_scores

JSON-encoded list of BM25 scores for retrieved items.

retrieval_used

Whether retrieval was used for this turn.

iterations

Number of agent-loop iterations the turn took.

tool_error_count

Number of tool errors encountered.

completed

Whether the turn completed successfully.

duration_ms

Turn duration in milliseconds.

model_id

The model ID used for this turn.

created_at

Timestamp when the row was first inserted.

updated_at

Timestamp when the row was last updated.

session_id: str
turn_idx: int
user_prompt: str | None
retrieved_ids: list[int]
retrieved_scores: list[float]
retrieval_used: bool
iterations: int | None
tool_error_count: int | None
completed: bool | None
duration_ms: int | None
model_id: str | None
created_at: float
updated_at: float
__init__(session_id, turn_idx, user_prompt, retrieved_ids, retrieved_scores, retrieval_used, iterations, tool_error_count, completed, duration_ms, model_id, created_at, updated_at)
oats.trajectory.metrics.log_retrieval_used(*, session_id, turn_idx, user_prompt, retrieved, model_id=None, store=None)[source]

Record what was retrieved for this turn. retrieved is pairs of (score, trajectory_id).

Return type:

None

oats.trajectory.metrics.log_turn_outcome(*, session_id, turn_idx, iterations, tool_error_count, completed, duration_ms=None, model_id=None, store=None)[source]

Record the outcome of a finished turn.

Return type:

None

class oats.trajectory.metrics.CohortStats(label, turns=0, completed=0, avg_iterations=0.0, avg_tool_errors=0.0, avg_duration_ms=0.0)[source]

Bases: object

Aggregate stats for one slice of turns.

label: str
turns: int = 0
completed: int = 0
avg_iterations: float = 0.0
avg_tool_errors: float = 0.0
avg_duration_ms: float = 0.0
property completion_rate: float

Fraction of turns in this cohort that completed successfully.

__init__(label, turns=0, completed=0, avg_iterations=0.0, avg_tool_errors=0.0, avg_duration_ms=0.0)
oats.trajectory.metrics.report(since_days=7.0, *, store=None)[source]

Aggregate turn_metrics across the last since_days.

Splits into two cohorts — retrieval-used vs. not — so callers can tell whether in-context retrieval is pulling its weight. Returns a dict ready for JSON or Markdown rendering.

Return type:

dict

oats.trajectory.metrics.format_report_markdown(data)[source]

Render the report dict as a Markdown table.

Produces a header with the time window and total turns, followed by a table of cohort statistics and an interpretation note.

Return type:

str

Parameters:

data – The dict returned by report().

Returns:

A Markdown-formatted string.

oats.trajectory.store

SQLite + FTS5 trajectory store.

One row per “turn record” — a user prompt, an assistant reply, a tool call, or a tool result. FTS5 mirrors the content column so callers can rank past turns by BM25 using SQLite’s builtin bm25() function.

Concurrency strategy (borrowed from the Hermes state-store pattern and adapted to coder2’s layering):

  • WAL journal mode — concurrent readers tolerate a single writer.

  • Short busy_timeout (1.0s) + application-level retry with random jitter (20–150 ms per retry, 15 retries). SQLite’s built-in busy handler is deterministic and causes convoys under parallel writers; jitter disperses them.

  • PRAGMA synchronous = NORMAL — durable after fsync’d checkpoints while keeping per-insert cost low. Trajectory data is observability, not a money-transfer log; this is the right durability/throughput trade.

  • Writes run through asyncio.to_thread so they never block the event loop that drives HookEngine.fire.

class oats.trajectory.store.TrajectoryRecord(id, session_id, parent_session_id, turn_idx, role, kind, tool_name, content, created_at)[source]

Bases: object

One row from the trajectory store.

id: int
session_id: str
parent_session_id: str | None
turn_idx: int
role: str
kind: str
tool_name: str | None
content: str
created_at: float
__init__(id, session_id, parent_session_id, turn_idx, role, kind, tool_name, content, created_at)
class oats.trajectory.store.TrajectoryStore(db_path=None)[source]

Bases: object

Persistent, full-text searchable log of agent turns.

Thread-safe: a single process-wide connection guarded by an RLock, which is the simpler correct choice on SQLite WAL (multiple connections also work but buy nothing here since we serialize writers anyway).

Call record() (sync) from any thread, or arecord() (async) from coroutine code.

__init__(db_path=None)[source]

Initialize the trajectory store, creating the database if needed.

Opens a SQLite connection in WAL mode with autocommit, creates the trajectories table, the FTS5 virtual table, and the turn_metrics table (with triggers to keep FTS5 in sync).

Parameters:

db_path – Path to the SQLite database file. Defaults to <data_dir>/trajectories.db.

record(*, session_id, turn_idx, role, kind, content, tool_name=None, parent_session_id=None, created_at=None)[source]

Insert one turn record. Returns the new row id.

Retries under SQLITE_BUSY with random jitter so concurrent writers don’t convoy on the builtin busy handler.

Return type:

int

async arecord(**kwargs)[source]

Async wrapper around record() — offloads to a thread.

Return type:

int

search(query, *, limit=10, session_id=None, kinds=None)[source]

BM25-rank past turns by query. Returns (score, record) pairs.

Lower rank is better in SQLite FTS5 (bm25() returns a negative score). We flip sign before returning so higher = more relevant, to match the convention used elsewhere in coder2.

Return type:

list[tuple[float, TrajectoryRecord]]

async asearch(query, **kwargs)[source]

Async wrapper around search() — offloads to a thread.

Return type:

list[tuple[float, TrajectoryRecord]]

session_turns(session_id, limit=1000)[source]

All turns for a session, ordered by turn_idx.

Return type:

list[TrajectoryRecord]

count()[source]

Total number of records. Handy for tests.

Return type:

int

close()[source]

Close the underlying SQLite connection.

Return type:

None

oats.trajectory.store.get_store(db_path=None)[source]

Return the process-wide trajectory store, creating it on first use.

Return type:

TrajectoryStore

oats.trajectory.store.reset_store()[source]

Testing hook — drop the module-level singleton.

Return type:

None

oats.trajectory.retrieval

Retrieval-augmented examples from the trajectory store.

Given a fresh user prompt, find the top past prompts that resemble it, then pull the tool outcomes that immediately followed them in the same session. That (matched prompt successful continuation) pair becomes an in-context example the current model can imitate.

Design notes:

  • Retrieval uses TrajectoryStore.search() over KIND_PROMPT rows so we’re ranking user intents, not tool noise.

  • Continuation lookup is scoped to the original session via TrajectoryStore.session_turns() so we never mix threads across sessions. We take at most continuation_limit turns following the matched prompt’s turn_idx, cap each turn’s content length, and stop at the next prompt (which would belong to a different user intent).

  • Examples are formatted plainly — no special delimiters — so the injection costs a predictable number of tokens regardless of retrieval depth.

class oats.trajectory.retrieval.Example(score, prompt_record, continuation)[source]

Bases: object

One retrieval-augmented example.

score: float
prompt_record: TrajectoryRecord
continuation: list[TrajectoryRecord]
format(*, content_cap=400)[source]

Format this example as a plain-text block for injection into the system prompt.

Renders the matched past prompt and each continuation turn with its tool name or kind tag. Content is truncated to content_cap chars.

Return type:

str

Parameters:

content_cap – Maximum characters per content field.

Returns:

A multi-line string with the formatted example.

__init__(score, prompt_record, continuation)
oats.trajectory.retrieval.retrieve_examples(user_prompt, *, top_k=2, min_score=0.0, continuation_limit=4, store=None, exclude_session_id=None)[source]

Return up to top_k examples for user_prompt.

exclude_session_id lets the caller skip the current session so the model isn’t handed back a stale version of what it just did.

Return type:

list[Example]

oats.trajectory.retrieval.format_examples_section(examples)[source]

Render an examples block for the system prompt; None if no examples.

Return type:

str | None

oats.trajectory.report

CLI for the trajectory self-improvement report.

Usage:

python -m coder.trajectory.report                 # 7-day markdown report
python -m coder.trajectory.report --since 30      # 30-day window
python -m coder.trajectory.report --json          # raw dict

The heavy lifting lives in coder.trajectory.metrics; this module is just argument parsing and output formatting.

oats.trajectory.report.main(argv=None)[source]

CLI entry point for the trajectory self-improvement report.

Parses --since (window in days, default 7) and --json (emit JSON instead of Markdown). Delegates to oats.trajectory.metrics.report() and oats.trajectory.metrics.format_report_markdown().

Return type:

int

Parameters:

argv – Command-line arguments (defaults to sys.argv[1:]).

Returns:

Exit code (always 0 on success).