oats.trajectory

Trajectory logging, metrics, retrieval, and reporting.

oats.trajectory.logger

Hook-driven trajectory logger.

Registers global Python handlers on the HookEngine that persist each turn into the SQLite/FTS5 trajectory store. Registration is idempotent and gated by CODER_FEATURE_TRAJECTORY_STORE; calling install() when the flag is off is a no-op.

Turn indexing uses a per-session in-memory counter keyed on session_id and backfilled from TrajectoryStore.session_turns() on first touch, so a restarted process continues numbering where it left off.

oats.trajectory.logger.install()[source]

Register the logger’s global handlers. Idempotent. Returns True if active.

Called once at process startup when the feature flag is on. Safe to call more than once — subsequent calls short-circuit so handlers aren’t registered multiple times.

Return type:: bool

oats.trajectory.logger.reset_for_tests()[source]

Testing hook — forget the installed flag and clear per-session counters.

Return type:: None

oats.trajectory.metrics

Per-turn self-improvement metrics.

Writes to the turn_metrics table in the same SQLite db as the trajectory store so retrieval records and turn outcomes can be joined for A/B analysis.

Two entry points:

log_retrieval_used() — at prompt-injection time, records what was retrieved (if anything) for a given (session_id, turn_idx).
log_turn_outcome() — at turn end, records whether the model completed cleanly, how many agent-loop iterations it took, and tool-error count.

report() aggregates the table into a diagnostics dict callers can render to stdout, Markdown, or JSON. The CLI at python -m coder.trajectory.report prints a weekly summary.

class oats.trajectory.metrics.TurnMetricRow(session_id, turn_idx, user_prompt, retrieved_ids, retrieved_scores, retrieval_used, iterations, tool_error_count, completed, duration_ms, model_id, created_at, updated_at)[source]

Bases: object

One row from the turn_metrics table.

session_id: The session this turn belongs to.

turn_idx: Sequential turn index within the session.

user_prompt: The user’s prompt text (truncated to 2000 chars).

retrieved_ids: JSON-encoded list of trajectory IDs that were retrieved.

retrieved_scores: JSON-encoded list of BM25 scores for retrieved items.

retrieval_used: Whether retrieval was used for this turn.

iterations: Number of agent-loop iterations the turn took.

tool_error_count: Number of tool errors encountered.

completed: Whether the turn completed successfully.

duration_ms: Turn duration in milliseconds.

model_id: The model ID used for this turn.

created_at: Timestamp when the row was first inserted.

updated_at: Timestamp when the row was last updated.

session_id: str

turn_idx: int

user_prompt: str | None

retrieved_ids: list[int]

retrieved_scores: list[float]

retrieval_used: bool

iterations: int | None

tool_error_count: int | None

completed: bool | None

duration_ms: int | None

model_id: str | None

created_at: float

updated_at: float

__init__(session_id, turn_idx, user_prompt, retrieved_ids, retrieved_scores, retrieval_used, iterations, tool_error_count, completed, duration_ms, model_id, created_at, updated_at)

oats.trajectory.metrics.log_retrieval_used(*, session_id, turn_idx, user_prompt, retrieved, model_id=None, store=None)[source]

Record what was retrieved for this turn. retrieved is pairs of (score, trajectory_id).

Return type:: None

oats.trajectory.metrics.log_turn_outcome(*, session_id, turn_idx, iterations, tool_error_count, completed, duration_ms=None, model_id=None, store=None)[source]

Record the outcome of a finished turn.

Return type:: None

class oats.trajectory.metrics.CohortStats(label, turns=0, completed=0, avg_iterations=0.0, avg_tool_errors=0.0, avg_duration_ms=0.0)[source]

Bases: object

Aggregate stats for one slice of turns.

label: str

turns: int = 0

completed: int = 0

avg_iterations: float = 0.0

avg_tool_errors: float = 0.0

avg_duration_ms: float = 0.0

property completion_rate: float: Fraction of turns in this cohort that completed successfully.

__init__(label, turns=0, completed=0, avg_iterations=0.0, avg_tool_errors=0.0, avg_duration_ms=0.0)

oats.trajectory.metrics.report(since_days=7.0, *, store=None)[source]

Aggregate turn_metrics across the last since_days.

Splits into two cohorts — retrieval-used vs. not — so callers can tell whether in-context retrieval is pulling its weight. Returns a dict ready for JSON or Markdown rendering.

Return type:: dict

oats.trajectory.metrics.format_report_markdown(data)[source]

Render the report dict as a Markdown table.

Produces a header with the time window and total turns, followed by a table of cohort statistics and an interpretation note.

Return type:: str
Parameters:: data – The dict returned by report().
Returns:: A Markdown-formatted string.

oats.trajectory.store

SQLite + FTS5 trajectory store.

One row per “turn record” — a user prompt, an assistant reply, a tool call, or a tool result. FTS5 mirrors the content column so callers can rank past turns by BM25 using SQLite’s builtin bm25() function.

Concurrency strategy (borrowed from the Hermes state-store pattern and adapted to coder2’s layering):

WAL journal mode — concurrent readers tolerate a single writer.
Short busy_timeout (1.0s) + application-level retry with random jitter (20–150 ms per retry, 15 retries). SQLite’s built-in busy handler is deterministic and causes convoys under parallel writers; jitter disperses them.
PRAGMA synchronous = NORMAL — durable after fsync’d checkpoints while keeping per-insert cost low. Trajectory data is observability, not a money-transfer log; this is the right durability/throughput trade.
Writes run through asyncio.to_thread so they never block the event loop that drives HookEngine.fire.

class oats.trajectory.store.TrajectoryRecord(id, session_id, parent_session_id, turn_idx, role, kind, tool_name, content, created_at)[source]

Bases: object

One row from the trajectory store.

id: int

session_id: str

parent_session_id: str | None

turn_idx: int

role: str

kind: str

tool_name: str | None

content: str

created_at: float

__init__(id, session_id, parent_session_id, turn_idx, role, kind, tool_name, content, created_at)

class oats.trajectory.store.TrajectoryStore(db_path=None)[source]

Bases: object

Persistent, full-text searchable log of agent turns.

Thread-safe: a single process-wide connection guarded by an RLock, which is the simpler correct choice on SQLite WAL (multiple connections also work but buy nothing here since we serialize writers anyway).

Call record() (sync) from any thread, or arecord() (async) from coroutine code.

__init__(db_path=None)[source]

Initialize the trajectory store, creating the database if needed.

Opens a SQLite connection in WAL mode with autocommit, creates the trajectories table, the FTS5 virtual table, and the turn_metrics table (with triggers to keep FTS5 in sync).

Parameters:: db_path – Path to the SQLite database file. Defaults to <data_dir>/trajectories.db.

record(*, session_id, turn_idx, role, kind, content, tool_name=None, parent_session_id=None, created_at=None)[source]

Insert one turn record. Returns the new row id.

Retries under SQLITE_BUSY with random jitter so concurrent writers don’t convoy on the builtin busy handler.

Return type:: int

async arecord(**kwargs)[source]

Async wrapper around record() — offloads to a thread.

Return type:: int

search(query, *, limit=10, session_id=None, kinds=None)[source]

BM25-rank past turns by query. Returns (score, record) pairs.

Lower rank is better in SQLite FTS5 (bm25() returns a negative score). We flip sign before returning so higher = more relevant, to match the convention used elsewhere in coder2.

Return type:: list[tuple[float, TrajectoryRecord]]

async asearch(query, **kwargs)[source]

Async wrapper around search() — offloads to a thread.

Return type:: list[tuple[float, TrajectoryRecord]]

session_turns(session_id, limit=1000)[source]

All turns for a session, ordered by turn_idx.

Return type:: list[TrajectoryRecord]

count()[source]

Total number of records. Handy for tests.

Return type:: int

close()[source]

Close the underlying SQLite connection.

Return type:: None

oats.trajectory.store.get_store(db_path=None)[source]

Return the process-wide trajectory store, creating it on first use.

Return type:: TrajectoryStore

oats.trajectory.store.reset_store()[source]

Testing hook — drop the module-level singleton.

Return type:: None

oats.trajectory.retrieval

Retrieval-augmented examples from the trajectory store.

Given a fresh user prompt, find the top past prompts that resemble it, then pull the tool outcomes that immediately followed them in the same session. That (matched prompt → successful continuation) pair becomes an in-context example the current model can imitate.

Design notes:

Retrieval uses TrajectoryStore.search() over KIND_PROMPT rows so we’re ranking user intents, not tool noise.
Continuation lookup is scoped to the original session via TrajectoryStore.session_turns() so we never mix threads across sessions. We take at most continuation_limit turns following the matched prompt’s turn_idx, cap each turn’s content length, and stop at the next prompt (which would belong to a different user intent).
Examples are formatted plainly — no special delimiters — so the injection costs a predictable number of tokens regardless of retrieval depth.

class oats.trajectory.retrieval.Example(score, prompt_record, continuation)[source]

Bases: object

One retrieval-augmented example.

score: float

prompt_record: TrajectoryRecord

continuation: list[TrajectoryRecord]

format(*, content_cap=400)[source]

Format this example as a plain-text block for injection into the system prompt.

Renders the matched past prompt and each continuation turn with its tool name or kind tag. Content is truncated to content_cap chars.

Return type:: str
Parameters:: content_cap – Maximum characters per content field.
Returns:: A multi-line string with the formatted example.

__init__(score, prompt_record, continuation)

oats.trajectory.retrieval.retrieve_examples(user_prompt, *, top_k=2, min_score=0.0, continuation_limit=4, store=None, exclude_session_id=None)[source]

Return up to top_k examples for user_prompt.

exclude_session_id lets the caller skip the current session so the model isn’t handed back a stale version of what it just did.

Return type:: list[Example]

oats.trajectory.retrieval.format_examples_section(examples)[source]

Render an examples block for the system prompt; None if no examples.

Return type:: str | None

oats.trajectory.report

CLI for the trajectory self-improvement report.

Usage:

python -m coder.trajectory.report                 # 7-day markdown report
python -m coder.trajectory.report --since 30      # 30-day window
python -m coder.trajectory.report --json          # raw dict

The heavy lifting lives in coder.trajectory.metrics; this module is just argument parsing and output formatting.

oats.trajectory.report.main(argv=None)[source]

CLI entry point for the trajectory self-improvement report.

Parses --since (window in days, default 7) and --json (emit JSON instead of Markdown). Delegates to oats.trajectory.metrics.report() and oats.trajectory.metrics.format_report_markdown().

Return type:: int
Parameters:: argv – Command-line arguments (defaults to sys.argv[1:]).
Returns:: Exit code (always 0 on success).