oats.trajectory
Trajectory logging, metrics, retrieval, and reporting.
oats.trajectory.logger
Hook-driven trajectory logger.
Registers global Python handlers on the HookEngine that persist each
turn into the SQLite/FTS5 trajectory store. Registration is idempotent and
gated by CODER_FEATURE_TRAJECTORY_STORE; calling install() when the
flag is off is a no-op.
Turn indexing uses a per-session in-memory counter keyed on session_id
and backfilled from TrajectoryStore.session_turns() on first touch, so
a restarted process continues numbering where it left off.
- oats.trajectory.logger.install()[source]
Register the logger’s global handlers. Idempotent. Returns True if active.
Called once at process startup when the feature flag is on. Safe to call more than once — subsequent calls short-circuit so handlers aren’t registered multiple times.
- Return type:
oats.trajectory.metrics
Per-turn self-improvement metrics.
Writes to the turn_metrics table in the same SQLite db as the trajectory
store so retrieval records and turn outcomes can be joined for A/B analysis.
Two entry points:
log_retrieval_used()— at prompt-injection time, records what was retrieved (if anything) for a given(session_id, turn_idx).log_turn_outcome()— at turn end, records whether the model completed cleanly, how many agent-loop iterations it took, and tool-error count.
report() aggregates the table into a diagnostics dict callers can
render to stdout, Markdown, or JSON. The CLI at python -m
coder.trajectory.report prints a weekly summary.
- class oats.trajectory.metrics.TurnMetricRow(session_id, turn_idx, user_prompt, retrieved_ids, retrieved_scores, retrieval_used, iterations, tool_error_count, completed, duration_ms, model_id, created_at, updated_at)[source]
Bases:
objectOne row from the
turn_metricstable.- session_id
The session this turn belongs to.
- turn_idx
Sequential turn index within the session.
- user_prompt
The user’s prompt text (truncated to 2000 chars).
- retrieved_ids
JSON-encoded list of trajectory IDs that were retrieved.
- retrieved_scores
JSON-encoded list of BM25 scores for retrieved items.
- retrieval_used
Whether retrieval was used for this turn.
- iterations
Number of agent-loop iterations the turn took.
- tool_error_count
Number of tool errors encountered.
- completed
Whether the turn completed successfully.
- duration_ms
Turn duration in milliseconds.
- model_id
The model ID used for this turn.
- created_at
Timestamp when the row was first inserted.
- updated_at
Timestamp when the row was last updated.
- session_id: str
- turn_idx: int
- retrieval_used: bool
- created_at: float
- updated_at: float
- __init__(session_id, turn_idx, user_prompt, retrieved_ids, retrieved_scores, retrieval_used, iterations, tool_error_count, completed, duration_ms, model_id, created_at, updated_at)
- oats.trajectory.metrics.log_retrieval_used(*, session_id, turn_idx, user_prompt, retrieved, model_id=None, store=None)[source]
Record what was retrieved for this turn.
retrievedis pairs of (score, trajectory_id).- Return type:
- oats.trajectory.metrics.log_turn_outcome(*, session_id, turn_idx, iterations, tool_error_count, completed, duration_ms=None, model_id=None, store=None)[source]
Record the outcome of a finished turn.
- Return type:
- class oats.trajectory.metrics.CohortStats(label, turns=0, completed=0, avg_iterations=0.0, avg_tool_errors=0.0, avg_duration_ms=0.0)[source]
Bases:
objectAggregate stats for one slice of turns.
- label: str
- turns: int = 0
- completed: int = 0
- avg_iterations: float = 0.0
- avg_tool_errors: float = 0.0
- avg_duration_ms: float = 0.0
- property completion_rate: float
Fraction of turns in this cohort that completed successfully.
- __init__(label, turns=0, completed=0, avg_iterations=0.0, avg_tool_errors=0.0, avg_duration_ms=0.0)
- oats.trajectory.metrics.report(since_days=7.0, *, store=None)[source]
Aggregate turn_metrics across the last
since_days.Splits into two cohorts — retrieval-used vs. not — so callers can tell whether in-context retrieval is pulling its weight. Returns a dict ready for JSON or Markdown rendering.
- Return type:
- oats.trajectory.metrics.format_report_markdown(data)[source]
Render the report dict as a Markdown table.
Produces a header with the time window and total turns, followed by a table of cohort statistics and an interpretation note.
- Return type:
- Parameters:
data – The dict returned by
report().- Returns:
A Markdown-formatted string.
oats.trajectory.store
SQLite + FTS5 trajectory store.
One row per “turn record” — a user prompt, an assistant reply, a tool call,
or a tool result. FTS5 mirrors the content column so callers can rank
past turns by BM25 using SQLite’s builtin bm25() function.
Concurrency strategy (borrowed from the Hermes state-store pattern and adapted to coder2’s layering):
WAL journal mode — concurrent readers tolerate a single writer.
Short
busy_timeout(1.0s) + application-level retry with random jitter (20–150 ms per retry, 15 retries). SQLite’s built-in busy handler is deterministic and causes convoys under parallel writers; jitter disperses them.PRAGMA synchronous = NORMAL— durable after fsync’d checkpoints while keeping per-insert cost low. Trajectory data is observability, not a money-transfer log; this is the right durability/throughput trade.Writes run through
asyncio.to_threadso they never block the event loop that drivesHookEngine.fire.
- class oats.trajectory.store.TrajectoryRecord(id, session_id, parent_session_id, turn_idx, role, kind, tool_name, content, created_at)[source]
Bases:
objectOne row from the trajectory store.
- id: int
- session_id: str
- turn_idx: int
- role: str
- kind: str
- content: str
- created_at: float
- __init__(id, session_id, parent_session_id, turn_idx, role, kind, tool_name, content, created_at)
- class oats.trajectory.store.TrajectoryStore(db_path=None)[source]
Bases:
objectPersistent, full-text searchable log of agent turns.
Thread-safe: a single process-wide connection guarded by an RLock, which is the simpler correct choice on SQLite WAL (multiple connections also work but buy nothing here since we serialize writers anyway).
Call
record()(sync) from any thread, orarecord()(async) from coroutine code.- __init__(db_path=None)[source]
Initialize the trajectory store, creating the database if needed.
Opens a SQLite connection in WAL mode with autocommit, creates the
trajectoriestable, the FTS5 virtual table, and theturn_metricstable (with triggers to keep FTS5 in sync).- Parameters:
db_path – Path to the SQLite database file. Defaults to
<data_dir>/trajectories.db.
- record(*, session_id, turn_idx, role, kind, content, tool_name=None, parent_session_id=None, created_at=None)[source]
Insert one turn record. Returns the new row id.
Retries under SQLITE_BUSY with random jitter so concurrent writers don’t convoy on the builtin busy handler.
- Return type:
- async arecord(**kwargs)[source]
Async wrapper around
record()— offloads to a thread.- Return type:
- search(query, *, limit=10, session_id=None, kinds=None)[source]
BM25-rank past turns by query. Returns
(score, record)pairs.Lower rank is better in SQLite FTS5 (
bm25()returns a negative score). We flip sign before returning so higher = more relevant, to match the convention used elsewhere in coder2.
- async asearch(query, **kwargs)[source]
Async wrapper around
search()— offloads to a thread.
- oats.trajectory.store.get_store(db_path=None)[source]
Return the process-wide trajectory store, creating it on first use.
- Return type:
TrajectoryStore
oats.trajectory.retrieval
Retrieval-augmented examples from the trajectory store.
Given a fresh user prompt, find the top past prompts that resemble it, then
pull the tool outcomes that immediately followed them in the same session.
That (matched prompt → successful continuation) pair becomes an
in-context example the current model can imitate.
Design notes:
Retrieval uses
TrajectoryStore.search()overKIND_PROMPTrows so we’re ranking user intents, not tool noise.Continuation lookup is scoped to the original session via
TrajectoryStore.session_turns()so we never mix threads across sessions. We take at mostcontinuation_limitturns following the matched prompt’sturn_idx, cap each turn’s content length, and stop at the next prompt (which would belong to a different user intent).Examples are formatted plainly — no special delimiters — so the injection costs a predictable number of tokens regardless of retrieval depth.
- class oats.trajectory.retrieval.Example(score, prompt_record, continuation)[source]
Bases:
objectOne retrieval-augmented example.
- score: float
- prompt_record: TrajectoryRecord
- continuation: list[TrajectoryRecord]
- format(*, content_cap=400)[source]
Format this example as a plain-text block for injection into the system prompt.
Renders the matched past prompt and each continuation turn with its tool name or kind tag. Content is truncated to
content_capchars.- Return type:
- Parameters:
content_cap – Maximum characters per content field.
- Returns:
A multi-line string with the formatted example.
- __init__(score, prompt_record, continuation)
- oats.trajectory.retrieval.retrieve_examples(user_prompt, *, top_k=2, min_score=0.0, continuation_limit=4, store=None, exclude_session_id=None)[source]
Return up to
top_kexamples foruser_prompt.exclude_session_idlets the caller skip the current session so the model isn’t handed back a stale version of what it just did.- Return type:
list[Example]
oats.trajectory.report
CLI for the trajectory self-improvement report.
Usage:
python -m coder.trajectory.report # 7-day markdown report
python -m coder.trajectory.report --since 30 # 30-day window
python -m coder.trajectory.report --json # raw dict
The heavy lifting lives in coder.trajectory.metrics; this module is
just argument parsing and output formatting.
- oats.trajectory.report.main(argv=None)[source]
CLI entry point for the trajectory self-improvement report.
Parses
--since(window in days, default 7) and--json(emit JSON instead of Markdown). Delegates tooats.trajectory.metrics.report()andoats.trajectory.metrics.format_report_markdown().- Return type:
- Parameters:
argv – Command-line arguments (defaults to
sys.argv[1:]).- Returns:
Exit code (always 0 on success).