The Part 7 agent_turn inlines every tool result verbatim into the next message. That’s fine for short outputs but blows the context window on a 50-row run_sql, a multi-KB skill body, or a long exec_js log. Part 11 fixes that.
Three glued-together pieces:
log_tool — every dispatch persists the full output as an OAMP memory tagged kind=tool_output with the LLM’s tool_call_id....[+N bytes. full output: fetch_tool_output(tool_call_id='call_…')]. The model knows where to find the rest.fetch_tool_output(tool_call_id) — a registered tool that recovers the full bytes by id when the agent decides it needs them.Pieces 1 and 3 are your TODO 8 and TODO 9. Piece 2 (the agent_turn redefinition with the truncation marker) is the pre-built cell at the end of Part 11 — re-run cell §11’s agent_turn to revert to the minimal version.
A 50-row run_sql of cargo_items is ~10 KB. A skillbox body for agent/ora-error-catalog is ~25 KB. Three of those in one turn and you’re at 100 KB of context for outputs the model only skimmed once.
After offload + truncation:
fetch_tool_output(tool_call_id=...) hint when truncation happens.Most truncated outputs are never refetched — the model only pulls full bytes when its preview isn’t enough. Bandwidth follows attention.
log_toolThe write side of offload. Every dispatch persists the full tool output as an OAMP memory tagged kind=tool_output with the LLM’s tool_call_id.
Solution:
def log_tool(thread_id, tool_call_id, tool_name, tool_args, tool_output):
memory_client.add_memory(
tool_output,
user_id=USER_ID, agent_id=AGENT_ID,
thread_id=thread_id,
metadata={
"kind": "tool_output",
"tool_call_id": tool_call_id,
"tool_name": tool_name,
"tool_args": json.dumps(tool_args),
},
)
The metadata shape isn’t optional — TODO 9 (tool_fetch_tool_output) looks rows up by metadata_filter={"kind": "tool_output", "tool_call_id": ...}, so the keys here must match. tool_args is JSON-serialised so OAMP’s metadata store (a JSON column) can index it without a custom encoder for whatever Python types the caller passed in.
The hard-stop assert below your implementation calls log_tool with a synthetic tool_call_id, then queries OAMP with the same metadata filter and checks the row came back with the right shape.
tool_fetch_tool_outputThe read side, mirror image of TODO 8. The agent calls this when its inlined preview was truncated and it needs the missing bytes to answer.
The lookup uses OAMP’s metadata_filter:
records = memory_client._store.list(
"memory",
user_id=USER_ID, agent_id=AGENT_ID,
metadata_filter={"kind": "tool_output", "tool_call_id": tool_call_id},
limit=1,
)
Return a JSON object with tool_name, tool_args, and tool_output if found, or {"error": ...} if no record matches.
Solution:
@register
def tool_fetch_tool_output(tool_call_id: str) -> str:
"""Retrieve the full, untruncated output of a previous tool call.
Use this when a prior tool result in your context was truncated with
'...[+N bytes. full output: fetch_tool_output(tool_call_id=...)]' and you need
the missing bytes to answer.
"""
records = memory_client._store.list(
"memory",
user_id=USER_ID, agent_id=AGENT_ID,
metadata_filter={"kind": "tool_output", "tool_call_id": tool_call_id},
limit=1,
)
if not records:
return json.dumps({"error": f"no tool call with id {tool_call_id}"})
r = records[0]
meta = r.metadata or {}
return json.dumps({
"tool_name": meta.get("tool_name"),
"tool_args": meta.get("tool_args"),
"tool_output": r.content,
})
After this is registered, the pre-built agent_turn redefinition in the next cell calls log_tool for each dispatch and emits the truncation marker.
agent_turnThe Part 7 agent_turn had this dispatch tail:
messages.append({"role": "tool", "tool_call_id": tc.id, "content": output})
The Part 11 redefinition becomes:
log_tool(thread_id, tc.id, name, args, output)
if len(output) <= 600:
preview = output
else:
preview = (
output[:600] +
f" ...[+{len(output)-600} bytes. "
f"full output: fetch_tool_output(tool_call_id='{tc.id}')]"
)
messages.append({"role": "tool", "tool_call_id": tc.id, "content": preview})
Two changes:
log_tool(...) — full output → OAMP memory tagged with the tool_call_id.preview — what actually goes into the next LLM call. Compact, with a recovery hint.run_sql("SELECT * FROM cargo_items"). Output is 12 KB....[+11400 bytes. full output: fetch_tool_output(tool_call_id='call_X7Y')].fetch_tool_output(tool_call_id="call_X7Y"). The harness retrieves the full 12 KB from OAMP and returns it. The model answers with the complete data.Crucially, this decision is the model’s — not a heuristic. Bandwidth follows attention.
run_sql, a multi-KB skill body, or a long exec_js log can each consume more context than the rest of the turn combined.tool_call_id. Compact preview → message list. The marker tells the model where the rest is.fetch_tool_output returns “no tool call with id …” — the tool_call_id doesn’t match any OAMP memory. Either the offload write failed, or the tool_call_id was mangled. Check memory_client._store.list(...) directly with the same filter.
Truncation marker shows but model never calls fetch_tool_output — that’s usually fine; the model decided the preview was enough. If you suspect it’s misreading the marker, lower the truncation threshold (currently 600 bytes) or change the marker text to be more explicit.
Full outputs accumulate in OAMP and the table grows — yes, that’s by design. In production, prune kind=tool_output memories older than N days from OAMP, or scope them to a specific thread_id and delete on thread close.