# Memory and Context Engineering for AI Agents with Oracle AI Database, Langchain and Tavily

--------

[![Open in Colab](https://img.shields.io/badge/Open%20in-Colab-F9AB00?style=flat-square&logo=googlecolab)](https://colab.research.google.com/github/oracle-devrel/oracle-ai-developer-hub/blob/main/notebooks/memory_context_engineering_agents.ipynb)

In this notebook, you'll learn how to engineer memory systems that give AI agents the ability to remember, learn, and adapt across conversations. 
Moving beyond simple RAG, we implement a complete **Memory Manager** with six distinct memory types‚Äîeach serving a specific cognitive function.



## What You'll Build

| Memory Type | Purpose | Storage |
|-------------|---------|---------|
| **Conversational** | Chat history per thread | SQL Table |
| **Knowledge Base** | Searchable documents & facts | Vector Store |
| **Workflow** | Learned action patterns | Vector Store |
| **Toolbox** | Dynamic tool definitions | Vector Store |
| **Entity** | People, places, systems extracted from context | Vector Store |
| **Summary** | Compressed context for long conversations | Vector Store |


## Key Concepts Covered

- **Memory Engineering**: Design patterns for agent memory systems
- **Context Engineering**: Techniques for optimizing what goes into the LLM context
- **Context Window Management**: Monitor usage, auto-summarize at thresholds
- **Just-in-Time Retrieval**: Compact summaries with on-demand expansion
- **Dynamic Tool Calling**: Semantic tool discovery and execution
- **Entity Extraction**: LLM-powered entity recognition and storage



## Prerequisites

- Python 3.10+
- Oracle AI Database (local Docker or cloud)
- OpenAI API key
- Tavily API key

## By the End
You'll have a reusable `MemoryLayer` class and agent loop that demonstrates how modern AI agents maintain context, learn from interactions, and manage information across sessions.

In [1]:
! pip install -qU langchain-oracledb sentence-transformers langchain-openai langchain tavily-python

# Local Installation of Oracle AI Database via Docker [Memory Core]

--------

This section walks you through setting up **Oracle AI Database 26ai** locally using Docker. Oracle AI Database is a converged database that combines relational, document, graph, and vector data in a single engine‚Äîmaking it ideal for AI applications that need semantic search, embeddings storage, and vector similarity queries.

**What you'll do:**
1. Pull and run the Oracle Database Docker container
2. Establish a connection from Python using `oracledb`
3. Create a dedicated user for vector operations

This local setup gives you a fully functional Oracle database for development and testing without needing cloud infrastructure.

### Installing Oracle AI Database via Docker

For this notebook we will be using a local installation of [Oracle AI Database](https://www.oracle.com/database/free/get-started/)

1. Install & start Docker. Docker Desktop (Mac/Windows) or Docker Engine (Linux). Make sure it‚Äôs running.
    - If installed with Docker Enginer, run from terminal ```open /Applications/Docker.app```
2. We are going to pull the [docker image](https://container-registry.oracle.com/ords/f?p=113:4:13936724845291:::4:P4_REPOSITORY,AI_REPOSITORY,AI_REPOSITORY_NAME,P4_REPOSITORY_NAME,P4_EULA_ID,P4_BUSINESS_AREA_ID:1863,1863,Oracle%20Database%20Free,Oracle%20Database%20Free,1,0&cs=3cVNH02fFYhB723ODpNnr0JZI1S7Z64nRyL_zC1Ls5BSVLafGsOLMFvFoPhn8JeeB8tXPhkfFKH8-dkrL_z3_0g)
3. Run a container with oracle image

    ```
      docker run -d \
        --name oracle-free \
        -p 1521:1521 -p 5500:5500 \
        -e ORACLE_PWD=OraclePwd_2025 \
        -v $HOME/oracle/full_data:/opt/oracle/oradata \
        container-registry.oracle.com/database/free:latest

    ```

> üö´ **Troubleshoot**  
> If you see the error:  
> *`docker: Error response from daemon: Conflict. The container name "/oracle-full" is already in use by container ... You have to remove (or rename) that container to be able to reuse that name.`*  
>
> üß© **Fix:**  
> - Remove the existing container:  
>   ```bash
>   docker rm oracle-free
>   ```  
> - Then re-run your Docker command from **Step 3** to start a new container.


### üöÄ One-Click Database Setup

The cell below handles **everything automatically**:
- ‚úÖ Checks if Docker is running
- ‚úÖ Checks if Oracle container exists and is healthy
- ‚úÖ Waits for database to be ready (with progress indicator)
- ‚úÖ Fixes the listener for ARM Macs (Apple Silicon)
- ‚úÖ Creates the VECTOR user with proper privileges
- ‚úÖ Tests the connection

**Just run the cell below and wait for the ‚úÖ success message!**


In [2]:
import subprocess
import time
import sys

def setup_oracle_database(container_name="oracle-free", vector_password="VectorPwd_2025"):
    """
    Complete Oracle Database setup - handles everything in one call.
    
    This function:
    1. Checks Docker is running
    2. Verifies container exists and is healthy
    3. Waits for database to be ready
    4. Fixes listener for ARM Macs
    5. Creates VECTOR user
    6. Tests connection
    """
    print("=" * 60)
    print("üöÄ ORACLE DATABASE SETUP")
    print("=" * 60)
    
    # Step 1: Check Docker
    print("\n[1/6] Checking Docker...")
    try:
        result = subprocess.run(['docker', 'info'], capture_output=True, text=True, timeout=10)
        if result.returncode != 0:
            print("   ‚ùå Docker is not running!")
            print("   üí° Start Docker Desktop and try again.")
            return False
        print("   ‚úÖ Docker is running")
    except FileNotFoundError:
        print("   ‚ùå Docker not found! Please install Docker.")
        return False
    except subprocess.TimeoutExpired:
        print("   ‚ùå Docker is not responding. Please restart Docker.")
        return False
    
    # Step 2: Check container
    print(f"\n[2/6] Checking container '{container_name}'...")
    result = subprocess.run(
        ['docker', 'ps', '-a', '--filter', f'name={container_name}', '--format', '{{.Status}}'],
        capture_output=True, text=True
    )
    status = result.stdout.strip()
    
    if not status:
        print(f"   ‚ùå Container '{container_name}' not found!")
        print("   üí° Run the docker run command from the previous cell first.")
        return False
    elif "Up" not in status:
        print(f"   ‚ö†Ô∏è  Container exists but not running. Starting...")
        subprocess.run(['docker', 'start', container_name], capture_output=True)
        time.sleep(5)
    
    print(f"   ‚úÖ Container is running")
    
    # Step 3: Wait for database ready
    print("\n[3/6] Waiting for database to be ready...")
    print("   (This can take 2-5 minutes on Apple Silicon Macs)")
    
    max_wait = 300  # 5 minutes
    check_interval = 10
    elapsed = 0
    
    while elapsed < max_wait:
        # Check container health
        result = subprocess.run(
            ['docker', 'ps', '--filter', f'name={container_name}', '--format', '{{.Status}}'],
            capture_output=True, text=True
        )
        if "healthy" in result.stdout.lower():
            print(f"\n   ‚úÖ Database is healthy!")
            break
        
        # Also check logs for ready message
        logs = subprocess.run(
            ['docker', 'logs', '--tail', '20', container_name],
            capture_output=True, text=True
        )
        if "DATABASE IS READY TO USE" in logs.stdout:
            print(f"\n   ‚úÖ Database is ready!")
            break
        
        # Progress indicator
        dots = "." * ((elapsed // check_interval) % 4 + 1)
        print(f"\r   ‚è≥ Waiting{dots.ljust(5)} ({elapsed}s elapsed)", end="", flush=True)
        time.sleep(check_interval)
        elapsed += check_interval
    else:
        print(f"\n   ‚ùå Timeout waiting for database. Check 'docker logs {container_name}'")
        return False
    
    # Step 4: Fix listener (for ARM Macs)
    print("\n[4/6] Configuring listener...")
    
    # Fix listener.ora
    subprocess.run(
        ['docker', 'exec', container_name, 'bash', '-c',
         "sed -i 's/HOST = [^)]*)/HOST = 0.0.0.0)/g' /opt/oracle/product/26ai/dbhomeFree/network/admin/listener.ora"],
        capture_output=True
    )
    
    # Restart listener
    subprocess.run(['docker', 'exec', container_name, 'lsnrctl', 'stop'], capture_output=True)
    start_result = subprocess.run(
        ['docker', 'exec', container_name, 'lsnrctl', 'start'],
        capture_output=True, text=True
    )
    
    if "Listening on" not in start_result.stdout:
        print("   ‚ùå Failed to start listener")
        return False
    
    # Register services
    subprocess.run(
        ['docker', 'exec', container_name, 'bash', '-c',
         "export ORACLE_SID=FREE && sqlplus -s / as sysdba <<< 'ALTER SYSTEM REGISTER;'"],
        capture_output=True
    )
    print("   ‚úÖ Listener configured and running")
    
    # Step 5: Create VECTOR user
    print("\n[5/6] Creating VECTOR user...")
    
    create_user_sql = f'''
    DECLARE
        user_count NUMBER;
    BEGIN
        SELECT COUNT(*) INTO user_count FROM all_users WHERE username = 'VECTOR';
        IF user_count = 0 THEN
            EXECUTE IMMEDIATE 'CREATE USER VECTOR IDENTIFIED BY {vector_password}';
            EXECUTE IMMEDIATE 'GRANT CONNECT, RESOURCE, CREATE SESSION TO VECTOR';
            EXECUTE IMMEDIATE 'GRANT UNLIMITED TABLESPACE TO VECTOR';
            EXECUTE IMMEDIATE 'GRANT CREATE TABLE, CREATE SEQUENCE, CREATE VIEW TO VECTOR';
            DBMS_OUTPUT.PUT_LINE('CREATED');
        ELSE
            DBMS_OUTPUT.PUT_LINE('EXISTS');
        END IF;
    END;
    /
    '''
    
    result = subprocess.run(
        ['docker', 'exec', container_name, 'bash', '-c',
         f"export ORACLE_SID=FREE && sqlplus -s / as sysdba <<< \"ALTER SESSION SET CONTAINER = FREEPDB1; {create_user_sql}\""],
        capture_output=True, text=True
    )
    
    if "ORA-" in result.stdout:
        print(f"   ‚ö†Ô∏è  Warning: {result.stdout}")
    else:
        print("   ‚úÖ VECTOR user ready")
    
    # Step 6: Test connection
    print("\n[6/6] Testing connection...")
    try:
        import oracledb
        conn = oracledb.connect(
            user="VECTOR",
            password=vector_password,
            dsn="127.0.0.1:1521/FREEPDB1"
        )
        with conn.cursor() as cur:
            cur.execute("SELECT 1 FROM dual")
            cur.fetchone()
        conn.close()
        print("   ‚úÖ Connection successful!")
    except Exception as e:
        print(f"   ‚ùå Connection failed: {e}")
        return False
    
    # Success!
    print("\n" + "=" * 60)
    print("üéâ SETUP COMPLETE!")
    print("=" * 60)
    print(f"""
You can now connect to Oracle:
    User: VECTOR
    Password: {vector_password}
    DSN: 127.0.0.1:1521/FREEPDB1
""")
    return True


In [3]:
# Run this cell after starting your Docker container
# It handles everything: waits for ready, fixes listener, creates user, tests connection
setup_oracle_database()

üöÄ ORACLE DATABASE SETUP

[1/6] Checking Docker...
   ‚ùå Docker is not running!
   üí° Start Docker Desktop and try again.


False

### Connection Helper Function

In the code below we have a reusable function that connects to Oracle Database with automatic retry logic and helpful error messages.

**What it does:**
1. Attempts to connect using the `oracledb` Python driver
2. Retries up to 3 times if the connection fails (useful when the database is still starting)
3. Prints the Oracle version banner on successful connection. This will also include the version you are running
4. Provides troubleshooting hints for common connection errors


In [None]:
import oracledb
import time

def connect_to_oracle(max_retries=3, retry_delay=5, user="sys", password="OraclePwd_2025", dsn="127.0.0.1:1521/FREEPDB1", program="langchain_oracledb_deep_research_demo"):
    """
    Connect to Oracle database with retry logic and better error handling.
    
    Args:
        max_retries: Maximum number of connection attempts
        retry_delay: Seconds to wait between retries
    """
    
    for attempt in range(1, max_retries + 1):
        try:
            print(f"Connection attempt {attempt}/{max_retries}...")
            conn = oracledb.connect(
                user=user,
                password=password,
                dsn=dsn,
                program=program
            )
            print("‚úì Connected successfully!")
            
            # Test the connection
            with conn.cursor() as cur:
                cur.execute("SELECT banner FROM v$version WHERE banner LIKE 'Oracle%';")
                banner = cur.fetchone()[0]
                # Banner should include the version you are running
                print(f"\n{banner}")
            
            return conn
            
        except oracledb.OperationalError as e:
            error_msg = str(e)
            print(f"‚úó Connection failed (attempt {attempt}/{max_retries})")
            
            if "DPY-4011" in error_msg or "Connection reset by peer" in error_msg:
                print("  ‚Üí This usually means:")
                print("    1. Database is still starting up (wait 2-3 minutes)")
                print("    2. Listener configuration issue")
                print("    3. Container is not running")
                
                if attempt < max_retries:
                    print(f"\n  Waiting {retry_delay} seconds before retry...")
                    time.sleep(retry_delay)
                else:
                    print("\n  üí° Try running: setup_oracle_database()")
                    print("     This will fix the listener and verify the connection.")
                    raise
            else:
                raise
        except Exception as e:
            print(f"‚úó Unexpected error: {e}")
            raise
    
    raise ConnectionError("Failed to connect after all retries")

Ensure you have your Docker Engine running before going through the next steps

Connect as the `VECTOR` user dedicated schema for storing embeddings and vector data.


In [None]:
vector_conn = connect_to_oracle(
    user="VECTOR",
    password="VectorPwd_2025",
    dsn="127.0.0.1:1521/FREEPDB1",
    program="langchain_oracledb_deep_research_demo",
)

print("Using user:", vector_conn.username)

‚úÖ **Setup complete!** You now have Oracle AI Database running locally with an active connection.

Next, we'll create vector stores using **LangChain's Oracle integration** to store embeddings and metadata for semantic search.

# Vector Search With Langchain and Oracle AI Database

--------

This section demonstrates how to use **LangChain's Oracle Vector Store (OracleVS)** to store and search documents using semantic similarity. 

Vector search enables finding documents based on meaning rather than exact keyword matches.

## What You'll Learn

| Step | Description |
|------|-------------|
| **1. Initialize Embeddings** | Load a HuggingFace embedding model to convert text into vectors |
| **2. Create Vector Store** | Set up an Oracle-backed vector store with distance strategy |
| **3. Create Index** | Build an IVF (Inverted File) index for fast similarity search |
| **4. Add Documents** | Store text with metadata in the vector database |
| **5. Query** | Search for similar documents using natural language |
| **6. Filter Results** | Use metadata filters to narrow down search results |

## Key Components

- **`OracleVS`**: LangChain's Oracle vector store integration
- **`HuggingFaceEmbeddings`**: Converts text to 768-dimensional vectors
- **`DistanceStrategy.EUCLIDEAN_DISTANCE`**: Measures similarity between vectors
- **IVF Index**: Speeds up searches on large datasets by clustering vectors

## Creating Vector Stores with Langchain OracleVS

In [None]:
from langchain_oracledb.vectorstores import OracleVS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_oracledb.vectorstores.oraclevs import create_index
from langchain_community.vectorstores.utils import DistanceStrategy

# Initialize the embedding model
embedding_model = HuggingFaceEmbeddings(
    model_name="sentence-transformers/paraphrase-mpnet-base-v2"
)

# Initialize the vector store
vector_store = OracleVS(
    client=vector_conn, 
    embedding_function=embedding_model,
    table_name="VECTOR_SEARCH_DEMO",
    distance_strategy=DistanceStrategy.EUCLIDEAN_DISTANCE
)

In [None]:
# Helper to safely create index (skips if already exists)
def safe_create_index(conn, vs, idx_name):
    """Create index, skipping if it already exists."""
    try:
        create_index(
            client=conn,
            vector_store=vs,
            params={"idx_name": idx_name, "idx_type": "IVF"}
        )
        print(f"  ‚úÖ Created index: {idx_name}")
    except Exception as e:
        if "ORA-00955" in str(e):
            print(f"  ‚è≠Ô∏è Index already exists: {idx_name} (skipped)")
        else:
            raise


In [None]:
import logging

#¬†Suppress langchain_oracledb logging, remove this if you want to see the debug logs
logging.getLogger("langchain_oracledb").setLevel(logging.CRITICAL)

# Create an IVF index for fast similarity search
safe_create_index(vector_conn, vector_store, "oravs_ivf")

## Ingesting Example Data

In [None]:
# add texts to the vector database
texts = ["A tablespace can be online (accessible) or offline (not accessible) whenever the database is open.\nA tablespace is usually online so that its data is available to users. The SYSTEM tablespace and temporary tablespaces cannot be taken offline.", "The database stores LOBs differently from other data types. Creating a LOB column implicitly creates a LOB segment and a LOB index. "]
metadata = [
    {"id": "100", "link": "Document Example Test 1"},
    {"id": "101", "link": "Document Example Test 2"},
]

# Simple Ingestion
vector_store.add_texts(
    texts=texts, # This is the text embeddings will be generated from
    metadatas=metadata # This is the metadata that will be stored with the text
)


## Querying the Vector Store

Search for documents similar to a natural language query. 

The vector store converts queries to an embedding and finds the closest matches.


Basic Search

In [None]:
query = "How does Oracle handle tablespaces?"

results = vector_store.similarity_search(query, k=3)

for i, doc in enumerate(results, start=1):
    print(f"--- Result {i} ---")
    print("Text:", doc.page_content)
    print("Metadata:", doc.metadata)


Search With Scores

In [None]:
results = vector_store.similarity_search_with_score(query, k=3)

for doc, score in results:
    print("Score:", score)
    print("Text :", doc.page_content)
    print("Meta :", doc.metadata)
    print("------")

Filter by exact match on a metadata field

In [None]:
query = "How are tablespaces made available to users?"

# This will only return docs where metadata.link == "Document Example Test 1".
docs = vector_store.similarity_search(
    query, k=3,
    filter={"link": {"$eq": "Document Example Test 1"}},
)

for doc in docs:
    print("Text:", doc.page_content[:120], "...")
    print("Meta:", doc.metadata)
    print("------")

Filter by id list ($in)

In [None]:
docs = vector_store.similarity_search(
    query="Explain database storage concepts",
    k=5,
    filter={"id": {"$in": ["100"]}},  # only id 100
)

print(docs)

# Memory Engineering and Agent Memory
--------



**`Agent Memory`** is the exocortex that augments an LLM‚Äîcapturing, encoding, storing, linking, and retrieving information beyond the model‚Äôs parametric and contextual limits. 
It provides the persistence and structure required for long-horizon reasoning and reliable behaviour.

**`Memory Engineering`** is the scaffolding and control harness that we design to move information optimally and efficiently into, through, and across all components of an AI system(databases, LLMs, applications etc). It ensures that data is captured, transformed, organized, and retrieved in the right way at the right time‚Äîso agents can behave reliably, believably, and capabaly.

This is the core section of the notebook where we build a complete **`Memory Manager`** for AI agents. 

Just like humans have different types of memory (short-term, long-term, procedural), AI agents benefit from specialized memory systems.

## Why Memory Engineering Matters

Without memory, agents:
- Forget previous conversations
- Can't learn from past interactions
- Repeat the same mistakes
- Lack context for complex tasks

With proper memory engineering, agents can:
- Maintain context across sessions
- Learn and improve over time
- Access relevant knowledge when needed
- Execute complex multi-step workflows

## Memory Types We'll Implement

| Memory Type | Human Analogy | Purpose | Storage |
|-------------|---------------|---------|---------|
| **Conversational** | Short-term memory | Chat history per thread | SQL Table |
| **Knowledge Base** | Long-term semantic memory | Facts, documents, search results | Vector Store |
| **Workflow** | Procedural memory | Learned action patterns | Vector Store |
| **Toolbox** | Skill memory | Available tools & capabilities | Vector Store |
| **Entity** | Episodic memory | People, places, systems mentioned | Vector Store |
| **Summary** | Compressed memory | Condensed context for long conversations | Vector Store |

## Steps in This Section

1. **Define table names** for each memory type
2. **Create SQL table** for conversational history
3. **Create vector stores** for semantic memories
4. **Build indexes** for fast similarity search
5. **Implement MemoryLayer class** with read/write methods for each memory type
6. **Initialize the memory manager** with all storage backends

## Define Memory Tables and Stores
First, we define table names for each memory type. 

These tables will be created in Oracle Database to persist agent memory.

In [None]:
# Table names for each memory type
CONVERSATIONAL_TABLE   = "CONVERSATIONAL_MEMORY" # Episodic memory
KNOWLEDGE_BASE_TABLE   = "SEMANTIC_MEMORY" # Semantic memory
WORKFLOW_TABLE = "WORKFLOW_MEMORY" # Procedural memory
TOOLBOX_TABLE    = "TOOLBOX_MEMORY" # Procedural memory
ENTITY_TABLE = "ENTITY_MEMORY" # Semantic memory
SUMMARY_TABLE = "SUMMARY_MEMORY" # Semanatic memory

ALL_TABLES = [CONVERSATIONAL_TABLE, KNOWLEDGE_BASE_TABLE, WORKFLOW_TABLE, TOOLBOX_TABLE, ENTITY_TABLE, SUMMARY_TABLE]

# Drop existing tables to start fresh
for table in ALL_TABLES:
    try:
        with vector_conn.cursor() as cur:
            cur.execute(f"DROP TABLE {table} PURGE")
    except Exception as e:
        if "ORA-00942" in str(e):
            print(f"  - {table} (not exists)")
        else:
            print(f"  ‚úó {table}: {e}")
            
vector_conn.commit()

In [None]:
# Model token limits (for context management)
MODEL_TOKEN_LIMITS = {
    "gpt-5": 256000,
    "gpt-5-mini": 128000,
    "gpt-4o": 128000,
    "gpt-4o-mini": 128000,
    "gpt-4-turbo": 128000,
    "gpt-4": 8192,
    "gpt-3.5-turbo": 16385,
}

### Create Conversational Memory Table

This function below creates a SQL table to store chat history. 

Unlike vector stores, conversational memory uses a traditional table because we need exact retrieval by thread ID (not similarity search).

**What it does:**
- Creates a table with columns: `id`, `thread_id`, `role`, `content`, `timestamp`, `metadata`
- Adds an index on `thread_id` for fast conversation lookups
- Adds an index on `timestamp` for chronological ordering


In [None]:
def create_conversational_history_table(conn, table_name: str = "CONVERSATIONAL_MEMORY"):
    """
    Create a table to store conversational history.

    Args:
        conn: Oracle database connection
        table_name: Name of the table to create
    """
    with conn.cursor() as cur:
        # Drop table if exists
        try:
            cur.execute(f"DROP TABLE {table_name}")
        except:
            pass  # Table doesn't exist
        
        # Create table with proper schema
        cur.execute(f"""
            CREATE TABLE {table_name} (
                id VARCHAR2(100) DEFAULT SYS_GUID() PRIMARY KEY,
                thread_id VARCHAR2(100) NOT NULL,
                role VARCHAR2(50) NOT NULL,
                content CLOB NOT NULL,
                timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
                metadata CLOB,
                created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
                summary_id VARCHAR2(100) DEFAULT NULL
            )
        """)
        
        # Create index on thread_id for faster lookups
        cur.execute(f"""
            CREATE INDEX idx_{table_name.lower()}_thread_id ON {table_name}(thread_id)
        """)
        
        # Create index on timestamp for ordering
        cur.execute(f"""
            CREATE INDEX idx_{table_name.lower()}_timestamp ON {table_name}(timestamp)
        """)
        
    conn.commit()
    print(f"Table {table_name} created successfully with indexes")
    return table_name


In [None]:
# Create the table
CONVERSATION_HISTORY_TABLE = create_conversational_history_table(vector_conn, CONVERSATIONAL_TABLE)

### Create Vector Stores for Each Memory Type

Here we create 5 separate vector stores‚Äîone for each memory type. 

Each vector store is backed by its own Oracle table and uses the same embedding model for consistency.

| Vector Store | Purpose |
|--------------|---------|
| `knowledge_base_vs` | Store documents, facts, and search results |
| `workflow_vs` | Store learned action patterns and tool sequences |
| `toolbox_vs` | Store tool definitions for semantic tool discovery |
| `entity_vs` | Store extracted entities (people, places, systems) |
| `summary_vs` | Store compressed summaries for long conversations |


In [None]:
knowledge_base_vs = OracleVS(
    client=vector_conn,
    embedding_function=embedding_model,
    table_name=KNOWLEDGE_BASE_TABLE,
    distance_strategy=DistanceStrategy.EUCLIDEAN_DISTANCE,
)

workflow_vs = OracleVS(
    client=vector_conn,
    embedding_function=embedding_model,
    table_name=WORKFLOW_TABLE,
    distance_strategy=DistanceStrategy.EUCLIDEAN_DISTANCE,
)

toolbox_vs = OracleVS(
    client=vector_conn,
    embedding_function=embedding_model,
    table_name=TOOLBOX_TABLE,
    distance_strategy=DistanceStrategy.EUCLIDEAN_DISTANCE,
)

entity_vs = OracleVS(
    client=vector_conn,
    embedding_function=embedding_model,
    table_name=ENTITY_TABLE,
    distance_strategy=DistanceStrategy.EUCLIDEAN_DISTANCE,
)

summary_vs = OracleVS(
    client=vector_conn,
    embedding_function=embedding_model,
    table_name=SUMMARY_TABLE,
    distance_strategy=DistanceStrategy.EUCLIDEAN_DISTANCE,
)

Then we create indexes for each of the vector stores

In [None]:
print("Creating vector indexes...")
safe_create_index(vector_conn, knowledge_base_vs, "knowledge_base_vs_ivf")
safe_create_index(vector_conn, workflow_vs, "workflow_vs_ivf")
safe_create_index(vector_conn, toolbox_vs, "toolbox_vs_ivf")
safe_create_index(vector_conn, entity_vs, "entity_vs_ivf")
safe_create_index(vector_conn, summary_vs, "summary_vs_ivf")
print("All indexes created!")


## Programmatic vs Agentic Operations

A key design decision in memory engineering is determining which operations should be **programmatic** (executed automatically by code) versus **agentic** (decided by the LLM at runtime).

| Operation | Programmatic | Agentic |
|-----------|:------------:|:-------:|
| `read_conversational_memory()` | ‚úÖ | ‚ùå |
| `read_knowledge_base()` | ‚úÖ | ‚ùå |
| `read_workflow()` | ‚úÖ | ‚ùå |
| `read_entity()` | ‚úÖ | ‚ùå |
| `read_summary_context()` | ‚úÖ | ‚ùå |
| `write_conversational_memory()` | ‚úÖ | ‚ùå |
| `write_workflow()` | ‚úÖ | ‚ùå |
| `write_entity()` | ‚úÖ | ‚ùå |
| `search_tavily()` | ‚ùå | ‚úÖ |
| `expand_summary()` | ‚ùå | ‚úÖ |
| `summarize_and_store()` | ‚ùå | ‚úÖ |

### Why Memory Reads are Programmatic

Memory retrieval operations are **always executed** at the start of each agent loop because:

1. **Context is essential** ‚Äî The agent needs memory to understand the conversation and avoid repeating mistakes. Without this, every interaction starts from scratch.

2. **The agent can't know what it doesn't know** ‚Äî If the agent had to decide whether to check memory, it would need to already know what's in memory‚Äîa chicken-and-egg problem.

3. **Consistency** ‚Äî Always loading memory ensures the agent has a predictable, complete view of its knowledge.

### Why Memory Writes are Programmatic

Storing conversations, workflows, and entities happens automatically because:

1. **Reliability** ‚Äî We don't want the agent to "forget" to save important information. Conversation history must be persisted consistently.

2. **Completeness** ‚Äî Every interaction should be recorded. Selective saving would create gaps in memory.

3. **Reduced cognitive load** ‚Äî Letting the agent focus on the task rather than memory management leads to better responses.

### Why Tool Calls are Agentic

External actions like web search and summary expansion are left to the agent's discretion because:

1. **Intent matters** ‚Äî Only the agent knows if it needs more information. Automatically searching for every query would be wasteful.

2. **Cost considerations** ‚Äî External API calls have latency and may incur costs. The agent should only call them when genuinely needed.

3. **Judgment required** ‚Äî Deciding *what* to search for or *which* summary to expand requires understanding the user's intent‚Äîsomething the LLM excels at.

## Memory Manager Implementation

The `MemoryManager` class is the central abstraction that unifies all memory operations. It provides a clean interface for reading and writing to different memory types, hiding the complexity of SQL queries and vector store operations.

### What We're Building

A single class that manages 6 types of memory with consistent read/write patterns:

| Memory Type | Storage | Write Method | Read Method |
|-------------|---------|--------------|-------------|
| **Conversational** | SQL Table | `write_conversational_memory()` | `read_conversational_memory()` |
| **Knowledge Base** | Vector Store | `write_knowledge_base()` | `read_knowledge_base()` |
| **Workflow** | Vector Store | `write_workflow()` | `read_workflow()` |
| **Toolbox** | Vector Store | `write_toolbox()` | `read_toolbox()` |
| **Entity** | Vector Store | `write_entity()` | `read_entity()` |
| **Summary** | Vector Store | `write_summary()` | `read_summary_memory()`, `read_summary_context()` |

### Key Features

- **Thread-based conversations** ‚Äî Messages are organized by `thread_id` for multi-conversation support
- **Semantic search** ‚Äî Vector stores enable finding relevant content by meaning, not just keywords
- **Metadata filtering** ‚Äî Workflows filter by `num_steps > 0`, summaries filter by `id`
- **LLM-powered entity extraction** ‚Äî Automatically extracts people, places, and systems from text
- **Formatted context output** ‚Äî Each read method returns formatted text ready for the LLM context

### Alternative: Memory Manager Frameworks

There are existing frameworks that abstract memory management for AI agents:

| Framework | Description |
|-----------|-------------|
| **LangChain Memory** | Built-in memory classes (ConversationBufferMemory, VectorStoreRetrieverMemory) |
| **Mem0** | Dedicated memory layer for AI agents with automatic memory management |
| **LlamaIndex** | Document-based memory with various storage backends |
| **Zep** | Long-term memory service for AI assistants |

### Pros and Cons of Building Your Own

| Approach | Pros | Cons |
|----------|------|------|
| **Custom (what we're doing)** | Full control, tailored to your needs, deeper understanding, no external dependencies | More code to maintain, need to handle edge cases yourself |
| **Using a framework** | Faster to implement, battle-tested, community support, handles edge cases | Less control, may not fit your exact use case, additional dependency |

> **For learning purposes**, building your own memory manager (as we do here) gives you a deep understanding of how memory engineering works. 
> 
> **For production**, you might consider using or extending an existing framework. 
>
> For example, this simple notebook only illustrates reads and writes, but not deletion and updates.

In [None]:
import json as json_lib
from datetime import datetime

class MemoryManager:
    """
    A simplified memory manager for AI agents using Oracle AI Database.
    
    Manages 5 types of memory:
    - Conversational: Chat history per thread (SQL table)
    - Knowledge Base: Searchable documents (Vector store)
    - Workflow: Execution patterns (Vector store)
    - Toolbox: Available tools (Vector store)
    - Entity: People, places, systems (Vector store)
    - Summary: Storing compressed context window
    """
    
    def __init__(self, conn, conversation_table: str, knowledge_base_vs, workflow_vs, toolbox_vs, entity_vs, summary_vs):
        self.conn = conn
        self.conversation_table = conversation_table
        self.knowledge_base_vs = knowledge_base_vs
        self.workflow_vs = workflow_vs
        self.toolbox_vs = toolbox_vs
        self.entity_vs = entity_vs
        self.summary_vs = summary_vs
    
    # ==================== CONVERSATIONAL MEMORY (SQL) ====================
    
    def write_conversational_memory(self, content: str, role: str, thread_id: str) -> str:
        """Store a message in conversation history."""
        thread_id = str(thread_id)
        with self.conn.cursor() as cur:
            id_var = cur.var(str)
            cur.execute(f"""
                INSERT INTO {self.conversation_table} (thread_id, role, content, metadata, timestamp)
                VALUES (:thread_id, :role, :content, :metadata, CURRENT_TIMESTAMP)
                RETURNING id INTO :id
            """, {"thread_id": thread_id, "role": role, "content": content, "metadata": "{}", "id": id_var})
            record_id = id_var.getvalue()[0] if id_var.getvalue() else None
        self.conn.commit()
        return record_id
    
    def read_conversational_memory(self, thread_id: str, limit: int = 10) -> str:
        """Read conversation history for a thread (excludes summarized messages)."""
        thread_id = str(thread_id)
        with self.conn.cursor() as cur:
            cur.execute(f"""
                SELECT role, content, timestamp FROM {self.conversation_table}
                WHERE thread_id = :thread_id AND summary_id IS NULL
                ORDER BY timestamp ASC
                FETCH FIRST :limit ROWS ONLY
            """, {"thread_id": thread_id, "limit": limit})
            results = cur.fetchall()
        
        messages = [f"[{ts.strftime('%H:%M:%S')}] [{role}] {content}" for role, content, ts in results]
        messages_formatted = '\n'.join(messages)
        return f"""## Conversation Memory: This is the conversation history for the current thread
### How to use: Use the conversation history to answer the question

{messages_formatted}"""
    
    def mark_as_summarized(self, thread_id: str, summary_id: str):
        """Mark all unsummarized messages in a thread as summarized."""
        thread_id = str(thread_id)
        with self.conn.cursor() as cur:
            cur.execute(f"""
                UPDATE {self.conversation_table}
                SET summary_id = :summary_id
                WHERE thread_id = :thread_id AND summary_id IS NULL
            """, {"summary_id": summary_id, "thread_id": thread_id})
        self.conn.commit()
        print(f"  üì¶ Marked messages as summarized (summary_id: {summary_id})")
    
    # ==================== KNOWLEDGE BASE (Vector Store) ====================
    
    def write_knowledge_base(self, text: str, metadata: dict):
        """Store text in knowledge base with metadata."""
        self.knowledge_base_vs.add_texts([text], [metadata])
    
    def read_knowledge_base(self, query: str, k: int = 3) -> str:
        """Search knowledge base for relevant content."""
        results = self.knowledge_base_vs.similarity_search(query, k=k)
        content = "\n".join([doc.page_content for doc in results])
        return f"""## Knowledge Base Memory: This are general information that is relevant to the question
### How to use: Use the knowledge base as background information that can help answer the question

{content}"""
    
    
    # ==================== WORKFLOW (Vector Store) ====================
    
    def write_workflow(self, query: str, steps: list, final_answer: str, success: bool = True):
        """Store a completed workflow pattern for future reference."""
        # Format steps as text
        steps_text = "\n".join([f"Step {i+1}: {s}" for i, s in enumerate(steps)])
        text = f"Query: {query}\nSteps:\n{steps_text}\nAnswer: {final_answer[:200]}"
        
        metadata = {
            "query": query,
            "success": success,
            "num_steps": len(steps),
            "timestamp": datetime.now().isoformat()
        }
        self.workflow_vs.add_texts([text], [metadata])
    
    def read_workflow(self, query: str, k: int = 3) -> str:
        """Search for similar past workflows with at least 1 step."""
        # Filter to only include workflows that have steps (num_steps > 0)
        results = self.workflow_vs.similarity_search(
            query, 
            k=k, 
            filter={"num_steps": {"$gt": 0}}
        )
        if not results:
            return "## Workflow Memory\nNo relevant workflows found."
        content = "\n---\n".join([doc.page_content for doc in results])
        return f"""## Workflow Memory: This are the past workflows that are relevant to the question
### How to use: Use the steps and use them to answer the question, especially when using tools and external sources

{content}"""
    
    # ==================== TOOLBOX (Vector Store) ====================
    
    def write_toolbox(self, text: str, metadata: dict):
        """Store a tool definition in the toolbox."""
        self.toolbox_vs.add_texts([text], [metadata])
    
    def read_toolbox(self, query: str, k: int = 3) -> list[dict]:
        """Find relevant tools and return OpenAI-compatible schemas."""
        results = self.toolbox_vs.similarity_search(query, k=k)
        tools = []
        for doc in results:
            meta = doc.metadata
            # Extract parameters from metadata and convert to OpenAI format
            stored_params = meta.get("parameters", {})
            properties = {}
            required = []
            
            for param_name, param_info in stored_params.items():
                # Convert stored param info to OpenAI schema format
                param_type = param_info.get("type", "string")
                # Map Python types to JSON schema types
                type_mapping = {
                    "<class 'str'>": "string",
                    "<class 'int'>": "integer", 
                    "<class 'float'>": "number",
                    "<class 'bool'>": "boolean",
                    "str": "string",
                    "int": "integer",
                    "float": "number",
                    "bool": "boolean"
                }
                json_type = type_mapping.get(param_type, "string")
                properties[param_name] = {"type": json_type}
                
                # If no default, it's required
                if "default" not in param_info:
                    required.append(param_name)
            
            tools.append({
                "type": "function",
                "function": {
                    "name": meta.get("name", "tool"),
                    "description": meta.get("description", ""),
                    "parameters": {"type": "object", "properties": properties, "required": required}
                }
            })
        return tools

    # ==================== ENTITY (Vector Store) ====================
    
    def extract_entities(self, text: str, llm_client) -> list[dict]:
        """Use LLM to extract entities (people, places, systems) from text."""
        if not text or len(text.strip()) < 5:
            return []
        
        prompt = f'''Extract entities from: "{text[:500]}"
Return JSON: [{{"name": "X", "type": "PERSON|PLACE|SYSTEM", "description": "brief"}}]
If none: []'''

        try:
            response = llm_client.chat.completions.create(
                model="gpt-4o-mini",
                messages=[{"role": "user", "content": prompt}],
                temperature=0.0,
                max_tokens=300
            )
            result = response.choices[0].message.content.strip()
            
            # Extract JSON array from response
            start, end = result.find("["), result.rfind("]")
            if start == -1 or end == -1:
                return []
            
            parsed = json_lib.loads(result[start:end+1])
            return [{"name": e["name"], "type": e.get("type", "UNKNOWN"), "description": e.get("description", "")} 
                    for e in parsed if isinstance(e, dict) and e.get("name")]
        except:
            return []
    
    def write_entity(self, name: str, entity_type: str, description: str, llm_client=None, text: str = None):
        """Store an entity OR extract and store entities from text."""
        if text and llm_client:
            # Extract and store entities from text
            entities = self.extract_entities(text, llm_client)
            for e in entities:
                self.entity_vs.add_texts(
                    [f"{e['name']} ({e['type']}): {e['description']}"],
                    [{"name": e['name'], "type": e['type'], "description": e['description']}]
                )
            return entities
        else:
            # Store single entity directly
            self.entity_vs.add_texts(
                [f"{name} ({entity_type}): {description}"],
                [{"name": name, "type": entity_type, "description": description}]
            )
    
    def read_entity(self, query: str, k: int = 5) -> str:
        """Search for relevant entities."""
        results = self.entity_vs.similarity_search(query, k=k)
        if not results:
            return "## Entity Memory\nNo entities found."
        
        entities = [f"‚Ä¢ {doc.metadata.get('name', '?')}: {doc.metadata.get('description', '')}" 
                    for doc in results if hasattr(doc, 'metadata')]
        entities_formatted = '\n'.join(entities)
        return f"""## Entity Memory: This are the entities that are relevant to the question
### How to use: Use the entities to answer the question, especially when having long conversations

{entities_formatted}"""
    
    # ==================== SUMMARY (Vector Store) ====================
    
    def write_summary(self, summary_id: str, full_content: str, summary: str, description: str):
        """Store a summary with its original content."""
        self.summary_vs.add_texts(
            [f"{summary_id}: {description}"],
            [{"id": summary_id, "full_content": full_content, "summary": summary, "description": description}]
        )
        return summary_id
    
    def read_summary_memory(self, summary_id: str) -> str:
        """Retrieve a specific summary by ID (just-in-time retrieval)."""
        results = self.summary_vs.similarity_search(
            summary_id, 
            k=5, 
            filter={"id": summary_id}
        )
        if not results:
            return f"Summary {summary_id} not found."
        doc = results[0]
        return doc.metadata.get('summary', 'No summary content.')
    
    def read_summary_context(self, query: str = "", k: int = 10) -> str:
        """Get available summaries for context window (IDs + descriptions only)."""
        results = self.summary_vs.similarity_search(query or "summary", k=k)
        if not results:
            return "## Summary Memory\nNo summaries available."
        
        lines = ["## Summary Memory", "Use expand_summary(id) to get full content:"]
        for doc in results:
            sid = doc.metadata.get('id', '?')
            desc = doc.metadata.get('description', 'No description')
            lines.append(f"  ‚Ä¢ [ID: {sid}] {desc}")
        return "\n".join(lines) 

In [None]:
# Initialize the MemoryLayer instance
# Note: Uses SQL table for conversational memory, vector stores for others
memory_manager = MemoryManager(
    conn=vector_conn,
    conversation_table=CONVERSATION_HISTORY_TABLE, 
    knowledge_base_vs=knowledge_base_vs,
    workflow_vs=workflow_vs,
    toolbox_vs=toolbox_vs,
    entity_vs=entity_vs,
    summary_vs=summary_vs
)

## Creating the Agent's Toolbox

### The Scalability Problem with Tools

As your AI system grows, you might have **hundreds of tools** available‚ÄîAPIs, database queries, calculators, search engines, and more. However, passing all tools to the LLM at inference time creates serious problems:

| Problem | Impact |
|---------|--------|
| **Context bloat** | Tool definitions consume tokens, leaving less room for actual content |
| **Tool selection failure** | LLMs struggle to choose the right tool when presented with too many options |
| **Increased latency** | More tokens = slower inference |
| **Higher costs** | More tokens = higher API costs |

Model providers like OpenAI and Anthropic typically recommend limiting the number of tools exposed to an LLM (often 10-20 max for reliable selection).

### The Solution: Semantic Tool Retrieval

The `Toolbox` class solves this by treating tools as a **searchable memory**:

1. **Register hundreds of tools** ‚Äî Store all available tools with their descriptions and embeddings
2. **Retrieve only relevant tools** ‚Äî At inference time, use vector search to find tools semantically relevant to the current query
3. **Pass a focused toolset** ‚Äî Only the retrieved tools (typically 3-5) are passed to the LLM

This approach means your system can **scale to hundreds of tools** while the LLM only sees the most relevant ones for each query.

### How the Code Works

The `Toolbox` class uses **docstrings as the retrieval key**:

```
User Query ‚Üí Embed Query ‚Üí Vector Search ‚Üí Find tools with similar docstrings ‚Üí Return relevant tools
```

| Component | Purpose |
|-----------|---------|
| `get_embedding()` | Converts tool description to a vector |
| `ToolMetadata` | Pydantic model storing tool name, description, signature, parameters |
| `_augment_docstring()` | Uses LLM to improve the docstring for better retrieval |
| `_generate_queries()` | Creates synthetic queries that would trigger this tool |
| `register_tool()` | Decorator that stores tool with its embedding in the toolbox |

When you call `memory_manager.read_toolbox(query)`, it performs a similarity search to find tools whose docstrings are semantically similar to the query.

### The Intersection of Three Engineering Disciplines

This implementation combines techniques from **memory engineering**, **context engineering**, and **prompt engineering**:

| Discipline | Technique Used | How It Helps |
|------------|----------------|--------------|
| **Memory Engineering** | Toolbox as procedural memory | Tools are stored and retrieved like learned skills |
| **Memory Engineering** | Docstring augmentation | LLM improves docstrings for better semantic retrieval |
| **Memory Engineering** | Synthetic query generation | Creates example queries to improve tool discoverability |
| **Context Engineering** | Selective tool retrieval | Only relevant tools enter the context, reducing bloat |
| **Context Engineering** | Context offloading | Tool results can be summarized to save context space |
| **Prompt Engineering** | Role setting | "You are a technical writer" improves docstring quality |

### Key Insight

The `augment=True` flag in `@toolbox.register_tool(augment=True)` triggers:
1. **Docstring augmentation** ‚Äî LLM rewrites the docstring to be clearer and more searchable
2. **Synthetic query generation** ‚Äî LLM generates example queries that would need this tool
3. **Rich embedding** ‚Äî Combines name + augmented docstring + signature + queries for better retrieval

This means a simple one-line docstring like `"Search the web"` becomes a rich, detailed description that's much more likely to be retrieved when the user asks something like `"What's the latest news about AI?"`

In [None]:
import inspect
import uuid
from typing import Callable, Optional, Union
from pydantic import BaseModel

def get_embedding(text: str) -> list[float]:
    """
    Get the embedding for a text using the configured embedding model.
    """
    return embedding_model.embed_query(text)


class ToolMetadata(BaseModel):
    """Metadata for a registered tool."""
    name: str
    description: str
    signature: str
    parameters: dict
    return_type: str


class Toolbox:
    """
    A toolbox for registering, storing, and retrieving tools with LLM-powered augmentation.
    
    Tools are stored with embeddings for semantic retrieval, allowing the agent to
    find relevant tools based on natural language queries.
    """
    
    def __init__(self, memory_manager, llm_client, model: str = "gpt-4o-mini"):
        """
        Initialize the Toolbox.
        
        Args:
            memory_manager: MemoryManager instance for storing tools
            llm_client: OpenAI client for LLM augmentation
            model: Model to use for augmentation (default: gpt-4o-mini)
        """
        self.memory_manager = memory_manager
        self.llm_client = llm_client
        self.model = model
        self._tools: dict[str, Callable] = {}  # Maps tool_id -> callable
        self._tools_by_name: dict[str, Callable] = {}  # Maps function_name -> callable for execution
    
    def _augment_docstring(self, docstring: str) -> str:
        """
        Use LLM to improve and expand a tool's docstring.
        
        Takes a basic docstring and returns an enhanced version with:
        - Clearer description of what the tool does
        - Better formatted parameters and return values
        - Usage examples and edge cases
        
        Args:
            docstring: The original docstring to augment
            
        Returns:
            An improved, more detailed docstring
        """
        if not docstring.strip():
            return "No description provided."


        # NOTE: The role description of a technical writer below is a prompt engineering technique that is used to improve the quality of the docstring
        # Athough there are research that suggest that role description doesn't realy affect the quality of the LLM's output, it is still a useful technique
        #¬†and it is a good [prompt engineering] technique to know.
        prompt = f"""You are a technical writer. Improve the following function docstring to be more clear, 
            comprehensive, and useful. Include:
            1. A clear concise summary
            2. Detailed description of what the function does
            3. When to use this function
            4. Any important notes or caveats

            Original docstring:
            {docstring}

            Return ONLY the improved docstring, no other text.
        """

        response = self.llm_client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}],
            temperature=0.3,
            max_tokens=500
        )
        
        return response.choices[0].message.content.strip()
    
    def _generate_queries(self, docstring: str, num_queries: int = 5) -> list[str]:
        """
        Generate synthetic example queries that would lead to using this tool.
        
        These queries are used to improve retrieval - by embedding both the tool
        description AND example queries, we increase the chances of finding the
        right tool when the user asks a related question.
        
        Args:
            docstring: The tool's docstring (ideally augmented)
            num_queries: Number of example queries to generate
            
        Returns:
            List of example natural language queries
        """
        prompt = f"""Based on the following tool description, generate {num_queries} diverse example queries 
            that a user might ask when they need this tool. Make them natural and varied.

            Tool description:
            {docstring}

            Return ONLY a JSON array of strings, like: ["query1", "query2", ...]
        """

        response = self.llm_client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}],
            temperature=0.3,
            max_tokens=300
        )
        
        try:
            import json
            queries = json.loads(response.choices[0].message.content.strip())
            return queries if isinstance(queries, list) else []
        except json.JSONDecodeError:
            # Fallback: extract queries from text
            return [response.choices[0].message.content.strip()]
    
    def _get_tool_metadata(self, func: Callable) -> ToolMetadata:
        """
        Extract metadata from a function for storage and retrieval.
        
        Args:
            func: The function to extract metadata from
            
        Returns:
            ToolMetadata object with function details
        """
        sig = inspect.signature(func)
        
        # Extract parameter info
        parameters = {}
        for name, param in sig.parameters.items():
            param_info = {"name": name}
            if param.annotation != inspect.Parameter.empty:
                param_info["type"] = str(param.annotation)
            if param.default != inspect.Parameter.empty:
                param_info["default"] = str(param.default)
            parameters[name] = param_info
        
        # Extract return type
        return_type = "Any"
        if sig.return_annotation != inspect.Signature.empty:
            return_type = str(sig.return_annotation)
        
        return ToolMetadata(
            name=func.__name__,
            description=func.__doc__ or "No description",
            signature=str(sig),
            parameters=parameters,
            return_type=return_type
        )
    
    def register_tool(
        self, func: Optional[Callable] = None, augment: bool = False
    ) -> Union[str, Callable]:
        """
        Register a function as a tool in the toolbox.

        Can be used as a decorator or called directly:
        
            @toolbox.register_tool
            def my_tool(): ...
            
            @toolbox.register_tool(augment=True)
            def my_enhanced_tool(): ...
            
            tool_id = toolbox.register_tool(some_function)

        Parameters:
        -----------
        func : Callable, optional
            The function to register as a tool. If None, returns a decorator.
        augment : bool, optional
            Whether to augment the tool docstring and generate synthetic queries
            using the configured LLM provider.
            
        Returns:
        --------
        Union[str, Callable]
            If func is provided, returns the tool ID. Otherwise returns a decorator.
        """

        def decorator(f: Callable) -> str:
            docstring = f.__doc__ or ""
            signature = str(inspect.signature(f))
            object_id = uuid.uuid4()
            object_id_str = str(object_id)

            # NOTE: Augmentation is a technique that is used to improve the quality of the tool's docstring
            #¬†by using the LLM to enhance the tool's discoverability and retrieval this is a [memory engineering] technique
            if augment:
                # Use LLM to enhance the tool's discoverability
                augmented_docstring = self._augment_docstring(docstring)
                queries = self._generate_queries(augmented_docstring)
                
                # Create rich embedding text combining all information
                embedding_text = f"{f.__name__} {augmented_docstring} {signature} {' '.join(queries)}"
                embedding = get_embedding(embedding_text)
                
                tool_data = self._get_tool_metadata(f)
                tool_data.description = augmented_docstring  # Use augmented description

                tool_dict = {
                    "_id": object_id_str,  # Use string, not UUID object
                    "embedding": embedding,
                    "queries": queries,
                    "augmented": True,
                    **tool_data.model_dump(),
                }
            else:
                # Basic registration without augmentation
                embedding = get_embedding(f"{f.__name__} {docstring} {signature}")
                tool_data = self._get_tool_metadata(f)

                tool_dict = {
                    "_id": object_id_str,  # Use string, not UUID object
                    "embedding": embedding,
                    "augmented": False,
                    **tool_data.model_dump(),
                }

            # Store the tool in the toolbox memory for retrieval
            # The embedding enables semantic search to find relevant tools
            self.memory_manager.write_toolbox(
                f"{f.__name__} {docstring} {signature}", 
                tool_dict
            )
            
            # Keep reference to the callable for execution
            self._tools[object_id_str] = f
            self._tools_by_name[f.__name__] = f  # Also store by name for easy lookup
            return object_id_str

        if func is None:
            return decorator
        return decorator(func)


In [None]:
import os
import getpass

# Function to securely get and set environment variables
def set_env_securely(var_name, prompt):
    value = getpass.getpass(prompt)
    os.environ[var_name] = value


In [None]:
set_env_securely("OPENAI_API_KEY", "OpenAI API Key: ")

In [None]:
from openai import OpenAI

client = OpenAI()

# Initialize the Toolbox
toolbox = Toolbox(memory_manager=memory_manager, llm_client=client)

# Context Engineering Techniques

--------


> **Context engineering** refers to the set of strategies for curating and maintaining the optimal set of tokens (information) during LLM inference, including all the other information that may land there outside of the prompts.
> 
> ‚Äî *Anthropic*

While memory engineering focuses on *what to store and retrieve*, context engineering focuses on *how to manage what's in the context window right now*. This includes monitoring usage, compressing information, and providing just-in-time access to details.

## What This Section Covers

| Step | Function | Purpose |
|------|----------|---------|
| **1. Calculate Usage** | `calculate_context_usage()` | Monitor what % of the context window is used |
| **2. Summarize** | `summarise_context_window()` | Compress long content into summaries using LLM |
| **3. Offload** | `offload_to_summary()` | Auto-trigger summarization when usage exceeds threshold |
| **4. Just-in-Time Retrieval** | `expand_summary()` tool | Let agent expand summaries on demand |

**`Just-In-Time (JIT)`** retrieval is the process of fetching only the information needed at the exact moment the agent requires it, based on the current task, query, or reasoning step. Instead of loading pre-computed or pre-cached context upfront, the system dynamically retrieves the minimal, most relevant data on demand, ensuring efficiency and reducing context overload. In the context of agent memory JIT is a retrieval-control strategy where memory access is triggered by the agent‚Äôs current goal, query, or reasoning step. Rather than preloading large histories or the full knowledge base, the system dynamically filters, ranks, and injects only the information that materially influences the next token. This reduces context saturation, improves attention allocation, and increases reasoning fidelity.

## The Context Management Flow

```
Context built ‚Üí Check usage % ‚Üí If >80%: Summarize & offload ‚Üí Store summary with ID
                                                              ‚Üì
Agent sees: [Summary ID: abc123] Brief description ‚Üê Agent can call expand_summary("abc123") if needed
```

This approach keeps the context lean while giving the agent access to full details when required.

In [None]:
# Context window calculator - returns percentage used
def calculate_context_usage(context: str, model: str = "gpt-4o-mini") -> dict:
    """Calculate context window usage as percentage."""
    estimated_tokens = len(context) // 4  # ~4 chars per token
    max_tokens = MODEL_TOKEN_LIMITS.get(model, 128000)
    percentage = (estimated_tokens / max_tokens) * 100
    return {"tokens": estimated_tokens, "max": max_tokens, "percent": round(percentage, 1)}


In [None]:
# Context summariser - calls LLM and stores summary
import uuid

def summarise_context_window(content: str, memory_manager, llm_client, model: str = "gpt-4o-mini") -> dict:
    """Summarise content using LLM and store in summary memory."""
    # Call LLM to summarise
    response = llm_client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": f"Summarise this in 2-3 sentences:\n{content[:3000]}"}],
        max_tokens=200
    )
    summary = response.choices[0].message.content
    
    # Generate one-liner description
    desc_response = llm_client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": f"Write a 10-word label for: {summary}"}],
        max_tokens=30
    )
    description = desc_response.choices[0].message.content.strip()
    
    # Store in memory
    summary_id = str(uuid.uuid4())[:8]
    memory_manager.write_summary(summary_id, content, summary, description)
    
    return {"id": summary_id, "description": description, "summary": summary}


In [None]:
# Context offloader - replaces content with summary reference
def offload_to_summary(context: str, memory_manager, llm_client, threshold_percent: float = 80.0) -> tuple:
    """If context exceeds threshold, summarise and return compacted version."""
    usage = calculate_context_usage(context)
    
    if usage['percent'] < threshold_percent:
        return context, []  # No offload needed
    
    # Summarise the context
    result = summarise_context_window(context, memory_manager, llm_client)
    
    # Return compact reference instead of full content
    compact = f"[Summary ID: {result['id']}] {result['description']}"
    return compact, [result]


### Summary Tools & Conversation Compaction

Below we register the `expand_summary` and `summarize_and_store` functions as tools the agent can call.

#### Design Logic: Why Mark Instead of Delete?

When conversation history grows large, we need to reduce context window usage. We had two choices:

| Approach | Pros | Cons |
|----------|------|------|
| **Delete summarized messages** | Simple, immediate space savings | Permanent data loss, can't audit or recover |
| **Mark as summarized (our choice)** | Preserves history, reversible, auditable | Slightly more complex queries |

**Our intuition:** Memory should be *compressed*, or *forgotten* not *erased*. By marking messages with a `summary_id` instead of deleting them:

1. **Full history is preserved** ‚Äî Original messages remain in the database for auditing, debugging, or reprocessing
2. **Linkage is maintained** ‚Äî Each summary knows which messages it represents (via `summary_id`)
3. **Reversible** ‚Äî If a summary is deleted, you could "unsummarize" by clearing the `summary_id`

#### The Flow

```
Thread has 50 messages ‚Üí Context too large ‚Üí summarize_conversation(thread_id)
                                                    ‚Üì
                        1. Read unsummarized messages
                        2. LLM summarizes them
                        3. Store summary with unique ID
                        4. UPDATE messages SET summary_id = 'abc123'
                                                    ‚Üì
                        Next read: Only new messages appear + Summary ID reference
```

This is a form of **log compaction** ‚Äî a pattern borrowed from databases and message queues where old entries are compressed but not lost.

In [None]:
# Summary tools for the agent
@toolbox.register_tool(augment=True)
def expand_summary(summary_id: str) -> str:
    """Expand a summary reference to full content. Use when you need more details from a [Summary ID: xxx] reference."""
    return memory_manager.read_summary_memory(summary_id)

@toolbox.register_tool(augment=True)
def summarize_and_store(text: str, thread_id: str = None) -> str:
    """Summarize long text and store in memory. Returns a summary ID for later retrieval with expand_summary."""
    result = summarise_context_window(text, memory_manager, client)
    # If thread_id provided, mark conversation messages as summarized
    if thread_id:
        memory_manager.mark_as_summarized(thread_id, result['id'])
    return f"Stored as [Summary ID: {result['id']}] {result['description']}"

def summarize_conversation(thread_id: str) -> dict:
    """
    Summarize all unsummarized messages in a thread and mark them.
    Call this to compact a thread's conversation history.
    """
    # Read current unsummarized messages
    conv_memory = memory_manager.read_conversational_memory(thread_id, limit=100)
    
    if not conv_memory or "[]" in conv_memory:
        return {"status": "nothing_to_summarize"}
    
    # Summarize the conversation
    result = summarise_context_window(conv_memory, memory_manager, client)
    
    # Mark messages as summarized
    memory_manager.mark_as_summarized(thread_id, result['id'])
    
    print(f"‚úÖ Conversation summarized: [Summary ID: {result['id']}]")
    return result



# Web Access with Tavily

--------

This section demonstrates how to create an **agentic tool** that the LLM can call to search the web. 

We use [Tavily](https://tavily.com/), an AI-optimized search API designed for LLM applications.

## What This Section Does

1. **Initialize the Tavily client** ‚Äî Set up the search API with an API key
2. **Register `search_tavily` as a tool** ‚Äî Use `@toolbox.register_tool(augment=True)` to make it discoverable
3. **Implement the search-and-store pattern** ‚Äî Results are automatically written to knowledge base memory
4. **Test tool retrieval** ‚Äî Verify the tool can be found via semantic search

## The Search-and-Store Pattern

One thing to note is that not only do we get external context that is not available to the Agent at execution, but we persists this to the knowledge base memory and the Agent can reuse this information in subsequent iteration.
When the agent calls `search_tavily()`, it doesn't just return results‚Äîit **persists them to the knowledge base**:

```
Agent calls search_tavily("latest AI news")
       ‚Üì
Tavily API returns results
       ‚Üì
Each result is written to knowledge_base_vs with metadata (title, URL, timestamp)
       ‚Üì
Future queries can retrieve this information without searching again
```

This pattern means the agent **learns** from its searches. Information discovered once becomes part of the agent's long-term memory, available for future conversations without additional API calls.

In [None]:
set_env_securely("TAVILY_API_KEY", "Tavily API Key: ")

In [None]:
from tavily import TavilyClient
from datetime import datetime

# Don't forget to set your API key!
tavily_client = TavilyClient(api_key=os.environ["TAVILY_API_KEY"])

@toolbox.register_tool(augment=True)
def search_tavily(query: str, max_results: int = 5):
    """
    Use this function to search the web and store the results in the knowledge base.
    """
    response = tavily_client.search(query=query, max_results=max_results)
    results = response.get("results", [])

    # Write each result to the knowledge base
    for result in results:
        # Create the text content to embed
        text = f"Title: {result.get('title', '')}\nContent: {result.get('content', '')}\nURL: {result.get('url', '')}"
        
        # Create metadata
        metadata = {
            "title": result.get("title", ""),
            "url": result.get("url", ""),
            "score": result.get("score", 0),
            "source_type": "tavily_search",
            "query": query,
            "timestamp": datetime.now().isoformat()
        }
        
        # Write to knowledge base
        memory_manager.write_knowledge_base(text, metadata)

    return results

In [None]:
import pprint
retreived_tools = memory_manager.read_toolbox("Search the internet")
pprint.pprint(retreived_tools)

# Agent Execution

--------


This is where everything comes together. We build a complete **agent loop** that integrates all the memory types, context engineering, and tool calling we've implemented.

## What This Section Contains

| Component | Purpose |
|-----------|---------|
| `AGENT_SYSTEM_PROMPT` | Instructions telling the LLM how to use memory and tools |
| `execute_tool()` | Looks up and executes tools from the toolbox by name |
| `call_openai_chat()` | Wrapper for OpenAI Chat Completions API with tool support |
| `call_agent()` | The main agent loop that orchestrates everything |


In [None]:
import json as json_lib

client = OpenAI()

# ==================== SYSTEM PROMPT ====================
# Below is an example of prompt engineering technique called role description.
# It is a technique that is used to improve the quality of the LLM's output.
# Although there are research that suggest that role description doesn't realy affect the quality of the LLM's output, it is still a useful technique
#¬†and it is a good [prompt engineering] technique to know.
AGENT_SYSTEM_PROMPT = """
# System Instructions
You are an intelligent assistant with access to memory and tools.

IMPORTANT: The user's input contains CONTEXT that has already been retrieved for you:
- Conversation Memory: Previous conversations
- Knowledge Base Memory: Relevant documents
- Summary Memory: Compressed summaries with IDs

## Summary Memory
When you see [Summary ID: xxx] entries, you can call expand_summary(summary_id) to get the full content.
Use this for just-in-time retrieval when you need more details.

When answering:
1. FIRST, use the context provided in the input
2. If you need more detail from a summary, call expand_summary
3. Only use search tools if context is insufficient
"""

def execute_tool(tool_name: str, tool_args: dict) -> str:
    """Execute a tool by looking it up in the toolbox."""
    
    if tool_name not in toolbox._tools_by_name:
        return f"Error: Tool '{tool_name}' not found"
    
    return str(toolbox._tools_by_name[tool_name](**tool_args) or "Done")

# ==================== OPENAI CHAT FUNCTION ====================
def call_openai_chat(messages: list, tools: list = None, model: str = "gpt-4o-mini", temperature: float = 0.4):
    """Call OpenAI Chat Completions API with tools."""
    kwargs = {"model": model, "messages": messages, "temperature": temperature}
    if tools:
        kwargs["tools"] = tools
        kwargs["tool_choice"] = "auto"
    return client.chat.completions.create(**kwargs)

## The Agent Loop Flow

```
1. BUILD CONTEXT
   ‚îú‚îÄ‚îÄ Read conversational memory (chat history)
   ‚îú‚îÄ‚îÄ Read knowledge base (relevant documents)
   ‚îú‚îÄ‚îÄ Read workflow memory (past action patterns)
   ‚îú‚îÄ‚îÄ Read entity memory (people, places, systems)
   ‚îî‚îÄ‚îÄ Read summary context (available summary IDs)

2. CHECK CONTEXT USAGE
   ‚îî‚îÄ‚îÄ If >80% used ‚Üí Summarize and offload

3. GET TOOLS
   ‚îî‚îÄ‚îÄ Retrieve semantically relevant tools from toolbox

4. STORE USER MESSAGE
   ‚îî‚îÄ‚îÄ Write to conversational memory + extract entities

5. AGENT LOOP (up to max_iterations)
   ‚îú‚îÄ‚îÄ Call LLM with context + tools
   ‚îú‚îÄ‚îÄ If tool calls ‚Üí Execute tools, add results to messages
   ‚îî‚îÄ‚îÄ If no tool calls ‚Üí Return final answer

6. SAVE RESULTS
   ‚îú‚îÄ‚îÄ Write workflow (if tools were used)
   ‚îú‚îÄ‚îÄ Extract entities from response
   ‚îî‚îÄ‚îÄ Store assistant response in conversational memory
```

## Key Design Decisions

- **Memory is loaded programmatically** ‚Äî The agent always has context without deciding to "remember"
- **Tools are retrieved semantically** ‚Äî Only relevant tools are passed to the LLM
- **Context is monitored** ‚Äî Auto-summarization prevents overflow
- **Everything is persisted** ‚Äî Conversations, workflows, and entities are saved for future use

In [None]:
# ==================== MAIN AGENT LOOP ====================
def call_agent(query: str, thread_id: str = "1", max_iterations: int = 10) -> str:
    """Agent loop with context window monitoring and summarization."""
    thread_id = str(thread_id)
    steps = []
    summaries = []  # Track created summaries
    
    # 1. Build context from memory
    print("\n" + "="*50)
    print("üß† BUILDING CONTEXT...")
    
    context = f"# Question\n{query}\n\n"
    context += memory_manager.read_conversational_memory(thread_id) + "\n\n"
    context += memory_manager.read_knowledge_base(query) + "\n\n"
    context += memory_manager.read_workflow(query) + "\n\n"
    context += memory_manager.read_entity(query) + "\n\n"
    context += memory_manager.read_summary_context(query) + "\n\n"  # Shows IDs + descriptions

    print("====CONTEXT WINDOW=====\n")
    print(context)
    
    # 2. Check context usage - summarize if >80%
    usage = calculate_context_usage(context)
    print(f"üìä Context: {usage['percent']}% ({usage['tokens']}/{usage['max']} tokens)")
    
    if usage['percent'] > 80:
        print("‚ö†Ô∏è Context >80% - summarizing...")
        context, summaries = offload_to_summary(context, memory_manager, client)
        # Add summary references to context
        if summaries:
            summary_section = "\n## Summary Memory\n"
            for s in summaries:
                summary_section += f"[Summary ID: {s['id']}] {s['description']}\n"
            context = summary_section + "\n" + context
        usage = calculate_context_usage(context)
        print(f"üìä After summarization: {usage['percent']}%")
    
    # 3. Get tools
    dynamic_tools = memory_manager.read_toolbox(query, k=5)
    print(f"üîß Tools: {[t['function']['name'] for t in dynamic_tools]}")
    
    # 4. Store user message & extract entities
    memory_manager.write_conversational_memory(query, "user", thread_id)
    try:
        memory_manager.write_entity("", "", "", llm_client=client, text=query)
    except: pass
    
    # 5. Agent loop
    messages = [{"role": "system", "content": AGENT_SYSTEM_PROMPT}, {"role": "user", "content": context}]
    final_answer = ""
    
    print("\nü§ñ AGENT LOOP")
    for iteration in range(max_iterations):
        print(f"\n--- Iteration {iteration + 1} ---")
        
        response = call_openai_chat(messages, tools=dynamic_tools)
        msg = response.choices[0].message
        
        if msg.tool_calls:
            messages.append({"role": "assistant", "content": msg.content, "tool_calls": [
                {"id": tc.id, "type": "function", "function": {"name": tc.function.name, "arguments": tc.function.arguments}}
                for tc in msg.tool_calls
            ]})
            
            for tc in msg.tool_calls:
                tool_name = tc.function.name
                tool_args = json_lib.loads(tc.function.arguments)
                # Format args for display (truncate long values)
                args_display = {k: (v[:50] + '...' if isinstance(v, str) and len(v) > 50 else v) 
                               for k, v in tool_args.items()}
                print(f"üõ†Ô∏è {tool_name}({args_display})")
                
                try:
                    result = execute_tool(tool_name, tool_args)
                    steps.append(f"{tool_name}({args_display}) ‚Üí success")
                except Exception as e:
                    result = f"Error: {e}"
                    steps.append(f"{tool_name}({args_display}) ‚Üí failed")
                
                print(f"   ‚Üí {result[:200]}...")
                messages.append({"role": "tool", "tool_call_id": tc.id, "content": result})
        else:
            final_answer = msg.content or ""
            print(f"\n‚úÖ DONE ({len(steps)} tool calls)")
            break
    
    # 6. Save workflow & entities
    if steps:
        memory_manager.write_workflow(query, steps, final_answer)
    try:
        memory_manager.write_entity("", "", "", llm_client=client, text=final_answer)
    except: pass
    memory_manager.write_conversational_memory(final_answer, "assistant", thread_id)
    
    print("\n" + "="*50 + f"\nüí¨ ANSWER:\n{final_answer}\n" + "="*50)
    return final_answer


In [None]:
call_agent("These results are good, can you double click into the most relevant one and tell me more about it?", thread_id="0")