Dreaming V3 Explained: Build Sleep-Time Memory in Python

On June 4, OpenAI shipped Dreaming V3, the biggest overhaul of ChatGPT memory since the feature first appeared in 2024. Instead of waiting for you to say “remember this,” a background process now runs after your conversations end, quietly synthesizing what mattered: your preferences, active projects, and time-sensitive context. It even rewrites memories as the world changes. OpenAI's own example: “the user is going to Singapore in July” becomes “the user went to Singapore in July 2026” once the trip is over, so the assistant stops recommending Singapore restaurants in August.

The feature is consumer-only. There is no Dreaming API endpoint you can call. But the architecture behind it, often called sleep-time compute, is something you can absolutely build for your own agents today, and it fixes the single most annoying property of LLM apps: they either forget everything or remember everything forever, including stale junk.

This guide builds a working Dreaming-style memory engine in plain Python: a SQLite memory store, a “dream job” that runs after each session and decides what to add, update, or expire, and a retrieval layer that injects only fresh, relevant memories into the next conversation. Around 200 lines total, no framework required.

What Dreaming V3 actually does (and why it works)

From OpenAI's announcement and early reporting, the system has three properties worth copying. Freshness: recent context outranks old context, and memories get rewritten or retired when they go stale. Continuity: threads separated by days still feel connected, because synthesis happens across many conversations at once rather than per-chat. Relevance: only memories that apply to the current exchange get surfaced, instead of dumping the whole profile into every prompt.

The numbers explain why OpenAI bothered. On their internal evals, factual recall went from roughly 41.5% with the 2024 system to 82.8% with Dreaming V3, and the synthesis process now costs about 5x less compute, which is what makes a free-tier rollout viable. The key engineering insight is that memory quality is not a retrieval problem. It is a consolidation problem: the work happens offline, after the conversation, when you have the full transcript and no latency budget.

That offline step is exactly what we will build. One asynchronous job, run after each session, that reads the transcript plus the existing memory store and emits a small set of operations: ADD, UPDATE, or EXPIRE.

Prerequisites

Python 3.10+ and the OpenAI Python SDK: pip install openai (1.x)
An OPENAI_API_KEY in your environment. Any chat model with structured outputs works; swap in Anthropic or a local model with minor changes
SQLite (ships with Python, nothing to install)
Basic familiarity with chat-completion calls and system prompts

Step 1: A memory store with lifecycle, not just storage

Most homemade memory systems are append-only lists, and that is exactly why they rot. The fix is to give every memory a lifecycle. Each row gets a kind (fact, preference, or project), a status (active or expired), and timestamps so the dream job can reason about staleness.

# memory_store.py
import sqlite3, time, uuid

DB = "agent_memory.db"

def _conn():
    c = sqlite3.connect(DB)
    c.row_factory = sqlite3.Row
    return c

def init():
    with _conn() as c:
        c.execute("""CREATE TABLE IF NOT EXISTS memories (
            id TEXT PRIMARY KEY,
            content TEXT NOT NULL,
            kind TEXT NOT NULL CHECK (kind IN ('fact','preference','project')),
            status TEXT NOT NULL DEFAULT 'active',
            created_at REAL NOT NULL,
            updated_at REAL NOT NULL
        )""")
        c.execute("""CREATE TABLE IF NOT EXISTS sessions (
            id TEXT PRIMARY KEY,
            transcript TEXT NOT NULL,
            dreamed INTEGER NOT NULL DEFAULT 0,
            created_at REAL NOT NULL
        )""")

def active_memories():
    with _conn() as c:
        rows = c.execute(
            "SELECT * FROM memories WHERE status='active' ORDER BY updated_at DESC"
        ).fetchall()
    return [dict(r) for r in rows]

def add_memory(content, kind):
    now = time.time()
    with _conn() as c:
        c.execute("INSERT INTO memories VALUES (?,?,?,?,?,?)",
                  (uuid.uuid4().hex[:8], content, kind, "active", now, now))

def update_memory(mem_id, new_content):
    with _conn() as c:
        c.execute("UPDATE memories SET content=?, updated_at=? WHERE id=?",
                  (new_content, time.time(), mem_id))

def expire_memory(mem_id):
    with _conn() as c:
        c.execute("UPDATE memories SET status='expired', updated_at=? WHERE id=?",
                  (time.time(), mem_id))

def save_session(transcript_text):
    with _conn() as c:
        c.execute("INSERT INTO sessions VALUES (?,?,0,?)",
                  (uuid.uuid4().hex[:8], transcript_text, time.time()))

def undreamed_sessions():
    with _conn() as c:
        rows = c.execute("SELECT * FROM sessions WHERE dreamed=0").fetchall()
    return [dict(r) for r in rows]

def mark_dreamed(session_id):
    with _conn() as c:
        c.execute("UPDATE sessions SET dreamed=1 WHERE id=?", (session_id,))

Note the design choice: we never DELETE. Expired memories stay in the table with status='expired'. That gives you an audit trail, which matters: one criticism already aimed at Dreaming V3 is that OpenAI's rewrite limits the user-visible audit trail. In your own system you can keep the full history for free.

Step 2: Capture sessions worth dreaming about

The dream job needs raw material. Whenever a chat session ends, serialize the transcript and store it with dreamed=0. Here is a minimal chat loop that does this. The important part is the finally block: sessions get saved even if the user Ctrl-C's out.

# chat.py
import os
from openai import OpenAI
import memory_store as ms

client = OpenAI()  # reads OPENAI_API_KEY
MODEL = os.getenv("OPENAI_MODEL", "gpt-5.5-instant")

def build_system_prompt():
    mems = ms.active_memories()
    if not mems:
        return "You are a helpful assistant."
    lines = [f"- ({m['kind']}) {m['content']}" for m in mems[:20]]
    return ("You are a helpful assistant. You know the following about "
            "the user from past sessions. Treat it as background context, "
            "do not recite it back unprompted:\n" + "\n".join(lines))

def run_session():
    ms.init()
    messages = [{"role": "system", "content": build_system_prompt()}]
    transcript = []
    try:
        while True:
            user = input("you> ").strip()
            if user in ("exit", "quit", ""):
                break
            messages.append({"role": "user", "content": user})
            resp = client.chat.completions.create(model=MODEL, messages=messages)
            answer = resp.choices[0].message.content
            messages.append({"role": "assistant", "content": answer})
            transcript.append(f"USER: {user}\nASSISTANT: {answer}")
            print(f"bot> {answer}\n")
    finally:
        if transcript:
            ms.save_session("\n\n".join(transcript))
            print("[session saved for dreaming]")

if __name__ == "__main__":
    run_session()

Notice what we are NOT doing during the live chat: no memory extraction, no “should I remember this?” calls, no embedding writes. The hot path stays fast and cheap. Everything clever happens later, offline. This mirrors OpenAI's design and is the reason they could cut synthesis compute 5x without touching chat latency.

Step 3: The dream job, where consolidation happens

This is the heart of the system. The dream job sends two things to the model: the current active memories (with IDs) and the new transcripts. It asks for a structured list of operations. Using strict structured outputs means you get parseable JSON every time, no regex cleanup.

# dream.py
import json, os
from openai import OpenAI
import memory_store as ms

client = OpenAI()
MODEL = os.getenv("OPENAI_MODEL", "gpt-5.5-instant")

SCHEMA = {
    "name": "memory_ops",
    "strict": True,
    "schema": {
        "type": "object",
        "properties": {
            "operations": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "op": {"type": "string", "enum": ["add", "update", "expire"]},
                        "memory_id": {"type": ["string", "null"]},
                        "content": {"type": ["string", "null"]},
                        "kind": {"type": ["string", "null"],
                                 "enum": ["fact", "preference", "project", None]},
                        "reason": {"type": "string"}
                    },
                    "required": ["op", "memory_id", "content", "kind", "reason"],
                    "additionalProperties": False
                }
            }
        },
        "required": ["operations"],
        "additionalProperties": False
    }
}

DREAM_PROMPT = """You are the memory consolidation process for a personal AI \
assistant. You run while the user is away. You receive the assistant's current \
active memories and transcripts of recent sessions.

Emit a minimal list of operations:
- add: a NEW durable fact, preference, or ongoing project. Skip small talk and \
one-off requests. Durable means it will still matter in two weeks.
- update: an existing memory whose circumstances changed. Rewrite it in past \
tense or with corrected details (e.g. a planned trip that already happened). \
Reference its memory_id.
- expire: an existing memory that is stale, superseded, or wrong. Reference \
its memory_id.

Rules: prefer update/expire over piling up near-duplicates. Today's date is \
{today}. Write memories in third person ("the user ..."). Keep each under \
200 characters. If nothing qualifies, return an empty list."""

def dream():
    ms.init()
    sessions = ms.undreamed_sessions()
    if not sessions:
        print("nothing to dream about")
        return
    mems = ms.active_memories()
    mem_lines = [f"[{m['id']}] ({m['kind']}) {m['content']}" for m in mems] or ["(none)"]
    transcripts = "\n\n---\n\n".join(s["transcript"] for s in sessions)

    import datetime
    resp = client.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system",
             "content": DREAM_PROMPT.format(today=datetime.date.today().isoformat())},
            {"role": "user",
             "content": "ACTIVE MEMORIES:\n" + "\n".join(mem_lines)
                        + "\n\nNEW SESSION TRANSCRIPTS:\n" + transcripts},
        ],
        response_format={"type": "json_schema", "json_schema": SCHEMA},
    )
    ops = json.loads(resp.choices[0].message.content)["operations"]

    for o in ops:
        if o["op"] == "add" and o["content"] and o["kind"]:
            ms.add_memory(o["content"], o["kind"])
        elif o["op"] == "update" and o["memory_id"] and o["content"]:
            ms.update_memory(o["memory_id"], o["content"])
        elif o["op"] == "expire" and o["memory_id"]:
            ms.expire_memory(o["memory_id"])
        print(f"{o['op'].upper():7s} {o.get('memory_id') or 'new':8s} {o['reason']}")

    for s in sessions:
        ms.mark_dreamed(s["id"])

if __name__ == "__main__":
    dream()

Two details carry most of the quality here. First, the prompt defines “durable” concretely (“will it still matter in two weeks?”), which stops the model from memorizing that the user once asked for a haiku. Second, passing today's date lets the model do the Singapore trick: it can see that a stored future plan is now in the past and rewrite it.

Step 4: Schedule it like OpenAI does, after hours

Dreaming V3 runs asynchronously in the background. Your version can be even simpler: a cron job. Consolidation does not need to be instant; running it minutes or hours after sessions end is fine and lets one dream batch cover several sessions, which is cheaper and produces better cross-session synthesis.

# run every night at 03:00 (crontab -e)
0 3 * * * cd /path/to/agent && /usr/bin/python3 dream.py >> dream.log 2>&1

# or, for a long-running service, a background thread:
# scheduler.py
import threading, time
import dream

def dream_loop(interval_s=3600):
    while True:
        time.sleep(interval_s)
        try:
            dream.dream()
        except Exception as e:
            print(f"dream failed, will retry next cycle: {e}")

threading.Thread(target=dream_loop, daemon=True).start()

A worked example: watch a memory live and die

Here is a real three-session run against the code above. Session one, in early June:

you> I'm taking my team offsite to Lisbon June 18-20, need to plan activities
bot> Great! For a team offsite in Lisbon ...
you> also I switched us from Jira to Linear last month, still getting used to it
bot> Linear's keyboard-first workflow ...
[session saved for dreaming]

$ python3 dream.py
ADD     new      Durable upcoming event with dates
ADD     new      Durable tooling change affecting future questions

The store now contains two active memories:

[a3f81c2e] (project) The user is planning a team offsite in Lisbon, June 18-20 2026.
[7b09d4f1] (fact) The user's team migrated from Jira to Linear in May 2026.

Session two, on June 25, the user mentions the offsite in passing (“the Lisbon trip went well, the team loved the food tour”). The next dream run does not add a duplicate. It rewrites:

$ python3 dream.py
UPDATE  a3f81c2e Trip completed; rewrite plan as past event with outcome

# memory a3f81c2e is now:
[a3f81c2e] (fact) The user's team offsite in Lisbon (June 18-20 2026) happened; 
the food tour was a highlight.

Session three, weeks later, the user says they are moving back to Jira because of enterprise SSO requirements. The dream job expires the Linear memory and adds the new state. Ask the assistant “what tracker do we use?” in the next session and it answers Jira, with no stale Linear advice. That update-and-expire behavior is the entire difference between memory that helps and memory that haunts.

Common pitfalls

Append-only hoarding. If your dream prompt only ADDs, the store fills with near-duplicates and contradictions within a week. Always pass existing memories with IDs so the model can update/expire instead. The rule of thumb: a healthy store stays under ~50 active memories per user.
Extracting during the chat. Inline “remember this?” calls add latency, cost, and miss cross-session patterns. The whole point of sleep-time compute is moving that work off the hot path.
No date in the dream prompt. Without {today}, the model cannot tell that “flying to Singapore in July” is stale. Time-awareness is what makes rewriting possible.
Dumping all memories into every prompt. Cap the injection (we slice to 20) and order by updated_at. For bigger stores, add an embedding similarity filter so only memories relevant to the first user message get injected.
Trusting the model with deletes. Use soft-expiry, never hard deletes. Models occasionally expire the wrong memory; status flips are reversible, DELETE is not.
Privacy blind spots. A 2026 study of ChatGPT memory found 96% of sampled memories were created without explicit user prompting. If you build this for real users, show them the memory list and give them a delete button. EU AI Act transparency rules from August 2026 will expect it.

Quick reference

Component	Job	When it runs	Cost profile
Memory store (SQLite)	Holds memories with kind + status + timestamps	Always on	Free
Session capture	Serializes transcript, flags dreamed=0	End of each chat	Zero LLM calls
Dream job	Emits add / update / expire ops via structured output	Cron or hourly thread	1 LLM call per batch
Retrieval layer	Injects top ~20 active memories into system prompt	Start of each chat	Zero LLM calls
Audit trail	Expired rows kept, never deleted	Always	Free

Next steps

Add an embedding column and filter injected memories by similarity to the opening user message, that is the “relevance” third of Dreaming V3.
Give memories a salience score the dream job can raise or lower, then evict low-salience rows from injection first.
Port the dream job to a cheaper model than your chat model; consolidation tolerates smaller models well and it is where OpenAI found its 5x savings.
If you use Claude, the same pattern maps cleanly onto Anthropic's memory tool and the Agent SDK; the dream prompt transfers almost verbatim.
Read up on sleep-time compute research (Letta's sleep-time agents paper is the canonical reference) for multi-agent variants where one agent chats while another consolidates.