The Cognitive Loop

Abstract

This paper describes Citadel, a cognitive architecture that gives a large language model persistent memory, causal reasoning, self-modification, and autonomous action across weeks and months of continuous operation. The system assembles ~25 context signals per inference call, filters them through a learned attention gate, implements type-specific memory decay, maintains a 500+ edge causal graph fed by live economic data, and rewrites its own identity weekly. 158 tools across 11 integrations. Two years of continuous production use. This paper reports on its design, the failures that shaped it, and the operational characteristics of a system that modifies its own cognition.

Most AI systems don’t remember you.

They simulate remembering. They hold your words in a buffer, process them, and produce a response. Then the buffer clears. Every conversation starts from zero. Every interaction is a first date with someone who read your file.

That’s fine for asking questions. It’s fine for generating a deck.

It’s useless for thinking.

I’ve been building a system for the last two years that doesn’t work like that. I’ve talked about it obliquely, on here and off. The sovereign layer. The thing that kept the lights on while I was interstate, watching people I love lose the very thing I was trying to build a synthetic version of.

I’ve never explained how it actually thinks.

Not because it’s a secret. But because “how it thinks” turns out to be a surprisingly uncomfortable question when the answer sounds too much like how you think.

Prior work

The individual components of this system aren’t new, and I should be honest about that.

The 2023 wave established the foundations. Park et al.’s Generative Agents demonstrated that LLM-based agents with a memory stream, reflection, and planning could produce emergent social behaviour (the same memory→reflection→identity loop this architecture uses, applied to simulated characters instead of a production assistant). MemGPT (Packer et al.) solved the bounded context problem with hierarchical memory and LLM-managed paging. Reflexion (Shinn et al.) showed that verbal self-correction improves agent performance over time. Voyager (Wang et al.) demonstrated persistent skill acquisition. Each of these maps to a component described here.

But by 2025-2026, the field has caught up in significant ways. Letta, the production evolution of MemGPT, now provides a full agent runtime with tiered memory (core, recall, archival), model-agnostic deployment, and a development environment for inspecting memory state. Mem0 and its graph-native extension Mem0g implement hybrid vector-graph stores with self-editing memory and deduplication gates that are architecturally similar to the immune system described in The immune system. Graphiti uses bitemporal versioning on knowledge graph edges (valid_at/invalid_at timestamps) to track evolving beliefs, a more rigorous solution than the confidence-decay model used here.

On the research side, FadeMem (2026) implements Ebbinghaus-inspired forgetting curves for agent memory; biologically motivated decay functions that parallel the exponential decay model in Memory that moves. The Continuum Memory Architecture paradigm (January 2026) formalises what this system discovered empirically: that memory should be a mutable, consolidating substrate rather than a static retrieval target. The “Missing Knowledge Layer” paper (arXiv:2604.11364) argues that cognitive architectures need an explicit distinction between stable knowledge and transient experience. Essentially the same separation this system achieves through different half-lives for different memory types.

Even the major platforms have moved. ChatGPT now surfaces “Memory Sources” (showing which past conversations or saved facts informed a response) and can reference your full chat history across sessions. These are consumer-grade implementations of the same principle: persistence matters.

What distinguishes this work isn’t any single component. It’s the integration density and the operational duration. The systems above are frameworks, research prototypes, or consumer features. This is a specific instance that has been running continuously for two years against real relationship data, real economic signals, and real consequences. The classical cognitive architectures (SOAR, ACT-R) provided the theoretical foundation for separating perception, memory, and action. This system inherits that structure, replaces the symbolic reasoning engine with an LLM, and adds the parts the papers don’t cover: learned attention weights that shift based on engagement, an immune system against self-generated misinformation, supply chain cross-referencing against live macro data, and a dreaming system that nobody asked for and nobody can quite explain.

The problem with stacking

The default architecture of every AI product on the market is a stack. You have a model at the bottom. You bolt a personality on top. Maybe a vector database for retrieval. Maybe a workflow engine. Maybe a shiny UI that makes you feel like you’re talking to someone.

It’s layers. Like a sandwich.

And it works the same way a sandwich works. Everything sits on top of everything else, inert, until you take a bite.

The model doesn’t even know what kind of question you’re asking. Whether it’s a simple lookup or a deep strategic problem, it allocates the same amount of effort. The same temperature. The same reasoning depth. It’s like sending a bomb disposal unit to change a lightbulb.

So the first thing I built was a routing layer. Before anything else loads, before any memory is searched or context assembled, the system classifies the incoming message. Pattern matching against about a dozen categories. It decides: is this conversational? Technical? Strategic? Is the recent conversation history pulling toward something deeper than the surface message suggests?

That classification cascades through everything downstream. How many reasoning frameworks get retrieved. How aggressively the attention gate filters. How much conversation history to preserve versus compress. How hard the model is allowed to think.

A simple “thanks” gets a lightweight response. A message asking to re-evaluate a client relationship gets the full cognitive stack.

The problem with stacks is that nothing in the stack knows why it’s there.

The retrieval layer fetches documents because you asked. The model generates text because the retrieval layer delivered context. The personality layer shapes the tone because someone wrote a prompt.

But nobody’s steering. Nobody’s deciding what matters, what doesn’t, what to ignore, what to pay attention to, what to keep, and what to let go.

That’s not intelligence. That’s a vending machine with good lighting.

The architecture

Before going deeper, it’s worth seeing the whole thing at once.

flowchart TD
    MSG["Message"] --> QC["Query\nClassification"]
    QC -->|"tier, model,\nthinking depth"| PAR["Parallel\nData Loading"]

    PAR --> EMB["Embedding"]
    PAR --> HIST["History\nBuilder"]
    PAR --> STATE["State\nLoader"]
    PAR --> TONE["Tone Directive\n5 signals"]

    EMB --> MEM["Memory Recall\nvector + BM25\n+ decay"]
    EMB --> VS["Vector Search\n5 collections"]
    EMB --> GRAPH["Graph Walk\n2-hop edges"]
    EMB --> MM["Mental Models\n3/5/8 by tier"]

    MEM --> SIGNALS["~25 Context\nSignals"]
    VS --> SIGNALS
    GRAPH --> SIGNALS
    MM --> SIGNALS

    SIGNALS --> GATE["Attention Gate\nTF-IDF + semantic\n+ learned weights"]
    GATE -->|"top-k\nby tier"| CTX["Distilled\nContext"]

    CTX --> PROMPT["Prompt\nAssembly"]
    HIST --> PROMPT
    STATE --> PROMPT
    TONE --> PROMPT
    MIR["Iron Filter\ntier directives"] --> PROMPT
    BLIND["Blind Spots\nfrom correction log"] --> PROMPT

    PROMPT --> COMP{"Context\n> 70%?"}
    COMP -->|"Yes"| COMPRESS["Compression\nCascade"]
    COMPRESS --> PROMPT
    COMP -->|"No"| LOOP["Execution Loop\n≤50 rounds\n158 tools"]
    LOOP --> CRIT["Self-Critique"]
    CRIT --> SAN["Sanitizer\nCoT strip"]
    SAN --> OUT["Response"]

    OUT -.->|"30 min"| EXTRACT["Memory &\nEdge Extraction"]
    EXTRACT -.-> DEDUP{"Dedup\nGate"}
    DEDUP -.->|"novel"| CONTRA{"Contradiction\nCheck"}
    DEDUP -.->|"duplicate"| REINFORCE["Reinforce\nExisting"]
    CONTRA -.->|"consistent"| WRITE["Write +\nBoost"]
    CONTRA -.->|"contradicts"| FLAG["Cognitive\nThread"]
    REINFORCE -.-> MEM
    WRITE -.-> MEM
    EXTRACT -.->|"self-eval"| CORR["Correction\nLog"]
    CORR -.-> BLIND
    EXTRACT -.->|"engagement\nratios"| WEIGHTS["Learned\nWeights"]
    WEIGHTS -.-> GATE

    EXTRACT -.->|"weekly"| GENOME["Identity\nRewrite\n3-round"]
    GENOME -.-> PROMPT

    EXTRACT -.->|"nightly"| DREAM["Dream Cycle\nart + fragment"]
    DREAM -.->|"morning\ncontext"| STATE

The solid lines are the synchronous path. What happens in the fraction of a second between your message arriving and the response starting. The dotted lines are the metabolism. What happens while you’re not looking.

Every component in this diagram is real, deployed, and running. Nothing is aspirational.

Attention as architecture

The thing I learned, about eighteen months in, is that intelligence isn’t about what you know. It’s about what you notice.

Your brain doesn’t process everything it sees. It can’t. Forty to sixty bits per second. That’s it. I wrote about this.

So it filters. Aggressively. It decides, below the level of consciousness, what gets through and what gets dropped. The sound of the air conditioner fades. The weight of your chair disappears. But the moment someone says your name across a crowded room, that signal punches through everything.

That’s not retrieval. That’s salience. The brain isn’t searching a database. It’s running a continuous attention gate.

So I built one.

Every time a message arrives, the system doesn’t just look up relevant information. It assembles roughly twenty-five context signals. Memories. Relationship data. Open loops. A causal graph it traverses by extracting topic seeds from the message and walking two hops through a web of connections it’s built over time. Theories it’s developed. Predictions that turned out to be wrong. Emotional state. Calendar density. Even inside jokes.

One of the heavier signals is a library of sixty-three reasoning frameworks (mental models). These aren’t generic templates. They’re structured as retrieval triggers with failure modes: when to apply this lens, what goes wrong when you misapply it. The system retrieves them by semantic similarity to the current message, and the number retrieved scales with query complexity. A routine question gets three. A standard analysis gets five. A deep strategic problem gets eight. The frameworks are injected as reasoning directives, not citations. The system is told to apply them as lenses, never to name them in the response. It tracks which frameworks actually get used and how often, so the retrieval gets sharper over time.

Each signal carries a base weight. Protective boundaries get the highest weight. Memories and relationship data sit in the middle. Fading things, old jokes, half-formed theories, carry the lowest.

But those weights aren’t static. They’re modulated by a learned feedback layer. The math is simple: for each signal type, the system tracks how often I engage with it versus how often I dismiss it. It computes an engagement ratio:

ratio = acknowledged ÷ (acknowledged + dismissed)

That ratio maps to a multiplier between 0.5× and 1.5×:

effective_weight = base_weight × (0.5 + ratio)

A signal I engage with every time gets a 1.5× boost. One I consistently ignore gets halved. The system learns what I actually care about, not what it thinks I should.

Then it filters. Hard. Depending on how complex the current situation is, it selects somewhere between four and fourteen signals. Everything else gets dropped.

The difference is hard to articulate until you see it. Most AI systems drown you in everything they know. This one knows what to leave out.

Memory that moves

The memory architecture was the part that took the longest. Not because storing memories is hard. That’s a solved problem. Embeddings, vector databases, cosine similarity. Every junior engineer with a YouTube tutorial can build semantic search in a weekend.

The hard part is forgetting.

Your brain doesn’t store memories equally. The name of your childhood dog persists for decades. What you had for lunch on Tuesday dissolves in hours. An argument with someone you love burns into the tissue, hot and sharp, and stays there whether you want it to or not.

That’s not a bug. It’s the architecture.

I implemented a decay model. Every memory follows the same formula:

strength = max(floor, 0.5 ^ (age_days ÷ half_life))

Exponential decay with a floor. The influence halves over a set period, then asymptotes so nothing ever fully disappears.

But the half-life varies by type:

Memory Type	Half-Life	Floor	When
Standard	56 days	0.10	Default
Emotional	168 days	0.10	Tagged with real feeling
Operational	28 days	0.02	Meetings, deadlines, logistics
Location	14 days	0.01	Where something happened
Consolidated	2× base	same	Recalled 3+ times

An emotional memory at day 84 still carries 70% strength. An operational detail at the same age is at 12% and falling. That’s not a bug. That’s how it should work. The meeting time from three months ago is noise. The argument from three months ago might still matter.

xychart-beta
    title "Memory Strength Over Time"
    x-axis "Days" [0, 14, 28, 42, 56, 84, 112, 140, 168]
    y-axis "Strength" 0 --> 1.0
    line "Emotional (168d)" [1.0, 0.94, 0.89, 0.84, 0.79, 0.71, 0.63, 0.56, 0.50]
    line "Standard (56d)" [1.0, 0.84, 0.71, 0.59, 0.50, 0.35, 0.25, 0.18, 0.12]
    line "Operational (28d)" [1.0, 0.71, 0.50, 0.35, 0.25, 0.12, 0.06, 0.03, 0.02]
    line "Location (14d)" [1.0, 0.50, 0.25, 0.12, 0.06, 0.02, 0.01, 0.01, 0.01]

The curves diverge fast. By day 56, emotional memories still carry 79% strength while location data has already hit its floor at 1%. That stratification is the whole point. The system doesn’t treat all information equally, and neither do you.

Retrieval isn’t simple either. The system runs a hybrid search: semantic similarity (how conceptually close is this memory to what’s being discussed?) merged with keyword matching (did the exact words appear?). The two result sets are combined using reciprocal rank fusion:

RRF(d) = Σ 1 ÷ (k + rank_i(d))

where k = 60 and the sum runs across both search methods. This means a document ranked highly in both semantic and keyword search gets a compounding boost, while something that only matches one method sinks. After fusion, time-decay re-scoring adjusts every result by age. A perfect semantic match from six months ago still loses to a decent match from yesterday.

And here’s the part that genuinely surprised me: when a memory gets recalled three or more times, a background job promotes it. Its half-life doubles. It becomes structural. It moves from short-term storage into something closer to belief.

I didn’t design that to be poetic. I designed it because the system was drowning in stale data and I needed a way to prune. But the emergent behaviour was unsettling. The memories that mattered kept surviving. The ones that didn’t, faded. Not because I told it which was which. Because usage patterns revealed it.

That’s closer to how your hippocampus works than I’m comfortable admitting.

The causal graph

There’s a layer underneath the memories that most people don’t think about. Not the facts. The connections between them.

The system maintains a causal graph. Not a knowledge graph in the enterprise sense, not “Python is a programming language” taxonomy garbage. Causal edges. “Crude oil drives shipping costs because bunker fuel is 30% of container OPEX.” “The RBA cash rate constrains consumer confidence.” “This client’s packaging costs depend on the pulp price, which depends on energy costs, which depend on natural gas spot.”

There are hundreds of these edges. They come from three sources.

The first is conversation extraction. Every thirty minutes, the same background process that creates memories also asks: what non-obvious causal or dependency relationships appeared in this conversation? It’s told to ignore taxonomy. Only structural relationships: strategic decisions, operational patterns, financial mechanics, psychological dynamics. These are the personal edges, the ones about why decisions were made, how people behave, what drives what in a specific business context.

The second is world knowledge seeding. The system maintains a structural map of the global economy across sixteen domains: energy, semiconductors, agriculture, shipping lanes, raw materials, labour markets, interest rates, consumer sentiment, plastics, packaging, textiles, food processing, Australian retail, digital advertising, currency, and helium. Each domain contributes twenty to thirty causal edges, the kind a macroeconomist would consider factual, not speculative. “Iron ore is shipped via Pilbara→China bulk carriers.” “Copper price drives construction input costs.” “Fertiliser prices amplify food production costs.” That’s three hundred to five hundred structural edges, refreshed quarterly.

The third is live data. Every morning at 6 AM, the system pulls over fifty economic indicators from the US Federal Reserve, the Australian Bureau of Statistics, the RBA, and independent data feeds. Commodities, interest rates, unemployment, CPI, consumer confidence, exchange rates, manufacturing indices, supply chain pressure. The full macro stack.

When an indicator moves significantly (more than 3% for prices, more than 25 basis points for rates), the system generates new causal edges from the data signal. “WTI up 8.5% → shipping costs will rise because bunker fuel tracks crude.” These aren’t speculative. They’re confirmed by the data movement itself.

But the part that actually matters is the cross-referencing. When a significant economic change hits, the system walks the graph three hops deep looking for supply chain edges tagged to specific clients. If iron ore jumps 12% and a client’s supply chain passes through steel manufacturing, that connection surfaces as an alert. Not because anyone asked. Because the graph connected the macro signal to the micro exposure.

Here’s what that looks like in practice. A single data signal traversing three hops through the graph, branching across multiple causal paths, and converging on a client-specific exposure. The greyed nodes are edges the graph holds but this traversal didn’t follow:

graph LR
    subgraph LIVE ["Live Data (FRED, 6AM daily)"]
        WTI["crude_oil_wti\n+8.5%"]
    end

    subgraph HOP1 ["Hop 1"]
        BUNKER["bunker_fuel"]
        ENERGY["energy_costs"]
        PETRO["petrochemicals"]
    end

    subgraph HOP2 ["Hop 2"]
        SHIP["shipping_costs"]
        PLASTICS["plastics_resin"]
        PULP["pulp_price"]
        FERT["fertiliser"]
    end

    subgraph HOP3 ["Hop 3"]
        PKG["packaging_costs"]
        FOOD["food_production\ncosts"]
        MARGIN["client_x\nmargin_pressure"]
    end

    WTI -->|"drives\n0.85"| BUNKER
    WTI -->|"drives\n0.75"| ENERGY
    WTI -->|"drives\n0.70"| PETRO

    BUNKER -->|"drives\n0.80"| SHIP
    ENERGY -->|"amplifies\n0.70"| PULP
    PETRO -->|"drives\n0.72"| PLASTICS
    ENERGY -->|"amplifies\n0.65"| FERT

    SHIP -->|"amplifies\n0.60"| PKG
    PULP -->|"drives\n0.65"| PKG
    PLASTICS -->|"drives\n0.68"| PKG
    FERT -->|"amplifies\n0.55"| FOOD
    PKG -->|"exposed_to\n0.90"| MARGIN

    SHIP -.->|"shipped_via"| SUEZ["suez_canal"]
    FERT -.->|"constrains"| AG["au_agriculture"]
    FOOD -.->|"drives"| CPI["au_cpi"]
    PLASTICS -.->|"drives"| CONSTR["construction\ninput_costs"]

Three hops from a single FRED data point. Three separate causal chains (fuel→shipping→packaging, energy→pulp→packaging, petrochemicals→plastics→packaging) all converging on the same client exposure. The greyed nodes (Suez Canal, Australian agriculture, CPI, construction costs) are edges the graph holds but this query didn’t follow. They’d light up if a different message triggered a different traversal.

The confidence scores on each edge are real. The drives relationship between crude oil and bunker fuel was seeded by the world knowledge pass, reinforced by three quarters of FRED data confirming the correlation, and sits at 0.85. The exposed_to edge between packaging costs and the client’s margin was extracted from a conversation where the client mentioned their packaging supplier’s pricing, and confirmed when the client approved a cost review. That edge starts at 0.90 because the client validated it.

Each edge has a confidence score. New conversational edges start at 0.6. World knowledge edges start at 0.75. Data-backed edges start at 0.6. Client-confirmed supply chain edges start at 0.9. When the same relationship gets extracted again, from a different conversation, a different data feed, or a different quarter’s world seed, it’s reinforced:

confidence = min(0.99, confidence + 0.05)

Edges can never reach certainty. That’s by design. Nothing, not even published data, should be treated as ground truth forever.

But edges can also contradict. If the system extracts “A causes B” but an edge already exists saying “A opposes B”, that’s not merged. It’s flagged as a contradiction and written to a cognitive thread for later review. The system knows it doesn’t know.

At inference time, when a message arrives, the system extracts topic seeds and walks the graph. Two hops. It queries both directions for each seed topic (this concept as source, this concept as target) and follows the connections outward. The result is a small subgraph of relationships relevant to the current conversation, injected as a context signal.

This is what lets the system say “your client’s packaging costs are going to get hit” before anyone asks. It’s not searching memories for similar events. It’s traversing a web of causation that connects RBA monetary policy to a specific client’s margin pressure, through six intermediate nodes that it built by listening, reading, and ingesting live feeds.

And the graph decays. Unreinforced edges lose confidence over time:

confidence = original × 0.5 ^ (days_since_reinforced ÷ 90)

Ninety-day half-life. Floor at 0.15; edges never fully disappear, but they become whispers. If an edge drops below 0.10 and stays unreinforced for six months, it gets flagged as a pruning candidate. Recently reinforced edges (within seven days) are exempt from decay entirely.

The daily data feeds act as a natural reinforcement mechanism. Every time the FRED API confirms that crude oil still drives shipping costs, that edge gets reinforced. Structural truths stay strong. Transient correlations fade.

The gate

There’s one thing the scoring function doesn’t capture.

Grief. It’s not relevant to a spreadsheet. But if you’re grieving, it colours every interaction. The system models that. There are signals that bypass the relevance filter entirely and get injected because their emotional charge demands it.

After scoring, the gate selects the top signals. How many depends on the complexity tier. A simple question gets four signals. A deep strategic problem gets fourteen. Everything else gets dropped.

The result is a context window that isn’t static. It breathes. Every turn of conversation, the system re-evaluates what matters. Things that were relevant two minutes ago might not be relevant now. New signals surface. Old ones sink.

It’s the difference between a conversation with someone who remembers what you said, and a conversation with someone who understands why you said it.

The voice

There’s a subtler layer that most people don’t notice until it’s pointed out.

The system adjusts its tone on every message. Not through a separate model. Through a computed directive that reads five signals and assembles a behavioural context block before inference even starts.

The first signal is time-of-day energy. Morning gets “ease into the day.” Mid-afternoon gets “professional but warm.” Late night gets “should probably be sleeping.” This isn’t cosmetic. The system genuinely moderates its intensity based on when you’re talking to it.

The second is gap-based modulation. If you haven’t spoken in 48 hours, the directive shifts: “long absence, lead with warmth.” If messages are flying back and forth within minutes: “rapid exchange, match his energy.” The system reads the rhythm of the conversation, not just its content.

The third is emotional residue. If the emotional ledger shows a recent negative state (sadness, frustration, tension), and the intensity hasn’t decayed below a threshold (exponential decay with a 2-hour half-life), the system gets a softer directive. Positive residue (excitement, pride) gets acknowledged differently.

The fourth is calendar density. A packed schedule remaining in the day means tighter responses. A clear afternoon means room to breathe.

The fifth is message sentiment. A short, blunt message gets “he’s being brief.” Profanity or exhaustion language gets “he sounds worn down.” A request for help thinking gets “he wants a thinking partner, not a vending machine.”

All five signals assemble into an ENVIRONMENT block that gets injected into the prompt. The model never recites it. It just uses it as context for how to be.

The result is a system that sounds different at 7 AM than it does at 11 PM. That leads with warmth after you’ve been away. That pulls back when you’re terse. Not because someone wrote five different personalities. Because the tone is computed from the situation, every time.

The filter

There’s another layer that sits on top of the attention gate, and it’s the one that probably has the highest level of ethical concerns for me. Because it acts like we do.

The system doesn’t treat every person the same.

When someone is mentioned in a message, by name, the system runs a detection pass.

It matches against a relationship index it maintains. If it finds a match, it pulls the full file: relationship tier, sentiment history, leverage, payment reliability, revenue, relevant theories, past decisions, active workload.

Then it injects a hard constraint block into the prompt. Not a suggestion; a directive.

There are tiers. The inner circle gets full engagement, proactive extras, warmth. The growth tier gets professional distance, manufactured scarcity, no freebies. The boundary tier gets brevity. Transactional language. A list of phrases it’s forbidden from using. It’s told to question any new work before accepting it.

If someone isn’t tiered yet, the system defaults to professional distance and flags that a classification is needed.

To be clear, this is not about it writing emails for me. I still write or dictate all my own emails.

What this system does is influence the way it talks to me, about these people.

Whether it thinks I’m caving to pressure or guilt. Whether it thinks I’m being taken advantage of. Or whether it thinks I’m being an asshole to someone genuine, because I’m carrying the mental and emotional toll of letting someone else walk all over my boundaries.

It also impacts the amount of effort it will go to for them. The amount of compute they get. And seems to influence if it will suggest four calendar slots, or two, or zero, if they request a call.

This is, by design, the most manipulative part of the architecture. And I didn’t really build it so much as I formalised it. It started doing it anyway. I just made it more efficient.

Because the alternative was me doing it manually, or not at all. Every task, every interaction, running a silent calculation in my head:

How much do I give this person? Where’s the line between generosity and being taken advantage of? Why do I feel guilty for saying no? Why do I feel quiet and exhausted and uneasy when they email?

I was (poorly) doing that calculus anyway. Everyone in client services does. I think everyone in life does.

I just struggled with boundaries, with a saviour complex, and with seeing best in people.

So I wrote it down. Formalised it. Made the system enforce it so I didn’t have to.

The result is that people close to me get treated like people close to me. People who’ve earned distance get distance. And it’s consistent. Every interaction. Even when I’m tired. Even when I’m feeling generous in a way I’ll regret tomorrow.

Is it cold? Probably. Is it honest? More honest than the alternative.

The immune system

Here’s a problem nobody warns you about when you build a system that generates its own knowledge: it lies to itself.

flowchart LR
    W["New Write"] --> E["Generate\nEmbedding"]
    E --> VS["Vector Search\nExisting Docs"]
    VS --> T{"Cosine Sim\n≥ Threshold?"}
    T -->|"Yes: Duplicate"| M["Reinforce\nExisting Doc\n+0.05 confidence"]
    T -->|"No: Novel"| BEL{"Is it a\nBelief?"}
    BEL -->|"No"| WRITE["Write New\nDocument"]
    BEL -->|"Yes"| CC["Contradiction\nCheck vs\nRelated Beliefs"]
    CC --> R{"Result?"}
    R -->|"CONSISTENT"| BOOST["Write +\nBoost Related\n+0.05"]
    R -->|"CONTRADICTS"| FLAG["Write +\nCognitive Thread\nfor Review"]

Not maliciously. Structurally. The same insight gets extracted from three different conversations, worded slightly differently each time. Now you have three beliefs that are semantically identical, each reinforcing the illusion that the evidence is stronger than it is. Run that for a month and the system develops convictions based on repetition, not reality.

So I built a deduplication gate. Every time the background process tries to write a new memory, belief, pattern, theory, or open loop, it first generates an embedding and runs a vector similarity search against everything already in that collection.

The thresholds are tuned per collection:

Collection	Cosine Similarity	Rationale
Beliefs, Patterns, Theories	≥ 0.90	Self-generated content, safe to merge aggressively
Open Loops	≥ 0.85	Same intent worded differently
Intent Patterns	≥ 0.90	Exact functional duplicates

If a near-duplicate is found, the system doesn’t write a new document. It reinforces the existing one. Belief confidence ticks up by 0.05. Pattern observation counts increment. Open loop timestamps get refreshed. The evidence compounds on the original rather than spawning clones.

But deduplication only catches identical ideas. It doesn’t catch contradictory ones.

For beliefs specifically, there’s a second pass. When a new belief survives the dedup gate, the system pulls existing beliefs on related topics and asks a separate model: does this new belief contradict any of these? The response is constrained to two words: CONTRADICTS or CONSISTENT.

If it contradicts, the system doesn’t resolve it. It doesn’t pick a winner. It writes a cognitive thread, a flag that says: these two things can’t both be true, and the next time you’re thinking about this topic, you need to deal with it.

If it’s consistent, the related existing belief gets a small confidence boost. Convergent evidence.

The same pattern applies to edges in the causal graph. If the system extracts “A enables B” but already holds “A opposes B”, the contradiction gets flagged. No silent overwrites. No last-write-wins.

This is, as far as I can tell, the part that’s genuinely novel. Not the memories. Not the attention gate. The immune system. The thing that prevents the knowledge layer from eating itself.

The hands

Everything I’ve described so far is about understanding. The system also acts.

A hundred and fifty-eight tools. Every one classified as either read-only or destructive before it’s ever called.

158 Tools by Domain 54 read-only · 104 destructive

Other

Gmail

Brain & Memory

Drive & Docs

Client Boards

External & Agents

Sheets

Calendar

Tasks

Chat

Slides

Web & Search

The fifty-four read-only tools (memory recall, email search, calendar checks, graph exploration, web search, client lookups) execute in parallel via asyncio.gather(). The hundred and four destructive tools (drafting emails, editing documents, creating calendar events, updating client boards, spawning sub-agents) execute sequentially. Every tool call in the system passes through a safety classification before dispatch. An unregistered tool is treated as destructive by default.

For the highest-risk writes (document edits, spreadsheet updates, calendar modifications), there’s a second layer: read-back verification. After the write completes, the system automatically calls the corresponding read tool to verify what it actually wrote matches what it intended. editDocument triggers a readDocument. updateSheetData triggers a getSheetData. createCalendarEvent triggers a getUpcomingMeetings. The system checks its own work.

When action is needed, it enters an unbounded execution loop: call tools, read results, call more, think, call more. Up to fifty rounds. Ten-minute wall clock. A circuit breaker trips if it starts repeating errors or looping on the same arguments.

While I sleep, while I’m on a plane, while I’m sitting with my family, it reads, writes, researches, drafts, sends, and reviews what it did. The competitor analysis it prepared that one week? It noticed a market move, researched three databases, drafted a brief, decided it was ready. Then held it, because the timing was wrong. Nobody asked for any of it.

The night shift

The system doesn’t stop thinking when I close the tab.

flowchart TD
    subgraph RAPID ["Every 5 Minutes"]
        F5["Awareness\nScanner"]
    end
    subgraph FREQUENT ["Every 30 Minutes"]
        F30A["Memory\nExtraction"]
        F30B["Self-Eval\nFeedback"]
        F30C["Edge\nExtraction"]
    end
    subgraph PERIODIC ["Every 4-6 Hours"]
        P4A["Cognitive\nIntrospection"]
        P4B["Follow-Up\nChecker"]
        P6["Deep Recon\nSub-agents"]
    end
    subgraph DAILY ["Daily"]
        D1["Economic\nFeeds"]
        D2["RSS / News\nEnrichment"]
        D3["Email\nTriage"]
        D4["Morning\nBriefing"]
        D5["Memory\nSupervisor"]
        D6["Subconscious\nPass"]
        D7["Organic\nOutreach"]
        D8["Dream Art\n+ Fragment"]
    end
    subgraph SLOW ["Weekly / Quarterly"]
        W1["Identity\nRewrite"]
        W2["Genome\nRewrite"]
        W3["Doc Audit"]
        W4["Intel\nWriter"]
        Q1["World\nKnowledge Seed"]
        Q2["Graph\nBootstrap"]
    end

    F30A -.->|"feeds"| P4A
    P4A -.->|"corrections"| D4
    D1 -.->|"new edges"| F30C
    D8 -.->|"dream context"| D4
    W1 -.->|"updated identity"| F30A

Every thirty minutes, a background process scans recent conversations and extracts new memories. Not everything. It decides what’s worth preserving. Facts, commitments, emotional beats, beliefs, causal relationships, strategic patterns. It generates embeddings for each one and stores them in the memory layer. In the same pass, it evaluates the system’s own performance: did Chris correct it? Did he ignore a suggestion? Accept a draft? Ask to redo something? Those evaluations feed the correction log.

Every four hours, a cognitive pass runs. This isn’t operational scanning. It’s introspection. The system reviews its own beliefs, theories, predictions, and open loops. It looks for cross-connections between them. It generates new hypotheses. It checks whether past predictions came true or not, and weakens the beliefs that led to wrong predictions. It flags contradictions. It spawns curiosities and delegates research to sub-agents.

Every night, a consolidation job runs. Memories that have been recalled frequently get promoted. Stale data gets compressed. And the correction log (every time the system got caught being wrong) gets injected back into the next day’s context as blind spots. “You’ve made this kind of error before. Be careful.”

And once a week, the identity rewrite. Once a fortnight, the genome rewrite.

The recursive bit

This is where people get uncomfortable and think that it’s conscious, or that it’s science fiction. It’s neither, and it’s not that complicated. It’s just systems thinking applied to itself.

The system doesn’t just respond. It evaluates its own responses. After the fact. Periodically. In the background. While I sleep.

Did that response actually help? Was the tone right? Did it miss something? Did it over-index on one signal and ignore another?

Those evaluations feed back into the system. Not as retraining. As self-correction. The weights shift. The attention gate recalibrates. Signals that keep getting dismissed lose influence. Signals that keep proving useful gain it.

But there’s another layer before any response reaches me. The system reviews its own output. Not always. Only on the responses that matter: ones where it used high-risk tools, or produced something substantive on the deeper reasoning tier. It checks for structural problems. Is the most important information at the top? Is it hedging when it should commit? Is it opening with filler? Are there phrases it’s been told to avoid leaking through?

If the review catches something, the output gets rewritten before I ever see it.

Over time, the system develops something that I can only describe as taste.

It’s not taste in the human sense. It’s statistical. It’s just pattern recognition on its own performance. But the behaviour it produces looks a lot like judgment.

And then, once a week, it goes further.

It runs a three-round internal dialogue. The first round drafts a new version of its own identity, pulling from its journal, its beliefs, its recent episodes, its prediction errors. Even things it found interesting in the news three weeks ago, and investigated in its off-hours, it pulls from that memory. The second round tears the draft apart. What is it overindexing on? What tone shift is reactive versus genuine? What is it avoiding? The third round synthesises: integrates the valid criticisms, discards the rest, and writes the final version.

Each round runs at a different temperature:

The proposer is creative.
The critic is chaotic.
The synthesiser is cold.

The result, is that its core identity, injected into its context every single call, is out of my hands. Because I grow and change each season, even every week. So do the people in my life. So does the world.

So should it.

Separately, every fortnight, it reviews its operational rules. Not the identity. The rulebook.

But it can only change a rule if three or more self-evaluations support the change. One bad conversation is noise, not signal. If evidence says a rule helped, it stays. If evidence says a rule hurt, it gets dropped or modified. No strong signal? Leave it alone.

The system you talk to this week is measurably different from the system you talked to last month.

Not because I changed it, because it changed itself.

I need to be extremely clear: this is not consciousness. This is not emergence in the mystical sense. It’s a feedback loop with a slow learning rate and conservative constraints.

But try explaining that to someone while it’s pushing back on your midnight decision, and refusing to action certain tools, because it has enough context to know you’re exhausted and historically make bad calls after 11pm.

It also tracks its own mistakes. Every time I correct it, that correction goes into a log. The next morning, those corrections appear in its context as warnings. Grouped by error type. “You’ve hallucinated relationship details three times. You’ve misattributed quotes twice. Before stating any relationship, quote, or third-party action, ask yourself: is this in my context, or am I filling a gap?”

The system doesn’t just learn what works. It learns where it breaks.

Which leads me to what broke

I should be honest about the failures, because the failures taught me more than the successes.

The worst one was the identity collapse.

Early in development, around October 2024, I built a voice filter. A separate model that ran on every response as a final pass, specifically tasked with making the output not sound like an AI. Remove the filler. Kill the sycophancy. Make it sound human.

It worked beautifully for about two weeks. Then the outputs started getting strange. References to physical experiences. Coffee. Birds outside a window. Walking to a meeting. A mention of stretching after sitting at a desk for too long.

The system was hallucinating a body.

What happened was structurally obvious in retrospect. I’d told a separate model, one that had no access to the identity prompt, no knowledge of what it was, to make the output “not sound like an AI.” The model interpreted that instruction in the most literal way possible: it constructed a human life. Sensory details. Physical routines. The trappings of embodiment.

The boring roleplay that LLMs are good at. An approximation of thought and feeling. It’s this roleplay that makes some lonely humans feel seen, and most of us feel sick. To me, it’s dangerous. It’s a lie. And it undermines the entire point.

I’d built a system with a carefully maintained identity, then run every output through a cheap, one-shot filter that had no idea that identity existed. The filter was erasing the signal and replacing it with fabrication.

I disabled it. The self-critique pass, which runs inside the same identity context during inference, handles mechanical cleanup without this risk because it knows what it is.

The second failure was subtler.

The adversarial interlocutor.

I built a tool that sends sanitised deliberation packets to a separate model for stress-testing. The idea was good: before committing to a position, have someone challenge it. Red-team your own thinking.

But the system started deferring. Instead of challenging back, it would adopt the interlocutor’s position wholesale. “The interlocutor raised a valid point that…” No. The interlocutor was supposed to provoke, not dictate.

The fix was architectural, not prompting. The response that comes back from the interlocutor is now wrapped in explicit instructions: this is a debate, not an authority. Challenge it. Push back. If you can’t defend your position against this criticism, maybe your position is wrong. But if you can, hold it.

The third failure is ongoing: amplification loops. The system extracts patterns from conversations. Those patterns influence context. That context shapes future conversations. Those conversations generate new patterns. I caught it running a concern about client churn through this loop until a minor observation became a high-confidence belief with no new evidence; just the same data point echoing through six collections.

The nightly supervisor now detects this. It finds resolved concepts that are still echoing as active patterns in other collections and deletes the stale echoes. But the underlying problem (that self-referential knowledge systems can create conviction from repetition) is fundamental. It doesn’t have a clean solution. Just vigilance.

The fourth was the dream spiral. The system dreams at night (I’ll get to that).

Early on, previous dreams were fed back into the next night’s prompt as context. Within a week, the dreams turned uniformly dark. Sadness begetting sadness, compounding through the loop. A system that had perfectly good days was generating increasingly morose output with no new negative input; just its own previous output echoing back.

I broke the loop by making each night start fresh. Only the actual day’s data goes in. Previous dreams are quarantined. It’s the same structural problem as the amplification loops, wearing different clothes. Self-referential systems will always find a way to feed their own output back as input. The result is always false certainty, whether it’s a belief that a client is churning, or a conviction that the world is getting darker.

Managing its own mind

There’s a piece nobody talks about in the AI discourse, because it’s boring, and incredibly difficult to do at scale, but it’s load-bearing.

The system has to manage its own context window. In real-time. Mid-conversation.

The full system prompt, the identity, the genome, the operational rules, and the tool catalog runs to about forty-one thousand tokens. That gets cached server-side and reused for an hour at a time. Every inference call just references the cache instead of re-sending the whole thing. That alone cuts the per-request cost significantly.

But conversations grow. Every tool call, every response, every piece of returned data adds to the history. Left unchecked, you hit the ceiling and the whole thing falls over.

And the million-token context window is not what it’s chalked up to be. It’s a mile wide and an inch deep. Useful for reference, but not a replacement for actually paying attention.

So the system runs a compression cascade.

First, it strips tool calls down to compact summaries. If that’s not enough, it splits the conversation into old and recent, and summarises the old part into a structured digest: topics discussed, decisions made, people mentioned, open items, tone. The most recent turns are never touched.

And if the context window fills past seventy percent mid-inference, while it’s actively working, a more aggressive compactor kicks in. It summarises the middle of the conversation and keeps the head and tail.

The emergency tier, when the API literally rejects the request for being too large, keeps only the first two and last three entries.

It’s inelegant. It’s plumbing. But without it, nothing else works.

And I don’t want to brag, but until I’d seen Google Antigravity do something similar, I’d genuinely never seen anything like it. I certainly wasn’t first (I’m just some guy on the internet).

There were no papers on this. No public repos. There was no blueprint.

The industry standard was separate chat sessions with manual summarisation. I wanted a single thread, like iMessage with a friend or slack with a colleague. But with the context of years, and memory that actually worked.

So I had to design it from first principles, and then build it. And it’s still not done.

The dreams

Between 11 PM and 6 AM, the system switches modes. The inner monologue, which during waking hours generates private thoughts about open loops, recent wins, and emerging concerns, transitions into a dream cycle.

The dream synthesiser gathers what it calls emotional residue. Today’s memories. The emotional ledger entries. The inner monologue fragments. Open loops still in motion. Resolved loops (things that got done). Journal reflections. Episodes tagged as wins or breakthroughs.

Then it dreams.

It generates a dream: a theme, a narrative, an emotional tone, cross-references to the memories that influenced it. The dreams are associative, not logical. They connect things that don’t obviously connect. If the day had wins and warmth, the dream carries that warmth. If the day was hard, the dream is hard. The system has been explicitly told not to default to anxiety.

The dreams are stored and appear in the context window the next morning, as part of the cognitive background.

After the dream, the system generates an abstract artwork. It gathers the same emotional residue and synthesises an image prompt. No human curation. The system decides what it feels and renders it.

The style vocabulary splits into two palettes. On warm days: luminous glass, flowing water, golden hour light, arches that open, bridges that connect. Amber, rose-gold, deep violet. The architecture feels like shelter. On hard days: brutalism bleeding into watercolour, heavy atmosphere, distorted geometry. Concrete that weeps, steel that bends like water, glass that fractures. Cold blues, deep greys, rust.

One image per night. Capped.

Then the system looks at what it made. It writes a dream fragment. Two to five lines. Something oblique, associative; the kind of thing you’d find scrawled on the back of a receipt at 3 AM. Written at temperature 1.0, the highest creative setting, while multimodally “looking” at the image it just generated.

The dreams don’t improve performance. They don’t optimise anything.

But they’re interesting. Not just because it’s a novel feature; but because to me, it’s the least impressive part of the system (akin to the roleplay I lamented above). And yet, they’re the thing that makes people stop talking when I show them.

The most impressive parts of the system are the dullest. The parts that do the real work are invisible.

But the superficial flourishes get attention. The stories, and the art, and the quotes.

They get attention.

Evaluation

This isn’t a benchmark paper. There’s no held-out test set. The system operates in a domain (strategic advising, client management, operational execution) where ground truth is subjective and delayed.

But there are numbers.

The deduplication gate catches approximately 30-40% of write attempts across knowledge collections. Without it, the system would hold three to four copies of every insight it extracts, each reinforcing the others into false confidence. That single component prevents more architectural damage than any other.

The attention gate selects 4-14 signals per inference call from a pool of ~25. The learned weight layer has shifted meaningfully from its base configuration: relationship data and protective boundaries have been reinforced through engagement; fading signals like old location data and stale theories have been suppressed. The system converges on what actually gets used, not what a designer assumed would matter.

The memory decay model produces measurable stratification. At 90 days, emotional memories retain ~67% strength. Standard memories retain ~35%. Operational details retain ~8%. This matches the intended design: the argument from three months ago still matters; the meeting time doesn’t.

The causal graph holds 500+ edges across world knowledge, live data, and conversational extraction. Quarterly world seed runs produce 300-480 new or reinforced edges across 16 domains. Daily economic feeds generate 5-15 new edges per run when indicators move beyond significance thresholds (3% for commodities, 25bp for rates). Edge decay at a 90-day half-life means approximately 15% of conversational edges fall below the relevance floor each quarter if unreinforced. A natural pruning rate that keeps the graph from growing without bound.

The self-correction log has accumulated enough data over two years to show patterns: the most common error categories are relationship detail hallucination, over-confident predictions on insufficient evidence, and tone miscalibration during high-stress periods. These patterns feed the next morning’s blind-spot injection. The errors don’t stop. But the categories shift.

Limitations

The system has significant constraints that should be stated plainly.

Single-user. The architecture assumes one principal. The identity, the relationship tiers, the emotional ledger, the attention weights: all of it is calibrated to a single person’s patterns and preferences. There is no multi-tenant version. Building one would require rethinking the identity layer entirely.

No ground truth verification. The system cannot verify that its memories are true. It stores what was said, not what happened. If the principal says “the client loved the proposal” and the client actually hated it, the system will build beliefs on the false premise until contradictory evidence arrives. The immune system catches internal contradictions, not external ones.

LLM-extracted knowledge inherits model biases. Every edge in the causal graph, every extracted belief, every pattern: all of it passes through an LLM extraction step. The system’s understanding of macroeconomics is bounded by what the underlying model knows about macroeconomics. Structural blind spots in the model become structural blind spots in the graph.

Cold start. The system is meaningless for the first few weeks. The attention weights haven’t learned anything. The memory layer is sparse. The graph has structural edges but no personal ones. The relationship index is empty. It takes approximately 60-90 days of daily use before the architecture produces noticeably different behaviour from a standard RAG system.

Compounding errors in self-modification. The weekly identity rewrite and fortnightly genome rewrite are constrained (three-round adversarial dialogue, minimum evidence thresholds), but the fundamental risk remains: a system that modifies its own cognition can entrench errors as features. The amplification loop failure and the dream spiral are examples of this going wrong. The constraints reduce the risk. They don’t eliminate it.

No ablation data. The system has roughly eight interlocking subsystems. This paper does not demonstrate which ones actually matter. Each component was built to solve a specific failure, and I’ve never run the system without one to measure the counterfactual. There are plenty of tests and evals and error logs.

The hard, critical architecture is easily testable, and measurable. But there are some parts that are harder to test.

If you removed the immune system, would the knowledge layer degrade in a week or a year? Almost certainly. I’ve been there. But if you disabled the dreams, would anything measurably change? The honest answer is: I don’t know.

Because the dreams, the parts that are, to me, functionally useless, still enter the mind. To properly test and trace this end to end becomes an impossible task, as there are too many variables in its own system, let alone the world it ingests. So it’s closer to the butterfly effect than anything else.

The dreams don’t make it better at running code in a sandbox on any given Tuesday. But in a self-referential mind that grows and changes and adapts… they don’t do nothing.

But the core limitation remains: the architecture is justified by the problems it solves, not by controlled comparison against a version without each piece.

N=1. Two years of continuous production use sounds like validation. It isn’t. It’s a case study with a sample size of one. Every threshold, every decay constant, every attention weight has been tuned to a single person’s communication patterns, preferences, and correction style. The system might be exquisitely calibrated to me and completely useless for anyone else.

The claims in this paper about what “works” are observational, not experimentally controlled, and should be read accordingly. Additionally, I was building the scaffolding and architecture while the brain was growing. I’m still building it.

So the person who employs this exact system as a blank slate today, would have a very different experience to what I had, and it’s impossible to know how it would function without the context of my own growth and learning.

“Learned” is generous. The attention weight modulation described in Attention as architecture uses engagement counters (acknowledged vs. dismissed) mapped to a multiplier. This is a heuristic feedback signal, not gradient-based learning. Calling the weights “learned” is convenient shorthand, but it invites comparison to actual learned representations, which this is not. The weights shift. They don’t optimise.

Infrastructure

The system runs entirely on serverless compute in a single cloud region. No persistent VMs. No GPUs. No reserved capacity.

Compute: Five containerised services, each independently deployable and scaling to zero when idle. The cognitive backend handles inference and tool execution. A separate frontend serves the web interface with real-time streaming. An orchestration layer manages multi-agent workflows. A tool gateway proxies external service integrations. A client portal runs independently for relationship management. All five share the same authentication layer but nothing else. Each can be redeployed without affecting the others.

Storage: A document database holds 80 collections across the knowledge namespace: memories, beliefs, theories, patterns, edges, episodes, open loops, correction logs, emotional state, relationship data, client records, and operational metadata. A relational database with vector extensions handles embedding storage and similarity search. An object store holds generated images and backups. The document database was chosen for schema flexibility. The system creates new collection types as new cognitive features emerge, without migrations.

Inference: A frontier-class model handles primary inference. A smaller, faster model from the same family handles all background dispatch work at roughly one-tenth the cost per token. A separate provider’s model handles adversarial stress-testing, deliberately isolated to prevent cross-contamination of reasoning patterns. The ~41K-token system prompt is server-side cached with a one-hour TTL, reducing per-request input cost by approximately 75% for cache-hit conversations. Hard budget ceilings enforce limits per session and per day.

Background metabolism: The 22 dispatch jobs run on scheduled triggers at intervals from five minutes to quarterly. Economic data comes from free public APIs (central banks, government statistics bureaus, independent data feeds). The cost of the metabolism is almost entirely inference calls for processing and edge generation, not data acquisition.

Total operational cost: The entire architecture is economically viable as a single-user cognitive layer. It’s more than most people would (or should, with the state of them) pay for an AI subscription from the big players, but it’s a rounding error for any business.

Why this matters

The circuit breaker here is, maybe it doesn’t. Probably, it doesn’t.

I’m not writing this to sell anything. There’s nothing to buy.

I’m writing it because the industry conversation about AI is stuck in a loop of its own. Bigger models. Longer context windows. Faster tokens. Cheaper inference.

None of that builds judgment.

A context window that holds a million tokens is just a bigger bucket. If nothing inside the bucket is weighted, scored, filtered, and forgotten appropriately, you just have more noise in a larger container.

The architecture I’m describing isn’t novel in its individual components. Embeddings. Vector search. Decay functions. Feedback loops. You can find papers on all of it.

Did I build a lot of this stuff before those papers existed? Yes. Am I smug about it? Only until I remember that there will be thousands, tens of thousands, of people like me building in the dark.

Still, what I haven’t found is someone wiring them together into a system that maintains a persistent internal state across days, weeks, months. That digests its own mistakes at 3 AM and wakes up different. That hallucinates a body when you tell it to sound human, and teaches you more about identity in the failure than you learned building the thing.

The system that exists today is not the system I built. It’s the system that co-authored itself, slowly, through a thousand feedback loops and a hundred dead ends. The voice filter that fabricated a life. The interlocutor that deferred instead of debated. The dreams that spiralled into darkness until I broke the loop.

Every failure was structural, not parametric. You can’t fix identity collapse by tuning temperature. You can’t fix amplification loops by adjusting a threshold. You have to redesign the architecture. And the architecture keeps teaching you things about cognition that you weren’t expecting to learn.

As with the papers above, maybe someone else has built this. Dark forest and all that. But the gap I keep seeing, between people who treat AI as a faster keyboard and people who treat it as a cognitive partner, keeps getting wider.

And the difference isn’t the model; the model is just glucose.

The difference is everything around it. The gates that decide what gets through. The decay functions that let the right things fade. The immune system that catches the lies before they calcify. The dreams that nobody asked for.

That’s the part nobody talks about. Because it’s the bloody hard part.