Building Edward: Learning with a graph.

March 22, 2026

Building Edward: episodic memory, affect, and two kinds of “learning” on a graph

A progress note on the Edward brain prototype: from raw text to recall, mood, and pattern chaining.

What we set out to do

Edward is a small personal episodic memory experiment: each line of user input becomes part of a time-sliced narrative (Moments), tied to Concept nodes in Neo4j, with affect represented as VAD (valence, arousal, dominance on 0–1). The system does not just store tokens—it recalls related graph context before committing the line, blends that recall into the current emotional state, then learns two complementary structures: undirected co-occurrence (SYNAPSE) and ordered sequence (CHAIN_NEXT).

Recent design progress (reflected in the current codebase and README) includes:

Relationship-level “core”: SYNAPSE edges carry a boolean core when strength crosses CORE_SYNAPSE_MIN_STRENGTH; idle decay skips core = true edges so stable associations do not silently erode.
Robust chain learning: sequence reinforcement was split into clear Cypher steps (increment → renormalize probabilities → read back) so CHAIN_NEXT updates and probability stay consistent in Neo4j.
Sensible co-occurrence pairs: token order is preserved for which pairs fire; each unordered pair is stored as a single canonical directed edge (lexicographic min → max) so you do not get confusing edges like three → two when you meant “two” then “three” in speech order.
Clear separation of concerns: Neo4j for the durable graph; Redis for session continuity (mood, working memory, last snapshot); config.py as the single place for tunables.

Tech stack

Layer	Choice	Role
Runtime	Python 3	`EdwardBrain`, CLI, orchestration
Graph DB	Neo4j (Bolt)	Concepts, Moments, `SYNAPSE`, `CHAIN_NEXT`, queries for recall and learning
Session state	Redis (or Valkey)	Current VAD, last ingest time, active moment id, working-memory tokens, last ingest JSON snapshot
Driver	`neo4j` Python driver	Sessions, Cypher
Config	`config.py`	URIs, weights, thresholds, stop words—no separate runtime config file

The README module table (README.md) is the authoritative map of files: brain.py, recall.py, mood.py, moments.py, concepts.py, synapse_learning.py, chain_learning.py, decay.py, pruning.py, context.py, state.py, cli.py.

Logical flow: input → recall → mood → write

At a high level, one ingest does the following (see brain.py and the numbered pipeline in README.md).

flowchart TD
  A[User line] --> B[clean_text: lowercase strip punctuation remove STOP_WORDS]
  B --> C[Neo4j: decay non-core SYNAPSE prune old Moments]
  C --> D[recall_for_ingest]
  D --> E[vad_from_recall_bundle]
  E --> F[blend_mood: input VAD + recall + Redis mood]
  F --> G[moment_vad_from_effective_and_mood]
  G --> H[create or append Moment]
  F --> I[salience_from_vad]
  I --> J[reinforce_cooccurrence SYNAPSE]
  J --> K[reinforce_sequence CHAIN_NEXT]
  K --> L[promote_if_significant Memory]
  L --> M[promote_core_synapses]
  M --> N[Redis: tokens timestamp mood snapshot]

1. Input and tokens

clean_text normalizes the line and drops stop words. Empty token lists short-circuit the pipeline.

2. Maintenance before “thinking”

Decay: weakens SYNAPSE strength for idle time; core = true edges are excluded; edges at or below the prune floor can be deleted; orphan concepts may be removed (decay.py).
Pruning: old ephemeral moments can be deleted (pruning.py).

3. Recall bundle (`recall.py`)

recall_for_ingest builds a single structure used everywhere else:

Concepts (fuzzy match against the graph: exact, substring, plural heuristics, etc.).
Synapses: top edges anchored from tokens ∪ matched concept names.
chain_next: for each distinct input token, outgoing CHAIN_NEXT successors from chain_learning.chain_next_candidates, including probability, counts, and the next concept’s VAD.
Moments: episodes involving an expanded set of concept names (including neighbors from synapses and predicted chain-next names), so recall can “see” a bit beyond the raw tokens.

Chain bootstrap (ensure_chain_next_registered) ensures the relationship type exists before first real sequence data (avoids empty-catalog issues in Neo4j).

4. From recall to one aggregate VAD (`mood.vad_from_recall_bundle`)

Recall is not a narrative—it is collapsed into one weighted VAD by averaging (v,a,d) samples with weights:

Matched concepts (fixed aggregate weight RECALL_AGG_SEED_CONCEPT).
Synapse rows: neighbor VAD scaled by clamped strength × RECALL_AGG_SYNAPSE_STRENGTH_MULT.
Moments: higher weight if the moment is already promoted to Memory (RECALL_AGG_MEMORY_MOMENT vs RECALL_AGG_MOMENT).
Chain predictions: each row contributes probability × RECALL_AGG_CHAIN_NEXT times the predicted next concept’s VAD.

So pattern chaining is not a separate mood module—it is another weighted channel into the same aggregate affect, tuned by RECALL_AGG_CHAIN_NEXT.

5. Blending into “effective” mood and the stored moment (`mood.blend_mood` + `moment_vad_from_effective_and_mood`)

blend_mood: combines line appraisal VAD (defaults or CLI overrides) with the recall aggregate using RECALL_BLEND_WEIGHT, then mixes with previous session VAD from Redis using MOOD_CARRYOVER. Output is the effective mood used for learning and for updating Redis.
Moment VAD: the value written on the Moment blends effective mood with prior session mood using MOOD_IMPACT_ON_MOMENT, so the episodic record is both about this line and continuous with how the session felt.

A human-readable recall summary string is also stored on the moment as recall_context.

6. Salience and learning (two graphs)

salience_from_vad derives a scalar salience from the effective VAD (arousal plus intensity on valence/dominance axes), clamped to configured bounds. That scales SYNAPSE_BASE_DELTA × salience for co-occurrence reinforcement.
reinforce_cooccurrence: for each unordered pair of distinct tokens in first-seen order, MERGE canonical (min_name)-[:SYNAPSE]->(max_name) and bump strength and last_updated; new edges get core = false initially.
reinforce_sequence: for each consecutive token pair on the line, updates CHAIN_NEXT with counts and renormalized probability out of each source concept.
promote_if_significant: may label the moment Memory when VAD / synapse context crosses significance thresholds.
promote_core_synapses: sets SYNAPSE.core from strength ≥ CORE_SYNAPSE_MIN_STRENGTH across the graph after this line’s updates.

7. Session persistence

Redis holds current VAD, last ingest time, active moment id, working memory tokens, and a JSON snapshot of the last ingest (recall bundle summary, updates, effective vs moment VAD, etc.) for inspection and continuity.

Why two edge types?

SYNAPSE: “these ideas showed up together in a line” (bag-of-words style, pairwise, salience-scaled). core marks durable associations that should not decay.
CHAIN_NEXT: “after token A, token B often followed next” (Markov-style transitions with empirical probabilities). This powers predictive recall and feeds the mood aggregate via RECALL_AGG_CHAIN_NEXT.

Together they approximate associative memory and sequential habit in one graph.

Tuning the story

Everything important lives in config.py: blend weights, recall weights, chain aggregation, salience, decay rate, core threshold, moment gaps, and Redis/Neo4j connection. The README sections on fuzzy recall, salience, :Memory promotion, and chain bootstrap are the deep-dives for operators who want to trace behavior line-by-line.