Building Edward: Learning with a graph.
Building Edward: episodic memory, affect, and two kinds of “learning” on a graph
A progress note on the Edward brain prototype: from raw text to recall, mood, and pattern chaining.
What we set out to do
Edward is a small personal episodic memory experiment: each line of user input becomes part of a time-sliced narrative (Moments), tied to Concept nodes in Neo4j, with affect represented as VAD (valence, arousal, dominance on 0–1). The system does not just store tokens—it recalls related graph context before committing the line, blends that recall into the current emotional state, then learns two complementary structures: undirected co-occurrence (SYNAPSE) and ordered sequence (CHAIN_NEXT).
Recent design progress (reflected in the current codebase and README) includes:
- Relationship-level “core”:
SYNAPSEedges carry a booleancorewhen strength crossesCORE_SYNAPSE_MIN_STRENGTH; idle decay skipscore = trueedges so stable associations do not silently erode. - Robust chain learning: sequence reinforcement was split into clear Cypher steps (increment → renormalize probabilities → read back) so
CHAIN_NEXTupdates andprobabilitystay consistent in Neo4j. - Sensible co-occurrence pairs: token order is preserved for which pairs fire; each unordered pair is stored as a single canonical directed edge (lexicographic
min → max) so you do not get confusing edges likethree → twowhen you meant “two” then “three” in speech order. - Clear separation of concerns: Neo4j for the durable graph; Redis for session continuity (mood, working memory, last snapshot);
config.pyas the single place for tunables.
Tech stack
| Layer | Choice | Role |
|---|---|---|
| Runtime | Python 3 | EdwardBrain, CLI, orchestration |
| Graph DB | Neo4j (Bolt) | Concepts, Moments, SYNAPSE, CHAIN_NEXT, queries for recall and learning |
| Session state | Redis (or Valkey) | Current VAD, last ingest time, active moment id, working-memory tokens, last ingest JSON snapshot |
| Driver | neo4j Python driver |
Sessions, Cypher |
| Config | config.py |
URIs, weights, thresholds, stop words—no separate runtime config file |
The README module table (README.md) is the authoritative map of files: brain.py, recall.py, mood.py, moments.py, concepts.py, synapse_learning.py, chain_learning.py, decay.py, pruning.py, context.py, state.py, cli.py.
Logical flow: input → recall → mood → write
At a high level, one ingest does the following (see brain.py and the numbered pipeline in README.md).
flowchart TD
A[User line] --> B[clean_text: lowercase strip punctuation remove STOP_WORDS]
B --> C[Neo4j: decay non-core SYNAPSE prune old Moments]
C --> D[recall_for_ingest]
D --> E[vad_from_recall_bundle]
E --> F[blend_mood: input VAD + recall + Redis mood]
F --> G[moment_vad_from_effective_and_mood]
G --> H[create or append Moment]
F --> I[salience_from_vad]
I --> J[reinforce_cooccurrence SYNAPSE]
J --> K[reinforce_sequence CHAIN_NEXT]
K --> L[promote_if_significant Memory]
L --> M[promote_core_synapses]
M --> N[Redis: tokens timestamp mood snapshot]
1. Input and tokens
clean_text normalizes the line and drops stop words. Empty token lists short-circuit the pipeline.
2. Maintenance before “thinking”
- Decay: weakens
SYNAPSEstrength for idle time;core = trueedges are excluded; edges at or below the prune floor can be deleted; orphan concepts may be removed (decay.py). - Pruning: old ephemeral moments can be deleted (
pruning.py).
3. Recall bundle (recall.py)
recall_for_ingest builds a single structure used everywhere else:
- Concepts (fuzzy match against the graph: exact, substring, plural heuristics, etc.).
- Synapses: top edges anchored from tokens ∪ matched concept names.
chain_next: for each distinct input token, outgoingCHAIN_NEXTsuccessors fromchain_learning.chain_next_candidates, including probability, counts, and the next concept’s VAD.- Moments: episodes involving an expanded set of concept names (including neighbors from synapses and predicted chain-next names), so recall can “see” a bit beyond the raw tokens.
Chain bootstrap (ensure_chain_next_registered) ensures the relationship type exists before first real sequence data (avoids empty-catalog issues in Neo4j).
4. From recall to one aggregate VAD (mood.vad_from_recall_bundle)
Recall is not a narrative—it is collapsed into one weighted VAD by averaging (v,a,d) samples with weights:
- Matched concepts (fixed aggregate weight
RECALL_AGG_SEED_CONCEPT). - Synapse rows: neighbor VAD scaled by clamped strength ×
RECALL_AGG_SYNAPSE_STRENGTH_MULT. - Moments: higher weight if the moment is already promoted to
Memory(RECALL_AGG_MEMORY_MOMENTvsRECALL_AGG_MOMENT). - Chain predictions: each row contributes
probability × RECALL_AGG_CHAIN_NEXTtimes the predicted next concept’s VAD.
So pattern chaining is not a separate mood module—it is another weighted channel into the same aggregate affect, tuned by RECALL_AGG_CHAIN_NEXT.
5. Blending into “effective” mood and the stored moment (mood.blend_mood + moment_vad_from_effective_and_mood)
blend_mood: combines line appraisal VAD (defaults or CLI overrides) with the recall aggregate usingRECALL_BLEND_WEIGHT, then mixes with previous session VAD from Redis usingMOOD_CARRYOVER. Output is the effective mood used for learning and for updating Redis.- Moment VAD: the value written on the Moment blends effective mood with prior session mood using
MOOD_IMPACT_ON_MOMENT, so the episodic record is both about this line and continuous with how the session felt.
A human-readable recall summary string is also stored on the moment as recall_context.
6. Salience and learning (two graphs)
salience_from_vadderives a scalar salience from the effective VAD (arousal plus intensity on valence/dominance axes), clamped to configured bounds. That scalesSYNAPSE_BASE_DELTA × saliencefor co-occurrence reinforcement.reinforce_cooccurrence: for each unordered pair of distinct tokens in first-seen order, MERGE canonical(min_name)-[:SYNAPSE]->(max_name)and bumpstrengthandlast_updated; new edges getcore = falseinitially.reinforce_sequence: for each consecutive token pair on the line, updatesCHAIN_NEXTwith counts and renormalizedprobabilityout of each source concept.promote_if_significant: may label the momentMemorywhen VAD / synapse context crosses significance thresholds.promote_core_synapses: setsSYNAPSE.corefromstrength ≥ CORE_SYNAPSE_MIN_STRENGTHacross the graph after this line’s updates.
7. Session persistence
Redis holds current VAD, last ingest time, active moment id, working memory tokens, and a JSON snapshot of the last ingest (recall bundle summary, updates, effective vs moment VAD, etc.) for inspection and continuity.
Why two edge types?
SYNAPSE: “these ideas showed up together in a line” (bag-of-words style, pairwise, salience-scaled).coremarks durable associations that should not decay.CHAIN_NEXT: “after token A, token B often followed next” (Markov-style transitions with empirical probabilities). This powers predictive recall and feeds the mood aggregate viaRECALL_AGG_CHAIN_NEXT.
Together they approximate associative memory and sequential habit in one graph.
Tuning the story
Everything important lives in config.py: blend weights, recall weights, chain aggregation, salience, decay rate, core threshold, moment gaps, and Redis/Neo4j connection. The README sections on fuzzy recall, salience, :Memory promotion, and chain bootstrap are the deep-dives for operators who want to trace behavior line-by-line.