Core
- In production agent memory, “vector search just doesn’t find it” when users rephrase the same intent.
- Genesys Memory System replaces flat vector storage with a causal graph to retrieve memories by relationships, not just embedding proximity or lets say “Genesys extends Graph RAG into agent memory with causal reasoning”
- You’ll learn what changes, what the “89.9% on LoCoMo” signal implies, and how to run a memory service locally with Docker Compose.
Introduction
Agent “memory” sounds straightforward: store past user facts and retrieve them later. In production, it’s messier. Users don’t repeat themselves the same way twice, and the most common implementation—flat vector storage with nearest-neighbor search—often fails at the exact moment you need it: when the query is phrased differently.
This is the motivation behind the Genesys Memory System, which positions itself as an alternative to flat vector storage by modeling memories as a causal graph. Instead of betting everything on embedding similarity, it tries to retrieve based on relationships and causal structure—closer to how “why” and “because” show up in real conversations.
In this post, you’ll map the failure mode (“vector search just doesn’t find it”) to concrete retrieval behavior, understand what “causal graph vs flat vector storage” changes in practice, interpret the “Genesys Memory System LoCoMo benchmark” claim (89.9% and +22 points vs Mem0), and package a minimal memory service with Docker for local iteration.
Hook: Why “vector search just doesn’t find it”
Flat vector storage typically means: embed each memory as a vector, dump it into a vector index, and retrieve top-k by cosine similarity. This works well when the query is semantically close to the stored text. It fails when the user asks for the same underlying fact through a different framing, different entities, or different causal direction.
Common production failure patterns:
- Rephrasing drift: “What did I say about my diet?” vs “Remind me what I’m avoiding these days.” Same intent, different lexical anchors.
- Causal inversion: “Why did we pick Postgres?” vs “What constraints made us avoid DynamoDB?” The relevant memory is about constraints/decisions, not the literal technology name.
- Multi-hop recall: The answer requires chaining: user preference → project constraint → decision. Flat top-k often returns one hop, not the chain.
In agent memory, the hard part isn’t storing text—it’s retrieving the right fact when the user’s wording changes and the “reason” matters more than the “phrase.”
When teams say “vector search just doesn’t find it,” they’re usually describing one of two things: (1) the correct memory isn’t in top-k, or (2) the correct memory is present but not salient enough for the model to use because it’s missing connecting context. Both are retrieval problems, not generation problems.
What Genesys changes: causal graph vs flat vectors
The core shift in Genesys is the storage and retrieval primitive. Rather than treating each memory as an independent point in a vector space (flat vector storage), Genesys models memories as nodes and relationships as edges in a causal graph. The goal is to retrieve not only “similar” snippets, but also the causal neighborhood that explains why something is true or relevant.
Flat vector storage: what it optimizes for
Flat vector storage is optimized for semantic proximity between a query and a chunk. In practice, it tends to:
- Return high-similarity paraphrases (good for FAQ-like recall).
- Miss causal/relational relevance when similar words aren’t present.
- Over-retrieve generic memories (“preferences”, “project”) that embed similarly.
Causal graph memory: what it optimizes for
A causal graph representation aims to make retrieval robust to rephrasing by anchoring on relationships: decisions, motivations, constraints, outcomes, and dependencies. Instead of “find the closest chunk,” retrieval can look like:
- Identify candidate nodes related to the query intent (entities, events, decisions).
- Traverse edges to pull supporting context (causes, effects, prerequisites).
- Return a structured bundle: key memory + justification path.
That “justification path” is the practical difference. In production, you want the agent to answer with the relevant fact and the supporting context that makes the fact usable and less hallucination-prone.
Comparison: causal graph vs flat vector storage
| Criteria | Flat vector storage | Causal graph (Genesys) |
|---|---|---|
| Best at | Direct semantic similarity | Relational + causal recall |
| Rephrasing robustness | Often brittle when anchors change | Improves when relationships stay stable |
| Multi-hop context | Requires luck in top-k or extra prompting | Natural via graph traversal |
| Explainability | “These chunks were similar” | “This is relevant because A → B → C” |
| Operational complexity | Simple indexing + ANN search | Graph schema + edge management + traversal |
Graph RAG vs Genesys Memory: What’s the Difference
At first glance, Graph RAG and the Genesys Memory System look very similar. Both move beyond simple vector search and introduce structure into retrieval. But they solve slightly different problems.
Let’s break this down with a simple example.
A Simple Example: “Why did we choose Postgres?”
Imagine your system has stored this information:
“We chose Postgres because we needed strong consistency and SQL reporting.”
How Graph RAG handles this
Graph RAG first converts information into a knowledge graph:
[Postgres] ← (chosen because) ← [strong consistency]
[Postgres] ← (chosen because) ← [SQL reporting]
Now when a user asks:
- “Why did we choose Postgres?”
- or even “What requirements led us to select our database?”
Graph RAG:
- Finds the node Postgres
- Traverses connected nodes
- Returns related facts + relationships
👉 Result:
It gives connected context, not just similar text.
📌 This is the key improvement over standard RAG:
- Instead of isolated chunks, Graph RAG retrieves relational context (Memgraph)
How Genesys Memory handles this
Genesys builds a causal memory graph, not just a knowledge graph.
It represents the same information like this:
[need: strong consistency] → (MOTIVATES) → [decision: Postgres]
[need: SQL reporting] → (MOTIVATES) → [decision: Postgres]
Now when the user asks:
- “Why did we choose Postgres?”
- OR “Why didn’t we use DynamoDB?”
Genesys:
- Identifies the decision node
- Traverses causal relationships
- Returns:
- the decision
- the reasons (causes)
- optionally the reasoning chain
👉 Result:
It answers not just what is connected, but what caused the decision
Key Similarities
Graph RAG and Genesys share a strong foundation:
- Both use graph structures (nodes + edges)
- Both support multi-hop reasoning
- Both improve retrieval beyond simple similarity
- Both provide more explainable outputs
👉 In fact, Graph RAG itself is an evolution of traditional RAG that adds structure and reasoning (Memgraph)
Key Differences
The difference becomes clear when you look at what kind of reasoning they enable:
| Aspect | Graph RAG | Genesys Memory |
|---|---|---|
| Core idea | Knowledge graph retrieval | Causal memory system |
| Focus | “What is related?” | “Why did this happen?” |
| Relationships | General (entity links) | Explicit causal links |
| Data type | Documents / knowledge base | Conversations + decisions |
| Time awareness | Limited | Strong (memory evolves over time) |
| Reasoning depth | Multi-hop | Causal + decision reasoning |
The Subtle but Important Shift
Graph RAG answers:
👉 “What information is connected?”
Genesys answers:
👉 “What led to this outcome?”
This aligns with a deeper shift in AI systems:
- Graph RAG → structured retrieval
- Genesys → structured memory + reasoning
In fact, recent research directions are already moving from graph-based retrieval toward causal graph reasoning, because traditional systems still rely on correlation rather than true cause-effect understanding (Medium)
Vector RAG → finds similar text
Graph RAG → finds connected information
Genesys → finds causal reasoning behind decisions
When to Use What
- Use Graph RAG when:
- You are working with documents
- Relationships between entities matter
- You need better multi-hop retrieval
- Use Genesys-style memory when:
- You are building agents
- You need to remember decisions over time
- “why” matters more than “what”
Benchmark signal: 89.9% LoCoMo and +22 vs Mem0
The headline claim is that Genesys reports “89.9% on LoCoMo — 22 points above Mem0.” Treat this as a signal, not a guarantee, but it’s a useful signal because it targets the exact pain point: memory retrieval under natural conversational variation.
How to interpret the LoCoMo number
Benchmarks like LoCoMo are typically designed to test whether a system can recall and use previously seen information in a way that matches the user’s intent, including rephrasing and indirect references. If a system scores higher, it suggests its retrieval strategy is better aligned with how memory is queried in real conversations.
What matters operationally is not the absolute number, but what the delta implies: a large jump over a baseline memory approach indicates the retrieval primitive (here, causal graph vs flat vector storage) may be capturing relevance that embedding similarity misses.
Why “+22 points above Mem0” is notable
Mem0 is often used as a reference point for agent memory implementations. A +22 point lift suggests Genesys’ approach is not a marginal tweak (like different chunking), but a different retrieval model. If your production complaint is “vector search just doesn’t find it,” you’re likely hitting a structural mismatch—so a structural change (graph retrieval) is exactly what you’d expect to move metrics.
To keep this grounded, here’s a minimal configuration shape you can use when running a memory service that supports both a graph store and a vector fallback. The point is to make retrieval behavior explicit and testable.
This YAML config defines a graph-first retrieval policy with a vector fallback, plus traversal limits to control latency.
Check out the Report here –> Genesys Benchmarking Report
service:
host: 0.0.0.0
port: 8080
memory:
retrieval_policy: graph_first
graph:
traversal:
max_depth: 3
max_nodes: 40
edge_types: ["CAUSES", "MOTIVATES", "DEPENDS_ON", "RESULTS_IN", "REFERS_TO"]
vector_fallback:
enabled: true
top_k: 8
observability:
log_level: info
include_justification_path: true
limits:
request_timeout_ms: 1500
max_memory_write_bytes: 65536
GitHub Repository
Agent memory systems (graph + retrieval patterns)
Browse open-source agent memory implementations and retrieval strategies you can compare against a causal-graph approach.
Production architecture notes: retrieval behavior when users rephrase questions
Rephrasing is not an edge case; it’s the default. In production, the same user intent appears as:
- Different nouns (“my onboarding doc” vs “the setup guide”)
- Different verbs (“pick” vs “avoid” vs “switch”)
- Different directionality (cause vs effect)
What to watch in retrieval logs
If you’re evaluating Genesys-style graph retrieval, instrument for these signals:
- Hit rate under paraphrase: does the correct memory show up without prompt hacks?
- Justification path quality: are edges meaningful or noisy?
- Latency distribution: traversal can spike tail latency if unconstrained.
Concrete behavior: “graph-first, vector-fallback”
A practical production posture is to attempt graph retrieval first (because it’s more robust to rephrasing), then fall back to vector search for “loose” semantic matches. The key is to keep the fallback explicit so you can measure when the graph fails and why.
This bash snippet shows a local run workflow: start the stack, write a memory, then query it twice with rephrased prompts and compare results.
# Start services
docker compose up -d --build
# Write a memory that includes a decision and its cause
curl -sS -X POST http://localhost:8080/memory/write \
-H 'Content-Type: application/json' \
-d '{
"user_id": "u-123",
"text": "We chose Postgres because we needed strong consistency and SQL reporting.",
"tags": ["decision", "database"],
"edges": [
{"type": "MOTIVATES", "from": "need:strong-consistency", "to": "decision:use-postgres"},
{"type": "MOTIVATES", "from": "need:sql-reporting", "to": "decision:use-postgres"}
]
}' | jq .
# Query 1: direct phrasing
curl -sS -X POST http://localhost:8080/memory/query \
-H 'Content-Type: application/json' \
-d '{"user_id":"u-123","query":"Why did we pick Postgres?"}' | jq .
# Query 2: rephrased / inverted causality
curl -sS -X POST http://localhost:8080/memory/query \
-H 'Content-Type: application/json' \
-d '{"user_id":"u-123","query":"What constraints made us avoid other databases?"}' | jq .
# Inspect logs for whether retrieval used graph traversal or vector fallback
docker compose logs -n 200 memory
Genesys’ role in the memory workflow
In an agent stack, Genesys sits between the conversation stream and the model prompt. Its job is to (1) ingest interaction events into a structured memory representation and (2) retrieve the right subset of memories when a new user message arrives—especially when the message is rephrased.
Where the causal graph helps
- Ingestion: extract entities/events/decisions and link them (cause/effect, depends-on, refers-to).
- Retrieval: map a query to candidate nodes, traverse edges to gather supporting context, and return a compact memory bundle.
- Prompt assembly: include the memory plus a justification path so the LLM can “see” why the memory is relevant.

Operational guardrails
Graph retrieval can be powerful, but you need constraints:
- Traversal limits (depth, max nodes) to cap latency.
- Edge hygiene to prevent noisy links from polluting retrieval.
- Fallback policy to avoid “no result” failures when the graph is sparse.
Deployment workflow: Dockerfile + Docker Compose
To make this practical, you want a local, reproducible way to run a memory service and iterate on retrieval behavior. Below is a minimal containerized setup: a Python-based HTTP service that reads the YAML config and exposes /memory/write and /memory/query. It’s intentionally small so you can swap in the real Genesys implementation or adapt the interface to match it.
Dockerfile: package the memory service
This Dockerfile builds a small FastAPI service image. It installs dependencies, copies the app, and runs Uvicorn on port 8080.
FROM python:3.12-slim
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1
WORKDIR /app
RUN pip install --no-cache-dir fastapi==0.115.0 uvicorn[standard]==0.30.6 pyyaml==6.0.2
COPY app.py /app/app.py
COPY config.yaml /app/config.yaml
EXPOSE 8080
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8080"]
Docker Compose: run locally as a service
This Compose file runs the memory service and mounts the config so you can tweak traversal limits and fallback behavior without rebuilding.
services:
memory:
build:
context: .
container_name: genesys-memory-local
ports:
- "8080:8080"
volumes:
- ./config.yaml:/app/config.yaml:ro
environment:
- CONFIG_PATH=/app/config.yaml
Minimal service implementation (working example)
This is a compact, working FastAPI app that demonstrates the shape of graph-first retrieval with a vector-like fallback. It’s not a full Genesys implementation, but it lets you test the key production behavior: rephrasing and causal traversal returning a justification path.
import os
from collections import defaultdict, deque
from typing import Any, Dict, List, Optional
import yaml
from fastapi import FastAPI
from pydantic import BaseModel
app = FastAPI(title="Memory Service (graph-first)")
CONFIG_PATH = os.getenv("CONFIG_PATH", "config.yaml")
def load_config() -> Dict[str, Any]:
with open(CONFIG_PATH, "r", encoding="utf-8") as f:
return yaml.safe_load(f)
# In-memory stores for local dev
MEMORIES: Dict[str, List[Dict[str, Any]]] = defaultdict(list) # user_id -> list of memories
GRAPH: Dict[str, Dict[str, List[str]]] = defaultdict(lambda: defaultdict(list)) # user_id -> adjacency list
class Edge(BaseModel):
type: str
from_: str
to: str
class Config:
fields = {"from_": "from"}
class WriteRequest(BaseModel):
user_id: str
text: str
tags: Optional[List[str]] = None
edges: Optional[List[Edge]] = None
class QueryRequest(BaseModel):
user_id: str
query: str
def tokenize(s: str) -> set:
return {t.strip(".,!?()\"'").lower() for t in s.split() if t.strip()}
def vector_fallback(user_id: str, query: str, top_k: int) -> List[Dict[str, Any]]:
q = tokenize(query)
scored = []
for m in MEMORIES[user_id]:
score = len(q & tokenize(m["text"]))
scored.append((score, m))
scored.sort(key=lambda x: x[0], reverse=True)
return [m for score, m in scored[:top_k] if score > 0]
def graph_retrieve(user_id: str, query: str, max_depth: int, max_nodes: int) -> Dict[str, Any]:
# Seed nodes from query tokens by matching against known node ids
q = tokenize(query)
nodes = set()
for node in GRAPH[user_id].keys():
if any(tok in node.lower() for tok in q):
nodes.add(node)
visited = set()
justification = []
results = []
dq = deque([(n, 0) for n in nodes])
while dq and len(visited) = max_depth:
continue
for edge_key, neighs in GRAPH[user_id][node].items():
for nxt in neighs:
justification.append({"from": node, "edge": edge_key, "to": nxt})
dq.append((nxt, depth + 1))
# Deduplicate memories by id
seen = set()
uniq = []
for m in results:
mid = m.get("id")
if mid and mid in seen:
continue
if mid:
seen.add(mid)
uniq.append(m)
return {"memories": uniq[:8], "justification": justification[:50], "visited_nodes": list(visited)}
@app.post("/memory/write")
def write_memory(req: WriteRequest):
mem_id = f"m-{len(MEMORIES[req.user_id]) + 1}"
MEMORIES[req.user_id].append({"id": mem_id, "text": req.text, "tags": req.tags or []})
# Add edges into adjacency list, keyed by edge type
if req.edges:
for e in req.edges:
GRAPH[req.user_id][e.from_][e.type].append(e.to)
# Also store reverse reference to improve recall under inverted phrasing
GRAPH[req.user_id][e.to][f"REV_{e.type}"].append(e.from_)
return {"ok": True, "id": mem_id, "graph_nodes": len(GRAPH[req.user_id])}
@app.post("/memory/query")
def query_memory(req: QueryRequest):
cfg = load_config()
policy = cfg["memory"]["retrieval_policy"]
if policy != "graph_first":
return {"error": "Only graph_first is supported in this demo"}
gcfg = cfg["memory"]["graph"]["traversal"]
out = graph_retrieve(
req.user_id,
req.query,
max_depth=int(gcfg["max_depth"]),
max_nodes=int(gcfg["max_nodes"]),
)
used = "graph"
if not out["memories"] and cfg["memory"]["vector_fallback"]["enabled"]:
used = "vector_fallback"
top_k = int(cfg["memory"]["vector_fallback"]["top_k"])
out = {"memories": vector_fallback(req.user_id, req.query, top_k), "justification": [], "visited_nodes": []}
return {"used": used, **out}
Image build/push step (for promotion to a cluster)
If you later promote this service to a Kubernetes deployment, the Docker workflow is the same: build, tag, push to a registry, then deploy. Here are the concrete commands (replace the registry with yours).
# Build and tag
docker build -t ghcr.io/your-org/genesys-memory-service:0.1.0 .
# Authenticate (example: GitHub Container Registry)
echo "$GITHUB_TOKEN" | docker login ghcr.io -u your-username --password-stdin
# Push
docker push ghcr.io/your-org/genesys-memory-service:0.1.0
Conclusion
The recurring production complaint—“vector search just doesn’t find it”—is usually a mismatch between how users ask and how memories are indexed. Flat vector storage optimizes for semantic proximity; it’s often brittle under rephrasing, causal inversion, and multi-hop recall. The Genesys approach reframes memory as a causal graph, aiming to retrieve not just a similar chunk but the connected context that makes an answer correct and usable.
The reported Genesys Memory System LoCoMo benchmark signal (89.9% and +22 points vs Mem0) is notable because it aligns with the exact failure mode teams see in agent memory: rephrased queries that should map to the same underlying facts. If your agent’s memory breaks when users change wording, a graph-first retrieval policy is a reasonable direction to evaluate.
Next step: run the Docker Compose setup locally, write a few “decision + cause” memories from your own domain, and test retrieval under rephrasing. If you can’t reproduce the failure locally, you won’t be able to fix it in production—make retrieval behavior measurable before you scale it.

