Genesys Memory System: 89.9% LoCoMo vs Vectors

Genesys Memory System: 89.9% LoCoMo vs Vectors

Core

  • In production agent memory, “vector search just doesn’t find it” when users rephrase the same intent.
  • Genesys Memory System replaces flat vector storage with a causal graph to retrieve memories by relationships, not just embedding proximity or lets say “Genesys extends Graph RAG into agent memory with causal reasoning”
  • You’ll learn what changes, what the “89.9% on LoCoMo” signal implies, and how to run a memory service locally with Docker Compose.

Introduction

Agent “memory” sounds straightforward: store past user facts and retrieve them later. In production, it’s messier. Users don’t repeat themselves the same way twice, and the most common implementation—flat vector storage with nearest-neighbor search—often fails at the exact moment you need it: when the query is phrased differently.

This is the motivation behind the Genesys Memory System, which positions itself as an alternative to flat vector storage by modeling memories as a causal graph. Instead of betting everything on embedding similarity, it tries to retrieve based on relationships and causal structure—closer to how “why” and “because” show up in real conversations.

In this post, you’ll map the failure mode (“vector search just doesn’t find it”) to concrete retrieval behavior, understand what “causal graph vs flat vector storage” changes in practice, interpret the “Genesys Memory System LoCoMo benchmark” claim (89.9% and +22 points vs Mem0), and package a minimal memory service with Docker for local iteration.

Hook: Why “vector search just doesn’t find it”

Flat vector storage typically means: embed each memory as a vector, dump it into a vector index, and retrieve top-k by cosine similarity. This works well when the query is semantically close to the stored text. It fails when the user asks for the same underlying fact through a different framing, different entities, or different causal direction.

Common production failure patterns:

  • Rephrasing drift: “What did I say about my diet?” vs “Remind me what I’m avoiding these days.” Same intent, different lexical anchors.
  • Causal inversion: “Why did we pick Postgres?” vs “What constraints made us avoid DynamoDB?” The relevant memory is about constraints/decisions, not the literal technology name.
  • Multi-hop recall: The answer requires chaining: user preference → project constraint → decision. Flat top-k often returns one hop, not the chain.

In agent memory, the hard part isn’t storing text—it’s retrieving the right fact when the user’s wording changes and the “reason” matters more than the “phrase.”

When teams say “vector search just doesn’t find it,” they’re usually describing one of two things: (1) the correct memory isn’t in top-k, or (2) the correct memory is present but not salient enough for the model to use because it’s missing connecting context. Both are retrieval problems, not generation problems.

What Genesys changes: causal graph vs flat vectors

The core shift in Genesys is the storage and retrieval primitive. Rather than treating each memory as an independent point in a vector space (flat vector storage), Genesys models memories as nodes and relationships as edges in a causal graph. The goal is to retrieve not only “similar” snippets, but also the causal neighborhood that explains why something is true or relevant.

Flat vector storage: what it optimizes for

Flat vector storage is optimized for semantic proximity between a query and a chunk. In practice, it tends to:

  • Return high-similarity paraphrases (good for FAQ-like recall).
  • Miss causal/relational relevance when similar words aren’t present.
  • Over-retrieve generic memories (“preferences”, “project”) that embed similarly.

Causal graph memory: what it optimizes for

A causal graph representation aims to make retrieval robust to rephrasing by anchoring on relationships: decisions, motivations, constraints, outcomes, and dependencies. Instead of “find the closest chunk,” retrieval can look like:

  • Identify candidate nodes related to the query intent (entities, events, decisions).
  • Traverse edges to pull supporting context (causes, effects, prerequisites).
  • Return a structured bundle: key memory + justification path.

That “justification path” is the practical difference. In production, you want the agent to answer with the relevant fact and the supporting context that makes the fact usable and less hallucination-prone.

Comparison: causal graph vs flat vector storage

Criteria Flat vector storage Causal graph (Genesys)
Best at Direct semantic similarity Relational + causal recall
Rephrasing robustness Often brittle when anchors change Improves when relationships stay stable
Multi-hop context Requires luck in top-k or extra prompting Natural via graph traversal
Explainability “These chunks were similar” “This is relevant because A → B → C”
Operational complexity Simple indexing + ANN search Graph schema + edge management + traversal

Graph RAG vs Genesys Memory: What’s the Difference

At first glance, Graph RAG and the Genesys Memory System look very similar. Both move beyond simple vector search and introduce structure into retrieval. But they solve slightly different problems.

Let’s break this down with a simple example.


A Simple Example: “Why did we choose Postgres?”

Imagine your system has stored this information:

“We chose Postgres because we needed strong consistency and SQL reporting.”


How Graph RAG handles this

Graph RAG first converts information into a knowledge graph:

[Postgres] ← (chosen because) ← [strong consistency]
[Postgres] ← (chosen because) ← [SQL reporting]

Now when a user asks:

  • “Why did we choose Postgres?”
  • or even “What requirements led us to select our database?”

Graph RAG:

  1. Finds the node Postgres
  2. Traverses connected nodes
  3. Returns related facts + relationships

👉 Result:
It gives connected context, not just similar text.

📌 This is the key improvement over standard RAG:

  • Instead of isolated chunks, Graph RAG retrieves relational context (Memgraph)

How Genesys Memory handles this

Genesys builds a causal memory graph, not just a knowledge graph.

It represents the same information like this:

[need: strong consistency] → (MOTIVATES) → [decision: Postgres]
[need: SQL reporting] → (MOTIVATES) → [decision: Postgres]

Now when the user asks:

  • “Why did we choose Postgres?”
  • OR “Why didn’t we use DynamoDB?”

Genesys:

  1. Identifies the decision node
  2. Traverses causal relationships
  3. Returns:
    • the decision
    • the reasons (causes)
    • optionally the reasoning chain

👉 Result:
It answers not just what is connected, but what caused the decision


Key Similarities

Graph RAG and Genesys share a strong foundation:

  • Both use graph structures (nodes + edges)
  • Both support multi-hop reasoning
  • Both improve retrieval beyond simple similarity
  • Both provide more explainable outputs

👉 In fact, Graph RAG itself is an evolution of traditional RAG that adds structure and reasoning (Memgraph)


Key Differences

The difference becomes clear when you look at what kind of reasoning they enable:

Aspect Graph RAG Genesys Memory
Core idea Knowledge graph retrieval Causal memory system
Focus “What is related?” “Why did this happen?”
Relationships General (entity links) Explicit causal links
Data type Documents / knowledge base Conversations + decisions
Time awareness Limited Strong (memory evolves over time)
Reasoning depth Multi-hop Causal + decision reasoning

The Subtle but Important Shift

Graph RAG answers:

👉 “What information is connected?”

Genesys answers:

👉 “What led to this outcome?”

This aligns with a deeper shift in AI systems:

  • Graph RAG → structured retrieval
  • Genesys → structured memory + reasoning

In fact, recent research directions are already moving from graph-based retrieval toward causal graph reasoning, because traditional systems still rely on correlation rather than true cause-effect understanding (Medium)


Vector RAG   → finds similar text
Graph RAG    → finds connected information
Genesys      → finds causal reasoning behind decisions

When to Use What

  • Use Graph RAG when:
    • You are working with documents
    • Relationships between entities matter
    • You need better multi-hop retrieval
  • Use Genesys-style memory when:
    • You are building agents
    • You need to remember decisions over time
    • “why” matters more than “what”

Benchmark signal: 89.9% LoCoMo and +22 vs Mem0

The headline claim is that Genesys reports “89.9% on LoCoMo22 points above Mem0.” Treat this as a signal, not a guarantee, but it’s a useful signal because it targets the exact pain point: memory retrieval under natural conversational variation.

How to interpret the LoCoMo number

Benchmarks like LoCoMo are typically designed to test whether a system can recall and use previously seen information in a way that matches the user’s intent, including rephrasing and indirect references. If a system scores higher, it suggests its retrieval strategy is better aligned with how memory is queried in real conversations.

What matters operationally is not the absolute number, but what the delta implies: a large jump over a baseline memory approach indicates the retrieval primitive (here, causal graph vs flat vector storage) may be capturing relevance that embedding similarity misses.

Why “+22 points above Mem0” is notable

Mem0 is often used as a reference point for agent memory implementations. A +22 point lift suggests Genesys’ approach is not a marginal tweak (like different chunking), but a different retrieval model. If your production complaint is “vector search just doesn’t find it,” you’re likely hitting a structural mismatch—so a structural change (graph retrieval) is exactly what you’d expect to move metrics.

To keep this grounded, here’s a minimal configuration shape you can use when running a memory service that supports both a graph store and a vector fallback. The point is to make retrieval behavior explicit and testable.

This YAML config defines a graph-first retrieval policy with a vector fallback, plus traversal limits to control latency.

Check out the Report here –> Genesys Benchmarking Report 


service:
  host: 0.0.0.0
  port: 8080

memory:
  retrieval_policy: graph_first
  graph:
    traversal:
      max_depth: 3
      max_nodes: 40
      edge_types: ["CAUSES", "MOTIVATES", "DEPENDS_ON", "RESULTS_IN", "REFERS_TO"]
  vector_fallback:
    enabled: true
    top_k: 8

observability:
  log_level: info
  include_justification_path: true
  
limits:
  request_timeout_ms: 1500
  max_memory_write_bytes: 65536
  

GitHub Repository

Agent memory systems (graph + retrieval patterns)

Browse open-source agent memory implementations and retrieval strategies you can compare against a causal-graph approach.

Explore on GitHub →

Production architecture notes: retrieval behavior when users rephrase questions

Rephrasing is not an edge case; it’s the default. In production, the same user intent appears as:

  • Different nouns (“my onboarding doc” vs “the setup guide”)
  • Different verbs (“pick” vs “avoid” vs “switch”)
  • Different directionality (cause vs effect)

What to watch in retrieval logs

If you’re evaluating Genesys-style graph retrieval, instrument for these signals:

  • Hit rate under paraphrase: does the correct memory show up without prompt hacks?
  • Justification path quality: are edges meaningful or noisy?
  • Latency distribution: traversal can spike tail latency if unconstrained.

Concrete behavior: “graph-first, vector-fallback”

A practical production posture is to attempt graph retrieval first (because it’s more robust to rephrasing), then fall back to vector search for “loose” semantic matches. The key is to keep the fallback explicit so you can measure when the graph fails and why.

This bash snippet shows a local run workflow: start the stack, write a memory, then query it twice with rephrased prompts and compare results.


# Start services
docker compose up -d --build

# Write a memory that includes a decision and its cause
curl -sS -X POST http://localhost:8080/memory/write \
  -H 'Content-Type: application/json' \
  -d '{
    "user_id": "u-123",
    "text": "We chose Postgres because we needed strong consistency and SQL reporting.",
    "tags": ["decision", "database"],
    "edges": [
      {"type": "MOTIVATES", "from": "need:strong-consistency", "to": "decision:use-postgres"},
      {"type": "MOTIVATES", "from": "need:sql-reporting", "to": "decision:use-postgres"}
    ]
  }' | jq .

# Query 1: direct phrasing
curl -sS -X POST http://localhost:8080/memory/query \
  -H 'Content-Type: application/json' \
  -d '{"user_id":"u-123","query":"Why did we pick Postgres?"}' | jq .

# Query 2: rephrased / inverted causality
curl -sS -X POST http://localhost:8080/memory/query \
  -H 'Content-Type: application/json' \
  -d '{"user_id":"u-123","query":"What constraints made us avoid other databases?"}' | jq .

# Inspect logs for whether retrieval used graph traversal or vector fallback
docker compose logs -n 200 memory
  

Genesys’ role in the memory workflow

In an agent stack, Genesys sits between the conversation stream and the model prompt. Its job is to (1) ingest interaction events into a structured memory representation and (2) retrieve the right subset of memories when a new user message arrives—especially when the message is rephrased.

Where the causal graph helps

  • Ingestion: extract entities/events/decisions and link them (cause/effect, depends-on, refers-to).
  • Retrieval: map a query to candidate nodes, traverse edges to gather supporting context, and return a compact memory bundle.
  • Prompt assembly: include the memory plus a justification path so the LLM can “see” why the memory is relevant.
Architecture diagram showing Genesys Memory System as a graph-based memory service with graph traversal retrieval and vector fallback in an agent workflow
Architecture diagram showing Genesys Memory System as a graph-based memory service with graph traversal retrieval and vector fallback in an agent workflow

Operational guardrails

Graph retrieval can be powerful, but you need constraints:

  • Traversal limits (depth, max nodes) to cap latency.
  • Edge hygiene to prevent noisy links from polluting retrieval.
  • Fallback policy to avoid “no result” failures when the graph is sparse.

Deployment workflow: Dockerfile + Docker Compose

To make this practical, you want a local, reproducible way to run a memory service and iterate on retrieval behavior. Below is a minimal containerized setup: a Python-based HTTP service that reads the YAML config and exposes /memory/write and /memory/query. It’s intentionally small so you can swap in the real Genesys implementation or adapt the interface to match it.

Dockerfile: package the memory service

This Dockerfile builds a small FastAPI service image. It installs dependencies, copies the app, and runs Uvicorn on port 8080.


FROM python:3.12-slim

ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1

WORKDIR /app

RUN pip install --no-cache-dir fastapi==0.115.0 uvicorn[standard]==0.30.6 pyyaml==6.0.2

COPY app.py /app/app.py
COPY config.yaml /app/config.yaml

EXPOSE 8080

CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8080"]
  

Docker Compose: run locally as a service

This Compose file runs the memory service and mounts the config so you can tweak traversal limits and fallback behavior without rebuilding.


services:
  memory:
    build:
      context: .
    container_name: genesys-memory-local
    ports:
      - "8080:8080"
    volumes:
      - ./config.yaml:/app/config.yaml:ro
    environment:
      - CONFIG_PATH=/app/config.yaml
  

Minimal service implementation (working example)

This is a compact, working FastAPI app that demonstrates the shape of graph-first retrieval with a vector-like fallback. It’s not a full Genesys implementation, but it lets you test the key production behavior: rephrasing and causal traversal returning a justification path.


import os
from collections import defaultdict, deque
from typing import Any, Dict, List, Optional

import yaml
from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI(title="Memory Service (graph-first)")

CONFIG_PATH = os.getenv("CONFIG_PATH", "config.yaml")


def load_config() -> Dict[str, Any]:
    with open(CONFIG_PATH, "r", encoding="utf-8") as f:
        return yaml.safe_load(f)


# In-memory stores for local dev
MEMORIES: Dict[str, List[Dict[str, Any]]] = defaultdict(list)  # user_id -> list of memories
GRAPH: Dict[str, Dict[str, List[str]]] = defaultdict(lambda: defaultdict(list))  # user_id -> adjacency list


class Edge(BaseModel):
    type: str
    from_: str
    to: str

    class Config:
        fields = {"from_": "from"}


class WriteRequest(BaseModel):
    user_id: str
    text: str
    tags: Optional[List[str]] = None
    edges: Optional[List[Edge]] = None


class QueryRequest(BaseModel):
    user_id: str
    query: str


def tokenize(s: str) -> set:
    return {t.strip(".,!?()\"'").lower() for t in s.split() if t.strip()}


def vector_fallback(user_id: str, query: str, top_k: int) -> List[Dict[str, Any]]:
    q = tokenize(query)
    scored = []
    for m in MEMORIES[user_id]:
        score = len(q & tokenize(m["text"]))
        scored.append((score, m))
    scored.sort(key=lambda x: x[0], reverse=True)
    return [m for score, m in scored[:top_k] if score > 0]


def graph_retrieve(user_id: str, query: str, max_depth: int, max_nodes: int) -> Dict[str, Any]:
    # Seed nodes from query tokens by matching against known node ids
    q = tokenize(query)
    nodes = set()
    for node in GRAPH[user_id].keys():
        if any(tok in node.lower() for tok in q):
            nodes.add(node)

    visited = set()
    justification = []
    results = []

    dq = deque([(n, 0) for n in nodes])
    while dq and len(visited) = max_depth:
            continue

        for edge_key, neighs in GRAPH[user_id][node].items():
            for nxt in neighs:
                justification.append({"from": node, "edge": edge_key, "to": nxt})
                dq.append((nxt, depth + 1))

    # Deduplicate memories by id
    seen = set()
    uniq = []
    for m in results:
        mid = m.get("id")
        if mid and mid in seen:
            continue
        if mid:
            seen.add(mid)
        uniq.append(m)

    return {"memories": uniq[:8], "justification": justification[:50], "visited_nodes": list(visited)}


@app.post("/memory/write")
def write_memory(req: WriteRequest):
    mem_id = f"m-{len(MEMORIES[req.user_id]) + 1}"
    MEMORIES[req.user_id].append({"id": mem_id, "text": req.text, "tags": req.tags or []})

    # Add edges into adjacency list, keyed by edge type
    if req.edges:
        for e in req.edges:
            GRAPH[req.user_id][e.from_][e.type].append(e.to)
            # Also store reverse reference to improve recall under inverted phrasing
            GRAPH[req.user_id][e.to][f"REV_{e.type}"].append(e.from_)

    return {"ok": True, "id": mem_id, "graph_nodes": len(GRAPH[req.user_id])}


@app.post("/memory/query")
def query_memory(req: QueryRequest):
    cfg = load_config()
    policy = cfg["memory"]["retrieval_policy"]

    if policy != "graph_first":
        return {"error": "Only graph_first is supported in this demo"}

    gcfg = cfg["memory"]["graph"]["traversal"]
    out = graph_retrieve(
        req.user_id,
        req.query,
        max_depth=int(gcfg["max_depth"]),
        max_nodes=int(gcfg["max_nodes"]),
    )

    used = "graph"
    if not out["memories"] and cfg["memory"]["vector_fallback"]["enabled"]:
        used = "vector_fallback"
        top_k = int(cfg["memory"]["vector_fallback"]["top_k"])
        out = {"memories": vector_fallback(req.user_id, req.query, top_k), "justification": [], "visited_nodes": []}

    return {"used": used, **out}
  

Image build/push step (for promotion to a cluster)

If you later promote this service to a Kubernetes deployment, the Docker workflow is the same: build, tag, push to a registry, then deploy. Here are the concrete commands (replace the registry with yours).


# Build and tag
docker build -t ghcr.io/your-org/genesys-memory-service:0.1.0 .

# Authenticate (example: GitHub Container Registry)
echo "$GITHUB_TOKEN" | docker login ghcr.io -u your-username --password-stdin

# Push
docker push ghcr.io/your-org/genesys-memory-service:0.1.0
  

Conclusion

The recurring production complaint—“vector search just doesn’t find it”—is usually a mismatch between how users ask and how memories are indexed. Flat vector storage optimizes for semantic proximity; it’s often brittle under rephrasing, causal inversion, and multi-hop recall. The Genesys approach reframes memory as a causal graph, aiming to retrieve not just a similar chunk but the connected context that makes an answer correct and usable.

The reported Genesys Memory System LoCoMo benchmark signal (89.9% and +22 points vs Mem0) is notable because it aligns with the exact failure mode teams see in agent memory: rephrased queries that should map to the same underlying facts. If your agent’s memory breaks when users change wording, a graph-first retrieval policy is a reasonable direction to evaluate.

Next step: run the Docker Compose setup locally, write a few “decision + cause” memories from your own domain, and test retrieval under rephrasing. If you can’t reproduce the failure locally, you won’t be able to fix it in production—make retrieval behavior measurable before you scale it.

Author

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *