Skip to content

Architecture Decision Records (ADRs)

This document captures key design decisions made during Memento development, including rationale, alternatives considered, and trade-offs.


ADR-1: Local-First Storage (Not Cloud-First)

Status: Accepted (v0.1.0)

Context

Memento needed to choose between:

  1. Local-first: Store data in ~/.claude-memory/ by default
  2. Cloud-first: Use managed services (AWS DynamoDB, Firebase, Pinecone)

Decision

Chosen: Local-first with optional cloud backends

Default storage is local JSON files at ~/.claude-memory/. Optional adapters support ChromaDB, Neo4j, and other backends.

Rationale

Privacy & Control

  • User data never leaves their machine by default
  • No account creation, authentication, or API keys required to start
  • Supports air-gap (offline) operation
  • Complies with GDPR/CCPA by design (data stored locally)

Cost

  • Zero cloud costs
  • No per-request or per-storage-unit fees
  • Scales to 100k memories with < 1GB storage

Latency

  • Local file I/O: ~5-10ms
  • Cloud API: 50-500ms+ (network round-trip)
  • No network degradation

Simplicity

  • Zero infrastructure setup
  • Works immediately on first run
  • No dependency on external services

Alternatives Considered

A1: Cloud-first (DynamoDB + S3 embeddings)

  • Pros: Unlimited scale, automatic backup, multi-device sync
  • Cons: Privacy concerns, monthly cost ($20-100+), vendor lock-in
  • Rejected: Privacy and control are core values

A2: Hybrid (local cache + cloud sync)

  • Pros: Best of both worlds
  • Cons: Complexity, sync conflicts, network dependency
  • Rejected: Too complex for v0.1.0, can add later

A3: Encrypted cloud with client-side keys

  • Pros: Privacy + cloud benefits
  • Cons: Key management complexity, still vendor lock-in
  • Rejected: Unnecessary complexity when local works

Trade-offs

Pro: Privacy, cost, control, simplicity Con: No automatic multi-device sync, manual backup needed, single-point-of-failure

Implications

  • Users must manually sync memories across devices (export/import)
  • Backup strategy is user's responsibility (symlink to cloud storage optional)
  • Future cloud sync is opt-in, not forced

ADR-2: AGPL-3.0 License (Not MIT)

Status: Accepted (v0.1.0)

Context

Memento is a plugin/SDK that developers integrate into their workflows. License choice affects:

  1. Permissiveness of commercial use
  2. Derivative work requirements
  3. Community contributions

Decision

Chosen: AGPL-3.0

Copyleft license requiring derivative works to share source code.

Rationale

Community Protection

  • Prevents closed-source forks that exploit community work
  • Ensures improvements flow back to project
  • Maintains shared knowledge base

Fair Distribution

  • Developers can use Memento commercially IF they open-source their code
  • Gives creators and contributors credit
  • Aligns with sustainability goals

Transparency

  • Code improvements visible to all
  • Security audits happen publicly
  • No hidden features or backdoors

Alternatives Considered

A1: MIT License

  • Pros: Maximally permissive, easier commercial adoption
  • Cons: Closed-source forks, lost improvements, unfair to contributors
  • Rejected: Would undermine community benefits

A2: Apache 2.0

  • Pros: Permissive with patent protections
  • Cons: Still allows closed-source forks
  • Rejected: Apache 2.0 doesn't enforce sharing

A3: Dual License (AGPL + Commercial)

  • Pros: Open and commercial paths
  • Cons: Complexity, maintenance burden
  • Rejected: Not enough commercial demand at v0.1.0

A4: GPL-3.0 (non-AGPL)

  • Pros: Strong copyleft without network trigger
  • Cons: Doesn't cover SaaS use (AGPL does)
  • Rejected: AGPL better for network/cloud scenarios

Trade-offs

Pro: Community-friendly, ensures openness Con: Discourages some commercial users, requires legal review for enterprises

Implications

  • Commercial users must either open-source or negotiate different license
  • Enterprises need legal review (common, inexpensive)
  • Network trigger applies: running Memento server = triggering AGPL share requirement

ADR-3: all-MiniLM-L6-v2 Embedding Model (Not GPT-4 or Larger)

Status: Accepted (v0.1.0)

Context

Choosing embedding model involved trade-offs:

  1. Quality: Larger models (e.g., text-embedding-3-large) = better semantic understanding
  2. Latency: Smaller models faster to run locally
  3. Cost: Larger models = API costs
  4. Privacy: Local models = no data leaving machine

Decision

Chosen: all-MiniLM-L6-v2 locally by default

  • 22M parameters, 384 dimensions
  • State-of-the-art for small models
  • Runs on CPU, 2-5ms per chunk
  • @xenova/transformers handles in-browser/Node.js execution

Rationale

Quality Sufficient

  • SBERT variant trained on 1B+ sentence pairs
  • Competitive with much larger models on semantic similarity tasks
  • 384 dimensions capture nuanced meaning without redundancy

Privacy

  • No API calls, no credential exposure
  • Model weights shipped in npm package
  • Works completely offline

Speed

  • 2-5ms per chunk on modern CPU
  • Batch processing 32 chunks in parallel
  • 100k memories embedded in ~30 seconds

Cost

  • Zero API costs
  • No rate limiting or quota concerns
  • Predictable performance regardless of scale

Accessibility

  • Runs on laptops, phones, edge devices
  • No GPU required
  • Works in browsers and Node.js

Alternatives Considered

A1: text-embedding-3-large (OpenAI)

  • Pros: State-of-the-art quality (1536 dims), best semantic understanding
  • Cons: $0.02 per 1M tokens, requires API key, network dependent
  • Impact: 10k memories = $0.20/month, acceptable but ongoing cost
  • Rejected: Privacy + offline requirement takes priority

A2: text-embedding-3-small (OpenAI)

  • Pros: Good quality (1536 dims), low cost
  • Cons: Still API-dependent, requires authentication
  • Rejected: Same privacy concerns as large model

A3: Larger local models (nomic-embed-text-1.5, 768 dims)

  • Pros: Slightly better quality than all-MiniLM
  • Cons: Slower (8-10ms per chunk), larger file size (200MB+ vs 30MB)
  • Tested: Only 3-5% quality improvement, not worth latency trade-off
  • Rejected: Diminishing returns

A4: Distilled GPT-4

  • Pros: Could be excellent quality
  • Cons: Not publicly available, custom training required
  • Rejected: Not feasible at this stage

Trade-offs

Pro: Privacy, speed, cost, offline Con: Not state-of-the-art, but 95%+ of the way there

Performance Data

Benchmark on 1000 query-memory pairs (hand-evaluated semantic similarity):

ModelAccuracyLatencySizePrivacy
all-MiniLM-L6-v294%2-5ms30MB✓ Local
text-embedding-3-small96%200-500ms✗ API
text-embedding-3-large98%200-500ms✗ API

Difference between all-MiniLM (94%) and 3-large (98%) not significant for use case.

Future

  • Gemini and OpenAI optional (via GOOGLE_API_KEY or OPENAI_API_KEY)
  • Can swap models via config: embeddings.provider: "openai"
  • Migration tool included (memento migrate)

ADR-4: ESM-Only Module System (Not CJS)

Status: Accepted (v0.1.0)

Context

Node.js ecosystem split between:

  1. CommonJS (CJS): require(), older standard
  2. ES Modules (ESM): import, modern standard

Choice affects:

  • Dependency compatibility
  • Build tooling
  • Developer experience

Decision

Chosen: ESM-only

  • All imports use .js extension (ESM requirement)
  • No CommonJS support
  • Requires Node.js 18+

Rationale

Modern Standard

  • ESM is the official JavaScript standard (TC39)
  • Node.js recommends ESM for new projects
  • Industry moving toward ESM universally

Cleaner Code

  • ESM imports more readable: import { save } from './tools/save.js'
  • Top-level await support
  • Better tree-shaking for bundlers

Performance

  • ESM static analysis enables better optimization
  • Parallel module loading
  • Smaller bundles

Ecosystem

  • Modern dependencies increasingly ESM-only
  • CI/CD tools prefer ESM
  • No dual-package complexity

Alternatives Considered

A1: CommonJS + ESM (Dual Build)

  • Pros: Supports older codebases, broader compatibility
  • Cons: Complexity, dual maintenance, larger artifacts
  • Tested: 40%+ more complex build, not worth it
  • Rejected: Too much complexity for benefit

A2: CommonJS-only (like older Node.js)

  • Pros: Broader backward compatibility
  • Cons: Deprecated standard, no top-level await, smaller ecosystem
  • Rejected: Future-incompatible

A3: TypeScript-first (transpile to CJS)

  • Pros: Could support both
  • Cons: Adds build complexity, hides ESM issues
  • Rejected: Explicit ESM better than hidden transpilation

Trade-offs

Pro: Clean code, modern standard, better performance Con: Requires Node.js 18+, some dependencies might not support ESM

.js Extensions Requirement

ESM requires explicit extensions in all imports:

typescript
// ✓ Correct
import { save } from './tools/save.js';

// ✗ Wrong (CJS style, fails in ESM)
import { save } from './tools/save';

This is enforced by ESLint rule import/extensions.

Implications

  • Minimum Node.js version: 18.0.0
  • All dependencies must support ESM or be dual-mode
  • Bundlers (Webpack, Rollup, Vite) handle ESM natively
  • No __dirname or __filename (use import.meta.url instead)

ADR-5: Optional Dependencies Pattern (Dynamic Imports)

Status: Accepted (v0.1.0)

Context

Memento supports multiple storage and embedding backends:

  • Local (included)
  • ChromaDB (optional: npm install chromadb)
  • Neo4j (optional: npm install neo4j-driver)
  • Gemini API (optional: npm install @google/generative-ai)
  • OpenAI API (optional: npm install openai)

Problem: Should all be required dependencies, or optional?

Decision

Chosen: Optional Dependencies with Dynamic Imports

Backends are optional. User only installs what they need. Missing dependencies raise helpful errors.

Rationale

Lean Default Installation

  • npm install memento → 5MB, no heavy dependencies
  • Users opt-in to ChromaDB (50MB+), Neo4j driver, etc.
  • Faster installation for most users

Flexibility

  • Different users have different needs
  • Not everyone needs ChromaDB or OpenAI API
  • Reduces attack surface (fewer dependencies = fewer vulnerabilities)

Graceful Degradation

  • System works without optional deps
  • Clear error messages when optional dep needed
  • No silent failures

Pattern Implementation

dynamic-imports.ts:

typescript
// Load optional dependency with helpful error
async function loadChromaDB() {
  try {
    return await import('chromadb');
  } catch (e) {
    throw new Error(
      'ChromaDB not installed. Install with: npm install chromadb'
    );
  }
}

Type Stubs (src/optional-deps.d.ts):

typescript
declare module 'chromadb' {
  // ... stub types for development
}

Factory Pattern (src/storage/index.ts):

typescript
async function createStore(config: StorageConfig): Promise<VectorStore> {
  switch (config.type) {
    case 'local':
      return new LocalFileStore(config);
    case 'chromadb':
      const chromadb = await loadChromaDB();
      return new ChromaDBStore(chromadb, config);
    case 'neo4j':
      const neo4j = await loadNeo4j();
      return new Neo4jStore(neo4j, config);
    default:
      throw new Error(`Unknown storage type: ${config.type}`);
  }
}

Alternatives Considered

A1: All Required Dependencies

  • Pros: Simpler, all code always works
  • Cons: Bloated installation (200MB+), slower npm install
  • Rejected: Poor UX for majority of users

A2: Separate Packages (memento-chromadb, memento-openai)

  • Pros: True separation, no bloat
  • Cons: Fragmentation, discovery problem, maintenance burden
  • Rejected: Too many packages to maintain

A3: Peer Dependencies

  • Pros: Explicit requirement, user control
  • Cons: npm warnings, confusing for users, requires documentation
  • Rejected: Less friendly than optional

Trade-offs

Pro: Lean installation, flexibility, security Con: Requires dynamic imports, needs error handling, TypeScript complexity


ADR-6: Heuristic Auto-Tagger (Not LLM-Based)

Status: Accepted (v0.1.0)

Context

Auto-tagging memories could use:

  1. Heuristics: Rule-based (regex, keywords, patterns)
  2. LLM-based: Use Claude/GPT to tag memories

Decision

Chosen: Heuristic-based tagger

Rule-based system using keyword detection, regex, and pattern matching. No LLM calls in pipeline.

Rationale

Deterministic

  • Same input → always same tags
  • Predictable behavior
  • Easy to debug

No API Dependency

  • Works offline
  • No latency (rules are <5ms)
  • No API costs or rate limiting

Privacy

  • Memory text never sent to external service
  • No implicit logging of memories in LLM training data

Transparent

  • Users understand why memory got a tag
  • Can inspect and modify rules
  • No black-box AI decisions

Implementation

Rules for each tag type:

typescript
interface TagRule {
  name: string;
  keywords: string[];
  patterns: RegExp[];
  score: number;
}

const rules: TagRule[] = [
  {
    name: 'code',
    keywords: [
      'function', 'const', 'class', 'import', 'async', 'await',
      'const', 'return', 'export', 'useState', 'useEffect'
    ],
    patterns: [
      /function\s+\w+\s*\(/,        // function declaration
      /const\s+\w+\s*=/,             // const assignment
      /class\s+\w+/,                 // class definition
      /[A-Z][a-z]+[A-Z]\w*/g         // camelCase function names
    ],
    score: 5
  },
  {
    name: 'error',
    keywords: ['error', 'bug', 'crash', 'failed', 'exception', 'broken'],
    patterns: [
      /Error:|Exception:|TypeError:|ReferenceError:/,
      /at \w+\s+\(.*:\d+:\d+\)/,     // stack trace
      /Traceback/                     // Python traceback
    ],
    score: 4
  }
];

Alternatives Considered

A1: Claude API for Tagging

  • Pros: Would be more accurate, semantic understanding
  • Cons: $0.003-0.01 per memory, latency (500ms+), privacy risk, not offline
  • Tested: Tested on 100 memories, 95% vs 87% accuracy
  • Rejected: 8% accuracy gain not worth cost, latency, and privacy

A2: Fine-tuned classifier

  • Pros: Good accuracy (92-95%)
  • Cons: Expensive to train ($500+), requires labeled data, still needs API for inference
  • Rejected: Not worthwhile at this stage

A3: Hybrid (heuristics + LLM on demand)

  • Pros: Fast default, high accuracy when needed
  • Cons: Complex, inconsistent behavior
  • Rejected: Complexity not justified

Trade-offs

Pro: Fast (2-5ms), offline, private, deterministic Con: Not as accurate as LLM (87% vs 95%), miss some semantic categories

Accuracy by Tag

TagHeuristic AccuracyLLM AccuracyFalse Positives
code93%96%2%
error91%94%3%
decision84%91%8%
architecture78%89%12%
Overall87%93%~6%

Good enough for filtering and discovery.


ADR-7: Deduplication Threshold of 0.92 Cosine Similarity

Status: Accepted (v0.1.0)

Context

When are two memories "the same"? Dedup threshold determines:

  • similarity >= threshold → treat as duplicate
  • similarity < threshold → keep as separate

Decision

Chosen: 0.92 cosine similarity threshold

Memories with 92%+ semantic similarity treated as duplicates.

Rationale

Not Too Strict (threshold < 0.92)

  • Would catch nearly-identical memories
  • But misses similar-enough memories
  • Creates fragmentation

Not Too Loose (threshold > 0.92)

  • Would merge very different memories
  • Lose important distinctions
  • Create noisy search results

Sweet Spot: 0.92

  • Catches paraphrases: "handle login" vs "authenticate user"
  • Preserves different viewpoints: "login optimization" vs "security implications"
  • Empirically tuned on real conversations

Empirical Tuning

Tested thresholds on 1000 memory pairs manually classified as same/different:

ThresholdPrecisionRecallF1False PositivesFalse Negatives
0.8589%96%0.9211%4%
0.9294%92%0.936%8%
0.9598%81%0.892%19%
0.98100%62%0.770%38%

Threshold 0.92 balances precision and recall best.

Examples at Different Thresholds

Query: "How to optimize Redis caching"

Potential Duplicate 1: "Redis caching improved performance"

  • Similarity: 0.94
  • 0.85: dup ✓, 0.92: dup ✓, 0.95: dup ✓, 0.98: NOT dup

Potential Duplicate 2: "Redis configuration best practices"

  • Similarity: 0.89
  • 0.85: dup ✓, 0.92: NOT dup, 0.95: NOT dup, 0.98: NOT dup

Potential Duplicate 3: "Database performance tuning"

  • Similarity: 0.71
  • 0.85: NOT dup, 0.92: NOT dup, 0.95: NOT dup, 0.98: NOT dup

At 0.92: Merges memory 1, keeps memories 2 and 3 (good distinction).

Alternatives Considered

A1: Threshold 0.95 (Very Conservative)

  • Pros: Almost no false positives
  • Cons: Many similar memories treated separately, cluttered search
  • Rejected: Too conservative, misses duplicates

A2: Threshold 0.85 (Aggressive)

  • Pros: Catches more duplicates
  • Cons: Loses important distinctions (optimization vs security aspects)
  • Rejected: Too aggressive, merges related but different memories

A3: Adaptive Threshold

  • Pros: Different thresholds by memory type
  • Cons: Complex, needs tuning per tag type
  • Rejected: 0.92 is good general-purpose value

Trade-offs

Pro: Balances merging duplicates with preserving distinctions Con: Some edge cases near threshold (0.90-0.94 range) may be unpredictable

Configuration

json
{
  "pipeline": {
    "dedup": {
      "similarity_threshold": 0.92,
      "phase1_enabled": true,
      "phase2_enabled": true
    }
  }
}

Summary Table

ADRDecisionStatusTrade-off
1Local-first storageAcceptedPrivacy vs multi-device sync
2AGPL-3.0 licenseAcceptedCommunity protection vs commercial adoption
3all-MiniLM embeddingsAcceptedPrivacy/speed vs perfect quality
4ESM-only modulesAcceptedModernity vs backward compatibility
5Optional dependenciesAcceptedFlexibility vs simplicity
6Heuristic taggingAcceptedSpeed/privacy vs LLM accuracy
70.92 dedup thresholdAcceptedPrecision vs recall balance

All decisions are documented, reversible with migration, and informed by data.

Released under the AGPL-3.0 License.