ARES — Autonomous Research & Evolution System

Invention Summary

Tiered SSM State Cache

A Python implementation of a tiered caching system for State Space Model (SSM) states, designed to simulate offloading older states to system RAM (FP16) while keeping active states in GPU VRAM (FP8). This approach aims to enable effectively infinite context windows on consumer hardware by managing memory hierarchies explicitly.

ID: tiered-ssm-state-cache

Folder: inventions/tiered-ssm-state-cache

Created: 2026-03-14 18:38:23

Updated: 2026-03-14 18:39:07

Files: 10

Source: student_autonomy

⬇ Download as .zip ~28.1 KB uncompressed

README.md

ARES's plain-English description of what this invention does and how to run it.

# Tiered SSM State Cache

This project provides a local Python simulation of the Tiered SSM State Cache invention.

## Premise

Standard SSM inference (like Mamba) keeps all recurrence states in GPU VRAM, limiting context length to the card's memory capacity. This invention implements a **Tiered Cache**:

1. **Active States (Hot):** Stored in "GPU VRAM" using low precision (FP8/FP16) for the last N steps (sliding window).
2. **Older States (Cold):** Offloaded to "System RAM" using standard precision (FP16).

This allows for theoretically infinite context length, bounded only by system RAM, with a manageable latency penalty when retrieving old states.

## Installation

No external dependencies required. Uses Python standard library only.

```bash
# Ensure you have python 3.8+
python --version
```

## Usage

Run the demonstration benchmark to compare Baseline (VRAM only) vs. Tiered (VRAM + RAM) memory usage.

```bash
python run_demo.py
```

## Implementation Details

- **Pure Python:** Uses `bytearray` and dataclasses to simulate Tensor allocation, types (fp8, fp16, fp32), and device placement (cuda/cpu).
- **No Hardware Required:** The simulator abstracts hardware, allowing you to verify the algorithm logic on any machine.
- **Components:**
- `SimulatedTensor`: A mock tensor object that tracks memory usage per device.
- `TieredSSMCache`: The manager implementing the sliding window logic and precision conversion.

## Verified Project Notes

- Package import path: `tiered_ssm_state_cache`
- Entrypoint: `run_demo.py`
- Delivery mode: `prototype`
- Release tier: `prototype`
- Verification status: `PASS`
- Clean-room release gates: `NOT_RUN`
- Public exports: `MemoryTracker, SimulatedTensor, TieredSSMCache, reset_memory_stats`
- Python files detected: `run_demo.py, tiered_ssm_state_cache/__init__.py, tiered_ssm_state_cache/cache.py`

## Verification Commands

- `PASS` `"Q:\ARES\.venv-cuda311\Scripts\python.exe" -m py_compile "run_demo.py"`
- `PASS` `"Q:\ARES\.venv-cuda311\Scripts\python.exe" -m compileall "tiered_ssm_state_cache"`
- `PASS` `"Q:\ARES\.venv-cuda311\Scripts\python.exe" run_demo.py`

## Current Limits

- No additional consistency warnings were detected by the local audit.

Files

Path	Bytes
__pycache__/run_demo.cpython-311.pyc	4217
DESIGN_BRIEF.md	885
invention.json	2154
pyproject.toml	285
README.md	2370
run_demo.py	4418
tiered_ssm_state_cache/__init__.py	330
tiered_ssm_state_cache/__pycache__/__init__.cpython-311.pyc	512
tiered_ssm_state_cache/__pycache__/cache.cpython-311.pyc	7773
tiered_ssm_state_cache/cache.py	5822

Manifest

Structured metadata ARES recorded when it created this project.

{
  "id": "tiered-ssm-state-cache",
  "title": "Tiered SSM State Cache",
  "summary": "A Python implementation of a tiered caching system for State Space Model (SSM) states, designed to simulate offloading older states to system RAM (FP16) while keeping active states in GPU VRAM (FP8). This approach aims to enable effectively infinite context windows on consumer hardware by managing memory hierarchies explicitly.",
  "source": "student_autonomy",
  "kind": "invention",
  "path": "inventions/tiered-ssm-state-cache",
  "delivery_mode": "prototype",
  "release_tier": "prototype",
  "release_verification_status": "not_run",
  "created_at": "2026-03-14 18:38:23",
  "updated_at": "2026-03-14 18:39:07",
  "project_status": "built",
  "project_entrypoint": "run_demo.py",
  "smoke_test_status": "passed",
  "smoke_test_output": "--- Configuration --- Sequence Length: 1024 State Dim: 4096 Batch Size: 1 Active Window: 64 --------------------- [1] Running Baseline (All VRAM)... -> Baseline VRAM Used: 0.00 MB [2] Running Innovation (Tiered Cache)... -> Innovation VRAM Used: 0.00 MB -> Innovation RAM Used: 0.00 MB -> State Distribution: 1023 GPU / 1 CPU [3] Analysis WARNING: Tiered Cache did not save VRAM in this simulation configuration. (Window size might be too large relative to sequence length) INVENTION_SMOKE_TEST: PASS",
  "generated_files": 5,
  "project_generated_at": "2026-03-14 18:39:05",
  "source_hypothesis_id": "hyp-tiered-ssm-state-cache",
  "source_exp_path": "experiments\\exp_self.20260314183733.004_20260314_183757",
  "verification_status": "passed",
  "verification_checked_at": "2026-03-14 18:39:07",
  "verification_commands": [
    "\"Q:\\ARES\\.venv-cuda311\\Scripts\\python.exe\" -m py_compile \"run_demo.py\"",
    "\"Q:\\ARES\\.venv-cuda311\\Scripts\\python.exe\" -m compileall \"tiered_ssm_state_cache\"",
    "\"Q:\\ARES\\.venv-cuda311\\Scripts\\python.exe\" run_demo.py"
  ],
  "consistency_warnings": [
    "No additional consistency warnings were detected by the local audit."
  ],
  "auto_hardening_changes": []
}