# Cross-Layer State Distillation (CLSD)
## Overview
Cross-Layer State Distillation (CLSD) is a model compression and optimization technique where the internal state (activations) of a "teacher" layer is used as a target for a "student" layer.
Unlike traditional Knowledge Distillation (which aligns final outputs), CLSD aligns intermediate feature maps. This forces the student layer to learn richer, more abstract representations earlier in the network, effectively "short-circuiting" the depth required to solve a problem.
## Project Scope
This package provides a minimal, dependency-free implementation of CLSD using pure Python.
**Features:**
* **Zero Dependencies:** Uses only Python standard library (`random`, `math`).
* **Custom Neural Engine:** Includes a tiny matrix/tensor library and autograd logic for the distillation task.
* **CLSD Trainer:** Logic to minimize the Mean Squared Error (MSE) between a source layer and a target layer.
## Limitations
* **Simulation Only:** This implementation is intended for educational and algorithmic demonstration. It uses synthetic random data and small network dimensions.
* **Performance:** Pure Python matrix operations are significantly slower than C-accelerated libraries like NumPy or PyTorch. Do not use for production training.
* **Scope:** The backward pass is optimized specifically for the distillation loss (MSE between layers), not a general computational graph.
## Installation
No installation required. Ensure you have Python 3.7+.
## Usage
Run the smoke test to see CLSD in action:
```bash
python run_demo.py
```
### Code Example
```python
import random
from cross_layer_state_distillation_clsd import Layer, CLSDTrainer, Tensor
# 1. Define a simple architecture
input_size = 4
hidden_size = 4
# A 'shallow' layer we want to enhance
student_layer = Layer(input_size, hidden_size)
# A 'deep' layer with rich representations to mimic
teacher_layer = Layer(input_size, hidden_size)
# Initialize Trainer
trainer = CLSDTrainer(learning_rate=0.1)
# 2. Generate synthetic input
data = Tensor([[random.random() for _ in range(input_size)]])
# 3. Get States
teacher_state = teacher_layer.forward(data)
student_state = student_layer.forward(data)
print(f"Initial Distillation Loss: {trainer.compute_loss(student_state, teacher_state):.4f}")
# 4. Distill: Update student weights to match teacher's state
trainer.distill_step(student_layer, data, teacher_state)
# 5. Verify improvement
new_student_state = student_layer.forward(data)
print(f"Post-Distillation Loss: {trainer.compute_loss(new_student_state, teacher_state):.4f}")
```
<!-- ARES_AUTO_VERIFIED_SUMMARY:START -->
## Verified Project Notes
- Package import path: `cross_layer_state_distillation_clsd`
- Entrypoint: `run_demo.py`
- Delivery mode: `prototype`
- Release tier: `prototype`
- Verification status: `PASS`
- Clean-room release gates: `NOT_RUN`
- Public exports: `CLSDTrainer, Layer, Tensor`
- Python files detected: `run_demo.py, cross_layer_state_distillation_clsd/__init__.py, cross_layer_state_distillation_clsd/core.py`
## Verification Commands
- `PASS` `"/home/corbybender/ares/.venv-linux/bin/python" -m py_compile "run_demo.py"`
- `PASS` `"/home/corbybender/ares/.venv-linux/bin/python" -m compileall "cross_layer_state_distillation_clsd"`
- `PASS` `"/home/corbybender/ares/.venv-linux/bin/python" run_demo.py`
## Current Limits
- No additional consistency warnings were detected by the local audit.
<!-- ARES_AUTO_VERIFIED_SUMMARY:END -->
| Path | Bytes |
|---|---|
| __pycache__/run_demo.cpython-314.pyc | 4171 |
| cross_layer_state_distillation_clsd/__init__.py | 224 |
| cross_layer_state_distillation_clsd/__pycache__/__init__.cpython-314.pyc | 459 |
| cross_layer_state_distillation_clsd/__pycache__/core.cpython-314.pyc | 12284 |
| cross_layer_state_distillation_clsd/core.py | 6775 |
| DESIGN_BRIEF.md | 797 |
| invention.json | 3300 |
| pyproject.toml | 371 |
| README.md | 3516 |
| run_demo.py | 3387 |
{
"id": "cross-layer-state-distillation-clsd",
"title": "Cross-Layer State Distillation (CLSD)",
"summary": "A pure-Python implementation of Cross-Layer State Distillation, a technique where internal activations (states) of a teacher layer are used to train a student layer within the same or a different network. This project implements a lightweight neural engine from scratch to demonstrate how knowledge can be compressed from deeper representations into shallower layers.",
"source": "student_autonomy",
"kind": "invention",
"path": "inventions/cross-layer-state-distillation-clsd",
"delivery_mode": "prototype",
"release_tier": "prototype",
"release_verification_status": "not_run",
"created_at": "2026-03-29 07:57:19",
"updated_at": "2026-03-29 07:58:08",
"project_entrypoint": "run_demo.py",
"smoke_test_status": "passed",
"smoke_test_output": "--- Initializing CLSD Demo --- Config: Input=5, Hidden=5, Steps=200 Input Vector: [0.99, 0.64, 0.56, 0.68, 0.84] Initial Teacher State (first 3): [0.4636, 0.5002, 0.5242] Initial Student State (first 3): [0.4955, 0.5099, 0.4559] Initial Distillation Loss (MSE): 0.001248 Starting Distillation Training... Step 50/200 - Loss: 0.000010 Step 100/200 - Loss: 0.000000 Step 150/200 - Loss: 0.000000 Step 200/200 - Loss: 0.000000 --- Results --- Final Teacher State (first 3): [0.4636, 0.5002, 0.5242] Fina",
"generated_files": 5,
"project_generated_at": "2026-03-29 07:58:07",
"source_exp_path": "experiments\\exp_self.20260308175017.011_20260308_175052",
"verification_status": "passed",
"verification_results": [
{
"command": "\"/home/corbybender/ares/.venv-linux/bin/python\" -m py_compile \"run_demo.py\"",
"passed": true,
"returncode": 0,
"timed_out": false,
"stdout_excerpt": "",
"stderr_excerpt": ""
},
{
"command": "\"/home/corbybender/ares/.venv-linux/bin/python\" -m compileall \"cross_layer_state_distillation_clsd\"",
"passed": true,
"returncode": 0,
"timed_out": false,
"stdout_excerpt": "Listing 'cross_layer_state_distillation_clsd'...",
"stderr_excerpt": ""
},
{
"command": "\"/home/corbybender/ares/.venv-linux/bin/python\" run_demo.py",
"passed": true,
"returncode": 0,
"timed_out": false,
"stdout_excerpt": "--- Initializing CLSD Demo ---\nConfig: Input=5, Hidden=5, Steps=200\n\nInput Vector: [0.99, 0.64, 0.56, 0.68, 0.84]\n\nInitial Teacher State (first 3): [0.4636, 0.5002, 0.5242]\nInitial Student State (first 3): [0.4955, 0.5099, 0.4559]\nInitial Distillation Loss (MSE): 0.001248\n\nStarting Distillation Training...\n Step 50/200 - Loss: 0.000010\n Step 100/200 - Loss: 0.000000\n Step 150/200 - Loss: 0.000000\n Step 200/200 - Loss: 0.000000\n\n--- Results ---\nFinal Teacher State (first 3): [0.4636, 0.5002, 0.5242]\nFinal Student State (first 3): [0.4636, 0.5002, 0.5242]\nFinal Distillation Loss (MSE): 0.000000\n\nSUCCESS: Student layer successfully mimicked Teacher layer state.\n\nINVENTION_SMOKE_TEST: PASS",
"stderr_excerpt": ""
}
],
"project_status": "built"
}