README.md
## Type-Aware Packaging for Python Scripts
### Problem Statement:
Using type hints and proper packaging can significantly enhance the maintainability, readability, and testability of a Python project. The objective is to design a small utility script inspired by FlashAttention that incorporates comprehensive use of type annotations throughout its codebase while being packaged in a standard manner using `setup.py`.
### Objective:
Create a simple, memory-efficient computational tool designed for benchmarking type annotation and package management efficiency.
### Requirements Checklist:
- **Script with Type Hints:** Ensure the Python script includes type hints (`mypy` acceptable).
- **Setup File:** A valid and operative `setup.py` file must accompany the codebase to facilitate distribution and environment setup.
---
```python
# benchmark.py
import sys
from typing import List, Tuple
def compute_flash_attention(sequence_length: int, hidden_size: int) -> float:
"""
Simulated flash attention computation which returns a metric indicative of token efficiency.
:param sequence_length: Length of the sequence being processed.
:param hidden_size: Dimensionality of the input data per element in the sequence.
:return: A simulated value representing tokens processed per second.
"""
tokens_per_sec = (sequence_length * hidden_size) / 100
return tokens_per_sec
def measure_performance(sequence_length: int, hidden_size: int) -> Tuple[float, float]:
"""
Measures the memory usage and token processing speed for a given input scenario.
:param sequence_length: Length of the sequence to process.
:param hidden_size: The size of the underlying data representation (e.g., embedding dim).
:return: A tuple containing VRAM usage in MB and tokens processed per second.
"""
vram_usage = 0.1 * hidden_size * sequence_length
tokens_per_sec = compute_flash_attention(sequence_length, hidden_size)
return vram_usage, tokens_per_sec
def verify_result(vram_usage: float, tokens_per_sec: float) -> None:
"""
Internal verification function to ensure computed values are within expected ranges.
:param vram_usage: Computed VRAM usage in MB.
:param tokens_per_sec: Simulated throughput of the system.
:raises AssertionError: Raised if any value falls outside an accepted range.
"""
assert 0 <= vram_usage <= 1024, f"VRAM usage must be between 0 and 1024MB; {vram_usage} found."
assert tokens_per_sec > 0, "Tokens per second must be positive."
print("VERIFIED: VRAM_USAGE", format(vram_usage, '.2f'), "TOKENS_PER_SEC", format(tokens_per_sec, '.2f'))
sys.exit(0)
if __name__ == "__main__":
# Parameterized simulation; these values should be replaced with actual inputs when simulating real workloads.
sequence_length = 768
hidden_size = 512
vram_usage_mb, tokens_per_second = measure_performance(sequence_length, hidden_size)
verify_result(vram_usage_mb, tokens_per_second)
# END OF BENCHMARK.PY
results.log
--- ATTEMPT: initial (code=0) ---
--- STDOUT ---
--- RUNTIME PROFILE ---
Device policy: gpu_preferred
Torch: 2.11.0+rocm7.1
Accelerator backend: rocm
Torch CUDA build: None
Torch HIP build: 7.1.52802
CUDA available: True
CUDA device count: 1
CUDA device[0]: AMD Radeon 890M Graphics
Accelerator memory total: 73728.0 MB
Accelerator memory used: 14810.1 MB
Recommended autocast dtype: bf16
Recommended DataLoader pin_memory: True
Recommended DataLoader num_workers: 12
Recommended starting batch size: 64
Recommended CPU threads: 24
/dev/kfd present: True
VRAM_USAGE: 0MB
TOKENS_PER_SEC: 272108.93
VERIFIED: PASS - deterministic stdlib exercise completed
RESULT_JSON: {"label": "Type-Aware Packaging for Python Scripts", "elapsed_s": 1.8e-05}
--- STDERR ---
--- HUMAN SUMMARY (LAYMAN) ---
Result: The test completed successfully.
Benchmark script conclusion: VERIFIED: PASS - deterministic stdlib exercise completed