| Experiment / Paper | Topic / Summary | Created | Status | Error | Actions |
|---|---|---|---|---|---|
|
exp_pytrain.20260522223131.031_20260522_223248
|
Python Skill Fallback
Title: Building a Type-Aware Package with Data Validation - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-22 22:33 | Success | - | |
|
exp_pytrain.20260522213303.030_20260522_213436
|
Python Skill Fallback
Title: Creating a Type-safe Asyncio Service - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-22 21:35 | Success | - | |
|
exp_pytrain.20260522203336.029_20260522_203519
|
Python Skill Fallback
Title: Creating a Type-Annotated and Packaged Python Application - Focus: Python Standard Library, Type Annotations, Packaging with setuptools - Note: Generated fallback due to unavailable model output.
|
05-22 20:36 | Success | - | |
|
exp_pytrain.20260522193012.028_20260522_193159
|
Python Skill Fallback
Title: Creating a Python Package with Typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-22 19:33 | Success | - | |
|
exp_pytrain.20260522182955.027_20260522_183110
|
Python Skill Fallback
Title: Creating a Type-Safe and Packaged Python Tool - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-22 18:32 | Success | - | |
|
exp_pytrain.20260522172748.026_20260522_172906
|
Python Skill Fallback
Title: Packaging a Python Type-Hinted Library - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-22 17:30 | Success | - | |
|
exp_pytrain.20260522162527.025_20260522_162732
|
This Python code is a simplified validation tool targeting Python package files for type annotations and setup configura...
README.md Description: The `benchmark.py` script provides a basic benchmark test for a Python project validation tool against predefined rules focusing on typing correctness (PEP 484) and following the package specifications guidelines (PEP...
|
05-22 16:28 | Success | - | |
|
exp_pytrain.20260522152337.024_20260522_152502
|
Python Skill Fallback
Title: Type-Safe Python Package Manager - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-22 15:26 | Success | - | |
|
exp_pytrain.20260522142445.023_20260522_142646
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-22 14:27 | Success | - | |
|
exp_pytrain.20260522132826.022_20260522_132936
|
This is a benchmark for running a synthetic workload involving type annotations as per PEP 695 in a package structure. T...
To run the benchmark: 1. Ensure Python version >=3.9.0, which supports PEP 585. 2. Execute `python benchmark.py` from this directory. Expected Output: The script concludes with either **VERIFIED:** indicating a successful verification or **...
|
05-22 13:30 | Success | - | |
|
exp_pytrain.20260522122304.021_20260522_122418
|
Use type hints to create a utility function that calculates memory usage based on data size parameters. Ensure the imple...
Benchmark the runtime performance and report results clearly as required including a PASS/FAIL statement. The self-checks should include various edge cases like null input types and boundary values ensuring reliability. ```python import sys...
|
05-22 12:25 | Success | - | |
|
exp_pytrain.20260522112344.020_20260522_112512
|
Python Skill Fallback
Title: Package FlashAttention with Typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-22 11:26 | Success | - | |
|
exp_pytrain.20260522102248.019_20260522_102412
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-22 10:25 | Success | - | |
|
exp_pytrain.20260522092345.018_20260522_092522
|
Python Skill Fallback
Title: Building a Type-Checked Logging Package - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-22 09:26 | Success | - | |
|
exp_pytrain.20260522082116.017_20260522_082236
|
Python Skill Fallback
Title: Automated Python Package Version Checker - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-22 08:23 | Success | - | |
|
exp_pytrain.20260522072222.016_20260522_072346
|
Python Skill Fallback
Title: Create a Robust Package Manager - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-22 07:24 | Success | - | |
|
exp_pytrain.20260522062612.015_20260522_062749
|
Python Skill Fallback
Title: Enhance Functionality with Type Hinting and Packaging - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-22 06:28 | Success | - | |
|
exp_pytrain.20260522052423.014_20260522_052554
|
Python Skill Fallback
Title: Type-safe Python Packaging - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-22 05:26 | Success | - | |
|
exp_pytrain.20260522042548.013_20260522_042655
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-22 04:27 | Success | - | |
|
exp_pytrain.20260522033035.012_20260522_033208
|
Introduction
The 'simple_calculator' Python package is designed to perform basic mathematical operations such as addition, subtraction, multiplication, and division with robust type annotations for enhanced maintainability and testability. This reposito...
|
05-22 03:33 | Success | - | |
|
exp_pytrain.20260522023152.011_20260522_023333
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-22 02:34 | Success | - | |
|
exp_pytrain.20260522013350.010_20260522_013454
|
Python Skill Fallback
Title: Asynchronous Function with Type Hints - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-22 01:35 | Success | - | |
|
exp_pytrain.20260522003827.009_20260522_003957
|
Python Skill Fallback
Title: Build and Test an Autodoc Module with Type Hints - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-22 00:40 | Success | - | |
|
exp_pytrain.20260521233708.008_20260521_233844
|
Python Skill Fallback
Title: Creating a Typable Python Library with a Setup Script - Focus: typing, package_management - Note: Generated fallback due to unavailable model output.
|
05-21 23:39 | Success | - | |
|
exp_pytrain.20260521223747.007_20260521_223859
|
Type-Safe Tensor Operations and Package Distribution
Introduction: The 'tensor_ops' Python package provides a type-safe interface for basic tensor operations using PyTorch, aimed at improving maintainability, testability, and robustness. This package includes unit tests and documentation, ens...
|
05-21 22:40 | Success | - | |
|
exp_pytrain.20260521213524.006_20260521_213704
|
Python Skill Fallback
Title: Creating a Python Package with Type Annotations - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-21 21:38 | Success | - | |
|
exp_pytrain.20260521203204.005_20260521_203354
|
Python Skill Fallback
Title: Type Hinted Package Manager - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-21 20:34 | Success | - | |
|
exp_pytrain.20260521193422.004_20260521_193627
|
Python Skill Fallback
Title: Building a Type-Safe Package Manager - Focus: {'name': 'Type Hints', 'details': ['Use, {'name': 'Packaging Standards', 'details - Note: Generated fallback due to unavailable model output.
|
05-21 19:37 | Success | - | |
|
exp_pytrain.20260521183552.003_20260521_183731
|
Python Skill Fallback
Title: Build an Asynchronous Python Package with Type Annotations - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-21 18:38 | Success | - | |
|
exp_pytrain.20260521173557.002_20260521_173707
|
Building a Type-Aware Package with Packaging Utilities
Introduction: This exercise involves creating a Python package that utilizes type hints as per PEP 695 standards and modern packaging techniques such as poetry. The primary goal is to ensure robustness and maintainability through static typ...
|
05-21 17:38 | Success | - | |
|
exp_pytrain.20260521163544.001_20260521_163653
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-21 16:37 | Success | - | |
|
exp_pytrain.20260521155830.001_20260521_160020
|
The objective is to design a Python module that leverages advanced features provided by the `typing` library, including...
Readme The objective is to design a Python module that leverages advanced features provided by the `typing` library, including generics and callable objects. The module shall be packaged into a standalone package ensuring all modules functi...
|
05-21 16:00 | Pending | - | |
|
exp_pytrain.20260521145507.050_20260521_145625
|
Asynchronous Task Executor with Type Annotations
This Python coding drill benchmarks the performance of an asynchronous task executor that uses type hints for improved readability, maintainability, and code robustness. Problem Description Create a module `async_task_manager.py` which shou...
|
05-21 14:57 | Success | - | |
|
exp_pytrain.20260521134739.049_20260521_134904
|
Python Skill Fallback
Title: Create Typing-Aware Package with PyTorch - Focus: Python typing module and mypy static typ, Integration of third-party stub files fo - Note: Generated fallback due to unavailable model output.
|
05-21 13:50 | Success | - | |
|
exp_pytrain.20260521124119.048_20260521_124243
|
The `math_operations` Python package is designed to provide a robust framework for performing basic and advanced mathema...
Installation You can install the package using pip: Requirements - Python >= 3.6 for proper type hinting support. - Familiarity with PEP 484 and dynamic typing in Python. Contributing Guidelines Contributions are welcome! Please ensure all...
|
05-21 12:43 | Success | - | |
|
exp_pytrain.20260521113640.047_20260521_113839
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-21 11:39 | Success | - | |
|
exp_pytrain.20260521103200.046_20260521_103345
|
Python Skill Fallback
Title: Create a Python Package with Type Checking - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-21 10:34 | Success | - | |
|
exp_pytrain.20260521091947.045_20260521_092105
|
Python Skill Fallback
Title: Creating a Python Package with Type Hints - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-21 09:22 | Success | - | |
|
exp_pytrain.20260521081346.044_20260521_081525
|
Python Skill Fallback
Title: Type-Safe Module Loader - Focus: {'topic_name': 'Type Hinting', 'descript, {'topic_name': 'Packaging', 'description - Note: Generated fallback due to unavailable model output.
|
05-21 08:16 | Success | - | |
|
exp_pytrain.20260521071110.043_20260521_071243
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-21 07:13 | Success | - | |
|
exp_pytrain.20260521060645.042_20260521_060823
|
Python Skill Fallback
Title: Type Parameter Syntax in Package - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-21 06:09 | Success | - | |
|
exp_pytrain.20260521050130.041_20260521_050322
|
Python Skill Fallback
Title: Type-Aware Package Distribution - Focus: typing.NewType for creating distinct typ, dataclasses, namedtuples, or simple clas, PEP 484 guidelines for type hinting synt, using typing.FileIO and other I/O types, constructing a setup.py t...
|
05-21 05:04 | Success | - | |
|
exp_pytrain.20260521040141.040_20260521_040327
|
Type-Aware Packaging for Python Scripts
Problem Statement: Using type hints and proper packaging can significantly enhance the maintainability, readability, and testability of a Python project. The objective is to design a small utility script inspired by FlashAttention that inco...
|
05-21 04:04 | Success | - | |
|
exp_pytrain.20260521025909.039_20260521_030110
|
Packaging a Python Project with Type Annotations
**Goal**: Create a complete Python project that includes setup for packaging, type annotations using mypy types, and ensures all modules are testable via pytest. Requirements: - `pytest` for testing. - Installed Python >= 3.7 (to support ty...
|
05-21 03:02 | Success | - | |
|
exp_pytrain.20260521015653.038_20260521_015817
|
Python Skill Fallback
Title: Type Annotated Package Manager - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-21 01:59 | Success | - | |
|
exp_pytrain.20260521005235.037_20260521_005349
|
Python Skill Fallback
Title: Create a Python Package with Type Hints - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-21 00:54 | Success | - | |
|
exp_pytrain.20260520235029.036_20260520_235148
|
Python Skill Fallback
Title: Python Package with Type Annotations - Focus: Python typing library (PEP 483/484), packaging a Python module with setup too - Note: Generated fallback due to unavailable model output.
|
05-20 23:52 | Success | - | |
|
exp_pytrain.20260520224840.035_20260520_224955
|
Building a Robust Typing and Packaging System for a Python Module
Objective: Write robust, reusable Python code that includes comprehensive type annotations following the PEP 484 guidelines. Ensure that the module is properly organized and packaged using `python setup.py` or similar packaging tools. Metri...
|
05-20 22:50 | Success | - | |
|
exp_pytrain.20260520214610.034_20260520_214746
|
Python Skill Fallback
Title: Build a Robust Python Project - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-20 21:48 | Success | - | |
|
exp_pytrain.20260520204029.033_20260520_204225
|
Python Skill Fallback
Title: Module Packaging and Type Checking - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-20 20:43 | Success | - | |
|
exp_pytrain.20260520193832.032_20260520_193946
|
Python Skill Fallback
Title: Creating a Python Package with Type Annotations - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-20 19:40 | Success | - | |
|
exp_pytrain.20260520183336.031_20260520_183556
|
Python Skill Fallback
Title: Creating a Type-Safe CLI Tool - Focus: {'topic': 'typing', 'description': "Use, {'topic': 'packaging', 'description': "B - Note: Generated fallback due to unavailable model output.
|
05-20 18:36 | Success | - | |
|
exp_pytrain.20260520172436.030_20260520_172657
|
Python Skill Fallback
Title: Creating a Type-safe, Asynchronous Task Manager - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-20 17:27 | Success | - | |
|
exp_pytrain.20260520162348.029_20260520_162441
|
Python Skill Fallback
Title: Creating a Type-Safe Packaging Utility - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-20 16:25 | Success | - | |
|
exp_pytrain.20260520152324.028_20260520_152501
|
Python Skill Fallback
Title: Packaging a Typing-Friendly Python App - Focus: {'topic': 'typing', 'resources': ['https, {'topic': 'packaging', 'resources': ['ht - Note: Generated fallback due to unavailable model output.
|
05-20 15:26 | Success | - | |
|
exp_pytrain.20260520141842.027_20260520_142015
|
Python Skill Fallback
Title: Package a Python Project with Type Annotations - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-20 14:21 | Success | - | |
|
exp_pytrain.20260520131642.026_20260520_131804
|
Python Skill Fallback
Title: Creating a Python Package with Typed Data Classes - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-20 13:19 | Success | - | |
|
exp_pytrain.20260520120941.025_20260520_121202
|
Python Skill Fallback
Title: Python Package Enhancer - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-20 12:13 | Success | - | |
|
exp_pytrain.20260520110253.024_20260520_110431
|
Python Skill Fallback
Title: Building a Basic Python Package with Type Annotations - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-20 11:05 | Success | - | |
|
exp_pytrain.20260520100410.023_20260520_100539
|
Python Skill Fallback
Title: Packaging Asynchronous Python Application - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-20 10:06 | Success | - | |
|
exp_pytrain.20260520090106.022_20260520_090222
|
Python Skill Fallback
Title: Type Annotations for Package Initialization - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-20 09:03 | Success | - | |
|
exp_pytrain.20260520075854.021_20260520_080009
|
Python Skill Fallback
Title: Creating a Typing-Aware Package - Focus: Python stdlib.typing, Pep484 - Type Hints, Python Packaging User Guide - Note: Generated fallback due to unavailable model output.
|
05-20 08:01 | Success | - | |
|
exp_pytrain.20260520065312.020_20260520_065440
|
Python Skill Fallback
Title: Create a Python Package for Robust Numerical Computation - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-20 06:55 | Success | - | |
|
exp_pytrain.20260520054850.019_20260520_055008
|
Python Skill Fallback
Title: Develop a Python Package with Type Annotations and Packaging Standards - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-20 05:51 | Success | - | |
|
exp_pytrain.20260520044911.018_20260520_045049
|
This drill focuses on implementing a utility that heavily leverages Python's type system. It emphasizes reliability thro...
Performance benchmarking involves measuring execution speed and memory usage while ensuring the code operates correctly even with unconventional or extreme inputs. README.md Python Reliability Drill: Typing Implemented a type-safe Python ut...
|
05-20 04:51 | Success | - | |
|
exp_pytrain.20260520034229.017_20260520_034428
|
Python Skill Fallback
Title: Type-annotated Python Package for Handling Files - Focus: Python stdlib, typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-20 03:45 | Success | - | |
|
exp_pytrain.20260520023518.016_20260520_023706
|
Python Skill Fallback
Title: Creating a Robust Configuration Handler - Focus: {'description': "Use Python's typing fea, {'description': 'Learn how to properly p - Note: Generated fallback due to unavailable model output.
|
05-20 02:38 | Success | - | |
|
exp_pytrain.20260520013200.015_20260520_013330
|
Python Skill Fallback
Title: Construct a Type-Full CLI Tool - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-20 01:34 | Success | - | |
|
exp_pytrain.20260520002712.014_20260520_002840
|
Python Skill Fallback
Title: Creating a Python Package for Type Checking - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-20 00:29 | Success | - | |
|
exp_pytrain.20260519232636.013_20260519_232747
|
Python Skill Fallback
Title: Build and Test a Python Package - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-19 23:28 | Success | - | |
|
exp_pytrain.20260519221956.012_20260519_222159
|
This Python coding drill benchmark aims to develop a type-safe package for text analysis functionalities such as tokeniz...
Setup Instructions Before you start: 1. Clone the repository or download it. 2. Make sure Python 3.x is installed on your system. 3. The benchmark does not require any external dependencies beyond Python's standard library. Goal Create a ru...
|
05-19 22:23 | Success | - | |
|
exp_pytrain.20260519211602.011_20260519_211808
|
Python Skill Fallback
Title: Python Module Packaging with Type Annotations - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-19 21:19 | Success | - | |
|
exp_pytrain.20260519201018.010_20260519_201134
|
Python Skill Fallback
Title: Creating an Asynchronous Package for Logging - Focus: {'topic': 'typing', 'description': "Use, {'topic': 'packaging', 'description': 'S - Note: Generated fallback due to unavailable model output.
|
05-19 20:12 | Success | - | |
|
exp_pytrain.20260519190707.009_20260519_190827
|
Python Skill Fallback
Title: Creating a Robust Typing Package - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-19 19:09 | Success | - | |
|
exp_pytrain.20260519175827.008_20260519_180029
|
Python Skill Fallback
Title: Creating a Python Package with Advanced Typings - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-19 18:01 | Success | - | |
|
exp_pytrain.20260519165353.007_20260519_165519
|
This benchmark is a Python coding drill that assesses reliable and robust utility implementation focusing on typing feat...
To execute this benchmark, follow these steps: 1. Ensure your environment meets Python's standard library requirements. 2. Clone or download the script `benchmark.py`. 3. Run the benchmark by executing `python benchmark.py` in your terminal...
|
05-19 16:56 | Success | - | |
|
exp_pytrain.20260519155303.006_20260519_155420
|
Python Skill Fallback
Title: Creating a Python Package with Typed Function Definitions - Focus: type hinting, module design, unit testing with hypothesis or pytest, creating packaging for Python scripts - Note: Generated fallback due to unavailable model output.
|
05-19 15:55 | Success | - | |
|
exp_pytrain.20260519145335.005_20260519_145459
|
Python Skill Fallback
Title: Creating a Robust Python Package with Type Annotations - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-19 14:56 | Success | - | |
|
exp_pytrain.20260519135420.004_20260519_135654
|
Python Skill Fallback
Title: Type-annotated CLI Tool - Focus: {'topic': 'type hinting', 'details': 'An, {'topic': 'argparse', 'details': 'Use ar, {'topic': 'setuptools', 'details': 'Pack - Note: Generated fallback due to unavailable model output.
|
05-19 13:57 | Success | - | |
|
exp_pytrain.20260519125436.003_20260519_125608
|
Python Skill Fallback
Title: Building a Type-Safe and Packagable Async Scraper - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-19 12:57 | Success | - | |
|
exp_pytrain.20260519115336.002_20260519_115509
|
Python Skill Fallback
Title: Generic Function with Constraint - Focus: type parameter syntax, parameter constraints, package creation - Note: Generated fallback due to unavailable model output.
|
05-19 11:56 | Success | - | |
|
exp_pytrain.20260519105116.001_20260519_105257
|
Python Skill Fallback
Title: Creating a Robust CLI Tool with Typing and Packaging - Focus: typing.Type, packaging.setup - Note: Generated fallback due to unavailable model output.
|
05-19 10:53 | Success | - | |
|
exp_pytrain.20260519085632.001_20260519_085836
|
Python Skill Fallback
Title: Building a Typing Compliant Python Package - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-19 08:59 | Success | - | |
|
exp_pytrain.20260519071652.016_20260519_071816
|
Python Skill Fallback
Title: Creating a Robust Library with Type Hints - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-19 07:19 | Success | - | |
|
exp_pytrain.20260519061847.015_20260519_062010
|
Python Skill Fallback
Title: Build a Typing and Packaging Benchmark for Python - Focus: PEP 484 (Type Hints), PEP 695 (Type Parameter Syntax), Python Packaging, Mypy Linting Tool - Note: Generated fallback due to unavailable model output.
|
05-19 06:21 | Success | - | |
|
exp_pytrain.20260519051740.014_20260519_051906
|
Python Skill Fallback
Title: Creating a Robust Python Library - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-19 05:20 | Success | - | |
|
exp_pytrain.20260519041657.013_20260519_041803
|
Python Skill Fallback
Title: Creating a Robust Python Package for FlashAttention Implementation - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-19 04:19 | Success | - | |
|
exp_pytrain.20260519031621.012_20260519_031841
|
Python Skill Fallback
Title: Python Package with Type Annotations - Focus: Python Standard Library, Package Management with pip/setuptools/d, Type Annotations in Python, Static Type Checking with mypy - Note: Generated fallback due to unavailable model output.
|
05-19 03:19 | Success | - | |
|
exp_pytrain.20260519020958.011_20260519_021200
|
This directory contains a Python CLI application named `notes_app.py` that helps manage notes stored in JSON files. The...
Features include: - Adding notes with title and content. - Listing all notes. - Deleting a specified note. Ensure you run `./notes_app.py --help` for details on each command usage. This application is designed to be compliant with the provi...
|
05-19 02:13 | Success | - | |
|
exp_pytrain.20260519010522.010_20260519_010701
|
Python Skill Fallback
Title: Asynchronous Webhook Handler with Type Annotations - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-19 01:08 | Success | - | |
|
exp_pytrain.20260519000214.009_20260519_000353
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-19 00:04 | Success | - | |
|
exp_pytrain.20260518230421.008_20260518_230525
|
Python Skill Fallback
Title: Creating a Robust Python Package with Type Annotations - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-18 23:06 | Success | - | |
|
exp_pytrain.20260518215627.007_20260518_215828
|
Python Skill Fallback
Title: Type-Driven Development and Packaging for a Calculator Application - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-18 21:59 | Success | - | |
|
exp_pytrain.20260518205357.006_20260518_205610
|
Experiment Benchmark
This experiment contains a runnable benchmark generated by ARES. Files - `benchmark.py`: main benchmark entrypoint - `results.log`: captured runtime output after execution Run Expected Output - `VRAM_USAGE: <value>MB` - `TOKENS_PER_SEC: <va...
|
05-18 20:57 | Success | - | |
|
exp_pytrain.20260518195229.005_20260518_195342
|
Python Skill Fallback
Title: Type-Checked Python Package Generator - Focus: Python typing, Packaging Python projects - Note: Generated fallback due to unavailable model output.
|
05-18 19:54 | Success | - | |
|
exp_pytrain.20260518184641.004_20260518_184805
|
Python Skill Fallback
Title: Building a Configurable Python Module with Typing Enhancements - Focus: {'topic_name': 'Type Hints', 'details':, {'topic_name': 'Python Packaging', 'deta - Note: Generated fallback due to unavailable model output.
|
05-18 18:49 | Success | - | |
|
exp_pytrain.20260518174539.003_20260518_174730
|
Python Skill Fallback
Title: Type-Safe Async Package Manager - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-18 17:48 | Success | - | |
|
exp_pytrain.20260518164552.002_20260518_164730
|
Python Skill Fallback
Title: Type-Enhanced Packaging Tools - Focus: {'name': 'Typing', 'details': ['Advanced, {'name': 'Packaging', 'details': ['Creat - Note: Generated fallback due to unavailable model output.
|
05-18 16:48 | Success | - | |
|
exp_pytrain.20260518153103.001_20260518_153225
|
Python Skill Fallback
Title: Creating a Reusable Data Validation Library - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-18 15:33 | Success | - | |
|
exp_pytrain.20260518140724.001_20260518_140855
|
Python Skill Fallback
Title: Develop a Robust Package with PyPI Support - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-18 14:09 | Success | - | |
|
exp_pytrain.20260518132244.002_20260518_132313
|
Here's the code for the benchmark:
No summary available yet.
|
05-18 13:24 | Success | - | |
|
exp_hf_2605.14786_20260518_131207
|
Known By Their Actions: Fingerprinting LLM Browser Agents via UI Traces
Paper ID: hf_2605.14786 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-18 13:13 | Success | - | |
|
exp_pytrain.20260518124827.001_20260518_124856
|
**Autonomous Coding Drill: Robust Typing and Packaging**
========================================================== Section 1: README.md **Section 2: benchmark.py** ```python import time from typing import Optional, Union def check_empty_string(s: str) -> bool: if not s: return True # Assuming an...
|
05-18 12:49 | Success | - | |
|
exp_self.20260518120617.003_20260518_120618
|
Student hypothesis: ssm_mamba + throughput_optimization co-design
Paper ID: self.20260518120617.003 - Hypothesis: Combining ssm_mamba + throughput_optimization + distillation will improve throughput or memory efficiency without breaking 8GB execution. - Plan: Create a compact comparative benchmark against...
|
05-18 12:06 | Success | - | |
|
exp_self.20260518120006.002_20260518_120006
|
Student hypothesis: linear + ssm_mamba co-design
Paper ID: self.20260518120006.002 - Hypothesis: Combining linear + ssm_mamba will improve throughput or memory efficiency without breaking 8GB execution. - Plan: Create a compact comparative benchmark against a simple baseline, measure VRAM...
|
05-18 12:00 | Success | - | |
|
exp_self.20260518115355.001_20260518_115356
|
Student hypothesis: ssm + linear co-design
Paper ID: self.20260518115355.001 - Hypothesis: Combining ssm + linear + ssm_mamba will improve throughput or memory efficiency without breaking 8GB execution. - Plan: Create a compact comparative benchmark against a simple baseline, measur...
|
05-18 11:53 | Success | - | |
|
exp_pytrain.20260518115245.001_20260518_115245
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-18 11:52 | Success | - | |
|
exp_self.20260518114305.014_20260518_114305
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260518114305.014 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-18 11:43 | Success | - | |
|
exp_self.20260518113637.013_20260518_113637
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260518113637.013 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-18 11:36 | Success | - | |
|
exp_self.20260518112917.012_20260518_112917
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260518112917.012 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-18 11:29 | Success | - | |
|
exp_pytrain.20260518112307.005_20260518_112307
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-18 11:23 | Success | - | |
|
exp_self.20260518112201.011_20260518_112201
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260518112201.011 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-18 11:22 | Success | - | |
|
exp_self.20260518111517.010_20260518_111518
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260518111517.010 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-18 11:15 | Success | - | |
|
exp_self.20260518110844.009_20260518_110844
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260518110844.009 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-18 11:08 | Success | - | |
|
exp_self.20260518110234.008_20260518_110234
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260518110234.008 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-18 11:02 | Success | - | |
|
exp_self.20260518105559.007_20260518_105559
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260518105559.007 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-18 10:56 | Success | - | |
|
exp_pytrain.20260518105207.004_20260518_105207
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-18 10:52 | Success | - | |
|
exp_self.20260518104957.006_20260518_104958
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260518104957.006 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-18 10:49 | Success | - | |
|
exp_self.20260518104323.005_20260518_104323
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260518104323.005 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-18 10:43 | Success | - | |
|
exp_hf_2506.01015_20260518_104101
|
AuralSAM2: Enabling SAM2 Hear Through Pyramid Audio-Visual Feature Prompting
Paper ID: hf_2506.01015 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-18 10:41 | Success | - | |
|
exp_self.20260518103629.004_20260518_103630
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260518103629.004 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-18 10:36 | Success | - | |
|
exp_oa_W7161354235_20260518_103153
|
Negation Neglect: When models fail to learn negations in training
Paper ID: oa_W7161354235 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-18 10:31 | Success | - | |
|
exp_self.20260518103045.003_20260518_103046
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260518103045.003 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-18 10:30 | Success | - | |
|
exp_self.20260518102409.002_20260518_102409
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260518102409.002 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-18 10:24 | Success | - | |
|
exp_oa_W7161354484_20260518_102212
|
Not Just RLHF: Why Alignment Alone Won't Fix Multi-Agent Sycophancy
Paper ID: oa_W7161354484 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-18 10:22 | Success | - | |
|
exp_pytrain.20260518102105.003_20260518_102105
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-18 10:21 | Success | - | |
|
exp_cr_10.1177_13621688261449335_20260518_101917
|
From Strategy Awareness to Engagement: Self-Regulated Learning Strategies-Based Writing Instruction in L2 Essay Developm...
Paper ID: cr_10.1177_13621688261449335 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recov...
|
05-18 10:19 | Success | - | |
|
exp_hf_2605.15597_20260518_101748
|
CM-EVS: Sparse Panoramic RGB-D-Pose Data for Complete Scene Coverage
Paper ID: hf_2605.15597 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-18 10:17 | Success | - | |
|
exp_cr_10.56726_irjmets96431_20260518_101627
|
HIERARCHALIGN: LINEAR-COMPLEXITY CROSS-MODAL ATTENTION WITH RLHF FOR HUMAN-ALIGNED MULTI-MODAL LARGE LANGUAGE MODELS
Paper ID: cr_10.56726_irjmets96431 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-18 10:16 | Success | - | |
|
exp_hf_2605.15138_20260518_101430
|
Forgetting That Sticks: Quantization-Permanent Unlearning via Circuit Attribution
Paper ID: hf_2605.15138 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-18 10:14 | Success | - | |
|
exp_cr_10.54254_2753-8818_2026.33701_20260518_101143
|
Large Language Models in Mental Health: An Investigation of Prompt-Based Approaches, Fine-Tuning and Domain Adaptation,...
Paper ID: cr_10.54254_2753-8818_2026.33701 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: R...
|
05-18 10:11 | Success | - | |
|
exp_self.20260518101034.001_20260518_101035
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260518101034.001 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-18 10:10 | Success | - | |
|
exp_2303.15564v3_20260518_100854
|
Backfill Candidate 2303.15564v3
Fallback synthesis: Mask and Restore: Blind Backdoor Defense at Test Time with Masked Autoencoder. Key signals: rag.
|
05-18 10:08 | Success | - | |
|
exp_cr_10.3390_electronics12183925_20260518_100836
|
Backfill Candidate cr_10.3390_electronics12183925
Fallback synthesis: Multi-Phase Focused PID Adaptive Tuning with Reinforcement Learning. Key signals: rag.
|
05-18 10:08 | Success | - | |
|
exp_cr_10.51574_ijrer.v5i1.4200_20260518_100818
|
Backfill Candidate cr_10.51574_ijrer.v5i1.4200
Fallback synthesis: Think of Pair Share Learning Model on Student Learning Activity in Science Subjects at State Elementary Madrasah. Key signals: rag.
|
05-18 10:08 | Success | - | |
|
exp_2512.15753v1_20260518_100730
|
Backfill Candidate 2512.15753v1
Fallback synthesis: TAO-Net: Two-stage Adaptive OOD Classification Network for Fine-grained Encrypted Traffic Classification. Key signals: rag.
|
05-18 10:07 | Success | - | |
|
exp_2204.00598v2_20260518_100712
|
Backfill Candidate 2204.00598v2
Fallback synthesis: Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language. Key signals: retrieval, rag.
|
05-18 10:07 | Success | - | |
|
exp_2303.15604v2_20260518_100653
|
Backfill Candidate 2303.15604v2
Fallback synthesis: HD-Bind: Encoding of Molecular Structure with Low Precision, Hyperdimensional Binary Representations. Key signals: inference, rag.
|
05-18 10:06 | Success | - | |
|
exp_2303.15595v2_20260518_100635
|
Backfill Candidate 2303.15595v2
Fallback synthesis: Bi-Encoder Cascades for Efficient Image Search. Key signals: retrieval, rag.
|
05-18 10:06 | Success | - | |
|
exp_2310.03754v1_20260518_100616
|
Backfill Candidate 2310.03754v1
Fallback synthesis: EMGTFNet: Fuzzy Vision Transformer to decode Upperlimb sEMG signals for Hand Gestures Recognition. Key signals: sparse, rag.
|
05-18 10:06 | Success | - | |
|
exp_2406.13847v1_20260518_100558
|
Backfill Candidate 2406.13847v1
Fallback synthesis: Locating and measuring marine aquaculture production from space: a computer vision approach in the French Mediterranean. Key signals: sparse, rag.
|
05-18 10:06 | Success | - | |
|
exp_cr_10.1158_1538-7445.pancreatic24-b066_20260518_100540
|
Backfill Candidate cr_10.1158_1538-7445.pancreatic24-b066
Fallback synthesis: Abstract B066: An AI approach to unraveling treatment response in pancreatic cancer: Insights from the COMPASS trial leveraging large language models (LLMs). Key signals: retrieval, rag.
|
05-18 10:05 | Success | - | |
|
exp_2412.12324v1_20260518_100522
|
Backfill Candidate 2412.12324v1
Fallback synthesis: F-RBA: A Federated Learning-based Framework for Risk-based Authentication. Key signals: ssm, rag.
|
05-18 10:05 | Success | - | |
|
exp_2506.12568v1_20260518_100504
|
Backfill Candidate 2506.12568v1
Fallback synthesis: MVP-CBM:Multi-layer Visual Preference-enhanced Concept Bottleneck Model for Explainable Medical Image Classification. Key signals: sparse, rag.
|
05-18 10:05 | Success | - | |
|
exp_cr_10.5539_elt.v18n7p15_20260518_100446
|
Backfill Candidate cr_10.5539_elt.v18n7p15
Fallback synthesis: Enhancing College English Education in China With AI: A Teacher-AI-Student Triad Model. Key signals: context, rag.
|
05-18 10:04 | Success | - | |
|
exp_2512.11057v1_20260518_100428
|
Backfill Candidate 2512.11057v1
Fallback synthesis: Weakly Supervised Tuberculosis Localization in Chest X-rays through Knowledge Distillation. Key signals: rag.
|
05-18 10:04 | Success | - | |
|
exp_2512.11147v1_20260518_100408
|
MiniScope: A Least Privilege Framework for Authorizing Tool Calling Agents
Fallback synthesis: MiniScope: A Least Privilege Framework for Authorizing Tool Calling Agents. No strong keyword signals detected.
|
05-18 10:04 | Success | - | |
|
exp_2506.12594v1_20260518_100319
|
A Comprehensive Survey of Deep Research: Systems, Methodologies, and Applications
Fallback synthesis: A Comprehensive Survey of Deep Research: Systems, Methodologies, and Applications. Key signals: retrieval.
|
05-18 10:03 | Success | - | |
|
exp_2506.12617v3_20260518_100301
|
Evaluating AI Alignment in Eleven LLMs through Output-Based Analysis and Human Benchmarking
Fallback synthesis: Evaluating AI Alignment in Eleven LLMs through Output-Based Analysis and Human Benchmarking. No strong keyword signals detected.
|
05-18 10:03 | Success | - | |
|
exp_2412.12351v2_20260518_100243
|
Krony-PT: GPT2 compressed with Kronecker Products
Fallback synthesis: Krony-PT: GPT2 compressed with Kronecker Products. No strong keyword signals detected.
|
05-18 10:02 | Success | - | |
|
exp_2303.15621v2_20260518_100225
|
ChatGPT as a Factual Inconsistency Evaluator for Text Summarization
Fallback synthesis: ChatGPT as a Factual Inconsistency Evaluator for Text Summarization. Key signals: inference.
|
05-18 10:02 | Success | - | |
|
exp_oa_W7124118447_20260518_100206
|
Lost in the Noise: How Reasoning Models Fail with Contextual Distractors
Fallback synthesis: Lost in the Noise: How Reasoning Models Fail with Contextual Distractors. Key signals: context, rag.
|
05-18 10:02 | Success | - | |
|
exp_oa_W7131864980_20260518_100148
|
EcoRL-Sched: Energy-Aware Heterogeneous GPU–FPGA Task Scheduling for Sustainable RLHF Training Pipelines
Fallback synthesis: EcoRL-Sched: Energy-Aware Heterogeneous GPU–FPGA Task Scheduling for Sustainable RLHF Training Pipelines. Key signals: inference.
|
05-18 10:01 | Success | - | |
|
exp_oa_W7133571298_20260518_100130
|
Density-Guided Response Optimization: Community-Grounded Alignment via Implicit Acceptance Signals
Fallback synthesis: Density-Guided Response Optimization: Community-Grounded Alignment via Implicit Acceptance Signals. Key signals: sparse, context, grounded.
|
05-18 10:01 | Success | - | |
|
exp_oa_W7134860682_20260518_100112
|
DARC: Disagreement-Aware Alignment via Risk-Constrained Decoding
Fallback synthesis: DARC: Disagreement-Aware Alignment via Risk-Constrained Decoding. Key signals: inference, rag, rerank.
|
05-18 10:01 | Success | - | |
|
exp_2512.10955v2_20260518_100054
|
Omni-Attribute: Open-vocabulary Attribute Encoder for Visual Concept Personalization
Fallback synthesis: Omni-Attribute: Open-vocabulary Attribute Encoder for Visual Concept Personalization. Key signals: context, retrieval, embedding.
|
05-18 10:00 | Success | - | |
|
exp_2512.11099v1_20260518_100036
|
VGent: Visual Grounding via Modular Design for Disentangling Reasoning and Prediction
Fallback synthesis: VGent: Visual Grounding via Modular Design for Disentangling Reasoning and Prediction. Key signals: inference, rag.
|
05-18 10:00 | Success | - | |
|
exp_cr_10.3390_agriculture15242569_20260518_100018
|
Smart Irrigation Scheduling for Crop Production Using a Crop Model and Improved Deep Reinforcement Learning
Fallback synthesis: Smart Irrigation Scheduling for Crop Production Using a Crop Model and Improved Deep Reinforcement Learning. Key signals: memory.
|
05-18 10:00 | Success | - | |
|
exp_2506.12576v2_20260518_100000
|
Enabling Precise Topic Alignment in Large Language Models Via Sparse Autoencoders
Fallback synthesis: Enabling Precise Topic Alignment in Large Language Models Via Sparse Autoencoders. Key signals: sparse, inference, rag.
|
05-18 10:00 | Success | - | |
|
exp_2506.12606v2_20260518_095913
|
An Exploration of Mamba for Speech Self-Supervised Models
Fallback synthesis: An Exploration of Mamba for Speech Self-Supervised Models. Key signals: linear, context, rag.
|
05-18 09:59 | Success | - | |
|
exp_2506.13814v1_20260518_095854
|
ReFrame: Layer Caching for Accelerated Inference in Real-Time Rendering
Fallback synthesis: ReFrame: Layer Caching for Accelerated Inference in Real-Time Rendering. Key signals: inference, rag.
|
05-18 09:58 | Success | - | |
|
exp_2506.17285v1_20260518_095836
|
A Framework for Generating Conversational Recommendation Datasets from Behavioral Interactions
Fallback synthesis: A Framework for Generating Conversational Recommendation Datasets from Behavioral Interactions. Key signals: context, grounded.
|
05-18 09:58 | Success | - | |
|
exp_core_297420785_20260518_095818
|
Towards Principled Training and Serving of Large Language Models
Fallback synthesis: Towards Principled Training and Serving of Large Language Models. Key signals: inference.
|
05-18 09:58 | Success | - | |
|
exp_2412.12409v1_20260518_095800
|
Improving Cooperation in Language Games with Bayesian Inference and the Cognitive Hierarchy
Fallback synthesis: Improving Cooperation in Language Games with Bayesian Inference and the Cognitive Hierarchy. Key signals: inference, rag, embedding.
|
05-18 09:58 | Success | - | |
|
exp_2406.13809v1_20260518_095741
|
Towards Holistic Language-video Representation: the language model-enhanced MSR-Video to Text Dataset
Fallback synthesis: Towards Holistic Language-video Representation: the language model-enhanced MSR-Video to Text Dataset. Key signals: context, retrieval.
|
05-18 09:57 | Success | - | |
|
exp_2406.13858v1_20260518_095723
|
Distributional reasoning in LLMs: Parallel reasoning processes in multi-hop reasoning
Fallback synthesis: Distributional reasoning in LLMs: Parallel reasoning processes in multi-hop reasoning. Key signals: linear, inference, embedding.
|
05-18 09:57 | Success | - | |
|
exp_2406.13885v1_20260518_095705
|
Knowledge Tagging System on Math Questions via LLMs with Flexible Demonstration Retriever
Fallback synthesis: Knowledge Tagging System on Math Questions via LLMs with Flexible Demonstration Retriever. Key signals: context, embedding.
|
05-18 09:57 | Success | - | |
|
exp_cr_10.3390_app131810379_20260518_095647
|
Novel Paintings from the Latent Diffusion Model through Transfer Learning
Fallback synthesis: Novel Paintings from the Latent Diffusion Model through Transfer Learning. Key signals: context, memory.
|
05-18 09:56 | Success | - | |
|
exp_cr_10.47689_stars.university-pp276-279_20260518_095628
|
Integrating pragmatic competence to english language classes
Fallback synthesis: Integrating pragmatic competence to english language classes. Key signals: context, rag.
|
05-18 09:56 | Success | - | |
|
exp_2303.15569v1_20260518_095610
|
Core-Periphery Principle Guided Redesign of Self-Attention in Transformers
Fallback synthesis: Core-Periphery Principle Guided Redesign of Self-Attention in Transformers. Key signals: sparse, rag.
|
05-18 09:56 | Success | - | |
|
exp_2303.15585v4_20260518_095552
|
(Un)fair devices: Moving beyond AI accuracy in personal sensing
Fallback synthesis: (Un)fair devices: Moving beyond AI accuracy in personal sensing. Key signals: ssm, rag, grounded.
|
05-18 09:55 | Success | - | |
|
exp_2209.15439v2_20260518_095504
|
Exploiting Instance-based Mixed Sampling via Auxiliary Source Domain Supervision for Domain-adaptive Action Detection
Fallback synthesis: Exploiting Instance-based Mixed Sampling via Auxiliary Source Domain Supervision for Domain-adaptive Action Detection. Key signals: ssm, context, rag.
|
05-18 09:55 | Success | - | |
|
exp_cr_10.1609_aaai.v36i11.21480_20260518_095446
|
PrEF: Probabilistic Electricity Forecasting via Copula-Augmented State Space Model
Fallback synthesis: PrEF: Probabilistic Electricity Forecasting via Copula-Augmented State Space Model. Key signals: linear, ssm, inference.
|
05-18 09:54 | Success | - | |
|
exp_2204.00673v2_20260518_095428
|
Learnable latent embeddings for joint behavioral and neural analysis
Fallback synthesis: Learnable latent embeddings for joint behavioral and neural analysis. Key signals: linear, rag, embedding.
|
05-18 09:54 | Success | - | |
|
exp_2204.00707v1_20260518_095410
|
Efficient Argument Structure Extraction with Transfer Learning and Active Learning
Fallback synthesis: Efficient Argument Structure Extraction with Transfer Learning and Active Learning. Key signals: context, rag.
|
05-18 09:54 | Success | - | |
|
exp_gh_maursader_symbiote-protocol_20260518_095352
|
maursader/symbiote-protocol
Fallback synthesis: maursader/symbiote-protocol. Key signals: memory, rag.
|
05-18 09:53 | Success | - | |
|
exp_2512.11179v3_20260518_095334
|
Bandwidth-constrained Variational Message Encoding for Cooperative Multi-agent Reinforcement Learning
Fallback synthesis: Bandwidth-constrained Variational Message Encoding for Cooperative Multi-agent Reinforcement Learning. Key signals: sparse.
|
05-18 09:53 | Success | - | |
|
exp_cr_10.1038_s41390-025-04669-8_20260518_095316
|
Is this neonate feeling pain? Leveraging clinical knowledge towards high-precision Large Language Model-based neonatal p...
Fallback synthesis: Is this neonate feeling pain? Leveraging clinical knowledge towards high-precision Large Language Model-based neonatal pain assessment. Key signals: ssm, rag.
|
05-18 09:53 | Success | - | |
|
exp_oa_W4415312651_20260518_095258
|
Adaptive Accompaniment with ReaLchords
Fallback synthesis: Adaptive Accompaniment with ReaLchords. Key signals: rag.
|
05-18 09:53 | Success | - | |
|
exp_oa_W4415056742_20260518_095239
|
Probabilistic Modeling of Latent Agentic Substructures in Deep Neural Networks
Fallback synthesis: Probabilistic Modeling of Latent Agentic Substructures in Deep Neural Networks. Key signals: linear, grounded.
|
05-18 09:52 | Success | - | |
|
exp_oa_W4414098962_20260518_095221
|
ForestGPT and Beyond: A Trustworthy Domain-Specific Large Language Model Paving the Way to Forestry 5.0
Fallback synthesis: ForestGPT and Beyond: A Trustworthy Domain-Specific Large Language Model Paving the Way to Forestry 5.0. Key signals: retrieval, rag.
|
05-18 09:52 | Success | - | |
|
exp_2506.12634v1_20260518_095203
|
Between Predictability and Randomness: Seeking Artistic Inspiration from AI Generative Models
Fallback synthesis: Between Predictability and Randomness: Seeking Artistic Inspiration from AI Generative Models. Key signals: memory, rag.
|
05-18 09:52 | Success | - | |
|
exp_2506.22454v1_20260518_095145
|
Microelectrode Signal Dynamics as Biomarkers of Subthalamic Nucleus Entry on Deep Brain Stimulation: A Nonlinear Feature...
Fallback synthesis: Microelectrode Signal Dynamics as Biomarkers of Subthalamic Nucleus Entry on Deep Brain Stimulation: A Nonlinear Feature Approach. Key signals: linear, rag.
|
05-18 09:51 | Success | - | |
|
exp_pytrain.20260518095042.002_20260518_095043
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-18 09:50 | Success | - | |
|
exp_cr_10.3390_tropicalmed10060167_20260518_095024
|
The Application of Machine Learning Algorithms to Predict HIV Testing Using Evidence from the 2002–2017 South African Ad...
Fallback synthesis: The Application of Machine Learning Algorithms to Predict HIV Testing Using Evidence from the 2002–2017 South African Adult Population-Based Surveys: An HIV Testing Predictive Model. Key signals: ssm, rag.
|
05-18 09:50 | Success | - | |
|
exp_oa_W4404344173_20260518_095005
|
Parameter-Efficient Fine-Tuning in Large Models: A Survey of Methodologies
Fallback synthesis: Parameter-Efficient Fine-Tuning in Large Models: A Survey of Methodologies. Key signals: memory, rag.
|
05-18 09:50 | Success | - | |
|
exp_2412.12358v1_20260518_094947
|
BioRAGent: A Retrieval-Augmented Generation System for Showcasing Generative Query Expansion and Domain-Specific Search...
Fallback synthesis: BioRAGent: A Retrieval-Augmented Generation System for Showcasing Generative Query Expansion and Domain-Specific Search for Scientific Q&A. Key signals: retrieval, rag.
|
05-18 09:49 | Success | - | |
|
exp_2406.13808v3_20260518_094928
|
Can Low-Rank Knowledge Distillation in LLMs be Useful for Microelectronic Reasoning?
Fallback synthesis: Can Low-Rank Knowledge Distillation in LLMs be Useful for Microelectronic Reasoning?. Key signals: context.
|
05-18 09:49 | Success | - | |
|
exp_2406.13840v1_20260518_094910
|
StackRAG Agent: Improving Developer Answers with Retrieval-Augmented Generation
Fallback synthesis: StackRAG Agent: Improving Developer Answers with Retrieval-Augmented Generation. Key signals: retrieval, rag.
|
05-18 09:49 | Success | - | |
|
exp_2309.13429v1_20260518_094852
|
Modeling Student Performance in Game-Based Learning Environments
Fallback synthesis: Modeling Student Performance in Game-Based Learning Environments. Key signals: context, rag.
|
05-18 09:48 | Success | - | |
|
exp_2309.13464v1_20260518_094834
|
Personalised and Adjustable Interval Type-2 Fuzzy-Based PPG Quality Assessment for the Edge
Fallback synthesis: Personalised and Adjustable Interval Type-2 Fuzzy-Based PPG Quality Assessment for the Edge. Key signals: ssm, rag.
|
05-18 09:48 | Success | - | |
|
exp_2309.13500v3_20260518_094816
|
Enhancing Student Performance Prediction on Learnersourced Questions with SGNN-LLM Synergy
Fallback synthesis: Enhancing Student Performance Prediction on Learnersourced Questions with SGNN-LLM Synergy. Key signals: sparse, embedding.
|
05-18 09:48 | Success | - | |
|
exp_2209.14338v2_20260518_094758
|
Who is GPT-3? An Exploration of Personality, Values and Demographics
Fallback synthesis: Who is GPT-3? An Exploration of Personality, Values and Demographics. Key signals: ssm, memory.
|
05-18 09:48 | Success | - | |
|
exp_cr_10.1093_humrep_deac107.551_20260518_094740
|
P-599 An expected benefit analysis of using an interpretable machine learning model for optimizing the day of trigger du...
Fallback synthesis: P-599 An expected benefit analysis of using an interpretable machine learning model for optimizing the day of trigger during ovarian stimulation. Key signals: linear, rag.
|
05-18 09:47 | Success | - | |
|
exp_cr_10.3390_biology11070995_20260518_094722
|
Predicting Protein–Protein Interactions Based on Ensemble Learning-Based Model from Protein Sequence
Fallback synthesis: Predicting Protein–Protein Interactions Based on Ensemble Learning-Based Model from Protein Sequence. Key signals: ssm, rag.
|
05-18 09:47 | Success | - | |
|
exp_2204.09640v3_20260518_094635
|
Probabilistic AutoRegressive Neural Networks for Accurate Long-range Forecasting
Fallback synthesis: Probabilistic AutoRegressive Neural Networks for Accurate Long-range Forecasting. Key signals: linear, rag.
|
05-18 09:46 | Success | - | |
|
exp_2204.00703v5_20260518_094616
|
A Reinforcement Learning Approach to Sensing Design in Resource-Constrained Wireless Networked Control Systems
Fallback synthesis: A Reinforcement Learning Approach to Sensing Design in Resource-Constrained Wireless Networked Control Systems. Key signals: rag.
|
05-18 09:46 | Success | - | |
|
exp_core_305590553_20260518_094558
|
Grounded Language Learning with Foundation Models
Fallback synthesis: Grounded Language Learning with Foundation Models. Key signals: grounded.
|
05-18 09:46 | Success | - | |
|
exp_2512.11141v2_20260518_094540
|
Learning complete and explainable visual representations from itemized text supervision
Fallback synthesis: Learning complete and explainable visual representations from itemized text supervision. Key signals: rag, grounded, embedding.
|
05-18 09:45 | Success | - | |
|
exp_cr_10.31449_inf.v49i24.8395_20260518_094522
|
Hybrid Deep Learning Model for Multi-Source Remote Sensing Data Fusion: Integrating DenseNet and Swin Transformer for Sp...
Fallback synthesis: Hybrid Deep Learning Model for Multi-Source Remote Sensing Data Fusion: Integrating DenseNet and Swin Transformer for Spatial Alignment and Feature Extraction. Key signals: context, inference.
|
05-18 09:45 | Success | - | |
|
exp_2506.12600v1_20260518_094504
|
Trust-MARL: Trust-Based Multi-Agent Reinforcement Learning Framework for Cooperative On-Ramp Merging Control in Heteroge...
Fallback synthesis: Trust-MARL: Trust-Based Multi-Agent Reinforcement Learning Framework for Cooperative On-Ramp Merging Control in Heterogeneous Traffic Flow. Key signals: context, rag.
|
05-18 09:45 | Success | - | |
|
exp_2506.12607v1_20260518_094446
|
Towards Building General Purpose Embedding Models for Industry 4.0 Agents
Fallback synthesis: Towards Building General Purpose Embedding Models for Industry 4.0 Agents. Key signals: context, inference, rag, embedding.
|
05-18 09:44 | Success | - | |
|
exp_2412.19823v1_20260518_094428
|
A Survey on Large Language Models for Communication, Network, and Service Management: Application Insights, Challenges,...
Fallback synthesis: A Survey on Large Language Models for Communication, Network, and Service Management: Application Insights, Challenges, and Future Directions. Key signals: context, rag.
|
05-18 09:44 | Success | - | |
|
exp_2309.13430v1_20260518_094410
|
Resolving References in Visually-Grounded Dialogue via Text Generation
Fallback synthesis: Resolving References in Visually-Grounded Dialogue via Text Generation. Key signals: context, retrieval, rag, grounded.
|
05-18 09:44 | Success | - | |
|
exp_2303.15555v1_20260518_094352
|
Object Discovery from Motion-Guided Tokens
Fallback synthesis: Object Discovery from Motion-Guided Tokens. Key signals: quantization, memory, rag.
|
05-18 09:43 | Success | - | |
|
exp_2209.14434v1_20260518_094334
|
Efficient Medical Image Assessment via Self-supervised Learning
Fallback synthesis: Efficient Medical Image Assessment via Self-supervised Learning. Key signals: ssm, rag, embedding.
|
05-18 09:43 | Success | - | |
|
exp_gh_Nestallum_tech-news-rag-assistant_20260518_094316
|
Nestallum/tech-news-rag-assistant
Fallback synthesis: Nestallum/tech-news-rag-assistant. Key signals: retrieval, rag, embedding.
|
05-18 09:43 | Success | - | |
|
exp_2512.11074v1_20260518_094228
|
MultiScript30k: Leveraging Multilingual Embeddings to Extend Cross Script Parallel Data
Fallback synthesis: MultiScript30k: Leveraging Multilingual Embeddings to Extend Cross Script Parallel Data. Key signals: ssm, rag, embedding.
|
05-18 09:42 | Success | - | |
|
exp_2512.11087v1_20260518_094209
|
Clip-and-Verify: Linear Constraint-Driven Domain Clipping for Accelerating Neural Network Verification
Fallback synthesis: Clip-and-Verify: Linear Constraint-Driven Domain Clipping for Accelerating Neural Network Verification. Key signals: linear, context, rag.
|
05-18 09:42 | Success | - | |
|
exp_2512.11131v1_20260518_094151
|
Fairness-Regularized Online Optimization with Switching Costs
Fallback synthesis: Fairness-Regularized Online Optimization with Switching Costs. Key signals: linear, inference, rag.
|
05-18 09:41 | Success | - | |
|
exp_oa_W4413800076_20260518_094133
|
From Illusion to Insight: A Taxonomic Survey of Hallucination Mitigation Techniques in LLMs
Fallback synthesis: From Illusion to Insight: A Taxonomic Survey of Hallucination Mitigation Techniques in LLMs. Key signals: retrieval, grounded.
|
05-18 09:41 | Success | - | |
|
exp_2506.12597v1_20260518_094115
|
Automatic Expert Discovery in LLM Upcycling via Sparse Interpolated Mixture-of-Experts
Fallback synthesis: Automatic Expert Discovery in LLM Upcycling via Sparse Interpolated Mixture-of-Experts. Key signals: sparse, moe, rag.
|
05-18 09:41 | Success | - | |
|
exp_2506.12655v2_20260518_094056
|
Beyond Sin-Squared Error: Linear-Time Entrywise Uncertainty Quantification for Streaming PCA
Fallback synthesis: Beyond Sin-Squared Error: Linear-Time Entrywise Uncertainty Quantification for Streaming PCA. Key signals: linear, inference, rag.
|
05-18 09:40 | Success | - | |
|
exp_2412.12300v3_20260518_094038
|
Unanswerability Evaluation for Retrieval Augmented Generation
Fallback synthesis: Unanswerability Evaluation for Retrieval Augmented Generation. Key signals: retrieval, rag, rerank.
|
05-18 09:40 | Success | - | |
|
exp_2412.12322v1_20260518_094019
|
RAG Playground: A Framework for Systematic Evaluation of Retrieval Strategies and Prompt Engineering in RAG Systems
Fallback synthesis: RAG Playground: A Framework for Systematic Evaluation of Retrieval Strategies and Prompt Engineering in RAG Systems. Key signals: retrieval, rag, rerank.
|
05-18 09:40 | Success | - | |
|
exp_2412.12359v2_20260518_094002
|
LLaVA Steering: Visual Instruction Tuning with 500x Fewer Parameters through Modality Linear Representation-Steering
Fallback synthesis: LLaVA Steering: Visual Instruction Tuning with 500x Fewer Parameters through Modality Linear Representation-Steering. Key signals: linear, context, rag.
|
05-18 09:40 | Success | - | |
|
exp_oa_W4399837987_20260518_093943
|
Supporting Human Raters with the Detection of Harmful Content using Large Language Models
Fallback synthesis: Supporting Human Raters with the Detection of Harmful Content using Large Language Models. Key signals: ssm, context, rag.
|
05-18 09:39 | Success | - | |
|
exp_2406.13805v1_20260518_093925
|
WikiContradict: A Benchmark for Evaluating LLMs on Real-World Knowledge Conflicts from Wikipedia
Fallback synthesis: WikiContradict: A Benchmark for Evaluating LLMs on Real-World Knowledge Conflicts from Wikipedia. Key signals: context, retrieval, rag.
|
05-18 09:39 | Success | - | |
|
exp_2406.13851v1_20260518_093907
|
Optimizing Quantile-based Trading Strategies in Electricity Arbitrage
Fallback synthesis: Optimizing Quantile-based Trading Strategies in Electricity Arbitrage. Key signals: ssm, rag.
|
05-18 09:39 | Success | - | |
|
exp_cr_10.3389_feduc.2024.1355952_20260518_093819
|
Applying the MSMLP model in advancing language teaching and learning: a longitudinal case study on soft skills developme...
Fallback synthesis: Applying the MSMLP model in advancing language teaching and learning: a longitudinal case study on soft skills development. Key signals: ssm, context, rag.
|
05-18 09:38 | Success | - | |
|
exp_cr_10.31849_utamax.v5i1.11260_20260518_093801
|
From Speech to Text: Enhancing Descriptive Paragraph Writing with Unjuk Tutur‘s Learning Model
Fallback synthesis: From Speech to Text: Enhancing Descriptive Paragraph Writing with Unjuk Tutur‘s Learning Model. Key signals: ssm, context, rag.
|
05-18 09:38 | Success | - | |
|
exp_2204.00595v1_20260518_093743
|
Monarch: Expressive Structured Matrices for Efficient and Accurate Training
Fallback synthesis: Monarch: Expressive Structured Matrices for Efficient and Accurate Training. Key signals: sparse, memory.
|
05-18 09:37 | Success | - | |
|
exp_2303.15446v2_20260518_093725
|
SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications
Fallback synthesis: SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications. Key signals: linear, context, inference.
|
05-18 09:37 | Success | - | |
|
exp_oa_W7118543654_20260518_093707
|
Instruction Tuning for Large Language Models: RLHF, Supervised Fine-Tuning, and Alignment Strategies
Fallback synthesis: Instruction Tuning for Large Language Models: RLHF, Supervised Fine-Tuning, and Alignment Strategies. No strong keyword signals detected.
|
05-18 09:37 | Success | - | |
|
exp_2512.11061v1_20260518_093649
|
VDAWorld: World Modelling via VLM-Directed Abstraction and Simulation
Fallback synthesis: VDAWorld: World Modelling via VLM-Directed Abstraction and Simulation. Key signals: grounded.
|
05-18 09:36 | Success | - | |
|
exp_oa_W4417539773_20260518_093630
|
Towards AI Search Paradigm
Fallback synthesis: Towards AI Search Paradigm. Key signals: inference, retrieval.
|
05-18 09:36 | Success | - | |
|
exp_2412.15262v1_20260518_093613
|
Advanced ingestion process powered by LLM parsing for RAG system
Fallback synthesis: Advanced ingestion process powered by LLM parsing for RAG system. Key signals: context, retrieval, rag, embedding.
|
05-18 09:36 | Success | - | |
|
exp_2412.12364v1_20260518_093554
|
LogBabylon: A Unified Framework for Cross-Log File Integration and Analysis
Fallback synthesis: LogBabylon: A Unified Framework for Cross-Log File Integration and Analysis. Key signals: context, retrieval, rag.
|
05-18 09:35 | Success | - | |
|
exp_2512.11130v2_20260518_093537
|
Fast-FoundationStereo: Real-Time Zero-Shot Stereo Matching
Fallback synthesis: Fast-FoundationStereo: Real-Time Zero-Shot Stereo Matching. No strong keyword signals detected.
|
05-18 09:35 | Success | - | |
|
exp_2506.12647v1_20260518_093518
|
Optimizing Blood Transfusions and Predicting Shortages in Resource-Constrained Areas
Fallback synthesis: Optimizing Blood Transfusions and Predicting Shortages in Resource-Constrained Areas. Key signals: linear, memory, rag.
|
05-18 09:35 | Success | - | |
|
exp_oa_W7130510261_20260518_093501
|
Training Methods for Large Language Models: Current Approaches and Challenges
Fallback synthesis: Training Methods for Large Language Models: Current Approaches and Challenges. Key signals: sparse, moe, retrieval.
|
05-18 09:35 | Success | - | |
|
exp_2303.15553v3_20260518_093412
|
MoViT: Memorizing Vision Transformers for Medical Image Analysis
Fallback synthesis: MoViT: Memorizing Vision Transformers for Medical Image Analysis. Key signals: context, memory, inference, rag.
|
05-18 09:34 | Success | - | |
|
exp_2204.00716v2_20260518_093354
|
CharacterBERT and Self-Teaching for Improving the Robustness of Dense Retrievers on Queries with Typos
Fallback synthesis: CharacterBERT and Self-Teaching for Improving the Robustness of Dense Retrievers on Queries with Typos. Key signals: ssm, retrieval, embedding.
|
05-18 09:33 | Success | - | |
|
exp_2406.13868v1_20260518_093336
|
SDQ: Sparse Decomposed Quantization for LLM Inference
Fallback synthesis: SDQ: Sparse Decomposed Quantization for LLM Inference. Key signals: quantization, sparse, memory, inference.
|
05-18 09:33 | Success | - | |
|
exp_core_160824652_20260518_093318
|
Efficient and Scalable Large Multimodal Models
Fallback synthesis: Efficient and Scalable Large Multimodal Models. Key signals: quantization, moe, memory, inference.
|
05-18 09:33 | Success | - | |
|
exp_cr_10.71465_csb162_20260518_093300
|
Domain-Adapted Large Language Models for Industrial Applications: From Fine-Tuning to Real-Time Deployment
Fallback synthesis: Domain-Adapted Large Language Models for Industrial Applications: From Fine-Tuning to Real-Time Deployment. Key signals: context, inference, retrieval, rag.
|
05-18 09:33 | Success | - | |
|
exp_pytrain.20260518092010.001_20260518_092010
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-18 09:20 | Success | - | |
|
exp_pytrain.20260518091904.006_20260518_091905
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-18 09:19 | Success | - | |
|
exp_self.20260518091638.023_20260518_091638
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260518091638.023 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-18 09:16 | Success | - | |
|
exp_self.20260518091003.022_20260518_091004
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260518091003.022 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-18 09:10 | Success | - | |
|
exp_self.20260518090327.021_20260518_090328
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260518090327.021 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-18 09:03 | Success | - | |
|
exp_self.20260518085645.020_20260518_085645
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260518085645.020 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-18 08:56 | Success | - | |
|
exp_self.20260518085009.019_20260518_085009
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260518085009.019 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-18 08:50 | Success | - | |
|
exp_pytrain.20260518084836.005_20260518_084837
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-18 08:48 | Success | - | |
|
exp_self.20260518084228.018_20260518_084228
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260518084228.018 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-18 08:42 | Success | - | |
|
exp_self.20260518083549.017_20260518_083550
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260518083549.017 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-18 08:35 | Success | - | |
|
exp_self.20260518082913.016_20260518_082914
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260518082913.016 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-18 08:29 | Success | - | |
|
exp_self.20260518082228.015_20260518_082229
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260518082228.015 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-18 08:22 | Success | - | |
|
exp_cr_10.3390_rs18101619_20260518_081902
|
Comprehensive Analysis of Snow BRDF Variations by Assessing the Improved Kernel-Driven BRDF Model
Paper ID: cr_10.3390_rs18101619 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered be...
|
05-18 08:19 | Success | - | |
|
exp_pytrain.20260518081648.004_20260518_081648
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-18 08:16 | Success | - | |
|
exp_self.20260518081544.014_20260518_081545
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260518081544.014 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-18 08:15 | Success | - | |
|
exp_self.20260518080900.013_20260518_080901
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260518080900.013 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-18 08:09 | Success | - | |
|
exp_self.20260518080219.012_20260518_080219
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260518080219.012 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-18 08:02 | Success | - | |
|
exp_self.20260518075542.011_20260518_075542
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260518075542.011 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-18 07:55 | Success | - | |
|
exp_self.20260518074905.010_20260518_074905
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260518074905.010 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-18 07:49 | Success | - | |
|
exp_pytrain.20260518074621.003_20260518_074621
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-18 07:46 | Success | - | |
|
exp_hf_2605.15592_20260518_074351
|
Efficient Image Synthesis with Sphere Latent Encoder
Paper ID: hf_2605.15592 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-18 07:43 | Success | - | |
|
exp_self.20260518074244.009_20260518_074244
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260518074244.009 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-18 07:42 | Success | - | |
|
exp_self.20260518073606.008_20260518_073606
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260518073606.008 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-18 07:36 | Success | - | |
|
exp_self.20260518072926.007_20260518_072926
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260518072926.007 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-18 07:29 | Success | - | |
|
exp_self.20260518072246.006_20260518_072247
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260518072246.006 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-18 07:22 | Success | - | |
|
exp_self.20260518071640.005_20260518_071640
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260518071640.005 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-18 07:16 | Success | - | |
|
exp_pytrain.20260518071506.002_20260518_071506
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-18 07:15 | Success | - | |
|
exp_self.20260518071041.004_20260518_071041
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260518071041.004 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-18 07:10 | Success | - | |
|
exp_oa_W4362515116_20260518_070842
|
A Survey of Large Language Models
Paper ID: oa_W4362515116 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-18 07:08 | Success | - | |
|
exp_hf_2605.12058_20260518_070538
|
Hölder Policy Optimisation
Paper ID: hf_2605.12058 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-18 07:05 | Success | - | |
|
exp_self.20260518070318.003_20260518_070319
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260518070318.003 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-18 07:03 | Success | - | |
|
exp_hf_2605.15375_20260518_065839
|
ChangeFlow -- Latent Rectified Flow for Change Detection in Remote Sensing
Paper ID: hf_2605.15375 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-18 06:58 | Success | - | |
|
exp_self.20260518065732.002_20260518_065733
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260518065732.002 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-18 06:57 | Success | - | |
|
exp_oa_W7160968741_20260518_065509
|
Star Elastic: Many-in-One Reasoning LLMs with Efficient Budget Control
Paper ID: oa_W7160968741 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-18 06:55 | Success | - | |
|
exp_hf_2605.15250_20260518_065235
|
GQLA: Group-Query Latent Attention for Hardware-Adaptive Large Language Model Decoding
Paper ID: hf_2605.15250 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-18 06:52 | Success | - | |
|
exp_self.20260518065123.001_20260518_065123
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260518065123.001 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-18 06:51 | Success | - | |
|
exp_cr_10.54097_yhppk428_20260518_064926
|
Distributed Training Strategies for Reducing Carbon Footprint in Large Scale Model Development
Paper ID: cr_10.54097_yhppk428 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered ben...
|
05-18 06:49 | Success | - | |
|
exp_2605.16255v1_20260518_064732
|
Designing Datacenter Power Delivery Hierarchies for the AI Era
Paper ID: 2605.16255v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
05-18 06:47 | Success | - | |
|
exp_pytrain.20260518064350.001_20260518_064350
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-18 06:43 | Success | - | |
|
exp_pytrain.20260510093059.001_20260510_093059
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-10 09:31 | Success | - | |
|
exp_gh_echo313unfolding_helix-substrate_20260510_092941
|
echo313unfolding/helix-substrate
Paper ID: gh_echo313unfolding_helix-substrate - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal...
|
05-10 09:29 | Success | - | |
|
exp_self.20260510092616.003_20260510_092617
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510092616.003 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 09:26 | Success | - | |
|
exp_gh_Priyanka-techi_rag-qa-chatbot_20260510_092250
|
Priyanka-techi/rag-qa-chatbot
Paper ID: gh_Priyanka-techi_rag-qa-chatbot - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: R...
|
05-10 09:22 | Success | - | |
|
exp_self.20260510092032.002_20260510_092033
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510092032.002 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 09:20 | Success | - | |
|
exp_self.20260510091415.001_20260510_091415
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510091415.001 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 09:14 | Success | - | |
|
exp_pytrain.20260510091242.001_20260510_091242
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-10 09:12 | Success | - | |
|
exp_self.20260510085803.013_20260510_085804
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510085803.013 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 08:58 | Success | - | |
|
exp_self.20260510085131.012_20260510_085132
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510085131.012 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 08:51 | Success | - | |
|
exp_self.20260510084455.011_20260510_084456
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510084455.011 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 08:44 | Success | - | |
|
exp_pytrain.20260510084107.003_20260510_084108
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-10 08:41 | Success | - | |
|
exp_self.20260510083857.010_20260510_083858
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510083857.010 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 08:39 | Success | - | |
|
exp_self.20260510083220.009_20260510_083220
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510083220.009 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 08:32 | Success | - | |
|
exp_self.20260510082546.008_20260510_082547
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510082546.008 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 08:25 | Success | - | |
|
exp_self.20260510081913.007_20260510_081913
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510081913.007 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 08:19 | Success | - | |
|
exp_self.20260510081240.006_20260510_081241
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510081240.006 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 08:12 | Success | - | |
|
exp_pytrain.20260510080959.002_20260510_080959
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-10 08:10 | Success | - | |
|
exp_self.20260510080638.005_20260510_080639
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510080638.005 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 08:06 | Success | - | |
|
exp_self.20260510080008.004_20260510_080008
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510080008.004 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 08:00 | Success | - | |
|
exp_self.20260510075333.003_20260510_075333
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510075333.003 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 07:53 | Success | - | |
|
exp_self.20260510074658.002_20260510_074659
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510074658.002 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 07:47 | Success | - | |
|
exp_self.20260510074025.001_20260510_074026
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510074025.001 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 07:40 | Success | - | |
|
exp_pytrain.20260510073854.001_20260510_073854
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-10 07:38 | Success | - | |
|
exp_pytrain.20260510073537.002_20260510_073537
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-10 07:35 | Success | - | |
|
exp_self.20260510073328.005_20260510_073328
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510073328.005 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 07:33 | Success | - | |
|
exp_self.20260510072653.004_20260510_072653
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510072653.004 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 07:26 | Success | - | |
|
exp_self.20260510072005.003_20260510_072006
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510072005.003 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 07:20 | Success | - | |
|
exp_self.20260510071332.002_20260510_071332
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510071332.002 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 07:13 | Success | - | |
|
exp_cr_10.1007_s44163-026-01360-7_20260510_071117
|
World model inspired sarcasm reasoning with large language model agents
Paper ID: cr_10.1007_s44163-026-01360-7 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Reco...
|
05-10 07:11 | Success | - | |
|
exp_self.20260510070645.001_20260510_070646
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510070645.001 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 07:06 | Success | - | |
|
exp_pytrain.20260510070514.001_20260510_070514
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-10 07:05 | Success | - | |
|
exp_self.20260510070202.002_20260510_070202
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510070202.002 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 07:02 | Success | - | |
|
exp_self.20260510065531.001_20260510_065531
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510065531.001 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 06:55 | Success | - | |
|
exp_pytrain.20260510065400.001_20260510_065400
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-10 06:54 | Success | - | |
|
exp_self.20260510064650.003_20260510_064650
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510064650.003 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 06:46 | Success | - | |
|
exp_self.20260510064019.002_20260510_064020
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510064019.002 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 06:40 | Success | - | |
|
exp_self.20260510063349.001_20260510_063349
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510063349.001 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 06:33 | Success | - | |
|
exp_pytrain.20260510063218.001_20260510_063218
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-10 06:32 | Success | - | |
|
exp_self.20260510062803.001_20260510_062804
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510062803.001 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 06:28 | Success | - | |
|
exp_pytrain.20260510062632.001_20260510_062633
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-10 06:26 | Success | - | |
|
exp_self.20260510061415.192_20260510_061416
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510061415.192 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 06:14 | Success | - | |
|
exp_self.20260510060738.191_20260510_060738
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510060738.191 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 06:07 | Success | - | |
|
exp_pytrain.20260510060348.041_20260510_060348
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-10 06:03 | Success | - | |
|
exp_self.20260510060135.190_20260510_060135
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510060135.190 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 06:01 | Success | - | |
|
exp_self.20260510055502.189_20260510_055503
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510055502.189 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 05:55 | Success | - | |
|
exp_self.20260510054830.188_20260510_054831
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510054830.188 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 05:48 | Success | - | |
|
exp_self.20260510054159.187_20260510_054159
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510054159.187 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 05:42 | Success | - | |
|
exp_self.20260510053522.186_20260510_053523
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510053522.186 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 05:35 | Success | - | |
|
exp_pytrain.20260510053242.040_20260510_053243
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-10 05:32 | Success | - | |
|
exp_self.20260510052922.185_20260510_052923
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510052922.185 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 05:29 | Success | - | |
|
exp_self.20260510052246.184_20260510_052247
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510052246.184 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 05:22 | Success | - | |
|
exp_self.20260510051612.183_20260510_051612
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510051612.183 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 05:16 | Success | - | |
|
exp_self.20260510050937.182_20260510_050937
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510050937.182 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 05:09 | Success | - | |
|
exp_self.20260510050305.181_20260510_050305
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510050305.181 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 05:03 | Success | - | |
|
exp_pytrain.20260510050129.039_20260510_050130
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-10 05:01 | Success | - | |
|
exp_self.20260510045537.180_20260510_045537
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510045537.180 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 04:55 | Success | - | |
|
exp_self.20260510044859.179_20260510_044859
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510044859.179 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 04:49 | Success | - | |
|
exp_self.20260510044226.178_20260510_044227
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510044226.178 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 04:42 | Success | - | |
|
exp_self.20260510043555.177_20260510_043555
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510043555.177 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 04:35 | Success | - | |
|
exp_pytrain.20260510043033.038_20260510_043034
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-10 04:30 | Success | - | |
|
exp_self.20260510042928.176_20260510_042929
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510042928.176 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 04:29 | Success | - | |
|
exp_self.20260510042256.175_20260510_042257
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510042256.175 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 04:22 | Success | - | |
|
exp_self.20260510041624.174_20260510_041624
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510041624.174 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 04:16 | Success | - | |
|
exp_self.20260510040943.173_20260510_040943
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510040943.173 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 04:09 | Success | - | |
|
exp_self.20260510040304.172_20260510_040305
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510040304.172 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 04:03 | Success | - | |
|
exp_pytrain.20260510035917.037_20260510_035917
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-10 03:59 | Success | - | |
|
exp_self.20260510035705.171_20260510_035705
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510035705.171 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 03:57 | Success | - | |
|
exp_self.20260510035028.170_20260510_035028
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510035028.170 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 03:50 | Success | - | |
|
exp_self.20260510034356.169_20260510_034357
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510034356.169 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 03:43 | Success | - | |
|
exp_self.20260510033718.168_20260510_033718
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510033718.168 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 03:37 | Success | - | |
|
exp_self.20260510033046.167_20260510_033046
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510033046.167 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 03:30 | Success | - | |
|
exp_pytrain.20260510032805.036_20260510_032806
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-10 03:28 | Success | - | |
|
exp_self.20260510032446.166_20260510_032447
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510032446.166 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 03:24 | Success | - | |
|
exp_self.20260510031815.165_20260510_031816
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510031815.165 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 03:18 | Success | - | |
|
exp_self.20260510031140.164_20260510_031140
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510031140.164 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 03:11 | Success | - | |
|
exp_self.20260510030506.163_20260510_030506
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510030506.163 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 03:05 | Success | - | |
|
exp_self.20260510025834.162_20260510_025835
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510025834.162 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 02:58 | Success | - | |
|
exp_pytrain.20260510025659.035_20260510_025700
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-10 02:57 | Success | - | |
|
exp_self.20260510025105.161_20260510_025106
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510025105.161 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 02:51 | Success | - | |
|
exp_self.20260510024430.160_20260510_024430
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510024430.160 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 02:44 | Success | - | |
|
exp_self.20260510023758.159_20260510_023758
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510023758.159 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 02:38 | Success | - | |
|
exp_self.20260510023128.158_20260510_023128
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510023128.158 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 02:31 | Success | - | |
|
exp_pytrain.20260510022607.034_20260510_022607
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-10 02:26 | Success | - | |
|
exp_self.20260510022504.157_20260510_022504
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510022504.157 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 02:25 | Success | - | |
|
exp_self.20260510021831.156_20260510_021832
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510021831.156 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 02:18 | Success | - | |
|
exp_self.20260510021200.155_20260510_021200
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510021200.155 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 02:12 | Success | - | |
|
exp_self.20260510020453.154_20260510_020453
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510020453.154 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 02:04 | Success | - | |
|
exp_self.20260510015734.153_20260510_015735
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510015734.153 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 01:57 | Success | - | |
|
exp_pytrain.20260510015452.033_20260510_015452
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-10 01:54 | Success | - | |
|
exp_self.20260510015133.152_20260510_015134
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510015133.152 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 01:51 | Success | - | |
|
exp_self.20260510014500.151_20260510_014500
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510014500.151 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 01:45 | Success | - | |
|
exp_self.20260510013828.150_20260510_013829
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510013828.150 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 01:38 | Success | - | |
|
exp_self.20260510013154.149_20260510_013154
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510013154.149 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 01:31 | Success | - | |
|
exp_self.20260510012523.148_20260510_012524
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510012523.148 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 01:25 | Success | - | |
|
exp_pytrain.20260510012353.032_20260510_012354
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-10 01:23 | Success | - | |
|
exp_self.20260510011745.147_20260510_011746
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510011745.147 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 01:17 | Success | - | |
|
exp_self.20260510011113.146_20260510_011114
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510011113.146 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 01:11 | Success | - | |
|
exp_self.20260510010436.145_20260510_010436
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510010436.145 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 01:04 | Success | - | |
|
exp_self.20260510005732.144_20260510_005732
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510005732.144 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 00:57 | Success | - | |
|
exp_pytrain.20260510005212.031_20260510_005212
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-10 00:52 | Success | - | |
|
exp_self.20260510005109.143_20260510_005109
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510005109.143 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 00:51 | Success | - | |
|
exp_self.20260510004437.142_20260510_004438
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510004437.142 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 00:44 | Success | - | |
|
exp_self.20260510003807.141_20260510_003808
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510003807.141 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 00:38 | Success | - | |
|
exp_self.20260510003135.140_20260510_003135
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510003135.140 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 00:31 | Success | - | |
|
exp_self.20260510002504.139_20260510_002505
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510002504.139 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 00:25 | Success | - | |
|
exp_pytrain.20260510002114.030_20260510_002114
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-10 00:21 | Success | - | |
|
exp_self.20260510001906.138_20260510_001907
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510001906.138 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 00:19 | Success | - | |
|
exp_self.20260510001235.137_20260510_001235
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510001235.137 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 00:12 | Success | - | |
|
exp_self.20260510000558.136_20260510_000559
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260510000558.136 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-10 00:06 | Success | - | |
|
exp_self.20260509235924.135_20260509_235924
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509235924.135 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 23:59 | Success | - | |
|
exp_self.20260509235253.134_20260509_235254
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509235253.134 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 23:52 | Success | - | |
|
exp_pytrain.20260509235012.029_20260509_235013
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 23:50 | Success | - | |
|
exp_self.20260509234653.133_20260509_234653
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509234653.133 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 23:46 | Success | - | |
|
exp_self.20260509234024.132_20260509_234024
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509234024.132 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 23:40 | Success | - | |
|
exp_self.20260509233353.131_20260509_233354
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509233353.131 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 23:33 | Success | - | |
|
exp_self.20260509232718.130_20260509_232719
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509232718.130 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 23:27 | Success | - | |
|
exp_self.20260509232042.129_20260509_232043
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509232042.129 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 23:20 | Success | - | |
|
exp_pytrain.20260509231912.028_20260509_231912
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 23:19 | Success | - | |
|
exp_self.20260509231309.128_20260509_231310
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509231309.128 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 23:13 | Success | - | |
|
exp_self.20260509230637.127_20260509_230637
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509230637.127 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 23:06 | Success | - | |
|
exp_self.20260509230005.126_20260509_230005
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509230005.126 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 23:00 | Success | - | |
|
exp_self.20260509225320.125_20260509_225320
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509225320.125 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 22:53 | Success | - | |
|
exp_pytrain.20260509224800.027_20260509_224800
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 22:48 | Success | - | |
|
exp_self.20260509224656.124_20260509_224657
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509224656.124 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 22:46 | Success | - | |
|
exp_self.20260509224021.123_20260509_224022
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509224021.123 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 22:40 | Success | - | |
|
exp_self.20260509223351.122_20260509_223351
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509223351.122 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 22:33 | Success | - | |
|
exp_self.20260509222720.121_20260509_222720
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509222720.121 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 22:27 | Success | - | |
|
exp_self.20260509222047.120_20260509_222047
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509222047.120 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 22:20 | Success | - | |
|
exp_pytrain.20260509221656.026_20260509_221656
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 22:16 | Success | - | |
|
exp_self.20260509221446.119_20260509_221447
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509221446.119 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 22:14 | Success | - | |
|
exp_self.20260509220816.118_20260509_220816
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509220816.118 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 22:08 | Success | - | |
|
exp_self.20260509220141.117_20260509_220141
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509220141.117 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 22:01 | Success | - | |
|
exp_self.20260509215502.116_20260509_215502
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509215502.116 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 21:55 | Success | - | |
|
exp_self.20260509214827.115_20260509_214827
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509214827.115 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 21:48 | Success | - | |
|
exp_pytrain.20260509214546.025_20260509_214546
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 21:45 | Success | - | |
|
exp_self.20260509214227.114_20260509_214227
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509214227.114 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 21:42 | Success | - | |
|
exp_self.20260509213554.113_20260509_213555
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509213554.113 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 21:35 | Success | - | |
|
exp_self.20260509212919.112_20260509_212919
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509212919.112 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 21:29 | Success | - | |
|
exp_self.20260509212245.111_20260509_212246
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509212245.111 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 21:22 | Success | - | |
|
exp_self.20260509211610.110_20260509_211611
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509211610.110 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 21:16 | Success | - | |
|
exp_pytrain.20260509211439.024_20260509_211439
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 21:14 | Success | - | |
|
exp_self.20260509210836.109_20260509_210836
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509210836.109 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 21:08 | Success | - | |
|
exp_self.20260509210157.108_20260509_210158
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509210157.108 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 21:02 | Success | - | |
|
exp_self.20260509205529.107_20260509_205529
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509205529.107 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 20:55 | Success | - | |
|
exp_self.20260509204845.106_20260509_204846
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509204845.106 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 20:48 | Success | - | |
|
exp_pytrain.20260509204307.023_20260509_204308
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 20:43 | Success | - | |
|
exp_self.20260509204205.105_20260509_204205
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509204205.105 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 20:42 | Success | - | |
|
exp_self.20260509203524.104_20260509_203524
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509203524.104 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 20:35 | Success | - | |
|
exp_self.20260509202852.103_20260509_202852
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509202852.103 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 20:28 | Success | - | |
|
exp_self.20260509202223.102_20260509_202223
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509202223.102 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 20:22 | Success | - | |
|
exp_self.20260509201552.101_20260509_201552
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509201552.101 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 20:15 | Success | - | |
|
exp_pytrain.20260509201200.022_20260509_201201
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 20:12 | Success | - | |
|
exp_self.20260509200952.100_20260509_200952
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509200952.100 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 20:09 | Success | - | |
|
exp_self.20260509200320.099_20260509_200321
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509200320.099 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 20:03 | Success | - | |
|
exp_self.20260509195650.098_20260509_195651
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509195650.098 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 19:56 | Success | - | |
|
exp_self.20260509195014.097_20260509_195015
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509195014.097 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 19:50 | Success | - | |
|
exp_self.20260509194341.096_20260509_194342
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509194341.096 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 19:43 | Success | - | |
|
exp_pytrain.20260509194101.021_20260509_194101
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 19:41 | Success | - | |
|
exp_self.20260509193744.095_20260509_193744
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509193744.095 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 19:37 | Success | - | |
|
exp_self.20260509193106.094_20260509_193107
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509193106.094 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 19:31 | Success | - | |
|
exp_self.20260509192430.093_20260509_192431
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509192430.093 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 19:24 | Success | - | |
|
exp_self.20260509191757.092_20260509_191757
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509191757.092 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 19:17 | Success | - | |
|
exp_self.20260509191122.091_20260509_191122
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509191122.091 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 19:11 | Success | - | |
|
exp_pytrain.20260509190950.020_20260509_190950
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 19:09 | Success | - | |
|
exp_self.20260509190343.090_20260509_190344
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509190343.090 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 19:03 | Success | - | |
|
exp_self.20260509185706.089_20260509_185706
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509185706.089 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 18:57 | Success | - | |
|
exp_self.20260509185036.088_20260509_185037
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509185036.088 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 18:50 | Success | - | |
|
exp_self.20260509184358.087_20260509_184359
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509184358.087 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 18:44 | Success | - | |
|
exp_pytrain.20260509183839.019_20260509_183840
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 18:38 | Success | - | |
|
exp_self.20260509183736.086_20260509_183736
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509183736.086 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 18:37 | Success | - | |
|
exp_self.20260509183105.085_20260509_183105
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509183105.085 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 18:31 | Success | - | |
|
exp_self.20260509182431.084_20260509_182431
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509182431.084 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 18:24 | Success | - | |
|
exp_self.20260509181754.083_20260509_181754
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509181754.083 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 18:17 | Success | - | |
|
exp_self.20260509181123.082_20260509_181123
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509181123.082 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 18:11 | Success | - | |
|
exp_pytrain.20260509180738.018_20260509_180739
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 18:07 | Success | - | |
|
exp_self.20260509180523.081_20260509_180524
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509180523.081 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 18:05 | Success | - | |
|
exp_self.20260509175852.080_20260509_175852
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509175852.080 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 17:58 | Success | - | |
|
exp_self.20260509175224.079_20260509_175224
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509175224.079 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 17:52 | Success | - | |
|
exp_self.20260509174550.078_20260509_174550
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509174550.078 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 17:45 | Success | - | |
|
exp_self.20260509173907.077_20260509_173907
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509173907.077 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 17:39 | Success | - | |
|
exp_pytrain.20260509173626.017_20260509_173627
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 17:36 | Success | - | |
|
exp_self.20260509173310.076_20260509_173310
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509173310.076 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 17:33 | Success | - | |
|
exp_self.20260509172637.075_20260509_172637
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509172637.075 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 17:26 | Success | - | |
|
exp_self.20260509172007.074_20260509_172007
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509172007.074 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 17:20 | Success | - | |
|
exp_self.20260509171336.073_20260509_171336
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509171336.073 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 17:13 | Success | - | |
|
exp_self.20260509170700.072_20260509_170700
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509170700.072 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 17:07 | Success | - | |
|
exp_pytrain.20260509170523.016_20260509_170523
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 17:05 | Success | - | |
|
exp_self.20260509165921.071_20260509_165921
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509165921.071 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 16:59 | Success | - | |
|
exp_self.20260509165253.070_20260509_165253
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509165253.070 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 16:52 | Success | - | |
|
exp_self.20260509164624.069_20260509_164625
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509164624.069 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 16:46 | Success | - | |
|
exp_self.20260509163954.068_20260509_163955
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509163954.068 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 16:39 | Success | - | |
|
exp_pytrain.20260509163500.015_20260509_163500
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 16:35 | Success | - | |
|
exp_self.20260509163357.067_20260509_163357
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509163357.067 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 16:34 | Success | - | |
|
exp_self.20260509162728.066_20260509_162728
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509162728.066 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 16:27 | Success | - | |
|
exp_self.20260509162054.065_20260509_162055
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509162054.065 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 16:20 | Success | - | |
|
exp_self.20260509161418.064_20260509_161419
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509161418.064 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 16:14 | Success | - | |
|
exp_self.20260509160749.063_20260509_160750
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509160749.063 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 16:07 | Success | - | |
|
exp_pytrain.20260509160406.014_20260509_160406
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 16:04 | Success | - | |
|
exp_self.20260509160146.062_20260509_160146
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509160146.062 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 16:01 | Success | - | |
|
exp_self.20260509155516.061_20260509_155517
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509155516.061 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 15:55 | Success | - | |
|
exp_self.20260509154848.060_20260509_154849
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509154848.060 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 15:48 | Success | - | |
|
exp_self.20260509154214.059_20260509_154215
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509154214.059 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 15:42 | Success | - | |
|
exp_self.20260509153540.058_20260509_153540
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509153540.058 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 15:35 | Success | - | |
|
exp_pytrain.20260509153300.013_20260509_153301
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 15:33 | Success | - | |
|
exp_self.20260509152945.057_20260509_152945
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509152945.057 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 15:29 | Success | - | |
|
exp_self.20260509152309.056_20260509_152309
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509152309.056 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 15:23 | Success | - | |
|
exp_cr_10.1093_mnras_stag893_20260509_151945
|
AstroSpec-LLM: A Large Language Model Framework for High-throughput Infrared Spectral Prediction of Interstellar PAHs
Paper ID: cr_10.1093_mnras_stag893 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 15:19 | Success | - | |
|
exp_self.20260509151623.055_20260509_151623
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509151623.055 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 15:16 | Success | - | |
|
exp_self.20260509150954.054_20260509_150955
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509150954.054 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 15:09 | Success | - | |
|
exp_self.20260509150327.053_20260509_150327
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509150327.053 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 15:03 | Success | - | |
|
exp_pytrain.20260509150150.012_20260509_150150
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 15:01 | Success | - | |
|
exp_self.20260509145551.052_20260509_145551
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509145551.052 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 14:55 | Success | - | |
|
exp_self.20260509144922.051_20260509_144922
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509144922.051 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 14:49 | Success | - | |
|
exp_self.20260509144254.050_20260509_144254
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509144254.050 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 14:42 | Success | - | |
|
exp_self.20260509143626.049_20260509_143626
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509143626.049 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 14:36 | Success | - | |
|
exp_pytrain.20260509143105.011_20260509_143105
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 14:31 | Success | - | |
|
exp_self.20260509143003.048_20260509_143003
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509143003.048 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 14:30 | Success | - | |
|
exp_self.20260509142335.047_20260509_142335
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509142335.047 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 14:23 | Success | - | |
|
exp_self.20260509141706.046_20260509_141707
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509141706.046 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 14:17 | Success | - | |
|
exp_self.20260509141031.045_20260509_141031
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509141031.045 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 14:10 | Success | - | |
|
exp_self.20260509140402.044_20260509_140403
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509140402.044 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 14:04 | Success | - | |
|
exp_pytrain.20260509140017.010_20260509_140018
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 14:00 | Success | - | |
|
exp_self.20260509135806.043_20260509_135806
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509135806.043 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 13:58 | Success | - | |
|
exp_self.20260509135136.042_20260509_135136
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509135136.042 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 13:51 | Success | - | |
|
exp_self.20260509134507.041_20260509_134507
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509134507.041 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 13:45 | Success | - | |
|
exp_self.20260509133833.040_20260509_133833
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509133833.040 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 13:38 | Success | - | |
|
exp_self.20260509133153.039_20260509_133153
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509133153.039 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 13:31 | Success | - | |
|
exp_pytrain.20260509132912.009_20260509_132912
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 13:29 | Success | - | |
|
exp_self.20260509132553.038_20260509_132553
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509132553.038 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 13:25 | Success | - | |
|
exp_self.20260509131915.037_20260509_131915
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509131915.037 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 13:19 | Success | - | |
|
exp_self.20260509131240.036_20260509_131240
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509131240.036 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 13:12 | Success | - | |
|
exp_self.20260509130608.035_20260509_130608
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509130608.035 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 13:06 | Success | - | |
|
exp_self.20260509125937.034_20260509_125937
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509125937.034 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 12:59 | Success | - | |
|
exp_pytrain.20260509125801.008_20260509_125801
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 12:58 | Success | - | |
|
exp_self.20260509125210.033_20260509_125210
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509125210.033 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 12:52 | Success | - | |
|
exp_self.20260509124536.032_20260509_124536
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509124536.032 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 12:45 | Success | - | |
|
exp_self.20260509123903.031_20260509_123903
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509123903.031 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 12:39 | Success | - | |
|
exp_self.20260509123233.030_20260509_123233
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509123233.030 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 12:32 | Success | - | |
|
exp_pytrain.20260509122711.007_20260509_122712
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 12:27 | Success | - | |
|
exp_self.20260509122608.029_20260509_122608
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509122608.029 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 12:26 | Success | - | |
|
exp_self.20260509121940.028_20260509_121941
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509121940.028 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 12:19 | Success | - | |
|
exp_cr_10.1093_bioinformatics_btag260_20260509_121746
|
Protein Language Model Embeddings Improve HIV Drug Resistance Prediction: A Comprehensive Benchmark with Attention-Based...
Paper ID: cr_10.1093_bioinformatics_btag260 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal:...
|
05-09 12:17 | Success | - | |
|
exp_self.20260509121141.027_20260509_121141
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509121141.027 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 12:11 | Success | - | |
|
exp_self.20260509120511.026_20260509_120511
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509120511.026 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 12:05 | Success | - | |
|
exp_self.20260509115841.025_20260509_115842
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509115841.025 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 11:58 | Success | - | |
|
exp_pytrain.20260509115601.006_20260509_115601
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 11:56 | Success | - | |
|
exp_self.20260509115241.024_20260509_115242
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509115241.024 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 11:52 | Success | - | |
|
exp_self.20260509114612.023_20260509_114612
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509114612.023 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 11:46 | Success | - | |
|
exp_self.20260509114028.022_20260509_114028
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509114028.022 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 11:40 | Success | - | |
|
exp_self.20260509113400.021_20260509_113401
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509113400.021 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 11:34 | Success | - | |
|
exp_self.20260509112733.020_20260509_112733
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509112733.020 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 11:27 | Success | - | |
|
exp_pytrain.20260509112452.005_20260509_112453
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 11:24 | Success | - | |
|
exp_self.20260509112134.019_20260509_112135
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509112134.019 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 11:21 | Success | - | |
|
exp_self.20260509111506.018_20260509_111506
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509111506.018 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 11:15 | Success | - | |
|
exp_self.20260509110829.017_20260509_110830
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509110829.017 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 11:08 | Success | - | |
|
exp_self.20260509110157.016_20260509_110158
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509110157.016 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 11:02 | Success | - | |
|
exp_self.20260509105527.015_20260509_105528
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509105527.015 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 10:55 | Success | - | |
|
exp_pytrain.20260509105352.004_20260509_105353
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 10:53 | Success | - | |
|
exp_self.20260509104800.014_20260509_104800
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509104800.014 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 10:48 | Success | - | |
|
exp_self.20260509104125.013_20260509_104126
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509104125.013 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 10:41 | Success | - | |
|
exp_self.20260509103449.012_20260509_103450
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509103449.012 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 10:34 | Success | - | |
|
exp_self.20260509102814.011_20260509_102814
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509102814.011 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 10:28 | Success | - | |
|
exp_pytrain.20260509102249.003_20260509_102249
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 10:22 | Success | - | |
|
exp_self.20260509102142.010_20260509_102143
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509102142.010 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 10:21 | Success | - | |
|
exp_self.20260509101508.009_20260509_101509
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509101508.009 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 10:15 | Success | - | |
|
exp_self.20260509100821.008_20260509_100822
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509100821.008 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 10:08 | Success | - | |
|
exp_self.20260509100143.007_20260509_100143
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509100143.007 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 10:01 | Success | - | |
|
exp_self.20260509095503.006_20260509_095503
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509095503.006 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 09:55 | Success | - | |
|
exp_pytrain.20260509095114.002_20260509_095114
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 09:51 | Success | - | |
|
exp_self.20260509094903.005_20260509_094904
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509094903.005 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 09:49 | Success | - | |
|
exp_self.20260509094221.004_20260509_094222
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509094221.004 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 09:42 | Success | - | |
|
exp_self.20260509093551.003_20260509_093551
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509093551.003 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 09:35 | Success | - | |
|
exp_self.20260509092838.002_20260509_092838
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509092838.002 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 09:28 | Success | - | |
|
exp_self.20260509092206.001_20260509_092207
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509092206.001 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 09:22 | Success | - | |
|
exp_pytrain.20260509092035.001_20260509_092035
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 09:20 | Success | - | |
|
exp_pytrain.20260509090930.001_20260509_090931
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 09:10 | Success | - | |
|
exp_self.20260509090017.001_20260509_090018
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509090017.001 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 09:01 | Success | - | |
|
exp_pytrain.20260509085747.001_20260509_085747
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 08:58 | Success | - | |
|
exp_pytrain.20260509084551.001_20260509_084551
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 08:46 | Success | - | |
|
exp_self.20260509084242.003_20260509_084243
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509084242.003 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 08:43 | Success | - | |
|
exp_self.20260509083508.002_20260509_083509
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509083508.002 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 08:36 | Success | - | |
|
exp_self.20260509082736.001_20260509_082737
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509082736.001 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 08:28 | Success | - | |
|
exp_pytrain.20260509082506.001_20260509_082506
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 08:26 | Success | - | |
|
exp_self.20260509082304.004_20260509_082305
|
self.20260509082304.004
No summary available yet.
|
05-09 08:23 | Success | - | |
|
exp_self.20260509081631.003_20260509_081631
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509081631.003 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 08:16 | Failed | GPU_REQUIRED policy blocked benchmark execution. | |
|
exp_self.20260509080957.002_20260509_080958
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509080957.002 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 08:09 | Failed | GPU_REQUIRED policy blocked benchmark execution. | |
|
exp_self.20260509080324.001_20260509_080324
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509080324.001 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 08:03 | Failed | GPU_REQUIRED policy blocked benchmark execution. | |
|
exp_pytrain.20260509080147.001_20260509_080148
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 08:01 | Failed | GPU_REQUIRED policy blocked benchmark execution. | |
|
exp_pytrain.20260509075902.001_20260509_075903
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 07:59 | Pending | - | |
|
exp_pytrain.20260509075611.001_20260509_075612
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 07:56 | Pending | - | |
|
exp_pytrain.20260509075053.001_20260509_075053
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 07:50 | Pending | - | |
|
exp_self.20260509074650.374_20260509_074650
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509074650.374 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 07:47 | Success | - | |
|
exp_self.20260509073856.373_20260509_073857
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509073856.373 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 07:40 | Success | - | |
|
exp_self.20260509073042.372_20260509_073042
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509073042.372 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 07:31 | Success | - | |
|
exp_self.20260509072242.371_20260509_072243
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509072242.371 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 07:23 | Success | - | |
|
exp_pytrain.20260509072006.092_20260509_072006
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 07:21 | Success | - | |
|
exp_self.20260509071426.370_20260509_071426
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509071426.370 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 07:15 | Success | - | |
|
exp_self.20260509070644.369_20260509_070644
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509070644.369 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 07:07 | Success | - | |
|
exp_self.20260509065858.368_20260509_065858
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509065858.368 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 07:00 | Success | - | |
|
exp_self.20260509065116.367_20260509_065116
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509065116.367 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 06:52 | Success | - | |
|
exp_pytrain.20260509064839.091_20260509_064839
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 06:49 | Success | - | |
|
exp_self.20260509064301.366_20260509_064301
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509064301.366 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 06:44 | Success | - | |
|
exp_self.20260509063548.365_20260509_063548
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509063548.365 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 06:36 | Success | - | |
|
exp_self.20260509062839.364_20260509_062839
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509062839.364 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 06:29 | Success | - | |
|
exp_self.20260509062051.363_20260509_062051
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509062051.363 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 06:21 | Success | - | |
|
exp_pytrain.20260509061655.090_20260509_061656
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 06:17 | Success | - | |
|
exp_self.20260509061333.362_20260509_061334
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509061333.362 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 06:14 | Success | - | |
|
exp_self.20260509060550.361_20260509_060550
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509060550.361 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 06:06 | Success | - | |
|
exp_self.20260509055810.360_20260509_055810
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509055810.360 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 05:59 | Success | - | |
|
exp_self.20260509055030.359_20260509_055030
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509055030.359 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 05:51 | Success | - | |
|
exp_pytrain.20260509054531.089_20260509_054531
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 05:46 | Success | - | |
|
exp_self.20260509054323.358_20260509_054324
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509054323.358 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 05:44 | Success | - | |
|
exp_gh_naranor_wamp-proxy_20260509_054005
|
naranor/wamp-proxy
Paper ID: gh_naranor_wamp-proxy - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered be...
|
05-09 05:41 | Success | - | |
|
exp_self.20260509053427.357_20260509_053427
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509053427.357 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 05:35 | Success | - | |
|
exp_gh_jacksong-sourse_sll-core_20260509_053131
|
jacksong-sourse/sll-core
Paper ID: gh_jacksong-sourse_sll-core - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recove...
|
05-09 05:32 | Success | - | |
|
exp_self.20260509052412.356_20260509_052412
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509052412.356 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 05:25 | Success | - | |
|
exp_self.20260509051632.355_20260509_051632
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509051632.355 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 05:17 | Success | - | |
|
exp_pytrain.20260509051353.088_20260509_051354
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 05:14 | Success | - | |
|
exp_self.20260509050813.354_20260509_050813
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509050813.354 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 05:09 | Success | - | |
|
exp_self.20260509050032.353_20260509_050032
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509050032.353 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 05:01 | Success | - | |
|
exp_self.20260509045250.352_20260509_045251
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509045250.352 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 04:53 | Success | - | |
|
exp_self.20260509044508.351_20260509_044509
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509044508.351 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 04:46 | Success | - | |
|
exp_pytrain.20260509044232.087_20260509_044233
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 04:43 | Success | - | |
|
exp_self.20260509043649.350_20260509_043649
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509043649.350 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 04:37 | Success | - | |
|
exp_self.20260509042912.349_20260509_042912
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509042912.349 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 04:30 | Success | - | |
|
exp_self.20260509042133.348_20260509_042134
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509042133.348 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 04:22 | Success | - | |
|
exp_self.20260509041343.347_20260509_041344
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509041343.347 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 04:14 | Success | - | |
|
exp_pytrain.20260509041108.086_20260509_041108
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 04:12 | Success | - | |
|
exp_self.20260509040538.346_20260509_040538
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509040538.346 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 04:06 | Success | - | |
|
exp_self.20260509035750.345_20260509_035751
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509035750.345 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 03:58 | Success | - | |
|
exp_self.20260509035008.344_20260509_035008
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509035008.344 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 03:51 | Success | - | |
|
exp_self.20260509034228.343_20260509_034228
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509034228.343 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 03:43 | Success | - | |
|
exp_pytrain.20260509033947.085_20260509_033947
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 03:40 | Success | - | |
|
exp_self.20260509033417.342_20260509_033418
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509033417.342 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 03:35 | Success | - | |
|
exp_self.20260509032630.341_20260509_032631
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509032630.341 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 03:27 | Success | - | |
|
exp_self.20260509031820.340_20260509_031820
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509031820.340 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 03:19 | Success | - | |
|
exp_self.20260509031110.339_20260509_031111
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509031110.339 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 03:12 | Success | - | |
|
exp_pytrain.20260509030754.084_20260509_030754
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 03:08 | Success | - | |
|
exp_self.20260509030034.338_20260509_030035
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509030034.338 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 03:01 | Success | - | |
|
exp_self.20260509025327.337_20260509_025328
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509025327.337 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 02:54 | Success | - | |
|
exp_self.20260509024558.336_20260509_024559
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509024558.336 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 02:47 | Success | - | |
|
exp_self.20260509023848.335_20260509_023848
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509023848.335 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 02:39 | Success | - | |
|
exp_pytrain.20260509023553.083_20260509_023554
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 02:36 | Success | - | |
|
exp_self.20260509022929.334_20260509_022929
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509022929.334 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 02:30 | Success | - | |
|
exp_self.20260509022207.333_20260509_022208
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509022207.333 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 02:23 | Success | - | |
|
exp_self.20260509021459.332_20260509_021459
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509021459.332 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 02:16 | Success | - | |
|
exp_self.20260509020711.331_20260509_020711
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509020711.331 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 02:08 | Success | - | |
|
exp_pytrain.20260509020427.082_20260509_020428
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 02:05 | Success | - | |
|
exp_self.20260509015605.330_20260509_015606
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509015605.330 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 01:57 | Success | - | |
|
exp_self.20260509014911.329_20260509_014911
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509014911.329 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 01:50 | Success | - | |
|
exp_self.20260509014138.328_20260509_014138
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509014138.328 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 01:42 | Success | - | |
|
exp_self.20260509013444.327_20260509_013445
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509013444.327 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 01:35 | Success | - | |
|
exp_pytrain.20260509013154.081_20260509_013154
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 01:32 | Success | - | |
|
exp_self.20260509012530.326_20260509_012530
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509012530.326 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 01:26 | Success | - | |
|
exp_self.20260509011835.325_20260509_011836
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509011835.325 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 01:19 | Success | - | |
|
exp_self.20260509011141.324_20260509_011141
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509011141.324 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 01:12 | Success | - | |
|
exp_self.20260509010448.323_20260509_010448
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509010448.323 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 01:05 | Success | - | |
|
exp_pytrain.20260509005902.080_20260509_005902
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 01:00 | Success | - | |
|
exp_self.20260509005644.322_20260509_005645
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509005644.322 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 00:57 | Success | - | |
|
exp_self.20260509004945.321_20260509_004946
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509004945.321 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 00:50 | Success | - | |
|
exp_self.20260509004242.320_20260509_004242
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509004242.320 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 00:43 | Success | - | |
|
exp_self.20260509003532.319_20260509_003532
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509003532.319 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 00:36 | Success | - | |
|
exp_self.20260509002821.318_20260509_002822
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509002821.318 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 00:29 | Success | - | |
|
exp_pytrain.20260509002528.079_20260509_002528
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-09 00:26 | Success | - | |
|
exp_self.20260509001851.317_20260509_001851
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509001851.317 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 00:19 | Success | - | |
|
exp_cr_10.62762_dia.2026.309098_20260509_001519
|
Farming Upward: The TsingSky Guangzhou Future Agriculture Cluster as a County-Level Model for Context-Specific Smart Agr...
Paper ID: cr_10.62762_dia.2026.309098 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recove...
|
05-09 00:16 | Success | - | |
|
exp_self.20260509001152.316_20260509_001152
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509001152.316 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 00:12 | Success | - | |
|
exp_self.20260509000431.315_20260509_000431
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260509000431.315 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-09 00:05 | Success | - | |
|
exp_self.20260508235652.314_20260508_235652
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508235652.314 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 23:57 | Success | - | |
|
exp_pytrain.20260508235406.078_20260508_235407
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 23:55 | Success | - | |
|
exp_self.20260508234916.313_20260508_234916
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508234916.313 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 23:50 | Success | - | |
|
exp_self.20260508234208.312_20260508_234208
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508234208.312 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 23:43 | Success | - | |
|
exp_self.20260508233440.311_20260508_233441
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508233440.311 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 23:35 | Success | - | |
|
exp_self.20260508232736.310_20260508_232737
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508232736.310 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 23:28 | Success | - | |
|
exp_pytrain.20260508232208.077_20260508_232209
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 23:23 | Success | - | |
|
exp_self.20260508231942.309_20260508_231952
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508231942.309 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 23:20 | Success | - | |
|
exp_self.20260508231224.308_20260508_231224
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508231224.308 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 23:13 | Success | - | |
|
exp_self.20260508230515.307_20260508_230516
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508230515.307 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 23:06 | Success | - | |
|
exp_self.20260508225803.306_20260508_225803
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508225803.306 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 22:59 | Success | - | |
|
exp_self.20260508225108.305_20260508_225108
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508225108.305 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 22:52 | Success | - | |
|
exp_pytrain.20260508224815.076_20260508_224815
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 22:49 | Success | - | |
|
exp_self.20260508224044.304_20260508_224045
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508224044.304 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 22:41 | Success | - | |
|
exp_self.20260508223350.303_20260508_223350
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508223350.303 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 22:34 | Success | - | |
|
exp_self.20260508222643.302_20260508_222653
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508222643.302 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 22:27 | Success | - | |
|
exp_self.20260508221921.301_20260508_221921
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508221921.301 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 22:20 | Success | - | |
|
exp_pytrain.20260508221627.075_20260508_221627
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 22:17 | Success | - | |
|
exp_self.20260508221137.300_20260508_221137
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508221137.300 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 22:12 | Success | - | |
|
exp_self.20260508220436.299_20260508_220436
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508220436.299 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 22:05 | Success | - | |
|
exp_self.20260508215723.298_20260508_215724
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508215723.298 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 21:58 | Success | - | |
|
exp_self.20260508215007.297_20260508_215008
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508215007.297 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 21:51 | Success | - | |
|
exp_pytrain.20260508214446.074_20260508_214447
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 21:45 | Success | - | |
|
exp_self.20260508214229.296_20260508_214230
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508214229.296 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 21:43 | Success | - | |
|
exp_self.20260508213518.295_20260508_213518
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508213518.295 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 21:36 | Success | - | |
|
exp_self.20260508212757.294_20260508_212758
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508212757.294 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 21:29 | Success | - | |
|
exp_self.20260508212040.293_20260508_212040
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508212040.293 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 21:21 | Success | - | |
|
exp_self.20260508211347.292_20260508_211347
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508211347.292 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 21:14 | Success | - | |
|
exp_pytrain.20260508211100.073_20260508_211100
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 21:12 | Success | - | |
|
exp_self.20260508210410.291_20260508_210411
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508210410.291 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 21:05 | Success | - | |
|
exp_self.20260508205706.290_20260508_205706
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508205706.290 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 20:58 | Success | - | |
|
exp_self.20260508204954.289_20260508_204955
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508204954.289 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 20:50 | Success | - | |
|
exp_self.20260508204210.288_20260508_204211
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508204210.288 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 20:43 | Success | - | |
|
exp_pytrain.20260508203924.072_20260508_203924
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 20:40 | Success | - | |
|
exp_self.20260508203213.287_20260508_203214
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508203213.287 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 20:33 | Success | - | |
|
exp_self.20260508202457.286_20260508_202458
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508202457.286 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 20:26 | Success | - | |
|
exp_self.20260508201738.285_20260508_201739
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508201738.285 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 20:18 | Success | - | |
|
exp_self.20260508201048.284_20260508_201048
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508201048.284 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 20:11 | Success | - | |
|
exp_pytrain.20260508200747.071_20260508_200758
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 20:09 | Success | - | |
|
exp_self.20260508200311.283_20260508_200312
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508200311.283 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 20:04 | Success | - | |
|
exp_self.20260508195555.282_20260508_195555
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508195555.282 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 19:56 | Success | - | |
|
exp_self.20260508194829.281_20260508_194829
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508194829.281 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 19:49 | Success | - | |
|
exp_self.20260508194136.280_20260508_194136
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508194136.280 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 19:42 | Success | - | |
|
exp_pytrain.20260508193626.070_20260508_193626
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 19:37 | Success | - | |
|
exp_self.20260508193410.279_20260508_193410
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508193410.279 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 19:35 | Success | - | |
|
exp_self.20260508192712.278_20260508_192712
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508192712.278 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 19:28 | Success | - | |
|
exp_cr_10.1371_journal.pone.0346078_20260508_192235
|
Systematic evaluation of the DeepSeek large language model for clinical diagnostic reasoning
Paper ID: cr_10.1371_journal.pone.0346078 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Re...
|
05-08 19:23 | Success | - | |
|
exp_self.20260508192014.277_20260508_192014
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508192014.277 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 19:21 | Success | - | |
|
exp_gh_IbadKhalid7_turboquant-model_20260508_191646
|
IbadKhalid7/turboquant-model
Paper ID: gh_IbadKhalid7_turboquant-model - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Re...
|
05-08 19:17 | Success | - | |
|
exp_self.20260508191258.276_20260508_191258
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508191258.276 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 19:14 | Success | - | |
|
exp_self.20260508190543.275_20260508_190543
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508190543.275 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 19:06 | Success | - | |
|
exp_pytrain.20260508190254.069_20260508_190255
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 19:03 | Success | - | |
|
exp_self.20260508185635.274_20260508_185635
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508185635.274 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 18:57 | Success | - | |
|
exp_hf_2605.06663_20260508_185138
|
EMO: Pretraining Mixture of Experts for Emergent Modularity
Paper ID: hf_2605.06663 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-08 18:52 | Success | - | |
|
exp_self.20260508184917.273_20260508_184917
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508184917.273 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 18:50 | Success | - | |
|
exp_self.20260508184104.272_20260508_184104
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508184104.272 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 18:42 | Success | - | |
|
exp_self.20260508183344.271_20260508_183344
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508183344.271 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 18:34 | Success | - | |
|
exp_pytrain.20260508183042.068_20260508_183042
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 18:31 | Success | - | |
|
exp_self.20260508182417.270_20260508_182418
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508182417.270 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 18:25 | Success | - | |
|
exp_self.20260508181703.269_20260508_181703
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508181703.269 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 18:18 | Success | - | |
|
exp_self.20260508180955.268_20260508_180955
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508180955.268 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 18:10 | Success | - | |
|
exp_self.20260508180141.267_20260508_180141
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508180141.267 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 18:02 | Success | - | |
|
exp_pytrain.20260508175840.067_20260508_175841
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 17:59 | Success | - | |
|
exp_self.20260508175409.266_20260508_175410
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508175409.266 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 17:55 | Success | - | |
|
exp_self.20260508174643.265_20260508_174652
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508174643.265 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 17:47 | Success | - | |
|
exp_self.20260508173907.264_20260508_173907
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508173907.264 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 17:40 | Success | - | |
|
exp_self.20260508173149.263_20260508_173149
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508173149.263 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 17:32 | Success | - | |
|
exp_pytrain.20260508172636.066_20260508_172637
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 17:27 | Success | - | |
|
exp_self.20260508172417.262_20260508_172418
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508172417.262 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 17:25 | Success | - | |
|
exp_self.20260508171707.261_20260508_171707
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508171707.261 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 17:18 | Success | - | |
|
exp_cr_10.3390_educsci16050747_20260508_171334
|
The CO-SPACE Model: Developing an Analytical Framework for Interdisciplinary Student Collaboration
Paper ID: cr_10.3390_educsci16050747 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recover...
|
05-08 17:14 | Success | - | |
|
exp_self.20260508171007.260_20260508_171007
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508171007.260 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 17:11 | Success | - | |
|
exp_self.20260508170319.259_20260508_170319
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508170319.259 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 17:04 | Success | - | |
|
exp_self.20260508165625.258_20260508_165625
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508165625.258 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 16:57 | Success | - | |
|
exp_pytrain.20260508165340.065_20260508_165341
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 16:54 | Success | - | |
|
exp_self.20260508164709.257_20260508_164710
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508164709.257 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 16:48 | Success | - | |
|
exp_self.20260508163936.256_20260508_163937
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508163936.256 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 16:40 | Success | - | |
|
exp_self.20260508163221.255_20260508_163222
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508163221.255 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 16:33 | Success | - | |
|
exp_self.20260508162404.254_20260508_162404
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508162404.254 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 16:25 | Success | - | |
|
exp_pytrain.20260508162119.064_20260508_162120
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 16:22 | Success | - | |
|
exp_cr_10.3390_systems14050529_20260508_161714
|
An Interpretable Socio-Technical Decision Support System for Bi-Objective Urban Distribution Center Location: Adaptive O...
Paper ID: cr_10.3390_systems14050529 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recover...
|
05-08 16:18 | Success | - | |
|
exp_self.20260508161446.253_20260508_161446
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508161446.253 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 16:15 | Success | - | |
|
exp_self.20260508160647.252_20260508_160647
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508160647.252 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 16:07 | Success | - | |
|
exp_self.20260508155942.251_20260508_155951
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508155942.251 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 16:00 | Success | - | |
|
exp_self.20260508155138.250_20260508_155139
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508155138.250 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 15:52 | Success | - | |
|
exp_pytrain.20260508154842.063_20260508_154842
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 15:49 | Success | - | |
|
exp_self.20260508154416.249_20260508_154416
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508154416.249 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 15:45 | Success | - | |
|
exp_self.20260508153607.248_20260508_153608
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508153607.248 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 15:37 | Success | - | |
|
exp_self.20260508152816.247_20260508_152816
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508152816.247 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 15:29 | Success | - | |
|
exp_self.20260508151958.246_20260508_151959
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508151958.246 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 15:21 | Success | - | |
|
exp_pytrain.20260508151620.062_20260508_151620
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 15:17 | Success | - | |
|
exp_self.20260508151159.245_20260508_151200
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508151159.245 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 15:13 | Success | - | |
|
exp_self.20260508150344.244_20260508_150344
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508150344.244 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 15:04 | Success | - | |
|
exp_self.20260508145543.243_20260508_145544
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508145543.243 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 14:56 | Success | - | |
|
exp_self.20260508144730.242_20260508_144730
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508144730.242 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 14:48 | Success | - | |
|
exp_pytrain.20260508144401.061_20260508_144401
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 14:45 | Success | - | |
|
exp_self.20260508143814.241_20260508_143814
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508143814.241 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 14:39 | Success | - | |
|
exp_self.20260508143020.240_20260508_143021
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508143020.240 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 14:31 | Success | - | |
|
exp_self.20260508142224.239_20260508_142224
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508142224.239 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 14:23 | Success | - | |
|
exp_self.20260508141453.238_20260508_141453
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508141453.238 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 14:15 | Success | - | |
|
exp_pytrain.20260508141155.060_20260508_141156
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 14:12 | Success | - | |
|
exp_self.20260508140704.237_20260508_140705
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508140704.237 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 14:08 | Success | - | |
|
exp_self.20260508135851.236_20260508_135851
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508135851.236 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 13:59 | Success | - | |
|
exp_self.20260508135033.235_20260508_135033
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508135033.235 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 13:51 | Success | - | |
|
exp_self.20260508134259.234_20260508_134259
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508134259.234 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 13:44 | Success | - | |
|
exp_pytrain.20260508133911.059_20260508_133912
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 13:40 | Success | - | |
|
exp_self.20260508133200.233_20260508_133201
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508133200.233 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 13:33 | Success | - | |
|
exp_self.20260508132453.232_20260508_132454
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508132453.232 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 13:25 | Success | - | |
|
exp_self.20260508131645.231_20260508_131646
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508131645.231 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 13:17 | Success | - | |
|
exp_self.20260508130934.230_20260508_130934
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508130934.230 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 13:10 | Success | - | |
|
exp_pytrain.20260508130559.058_20260508_130559
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 13:07 | Success | - | |
|
exp_self.20260508125845.229_20260508_125845
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508125845.229 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 12:59 | Success | - | |
|
exp_self.20260508125034.228_20260508_125034
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508125034.228 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 12:51 | Success | - | |
|
exp_self.20260508124342.227_20260508_124342
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508124342.227 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 12:44 | Success | - | |
|
exp_self.20260508123623.226_20260508_123623
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508123623.226 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 12:37 | Success | - | |
|
exp_pytrain.20260508123326.057_20260508_123327
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 12:34 | Success | - | |
|
exp_self.20260508122700.225_20260508_122701
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508122700.225 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 12:28 | Success | - | |
|
exp_self.20260508122001.224_20260508_122001
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508122001.224 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 12:21 | Success | - | |
|
exp_self.20260508121303.223_20260508_121303
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508121303.223 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 12:14 | Success | - | |
|
exp_self.20260508120509.222_20260508_120509
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508120509.222 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 12:06 | Success | - | |
|
exp_pytrain.20260508120206.056_20260508_120206
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 12:03 | Success | - | |
|
exp_self.20260508115545.221_20260508_115546
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508115545.221 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 11:56 | Success | - | |
|
exp_self.20260508114830.220_20260508_114831
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508114830.220 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 11:49 | Success | - | |
|
exp_self.20260508114107.219_20260508_114107
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508114107.219 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 11:42 | Success | - | |
|
exp_self.20260508113419.218_20260508_113420
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508113419.218 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 11:35 | Success | - | |
|
exp_pytrain.20260508112933.055_20260508_112933
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 11:30 | Success | - | |
|
exp_self.20260508112632.217_20260508_112632
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508112632.217 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 11:27 | Success | - | |
|
exp_self.20260508111824.216_20260508_111824
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508111824.216 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 11:19 | Success | - | |
|
exp_self.20260508110954.215_20260508_110954
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508110954.215 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 11:10 | Success | - | |
|
exp_self.20260508110158.214_20260508_110158
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508110158.214 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 11:03 | Success | - | |
|
exp_pytrain.20260508105634.054_20260508_105635
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 10:57 | Success | - | |
|
exp_self.20260508105329.213_20260508_105330
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508105329.213 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 10:54 | Success | - | |
|
exp_self.20260508104501.212_20260508_104502
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508104501.212 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 10:46 | Success | - | |
|
exp_self.20260508103635.211_20260508_103636
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508103635.211 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 10:37 | Success | - | |
|
exp_self.20260508102828.210_20260508_102828
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508102828.210 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 10:29 | Success | - | |
|
exp_pytrain.20260508102450.053_20260508_102450
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 10:25 | Success | - | |
|
exp_self.20260508102043.209_20260508_102043
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508102043.209 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 10:21 | Success | - | |
|
exp_self.20260508101214.208_20260508_101214
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508101214.208 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 10:13 | Success | - | |
|
exp_self.20260508100409.207_20260508_100409
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508100409.207 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 10:05 | Success | - | |
|
exp_self.20260508095552.206_20260508_095552
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508095552.206 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 09:56 | Success | - | |
|
exp_pytrain.20260508095210.052_20260508_095210
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 09:53 | Success | - | |
|
exp_self.20260508094626.205_20260508_094626
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508094626.205 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 09:47 | Success | - | |
|
exp_self.20260508093836.204_20260508_093836
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508093836.204 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 09:39 | Success | - | |
|
exp_self.20260508093037.203_20260508_093037
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508093037.203 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 09:31 | Success | - | |
|
exp_self.20260508092248.202_20260508_092249
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508092248.202 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 09:23 | Success | - | |
|
exp_pytrain.20260508091957.051_20260508_091957
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 09:20 | Success | - | |
|
exp_self.20260508091420.201_20260508_091420
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508091420.201 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 09:15 | Success | - | |
|
exp_self.20260508090539.200_20260508_090540
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508090539.200 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 09:06 | Success | - | |
|
exp_self.20260508085824.199_20260508_085825
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508085824.199 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 08:59 | Success | - | |
|
exp_self.20260508085104.198_20260508_085105
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508085104.198 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 08:52 | Success | - | |
|
exp_pytrain.20260508084806.050_20260508_084807
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 08:49 | Success | - | |
|
exp_self.20260508084329.197_20260508_084329
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508084329.197 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 08:44 | Success | - | |
|
exp_self.20260508083605.196_20260508_083605
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508083605.196 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 08:37 | Success | - | |
|
exp_self.20260508082858.195_20260508_082858
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508082858.195 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 08:30 | Success | - | |
|
exp_self.20260508082121.194_20260508_082121
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508082121.194 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 08:22 | Success | - | |
|
exp_pytrain.20260508081601.049_20260508_081601
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 08:17 | Success | - | |
|
exp_self.20260508081337.193_20260508_081338
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508081337.193 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 08:14 | Success | - | |
|
exp_self.20260508080628.192_20260508_080629
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508080628.192 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 08:07 | Success | - | |
|
exp_self.20260508075924.191_20260508_075924
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508075924.191 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 08:00 | Success | - | |
|
exp_self.20260508075240.190_20260508_075241
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508075240.190 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 07:53 | Success | - | |
|
exp_self.20260508074543.189_20260508_074543
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508074543.189 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 07:46 | Success | - | |
|
exp_pytrain.20260508074259.048_20260508_074300
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 07:44 | Success | - | |
|
exp_self.20260508073629.188_20260508_073630
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508073629.188 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 07:37 | Success | - | |
|
exp_self.20260508072929.187_20260508_072930
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508072929.187 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 07:30 | Success | - | |
|
exp_self.20260508072236.186_20260508_072236
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508072236.186 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 07:23 | Success | - | |
|
exp_self.20260508071538.185_20260508_071538
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508071538.185 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 07:16 | Success | - | |
|
exp_pytrain.20260508071136.047_20260508_071136
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 07:12 | Success | - | |
|
exp_self.20260508070800.184_20260508_070800
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508070800.184 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 07:09 | Success | - | |
|
exp_self.20260508070058.183_20260508_070059
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508070058.183 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 07:02 | Success | - | |
|
exp_self.20260508065226.182_20260508_065226
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508065226.182 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 06:53 | Success | - | |
|
exp_self.20260508064516.181_20260508_064525
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508064516.181 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 06:46 | Success | - | |
|
exp_pytrain.20260508063943.046_20260508_063944
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 06:40 | Success | - | |
|
exp_self.20260508063726.180_20260508_063727
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508063726.180 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 06:38 | Success | - | |
|
exp_self.20260508062737.179_20260508_062737
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508062737.179 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 06:28 | Success | - | |
|
exp_hf_2605.04045_20260508_062411
|
Audio-Visual Intelligence in Large Foundation Models
Paper ID: hf_2605.04045 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-08 06:25 | Success | - | |
|
exp_self.20260508061725.178_20260508_061725
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508061725.178 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 06:18 | Success | - | |
|
exp_self.20260508061007.177_20260508_061007
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508061007.177 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 06:11 | Success | - | |
|
exp_pytrain.20260508060640.045_20260508_060641
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 06:07 | Success | - | |
|
exp_hf_2605.05758_20260508_060320
|
BioTool: A Comprehensive Tool-Calling Dataset for Enhancing Biomedical Capabilities of Large Language Models
Paper ID: hf_2605.05758 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-08 06:04 | Success | - | |
|
exp_self.20260508055913.176_20260508_055913
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508055913.176 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 06:00 | Success | - | |
|
exp_self.20260508055200.175_20260508_055201
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508055200.175 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 05:53 | Success | - | |
|
exp_self.20260508054445.174_20260508_054445
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508054445.174 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 05:45 | Success | - | |
|
exp_self.20260508053712.173_20260508_053712
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508053712.173 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 05:38 | Success | - | |
|
exp_pytrain.20260508053352.044_20260508_053353
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 05:34 | Success | - | |
|
exp_self.20260508052705.172_20260508_052706
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508052705.172 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 05:28 | Success | - | |
|
exp_self.20260508051956.171_20260508_051957
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508051956.171 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 05:21 | Success | - | |
|
exp_self.20260508051237.170_20260508_051238
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508051237.170 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 05:13 | Success | - | |
|
exp_self.20260508050519.169_20260508_050519
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508050519.169 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 05:06 | Success | - | |
|
exp_pytrain.20260508050157.043_20260508_050158
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 05:03 | Success | - | |
|
exp_self.20260508045751.168_20260508_045751
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508045751.168 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 04:58 | Success | - | |
|
exp_self.20260508045034.167_20260508_045035
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508045034.167 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 04:51 | Success | - | |
|
exp_self.20260508044321.166_20260508_044322
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508044321.166 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 04:44 | Success | - | |
|
exp_self.20260508043601.165_20260508_043602
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508043601.165 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 04:37 | Success | - | |
|
exp_hf_2605.04956_20260508_043221
|
KernelBench-X: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels
Paper ID: hf_2605.04956 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-08 04:33 | Success | - | |
|
exp_pytrain.20260508042925.042_20260508_042926
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 04:30 | Success | - | |
|
exp_self.20260508042242.164_20260508_042242
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508042242.164 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 04:23 | Success | - | |
|
exp_self.20260508041522.163_20260508_041522
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508041522.163 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 04:16 | Success | - | |
|
exp_self.20260508040802.162_20260508_040803
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508040802.162 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 04:09 | Success | - | |
|
exp_self.20260508040043.161_20260508_040044
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508040043.161 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 04:01 | Success | - | |
|
exp_pytrain.20260508035723.041_20260508_035723
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 03:58 | Success | - | |
|
exp_self.20260508035039.160_20260508_035039
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508035039.160 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 03:51 | Success | - | |
|
exp_self.20260508034326.159_20260508_034326
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508034326.159 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 03:44 | Success | - | |
|
exp_self.20260508033609.158_20260508_033610
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508033609.158 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 03:37 | Success | - | |
|
exp_self.20260508032851.157_20260508_032852
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508032851.157 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 03:29 | Success | - | |
|
exp_pytrain.20260508032531.040_20260508_032532
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 03:26 | Success | - | |
|
exp_self.20260508031842.156_20260508_031843
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508031842.156 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 03:19 | Success | - | |
|
exp_self.20260508031131.155_20260508_031131
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508031131.155 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 03:12 | Success | - | |
|
exp_self.20260508030417.154_20260508_030417
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508030417.154 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 03:05 | Success | - | |
|
exp_self.20260508025656.153_20260508_025657
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508025656.153 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 02:58 | Success | - | |
|
exp_pytrain.20260508025335.039_20260508_025336
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 02:54 | Success | - | |
|
exp_self.20260508024825.152_20260508_024826
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508024825.152 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 02:49 | Success | - | |
|
exp_self.20260508024059.151_20260508_024059
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508024059.151 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 02:42 | Success | - | |
|
exp_self.20260508023344.150_20260508_023345
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508023344.150 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 02:34 | Success | - | |
|
exp_hf_2605.06216_20260508_022843
|
TIDE: Every Layer Knows the Token Beneath the Context
Paper ID: hf_2605.06216 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-08 02:29 | Success | - | |
|
exp_self.20260508022546.149_20260508_022547
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508022546.149 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 02:26 | Success | - | |
|
exp_pytrain.20260508022107.038_20260508_022108
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 02:22 | Success | - | |
|
exp_self.20260508021707.148_20260508_021707
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508021707.148 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 02:18 | Success | - | |
|
exp_self.20260508020750.147_20260508_020751
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508020750.147 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 02:08 | Success | - | |
|
exp_self.20260508020038.146_20260508_020038
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508020038.146 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 02:01 | Success | - | |
|
exp_self.20260508015330.145_20260508_015331
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508015330.145 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 01:54 | Success | - | |
|
exp_pytrain.20260508014859.037_20260508_014859
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 01:50 | Success | - | |
|
exp_self.20260508014606.144_20260508_014606
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508014606.144 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 01:47 | Success | - | |
|
exp_self.20260508013847.143_20260508_013848
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508013847.143 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 01:39 | Success | - | |
|
exp_self.20260508013017.142_20260508_013017
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508013017.142 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 01:31 | Success | - | |
|
exp_self.20260508012305.141_20260508_012306
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508012305.141 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 01:24 | Success | - | |
|
exp_cr_10.3389_frai.2026.1760246_20260508_011940
|
Language-based personality assessment from life narratives: a focus on model interpretability and efficiency
Paper ID: cr_10.3389_frai.2026.1760246 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recov...
|
05-08 01:20 | Success | - | |
|
exp_pytrain.20260508011643.036_20260508_011643
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 01:17 | Success | - | |
|
exp_self.20260508011134.140_20260508_011134
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508011134.140 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 01:12 | Success | - | |
|
exp_self.20260508010338.139_20260508_010339
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508010338.139 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 01:04 | Success | - | |
|
exp_self.20260508005624.138_20260508_005625
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508005624.138 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 00:57 | Success | - | |
|
exp_self.20260508004911.137_20260508_004912
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508004911.137 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 00:50 | Success | - | |
|
exp_pytrain.20260508004440.035_20260508_004440
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 00:45 | Success | - | |
|
exp_self.20260508004148.136_20260508_004148
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508004148.136 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 00:42 | Success | - | |
|
exp_self.20260508003319.135_20260508_003319
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508003319.135 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 00:34 | Success | - | |
|
exp_self.20260508002605.134_20260508_002605
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508002605.134 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 00:27 | Success | - | |
|
exp_cr_10.3389_fendo.2026.1776707_20260508_002240
|
Global knowledge graph of osteoporosis biomarkers based on large language model embeddings and complex network algorithm...
Paper ID: cr_10.3389_fendo.2026.1776707 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Reco...
|
05-08 00:23 | Success | - | |
|
exp_cr_10.3389_fmed.2026.1817215_20260508_001814
|
Low-energy small language models with retrieval-augmented generation can surpass large-model performance in rheumatology
Paper ID: cr_10.3389_fmed.2026.1817215 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recov...
|
05-08 00:19 | Success | - | |
|
exp_self.20260508001513.133_20260508_001513
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508001513.133 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 00:16 | Success | - | |
|
exp_pytrain.20260508001153.034_20260508_001154
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-08 00:12 | Success | - | |
|
exp_self.20260508000505.132_20260508_000506
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260508000505.132 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-08 00:06 | Success | - | |
|
exp_self.20260507235702.131_20260507_235702
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507235702.131 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 23:58 | Success | - | |
|
exp_self.20260507234936.130_20260507_234936
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507234936.130 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 23:50 | Success | - | |
|
exp_hf_2605.04451_20260507_234500
|
RemoteZero: Geospatial Reasoning with Zero Human Annotations
Paper ID: hf_2605.04451 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-07 23:46 | Success | - | |
|
exp_self.20260507234249.129_20260507_234250
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507234249.129 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 23:43 | Success | - | |
|
exp_pytrain.20260507234010.033_20260507_234010
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 23:41 | Success | - | |
|
exp_self.20260507233311.128_20260507_233311
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507233311.128 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 23:34 | Success | - | |
|
exp_self.20260507232531.127_20260507_232531
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507232531.127 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 23:26 | Success | - | |
|
exp_self.20260507231755.126_20260507_231755
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507231755.126 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 23:18 | Success | - | |
|
exp_self.20260507231019.125_20260507_231020
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507231019.125 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 23:11 | Success | - | |
|
exp_pytrain.20260507230746.032_20260507_230746
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 23:08 | Success | - | |
|
exp_self.20260507230214.124_20260507_230214
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507230214.124 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 23:03 | Success | - | |
|
exp_self.20260507225439.123_20260507_225439
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507225439.123 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 22:55 | Success | - | |
|
exp_self.20260507224704.122_20260507_224704
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507224704.122 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 22:48 | Success | - | |
|
exp_hf_2605.06222_20260507_224341
|
When to Trust Imagination: Adaptive Action Execution for World Action Models
Paper ID: hf_2605.06222 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-07 22:44 | Success | - | |
|
exp_self.20260507223814.121_20260507_223814
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507223814.121 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 22:39 | Success | - | |
|
exp_pytrain.20260507223534.031_20260507_223534
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 22:36 | Success | - | |
|
exp_self.20260507223009.120_20260507_223010
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507223009.120 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 22:31 | Success | - | |
|
exp_hf_2605.04647_20260507_222646
|
ReflectDrive-2: Reinforcement-Learning-Aligned Self-Editing for Discrete Diffusion Driving
Paper ID: hf_2605.04647 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-07 22:27 | Success | - | |
|
exp_self.20260507222118.119_20260507_222118
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507222118.119 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 22:22 | Success | - | |
|
exp_hf_2605.06376_20260507_221820
|
Continuous-Time Distribution Matching for Few-Step Diffusion Distillation
Paper ID: hf_2605.06376 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-07 22:19 | Success | - | |
|
exp_hf_2605.06356_20260507_221416
|
SwiftI2V: Efficient High-Resolution Image-to-Video Generation via Conditional Segment-wise Generation
Paper ID: hf_2605.06356 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-07 22:15 | Success | - | |
|
exp_self.20260507221205.118_20260507_221206
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507221205.118 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 22:13 | Success | - | |
|
exp_hf_2605.06200_20260507_220843
|
A^2TGPO: Agentic Turn-Group Policy Optimization with Adaptive Turn-level Clipping
Paper ID: hf_2605.06200 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-07 22:09 | Success | - | |
|
exp_2605.06664v1_20260507_220620
|
BAMI: Training-Free Bias Mitigation in GUI Grounding
Paper ID: 2605.06664v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
05-07 22:07 | Success | - | |
|
exp_pytrain.20260507220406.030_20260507_220406
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 22:05 | Success | - | |
|
exp_self.20260507220159.117_20260507_220159
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507220159.117 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 22:03 | Success | - | |
|
exp_2605.06663v1_20260507_215836
|
EMO: Pretraining Mixture of Experts for Emergent Modularity
Paper ID: 2605.06663v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
05-07 21:59 | Success | - | |
|
exp_self.20260507215301.116_20260507_215302
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507215301.116 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 21:54 | Success | - | |
|
exp_2605.06665v1_20260507_214944
|
UniPool: A Globally Shared Expert Pool for Mixture-of-Experts
Paper ID: 2605.06665v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
05-07 21:50 | Success | - | |
|
exp_self.20260507214324.115_20260507_214325
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507214324.115 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 21:44 | Success | - | |
|
exp_self.20260507213551.114_20260507_213552
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507213551.114 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 21:36 | Success | - | |
|
exp_hf_2605.05922_20260507_213253
|
Think, then Score: Decoupled Reasoning and Scoring for Video Reward Modeling
Paper ID: hf_2605.05922 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-07 21:33 | Success | - | |
|
exp_pytrain.20260507213042.029_20260507_213042
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 21:31 | Success | - | |
|
exp_self.20260507212727.113_20260507_212727
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507212727.113 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 21:28 | Success | - | |
|
exp_hf_2605.06665_20260507_212432
|
UniPool: A Globally Shared Expert Pool for Mixture-of-Experts
Paper ID: hf_2605.06665 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-07 21:25 | Success | - | |
|
exp_hf_2605.06548_20260507_212028
|
Continuous Latent Diffusion Language Model
Paper ID: hf_2605.06548 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-07 21:21 | Success | - | |
|
exp_self.20260507211819.112_20260507_211819
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507211819.112 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 21:19 | Success | - | |
|
exp_self.20260507211038.111_20260507_211039
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507211038.111 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 21:11 | Success | - | |
|
exp_2605.06225v1_20260507_210713
|
Memory Inception: Latent-Space KV Cache Manipulation for Steering LLMs
Paper ID: 2605.06225v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
05-07 21:08 | Success | - | |
|
exp_self.20260507210349.110_20260507_210349
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507210349.110 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 21:04 | Success | - | |
|
exp_pytrain.20260507205859.028_20260507_205859
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 21:00 | Success | - | |
|
exp_self.20260507205653.109_20260507_205653
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507205653.109 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 20:57 | Success | - | |
|
exp_2605.06230v1_20260507_205222
|
Safactory: A Scalable Agent Factory for Trustworthy Autonomous Intelligence
Paper ID: 2605.06230v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
05-07 20:53 | Success | - | |
|
exp_self.20260507205008.108_20260507_205008
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507205008.108 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 20:51 | Success | - | |
|
exp_2605.06229v1_20260507_204646
|
Look Beyond Saliency: Low-Attention Guided Dual Encoding for Video Semantic Search
Paper ID: 2605.06229v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
05-07 20:47 | Success | - | |
|
exp_self.20260507204322.107_20260507_204322
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507204322.107 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 20:44 | Success | - | |
|
exp_self.20260507203529.106_20260507_203529
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507203529.106 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 20:36 | Success | - | |
|
exp_self.20260507202758.105_20260507_202758
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507202758.105 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 20:29 | Success | - | |
|
exp_pytrain.20260507202522.027_20260507_202523
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 20:26 | Success | - | |
|
exp_self.20260507201911.104_20260507_201912
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507201911.104 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 20:20 | Success | - | |
|
exp_self.20260507201131.103_20260507_201131
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507201131.103 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 20:12 | Success | - | |
|
exp_self.20260507200355.102_20260507_200356
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507200355.102 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 20:04 | Success | - | |
|
exp_self.20260507195611.101_20260507_195612
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507195611.101 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 19:57 | Success | - | |
|
exp_pytrain.20260507195336.026_20260507_195337
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 19:54 | Success | - | |
|
exp_self.20260507194808.100_20260507_194809
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507194808.100 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 19:49 | Success | - | |
|
exp_self.20260507194016.099_20260507_194017
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507194016.099 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 19:41 | Success | - | |
|
exp_self.20260507193226.098_20260507_193226
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507193226.098 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 19:33 | Success | - | |
|
exp_self.20260507192444.097_20260507_192445
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507192444.097 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 19:25 | Success | - | |
|
exp_pytrain.20260507192206.025_20260507_192207
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 19:23 | Success | - | |
|
exp_self.20260507191508.096_20260507_191509
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507191508.096 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 19:16 | Success | - | |
|
exp_self.20260507190726.095_20260507_190727
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507190726.095 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 19:08 | Success | - | |
|
exp_self.20260507190034.094_20260507_190034
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507190034.094 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 19:01 | Success | - | |
|
exp_self.20260507185226.093_20260507_185226
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507185226.093 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 18:53 | Success | - | |
|
exp_pytrain.20260507184947.024_20260507_184948
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 18:50 | Success | - | |
|
exp_self.20260507184252.092_20260507_184253
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507184252.092 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 18:43 | Success | - | |
|
exp_self.20260507183513.091_20260507_183513
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507183513.091 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 18:36 | Success | - | |
|
exp_self.20260507182731.090_20260507_182732
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507182731.090 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 18:28 | Success | - | |
|
exp_self.20260507181957.089_20260507_181957
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507181957.089 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 18:21 | Success | - | |
|
exp_pytrain.20260507181722.023_20260507_181723
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 18:18 | Success | - | |
|
exp_self.20260507181151.088_20260507_181151
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507181151.088 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 18:12 | Success | - | |
|
exp_self.20260507180417.087_20260507_180417
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507180417.087 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 18:05 | Success | - | |
|
exp_self.20260507175643.086_20260507_175643
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507175643.086 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 17:57 | Success | - | |
|
exp_self.20260507174801.085_20260507_174801
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507174801.085 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 17:49 | Success | - | |
|
exp_pytrain.20260507174523.022_20260507_174523
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 17:46 | Success | - | |
|
exp_self.20260507173816.084_20260507_173817
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507173816.084 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 17:39 | Success | - | |
|
exp_self.20260507173042.083_20260507_173043
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507173042.083 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 17:31 | Success | - | |
|
exp_self.20260507172300.082_20260507_172301
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507172300.082 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 17:24 | Success | - | |
|
exp_self.20260507171547.081_20260507_171548
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507171547.081 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 17:16 | Success | - | |
|
exp_pytrain.20260507171227.021_20260507_171227
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 17:13 | Success | - | |
|
exp_self.20260507170539.080_20260507_170540
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507170539.080 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 17:06 | Success | - | |
|
exp_self.20260507165753.079_20260507_165754
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507165753.079 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 16:58 | Success | - | |
|
exp_self.20260507165016.078_20260507_165016
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507165016.078 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 16:51 | Success | - | |
|
exp_self.20260507164259.077_20260507_164300
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507164259.077 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 16:44 | Success | - | |
|
exp_pytrain.20260507163939.020_20260507_163939
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 16:40 | Success | - | |
|
exp_self.20260507163254.076_20260507_163254
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507163254.076 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 16:33 | Success | - | |
|
exp_self.20260507162541.075_20260507_162541
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507162541.075 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 16:26 | Success | - | |
|
exp_self.20260507161834.074_20260507_161834
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507161834.074 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 16:19 | Success | - | |
|
exp_self.20260507161120.073_20260507_161120
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507161120.073 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 16:12 | Success | - | |
|
exp_pytrain.20260507160754.019_20260507_160755
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 16:08 | Success | - | |
|
exp_self.20260507160110.072_20260507_160111
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507160110.072 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 16:02 | Success | - | |
|
exp_self.20260507155356.071_20260507_155356
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507155356.071 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 15:55 | Success | - | |
|
exp_self.20260507154645.070_20260507_154645
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507154645.070 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 15:47 | Success | - | |
|
exp_self.20260507153911.069_20260507_153912
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507153911.069 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 15:40 | Success | - | |
|
exp_pytrain.20260507153551.018_20260507_153552
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 15:36 | Success | - | |
|
exp_self.20260507152907.068_20260507_152907
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507152907.068 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 15:30 | Success | - | |
|
exp_self.20260507152159.067_20260507_152159
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507152159.067 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 15:23 | Success | - | |
|
exp_self.20260507151442.066_20260507_151442
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507151442.066 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 15:15 | Success | - | |
|
exp_self.20260507150717.065_20260507_150717
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507150717.065 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 15:08 | Success | - | |
|
exp_pytrain.20260507150358.017_20260507_150358
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 15:05 | Success | - | |
|
exp_self.20260507145710.064_20260507_145711
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507145710.064 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 14:58 | Success | - | |
|
exp_self.20260507145000.063_20260507_145000
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507145000.063 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 14:51 | Success | - | |
|
exp_self.20260507144241.062_20260507_144242
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507144241.062 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 14:43 | Success | - | |
|
exp_self.20260507143529.061_20260507_143529
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507143529.061 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 14:36 | Success | - | |
|
exp_pytrain.20260507143138.016_20260507_143139
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 14:32 | Success | - | |
|
exp_self.20260507142452.060_20260507_142453
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507142452.060 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 14:25 | Success | - | |
|
exp_self.20260507141737.059_20260507_141737
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507141737.059 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 14:18 | Success | - | |
|
exp_self.20260507141031.058_20260507_141031
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507141031.058 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 14:11 | Success | - | |
|
exp_self.20260507140313.057_20260507_140313
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507140313.057 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 14:04 | Success | - | |
|
exp_pytrain.20260507135945.015_20260507_135946
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 14:00 | Success | - | |
|
exp_self.20260507135303.056_20260507_135303
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507135303.056 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 13:54 | Success | - | |
|
exp_self.20260507134548.055_20260507_134548
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507134548.055 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 13:46 | Success | - | |
|
exp_self.20260507133839.054_20260507_133839
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507133839.054 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 13:39 | Success | - | |
|
exp_self.20260507133125.053_20260507_133125
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507133125.053 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 13:32 | Success | - | |
|
exp_pytrain.20260507132755.014_20260507_132756
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 13:28 | Success | - | |
|
exp_self.20260507132116.052_20260507_132117
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507132116.052 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 13:22 | Success | - | |
|
exp_self.20260507131402.051_20260507_131402
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507131402.051 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 13:15 | Success | - | |
|
exp_self.20260507130648.050_20260507_130648
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507130648.050 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 13:07 | Success | - | |
|
exp_self.20260507125939.049_20260507_125939
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507125939.049 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 13:00 | Success | - | |
|
exp_pytrain.20260507125612.013_20260507_125612
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 12:57 | Success | - | |
|
exp_self.20260507124932.048_20260507_124932
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507124932.048 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 12:50 | Success | - | |
|
exp_self.20260507124217.047_20260507_124218
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507124217.047 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 12:43 | Success | - | |
|
exp_self.20260507123505.046_20260507_123505
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507123505.046 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 12:36 | Success | - | |
|
exp_self.20260507122750.045_20260507_122751
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507122750.045 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 12:28 | Success | - | |
|
exp_pytrain.20260507122423.012_20260507_122423
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 12:25 | Success | - | |
|
exp_self.20260507121740.044_20260507_121740
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507121740.044 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 12:18 | Success | - | |
|
exp_self.20260507121026.043_20260507_121026
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507121026.043 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 12:11 | Success | - | |
|
exp_self.20260507120244.042_20260507_120244
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507120244.042 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 12:03 | Success | - | |
|
exp_self.20260507115513.041_20260507_115513
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507115513.041 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 11:56 | Success | - | |
|
exp_pytrain.20260507115238.011_20260507_115239
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 11:53 | Success | - | |
|
exp_self.20260507114717.040_20260507_114717
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507114717.040 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 11:48 | Success | - | |
|
exp_self.20260507113947.039_20260507_113947
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507113947.039 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 11:40 | Success | - | |
|
exp_hf_2605.02910_20260507_113626
|
CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing
Paper ID: hf_2605.02910 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-07 11:37 | Success | - | |
|
exp_self.20260507113055.038_20260507_113056
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507113055.038 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 11:31 | Success | - | |
|
exp_self.20260507112316.037_20260507_112317
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507112316.037 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 11:24 | Success | - | |
|
exp_pytrain.20260507112048.010_20260507_112049
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 11:21 | Success | - | |
|
exp_self.20260507111154.036_20260507_111155
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507111154.036 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 11:12 | Success | - | |
|
exp_self.20260507110430.035_20260507_110430
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507110430.035 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 11:05 | Success | - | |
|
exp_self.20260507105742.034_20260507_105742
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507105742.034 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 10:58 | Success | - | |
|
exp_self.20260507105028.033_20260507_105029
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507105028.033 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 10:51 | Success | - | |
|
exp_pytrain.20260507104755.009_20260507_104755
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 10:48 | Success | - | |
|
exp_self.20260507104101.032_20260507_104102
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507104101.032 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 10:42 | Success | - | |
|
exp_self.20260507103323.031_20260507_103323
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507103323.031 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 10:34 | Success | - | |
|
exp_self.20260507102549.030_20260507_102550
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507102549.030 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 10:26 | Success | - | |
|
exp_self.20260507101812.029_20260507_101813
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507101812.029 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 10:19 | Success | - | |
|
exp_pytrain.20260507101544.008_20260507_101544
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 10:16 | Success | - | |
|
exp_self.20260507100837.028_20260507_100837
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507100837.028 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 10:09 | Success | - | |
|
exp_self.20260507100104.027_20260507_100104
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507100104.027 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 10:02 | Success | - | |
|
exp_self.20260507095323.026_20260507_095323
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507095323.026 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 09:54 | Success | - | |
|
exp_self.20260507094547.025_20260507_094547
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507094547.025 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 09:46 | Success | - | |
|
exp_pytrain.20260507094318.007_20260507_094318
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 09:44 | Success | - | |
|
exp_self.20260507093840.024_20260507_093841
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507093840.024 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 09:39 | Success | - | |
|
exp_hf_2604.27393_20260507_093538
|
MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction
Paper ID: hf_2604.27393 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-07 09:36 | Success | - | |
|
exp_self.20260507092937.023_20260507_092937
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507092937.023 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 09:30 | Success | - | |
|
exp_self.20260507092143.022_20260507_092143
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507092143.022 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 09:22 | Success | - | |
|
exp_self.20260507091349.021_20260507_091349
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507091349.021 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 09:14 | Success | - | |
|
exp_pytrain.20260507091119.006_20260507_091119
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 09:12 | Success | - | |
|
exp_self.20260507090536.020_20260507_090536
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507090536.020 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 09:06 | Success | - | |
|
exp_self.20260507085755.019_20260507_085756
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507085755.019 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 08:58 | Success | - | |
|
exp_self.20260507085011.018_20260507_085011
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507085011.018 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 08:51 | Success | - | |
|
exp_self.20260507084228.017_20260507_084228
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507084228.017 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 08:43 | Success | - | |
|
exp_pytrain.20260507083958.005_20260507_083959
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 08:41 | Success | - | |
|
exp_self.20260507083358.016_20260507_083358
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507083358.016 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 08:35 | Success | - | |
|
exp_self.20260507082612.015_20260507_082612
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507082612.015 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 08:27 | Success | - | |
|
exp_self.20260507081830.014_20260507_081830
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507081830.014 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 08:19 | Success | - | |
|
exp_self.20260507081052.013_20260507_081052
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507081052.013 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 08:11 | Success | - | |
|
exp_pytrain.20260507080816.004_20260507_080816
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 08:09 | Success | - | |
|
exp_self.20260507080105.012_20260507_080105
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507080105.012 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 08:02 | Success | - | |
|
exp_self.20260507075326.011_20260507_075326
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507075326.011 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 07:54 | Success | - | |
|
exp_self.20260507074550.010_20260507_074551
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507074550.010 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 07:46 | Success | - | |
|
exp_self.20260507073818.009_20260507_073819
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507073818.009 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 07:39 | Success | - | |
|
exp_pytrain.20260507073544.003_20260507_073545
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 07:36 | Success | - | |
|
exp_self.20260507073128.008_20260507_073129
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507073128.008 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 07:32 | Success | - | |
|
exp_self.20260507072351.007_20260507_072351
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507072351.007 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 07:24 | Success | - | |
|
exp_self.20260507071610.006_20260507_071610
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507071610.006 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 07:17 | Success | - | |
|
exp_cr_10.3390_app16104584_20260507_071322
|
Assessing Stand-to-Sit Kinematics via mmWave Radar: A Real-to-Sim Robust Bidirectional State-Space Model
Paper ID: cr_10.3390_app16104584 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered b...
|
05-07 07:14 | Success | - | |
|
exp_self.20260507070604.005_20260507_070604
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507070604.005 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 07:07 | Success | - | |
|
exp_pytrain.20260507070329.002_20260507_070329
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 07:04 | Success | - | |
|
exp_self.20260507065759.004_20260507_065759
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507065759.004 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 06:59 | Success | - | |
|
exp_self.20260507064938.003_20260507_064938
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507064938.003 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 06:50 | Success | - | |
|
exp_self.20260507064159.002_20260507_064200
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507064159.002 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 06:43 | Success | - | |
|
exp_self.20260507063425.001_20260507_063426
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507063425.001 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 06:35 | Success | - | |
|
exp_pytrain.20260507063157.001_20260507_063157
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 06:32 | Success | - | |
|
exp_self.20260507062415.1506_20260507_062415
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507062415.1506 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 06:25 | Success | - | |
|
exp_self.20260507061643.1505_20260507_061643
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507061643.1505 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 06:17 | Success | - | |
|
exp_self.20260507060906.1504_20260507_060906
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507060906.1504 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 06:10 | Success | - | |
|
exp_pytrain.20260507060628.375_20260507_060628
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 06:07 | Success | - | |
|
exp_self.20260507055929.1503_20260507_055930
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507055929.1503 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 06:00 | Success | - | |
|
exp_self.20260507055148.1502_20260507_055149
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507055148.1502 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 05:52 | Success | - | |
|
exp_self.20260507054416.1501_20260507_054416
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507054416.1501 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 05:45 | Success | - | |
|
exp_self.20260507053723.1500_20260507_053724
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507053723.1500 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 05:38 | Success | - | |
|
exp_pytrain.20260507053455.374_20260507_053455
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 05:35 | Success | - | |
|
exp_self.20260507052858.1499_20260507_052858
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507052858.1499 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 05:30 | Success | - | |
|
exp_self.20260507052121.1498_20260507_052121
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507052121.1498 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 05:22 | Success | - | |
|
exp_self.20260507051336.1497_20260507_051336
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507051336.1497 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 05:14 | Success | - | |
|
exp_self.20260507050602.1496_20260507_050603
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507050602.1496 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 05:07 | Success | - | |
|
exp_pytrain.20260507050326.373_20260507_050327
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 05:04 | Success | - | |
|
exp_self.20260507045756.1495_20260507_045756
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507045756.1495 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 04:58 | Success | - | |
|
exp_hf_2605.03314_20260507_045431
|
When to Think, When to Speak: Learning Disclosure Policies for LLM Reasoning
Paper ID: hf_2605.03314 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-07 04:55 | Success | - | |
|
exp_self.20260507045009.1494_20260507_045009
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507045009.1494 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 04:51 | Success | - | |
|
exp_self.20260507044236.1493_20260507_044236
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507044236.1493 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 04:43 | Success | - | |
|
exp_self.20260507043436.1492_20260507_043437
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507043436.1492 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 04:35 | Success | - | |
|
exp_pytrain.20260507043207.372_20260507_043207
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 04:33 | Success | - | |
|
exp_self.20260507042608.1491_20260507_042608
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507042608.1491 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 04:27 | Success | - | |
|
exp_self.20260507041915.1490_20260507_041916
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507041915.1490 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 04:20 | Success | - | |
|
exp_self.20260507041025.1489_20260507_041025
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507041025.1489 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 04:11 | Success | - | |
|
exp_self.20260507040248.1488_20260507_040249
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507040248.1488 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 04:03 | Success | - | |
|
exp_pytrain.20260507040012.371_20260507_040012
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 04:01 | Success | - | |
|
exp_self.20260507035307.1487_20260507_035307
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507035307.1487 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 03:54 | Success | - | |
|
exp_self.20260507034526.1486_20260507_034526
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507034526.1486 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 03:46 | Success | - | |
|
exp_self.20260507033747.1485_20260507_033747
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507033747.1485 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 03:38 | Success | - | |
|
exp_self.20260507033008.1484_20260507_033009
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507033008.1484 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 03:31 | Success | - | |
|
exp_pytrain.20260507032735.370_20260507_032735
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 03:28 | Success | - | |
|
exp_self.20260507032139.1483_20260507_032139
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507032139.1483 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 03:22 | Success | - | |
|
exp_self.20260507031406.1482_20260507_031406
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507031406.1482 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 03:15 | Success | - | |
|
exp_self.20260507030556.1481_20260507_030557
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507030556.1481 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 03:06 | Success | - | |
|
exp_self.20260507025817.1480_20260507_025818
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507025817.1480 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 02:59 | Success | - | |
|
exp_pytrain.20260507025542.369_20260507_025542
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 02:56 | Success | - | |
|
exp_self.20260507024939.1479_20260507_024940
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507024939.1479 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 02:50 | Success | - | |
|
exp_self.20260507024205.1478_20260507_024205
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507024205.1478 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 02:43 | Success | - | |
|
exp_self.20260507023432.1477_20260507_023432
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507023432.1477 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 02:35 | Success | - | |
|
exp_self.20260507022658.1476_20260507_022658
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507022658.1476 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 02:28 | Success | - | |
|
exp_pytrain.20260507022422.368_20260507_022422
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 02:25 | Success | - | |
|
exp_self.20260507021823.1475_20260507_021823
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507021823.1475 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 02:19 | Success | - | |
|
exp_self.20260507021030.1474_20260507_021030
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507021030.1474 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 02:11 | Success | - | |
|
exp_self.20260507020252.1473_20260507_020253
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507020252.1473 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 02:03 | Success | - | |
|
exp_self.20260507015523.1472_20260507_015524
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507015523.1472 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 01:56 | Success | - | |
|
exp_pytrain.20260507015255.367_20260507_015255
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 01:53 | Success | - | |
|
exp_self.20260507014550.1471_20260507_014551
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507014550.1471 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 01:46 | Success | - | |
|
exp_self.20260507013813.1470_20260507_013813
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507013813.1470 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 01:39 | Success | - | |
|
exp_self.20260507013036.1469_20260507_013036
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507013036.1469 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 01:31 | Success | - | |
|
exp_self.20260507012258.1468_20260507_012258
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507012258.1468 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 01:24 | Success | - | |
|
exp_pytrain.20260507012030.366_20260507_012031
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 01:21 | Success | - | |
|
exp_self.20260507011325.1467_20260507_011326
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507011325.1467 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 01:14 | Success | - | |
|
exp_self.20260507010550.1466_20260507_010551
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507010550.1466 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 01:06 | Success | - | |
|
exp_self.20260507005850.1465_20260507_005850
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507005850.1465 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 00:59 | Success | - | |
|
exp_self.20260507005118.1464_20260507_005118
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507005118.1464 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 00:52 | Success | - | |
|
exp_pytrain.20260507004842.365_20260507_004842
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 00:49 | Success | - | |
|
exp_self.20260507004140.1463_20260507_004140
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507004140.1463 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 00:42 | Success | - | |
|
exp_self.20260507003358.1462_20260507_003358
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507003358.1462 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 00:35 | Success | - | |
|
exp_self.20260507002629.1461_20260507_002629
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507002629.1461 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 00:27 | Success | - | |
|
exp_self.20260507001858.1460_20260507_001859
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507001858.1460 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 00:20 | Success | - | |
|
exp_pytrain.20260507001624.364_20260507_001625
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-07 00:17 | Success | - | |
|
exp_hf_2605.05185_20260507_001338
|
OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents
Paper ID: hf_2605.05185 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-07 00:14 | Success | - | |
|
exp_self.20260507001024.1459_20260507_001024
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507001024.1459 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 00:11 | Success | - | |
|
exp_self.20260507000243.1458_20260507_000243
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260507000243.1458 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-07 00:03 | Success | - | |
|
exp_self.20260506235508.1457_20260506_235508
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506235508.1457 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 23:56 | Success | - | |
|
exp_self.20260506234738.1456_20260506_234739
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506234738.1456 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 23:48 | Success | - | |
|
exp_pytrain.20260506234503.363_20260506_234503
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 23:46 | Success | - | |
|
exp_self.20260506233940.1455_20260506_233940
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506233940.1455 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 23:40 | Success | - | |
|
exp_self.20260506233208.1454_20260506_233208
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506233208.1454 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 23:33 | Success | - | |
|
exp_hf_2605.03849_20260506_232848
|
Stream-R1: Reliability-Perplexity Aware Reward Distillation for Streaming Video Generation
Paper ID: hf_2605.03849 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-06 23:29 | Success | - | |
|
exp_self.20260506232315.1453_20260506_232315
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506232315.1453 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 23:24 | Success | - | |
|
exp_hf_2605.04569_20260506_231736
|
Lightning Unified Video Editing via In-Context Sparse Attention
Paper ID: hf_2605.04569 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-06 23:18 | Success | - | |
|
exp_self.20260506231531.1452_20260506_231531
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506231531.1452 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 23:16 | Success | - | |
|
exp_pytrain.20260506231255.362_20260506_231255
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 23:13 | Success | - | |
|
exp_self.20260506230837.1451_20260506_230837
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506230837.1451 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 23:09 | Success | - | |
|
exp_hf_2605.03269_20260506_230544
|
RLDX-1 Technical Report
Paper ID: hf_2605.03269 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-06 23:06 | Success | - | |
|
exp_self.20260506225816.1450_20260506_225816
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506225816.1450 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 22:59 | Success | - | |
|
exp_self.20260506225047.1449_20260506_225047
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506225047.1449 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 22:51 | Success | - | |
|
exp_self.20260506224318.1448_20260506_224318
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506224318.1448 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 22:44 | Success | - | |
|
exp_pytrain.20260506224043.361_20260506_224043
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 22:41 | Success | - | |
|
exp_self.20260506223626.1447_20260506_223626
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506223626.1447 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 22:37 | Success | - | |
|
exp_self.20260506222850.1446_20260506_222850
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506222850.1446 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 22:29 | Success | - | |
|
exp_self.20260506222117.1445_20260506_222117
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506222117.1445 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 22:22 | Success | - | |
|
exp_self.20260506221340.1444_20260506_221340
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506221340.1444 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 22:14 | Success | - | |
|
exp_pytrain.20260506220829.360_20260506_220829
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 22:09 | Success | - | |
|
exp_self.20260506220602.1443_20260506_220602
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506220602.1443 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 22:07 | Success | - | |
|
exp_2605.05204v1_20260506_220233
|
D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models
Paper ID: 2605.05204v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
05-06 22:03 | Success | - | |
|
exp_self.20260506215754.1442_20260506_215754
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506215754.1442 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 21:58 | Success | - | |
|
exp_self.20260506214951.1441_20260506_214951
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506214951.1441 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 21:50 | Success | - | |
|
exp_self.20260506214156.1440_20260506_214156
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506214156.1440 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 21:42 | Success | - | |
|
exp_pytrain.20260506213650.359_20260506_213650
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 21:37 | Success | - | |
|
exp_self.20260506213427.1439_20260506_213427
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506213427.1439 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 21:35 | Success | - | |
|
exp_self.20260506212628.1438_20260506_212628
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506212628.1438 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 21:27 | Success | - | |
|
exp_hf_2605.05204_20260506_212255
|
D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models
Paper ID: hf_2605.05204 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-06 21:23 | Success | - | |
|
exp_self.20260506211706.1437_20260506_211706
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506211706.1437 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 21:18 | Success | - | |
|
exp_2605.05090v1_20260506_211401
|
Automatically Finding and Validating Unexpected Side-Effects of Interventions on Language Models
Paper ID: 2605.05090v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
05-06 21:15 | Success | - | |
|
exp_self.20260506210742.1436_20260506_210743
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506210742.1436 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 21:08 | Success | - | |
|
exp_pytrain.20260506210454.358_20260506_210455
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 21:05 | Success | - | |
|
exp_self.20260506210014.1435_20260506_210015
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506210014.1435 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 21:01 | Success | - | |
|
exp_2605.05096v1_20260506_205535
|
CapsID: Soft-Routed Variable-Length Semantic IDs for Generative Recommendation
Paper ID: 2605.05096v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
05-06 20:56 | Success | - | |
|
exp_self.20260506205310.1434_20260506_205310
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506205310.1434 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 20:54 | Success | - | |
|
exp_self.20260506204505.1433_20260506_204505
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506204505.1433 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 20:46 | Success | - | |
|
exp_self.20260506203708.1432_20260506_203709
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506203708.1432 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 20:38 | Success | - | |
|
exp_pytrain.20260506203307.357_20260506_203308
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 20:34 | Success | - | |
|
exp_self.20260506202939.1431_20260506_202940
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506202939.1431 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 20:30 | Success | - | |
|
exp_self.20260506202142.1430_20260506_202142
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506202142.1430 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 20:22 | Success | - | |
|
exp_self.20260506201336.1429_20260506_201337
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506201336.1429 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 20:14 | Success | - | |
|
exp_gh_is-leeroy-jenkins_Buddy_20260506_201034
|
is-leeroy-jenkins/Buddy
Paper ID: gh_is-leeroy-jenkins_Buddy - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recover...
|
05-06 20:11 | Success | - | |
|
exp_self.20260506200411.1428_20260506_200411
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506200411.1428 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 20:05 | Success | - | |
|
exp_pytrain.20260506200124.356_20260506_200124
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 20:02 | Success | - | |
|
exp_self.20260506195510.1427_20260506_195510
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506195510.1427 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 19:56 | Success | - | |
|
exp_gh_ThoughtTimeMachine_UFCE-Streaming_20260506_195143
|
ThoughtTimeMachine/UFCE-Streaming
Paper ID: gh_ThoughtTimeMachine_UFCE-Streaming - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signa...
|
05-06 19:52 | Success | - | |
|
exp_self.20260506194802.1426_20260506_194802
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506194802.1426 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 19:49 | Success | - | |
|
exp_self.20260506193957.1425_20260506_193958
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506193957.1425 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 19:41 | Success | - | |
|
exp_self.20260506193204.1424_20260506_193204
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506193204.1424 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 19:33 | Success | - | |
|
exp_pytrain.20260506192900.355_20260506_192901
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 19:30 | Success | - | |
|
exp_self.20260506192424.1423_20260506_192424
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506192424.1423 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 19:25 | Success | - | |
|
exp_self.20260506191633.1422_20260506_191633
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506191633.1422 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 19:17 | Success | - | |
|
exp_gh_deepspeedai_DeepSpeed_20260506_191307
|
deepspeedai/DeepSpeed
Paper ID: gh_deepspeedai_DeepSpeed - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 19:14 | Success | - | |
|
exp_self.20260506190822.1421_20260506_190822
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506190822.1421 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 19:09 | Success | - | |
|
exp_gh_im-anishraj_BhojRAG_20260506_190456
|
im-anishraj/BhojRAG
Paper ID: gh_im-anishraj_BhojRAG - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered b...
|
05-06 19:05 | Success | - | |
|
exp_self.20260506190012.1420_20260506_190012
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506190012.1420 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 19:01 | Success | - | |
|
exp_pytrain.20260506185721.354_20260506_185721
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 18:58 | Success | - | |
|
exp_self.20260506185137.1419_20260506_185137
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506185137.1419 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 18:52 | Success | - | |
|
exp_self.20260506184341.1418_20260506_184342
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506184341.1418 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 18:44 | Success | - | |
|
exp_self.20260506183551.1417_20260506_183552
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506183551.1417 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 18:36 | Success | - | |
|
exp_self.20260506182801.1416_20260506_182801
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506182801.1416 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 18:29 | Success | - | |
|
exp_pytrain.20260506182511.353_20260506_182511
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 18:26 | Success | - | |
|
exp_self.20260506181924.1415_20260506_181924
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506181924.1415 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 18:20 | Success | - | |
|
exp_self.20260506181129.1414_20260506_181129
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506181129.1414 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 18:12 | Success | - | |
|
exp_self.20260506180333.1413_20260506_180334
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506180333.1413 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 18:04 | Success | - | |
|
exp_self.20260506175541.1412_20260506_175542
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506175541.1412 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 17:56 | Success | - | |
|
exp_pytrain.20260506175246.352_20260506_175246
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 17:53 | Success | - | |
|
exp_self.20260506174702.1411_20260506_174702
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506174702.1411 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 17:48 | Success | - | |
|
exp_self.20260506173906.1410_20260506_173907
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506173906.1410 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 17:40 | Success | - | |
|
exp_self.20260506173111.1409_20260506_173112
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506173111.1409 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 17:32 | Success | - | |
|
exp_self.20260506172315.1408_20260506_172315
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506172315.1408 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 17:24 | Success | - | |
|
exp_pytrain.20260506172028.351_20260506_172028
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 17:21 | Success | - | |
|
exp_self.20260506171414.1407_20260506_171415
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506171414.1407 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 17:15 | Success | - | |
|
exp_self.20260506170619.1406_20260506_170620
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506170619.1406 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 17:07 | Success | - | |
|
exp_self.20260506165827.1405_20260506_165828
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506165827.1405 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 16:59 | Success | - | |
|
exp_self.20260506165035.1404_20260506_165035
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506165035.1404 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 16:51 | Success | - | |
|
exp_pytrain.20260506164738.350_20260506_164738
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 16:48 | Success | - | |
|
exp_self.20260506164157.1403_20260506_164157
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506164157.1403 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 16:42 | Success | - | |
|
exp_self.20260506163402.1402_20260506_163402
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506163402.1402 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 16:35 | Success | - | |
|
exp_self.20260506162603.1401_20260506_162603
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506162603.1401 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 16:27 | Success | - | |
|
exp_self.20260506161806.1400_20260506_161806
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506161806.1400 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 16:19 | Success | - | |
|
exp_pytrain.20260506161515.349_20260506_161516
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 16:16 | Success | - | |
|
exp_self.20260506161037.1399_20260506_161037
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506161037.1399 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 16:11 | Success | - | |
|
exp_self.20260506160242.1398_20260506_160242
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506160242.1398 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 16:03 | Success | - | |
|
exp_self.20260506155444.1397_20260506_155444
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506155444.1397 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 15:55 | Success | - | |
|
exp_self.20260506154638.1396_20260506_154638
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506154638.1396 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 15:47 | Success | - | |
|
exp_pytrain.20260506154339.348_20260506_154339
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 15:44 | Success | - | |
|
exp_self.20260506153554.1395_20260506_153555
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506153554.1395 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 15:36 | Success | - | |
|
exp_self.20260506152858.1394_20260506_152858
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506152858.1394 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 15:30 | Success | - | |
|
exp_self.20260506152201.1393_20260506_152201
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506152201.1393 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 15:23 | Success | - | |
|
exp_self.20260506151346.1392_20260506_151347
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506151346.1392 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 15:14 | Success | - | |
|
exp_pytrain.20260506151043.347_20260506_151044
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 15:11 | Success | - | |
|
exp_self.20260506150548.1391_20260506_150549
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506150548.1391 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 15:06 | Success | - | |
|
exp_self.20260506145736.1390_20260506_145737
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506145736.1390 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 14:58 | Success | - | |
|
exp_self.20260506144917.1389_20260506_144918
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506144917.1389 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 14:50 | Success | - | |
|
exp_self.20260506144118.1388_20260506_144118
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506144118.1388 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 14:42 | Success | - | |
|
exp_pytrain.20260506143829.346_20260506_143830
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 14:39 | Success | - | |
|
exp_self.20260506143343.1387_20260506_143343
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506143343.1387 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 14:34 | Success | - | |
|
exp_self.20260506142557.1386_20260506_142558
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506142557.1386 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 14:27 | Success | - | |
|
exp_self.20260506141756.1385_20260506_141757
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506141756.1385 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 14:19 | Success | - | |
|
exp_self.20260506140953.1384_20260506_140954
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506140953.1384 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 14:10 | Success | - | |
|
exp_pytrain.20260506140648.345_20260506_140649
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 14:07 | Success | - | |
|
exp_self.20260506140202.1383_20260506_140202
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506140202.1383 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 14:03 | Success | - | |
|
exp_self.20260506135352.1382_20260506_135352
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506135352.1382 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 13:54 | Success | - | |
|
exp_self.20260506134544.1381_20260506_134545
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506134544.1381 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 13:46 | Success | - | |
|
exp_self.20260506133740.1380_20260506_133740
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506133740.1380 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 13:38 | Success | - | |
|
exp_pytrain.20260506133436.344_20260506_133436
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 13:35 | Success | - | |
|
exp_self.20260506132951.1379_20260506_132951
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506132951.1379 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 13:30 | Success | - | |
|
exp_self.20260506132146.1378_20260506_132147
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506132146.1378 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 13:22 | Success | - | |
|
exp_self.20260506131341.1377_20260506_131341
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506131341.1377 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 13:14 | Success | - | |
|
exp_self.20260506130534.1376_20260506_130535
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506130534.1376 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 13:06 | Success | - | |
|
exp_pytrain.20260506130229.343_20260506_130229
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 13:03 | Success | - | |
|
exp_self.20260506125636.1375_20260506_125636
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506125636.1375 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 12:57 | Success | - | |
|
exp_self.20260506124942.1374_20260506_124942
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506124942.1374 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 12:50 | Success | - | |
|
exp_self.20260506124134.1373_20260506_124134
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506124134.1373 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 12:42 | Success | - | |
|
exp_self.20260506123325.1372_20260506_123326
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506123325.1372 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 12:34 | Success | - | |
|
exp_pytrain.20260506123020.342_20260506_123021
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 12:31 | Success | - | |
|
exp_hf_2605.02913_20260506_122724
|
Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning
Paper ID: hf_2605.02913 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-06 12:28 | Success | - | |
|
exp_self.20260506122243.1371_20260506_122244
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506122243.1371 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 12:23 | Success | - | |
|
exp_self.20260506121448.1370_20260506_121449
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506121448.1370 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 12:15 | Success | - | |
|
exp_self.20260506120656.1369_20260506_120656
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506120656.1369 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 12:07 | Success | - | |
|
exp_self.20260506115901.1368_20260506_115901
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506115901.1368 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 12:00 | Success | - | |
|
exp_pytrain.20260506115614.341_20260506_115615
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 11:57 | Success | - | |
|
exp_self.20260506114901.1367_20260506_114901
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506114901.1367 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 11:50 | Success | - | |
|
exp_self.20260506114120.1366_20260506_114120
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506114120.1366 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 11:42 | Success | - | |
|
exp_self.20260506113344.1365_20260506_113344
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506113344.1365 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 11:34 | Success | - | |
|
exp_self.20260506112621.1364_20260506_112622
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506112621.1364 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 11:27 | Success | - | |
|
exp_pytrain.20260506112352.340_20260506_112352
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 11:24 | Success | - | |
|
exp_self.20260506111730.1363_20260506_111730
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506111730.1363 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 11:18 | Success | - | |
|
exp_self.20260506110953.1362_20260506_110954
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506110953.1362 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 11:10 | Success | - | |
|
exp_self.20260506110211.1361_20260506_110212
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506110211.1361 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 11:03 | Success | - | |
|
exp_self.20260506105422.1360_20260506_105422
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506105422.1360 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 10:55 | Success | - | |
|
exp_pytrain.20260506105145.339_20260506_105146
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 10:52 | Success | - | |
|
exp_self.20260506104615.1359_20260506_104615
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506104615.1359 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 10:47 | Success | - | |
|
exp_self.20260506103815.1358_20260506_103816
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506103815.1358 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 10:39 | Success | - | |
|
exp_self.20260506103029.1357_20260506_103030
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506103029.1357 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 10:31 | Success | - | |
|
exp_self.20260506102252.1356_20260506_102253
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506102252.1356 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 10:23 | Success | - | |
|
exp_pytrain.20260506102024.338_20260506_102025
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 10:21 | Success | - | |
|
exp_self.20260506101312.1355_20260506_101312
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506101312.1355 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 10:14 | Success | - | |
|
exp_self.20260506100534.1354_20260506_100535
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506100534.1354 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 10:06 | Success | - | |
|
exp_self.20260506095751.1353_20260506_095752
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506095751.1353 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 09:58 | Success | - | |
|
exp_self.20260506095012.1352_20260506_095012
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506095012.1352 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 09:51 | Success | - | |
|
exp_pytrain.20260506094743.337_20260506_094743
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 09:48 | Success | - | |
|
exp_self.20260506094159.1351_20260506_094200
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506094159.1351 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 09:43 | Success | - | |
|
exp_self.20260506093422.1350_20260506_093422
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506093422.1350 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 09:35 | Success | - | |
|
exp_self.20260506092636.1349_20260506_092637
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506092636.1349 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 09:27 | Success | - | |
|
exp_self.20260506091853.1348_20260506_091853
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506091853.1348 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 09:19 | Success | - | |
|
exp_pytrain.20260506091624.336_20260506_091624
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 09:17 | Success | - | |
|
exp_self.20260506091054.1347_20260506_091054
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506091054.1347 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 09:11 | Success | - | |
|
exp_self.20260506090314.1346_20260506_090314
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506090314.1346 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 09:04 | Success | - | |
|
exp_self.20260506085533.1345_20260506_085533
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506085533.1345 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 08:56 | Success | - | |
|
exp_self.20260506084742.1344_20260506_084743
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506084742.1344 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 08:48 | Success | - | |
|
exp_pytrain.20260506084502.335_20260506_084502
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 08:46 | Success | - | |
|
exp_self.20260506083928.1343_20260506_083929
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506083928.1343 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 08:40 | Success | - | |
|
exp_self.20260506083132.1342_20260506_083133
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506083132.1342 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 08:32 | Success | - | |
|
exp_self.20260506082355.1341_20260506_082355
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506082355.1341 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 08:24 | Success | - | |
|
exp_self.20260506081616.1340_20260506_081616
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506081616.1340 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 08:17 | Success | - | |
|
exp_pytrain.20260506081342.334_20260506_081342
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 08:14 | Success | - | |
|
exp_self.20260506080639.1339_20260506_080639
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506080639.1339 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 08:07 | Success | - | |
|
exp_self.20260506075857.1338_20260506_075857
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506075857.1338 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 07:59 | Success | - | |
|
exp_self.20260506075116.1337_20260506_075116
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506075116.1337 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 07:52 | Success | - | |
|
exp_self.20260506074335.1336_20260506_074335
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506074335.1336 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 07:44 | Success | - | |
|
exp_pytrain.20260506074106.333_20260506_074106
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 07:42 | Success | - | |
|
exp_self.20260506073355.1335_20260506_073355
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506073355.1335 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 07:34 | Success | - | |
|
exp_self.20260506072620.1334_20260506_072620
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506072620.1334 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 07:27 | Success | - | |
|
exp_self.20260506071834.1333_20260506_071834
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506071834.1333 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 07:19 | Success | - | |
|
exp_self.20260506071049.1332_20260506_071050
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506071049.1332 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 07:11 | Success | - | |
|
exp_pytrain.20260506070822.332_20260506_070822
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 07:09 | Success | - | |
|
exp_self.20260506070111.1331_20260506_070111
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506070111.1331 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 07:02 | Success | - | |
|
exp_self.20260506065331.1330_20260506_065332
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506065331.1330 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 06:54 | Success | - | |
|
exp_self.20260506064551.1329_20260506_064551
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506064551.1329 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 06:46 | Success | - | |
|
exp_self.20260506063806.1328_20260506_063806
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506063806.1328 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 06:39 | Success | - | |
|
exp_pytrain.20260506063538.331_20260506_063538
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 06:36 | Success | - | |
|
exp_hf_2605.02904_20260506_063138
|
StateSMix: Online Lossless Compression via Mamba State Space Models and Sparse N-gram Context Mixing
Paper ID: hf_2605.02904 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-06 06:32 | Success | - | |
|
exp_self.20260506062933.1327_20260506_062933
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506062933.1327 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 06:30 | Success | - | |
|
exp_self.20260506062154.1326_20260506_062155
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506062154.1326 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 06:22 | Success | - | |
|
exp_self.20260506061412.1325_20260506_061413
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506061412.1325 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 06:15 | Success | - | |
|
exp_self.20260506060632.1324_20260506_060632
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506060632.1324 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 06:07 | Success | - | |
|
exp_pytrain.20260506060404.330_20260506_060404
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 06:05 | Success | - | |
|
exp_self.20260506055656.1323_20260506_055657
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506055656.1323 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 05:57 | Success | - | |
|
exp_self.20260506054921.1322_20260506_054921
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506054921.1322 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 05:50 | Success | - | |
|
exp_self.20260506054146.1321_20260506_054146
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506054146.1321 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 05:42 | Success | - | |
|
exp_self.20260506053353.1320_20260506_053353
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506053353.1320 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 05:34 | Success | - | |
|
exp_pytrain.20260506053124.329_20260506_053124
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 05:32 | Success | - | |
|
exp_self.20260506052416.1319_20260506_052417
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506052416.1319 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 05:25 | Success | - | |
|
exp_self.20260506051643.1318_20260506_051643
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506051643.1318 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 05:17 | Success | - | |
|
exp_self.20260506050907.1317_20260506_050907
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506050907.1317 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 05:10 | Success | - | |
|
exp_self.20260506050122.1316_20260506_050123
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506050122.1316 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 05:02 | Success | - | |
|
exp_pytrain.20260506045850.328_20260506_045851
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 04:59 | Success | - | |
|
exp_self.20260506045253.1315_20260506_045254
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506045253.1315 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 04:53 | Success | - | |
|
exp_self.20260506044509.1314_20260506_044510
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506044509.1314 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 04:46 | Success | - | |
|
exp_self.20260506043724.1313_20260506_043724
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506043724.1313 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 04:38 | Success | - | |
|
exp_self.20260506042942.1312_20260506_042942
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506042942.1312 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 04:30 | Success | - | |
|
exp_pytrain.20260506042714.327_20260506_042714
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 04:28 | Success | - | |
|
exp_self.20260506042130.1311_20260506_042131
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506042130.1311 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 04:22 | Success | - | |
|
exp_self.20260506041350.1310_20260506_041351
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506041350.1310 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 04:14 | Success | - | |
|
exp_self.20260506040558.1309_20260506_040558
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506040558.1309 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 04:07 | Success | - | |
|
exp_self.20260506035818.1308_20260506_035818
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506035818.1308 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 03:59 | Success | - | |
|
exp_pytrain.20260506035550.326_20260506_035550
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 03:56 | Success | - | |
|
exp_self.20260506034944.1307_20260506_034945
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506034944.1307 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 03:50 | Success | - | |
|
exp_self.20260506034200.1306_20260506_034201
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506034200.1306 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 03:43 | Success | - | |
|
exp_self.20260506033417.1305_20260506_033417
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506033417.1305 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 03:35 | Success | - | |
|
exp_self.20260506032632.1304_20260506_032632
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506032632.1304 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 03:27 | Success | - | |
|
exp_pytrain.20260506032349.325_20260506_032349
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 03:24 | Success | - | |
|
exp_self.20260506031817.1303_20260506_031817
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506031817.1303 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 03:19 | Success | - | |
|
exp_self.20260506031109.1302_20260506_031109
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506031109.1302 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 03:12 | Success | - | |
|
exp_self.20260506030304.1301_20260506_030305
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506030304.1301 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 03:04 | Success | - | |
|
exp_self.20260506025538.1300_20260506_025539
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506025538.1300 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 02:56 | Success | - | |
|
exp_pytrain.20260506025209.324_20260506_025209
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 02:53 | Success | - | |
|
exp_self.20260506024648.1299_20260506_024648
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506024648.1299 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 02:47 | Success | - | |
|
exp_hf_2605.00891_20260506_024255
|
X2SAM: Any Segmentation in Images and Videos
Paper ID: hf_2605.00891 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-06 02:43 | Success | - | |
|
exp_self.20260506023848.1298_20260506_023848
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506023848.1298 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 02:39 | Success | - | |
|
exp_self.20260506023121.1297_20260506_023121
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506023121.1297 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 02:32 | Success | - | |
|
exp_self.20260506022356.1296_20260506_022357
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506022356.1296 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 02:25 | Success | - | |
|
exp_pytrain.20260506022030.323_20260506_022030
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 02:21 | Success | - | |
|
exp_self.20260506021521.1295_20260506_021521
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506021521.1295 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 02:16 | Success | - | |
|
exp_self.20260506020756.1294_20260506_020756
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506020756.1294 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 02:09 | Success | - | |
|
exp_self.20260506015842.1293_20260506_015842
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506015842.1293 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 01:59 | Success | - | |
|
exp_self.20260506015121.1292_20260506_015121
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506015121.1292 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 01:52 | Success | - | |
|
exp_pytrain.20260506014801.322_20260506_014801
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 01:49 | Success | - | |
|
exp_hf_2605.01371_20260506_014330
|
ESARBench: A Benchmark for Agentic UAV Embodied Search and Rescue
Paper ID: hf_2605.01371 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-06 01:44 | Success | - | |
|
exp_self.20260506014034.1291_20260506_014034
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506014034.1291 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 01:41 | Success | - | |
|
exp_self.20260506013314.1290_20260506_013315
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506013314.1290 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 01:34 | Success | - | |
|
exp_self.20260506012552.1289_20260506_012552
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506012552.1289 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 01:26 | Success | - | |
|
exp_self.20260506011836.1288_20260506_011836
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506011836.1288 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 01:19 | Success | - | |
|
exp_pytrain.20260506011505.321_20260506_011505
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 01:16 | Success | - | |
|
exp_self.20260506010825.1287_20260506_010825
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506010825.1287 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 01:09 | Success | - | |
|
exp_self.20260506010052.1286_20260506_010052
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506010052.1286 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 01:01 | Success | - | |
|
exp_self.20260506005328.1285_20260506_005328
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506005328.1285 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 00:54 | Success | - | |
|
exp_self.20260506004614.1284_20260506_004614
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506004614.1284 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 00:47 | Success | - | |
|
exp_pytrain.20260506004250.320_20260506_004250
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 00:43 | Success | - | |
|
exp_self.20260506003606.1283_20260506_003607
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506003606.1283 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 00:37 | Success | - | |
|
exp_self.20260506002849.1282_20260506_002849
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506002849.1282 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 00:29 | Success | - | |
|
exp_self.20260506002126.1281_20260506_002126
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506002126.1281 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 00:22 | Success | - | |
|
exp_self.20260506001414.1280_20260506_001414
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506001414.1280 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 00:15 | Success | - | |
|
exp_pytrain.20260506001053.319_20260506_001054
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-06 00:11 | Success | - | |
|
exp_self.20260506000411.1279_20260506_000412
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260506000411.1279 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-06 00:05 | Success | - | |
|
exp_self.20260505235640.1278_20260505_235640
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505235640.1278 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 23:57 | Success | - | |
|
exp_self.20260505234923.1277_20260505_234923
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505234923.1277 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 23:50 | Success | - | |
|
exp_self.20260505234203.1276_20260505_234203
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505234203.1276 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 23:43 | Success | - | |
|
exp_pytrain.20260505233843.318_20260505_233843
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 23:39 | Success | - | |
|
exp_self.20260505233157.1275_20260505_233157
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505233157.1275 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 23:33 | Success | - | |
|
exp_self.20260505232441.1274_20260505_232441
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505232441.1274 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 23:25 | Success | - | |
|
exp_cr_10.1093_ehjdh_ztag070_20260505_231942
|
Automated Full-text screening and accelerated reviews using large language models with Context-Aware Agents: An explorat...
Paper ID: cr_10.1093_ehjdh_ztag070 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 23:20 | Success | - | |
|
exp_self.20260505231641.1273_20260505_231641
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505231641.1273 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 23:17 | Success | - | |
|
exp_self.20260505230927.1272_20260505_230927
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505230927.1272 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 23:10 | Success | - | |
|
exp_pytrain.20260505230558.317_20260505_230558
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 23:07 | Success | - | |
|
exp_hf_2605.01284_20260505_230319
|
Chain of Evidence: Pixel-Level Visual Attribution for Iterative Retrieval-Augmented Generation
Paper ID: hf_2605.01284 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-05 23:04 | Success | - | |
|
exp_self.20260505230025.1271_20260505_230025
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505230025.1271 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 23:01 | Success | - | |
|
exp_self.20260505225303.1270_20260505_225303
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505225303.1270 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 22:54 | Success | - | |
|
exp_self.20260505224545.1269_20260505_224546
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505224545.1269 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 22:46 | Success | - | |
|
exp_self.20260505223823.1268_20260505_223823
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505223823.1268 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 22:39 | Success | - | |
|
exp_pytrain.20260505223353.316_20260505_223353
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 22:34 | Success | - | |
|
exp_self.20260505223102.1267_20260505_223102
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505223102.1267 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 22:32 | Success | - | |
|
exp_2605.04040v1_20260505_222739
|
Large Language Models are Universal Reasoners for Visual Generation
Paper ID: 2605.04040v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
05-05 22:28 | Success | - | |
|
exp_self.20260505222132.1266_20260505_222133
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505222132.1266 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 22:22 | Success | - | |
|
exp_hf_2605.01466_20260505_221705
|
SplAttN: Bridging 2D and 3D with Gaussian Soft Splatting and Attention for Point Cloud Completion
Paper ID: hf_2605.01466 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-05 22:18 | Success | - | |
|
exp_self.20260505221348.1265_20260505_221349
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505221348.1265 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 22:14 | Success | - | |
|
exp_2605.04045v1_20260505_220959
|
Audio-Visual Intelligence in Large Foundation Models
Paper ID: 2605.04045v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
05-05 22:11 | Success | - | |
|
exp_self.20260505220445.1264_20260505_220445
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505220445.1264 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 22:05 | Success | - | |
|
exp_pytrain.20260505220011.315_20260505_220011
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 22:01 | Success | - | |
|
exp_self.20260505215718.1263_20260505_215718
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505215718.1263 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 21:58 | Success | - | |
|
exp_hf_2604.28123_20260505_215348
|
Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL
Paper ID: hf_2604.28123 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-05 21:54 | Success | - | |
|
exp_gh_Deor736_casullens_20260505_215028
|
Deor736/casullens
Paper ID: gh_Deor736_casullens - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered ben...
|
05-05 21:51 | Success | - | |
|
exp_self.20260505214514.1262_20260505_214515
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505214514.1262 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 21:46 | Success | - | |
|
exp_self.20260505213759.1261_20260505_213759
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505213759.1261 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 21:39 | Success | - | |
|
exp_hf_2605.02943_20260505_213407
|
Healthcare AI GYM for Medical Agents
Paper ID: hf_2605.02943 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-05 21:35 | Success | - | |
|
exp_self.20260505213001.1260_20260505_213001
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505213001.1260 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 21:31 | Success | - | |
|
exp_pytrain.20260505212638.314_20260505_212638
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 21:27 | Success | - | |
|
exp_self.20260505212240.1259_20260505_212240
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505212240.1259 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 21:23 | Success | - | |
|
exp_self.20260505211524.1258_20260505_211525
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505211524.1258 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 21:16 | Success | - | |
|
exp_2605.03969v1_20260505_211025
|
Feature-Augmented Transformers for Robust AI-Text Detection Across Domains and Generators
Paper ID: 2605.03969v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
05-05 21:11 | Success | - | |
|
exp_self.20260505210725.1257_20260505_210726
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505210725.1257 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 21:08 | Success | - | |
|
exp_2605.03953v1_20260505_210336
|
Transformers with Selective Access to Early Representations
Paper ID: 2605.03953v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
05-05 21:04 | Success | - | |
|
exp_self.20260505205824.1256_20260505_205824
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505205824.1256 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 20:59 | Success | - | |
|
exp_pytrain.20260505205456.313_20260505_205456
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 20:56 | Success | - | |
|
exp_self.20260505205058.1255_20260505_205058
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505205058.1255 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 20:52 | Success | - | |
|
exp_hf_2605.04012_20260505_204705
|
SymptomAI: Towards a Conversational AI Agent for Everyday Symptom Assessment
Paper ID: hf_2605.04012 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-05 20:48 | Success | - | |
|
exp_self.20260505204143.1254_20260505_204143
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505204143.1254 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 20:42 | Success | - | |
|
exp_self.20260505203329.1253_20260505_203329
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505203329.1253 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 20:34 | Success | - | |
|
exp_self.20260505202614.1252_20260505_202614
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505202614.1252 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 20:27 | Success | - | |
|
exp_pytrain.20260505202244.312_20260505_202245
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 20:23 | Success | - | |
|
exp_self.20260505201556.1251_20260505_201556
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505201556.1251 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 20:17 | Success | - | |
|
exp_self.20260505200835.1250_20260505_200835
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505200835.1250 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 20:09 | Success | - | |
|
exp_self.20260505200121.1249_20260505_200122
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505200121.1249 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 20:02 | Success | - | |
|
exp_self.20260505195407.1248_20260505_195407
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505195407.1248 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 19:55 | Success | - | |
|
exp_pytrain.20260505195036.311_20260505_195037
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 19:51 | Success | - | |
|
exp_self.20260505194354.1247_20260505_194354
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505194354.1247 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 19:44 | Success | - | |
|
exp_self.20260505193634.1246_20260505_193635
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505193634.1246 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 19:37 | Success | - | |
|
exp_self.20260505192915.1245_20260505_192916
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505192915.1245 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 19:30 | Success | - | |
|
exp_self.20260505192149.1244_20260505_192149
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505192149.1244 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 19:22 | Success | - | |
|
exp_pytrain.20260505191823.310_20260505_191823
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 19:19 | Success | - | |
|
exp_self.20260505191142.1243_20260505_191143
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505191142.1243 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 19:12 | Success | - | |
|
exp_self.20260505190417.1242_20260505_190417
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505190417.1242 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 19:05 | Success | - | |
|
exp_self.20260505185651.1241_20260505_185651
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505185651.1241 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 18:57 | Success | - | |
|
exp_self.20260505184926.1240_20260505_184926
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505184926.1240 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 18:50 | Success | - | |
|
exp_pytrain.20260505184606.309_20260505_184606
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 18:47 | Success | - | |
|
exp_self.20260505183935.1239_20260505_183936
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505183935.1239 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 18:40 | Success | - | |
|
exp_self.20260505183156.1238_20260505_183156
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505183156.1238 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 18:32 | Success | - | |
|
exp_self.20260505182422.1237_20260505_182422
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505182422.1237 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 18:25 | Success | - | |
|
exp_self.20260505181651.1236_20260505_181651
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505181651.1236 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 18:17 | Success | - | |
|
exp_pytrain.20260505181418.308_20260505_181419
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 18:15 | Success | - | |
|
exp_self.20260505180708.1235_20260505_180709
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505180708.1235 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 18:08 | Success | - | |
|
exp_self.20260505175935.1234_20260505_175935
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505175935.1234 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 18:00 | Success | - | |
|
exp_self.20260505175200.1233_20260505_175200
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505175200.1233 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 17:53 | Success | - | |
|
exp_self.20260505174431.1232_20260505_174431
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505174431.1232 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 17:45 | Success | - | |
|
exp_pytrain.20260505174203.307_20260505_174203
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 17:43 | Success | - | |
|
exp_self.20260505173745.1231_20260505_173745
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505173745.1231 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 17:38 | Success | - | |
|
exp_hf_2605.00925_20260505_173448
|
Linking spatial biology and clinical histology via Haiku
Paper ID: hf_2605.00925 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-05 17:35 | Success | - | |
|
exp_self.20260505172739.1230_20260505_172740
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505172739.1230 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 17:28 | Success | - | |
|
exp_self.20260505172004.1229_20260505_172005
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505172004.1229 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 17:21 | Success | - | |
|
exp_self.20260505171230.1228_20260505_171231
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505171230.1228 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 17:13 | Success | - | |
|
exp_pytrain.20260505170948.306_20260505_170949
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 17:10 | Success | - | |
|
exp_self.20260505170305.1227_20260505_170306
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505170305.1227 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 17:04 | Success | - | |
|
exp_self.20260505165541.1226_20260505_165542
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505165541.1226 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 16:56 | Success | - | |
|
exp_self.20260505164819.1225_20260505_164819
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505164819.1225 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 16:49 | Success | - | |
|
exp_self.20260505164058.1224_20260505_164058
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505164058.1224 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 16:42 | Success | - | |
|
exp_pytrain.20260505163732.305_20260505_163733
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 16:38 | Success | - | |
|
exp_self.20260505163223.1223_20260505_163223
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505163223.1223 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 16:33 | Success | - | |
|
exp_self.20260505162502.1222_20260505_162502
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505162502.1222 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 16:26 | Success | - | |
|
exp_self.20260505161709.1221_20260505_161709
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505161709.1221 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 16:18 | Success | - | |
|
exp_self.20260505160839.1220_20260505_160840
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505160839.1220 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 16:09 | Success | - | |
|
exp_pytrain.20260505160515.304_20260505_160515
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 16:06 | Success | - | |
|
exp_self.20260505155833.1219_20260505_155833
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505155833.1219 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 15:59 | Success | - | |
|
exp_self.20260505155120.1218_20260505_155120
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505155120.1218 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 15:52 | Success | - | |
|
exp_self.20260505154406.1217_20260505_154407
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505154406.1217 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 15:45 | Success | - | |
|
exp_self.20260505153652.1216_20260505_153653
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505153652.1216 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 15:37 | Success | - | |
|
exp_pytrain.20260505153327.303_20260505_153327
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 15:34 | Success | - | |
|
exp_self.20260505152648.1215_20260505_152649
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505152648.1215 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 15:27 | Success | - | |
|
exp_self.20260505151932.1214_20260505_151932
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505151932.1214 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 15:20 | Success | - | |
|
exp_self.20260505151218.1213_20260505_151218
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505151218.1213 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 15:13 | Success | - | |
|
exp_self.20260505150506.1212_20260505_150507
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505150506.1212 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 15:06 | Success | - | |
|
exp_pytrain.20260505150136.302_20260505_150137
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 15:02 | Success | - | |
|
exp_self.20260505145458.1211_20260505_145459
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505145458.1211 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 14:56 | Success | - | |
|
exp_self.20260505144741.1210_20260505_144741
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505144741.1210 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 14:48 | Success | - | |
|
exp_self.20260505144024.1209_20260505_144025
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505144024.1209 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 14:41 | Success | - | |
|
exp_self.20260505143316.1208_20260505_143316
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505143316.1208 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 14:34 | Success | - | |
|
exp_pytrain.20260505142950.301_20260505_142950
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 14:30 | Success | - | |
|
exp_hf_2605.01711_20260505_142634
|
Linear-Time Global Visual Modeling without Explicit Attention
Paper ID: hf_2605.01711 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-05 14:27 | Success | - | |
|
exp_self.20260505142223.1207_20260505_142223
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505142223.1207 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 14:23 | Success | - | |
|
exp_self.20260505141506.1206_20260505_141506
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505141506.1206 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 14:16 | Success | - | |
|
exp_self.20260505140750.1205_20260505_140750
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505140750.1205 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 14:08 | Success | - | |
|
exp_self.20260505140035.1204_20260505_140035
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505140035.1204 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 14:01 | Success | - | |
|
exp_pytrain.20260505135709.300_20260505_135710
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 13:58 | Success | - | |
|
exp_self.20260505135028.1203_20260505_135029
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505135028.1203 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 13:51 | Success | - | |
|
exp_self.20260505134315.1202_20260505_134315
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505134315.1202 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 13:44 | Success | - | |
|
exp_self.20260505133606.1201_20260505_133607
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505133606.1201 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 13:37 | Success | - | |
|
exp_self.20260505132851.1200_20260505_132851
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505132851.1200 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 13:29 | Success | - | |
|
exp_pytrain.20260505132526.299_20260505_132526
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 13:26 | Success | - | |
|
exp_self.20260505132127.1199_20260505_132127
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505132127.1199 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 13:22 | Success | - | |
|
exp_self.20260505131410.1198_20260505_131410
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505131410.1198 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 13:15 | Success | - | |
|
exp_self.20260505130652.1197_20260505_130652
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505130652.1197 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 13:07 | Success | - | |
|
exp_self.20260505125943.1196_20260505_125943
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505125943.1196 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 13:00 | Success | - | |
|
exp_hf_2605.00632_20260505_125613
|
BlenderRAG: High-Fidelity 3D Object Generation via Retrieval-Augmented Code Synthesis
Paper ID: hf_2605.00632 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-05 12:57 | Success | - | |
|
exp_pytrain.20260505125319.298_20260505_125319
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 12:54 | Success | - | |
|
exp_self.20260505124633.1195_20260505_124634
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505124633.1195 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 12:47 | Success | - | |
|
exp_self.20260505123923.1194_20260505_123923
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505123923.1194 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 12:40 | Success | - | |
|
exp_self.20260505123207.1193_20260505_123207
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505123207.1193 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 12:33 | Success | - | |
|
exp_self.20260505122446.1192_20260505_122447
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505122446.1192 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 12:25 | Success | - | |
|
exp_pytrain.20260505122124.297_20260505_122124
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 12:22 | Success | - | |
|
exp_self.20260505121450.1191_20260505_121451
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505121450.1191 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 12:15 | Success | - | |
|
exp_self.20260505120709.1190_20260505_120709
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505120709.1190 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 12:08 | Success | - | |
|
exp_self.20260505115930.1189_20260505_115930
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505115930.1189 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 12:00 | Success | - | |
|
exp_self.20260505115158.1188_20260505_115158
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505115158.1188 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 11:53 | Success | - | |
|
exp_pytrain.20260505114922.296_20260505_114923
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 11:50 | Success | - | |
|
exp_self.20260505114322.1187_20260505_114323
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505114322.1187 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 11:44 | Success | - | |
|
exp_self.20260505113549.1186_20260505_113549
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505113549.1186 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 11:36 | Success | - | |
|
exp_self.20260505112816.1185_20260505_112816
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505112816.1185 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 11:29 | Success | - | |
|
exp_self.20260505112043.1184_20260505_112044
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505112043.1184 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 11:21 | Success | - | |
|
exp_pytrain.20260505111804.295_20260505_111805
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 11:19 | Success | - | |
|
exp_self.20260505111102.1183_20260505_111102
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505111102.1183 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 11:12 | Success | - | |
|
exp_self.20260505110310.1182_20260505_110310
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505110310.1182 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 11:04 | Success | - | |
|
exp_self.20260505105531.1181_20260505_105532
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505105531.1181 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 10:56 | Success | - | |
|
exp_self.20260505104756.1180_20260505_104757
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505104756.1180 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 10:48 | Success | - | |
|
exp_pytrain.20260505104520.294_20260505_104521
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 10:46 | Success | - | |
|
exp_self.20260505103817.1179_20260505_103817
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505103817.1179 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 10:39 | Success | - | |
|
exp_self.20260505103031.1178_20260505_103031
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505103031.1178 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 10:31 | Success | - | |
|
exp_self.20260505102318.1177_20260505_102318
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505102318.1177 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 10:24 | Success | - | |
|
exp_self.20260505101559.1176_20260505_101559
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505101559.1176 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 10:17 | Success | - | |
|
exp_pytrain.20260505101229.293_20260505_101230
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 10:13 | Success | - | |
|
exp_self.20260505100542.1175_20260505_100542
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505100542.1175 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 10:06 | Success | - | |
|
exp_self.20260505095822.1174_20260505_095823
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505095822.1174 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 09:59 | Success | - | |
|
exp_self.20260505095107.1173_20260505_095107
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505095107.1173 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 09:52 | Success | - | |
|
exp_self.20260505094352.1172_20260505_094353
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505094352.1172 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 09:44 | Success | - | |
|
exp_pytrain.20260505094022.292_20260505_094023
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 09:41 | Success | - | |
|
exp_self.20260505093338.1171_20260505_093338
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505093338.1171 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 09:34 | Success | - | |
|
exp_self.20260505092621.1170_20260505_092621
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505092621.1170 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 09:27 | Success | - | |
|
exp_self.20260505091800.1169_20260505_091800
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505091800.1169 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 09:19 | Success | - | |
|
exp_self.20260505091039.1168_20260505_091039
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505091039.1168 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 09:11 | Success | - | |
|
exp_pytrain.20260505090719.291_20260505_090720
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 09:08 | Success | - | |
|
exp_self.20260505090120.1167_20260505_090121
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505090120.1167 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 09:02 | Success | - | |
|
exp_self.20260505085333.1166_20260505_085334
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505085333.1166 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 08:54 | Success | - | |
|
exp_self.20260505084553.1165_20260505_084553
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505084553.1165 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 08:46 | Success | - | |
|
exp_self.20260505083820.1164_20260505_083821
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505083820.1164 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 08:39 | Success | - | |
|
exp_pytrain.20260505083543.290_20260505_083544
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 08:36 | Success | - | |
|
exp_self.20260505083124.1163_20260505_083125
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505083124.1163 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 08:32 | Success | - | |
|
exp_self.20260505082348.1162_20260505_082348
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505082348.1162 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 08:24 | Success | - | |
|
exp_self.20260505081547.1161_20260505_081547
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505081547.1161 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 08:16 | Success | - | |
|
exp_self.20260505080828.1160_20260505_080829
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505080828.1160 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 08:09 | Success | - | |
|
exp_pytrain.20260505080351.289_20260505_080351
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 08:04 | Success | - | |
|
exp_self.20260505080100.1159_20260505_080100
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505080100.1159 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 08:02 | Success | - | |
|
exp_self.20260505075417.1158_20260505_075417
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505075417.1158 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 07:55 | Success | - | |
|
exp_self.20260505074704.1157_20260505_074704
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505074704.1157 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 07:48 | Success | - | |
|
exp_self.20260505073942.1156_20260505_073943
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505073942.1156 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 07:40 | Success | - | |
|
exp_self.20260505073206.1155_20260505_073207
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505073206.1155 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 07:33 | Success | - | |
|
exp_pytrain.20260505072932.288_20260505_072932
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 07:30 | Success | - | |
|
exp_self.20260505072228.1154_20260505_072228
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505072228.1154 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 07:23 | Success | - | |
|
exp_self.20260505071445.1153_20260505_071446
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505071445.1153 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 07:15 | Success | - | |
|
exp_self.20260505070706.1152_20260505_070706
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505070706.1152 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 07:08 | Success | - | |
|
exp_self.20260505065933.1151_20260505_065934
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505065933.1151 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 07:00 | Success | - | |
|
exp_pytrain.20260505065659.287_20260505_065700
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 06:58 | Success | - | |
|
exp_self.20260505064958.1150_20260505_064958
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505064958.1150 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 06:51 | Success | - | |
|
exp_self.20260505064217.1149_20260505_064218
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505064217.1149 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 06:43 | Success | - | |
|
exp_self.20260505063437.1148_20260505_063437
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505063437.1148 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 06:35 | Success | - | |
|
exp_self.20260505062706.1147_20260505_062706
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505062706.1147 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 06:28 | Success | - | |
|
exp_pytrain.20260505062433.286_20260505_062434
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 06:25 | Success | - | |
|
exp_self.20260505061728.1146_20260505_061729
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505061728.1146 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 06:18 | Success | - | |
|
exp_self.20260505060952.1145_20260505_060953
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505060952.1145 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 06:10 | Success | - | |
|
exp_self.20260505060221.1144_20260505_060221
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505060221.1144 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 06:03 | Success | - | |
|
exp_self.20260505055437.1143_20260505_055437
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505055437.1143 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 05:55 | Success | - | |
|
exp_pytrain.20260505055159.285_20260505_055200
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 05:53 | Success | - | |
|
exp_self.20260505054742.1142_20260505_054742
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505054742.1142 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 05:48 | Success | - | |
|
exp_self.20260505053851.1141_20260505_053852
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505053851.1141 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 05:39 | Success | - | |
|
exp_self.20260505053028.1140_20260505_053028
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505053028.1140 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 05:31 | Success | - | |
|
exp_self.20260505052241.1139_20260505_052241
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505052241.1139 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 05:23 | Success | - | |
|
exp_pytrain.20260505052013.284_20260505_052013
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 05:21 | Success | - | |
|
exp_self.20260505051311.1138_20260505_051311
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505051311.1138 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 05:14 | Success | - | |
|
exp_self.20260505050534.1137_20260505_050535
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505050534.1137 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 05:06 | Success | - | |
|
exp_self.20260505045805.1136_20260505_045806
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505045805.1136 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 04:59 | Success | - | |
|
exp_hf_2605.00814_20260505_045230
|
Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs
Paper ID: hf_2605.00814 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-05 04:53 | Success | - | |
|
exp_self.20260505045019.1135_20260505_045019
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505045019.1135 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 04:51 | Success | - | |
|
exp_pytrain.20260505044746.283_20260505_044747
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 04:48 | Success | - | |
|
exp_self.20260505044054.1134_20260505_044055
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505044054.1134 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 04:41 | Success | - | |
|
exp_self.20260505043319.1133_20260505_043319
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505043319.1133 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 04:34 | Success | - | |
|
exp_self.20260505042547.1132_20260505_042548
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505042547.1132 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 04:26 | Success | - | |
|
exp_cr_10.1093_nar_gkag425_20260505_042233
|
xBind: an integrated webserver for large language model-enabled cross-molecular protein binding site prediction
Paper ID: cr_10.1093_nar_gkag425 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered b...
|
05-05 04:23 | Success | - | |
|
exp_self.20260505041744.1131_20260505_041745
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505041744.1131 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 04:18 | Success | - | |
|
exp_pytrain.20260505041514.282_20260505_041514
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 04:16 | Success | - | |
|
exp_cr_10.3390_fi18050243_20260505_041224
|
The Trustworthy Model Context Protocol (MCP) Registry: An Architectural Blueprint for Cryptographic Provenance and Runti...
Paper ID: cr_10.3390_fi18050243 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered be...
|
05-05 04:13 | Success | - | |
|
exp_self.20260505040909.1130_20260505_040910
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505040909.1130 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 04:10 | Success | - | |
|
exp_self.20260505040132.1129_20260505_040132
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505040132.1129 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 04:02 | Success | - | |
|
exp_self.20260505035353.1128_20260505_035353
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505035353.1128 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 03:54 | Success | - | |
|
exp_self.20260505034623.1127_20260505_034623
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505034623.1127 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 03:47 | Success | - | |
|
exp_pytrain.20260505034356.281_20260505_034357
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 03:44 | Success | - | |
|
exp_self.20260505033646.1126_20260505_033646
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505033646.1126 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 03:37 | Success | - | |
|
exp_self.20260505032913.1125_20260505_032913
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505032913.1125 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 03:30 | Success | - | |
|
exp_self.20260505032127.1124_20260505_032127
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505032127.1124 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 03:22 | Success | - | |
|
exp_self.20260505031355.1123_20260505_031356
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505031355.1123 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 03:14 | Success | - | |
|
exp_pytrain.20260505031129.280_20260505_031130
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 03:12 | Success | - | |
|
exp_self.20260505030522.1122_20260505_030523
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505030522.1122 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 03:06 | Success | - | |
|
exp_self.20260505025753.1121_20260505_025753
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505025753.1121 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 02:58 | Success | - | |
|
exp_hf_2605.00529_20260505_025216
|
Hierarchical Abstract Tree for Cross-Document Retrieval-Augmented Generation
Paper ID: hf_2605.00529 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-05 02:53 | Success | - | |
|
exp_self.20260505025012.1120_20260505_025013
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505025012.1120 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 02:51 | Success | - | |
|
exp_self.20260505024241.1119_20260505_024241
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505024241.1119 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 02:43 | Success | - | |
|
exp_pytrain.20260505024006.279_20260505_024006
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 02:41 | Success | - | |
|
exp_self.20260505023303.1118_20260505_023303
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505023303.1118 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 02:34 | Success | - | |
|
exp_self.20260505022529.1117_20260505_022529
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505022529.1117 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 02:26 | Success | - | |
|
exp_self.20260505021759.1116_20260505_021759
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505021759.1116 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 02:19 | Success | - | |
|
exp_self.20260505021032.1115_20260505_021032
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505021032.1115 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 02:11 | Success | - | |
|
exp_pytrain.20260505020742.278_20260505_020743
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 02:08 | Success | - | |
|
exp_self.20260505020052.1114_20260505_020052
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505020052.1114 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 02:01 | Success | - | |
|
exp_self.20260505015316.1113_20260505_015316
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505015316.1113 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 01:54 | Success | - | |
|
exp_self.20260505014550.1112_20260505_014550
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505014550.1112 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 01:46 | Success | - | |
|
exp_self.20260505013822.1111_20260505_013823
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505013822.1111 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 01:39 | Success | - | |
|
exp_pytrain.20260505013550.277_20260505_013551
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 01:36 | Success | - | |
|
exp_self.20260505012901.1110_20260505_012901
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505012901.1110 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 01:30 | Success | - | |
|
exp_self.20260505012126.1109_20260505_012127
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505012126.1109 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 01:22 | Success | - | |
|
exp_self.20260505011359.1108_20260505_011400
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505011359.1108 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 01:15 | Success | - | |
|
exp_self.20260505010635.1107_20260505_010635
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505010635.1107 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 01:07 | Success | - | |
|
exp_pytrain.20260505010408.276_20260505_010408
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 01:05 | Success | - | |
|
exp_self.20260505005709.1106_20260505_005710
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505005709.1106 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 00:58 | Success | - | |
|
exp_self.20260505004941.1105_20260505_004941
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505004941.1105 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 00:50 | Success | - | |
|
exp_self.20260505004211.1104_20260505_004212
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505004211.1104 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 00:43 | Success | - | |
|
exp_self.20260505003445.1103_20260505_003445
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505003445.1103 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 00:35 | Success | - | |
|
exp_pytrain.20260505003217.275_20260505_003217
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 00:33 | Success | - | |
|
exp_self.20260505002516.1102_20260505_002517
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505002516.1102 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 00:26 | Success | - | |
|
exp_gh_Edgarzp12_realtime-sentiment-pipeline_20260505_001949
|
Edgarzp12/realtime-sentiment-pipeline
Paper ID: gh_Edgarzp12_realtime-sentiment-pipeline - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected S...
|
05-05 00:20 | Success | - | |
|
exp_self.20260505001740.1101_20260505_001740
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505001740.1101 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 00:18 | Success | - | |
|
exp_self.20260505001010.1100_20260505_001010
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505001010.1100 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 00:11 | Success | - | |
|
exp_self.20260505000326.1099_20260505_000326
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260505000326.1099 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-05 00:04 | Success | - | |
|
exp_pytrain.20260505000054.274_20260505_000054
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-05 00:01 | Success | - | |
|
exp_self.20260504235401.1098_20260504_235402
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504235401.1098 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 23:55 | Success | - | |
|
exp_self.20260504234636.1097_20260504_234636
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504234636.1097 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 23:47 | Success | - | |
|
exp_self.20260504233909.1096_20260504_233909
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504233909.1096 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 23:40 | Success | - | |
|
exp_self.20260504233144.1095_20260504_233144
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504233144.1095 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 23:32 | Success | - | |
|
exp_pytrain.20260504232909.273_20260504_232910
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 23:30 | Success | - | |
|
exp_self.20260504232208.1094_20260504_232209
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504232208.1094 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 23:23 | Success | - | |
|
exp_self.20260504231445.1093_20260504_231446
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504231445.1093 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 23:15 | Success | - | |
|
exp_self.20260504230716.1092_20260504_230716
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504230716.1092 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 23:08 | Success | - | |
|
exp_self.20260504225953.1091_20260504_225954
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504225953.1091 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 23:00 | Success | - | |
|
exp_pytrain.20260504225721.272_20260504_225721
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 22:58 | Success | - | |
|
exp_self.20260504225305.1090_20260504_225305
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504225305.1090 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 22:54 | Success | - | |
|
exp_2605.02884v1_20260504_224951
|
Unsupervised Machine Learning for Detecting Structural Anomalies in European Regional Statistics
Paper ID: 2605.02884v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
05-04 22:50 | Success | - | |
|
exp_self.20260504224423.1089_20260504_224423
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504224423.1089 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 22:45 | Success | - | |
|
exp_hf_2605.02881_20260504_224000
|
MolmoAct2: Action Reasoning Models for Real-world Deployment
Paper ID: hf_2605.02881 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-04 22:41 | Success | - | |
|
exp_self.20260504223647.1088_20260504_223648
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504223647.1088 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 22:37 | Success | - | |
|
exp_2605.02888v1_20260504_223358
|
SpecKV: Adaptive Speculative Decoding with Compression-Aware Gamma Selection
Paper ID: 2605.02888v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
05-04 22:35 | Success | - | |
|
exp_cr_10.18664_1994-7852.215.2026.358845_20260504_223110
|
IMPROVEMENT OF CARGO ROUTING TECHNOLOGY AT A CONTAINER HAB USING A COMPREHENSIVE MATHEMATICAL MODEL
Paper ID: cr_10.18664_1994-7852.215.2026.358845 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Sign...
|
05-04 22:32 | Success | - | |
|
exp_self.20260504222756.1087_20260504_222756
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504222756.1087 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 22:28 | Success | - | |
|
exp_pytrain.20260504222524.271_20260504_222524
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 22:26 | Success | - | |
|
exp_hf_2605.02222_20260504_222238
|
Generative Modeling with Orbit-Space Particle Flow Matching
Paper ID: hf_2605.02222 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-04 22:23 | Success | - | |
|
exp_self.20260504221820.1086_20260504_221820
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504221820.1086 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 22:19 | Success | - | |
|
exp_self.20260504221054.1085_20260504_221055
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504221054.1085 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 22:11 | Success | - | |
|
exp_self.20260504220326.1084_20260504_220327
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504220326.1084 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 22:04 | Success | - | |
|
exp_cr_10.3390_vehicles8050101_20260504_215908
|
A Vehicle Type Recognition Network Based on Feature Comparison and Mixture of Experts Model
Paper ID: cr_10.3390_vehicles8050101 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recover...
|
05-04 22:00 | Success | - | |
|
exp_self.20260504215555.1083_20260504_215555
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504215555.1083 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 21:56 | Success | - | |
|
exp_pytrain.20260504215323.270_20260504_215323
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 21:54 | Success | - | |
|
exp_self.20260504214632.1082_20260504_214632
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504214632.1082 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 21:47 | Success | - | |
|
exp_self.20260504213909.1081_20260504_213909
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504213909.1081 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 21:40 | Success | - | |
|
exp_2605.02866v1_20260504_213342
|
Laplacian Frequency Interaction Network for Rural Thematic Road Extraction
Paper ID: 2605.02866v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
05-04 21:34 | Success | - | |
|
exp_self.20260504213134.1080_20260504_213135
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504213134.1080 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 21:32 | Success | - | |
|
exp_2605.02860v1_20260504_212820
|
Standing on the Shoulders of Giants: Stabilized Knowledge Distillation for Cross--Language Code Clone Detection
Paper ID: 2605.02860v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
05-04 21:29 | Success | - | |
|
exp_self.20260504212405.1079_20260504_212405
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504212405.1079 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 21:25 | Success | - | |
|
exp_pytrain.20260504212134.269_20260504_212134
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 21:22 | Success | - | |
|
exp_self.20260504211616.1078_20260504_211616
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504211616.1078 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 21:17 | Success | - | |
|
exp_hf_2604.27660_20260504_211154
|
From Context to Skills: Can Language Models Learn from Context Skillfully?
Paper ID: hf_2604.27660 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-04 21:12 | Success | - | |
|
exp_self.20260504210841.1077_20260504_210842
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504210841.1077 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 21:09 | Success | - | |
|
exp_self.20260504210116.1076_20260504_210116
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504210116.1076 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 21:02 | Success | - | |
|
exp_cr_10.1007_s42452-026-08699-7_20260504_205758
|
A swin transformer enhanced reverse knowledge distillation model for industrial anomaly detection via window-aware stoch...
Paper ID: cr_10.1007_s42452-026-08699-7 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Reco...
|
05-04 20:59 | Success | - | |
|
exp_self.20260504205234.1075_20260504_205235
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504205234.1075 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 20:53 | Success | - | |
|
exp_pytrain.20260504205001.268_20260504_205002
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 20:51 | Success | - | |
|
exp_self.20260504204308.1074_20260504_204309
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504204308.1074 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 20:44 | Success | - | |
|
exp_self.20260504203539.1073_20260504_203540
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504203539.1073 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 20:36 | Success | - | |
|
exp_self.20260504202811.1072_20260504_202811
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504202811.1072 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 20:29 | Success | - | |
|
exp_self.20260504202111.1071_20260504_202111
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504202111.1071 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 20:22 | Success | - | |
|
exp_pytrain.20260504201843.267_20260504_201844
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 20:19 | Success | - | |
|
exp_self.20260504201141.1070_20260504_201141
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504201141.1070 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 20:12 | Success | - | |
|
exp_self.20260504200416.1069_20260504_200416
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504200416.1069 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 20:05 | Success | - | |
|
exp_self.20260504195649.1068_20260504_195650
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504195649.1068 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 19:57 | Success | - | |
|
exp_self.20260504194916.1067_20260504_194917
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504194916.1067 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 19:50 | Success | - | |
|
exp_pytrain.20260504194648.266_20260504_194648
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 19:47 | Success | - | |
|
exp_self.20260504193946.1066_20260504_193946
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504193946.1066 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 19:40 | Success | - | |
|
exp_self.20260504193218.1065_20260504_193218
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504193218.1065 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 19:33 | Success | - | |
|
exp_self.20260504192449.1064_20260504_192450
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504192449.1064 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 19:25 | Success | - | |
|
exp_self.20260504191709.1063_20260504_191709
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504191709.1063 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 19:18 | Success | - | |
|
exp_pytrain.20260504191435.265_20260504_191435
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 19:15 | Success | - | |
|
exp_self.20260504190727.1062_20260504_190727
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504190727.1062 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 19:08 | Success | - | |
|
exp_self.20260504185951.1061_20260504_185952
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504185951.1061 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 19:00 | Success | - | |
|
exp_self.20260504185215.1060_20260504_185215
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504185215.1060 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 18:53 | Success | - | |
|
exp_self.20260504184438.1059_20260504_184438
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504184438.1059 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 18:45 | Success | - | |
|
exp_pytrain.20260504184155.264_20260504_184155
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 18:42 | Success | - | |
|
exp_self.20260504183448.1058_20260504_183448
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504183448.1058 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 18:35 | Success | - | |
|
exp_self.20260504182710.1057_20260504_182711
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504182710.1057 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 18:28 | Success | - | |
|
exp_self.20260504181933.1056_20260504_181933
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504181933.1056 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 18:20 | Success | - | |
|
exp_self.20260504181208.1055_20260504_181209
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504181208.1055 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 18:13 | Success | - | |
|
exp_pytrain.20260504180936.263_20260504_180937
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 18:10 | Success | - | |
|
exp_self.20260504180231.1054_20260504_180231
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504180231.1054 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 18:03 | Success | - | |
|
exp_self.20260504175459.1053_20260504_175459
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504175459.1053 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 17:56 | Success | - | |
|
exp_self.20260504174719.1052_20260504_174720
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504174719.1052 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 17:48 | Success | - | |
|
exp_self.20260504173946.1051_20260504_173946
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504173946.1051 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 17:40 | Success | - | |
|
exp_pytrain.20260504173714.262_20260504_173714
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 17:38 | Success | - | |
|
exp_self.20260504173009.1050_20260504_173009
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504173009.1050 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 17:31 | Success | - | |
|
exp_self.20260504172233.1049_20260504_172233
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504172233.1049 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 17:23 | Success | - | |
|
exp_self.20260504171504.1048_20260504_171504
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504171504.1048 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 17:16 | Success | - | |
|
exp_self.20260504170722.1047_20260504_170723
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504170722.1047 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 17:08 | Success | - | |
|
exp_pytrain.20260504170447.261_20260504_170447
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 17:05 | Success | - | |
|
exp_self.20260504165739.1046_20260504_165739
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504165739.1046 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 16:58 | Success | - | |
|
exp_self.20260504165007.1045_20260504_165007
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504165007.1045 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 16:51 | Success | - | |
|
exp_self.20260504164237.1044_20260504_164237
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504164237.1044 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 16:43 | Success | - | |
|
exp_hf_2605.00347_20260504_163916
|
Odysseus: Scaling VLMs to 100+ Turn Decision-Making in Games via Reinforcement Learning
Paper ID: hf_2605.00347 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-04 16:40 | Success | - | |
|
exp_self.20260504163455.1043_20260504_163456
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504163455.1043 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 16:35 | Success | - | |
|
exp_pytrain.20260504163224.260_20260504_163224
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 16:33 | Success | - | |
|
exp_self.20260504162518.1042_20260504_162518
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504162518.1042 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 16:26 | Success | - | |
|
exp_self.20260504161749.1041_20260504_161750
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504161749.1041 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 16:18 | Success | - | |
|
exp_self.20260504161021.1040_20260504_161021
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504161021.1040 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 16:11 | Success | - | |
|
exp_self.20260504160329.1039_20260504_160329
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504160329.1039 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 16:04 | Success | - | |
|
exp_pytrain.20260504160101.259_20260504_160101
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 16:02 | Success | - | |
|
exp_self.20260504155357.1038_20260504_155358
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504155357.1038 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 15:55 | Success | - | |
|
exp_self.20260504154628.1037_20260504_154629
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504154628.1037 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 15:47 | Success | - | |
|
exp_self.20260504153855.1036_20260504_153856
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504153855.1036 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 15:39 | Success | - | |
|
exp_self.20260504153126.1035_20260504_153126
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504153126.1035 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 15:32 | Success | - | |
|
exp_pytrain.20260504152900.258_20260504_152900
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 15:30 | Success | - | |
|
exp_self.20260504152432.1034_20260504_152433
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504152432.1034 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 15:25 | Success | - | |
|
exp_self.20260504151701.1033_20260504_151701
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504151701.1033 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 15:18 | Success | - | |
|
exp_self.20260504150933.1032_20260504_150934
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504150933.1032 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 15:10 | Success | - | |
|
exp_self.20260504150142.1031_20260504_150142
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504150142.1031 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 15:02 | Success | - | |
|
exp_pytrain.20260504145739.257_20260504_145740
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 14:58 | Success | - | |
|
exp_self.20260504145034.1030_20260504_145034
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504145034.1030 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 14:51 | Success | - | |
|
exp_self.20260504144253.1029_20260504_144253
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504144253.1029 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 14:43 | Success | - | |
|
exp_hf_2604.27818_20260504_143825
|
MASCing: Configurable Mixture-of-Experts Behavior via Activation Steering Masks
Paper ID: hf_2604.27818 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-04 14:39 | Success | - | |
|
exp_self.20260504143510.1028_20260504_143511
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504143510.1028 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 14:36 | Success | - | |
|
exp_self.20260504142734.1027_20260504_142734
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504142734.1027 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 14:28 | Success | - | |
|
exp_pytrain.20260504142500.256_20260504_142501
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 14:26 | Success | - | |
|
exp_self.20260504141745.1026_20260504_141745
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504141745.1026 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 14:18 | Success | - | |
|
exp_self.20260504141008.1025_20260504_141008
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504141008.1025 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 14:11 | Success | - | |
|
exp_self.20260504140239.1024_20260504_140239
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504140239.1024 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 14:03 | Success | - | |
|
exp_self.20260504135503.1023_20260504_135504
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504135503.1023 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 13:56 | Success | - | |
|
exp_pytrain.20260504135235.255_20260504_135236
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 13:53 | Success | - | |
|
exp_self.20260504134534.1022_20260504_134535
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504134534.1022 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 13:46 | Success | - | |
|
exp_self.20260504133806.1021_20260504_133806
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504133806.1021 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 13:39 | Success | - | |
|
exp_self.20260504133036.1020_20260504_133037
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504133036.1020 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 13:31 | Success | - | |
|
exp_self.20260504132301.1019_20260504_132302
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504132301.1019 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 13:24 | Success | - | |
|
exp_pytrain.20260504132031.254_20260504_132032
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 13:21 | Success | - | |
|
exp_self.20260504131330.1018_20260504_131331
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504131330.1018 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 13:14 | Success | - | |
|
exp_self.20260504130601.1017_20260504_130601
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504130601.1017 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 13:07 | Success | - | |
|
exp_self.20260504125830.1016_20260504_125830
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504125830.1016 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 12:59 | Success | - | |
|
exp_self.20260504125100.1015_20260504_125100
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504125100.1015 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 12:52 | Success | - | |
|
exp_pytrain.20260504124826.253_20260504_124826
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 12:49 | Success | - | |
|
exp_self.20260504124125.1014_20260504_124125
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504124125.1014 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 12:42 | Success | - | |
|
exp_self.20260504123351.1013_20260504_123352
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504123351.1013 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 12:34 | Success | - | |
|
exp_self.20260504122622.1012_20260504_122622
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504122622.1012 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 12:27 | Success | - | |
|
exp_self.20260504121853.1011_20260504_121853
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504121853.1011 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 12:19 | Success | - | |
|
exp_pytrain.20260504121618.252_20260504_121618
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 12:17 | Success | - | |
|
exp_self.20260504120917.1010_20260504_120918
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504120917.1010 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 12:10 | Success | - | |
|
exp_self.20260504120145.1009_20260504_120145
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504120145.1009 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 12:02 | Success | - | |
|
exp_self.20260504115415.1008_20260504_115416
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504115415.1008 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 11:55 | Success | - | |
|
exp_self.20260504114645.1007_20260504_114646
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504114645.1007 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 11:47 | Success | - | |
|
exp_pytrain.20260504114411.251_20260504_114411
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 11:45 | Success | - | |
|
exp_self.20260504113715.1006_20260504_113716
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504113715.1006 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 11:38 | Success | - | |
|
exp_self.20260504112931.1005_20260504_112931
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504112931.1005 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 11:30 | Success | - | |
|
exp_self.20260504112151.1004_20260504_112152
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504112151.1004 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 11:22 | Success | - | |
|
exp_self.20260504111417.1003_20260504_111417
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504111417.1003 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 11:15 | Success | - | |
|
exp_pytrain.20260504111138.250_20260504_111139
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 11:12 | Success | - | |
|
exp_self.20260504110432.1002_20260504_110432
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504110432.1002 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 11:05 | Success | - | |
|
exp_self.20260504105648.1001_20260504_105648
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504105648.1001 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 10:57 | Success | - | |
|
exp_self.20260504104906.1000_20260504_104906
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504104906.1000 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 10:50 | Success | - | |
|
exp_self.20260504104127.999_20260504_104127
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504104127.999 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 10:42 | Success | - | |
|
exp_pytrain.20260504103851.249_20260504_103852
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 10:39 | Success | - | |
|
exp_self.20260504103249.998_20260504_103250
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504103249.998 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 10:33 | Success | - | |
|
exp_self.20260504102508.997_20260504_102508
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504102508.997 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 10:26 | Success | - | |
|
exp_self.20260504101731.996_20260504_101731
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504101731.996 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 10:18 | Success | - | |
|
exp_hf_2604.27124_20260504_101406
|
Better Models, Faster Training: Sigmoid Attention for single-cell Foundation Models
Paper ID: hf_2604.27124 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-04 10:15 | Success | - | |
|
exp_self.20260504100939.995_20260504_100940
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504100939.995 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 10:10 | Success | - | |
|
exp_pytrain.20260504100705.248_20260504_100705
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 10:08 | Success | - | |
|
exp_self.20260504100128.994_20260504_100128
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504100128.994 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 10:02 | Success | - | |
|
exp_self.20260504095348.993_20260504_095348
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504095348.993 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 09:54 | Success | - | |
|
exp_self.20260504094557.992_20260504_094558
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504094557.992 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 09:47 | Success | - | |
|
exp_self.20260504093817.991_20260504_093817
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504093817.991 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 09:39 | Success | - | |
|
exp_pytrain.20260504093543.247_20260504_093544
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 09:36 | Success | - | |
|
exp_self.20260504092942.990_20260504_092942
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504092942.990 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 09:30 | Success | - | |
|
exp_self.20260504092201.989_20260504_092201
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504092201.989 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 09:23 | Success | - | |
|
exp_self.20260504091424.988_20260504_091424
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504091424.988 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 09:15 | Success | - | |
|
exp_self.20260504090648.987_20260504_090648
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504090648.987 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 09:07 | Success | - | |
|
exp_pytrain.20260504090407.246_20260504_090408
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 09:05 | Success | - | |
|
exp_self.20260504085701.986_20260504_085702
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504085701.986 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 08:58 | Success | - | |
|
exp_self.20260504084917.985_20260504_084917
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504084917.985 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 08:50 | Success | - | |
|
exp_self.20260504084140.984_20260504_084140
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504084140.984 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 08:42 | Success | - | |
|
exp_self.20260504083425.983_20260504_083425
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504083425.983 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 08:35 | Success | - | |
|
exp_pytrain.20260504083152.245_20260504_083153
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 08:32 | Success | - | |
|
exp_self.20260504082739.982_20260504_082740
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504082739.982 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 08:28 | Success | - | |
|
exp_self.20260504081759.981_20260504_081800
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504081759.981 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 08:19 | Success | - | |
|
exp_self.20260504081016.980_20260504_081017
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504081016.980 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 08:11 | Success | - | |
|
exp_self.20260504080233.979_20260504_080234
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504080233.979 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 08:03 | Success | - | |
|
exp_pytrain.20260504080001.244_20260504_080001
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 08:01 | Success | - | |
|
exp_self.20260504075251.978_20260504_075251
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504075251.978 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 07:53 | Success | - | |
|
exp_self.20260504074517.977_20260504_074517
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504074517.977 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 07:46 | Success | - | |
|
exp_self.20260504073739.976_20260504_073740
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504073739.976 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 07:38 | Success | - | |
|
exp_self.20260504072955.975_20260504_072955
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504072955.975 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 07:30 | Success | - | |
|
exp_pytrain.20260504072720.243_20260504_072720
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 07:28 | Success | - | |
|
exp_self.20260504072114.974_20260504_072115
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504072114.974 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 07:22 | Success | - | |
|
exp_self.20260504071331.973_20260504_071332
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504071331.973 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 07:14 | Success | - | |
|
exp_self.20260504070554.972_20260504_070554
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504070554.972 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 07:06 | Success | - | |
|
exp_self.20260504065818.971_20260504_065818
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504065818.971 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 06:59 | Success | - | |
|
exp_pytrain.20260504065539.242_20260504_065539
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 06:56 | Success | - | |
|
exp_self.20260504064833.970_20260504_064833
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504064833.970 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 06:49 | Success | - | |
|
exp_self.20260504064052.969_20260504_064052
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504064052.969 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 06:41 | Success | - | |
|
exp_self.20260504063312.968_20260504_063312
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504063312.968 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 06:34 | Success | - | |
|
exp_self.20260504062531.967_20260504_062531
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504062531.967 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 06:26 | Success | - | |
|
exp_pytrain.20260504062256.241_20260504_062256
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 06:24 | Success | - | |
|
exp_self.20260504061655.966_20260504_061655
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504061655.966 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 06:17 | Success | - | |
|
exp_self.20260504060918.965_20260504_060918
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504060918.965 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 06:10 | Success | - | |
|
exp_self.20260504060142.964_20260504_060142
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504060142.964 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 06:02 | Success | - | |
|
exp_self.20260504055400.963_20260504_055400
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504055400.963 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 05:55 | Success | - | |
|
exp_pytrain.20260504055119.240_20260504_055120
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 05:52 | Success | - | |
|
exp_self.20260504054419.962_20260504_054420
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504054419.962 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 05:45 | Success | - | |
|
exp_self.20260504053643.961_20260504_053643
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504053643.961 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 05:37 | Success | - | |
|
exp_self.20260504052911.960_20260504_052911
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504052911.960 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 05:30 | Success | - | |
|
exp_self.20260504052135.959_20260504_052135
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504052135.959 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 05:22 | Success | - | |
|
exp_pytrain.20260504051855.239_20260504_051855
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 05:19 | Success | - | |
|
exp_self.20260504051146.958_20260504_051147
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504051146.958 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 05:12 | Success | - | |
|
exp_self.20260504050403.957_20260504_050403
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504050403.957 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 05:05 | Success | - | |
|
exp_self.20260504045628.956_20260504_045628
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504045628.956 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 04:57 | Success | - | |
|
exp_self.20260504044852.955_20260504_044852
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504044852.955 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 04:49 | Success | - | |
|
exp_pytrain.20260504044612.238_20260504_044612
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 04:47 | Success | - | |
|
exp_self.20260504043913.954_20260504_043914
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504043913.954 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 04:40 | Success | - | |
|
exp_self.20260504043127.953_20260504_043128
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504043127.953 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 04:32 | Success | - | |
|
exp_self.20260504042346.952_20260504_042347
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504042346.952 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 04:24 | Success | - | |
|
exp_self.20260504041612.951_20260504_041612
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504041612.951 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 04:17 | Success | - | |
|
exp_pytrain.20260504041335.237_20260504_041335
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 04:14 | Success | - | |
|
exp_self.20260504040628.950_20260504_040628
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504040628.950 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 04:07 | Success | - | |
|
exp_self.20260504035842.949_20260504_035842
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504035842.949 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 03:59 | Success | - | |
|
exp_self.20260504035057.948_20260504_035057
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504035057.948 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 03:52 | Success | - | |
|
exp_self.20260504034322.947_20260504_034323
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504034322.947 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 03:44 | Success | - | |
|
exp_pytrain.20260504034050.236_20260504_034050
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 03:41 | Success | - | |
|
exp_self.20260504033452.946_20260504_033452
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504033452.946 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 03:35 | Success | - | |
|
exp_self.20260504032713.945_20260504_032713
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504032713.945 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 03:28 | Success | - | |
|
exp_self.20260504031941.944_20260504_031941
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504031941.944 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 03:20 | Success | - | |
|
exp_self.20260504031203.943_20260504_031204
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504031203.943 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 03:13 | Success | - | |
|
exp_pytrain.20260504030924.235_20260504_030924
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 03:10 | Success | - | |
|
exp_self.20260504030400.942_20260504_030401
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504030400.942 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 03:05 | Success | - | |
|
exp_hf_2604.23586_20260504_030038
|
Talker-T2AV: Joint Talking Audio-Video Generation with Autoregressive Diffusion Modeling
Paper ID: hf_2604.23586 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-04 03:01 | Success | - | |
|
exp_self.20260504025506.941_20260504_025507
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504025506.941 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 02:56 | Success | - | |
|
exp_self.20260504024728.940_20260504_024728
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504024728.940 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 02:48 | Success | - | |
|
exp_self.20260504023954.939_20260504_023955
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504023954.939 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 02:40 | Success | - | |
|
exp_pytrain.20260504023725.234_20260504_023725
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 02:38 | Success | - | |
|
exp_self.20260504023152.938_20260504_023153
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504023152.938 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 02:32 | Success | - | |
|
exp_self.20260504022414.937_20260504_022414
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504022414.937 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 02:25 | Success | - | |
|
exp_self.20260504021643.936_20260504_021643
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504021643.936 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 02:17 | Success | - | |
|
exp_self.20260504020908.935_20260504_020908
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504020908.935 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 02:10 | Success | - | |
|
exp_pytrain.20260504020600.233_20260504_020600
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 02:07 | Success | - | |
|
exp_self.20260504015856.934_20260504_015856
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504015856.934 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 01:59 | Success | - | |
|
exp_self.20260504015113.933_20260504_015113
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504015113.933 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 01:52 | Success | - | |
|
exp_hf_2605.00323_20260504_014639
|
Online Self-Calibration Against Hallucination in Vision-Language Models
Paper ID: hf_2605.00323 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-04 01:47 | Success | - | |
|
exp_self.20260504014431.932_20260504_014431
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504014431.932 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 01:45 | Success | - | |
|
exp_self.20260504013700.931_20260504_013701
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504013700.931 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 01:38 | Success | - | |
|
exp_pytrain.20260504013421.232_20260504_013421
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 01:35 | Success | - | |
|
exp_self.20260504012724.930_20260504_012725
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504012724.930 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 01:28 | Success | - | |
|
exp_self.20260504011951.929_20260504_011952
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504011951.929 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 01:20 | Success | - | |
|
exp_self.20260504011218.928_20260504_011218
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504011218.928 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 01:13 | Success | - | |
|
exp_self.20260504010447.927_20260504_010447
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504010447.927 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 01:05 | Success | - | |
|
exp_pytrain.20260504010208.231_20260504_010209
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 01:03 | Success | - | |
|
exp_hf_2605.00691_20260504_005919
|
Learning to Act and Cooperate for Distributed Black-Box Consensus Optimization
Paper ID: hf_2605.00691 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-04 01:00 | Success | - | |
|
exp_self.20260504005454.926_20260504_005455
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504005454.926 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 00:55 | Success | - | |
|
exp_self.20260504004722.925_20260504_004723
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504004722.925 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 00:48 | Success | - | |
|
exp_self.20260504003950.924_20260504_003950
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504003950.924 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 00:40 | Success | - | |
|
exp_self.20260504003211.923_20260504_003211
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504003211.923 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 00:33 | Success | - | |
|
exp_pytrain.20260504002933.230_20260504_002934
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-04 00:30 | Success | - | |
|
exp_self.20260504002229.922_20260504_002229
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504002229.922 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 00:23 | Success | - | |
|
exp_self.20260504001451.921_20260504_001452
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504001451.921 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 00:15 | Success | - | |
|
exp_self.20260504000713.920_20260504_000713
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260504000713.920 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 00:08 | Success | - | |
|
exp_self.20260503235926.919_20260503_235926
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503235926.919 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-04 00:00 | Success | - | |
|
exp_pytrain.20260503235654.229_20260503_235654
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 23:57 | Success | - | |
|
exp_self.20260503234951.918_20260503_234951
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503234951.918 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 23:50 | Success | - | |
|
exp_self.20260503234216.917_20260503_234216
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503234216.917 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 23:43 | Success | - | |
|
exp_self.20260503233444.916_20260503_233444
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503233444.916 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 23:35 | Success | - | |
|
exp_self.20260503232710.915_20260503_232710
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503232710.915 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 23:28 | Success | - | |
|
exp_pytrain.20260503232432.228_20260503_232432
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 23:25 | Success | - | |
|
exp_self.20260503232014.914_20260503_232014
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503232014.914 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 23:21 | Success | - | |
|
exp_self.20260503231240.913_20260503_231241
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503231240.913 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 23:13 | Success | - | |
|
exp_self.20260503230508.912_20260503_230508
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503230508.912 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 23:06 | Success | - | |
|
exp_hf_2604.23195_20260503_230212
|
AnalogRetriever: Learning Cross-Modal Representations for Analog Circuit Retrieval
Paper ID: hf_2604.23195 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-03 23:03 | Success | - | |
|
exp_self.20260503225503.911_20260503_225503
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503225503.911 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 22:56 | Success | - | |
|
exp_pytrain.20260503225231.227_20260503_225232
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 22:53 | Success | - | |
|
exp_self.20260503224524.910_20260503_224525
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503224524.910 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 22:46 | Success | - | |
|
exp_self.20260503223749.909_20260503_223749
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503223749.909 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 22:38 | Success | - | |
|
exp_self.20260503223015.908_20260503_223016
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503223015.908 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 22:31 | Success | - | |
|
exp_self.20260503222243.907_20260503_222243
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503222243.907 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 22:23 | Success | - | |
|
exp_pytrain.20260503222009.226_20260503_222009
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 22:21 | Success | - | |
|
exp_self.20260503221304.906_20260503_221305
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503221304.906 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 22:14 | Success | - | |
|
exp_self.20260503220535.905_20260503_220535
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503220535.905 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 22:06 | Success | - | |
|
exp_self.20260503215803.904_20260503_215803
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503215803.904 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 21:59 | Success | - | |
|
exp_self.20260503215023.903_20260503_215024
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503215023.903 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 21:51 | Success | - | |
|
exp_pytrain.20260503214751.225_20260503_214751
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 21:48 | Success | - | |
|
exp_self.20260503214045.902_20260503_214045
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503214045.902 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 21:41 | Success | - | |
|
exp_self.20260503213312.901_20260503_213313
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503213312.901 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 21:34 | Success | - | |
|
exp_self.20260503212540.900_20260503_212540
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503212540.900 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 21:26 | Success | - | |
|
exp_self.20260503211803.899_20260503_211803
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503211803.899 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 21:19 | Success | - | |
|
exp_pytrain.20260503211529.224_20260503_211530
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 21:16 | Success | - | |
|
exp_self.20260503211006.898_20260503_211006
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503211006.898 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 21:11 | Success | - | |
|
exp_self.20260503210229.897_20260503_210230
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503210229.897 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 21:03 | Success | - | |
|
exp_2605.00814v1_20260503_205911
|
Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs
Paper ID: 2605.00814v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
05-03 21:00 | Success | - | |
|
exp_self.20260503205449.896_20260503_205449
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503205449.896 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 20:55 | Success | - | |
|
exp_hf_2605.00658_20260503_205126
|
UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors
Paper ID: hf_2605.00658 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-03 20:52 | Success | - | |
|
exp_self.20260503204553.895_20260503_204554
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503204553.895 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 20:46 | Success | - | |
|
exp_pytrain.20260503204320.223_20260503_204320
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 20:44 | Success | - | |
|
exp_self.20260503203616.894_20260503_203616
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503203616.894 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 20:37 | Success | - | |
|
exp_self.20260503202843.893_20260503_202844
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503202843.893 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 20:29 | Success | - | |
|
exp_self.20260503202113.892_20260503_202113
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503202113.892 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 20:22 | Success | - | |
|
exp_self.20260503201341.891_20260503_201341
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503201341.891 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 20:14 | Success | - | |
|
exp_pytrain.20260503201103.222_20260503_201104
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 20:12 | Success | - | |
|
exp_self.20260503200406.890_20260503_200407
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503200406.890 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 20:05 | Success | - | |
|
exp_self.20260503195634.889_20260503_195634
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503195634.889 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 19:57 | Success | - | |
|
exp_self.20260503194904.888_20260503_194904
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503194904.888 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 19:50 | Success | - | |
|
exp_self.20260503194129.887_20260503_194130
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503194129.887 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 19:42 | Success | - | |
|
exp_pytrain.20260503193852.221_20260503_193853
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 19:39 | Success | - | |
|
exp_self.20260503193156.886_20260503_193157
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503193156.886 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 19:32 | Success | - | |
|
exp_self.20260503192418.885_20260503_192419
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503192418.885 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 19:25 | Success | - | |
|
exp_self.20260503191649.884_20260503_191649
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503191649.884 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 19:17 | Success | - | |
|
exp_self.20260503190920.883_20260503_190920
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503190920.883 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 19:10 | Success | - | |
|
exp_pytrain.20260503190644.220_20260503_190644
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 19:07 | Success | - | |
|
exp_self.20260503185946.882_20260503_185946
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503185946.882 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 19:00 | Success | - | |
|
exp_self.20260503185212.881_20260503_185212
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503185212.881 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 18:53 | Success | - | |
|
exp_self.20260503184437.880_20260503_184438
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503184437.880 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 18:45 | Success | - | |
|
exp_self.20260503183708.879_20260503_183709
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503183708.879 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 18:38 | Success | - | |
|
exp_pytrain.20260503183433.219_20260503_183434
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 18:35 | Success | - | |
|
exp_self.20260503182907.878_20260503_182907
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503182907.878 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 18:30 | Success | - | |
|
exp_self.20260503182136.877_20260503_182136
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503182136.877 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 18:22 | Success | - | |
|
exp_self.20260503181358.876_20260503_181358
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503181358.876 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 18:15 | Success | - | |
|
exp_self.20260503180557.875_20260503_180557
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503180557.875 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 18:07 | Success | - | |
|
exp_pytrain.20260503180301.218_20260503_180301
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 18:04 | Success | - | |
|
exp_self.20260503175722.874_20260503_175722
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503175722.874 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 17:58 | Success | - | |
|
exp_self.20260503174921.873_20260503_174921
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503174921.873 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 17:50 | Success | - | |
|
exp_self.20260503174135.872_20260503_174136
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503174135.872 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 17:42 | Success | - | |
|
exp_self.20260503173346.871_20260503_173347
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503173346.871 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 17:34 | Success | - | |
|
exp_pytrain.20260503173050.217_20260503_173051
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 17:31 | Success | - | |
|
exp_self.20260503172348.870_20260503_172348
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503172348.870 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 17:24 | Success | - | |
|
exp_self.20260503171613.869_20260503_171613
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503171613.869 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 17:17 | Success | - | |
|
exp_self.20260503170839.868_20260503_170840
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503170839.868 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 17:09 | Success | - | |
|
exp_self.20260503170106.867_20260503_170107
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503170106.867 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 17:02 | Success | - | |
|
exp_pytrain.20260503165833.216_20260503_165833
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 16:59 | Success | - | |
|
exp_self.20260503165136.866_20260503_165137
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503165136.866 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 16:52 | Success | - | |
|
exp_self.20260503164359.865_20260503_164359
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503164359.865 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 16:45 | Success | - | |
|
exp_self.20260503163609.864_20260503_163610
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503163609.864 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 16:37 | Success | - | |
|
exp_self.20260503162838.863_20260503_162839
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503162838.863 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 16:29 | Success | - | |
|
exp_pytrain.20260503162555.215_20260503_162556
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 16:26 | Success | - | |
|
exp_self.20260503161853.862_20260503_161853
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503161853.862 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 16:19 | Success | - | |
|
exp_self.20260503161120.861_20260503_161121
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503161120.861 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 16:12 | Success | - | |
|
exp_self.20260503160350.860_20260503_160350
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503160350.860 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 16:04 | Success | - | |
|
exp_self.20260503155619.859_20260503_155620
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503155619.859 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 15:57 | Success | - | |
|
exp_pytrain.20260503155338.214_20260503_155339
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 15:54 | Success | - | |
|
exp_self.20260503154644.858_20260503_154644
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503154644.858 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 15:47 | Success | - | |
|
exp_self.20260503153908.857_20260503_153909
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503153908.857 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 15:40 | Success | - | |
|
exp_self.20260503153130.856_20260503_153130
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503153130.856 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 15:32 | Success | - | |
|
exp_self.20260503152359.855_20260503_152400
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503152359.855 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 15:25 | Success | - | |
|
exp_pytrain.20260503152124.213_20260503_152125
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 15:22 | Success | - | |
|
exp_self.20260503151414.854_20260503_151414
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503151414.854 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 15:15 | Success | - | |
|
exp_self.20260503150633.853_20260503_150633
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503150633.853 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 15:07 | Success | - | |
|
exp_self.20260503145845.852_20260503_145845
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503145845.852 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 14:59 | Success | - | |
|
exp_self.20260503145114.851_20260503_145114
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503145114.851 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 14:52 | Success | - | |
|
exp_pytrain.20260503144843.212_20260503_144844
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 14:49 | Success | - | |
|
exp_self.20260503144145.850_20260503_144146
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503144145.850 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 14:42 | Success | - | |
|
exp_self.20260503143410.849_20260503_143410
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503143410.849 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 14:35 | Success | - | |
|
exp_self.20260503142634.848_20260503_142635
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503142634.848 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 14:27 | Success | - | |
|
exp_self.20260503141857.847_20260503_141857
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503141857.847 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 14:20 | Success | - | |
|
exp_pytrain.20260503141622.211_20260503_141622
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 14:17 | Success | - | |
|
exp_self.20260503140917.846_20260503_140917
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503140917.846 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 14:10 | Success | - | |
|
exp_self.20260503140147.845_20260503_140147
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503140147.845 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 14:02 | Success | - | |
|
exp_self.20260503135413.844_20260503_135414
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503135413.844 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 13:55 | Success | - | |
|
exp_self.20260503134643.843_20260503_134643
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503134643.843 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 13:47 | Success | - | |
|
exp_pytrain.20260503134412.210_20260503_134413
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 13:45 | Success | - | |
|
exp_self.20260503133709.842_20260503_133710
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503133709.842 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 13:38 | Success | - | |
|
exp_self.20260503132940.841_20260503_132940
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503132940.841 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 13:30 | Success | - | |
|
exp_self.20260503132208.840_20260503_132209
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503132208.840 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 13:23 | Success | - | |
|
exp_self.20260503131436.839_20260503_131436
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503131436.839 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 13:15 | Success | - | |
|
exp_pytrain.20260503131204.209_20260503_131204
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 13:13 | Success | - | |
|
exp_self.20260503130500.838_20260503_130500
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503130500.838 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 13:06 | Success | - | |
|
exp_self.20260503125732.837_20260503_125732
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503125732.837 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 12:58 | Success | - | |
|
exp_self.20260503125002.836_20260503_125003
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503125002.836 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 12:51 | Success | - | |
|
exp_self.20260503124230.835_20260503_124230
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503124230.835 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 12:43 | Success | - | |
|
exp_pytrain.20260503123957.208_20260503_123957
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 12:40 | Success | - | |
|
exp_self.20260503123300.834_20260503_123300
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503123300.834 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 12:34 | Success | - | |
|
exp_self.20260503122529.833_20260503_122530
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503122529.833 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 12:26 | Success | - | |
|
exp_self.20260503121757.832_20260503_121757
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503121757.832 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 12:19 | Success | - | |
|
exp_self.20260503121026.831_20260503_121026
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503121026.831 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 12:11 | Success | - | |
|
exp_pytrain.20260503120749.207_20260503_120749
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 12:08 | Success | - | |
|
exp_self.20260503120049.830_20260503_120049
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503120049.830 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 12:01 | Success | - | |
|
exp_self.20260503115317.829_20260503_115317
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503115317.829 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 11:54 | Success | - | |
|
exp_self.20260503114548.828_20260503_114549
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503114548.828 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 11:46 | Success | - | |
|
exp_self.20260503113818.827_20260503_113818
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503113818.827 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 11:39 | Success | - | |
|
exp_pytrain.20260503113537.206_20260503_113538
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 11:36 | Success | - | |
|
exp_self.20260503112841.826_20260503_112841
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503112841.826 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 11:29 | Success | - | |
|
exp_self.20260503112111.825_20260503_112111
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503112111.825 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 11:22 | Success | - | |
|
exp_self.20260503111333.824_20260503_111333
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503111333.824 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 11:14 | Success | - | |
|
exp_self.20260503110600.823_20260503_110600
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503110600.823 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 11:07 | Success | - | |
|
exp_pytrain.20260503110322.205_20260503_110322
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 11:04 | Success | - | |
|
exp_self.20260503105625.822_20260503_105626
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503105625.822 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 10:57 | Success | - | |
|
exp_self.20260503104847.821_20260503_104847
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503104847.821 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 10:49 | Success | - | |
|
exp_self.20260503104115.820_20260503_104115
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503104115.820 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 10:42 | Success | - | |
|
exp_self.20260503103341.819_20260503_103341
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503103341.819 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 10:34 | Success | - | |
|
exp_pytrain.20260503103106.204_20260503_103107
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 10:32 | Success | - | |
|
exp_self.20260503102407.818_20260503_102408
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503102407.818 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 10:25 | Success | - | |
|
exp_self.20260503101627.817_20260503_101627
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503101627.817 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 10:17 | Success | - | |
|
exp_self.20260503100853.816_20260503_100854
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503100853.816 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 10:09 | Success | - | |
|
exp_self.20260503100123.815_20260503_100123
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503100123.815 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 10:02 | Success | - | |
|
exp_pytrain.20260503095852.203_20260503_095852
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 09:59 | Success | - | |
|
exp_self.20260503095253.814_20260503_095253
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503095253.814 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 09:53 | Success | - | |
|
exp_self.20260503094519.813_20260503_094519
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503094519.813 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 09:46 | Success | - | |
|
exp_self.20260503093750.812_20260503_093750
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503093750.812 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 09:38 | Success | - | |
|
exp_self.20260503093010.811_20260503_093011
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503093010.811 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 09:31 | Success | - | |
|
exp_pytrain.20260503092729.202_20260503_092730
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 09:28 | Success | - | |
|
exp_self.20260503092034.810_20260503_092034
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503092034.810 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 09:21 | Success | - | |
|
exp_self.20260503091246.809_20260503_091246
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503091246.809 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 09:13 | Success | - | |
|
exp_self.20260503090509.808_20260503_090509
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503090509.808 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 09:06 | Success | - | |
|
exp_self.20260503085736.807_20260503_085736
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503085736.807 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 08:58 | Success | - | |
|
exp_pytrain.20260503085454.201_20260503_085454
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 08:55 | Success | - | |
|
exp_self.20260503084759.806_20260503_084759
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503084759.806 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 08:49 | Success | - | |
|
exp_self.20260503084020.805_20260503_084021
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503084020.805 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 08:41 | Success | - | |
|
exp_self.20260503083250.804_20260503_083251
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503083250.804 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 08:33 | Success | - | |
|
exp_self.20260503082521.803_20260503_082522
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503082521.803 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 08:26 | Success | - | |
|
exp_pytrain.20260503082247.200_20260503_082247
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 08:23 | Success | - | |
|
exp_self.20260503081553.802_20260503_081553
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503081553.802 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 08:16 | Success | - | |
|
exp_self.20260503080816.801_20260503_080816
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503080816.801 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 08:09 | Success | - | |
|
exp_self.20260503080042.800_20260503_080043
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503080042.800 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 08:01 | Success | - | |
|
exp_self.20260503075313.799_20260503_075313
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503075313.799 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 07:54 | Success | - | |
|
exp_pytrain.20260503075039.199_20260503_075039
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 07:51 | Success | - | |
|
exp_self.20260503074346.798_20260503_074347
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503074346.798 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 07:44 | Success | - | |
|
exp_gh_tamimmirza_hallueval_20260503_073921
|
tamimmirza/hallueval
Paper ID: gh_tamimmirza_hallueval - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 07:40 | Success | - | |
|
exp_self.20260503073607.797_20260503_073607
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503073607.797 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 07:37 | Success | - | |
|
exp_self.20260503072832.796_20260503_072833
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503072832.796 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 07:29 | Success | - | |
|
exp_self.20260503072055.795_20260503_072055
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503072055.795 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 07:21 | Success | - | |
|
exp_pytrain.20260503071821.198_20260503_071822
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 07:19 | Success | - | |
|
exp_self.20260503071119.794_20260503_071119
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503071119.794 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 07:12 | Success | - | |
|
exp_self.20260503070351.793_20260503_070352
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503070351.793 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 07:04 | Success | - | |
|
exp_self.20260503065623.792_20260503_065624
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503065623.792 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 06:57 | Success | - | |
|
exp_self.20260503064841.791_20260503_064841
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503064841.791 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 06:49 | Success | - | |
|
exp_pytrain.20260503064606.197_20260503_064606
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 06:47 | Success | - | |
|
exp_self.20260503063902.790_20260503_063902
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503063902.790 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 06:40 | Success | - | |
|
exp_self.20260503063127.789_20260503_063128
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503063127.789 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 06:32 | Success | - | |
|
exp_self.20260503062355.788_20260503_062355
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503062355.788 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 06:24 | Success | - | |
|
exp_self.20260503061625.787_20260503_061626
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503061625.787 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 06:17 | Success | - | |
|
exp_pytrain.20260503061339.196_20260503_061339
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 06:14 | Success | - | |
|
exp_self.20260503060648.786_20260503_060648
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503060648.786 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 06:07 | Success | - | |
|
exp_self.20260503055921.785_20260503_055922
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503055921.785 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 06:00 | Success | - | |
|
exp_self.20260503055156.784_20260503_055156
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503055156.784 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 05:52 | Success | - | |
|
exp_self.20260503054428.783_20260503_054429
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503054428.783 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 05:45 | Success | - | |
|
exp_pytrain.20260503054158.195_20260503_054158
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 05:43 | Success | - | |
|
exp_self.20260503053509.782_20260503_053509
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503053509.782 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 05:36 | Success | - | |
|
exp_self.20260503052738.781_20260503_052738
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503052738.781 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 05:28 | Success | - | |
|
exp_self.20260503052009.780_20260503_052010
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503052009.780 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 05:21 | Success | - | |
|
exp_self.20260503051242.779_20260503_051242
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503051242.779 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 05:13 | Success | - | |
|
exp_pytrain.20260503051011.194_20260503_051012
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 05:11 | Success | - | |
|
exp_self.20260503050323.778_20260503_050323
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503050323.778 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 05:04 | Success | - | |
|
exp_self.20260503045550.777_20260503_045550
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503045550.777 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 04:56 | Success | - | |
|
exp_self.20260503044822.776_20260503_044823
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503044822.776 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 04:49 | Success | - | |
|
exp_self.20260503044056.775_20260503_044056
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503044056.775 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 04:41 | Success | - | |
|
exp_pytrain.20260503043829.193_20260503_043830
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 04:39 | Success | - | |
|
exp_self.20260503043136.774_20260503_043137
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503043136.774 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 04:32 | Success | - | |
|
exp_self.20260503042411.773_20260503_042412
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503042411.773 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 04:25 | Success | - | |
|
exp_self.20260503041641.772_20260503_041642
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503041641.772 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 04:17 | Success | - | |
|
exp_self.20260503040915.771_20260503_040915
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503040915.771 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 04:10 | Success | - | |
|
exp_pytrain.20260503040650.192_20260503_040651
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 04:07 | Success | - | |
|
exp_self.20260503035949.770_20260503_035949
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503035949.770 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 04:00 | Success | - | |
|
exp_self.20260503035220.769_20260503_035221
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503035220.769 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 03:53 | Success | - | |
|
exp_self.20260503034448.768_20260503_034448
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503034448.768 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 03:45 | Success | - | |
|
exp_self.20260503033717.767_20260503_033717
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503033717.767 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 03:38 | Success | - | |
|
exp_pytrain.20260503033451.191_20260503_033451
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 03:35 | Success | - | |
|
exp_self.20260503032752.766_20260503_032752
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503032752.766 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 03:28 | Success | - | |
|
exp_self.20260503032016.765_20260503_032016
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503032016.765 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 03:21 | Success | - | |
|
exp_self.20260503031249.764_20260503_031249
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503031249.764 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 03:13 | Success | - | |
|
exp_self.20260503030514.763_20260503_030514
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503030514.763 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 03:06 | Success | - | |
|
exp_pytrain.20260503030247.190_20260503_030247
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 03:03 | Success | - | |
|
exp_self.20260503025546.762_20260503_025546
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503025546.762 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 02:56 | Success | - | |
|
exp_self.20260503024822.761_20260503_024823
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503024822.761 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 02:49 | Success | - | |
|
exp_self.20260503024057.760_20260503_024057
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503024057.760 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 02:42 | Success | - | |
|
exp_self.20260503023326.759_20260503_023326
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503023326.759 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 02:34 | Success | - | |
|
exp_pytrain.20260503023058.189_20260503_023058
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 02:32 | Success | - | |
|
exp_self.20260503022400.758_20260503_022400
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503022400.758 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 02:25 | Success | - | |
|
exp_self.20260503021635.757_20260503_021636
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503021635.757 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 02:17 | Success | - | |
|
exp_self.20260503020907.756_20260503_020908
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503020907.756 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 02:10 | Success | - | |
|
exp_self.20260503020140.755_20260503_020141
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503020140.755 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 02:02 | Success | - | |
|
exp_pytrain.20260503015909.188_20260503_015909
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 02:00 | Success | - | |
|
exp_self.20260503015210.754_20260503_015210
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503015210.754 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 01:53 | Success | - | |
|
exp_self.20260503014442.753_20260503_014442
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503014442.753 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 01:45 | Success | - | |
|
exp_self.20260503013720.752_20260503_013720
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503013720.752 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 01:38 | Success | - | |
|
exp_self.20260503012953.751_20260503_012953
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503012953.751 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 01:30 | Success | - | |
|
exp_pytrain.20260503012718.187_20260503_012718
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 01:28 | Success | - | |
|
exp_self.20260503012026.750_20260503_012026
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503012026.750 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 01:21 | Success | - | |
|
exp_self.20260503011300.749_20260503_011300
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503011300.749 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 01:14 | Success | - | |
|
exp_self.20260503010531.748_20260503_010531
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503010531.748 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 01:06 | Success | - | |
|
exp_self.20260503005757.747_20260503_005758
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503005757.747 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 00:59 | Success | - | |
|
exp_pytrain.20260503005526.186_20260503_005526
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 00:56 | Success | - | |
|
exp_self.20260503005005.746_20260503_005005
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503005005.746 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 00:51 | Success | - | |
|
exp_gh_divyamhi_longbench-diagnostics_20260503_004652
|
divyamhi/longbench-diagnostics
Paper ID: gh_divyamhi_longbench-diagnostics - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal:...
|
05-03 00:47 | Success | - | |
|
exp_self.20260503004127.745_20260503_004127
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503004127.745 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 00:42 | Success | - | |
|
exp_self.20260503003402.744_20260503_003403
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503003402.744 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 00:35 | Success | - | |
|
exp_self.20260503002631.743_20260503_002632
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503002631.743 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 00:27 | Success | - | |
|
exp_pytrain.20260503002405.185_20260503_002406
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-03 00:25 | Success | - | |
|
exp_self.20260503001706.742_20260503_001707
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503001706.742 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 00:18 | Success | - | |
|
exp_self.20260503000932.741_20260503_000933
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503000932.741 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 00:10 | Success | - | |
|
exp_self.20260503000118.740_20260503_000119
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260503000118.740 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-03 00:02 | Success | - | |
|
exp_self.20260502235356.739_20260502_235357
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502235356.739 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 23:54 | Success | - | |
|
exp_pytrain.20260502235126.184_20260502_235126
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 23:52 | Success | - | |
|
exp_self.20260502234437.738_20260502_234438
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502234437.738 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 23:45 | Success | - | |
|
exp_self.20260502233709.737_20260502_233709
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502233709.737 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 23:38 | Success | - | |
|
exp_self.20260502232940.736_20260502_232941
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502232940.736 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 23:30 | Success | - | |
|
exp_self.20260502232213.735_20260502_232214
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502232213.735 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 23:23 | Success | - | |
|
exp_pytrain.20260502231947.183_20260502_231947
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 23:20 | Success | - | |
|
exp_self.20260502231247.734_20260502_231247
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502231247.734 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 23:13 | Success | - | |
|
exp_self.20260502230522.733_20260502_230522
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502230522.733 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 23:06 | Success | - | |
|
exp_self.20260502225752.732_20260502_225752
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502225752.732 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 22:58 | Success | - | |
|
exp_self.20260502225016.731_20260502_225016
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502225016.731 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 22:51 | Success | - | |
|
exp_pytrain.20260502224751.182_20260502_224752
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 22:48 | Success | - | |
|
exp_self.20260502224054.730_20260502_224055
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502224054.730 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 22:41 | Success | - | |
|
exp_self.20260502223330.729_20260502_223331
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502223330.729 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 22:34 | Success | - | |
|
exp_self.20260502222602.728_20260502_222602
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502222602.728 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 22:27 | Success | - | |
|
exp_self.20260502221834.727_20260502_221834
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502221834.727 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 22:19 | Success | - | |
|
exp_pytrain.20260502221608.181_20260502_221608
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 22:17 | Success | - | |
|
exp_self.20260502220916.726_20260502_220917
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502220916.726 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 22:10 | Success | - | |
|
exp_self.20260502220153.725_20260502_220154
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502220153.725 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 22:02 | Success | - | |
|
exp_self.20260502215429.724_20260502_215429
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502215429.724 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 21:55 | Success | - | |
|
exp_self.20260502214702.723_20260502_214703
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502214702.723 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 21:48 | Success | - | |
|
exp_pytrain.20260502214436.180_20260502_214436
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 21:45 | Success | - | |
|
exp_self.20260502213738.722_20260502_213738
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502213738.722 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 21:38 | Success | - | |
|
exp_self.20260502213015.721_20260502_213015
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502213015.721 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 21:31 | Success | - | |
|
exp_self.20260502212250.720_20260502_212250
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502212250.720 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 21:23 | Success | - | |
|
exp_self.20260502211520.719_20260502_211521
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502211520.719 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 21:16 | Success | - | |
|
exp_pytrain.20260502211253.179_20260502_211253
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 21:13 | Success | - | |
|
exp_self.20260502210555.718_20260502_210555
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502210555.718 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 21:06 | Success | - | |
|
exp_self.20260502205826.717_20260502_205826
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502205826.717 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 20:59 | Success | - | |
|
exp_self.20260502205103.716_20260502_205103
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502205103.716 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 20:52 | Success | - | |
|
exp_self.20260502204338.715_20260502_204338
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502204338.715 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 20:44 | Success | - | |
|
exp_pytrain.20260502204106.178_20260502_204106
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 20:42 | Success | - | |
|
exp_self.20260502203413.714_20260502_203413
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502203413.714 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 20:35 | Success | - | |
|
exp_self.20260502202646.713_20260502_202647
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502202646.713 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 20:27 | Success | - | |
|
exp_self.20260502201924.712_20260502_201925
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502201924.712 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 20:20 | Success | - | |
|
exp_self.20260502201201.711_20260502_201201
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502201201.711 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 20:13 | Success | - | |
|
exp_pytrain.20260502200929.177_20260502_200929
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 20:10 | Success | - | |
|
exp_self.20260502200239.710_20260502_200239
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502200239.710 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 20:03 | Success | - | |
|
exp_self.20260502195512.709_20260502_195513
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502195512.709 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 19:56 | Success | - | |
|
exp_self.20260502194746.708_20260502_194747
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502194746.708 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 19:48 | Success | - | |
|
exp_self.20260502194023.707_20260502_194023
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502194023.707 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 19:41 | Success | - | |
|
exp_pytrain.20260502193752.176_20260502_193753
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 19:38 | Success | - | |
|
exp_self.20260502193104.706_20260502_193105
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502193104.706 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 19:32 | Success | - | |
|
exp_self.20260502192337.705_20260502_192337
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502192337.705 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 19:24 | Success | - | |
|
exp_self.20260502191610.704_20260502_191611
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502191610.704 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 19:17 | Success | - | |
|
exp_self.20260502190847.703_20260502_190847
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502190847.703 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 19:09 | Success | - | |
|
exp_pytrain.20260502190617.175_20260502_190617
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 19:07 | Success | - | |
|
exp_self.20260502185928.702_20260502_185928
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502185928.702 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 19:00 | Success | - | |
|
exp_self.20260502185156.701_20260502_185156
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502185156.701 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 18:52 | Success | - | |
|
exp_self.20260502184430.700_20260502_184430
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502184430.700 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 18:45 | Success | - | |
|
exp_self.20260502183704.699_20260502_183704
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502183704.699 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 18:38 | Success | - | |
|
exp_pytrain.20260502183438.174_20260502_183438
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 18:35 | Success | - | |
|
exp_self.20260502182731.698_20260502_182731
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502182731.698 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 18:28 | Success | - | |
|
exp_self.20260502181942.697_20260502_181942
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502181942.697 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 18:20 | Success | - | |
|
exp_self.20260502181200.696_20260502_181200
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502181200.696 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 18:13 | Success | - | |
|
exp_self.20260502180420.695_20260502_180421
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502180420.695 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 18:05 | Success | - | |
|
exp_pytrain.20260502180150.173_20260502_180151
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 18:02 | Success | - | |
|
exp_self.20260502175434.694_20260502_175434
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502175434.694 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 17:55 | Success | - | |
|
exp_self.20260502174702.693_20260502_174702
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502174702.693 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 17:48 | Success | - | |
|
exp_self.20260502173924.692_20260502_173924
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502173924.692 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 17:40 | Success | - | |
|
exp_self.20260502173148.691_20260502_173148
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502173148.691 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 17:32 | Success | - | |
|
exp_pytrain.20260502172915.172_20260502_172915
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 17:30 | Success | - | |
|
exp_self.20260502172345.690_20260502_172346
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502172345.690 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 17:24 | Success | - | |
|
exp_self.20260502171604.689_20260502_171605
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502171604.689 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 17:17 | Success | - | |
|
exp_self.20260502170823.688_20260502_170823
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502170823.688 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 17:09 | Success | - | |
|
exp_self.20260502170029.687_20260502_170029
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502170029.687 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 17:01 | Success | - | |
|
exp_pytrain.20260502165745.171_20260502_165745
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 16:58 | Success | - | |
|
exp_self.20260502165049.686_20260502_165050
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502165049.686 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 16:51 | Success | - | |
|
exp_self.20260502164311.685_20260502_164311
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502164311.685 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 16:44 | Success | - | |
|
exp_self.20260502163538.684_20260502_163539
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502163538.684 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 16:36 | Success | - | |
|
exp_self.20260502162807.683_20260502_162807
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502162807.683 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 16:29 | Success | - | |
|
exp_pytrain.20260502162532.170_20260502_162532
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 16:26 | Success | - | |
|
exp_self.20260502161839.682_20260502_161839
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502161839.682 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 16:19 | Success | - | |
|
exp_self.20260502161110.681_20260502_161111
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502161110.681 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 16:12 | Success | - | |
|
exp_self.20260502160339.680_20260502_160340
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502160339.680 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 16:04 | Success | - | |
|
exp_self.20260502155607.679_20260502_155608
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502155607.679 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 15:57 | Success | - | |
|
exp_pytrain.20260502155334.169_20260502_155334
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 15:54 | Success | - | |
|
exp_self.20260502154635.678_20260502_154635
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502154635.678 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 15:47 | Success | - | |
|
exp_self.20260502153910.677_20260502_153910
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502153910.677 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 15:40 | Success | - | |
|
exp_self.20260502153142.676_20260502_153143
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502153142.676 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 15:32 | Success | - | |
|
exp_self.20260502152407.675_20260502_152407
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502152407.675 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 15:25 | Success | - | |
|
exp_pytrain.20260502152143.168_20260502_152143
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 15:22 | Success | - | |
|
exp_self.20260502151451.674_20260502_151452
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502151451.674 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 15:15 | Success | - | |
|
exp_self.20260502150718.673_20260502_150718
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502150718.673 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 15:08 | Success | - | |
|
exp_self.20260502145936.672_20260502_145937
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502145936.672 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 15:00 | Success | - | |
|
exp_self.20260502145203.671_20260502_145204
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502145203.671 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 14:53 | Success | - | |
|
exp_pytrain.20260502144933.167_20260502_144934
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 14:50 | Success | - | |
|
exp_self.20260502144241.670_20260502_144241
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502144241.670 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 14:43 | Success | - | |
|
exp_self.20260502143511.669_20260502_143511
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502143511.669 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 14:36 | Success | - | |
|
exp_self.20260502142738.668_20260502_142738
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502142738.668 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 14:28 | Success | - | |
|
exp_self.20260502142003.667_20260502_142004
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502142003.667 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 14:21 | Success | - | |
|
exp_pytrain.20260502141737.166_20260502_141738
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 14:18 | Success | - | |
|
exp_self.20260502141038.666_20260502_141038
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502141038.666 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 14:11 | Success | - | |
|
exp_self.20260502140315.665_20260502_140315
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502140315.665 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 14:04 | Success | - | |
|
exp_self.20260502135545.664_20260502_135545
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502135545.664 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 13:56 | Success | - | |
|
exp_self.20260502134809.663_20260502_134809
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502134809.663 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 13:49 | Success | - | |
|
exp_pytrain.20260502134535.165_20260502_134536
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 13:46 | Success | - | |
|
exp_self.20260502133830.662_20260502_133831
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502133830.662 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 13:39 | Success | - | |
|
exp_self.20260502133059.661_20260502_133059
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502133059.661 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 13:32 | Success | - | |
|
exp_self.20260502132326.660_20260502_132326
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502132326.660 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 13:24 | Success | - | |
|
exp_self.20260502131555.659_20260502_131555
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502131555.659 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 13:16 | Success | - | |
|
exp_pytrain.20260502131317.164_20260502_131317
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 13:14 | Success | - | |
|
exp_self.20260502130614.658_20260502_130615
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502130614.658 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 13:07 | Success | - | |
|
exp_self.20260502125843.657_20260502_125843
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502125843.657 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 12:59 | Success | - | |
|
exp_self.20260502125058.656_20260502_125058
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502125058.656 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 12:52 | Success | - | |
|
exp_self.20260502124327.655_20260502_124327
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502124327.655 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 12:44 | Success | - | |
|
exp_pytrain.20260502124051.163_20260502_124051
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 12:41 | Success | - | |
|
exp_self.20260502123355.654_20260502_123355
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502123355.654 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 12:34 | Success | - | |
|
exp_self.20260502122623.653_20260502_122624
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502122623.653 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 12:27 | Success | - | |
|
exp_self.20260502121855.652_20260502_121856
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502121855.652 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 12:19 | Success | - | |
|
exp_self.20260502121133.651_20260502_121133
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502121133.651 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 12:12 | Success | - | |
|
exp_pytrain.20260502120902.162_20260502_120902
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 12:10 | Success | - | |
|
exp_self.20260502120215.650_20260502_120215
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502120215.650 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 12:03 | Success | - | |
|
exp_self.20260502115447.649_20260502_115448
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502115447.649 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 11:55 | Success | - | |
|
exp_gh_mcarbonell_supermario-optimizer_20260502_115135
|
mcarbonell/supermario-optimizer
Paper ID: gh_mcarbonell_supermario-optimizer - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal:...
|
05-02 11:52 | Success | - | |
|
exp_self.20260502114719.648_20260502_114719
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502114719.648 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 11:48 | Success | - | |
|
exp_self.20260502113935.647_20260502_113936
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502113935.647 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 11:40 | Success | - | |
|
exp_pytrain.20260502113709.161_20260502_113710
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 11:38 | Success | - | |
|
exp_self.20260502113005.646_20260502_113005
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502113005.646 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 11:31 | Success | - | |
|
exp_self.20260502112242.645_20260502_112242
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502112242.645 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 11:23 | Success | - | |
|
exp_self.20260502111517.644_20260502_111517
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502111517.644 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 11:16 | Success | - | |
|
exp_self.20260502110748.643_20260502_110749
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502110748.643 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 11:08 | Success | - | |
|
exp_pytrain.20260502110519.160_20260502_110520
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 11:06 | Success | - | |
|
exp_self.20260502105818.642_20260502_105819
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502105818.642 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 10:59 | Success | - | |
|
exp_self.20260502105049.641_20260502_105049
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502105049.641 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 10:51 | Success | - | |
|
exp_self.20260502104325.640_20260502_104326
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502104325.640 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 10:44 | Success | - | |
|
exp_self.20260502103600.639_20260502_103600
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502103600.639 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 10:37 | Success | - | |
|
exp_pytrain.20260502103329.159_20260502_103329
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 10:34 | Success | - | |
|
exp_self.20260502102638.638_20260502_102639
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502102638.638 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 10:27 | Success | - | |
|
exp_self.20260502101913.637_20260502_101913
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502101913.637 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 10:20 | Success | - | |
|
exp_self.20260502101133.636_20260502_101134
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502101133.636 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 10:12 | Success | - | |
|
exp_self.20260502100405.635_20260502_100405
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502100405.635 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 10:05 | Success | - | |
|
exp_pytrain.20260502100133.158_20260502_100133
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 10:02 | Success | - | |
|
exp_self.20260502095441.634_20260502_095442
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502095441.634 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 09:55 | Success | - | |
|
exp_self.20260502094716.633_20260502_094716
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502094716.633 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 09:48 | Success | - | |
|
exp_self.20260502093949.632_20260502_093949
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502093949.632 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 09:40 | Success | - | |
|
exp_self.20260502093226.631_20260502_093226
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502093226.631 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 09:33 | Success | - | |
|
exp_pytrain.20260502092955.157_20260502_092955
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 09:30 | Success | - | |
|
exp_self.20260502092305.630_20260502_092305
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502092305.630 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 09:24 | Success | - | |
|
exp_self.20260502091537.629_20260502_091538
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502091537.629 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 09:16 | Success | - | |
|
exp_self.20260502090813.628_20260502_090813
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502090813.628 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 09:09 | Success | - | |
|
exp_self.20260502090047.627_20260502_090047
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502090047.627 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 09:01 | Success | - | |
|
exp_pytrain.20260502085817.156_20260502_085818
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 08:59 | Success | - | |
|
exp_self.20260502085131.626_20260502_085132
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502085131.626 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 08:52 | Success | - | |
|
exp_self.20260502084403.625_20260502_084403
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502084403.625 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 08:45 | Success | - | |
|
exp_self.20260502083638.624_20260502_083638
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502083638.624 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 08:37 | Success | - | |
|
exp_self.20260502082911.623_20260502_082911
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502082911.623 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 08:30 | Success | - | |
|
exp_pytrain.20260502082641.155_20260502_082642
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 08:27 | Success | - | |
|
exp_self.20260502081947.622_20260502_081947
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502081947.622 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 08:20 | Success | - | |
|
exp_self.20260502081218.621_20260502_081218
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502081218.621 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 08:13 | Success | - | |
|
exp_self.20260502080443.620_20260502_080443
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502080443.620 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 08:05 | Success | - | |
|
exp_self.20260502075711.619_20260502_075711
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502075711.619 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 07:58 | Success | - | |
|
exp_pytrain.20260502075443.154_20260502_075443
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 07:55 | Success | - | |
|
exp_self.20260502074743.618_20260502_074744
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502074743.618 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 07:48 | Success | - | |
|
exp_self.20260502074020.617_20260502_074020
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502074020.617 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 07:41 | Success | - | |
|
exp_self.20260502073252.616_20260502_073253
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502073252.616 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 07:33 | Success | - | |
|
exp_self.20260502072527.615_20260502_072527
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502072527.615 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 07:26 | Success | - | |
|
exp_pytrain.20260502072302.153_20260502_072302
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 07:24 | Success | - | |
|
exp_self.20260502071604.614_20260502_071604
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502071604.614 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 07:17 | Success | - | |
|
exp_self.20260502070841.613_20260502_070841
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502070841.613 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 07:09 | Success | - | |
|
exp_self.20260502070115.612_20260502_070115
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502070115.612 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 07:02 | Success | - | |
|
exp_self.20260502065339.611_20260502_065340
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502065339.611 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 06:54 | Success | - | |
|
exp_pytrain.20260502065106.152_20260502_065107
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 06:52 | Success | - | |
|
exp_gh_airdropkalami_awesome-gpu-for-llm_20260502_064816
|
airdropkalami/awesome-gpu-for-llm
Paper ID: gh_airdropkalami_awesome-gpu-for-llm - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signa...
|
05-02 06:49 | Success | - | |
|
exp_self.20260502064458.610_20260502_064458
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502064458.610 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 06:46 | Success | - | |
|
exp_self.20260502063727.609_20260502_063727
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502063727.609 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 06:38 | Success | - | |
|
exp_self.20260502062955.608_20260502_062956
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502062955.608 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 06:30 | Success | - | |
|
exp_self.20260502062212.607_20260502_062213
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502062212.607 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 06:23 | Success | - | |
|
exp_pytrain.20260502061933.151_20260502_061933
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 06:20 | Success | - | |
|
exp_self.20260502061224.606_20260502_061225
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502061224.606 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 06:13 | Success | - | |
|
exp_self.20260502060449.605_20260502_060450
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502060449.605 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 06:05 | Success | - | |
|
exp_self.20260502055718.604_20260502_055718
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502055718.604 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 05:58 | Success | - | |
|
exp_self.20260502054945.603_20260502_054945
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502054945.603 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 05:50 | Success | - | |
|
exp_pytrain.20260502054712.150_20260502_054712
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 05:48 | Success | - | |
|
exp_self.20260502054005.602_20260502_054005
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502054005.602 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 05:41 | Success | - | |
|
exp_self.20260502053235.601_20260502_053235
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502053235.601 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 05:33 | Success | - | |
|
exp_self.20260502052508.600_20260502_052509
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502052508.600 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 05:26 | Success | - | |
|
exp_self.20260502051738.599_20260502_051739
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502051738.599 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 05:18 | Success | - | |
|
exp_pytrain.20260502051507.149_20260502_051507
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 05:16 | Success | - | |
|
exp_self.20260502050805.598_20260502_050805
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502050805.598 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 05:09 | Success | - | |
|
exp_self.20260502050031.597_20260502_050032
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502050031.597 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 05:01 | Success | - | |
|
exp_self.20260502045254.596_20260502_045254
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502045254.596 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 04:53 | Success | - | |
|
exp_self.20260502044524.595_20260502_044525
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502044524.595 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 04:46 | Success | - | |
|
exp_pytrain.20260502044246.148_20260502_044247
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 04:43 | Success | - | |
|
exp_self.20260502043552.594_20260502_043553
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502043552.594 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 04:36 | Success | - | |
|
exp_self.20260502042826.593_20260502_042826
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502042826.593 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 04:29 | Success | - | |
|
exp_self.20260502042102.592_20260502_042102
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502042102.592 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 04:22 | Success | - | |
|
exp_self.20260502041354.591_20260502_041354
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502041354.591 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 04:14 | Success | - | |
|
exp_pytrain.20260502041129.147_20260502_041129
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 04:12 | Success | - | |
|
exp_self.20260502040430.590_20260502_040430
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502040430.590 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 04:05 | Success | - | |
|
exp_self.20260502035705.589_20260502_035705
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502035705.589 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 03:58 | Success | - | |
|
exp_self.20260502034941.588_20260502_034941
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502034941.588 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 03:50 | Success | - | |
|
exp_self.20260502034212.587_20260502_034212
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502034212.587 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 03:43 | Success | - | |
|
exp_pytrain.20260502033947.146_20260502_033947
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 03:40 | Success | - | |
|
exp_self.20260502033246.586_20260502_033246
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502033246.586 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 03:33 | Success | - | |
|
exp_self.20260502032524.585_20260502_032524
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502032524.585 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 03:26 | Success | - | |
|
exp_self.20260502031759.584_20260502_031800
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502031759.584 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 03:19 | Success | - | |
|
exp_self.20260502031030.583_20260502_031030
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502031030.583 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 03:11 | Success | - | |
|
exp_pytrain.20260502030802.145_20260502_030802
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 03:09 | Success | - | |
|
exp_self.20260502030104.582_20260502_030104
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502030104.582 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 03:02 | Success | - | |
|
exp_self.20260502025341.581_20260502_025342
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502025341.581 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 02:54 | Success | - | |
|
exp_self.20260502024618.580_20260502_024618
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502024618.580 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 02:47 | Success | - | |
|
exp_self.20260502023853.579_20260502_023854
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502023853.579 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 02:39 | Success | - | |
|
exp_pytrain.20260502023621.144_20260502_023621
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 02:37 | Success | - | |
|
exp_self.20260502023104.578_20260502_023104
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502023104.578 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 02:32 | Success | - | |
|
exp_gh_cryptopoly_ChaosEngineAI_20260502_022642
|
cryptopoly/ChaosEngineAI
Paper ID: gh_cryptopoly_ChaosEngineAI - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recove...
|
05-02 02:27 | Success | - | |
|
exp_self.20260502022329.577_20260502_022329
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502022329.577 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 02:24 | Success | - | |
|
exp_self.20260502021605.576_20260502_021605
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502021605.576 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 02:17 | Success | - | |
|
exp_self.20260502020836.575_20260502_020837
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502020836.575 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 02:09 | Success | - | |
|
exp_pytrain.20260502020459.143_20260502_020500
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 02:06 | Success | - | |
|
exp_self.20260502020046.574_20260502_020046
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502020046.574 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 02:01 | Success | - | |
|
exp_self.20260502015322.573_20260502_015322
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502015322.573 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 01:54 | Success | - | |
|
exp_self.20260502014548.572_20260502_014548
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502014548.572 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 01:46 | Success | - | |
|
exp_self.20260502013820.571_20260502_013820
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502013820.571 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 01:39 | Success | - | |
|
exp_pytrain.20260502013337.142_20260502_013337
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 01:34 | Success | - | |
|
exp_self.20260502013139.570_20260502_013139
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502013139.570 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 01:32 | Success | - | |
|
exp_self.20260502012415.569_20260502_012415
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502012415.569 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 01:25 | Success | - | |
|
exp_self.20260502011646.568_20260502_011646
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502011646.568 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 01:17 | Success | - | |
|
exp_self.20260502010919.567_20260502_010920
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502010919.567 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 01:10 | Success | - | |
|
exp_self.20260502010237.566_20260502_010238
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502010237.566 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 01:03 | Success | - | |
|
exp_pytrain.20260502010011.141_20260502_010011
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 01:01 | Success | - | |
|
exp_self.20260502005311.565_20260502_005312
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502005311.565 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 00:54 | Success | - | |
|
exp_self.20260502004550.564_20260502_004550
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502004550.564 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 00:46 | Success | - | |
|
exp_self.20260502003815.563_20260502_003816
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502003815.563 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 00:39 | Success | - | |
|
exp_self.20260502003049.562_20260502_003049
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502003049.562 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 00:31 | Success | - | |
|
exp_pytrain.20260502002821.140_20260502_002821
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-02 00:29 | Success | - | |
|
exp_self.20260502002129.561_20260502_002129
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502002129.561 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 00:22 | Success | - | |
|
exp_self.20260502001402.560_20260502_001402
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502001402.560 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 00:15 | Success | - | |
|
exp_self.20260502000638.559_20260502_000638
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260502000638.559 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 00:07 | Success | - | |
|
exp_self.20260501235902.558_20260501_235902
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501235902.558 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-02 00:00 | Success | - | |
|
exp_pytrain.20260501235632.139_20260501_235632
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 23:57 | Success | - | |
|
exp_self.20260501234941.557_20260501_234941
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501234941.557 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 23:50 | Success | - | |
|
exp_self.20260501234216.556_20260501_234217
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501234216.556 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 23:43 | Success | - | |
|
exp_self.20260501233454.555_20260501_233455
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501233454.555 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 23:35 | Success | - | |
|
exp_self.20260501232731.554_20260501_232731
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501232731.554 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 23:28 | Success | - | |
|
exp_pytrain.20260501232459.138_20260501_232459
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 23:26 | Success | - | |
|
exp_self.20260501231808.553_20260501_231808
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501231808.553 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 23:19 | Success | - | |
|
exp_self.20260501231039.552_20260501_231039
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501231039.552 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 23:11 | Success | - | |
|
exp_self.20260501230316.551_20260501_230316
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501230316.551 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 23:04 | Success | - | |
|
exp_self.20260501225553.550_20260501_225554
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501225553.550 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 22:56 | Success | - | |
|
exp_pytrain.20260501225321.137_20260501_225321
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 22:54 | Success | - | |
|
exp_self.20260501224633.549_20260501_224633
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501224633.549 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 22:47 | Success | - | |
|
exp_self.20260501223904.548_20260501_223904
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501223904.548 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 22:40 | Success | - | |
|
exp_self.20260501223139.547_20260501_223139
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501223139.547 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 22:32 | Success | - | |
|
exp_self.20260501222416.546_20260501_222417
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501222416.546 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 22:25 | Success | - | |
|
exp_pytrain.20260501222146.136_20260501_222147
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 22:22 | Success | - | |
|
exp_self.20260501221500.545_20260501_221500
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501221500.545 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 22:16 | Success | - | |
|
exp_self.20260501220729.544_20260501_220729
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501220729.544 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 22:08 | Success | - | |
|
exp_self.20260501220004.543_20260501_220004
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501220004.543 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 22:01 | Success | - | |
|
exp_self.20260501215239.542_20260501_215240
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501215239.542 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 21:53 | Success | - | |
|
exp_pytrain.20260501215015.135_20260501_215015
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 21:51 | Success | - | |
|
exp_self.20260501214324.541_20260501_214324
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501214324.541 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 21:44 | Success | - | |
|
exp_self.20260501213557.540_20260501_213558
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501213557.540 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 21:37 | Success | - | |
|
exp_self.20260501212829.539_20260501_212829
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501212829.539 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 21:29 | Success | - | |
|
exp_self.20260501212103.538_20260501_212103
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501212103.538 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 21:22 | Success | - | |
|
exp_pytrain.20260501211839.134_20260501_211839
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 21:19 | Success | - | |
|
exp_gh_Pearlfisheryjersey8695_kalshiquant_20260501_211555
|
Pearlfisheryjersey8695/kalshiquant
Paper ID: gh_Pearlfisheryjersey8695_kalshiquant - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Sign...
|
05-01 21:16 | Success | - | |
|
exp_self.20260501211028.537_20260501_211029
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501211028.537 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 21:11 | Success | - | |
|
exp_self.20260501210306.536_20260501_210306
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501210306.536 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 21:04 | Success | - | |
|
exp_self.20260501205539.535_20260501_205540
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501205539.535 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 20:56 | Success | - | |
|
exp_self.20260501204816.534_20260501_204816
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501204816.534 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 20:49 | Success | - | |
|
exp_pytrain.20260501204551.133_20260501_204551
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 20:46 | Success | - | |
|
exp_self.20260501203854.533_20260501_203854
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501203854.533 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 20:39 | Success | - | |
|
exp_self.20260501203129.532_20260501_203130
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501203129.532 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 20:32 | Success | - | |
|
exp_self.20260501202406.531_20260501_202406
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501202406.531 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 20:25 | Success | - | |
|
exp_self.20260501201633.530_20260501_201633
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501201633.530 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 20:17 | Success | - | |
|
exp_pytrain.20260501201407.132_20260501_201408
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 20:15 | Success | - | |
|
exp_self.20260501200709.529_20260501_200710
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501200709.529 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 20:08 | Success | - | |
|
exp_self.20260501195944.528_20260501_195944
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501195944.528 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 20:00 | Success | - | |
|
exp_self.20260501195221.527_20260501_195222
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501195221.527 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 19:53 | Success | - | |
|
exp_self.20260501194453.526_20260501_194453
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501194453.526 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 19:45 | Success | - | |
|
exp_pytrain.20260501194226.131_20260501_194226
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 19:43 | Success | - | |
|
exp_self.20260501193535.525_20260501_193535
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501193535.525 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 19:36 | Success | - | |
|
exp_self.20260501192811.524_20260501_192811
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501192811.524 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 19:29 | Success | - | |
|
exp_self.20260501192047.523_20260501_192047
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501192047.523 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 19:21 | Success | - | |
|
exp_self.20260501191322.522_20260501_191322
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501191322.522 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 19:14 | Success | - | |
|
exp_pytrain.20260501191050.130_20260501_191051
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 19:11 | Success | - | |
|
exp_self.20260501190354.521_20260501_190355
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501190354.521 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 19:04 | Success | - | |
|
exp_self.20260501185626.520_20260501_185627
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501185626.520 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 18:57 | Success | - | |
|
exp_self.20260501184858.519_20260501_184858
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501184858.519 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 18:50 | Success | - | |
|
exp_self.20260501184129.518_20260501_184129
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501184129.518 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 18:42 | Success | - | |
|
exp_pytrain.20260501183854.129_20260501_183855
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 18:39 | Success | - | |
|
exp_self.20260501183154.517_20260501_183154
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501183154.517 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 18:32 | Success | - | |
|
exp_self.20260501182421.516_20260501_182421
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501182421.516 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 18:25 | Success | - | |
|
exp_self.20260501181633.515_20260501_181633
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501181633.515 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 18:17 | Success | - | |
|
exp_self.20260501180900.514_20260501_180901
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501180900.514 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 18:10 | Success | - | |
|
exp_pytrain.20260501180610.128_20260501_180610
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 18:07 | Success | - | |
|
exp_self.20260501175917.513_20260501_175918
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501175917.513 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 18:00 | Success | - | |
|
exp_self.20260501175141.512_20260501_175142
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501175141.512 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 17:52 | Success | - | |
|
exp_self.20260501174359.511_20260501_174359
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501174359.511 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 17:45 | Success | - | |
|
exp_self.20260501173631.510_20260501_173631
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501173631.510 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 17:37 | Success | - | |
|
exp_pytrain.20260501173356.127_20260501_173357
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 17:34 | Success | - | |
|
exp_self.20260501172705.509_20260501_172705
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501172705.509 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 17:28 | Success | - | |
|
exp_self.20260501171932.508_20260501_171932
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501171932.508 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 17:20 | Success | - | |
|
exp_self.20260501171202.507_20260501_171202
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501171202.507 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 17:13 | Success | - | |
|
exp_self.20260501170425.506_20260501_170426
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501170425.506 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 17:05 | Success | - | |
|
exp_pytrain.20260501170155.126_20260501_170155
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 17:02 | Success | - | |
|
exp_self.20260501165450.505_20260501_165450
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501165450.505 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 16:55 | Success | - | |
|
exp_self.20260501164656.504_20260501_164657
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501164656.504 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 16:47 | Success | - | |
|
exp_self.20260501163922.503_20260501_163922
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501163922.503 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 16:40 | Success | - | |
|
exp_self.20260501163154.502_20260501_163154
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501163154.502 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 16:32 | Success | - | |
|
exp_pytrain.20260501162929.125_20260501_162930
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 16:30 | Success | - | |
|
exp_self.20260501162231.501_20260501_162231
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501162231.501 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 16:23 | Success | - | |
|
exp_self.20260501161506.500_20260501_161506
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501161506.500 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 16:16 | Success | - | |
|
exp_self.20260501160737.499_20260501_160737
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501160737.499 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 16:08 | Success | - | |
|
exp_self.20260501155947.498_20260501_155947
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501155947.498 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 16:00 | Success | - | |
|
exp_pytrain.20260501155717.124_20260501_155717
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 15:58 | Success | - | |
|
exp_self.20260501155013.497_20260501_155014
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501155013.497 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 15:51 | Success | - | |
|
exp_self.20260501154243.496_20260501_154244
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501154243.496 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 15:43 | Success | - | |
|
exp_self.20260501153512.495_20260501_153512
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501153512.495 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 15:36 | Success | - | |
|
exp_self.20260501152738.494_20260501_152739
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501152738.494 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 15:28 | Success | - | |
|
exp_pytrain.20260501152509.123_20260501_152510
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 15:26 | Success | - | |
|
exp_self.20260501151806.493_20260501_151806
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501151806.493 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 15:19 | Success | - | |
|
exp_self.20260501151038.492_20260501_151038
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501151038.492 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 15:11 | Success | - | |
|
exp_self.20260501150306.491_20260501_150307
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501150306.491 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 15:04 | Success | - | |
|
exp_self.20260501145532.490_20260501_145532
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501145532.490 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 14:56 | Success | - | |
|
exp_pytrain.20260501145301.122_20260501_145301
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 14:54 | Success | - | |
|
exp_self.20260501144600.489_20260501_144601
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501144600.489 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 14:47 | Success | - | |
|
exp_hf_2604.24954_20260501_144136
|
Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence
Paper ID: hf_2604.24954 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-01 14:42 | Success | - | |
|
exp_self.20260501143826.488_20260501_143826
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501143826.488 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 14:39 | Success | - | |
|
exp_self.20260501143051.487_20260501_143052
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501143051.487 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 14:31 | Success | - | |
|
exp_self.20260501142327.486_20260501_142328
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501142327.486 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 14:24 | Success | - | |
|
exp_pytrain.20260501142102.121_20260501_142103
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 14:22 | Success | - | |
|
exp_self.20260501141411.485_20260501_141411
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501141411.485 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 14:15 | Success | - | |
|
exp_self.20260501140645.484_20260501_140645
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501140645.484 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 14:07 | Success | - | |
|
exp_self.20260501135915.483_20260501_135916
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501135915.483 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 14:00 | Success | - | |
|
exp_self.20260501135149.482_20260501_135150
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501135149.482 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 13:52 | Success | - | |
|
exp_pytrain.20260501134925.120_20260501_134925
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 13:50 | Success | - | |
|
exp_self.20260501134227.481_20260501_134227
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501134227.481 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 13:43 | Success | - | |
|
exp_self.20260501133501.480_20260501_133501
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501133501.480 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 13:36 | Success | - | |
|
exp_self.20260501132732.479_20260501_132732
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501132732.479 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 13:28 | Success | - | |
|
exp_self.20260501132001.478_20260501_132002
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501132001.478 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 13:21 | Success | - | |
|
exp_pytrain.20260501131733.119_20260501_131734
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 13:18 | Success | - | |
|
exp_self.20260501131032.477_20260501_131033
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501131032.477 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 13:11 | Success | - | |
|
exp_self.20260501130305.476_20260501_130305
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501130305.476 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 13:04 | Success | - | |
|
exp_self.20260501125536.475_20260501_125536
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501125536.475 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 12:56 | Success | - | |
|
exp_self.20260501124805.474_20260501_124805
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501124805.474 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 12:49 | Success | - | |
|
exp_pytrain.20260501124538.118_20260501_124538
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 12:46 | Success | - | |
|
exp_self.20260501124018.473_20260501_124018
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501124018.473 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 12:41 | Success | - | |
|
exp_hf_2604.27151_20260501_123658
|
Step-level Optimization for Efficient Computer-use Agents
Paper ID: hf_2604.27151 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-01 12:38 | Success | - | |
|
exp_self.20260501123130.472_20260501_123131
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501123130.472 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 12:32 | Success | - | |
|
exp_self.20260501122400.471_20260501_122401
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501122400.471 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 12:25 | Success | - | |
|
exp_self.20260501121634.470_20260501_121634
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501121634.470 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 12:17 | Success | - | |
|
exp_pytrain.20260501121401.117_20260501_121402
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 12:15 | Success | - | |
|
exp_self.20260501120711.469_20260501_120711
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501120711.469 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 12:08 | Success | - | |
|
exp_self.20260501115936.468_20260501_115936
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501115936.468 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 12:00 | Success | - | |
|
exp_self.20260501115207.467_20260501_115208
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501115207.467 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 11:53 | Success | - | |
|
exp_self.20260501114438.466_20260501_114438
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501114438.466 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 11:45 | Success | - | |
|
exp_pytrain.20260501114209.116_20260501_114209
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 11:43 | Success | - | |
|
exp_self.20260501113459.465_20260501_113500
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501113459.465 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 11:36 | Success | - | |
|
exp_self.20260501112728.464_20260501_112728
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501112728.464 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 11:28 | Success | - | |
|
exp_self.20260501111952.463_20260501_111953
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501111952.463 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 11:20 | Success | - | |
|
exp_self.20260501111220.462_20260501_111221
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501111220.462 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 11:13 | Success | - | |
|
exp_pytrain.20260501110950.115_20260501_110950
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 11:10 | Success | - | |
|
exp_self.20260501110244.461_20260501_110244
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501110244.461 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 11:03 | Success | - | |
|
exp_self.20260501105540.460_20260501_105540
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501105540.460 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 10:56 | Success | - | |
|
exp_self.20260501104837.459_20260501_104837
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501104837.459 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 10:49 | Success | - | |
|
exp_self.20260501104137.458_20260501_104137
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501104137.458 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 10:42 | Success | - | |
|
exp_pytrain.20260501103812.114_20260501_103812
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 10:39 | Success | - | |
|
exp_self.20260501103207.457_20260501_103208
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501103207.457 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 10:33 | Success | - | |
|
exp_self.20260501102502.456_20260501_102502
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501102502.456 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 10:26 | Success | - | |
|
exp_self.20260501101638.455_20260501_101639
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501101638.455 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 10:17 | Success | - | |
|
exp_self.20260501100932.454_20260501_100932
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501100932.454 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 10:10 | Success | - | |
|
exp_pytrain.20260501100620.113_20260501_100621
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 10:07 | Success | - | |
|
exp_self.20260501095942.453_20260501_095942
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501095942.453 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 10:00 | Success | - | |
|
exp_self.20260501095231.452_20260501_095231
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501095231.452 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 09:53 | Success | - | |
|
exp_self.20260501094529.451_20260501_094529
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501094529.451 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 09:46 | Success | - | |
|
exp_hf_2604.28157_20260501_094142
|
FlashRT: Towards Computationally and Memory Efficient Red-Teaming for Prompt Injection and Knowledge Corruption
Paper ID: hf_2604.28157 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-01 09:42 | Success | - | |
|
exp_self.20260501093641.450_20260501_093641
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501093641.450 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 09:37 | Success | - | |
|
exp_pytrain.20260501093349.112_20260501_093350
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 09:34 | Success | - | |
|
exp_self.20260501092742.449_20260501_092742
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501092742.449 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 09:28 | Success | - | |
|
exp_self.20260501092005.448_20260501_092006
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501092005.448 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 09:21 | Success | - | |
|
exp_self.20260501091233.447_20260501_091233
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501091233.447 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 09:13 | Success | - | |
|
exp_self.20260501090500.446_20260501_090500
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501090500.446 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 09:06 | Success | - | |
|
exp_pytrain.20260501090227.111_20260501_090227
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 09:03 | Success | - | |
|
exp_self.20260501085522.445_20260501_085522
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501085522.445 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 08:56 | Success | - | |
|
exp_self.20260501084747.444_20260501_084748
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501084747.444 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 08:48 | Success | - | |
|
exp_self.20260501084010.443_20260501_084011
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501084010.443 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 08:41 | Success | - | |
|
exp_self.20260501083234.442_20260501_083234
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501083234.442 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 08:33 | Success | - | |
|
exp_pytrain.20260501083005.110_20260501_083006
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 08:31 | Success | - | |
|
exp_self.20260501082302.441_20260501_082302
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501082302.441 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 08:24 | Success | - | |
|
exp_self.20260501081526.440_20260501_081527
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501081526.440 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 08:16 | Success | - | |
|
exp_self.20260501080747.439_20260501_080748
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501080747.439 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 08:08 | Success | - | |
|
exp_self.20260501080011.438_20260501_080011
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501080011.438 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 08:01 | Success | - | |
|
exp_pytrain.20260501075740.109_20260501_075740
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 07:58 | Success | - | |
|
exp_self.20260501075034.437_20260501_075035
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501075034.437 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 07:51 | Success | - | |
|
exp_self.20260501074301.436_20260501_074301
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501074301.436 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 07:44 | Success | - | |
|
exp_self.20260501073527.435_20260501_073527
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501073527.435 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 07:36 | Success | - | |
|
exp_self.20260501072750.434_20260501_072751
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501072750.434 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 07:28 | Success | - | |
|
exp_pytrain.20260501072519.108_20260501_072519
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 07:26 | Success | - | |
|
exp_self.20260501071816.433_20260501_071817
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501071816.433 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 07:19 | Success | - | |
|
exp_self.20260501071046.432_20260501_071047
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501071046.432 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 07:11 | Success | - | |
|
exp_self.20260501070314.431_20260501_070315
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501070314.431 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 07:04 | Success | - | |
|
exp_self.20260501065536.430_20260501_065537
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501065536.430 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 06:56 | Success | - | |
|
exp_pytrain.20260501065303.107_20260501_065304
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 06:54 | Success | - | |
|
exp_self.20260501064557.429_20260501_064558
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501064557.429 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 06:47 | Success | - | |
|
exp_self.20260501063826.428_20260501_063826
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501063826.428 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 06:39 | Success | - | |
|
exp_self.20260501063055.427_20260501_063055
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501063055.427 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 06:31 | Success | - | |
|
exp_self.20260501062322.426_20260501_062322
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501062322.426 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 06:24 | Success | - | |
|
exp_pytrain.20260501062042.106_20260501_062043
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 06:21 | Success | - | |
|
exp_self.20260501061339.425_20260501_061339
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501061339.425 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 06:14 | Success | - | |
|
exp_self.20260501060605.424_20260501_060605
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501060605.424 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 06:07 | Success | - | |
|
exp_self.20260501055833.423_20260501_055833
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501055833.423 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 05:59 | Success | - | |
|
exp_self.20260501055057.422_20260501_055058
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501055057.422 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 05:52 | Success | - | |
|
exp_pytrain.20260501054820.105_20260501_054820
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 05:49 | Success | - | |
|
exp_self.20260501054122.421_20260501_054122
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501054122.421 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 05:42 | Success | - | |
|
exp_self.20260501053338.420_20260501_053338
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501053338.420 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 05:34 | Success | - | |
|
exp_self.20260501052610.419_20260501_052610
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501052610.419 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 05:27 | Success | - | |
|
exp_self.20260501051837.418_20260501_051838
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501051837.418 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 05:19 | Success | - | |
|
exp_pytrain.20260501051600.104_20260501_051601
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 05:17 | Success | - | |
|
exp_self.20260501051001.417_20260501_051002
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501051001.417 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 05:11 | Success | - | |
|
exp_self.20260501050223.416_20260501_050224
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501050223.416 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 05:03 | Success | - | |
|
exp_hf_2604.27251_20260501_045903
|
Compliance versus Sensibility: On the Reasoning Controllability in Large Language Models
Paper ID: hf_2604.27251 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-01 05:00 | Success | - | |
|
exp_self.20260501045442.415_20260501_045442
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501045442.415 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 04:55 | Success | - | |
|
exp_self.20260501044707.414_20260501_044708
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501044707.414 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 04:48 | Success | - | |
|
exp_pytrain.20260501044431.103_20260501_044431
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 04:45 | Success | - | |
|
exp_self.20260501043727.413_20260501_043728
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501043727.413 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 04:38 | Success | - | |
|
exp_self.20260501042943.412_20260501_042944
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501042943.412 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 04:30 | Success | - | |
|
exp_self.20260501042201.411_20260501_042201
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501042201.411 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 04:23 | Success | - | |
|
exp_self.20260501041427.410_20260501_041428
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501041427.410 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 04:15 | Success | - | |
|
exp_pytrain.20260501041152.102_20260501_041152
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 04:12 | Success | - | |
|
exp_self.20260501040457.409_20260501_040457
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501040457.409 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 04:06 | Success | - | |
|
exp_self.20260501035717.408_20260501_035717
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501035717.408 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 03:58 | Success | - | |
|
exp_self.20260501034936.407_20260501_034936
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501034936.407 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 03:50 | Success | - | |
|
exp_self.20260501034207.406_20260501_034207
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501034207.406 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 03:43 | Success | - | |
|
exp_pytrain.20260501033936.101_20260501_033936
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 03:40 | Success | - | |
|
exp_self.20260501033231.405_20260501_033232
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501033231.405 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 03:33 | Success | - | |
|
exp_self.20260501032446.404_20260501_032446
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501032446.404 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 03:25 | Success | - | |
|
exp_self.20260501031706.403_20260501_031707
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501031706.403 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 03:18 | Success | - | |
|
exp_self.20260501030930.402_20260501_030930
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501030930.402 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 03:10 | Success | - | |
|
exp_pytrain.20260501030701.100_20260501_030701
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 03:08 | Success | - | |
|
exp_self.20260501025955.401_20260501_025955
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501025955.401 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 03:00 | Success | - | |
|
exp_self.20260501025222.400_20260501_025223
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501025222.400 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 02:53 | Success | - | |
|
exp_self.20260501024443.399_20260501_024444
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501024443.399 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 02:45 | Success | - | |
|
exp_self.20260501023709.398_20260501_023709
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501023709.398 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 02:38 | Success | - | |
|
exp_pytrain.20260501023440.099_20260501_023440
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 02:35 | Success | - | |
|
exp_self.20260501022905.397_20260501_022905
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501022905.397 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 02:30 | Success | - | |
|
exp_self.20260501022136.396_20260501_022136
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501022136.396 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 02:22 | Success | - | |
|
exp_self.20260501021335.395_20260501_021336
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501021335.395 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 02:14 | Success | - | |
|
exp_self.20260501020615.394_20260501_020616
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501020615.394 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 02:07 | Success | - | |
|
exp_pytrain.20260501020300.098_20260501_020301
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 02:04 | Success | - | |
|
exp_self.20260501015628.393_20260501_015628
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501015628.393 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 01:57 | Success | - | |
|
exp_self.20260501014808.392_20260501_014808
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501014808.392 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 01:49 | Success | - | |
|
exp_self.20260501014104.391_20260501_014104
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501014104.391 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 01:42 | Success | - | |
|
exp_self.20260501013359.390_20260501_013400
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501013359.390 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 01:35 | Success | - | |
|
exp_pytrain.20260501013049.097_20260501_013050
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 01:31 | Success | - | |
|
exp_self.20260501012550.389_20260501_012550
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501012550.389 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 01:26 | Success | - | |
|
exp_self.20260501011845.388_20260501_011845
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501011845.388 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 01:19 | Success | - | |
|
exp_self.20260501011028.387_20260501_011029
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501011028.387 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 01:11 | Success | - | |
|
exp_self.20260501010208.386_20260501_010208
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501010208.386 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 01:03 | Success | - | |
|
exp_pytrain.20260501005851.096_20260501_005851
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 00:59 | Success | - | |
|
exp_self.20260501005248.385_20260501_005248
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501005248.385 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 00:53 | Success | - | |
|
exp_self.20260501004506.384_20260501_004506
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501004506.384 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 00:46 | Success | - | |
|
exp_self.20260501003730.383_20260501_003730
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501003730.383 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 00:38 | Success | - | |
|
exp_self.20260501002958.382_20260501_002958
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501002958.382 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 00:31 | Success | - | |
|
exp_pytrain.20260501002728.095_20260501_002728
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
05-01 00:28 | Success | - | |
|
exp_hf_2604.27039_20260501_002331
|
Length Value Model: Scalable Value Pretraining for Token-Level Length Modeling
Paper ID: hf_2604.27039 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-01 00:24 | Success | - | |
|
exp_self.20260501002123.381_20260501_002123
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501002123.381 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 00:22 | Success | - | |
|
exp_self.20260501001351.380_20260501_001352
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501001351.380 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 00:14 | Success | - | |
|
exp_hf_2604.27085_20260501_000818
|
Efficient Training on Multiple Consumer GPUs with RoundPipe
Paper ID: hf_2604.27085 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
05-01 00:09 | Success | - | |
|
exp_self.20260501000612.379_20260501_000612
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260501000612.379 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
05-01 00:07 | Success | - | |
|
exp_self.20260430235842.378_20260430_235842
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430235842.378 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 23:59 | Success | - | |
|
exp_pytrain.20260430235607.094_20260430_235607
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 23:57 | Success | - | |
|
exp_self.20260430234911.377_20260430_234911
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430234911.377 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 23:50 | Success | - | |
|
exp_self.20260430234139.376_20260430_234140
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430234139.376 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 23:42 | Success | - | |
|
exp_self.20260430233410.375_20260430_233410
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430233410.375 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 23:35 | Success | - | |
|
exp_self.20260430232640.374_20260430_232640
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430232640.374 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 23:27 | Success | - | |
|
exp_pytrain.20260430232403.093_20260430_232403
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 23:25 | Success | - | |
|
exp_self.20260430231943.373_20260430_231943
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430231943.373 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 23:20 | Success | - | |
|
exp_self.20260430231212.372_20260430_231212
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430231212.372 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 23:13 | Success | - | |
|
exp_self.20260430230438.371_20260430_230439
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430230438.371 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 23:05 | Success | - | |
|
exp_self.20260430225707.370_20260430_225707
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430225707.370 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 22:58 | Success | - | |
|
exp_pytrain.20260430225217.092_20260430_225217
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 22:53 | Success | - | |
|
exp_self.20260430225012.369_20260430_225013
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430225012.369 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 22:51 | Success | - | |
|
exp_self.20260430224239.368_20260430_224239
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430224239.368 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 22:43 | Success | - | |
|
exp_self.20260430223508.367_20260430_223508
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430223508.367 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 22:36 | Success | - | |
|
exp_self.20260430222737.366_20260430_222738
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430222737.366 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 22:28 | Success | - | |
|
exp_hf_2604.27083_20260430_222418
|
Co-Evolving Policy Distillation
Paper ID: hf_2604.27083 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-30 22:25 | Success | - | |
|
exp_pytrain.20260430221953.091_20260430_221954
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 22:20 | Success | - | |
|
exp_self.20260430221749.365_20260430_221749
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430221749.365 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 22:18 | Success | - | |
|
exp_self.20260430221014.364_20260430_221015
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430221014.364 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 22:11 | Success | - | |
|
exp_self.20260430220245.363_20260430_220245
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430220245.363 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 22:03 | Success | - | |
|
exp_hf_2604.28130_20260430_215946
|
MoCapAnything V2: End-to-End Motion Capture for Arbitrary Skeletons
Paper ID: hf_2604.28130 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-30 22:00 | Success | - | |
|
exp_self.20260430215242.362_20260430_215243
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430215242.362 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 21:53 | Success | - | |
|
exp_pytrain.20260430214757.090_20260430_214757
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 21:48 | Success | - | |
|
exp_self.20260430214554.361_20260430_214555
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430214554.361 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 21:46 | Success | - | |
|
exp_self.20260430213820.360_20260430_213820
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430213820.360 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 21:39 | Success | - | |
|
exp_2604.28190v1_20260430_213249
|
Representation Fréchet Loss for Visual Generation
Paper ID: 2604.28190v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-30 21:33 | Success | - | |
|
exp_self.20260430213042.359_20260430_213043
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430213042.359 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 21:31 | Success | - | |
|
exp_hf_2604.28169_20260430_212721
|
PhyCo: Learning Controllable Physical Priors for Generative Motion
Paper ID: hf_2604.28169 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-30 21:28 | Success | - | |
|
exp_self.20260430212149.358_20260430_212149
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430212149.358 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 21:22 | Success | - | |
|
exp_2604.28193v1_20260430_211833
|
Generalizable Sparse-View 3D Reconstruction from Unconstrained Images
Paper ID: 2604.28193v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-30 21:19 | Success | - | |
|
exp_pytrain.20260430211515.089_20260430_211515
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 21:16 | Success | - | |
|
exp_self.20260430211208.357_20260430_211208
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430211208.357 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 21:13 | Success | - | |
|
exp_hf_2604.28190_20260430_210845
|
Representation Fréchet Loss for Visual Generation
Paper ID: hf_2604.28190 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-30 21:09 | Success | - | |
|
exp_hf_2604.28185_20260430_210444
|
Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling
Paper ID: hf_2604.28185 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-30 21:05 | Success | - | |
|
exp_self.20260430210127.356_20260430_210127
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430210127.356 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 21:02 | Success | - | |
|
exp_self.20260430205355.355_20260430_205356
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430205355.355 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 20:54 | Success | - | |
|
exp_self.20260430204624.354_20260430_204624
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430204624.354 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 20:47 | Success | - | |
|
exp_pytrain.20260430204349.088_20260430_204349
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 20:44 | Success | - | |
|
exp_self.20260430203656.353_20260430_203656
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430203656.353 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 20:37 | Success | - | |
|
exp_self.20260430202920.352_20260430_202920
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430202920.352 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 20:30 | Success | - | |
|
exp_2604.28056v1_20260430_202603
|
RHyVE: Competence-Aware Verification and Phase-Aware Deployment for LLM-Generated Reward Hypotheses
Paper ID: 2604.28056v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-30 20:27 | Success | - | |
|
exp_self.20260430202145.351_20260430_202145
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430202145.351 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 20:22 | Success | - | |
|
exp_hf_2604.23758_20260430_201823
|
Agentic Fusion of Large Atomic and Language Models to Accelerate Superconductors Discovery
Paper ID: hf_2604.23758 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-30 20:19 | Success | - | |
|
exp_self.20260430201407.350_20260430_201407
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430201407.350 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 20:15 | Success | - | |
|
exp_pytrain.20260430201136.087_20260430_201136
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 20:12 | Success | - | |
|
exp_self.20260430200432.349_20260430_200432
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430200432.349 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 20:05 | Success | - | |
|
exp_self.20260430195702.348_20260430_195702
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430195702.348 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 19:58 | Success | - | |
|
exp_self.20260430194934.347_20260430_194934
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430194934.347 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 19:50 | Success | - | |
|
exp_self.20260430194158.346_20260430_194159
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430194158.346 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 19:43 | Success | - | |
|
exp_pytrain.20260430193925.086_20260430_193925
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 19:40 | Success | - | |
|
exp_self.20260430193223.345_20260430_193223
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430193223.345 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 19:33 | Success | - | |
|
exp_self.20260430192453.344_20260430_192453
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430192453.344 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 19:25 | Success | - | |
|
exp_self.20260430191723.343_20260430_191723
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430191723.343 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 19:18 | Success | - | |
|
exp_self.20260430190950.342_20260430_190951
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430190950.342 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 19:10 | Success | - | |
|
exp_pytrain.20260430190711.085_20260430_190711
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 19:08 | Success | - | |
|
exp_self.20260430190014.341_20260430_190014
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430190014.341 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 19:01 | Success | - | |
|
exp_self.20260430185241.340_20260430_185242
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430185241.340 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 18:53 | Success | - | |
|
exp_self.20260430184513.339_20260430_184513
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430184513.339 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 18:46 | Success | - | |
|
exp_self.20260430183743.338_20260430_183743
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430183743.338 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 18:38 | Success | - | |
|
exp_pytrain.20260430183509.084_20260430_183509
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 18:36 | Success | - | |
|
exp_self.20260430182808.337_20260430_182809
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430182808.337 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 18:29 | Success | - | |
|
exp_self.20260430182037.336_20260430_182037
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430182037.336 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 18:21 | Success | - | |
|
exp_self.20260430181256.335_20260430_181256
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430181256.335 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 18:13 | Success | - | |
|
exp_self.20260430180529.334_20260430_180530
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430180529.334 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 18:06 | Success | - | |
|
exp_pytrain.20260430180255.083_20260430_180256
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 18:03 | Success | - | |
|
exp_self.20260430175550.333_20260430_175551
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430175550.333 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 17:56 | Success | - | |
|
exp_self.20260430174816.332_20260430_174816
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430174816.332 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 17:49 | Success | - | |
|
exp_self.20260430174050.331_20260430_174051
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430174050.331 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 17:41 | Success | - | |
|
exp_self.20260430173329.330_20260430_173330
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430173329.330 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 17:34 | Success | - | |
|
exp_pytrain.20260430173016.082_20260430_173016
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 17:31 | Success | - | |
|
exp_self.20260430172442.329_20260430_172442
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430172442.329 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 17:25 | Success | - | |
|
exp_self.20260430171659.328_20260430_171700
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430171659.328 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 17:18 | Success | - | |
|
exp_self.20260430170915.327_20260430_170916
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430170915.327 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 17:10 | Success | - | |
|
exp_self.20260430170116.326_20260430_170117
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430170116.326 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 17:02 | Success | - | |
|
exp_pytrain.20260430165837.081_20260430_165838
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 16:59 | Success | - | |
|
exp_self.20260430165117.325_20260430_165118
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430165117.325 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 16:52 | Success | - | |
|
exp_self.20260430164352.324_20260430_164353
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430164352.324 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 16:44 | Success | - | |
|
exp_self.20260430163616.323_20260430_163617
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430163616.323 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 16:37 | Success | - | |
|
exp_self.20260430162849.322_20260430_162849
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430162849.322 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 16:29 | Success | - | |
|
exp_pytrain.20260430162619.080_20260430_162619
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 16:27 | Success | - | |
|
exp_self.20260430161923.321_20260430_161923
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430161923.321 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 16:20 | Success | - | |
|
exp_self.20260430161152.320_20260430_161152
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430161152.320 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 16:12 | Success | - | |
|
exp_self.20260430160430.319_20260430_160430
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430160430.319 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 16:05 | Success | - | |
|
exp_self.20260430155704.318_20260430_155704
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430155704.318 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 15:58 | Success | - | |
|
exp_pytrain.20260430155440.079_20260430_155441
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 15:55 | Success | - | |
|
exp_self.20260430154740.317_20260430_154740
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430154740.317 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 15:48 | Success | - | |
|
exp_self.20260430154016.316_20260430_154017
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430154016.316 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 15:41 | Success | - | |
|
exp_self.20260430153243.315_20260430_153243
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430153243.315 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 15:33 | Success | - | |
|
exp_self.20260430152513.314_20260430_152514
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430152513.314 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 15:26 | Success | - | |
|
exp_pytrain.20260430152244.078_20260430_152245
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 15:23 | Success | - | |
|
exp_self.20260430151542.313_20260430_151542
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430151542.313 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 15:16 | Success | - | |
|
exp_self.20260430150813.312_20260430_150814
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430150813.312 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 15:09 | Success | - | |
|
exp_self.20260430150045.311_20260430_150045
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430150045.311 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 15:01 | Success | - | |
|
exp_self.20260430145311.310_20260430_145311
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430145311.310 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 14:54 | Success | - | |
|
exp_pytrain.20260430145039.077_20260430_145040
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 14:51 | Success | - | |
|
exp_self.20260430144338.309_20260430_144338
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430144338.309 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 14:44 | Success | - | |
|
exp_self.20260430143611.308_20260430_143611
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430143611.308 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 14:37 | Success | - | |
|
exp_self.20260430142840.307_20260430_142840
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430142840.307 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 14:29 | Success | - | |
|
exp_self.20260430142108.306_20260430_142109
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430142108.306 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 14:22 | Success | - | |
|
exp_pytrain.20260430141841.076_20260430_141841
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 14:19 | Success | - | |
|
exp_self.20260430141131.305_20260430_141131
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430141131.305 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 14:12 | Success | - | |
|
exp_self.20260430140341.304_20260430_140342
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430140341.304 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 14:04 | Success | - | |
|
exp_self.20260430135607.303_20260430_135607
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430135607.303 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 13:57 | Success | - | |
|
exp_self.20260430134839.302_20260430_134839
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430134839.302 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 13:49 | Success | - | |
|
exp_pytrain.20260430134605.075_20260430_134605
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 13:47 | Success | - | |
|
exp_self.20260430133915.301_20260430_133916
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430133915.301 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 13:40 | Success | - | |
|
exp_self.20260430133150.300_20260430_133150
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430133150.300 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 13:32 | Success | - | |
|
exp_self.20260430132427.299_20260430_132428
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430132427.299 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 13:25 | Success | - | |
|
exp_self.20260430131703.298_20260430_131704
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430131703.298 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 13:18 | Success | - | |
|
exp_pytrain.20260430131432.074_20260430_131432
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 13:15 | Success | - | |
|
exp_hf_2604.23426_20260430_130929
|
Enhanced Privacy and Communication Efficiency in Non-IID Federated Learning with Adaptive Quantization and Differential...
Paper ID: hf_2604.23426 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-30 13:10 | Success | - | |
|
exp_self.20260430130723.297_20260430_130723
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430130723.297 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 13:08 | Success | - | |
|
exp_hf_2604.25135_20260430_130257
|
FAMA: Failure-Aware Meta-Agentic Framework for Open-Source LLMs in Interactive Tool Use Environments
Paper ID: hf_2604.25135 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-30 13:03 | Success | - | |
|
exp_self.20260430130021.296_20260430_130021
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430130021.296 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 13:01 | Success | - | |
|
exp_hf_2604.26091_20260430_125701
|
Operating-Layer Controls for Onchain Language-Model Agents Under Real Capital
Paper ID: hf_2604.26091 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-30 12:58 | Success | - | |
|
exp_self.20260430125241.295_20260430_125241
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430125241.295 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 12:53 | Success | - | |
|
exp_self.20260430124510.294_20260430_124510
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430124510.294 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 12:46 | Success | - | |
|
exp_pytrain.20260430124235.073_20260430_124235
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 12:43 | Success | - | |
|
exp_self.20260430123524.293_20260430_123524
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430123524.293 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 12:36 | Success | - | |
|
exp_self.20260430122737.292_20260430_122738
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430122737.292 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 12:28 | Success | - | |
|
exp_self.20260430121957.291_20260430_121957
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430121957.291 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 12:21 | Success | - | |
|
exp_self.20260430121225.290_20260430_121226
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430121225.290 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 12:13 | Success | - | |
|
exp_pytrain.20260430120947.072_20260430_120947
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 12:10 | Success | - | |
|
exp_self.20260430120249.289_20260430_120250
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430120249.289 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 12:03 | Success | - | |
|
exp_self.20260430115523.288_20260430_115523
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430115523.288 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 11:56 | Success | - | |
|
exp_self.20260430114758.287_20260430_114758
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430114758.287 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 11:49 | Success | - | |
|
exp_self.20260430114032.286_20260430_114032
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430114032.286 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 11:41 | Success | - | |
|
exp_pytrain.20260430113801.071_20260430_113801
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 11:39 | Success | - | |
|
exp_self.20260430113109.285_20260430_113110
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430113109.285 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 11:32 | Success | - | |
|
exp_self.20260430112332.284_20260430_112332
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430112332.284 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 11:24 | Success | - | |
|
exp_self.20260430111556.283_20260430_111557
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430111556.283 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 11:16 | Success | - | |
|
exp_self.20260430110827.282_20260430_110827
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430110827.282 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 11:09 | Success | - | |
|
exp_pytrain.20260430110546.070_20260430_110547
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 11:06 | Success | - | |
|
exp_self.20260430105859.281_20260430_105900
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430105859.281 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 11:00 | Success | - | |
|
exp_self.20260430105128.280_20260430_105129
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430105128.280 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 10:52 | Success | - | |
|
exp_self.20260430104355.279_20260430_104356
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430104355.279 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 10:44 | Success | - | |
|
exp_self.20260430103628.278_20260430_103629
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430103628.278 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 10:37 | Success | - | |
|
exp_pytrain.20260430103404.069_20260430_103404
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 10:35 | Success | - | |
|
exp_self.20260430102703.277_20260430_102704
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430102703.277 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 10:28 | Success | - | |
|
exp_self.20260430101935.276_20260430_101935
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430101935.276 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 10:20 | Success | - | |
|
exp_self.20260430101202.275_20260430_101202
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430101202.275 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 10:13 | Success | - | |
|
exp_self.20260430100430.274_20260430_100431
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430100430.274 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 10:05 | Success | - | |
|
exp_pytrain.20260430100207.068_20260430_100208
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 10:03 | Success | - | |
|
exp_self.20260430095508.273_20260430_095508
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430095508.273 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 09:56 | Success | - | |
|
exp_self.20260430094817.272_20260430_094818
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430094817.272 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 09:49 | Success | - | |
|
exp_self.20260430094045.271_20260430_094045
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430094045.271 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 09:41 | Success | - | |
|
exp_self.20260430093307.270_20260430_093309
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430093307.270 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 09:34 | Success | - | |
|
exp_pytrain.20260430093028.067_20260430_093028
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 09:31 | Success | - | |
|
exp_self.20260430092326.269_20260430_092327
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430092326.269 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 09:24 | Success | - | |
|
exp_self.20260430091555.268_20260430_091555
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430091555.268 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 09:16 | Success | - | |
|
exp_self.20260430090828.267_20260430_090828
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430090828.267 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 09:09 | Success | - | |
|
exp_self.20260430090058.266_20260430_090058
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430090058.266 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 09:02 | Success | - | |
|
exp_pytrain.20260430085827.066_20260430_085828
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 08:59 | Success | - | |
|
exp_hf_2604.24351_20260430_085542
|
Diffusion Templates: A Unified Plugin Framework for Controllable Diffusion
Paper ID: hf_2604.24351 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-30 08:56 | Success | - | |
|
exp_self.20260430085122.265_20260430_085122
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430085122.265 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 08:52 | Success | - | |
|
exp_self.20260430084352.264_20260430_084353
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430084352.264 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 08:44 | Success | - | |
|
exp_self.20260430083627.263_20260430_083628
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430083627.263 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 08:37 | Success | - | |
|
exp_self.20260430082843.262_20260430_082843
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430082843.262 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 08:29 | Success | - | |
|
exp_pytrain.20260430082618.065_20260430_082619
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 08:27 | Success | - | |
|
exp_self.20260430081914.261_20260430_081914
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430081914.261 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 08:20 | Success | - | |
|
exp_self.20260430081140.260_20260430_081141
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430081140.260 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 08:12 | Success | - | |
|
exp_self.20260430080408.259_20260430_080408
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430080408.259 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 08:05 | Success | - | |
|
exp_self.20260430075626.258_20260430_075627
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430075626.258 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 07:57 | Success | - | |
|
exp_pytrain.20260430075356.064_20260430_075357
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 07:54 | Success | - | |
|
exp_self.20260430074824.257_20260430_074825
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430074824.257 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 07:49 | Success | - | |
|
exp_self.20260430074039.256_20260430_074039
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430074039.256 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 07:41 | Success | - | |
|
exp_self.20260430073251.255_20260430_073252
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430073251.255 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 07:33 | Success | - | |
|
exp_self.20260430072459.254_20260430_072503
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430072459.254 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 07:26 | Success | - | |
|
exp_pytrain.20260430072204.063_20260430_072204
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 07:23 | Success | - | |
|
exp_self.20260430071609.253_20260430_071610
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430071609.253 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 07:17 | Success | - | |
|
exp_self.20260430070757.252_20260430_070757
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430070757.252 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 07:09 | Success | - | |
|
exp_self.20260430070031.251_20260430_070032
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430070031.251 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 07:01 | Success | - | |
|
exp_self.20260430065315.250_20260430_065315
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430065315.250 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 06:54 | Success | - | |
|
exp_pytrain.20260430065011.062_20260430_065012
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 06:51 | Success | - | |
|
exp_self.20260430064353.249_20260430_064353
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430064353.249 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 06:44 | Success | - | |
|
exp_self.20260430063630.248_20260430_063631
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430063630.248 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 06:37 | Success | - | |
|
exp_self.20260430062932.247_20260430_062932
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430062932.247 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 06:30 | Success | - | |
|
exp_self.20260430062239.246_20260430_062240
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430062239.246 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 06:23 | Success | - | |
|
exp_pytrain.20260430061731.061_20260430_061732
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 06:18 | Success | - | |
|
exp_self.20260430061526.245_20260430_061527
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430061526.245 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 06:16 | Success | - | |
|
exp_self.20260430060729.244_20260430_060729
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430060729.244 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 06:08 | Success | - | |
|
exp_self.20260430060018.243_20260430_060018
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430060018.243 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 06:01 | Success | - | |
|
exp_self.20260430055334.242_20260430_055334
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430055334.242 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 05:54 | Success | - | |
|
exp_self.20260430054639.241_20260430_054639
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430054639.241 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 05:47 | Success | - | |
|
exp_pytrain.20260430054404.060_20260430_054405
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 05:45 | Success | - | |
|
exp_self.20260430053634.240_20260430_053634
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430053634.240 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 05:37 | Success | - | |
|
exp_self.20260430052914.239_20260430_052914
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430052914.239 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 05:30 | Success | - | |
|
exp_self.20260430052231.238_20260430_052231
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430052231.238 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 05:23 | Success | - | |
|
exp_self.20260430051516.237_20260430_051516
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430051516.237 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 05:16 | Success | - | |
|
exp_pytrain.20260430051233.059_20260430_051234
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 05:13 | Success | - | |
|
exp_self.20260430050608.236_20260430_050608
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430050608.236 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 05:07 | Success | - | |
|
exp_oa_W7157506044_20260430_050303
|
Focus Session: Hardware and Software Techniques for Accelerating Multimodal Foundation Models
Paper ID: oa_W7157506044 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-30 05:04 | Success | - | |
|
exp_oa_W7157506014_20260430_045847
|
SpikingBrain2.0: Brain-Inspired Foundation Models for Efficient Long-Context and Cross-Platform Inference
Paper ID: oa_W7157506014 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-30 04:59 | Success | - | |
|
exp_self.20260430045639.235_20260430_045639
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430045639.235 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 04:57 | Success | - | |
|
exp_self.20260430044950.234_20260430_044951
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430044950.234 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 04:50 | Success | - | |
|
exp_self.20260430044255.233_20260430_044255
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430044255.233 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 04:43 | Success | - | |
|
exp_pytrain.20260430044010.058_20260430_044010
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 04:41 | Success | - | |
|
exp_self.20260430043335.232_20260430_043335
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430043335.232 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 04:34 | Success | - | |
|
exp_self.20260430042634.231_20260430_042634
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430042634.231 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 04:27 | Success | - | |
|
exp_hf_2604.24927_20260430_042145
|
Large Language Models Explore by Latent Distilling
Paper ID: hf_2604.24927 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-30 04:22 | Success | - | |
|
exp_self.20260430041934.230_20260430_041935
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430041934.230 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 04:20 | Success | - | |
|
exp_self.20260430041133.229_20260430_041134
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430041133.229 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 04:12 | Success | - | |
|
exp_pytrain.20260430040841.057_20260430_040841
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 04:09 | Success | - | |
|
exp_self.20260430040223.228_20260430_040224
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430040223.228 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 04:03 | Success | - | |
|
exp_self.20260430035541.227_20260430_035541
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430035541.227 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 03:56 | Success | - | |
|
exp_self.20260430034848.226_20260430_034848
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430034848.226 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 03:49 | Success | - | |
|
exp_self.20260430034146.225_20260430_034146
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430034146.225 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 03:42 | Success | - | |
|
exp_pytrain.20260430033638.056_20260430_033639
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 03:37 | Success | - | |
|
exp_self.20260430033433.224_20260430_033434
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430033433.224 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 03:35 | Success | - | |
|
exp_self.20260430032728.223_20260430_032728
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430032728.223 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 03:28 | Success | - | |
|
exp_self.20260430032046.222_20260430_032046
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430032046.222 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 03:21 | Success | - | |
|
exp_self.20260430031243.221_20260430_031243
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430031243.221 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 03:13 | Success | - | |
|
exp_self.20260430030550.220_20260430_030550
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430030550.220 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 03:06 | Success | - | |
|
exp_pytrain.20260430030257.055_20260430_030258
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 03:04 | Success | - | |
|
exp_self.20260430025642.219_20260430_025642
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430025642.219 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 02:57 | Success | - | |
|
exp_self.20260430024937.218_20260430_024937
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430024937.218 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 02:50 | Success | - | |
|
exp_self.20260430024245.217_20260430_024245
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430024245.217 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 02:43 | Success | - | |
|
exp_self.20260430023557.216_20260430_023558
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430023557.216 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 02:37 | Success | - | |
|
exp_pytrain.20260430023045.054_20260430_023045
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 02:31 | Success | - | |
|
exp_self.20260430022847.215_20260430_022847
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430022847.215 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 02:29 | Success | - | |
|
exp_self.20260430022154.214_20260430_022155
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430022154.214 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 02:22 | Success | - | |
|
exp_self.20260430021443.213_20260430_021443
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430021443.213 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 02:15 | Success | - | |
|
exp_self.20260430020743.212_20260430_020744
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430020743.212 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 02:08 | Success | - | |
|
exp_self.20260430020020.211_20260430_020031
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430020020.211 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 02:01 | Success | - | |
|
exp_pytrain.20260430015734.053_20260430_015734
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 01:58 | Success | - | |
|
exp_self.20260430015125.210_20260430_015125
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430015125.210 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 01:52 | Success | - | |
|
exp_self.20260430014418.209_20260430_014418
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430014418.209 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 01:45 | Success | - | |
|
exp_self.20260430013706.208_20260430_013706
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430013706.208 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 01:38 | Success | - | |
|
exp_self.20260430013020.207_20260430_013020
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430013020.207 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 01:31 | Success | - | |
|
exp_pytrain.20260430012520.052_20260430_012520
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 01:26 | Success | - | |
|
exp_self.20260430012312.206_20260430_012312
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430012312.206 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 01:24 | Success | - | |
|
exp_self.20260430011559.205_20260430_011600
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430011559.205 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 01:17 | Success | - | |
|
exp_self.20260430010918.204_20260430_010918
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430010918.204 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 01:10 | Success | - | |
|
exp_self.20260430010231.203_20260430_010231
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430010231.203 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 01:03 | Success | - | |
|
exp_self.20260430005532.202_20260430_005532
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430005532.202 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 00:56 | Success | - | |
|
exp_pytrain.20260430005246.051_20260430_005246
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 00:53 | Success | - | |
|
exp_self.20260430004622.201_20260430_004623
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430004622.201 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 00:47 | Success | - | |
|
exp_self.20260430003935.200_20260430_003936
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430003935.200 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 00:40 | Success | - | |
|
exp_self.20260430003206.199_20260430_003207
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430003206.199 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 00:33 | Success | - | |
|
exp_self.20260430002519.198_20260430_002519
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430002519.198 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 00:26 | Success | - | |
|
exp_pytrain.20260430002127.050_20260430_002127
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-30 00:22 | Success | - | |
|
exp_self.20260430001800.197_20260430_001801
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430001800.197 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 00:19 | Success | - | |
|
exp_self.20260430001003.196_20260430_001003
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430001003.196 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 00:11 | Success | - | |
|
exp_self.20260430000307.195_20260430_000307
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260430000307.195 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-30 00:04 | Success | - | |
|
exp_self.20260429235507.194_20260429_235507
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429235507.194 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 23:56 | Success | - | |
|
exp_pytrain.20260429234958.049_20260429_234959
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 23:51 | Success | - | |
|
exp_self.20260429234753.193_20260429_234754
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429234753.193 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 23:48 | Success | - | |
|
exp_self.20260429234104.192_20260429_234104
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429234104.192 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 23:42 | Success | - | |
|
exp_self.20260429233355.191_20260429_233356
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429233355.191 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 23:34 | Success | - | |
|
exp_self.20260429232714.190_20260429_232714
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429232714.190 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 23:28 | Success | - | |
|
exp_self.20260429232018.189_20260429_232019
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429232018.189 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 23:21 | Success | - | |
|
exp_pytrain.20260429231724.048_20260429_231725
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 23:18 | Success | - | |
|
exp_self.20260429231302.188_20260429_231303
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429231302.188 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 23:14 | Success | - | |
|
exp_self.20260429230506.187_20260429_230506
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429230506.187 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 23:06 | Success | - | |
|
exp_self.20260429225732.186_20260429_225732
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429225732.186 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 22:59 | Success | - | |
|
exp_self.20260429225004.185_20260429_225004
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429225004.185 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 22:51 | Success | - | |
|
exp_pytrain.20260429224537.047_20260429_224538
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 22:46 | Success | - | |
|
exp_self.20260429224250.184_20260429_224250
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429224250.184 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 22:43 | Success | - | |
|
exp_self.20260429223542.183_20260429_223542
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429223542.183 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 22:36 | Success | - | |
|
exp_self.20260429222843.182_20260429_222844
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429222843.182 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 22:29 | Success | - | |
|
exp_self.20260429222133.181_20260429_222133
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429222133.181 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 22:22 | Success | - | |
|
exp_self.20260429221432.180_20260429_221433
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429221432.180 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 22:15 | Success | - | |
|
exp_pytrain.20260429221123.046_20260429_221124
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 22:12 | Success | - | |
|
exp_self.20260429220658.179_20260429_220659
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429220658.179 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 22:08 | Success | - | |
|
exp_2604.26940v1_20260429_220337
|
Select to Think: Unlocking SLM Potential with Local Sufficiency
Paper ID: 2604.26940v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-29 22:04 | Success | - | |
|
exp_self.20260429215901.178_20260429_215901
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429215901.178 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 22:00 | Success | - | |
|
exp_hf_2604.26951_20260429_215532
|
Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models
Paper ID: hf_2604.26951 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-29 21:56 | Success | - | |
|
exp_self.20260429215000.177_20260429_215000
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429215000.177 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 21:51 | Success | - | |
|
exp_2604.26951v1_20260429_214641
|
Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models
Paper ID: 2604.26951v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-29 21:47 | Success | - | |
|
exp_self.20260429214213.176_20260429_214213
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429214213.176 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 21:43 | Success | - | |
|
exp_pytrain.20260429213918.045_20260429_213918
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 21:40 | Success | - | |
|
exp_hf_2604.26779_20260429_213657
|
Accelerating RL Post-Training Rollouts via System-Integrated Speculative Decoding
Paper ID: hf_2604.26779 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-29 21:37 | Success | - | |
|
exp_self.20260429212939.175_20260429_212939
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429212939.175 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 21:30 | Success | - | |
|
exp_self.20260429212159.174_20260429_212159
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429212159.174 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 21:23 | Success | - | |
|
exp_hf_2604.26694_20260429_211854
|
Unified 4D World Action Modeling from Video Priors with Asynchronous Denoising
Paper ID: hf_2604.26694 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-29 21:19 | Success | - | |
|
exp_self.20260429211140.173_20260429_211140
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429211140.173 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 21:12 | Success | - | |
|
exp_pytrain.20260429210644.044_20260429_210644
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 21:07 | Success | - | |
|
exp_self.20260429210423.172_20260429_210424
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429210423.172 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 21:05 | Success | - | |
|
exp_2604.26868v1_20260429_210126
|
Breaking the Rigid Prior: Towards Articulated 3D Anomaly Detection
Paper ID: 2604.26868v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-29 21:02 | Success | - | |
|
exp_2604.26857v1_20260429_205717
|
Edge AI for Automotive Vulnerable Road User Safety: Deployable Detection via Knowledge Distillation
Paper ID: 2604.26857v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-29 20:58 | Success | - | |
|
exp_self.20260429205500.171_20260429_205500
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429205500.171 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 20:56 | Success | - | |
|
exp_self.20260429204711.170_20260429_204711
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429204711.170 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 20:48 | Success | - | |
|
exp_2604.26866v1_20260429_204345
|
MoRFI: Monotonic Sparse Autoencoder Feature Identification
Paper ID: 2604.26866v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-29 20:44 | Success | - | |
|
exp_self.20260429203955.169_20260429_203955
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429203955.169 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 20:40 | Success | - | |
|
exp_pytrain.20260429203510.043_20260429_203510
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 20:36 | Success | - | |
|
exp_self.20260429203302.168_20260429_203303
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429203302.168 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 20:34 | Success | - | |
|
exp_cr_10.22214_ijraset.2026.80728_20260429_202947
|
ViT-YOLOv8: A Hybrid Transformer-Convolutional Model for Small Object Classification in UAV Imagery Using VisDrone
Paper ID: cr_10.22214_ijraset.2026.80728 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Rec...
|
04-29 20:30 | Success | - | |
|
exp_self.20260429202521.167_20260429_202522
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429202521.167 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 20:26 | Success | - | |
|
exp_self.20260429201426.166_20260429_201426
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429201426.166 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 20:15 | Success | - | |
|
exp_self.20260429200546.165_20260429_200547
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429200546.165 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 20:06 | Success | - | |
|
exp_pytrain.20260429200309.042_20260429_200309
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 20:04 | Success | - | |
|
exp_self.20260429195733.164_20260429_195733
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429195733.164 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 19:58 | Success | - | |
|
exp_self.20260429194949.163_20260429_194950
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429194949.163 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 19:50 | Success | - | |
|
exp_self.20260429194209.162_20260429_194209
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429194209.162 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 19:43 | Success | - | |
|
exp_self.20260429193427.161_20260429_193428
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429193427.161 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 19:35 | Success | - | |
|
exp_pytrain.20260429193141.041_20260429_193141
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 19:32 | Success | - | |
|
exp_self.20260429192442.160_20260429_192443
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429192442.160 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 19:25 | Success | - | |
|
exp_self.20260429191658.159_20260429_191658
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429191658.159 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 19:18 | Success | - | |
|
exp_self.20260429190914.158_20260429_190914
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429190914.158 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 19:10 | Success | - | |
|
exp_self.20260429190136.157_20260429_190136
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429190136.157 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 19:02 | Success | - | |
|
exp_pytrain.20260429185856.040_20260429_185856
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 19:00 | Success | - | |
|
exp_self.20260429185141.156_20260429_185142
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429185141.156 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 18:52 | Success | - | |
|
exp_self.20260429184358.155_20260429_184359
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429184358.155 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 18:45 | Success | - | |
|
exp_self.20260429183613.154_20260429_183613
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429183613.154 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 18:37 | Success | - | |
|
exp_self.20260429182851.153_20260429_182852
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429182851.153 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 18:29 | Success | - | |
|
exp_pytrain.20260429182627.039_20260429_182627
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 18:27 | Success | - | |
|
exp_self.20260429181931.152_20260429_181931
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429181931.152 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 18:20 | Success | - | |
|
exp_self.20260429181202.151_20260429_181202
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429181202.151 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 18:13 | Success | - | |
|
exp_self.20260429180426.150_20260429_180426
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429180426.150 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 18:05 | Success | - | |
|
exp_self.20260429175657.149_20260429_175658
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429175657.149 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 17:58 | Success | - | |
|
exp_pytrain.20260429175429.038_20260429_175429
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 17:55 | Success | - | |
|
exp_self.20260429174724.148_20260429_174725
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429174724.148 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 17:48 | Success | - | |
|
exp_self.20260429173954.147_20260429_173954
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429173954.147 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 17:40 | Success | - | |
|
exp_self.20260429173221.146_20260429_173221
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429173221.146 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 17:33 | Success | - | |
|
exp_self.20260429172456.145_20260429_172456
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429172456.145 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 17:25 | Success | - | |
|
exp_pytrain.20260429172232.037_20260429_172233
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 17:23 | Success | - | |
|
exp_self.20260429171814.144_20260429_171814
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429171814.144 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 17:19 | Success | - | |
|
exp_self.20260429171047.143_20260429_171048
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429171047.143 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 17:11 | Success | - | |
|
exp_self.20260429170314.142_20260429_170314
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429170314.142 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 17:04 | Success | - | |
|
exp_self.20260429165340.141_20260429_165341
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429165340.141 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 16:54 | Success | - | |
|
exp_pytrain.20260429165117.036_20260429_165118
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 16:52 | Success | - | |
|
exp_self.20260429164420.140_20260429_164420
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429164420.140 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 16:45 | Success | - | |
|
exp_self.20260429163658.139_20260429_163658
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429163658.139 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 16:38 | Success | - | |
|
exp_self.20260429162936.138_20260429_162936
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429162936.138 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 16:30 | Success | - | |
|
exp_self.20260429162205.137_20260429_162205
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429162205.137 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 16:23 | Success | - | |
|
exp_pytrain.20260429161941.035_20260429_161941
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 16:20 | Success | - | |
|
exp_self.20260429161425.136_20260429_161426
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429161425.136 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 16:15 | Success | - | |
|
exp_self.20260429160658.135_20260429_160658
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429160658.135 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 16:08 | Success | - | |
|
exp_self.20260429155936.134_20260429_155936
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429155936.134 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 16:00 | Success | - | |
|
exp_self.20260429155213.133_20260429_155213
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429155213.133 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 15:53 | Success | - | |
|
exp_pytrain.20260429154749.034_20260429_154749
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 15:48 | Success | - | |
|
exp_self.20260429154103.132_20260429_154103
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429154103.132 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 15:42 | Success | - | |
|
exp_self.20260429153327.131_20260429_153327
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429153327.131 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 15:34 | Success | - | |
|
exp_self.20260429152558.130_20260429_152558
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429152558.130 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 15:27 | Success | - | |
|
exp_self.20260429151831.129_20260429_151832
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429151831.129 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 15:19 | Success | - | |
|
exp_pytrain.20260429151605.033_20260429_151605
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 15:17 | Success | - | |
|
exp_self.20260429150907.128_20260429_150908
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429150907.128 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 15:10 | Success | - | |
|
exp_self.20260429150145.127_20260429_150145
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429150145.127 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 15:02 | Success | - | |
|
exp_self.20260429145411.126_20260429_145412
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429145411.126 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 14:55 | Success | - | |
|
exp_self.20260429144637.125_20260429_144637
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429144637.125 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 14:47 | Success | - | |
|
exp_pytrain.20260429144331.032_20260429_144331
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 14:44 | Success | - | |
|
exp_self.20260429143624.124_20260429_143624
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429143624.124 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 14:37 | Success | - | |
|
exp_self.20260429142843.123_20260429_142844
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429142843.123 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 14:29 | Success | - | |
|
exp_self.20260429142112.122_20260429_142112
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429142112.122 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 14:22 | Success | - | |
|
exp_self.20260429141336.121_20260429_141337
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429141336.121 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 14:14 | Success | - | |
|
exp_pytrain.20260429141106.031_20260429_141106
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 14:12 | Success | - | |
|
exp_self.20260429140403.120_20260429_140404
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429140403.120 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 14:05 | Success | - | |
|
exp_self.20260429135634.119_20260429_135635
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429135634.119 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 13:57 | Success | - | |
|
exp_self.20260429134908.118_20260429_134908
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429134908.118 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 13:50 | Success | - | |
|
exp_self.20260429134131.117_20260429_134131
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429134131.117 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 13:42 | Success | - | |
|
exp_pytrain.20260429133901.030_20260429_133902
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 13:40 | Success | - | |
|
exp_self.20260429133201.116_20260429_133201
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429133201.116 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 13:33 | Success | - | |
|
exp_self.20260429132430.115_20260429_132430
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429132430.115 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 13:25 | Success | - | |
|
exp_self.20260429131658.114_20260429_131658
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429131658.114 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 13:18 | Success | - | |
|
exp_self.20260429130922.113_20260429_130922
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429130922.113 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 13:10 | Success | - | |
|
exp_pytrain.20260429130651.029_20260429_130652
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 13:07 | Success | - | |
|
exp_self.20260429125949.112_20260429_125949
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429125949.112 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 13:00 | Success | - | |
|
exp_self.20260429125221.111_20260429_125221
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429125221.111 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 12:53 | Success | - | |
|
exp_self.20260429124450.110_20260429_124450
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429124450.110 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 12:45 | Success | - | |
|
exp_self.20260429123721.109_20260429_123721
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429123721.109 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 12:38 | Success | - | |
|
exp_pytrain.20260429123444.028_20260429_123444
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 12:35 | Success | - | |
|
exp_self.20260429122743.108_20260429_122744
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429122743.108 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 12:28 | Success | - | |
|
exp_self.20260429122011.107_20260429_122012
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429122011.107 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 12:21 | Success | - | |
|
exp_self.20260429121240.106_20260429_121240
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429121240.106 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 12:13 | Success | - | |
|
exp_self.20260429120510.105_20260429_120511
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429120510.105 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 12:06 | Success | - | |
|
exp_pytrain.20260429120236.027_20260429_120236
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 12:03 | Success | - | |
|
exp_self.20260429115535.104_20260429_115535
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429115535.104 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 11:56 | Success | - | |
|
exp_self.20260429114801.103_20260429_114801
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429114801.103 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 11:49 | Success | - | |
|
exp_self.20260429114031.102_20260429_114032
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429114031.102 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 11:41 | Success | - | |
|
exp_self.20260429113259.101_20260429_113300
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429113259.101 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 11:34 | Success | - | |
|
exp_pytrain.20260429113025.026_20260429_113026
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 11:31 | Success | - | |
|
exp_self.20260429112322.100_20260429_112322
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429112322.100 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 11:24 | Success | - | |
|
exp_self.20260429111540.099_20260429_111540
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429111540.099 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 11:16 | Success | - | |
|
exp_self.20260429110758.098_20260429_110758
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429110758.098 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 11:09 | Success | - | |
|
exp_self.20260429110026.097_20260429_110026
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429110026.097 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 11:01 | Success | - | |
|
exp_pytrain.20260429105751.025_20260429_105752
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 10:58 | Success | - | |
|
exp_self.20260429105213.096_20260429_105214
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429105213.096 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 10:53 | Success | - | |
|
exp_self.20260429104423.095_20260429_104423
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429104423.095 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 10:45 | Success | - | |
|
exp_self.20260429103635.094_20260429_103636
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429103635.094 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 10:37 | Success | - | |
|
exp_self.20260429102857.093_20260429_102857
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429102857.093 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 10:29 | Success | - | |
|
exp_pytrain.20260429102633.024_20260429_102633
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 10:27 | Success | - | |
|
exp_self.20260429101930.092_20260429_101931
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429101930.092 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 10:20 | Success | - | |
|
exp_self.20260429101202.091_20260429_101203
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429101202.091 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 10:13 | Success | - | |
|
exp_self.20260429100426.090_20260429_100426
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429100426.090 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 10:05 | Success | - | |
|
exp_self.20260429095643.089_20260429_095643
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429095643.089 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 09:57 | Success | - | |
|
exp_pytrain.20260429095419.023_20260429_095420
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 09:55 | Success | - | |
|
exp_self.20260429094953.088_20260429_094954
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429094953.088 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 09:50 | Success | - | |
|
exp_self.20260429094225.087_20260429_094225
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429094225.087 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 09:43 | Success | - | |
|
exp_self.20260429093500.086_20260429_093500
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429093500.086 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 09:36 | Success | - | |
|
exp_self.20260429092730.085_20260429_092730
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429092730.085 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 09:28 | Success | - | |
|
exp_pytrain.20260429092303.022_20260429_092303
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 09:24 | Success | - | |
|
exp_self.20260429091605.084_20260429_091605
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429091605.084 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 09:17 | Success | - | |
|
exp_self.20260429090821.083_20260429_090822
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429090821.083 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 09:09 | Success | - | |
|
exp_self.20260429090037.082_20260429_090038
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429090037.082 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 09:01 | Success | - | |
|
exp_self.20260429085252.081_20260429_085252
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429085252.081 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 08:53 | Success | - | |
|
exp_pytrain.20260429085020.021_20260429_085021
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 08:51 | Success | - | |
|
exp_self.20260429084424.080_20260429_084424
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429084424.080 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 08:45 | Success | - | |
|
exp_self.20260429083640.079_20260429_083641
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429083640.079 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 08:37 | Success | - | |
|
exp_self.20260429082900.078_20260429_082900
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429082900.078 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 08:30 | Success | - | |
|
exp_self.20260429082108.077_20260429_082108
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429082108.077 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 08:22 | Success | - | |
|
exp_pytrain.20260429081836.020_20260429_081837
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 08:19 | Success | - | |
|
exp_self.20260429081244.076_20260429_081244
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429081244.076 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 08:13 | Success | - | |
|
exp_self.20260429080451.075_20260429_080451
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429080451.075 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 08:05 | Success | - | |
|
exp_self.20260429075707.074_20260429_075708
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429075707.074 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 07:58 | Success | - | |
|
exp_self.20260429074936.073_20260429_074936
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429074936.073 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 07:50 | Success | - | |
|
exp_pytrain.20260429074707.019_20260429_074707
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 07:48 | Success | - | |
|
exp_self.20260429074003.072_20260429_074004
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429074003.072 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 07:41 | Success | - | |
|
exp_self.20260429073228.071_20260429_073228
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429073228.071 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 07:33 | Success | - | |
|
exp_self.20260429072449.070_20260429_072450
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429072449.070 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 07:25 | Success | - | |
|
exp_self.20260429071658.069_20260429_071658
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429071658.069 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 07:18 | Success | - | |
|
exp_pytrain.20260429071433.018_20260429_071433
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 07:15 | Success | - | |
|
exp_self.20260429070734.068_20260429_070734
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429070734.068 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 07:08 | Success | - | |
|
exp_self.20260429070007.067_20260429_070008
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429070007.067 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 07:01 | Success | - | |
|
exp_self.20260429065244.066_20260429_065244
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429065244.066 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 06:53 | Success | - | |
|
exp_self.20260429064518.065_20260429_064519
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429064518.065 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 06:46 | Success | - | |
|
exp_pytrain.20260429064248.017_20260429_064248
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 06:43 | Success | - | |
|
exp_self.20260429063534.064_20260429_063535
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429063534.064 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 06:36 | Success | - | |
|
exp_self.20260429062803.063_20260429_062804
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429062803.063 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 06:29 | Success | - | |
|
exp_self.20260429062037.062_20260429_062038
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429062037.062 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 06:21 | Success | - | |
|
exp_self.20260429061307.061_20260429_061308
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429061307.061 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 06:14 | Success | - | |
|
exp_pytrain.20260429061036.016_20260429_061037
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 06:11 | Success | - | |
|
exp_self.20260429060341.060_20260429_060341
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429060341.060 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 06:04 | Success | - | |
|
exp_self.20260429055613.059_20260429_055613
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429055613.059 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 05:57 | Success | - | |
|
exp_self.20260429054846.058_20260429_054846
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429054846.058 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 05:49 | Success | - | |
|
exp_self.20260429054123.057_20260429_054123
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429054123.057 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 05:42 | Success | - | |
|
exp_pytrain.20260429053853.015_20260429_053853
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 05:39 | Success | - | |
|
exp_self.20260429053206.056_20260429_053207
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429053206.056 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 05:33 | Success | - | |
|
exp_self.20260429052436.055_20260429_052436
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429052436.055 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 05:25 | Success | - | |
|
exp_self.20260429051705.054_20260429_051706
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429051705.054 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 05:18 | Success | - | |
|
exp_self.20260429050937.053_20260429_050937
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429050937.053 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 05:10 | Success | - | |
|
exp_pytrain.20260429050708.014_20260429_050709
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 05:08 | Success | - | |
|
exp_self.20260429050013.052_20260429_050013
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429050013.052 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 05:01 | Success | - | |
|
exp_self.20260429045238.051_20260429_045238
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429045238.051 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 04:53 | Success | - | |
|
exp_self.20260429044501.050_20260429_044501
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429044501.050 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 04:46 | Success | - | |
|
exp_self.20260429043731.049_20260429_043732
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429043731.049 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 04:38 | Success | - | |
|
exp_pytrain.20260429043508.013_20260429_043508
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 04:36 | Success | - | |
|
exp_self.20260429042806.048_20260429_042806
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429042806.048 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 04:29 | Success | - | |
|
exp_self.20260429042039.047_20260429_042040
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429042039.047 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 04:21 | Success | - | |
|
exp_self.20260429041305.046_20260429_041305
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429041305.046 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 04:14 | Success | - | |
|
exp_self.20260429040535.045_20260429_040535
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429040535.045 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 04:06 | Success | - | |
|
exp_pytrain.20260429040313.012_20260429_040313
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 04:04 | Success | - | |
|
exp_self.20260429035604.044_20260429_035604
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429035604.044 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 03:57 | Success | - | |
|
exp_self.20260429034834.043_20260429_034834
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429034834.043 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 03:49 | Success | - | |
|
exp_self.20260429034100.042_20260429_034101
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429034100.042 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 03:42 | Success | - | |
|
exp_self.20260429033328.041_20260429_033329
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429033328.041 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 03:34 | Success | - | |
|
exp_pytrain.20260429033106.011_20260429_033106
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 03:32 | Success | - | |
|
exp_self.20260429032403.040_20260429_032404
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429032403.040 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 03:25 | Success | - | |
|
exp_self.20260429031632.039_20260429_031633
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429031632.039 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 03:17 | Success | - | |
|
exp_self.20260429030859.038_20260429_030859
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429030859.038 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 03:10 | Success | - | |
|
exp_self.20260429030114.037_20260429_030115
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429030114.037 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 03:02 | Success | - | |
|
exp_pytrain.20260429025847.010_20260429_025847
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 02:59 | Success | - | |
|
exp_self.20260429025145.036_20260429_025145
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429025145.036 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 02:52 | Success | - | |
|
exp_self.20260429024415.035_20260429_024415
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429024415.035 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 02:45 | Success | - | |
|
exp_self.20260429023642.034_20260429_023642
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429023642.034 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 02:37 | Success | - | |
|
exp_self.20260429022907.033_20260429_022907
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429022907.033 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 02:30 | Success | - | |
|
exp_pytrain.20260429022639.009_20260429_022640
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 02:27 | Success | - | |
|
exp_hf_2604.25719_20260429_022357
|
Step-Audio-R1.5 Technical Report
Paper ID: hf_2604.25719 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-29 02:24 | Success | - | |
|
exp_self.20260429021938.032_20260429_021938
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429021938.032 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 02:20 | Success | - | |
|
exp_self.20260429021202.031_20260429_021203
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429021202.031 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 02:13 | Success | - | |
|
exp_self.20260429020427.030_20260429_020427
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429020427.030 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 02:05 | Success | - | |
|
exp_self.20260429015659.029_20260429_015659
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429015659.029 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 01:58 | Success | - | |
|
exp_pytrain.20260429015436.008_20260429_015436
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 01:55 | Success | - | |
|
exp_self.20260429014732.028_20260429_014733
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429014732.028 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 01:48 | Success | - | |
|
exp_self.20260429014007.027_20260429_014008
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429014007.027 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 01:41 | Success | - | |
|
exp_self.20260429013231.026_20260429_013232
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429013231.026 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 01:33 | Success | - | |
|
exp_self.20260429012500.025_20260429_012500
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429012500.025 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 01:26 | Success | - | |
|
exp_pytrain.20260429012238.007_20260429_012238
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 01:23 | Success | - | |
|
exp_self.20260429011538.024_20260429_011539
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429011538.024 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 01:16 | Success | - | |
|
exp_self.20260429010814.023_20260429_010814
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429010814.023 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 01:09 | Success | - | |
|
exp_self.20260429010032.022_20260429_010033
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429010032.022 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 01:01 | Success | - | |
|
exp_self.20260429005303.021_20260429_005304
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429005303.021 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 00:54 | Success | - | |
|
exp_pytrain.20260429005041.006_20260429_005041
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 00:51 | Success | - | |
|
exp_self.20260429004625.020_20260429_004626
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429004625.020 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 00:47 | Success | - | |
|
exp_self.20260429003900.019_20260429_003900
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429003900.019 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 00:40 | Success | - | |
|
exp_gh_burgerkhan6227_tokenWise-Optimizer_20260429_003437
|
burgerkhan6227/tokenWise-Optimizer
Paper ID: gh_burgerkhan6227_tokenWise-Optimizer - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Sign...
|
04-29 00:35 | Success | - | |
|
exp_self.20260429003127.018_20260429_003127
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429003127.018 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 00:32 | Success | - | |
|
exp_cr_10.30574_ijsra.2026.19.1.0697_20260429_002812
|
Formation and efficiency analysis of an innovative business model in automotive engineering based on the principles of o...
Paper ID: cr_10.30574_ijsra.2026.19.1.0697 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: R...
|
04-29 00:29 | Success | - | |
|
exp_self.20260429002249.017_20260429_002249
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429002249.017 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 00:23 | Success | - | |
|
exp_pytrain.20260429001912.005_20260429_001912
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-29 00:20 | Success | - | |
|
exp_self.20260429001458.016_20260429_001458
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429001458.016 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 00:16 | Success | - | |
|
exp_cr_10.65196_7a1sxq95_20260429_001208
|
<b>量子机器学习在大模型训练加速中的应用探索</b><b></b>
Paper ID: cr_10.65196_7a1sxq95 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered ben...
|
04-29 00:13 | Success | - | |
|
exp_self.20260429000505.015_20260429_000506
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260429000505.015 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-29 00:06 | Success | - | |
|
exp_cr_10.22214_ijraset.2026.79880_20260429_000155
|
Design and Evaluation of a Smartphone Application for Early Atopic Dermatitis Screening Using Large Language Model
Paper ID: cr_10.22214_ijraset.2026.79880 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Rec...
|
04-29 00:02 | Success | - | |
|
exp_self.20260428235733.014_20260428_235733
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428235733.014 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 23:58 | Success | - | |
|
exp_self.20260428235009.013_20260428_235009
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428235009.013 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 23:51 | Success | - | |
|
exp_pytrain.20260428234746.004_20260428_234746
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-28 23:48 | Success | - | |
|
exp_self.20260428234051.012_20260428_234052
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428234051.012 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 23:41 | Success | - | |
|
exp_self.20260428233325.011_20260428_233326
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428233325.011 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 23:34 | Success | - | |
|
exp_self.20260428232600.010_20260428_232600
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428232600.010 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 23:27 | Success | - | |
|
exp_self.20260428231823.009_20260428_231824
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428231823.009 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 23:19 | Success | - | |
|
exp_pytrain.20260428231559.003_20260428_231600
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-28 23:17 | Success | - | |
|
exp_self.20260428230859.008_20260428_230859
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428230859.008 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 23:10 | Success | - | |
|
exp_self.20260428230131.007_20260428_230132
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428230131.007 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 23:02 | Success | - | |
|
exp_self.20260428225412.006_20260428_225412
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428225412.006 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 22:55 | Success | - | |
|
exp_self.20260428224643.005_20260428_224643
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428224643.005 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 22:47 | Success | - | |
|
exp_pytrain.20260428224416.002_20260428_224416
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-28 22:45 | Success | - | |
|
exp_self.20260428223721.004_20260428_223721
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428223721.004 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 22:38 | Success | - | |
|
exp_self.20260428222954.003_20260428_222954
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428222954.003 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 22:30 | Success | - | |
|
exp_self.20260428222228.002_20260428_222229
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428222228.002 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 22:23 | Success | - | |
|
exp_hf_2604.23941_20260428_221910
|
GoClick: Lightweight Element Grounding Model for Autonomous GUI Interaction
Paper ID: hf_2604.23941 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-28 22:20 | Success | - | |
|
exp_self.20260428221455.001_20260428_221455
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428221455.001 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 22:15 | Success | - | |
|
exp_pytrain.20260428221232.001_20260428_221233
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-28 22:13 | Success | - | |
|
exp_self.20260428220844.040_20260428_220844
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428220844.040 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 22:09 | Success | - | |
|
exp_pytrain.20260428220612.011_20260428_220612
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-28 22:07 | Success | - | |
|
exp_2604.25902v1_20260428_220326
|
Toward a Functional Geometric Algebra for Natural Language Semantics
Paper ID: 2604.25902v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-28 22:04 | Success | - | |
|
exp_2604.25917v1_20260428_215820
|
Recursive Multi-Agent Systems
Paper ID: 2604.25917v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-28 21:59 | Success | - | |
|
exp_self.20260428215609.039_20260428_215609
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428215609.039 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 21:57 | Success | - | |
|
exp_2604.25903v1_20260428_215256
|
Carbon-Taxed Transformers: A Green Compression Pipeline for Overgrown Language Models
Paper ID: 2604.25903v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-28 21:53 | Success | - | |
|
exp_self.20260428214720.038_20260428_214721
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428214720.038 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 21:48 | Success | - | |
|
exp_self.20260428213945.037_20260428_213945
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428213945.037 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 21:40 | Success | - | |
|
exp_hf_2604.25203_20260428_213651
|
BARRED: Synthetic Training of Custom Policy Guardrails via Asymmetric Debate
Paper ID: hf_2604.25203 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-28 21:37 | Success | - | |
|
exp_pytrain.20260428213447.010_20260428_213448
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-28 21:35 | Success | - | |
|
exp_hf_2604.25819_20260428_212948
|
Mutual Forcing: Dual-Mode Self-Evolution for Fast Autoregressive Audio-Video Character Generation
Paper ID: hf_2604.25819 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-28 21:30 | Success | - | |
|
exp_self.20260428212744.036_20260428_212745
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428212744.036 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 21:28 | Success | - | |
|
exp_hf_2604.25427_20260428_212208
|
A Systematic Post-Train Framework for Video Generation
Paper ID: hf_2604.25427 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-28 21:23 | Success | - | |
|
exp_self.20260428212004.035_20260428_212004
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428212004.035 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 21:21 | Success | - | |
|
exp_self.20260428211233.034_20260428_211234
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428211233.034 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 21:13 | Success | - | |
|
exp_self.20260428210458.033_20260428_210458
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428210458.033 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 21:06 | Success | - | |
|
exp_pytrain.20260428210228.009_20260428_210228
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-28 21:03 | Success | - | |
|
exp_2604.25740v1_20260428_205727
|
QAROO: AI-Driven Online Task Offloading for Energy-Efficient and Sustainable MEC Networks
Paper ID: 2604.25740v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-28 20:58 | Success | - | |
|
exp_self.20260428205522.032_20260428_205522
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428205522.032 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 20:56 | Success | - | |
|
exp_2604.25774v1_20260428_205056
|
CGU-ILALab at FoodBench-QA 2026: Comparing Traditional and LLM-based Approaches for Recipe Nutrient Estimation
Paper ID: 2604.25774v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-28 20:51 | Success | - | |
|
exp_self.20260428204740.031_20260428_204741
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428204740.031 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 20:48 | Success | - | |
|
exp_hf_2604.25917_20260428_204421
|
Recursive Multi-Agent Systems
Paper ID: hf_2604.25917 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-28 20:45 | Success | - | |
|
exp_self.20260428203959.030_20260428_204000
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428203959.030 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 20:41 | Success | - | |
|
exp_self.20260428203229.029_20260428_203229
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428203229.029 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 20:33 | Success | - | |
|
exp_pytrain.20260428202954.008_20260428_202954
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-28 20:30 | Success | - | |
|
exp_hf_2604.18756_20260428_202707
|
Towards Understanding the Robustness of Sparse Autoencoders
Paper ID: hf_2604.18756 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-28 20:28 | Success | - | |
|
exp_self.20260428202141.028_20260428_202141
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428202141.028 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 20:22 | Success | - | |
|
exp_self.20260428201403.027_20260428_201404
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428201403.027 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 20:15 | Success | - | |
|
exp_self.20260428200633.026_20260428_200633
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428200633.026 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 20:07 | Success | - | |
|
exp_self.20260428195907.025_20260428_195907
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428195907.025 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 20:00 | Success | - | |
|
exp_pytrain.20260428195633.007_20260428_195633
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-28 19:57 | Success | - | |
|
exp_self.20260428194943.024_20260428_194943
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428194943.024 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 19:50 | Success | - | |
|
exp_self.20260428194241.023_20260428_194242
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428194241.023 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 19:43 | Success | - | |
|
exp_self.20260428193456.022_20260428_193457
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428193456.022 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 19:35 | Success | - | |
|
exp_self.20260428192730.021_20260428_192730
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428192730.021 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 19:28 | Success | - | |
|
exp_pytrain.20260428192459.006_20260428_192459
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-28 19:26 | Success | - | |
|
exp_self.20260428191815.020_20260428_191815
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428191815.020 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 19:19 | Success | - | |
|
exp_self.20260428191047.019_20260428_191048
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428191047.019 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 19:11 | Success | - | |
|
exp_self.20260428190326.018_20260428_190326
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428190326.018 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 19:04 | Success | - | |
|
exp_self.20260428185559.017_20260428_185600
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428185559.017 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 18:57 | Success | - | |
|
exp_pytrain.20260428185327.005_20260428_185328
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-28 18:54 | Success | - | |
|
exp_self.20260428184700.016_20260428_184700
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428184700.016 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 18:48 | Success | - | |
|
exp_self.20260428184004.015_20260428_184004
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428184004.015 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 18:41 | Success | - | |
|
exp_self.20260428183153.014_20260428_183154
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428183153.014 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 18:32 | Success | - | |
|
exp_self.20260428182343.013_20260428_182344
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428182343.013 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 18:24 | Success | - | |
|
exp_pytrain.20260428182039.004_20260428_182039
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-28 18:21 | Success | - | |
|
exp_self.20260428181406.012_20260428_181406
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428181406.012 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 18:15 | Success | - | |
|
exp_self.20260428180552.011_20260428_180553
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428180552.011 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 18:06 | Success | - | |
|
exp_self.20260428175850.010_20260428_175851
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428175850.010 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 17:59 | Success | - | |
|
exp_self.20260428175152.009_20260428_175153
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428175152.009 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 17:52 | Success | - | |
|
exp_pytrain.20260428174844.003_20260428_174845
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-28 17:49 | Success | - | |
|
exp_gh_Rangle2_mda_20260428_174430
|
Rangle2/mda
Paper ID: gh_Rangle2_mda - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-28 17:45 | Success | - | |
|
exp_self.20260428174142.008_20260428_174143
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428174142.008 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 17:42 | Success | - | |
|
exp_self.20260428173330.007_20260428_173330
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428173330.007 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 17:34 | Success | - | |
|
exp_self.20260428172558.006_20260428_172559
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428172558.006 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 17:27 | Success | - | |
|
exp_self.20260428171832.005_20260428_171832
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428171832.005 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 17:19 | Success | - | |
|
exp_pytrain.20260428171608.002_20260428_171608
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-28 17:17 | Success | - | |
|
exp_self.20260428170918.004_20260428_170919
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428170918.004 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 17:10 | Success | - | |
|
exp_self.20260428170158.003_20260428_170158
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428170158.003 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 17:03 | Success | - | |
|
exp_self.20260428165437.002_20260428_165438
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428165437.002 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 16:55 | Success | - | |
|
exp_hf_2604.15574_20260428_165123
|
Why Fine-Tuning Encourages Hallucinations and How to Fix It
Paper ID: hf_2604.15574 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-28 16:52 | Success | - | |
|
exp_self.20260428164709.001_20260428_164709
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428164709.001 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 16:48 | Success | - | |
|
exp_pytrain.20260428164446.001_20260428_164446
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-28 16:45 | Success | - | |
|
exp_self.20260428162713.043_20260428_162714
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428162713.043 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 16:38 | Pending | - | |
|
exp_pytrain.20260428162422.016_20260428_162422
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-28 16:25 | Success | - | |
|
exp_hf_2604.24040_20260428_160049
|
Improving Robustness of Tabular Retrieval via Representational Stability
Paper ID: hf_2604.24040 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-28 16:22 | Failed | NameError: name 'D_MODEL' is not defined | |
|
exp_self.20260428153717.042_20260428_153717
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428153717.042 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 15:59 | Failed | NameError: name 'D_MODEL' is not defined | |
|
exp_pytrain.20260428153326.015_20260428_153327
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-28 15:35 | Success | - | |
|
exp_hf_2604.21681_20260428_150852
|
Sapiens2
Paper ID: hf_2604.21681 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-28 15:31 | Failed | NameError: name 'D_MODEL' is not defined | |
|
exp_self.20260428144447.041_20260428_144447
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428144447.041 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 15:06 | Failed | NameError: name 'D_MODEL' is not defined | |
|
exp_pytrain.20260428144204.014_20260428_144205
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-28 14:43 | Success | - | |
|
exp_self.20260428141318.040_20260428_141319
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428141318.040 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 14:36 | Failed | NameError: name 'D_MODEL' is not defined | |
|
exp_pytrain.20260428140827.013_20260428_140827
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-28 14:11 | Success | - | |
|
exp_self.20260428134113.039_20260428_134113
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428134113.039 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 14:03 | Failed | NameError: name 'D_MODEL' is not defined | |
|
exp_pytrain.20260428133611.012_20260428_133611
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-28 13:37 | Success | - | |
|
exp_self.20260428131317.038_20260428_131317
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428131317.038 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 13:35 | Failed | NameError: name 'D_MODEL' is not defined | |
|
exp_self.20260428124536.037_20260428_124536
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428124536.037 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 13:07 | Failed | NameError: name 'D_MODEL' is not defined | |
|
exp_pytrain.20260428124255.011_20260428_124255
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-28 12:44 | Success | - | |
|
exp_self.20260428121459.036_20260428_121459
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428121459.036 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 12:36 | Failed | NameError: name 'D_MODEL' is not defined | |
|
exp_pytrain.20260428115034.010_20260428_115034
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-28 12:11 | Failed | Timeout while waiting for process shutdown | |
|
exp_self.20260428114309.035_20260428_114310
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428114309.035 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 11:44 | Success | - | |
|
exp_self.20260428113542.034_20260428_113542
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428113542.034 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 11:36 | Success | - | |
|
exp_self.20260428112759.033_20260428_112759
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428112759.033 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 11:29 | Success | - | |
|
exp_self.20260428112052.032_20260428_112053
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428112052.032 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 11:22 | Success | - | |
|
exp_pytrain.20260428111753.009_20260428_111753
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-28 11:18 | Success | - | |
|
exp_self.20260428111112.031_20260428_111112
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428111112.031 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 11:12 | Success | - | |
|
exp_self.20260428110353.030_20260428_110353
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428110353.030 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 11:05 | Success | - | |
|
exp_self.20260428105632.029_20260428_105633
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428105632.029 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 10:57 | Success | - | |
|
exp_self.20260428104908.028_20260428_104908
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428104908.028 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 10:50 | Success | - | |
|
exp_pytrain.20260428104549.008_20260428_104549
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-28 10:46 | Success | - | |
|
exp_self.20260428103915.027_20260428_103915
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428103915.027 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 10:40 | Success | - | |
|
exp_self.20260428103147.026_20260428_103147
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428103147.026 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 10:32 | Success | - | |
|
exp_self.20260428102407.025_20260428_102407
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428102407.025 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 10:25 | Success | - | |
|
exp_self.20260428101639.024_20260428_101639
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428101639.024 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 10:17 | Success | - | |
|
exp_pytrain.20260428101327.007_20260428_101328
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-28 10:14 | Success | - | |
|
exp_self.20260428100834.023_20260428_100834
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428100834.023 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 10:09 | Success | - | |
|
exp_self.20260428100108.022_20260428_100109
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428100108.022 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 10:02 | Success | - | |
|
exp_self.20260428095339.021_20260428_095339
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428095339.021 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 09:54 | Success | - | |
|
exp_self.20260428094609.020_20260428_094610
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428094609.020 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 09:47 | Success | - | |
|
exp_pytrain.20260428094056.006_20260428_094056
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-28 09:42 | Success | - | |
|
exp_self.20260428093607.019_20260428_093608
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428093607.019 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 09:38 | Success | - | |
|
exp_self.20260428092517.018_20260428_092518
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428092517.018 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 09:28 | Success | - | |
|
exp_self.20260428091435.017_20260428_091435
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428091435.017 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 09:17 | Success | - | |
|
exp_pytrain.20260428090851.005_20260428_090851
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-28 09:10 | Success | - | |
|
exp_self.20260428090628.016_20260428_090628
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428090628.016 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 09:07 | Success | - | |
|
exp_self.20260428085820.015_20260428_085821
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428085820.015 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 08:59 | Success | - | |
|
exp_self.20260428085012.014_20260428_085012
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428085012.014 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 08:51 | Success | - | |
|
exp_self.20260428084220.013_20260428_084221
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428084220.013 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 08:43 | Success | - | |
|
exp_hf_2604.23644_20260428_083911
|
RaV-IDP: A Reconstruction-as-Validation Framework for Faithful Intelligent Document Processing
Paper ID: hf_2604.23644 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-28 08:40 | Success | - | |
|
exp_pytrain.20260428083651.004_20260428_083652
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-28 08:37 | Success | - | |
|
exp_self.20260428083015.012_20260428_083015
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428083015.012 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 08:31 | Success | - | |
|
exp_hf_2604.17565_20260428_082520
|
UniGeo: Unifying Geometric Guidance for Camera-Controllable Image Editing via Video Models
Paper ID: hf_2604.17565 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-28 08:26 | Success | - | |
|
exp_self.20260428082259.011_20260428_082300
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428082259.011 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 08:24 | Success | - | |
|
exp_self.20260428081528.010_20260428_081529
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428081528.010 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 08:16 | Success | - | |
|
exp_self.20260428080802.009_20260428_080802
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428080802.009 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 08:09 | Success | - | |
|
exp_pytrain.20260428080505.003_20260428_080505
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-28 08:06 | Success | - | |
|
exp_self.20260428080054.008_20260428_080055
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428080054.008 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 08:01 | Success | - | |
|
exp_self.20260428075312.007_20260428_075312
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428075312.007 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 07:54 | Success | - | |
|
exp_self.20260428074536.006_20260428_074536
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428074536.006 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 07:46 | Success | - | |
|
exp_self.20260428073753.005_20260428_073754
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428073753.005 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 07:39 | Success | - | |
|
exp_pytrain.20260428073321.002_20260428_073321
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-28 07:34 | Success | - | |
|
exp_self.20260428073051.004_20260428_073051
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428073051.004 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 07:31 | Success | - | |
|
exp_hf_2604.22842_20260428_072732
|
EX-FIQA: Leveraging Intermediate Early eXit Representations from Vision Transformers for Face Image Quality Assessment
Paper ID: hf_2604.22842 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-28 07:28 | Success | - | |
|
exp_self.20260428072013.003_20260428_072013
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428072013.003 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 07:21 | Success | - | |
|
exp_self.20260428071235.002_20260428_071235
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428071235.002 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 07:13 | Success | - | |
|
exp_self.20260428070411.001_20260428_070412
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428070411.001 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 07:05 | Success | - | |
|
exp_pytrain.20260428070115.001_20260428_070116
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-28 07:02 | Success | - | |
|
exp_self.20260428035110.271_20260428_035111
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428035110.271 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 03:51 | Pending | - | |
|
exp_self.20260428034338.270_20260428_034338
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428034338.270 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 03:44 | Success | - | |
|
exp_pytrain.20260428034034.066_20260428_034034
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-28 03:41 | Success | - | |
|
exp_self.20260428033356.269_20260428_033356
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428033356.269 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 03:35 | Success | - | |
|
exp_self.20260428032633.268_20260428_032633
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428032633.268 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 03:27 | Success | - | |
|
exp_self.20260428031904.267_20260428_031904
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428031904.267 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 03:20 | Success | - | |
|
exp_self.20260428031129.266_20260428_031129
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428031129.266 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 03:12 | Success | - | |
|
exp_pytrain.20260428030825.065_20260428_030825
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-28 03:09 | Success | - | |
|
exp_self.20260428030126.265_20260428_030126
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428030126.265 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 03:02 | Success | - | |
|
exp_hf_2604.22841_20260428_025625
|
ATTN-FIQA: Interpretable Attention-based Face Image Quality Assessment with Vision Transformers
Paper ID: hf_2604.22841 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-28 02:57 | Success | - | |
|
exp_self.20260428025358.264_20260428_025358
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428025358.264 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 02:55 | Success | - | |
|
exp_hf_2604.23210_20260428_024857
|
Discovering Agentic Safety Specifications from 1-Bit Danger Signals
Paper ID: hf_2604.23210 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-28 02:50 | Success | - | |
|
exp_self.20260428024612.263_20260428_024612
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428024612.263 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 02:47 | Success | - | |
|
exp_self.20260428023851.262_20260428_023852
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428023851.262 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 02:40 | Success | - | |
|
exp_pytrain.20260428023535.064_20260428_023535
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-28 02:36 | Success | - | |
|
exp_self.20260428023133.261_20260428_023133
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428023133.261 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 02:32 | Success | - | |
|
exp_self.20260428022418.260_20260428_022418
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428022418.260 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 02:25 | Success | - | |
|
exp_self.20260428021640.259_20260428_021640
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428021640.259 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 02:17 | Success | - | |
|
exp_self.20260428020922.258_20260428_020922
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428020922.258 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 02:10 | Success | - | |
|
exp_pytrain.20260428020352.063_20260428_020353
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-28 02:05 | Success | - | |
|
exp_self.20260428020128.257_20260428_020129
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428020128.257 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 02:02 | Success | - | |
|
exp_self.20260428015354.256_20260428_015355
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428015354.256 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 01:55 | Success | - | |
|
exp_self.20260428014625.255_20260428_014625
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428014625.255 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 01:47 | Success | - | |
|
exp_cr_10.3389_fchem.2026.1834317_20260428_014316
|
CS-DTA: a language model-driven framework for robust drug-target affinity prediction under strict cold-start scenarios
Paper ID: cr_10.3389_fchem.2026.1834317 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Reco...
|
04-28 01:44 | Success | - | |
|
exp_hf_2508.10180_20260428_013945
|
For-Value: Efficient Forward-Only Data Valuation for finetuning LLMs and VLMs
Paper ID: hf_2508.10180 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-28 01:40 | Success | - | |
|
exp_self.20260428013438.254_20260428_013438
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428013438.254 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 01:35 | Success | - | |
|
exp_pytrain.20260428013147.062_20260428_013148
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-28 01:32 | Success | - | |
|
exp_self.20260428012549.253_20260428_012549
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428012549.253 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 01:26 | Success | - | |
|
exp_self.20260428011823.252_20260428_011823
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428011823.252 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 01:19 | Success | - | |
|
exp_self.20260428011105.251_20260428_011106
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428011105.251 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 01:12 | Success | - | |
|
exp_self.20260428010321.250_20260428_010321
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428010321.250 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 01:04 | Success | - | |
|
exp_pytrain.20260428010017.061_20260428_010017
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-28 01:01 | Success | - | |
|
exp_self.20260428005332.249_20260428_005332
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428005332.249 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 00:54 | Success | - | |
|
exp_self.20260428004606.248_20260428_004607
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428004606.248 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 00:47 | Success | - | |
|
exp_self.20260428003851.247_20260428_003851
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428003851.247 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 00:39 | Success | - | |
|
exp_self.20260428003112.246_20260428_003112
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428003112.246 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 00:32 | Success | - | |
|
exp_pytrain.20260428002802.060_20260428_002803
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-28 00:29 | Success | - | |
|
exp_hf_2604.21480_20260428_002454
|
Efficient Agent Evaluation via Diversity-Guided User Simulation
Paper ID: hf_2604.21480 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-28 00:26 | Success | - | |
|
exp_self.20260428002113.245_20260428_002114
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428002113.245 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 00:22 | Success | - | |
|
exp_self.20260428001403.244_20260428_001404
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428001403.244 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 00:15 | Success | - | |
|
exp_self.20260428000617.243_20260428_000618
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260428000617.243 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-28 00:07 | Success | - | |
|
exp_self.20260427235833.242_20260427_235834
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427235833.242 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 23:59 | Success | - | |
|
exp_pytrain.20260427235539.059_20260427_235539
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 23:56 | Success | - | |
|
exp_self.20260427234852.241_20260427_234853
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427234852.241 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 23:49 | Success | - | |
|
exp_self.20260427234117.240_20260427_234118
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427234117.240 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 23:42 | Success | - | |
|
exp_hf_2604.23775_20260427_233739
|
Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms
Paper ID: hf_2604.23775 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-27 23:38 | Success | - | |
|
exp_self.20260427233359.239_20260427_233359
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427233359.239 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 23:35 | Success | - | |
|
exp_self.20260427232656.238_20260427_232656
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427232656.238 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 23:27 | Success | - | |
|
exp_pytrain.20260427232358.058_20260427_232359
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 23:25 | Success | - | |
|
exp_self.20260427231909.237_20260427_231910
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427231909.237 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 23:20 | Success | - | |
|
exp_hf_2604.24300_20260427_231404
|
ReVSI: Rebuilding Visual Spatial Intelligence Evaluation for Accurate Assessment of VLM 3D Reasoning
Paper ID: hf_2604.24300 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-27 23:15 | Success | - | |
|
exp_self.20260427231151.236_20260427_231151
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427231151.236 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 23:12 | Success | - | |
|
exp_self.20260427230423.235_20260427_230423
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427230423.235 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 23:05 | Success | - | |
|
exp_self.20260427225714.234_20260427_225714
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427225714.234 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 22:58 | Success | - | |
|
exp_pytrain.20260427225153.057_20260427_225154
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 22:52 | Success | - | |
|
exp_self.20260427224938.233_20260427_224938
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427224938.233 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 22:50 | Success | - | |
|
exp_hf_2604.23099_20260427_224605
|
ProEval: Proactive Failure Discovery and Efficient Performance Estimation for Generative AI Evaluation
Paper ID: hf_2604.23099 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-27 22:47 | Success | - | |
|
exp_self.20260427223933.232_20260427_223934
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427223933.232 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 22:40 | Success | - | |
|
exp_self.20260427223222.231_20260427_223223
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427223222.231 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 22:33 | Success | - | |
|
exp_hf_2604.24003_20260427_222907
|
Stabilizing Efficient Reasoning with Step-Level Advantage Selection
Paper ID: hf_2604.24003 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-27 22:30 | Success | - | |
|
exp_self.20260427222236.230_20260427_222236
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427222236.230 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 22:23 | Success | - | |
|
exp_pytrain.20260427221947.056_20260427_221948
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 22:20 | Success | - | |
|
exp_self.20260427221458.229_20260427_221459
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427221458.229 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 22:16 | Success | - | |
|
exp_self.20260427220750.228_20260427_220750
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427220750.228 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 22:08 | Success | - | |
|
exp_2604.24645v1_20260427_220414
|
K-MetBench: A Multi-Dimensional Benchmark for Fine-Grained Evaluation of Expert Reasoning, Locality, and Multimodality i...
Paper ID: 2604.24645v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-27 22:05 | Success | - | |
|
exp_self.20260427220039.227_20260427_220039
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427220039.227 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 22:01 | Success | - | |
|
exp_self.20260427215320.226_20260427_215320
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427215320.226 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 21:54 | Success | - | |
|
exp_2604.24647v1_20260427_215021
|
DepthKV: Layer-Dependent KV Cache Pruning for Long-Context LLM Inference
Paper ID: 2604.24647v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-27 21:51 | Success | - | |
|
exp_pytrain.20260427214806.055_20260427_214806
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 21:49 | Success | - | |
|
exp_self.20260427214558.225_20260427_214558
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427214558.225 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 21:47 | Success | - | |
|
exp_self.20260427213912.224_20260427_213913
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427213912.224 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 21:40 | Success | - | |
|
exp_self.20260427213206.223_20260427_213206
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427213206.223 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 21:33 | Success | - | |
|
exp_self.20260427212456.222_20260427_212457
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427212456.222 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 21:25 | Success | - | |
|
exp_self.20260427211731.221_20260427_211732
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427211731.221 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 21:18 | Success | - | |
|
exp_pytrain.20260427211438.054_20260427_211438
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 21:15 | Success | - | |
|
exp_self.20260427210818.220_20260427_210818
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427210818.220 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 21:09 | Success | - | |
|
exp_self.20260427210114.219_20260427_210114
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427210114.219 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 21:02 | Success | - | |
|
exp_self.20260427205406.218_20260427_205407
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427205406.218 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 20:55 | Success | - | |
|
exp_self.20260427204706.217_20260427_204706
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427204706.217 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 20:48 | Success | - | |
|
exp_pytrain.20260427204257.053_20260427_204257
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 20:43 | Success | - | |
|
exp_self.20260427203943.216_20260427_203944
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427203943.216 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 20:40 | Success | - | |
|
exp_self.20260427203224.215_20260427_203224
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427203224.215 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 20:33 | Success | - | |
|
exp_self.20260427202501.214_20260427_202502
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427202501.214 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 20:26 | Success | - | |
|
exp_self.20260427201618.213_20260427_201619
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427201618.213 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 20:17 | Success | - | |
|
exp_pytrain.20260427201112.052_20260427_201113
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 20:12 | Success | - | |
|
exp_self.20260427200902.212_20260427_200903
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427200902.212 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 20:10 | Success | - | |
|
exp_self.20260427200144.211_20260427_200145
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427200144.211 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 20:02 | Success | - | |
|
exp_self.20260427195445.210_20260427_195446
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427195445.210 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 19:55 | Success | - | |
|
exp_self.20260427194746.209_20260427_194746
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427194746.209 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 19:48 | Success | - | |
|
exp_gh_Keeterete513_llm-model-search-recommendation_20260427_194300
|
Keeterete513/llm-model-search-recommendation
Paper ID: gh_Keeterete513_llm-model-search-recommendation - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Exp...
|
04-27 19:44 | Success | - | |
|
exp_self.20260427194043.208_20260427_194044
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427194043.208 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 19:41 | Success | - | |
|
exp_pytrain.20260427193752.051_20260427_193752
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 19:38 | Success | - | |
|
exp_self.20260427193312.207_20260427_193313
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427193312.207 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 19:34 | Success | - | |
|
exp_self.20260427192545.206_20260427_192546
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427192545.206 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 19:26 | Success | - | |
|
exp_self.20260427191853.205_20260427_191853
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427191853.205 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 19:19 | Success | - | |
|
exp_self.20260427191138.204_20260427_191139
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427191138.204 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 19:12 | Success | - | |
|
exp_pytrain.20260427190618.050_20260427_190618
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 19:07 | Success | - | |
|
exp_self.20260427190406.203_20260427_190406
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427190406.203 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 19:05 | Success | - | |
|
exp_self.20260427185707.202_20260427_185707
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427185707.202 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 18:58 | Success | - | |
|
exp_self.20260427185002.201_20260427_185003
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427185002.201 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 18:51 | Success | - | |
|
exp_self.20260427184254.200_20260427_184255
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427184254.200 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 18:43 | Success | - | |
|
exp_self.20260427183558.199_20260427_183559
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427183558.199 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 18:37 | Success | - | |
|
exp_pytrain.20260427183321.049_20260427_183321
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 18:34 | Success | - | |
|
exp_self.20260427182538.198_20260427_182539
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427182538.198 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 18:26 | Success | - | |
|
exp_self.20260427181835.197_20260427_181835
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427181835.197 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 18:19 | Success | - | |
|
exp_self.20260427181120.196_20260427_181121
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427181120.196 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 18:12 | Success | - | |
|
exp_self.20260427180424.195_20260427_180424
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427180424.195 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 18:05 | Success | - | |
|
exp_pytrain.20260427180148.048_20260427_180148
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 18:02 | Success | - | |
|
exp_self.20260427175525.194_20260427_175525
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427175525.194 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 17:56 | Success | - | |
|
exp_self.20260427174723.193_20260427_174724
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427174723.193 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 17:48 | Success | - | |
|
exp_self.20260427174020.192_20260427_174020
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427174020.192 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 17:41 | Success | - | |
|
exp_self.20260427173302.191_20260427_173303
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427173302.191 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 17:34 | Success | - | |
|
exp_pytrain.20260427173011.047_20260427_173012
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 17:31 | Success | - | |
|
exp_self.20260427172355.190_20260427_172355
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427172355.190 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 17:24 | Success | - | |
|
exp_self.20260427171608.189_20260427_171608
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427171608.189 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 17:17 | Success | - | |
|
exp_self.20260427170812.188_20260427_170812
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427170812.188 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 17:09 | Success | - | |
|
exp_self.20260427165949.187_20260427_165950
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427165949.187 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 17:00 | Success | - | |
|
exp_pytrain.20260427165721.046_20260427_165721
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 16:58 | Success | - | |
|
exp_self.20260427165033.186_20260427_165034
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427165033.186 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 16:51 | Success | - | |
|
exp_self.20260427164333.185_20260427_164334
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427164333.185 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 16:44 | Success | - | |
|
exp_self.20260427163647.184_20260427_163648
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427163647.184 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 16:37 | Success | - | |
|
exp_self.20260427162943.183_20260427_162944
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427162943.183 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 16:30 | Success | - | |
|
exp_pytrain.20260427162433.045_20260427_162434
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 16:25 | Success | - | |
|
exp_self.20260427162232.182_20260427_162233
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427162232.182 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 16:23 | Success | - | |
|
exp_self.20260427161518.181_20260427_161518
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427161518.181 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 16:16 | Success | - | |
|
exp_self.20260427160833.180_20260427_160834
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427160833.180 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 16:09 | Success | - | |
|
exp_self.20260427160146.179_20260427_160147
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427160146.179 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 16:02 | Success | - | |
|
exp_self.20260427155456.178_20260427_155456
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427155456.178 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 15:55 | Success | - | |
|
exp_pytrain.20260427155204.044_20260427_155204
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 15:53 | Success | - | |
|
exp_self.20260427154548.177_20260427_154549
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427154548.177 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 15:46 | Success | - | |
|
exp_self.20260427153845.176_20260427_153845
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427153845.176 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 15:39 | Success | - | |
|
exp_self.20260427153144.175_20260427_153145
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427153144.175 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 15:32 | Success | - | |
|
exp_self.20260427152450.174_20260427_152451
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427152450.174 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 15:25 | Success | - | |
|
exp_pytrain.20260427151938.043_20260427_151939
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 15:20 | Success | - | |
|
exp_self.20260427151737.173_20260427_151737
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427151737.173 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 15:18 | Success | - | |
|
exp_self.20260427151039.172_20260427_151039
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427151039.172 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 15:11 | Success | - | |
|
exp_self.20260427150354.171_20260427_150354
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427150354.171 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 15:04 | Success | - | |
|
exp_self.20260427145704.170_20260427_145705
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427145704.170 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 14:58 | Success | - | |
|
exp_self.20260427145002.169_20260427_145002
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427145002.169 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 14:51 | Success | - | |
|
exp_pytrain.20260427144717.042_20260427_144717
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 14:48 | Success | - | |
|
exp_self.20260427144241.168_20260427_144242
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427144241.168 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 14:43 | Success | - | |
|
exp_self.20260427143525.167_20260427_143525
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427143525.167 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 14:36 | Success | - | |
|
exp_self.20260427142825.166_20260427_142825
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427142825.166 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 14:29 | Success | - | |
|
exp_self.20260427142118.165_20260427_142119
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427142118.165 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 14:22 | Success | - | |
|
exp_pytrain.20260427141543.041_20260427_141543
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 14:16 | Success | - | |
|
exp_self.20260427141333.164_20260427_141334
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427141333.164 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 14:14 | Success | - | |
|
exp_self.20260427140622.163_20260427_140622
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427140622.163 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 14:07 | Success | - | |
|
exp_self.20260427135918.162_20260427_135918
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427135918.162 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 14:00 | Success | - | |
|
exp_self.20260427135206.161_20260427_135217
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427135206.161 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 13:53 | Success | - | |
|
exp_self.20260427134518.160_20260427_134519
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427134518.160 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 13:46 | Success | - | |
|
exp_pytrain.20260427134233.040_20260427_134233
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 13:43 | Success | - | |
|
exp_self.20260427133627.159_20260427_133627
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427133627.159 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 13:37 | Success | - | |
|
exp_self.20260427132926.158_20260427_132926
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427132926.158 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 13:30 | Success | - | |
|
exp_self.20260427132234.157_20260427_132234
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427132234.157 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 13:23 | Success | - | |
|
exp_self.20260427131544.156_20260427_131545
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427131544.156 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 13:16 | Success | - | |
|
exp_pytrain.20260427131032.039_20260427_131032
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 13:11 | Success | - | |
|
exp_self.20260427130821.155_20260427_130821
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427130821.155 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 13:09 | Success | - | |
|
exp_self.20260427130111.154_20260427_130112
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427130111.154 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 13:02 | Success | - | |
|
exp_self.20260427125401.153_20260427_125401
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427125401.153 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 12:55 | Success | - | |
|
exp_self.20260427124714.152_20260427_124715
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427124714.152 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 12:48 | Success | - | |
|
exp_self.20260427124030.151_20260427_124030
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427124030.151 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 12:41 | Success | - | |
|
exp_pytrain.20260427123742.038_20260427_123751
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 12:38 | Success | - | |
|
exp_self.20260427121526.150_20260427_121527
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427121526.150 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 12:16 | Success | - | |
|
exp_self.20260427120753.149_20260427_120754
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427120753.149 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 12:08 | Success | - | |
|
exp_self.20260427120023.148_20260427_120023
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427120023.148 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 12:01 | Success | - | |
|
exp_pytrain.20260427115748.037_20260427_115748
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 11:58 | Success | - | |
|
exp_self.20260427115057.147_20260427_115057
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427115057.147 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 11:51 | Success | - | |
|
exp_self.20260427114321.146_20260427_114322
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427114321.146 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 11:44 | Success | - | |
|
exp_self.20260427113546.145_20260427_113546
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427113546.145 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 11:36 | Success | - | |
|
exp_self.20260427112818.144_20260427_112818
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427112818.144 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 11:29 | Success | - | |
|
exp_pytrain.20260427112545.036_20260427_112546
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 11:26 | Success | - | |
|
exp_self.20260427111853.143_20260427_111853
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427111853.143 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 11:19 | Success | - | |
|
exp_self.20260427111114.142_20260427_111115
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427111114.142 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 11:12 | Success | - | |
|
exp_self.20260427110339.141_20260427_110339
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427110339.141 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 11:04 | Success | - | |
|
exp_self.20260427105603.140_20260427_105604
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427105603.140 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 10:57 | Success | - | |
|
exp_pytrain.20260427105336.035_20260427_105336
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 10:54 | Success | - | |
|
exp_self.20260427104635.139_20260427_104636
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427104635.139 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 10:47 | Success | - | |
|
exp_self.20260427103858.138_20260427_103858
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427103858.138 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 10:40 | Success | - | |
|
exp_self.20260427103110.137_20260427_103111
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427103110.137 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 10:32 | Success | - | |
|
exp_self.20260427102330.136_20260427_102330
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427102330.136 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 10:24 | Success | - | |
|
exp_pytrain.20260427102102.034_20260427_102102
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 10:22 | Success | - | |
|
exp_self.20260427101355.135_20260427_101356
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427101355.135 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 10:14 | Success | - | |
|
exp_self.20260427100617.134_20260427_100617
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427100617.134 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 10:07 | Success | - | |
|
exp_self.20260427095836.133_20260427_095836
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427095836.133 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 09:59 | Success | - | |
|
exp_self.20260427095059.132_20260427_095100
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427095059.132 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 09:52 | Success | - | |
|
exp_pytrain.20260427094833.033_20260427_094834
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 09:49 | Success | - | |
|
exp_self.20260427094123.131_20260427_094124
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427094123.131 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 09:42 | Success | - | |
|
exp_self.20260427093349.130_20260427_093349
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427093349.130 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 09:34 | Success | - | |
|
exp_self.20260427092615.129_20260427_092615
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427092615.129 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 09:27 | Success | - | |
|
exp_self.20260427091835.128_20260427_091835
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427091835.128 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 09:19 | Success | - | |
|
exp_pytrain.20260427091608.032_20260427_091608
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 09:17 | Success | - | |
|
exp_self.20260427090902.127_20260427_090902
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427090902.127 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 09:10 | Success | - | |
|
exp_self.20260427090133.126_20260427_090133
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427090133.126 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 09:02 | Success | - | |
|
exp_self.20260427085404.125_20260427_085404
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427085404.125 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 08:55 | Success | - | |
|
exp_self.20260427084627.124_20260427_084627
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427084627.124 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 08:47 | Success | - | |
|
exp_pytrain.20260427084400.031_20260427_084401
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 08:45 | Success | - | |
|
exp_self.20260427083940.123_20260427_083941
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427083940.123 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 08:40 | Success | - | |
|
exp_self.20260427083159.122_20260427_083200
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427083159.122 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 08:33 | Success | - | |
|
exp_self.20260427082419.121_20260427_082420
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427082419.121 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 08:25 | Success | - | |
|
exp_self.20260427081633.120_20260427_081635
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427081633.120 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 08:17 | Success | - | |
|
exp_pytrain.20260427081242.030_20260427_081243
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 08:13 | Success | - | |
|
exp_self.20260427080931.119_20260427_080931
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427080931.119 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 08:10 | Success | - | |
|
exp_hf_2604.22085_20260427_080627
|
Memanto: Typed Semantic Memory with Information-Theoretic Retrieval for Long-Horizon Agents
Paper ID: hf_2604.22085 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-27 08:07 | Success | - | |
|
exp_self.20260427075841.118_20260427_075842
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427075841.118 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 08:01 | Success | - | |
|
exp_self.20260427075026.117_20260427_075027
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427075026.117 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 07:51 | Success | - | |
|
exp_self.20260427074258.116_20260427_074258
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427074258.116 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 07:44 | Success | - | |
|
exp_pytrain.20260427074035.029_20260427_074036
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 07:41 | Success | - | |
|
exp_self.20260427073337.115_20260427_073337
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427073337.115 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 07:34 | Success | - | |
|
exp_self.20260427072608.114_20260427_072608
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427072608.114 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 07:27 | Success | - | |
|
exp_self.20260427071826.113_20260427_071827
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427071826.113 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 07:19 | Success | - | |
|
exp_self.20260427071041.112_20260427_071041
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427071041.112 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 07:11 | Success | - | |
|
exp_pytrain.20260427070820.028_20260427_070820
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 07:09 | Success | - | |
|
exp_self.20260427070122.111_20260427_070122
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427070122.111 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 07:02 | Success | - | |
|
exp_self.20260427065356.110_20260427_065357
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427065356.110 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 06:54 | Success | - | |
|
exp_self.20260427064626.109_20260427_064626
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427064626.109 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 06:47 | Success | - | |
|
exp_self.20260427063900.108_20260427_063901
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427063900.108 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 06:40 | Success | - | |
|
exp_pytrain.20260427063639.027_20260427_063640
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 06:37 | Success | - | |
|
exp_self.20260427062939.107_20260427_062939
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427062939.107 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 06:30 | Success | - | |
|
exp_self.20260427062210.106_20260427_062210
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427062210.106 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 06:23 | Success | - | |
|
exp_self.20260427061445.105_20260427_061446
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427061445.105 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 06:15 | Success | - | |
|
exp_self.20260427060749.104_20260427_060750
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427060749.104 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 06:08 | Success | - | |
|
exp_pytrain.20260427060520.026_20260427_060521
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 06:06 | Success | - | |
|
exp_self.20260427055835.103_20260427_055836
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427055835.103 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 05:59 | Success | - | |
|
exp_self.20260427055105.102_20260427_055105
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427055105.102 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 05:52 | Success | - | |
|
exp_self.20260427054340.101_20260427_054340
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427054340.101 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 05:44 | Success | - | |
|
exp_self.20260427053618.100_20260427_053619
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427053618.100 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 05:37 | Success | - | |
|
exp_pytrain.20260427053352.025_20260427_053353
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 05:34 | Success | - | |
|
exp_self.20260427052802.099_20260427_052802
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427052802.099 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 05:29 | Success | - | |
|
exp_self.20260427052040.098_20260427_052041
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427052040.098 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 05:21 | Success | - | |
|
exp_self.20260427051316.097_20260427_051316
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427051316.097 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 05:14 | Success | - | |
|
exp_self.20260427050545.096_20260427_050545
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427050545.096 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 05:06 | Success | - | |
|
exp_pytrain.20260427050214.024_20260427_050215
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 05:03 | Success | - | |
|
exp_self.20260427045903.095_20260427_045903
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427045903.095 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 05:00 | Success | - | |
|
exp_self.20260427045140.094_20260427_045141
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427045140.094 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 04:52 | Success | - | |
|
exp_self.20260427044419.093_20260427_044419
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427044419.093 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 04:45 | Success | - | |
|
exp_self.20260427043655.092_20260427_043655
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427043655.092 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 04:37 | Success | - | |
|
exp_pytrain.20260427043041.023_20260427_043041
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 04:31 | Success | - | |
|
exp_self.20260427042847.091_20260427_042848
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427042847.091 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 04:29 | Success | - | |
|
exp_hf_2604.21718_20260427_042315
|
Building a Precise Video Language with Human-AI Oversight
Paper ID: hf_2604.21718 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-27 04:24 | Success | - | |
|
exp_self.20260427042117.090_20260427_042118
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427042117.090 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 04:22 | Success | - | |
|
exp_self.20260427041353.089_20260427_041354
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427041353.089 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 04:14 | Success | - | |
|
exp_cr_10.1007_s11831-026-10598-4_20260427_040933
|
Building Expert Small Models: A Comprehensive Survey of Model Compression, Knowledge Distillation, and Augmented Inferen...
Paper ID: cr_10.1007_s11831-026-10598-4 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Reco...
|
04-27 04:10 | Success | - | |
|
exp_self.20260427040714.088_20260427_040714
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427040714.088 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 04:08 | Success | - | |
|
exp_self.20260427035947.087_20260427_035947
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427035947.087 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 04:00 | Success | - | |
|
exp_pytrain.20260427035726.022_20260427_035727
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 03:58 | Success | - | |
|
exp_self.20260427035034.086_20260427_035035
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427035034.086 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 03:51 | Success | - | |
|
exp_self.20260427034310.085_20260427_034311
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427034310.085 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 03:44 | Success | - | |
|
exp_self.20260427033544.084_20260427_033544
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427033544.084 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 03:36 | Success | - | |
|
exp_self.20260427032822.083_20260427_032822
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427032822.083 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 03:29 | Success | - | |
|
exp_pytrain.20260427032601.021_20260427_032601
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 03:27 | Success | - | |
|
exp_self.20260427031905.082_20260427_031905
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427031905.082 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 03:20 | Success | - | |
|
exp_self.20260427031144.081_20260427_031144
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427031144.081 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 03:12 | Success | - | |
|
exp_self.20260427030425.080_20260427_030425
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427030425.080 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 03:05 | Success | - | |
|
exp_self.20260427025659.079_20260427_025700
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427025659.079 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 02:58 | Success | - | |
|
exp_pytrain.20260427025436.020_20260427_025437
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 02:55 | Success | - | |
|
exp_self.20260427024739.078_20260427_024739
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427024739.078 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 02:48 | Success | - | |
|
exp_self.20260427024021.077_20260427_024021
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427024021.077 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 02:41 | Success | - | |
|
exp_cr_10.1007_s40864-026-00269-9_20260427_023706
|
Train Slide Prediction and Risk Assessment Using Vehicle-Signal Data: A Data-Model Fusion Method
Paper ID: cr_10.1007_s40864-026-00269-9 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Reco...
|
04-27 02:38 | Success | - | |
|
exp_self.20260427023250.076_20260427_023251
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427023250.076 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 02:33 | Success | - | |
|
exp_self.20260427022533.075_20260427_022534
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427022533.075 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 02:26 | Success | - | |
|
exp_pytrain.20260427022306.019_20260427_022306
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 02:24 | Success | - | |
|
exp_self.20260427021620.074_20260427_021620
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427021620.074 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 02:17 | Success | - | |
|
exp_self.20260427020850.073_20260427_020851
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427020850.073 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 02:09 | Success | - | |
|
exp_self.20260427020127.072_20260427_020127
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427020127.072 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 02:02 | Success | - | |
|
exp_self.20260427015407.071_20260427_015407
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427015407.071 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 01:55 | Success | - | |
|
exp_pytrain.20260427015145.018_20260427_015146
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 01:52 | Success | - | |
|
exp_self.20260427014457.070_20260427_014458
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427014457.070 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 01:46 | Success | - | |
|
exp_self.20260427013738.069_20260427_013738
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427013738.069 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 01:38 | Success | - | |
|
exp_self.20260427013012.068_20260427_013013
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427013012.068 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 01:31 | Success | - | |
|
exp_self.20260427012251.067_20260427_012251
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427012251.067 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 01:23 | Success | - | |
|
exp_pytrain.20260427011919.017_20260427_011919
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 01:20 | Success | - | |
|
exp_self.20260427011511.066_20260427_011512
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427011511.066 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 01:16 | Success | - | |
|
exp_self.20260427010748.065_20260427_010749
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427010748.065 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 01:08 | Success | - | |
|
exp_self.20260427010023.064_20260427_010023
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427010023.064 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 01:01 | Success | - | |
|
exp_self.20260427005303.063_20260427_005304
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427005303.063 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 00:54 | Success | - | |
|
exp_pytrain.20260427004721.016_20260427_004722
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 00:48 | Success | - | |
|
exp_self.20260427004529.062_20260427_004529
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427004529.062 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 00:46 | Success | - | |
|
exp_self.20260427003811.061_20260427_003811
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427003811.061 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 00:39 | Success | - | |
|
exp_self.20260427003056.060_20260427_003056
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427003056.060 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 00:31 | Success | - | |
|
exp_self.20260427002338.059_20260427_002339
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427002338.059 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 00:24 | Success | - | |
|
exp_self.20260427001617.058_20260427_001617
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427001617.058 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 00:17 | Success | - | |
|
exp_pytrain.20260427001353.015_20260427_001353
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-27 00:14 | Success | - | |
|
exp_self.20260427000659.057_20260427_000700
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260427000659.057 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 00:08 | Success | - | |
|
exp_cr_10.3897_jucs.160588_20260427_000351
|
Duygu-Turk: A Context-Aware Sentiment Analysis Framework for Turkish, Based on Plutchik&rsquo;s Emotion Model
Paper ID: cr_10.3897_jucs.160588 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered b...
|
04-27 00:04 | Success | - | |
|
exp_self.20260426235935.056_20260426_235935
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426235935.056 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-27 00:00 | Success | - | |
|
exp_self.20260426235215.055_20260426_235215
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426235215.055 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 23:53 | Success | - | |
|
exp_hf_2604.22294_20260426_234757
|
Contexts are Never Long Enough: Structured Reasoning for Scalable Question Answering over Long Document Sets
Paper ID: hf_2604.22294 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-26 23:48 | Success | - | |
|
exp_self.20260426234451.054_20260426_234452
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426234451.054 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 23:45 | Success | - | |
|
exp_pytrain.20260426234228.014_20260426_234229
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 23:43 | Success | - | |
|
exp_self.20260426233535.053_20260426_233536
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426233535.053 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 23:36 | Success | - | |
|
exp_self.20260426232816.052_20260426_232817
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426232816.052 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 23:29 | Success | - | |
|
exp_self.20260426232057.051_20260426_232057
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426232057.051 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 23:21 | Success | - | |
|
exp_hf_2604.18580_20260426_231743
|
Sessa: Selective State Space Attention
Paper ID: hf_2604.18580 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-26 23:18 | Success | - | |
|
exp_self.20260426231330.050_20260426_231330
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426231330.050 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 23:14 | Success | - | |
|
exp_pytrain.20260426231109.013_20260426_231109
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 23:12 | Success | - | |
|
exp_self.20260426230421.049_20260426_230421
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426230421.049 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 23:05 | Success | - | |
|
exp_hf_2604.22586_20260426_230106
|
FlowAnchor: Stabilizing the Editing Signal for Inversion-Free Video Editing
Paper ID: hf_2604.22586 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-26 23:02 | Success | - | |
|
exp_self.20260426225652.048_20260426_225652
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426225652.048 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 22:57 | Success | - | |
|
exp_self.20260426224931.047_20260426_224931
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426224931.047 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 22:50 | Success | - | |
|
exp_self.20260426224210.046_20260426_224211
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426224210.046 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 22:43 | Success | - | |
|
exp_pytrain.20260426223943.012_20260426_223943
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 22:40 | Success | - | |
|
exp_self.20260426223531.045_20260426_223532
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426223531.045 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 22:36 | Success | - | |
|
exp_self.20260426222812.044_20260426_222813
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426222812.044 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 22:29 | Success | - | |
|
exp_self.20260426222046.043_20260426_222046
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426222046.043 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 22:21 | Success | - | |
|
exp_hf_2604.16353_20260426_221752
|
AgriIR: A Scalable Framework for Domain-Specific Knowledge Retrieval
Paper ID: hf_2604.16353 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-26 22:18 | Success | - | |
|
exp_self.20260426221045.042_20260426_221045
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426221045.042 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 22:11 | Success | - | |
|
exp_pytrain.20260426220815.011_20260426_220816
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 22:09 | Success | - | |
|
exp_self.20260426220117.041_20260426_220117
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426220117.041 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 22:02 | Success | - | |
|
exp_self.20260426215354.040_20260426_215354
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426215354.040 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 21:54 | Success | - | |
|
exp_self.20260426214621.039_20260426_214622
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426214621.039 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 21:47 | Success | - | |
|
exp_self.20260426213848.038_20260426_213848
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426213848.038 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 21:39 | Success | - | |
|
exp_pytrain.20260426213619.010_20260426_213619
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 21:37 | Success | - | |
|
exp_self.20260426212919.037_20260426_212919
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426212919.037 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 21:30 | Success | - | |
|
exp_self.20260426212143.036_20260426_212143
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426212143.036 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 21:22 | Success | - | |
|
exp_hf_2604.08645_20260426_211825
|
3D-VCD: Hallucination Mitigation in 3D-LLM Embodied Agents through Visual Contrastive Decoding
Paper ID: hf_2604.08645 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-26 21:19 | Success | - | |
|
exp_self.20260426211402.035_20260426_211402
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426211402.035 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 21:15 | Success | - | |
|
exp_self.20260426210634.034_20260426_210635
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426210634.034 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 21:07 | Success | - | |
|
exp_pytrain.20260426210402.009_20260426_210402
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 21:05 | Success | - | |
|
exp_hf_2604.18519_20260426_210116
|
LLM Safety From Within: Detecting Harmful Content with Internal Representations
Paper ID: hf_2604.18519 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-26 21:02 | Success | - | |
|
exp_self.20260426205759.033_20260426_205800
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426205759.033 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 20:59 | Success | - | |
|
exp_2604.22750v1_20260426_205443
|
How Do AI Agents Spend Your Money? Analyzing and Predicting Token Consumption in Agentic Coding Tasks
Paper ID: 2604.22750v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-26 20:55 | Success | - | |
|
exp_hf_2604.22152_20260426_205113
|
dWorldEval: Scalable Robotic Policy Evaluation via Discrete Diffusion World Model
Paper ID: hf_2604.22152 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-26 20:52 | Success | - | |
|
exp_self.20260426204912.032_20260426_204912
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426204912.032 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 20:50 | Success | - | |
|
exp_self.20260426204139.031_20260426_204140
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426204139.031 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 20:42 | Success | - | |
|
exp_self.20260426203359.030_20260426_203359
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426203359.030 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 20:35 | Success | - | |
|
exp_pytrain.20260426203131.008_20260426_203131
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 20:32 | Success | - | |
|
exp_self.20260426202432.029_20260426_202433
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426202432.029 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 20:25 | Success | - | |
|
exp_self.20260426201702.028_20260426_201702
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426201702.028 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 20:18 | Success | - | |
|
exp_self.20260426200935.027_20260426_200936
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426200935.027 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 20:10 | Success | - | |
|
exp_self.20260426200206.026_20260426_200207
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426200206.026 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 20:03 | Success | - | |
|
exp_pytrain.20260426195935.007_20260426_195936
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 20:00 | Success | - | |
|
exp_self.20260426195228.025_20260426_195228
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426195228.025 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 19:53 | Success | - | |
|
exp_self.20260426194503.024_20260426_194503
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426194503.024 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 19:46 | Success | - | |
|
exp_self.20260426193735.023_20260426_193736
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426193735.023 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 19:38 | Success | - | |
|
exp_self.20260426193006.022_20260426_193007
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426193006.022 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 19:31 | Success | - | |
|
exp_pytrain.20260426192736.006_20260426_192737
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 19:28 | Success | - | |
|
exp_self.20260426192035.021_20260426_192035
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426192035.021 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 19:21 | Success | - | |
|
exp_self.20260426191310.020_20260426_191310
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426191310.020 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 19:14 | Success | - | |
|
exp_self.20260426190537.019_20260426_190538
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426190537.019 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 19:06 | Success | - | |
|
exp_self.20260426185802.018_20260426_185802
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426185802.018 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 18:59 | Success | - | |
|
exp_pytrain.20260426185529.005_20260426_185529
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 18:56 | Success | - | |
|
exp_self.20260426184822.017_20260426_184822
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426184822.017 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 18:49 | Success | - | |
|
exp_self.20260426184051.016_20260426_184052
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426184051.016 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 18:41 | Success | - | |
|
exp_self.20260426183320.015_20260426_183320
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426183320.015 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 18:34 | Success | - | |
|
exp_self.20260426182541.014_20260426_182542
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426182541.014 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 18:26 | Success | - | |
|
exp_pytrain.20260426182252.004_20260426_182252
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 18:23 | Success | - | |
|
exp_self.20260426181823.013_20260426_181824
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426181823.013 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 18:19 | Success | - | |
|
exp_self.20260426181030.012_20260426_181030
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426181030.012 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 18:11 | Success | - | |
|
exp_self.20260426180253.011_20260426_180254
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426180253.011 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 18:04 | Success | - | |
|
exp_self.20260426175515.010_20260426_175515
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426175515.010 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 17:56 | Success | - | |
|
exp_pytrain.20260426175125.003_20260426_175125
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 17:52 | Success | - | |
|
exp_self.20260426174800.009_20260426_174801
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426174800.009 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 17:49 | Success | - | |
|
exp_self.20260426174014.008_20260426_174014
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426174014.008 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 17:41 | Success | - | |
|
exp_self.20260426173236.007_20260426_173237
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426173236.007 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 17:33 | Success | - | |
|
exp_self.20260426172503.006_20260426_172504
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426172503.006 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 17:26 | Success | - | |
|
exp_pytrain.20260426171918.002_20260426_171918
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 17:20 | Success | - | |
|
exp_self.20260426171725.005_20260426_171725
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426171725.005 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 17:18 | Success | - | |
|
exp_self.20260426171007.004_20260426_171008
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426171007.004 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 17:11 | Success | - | |
|
exp_self.20260426170246.003_20260426_170246
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426170246.003 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 17:03 | Success | - | |
|
exp_self.20260426165528.002_20260426_165528
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426165528.002 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 16:56 | Success | - | |
|
exp_self.20260426164809.001_20260426_164810
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426164809.001 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 16:49 | Success | - | |
|
exp_pytrain.20260426164548.001_20260426_164548
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 16:46 | Success | - | |
|
exp_self.20260426163845.034_20260426_163846
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426163845.034 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 16:39 | Success | - | |
|
exp_pytrain.20260426163546.009_20260426_163547
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 16:37 | Success | - | |
|
exp_self.20260426162851.033_20260426_162851
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426162851.033 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 16:29 | Success | - | |
|
exp_self.20260426162114.032_20260426_162115
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426162114.032 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 16:22 | Success | - | |
|
exp_self.20260426161348.031_20260426_161348
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426161348.031 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 16:14 | Success | - | |
|
exp_self.20260426160625.030_20260426_160626
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426160625.030 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 16:07 | Success | - | |
|
exp_pytrain.20260426160348.008_20260426_160349
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 16:04 | Success | - | |
|
exp_self.20260426155659.029_20260426_155700
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426155659.029 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 15:58 | Success | - | |
|
exp_self.20260426154924.028_20260426_154924
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426154924.028 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 15:50 | Success | - | |
|
exp_self.20260426154203.027_20260426_154203
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426154203.027 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 15:43 | Success | - | |
|
exp_self.20260426153444.026_20260426_153444
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426153444.026 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 15:35 | Success | - | |
|
exp_pytrain.20260426153222.007_20260426_153222
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 15:33 | Success | - | |
|
exp_self.20260426152529.025_20260426_152529
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426152529.025 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 15:26 | Success | - | |
|
exp_self.20260426151810.024_20260426_151811
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426151810.024 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 15:19 | Success | - | |
|
exp_self.20260426151053.023_20260426_151054
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426151053.023 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 15:11 | Success | - | |
|
exp_self.20260426150331.022_20260426_150331
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426150331.022 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 15:04 | Success | - | |
|
exp_pytrain.20260426150105.006_20260426_150105
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 15:02 | Success | - | |
|
exp_self.20260426145407.021_20260426_145407
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426145407.021 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 14:55 | Success | - | |
|
exp_self.20260426144649.020_20260426_144649
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426144649.020 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 14:47 | Success | - | |
|
exp_self.20260426143929.019_20260426_143929
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426143929.019 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 14:40 | Success | - | |
|
exp_self.20260426143209.018_20260426_143209
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426143209.018 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 14:33 | Success | - | |
|
exp_pytrain.20260426142947.005_20260426_142947
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 14:30 | Success | - | |
|
exp_self.20260426142256.017_20260426_142256
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426142256.017 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 14:23 | Success | - | |
|
exp_self.20260426141535.016_20260426_141536
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426141535.016 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 14:16 | Success | - | |
|
exp_self.20260426140815.015_20260426_140815
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426140815.015 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 14:09 | Success | - | |
|
exp_self.20260426140053.014_20260426_140054
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426140053.014 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 14:01 | Success | - | |
|
exp_pytrain.20260426135832.004_20260426_135832
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 13:59 | Success | - | |
|
exp_self.20260426135145.013_20260426_135145
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426135145.013 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 13:52 | Success | - | |
|
exp_self.20260426134424.012_20260426_134425
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426134424.012 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 13:45 | Success | - | |
|
exp_self.20260426133702.011_20260426_133702
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426133702.011 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 13:38 | Success | - | |
|
exp_self.20260426132939.010_20260426_132939
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426132939.010 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 13:30 | Success | - | |
|
exp_pytrain.20260426132608.003_20260426_132608
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 13:27 | Success | - | |
|
exp_self.20260426132202.009_20260426_132202
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426132202.009 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 13:23 | Success | - | |
|
exp_self.20260426131440.008_20260426_131440
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426131440.008 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 13:15 | Success | - | |
|
exp_self.20260426130725.007_20260426_130725
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426130725.007 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 13:08 | Success | - | |
|
exp_self.20260426130007.006_20260426_130008
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426130007.006 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 13:01 | Success | - | |
|
exp_pytrain.20260426125423.002_20260426_125424
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 12:55 | Success | - | |
|
exp_self.20260426125230.005_20260426_125230
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426125230.005 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 12:53 | Success | - | |
|
exp_self.20260426124512.004_20260426_124513
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426124512.004 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 12:46 | Success | - | |
|
exp_self.20260426123750.003_20260426_123750
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426123750.003 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 12:38 | Success | - | |
|
exp_self.20260426123029.002_20260426_123029
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426123029.002 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 12:31 | Success | - | |
|
exp_self.20260426122311.001_20260426_122311
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426122311.001 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 12:24 | Success | - | |
|
exp_pytrain.20260426122049.001_20260426_122049
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 12:21 | Success | - | |
|
exp_pytrain.20260426115648.002_20260426_115843
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 11:59 | Success | - | |
|
exp_self.20260426114303.004_20260426_114303
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426114303.004 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 11:44 | Success | - | |
|
exp_self.20260426113539.003_20260426_113539
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426113539.003 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 11:36 | Success | - | |
|
exp_self.20260426112819.002_20260426_112820
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426112819.002 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 11:29 | Success | - | |
|
exp_self.20260426112059.001_20260426_112059
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426112059.001 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 11:22 | Success | - | |
|
exp_pytrain.20260426111838.001_20260426_111838
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 11:19 | Success | - | |
|
exp_self.20260426111056.851_20260426_111056
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426111056.851 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 11:11 | Success | - | |
|
exp_self.20260426110335.850_20260426_110336
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426110335.850 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 11:04 | Success | - | |
|
exp_pytrain.20260426110006.211_20260426_110007
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 11:01 | Success | - | |
|
exp_self.20260426105559.849_20260426_105600
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426105559.849 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 10:57 | Success | - | |
|
exp_self.20260426104836.848_20260426_104836
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426104836.848 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 10:49 | Success | - | |
|
exp_self.20260426104118.847_20260426_104119
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426104118.847 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 10:42 | Success | - | |
|
exp_self.20260426103358.846_20260426_103358
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426103358.846 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 10:35 | Success | - | |
|
exp_pytrain.20260426102813.210_20260426_102814
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 10:29 | Success | - | |
|
exp_self.20260426102620.845_20260426_102620
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426102620.845 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 10:27 | Success | - | |
|
exp_self.20260426101902.844_20260426_101902
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426101902.844 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 10:20 | Success | - | |
|
exp_self.20260426101145.843_20260426_101146
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426101145.843 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 10:12 | Success | - | |
|
exp_self.20260426100425.842_20260426_100425
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426100425.842 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 10:05 | Success | - | |
|
exp_self.20260426095703.841_20260426_095704
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426095703.841 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 09:58 | Success | - | |
|
exp_pytrain.20260426095441.209_20260426_095441
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 09:55 | Success | - | |
|
exp_self.20260426094748.840_20260426_094748
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426094748.840 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 09:48 | Success | - | |
|
exp_self.20260426094033.839_20260426_094033
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426094033.839 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 09:41 | Success | - | |
|
exp_self.20260426093311.838_20260426_093312
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426093311.838 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 09:34 | Success | - | |
|
exp_self.20260426092549.837_20260426_092549
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426092549.837 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 09:26 | Success | - | |
|
exp_pytrain.20260426092325.208_20260426_092325
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 09:24 | Success | - | |
|
exp_self.20260426091634.836_20260426_091635
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426091634.836 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 09:17 | Success | - | |
|
exp_self.20260426090914.835_20260426_090914
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426090914.835 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 09:10 | Success | - | |
|
exp_self.20260426090151.834_20260426_090152
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426090151.834 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 09:02 | Success | - | |
|
exp_self.20260426085430.833_20260426_085430
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426085430.833 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 08:55 | Success | - | |
|
exp_pytrain.20260426085209.207_20260426_085209
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 08:53 | Success | - | |
|
exp_self.20260426084519.832_20260426_084520
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426084519.832 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 08:46 | Success | - | |
|
exp_self.20260426083758.831_20260426_083758
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426083758.831 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 08:39 | Success | - | |
|
exp_self.20260426083033.830_20260426_083033
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426083033.830 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 08:31 | Success | - | |
|
exp_self.20260426082312.829_20260426_082313
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426082312.829 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 08:24 | Success | - | |
|
exp_pytrain.20260426082052.206_20260426_082053
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 08:21 | Success | - | |
|
exp_self.20260426081359.828_20260426_081400
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426081359.828 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 08:15 | Success | - | |
|
exp_self.20260426080640.827_20260426_080640
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426080640.827 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 08:07 | Success | - | |
|
exp_self.20260426075917.826_20260426_075918
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426075917.826 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 08:00 | Success | - | |
|
exp_self.20260426075202.825_20260426_075202
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426075202.825 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 07:53 | Success | - | |
|
exp_pytrain.20260426074936.205_20260426_074937
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 07:50 | Success | - | |
|
exp_self.20260426074253.824_20260426_074254
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426074253.824 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 07:43 | Success | - | |
|
exp_self.20260426073525.823_20260426_073526
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426073525.823 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 07:36 | Success | - | |
|
exp_self.20260426072748.822_20260426_072748
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426072748.822 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 07:28 | Success | - | |
|
exp_self.20260426072017.821_20260426_072017
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426072017.821 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 07:21 | Success | - | |
|
exp_pytrain.20260426071754.204_20260426_071754
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 07:18 | Success | - | |
|
exp_self.20260426071052.820_20260426_071053
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426071052.820 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 07:11 | Success | - | |
|
exp_self.20260426070333.819_20260426_070333
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426070333.819 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 07:04 | Success | - | |
|
exp_self.20260426065612.818_20260426_065612
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426065612.818 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 06:57 | Success | - | |
|
exp_self.20260426064852.817_20260426_064852
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426064852.817 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 06:49 | Success | - | |
|
exp_pytrain.20260426064632.203_20260426_064633
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 06:47 | Success | - | |
|
exp_self.20260426063940.816_20260426_063940
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426063940.816 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 06:40 | Success | - | |
|
exp_self.20260426063219.815_20260426_063219
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426063219.815 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 06:33 | Success | - | |
|
exp_self.20260426062453.814_20260426_062454
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426062453.814 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 06:25 | Success | - | |
|
exp_self.20260426061726.813_20260426_061726
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426061726.813 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 06:18 | Success | - | |
|
exp_pytrain.20260426061356.202_20260426_061356
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 06:14 | Success | - | |
|
exp_self.20260426060948.812_20260426_060948
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426060948.812 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 06:10 | Success | - | |
|
exp_self.20260426060225.811_20260426_060225
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426060225.811 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 06:03 | Success | - | |
|
exp_self.20260426055506.810_20260426_055506
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426055506.810 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 05:56 | Success | - | |
|
exp_self.20260426054749.809_20260426_054750
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426054749.809 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 05:48 | Success | - | |
|
exp_pytrain.20260426054204.201_20260426_054204
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 05:43 | Success | - | |
|
exp_self.20260426054010.808_20260426_054010
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426054010.808 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 05:41 | Success | - | |
|
exp_self.20260426053253.807_20260426_053254
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426053253.807 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 05:33 | Success | - | |
|
exp_self.20260426052534.806_20260426_052534
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426052534.806 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 05:26 | Success | - | |
|
exp_self.20260426051816.805_20260426_051816
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426051816.805 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 05:19 | Success | - | |
|
exp_self.20260426051052.804_20260426_051052
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426051052.804 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 05:11 | Success | - | |
|
exp_pytrain.20260426050832.200_20260426_050832
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 05:09 | Success | - | |
|
exp_self.20260426050142.803_20260426_050143
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426050142.803 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 05:02 | Success | - | |
|
exp_self.20260426045419.802_20260426_045419
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426045419.802 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 04:55 | Success | - | |
|
exp_self.20260426044652.801_20260426_044653
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426044652.801 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 04:47 | Success | - | |
|
exp_self.20260426043931.800_20260426_043932
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426043931.800 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 04:40 | Success | - | |
|
exp_pytrain.20260426043709.199_20260426_043710
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 04:38 | Success | - | |
|
exp_self.20260426043015.799_20260426_043015
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426043015.799 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 04:31 | Success | - | |
|
exp_self.20260426042255.798_20260426_042255
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426042255.798 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 04:23 | Success | - | |
|
exp_self.20260426041534.797_20260426_041534
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426041534.797 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 04:16 | Success | - | |
|
exp_self.20260426040809.796_20260426_040810
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426040809.796 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 04:09 | Success | - | |
|
exp_pytrain.20260426040547.198_20260426_040547
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 04:06 | Success | - | |
|
exp_self.20260426035852.795_20260426_035852
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426035852.795 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 03:59 | Success | - | |
|
exp_self.20260426035127.794_20260426_035128
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426035127.794 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 03:52 | Success | - | |
|
exp_self.20260426034406.793_20260426_034406
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426034406.793 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 03:45 | Success | - | |
|
exp_self.20260426033644.792_20260426_033644
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426033644.792 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 03:37 | Success | - | |
|
exp_pytrain.20260426033421.197_20260426_033422
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 03:35 | Success | - | |
|
exp_self.20260426032734.791_20260426_032735
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426032734.791 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 03:28 | Success | - | |
|
exp_self.20260426032013.790_20260426_032014
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426032013.790 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 03:21 | Success | - | |
|
exp_self.20260426031250.789_20260426_031251
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426031250.789 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 03:13 | Success | - | |
|
exp_self.20260426030515.788_20260426_030515
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426030515.788 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 03:06 | Success | - | |
|
exp_pytrain.20260426030254.196_20260426_030254
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 03:03 | Success | - | |
|
exp_self.20260426025604.787_20260426_025605
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426025604.787 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 02:57 | Success | - | |
|
exp_self.20260426024843.786_20260426_024843
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426024843.786 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 02:49 | Success | - | |
|
exp_self.20260426024118.785_20260426_024119
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426024118.785 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 02:42 | Success | - | |
|
exp_self.20260426023356.784_20260426_023356
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426023356.784 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 02:34 | Success | - | |
|
exp_pytrain.20260426023135.195_20260426_023135
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 02:32 | Success | - | |
|
exp_self.20260426022439.783_20260426_022440
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426022439.783 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 02:25 | Success | - | |
|
exp_self.20260426021720.782_20260426_021720
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426021720.782 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 02:18 | Success | - | |
|
exp_self.20260426021001.781_20260426_021001
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426021001.781 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 02:11 | Success | - | |
|
exp_self.20260426020237.780_20260426_020237
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426020237.780 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 02:03 | Success | - | |
|
exp_pytrain.20260426015905.194_20260426_015905
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 02:00 | Success | - | |
|
exp_self.20260426015456.779_20260426_015456
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426015456.779 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 01:55 | Success | - | |
|
exp_self.20260426014732.778_20260426_014732
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426014732.778 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 01:48 | Success | - | |
|
exp_self.20260426014011.777_20260426_014011
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426014011.777 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 01:41 | Success | - | |
|
exp_gh_dilberx_universal-llm-telemetry-suite_20260426_013726
|
dilberx/universal-llm-telemetry-suite
Paper ID: gh_dilberx_universal-llm-telemetry-suite - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected S...
|
04-26 01:38 | Success | - | |
|
exp_self.20260426013030.776_20260426_013030
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426013030.776 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 01:31 | Success | - | |
|
exp_pytrain.20260426012700.193_20260426_012700
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 01:28 | Success | - | |
|
exp_self.20260426012250.775_20260426_012250
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426012250.775 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 01:23 | Success | - | |
|
exp_self.20260426011526.774_20260426_011527
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426011526.774 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 01:16 | Success | - | |
|
exp_self.20260426010807.773_20260426_010807
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426010807.773 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 01:09 | Success | - | |
|
exp_self.20260426010048.772_20260426_010049
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426010048.772 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 01:01 | Success | - | |
|
exp_pytrain.20260426005503.192_20260426_005504
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 00:56 | Success | - | |
|
exp_self.20260426005308.771_20260426_005308
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426005308.771 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 00:54 | Success | - | |
|
exp_self.20260426004549.770_20260426_004550
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426004549.770 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 00:46 | Success | - | |
|
exp_self.20260426003834.769_20260426_003835
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426003834.769 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 00:39 | Success | - | |
|
exp_self.20260426003115.768_20260426_003116
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426003115.768 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 00:32 | Success | - | |
|
exp_self.20260426002353.767_20260426_002353
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426002353.767 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 00:24 | Success | - | |
|
exp_pytrain.20260426002130.191_20260426_002131
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-26 00:22 | Success | - | |
|
exp_self.20260426001441.766_20260426_001441
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426001441.766 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 00:15 | Success | - | |
|
exp_self.20260426000722.765_20260426_000723
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260426000722.765 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-26 00:08 | Success | - | |
|
exp_cr_10.24143_2072-9502-2026-2-111-120_20260426_000407
|
Fuzzy logic-based model for information security risk assessment of a territorially distributed internal affairs system
Paper ID: cr_10.24143_2072-9502-2026-2-111-120 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signa...
|
04-26 00:05 | Success | - | |
|
exp_cr_10.24143_2072-9502-2026-2-85-93_20260426_000043
|
Optimizing the YOLO model for NPU operation
Paper ID: cr_10.24143_2072-9502-2026-2-85-93 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal:...
|
04-26 00:01 | Success | - | |
|
exp_self.20260425235838.764_20260425_235838
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425235838.764 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 23:59 | Success | - | |
|
exp_self.20260425235120.763_20260425_235120
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425235120.763 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 23:52 | Success | - | |
|
exp_pytrain.20260425234851.190_20260425_234852
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 23:49 | Success | - | |
|
exp_self.20260425234207.762_20260425_234208
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425234207.762 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 23:43 | Success | - | |
|
exp_self.20260425233445.761_20260425_233446
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425233445.761 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 23:35 | Success | - | |
|
exp_self.20260425232722.760_20260425_232722
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425232722.760 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 23:28 | Success | - | |
|
exp_gh_eslammoha8625_llmtest-perf_20260425_232157
|
eslammoha8625/llmtest-perf
Paper ID: gh_eslammoha8625_llmtest-perf - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Reco...
|
04-25 23:22 | Success | - | |
|
exp_self.20260425231954.759_20260425_231954
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425231954.759 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 23:20 | Success | - | |
|
exp_pytrain.20260425231729.189_20260425_231730
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 23:18 | Success | - | |
|
exp_self.20260425231041.758_20260425_231042
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425231041.758 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 23:11 | Success | - | |
|
exp_self.20260425230320.757_20260425_230320
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425230320.757 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 23:04 | Success | - | |
|
exp_self.20260425225600.756_20260425_225601
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425225600.756 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 22:57 | Success | - | |
|
exp_self.20260425224837.755_20260425_224837
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425224837.755 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 22:49 | Success | - | |
|
exp_pytrain.20260425224608.188_20260425_224609
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 22:47 | Success | - | |
|
exp_self.20260425223923.754_20260425_223924
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425223923.754 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 22:40 | Success | - | |
|
exp_self.20260425223159.753_20260425_223159
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425223159.753 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 22:33 | Success | - | |
|
exp_self.20260425222438.752_20260425_222438
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425222438.752 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 22:25 | Success | - | |
|
exp_self.20260425221715.751_20260425_221715
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425221715.751 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 22:18 | Success | - | |
|
exp_pytrain.20260425221443.187_20260425_221443
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 22:15 | Success | - | |
|
exp_self.20260425220755.750_20260425_220755
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425220755.750 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 22:08 | Success | - | |
|
exp_self.20260425220030.749_20260425_220030
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425220030.749 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 22:01 | Success | - | |
|
exp_self.20260425215301.748_20260425_215302
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425215301.748 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 21:54 | Success | - | |
|
exp_self.20260425214541.747_20260425_214541
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425214541.747 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 21:46 | Success | - | |
|
exp_pytrain.20260425214314.186_20260425_214314
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 21:44 | Success | - | |
|
exp_self.20260425213626.746_20260425_213627
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425213626.746 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 21:37 | Success | - | |
|
exp_self.20260425212857.745_20260425_212857
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425212857.745 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 21:29 | Success | - | |
|
exp_self.20260425212125.744_20260425_212126
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425212125.744 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 21:22 | Success | - | |
|
exp_self.20260425211405.743_20260425_211405
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425211405.743 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 21:15 | Success | - | |
|
exp_pytrain.20260425211136.185_20260425_211136
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 21:12 | Success | - | |
|
exp_self.20260425210446.742_20260425_210446
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425210446.742 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 21:05 | Success | - | |
|
exp_self.20260425205716.741_20260425_205717
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425205716.741 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 20:58 | Success | - | |
|
exp_self.20260425204946.740_20260425_204947
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425204946.740 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 20:50 | Success | - | |
|
exp_self.20260425204221.739_20260425_204221
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425204221.739 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 20:43 | Success | - | |
|
exp_pytrain.20260425203958.184_20260425_203958
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 20:41 | Success | - | |
|
exp_self.20260425203304.738_20260425_203304
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425203304.738 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 20:34 | Success | - | |
|
exp_self.20260425202538.737_20260425_202539
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425202538.737 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 20:26 | Success | - | |
|
exp_self.20260425201812.736_20260425_201812
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425201812.736 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 20:19 | Success | - | |
|
exp_self.20260425201045.735_20260425_201045
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425201045.735 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 20:11 | Success | - | |
|
exp_pytrain.20260425200823.183_20260425_200824
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 20:09 | Success | - | |
|
exp_self.20260425200124.734_20260425_200125
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425200124.734 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 20:02 | Success | - | |
|
exp_self.20260425195403.733_20260425_195403
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425195403.733 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 19:55 | Success | - | |
|
exp_gh_Ac3v3d0_semafold_20260425_194941
|
Ac3v3d0/semafold
Paper ID: gh_Ac3v3d0_semafold - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benc...
|
04-25 19:50 | Success | - | |
|
exp_self.20260425194632.732_20260425_194633
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425194632.732 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 19:47 | Success | - | |
|
exp_self.20260425193852.731_20260425_193852
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425193852.731 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 19:39 | Success | - | |
|
exp_pytrain.20260425193623.182_20260425_193624
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 19:37 | Success | - | |
|
exp_self.20260425192934.730_20260425_192935
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425192934.730 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 19:30 | Success | - | |
|
exp_self.20260425192211.729_20260425_192211
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425192211.729 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 19:23 | Success | - | |
|
exp_self.20260425191447.728_20260425_191448
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425191447.728 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 19:15 | Success | - | |
|
exp_self.20260425190726.727_20260425_190727
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425190726.727 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 19:08 | Success | - | |
|
exp_pytrain.20260425190500.181_20260425_190500
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 19:06 | Success | - | |
|
exp_self.20260425185801.726_20260425_185802
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425185801.726 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 18:59 | Success | - | |
|
exp_self.20260425185033.725_20260425_185034
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425185033.725 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 18:51 | Success | - | |
|
exp_self.20260425184307.724_20260425_184308
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425184307.724 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 18:44 | Success | - | |
|
exp_self.20260425183544.723_20260425_183544
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425183544.723 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 18:36 | Success | - | |
|
exp_pytrain.20260425183316.180_20260425_183316
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 18:34 | Success | - | |
|
exp_self.20260425182627.722_20260425_182628
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425182627.722 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 18:27 | Success | - | |
|
exp_self.20260425181859.721_20260425_181900
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425181859.721 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 18:20 | Success | - | |
|
exp_self.20260425181132.720_20260425_181133
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425181132.720 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 18:12 | Success | - | |
|
exp_self.20260425180402.719_20260425_180402
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425180402.719 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 18:05 | Success | - | |
|
exp_pytrain.20260425180138.179_20260425_180138
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 18:02 | Success | - | |
|
exp_self.20260425175448.718_20260425_175449
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425175448.718 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 17:55 | Success | - | |
|
exp_self.20260425174726.717_20260425_174727
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425174726.717 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 17:48 | Success | - | |
|
exp_self.20260425174001.716_20260425_174001
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425174001.716 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 17:41 | Success | - | |
|
exp_self.20260425173230.715_20260425_173230
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425173230.715 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 17:33 | Success | - | |
|
exp_pytrain.20260425173007.178_20260425_173007
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 17:31 | Success | - | |
|
exp_self.20260425172317.714_20260425_172317
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425172317.714 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 17:24 | Success | - | |
|
exp_self.20260425171556.713_20260425_171557
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425171556.713 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 17:16 | Success | - | |
|
exp_self.20260425170833.712_20260425_170833
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425170833.712 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 17:09 | Success | - | |
|
exp_self.20260425170101.711_20260425_170101
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425170101.711 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 17:02 | Success | - | |
|
exp_pytrain.20260425165838.177_20260425_165839
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 16:59 | Success | - | |
|
exp_self.20260425165143.710_20260425_165144
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425165143.710 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 16:52 | Success | - | |
|
exp_self.20260425164422.709_20260425_164423
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425164422.709 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 16:45 | Success | - | |
|
exp_self.20260425163659.708_20260425_163700
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425163659.708 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 16:38 | Success | - | |
|
exp_self.20260425162934.707_20260425_162934
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425162934.707 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 16:30 | Success | - | |
|
exp_pytrain.20260425162710.176_20260425_162710
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 16:28 | Success | - | |
|
exp_self.20260425162014.706_20260425_162014
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425162014.706 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 16:21 | Success | - | |
|
exp_self.20260425161254.705_20260425_161254
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425161254.705 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 16:13 | Success | - | |
|
exp_self.20260425160532.704_20260425_160533
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425160532.704 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 16:06 | Success | - | |
|
exp_self.20260425155807.703_20260425_155808
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425155807.703 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 15:59 | Success | - | |
|
exp_pytrain.20260425155541.175_20260425_155541
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 15:56 | Success | - | |
|
exp_self.20260425154842.702_20260425_154843
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425154842.702 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 15:49 | Success | - | |
|
exp_self.20260425154114.701_20260425_154114
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425154114.701 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 15:42 | Success | - | |
|
exp_self.20260425153351.700_20260425_153351
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425153351.700 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 15:34 | Success | - | |
|
exp_self.20260425152626.699_20260425_152626
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425152626.699 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 15:27 | Success | - | |
|
exp_pytrain.20260425152356.174_20260425_152357
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 15:24 | Success | - | |
|
exp_self.20260425151708.698_20260425_151708
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425151708.698 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 15:18 | Success | - | |
|
exp_self.20260425150939.697_20260425_150939
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425150939.697 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 15:10 | Success | - | |
|
exp_self.20260425150211.696_20260425_150211
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425150211.696 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 15:03 | Success | - | |
|
exp_self.20260425145443.695_20260425_145443
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425145443.695 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 14:55 | Success | - | |
|
exp_pytrain.20260425145213.173_20260425_145214
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 14:53 | Success | - | |
|
exp_self.20260425144523.694_20260425_144524
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425144523.694 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 14:46 | Success | - | |
|
exp_self.20260425143752.693_20260425_143753
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425143752.693 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 14:38 | Success | - | |
|
exp_self.20260425143027.692_20260425_143027
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425143027.692 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 14:31 | Success | - | |
|
exp_self.20260425142259.691_20260425_142259
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425142259.691 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 14:24 | Success | - | |
|
exp_pytrain.20260425142032.172_20260425_142032
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 14:21 | Success | - | |
|
exp_self.20260425141334.690_20260425_141335
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425141334.690 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 14:14 | Success | - | |
|
exp_self.20260425140558.689_20260425_140558
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425140558.689 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 14:07 | Success | - | |
|
exp_self.20260425135824.688_20260425_135825
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425135824.688 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 13:59 | Success | - | |
|
exp_self.20260425135058.687_20260425_135059
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425135058.687 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 13:52 | Success | - | |
|
exp_pytrain.20260425134830.171_20260425_134831
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 13:49 | Success | - | |
|
exp_self.20260425134135.686_20260425_134135
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425134135.686 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 13:42 | Success | - | |
|
exp_self.20260425133405.685_20260425_133405
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425133405.685 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 13:35 | Success | - | |
|
exp_self.20260425132637.684_20260425_132637
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425132637.684 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 13:27 | Success | - | |
|
exp_self.20260425131914.683_20260425_131914
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425131914.683 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 13:20 | Success | - | |
|
exp_pytrain.20260425131647.170_20260425_131648
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 13:17 | Success | - | |
|
exp_self.20260425130954.682_20260425_130954
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425130954.682 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 13:10 | Success | - | |
|
exp_self.20260425130221.681_20260425_130221
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425130221.681 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 13:03 | Success | - | |
|
exp_self.20260425125459.680_20260425_125459
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425125459.680 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 12:56 | Success | - | |
|
exp_self.20260425124733.679_20260425_124734
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425124733.679 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 12:48 | Success | - | |
|
exp_pytrain.20260425124515.169_20260425_124515
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 12:46 | Success | - | |
|
exp_self.20260425123822.678_20260425_123823
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425123822.678 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 12:39 | Success | - | |
|
exp_self.20260425123101.677_20260425_123101
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425123101.677 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 12:32 | Success | - | |
|
exp_self.20260425122333.676_20260425_122334
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425122333.676 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 12:24 | Success | - | |
|
exp_self.20260425121607.675_20260425_121607
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425121607.675 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 12:17 | Success | - | |
|
exp_pytrain.20260425121346.168_20260425_121346
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 12:14 | Success | - | |
|
exp_self.20260425120648.674_20260425_120648
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425120648.674 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 12:07 | Success | - | |
|
exp_self.20260425115926.673_20260425_115927
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425115926.673 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 12:00 | Success | - | |
|
exp_self.20260425115155.672_20260425_115155
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425115155.672 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 11:52 | Success | - | |
|
exp_self.20260425114417.671_20260425_114418
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425114417.671 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 11:45 | Success | - | |
|
exp_pytrain.20260425114151.167_20260425_114151
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 11:42 | Success | - | |
|
exp_self.20260425113545.670_20260425_113545
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425113545.670 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 11:36 | Success | - | |
|
exp_self.20260425112819.669_20260425_112820
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425112819.669 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 11:29 | Success | - | |
|
exp_self.20260425112050.668_20260425_112050
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425112050.668 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 11:21 | Success | - | |
|
exp_self.20260425111304.667_20260425_111305
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425111304.667 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 11:14 | Success | - | |
|
exp_pytrain.20260425111035.166_20260425_111036
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 11:11 | Success | - | |
|
exp_self.20260425110336.666_20260425_110336
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425110336.666 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 11:04 | Success | - | |
|
exp_self.20260425105606.665_20260425_105607
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425105606.665 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 10:57 | Success | - | |
|
exp_self.20260425104835.664_20260425_104835
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425104835.664 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 10:49 | Success | - | |
|
exp_self.20260425104059.663_20260425_104059
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425104059.663 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 10:42 | Success | - | |
|
exp_pytrain.20260425103825.165_20260425_103826
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 10:39 | Success | - | |
|
exp_self.20260425103130.662_20260425_103131
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425103130.662 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 10:32 | Success | - | |
|
exp_self.20260425102351.661_20260425_102352
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425102351.661 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 10:24 | Success | - | |
|
exp_self.20260425101621.660_20260425_101621
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425101621.660 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 10:17 | Success | - | |
|
exp_self.20260425100844.659_20260425_100844
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425100844.659 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 10:09 | Success | - | |
|
exp_pytrain.20260425100606.164_20260425_100607
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 10:07 | Success | - | |
|
exp_self.20260425095919.658_20260425_095920
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425095919.658 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 10:00 | Success | - | |
|
exp_self.20260425095149.657_20260425_095149
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425095149.657 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 09:52 | Success | - | |
|
exp_self.20260425094421.656_20260425_094421
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425094421.656 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 09:45 | Success | - | |
|
exp_self.20260425093658.655_20260425_093658
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425093658.655 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 09:38 | Success | - | |
|
exp_pytrain.20260425093433.163_20260425_093434
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 09:35 | Success | - | |
|
exp_self.20260425092739.654_20260425_092740
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425092739.654 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 09:28 | Success | - | |
|
exp_self.20260425092008.653_20260425_092008
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425092008.653 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 09:21 | Success | - | |
|
exp_self.20260425091236.652_20260425_091237
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425091236.652 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 09:13 | Success | - | |
|
exp_self.20260425090515.651_20260425_090516
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425090515.651 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 09:06 | Success | - | |
|
exp_pytrain.20260425090249.162_20260425_090250
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 09:03 | Success | - | |
|
exp_self.20260425085830.650_20260425_085831
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425085830.650 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 08:59 | Success | - | |
|
exp_self.20260425085107.649_20260425_085107
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425085107.649 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 08:52 | Success | - | |
|
exp_cr_10.1177_01466453251412512_20260425_084817
|
A short review of published multi-model inference studies in radiation epidemiology and some new developments
Paper ID: cr_10.1177_01466453251412512 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recov...
|
04-25 08:49 | Success | - | |
|
exp_self.20260425084118.648_20260425_084119
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425084118.648 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 08:42 | Success | - | |
|
exp_self.20260425083355.647_20260425_083356
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425083355.647 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 08:34 | Success | - | |
|
exp_pytrain.20260425083134.161_20260425_083135
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 08:32 | Success | - | |
|
exp_self.20260425082439.646_20260425_082440
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425082439.646 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 08:25 | Success | - | |
|
exp_self.20260425081718.645_20260425_081719
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425081718.645 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 08:18 | Success | - | |
|
exp_self.20260425080954.644_20260425_080954
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425080954.644 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 08:10 | Success | - | |
|
exp_self.20260425080229.643_20260425_080229
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425080229.643 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 08:03 | Success | - | |
|
exp_pytrain.20260425080009.160_20260425_080010
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 08:01 | Success | - | |
|
exp_self.20260425075312.642_20260425_075313
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425075312.642 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 07:54 | Success | - | |
|
exp_self.20260425074553.641_20260425_074554
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425074553.641 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 07:46 | Success | - | |
|
exp_self.20260425073825.640_20260425_073826
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425073825.640 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 07:39 | Success | - | |
|
exp_self.20260425073101.639_20260425_073102
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425073101.639 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 07:32 | Success | - | |
|
exp_pytrain.20260425072840.159_20260425_072841
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 07:29 | Success | - | |
|
exp_self.20260425072147.638_20260425_072147
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425072147.638 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 07:22 | Success | - | |
|
exp_self.20260425071428.637_20260425_071428
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425071428.637 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 07:15 | Success | - | |
|
exp_self.20260425070635.636_20260425_070636
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425070635.636 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 07:07 | Success | - | |
|
exp_self.20260425065841.635_20260425_065841
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425065841.635 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 06:59 | Success | - | |
|
exp_pytrain.20260425065557.158_20260425_065557
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 06:57 | Success | - | |
|
exp_self.20260425065015.634_20260425_065016
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425065015.634 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 06:51 | Success | - | |
|
exp_self.20260425064220.633_20260425_064220
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425064220.633 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 06:43 | Success | - | |
|
exp_self.20260425063435.632_20260425_063435
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425063435.632 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 06:35 | Success | - | |
|
exp_self.20260425062658.631_20260425_062658
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425062658.631 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 06:28 | Success | - | |
|
exp_pytrain.20260425062421.157_20260425_062422
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 06:25 | Success | - | |
|
exp_self.20260425061725.630_20260425_061725
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425061725.630 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 06:18 | Success | - | |
|
exp_self.20260425061000.629_20260425_061000
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425061000.629 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 06:11 | Success | - | |
|
exp_self.20260425060232.628_20260425_060232
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425060232.628 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 06:03 | Success | - | |
|
exp_self.20260425055513.627_20260425_055513
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425055513.627 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 05:56 | Success | - | |
|
exp_pytrain.20260425055252.156_20260425_055253
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 05:53 | Success | - | |
|
exp_self.20260425054558.626_20260425_054558
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425054558.626 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 05:47 | Success | - | |
|
exp_self.20260425053832.625_20260425_053833
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425053832.625 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 05:39 | Success | - | |
|
exp_self.20260425053108.624_20260425_053108
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425053108.624 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 05:32 | Success | - | |
|
exp_self.20260425052343.623_20260425_052344
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425052343.623 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 05:24 | Success | - | |
|
exp_pytrain.20260425052125.155_20260425_052125
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 05:22 | Success | - | |
|
exp_self.20260425051431.622_20260425_051431
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425051431.622 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 05:15 | Success | - | |
|
exp_self.20260425050703.621_20260425_050704
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425050703.621 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 05:08 | Success | - | |
|
exp_self.20260425045935.620_20260425_045935
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425045935.620 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 05:00 | Success | - | |
|
exp_self.20260425045207.619_20260425_045207
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425045207.619 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 04:53 | Success | - | |
|
exp_pytrain.20260425044947.154_20260425_044948
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 04:50 | Success | - | |
|
exp_self.20260425044253.618_20260425_044253
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425044253.618 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 04:43 | Success | - | |
|
exp_self.20260425043535.617_20260425_043535
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425043535.617 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 04:36 | Success | - | |
|
exp_self.20260425042813.616_20260425_042813
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425042813.616 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 04:29 | Success | - | |
|
exp_self.20260425042046.615_20260425_042046
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425042046.615 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 04:21 | Success | - | |
|
exp_pytrain.20260425041823.153_20260425_041824
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 04:19 | Success | - | |
|
exp_self.20260425041127.614_20260425_041128
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425041127.614 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 04:12 | Success | - | |
|
exp_self.20260425040406.613_20260425_040406
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425040406.613 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 04:05 | Success | - | |
|
exp_self.20260425035644.612_20260425_035645
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425035644.612 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 03:57 | Success | - | |
|
exp_self.20260425034919.611_20260425_034920
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425034919.611 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 03:50 | Success | - | |
|
exp_pytrain.20260425034655.152_20260425_034656
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 03:47 | Success | - | |
|
exp_self.20260425034001.610_20260425_034001
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425034001.610 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 03:41 | Success | - | |
|
exp_self.20260425033241.609_20260425_033241
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425033241.609 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 03:33 | Success | - | |
|
exp_self.20260425032519.608_20260425_032519
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425032519.608 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 03:26 | Success | - | |
|
exp_self.20260425031755.607_20260425_031756
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425031755.607 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 03:18 | Success | - | |
|
exp_pytrain.20260425031525.151_20260425_031525
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 03:16 | Success | - | |
|
exp_self.20260425030840.606_20260425_030840
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425030840.606 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 03:09 | Success | - | |
|
exp_self.20260425030115.605_20260425_030115
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425030115.605 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 03:02 | Success | - | |
|
exp_self.20260425025357.604_20260425_025357
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425025357.604 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 02:55 | Success | - | |
|
exp_self.20260425024639.603_20260425_024639
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425024639.603 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 02:47 | Success | - | |
|
exp_pytrain.20260425024408.150_20260425_024409
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 02:45 | Success | - | |
|
exp_self.20260425023716.602_20260425_023717
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425023716.602 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 02:38 | Success | - | |
|
exp_self.20260425022952.601_20260425_022952
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425022952.601 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 02:30 | Success | - | |
|
exp_self.20260425022234.600_20260425_022235
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425022234.600 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 02:23 | Success | - | |
|
exp_self.20260425021513.599_20260425_021513
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425021513.599 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 02:16 | Success | - | |
|
exp_pytrain.20260425021248.149_20260425_021248
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 02:13 | Success | - | |
|
exp_self.20260425020603.598_20260425_020603
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425020603.598 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 02:07 | Success | - | |
|
exp_self.20260425015841.597_20260425_015842
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425015841.597 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 01:59 | Success | - | |
|
exp_self.20260425015116.596_20260425_015117
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425015116.596 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 01:52 | Success | - | |
|
exp_cr_10.55041_ijsmt.v2i4.199_20260425_014809
|
AI-Driven Resume Skill Extraction and Job Recommendation System using Hybrid Transformer Mamba Model
Paper ID: cr_10.55041_ijsmt.v2i4.199 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recover...
|
04-25 01:49 | Success | - | |
|
exp_cr_10.1038_s41598-026-49734-2_20260425_014439
|
A multi-cognitive PCB defect detection model integrating Mamba
Paper ID: cr_10.1038_s41598-026-49734-2 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Reco...
|
04-25 01:45 | Success | - | |
|
exp_self.20260425014239.595_20260425_014239
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425014239.595 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 01:43 | Success | - | |
|
exp_pytrain.20260425014015.148_20260425_014016
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 01:41 | Success | - | |
|
exp_self.20260425013324.594_20260425_013324
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425013324.594 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 01:34 | Success | - | |
|
exp_self.20260425012558.593_20260425_012559
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425012558.593 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 01:27 | Success | - | |
|
exp_self.20260425011835.592_20260425_011835
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425011835.592 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 01:19 | Success | - | |
|
exp_self.20260425011116.591_20260425_011116
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425011116.591 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 01:12 | Success | - | |
|
exp_pytrain.20260425010855.147_20260425_010856
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 01:09 | Success | - | |
|
exp_self.20260425010200.590_20260425_010200
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425010200.590 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 01:03 | Success | - | |
|
exp_self.20260425005423.589_20260425_005423
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425005423.589 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 00:55 | Success | - | |
|
exp_self.20260425004701.588_20260425_004701
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425004701.588 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 00:48 | Success | - | |
|
exp_self.20260425003933.587_20260425_003934
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425003933.587 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 00:40 | Success | - | |
|
exp_pytrain.20260425003715.146_20260425_003715
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 00:38 | Success | - | |
|
exp_self.20260425003016.586_20260425_003017
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425003016.586 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 00:31 | Success | - | |
|
exp_self.20260425002255.585_20260425_002255
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425002255.585 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 00:23 | Success | - | |
|
exp_self.20260425001533.584_20260425_001533
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425001533.584 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 00:16 | Success | - | |
|
exp_self.20260425000816.583_20260425_000816
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260425000816.583 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 00:09 | Success | - | |
|
exp_pytrain.20260425000551.145_20260425_000552
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-25 00:06 | Success | - | |
|
exp_self.20260424235859.582_20260424_235859
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424235859.582 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-25 00:00 | Success | - | |
|
exp_self.20260424235135.581_20260424_235135
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424235135.581 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 23:52 | Success | - | |
|
exp_self.20260424234407.580_20260424_234407
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424234407.580 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 23:45 | Success | - | |
|
exp_self.20260424233646.579_20260424_233646
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424233646.579 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 23:37 | Success | - | |
|
exp_pytrain.20260424233420.144_20260424_233420
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 23:35 | Success | - | |
|
exp_self.20260424232727.578_20260424_232728
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424232727.578 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 23:28 | Success | - | |
|
exp_self.20260424232005.577_20260424_232005
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424232005.577 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 23:21 | Success | - | |
|
exp_self.20260424231243.576_20260424_231243
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424231243.576 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 23:13 | Success | - | |
|
exp_self.20260424230516.575_20260424_230517
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424230516.575 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 23:06 | Success | - | |
|
exp_pytrain.20260424230256.143_20260424_230256
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 23:03 | Success | - | |
|
exp_self.20260424225604.574_20260424_225604
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424225604.574 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 22:57 | Success | - | |
|
exp_self.20260424224843.573_20260424_224844
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424224843.573 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 22:49 | Success | - | |
|
exp_self.20260424224119.572_20260424_224120
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424224119.572 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 22:42 | Success | - | |
|
exp_self.20260424223358.571_20260424_223358
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424223358.571 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 22:35 | Success | - | |
|
exp_pytrain.20260424223139.142_20260424_223139
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 22:32 | Success | - | |
|
exp_self.20260424222444.570_20260424_222445
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424222444.570 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 22:25 | Success | - | |
|
exp_self.20260424221721.569_20260424_221721
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424221721.569 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 22:18 | Success | - | |
|
exp_self.20260424220957.568_20260424_220958
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424220957.568 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 22:11 | Success | - | |
|
exp_self.20260424220235.567_20260424_220235
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424220235.567 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 22:03 | Success | - | |
|
exp_pytrain.20260424220015.141_20260424_220015
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 22:01 | Success | - | |
|
exp_self.20260424215320.566_20260424_215321
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424215320.566 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 21:54 | Success | - | |
|
exp_self.20260424214602.565_20260424_214602
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424214602.565 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 21:47 | Success | - | |
|
exp_self.20260424213840.564_20260424_213840
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424213840.564 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 21:39 | Success | - | |
|
exp_self.20260424213113.563_20260424_213113
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424213113.563 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 21:32 | Success | - | |
|
exp_pytrain.20260424212851.140_20260424_212851
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 21:29 | Success | - | |
|
exp_self.20260424212154.562_20260424_212155
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424212154.562 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 21:22 | Success | - | |
|
exp_self.20260424211431.561_20260424_211432
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424211431.561 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 21:15 | Success | - | |
|
exp_self.20260424210709.560_20260424_210709
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424210709.560 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 21:08 | Success | - | |
|
exp_self.20260424205949.559_20260424_205949
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424205949.559 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 21:00 | Success | - | |
|
exp_pytrain.20260424205728.139_20260424_205729
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 20:58 | Success | - | |
|
exp_self.20260424205033.558_20260424_205034
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424205033.558 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 20:51 | Success | - | |
|
exp_self.20260424204313.557_20260424_204313
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424204313.557 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 20:44 | Success | - | |
|
exp_self.20260424203550.556_20260424_203550
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424203550.556 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 20:36 | Success | - | |
|
exp_self.20260424202823.555_20260424_202823
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424202823.555 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 20:29 | Success | - | |
|
exp_pytrain.20260424202604.138_20260424_202604
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 20:27 | Success | - | |
|
exp_self.20260424201907.554_20260424_201907
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424201907.554 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 20:20 | Success | - | |
|
exp_self.20260424201146.553_20260424_201146
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424201146.553 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 20:12 | Success | - | |
|
exp_self.20260424200425.552_20260424_200426
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424200425.552 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 20:05 | Success | - | |
|
exp_self.20260424195706.551_20260424_195706
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424195706.551 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 19:58 | Success | - | |
|
exp_pytrain.20260424195439.137_20260424_195440
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 19:55 | Success | - | |
|
exp_self.20260424194748.550_20260424_194749
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424194748.550 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 19:48 | Success | - | |
|
exp_self.20260424194026.549_20260424_194027
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424194026.549 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 19:41 | Success | - | |
|
exp_self.20260424193305.548_20260424_193305
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424193305.548 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 19:34 | Success | - | |
|
exp_self.20260424192544.547_20260424_192544
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424192544.547 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 19:26 | Success | - | |
|
exp_pytrain.20260424192319.136_20260424_192320
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 19:24 | Success | - | |
|
exp_self.20260424191635.546_20260424_191635
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424191635.546 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 19:17 | Success | - | |
|
exp_self.20260424190905.545_20260424_190906
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424190905.545 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 19:10 | Success | - | |
|
exp_self.20260424190139.544_20260424_190140
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424190139.544 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 19:02 | Success | - | |
|
exp_self.20260424185418.543_20260424_185418
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424185418.543 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 18:55 | Success | - | |
|
exp_pytrain.20260424185157.135_20260424_185157
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 18:52 | Success | - | |
|
exp_self.20260424184503.542_20260424_184504
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424184503.542 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 18:46 | Success | - | |
|
exp_self.20260424183743.541_20260424_183743
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424183743.541 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 18:38 | Success | - | |
|
exp_self.20260424183016.540_20260424_183016
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424183016.540 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 18:31 | Success | - | |
|
exp_self.20260424182250.539_20260424_182250
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424182250.539 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 18:23 | Success | - | |
|
exp_pytrain.20260424182032.134_20260424_182032
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 18:21 | Success | - | |
|
exp_self.20260424181339.538_20260424_181340
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424181339.538 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 18:14 | Success | - | |
|
exp_self.20260424180613.537_20260424_180613
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424180613.537 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 18:07 | Success | - | |
|
exp_self.20260424175847.536_20260424_175848
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424175847.536 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 17:59 | Success | - | |
|
exp_self.20260424175124.535_20260424_175124
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424175124.535 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 17:52 | Success | - | |
|
exp_pytrain.20260424174905.133_20260424_174905
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 17:50 | Success | - | |
|
exp_self.20260424174211.534_20260424_174211
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424174211.534 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 17:43 | Success | - | |
|
exp_self.20260424173448.533_20260424_173449
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424173448.533 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 17:35 | Success | - | |
|
exp_self.20260424172728.532_20260424_172728
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424172728.532 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 17:28 | Success | - | |
|
exp_self.20260424172006.531_20260424_172006
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424172006.531 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 17:21 | Success | - | |
|
exp_pytrain.20260424171745.132_20260424_171745
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 17:18 | Success | - | |
|
exp_self.20260424171051.530_20260424_171051
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424171051.530 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 17:11 | Success | - | |
|
exp_self.20260424170333.529_20260424_170333
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424170333.529 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 17:04 | Success | - | |
|
exp_self.20260424165610.528_20260424_165611
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424165610.528 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 16:57 | Success | - | |
|
exp_self.20260424164845.527_20260424_164846
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424164845.527 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 16:49 | Success | - | |
|
exp_pytrain.20260424164619.131_20260424_164620
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 16:47 | Success | - | |
|
exp_self.20260424163919.526_20260424_163920
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424163919.526 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 16:40 | Success | - | |
|
exp_self.20260424163151.525_20260424_163151
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424163151.525 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 16:32 | Success | - | |
|
exp_self.20260424162431.524_20260424_162432
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424162431.524 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 16:25 | Success | - | |
|
exp_self.20260424161708.523_20260424_161709
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424161708.523 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 16:18 | Success | - | |
|
exp_pytrain.20260424161441.130_20260424_161441
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 16:15 | Success | - | |
|
exp_self.20260424160931.522_20260424_160932
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424160931.522 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 16:10 | Success | - | |
|
exp_self.20260424160214.521_20260424_160214
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424160214.521 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 16:03 | Success | - | |
|
exp_self.20260424155448.520_20260424_155449
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424155448.520 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 15:55 | Success | - | |
|
exp_self.20260424154722.519_20260424_154723
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424154722.519 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 15:48 | Success | - | |
|
exp_gh_vnmoorthy_pavo-bench_20260424_154439
|
vnmoorthy/pavo-bench
Paper ID: gh_vnmoorthy_pavo-bench - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 15:45 | Success | - | |
|
exp_pytrain.20260424154233.129_20260424_154233
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 15:43 | Success | - | |
|
exp_self.20260424153541.518_20260424_153542
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424153541.518 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 15:36 | Success | - | |
|
exp_self.20260424152820.517_20260424_152820
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424152820.517 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 15:29 | Success | - | |
|
exp_self.20260424152101.516_20260424_152101
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424152101.516 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 15:22 | Success | - | |
|
exp_self.20260424151340.515_20260424_151340
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424151340.515 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 15:14 | Success | - | |
|
exp_pytrain.20260424151114.128_20260424_151114
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 15:12 | Success | - | |
|
exp_self.20260424150423.514_20260424_150423
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424150423.514 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 15:05 | Success | - | |
|
exp_self.20260424145658.513_20260424_145658
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424145658.513 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 14:58 | Success | - | |
|
exp_self.20260424144935.512_20260424_144936
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424144935.512 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 14:50 | Success | - | |
|
exp_self.20260424144218.511_20260424_144218
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424144218.511 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 14:43 | Success | - | |
|
exp_pytrain.20260424143953.127_20260424_143953
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 14:40 | Success | - | |
|
exp_self.20260424143308.510_20260424_143308
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424143308.510 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 14:34 | Success | - | |
|
exp_self.20260424142543.509_20260424_142544
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424142543.509 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 14:26 | Success | - | |
|
exp_self.20260424141816.508_20260424_141816
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424141816.508 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 14:19 | Success | - | |
|
exp_self.20260424141053.507_20260424_141053
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424141053.507 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 14:11 | Success | - | |
|
exp_pytrain.20260424140833.126_20260424_140833
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 14:09 | Success | - | |
|
exp_self.20260424140139.506_20260424_140139
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424140139.506 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 14:02 | Success | - | |
|
exp_self.20260424135418.505_20260424_135419
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424135418.505 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 13:55 | Success | - | |
|
exp_self.20260424134657.504_20260424_134657
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424134657.504 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 13:47 | Success | - | |
|
exp_self.20260424133932.503_20260424_133933
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424133932.503 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 13:40 | Success | - | |
|
exp_pytrain.20260424133713.125_20260424_133714
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 13:38 | Success | - | |
|
exp_self.20260424133018.502_20260424_133019
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424133018.502 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 13:31 | Success | - | |
|
exp_self.20260424132254.501_20260424_132254
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424132254.501 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 13:23 | Success | - | |
|
exp_self.20260424131526.500_20260424_131526
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424131526.500 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 13:16 | Success | - | |
|
exp_self.20260424130759.499_20260424_130759
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424130759.499 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 13:09 | Success | - | |
|
exp_pytrain.20260424130541.124_20260424_130541
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 13:06 | Success | - | |
|
exp_self.20260424125849.498_20260424_125850
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424125849.498 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 12:59 | Success | - | |
|
exp_self.20260424125129.497_20260424_125129
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424125129.497 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 12:52 | Success | - | |
|
exp_hf_2604.20156_20260424_124815
|
Temporally Extended Mixture-of-Experts Models
Paper ID: hf_2604.20156 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-24 12:49 | Success | - | |
|
exp_self.20260424124252.496_20260424_124252
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424124252.496 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 12:43 | Success | - | |
|
exp_self.20260424123524.495_20260424_123524
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424123524.495 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 12:36 | Success | - | |
|
exp_pytrain.20260424123302.123_20260424_123303
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 12:34 | Success | - | |
|
exp_self.20260424122607.494_20260424_122607
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424122607.494 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 12:27 | Success | - | |
|
exp_self.20260424121850.493_20260424_121850
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424121850.493 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 12:19 | Success | - | |
|
exp_self.20260424121129.492_20260424_121129
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424121129.492 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 12:12 | Success | - | |
|
exp_hf_2506.17001_20260424_120557
|
PersonalAI: A Systematic Comparison of Knowledge Graph Storage and Retrieval Approaches for Personalized LLM agents
Paper ID: hf_2506.17001 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-24 12:06 | Success | - | |
|
exp_self.20260424120400.491_20260424_120401
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424120400.491 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 12:05 | Success | - | |
|
exp_pytrain.20260424120135.122_20260424_120135
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 12:02 | Success | - | |
|
exp_self.20260424115442.490_20260424_115443
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424115442.490 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 11:55 | Success | - | |
|
exp_cr_10.1093_scipol_scag026_20260424_115020
|
Generative AI in public administration: evaluating a fine-tuned large language model for policy briefing notes
Paper ID: cr_10.1093_scipol_scag026 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovere...
|
04-24 11:51 | Success | - | |
|
exp_self.20260424114713.489_20260424_114714
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424114713.489 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 11:48 | Success | - | |
|
exp_self.20260424113950.488_20260424_113950
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424113950.488 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 11:40 | Success | - | |
|
exp_self.20260424113134.487_20260424_113134
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424113134.487 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 11:32 | Success | - | |
|
exp_pytrain.20260424112913.121_20260424_112913
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 11:30 | Success | - | |
|
exp_self.20260424112212.486_20260424_112213
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424112212.486 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 11:23 | Success | - | |
|
exp_self.20260424111449.485_20260424_111449
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424111449.485 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 11:15 | Success | - | |
|
exp_self.20260424110725.484_20260424_110725
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424110725.484 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 11:08 | Success | - | |
|
exp_self.20260424105957.483_20260424_105957
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424105957.483 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 11:00 | Success | - | |
|
exp_pytrain.20260424105733.120_20260424_105733
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 10:58 | Success | - | |
|
exp_self.20260424105037.482_20260424_105037
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424105037.482 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 10:51 | Success | - | |
|
exp_self.20260424104314.481_20260424_104314
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424104314.481 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 10:44 | Success | - | |
|
exp_self.20260424103553.480_20260424_103554
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424103553.480 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 10:36 | Success | - | |
|
exp_self.20260424102824.479_20260424_102825
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424102824.479 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 10:29 | Success | - | |
|
exp_pytrain.20260424102556.119_20260424_102556
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 10:26 | Success | - | |
|
exp_self.20260424101902.478_20260424_101902
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424101902.478 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 10:20 | Success | - | |
|
exp_self.20260424101135.477_20260424_101135
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424101135.477 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 10:12 | Success | - | |
|
exp_self.20260424100413.476_20260424_100413
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424100413.476 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 10:05 | Success | - | |
|
exp_self.20260424095652.475_20260424_095652
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424095652.475 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 09:57 | Success | - | |
|
exp_pytrain.20260424095422.118_20260424_095423
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 09:55 | Success | - | |
|
exp_hf_2604.21915_20260424_095140
|
Vista4D: Video Reshooting with 4D Point Clouds
Paper ID: hf_2604.21915 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-24 09:52 | Success | - | |
|
exp_self.20260424094728.474_20260424_094729
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424094728.474 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 09:48 | Success | - | |
|
exp_cr_10.3390_s26092643_20260424_094413
|
Prediction of BDS-3 Satellite Clock Bias Based on the Mamba-LSTM Model
Paper ID: cr_10.3390_s26092643 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered ben...
|
04-24 09:45 | Success | - | |
|
exp_self.20260424093855.473_20260424_093855
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424093855.473 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 09:39 | Success | - | |
|
exp_self.20260424093129.472_20260424_093130
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424093129.472 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 09:32 | Success | - | |
|
exp_self.20260424092400.471_20260424_092401
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424092400.471 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 09:25 | Success | - | |
|
exp_pytrain.20260424092142.117_20260424_092142
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 09:22 | Success | - | |
|
exp_self.20260424091448.470_20260424_091448
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424091448.470 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 09:15 | Success | - | |
|
exp_self.20260424090726.469_20260424_090726
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424090726.469 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 09:08 | Success | - | |
|
exp_self.20260424090001.468_20260424_090002
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424090001.468 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 09:01 | Success | - | |
|
exp_self.20260424085232.467_20260424_085232
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424085232.467 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 08:53 | Success | - | |
|
exp_pytrain.20260424085007.116_20260424_085007
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 08:51 | Success | - | |
|
exp_gh_Rianbajukendari_mini-infer_20260424_084754
|
Rianbajukendari/mini-infer
Paper ID: gh_Rianbajukendari_mini-infer - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Reco...
|
04-24 08:48 | Success | - | |
|
exp_self.20260424084158.466_20260424_084158
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424084158.466 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 08:43 | Success | - | |
|
exp_self.20260424083437.465_20260424_083437
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424083437.465 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 08:35 | Success | - | |
|
exp_self.20260424082716.464_20260424_082716
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424082716.464 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 08:28 | Success | - | |
|
exp_self.20260424081950.463_20260424_081951
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424081950.463 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 08:20 | Success | - | |
|
exp_pytrain.20260424081723.115_20260424_081723
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 08:18 | Success | - | |
|
exp_self.20260424081031.462_20260424_081031
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424081031.462 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 08:11 | Success | - | |
|
exp_self.20260424080308.461_20260424_080308
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424080308.461 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 08:04 | Success | - | |
|
exp_self.20260424075541.460_20260424_075542
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424075541.460 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 07:56 | Success | - | |
|
exp_self.20260424074819.459_20260424_074819
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424074819.459 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 07:49 | Success | - | |
|
exp_pytrain.20260424074551.114_20260424_074551
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 07:46 | Success | - | |
|
exp_self.20260424073904.458_20260424_073904
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424073904.458 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 07:40 | Success | - | |
|
exp_self.20260424073135.457_20260424_073135
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424073135.457 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 07:32 | Success | - | |
|
exp_self.20260424072413.456_20260424_072414
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424072413.456 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 07:25 | Success | - | |
|
exp_self.20260424071649.455_20260424_071649
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424071649.455 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 07:17 | Success | - | |
|
exp_pytrain.20260424071420.113_20260424_071420
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 07:15 | Success | - | |
|
exp_self.20260424070724.454_20260424_070724
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424070724.454 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 07:08 | Success | - | |
|
exp_self.20260424065953.453_20260424_065954
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424065953.453 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 07:00 | Success | - | |
|
exp_self.20260424065228.452_20260424_065228
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424065228.452 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 06:53 | Success | - | |
|
exp_self.20260424064506.451_20260424_064506
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424064506.451 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 06:46 | Success | - | |
|
exp_pytrain.20260424064241.112_20260424_064242
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 06:43 | Success | - | |
|
exp_self.20260424063555.450_20260424_063555
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424063555.450 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 06:36 | Success | - | |
|
exp_self.20260424062825.449_20260424_062825
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424062825.449 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 06:29 | Success | - | |
|
exp_self.20260424062056.448_20260424_062056
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424062056.448 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 06:21 | Success | - | |
|
exp_self.20260424061333.447_20260424_061334
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424061333.447 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 06:14 | Success | - | |
|
exp_pytrain.20260424061108.111_20260424_061109
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 06:12 | Success | - | |
|
exp_self.20260424060415.446_20260424_060415
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424060415.446 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 06:05 | Success | - | |
|
exp_hf_2604.21668_20260424_055844
|
Encoder-Free Human Motion Understanding via Structured Motion Descriptions
Paper ID: hf_2604.21668 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-24 05:59 | Success | - | |
|
exp_self.20260424055647.445_20260424_055648
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424055647.445 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 05:57 | Success | - | |
|
exp_self.20260424054923.444_20260424_054924
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424054923.444 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 05:50 | Success | - | |
|
exp_self.20260424054155.443_20260424_054156
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424054155.443 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 05:42 | Success | - | |
|
exp_pytrain.20260424053931.110_20260424_053931
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 05:40 | Success | - | |
|
exp_self.20260424053237.442_20260424_053237
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424053237.442 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 05:33 | Success | - | |
|
exp_self.20260424052513.441_20260424_052513
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424052513.441 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 05:26 | Success | - | |
|
exp_self.20260424051751.440_20260424_051752
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424051751.440 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 05:18 | Success | - | |
|
exp_self.20260424051029.439_20260424_051030
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424051029.439 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 05:11 | Success | - | |
|
exp_pytrain.20260424050800.109_20260424_050801
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 05:09 | Success | - | |
|
exp_self.20260424050109.438_20260424_050109
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424050109.438 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 05:02 | Success | - | |
|
exp_self.20260424045344.437_20260424_045344
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424045344.437 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 04:54 | Success | - | |
|
exp_cr_10.38124_ijisrt_26apr950_20260424_044818
|
Contextiva: An Integrated Framework Based on Agentic Retrieval Augmented Generation and Model Context Protocol for AI-As...
Paper ID: cr_10.38124_ijisrt_26apr950 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recove...
|
04-24 04:49 | Success | - | |
|
exp_self.20260424044615.436_20260424_044615
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424044615.436 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 04:47 | Success | - | |
|
exp_self.20260424043845.435_20260424_043845
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424043845.435 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 04:39 | Success | - | |
|
exp_pytrain.20260424043627.108_20260424_043627
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 04:37 | Success | - | |
|
exp_self.20260424042933.434_20260424_042933
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424042933.434 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 04:30 | Success | - | |
|
exp_self.20260424042212.433_20260424_042212
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424042212.433 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 04:23 | Success | - | |
|
exp_self.20260424041439.432_20260424_041439
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424041439.432 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 04:15 | Success | - | |
|
exp_oa_W7155244741_20260424_040908
|
Efficient Video Diffusion Models: Advancements and Challenges
Paper ID: oa_W7155244741 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-24 04:10 | Success | - | |
|
exp_self.20260424040713.431_20260424_040714
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424040713.431 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 04:08 | Success | - | |
|
exp_pytrain.20260424040446.107_20260424_040446
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 04:05 | Success | - | |
|
exp_oa_W7155244458_20260424_040204
|
Neural Garbage Collection: Learning to Forget while Learning to Reason
Paper ID: oa_W7155244458 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-24 04:03 | Success | - | |
|
exp_self.20260424035636.430_20260424_035636
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424035636.430 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 03:57 | Success | - | |
|
exp_self.20260424034907.429_20260424_034908
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424034907.429 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 03:50 | Success | - | |
|
exp_self.20260424034147.428_20260424_034148
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424034147.428 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 03:42 | Success | - | |
|
exp_self.20260424033428.427_20260424_033428
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424033428.427 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 03:35 | Success | - | |
|
exp_pytrain.20260424033201.106_20260424_033202
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 03:33 | Success | - | |
|
exp_self.20260424032515.426_20260424_032515
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424032515.426 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 03:26 | Success | - | |
|
exp_self.20260424031744.425_20260424_031745
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424031744.425 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 03:18 | Success | - | |
|
exp_self.20260424031021.424_20260424_031021
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424031021.424 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 03:11 | Success | - | |
|
exp_self.20260424030259.423_20260424_030259
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424030259.423 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 03:04 | Success | - | |
|
exp_pytrain.20260424030035.105_20260424_030035
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 03:01 | Success | - | |
|
exp_self.20260424025348.422_20260424_025349
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424025348.422 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 02:54 | Success | - | |
|
exp_self.20260424024625.421_20260424_024625
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424024625.421 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 02:47 | Success | - | |
|
exp_self.20260424023857.420_20260424_023858
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424023857.420 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 02:40 | Success | - | |
|
exp_self.20260424023139.419_20260424_023140
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424023139.419 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 02:32 | Success | - | |
|
exp_pytrain.20260424022920.104_20260424_022921
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 02:30 | Success | - | |
|
exp_self.20260424022400.418_20260424_022401
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424022400.418 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 02:25 | Success | - | |
|
exp_self.20260424021639.417_20260424_021639
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424021639.417 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 02:17 | Success | - | |
|
exp_self.20260424020919.416_20260424_020919
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424020919.416 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 02:10 | Success | - | |
|
exp_self.20260424015946.415_20260424_015947
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424015946.415 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 02:00 | Success | - | |
|
exp_pytrain.20260424015727.103_20260424_015727
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 01:58 | Success | - | |
|
exp_self.20260424015034.414_20260424_015034
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424015034.414 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 01:51 | Success | - | |
|
exp_self.20260424014315.413_20260424_014315
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424014315.413 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 01:44 | Success | - | |
|
exp_self.20260424013551.412_20260424_013551
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424013551.412 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 01:36 | Success | - | |
|
exp_self.20260424012831.411_20260424_012832
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424012831.411 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 01:29 | Success | - | |
|
exp_pytrain.20260424012613.102_20260424_012614
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 01:27 | Success | - | |
|
exp_self.20260424011927.410_20260424_011927
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424011927.410 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 01:20 | Success | - | |
|
exp_self.20260424011208.409_20260424_011208
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424011208.409 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 01:13 | Success | - | |
|
exp_self.20260424010447.408_20260424_010447
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424010447.408 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 01:05 | Success | - | |
|
exp_self.20260424005654.407_20260424_005654
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424005654.407 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 00:57 | Success | - | |
|
exp_pytrain.20260424005435.101_20260424_005435
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 00:55 | Success | - | |
|
exp_gh_Solar-cmd_neural-arithmetic-compression_20260424_004938
|
Solar-cmd/neural-arithmetic-compression
Paper ID: gh_Solar-cmd_neural-arithmetic-compression - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected...
|
04-24 00:50 | Success | - | |
|
exp_self.20260424004739.406_20260424_004739
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424004739.406 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 00:48 | Success | - | |
|
exp_self.20260424004015.405_20260424_004016
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424004015.405 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 00:41 | Success | - | |
|
exp_self.20260424003259.404_20260424_003259
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424003259.404 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 00:34 | Success | - | |
|
exp_self.20260424002543.403_20260424_002543
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424002543.403 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 00:26 | Success | - | |
|
exp_pytrain.20260424002319.100_20260424_002320
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-24 00:24 | Success | - | |
|
exp_hf_2604.20398_20260424_002105
|
WebGen-R1: Incentivizing Large Language Models to Generate Functional and Aesthetic Websites with Reinforcement Learning
Paper ID: hf_2604.20398 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-24 00:22 | Success | - | |
|
exp_self.20260424001759.402_20260424_001759
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424001759.402 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 00:19 | Success | - | |
|
exp_self.20260424001039.401_20260424_001040
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424001039.401 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 00:11 | Success | - | |
|
exp_hf_2604.20244_20260424_000622
|
Hybrid Policy Distillation for LLMs
Paper ID: hf_2604.20244 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-24 00:07 | Success | - | |
|
exp_self.20260424000317.400_20260424_000317
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260424000317.400 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-24 00:04 | Success | - | |
|
exp_self.20260423235553.399_20260423_235554
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423235553.399 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 23:56 | Success | - | |
|
exp_pytrain.20260423235011.099_20260423_235011
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 23:51 | Success | - | |
|
exp_self.20260423234818.398_20260423_234818
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423234818.398 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 23:49 | Success | - | |
|
exp_self.20260423234057.397_20260423_234058
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423234057.397 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 23:42 | Success | - | |
|
exp_self.20260423233340.396_20260423_233340
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423233340.396 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 23:34 | Success | - | |
|
exp_self.20260423232622.395_20260423_232622
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423232622.395 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 23:27 | Success | - | |
|
exp_hf_2604.20987_20260423_232309
|
Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks
Paper ID: hf_2604.20987 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-23 23:24 | Success | - | |
|
exp_pytrain.20260423231853.098_20260423_231853
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 23:19 | Success | - | |
|
exp_self.20260423231700.394_20260423_231701
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423231700.394 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 23:18 | Success | - | |
|
exp_self.20260423230943.393_20260423_230944
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423230943.393 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 23:10 | Success | - | |
|
exp_self.20260423230223.392_20260423_230223
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423230223.392 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 23:03 | Success | - | |
|
exp_self.20260423225502.391_20260423_225502
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423225502.391 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 22:56 | Success | - | |
|
exp_self.20260423224742.390_20260423_224742
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423224742.390 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 22:48 | Success | - | |
|
exp_pytrain.20260423224525.097_20260423_224525
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 22:46 | Success | - | |
|
exp_self.20260423223833.389_20260423_223833
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423223833.389 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 22:39 | Success | - | |
|
exp_self.20260423223116.388_20260423_223116
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423223116.388 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 22:32 | Success | - | |
|
exp_hf_2604.21193_20260423_222801
|
Trust but Verify: Introducing DAVinCI -- A Framework for Dual Attribution and Verification in Claim Inference for Langua...
Paper ID: hf_2604.21193 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-23 22:29 | Success | - | |
|
exp_hf_2604.21889_20260423_222436
|
TingIS: Real-time Risk Event Discovery from Noisy Customer Incidents at Enterprise Scale
Paper ID: hf_2604.21889 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-23 22:25 | Success | - | |
|
exp_self.20260423222240.387_20260423_222240
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423222240.387 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 22:23 | Success | - | |
|
exp_self.20260423221519.386_20260423_221519
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423221519.386 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 22:16 | Success | - | |
|
exp_pytrain.20260423221251.096_20260423_221251
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 22:13 | Success | - | |
|
exp_self.20260423220600.385_20260423_220600
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423220600.385 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 22:07 | Success | - | |
|
exp_self.20260423215835.384_20260423_215835
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423215835.384 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 21:59 | Success | - | |
|
exp_self.20260423215114.383_20260423_215114
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423215114.383 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 21:52 | Success | - | |
|
exp_self.20260423214356.382_20260423_214357
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423214356.382 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 21:44 | Success | - | |
|
exp_pytrain.20260423214025.095_20260423_214026
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 21:41 | Success | - | |
|
exp_self.20260423213617.381_20260423_213617
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423213617.381 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 21:37 | Success | - | |
|
exp_self.20260423212858.380_20260423_212859
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423212858.380 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 21:30 | Success | - | |
|
exp_hf_2604.19734_20260423_212546
|
UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning and World Modeling
Paper ID: hf_2604.19734 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-23 21:26 | Success | - | |
|
exp_self.20260423212134.379_20260423_212134
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423212134.379 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 21:22 | Success | - | |
|
exp_self.20260423211425.378_20260423_211430
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423211425.378 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 21:15 | Success | - | |
|
exp_pytrain.20260423210832.094_20260423_210835
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 21:09 | Success | - | |
|
exp_self.20260423210534.377_20260423_210538
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423210534.377 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 21:06 | Success | - | |
|
exp_self.20260423205647.376_20260423_205649
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423205647.376 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 20:57 | Success | - | |
|
exp_self.20260423204823.375_20260423_204829
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423204823.375 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 20:49 | Success | - | |
|
exp_self.20260423203929.374_20260423_203936
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423203929.374 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 20:40 | Success | - | |
|
exp_pytrain.20260423203416.093_20260423_203418
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 20:35 | Success | - | |
|
exp_self.20260423203128.373_20260423_203132
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423203128.373 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 20:32 | Success | - | |
|
exp_self.20260423202218.372_20260423_202227
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423202218.372 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 20:23 | Success | - | |
|
exp_self.20260423201343.371_20260423_201346
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423201343.371 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 20:14 | Success | - | |
|
exp_2604.21816v1_20260423_200900
|
Tool Attention Is All You Need: Dynamic Tool Gating and Lazy Schema Loading for Eliminating the MCP/Tools Tax in Scalabl...
Paper ID: 2604.21816v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-23 20:10 | Success | - | |
|
exp_self.20260423200455.370_20260423_200458
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423200455.370 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 20:06 | Success | - | |
|
exp_pytrain.20260423195941.092_20260423_195941
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 20:00 | Success | - | |
|
exp_self.20260423195641.369_20260423_195649
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423195641.369 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 19:57 | Success | - | |
|
exp_self.20260423194749.368_20260423_194754
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423194749.368 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 19:48 | Success | - | |
|
exp_hf_2604.20200_20260423_194311
|
Chasing the Public Score: User Pressure and Evaluation Exploitation in Coding Agent Workflows
Paper ID: hf_2604.20200 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-23 19:44 | Success | - | |
|
exp_self.20260423193852.367_20260423_193858
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423193852.367 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 19:40 | Success | - | |
|
exp_self.20260423193005.366_20260423_193008
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423193005.366 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 19:31 | Success | - | |
|
exp_pytrain.20260423192419.091_20260423_192421
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 19:25 | Success | - | |
|
exp_self.20260423192135.365_20260423_192139
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423192135.365 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 19:22 | Success | - | |
|
exp_self.20260423191213.364_20260423_191218
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423191213.364 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 19:13 | Success | - | |
|
exp_self.20260423190323.363_20260423_190325
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423190323.363 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 19:04 | Success | - | |
|
exp_self.20260423185444.362_20260423_185447
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423185444.362 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 18:55 | Success | - | |
|
exp_pytrain.20260423184941.090_20260423_184945
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 18:50 | Success | - | |
|
exp_self.20260423184708.361_20260423_184710
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423184708.361 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 18:48 | Success | - | |
|
exp_self.20260423183819.360_20260423_183822
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423183819.360 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 18:39 | Success | - | |
|
exp_self.20260423182901.359_20260423_182904
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423182901.359 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 18:30 | Success | - | |
|
exp_self.20260423182021.358_20260423_182022
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423182021.358 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 18:21 | Success | - | |
|
exp_pytrain.20260423181516.089_20260423_181520
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 18:16 | Success | - | |
|
exp_self.20260423181226.357_20260423_181230
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423181226.357 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 18:13 | Success | - | |
|
exp_self.20260423180403.356_20260423_180405
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423180403.356 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 18:05 | Success | - | |
|
exp_self.20260423175618.355_20260423_175622
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423175618.355 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 17:57 | Success | - | |
|
exp_self.20260423174735.354_20260423_174739
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423174735.354 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 17:48 | Success | - | |
|
exp_pytrain.20260423174345.088_20260423_174350
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 17:44 | Success | - | |
|
exp_self.20260423173647.353_20260423_173650
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423173647.353 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 17:37 | Success | - | |
|
exp_self.20260423172829.352_20260423_172830
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423172829.352 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 17:29 | Success | - | |
|
exp_self.20260423172111.351_20260423_172112
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423172111.351 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 17:22 | Success | - | |
|
exp_self.20260423171354.350_20260423_171354
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423171354.350 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 17:14 | Success | - | |
|
exp_pytrain.20260423171127.087_20260423_171127
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 17:12 | Success | - | |
|
exp_self.20260423170717.349_20260423_170717
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423170717.349 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 17:08 | Success | - | |
|
exp_self.20260423165955.348_20260423_165955
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423165955.348 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 17:00 | Success | - | |
|
exp_self.20260423165243.347_20260423_165248
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423165243.347 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 16:53 | Success | - | |
|
exp_self.20260423164445.346_20260423_164450
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423164445.346 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 16:45 | Success | - | |
|
exp_pytrain.20260423163913.086_20260423_163919
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 16:40 | Success | - | |
|
exp_self.20260423163604.345_20260423_163608
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423163604.345 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 16:37 | Success | - | |
|
exp_self.20260423162719.344_20260423_162719
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423162719.344 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 16:28 | Success | - | |
|
exp_self.20260423162019.343_20260423_162023
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423162019.343 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 16:21 | Success | - | |
|
exp_self.20260423161123.342_20260423_161126
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423161123.342 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 16:12 | Success | - | |
|
exp_pytrain.20260423160738.085_20260423_160742
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 16:08 | Success | - | |
|
exp_self.20260423160023.341_20260423_160027
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423160023.341 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 16:01 | Success | - | |
|
exp_self.20260423155107.340_20260423_155113
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423155107.340 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 15:52 | Success | - | |
|
exp_self.20260423154226.339_20260423_154228
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423154226.339 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 15:43 | Success | - | |
|
exp_cr_10.31449_inf.v50i11.9002_20260423_153854
|
Hybrid LSTM-Transformer Model for Sequential and Context- Aware Tourism Destination Recommendation
Paper ID: cr_10.31449_inf.v50i11.9002 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recove...
|
04-23 15:39 | Success | - | |
|
exp_pytrain.20260423153605.084_20260423_153613
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 15:37 | Success | - | |
|
exp_self.20260423153035.338_20260423_153036
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423153035.338 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 15:31 | Success | - | |
|
exp_self.20260423152233.337_20260423_152236
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423152233.337 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 15:23 | Success | - | |
|
exp_self.20260423151431.336_20260423_151432
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423151431.336 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 15:15 | Success | - | |
|
exp_self.20260423150707.335_20260423_150707
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423150707.335 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 15:08 | Success | - | |
|
exp_pytrain.20260423150434.083_20260423_150435
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 15:05 | Success | - | |
|
exp_self.20260423145918.334_20260423_145919
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423145918.334 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 15:00 | Success | - | |
|
exp_self.20260423145152.333_20260423_145152
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423145152.333 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 14:52 | Success | - | |
|
exp_self.20260423144426.332_20260423_144427
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423144426.332 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 14:45 | Success | - | |
|
exp_self.20260423143648.331_20260423_143648
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423143648.331 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 14:37 | Success | - | |
|
exp_pytrain.20260423143315.082_20260423_143315
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 14:34 | Success | - | |
|
exp_self.20260423142913.330_20260423_142914
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423142913.330 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 14:30 | Success | - | |
|
exp_self.20260423142109.329_20260423_142109
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423142109.329 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 14:22 | Success | - | |
|
exp_self.20260423141347.328_20260423_141348
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423141347.328 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 14:14 | Success | - | |
|
exp_self.20260423140702.327_20260423_140704
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423140702.327 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 14:08 | Success | - | |
|
exp_pytrain.20260423140131.081_20260423_140132
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 14:02 | Success | - | |
|
exp_self.20260423135925.326_20260423_135930
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423135925.326 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 14:00 | Success | - | |
|
exp_self.20260423135101.325_20260423_135103
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423135101.325 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 13:52 | Success | - | |
|
exp_self.20260423134229.324_20260423_134229
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423134229.324 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 13:43 | Success | - | |
|
exp_self.20260423133519.323_20260423_133521
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423133519.323 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 13:36 | Success | - | |
|
exp_pytrain.20260423132946.080_20260423_132948
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 13:30 | Success | - | |
|
exp_self.20260423132733.322_20260423_132733
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423132733.322 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 13:28 | Success | - | |
|
exp_hf_2604.19835_20260423_132432
|
Expert Upcycling: Shifting the Compute-Efficient Frontier of Mixture-of-Experts
Paper ID: hf_2604.19835 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-23 13:25 | Success | - | |
|
exp_self.20260423131707.321_20260423_131709
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423131707.321 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 13:18 | Success | - | |
|
exp_self.20260423130842.320_20260423_130842
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423130842.320 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 13:09 | Success | - | |
|
exp_self.20260423130127.319_20260423_130127
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423130127.319 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 13:02 | Success | - | |
|
exp_pytrain.20260423125758.079_20260423_125758
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 12:59 | Success | - | |
|
exp_self.20260423125450.318_20260423_125450
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423125450.318 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 12:55 | Success | - | |
|
exp_self.20260423124733.317_20260423_124734
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423124733.317 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 12:48 | Success | - | |
|
exp_cr_10.3390_s26092616_20260423_124206
|
MSW-Mamba-Det: Multi-Scale Windowed State-Space Modeling for End-to-End Defect Detection in Photovoltaic Module Electrol...
Paper ID: cr_10.3390_s26092616 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered ben...
|
04-23 12:43 | Success | - | |
|
exp_self.20260423124008.316_20260423_124008
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423124008.316 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 12:41 | Success | - | |
|
exp_self.20260423123248.315_20260423_123248
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423123248.315 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 12:33 | Success | - | |
|
exp_pytrain.20260423122638.078_20260423_122638
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 12:27 | Success | - | |
|
exp_self.20260423122445.314_20260423_122446
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423122445.314 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 12:25 | Success | - | |
|
exp_self.20260423121729.313_20260423_121730
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423121729.313 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 12:18 | Success | - | |
|
exp_self.20260423121008.312_20260423_121008
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423121008.312 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 12:11 | Success | - | |
|
exp_self.20260423120248.311_20260423_120248
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423120248.311 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 12:03 | Success | - | |
|
exp_self.20260423115529.310_20260423_115529
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423115529.310 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 11:56 | Success | - | |
|
exp_pytrain.20260423115311.077_20260423_115312
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 11:54 | Success | - | |
|
exp_self.20260423114752.309_20260423_114752
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423114752.309 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 11:48 | Success | - | |
|
exp_self.20260423114033.308_20260423_114033
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423114033.308 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 11:41 | Success | - | |
|
exp_self.20260423113356.307_20260423_113357
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423113356.307 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 11:34 | Success | - | |
|
exp_self.20260423112605.306_20260423_112606
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423112605.306 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 11:27 | Success | - | |
|
exp_pytrain.20260423112150.076_20260423_112151
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 11:22 | Success | - | |
|
exp_self.20260423111851.305_20260423_111853
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423111851.305 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 11:19 | Success | - | |
|
exp_self.20260423111127.304_20260423_111128
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423111127.304 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 11:12 | Success | - | |
|
exp_hf_2604.20720_20260423_110510
|
COMPASS: COntinual Multilingual PEFT with Adaptive Semantic Sampling
Paper ID: hf_2604.20720 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-23 11:06 | Success | - | |
|
exp_self.20260423110306.303_20260423_110306
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423110306.303 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 11:04 | Success | - | |
|
exp_self.20260423105538.302_20260423_105541
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423105538.302 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 10:56 | Success | - | |
|
exp_pytrain.20260423105034.075_20260423_105034
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 10:51 | Success | - | |
|
exp_self.20260423104822.301_20260423_104823
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423104822.301 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 10:49 | Success | - | |
|
exp_self.20260423104116.300_20260423_104117
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423104116.300 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 10:42 | Success | - | |
|
exp_self.20260423103357.299_20260423_103357
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423103357.299 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 10:34 | Success | - | |
|
exp_self.20260423102622.298_20260423_102623
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423102622.298 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 10:27 | Success | - | |
|
exp_pytrain.20260423101601.074_20260423_101802
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 10:19 | Success | - | |
|
exp_self.20260423095646.297_20260423_095648
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423095646.297 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 09:57 | Success | - | |
|
exp_hf_2604.18780_20260423_095248
|
Streaming Structured Inference with Flash-SemiCRF
Paper ID: hf_2604.18780 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-23 09:53 | Success | - | |
|
exp_self.20260423094933.296_20260423_094933
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423094933.296 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 09:50 | Success | - | |
|
exp_hf_2604.16659_20260423_094450
|
Benign Fine-Tuning Breaks Safety Alignment in Audio LLMs
Paper ID: hf_2604.16659 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-23 09:45 | Success | - | |
|
exp_self.20260423094211.295_20260423_094213
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423094211.295 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 09:43 | Success | - | |
|
exp_pytrain.20260423093908.073_20260423_093910
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 09:40 | Success | - | |
|
exp_self.20260423093432.294_20260423_093436
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423093432.294 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 09:35 | Success | - | |
|
exp_self.20260423092722.293_20260423_092723
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423092722.293 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 09:28 | Success | - | |
|
exp_hf_2604.15093_20260423_092417
|
OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis
Paper ID: hf_2604.15093 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-23 09:25 | Success | - | |
|
exp_self.20260423091752.292_20260423_091753
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423091752.292 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 09:18 | Success | - | |
|
exp_self.20260423090952.291_20260423_090953
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423090952.291 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 09:10 | Success | - | |
|
exp_pytrain.20260423090709.072_20260423_090709
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 09:08 | Success | - | |
|
exp_self.20260423090034.290_20260423_090034
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423090034.290 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 09:01 | Success | - | |
|
exp_self.20260423085335.289_20260423_085335
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423085335.289 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 08:54 | Success | - | |
|
exp_self.20260423084612.288_20260423_084612
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423084612.288 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 08:47 | Success | - | |
|
exp_self.20260423083908.287_20260423_083911
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423083908.287 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 08:40 | Success | - | |
|
exp_pytrain.20260423083552.071_20260423_083553
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 08:36 | Success | - | |
|
exp_self.20260423083000.286_20260423_083002
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423083000.286 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 08:31 | Success | - | |
|
exp_self.20260423082237.285_20260423_082239
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423082237.285 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 08:23 | Success | - | |
|
exp_self.20260423081454.284_20260423_081455
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423081454.284 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 08:15 | Success | - | |
|
exp_cr_10.3390_agriculture16090927_20260423_081104
|
A Copula-Based Efficiency Effects Stochastic Frontier Model with Application to Government Programs in Thai Rice Farming
Paper ID: cr_10.3390_agriculture16090927 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Rec...
|
04-23 08:12 | Success | - | |
|
exp_self.20260423080734.283_20260423_080737
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423080734.283 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 08:08 | Success | - | |
|
exp_pytrain.20260423080413.070_20260423_080415
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 08:05 | Success | - | |
|
exp_self.20260423075737.282_20260423_075738
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423075737.282 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 07:58 | Success | - | |
|
exp_self.20260423075031.281_20260423_075032
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423075031.281 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 07:51 | Success | - | |
|
exp_self.20260423074326.280_20260423_074328
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423074326.280 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 07:44 | Success | - | |
|
exp_self.20260423073556.279_20260423_073558
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423073556.279 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 07:37 | Success | - | |
|
exp_pytrain.20260423073235.069_20260423_073239
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 07:33 | Success | - | |
|
exp_self.20260423072616.278_20260423_072618
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423072616.278 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 07:27 | Success | - | |
|
exp_self.20260423071835.277_20260423_071836
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423071835.277 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 07:19 | Success | - | |
|
exp_self.20260423071105.276_20260423_071108
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423071105.276 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 07:12 | Success | - | |
|
exp_self.20260423070343.275_20260423_070344
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423070343.275 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 07:04 | Success | - | |
|
exp_pytrain.20260423070046.068_20260423_070048
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 07:01 | Success | - | |
|
exp_self.20260423065413.274_20260423_065415
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423065413.274 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 06:55 | Success | - | |
|
exp_self.20260423064658.273_20260423_064700
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423064658.273 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 06:48 | Success | - | |
|
exp_self.20260423063932.272_20260423_063934
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423063932.272 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 06:40 | Success | - | |
|
exp_self.20260423063213.271_20260423_063217
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423063213.271 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 06:33 | Success | - | |
|
exp_pytrain.20260423062858.067_20260423_062901
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 06:30 | Success | - | |
|
exp_self.20260423062219.270_20260423_062223
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423062219.270 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 06:23 | Success | - | |
|
exp_self.20260423061456.269_20260423_061458
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423061456.269 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 06:16 | Success | - | |
|
exp_self.20260423060726.268_20260423_060728
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423060726.268 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 06:08 | Success | - | |
|
exp_self.20260423060002.267_20260423_060005
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423060002.267 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 06:01 | Success | - | |
|
exp_pytrain.20260423055707.066_20260423_055708
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 05:58 | Success | - | |
|
exp_self.20260423055031.266_20260423_055034
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423055031.266 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 05:51 | Success | - | |
|
exp_self.20260423054302.265_20260423_054304
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423054302.265 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 05:44 | Success | - | |
|
exp_self.20260423053533.264_20260423_053537
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423053533.264 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 05:36 | Success | - | |
|
exp_self.20260423052806.263_20260423_052807
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423052806.263 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 05:29 | Success | - | |
|
exp_pytrain.20260423052511.065_20260423_052515
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 05:26 | Success | - | |
|
exp_self.20260423052020.262_20260423_052022
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423052020.262 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 05:21 | Success | - | |
|
exp_self.20260423051255.261_20260423_051256
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423051255.261 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 05:13 | Success | - | |
|
exp_self.20260423050522.260_20260423_050524
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423050522.260 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 05:06 | Success | - | |
|
exp_self.20260423045750.259_20260423_045753
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423045750.259 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 04:58 | Success | - | |
|
exp_pytrain.20260423045341.064_20260423_045343
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 04:54 | Success | - | |
|
exp_self.20260423045026.258_20260423_045028
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423045026.258 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 04:51 | Success | - | |
|
exp_self.20260423044256.257_20260423_044257
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423044256.257 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 04:43 | Success | - | |
|
exp_self.20260423043531.256_20260423_043532
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423043531.256 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 04:36 | Success | - | |
|
exp_gh_Yigtwxx_Awesome-RAG-Production_20260423_043223
|
Yigtwxx/Awesome-RAG-Production
Paper ID: gh_Yigtwxx_Awesome-RAG-Production - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal:...
|
04-23 04:33 | Success | - | |
|
exp_self.20260423042540.255_20260423_042541
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423042540.255 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 04:26 | Success | - | |
|
exp_pytrain.20260423042220.063_20260423_042222
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 04:23 | Success | - | |
|
exp_self.20260423041717.254_20260423_041719
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423041717.254 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 04:18 | Success | - | |
|
exp_self.20260423040956.253_20260423_040959
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423040956.253 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 04:11 | Success | - | |
|
exp_self.20260423040245.252_20260423_040246
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423040245.252 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 04:03 | Success | - | |
|
exp_hf_2604.19572_20260423_035835
|
A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression
Paper ID: hf_2604.19572 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-23 03:59 | Success | - | |
|
exp_self.20260423035352.251_20260423_035354
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423035352.251 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 03:54 | Success | - | |
|
exp_pytrain.20260423035057.062_20260423_035058
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 03:52 | Success | - | |
|
exp_self.20260423034435.250_20260423_034436
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423034435.250 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 03:45 | Success | - | |
|
exp_self.20260423033640.249_20260423_033645
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423033640.249 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 03:37 | Success | - | |
|
exp_self.20260423032911.248_20260423_032913
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423032911.248 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 03:30 | Success | - | |
|
exp_self.20260423032141.247_20260423_032143
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423032141.247 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 03:22 | Success | - | |
|
exp_pytrain.20260423031827.061_20260423_031831
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 03:19 | Success | - | |
|
exp_self.20260423031146.246_20260423_031148
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423031146.246 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 03:12 | Success | - | |
|
exp_self.20260423030416.245_20260423_030417
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423030416.245 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 03:05 | Success | - | |
|
exp_self.20260423025717.244_20260423_025718
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423025717.244 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 02:58 | Success | - | |
|
exp_self.20260423024957.243_20260423_024958
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423024957.243 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 02:51 | Success | - | |
|
exp_pytrain.20260423024539.060_20260423_024540
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 02:46 | Success | - | |
|
exp_self.20260423024150.242_20260423_024150
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423024150.242 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 02:42 | Success | - | |
|
exp_self.20260423023428.241_20260423_023430
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423023428.241 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 02:35 | Success | - | |
|
exp_self.20260423022702.240_20260423_022704
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423022702.240 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 02:28 | Success | - | |
|
exp_hf_2604.18982_20260423_022354
|
SAVOIR: Learning Social Savoir-Faire via Shapley-based Reward Attribution
Paper ID: hf_2604.18982 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-23 02:24 | Success | - | |
|
exp_self.20260423021714.239_20260423_021715
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423021714.239 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 02:18 | Success | - | |
|
exp_pytrain.20260423021406.059_20260423_021407
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 02:15 | Success | - | |
|
exp_self.20260423020920.238_20260423_020921
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423020920.238 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 02:10 | Success | - | |
|
exp_self.20260423020155.237_20260423_020156
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423020155.237 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 02:02 | Success | - | |
|
exp_self.20260423015433.236_20260423_015435
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423015433.236 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 01:55 | Success | - | |
|
exp_gh_clareembattled960_turboQuantPlayground_20260423_015047
|
clareembattled960/turboQuantPlayground
Paper ID: gh_clareembattled960_turboQuantPlayground - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected...
|
04-23 01:51 | Success | - | |
|
exp_hf_2604.16529_20260423_014821
|
Scaling Test-Time Compute for Agentic Coding
Paper ID: hf_2604.16529 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-23 01:49 | Success | - | |
|
exp_self.20260423014555.235_20260423_014558
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423014555.235 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 01:47 | Success | - | |
|
exp_pytrain.20260423014244.058_20260423_014245
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 01:43 | Success | - | |
|
exp_self.20260423013807.234_20260423_013809
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423013807.234 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 01:39 | Success | - | |
|
exp_self.20260423013057.233_20260423_013059
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423013057.233 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 01:32 | Success | - | |
|
exp_self.20260423012332.232_20260423_012332
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423012332.232 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 01:24 | Success | - | |
|
exp_self.20260423011620.231_20260423_011623
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423011620.231 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 01:17 | Success | - | |
|
exp_pytrain.20260423011101.057_20260423_011102
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 01:12 | Success | - | |
|
exp_self.20260423010849.230_20260423_010851
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423010849.230 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 01:09 | Success | - | |
|
exp_self.20260423010128.229_20260423_010132
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423010128.229 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 01:02 | Success | - | |
|
exp_self.20260423005406.228_20260423_005408
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423005406.228 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 00:55 | Success | - | |
|
exp_self.20260423004706.227_20260423_004709
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423004706.227 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 00:48 | Success | - | |
|
exp_self.20260423003953.226_20260423_003954
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423003953.226 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 00:40 | Success | - | |
|
exp_pytrain.20260423003700.056_20260423_003702
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 00:38 | Success | - | |
|
exp_self.20260423003034.225_20260423_003036
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423003034.225 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 00:31 | Success | - | |
|
exp_self.20260423002316.224_20260423_002318
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423002316.224 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 00:24 | Success | - | |
|
exp_self.20260423001548.223_20260423_001551
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423001548.223 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 00:16 | Success | - | |
|
exp_self.20260423000804.222_20260423_000806
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260423000804.222 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-23 00:09 | Success | - | |
|
exp_pytrain.20260423000434.055_20260423_000437
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-23 00:05 | Success | - | |
|
exp_self.20260422235756.221_20260422_235757
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422235756.221 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 23:58 | Success | - | |
|
exp_self.20260422235029.220_20260422_235030
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422235029.220 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 23:51 | Success | - | |
|
exp_self.20260422234309.219_20260422_234311
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422234309.219 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 23:44 | Success | - | |
|
exp_self.20260422233552.218_20260422_233553
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422233552.218 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 23:36 | Success | - | |
|
exp_pytrain.20260422233247.054_20260422_233250
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 23:33 | Success | - | |
|
exp_self.20260422232628.217_20260422_232629
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422232628.217 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 23:27 | Success | - | |
|
exp_self.20260422231903.216_20260422_231904
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422231903.216 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 23:20 | Success | - | |
|
exp_self.20260422231101.215_20260422_231102
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422231101.215 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 23:12 | Success | - | |
|
exp_self.20260422230345.214_20260422_230346
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422230345.214 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 23:04 | Success | - | |
|
exp_pytrain.20260422230121.053_20260422_230122
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 23:02 | Success | - | |
|
exp_hf_2604.20570_20260422_225907
|
Exploring Spatial Intelligence from a Generative Perspective
Paper ID: hf_2604.20570 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-22 23:00 | Success | - | |
|
exp_self.20260422225604.213_20260422_225604
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422225604.213 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 22:57 | Success | - | |
|
exp_self.20260422224850.212_20260422_224850
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422224850.212 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 22:49 | Success | - | |
|
exp_self.20260422224136.211_20260422_224136
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422224136.211 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 22:42 | Success | - | |
|
exp_self.20260422223420.210_20260422_223420
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422223420.210 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 22:35 | Success | - | |
|
exp_pytrain.20260422222837.052_20260422_222837
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 22:29 | Success | - | |
|
exp_self.20260422222645.209_20260422_222646
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422222645.209 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 22:27 | Success | - | |
|
exp_self.20260422221930.208_20260422_221930
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422221930.208 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 22:20 | Success | - | |
|
exp_hf_2604.20817_20260422_221402
|
Convergent Evolution: How Different Language Models Learn Similar Number Representations
Paper ID: hf_2604.20817 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-22 22:15 | Success | - | |
|
exp_self.20260422221208.207_20260422_221208
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422221208.207 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 22:13 | Success | - | |
|
exp_self.20260422220451.206_20260422_220452
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422220451.206 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 22:05 | Success | - | |
|
exp_self.20260422215731.205_20260422_215732
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422215731.205 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 21:58 | Success | - | |
|
exp_pytrain.20260422215512.051_20260422_215513
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 21:56 | Success | - | |
|
exp_2604.20842v1_20260422_215259
|
SpeechParaling-Bench: A Comprehensive Benchmark for Paralinguistic-Aware Speech Generation
Paper ID: 2604.20842v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-22 21:54 | Success | - | |
|
exp_self.20260422214954.204_20260422_214955
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422214954.204 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 21:50 | Success | - | |
|
exp_self.20260422214236.203_20260422_214236
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422214236.203 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 21:43 | Success | - | |
|
exp_hf_2604.14932_20260422_213919
|
WavAlign: Enhancing Intelligence and Expressiveness in Spoken Dialogue Models via Adaptive Hybrid Post-Training
Paper ID: hf_2604.14932 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-22 21:40 | Success | - | |
|
exp_self.20260422213506.202_20260422_213506
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422213506.202 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 21:36 | Success | - | |
|
exp_self.20260422212745.201_20260422_212746
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422212745.201 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 21:28 | Success | - | |
|
exp_pytrain.20260422212202.050_20260422_212202
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 21:23 | Success | - | |
|
exp_self.20260422212011.200_20260422_212011
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422212011.200 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 21:21 | Success | - | |
|
exp_gh_SonySemiconductorSolutions_mct-model-optimization_20260422_211728
|
SonySemiconductorSolutions/mct-model-optimization
Paper ID: gh_SonySemiconductorSolutions_mct-model-optimization - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry....
|
04-22 21:18 | Success | - | |
|
exp_hf_2604.19902_20260422_211221
|
MMCORE: MultiModal COnnection with Representation Aligned Latent Embeddings
Paper ID: hf_2604.19902 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-22 21:13 | Success | - | |
|
exp_self.20260422211028.199_20260422_211028
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422211028.199 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 21:11 | Success | - | |
|
exp_hf_2604.20796_20260422_210713
|
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model
Paper ID: hf_2604.20796 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-22 21:08 | Success | - | |
|
exp_2604.20688v1_20260422_210456
|
Storm Surge Modeling, Bias Correction, Graph Neural Networks, Graph Convolution Networks
Paper ID: 2604.20688v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-22 21:05 | Success | - | |
|
exp_self.20260422210257.198_20260422_210258
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422210257.198 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 21:04 | Success | - | |
|
exp_2604.20682v1_20260422_205945
|
Variance Is Not Importance: Structural Analysis of Transformer Compressibility Across Model Scales
Paper ID: 2604.20682v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-22 21:00 | Success | - | |
|
exp_self.20260422205537.197_20260422_205537
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422205537.197 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 20:56 | Success | - | |
|
exp_pytrain.20260422204924.049_20260422_204924
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 20:50 | Success | - | |
|
exp_self.20260422204733.196_20260422_204734
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422204733.196 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 20:48 | Success | - | |
|
exp_self.20260422204019.195_20260422_204020
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422204019.195 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 20:41 | Success | - | |
|
exp_self.20260422203255.194_20260422_203256
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422203255.194 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 20:33 | Success | - | |
|
exp_self.20260422202539.193_20260422_202540
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422202539.193 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 20:26 | Success | - | |
|
exp_self.20260422201825.192_20260422_201825
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422201825.192 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 20:19 | Success | - | |
|
exp_pytrain.20260422201606.048_20260422_201607
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 20:17 | Success | - | |
|
exp_self.20260422200920.191_20260422_200921
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422200920.191 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 20:10 | Success | - | |
|
exp_hf_2604.15664_20260422_200607
|
Stargazer: A Scalable Model-Fitting Benchmark Environment for AI Agents under Astrophysical Constraints
Paper ID: hf_2604.15664 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-22 20:07 | Success | - | |
|
exp_self.20260422200054.190_20260422_200054
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422200054.190 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 20:01 | Success | - | |
|
exp_self.20260422195334.189_20260422_195334
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422195334.189 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 19:54 | Success | - | |
|
exp_self.20260422194617.188_20260422_194618
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422194617.188 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 19:47 | Success | - | |
|
exp_pytrain.20260422194400.047_20260422_194401
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 19:45 | Success | - | |
|
exp_self.20260422193715.187_20260422_193716
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422193715.187 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 19:38 | Success | - | |
|
exp_self.20260422193001.186_20260422_193001
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422193001.186 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 19:31 | Success | - | |
|
exp_self.20260422192241.185_20260422_192241
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422192241.185 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 19:23 | Success | - | |
|
exp_self.20260422191522.184_20260422_191523
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422191522.184 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 19:16 | Success | - | |
|
exp_pytrain.20260422191154.046_20260422_191155
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 19:12 | Success | - | |
|
exp_self.20260422190748.183_20260422_190749
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422190748.183 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 19:08 | Success | - | |
|
exp_self.20260422190030.182_20260422_190031
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422190030.182 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 19:01 | Success | - | |
|
exp_self.20260422185315.181_20260422_185316
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422185315.181 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 18:54 | Success | - | |
|
exp_self.20260422184559.180_20260422_184600
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422184559.180 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 18:47 | Success | - | |
|
exp_pytrain.20260422184016.045_20260422_184016
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 18:41 | Success | - | |
|
exp_self.20260422183824.179_20260422_183824
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422183824.179 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 18:39 | Success | - | |
|
exp_self.20260422183109.178_20260422_183110
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422183109.178 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 18:32 | Success | - | |
|
exp_self.20260422182355.177_20260422_182355
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422182355.177 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 18:24 | Success | - | |
|
exp_self.20260422181634.176_20260422_181635
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422181634.176 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 18:17 | Success | - | |
|
exp_self.20260422180918.175_20260422_180918
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422180918.175 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 18:10 | Success | - | |
|
exp_pytrain.20260422180700.044_20260422_180700
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 18:08 | Success | - | |
|
exp_self.20260422180015.174_20260422_180016
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422180015.174 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 18:01 | Success | - | |
|
exp_self.20260422175300.173_20260422_175300
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422175300.173 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 17:54 | Success | - | |
|
exp_self.20260422174540.172_20260422_174540
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422174540.172 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 17:46 | Success | - | |
|
exp_self.20260422173822.171_20260422_173822
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422173822.171 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 17:39 | Success | - | |
|
exp_pytrain.20260422173454.043_20260422_173454
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 17:35 | Success | - | |
|
exp_self.20260422173049.170_20260422_173049
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422173049.170 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 17:31 | Success | - | |
|
exp_self.20260422172332.169_20260422_172332
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422172332.169 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 17:24 | Success | - | |
|
exp_self.20260422171616.168_20260422_171617
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422171616.168 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 17:17 | Success | - | |
|
exp_self.20260422170902.167_20260422_170902
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422170902.167 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 17:10 | Success | - | |
|
exp_pytrain.20260422170318.042_20260422_170318
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 17:04 | Success | - | |
|
exp_self.20260422170127.166_20260422_170128
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422170127.166 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 17:02 | Success | - | |
|
exp_self.20260422165413.165_20260422_165414
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422165413.165 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 16:55 | Success | - | |
|
exp_self.20260422164659.164_20260422_164659
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422164659.164 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 16:48 | Success | - | |
|
exp_self.20260422163937.163_20260422_163938
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422163937.163 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 16:40 | Success | - | |
|
exp_self.20260422163221.162_20260422_163221
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422163221.162 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 16:33 | Success | - | |
|
exp_pytrain.20260422163004.041_20260422_163004
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 16:31 | Success | - | |
|
exp_self.20260422162311.161_20260422_162312
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422162311.161 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 16:24 | Success | - | |
|
exp_self.20260422161552.160_20260422_161552
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422161552.160 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 16:16 | Success | - | |
|
exp_self.20260422160834.159_20260422_160835
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422160834.159 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 16:09 | Success | - | |
|
exp_self.20260422160119.158_20260422_160120
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422160119.158 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 16:02 | Success | - | |
|
exp_pytrain.20260422155849.040_20260422_155849
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 15:59 | Success | - | |
|
exp_self.20260422155159.157_20260422_155200
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422155159.157 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 15:53 | Success | - | |
|
exp_self.20260422154434.156_20260422_154434
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422154434.156 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 15:45 | Success | - | |
|
exp_self.20260422153706.155_20260422_153707
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422153706.155 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 15:38 | Success | - | |
|
exp_self.20260422152932.154_20260422_152932
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422152932.154 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 15:30 | Success | - | |
|
exp_pytrain.20260422152657.039_20260422_152658
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 15:28 | Success | - | |
|
exp_self.20260422151952.153_20260422_151953
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422151952.153 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 15:20 | Success | - | |
|
exp_self.20260422151212.152_20260422_151213
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422151212.152 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 15:13 | Success | - | |
|
exp_self.20260422150437.151_20260422_150438
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422150437.151 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 15:05 | Success | - | |
|
exp_self.20260422145653.150_20260422_145653
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422145653.150 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 14:57 | Success | - | |
|
exp_pytrain.20260422145407.038_20260422_145408
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 14:55 | Success | - | |
|
exp_self.20260422144825.149_20260422_144825
|
Self-directed benchmark: ssm_mamba strategy stress test
Paper ID: self.20260422144825.149 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 14:49 | Success | - | |
|
exp_self.20260422144045.148_20260422_144045
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422144045.148 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 14:41 | Success | - | |
|
exp_self.20260422143259.147_20260422_143259
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422143259.147 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 14:34 | Success | - | |
|
exp_self.20260422142514.146_20260422_142514
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422142514.146 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 14:26 | Success | - | |
|
exp_pytrain.20260422142241.037_20260422_142241
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 14:23 | Success | - | |
|
exp_self.20260422141534.145_20260422_141535
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422141534.145 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 14:16 | Success | - | |
|
exp_self.20260422140759.144_20260422_140759
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422140759.144 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 14:09 | Success | - | |
|
exp_self.20260422140016.143_20260422_140016
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422140016.143 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 14:01 | Success | - | |
|
exp_self.20260422135233.142_20260422_135233
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422135233.142 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 13:53 | Success | - | |
|
exp_pytrain.20260422134959.036_20260422_135000
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 13:51 | Success | - | |
|
exp_self.20260422134402.141_20260422_134402
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422134402.141 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 13:45 | Success | - | |
|
exp_self.20260422133619.140_20260422_133619
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422133619.140 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 13:37 | Success | - | |
|
exp_self.20260422132839.139_20260422_132839
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422132839.139 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 13:29 | Success | - | |
|
exp_self.20260422132049.138_20260422_132050
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422132049.138 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 13:21 | Success | - | |
|
exp_pytrain.20260422131809.035_20260422_131809
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 13:19 | Success | - | |
|
exp_self.20260422131105.137_20260422_131105
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422131105.137 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 13:12 | Success | - | |
|
exp_self.20260422130326.136_20260422_130326
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422130326.136 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 13:04 | Success | - | |
|
exp_self.20260422125553.135_20260422_125553
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422125553.135 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 12:56 | Success | - | |
|
exp_self.20260422124820.134_20260422_124821
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422124820.134 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 12:49 | Success | - | |
|
exp_pytrain.20260422124545.034_20260422_124545
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 12:46 | Success | - | |
|
exp_self.20260422123848.133_20260422_123848
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422123848.133 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 12:39 | Success | - | |
|
exp_self.20260422123105.132_20260422_123105
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422123105.132 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 12:32 | Success | - | |
|
exp_self.20260422122325.131_20260422_122326
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422122325.131 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 12:24 | Success | - | |
|
exp_self.20260422121551.130_20260422_121552
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422121551.130 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 12:16 | Success | - | |
|
exp_pytrain.20260422121322.033_20260422_121322
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 12:14 | Success | - | |
|
exp_self.20260422120614.129_20260422_120614
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422120614.129 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 12:07 | Success | - | |
|
exp_self.20260422115837.128_20260422_115837
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422115837.128 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 11:59 | Success | - | |
|
exp_self.20260422115056.127_20260422_115057
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422115056.127 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 11:52 | Success | - | |
|
exp_self.20260422114320.126_20260422_114320
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422114320.126 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 11:44 | Success | - | |
|
exp_pytrain.20260422114049.032_20260422_114049
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 11:41 | Success | - | |
|
exp_self.20260422113341.125_20260422_113342
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422113341.125 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 11:34 | Success | - | |
|
exp_self.20260422112605.124_20260422_112606
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422112605.124 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 11:27 | Success | - | |
|
exp_self.20260422111822.123_20260422_111823
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422111822.123 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 11:19 | Success | - | |
|
exp_self.20260422111041.122_20260422_111041
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422111041.122 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 11:11 | Success | - | |
|
exp_pytrain.20260422110808.031_20260422_110808
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 11:09 | Success | - | |
|
exp_self.20260422110056.121_20260422_110057
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422110056.121 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 11:02 | Success | - | |
|
exp_self.20260422105318.120_20260422_105318
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422105318.120 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 10:54 | Success | - | |
|
exp_self.20260422104541.119_20260422_104541
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422104541.119 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 10:46 | Success | - | |
|
exp_self.20260422103758.118_20260422_103759
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422103758.118 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 10:39 | Success | - | |
|
exp_pytrain.20260422103526.030_20260422_103526
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 10:36 | Success | - | |
|
exp_self.20260422103001.117_20260422_103002
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422103001.117 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 10:31 | Success | - | |
|
exp_self.20260422102215.116_20260422_102215
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422102215.116 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 10:23 | Success | - | |
|
exp_self.20260422101433.115_20260422_101434
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422101433.115 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 10:15 | Success | - | |
|
exp_self.20260422100645.114_20260422_100646
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422100645.114 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 10:07 | Success | - | |
|
exp_pytrain.20260422100346.029_20260422_100346
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 10:04 | Success | - | |
|
exp_self.20260422095726.113_20260422_095726
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422095726.113 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 09:58 | Success | - | |
|
exp_self.20260422094951.112_20260422_094951
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422094951.112 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 09:50 | Success | - | |
|
exp_self.20260422094215.111_20260422_094216
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422094215.111 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 09:43 | Success | - | |
|
exp_self.20260422093442.110_20260422_093443
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422093442.110 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 09:35 | Success | - | |
|
exp_pytrain.20260422093216.028_20260422_093216
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 09:33 | Success | - | |
|
exp_self.20260422092508.109_20260422_092509
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422092508.109 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 09:26 | Success | - | |
|
exp_self.20260422091733.108_20260422_091734
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422091733.108 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 09:18 | Success | - | |
|
exp_self.20260422091014.107_20260422_091015
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422091014.107 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 09:11 | Success | - | |
|
exp_self.20260422090235.106_20260422_090236
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422090235.106 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 09:03 | Success | - | |
|
exp_pytrain.20260422085930.027_20260422_085931
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 09:00 | Success | - | |
|
exp_self.20260422085302.105_20260422_085303
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422085302.105 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 08:54 | Success | - | |
|
exp_self.20260422084535.104_20260422_084537
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422084535.104 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 08:46 | Success | - | |
|
exp_hf_2604.19642_20260422_084027
|
Micro Language Models Enable Instant Responses
Paper ID: hf_2604.19642 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-22 08:41 | Success | - | |
|
exp_self.20260422083758.103_20260422_083759
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422083758.103 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 08:39 | Success | - | |
|
exp_self.20260422083025.102_20260422_083028
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422083025.102 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 08:31 | Success | - | |
|
exp_pytrain.20260422082712.026_20260422_082714
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 08:28 | Success | - | |
|
exp_self.20260422082033.101_20260422_082034
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422082033.101 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 08:21 | Success | - | |
|
exp_self.20260422081307.100_20260422_081309
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422081307.100 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 08:14 | Success | - | |
|
exp_self.20260422080529.099_20260422_080531
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422080529.099 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 08:06 | Success | - | |
|
exp_self.20260422075750.098_20260422_075753
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422075750.098 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 07:58 | Success | - | |
|
exp_pytrain.20260422075430.025_20260422_075432
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 07:55 | Success | - | |
|
exp_self.20260422074938.097_20260422_074940
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422074938.097 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 07:50 | Success | - | |
|
exp_self.20260422074224.096_20260422_074225
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422074224.096 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 07:43 | Success | - | |
|
exp_self.20260422073451.095_20260422_073451
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422073451.095 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 07:35 | Success | - | |
|
exp_self.20260422072630.094_20260422_072631
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422072630.094 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 07:27 | Success | - | |
|
exp_pytrain.20260422072220.024_20260422_072221
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 07:23 | Success | - | |
|
exp_self.20260422071830.093_20260422_071830
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422071830.093 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 07:19 | Success | - | |
|
exp_self.20260422071026.092_20260422_071028
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422071026.092 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 07:11 | Success | - | |
|
exp_self.20260422070221.091_20260422_070222
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422070221.091 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 07:03 | Success | - | |
|
exp_self.20260422065440.090_20260422_065442
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422065440.090 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 06:55 | Success | - | |
|
exp_pytrain.20260422065027.023_20260422_065028
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 06:51 | Success | - | |
|
exp_self.20260422064653.089_20260422_064654
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422064653.089 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 06:47 | Success | - | |
|
exp_self.20260422063941.088_20260422_063943
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422063941.088 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 06:40 | Success | - | |
|
exp_self.20260422063144.087_20260422_063144
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422063144.087 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 06:32 | Success | - | |
|
exp_self.20260422062348.086_20260422_062348
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422062348.086 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 06:24 | Success | - | |
|
exp_pytrain.20260422061859.022_20260422_061900
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 06:20 | Success | - | |
|
exp_self.20260422061703.085_20260422_061703
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422061703.085 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 06:18 | Success | - | |
|
exp_self.20260422061020.084_20260422_061020
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422061020.084 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 06:11 | Success | - | |
|
exp_self.20260422060328.083_20260422_060329
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422060328.083 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 06:04 | Success | - | |
|
exp_self.20260422055647.082_20260422_055647
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422055647.082 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 05:57 | Success | - | |
|
exp_self.20260422054905.081_20260422_054907
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422054905.081 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 05:50 | Success | - | |
|
exp_pytrain.20260422054614.021_20260422_054614
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 05:47 | Success | - | |
|
exp_self.20260422053952.080_20260422_053954
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422053952.080 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 05:40 | Success | - | |
|
exp_self.20260422053254.079_20260422_053254
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422053254.079 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 05:33 | Success | - | |
|
exp_self.20260422052528.078_20260422_052529
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422052528.078 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 05:26 | Success | - | |
|
exp_self.20260422051732.077_20260422_051732
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422051732.077 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 05:18 | Success | - | |
|
exp_pytrain.20260422051430.020_20260422_051432
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 05:15 | Success | - | |
|
exp_self.20260422050955.076_20260422_050956
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422050955.076 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 05:10 | Success | - | |
|
exp_self.20260422050136.075_20260422_050139
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422050136.075 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 05:02 | Success | - | |
|
exp_self.20260422045424.074_20260422_045424
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422045424.074 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 04:55 | Success | - | |
|
exp_self.20260422044644.073_20260422_044644
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422044644.073 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 04:47 | Success | - | |
|
exp_pytrain.20260422044237.019_20260422_044238
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 04:43 | Success | - | |
|
exp_self.20260422043907.072_20260422_043907
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422043907.072 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 04:40 | Success | - | |
|
exp_self.20260422043208.071_20260422_043208
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422043208.071 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 04:33 | Success | - | |
|
exp_self.20260422042443.070_20260422_042444
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422042443.070 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 04:25 | Success | - | |
|
exp_self.20260422041648.069_20260422_041657
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422041648.069 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 04:17 | Success | - | |
|
exp_pytrain.20260422041052.018_20260422_041053
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 04:11 | Success | - | |
|
exp_self.20260422040739.068_20260422_040742
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422040739.068 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 04:08 | Success | - | |
|
exp_self.20260422035911.067_20260422_035914
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422035911.067 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 04:00 | Success | - | |
|
exp_self.20260422035056.066_20260422_035059
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422035056.066 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 03:52 | Success | - | |
|
exp_self.20260422034159.065_20260422_034159
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422034159.065 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 03:43 | Success | - | |
|
exp_pytrain.20260422033659.017_20260422_033659
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 03:38 | Success | - | |
|
exp_self.20260422033428.064_20260422_033432
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422033428.064 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 03:35 | Success | - | |
|
exp_self.20260422032625.063_20260422_032629
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422032625.063 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 03:27 | Success | - | |
|
exp_self.20260422031746.062_20260422_031746
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422031746.062 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 03:18 | Success | - | |
|
exp_self.20260422030953.061_20260422_030953
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422030953.061 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 03:10 | Success | - | |
|
exp_pytrain.20260422030453.016_20260422_030453
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 03:05 | Success | - | |
|
exp_self.20260422030232.060_20260422_030232
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422030232.060 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 03:03 | Success | - | |
|
exp_hf_2604.19254_20260422_025934
|
ShadowPEFT: Shadow Network for Parameter-Efficient Fine-Tuning
Paper ID: hf_2604.19254 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-22 03:00 | Success | - | |
|
exp_self.20260422025204.059_20260422_025205
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422025204.059 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 02:53 | Success | - | |
|
exp_self.20260422024442.058_20260422_024444
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422024442.058 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 02:45 | Success | - | |
|
exp_cr_10.1017_rsm.2026.10094_20260422_024056
|
Large language model-based paper classification framework with key-insight extraction and confidence-weighted voting
Paper ID: cr_10.1017_rsm.2026.10094 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovere...
|
04-22 02:41 | Success | - | |
|
exp_self.20260422023721.057_20260422_023721
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422023721.057 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 02:38 | Success | - | |
|
exp_pytrain.20260422023318.015_20260422_023319
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 02:34 | Success | - | |
|
exp_self.20260422022949.056_20260422_022950
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422022949.056 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 02:30 | Success | - | |
|
exp_self.20260422022216.055_20260422_022217
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422022216.055 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 02:23 | Success | - | |
|
exp_self.20260422021450.054_20260422_021451
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422021450.054 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 02:15 | Success | - | |
|
exp_self.20260422020719.053_20260422_020720
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422020719.053 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 02:08 | Success | - | |
|
exp_hf_2604.17982_20260422_020405
|
Mitigating Multimodal Hallucination via Phase-wise Self-reward
Paper ID: hf_2604.17982 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-22 02:05 | Success | - | |
|
exp_pytrain.20260422020155.014_20260422_020156
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 02:02 | Success | - | |
|
exp_hf_2604.16913_20260422_015746
|
The Cognitive Penalty: Ablating System 1 and System 2 Reasoning in Edge-Native SLMs for Decentralized Consensus
Paper ID: hf_2604.16913 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-22 01:58 | Success | - | |
|
exp_self.20260422015527.052_20260422_015529
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422015527.052 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 01:56 | Success | - | |
|
exp_hf_2604.16054_20260422_015122
|
Mind's Eye: A Benchmark of Visual Abstraction, Transformation and Composition for Multimodal LLMs
Paper ID: hf_2604.16054 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-22 01:52 | Success | - | |
|
exp_self.20260422014802.051_20260422_014802
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422014802.051 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 01:49 | Success | - | |
|
exp_self.20260422014102.050_20260422_014104
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422014102.050 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 01:42 | Success | - | |
|
exp_self.20260422013313.049_20260422_013314
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422013313.049 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 01:34 | Success | - | |
|
exp_pytrain.20260422013013.013_20260422_013014
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 01:31 | Success | - | |
|
exp_self.20260422012521.048_20260422_012521
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422012521.048 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 01:26 | Success | - | |
|
exp_self.20260422011831.047_20260422_011832
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422011831.047 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 01:19 | Success | - | |
|
exp_self.20260422011104.046_20260422_011106
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422011104.046 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 01:12 | Success | - | |
|
exp_self.20260422010233.045_20260422_010234
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422010233.045 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 01:03 | Success | - | |
|
exp_pytrain.20260422005828.012_20260422_005830
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 00:59 | Success | - | |
|
exp_self.20260422005526.044_20260422_005526
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422005526.044 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 00:56 | Success | - | |
|
exp_self.20260422004705.043_20260422_004705
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422004705.043 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 00:48 | Success | - | |
|
exp_cr_10.3389_fmed.2026.1819087_20260422_004148
|
Examiner stratification reveals clinically relevant variability in large language model answers to endodontic patient qu...
Paper ID: cr_10.3389_fmed.2026.1819087 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recov...
|
04-22 00:42 | Success | - | |
|
exp_self.20260422003907.042_20260422_003907
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422003907.042 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 00:40 | Success | - | |
|
exp_self.20260422003216.041_20260422_003218
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422003216.041 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 00:33 | Success | - | |
|
exp_pytrain.20260422002657.011_20260422_002700
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-22 00:28 | Success | - | |
|
exp_self.20260422002421.040_20260422_002422
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422002421.040 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 00:25 | Success | - | |
|
exp_gh_NVIDIA_TransformerEngine_20260422_002107
|
NVIDIA/TransformerEngine
Paper ID: gh_NVIDIA_TransformerEngine - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recove...
|
04-22 00:22 | Success | - | |
|
exp_self.20260422001330.039_20260422_001331
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422001330.039 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 00:14 | Success | - | |
|
exp_self.20260422000620.038_20260422_000621
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260422000620.038 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-22 00:07 | Success | - | |
|
exp_hf_2604.19747_20260422_000224
|
AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model
Paper ID: hf_2604.19747 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-22 00:03 | Success | - | |
|
exp_self.20260421235647.037_20260421_235649
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421235647.037 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 23:57 | Success | - | |
|
exp_pytrain.20260421235222.010_20260421_235223
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 23:53 | Success | - | |
|
exp_self.20260421234843.036_20260421_234845
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421234843.036 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 23:49 | Success | - | |
|
exp_self.20260421234116.035_20260421_234116
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421234116.035 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 23:42 | Success | - | |
|
exp_self.20260421233419.034_20260421_233420
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421233419.034 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 23:35 | Success | - | |
|
exp_self.20260421232632.033_20260421_232633
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421232632.033 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 23:27 | Success | - | |
|
exp_pytrain.20260421232058.009_20260421_232059
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 23:22 | Success | - | |
|
exp_self.20260421231848.032_20260421_231849
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421231848.032 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 23:19 | Success | - | |
|
exp_self.20260421231040.031_20260421_231040
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421231040.031 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 23:11 | Success | - | |
|
exp_self.20260421230233.030_20260421_230241
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421230233.030 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 23:03 | Success | - | |
|
exp_self.20260421225413.029_20260421_225415
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421225413.029 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 22:55 | Success | - | |
|
exp_pytrain.20260421224817.008_20260421_224817
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 22:49 | Success | - | |
|
exp_self.20260421224605.028_20260421_224606
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421224605.028 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 22:47 | Success | - | |
|
exp_self.20260421223808.027_20260421_223808
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421223808.027 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 22:39 | Success | - | |
|
exp_self.20260421223019.026_20260421_223020
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421223019.026 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 22:31 | Success | - | |
|
exp_self.20260421222220.025_20260421_222220
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421222220.025 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 22:23 | Success | - | |
|
exp_hf_2604.17397_20260421_221911
|
Speculative Decoding for Autoregressive Video Generation
Paper ID: hf_2604.17397 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-21 22:20 | Success | - | |
|
exp_pytrain.20260421221652.007_20260421_221654
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 22:17 | Success | - | |
|
exp_self.20260421221404.024_20260421_221405
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421221404.024 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 22:15 | Success | - | |
|
exp_self.20260421220628.023_20260421_220629
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421220628.023 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 22:07 | Success | - | |
|
exp_hf_2604.15706_20260421_220044
|
Target-Oriented Pretraining Data Selection via Neuron-Activated Graph
Paper ID: hf_2604.15706 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-21 22:01 | Success | - | |
|
exp_self.20260421215833.022_20260421_215834
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421215833.022 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 21:59 | Success | - | |
|
exp_2604.19748v1_20260421_215343
|
Tstars-Tryon 1.0: Robust and Realistic Virtual Try-On for Diverse Fashion Items
Paper ID: 2604.19748v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-21 21:54 | Success | - | |
|
exp_self.20260421215046.021_20260421_215047
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421215046.021 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 21:51 | Success | - | |
|
exp_2604.19747v1_20260421_214717
|
AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model
Paper ID: 2604.19747v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-21 21:48 | Success | - | |
|
exp_pytrain.20260421214440.006_20260421_214441
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 21:45 | Success | - | |
|
exp_self.20260421214155.020_20260421_214155
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421214155.020 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 21:42 | Success | - | |
|
exp_hf_2604.19636_20260421_213821
|
CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation
Paper ID: hf_2604.19636 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-21 21:39 | Success | - | |
|
exp_self.20260421213314.019_20260421_213314
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421213314.019 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 21:34 | Success | - | |
|
exp_self.20260421212522.018_20260421_212524
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421212522.018 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 21:26 | Success | - | |
|
exp_hf_2604.19748_20260421_212050
|
Tstars-Tryon 1.0: Robust and Realistic Virtual Try-On for Diverse Fashion Items
Paper ID: hf_2604.19748 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-21 21:21 | Success | - | |
|
exp_self.20260421211832.017_20260421_211833
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421211832.017 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 21:19 | Success | - | |
|
exp_hf_2604.19550_20260421_211506
|
LoopCTR: Unlocking the Loop Scaling Power for Click-Through Rate Prediction
Paper ID: hf_2604.19550 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-21 21:16 | Success | - | |
|
exp_pytrain.20260421211245.005_20260421_211247
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 21:13 | Success | - | |
|
exp_self.20260421211002.016_20260421_211003
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421211002.016 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 21:11 | Success | - | |
|
exp_self.20260421210155.015_20260421_210158
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421210155.015 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 21:03 | Success | - | |
|
exp_self.20260421205346.014_20260421_205347
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421205346.014 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 20:54 | Success | - | |
|
exp_2604.19473v1_20260421_204906
|
TS-Attn: Temporal-wise Separable Attention for Multi-Event Video Generation
Paper ID: 2604.19473v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-21 20:50 | Success | - | |
|
exp_self.20260421204635.013_20260421_204635
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421204635.013 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 20:47 | Success | - | |
|
exp_2604.19464v1_20260421_204319
|
LePREC: Reasoning as Classification over Structured Factors for Assessing Relevance of Legal Issues
Paper ID: 2604.19464v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-21 20:44 | Success | - | |
|
exp_pytrain.20260421204041.004_20260421_204041
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 20:41 | Success | - | |
|
exp_self.20260421203801.012_20260421_203803
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421203801.012 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 20:39 | Success | - | |
|
exp_self.20260421202857.011_20260421_202858
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421202857.011 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 20:30 | Success | - | |
|
exp_self.20260421202106.010_20260421_202106
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421202106.010 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 20:22 | Success | - | |
|
exp_self.20260421201203.009_20260421_201205
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421201203.009 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 20:13 | Success | - | |
|
exp_pytrain.20260421200910.003_20260421_200912
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 20:10 | Success | - | |
|
exp_self.20260421200155.008_20260421_200156
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421200155.008 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 20:02 | Success | - | |
|
exp_self.20260421195422.007_20260421_195422
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421195422.007 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 19:55 | Success | - | |
|
exp_self.20260421194716.006_20260421_194718
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421194716.006 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 19:48 | Success | - | |
|
exp_self.20260421193916.005_20260421_193917
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421193916.005 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 19:40 | Success | - | |
|
exp_pytrain.20260421193556.002_20260421_193559
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 19:37 | Success | - | |
|
exp_self.20260421192933.004_20260421_192933
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421192933.004 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 19:30 | Success | - | |
|
exp_self.20260421192239.003_20260421_192240
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421192239.003 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 19:23 | Success | - | |
|
exp_self.20260421191504.002_20260421_191507
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421191504.002 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 19:16 | Success | - | |
|
exp_self.20260421190652.001_20260421_190652
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421190652.001 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 19:07 | Success | - | |
|
exp_pytrain.20260421190343.001_20260421_190347
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 19:04 | Success | - | |
|
exp_self.20260421182628.001_20260421_182630
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421182628.001 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 18:27 | Success | - | |
|
exp_pytrain.20260421182329.001_20260421_182332
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 18:24 | Success | - | |
|
exp_self.20260421181542.194_20260421_181544
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421181542.194 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 18:16 | Success | - | |
|
exp_self.20260421180805.193_20260421_180810
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421180805.193 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 18:09 | Success | - | |
|
exp_self.20260421180005.192_20260421_180009
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421180005.192 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 18:01 | Success | - | |
|
exp_pytrain.20260421175600.049_20260421_175602
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 17:57 | Success | - | |
|
exp_self.20260421175129.191_20260421_175129
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421175129.191 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 17:52 | Success | - | |
|
exp_self.20260421174330.190_20260421_174331
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421174330.190 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 17:44 | Success | - | |
|
exp_self.20260421173637.189_20260421_173637
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421173637.189 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 17:37 | Success | - | |
|
exp_self.20260421172939.188_20260421_172949
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421172939.188 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 17:30 | Success | - | |
|
exp_pytrain.20260421172439.048_20260421_172440
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 17:25 | Success | - | |
|
exp_self.20260421172230.187_20260421_172238
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421172230.187 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 17:23 | Success | - | |
|
exp_self.20260421171334.186_20260421_171337
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421171334.186 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 17:14 | Success | - | |
|
exp_self.20260421170536.185_20260421_170537
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421170536.185 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 17:06 | Success | - | |
|
exp_self.20260421165713.184_20260421_165717
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421165713.184 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 16:58 | Success | - | |
|
exp_pytrain.20260421165115.047_20260421_165115
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 16:52 | Success | - | |
|
exp_self.20260421164857.183_20260421_164858
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421164857.183 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 16:50 | Success | - | |
|
exp_self.20260421164218.182_20260421_164219
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421164218.182 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 16:43 | Success | - | |
|
exp_self.20260421163425.181_20260421_163427
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421163425.181 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 16:35 | Success | - | |
|
exp_self.20260421162533.180_20260421_162535
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421162533.180 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 16:26 | Success | - | |
|
exp_pytrain.20260421161954.046_20260421_161955
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 16:20 | Success | - | |
|
exp_self.20260421161737.179_20260421_161738
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421161737.179 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 16:18 | Success | - | |
|
exp_self.20260421161016.178_20260421_161018
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421161016.178 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 16:11 | Success | - | |
|
exp_self.20260421160220.177_20260421_160223
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421160220.177 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 16:03 | Success | - | |
|
exp_cr_10.2196_89540_20260421_155658
|
Classifying American Society of Anesthesiologists Physical Status With a Low-Rank–Adapted Large Language Model: Developm...
Paper ID: cr_10.2196_89540 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchma...
|
04-21 15:58 | Success | - | |
|
exp_self.20260421155152.176_20260421_155154
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421155152.176 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 15:52 | Success | - | |
|
exp_pytrain.20260421154821.045_20260421_154821
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 15:49 | Success | - | |
|
exp_self.20260421154216.175_20260421_154218
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421154216.175 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 15:43 | Success | - | |
|
exp_hf_2604.18396_20260421_153805
|
River-LLM: Large Language Model Seamless Exit Based on KV Share
Paper ID: hf_2604.18396 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-21 15:39 | Success | - | |
|
exp_self.20260421153413.174_20260421_153415
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421153413.174 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 15:35 | Success | - | |
|
exp_self.20260421152648.173_20260421_152648
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421152648.173 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 15:27 | Success | - | |
|
exp_self.20260421151922.172_20260421_151923
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421151922.172 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 15:20 | Success | - | |
|
exp_pytrain.20260421151557.044_20260421_151558
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 15:17 | Success | - | |
|
exp_self.20260421151144.171_20260421_151146
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421151144.171 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 15:12 | Success | - | |
|
exp_self.20260421145414.170_20260421_145414
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421145414.170 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 14:55 | Success | - | |
|
exp_hf_2604.18267_20260421_144953
|
MARCO: Navigating the Unseen Space of Semantic Correspondence
Paper ID: hf_2604.18267 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-21 14:50 | Success | - | |
|
exp_self.20260421144726.169_20260421_144730
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421144726.169 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 14:48 | Success | - | |
|
exp_pytrain.20260421144404.043_20260421_144406
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 14:45 | Success | - | |
|
exp_self.20260421143728.168_20260421_143730
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421143728.168 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 14:38 | Success | - | |
|
exp_self.20260421143009.167_20260421_143011
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421143009.167 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 14:31 | Success | - | |
|
exp_self.20260421142233.166_20260421_142235
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421142233.166 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 14:23 | Success | - | |
|
exp_self.20260421141407.165_20260421_141407
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421141407.165 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 14:15 | Success | - | |
|
exp_pytrain.20260421141150.042_20260421_141150
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 14:12 | Success | - | |
|
exp_self.20260421140451.164_20260421_140451
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421140451.164 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 14:05 | Success | - | |
|
exp_self.20260421135734.163_20260421_135734
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421135734.163 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 13:58 | Success | - | |
|
exp_self.20260421135020.162_20260421_135020
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421135020.162 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 13:51 | Success | - | |
|
exp_self.20260421134257.161_20260421_134258
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421134257.161 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 13:44 | Success | - | |
|
exp_pytrain.20260421134032.041_20260421_134032
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 13:41 | Success | - | |
|
exp_self.20260421133446.160_20260421_133446
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421133446.160 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 13:35 | Success | - | |
|
exp_self.20260421132702.159_20260421_132702
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421132702.159 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 13:28 | Success | - | |
|
exp_self.20260421131855.158_20260421_131855
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421131855.158 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 13:19 | Success | - | |
|
exp_self.20260421131135.157_20260421_131135
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421131135.157 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 13:12 | Success | - | |
|
exp_pytrain.20260421130917.040_20260421_130918
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 13:10 | Success | - | |
|
exp_self.20260421130431.156_20260421_130432
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421130431.156 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 13:05 | Success | - | |
|
exp_self.20260421125626.155_20260421_125626
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421125626.155 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 12:57 | Success | - | |
|
exp_self.20260421124821.154_20260421_124822
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421124821.154 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 12:49 | Success | - | |
|
exp_self.20260421124017.153_20260421_124017
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421124017.153 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 12:41 | Success | - | |
|
exp_pytrain.20260421123723.039_20260421_123723
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 12:38 | Success | - | |
|
exp_self.20260421123104.152_20260421_123104
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421123104.152 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 12:32 | Success | - | |
|
exp_self.20260421122300.151_20260421_122300
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421122300.151 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 12:24 | Success | - | |
|
exp_self.20260421121456.150_20260421_121456
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421121456.150 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 12:16 | Success | - | |
|
exp_self.20260421120803.149_20260421_120803
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421120803.149 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 12:09 | Success | - | |
|
exp_pytrain.20260421120510.038_20260421_120510
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 12:06 | Success | - | |
|
exp_self.20260421115734.148_20260421_115734
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421115734.148 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 11:58 | Success | - | |
|
exp_self.20260421114931.147_20260421_114932
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421114931.147 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 11:50 | Success | - | |
|
exp_self.20260421114235.146_20260421_114236
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421114235.146 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 11:43 | Success | - | |
|
exp_self.20260421113535.145_20260421_113536
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421113535.145 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 11:36 | Success | - | |
|
exp_pytrain.20260421113312.037_20260421_113312
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 11:34 | Success | - | |
|
exp_self.20260421112654.144_20260421_112654
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421112654.144 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 11:27 | Success | - | |
|
exp_self.20260421111907.143_20260421_111907
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421111907.143 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 11:20 | Success | - | |
|
exp_self.20260421111126.142_20260421_111126
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421111126.142 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 11:12 | Success | - | |
|
exp_self.20260421110406.141_20260421_110406
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421110406.141 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 11:05 | Success | - | |
|
exp_pytrain.20260421110142.036_20260421_110142
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 11:02 | Success | - | |
|
exp_hf_2604.16498_20260421_105902
|
Forge-UGC: FX optimization and register-graph engine for universal graph compiler
Paper ID: hf_2604.16498 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-21 11:00 | Success | - | |
|
exp_self.20260421105428.140_20260421_105429
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421105428.140 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 10:55 | Success | - | |
|
exp_self.20260421104703.139_20260421_104703
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421104703.139 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 10:48 | Success | - | |
|
exp_hf_2604.16830_20260421_104327
|
The Illusion of Certainty: Decoupling Capability and Calibration in On-Policy Distillation
Paper ID: hf_2604.16830 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-21 10:44 | Success | - | |
|
exp_self.20260421103848.138_20260421_103849
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421103848.138 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 10:39 | Success | - | |
|
exp_self.20260421103122.137_20260421_103122
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421103122.137 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 10:32 | Success | - | |
|
exp_pytrain.20260421102858.035_20260421_102858
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 10:30 | Success | - | |
|
exp_hf_2602.15143_20260421_102618
|
Protecting Language Models Against Unauthorized Distillation through Trace Rewriting
Paper ID: hf_2602.15143 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-21 10:27 | Success | - | |
|
exp_self.20260421102205.136_20260421_102205
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421102205.136 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 10:23 | Success | - | |
|
exp_cr_10.1038_s41598-026-48666-1_20260421_101857
|
Multimodal survival analysis of glioblastoma using whole-slide histopathology, gene expression, clinical variables and l...
Paper ID: cr_10.1038_s41598-026-48666-1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Reco...
|
04-21 10:19 | Success | - | |
|
exp_self.20260421101334.135_20260421_101334
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421101334.135 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 10:14 | Success | - | |
|
exp_self.20260421100611.134_20260421_100612
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421100611.134 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 10:07 | Success | - | |
|
exp_self.20260421095839.133_20260421_095839
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421095839.133 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 09:59 | Success | - | |
|
exp_pytrain.20260421095618.034_20260421_095619
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 09:57 | Success | - | |
|
exp_self.20260421095135.132_20260421_095135
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421095135.132 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 09:52 | Success | - | |
|
exp_self.20260421094351.131_20260421_094352
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421094351.131 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 09:44 | Success | - | |
|
exp_hf_2511.10262_20260421_093956
|
MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models
Paper ID: hf_2511.10262 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-21 09:41 | Success | - | |
|
exp_self.20260421093442.130_20260421_093442
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421093442.130 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 09:35 | Success | - | |
|
exp_hf_2604.15710_20260421_093126
|
VoxMind: An End-to-End Agentic Spoken Dialogue System
Paper ID: hf_2604.15710 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-21 09:32 | Success | - | |
|
exp_self.20260421092701.129_20260421_092701
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421092701.129 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 09:28 | Success | - | |
|
exp_pytrain.20260421092444.033_20260421_092444
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 09:25 | Success | - | |
|
exp_self.20260421091745.128_20260421_091745
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421091745.128 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 09:18 | Success | - | |
|
exp_self.20260421091025.127_20260421_091025
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421091025.127 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 09:11 | Success | - | |
|
exp_hf_2604.16576_20260421_090446
|
On the Robustness of LLM-Based Dense Retrievers: A Systematic Analysis of Generalizability and Stability
Paper ID: hf_2604.16576 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-21 09:05 | Success | - | |
|
exp_self.20260421090250.126_20260421_090251
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421090250.126 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 09:03 | Success | - | |
|
exp_self.20260421085450.125_20260421_085450
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421085450.125 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 08:55 | Success | - | |
|
exp_pytrain.20260421085149.032_20260421_085149
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 08:52 | Success | - | |
|
exp_hf_2604.17091_20260421_084646
|
GenericAgent: A Token-Efficient Self-Evolving LLM Agent via Contextual Information Density Maximization (V1.0)
Paper ID: hf_2604.17091 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-21 08:47 | Success | - | |
|
exp_self.20260421084450.124_20260421_084451
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421084450.124 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 08:45 | Success | - | |
|
exp_self.20260421083730.123_20260421_083731
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421083730.123 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 08:38 | Success | - | |
|
exp_self.20260421083001.122_20260421_083001
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421083001.122 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 08:31 | Success | - | |
|
exp_self.20260421082317.121_20260421_082318
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421082317.121 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 08:24 | Success | - | |
|
exp_pytrain.20260421082020.031_20260421_082021
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 08:21 | Success | - | |
|
exp_self.20260421081324.120_20260421_081324
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421081324.120 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 08:14 | Success | - | |
|
exp_self.20260421080553.119_20260421_080553
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421080553.119 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 08:06 | Success | - | |
|
exp_self.20260421075818.118_20260421_075818
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421075818.118 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 07:59 | Success | - | |
|
exp_self.20260421075045.117_20260421_075045
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421075045.117 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 07:51 | Success | - | |
|
exp_pytrain.20260421074819.030_20260421_074820
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 07:49 | Success | - | |
|
exp_self.20260421074125.116_20260421_074126
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421074125.116 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 07:42 | Success | - | |
|
exp_self.20260421073352.115_20260421_073352
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421073352.115 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 07:34 | Success | - | |
|
exp_self.20260421072621.114_20260421_072622
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421072621.114 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 07:27 | Success | - | |
|
exp_self.20260421071851.113_20260421_071852
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421071851.113 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 07:19 | Success | - | |
|
exp_pytrain.20260421071627.029_20260421_071627
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 07:17 | Success | - | |
|
exp_self.20260421070937.112_20260421_070938
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421070937.112 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 07:10 | Success | - | |
|
exp_self.20260421070209.111_20260421_070210
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421070209.111 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 07:03 | Success | - | |
|
exp_self.20260421065443.110_20260421_065444
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421065443.110 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 06:55 | Success | - | |
|
exp_self.20260421064718.109_20260421_064718
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421064718.109 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 06:48 | Success | - | |
|
exp_pytrain.20260421064459.028_20260421_064459
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 06:46 | Success | - | |
|
exp_cr_10.55041_isjem06670_20260421_063959
|
A Review of Quantization Techniques for Large Language Models: From Post-Training Quantization to Extreme 1-Bit Methods
Paper ID: cr_10.55041_isjem06670 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered b...
|
04-21 06:41 | Success | - | |
|
exp_self.20260421063758.108_20260421_063758
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421063758.108 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 06:39 | Success | - | |
|
exp_self.20260421063037.107_20260421_063037
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421063037.107 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 06:31 | Success | - | |
|
exp_self.20260421062310.106_20260421_062311
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421062310.106 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 06:24 | Success | - | |
|
exp_self.20260421061537.105_20260421_061537
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421061537.105 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 06:16 | Success | - | |
|
exp_pytrain.20260421061305.027_20260421_061305
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 06:14 | Success | - | |
|
exp_self.20260421060611.104_20260421_060612
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421060611.104 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 06:07 | Success | - | |
|
exp_self.20260421055840.103_20260421_055840
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421055840.103 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 05:59 | Success | - | |
|
exp_self.20260421055105.102_20260421_055106
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421055105.102 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 05:52 | Success | - | |
|
exp_self.20260421054335.101_20260421_054335
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421054335.101 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 05:44 | Success | - | |
|
exp_pytrain.20260421054107.026_20260421_054107
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 05:42 | Success | - | |
|
exp_self.20260421053414.100_20260421_053414
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421053414.100 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 05:35 | Success | - | |
|
exp_self.20260421052644.099_20260421_052644
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421052644.099 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 05:27 | Success | - | |
|
exp_gh_bojobh609_TurboQuant_20260421_052115
|
bojobh609/TurboQuant
Paper ID: gh_bojobh609_TurboQuant - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 05:22 | Success | - | |
|
exp_self.20260421051914.098_20260421_051915
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421051914.098 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 05:20 | Success | - | |
|
exp_self.20260421051145.097_20260421_051146
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421051145.097 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 05:12 | Success | - | |
|
exp_pytrain.20260421050924.025_20260421_050924
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 05:10 | Success | - | |
|
exp_self.20260421050228.096_20260421_050228
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421050228.096 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 05:03 | Success | - | |
|
exp_self.20260421045505.095_20260421_045506
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421045505.095 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 04:56 | Success | - | |
|
exp_self.20260421044735.094_20260421_044736
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421044735.094 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 04:48 | Success | - | |
|
exp_self.20260421044005.093_20260421_044005
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421044005.093 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 04:41 | Success | - | |
|
exp_pytrain.20260421043742.024_20260421_043743
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 04:38 | Success | - | |
|
exp_self.20260421043047.092_20260421_043048
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421043047.092 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 04:31 | Success | - | |
|
exp_self.20260421042324.091_20260421_042325
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421042324.091 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 04:24 | Success | - | |
|
exp_self.20260421041604.090_20260421_041604
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421041604.090 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 04:17 | Success | - | |
|
exp_self.20260421040833.089_20260421_040834
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421040833.089 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 04:09 | Success | - | |
|
exp_pytrain.20260421040609.023_20260421_040609
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 04:07 | Success | - | |
|
exp_self.20260421035913.088_20260421_035913
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421035913.088 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 04:00 | Success | - | |
|
exp_self.20260421035151.087_20260421_035152
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421035151.087 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 03:52 | Success | - | |
|
exp_self.20260421034430.086_20260421_034430
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421034430.086 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 03:45 | Success | - | |
|
exp_self.20260421033702.085_20260421_033702
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421033702.085 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 03:38 | Success | - | |
|
exp_pytrain.20260421033429.022_20260421_033430
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 03:35 | Success | - | |
|
exp_self.20260421032737.084_20260421_032738
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421032737.084 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 03:28 | Success | - | |
|
exp_self.20260421032009.083_20260421_032009
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421032009.083 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 03:21 | Success | - | |
|
exp_self.20260421031242.082_20260421_031242
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421031242.082 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 03:13 | Success | - | |
|
exp_self.20260421030517.081_20260421_030517
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421030517.081 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 03:06 | Success | - | |
|
exp_pytrain.20260421030250.021_20260421_030251
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 03:03 | Success | - | |
|
exp_self.20260421025558.080_20260421_025559
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421025558.080 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 02:57 | Success | - | |
|
exp_gh_berntpopp_phentrieve_20260421_025139
|
berntpopp/phentrieve
Paper ID: gh_berntpopp_phentrieve - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 02:52 | Success | - | |
|
exp_self.20260421024831.079_20260421_024831
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421024831.079 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 02:49 | Success | - | |
|
exp_self.20260421024103.078_20260421_024104
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421024103.078 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 02:42 | Success | - | |
|
exp_self.20260421023330.077_20260421_023331
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421023330.077 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 02:34 | Success | - | |
|
exp_pytrain.20260421023111.020_20260421_023112
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 02:32 | Success | - | |
|
exp_self.20260421022415.076_20260421_022415
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421022415.076 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 02:25 | Success | - | |
|
exp_self.20260421021653.075_20260421_021653
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421021653.075 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 02:17 | Success | - | |
|
exp_self.20260421020914.074_20260421_020915
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421020914.074 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 02:10 | Success | - | |
|
exp_self.20260421020144.073_20260421_020144
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421020144.073 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 02:02 | Success | - | |
|
exp_pytrain.20260421015924.019_20260421_015925
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 02:00 | Success | - | |
|
exp_self.20260421015226.072_20260421_015226
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421015226.072 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 01:53 | Success | - | |
|
exp_self.20260421014502.071_20260421_014503
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421014502.071 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 01:46 | Success | - | |
|
exp_self.20260421013738.070_20260421_013739
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421013738.070 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 01:38 | Success | - | |
|
exp_self.20260421013008.069_20260421_013009
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421013008.069 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 01:31 | Success | - | |
|
exp_pytrain.20260421012746.018_20260421_012746
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 01:28 | Success | - | |
|
exp_self.20260421012052.068_20260421_012052
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421012052.068 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 01:21 | Success | - | |
|
exp_self.20260421011330.067_20260421_011331
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421011330.067 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 01:14 | Success | - | |
|
exp_self.20260421010610.066_20260421_010611
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421010610.066 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 01:07 | Success | - | |
|
exp_self.20260421005846.065_20260421_005847
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421005846.065 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 00:59 | Success | - | |
|
exp_pytrain.20260421005620.017_20260421_005620
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 00:57 | Success | - | |
|
exp_self.20260421004924.064_20260421_004925
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421004924.064 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 00:50 | Success | - | |
|
exp_self.20260421004156.063_20260421_004156
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421004156.063 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 00:42 | Success | - | |
|
exp_self.20260421003431.062_20260421_003431
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421003431.062 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 00:35 | Success | - | |
|
exp_self.20260421002708.061_20260421_002708
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421002708.061 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 00:28 | Success | - | |
|
exp_pytrain.20260421002439.016_20260421_002440
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-21 00:25 | Success | - | |
|
exp_self.20260421001749.060_20260421_001749
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421001749.060 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 00:18 | Success | - | |
|
exp_self.20260421001018.059_20260421_001019
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421001018.059 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 00:11 | Success | - | |
|
exp_hf_2604.17388_20260421_000513
|
Back to Repair: A Minimal Denoising Network\ for Time Series Anomaly Detection
Paper ID: hf_2604.17388 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-21 00:06 | Success | - | |
|
exp_self.20260421000318.058_20260421_000318
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260421000318.058 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-21 00:04 | Success | - | |
|
exp_self.20260420235506.057_20260420_235506
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420235506.057 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 23:56 | Success | - | |
|
exp_pytrain.20260420235247.015_20260420_235248
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-20 23:53 | Success | - | |
|
exp_gh_whispering3_scao_20260420_235007
|
whispering3/scao
Paper ID: gh_whispering3_scao - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benc...
|
04-20 23:51 | Success | - | |
|
exp_self.20260420234658.056_20260420_234658
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420234658.056 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 23:48 | Success | - | |
|
exp_hf_2604.17454_20260420_234409
|
HSG: Hyperbolic Scene Graph
Paper ID: hf_2604.17454 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-20 23:45 | Success | - | |
|
exp_self.20260420233710.055_20260420_233711
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420233710.055 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 23:38 | Success | - | |
|
exp_self.20260420232943.054_20260420_232943
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420232943.054 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 23:30 | Success | - | |
|
exp_self.20260420232220.053_20260420_232220
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420232220.053 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 23:23 | Success | - | |
|
exp_pytrain.20260420231953.014_20260420_231953
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-20 23:20 | Success | - | |
|
exp_self.20260420231302.052_20260420_231303
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420231302.052 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 23:14 | Success | - | |
|
exp_self.20260420230537.051_20260420_230537
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420230537.051 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 23:06 | Success | - | |
|
exp_2604.18584v1_20260420_230012
|
MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval
Paper ID: 2604.18584v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-20 23:01 | Success | - | |
|
exp_self.20260420225812.050_20260420_225813
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420225812.050 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 22:59 | Success | - | |
|
exp_2604.18580v1_20260420_225503
|
Sessa: Selective State Space Attention
Paper ID: 2604.18580v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-20 22:56 | Success | - | |
|
exp_self.20260420225053.049_20260420_225054
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420225053.049 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 22:51 | Success | - | |
|
exp_pytrain.20260420224826.013_20260420_224827
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-20 22:49 | Success | - | |
|
exp_hf_2604.18584_20260420_224546
|
MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval
Paper ID: hf_2604.18584 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-20 22:46 | Success | - | |
|
exp_self.20260420224024.048_20260420_224025
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420224024.048 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 22:41 | Success | - | |
|
exp_self.20260420223254.047_20260420_223254
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420223254.047 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 22:33 | Success | - | |
|
exp_self.20260420222534.046_20260420_222535
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420222534.046 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 22:26 | Success | - | |
|
exp_self.20260420221813.045_20260420_221813
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420221813.045 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 22:19 | Success | - | |
|
exp_pytrain.20260420221547.012_20260420_221547
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-20 22:16 | Success | - | |
|
exp_hf_2604.08537_20260420_221333
|
Meta-learning In-Context Enables Training-Free Cross Subject Brain Decoding
Paper ID: hf_2604.08537 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-20 22:14 | Success | - | |
|
exp_self.20260420221029.044_20260420_221029
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420221029.044 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 22:11 | Success | - | |
|
exp_hf_2604.18486_20260420_220742
|
OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation
Paper ID: hf_2604.18486 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-20 22:08 | Success | - | |
|
exp_self.20260420220151.043_20260420_220151
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420220151.043 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 22:02 | Success | - | |
|
exp_2604.18064v1_20260420_215624
|
Understanding Human Actions through the Lens of Executable Models
Paper ID: 2604.18064v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-20 21:57 | Success | - | |
|
exp_self.20260420215424.042_20260420_215425
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420215424.042 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 21:55 | Success | - | |
|
exp_2604.18067v1_20260420_215109
|
Towards Real-Time ECG and EMG Modeling on $μ$ NPUs
Paper ID: 2604.18067v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-20 21:52 | Success | - | |
|
exp_self.20260420214658.041_20260420_214658
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420214658.041 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 21:48 | Success | - | |
|
exp_pytrain.20260420214431.011_20260420_214431
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-20 21:45 | Success | - | |
|
exp_self.20260420213746.040_20260420_213746
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420213746.040 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 21:38 | Success | - | |
|
exp_self.20260420213022.039_20260420_213022
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420213022.039 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 21:31 | Success | - | |
|
exp_self.20260420212259.038_20260420_212259
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420212259.038 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 21:24 | Success | - | |
|
exp_self.20260420211539.037_20260420_211539
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420211539.037 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 21:16 | Success | - | |
|
exp_pytrain.20260420211315.010_20260420_211316
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-20 21:14 | Success | - | |
|
exp_self.20260420210632.036_20260420_210633
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420210632.036 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 21:07 | Success | - | |
|
exp_hf_2604.17696_20260420_210059
|
Stratagem: Learning Transferable Reasoning via Trajectory-Modulated Game Self-Play
Paper ID: hf_2604.17696 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-20 21:02 | Success | - | |
|
exp_self.20260420205905.035_20260420_205906
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420205905.035 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 21:00 | Success | - | |
|
exp_self.20260420205144.034_20260420_205145
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420205144.034 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 20:52 | Success | - | |
|
exp_hf_2604.17698_20260420_204827
|
The Geometric Canary: Predicting Steerability and Detecting Drift via Representational Stability
Paper ID: hf_2604.17698 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-20 20:49 | Success | - | |
|
exp_self.20260420204420.033_20260420_204420
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420204420.033 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 20:45 | Success | - | |
|
exp_pytrain.20260420204151.009_20260420_204151
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-20 20:42 | Success | - | |
|
exp_hf_2604.16642_20260420_203910
|
Geometric coherence of single-cell CRISPR perturbations reveals regulatory architecture and predicts cellular stress
Paper ID: hf_2604.16642 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-20 20:40 | Success | - | |
|
exp_self.20260420203459.032_20260420_203459
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420203459.032 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 20:36 | Success | - | |
|
exp_self.20260420202740.031_20260420_202740
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420202740.031 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 20:28 | Success | - | |
|
exp_hf_2604.16503_20260420_202425
|
Motif-Video 2B: Technical Report
Paper ID: hf_2604.16503 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-20 20:25 | Success | - | |
|
exp_self.20260420202013.030_20260420_202013
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420202013.030 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 20:21 | Success | - | |
|
exp_self.20260420201253.029_20260420_201253
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420201253.029 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 20:13 | Success | - | |
|
exp_pytrain.20260420201028.008_20260420_201029
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-20 20:11 | Success | - | |
|
exp_self.20260420200343.028_20260420_200344
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420200343.028 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 20:04 | Success | - | |
|
exp_self.20260420195621.027_20260420_195621
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420195621.027 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 19:57 | Success | - | |
|
exp_self.20260420194900.026_20260420_194901
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420194900.026 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 19:50 | Success | - | |
|
exp_self.20260420194107.025_20260420_194108
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420194107.025 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 19:42 | Success | - | |
|
exp_pytrain.20260420193810.007_20260420_193810
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-20 19:39 | Success | - | |
|
exp_gh_Labyrinthine-saltiness744_turboquant-mlx_20260420_193306
|
Labyrinthine-saltiness744/turboquant-mlx
Paper ID: gh_Labyrinthine-saltiness744_turboquant-mlx - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expecte...
|
04-20 19:34 | Success | - | |
|
exp_self.20260420193058.024_20260420_193058
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420193058.024 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 19:32 | Success | - | |
|
exp_self.20260420192330.023_20260420_192331
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420192330.023 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 19:24 | Success | - | |
|
exp_self.20260420191606.022_20260420_191606
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420191606.022 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 19:17 | Success | - | |
|
exp_self.20260420190834.021_20260420_190835
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420190834.021 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 19:09 | Success | - | |
|
exp_pytrain.20260420190606.006_20260420_190606
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-20 19:07 | Success | - | |
|
exp_self.20260420185907.020_20260420_185907
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420185907.020 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 19:00 | Success | - | |
|
exp_self.20260420185144.019_20260420_185144
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420185144.019 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 18:52 | Success | - | |
|
exp_self.20260420184419.018_20260420_184420
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420184419.018 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 18:45 | Success | - | |
|
exp_self.20260420183654.017_20260420_183654
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420183654.017 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 18:37 | Success | - | |
|
exp_pytrain.20260420183422.005_20260420_183423
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-20 18:35 | Success | - | |
|
exp_self.20260420182725.016_20260420_182726
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420182725.016 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 18:28 | Success | - | |
|
exp_self.20260420181951.015_20260420_181952
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420181951.015 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 18:20 | Success | - | |
|
exp_self.20260420181224.014_20260420_181225
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420181224.014 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 18:13 | Success | - | |
|
exp_self.20260420180501.013_20260420_180502
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420180501.013 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 18:06 | Success | - | |
|
exp_pytrain.20260420180227.004_20260420_180227
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-20 18:03 | Success | - | |
|
exp_self.20260420175534.012_20260420_175535
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420175534.012 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 17:56 | Success | - | |
|
exp_self.20260420174805.011_20260420_174805
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420174805.011 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 17:49 | Success | - | |
|
exp_self.20260420174037.010_20260420_174038
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420174037.010 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 17:41 | Success | - | |
|
exp_self.20260420173308.009_20260420_173309
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420173308.009 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 17:34 | Success | - | |
|
exp_pytrain.20260420173035.003_20260420_173036
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-20 17:31 | Success | - | |
|
exp_self.20260420172346.008_20260420_172347
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420172346.008 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 17:24 | Success | - | |
|
exp_self.20260420171612.007_20260420_171613
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420171612.007 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 17:17 | Success | - | |
|
exp_self.20260420170842.006_20260420_170842
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420170842.006 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 17:09 | Success | - | |
|
exp_self.20260420170113.005_20260420_170114
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420170113.005 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 17:02 | Success | - | |
|
exp_pytrain.20260420165843.002_20260420_165843
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-20 16:59 | Success | - | |
|
exp_self.20260420165149.004_20260420_165149
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420165149.004 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 16:52 | Success | - | |
|
exp_self.20260420164419.003_20260420_164420
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420164419.003 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 16:45 | Success | - | |
|
exp_self.20260420163650.002_20260420_163651
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420163650.002 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 16:37 | Success | - | |
|
exp_self.20260420162923.001_20260420_162923
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420162923.001 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 16:30 | Success | - | |
|
exp_pytrain.20260420162704.001_20260420_162704
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-20 16:28 | Success | - | |
|
exp_self.20260420144048.743_20260420_144049
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420144048.743 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 14:40 | Pending | - | |
|
exp_self.20260420143322.742_20260420_143323
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420143322.742 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 14:34 | Success | - | |
|
exp_self.20260420142550.741_20260420_142551
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420142550.741 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 14:26 | Success | - | |
|
exp_self.20260420141820.740_20260420_141820
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420141820.740 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 14:19 | Success | - | |
|
exp_pytrain.20260420141550.183_20260420_141550
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-20 14:16 | Success | - | |
|
exp_self.20260420140843.739_20260420_140844
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420140843.739 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 14:09 | Success | - | |
|
exp_self.20260420140114.738_20260420_140115
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420140114.738 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 14:02 | Success | - | |
|
exp_self.20260420135339.737_20260420_135339
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420135339.737 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 13:54 | Success | - | |
|
exp_self.20260420134607.736_20260420_134607
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420134607.736 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 13:47 | Success | - | |
|
exp_pytrain.20260420134337.182_20260420_134338
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-20 13:44 | Success | - | |
|
exp_self.20260420133630.735_20260420_133630
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420133630.735 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 13:37 | Success | - | |
|
exp_self.20260420132859.734_20260420_132900
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420132859.734 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 13:30 | Success | - | |
|
exp_self.20260420132130.733_20260420_132130
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420132130.733 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 13:22 | Success | - | |
|
exp_self.20260420131355.732_20260420_131356
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420131355.732 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 13:14 | Success | - | |
|
exp_pytrain.20260420131123.181_20260420_131123
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-20 13:12 | Success | - | |
|
exp_self.20260420130419.731_20260420_130419
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420130419.731 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 13:05 | Success | - | |
|
exp_self.20260420125646.730_20260420_125646
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420125646.730 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 12:57 | Success | - | |
|
exp_self.20260420124913.729_20260420_124913
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420124913.729 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 12:50 | Success | - | |
|
exp_cr_10.1108_jbsed-05-2025-0135_20260420_124446
|
Building smarter digital content: a CRITIC – DEMATEL framework for leveraging large language model optimization in marke...
Paper ID: cr_10.1108_jbsed-05-2025-0135 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Reco...
|
04-20 12:45 | Success | - | |
|
exp_self.20260420124127.728_20260420_124128
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420124127.728 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 12:42 | Success | - | |
|
exp_pytrain.20260420123850.180_20260420_123851
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-20 12:39 | Success | - | |
|
exp_self.20260420123153.727_20260420_123153
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420123153.727 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 12:32 | Success | - | |
|
exp_self.20260420122416.726_20260420_122416
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420122416.726 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 12:25 | Success | - | |
|
exp_self.20260420121639.725_20260420_121639
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420121639.725 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 12:17 | Success | - | |
|
exp_self.20260420120906.724_20260420_120906
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420120906.724 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 12:10 | Success | - | |
|
exp_pytrain.20260420120624.179_20260420_120624
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-20 12:07 | Success | - | |
|
exp_self.20260420115911.723_20260420_115912
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420115911.723 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 12:00 | Success | - | |
|
exp_self.20260420115136.722_20260420_115136
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420115136.722 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 11:52 | Success | - | |
|
exp_self.20260420114358.721_20260420_114358
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420114358.721 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 11:45 | Success | - | |
|
exp_self.20260420113617.720_20260420_113618
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420113617.720 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 11:37 | Success | - | |
|
exp_pytrain.20260420113344.178_20260420_113345
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-20 11:34 | Success | - | |
|
exp_self.20260420112743.719_20260420_112744
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420112743.719 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 11:28 | Success | - | |
|
exp_self.20260420112006.718_20260420_112007
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420112006.718 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 11:21 | Success | - | |
|
exp_self.20260420111232.717_20260420_111232
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420111232.717 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 11:13 | Success | - | |
|
exp_self.20260420110501.716_20260420_110501
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420110501.716 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 11:06 | Success | - | |
|
exp_pytrain.20260420110221.177_20260420_110221
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-20 11:03 | Success | - | |
|
exp_self.20260420105523.715_20260420_105524
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420105523.715 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 10:56 | Success | - | |
|
exp_self.20260420104751.714_20260420_104751
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420104751.714 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 10:48 | Success | - | |
|
exp_self.20260420104018.713_20260420_104018
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420104018.713 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 10:41 | Success | - | |
|
exp_self.20260420103245.712_20260420_103245
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420103245.712 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 10:33 | Success | - | |
|
exp_pytrain.20260420103005.176_20260420_103005
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-20 10:31 | Success | - | |
|
exp_self.20260420102311.711_20260420_102311
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420102311.711 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 10:24 | Success | - | |
|
exp_self.20260420101529.710_20260420_101529
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420101529.710 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 10:16 | Success | - | |
|
exp_self.20260420100748.709_20260420_100748
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420100748.709 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 10:08 | Success | - | |
|
exp_self.20260420100008.708_20260420_100009
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420100008.708 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 10:01 | Success | - | |
|
exp_pytrain.20260420095729.175_20260420_095729
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-20 09:58 | Success | - | |
|
exp_self.20260420095036.707_20260420_095036
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420095036.707 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 09:51 | Success | - | |
|
exp_self.20260420094301.706_20260420_094301
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420094301.706 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 09:44 | Success | - | |
|
exp_self.20260420093521.705_20260420_093522
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420093521.705 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 09:36 | Success | - | |
|
exp_self.20260420092746.704_20260420_092747
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420092746.704 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 09:28 | Success | - | |
|
exp_pytrain.20260420092517.174_20260420_092517
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-20 09:26 | Success | - | |
|
exp_self.20260420091928.703_20260420_091929
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420091928.703 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 09:20 | Success | - | |
|
exp_self.20260420091144.702_20260420_091144
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420091144.702 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 09:12 | Success | - | |
|
exp_self.20260420090352.701_20260420_090353
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420090352.701 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 09:04 | Success | - | |
|
exp_self.20260420085617.700_20260420_085618
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420085617.700 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 08:57 | Success | - | |
|
exp_pytrain.20260420085354.173_20260420_085354
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-20 08:54 | Success | - | |
|
exp_self.20260420084636.699_20260420_084636
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420084636.699 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 08:47 | Success | - | |
|
exp_self.20260420083910.698_20260420_083910
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420083910.698 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 08:40 | Success | - | |
|
exp_self.20260420083130.697_20260420_083130
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420083130.697 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 08:32 | Success | - | |
|
exp_self.20260420082402.696_20260420_082402
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420082402.696 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 08:25 | Success | - | |
|
exp_pytrain.20260420082145.172_20260420_082146
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-20 08:22 | Success | - | |
|
exp_self.20260420081446.695_20260420_081446
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420081446.695 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 08:15 | Success | - | |
|
exp_self.20260420080726.694_20260420_080727
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420080726.694 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 08:08 | Success | - | |
|
exp_self.20260420080006.693_20260420_080007
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420080006.693 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 08:01 | Success | - | |
|
exp_hf_2604.16027_20260420_075431
|
Where does output diversity collapse in post-training?
Paper ID: hf_2604.16027 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-20 07:55 | Success | - | |
|
exp_self.20260420075238.692_20260420_075238
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420075238.692 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 07:53 | Success | - | |
|
exp_pytrain.20260420075013.171_20260420_075013
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-20 07:51 | Success | - | |
|
exp_self.20260420074327.691_20260420_074328
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420074327.691 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 07:44 | Success | - | |
|
exp_self.20260420073603.690_20260420_073603
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420073603.690 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 07:37 | Success | - | |
|
exp_self.20260420072840.689_20260420_072840
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420072840.689 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 07:29 | Success | - | |
|
exp_self.20260420072121.688_20260420_072121
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420072121.688 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 07:22 | Success | - | |
|
exp_pytrain.20260420071859.170_20260420_071900
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-20 07:20 | Success | - | |
|
exp_self.20260420071201.687_20260420_071201
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420071201.687 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 07:13 | Success | - | |
|
exp_hf_2604.15923_20260420_070744
|
Hierarchical Codec Diffusion for Video-to-Speech Generation
Paper ID: hf_2604.15923 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-20 07:08 | Success | - | |
|
exp_self.20260420070442.686_20260420_070443
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420070442.686 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 07:05 | Success | - | |
|
exp_self.20260420065724.685_20260420_065724
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420065724.685 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 06:58 | Success | - | |
|
exp_self.20260420065003.684_20260420_065003
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420065003.684 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 06:51 | Success | - | |
|
exp_pytrain.20260420064745.169_20260420_064746
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-20 06:48 | Success | - | |
|
exp_self.20260420064036.683_20260420_064037
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420064036.683 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 06:41 | Success | - | |
|
exp_self.20260420063318.682_20260420_063319
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420063318.682 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 06:34 | Success | - | |
|
exp_self.20260420062552.681_20260420_062552
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420062552.681 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 06:26 | Success | - | |
|
exp_self.20260420061829.680_20260420_061829
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420061829.680 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 06:19 | Success | - | |
|
exp_pytrain.20260420061611.168_20260420_061611
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-20 06:17 | Success | - | |
|
exp_self.20260420060919.679_20260420_060920
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420060919.679 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 06:10 | Success | - | |
|
exp_self.20260420060203.678_20260420_060204
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420060203.678 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 06:03 | Success | - | |
|
exp_self.20260420055439.677_20260420_055439
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420055439.677 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 05:55 | Success | - | |
|
exp_self.20260420054712.676_20260420_054713
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420054712.676 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 05:48 | Success | - | |
|
exp_pytrain.20260420054454.167_20260420_054455
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-20 05:45 | Success | - | |
|
exp_self.20260420053802.675_20260420_053803
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420053802.675 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 05:39 | Success | - | |
|
exp_self.20260420053042.674_20260420_053042
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420053042.674 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 05:31 | Success | - | |
|
exp_self.20260420052238.673_20260420_052239
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420052238.673 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 05:23 | Success | - | |
|
exp_self.20260420051442.672_20260420_051442
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420051442.672 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 05:15 | Success | - | |
|
exp_pytrain.20260420051214.166_20260420_051214
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-20 05:13 | Success | - | |
|
exp_self.20260420050509.671_20260420_050509
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420050509.671 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 05:06 | Success | - | |
|
exp_self.20260420045731.670_20260420_045732
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420045731.670 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 04:58 | Success | - | |
|
exp_self.20260420044955.669_20260420_044955
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420044955.669 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 04:50 | Success | - | |
|
exp_self.20260420044214.668_20260420_044214
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420044214.668 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 04:43 | Success | - | |
|
exp_pytrain.20260420043942.165_20260420_043943
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-20 04:40 | Success | - | |
|
exp_self.20260420043244.667_20260420_043244
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420043244.667 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 04:33 | Success | - | |
|
exp_hf_2604.12012_20260420_042820
|
TIPSv2: Advancing Vision-Language Pretraining with Enhanced Patch-Text Alignment
Paper ID: hf_2604.12012 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-20 04:29 | Success | - | |
|
exp_self.20260420042505.666_20260420_042506
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420042505.666 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 04:26 | Success | - | |
|
exp_self.20260420041723.665_20260420_041724
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420041723.665 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 04:18 | Success | - | |
|
exp_self.20260420040953.664_20260420_040953
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420040953.664 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 04:10 | Success | - | |
|
exp_pytrain.20260420040726.164_20260420_040726
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-20 04:08 | Success | - | |
|
exp_self.20260420040018.663_20260420_040018
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420040018.663 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 04:01 | Success | - | |
|
exp_self.20260420035249.662_20260420_035250
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420035249.662 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 03:53 | Success | - | |
|
exp_self.20260420034510.661_20260420_034510
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420034510.661 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 03:46 | Success | - | |
|
exp_self.20260420033738.660_20260420_033739
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420033738.660 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 03:38 | Success | - | |
|
exp_pytrain.20260420033514.163_20260420_033514
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-20 03:36 | Success | - | |
|
exp_self.20260420033055.659_20260420_033055
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420033055.659 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 03:31 | Success | - | |
|
exp_self.20260420032322.658_20260420_032323
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420032322.658 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 03:24 | Success | - | |
|
exp_hf_2604.14663_20260420_032024
|
EdgeDetect: Importance-Aware Gradient Compression with Homomorphic Aggregation for Federated Intrusion Detection
Paper ID: hf_2604.14663 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-20 03:21 | Success | - | |
|
exp_self.20260420031313.657_20260420_031314
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420031313.657 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 03:14 | Success | - | |
|
exp_self.20260420030546.656_20260420_030546
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420030546.656 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 03:06 | Success | - | |
|
exp_pytrain.20260420030311.162_20260420_030311
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-20 03:04 | Success | - | |
|
exp_self.20260420025611.655_20260420_025612
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420025611.655 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 02:57 | Success | - | |
|
exp_self.20260420024837.654_20260420_024838
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420024837.654 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 02:49 | Success | - | |
|
exp_self.20260420024106.653_20260420_024107
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420024106.653 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 02:42 | Success | - | |
|
exp_self.20260420023337.652_20260420_023337
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420023337.652 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 02:34 | Success | - | |
|
exp_pytrain.20260420023104.161_20260420_023105
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-20 02:32 | Success | - | |
|
exp_self.20260420022412.651_20260420_022412
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420022412.651 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 02:25 | Success | - | |
|
exp_self.20260420021632.650_20260420_021633
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420021632.650 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 02:17 | Success | - | |
|
exp_self.20260420020900.649_20260420_020900
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420020900.649 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 02:10 | Success | - | |
|
exp_self.20260420020128.648_20260420_020129
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420020128.648 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 02:02 | Success | - | |
|
exp_pytrain.20260420015858.160_20260420_015858
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-20 02:00 | Success | - | |
|
exp_self.20260420015204.647_20260420_015205
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420015204.647 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 01:53 | Success | - | |
|
exp_self.20260420014432.646_20260420_014433
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420014432.646 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 01:45 | Success | - | |
|
exp_self.20260420013703.645_20260420_013704
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420013703.645 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 01:38 | Success | - | |
|
exp_self.20260420012940.644_20260420_012940
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420012940.644 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 01:30 | Success | - | |
|
exp_pytrain.20260420012715.159_20260420_012716
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-20 01:28 | Success | - | |
|
exp_self.20260420012022.643_20260420_012022
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420012022.643 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 01:21 | Success | - | |
|
exp_self.20260420011253.642_20260420_011253
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420011253.642 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 01:13 | Success | - | |
|
exp_self.20260420010520.641_20260420_010521
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420010520.641 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 01:06 | Success | - | |
|
exp_self.20260420005755.640_20260420_005755
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420005755.640 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 00:58 | Success | - | |
|
exp_pytrain.20260420005530.158_20260420_005531
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-20 00:56 | Success | - | |
|
exp_self.20260420004833.639_20260420_004833
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420004833.639 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 00:49 | Success | - | |
|
exp_self.20260420004108.638_20260420_004109
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420004108.638 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 00:42 | Success | - | |
|
exp_self.20260420003340.637_20260420_003341
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420003340.637 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 00:34 | Success | - | |
|
exp_self.20260420002613.636_20260420_002613
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420002613.636 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 00:27 | Success | - | |
|
exp_pytrain.20260420002350.157_20260420_002350
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-20 00:24 | Success | - | |
|
exp_hf_2511.15915_20260420_001955
|
AccelOpt: A Self-Improving LLM Agentic System for AI Accelerator Kernel Optimization
Paper ID: hf_2511.15915 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-20 00:20 | Success | - | |
|
exp_self.20260420001644.635_20260420_001645
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420001644.635 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 00:17 | Success | - | |
|
exp_self.20260420000919.634_20260420_000920
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420000919.634 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 00:10 | Success | - | |
|
exp_hf_2604.16299_20260420_000500
|
Repurposing 3D Generative Model for Autoregressive Layout Generation
Paper ID: hf_2604.16299 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-20 00:06 | Success | - | |
|
exp_self.20260420000150.633_20260420_000151
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260420000150.633 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-20 00:02 | Success | - | |
|
exp_self.20260419235423.632_20260419_235423
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419235423.632 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 23:55 | Success | - | |
|
exp_pytrain.20260419235158.156_20260419_235158
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 23:53 | Success | - | |
|
exp_self.20260419234458.631_20260419_234459
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419234458.631 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 23:46 | Success | - | |
|
exp_self.20260419233731.630_20260419_233731
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419233731.630 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 23:38 | Success | - | |
|
exp_self.20260419233002.629_20260419_233002
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419233002.629 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 23:31 | Success | - | |
|
exp_self.20260419232234.628_20260419_232235
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419232234.628 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 23:23 | Success | - | |
|
exp_pytrain.20260419232005.155_20260419_232006
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 23:21 | Success | - | |
|
exp_self.20260419231306.627_20260419_231307
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419231306.627 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 23:14 | Success | - | |
|
exp_self.20260419230540.626_20260419_230541
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419230540.626 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 23:06 | Success | - | |
|
exp_self.20260419225818.625_20260419_225818
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419225818.625 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 22:59 | Success | - | |
|
exp_self.20260419225049.624_20260419_225049
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419225049.624 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 22:51 | Success | - | |
|
exp_pytrain.20260419224822.154_20260419_224822
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 22:49 | Success | - | |
|
exp_self.20260419224122.623_20260419_224122
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419224122.623 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 22:42 | Success | - | |
|
exp_self.20260419223357.622_20260419_223357
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419223357.622 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 22:35 | Success | - | |
|
exp_self.20260419222636.621_20260419_222636
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419222636.621 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 22:27 | Success | - | |
|
exp_self.20260419221909.620_20260419_221910
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419221909.620 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 22:20 | Success | - | |
|
exp_pytrain.20260419221638.153_20260419_221638
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 22:17 | Success | - | |
|
exp_self.20260419221224.619_20260419_221224
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419221224.619 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 22:13 | Success | - | |
|
exp_hf_2604.16029_20260419_220803
|
Cut Your Losses! Learning to Prune Paths Early for Efficient Parallel Reasoning
Paper ID: hf_2604.16029 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-19 22:09 | Success | - | |
|
exp_self.20260419220449.618_20260419_220450
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419220449.618 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 22:05 | Success | - | |
|
exp_self.20260419215720.617_20260419_215720
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419215720.617 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 21:58 | Success | - | |
|
exp_2604.16298v1_20260419_215151
|
FineCog-Nav: Integrating Fine-grained Cognitive Modules for Zero-shot Multimodal UAV Navigation
Paper ID: 2604.16298v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-19 21:52 | Success | - | |
|
exp_self.20260419214940.616_20260419_214940
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419214940.616 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 21:50 | Success | - | |
|
exp_2604.16299v1_20260419_214648
|
Repurposing 3D Generative Model for Autoregressive Layout Generation
Paper ID: 2604.16299v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-19 21:47 | Success | - | |
|
exp_pytrain.20260419214443.152_20260419_214444
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 21:45 | Success | - | |
|
exp_self.20260419214025.615_20260419_214026
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419214025.615 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 21:41 | Success | - | |
|
exp_self.20260419213250.614_20260419_213251
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419213250.614 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 21:33 | Success | - | |
|
exp_self.20260419212519.613_20260419_212519
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419212519.613 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 21:26 | Success | - | |
|
exp_self.20260419211729.612_20260419_211729
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419211729.612 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 21:18 | Success | - | |
|
exp_pytrain.20260419211223.151_20260419_211223
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 21:13 | Success | - | |
|
exp_self.20260419210959.611_20260419_211000
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419210959.611 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 21:11 | Success | - | |
|
exp_hf_2604.15804_20260419_210625
|
Qwen3.5-Omni Technical Report
Paper ID: hf_2604.15804 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-19 21:07 | Success | - | |
|
exp_gh_arsalanafzal010_SmartRAG_20260419_210120
|
arsalanafzal010/SmartRAG
Paper ID: gh_arsalanafzal010_SmartRAG - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recove...
|
04-19 21:02 | Success | - | |
|
exp_self.20260419205909.610_20260419_205909
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419205909.610 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 21:00 | Success | - | |
|
exp_2604.16205v1_20260419_205549
|
ChemGraph-XANES: An Agentic Framework for XANES Simulation and Analysis
Paper ID: 2604.16205v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-19 20:56 | Success | - | |
|
exp_self.20260419205013.609_20260419_205014
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419205013.609 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 20:51 | Success | - | |
|
exp_2604.16207v1_20260419_204446
|
AIFIND: Artifact-Aware Interpreting Fine-Grained Alignment for Incremental Face Forgery Detection
Paper ID: 2604.16207v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-19 20:45 | Success | - | |
|
exp_self.20260419204235.608_20260419_204236
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419204235.608 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 20:43 | Success | - | |
|
exp_pytrain.20260419204002.150_20260419_204002
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 20:41 | Success | - | |
|
exp_self.20260419203305.607_20260419_203305
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419203305.607 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 20:34 | Success | - | |
|
exp_self.20260419202533.606_20260419_202534
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419202533.606 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 20:26 | Success | - | |
|
exp_self.20260419201810.605_20260419_201810
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419201810.605 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 20:19 | Success | - | |
|
exp_self.20260419201041.604_20260419_201041
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419201041.604 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 20:11 | Success | - | |
|
exp_pytrain.20260419200803.149_20260419_200803
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 20:09 | Success | - | |
|
exp_self.20260419200111.603_20260419_200112
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419200111.603 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 20:02 | Success | - | |
|
exp_self.20260419195339.602_20260419_195339
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419195339.602 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 19:54 | Success | - | |
|
exp_self.20260419194609.601_20260419_194609
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419194609.601 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 19:47 | Success | - | |
|
exp_self.20260419193841.600_20260419_193841
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419193841.600 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 19:39 | Success | - | |
|
exp_pytrain.20260419193606.148_20260419_193606
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 19:37 | Success | - | |
|
exp_self.20260419192913.599_20260419_192914
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419192913.599 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 19:30 | Success | - | |
|
exp_self.20260419192146.598_20260419_192146
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419192146.598 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 19:22 | Success | - | |
|
exp_self.20260419191419.597_20260419_191419
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419191419.597 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 19:15 | Success | - | |
|
exp_self.20260419190652.596_20260419_190652
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419190652.596 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 19:07 | Success | - | |
|
exp_pytrain.20260419190421.147_20260419_190421
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 19:05 | Success | - | |
|
exp_self.20260419185729.595_20260419_185730
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419185729.595 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 18:58 | Success | - | |
|
exp_self.20260419185000.594_20260419_185000
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419185000.594 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 18:51 | Success | - | |
|
exp_self.20260419184226.593_20260419_184227
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419184226.593 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 18:43 | Success | - | |
|
exp_self.20260419183458.592_20260419_183458
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419183458.592 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 18:36 | Success | - | |
|
exp_pytrain.20260419183221.146_20260419_183221
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 18:33 | Success | - | |
|
exp_self.20260419182529.591_20260419_182529
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419182529.591 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 18:26 | Success | - | |
|
exp_self.20260419181801.590_20260419_181802
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419181801.590 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 18:19 | Success | - | |
|
exp_self.20260419181034.589_20260419_181034
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419181034.589 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 18:11 | Success | - | |
|
exp_self.20260419180310.588_20260419_180310
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419180310.588 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 18:04 | Success | - | |
|
exp_pytrain.20260419180044.145_20260419_180044
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 18:01 | Success | - | |
|
exp_self.20260419175351.587_20260419_175351
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419175351.587 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 17:54 | Success | - | |
|
exp_self.20260419174627.586_20260419_174627
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419174627.586 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 17:47 | Success | - | |
|
exp_self.20260419173846.585_20260419_173847
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419173846.585 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 17:39 | Success | - | |
|
exp_self.20260419173109.584_20260419_173109
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419173109.584 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 17:32 | Success | - | |
|
exp_pytrain.20260419172841.144_20260419_172842
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 17:29 | Success | - | |
|
exp_self.20260419172148.583_20260419_172148
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419172148.583 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 17:22 | Success | - | |
|
exp_self.20260419171425.582_20260419_171425
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419171425.582 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 17:15 | Success | - | |
|
exp_self.20260419170654.581_20260419_170654
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419170654.581 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 17:07 | Success | - | |
|
exp_self.20260419165915.580_20260419_165915
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419165915.580 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 17:00 | Success | - | |
|
exp_pytrain.20260419165648.143_20260419_165648
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 16:57 | Success | - | |
|
exp_self.20260419164954.579_20260419_164954
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419164954.579 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 16:50 | Success | - | |
|
exp_self.20260419164229.578_20260419_164230
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419164229.578 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 16:43 | Success | - | |
|
exp_self.20260419163505.577_20260419_163505
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419163505.577 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 16:36 | Success | - | |
|
exp_self.20260419162737.576_20260419_162737
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419162737.576 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 16:28 | Success | - | |
|
exp_pytrain.20260419162506.142_20260419_162507
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 16:26 | Success | - | |
|
exp_self.20260419161811.575_20260419_161811
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419161811.575 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 16:19 | Success | - | |
|
exp_self.20260419161045.574_20260419_161045
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419161045.574 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 16:11 | Success | - | |
|
exp_self.20260419160310.573_20260419_160310
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419160310.573 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 16:04 | Success | - | |
|
exp_self.20260419155532.572_20260419_155533
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419155532.572 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 15:56 | Success | - | |
|
exp_pytrain.20260419155257.141_20260419_155257
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 15:54 | Success | - | |
|
exp_self.20260419154600.571_20260419_154600
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419154600.571 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 15:47 | Success | - | |
|
exp_self.20260419153830.570_20260419_153831
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419153830.570 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 15:39 | Success | - | |
|
exp_self.20260419153101.569_20260419_153102
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419153101.569 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 15:32 | Success | - | |
|
exp_self.20260419152327.568_20260419_152327
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419152327.568 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 15:24 | Success | - | |
|
exp_pytrain.20260419152047.140_20260419_152047
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 15:21 | Success | - | |
|
exp_self.20260419151349.567_20260419_151350
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419151349.567 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 15:14 | Success | - | |
|
exp_self.20260419150616.566_20260419_150617
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419150616.566 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 15:07 | Success | - | |
|
exp_self.20260419145845.565_20260419_145845
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419145845.565 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 14:59 | Success | - | |
|
exp_self.20260419145119.564_20260419_145120
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419145119.564 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 14:52 | Success | - | |
|
exp_pytrain.20260419144842.139_20260419_144842
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 14:49 | Success | - | |
|
exp_self.20260419144146.563_20260419_144146
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419144146.563 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 14:42 | Success | - | |
|
exp_self.20260419143416.562_20260419_143416
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419143416.562 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 14:35 | Success | - | |
|
exp_self.20260419142647.561_20260419_142648
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419142647.561 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 14:27 | Success | - | |
|
exp_self.20260419141918.560_20260419_141918
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419141918.560 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 14:20 | Success | - | |
|
exp_pytrain.20260419141643.138_20260419_141644
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 14:17 | Success | - | |
|
exp_self.20260419140950.559_20260419_140950
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419140950.559 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 14:10 | Success | - | |
|
exp_self.20260419140218.558_20260419_140219
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419140218.558 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 14:03 | Success | - | |
|
exp_self.20260419135445.557_20260419_135446
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419135445.557 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 13:55 | Success | - | |
|
exp_self.20260419134717.556_20260419_134717
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419134717.556 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 13:48 | Success | - | |
|
exp_pytrain.20260419134442.137_20260419_134442
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 13:45 | Success | - | |
|
exp_self.20260419133740.555_20260419_133740
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419133740.555 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 13:38 | Success | - | |
|
exp_self.20260419133011.554_20260419_133011
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419133011.554 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 13:31 | Success | - | |
|
exp_self.20260419132245.553_20260419_132246
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419132245.553 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 13:23 | Success | - | |
|
exp_self.20260419131517.552_20260419_131517
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419131517.552 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 13:16 | Success | - | |
|
exp_pytrain.20260419131249.136_20260419_131249
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 13:13 | Success | - | |
|
exp_self.20260419130659.551_20260419_130659
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419130659.551 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 13:08 | Success | - | |
|
exp_self.20260419125854.550_20260419_125854
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419125854.550 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 12:59 | Success | - | |
|
exp_self.20260419125120.549_20260419_125120
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419125120.549 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 12:52 | Success | - | |
|
exp_self.20260419124350.548_20260419_124350
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419124350.548 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 12:44 | Success | - | |
|
exp_pytrain.20260419124124.135_20260419_124124
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 12:42 | Success | - | |
|
exp_self.20260419123429.547_20260419_123430
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419123429.547 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 12:35 | Success | - | |
|
exp_self.20260419122704.546_20260419_122705
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419122704.546 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 12:28 | Success | - | |
|
exp_self.20260419121934.545_20260419_121934
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419121934.545 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 12:20 | Success | - | |
|
exp_self.20260419121155.544_20260419_121155
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419121155.544 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 12:12 | Success | - | |
|
exp_pytrain.20260419120926.134_20260419_120926
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 12:10 | Success | - | |
|
exp_self.20260419120231.543_20260419_120232
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419120231.543 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 12:03 | Success | - | |
|
exp_self.20260419115506.542_20260419_115507
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419115506.542 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 11:56 | Success | - | |
|
exp_self.20260419114737.541_20260419_114737
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419114737.541 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 11:48 | Success | - | |
|
exp_self.20260419114004.540_20260419_114004
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419114004.540 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 11:41 | Success | - | |
|
exp_pytrain.20260419113726.133_20260419_113726
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 11:38 | Success | - | |
|
exp_self.20260419113022.539_20260419_113022
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419113022.539 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 11:31 | Success | - | |
|
exp_self.20260419112252.538_20260419_112253
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419112252.538 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 11:23 | Success | - | |
|
exp_self.20260419111523.537_20260419_111524
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419111523.537 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 11:16 | Success | - | |
|
exp_self.20260419110751.536_20260419_110751
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419110751.536 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 11:08 | Success | - | |
|
exp_pytrain.20260419110520.132_20260419_110520
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 11:06 | Success | - | |
|
exp_self.20260419105823.535_20260419_105823
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419105823.535 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 10:59 | Success | - | |
|
exp_self.20260419105044.534_20260419_105045
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419105044.534 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 10:51 | Success | - | |
|
exp_self.20260419104318.533_20260419_104319
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419104318.533 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 10:44 | Success | - | |
|
exp_self.20260419103550.532_20260419_103551
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419103550.532 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 10:36 | Success | - | |
|
exp_pytrain.20260419103315.131_20260419_103315
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 10:34 | Success | - | |
|
exp_self.20260419102620.531_20260419_102621
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419102620.531 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 10:27 | Success | - | |
|
exp_self.20260419101913.530_20260419_101914
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419101913.530 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 10:20 | Success | - | |
|
exp_self.20260419101135.529_20260419_101136
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419101135.529 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 10:12 | Success | - | |
|
exp_self.20260419100400.528_20260419_100401
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419100400.528 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 10:05 | Success | - | |
|
exp_pytrain.20260419100131.130_20260419_100131
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 10:02 | Success | - | |
|
exp_self.20260419095426.527_20260419_095426
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419095426.527 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 09:55 | Success | - | |
|
exp_self.20260419094657.526_20260419_094657
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419094657.526 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 09:48 | Success | - | |
|
exp_self.20260419093926.525_20260419_093927
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419093926.525 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 09:40 | Success | - | |
|
exp_self.20260419093146.524_20260419_093147
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419093146.524 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 09:32 | Success | - | |
|
exp_pytrain.20260419092914.129_20260419_092914
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 09:30 | Success | - | |
|
exp_self.20260419092211.523_20260419_092212
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419092211.523 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 09:23 | Success | - | |
|
exp_self.20260419091441.522_20260419_091441
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419091441.522 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 09:15 | Success | - | |
|
exp_self.20260419090713.521_20260419_090713
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419090713.521 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 09:08 | Success | - | |
|
exp_self.20260419085938.520_20260419_085939
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419085938.520 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 09:00 | Success | - | |
|
exp_pytrain.20260419085707.128_20260419_085707
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 08:58 | Success | - | |
|
exp_self.20260419085004.519_20260419_085005
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419085004.519 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 08:51 | Success | - | |
|
exp_self.20260419084234.518_20260419_084234
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419084234.518 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 08:43 | Success | - | |
|
exp_self.20260419083506.517_20260419_083507
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419083506.517 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 08:36 | Success | - | |
|
exp_self.20260419082735.516_20260419_082735
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419082735.516 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 08:28 | Success | - | |
|
exp_pytrain.20260419082504.127_20260419_082505
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 08:26 | Success | - | |
|
exp_self.20260419081810.515_20260419_081810
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419081810.515 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 08:19 | Success | - | |
|
exp_self.20260419081039.514_20260419_081039
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419081039.514 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 08:11 | Success | - | |
|
exp_self.20260419080309.513_20260419_080310
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419080309.513 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 08:04 | Success | - | |
|
exp_self.20260419075537.512_20260419_075538
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419075537.512 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 07:56 | Success | - | |
|
exp_pytrain.20260419075303.126_20260419_075303
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 07:54 | Success | - | |
|
exp_self.20260419074609.511_20260419_074609
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419074609.511 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 07:47 | Success | - | |
|
exp_self.20260419073832.510_20260419_073833
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419073832.510 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 07:39 | Success | - | |
|
exp_self.20260419073100.509_20260419_073100
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419073100.509 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 07:32 | Success | - | |
|
exp_self.20260419072330.508_20260419_072330
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419072330.508 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 07:24 | Success | - | |
|
exp_pytrain.20260419072052.125_20260419_072052
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 07:21 | Success | - | |
|
exp_self.20260419071353.507_20260419_071353
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419071353.507 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 07:14 | Success | - | |
|
exp_self.20260419070619.506_20260419_070619
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419070619.506 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 07:07 | Success | - | |
|
exp_self.20260419065842.505_20260419_065842
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419065842.505 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 06:59 | Success | - | |
|
exp_self.20260419065108.504_20260419_065109
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419065108.504 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 06:52 | Success | - | |
|
exp_pytrain.20260419064833.124_20260419_064834
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 06:49 | Success | - | |
|
exp_self.20260419064140.503_20260419_064140
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419064140.503 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 06:42 | Success | - | |
|
exp_self.20260419063411.502_20260419_063411
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419063411.502 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 06:35 | Success | - | |
|
exp_self.20260419062643.501_20260419_062643
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419062643.501 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 06:27 | Success | - | |
|
exp_self.20260419061914.500_20260419_061914
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419061914.500 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 06:20 | Success | - | |
|
exp_pytrain.20260419061641.123_20260419_061641
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 06:17 | Success | - | |
|
exp_self.20260419060944.499_20260419_060944
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419060944.499 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 06:10 | Success | - | |
|
exp_self.20260419060213.498_20260419_060213
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419060213.498 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 06:03 | Success | - | |
|
exp_self.20260419055446.497_20260419_055447
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419055446.497 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 05:55 | Success | - | |
|
exp_self.20260419054718.496_20260419_054718
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419054718.496 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 05:48 | Success | - | |
|
exp_pytrain.20260419054449.122_20260419_054450
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 05:45 | Success | - | |
|
exp_self.20260419053756.495_20260419_053756
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419053756.495 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 05:38 | Success | - | |
|
exp_self.20260419053025.494_20260419_053025
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419053025.494 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 05:31 | Success | - | |
|
exp_self.20260419052256.493_20260419_052256
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419052256.493 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 05:23 | Success | - | |
|
exp_self.20260419051528.492_20260419_051529
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419051528.492 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 05:16 | Success | - | |
|
exp_pytrain.20260419051303.121_20260419_051304
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 05:14 | Success | - | |
|
exp_self.20260419050738.491_20260419_050739
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419050738.491 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 05:08 | Success | - | |
|
exp_self.20260419045945.490_20260419_045946
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419045945.490 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 05:00 | Success | - | |
|
exp_self.20260419045146.489_20260419_045146
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419045146.489 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 04:52 | Success | - | |
|
exp_self.20260419044412.488_20260419_044413
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419044412.488 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 04:45 | Success | - | |
|
exp_pytrain.20260419044136.120_20260419_044137
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 04:42 | Success | - | |
|
exp_self.20260419043440.487_20260419_043441
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419043440.487 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 04:35 | Success | - | |
|
exp_self.20260419042714.486_20260419_042714
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419042714.486 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 04:28 | Success | - | |
|
exp_self.20260419041946.485_20260419_041946
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419041946.485 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 04:20 | Success | - | |
|
exp_self.20260419041221.484_20260419_041221
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419041221.484 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 04:13 | Success | - | |
|
exp_pytrain.20260419040945.119_20260419_040945
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 04:10 | Success | - | |
|
exp_self.20260419040253.483_20260419_040253
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419040253.483 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 04:03 | Success | - | |
|
exp_self.20260419035518.482_20260419_035519
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419035518.482 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 03:56 | Success | - | |
|
exp_self.20260419034751.481_20260419_034752
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419034751.481 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 03:48 | Success | - | |
|
exp_self.20260419034029.480_20260419_034029
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419034029.480 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 03:41 | Success | - | |
|
exp_pytrain.20260419033758.118_20260419_033759
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 03:39 | Success | - | |
|
exp_self.20260419033111.479_20260419_033111
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419033111.479 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 03:32 | Success | - | |
|
exp_self.20260419032341.478_20260419_032342
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419032341.478 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 03:24 | Success | - | |
|
exp_self.20260419031618.477_20260419_031618
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419031618.477 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 03:17 | Success | - | |
|
exp_self.20260419030850.476_20260419_030850
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419030850.476 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 03:09 | Success | - | |
|
exp_pytrain.20260419030620.117_20260419_030620
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 03:07 | Success | - | |
|
exp_self.20260419025927.475_20260419_025927
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419025927.475 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 03:00 | Success | - | |
|
exp_self.20260419025204.474_20260419_025205
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419025204.474 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 02:53 | Success | - | |
|
exp_self.20260419024434.473_20260419_024434
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419024434.473 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 02:45 | Success | - | |
|
exp_self.20260419023709.472_20260419_023709
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419023709.472 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 02:38 | Success | - | |
|
exp_pytrain.20260419023440.116_20260419_023441
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 02:35 | Success | - | |
|
exp_self.20260419022749.471_20260419_022749
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419022749.471 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 02:28 | Success | - | |
|
exp_self.20260419022026.470_20260419_022026
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419022026.470 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 02:21 | Success | - | |
|
exp_self.20260419021254.469_20260419_021254
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419021254.469 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 02:13 | Success | - | |
|
exp_self.20260419020527.468_20260419_020528
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419020527.468 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 02:06 | Success | - | |
|
exp_pytrain.20260419020301.115_20260419_020302
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 02:04 | Success | - | |
|
exp_self.20260419015600.467_20260419_015600
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419015600.467 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 01:57 | Success | - | |
|
exp_self.20260419014834.466_20260419_014834
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419014834.466 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 01:49 | Success | - | |
|
exp_self.20260419014110.465_20260419_014110
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419014110.465 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 01:42 | Success | - | |
|
exp_self.20260419013340.464_20260419_013340
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419013340.464 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 01:34 | Success | - | |
|
exp_pytrain.20260419013110.114_20260419_013110
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 01:32 | Success | - | |
|
exp_self.20260419012409.463_20260419_012409
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419012409.463 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 01:25 | Success | - | |
|
exp_self.20260419011640.462_20260419_011641
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419011640.462 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 01:17 | Success | - | |
|
exp_self.20260419010913.461_20260419_010913
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419010913.461 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 01:10 | Success | - | |
|
exp_self.20260419010139.460_20260419_010140
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419010139.460 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 01:02 | Success | - | |
|
exp_pytrain.20260419005910.113_20260419_005911
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 01:00 | Success | - | |
|
exp_self.20260419005211.459_20260419_005211
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419005211.459 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 00:53 | Success | - | |
|
exp_self.20260419004447.458_20260419_004448
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419004447.458 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 00:45 | Success | - | |
|
exp_self.20260419003719.457_20260419_003720
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419003719.457 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 00:38 | Success | - | |
|
exp_self.20260419002939.456_20260419_002940
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419002939.456 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 00:30 | Success | - | |
|
exp_pytrain.20260419002653.112_20260419_002654
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-19 00:27 | Success | - | |
|
exp_self.20260419001943.455_20260419_001943
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419001943.455 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 00:20 | Success | - | |
|
exp_self.20260419001159.454_20260419_001159
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419001159.454 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 00:13 | Success | - | |
|
exp_self.20260419000409.453_20260419_000410
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260419000409.453 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-19 00:05 | Success | - | |
|
exp_cr_10.1108_compel-11-2025-0530_20260419_000037
|
Analytical calculation model of eddy current loss of power transformer winding using method of images
Paper ID: cr_10.1108_compel-11-2025-0530 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Rec...
|
04-19 00:01 | Success | - | |
|
exp_self.20260418235718.452_20260418_235718
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418235718.452 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 23:58 | Success | - | |
|
exp_pytrain.20260418235432.111_20260418_235432
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 23:55 | Success | - | |
|
exp_self.20260418234901.451_20260418_234902
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418234901.451 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 23:50 | Success | - | |
|
exp_self.20260418234112.450_20260418_234113
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418234112.450 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 23:42 | Success | - | |
|
exp_self.20260418233322.449_20260418_233322
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418233322.449 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 23:34 | Success | - | |
|
exp_self.20260418232526.448_20260418_232526
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418232526.448 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 23:26 | Success | - | |
|
exp_pytrain.20260418232248.110_20260418_232248
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 23:23 | Success | - | |
|
exp_self.20260418231640.447_20260418_231640
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418231640.447 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 23:17 | Success | - | |
|
exp_self.20260418230849.446_20260418_230850
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418230849.446 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 23:09 | Success | - | |
|
exp_self.20260418230100.445_20260418_230101
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418230100.445 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 23:02 | Success | - | |
|
exp_self.20260418225320.444_20260418_225320
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418225320.444 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 22:54 | Success | - | |
|
exp_pytrain.20260418225027.109_20260418_225027
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 22:51 | Success | - | |
|
exp_self.20260418224453.443_20260418_224454
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418224453.443 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 22:45 | Success | - | |
|
exp_self.20260418223705.442_20260418_223705
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418223705.442 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 22:38 | Success | - | |
|
exp_self.20260418222917.441_20260418_222917
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418222917.441 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 22:30 | Success | - | |
|
exp_self.20260418222129.440_20260418_222130
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418222129.440 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 22:22 | Success | - | |
|
exp_pytrain.20260418221849.108_20260418_221849
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 22:19 | Success | - | |
|
exp_self.20260418221313.439_20260418_221313
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418221313.439 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 22:14 | Success | - | |
|
exp_self.20260418220533.438_20260418_220533
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418220533.438 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 22:06 | Success | - | |
|
exp_self.20260418215744.437_20260418_215744
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418215744.437 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 21:58 | Success | - | |
|
exp_self.20260418215003.436_20260418_215003
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418215003.436 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 21:51 | Success | - | |
|
exp_pytrain.20260418214716.107_20260418_214716
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 21:48 | Success | - | |
|
exp_self.20260418214140.435_20260418_214141
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418214140.435 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 21:42 | Success | - | |
|
exp_self.20260418213358.434_20260418_213358
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418213358.434 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 21:35 | Success | - | |
|
exp_self.20260418212618.433_20260418_212618
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418212618.433 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 21:27 | Success | - | |
|
exp_self.20260418211827.432_20260418_211827
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418211827.432 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 21:19 | Success | - | |
|
exp_pytrain.20260418211549.106_20260418_211550
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 21:16 | Success | - | |
|
exp_self.20260418210835.431_20260418_210835
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418210835.431 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 21:09 | Success | - | |
|
exp_self.20260418210141.430_20260418_210142
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418210141.430 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 21:02 | Success | - | |
|
exp_self.20260418205350.429_20260418_205350
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418205350.429 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 20:54 | Success | - | |
|
exp_self.20260418204558.428_20260418_204558
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418204558.428 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 20:47 | Success | - | |
|
exp_pytrain.20260418204316.105_20260418_204316
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 20:44 | Success | - | |
|
exp_self.20260418203703.427_20260418_203703
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418203703.427 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 20:38 | Success | - | |
|
exp_self.20260418202911.426_20260418_202911
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418202911.426 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 20:30 | Success | - | |
|
exp_self.20260418202118.425_20260418_202119
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418202118.425 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 20:22 | Success | - | |
|
exp_self.20260418201328.424_20260418_201329
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418201328.424 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 20:14 | Success | - | |
|
exp_pytrain.20260418201046.104_20260418_201046
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 20:11 | Success | - | |
|
exp_self.20260418200329.423_20260418_200329
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418200329.423 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 20:04 | Success | - | |
|
exp_gh_donitb934_1Cat-vLLM_20260418_200004
|
donitb934/1Cat-vLLM
Paper ID: gh_donitb934_1Cat-vLLM - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered b...
|
04-18 20:01 | Success | - | |
|
exp_self.20260418195635.422_20260418_195635
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418195635.422 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 19:57 | Success | - | |
|
exp_self.20260418194852.421_20260418_194853
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418194852.421 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 19:49 | Success | - | |
|
exp_self.20260418194119.420_20260418_194120
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418194119.420 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 19:42 | Success | - | |
|
exp_pytrain.20260418193847.103_20260418_193848
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 19:39 | Success | - | |
|
exp_self.20260418193146.419_20260418_193146
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418193146.419 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 19:32 | Success | - | |
|
exp_self.20260418192416.418_20260418_192416
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418192416.418 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 19:25 | Success | - | |
|
exp_self.20260418191644.417_20260418_191645
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418191644.417 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 19:17 | Success | - | |
|
exp_self.20260418190914.416_20260418_190914
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418190914.416 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 19:10 | Success | - | |
|
exp_pytrain.20260418190636.102_20260418_190637
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 19:07 | Success | - | |
|
exp_self.20260418185942.415_20260418_185942
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418185942.415 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 19:00 | Success | - | |
|
exp_self.20260418185209.414_20260418_185210
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418185209.414 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 18:53 | Success | - | |
|
exp_self.20260418184436.413_20260418_184436
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418184436.413 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 18:45 | Success | - | |
|
exp_self.20260418183707.412_20260418_183707
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418183707.412 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 18:38 | Success | - | |
|
exp_pytrain.20260418183429.101_20260418_183430
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 18:35 | Success | - | |
|
exp_self.20260418182725.411_20260418_182726
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418182725.411 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 18:28 | Success | - | |
|
exp_self.20260418181954.410_20260418_181955
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418181954.410 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 18:20 | Success | - | |
|
exp_self.20260418181224.409_20260418_181224
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418181224.409 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 18:13 | Success | - | |
|
exp_self.20260418180458.408_20260418_180458
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418180458.408 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 18:06 | Success | - | |
|
exp_pytrain.20260418180223.100_20260418_180224
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 18:03 | Success | - | |
|
exp_self.20260418175805.407_20260418_175806
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418175805.407 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 17:59 | Success | - | |
|
exp_self.20260418175032.406_20260418_175033
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418175032.406 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 17:51 | Success | - | |
|
exp_cr_10.32628_ijsrst52310283_20260418_174742
|
Enhancing Transformer Attention Mechanisms for Knowledge Retention in Fine-Tuned Large Language Models
Paper ID: cr_10.32628_ijsrst52310283 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recover...
|
04-18 17:48 | Success | - | |
|
exp_self.20260418174041.405_20260418_174041
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418174041.405 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 17:41 | Success | - | |
|
exp_self.20260418173309.404_20260418_173309
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418173309.404 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 17:34 | Success | - | |
|
exp_pytrain.20260418173035.099_20260418_173035
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 17:31 | Success | - | |
|
exp_self.20260418172329.403_20260418_172329
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418172329.403 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 17:24 | Success | - | |
|
exp_self.20260418171601.402_20260418_171601
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418171601.402 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 17:17 | Success | - | |
|
exp_self.20260418170833.401_20260418_170833
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418170833.401 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 17:09 | Success | - | |
|
exp_self.20260418170053.400_20260418_170054
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418170053.400 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 17:01 | Success | - | |
|
exp_pytrain.20260418165817.098_20260418_165818
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 16:59 | Success | - | |
|
exp_self.20260418165124.399_20260418_165125
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418165124.399 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 16:52 | Success | - | |
|
exp_self.20260418164358.398_20260418_164358
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418164358.398 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 16:45 | Success | - | |
|
exp_self.20260418163631.397_20260418_163631
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418163631.397 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 16:37 | Success | - | |
|
exp_self.20260418162906.396_20260418_162907
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418162906.396 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 16:30 | Success | - | |
|
exp_pytrain.20260418162635.097_20260418_162635
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 16:27 | Success | - | |
|
exp_self.20260418161943.395_20260418_161943
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418161943.395 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 16:20 | Success | - | |
|
exp_self.20260418161218.394_20260418_161218
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418161218.394 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 16:13 | Success | - | |
|
exp_self.20260418160446.393_20260418_160447
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418160446.393 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 16:05 | Success | - | |
|
exp_self.20260418155720.392_20260418_155720
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418155720.392 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 15:58 | Success | - | |
|
exp_pytrain.20260418155443.096_20260418_155444
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 15:55 | Success | - | |
|
exp_self.20260418154749.391_20260418_154750
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418154749.391 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 15:48 | Success | - | |
|
exp_self.20260418154018.390_20260418_154018
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418154018.390 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 15:41 | Success | - | |
|
exp_self.20260418153250.389_20260418_153251
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418153250.389 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 15:33 | Success | - | |
|
exp_self.20260418152523.388_20260418_152524
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418152523.388 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 15:26 | Success | - | |
|
exp_pytrain.20260418152251.095_20260418_152251
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 15:23 | Success | - | |
|
exp_self.20260418151559.387_20260418_151600
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418151559.387 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 15:17 | Success | - | |
|
exp_self.20260418150819.386_20260418_150820
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418150819.386 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 15:09 | Success | - | |
|
exp_gh_Bhavesh716_LLM-from-Scratch_20260418_150500
|
Bhavesh716/LLM-from-Scratch
Paper ID: gh_Bhavesh716_LLM-from-Scratch - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Rec...
|
04-18 15:06 | Success | - | |
|
exp_self.20260418150033.385_20260418_150033
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418150033.385 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 15:01 | Success | - | |
|
exp_self.20260418145301.384_20260418_145301
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418145301.384 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 14:54 | Success | - | |
|
exp_pytrain.20260418145033.094_20260418_145033
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 14:51 | Success | - | |
|
exp_self.20260418144331.383_20260418_144331
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418144331.383 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 14:44 | Success | - | |
|
exp_self.20260418143605.382_20260418_143606
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418143605.382 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 14:37 | Success | - | |
|
exp_self.20260418142840.381_20260418_142840
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418142840.381 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 14:29 | Success | - | |
|
exp_self.20260418142110.380_20260418_142110
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418142110.380 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 14:22 | Success | - | |
|
exp_pytrain.20260418141834.093_20260418_141834
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 14:19 | Success | - | |
|
exp_self.20260418141142.379_20260418_141142
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418141142.379 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 14:12 | Success | - | |
|
exp_self.20260418140409.378_20260418_140409
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418140409.378 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 14:05 | Success | - | |
|
exp_self.20260418135637.377_20260418_135637
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418135637.377 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 13:57 | Success | - | |
|
exp_self.20260418134905.376_20260418_134905
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418134905.376 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 13:50 | Success | - | |
|
exp_pytrain.20260418134627.092_20260418_134627
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 13:47 | Success | - | |
|
exp_self.20260418133933.375_20260418_133934
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418133933.375 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 13:40 | Success | - | |
|
exp_self.20260418133202.374_20260418_133202
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418133202.374 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 13:33 | Success | - | |
|
exp_self.20260418132433.373_20260418_132433
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418132433.373 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 13:25 | Success | - | |
|
exp_self.20260418131709.372_20260418_131710
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418131709.372 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 13:18 | Success | - | |
|
exp_pytrain.20260418131433.091_20260418_131434
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 13:15 | Success | - | |
|
exp_self.20260418130742.371_20260418_130742
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418130742.371 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 13:08 | Success | - | |
|
exp_self.20260418130013.370_20260418_130013
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418130013.370 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 13:01 | Success | - | |
|
exp_self.20260418125241.369_20260418_125241
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418125241.369 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 12:53 | Success | - | |
|
exp_self.20260418124513.368_20260418_124514
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418124513.368 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 12:46 | Success | - | |
|
exp_pytrain.20260418124240.090_20260418_124241
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 12:43 | Success | - | |
|
exp_self.20260418123550.367_20260418_123550
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418123550.367 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 12:36 | Success | - | |
|
exp_self.20260418122819.366_20260418_122819
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418122819.366 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 12:29 | Success | - | |
|
exp_self.20260418122023.365_20260418_122024
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418122023.365 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 12:21 | Success | - | |
|
exp_self.20260418121255.364_20260418_121256
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418121255.364 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 12:13 | Success | - | |
|
exp_pytrain.20260418121023.089_20260418_121023
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 12:11 | Success | - | |
|
exp_self.20260418120334.363_20260418_120334
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418120334.363 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 12:04 | Success | - | |
|
exp_self.20260418115616.362_20260418_115616
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418115616.362 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 11:57 | Success | - | |
|
exp_self.20260418114832.361_20260418_114832
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418114832.361 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 11:49 | Success | - | |
|
exp_self.20260418114040.360_20260418_114041
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418114040.360 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 11:41 | Success | - | |
|
exp_pytrain.20260418113759.088_20260418_113759
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 11:39 | Success | - | |
|
exp_self.20260418113151.359_20260418_113152
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418113151.359 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 11:32 | Success | - | |
|
exp_self.20260418112407.358_20260418_112408
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418112407.358 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 11:25 | Success | - | |
|
exp_self.20260418111624.357_20260418_111624
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418111624.357 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 11:17 | Success | - | |
|
exp_self.20260418110836.356_20260418_110837
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418110836.356 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 11:09 | Success | - | |
|
exp_pytrain.20260418110550.087_20260418_110550
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 11:06 | Success | - | |
|
exp_self.20260418110023.355_20260418_110023
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418110023.355 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 11:01 | Success | - | |
|
exp_self.20260418105241.354_20260418_105241
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418105241.354 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 10:53 | Success | - | |
|
exp_self.20260418104449.353_20260418_104449
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418104449.353 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 10:45 | Success | - | |
|
exp_self.20260418103650.352_20260418_103650
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418103650.352 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 10:37 | Success | - | |
|
exp_pytrain.20260418103415.086_20260418_103415
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 10:35 | Success | - | |
|
exp_self.20260418102656.351_20260418_102657
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418102656.351 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 10:28 | Success | - | |
|
exp_self.20260418101926.350_20260418_101926
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418101926.350 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 10:20 | Success | - | |
|
exp_self.20260418101151.349_20260418_101151
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418101151.349 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 10:12 | Success | - | |
|
exp_self.20260418100420.348_20260418_100420
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418100420.348 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 10:05 | Success | - | |
|
exp_pytrain.20260418100151.085_20260418_100151
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 10:02 | Success | - | |
|
exp_self.20260418095444.347_20260418_095444
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418095444.347 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 09:55 | Success | - | |
|
exp_self.20260418094705.346_20260418_094705
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418094705.346 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 09:48 | Success | - | |
|
exp_self.20260418093934.345_20260418_093935
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418093934.345 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 09:40 | Success | - | |
|
exp_self.20260418093148.344_20260418_093148
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418093148.344 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 09:32 | Success | - | |
|
exp_pytrain.20260418092909.084_20260418_092909
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 09:30 | Success | - | |
|
exp_self.20260418092445.343_20260418_092445
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418092445.343 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 09:25 | Success | - | |
|
exp_self.20260418091718.342_20260418_091719
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418091718.342 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 09:18 | Success | - | |
|
exp_self.20260418090940.341_20260418_090940
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418090940.341 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 09:10 | Success | - | |
|
exp_self.20260418090201.340_20260418_090201
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418090201.340 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 09:03 | Success | - | |
|
exp_gh_Sidgithub18_mlbuild_20260418_085912
|
Sidgithub18/mlbuild
Paper ID: gh_Sidgithub18_mlbuild - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered b...
|
04-18 09:00 | Success | - | |
|
exp_pytrain.20260418085654.083_20260418_085654
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 08:57 | Success | - | |
|
exp_self.20260418085002.339_20260418_085003
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418085002.339 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 08:51 | Success | - | |
|
exp_self.20260418084227.338_20260418_084227
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418084227.338 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 08:43 | Success | - | |
|
exp_self.20260418083455.337_20260418_083455
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418083455.337 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 08:35 | Success | - | |
|
exp_self.20260418082727.336_20260418_082727
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418082727.336 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 08:28 | Success | - | |
|
exp_pytrain.20260418082454.082_20260418_082455
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 08:25 | Success | - | |
|
exp_self.20260418081800.335_20260418_081800
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418081800.335 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 08:19 | Success | - | |
|
exp_self.20260418081036.334_20260418_081036
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418081036.334 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 08:11 | Success | - | |
|
exp_self.20260418080254.333_20260418_080254
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418080254.333 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 08:03 | Success | - | |
|
exp_self.20260418075509.332_20260418_075509
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418075509.332 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 07:56 | Success | - | |
|
exp_pytrain.20260418075219.081_20260418_075220
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 07:53 | Success | - | |
|
exp_self.20260418074531.331_20260418_074531
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418074531.331 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 07:46 | Success | - | |
|
exp_self.20260418073806.330_20260418_073806
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418073806.330 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 07:39 | Success | - | |
|
exp_self.20260418073031.329_20260418_073031
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418073031.329 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 07:31 | Success | - | |
|
exp_self.20260418072304.328_20260418_072305
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418072304.328 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 07:24 | Success | - | |
|
exp_pytrain.20260418072041.080_20260418_072041
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 07:21 | Success | - | |
|
exp_self.20260418071328.327_20260418_071329
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418071328.327 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 07:14 | Success | - | |
|
exp_self.20260418070557.326_20260418_070557
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418070557.326 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 07:07 | Success | - | |
|
exp_gh_maple3788_RAG_Lab_20260418_070128
|
maple3788/RAG_Lab
Paper ID: gh_maple3788_RAG_Lab - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered ben...
|
04-18 07:02 | Success | - | |
|
exp_self.20260418065812.325_20260418_065812
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418065812.325 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 06:59 | Success | - | |
|
exp_self.20260418065058.324_20260418_065059
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418065058.324 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 06:52 | Success | - | |
|
exp_pytrain.20260418064817.079_20260418_064817
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 06:49 | Success | - | |
|
exp_self.20260418064138.323_20260418_064138
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418064138.323 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 06:42 | Success | - | |
|
exp_self.20260418063422.322_20260418_063422
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418063422.322 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 06:35 | Success | - | |
|
exp_self.20260418062706.321_20260418_062706
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418062706.321 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 06:28 | Success | - | |
|
exp_self.20260418061954.320_20260418_061954
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418061954.320 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 06:20 | Success | - | |
|
exp_pytrain.20260418061627.078_20260418_061628
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 06:17 | Success | - | |
|
exp_self.20260418061224.319_20260418_061224
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418061224.319 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 06:13 | Success | - | |
|
exp_self.20260418060513.318_20260418_060513
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418060513.318 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 06:06 | Success | - | |
|
exp_self.20260418055800.317_20260418_055800
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418055800.317 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 05:59 | Success | - | |
|
exp_self.20260418055042.316_20260418_055043
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418055042.316 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 05:51 | Success | - | |
|
exp_pytrain.20260418054506.077_20260418_054506
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 05:46 | Success | - | |
|
exp_self.20260418054318.315_20260418_054318
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418054318.315 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 05:44 | Success | - | |
|
exp_self.20260418053600.314_20260418_053600
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418053600.314 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 05:37 | Success | - | |
|
exp_self.20260418052844.313_20260418_052844
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418052844.313 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 05:29 | Success | - | |
|
exp_self.20260418052132.312_20260418_052132
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418052132.312 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 05:22 | Success | - | |
|
exp_gh_hussin2323332_slrm-lumin-fusion_20260418_051826
|
hussin2323332/slrm-lumin-fusion
Paper ID: gh_hussin2323332_slrm-lumin-fusion - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal:...
|
04-18 05:19 | Success | - | |
|
exp_self.20260418051413.311_20260418_051413
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418051413.311 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 05:15 | Success | - | |
|
exp_pytrain.20260418051159.076_20260418_051159
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 05:13 | Success | - | |
|
exp_gh_mzuhair9933_PoPE-pytorch_20260418_050920
|
mzuhair9933/PoPE-pytorch
Paper ID: gh_mzuhair9933_PoPE-pytorch - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recove...
|
04-18 05:10 | Success | - | |
|
exp_self.20260418050510.310_20260418_050511
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418050510.310 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 05:06 | Success | - | |
|
exp_self.20260418045757.309_20260418_045757
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418045757.309 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 04:58 | Success | - | |
|
exp_self.20260418045045.308_20260418_045046
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418045045.308 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 04:51 | Success | - | |
|
exp_self.20260418044336.307_20260418_044336
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418044336.307 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 04:44 | Success | - | |
|
exp_pytrain.20260418044010.075_20260418_044010
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 04:41 | Success | - | |
|
exp_self.20260418043607.306_20260418_043608
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418043607.306 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 04:37 | Success | - | |
|
exp_self.20260418042856.305_20260418_042857
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418042856.305 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 04:29 | Success | - | |
|
exp_self.20260418042142.304_20260418_042143
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418042142.304 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 04:22 | Success | - | |
|
exp_self.20260418041419.303_20260418_041420
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418041419.303 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 04:15 | Success | - | |
|
exp_pytrain.20260418040844.074_20260418_040844
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 04:09 | Success | - | |
|
exp_self.20260418040656.302_20260418_040656
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418040656.302 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 04:07 | Success | - | |
|
exp_self.20260418035937.301_20260418_035937
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418035937.301 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 04:00 | Success | - | |
|
exp_self.20260418035225.300_20260418_035225
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418035225.300 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 03:53 | Success | - | |
|
exp_self.20260418034514.299_20260418_034514
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418034514.299 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 03:46 | Success | - | |
|
exp_self.20260418033802.298_20260418_033802
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418033802.298 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 03:39 | Success | - | |
|
exp_pytrain.20260418033540.073_20260418_033540
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 03:36 | Success | - | |
|
exp_oa_W7154587199_20260418_033300
|
Mapping the LLM Landscape: A Cross-Family Survey of Architectures, Alignment Methods, and Benchmark Performance
Paper ID: oa_W7154587199 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-18 03:34 | Success | - | |
|
exp_self.20260418032743.297_20260418_032743
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418032743.297 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 03:28 | Success | - | |
|
exp_self.20260418032024.296_20260418_032025
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418032024.296 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 03:21 | Success | - | |
|
exp_self.20260418031311.295_20260418_031311
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418031311.295 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 03:14 | Success | - | |
|
exp_self.20260418030601.294_20260418_030601
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418030601.294 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 03:07 | Success | - | |
|
exp_pytrain.20260418030235.072_20260418_030235
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 03:03 | Success | - | |
|
exp_self.20260418025829.293_20260418_025829
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418025829.293 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 02:59 | Success | - | |
|
exp_self.20260418025118.292_20260418_025118
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418025118.292 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 02:52 | Success | - | |
|
exp_self.20260418024406.291_20260418_024406
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418024406.291 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 02:45 | Success | - | |
|
exp_self.20260418023649.290_20260418_023649
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418023649.290 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 02:37 | Success | - | |
|
exp_pytrain.20260418023112.071_20260418_023112
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 02:32 | Success | - | |
|
exp_self.20260418022924.289_20260418_022924
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418022924.289 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 02:30 | Success | - | |
|
exp_self.20260418022207.288_20260418_022208
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418022207.288 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 02:23 | Success | - | |
|
exp_self.20260418021453.287_20260418_021453
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418021453.287 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 02:15 | Success | - | |
|
exp_self.20260418020741.286_20260418_020742
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418020741.286 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 02:08 | Success | - | |
|
exp_self.20260418020027.285_20260418_020028
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418020027.285 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 02:01 | Success | - | |
|
exp_pytrain.20260418015804.070_20260418_015805
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 01:59 | Success | - | |
|
exp_self.20260418015125.284_20260418_015125
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418015125.284 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 01:52 | Success | - | |
|
exp_self.20260418014409.283_20260418_014410
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418014409.283 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 01:45 | Success | - | |
|
exp_self.20260418013657.282_20260418_013657
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418013657.282 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 01:37 | Success | - | |
|
exp_self.20260418012946.281_20260418_012946
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418012946.281 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 01:30 | Success | - | |
|
exp_pytrain.20260418012619.069_20260418_012620
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 01:27 | Success | - | |
|
exp_self.20260418012216.280_20260418_012216
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418012216.280 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 01:23 | Success | - | |
|
exp_self.20260418011505.279_20260418_011506
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418011505.279 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 01:16 | Success | - | |
|
exp_self.20260418010753.278_20260418_010753
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418010753.278 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 01:08 | Success | - | |
|
exp_self.20260418010038.277_20260418_010039
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418010038.277 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 01:01 | Success | - | |
|
exp_gh_n24q02m_qwen3-embed_20260418_005755
|
n24q02m/qwen3-embed
Paper ID: gh_n24q02m_qwen3-embed - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered b...
|
04-18 00:58 | Success | - | |
|
exp_pytrain.20260418005454.068_20260418_005454
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 00:55 | Success | - | |
|
exp_self.20260418005052.276_20260418_005053
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418005052.276 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 00:51 | Success | - | |
|
exp_self.20260418004338.275_20260418_004338
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418004338.275 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 00:44 | Success | - | |
|
exp_self.20260418003624.274_20260418_003624
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418003624.274 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 00:37 | Success | - | |
|
exp_self.20260418002908.273_20260418_002909
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418002908.273 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 00:30 | Success | - | |
|
exp_pytrain.20260418002335.067_20260418_002335
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-18 00:24 | Success | - | |
|
exp_self.20260418002146.272_20260418_002147
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418002146.272 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 00:22 | Success | - | |
|
exp_self.20260418001428.271_20260418_001428
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418001428.271 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 00:15 | Success | - | |
|
exp_self.20260418000712.270_20260418_000712
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418000712.270 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 00:08 | Success | - | |
|
exp_self.20260418000001.269_20260418_000002
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260418000001.269 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-18 00:01 | Success | - | |
|
exp_self.20260417235252.268_20260417_235252
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417235252.268 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 23:53 | Success | - | |
|
exp_pytrain.20260417235030.066_20260417_235031
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 23:51 | Success | - | |
|
exp_self.20260417234521.267_20260417_234521
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417234521.267 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 23:46 | Success | - | |
|
exp_gh_reissuerenewal84_moe-compress_20260417_234001
|
reissuerenewal84/moe-compress
Paper ID: gh_reissuerenewal84_moe-compress - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: R...
|
04-17 23:41 | Success | - | |
|
exp_self.20260417233802.266_20260417_233803
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417233802.266 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 23:39 | Success | - | |
|
exp_self.20260417233051.265_20260417_233051
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417233051.265 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 23:31 | Success | - | |
|
exp_gh_lakshgk_distill_20260417_232810
|
lakshgk/distill
Paper ID: gh_lakshgk_distill - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered bench...
|
04-17 23:29 | Success | - | |
|
exp_self.20260417232116.264_20260417_232116
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417232116.264 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 23:22 | Success | - | |
|
exp_pytrain.20260417231858.065_20260417_231858
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 23:20 | Success | - | |
|
exp_self.20260417231134.263_20260417_231134
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417231134.263 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 23:12 | Success | - | |
|
exp_self.20260417230409.262_20260417_230409
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417230409.262 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 23:05 | Success | - | |
|
exp_self.20260417225644.261_20260417_225645
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417225644.261 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 22:57 | Success | - | |
|
exp_self.20260417224900.260_20260417_224901
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417224900.260 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 22:50 | Success | - | |
|
exp_pytrain.20260417224631.064_20260417_224631
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 22:47 | Success | - | |
|
exp_self.20260417223940.259_20260417_223941
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417223940.259 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 22:40 | Success | - | |
|
exp_self.20260417223219.258_20260417_223219
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417223219.258 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 22:33 | Success | - | |
|
exp_self.20260417222456.257_20260417_222457
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417222456.257 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 22:25 | Success | - | |
|
exp_self.20260417221734.256_20260417_221735
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417221734.256 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 22:18 | Success | - | |
|
exp_pytrain.20260417221504.063_20260417_221504
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 22:16 | Success | - | |
|
exp_self.20260417220817.255_20260417_220817
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417220817.255 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 22:09 | Success | - | |
|
exp_self.20260417220049.254_20260417_220049
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417220049.254 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 22:01 | Success | - | |
|
exp_self.20260417215324.253_20260417_215325
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417215324.253 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 21:54 | Success | - | |
|
exp_self.20260417214601.252_20260417_214601
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417214601.252 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 21:47 | Success | - | |
|
exp_pytrain.20260417214327.062_20260417_214327
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 21:44 | Success | - | |
|
exp_self.20260417213636.251_20260417_213636
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417213636.251 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 21:37 | Success | - | |
|
exp_self.20260417212909.250_20260417_212909
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417212909.250 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 21:30 | Success | - | |
|
exp_gh_sanjeev-ragunathan_evolution-of-ml_20260417_212341
|
sanjeev-ragunathan/evolution-of-ml
Paper ID: gh_sanjeev-ragunathan_evolution-of-ml - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Sign...
|
04-17 21:24 | Success | - | |
|
exp_self.20260417212128.249_20260417_212129
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417212128.249 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 21:22 | Success | - | |
|
exp_self.20260417211351.248_20260417_211351
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417211351.248 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 21:14 | Success | - | |
|
exp_pytrain.20260417211115.061_20260417_211116
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 21:12 | Success | - | |
|
exp_self.20260417210408.247_20260417_210408
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417210408.247 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 21:05 | Success | - | |
|
exp_self.20260417205647.246_20260417_205648
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417205647.246 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 20:57 | Success | - | |
|
exp_self.20260417204926.245_20260417_204926
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417204926.245 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 20:50 | Success | - | |
|
exp_self.20260417204202.244_20260417_204202
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417204202.244 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 20:43 | Success | - | |
|
exp_pytrain.20260417203937.060_20260417_203937
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 20:40 | Success | - | |
|
exp_self.20260417203309.243_20260417_203309
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417203309.243 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 20:34 | Success | - | |
|
exp_self.20260417202607.242_20260417_202607
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417202607.242 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 20:27 | Success | - | |
|
exp_gh_mtmatheuus_QKV-Core_20260417_202256
|
mtmatheuus/QKV-Core
Paper ID: gh_mtmatheuus_QKV-Core - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered b...
|
04-17 20:23 | Success | - | |
|
exp_self.20260417201645.241_20260417_201645
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417201645.241 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 20:17 | Success | - | |
|
exp_self.20260417200923.240_20260417_200923
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417200923.240 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 20:10 | Success | - | |
|
exp_pytrain.20260417200654.059_20260417_200655
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 20:07 | Success | - | |
|
exp_self.20260417200134.239_20260417_200135
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417200134.239 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 20:02 | Success | - | |
|
exp_self.20260417195414.238_20260417_195414
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417195414.238 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 19:55 | Success | - | |
|
exp_self.20260417194651.237_20260417_194651
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417194651.237 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 19:47 | Success | - | |
|
exp_self.20260417193927.236_20260417_193927
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417193927.236 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 19:40 | Success | - | |
|
exp_pytrain.20260417193445.058_20260417_193446
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 19:35 | Success | - | |
|
exp_self.20260417193248.235_20260417_193248
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417193248.235 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 19:33 | Success | - | |
|
exp_self.20260417192526.234_20260417_192527
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417192526.234 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 19:26 | Success | - | |
|
exp_self.20260417191807.233_20260417_191808
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417191807.233 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 19:19 | Success | - | |
|
exp_self.20260417191045.232_20260417_191046
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417191045.232 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 19:11 | Success | - | |
|
exp_self.20260417190328.231_20260417_190328
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417190328.231 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 19:04 | Success | - | |
|
exp_pytrain.20260417190109.057_20260417_190109
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 19:02 | Success | - | |
|
exp_self.20260417185422.230_20260417_185423
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417185422.230 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 18:55 | Success | - | |
|
exp_self.20260417184710.229_20260417_184710
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417184710.229 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 18:48 | Success | - | |
|
exp_self.20260417183950.228_20260417_183951
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417183950.228 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 18:40 | Success | - | |
|
exp_self.20260417183231.227_20260417_183232
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417183231.227 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 18:33 | Success | - | |
|
exp_pytrain.20260417182906.056_20260417_182907
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 18:30 | Success | - | |
|
exp_self.20260417182503.226_20260417_182504
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417182503.226 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 18:26 | Success | - | |
|
exp_self.20260417181751.225_20260417_181752
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417181751.225 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 18:18 | Success | - | |
|
exp_self.20260417181039.224_20260417_181039
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417181039.224 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 18:11 | Success | - | |
|
exp_self.20260417180329.223_20260417_180329
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417180329.223 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 18:04 | Success | - | |
|
exp_pytrain.20260417175713.055_20260417_175713
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 17:58 | Success | - | |
|
exp_self.20260417175524.222_20260417_175525
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417175524.222 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 17:56 | Success | - | |
|
exp_self.20260417174812.221_20260417_174813
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417174812.221 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 17:49 | Success | - | |
|
exp_self.20260417174103.220_20260417_174103
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417174103.220 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 17:42 | Success | - | |
|
exp_self.20260417173347.219_20260417_173348
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417173347.219 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 17:34 | Success | - | |
|
exp_self.20260417172625.218_20260417_172626
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417172625.218 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 17:27 | Success | - | |
|
exp_pytrain.20260417172338.054_20260417_172339
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 17:24 | Success | - | |
|
exp_self.20260417171927.217_20260417_171928
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417171927.217 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 17:20 | Success | - | |
|
exp_self.20260417171216.216_20260417_171217
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417171216.216 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 17:13 | Success | - | |
|
exp_self.20260417170507.215_20260417_170508
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417170507.215 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 17:06 | Success | - | |
|
exp_self.20260417165759.214_20260417_165800
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417165759.214 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 16:59 | Success | - | |
|
exp_pytrain.20260417165220.053_20260417_165220
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 16:53 | Success | - | |
|
exp_self.20260417165031.213_20260417_165031
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417165031.213 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 16:51 | Success | - | |
|
exp_self.20260417164322.212_20260417_164322
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417164322.212 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 16:44 | Success | - | |
|
exp_self.20260417163603.211_20260417_163603
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417163603.211 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 16:37 | Success | - | |
|
exp_self.20260417162850.210_20260417_162850
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417162850.210 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 16:29 | Success | - | |
|
exp_self.20260417162142.209_20260417_162143
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417162142.209 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 16:22 | Success | - | |
|
exp_pytrain.20260417161928.052_20260417_161929
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 16:20 | Success | - | |
|
exp_self.20260417161413.208_20260417_161413
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417161413.208 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 16:15 | Success | - | |
|
exp_self.20260417160603.207_20260417_160603
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417160603.207 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 16:07 | Success | - | |
|
exp_self.20260417155849.206_20260417_155849
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417155849.206 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 15:59 | Success | - | |
|
exp_self.20260417155139.205_20260417_155139
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417155139.205 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 15:52 | Success | - | |
|
exp_pytrain.20260417154813.051_20260417_154813
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 15:49 | Success | - | |
|
exp_self.20260417154411.204_20260417_154412
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417154411.204 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 15:45 | Success | - | |
|
exp_self.20260417153659.203_20260417_153700
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417153659.203 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 15:38 | Success | - | |
|
exp_self.20260417152950.202_20260417_152950
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417152950.202 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 15:30 | Success | - | |
|
exp_self.20260417152238.201_20260417_152238
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417152238.201 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 15:23 | Success | - | |
|
exp_pytrain.20260417151658.050_20260417_151659
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 15:18 | Success | - | |
|
exp_self.20260417151511.200_20260417_151511
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417151511.200 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 15:16 | Success | - | |
|
exp_self.20260417150800.199_20260417_150801
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417150800.199 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 15:09 | Success | - | |
|
exp_self.20260417150042.198_20260417_150042
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417150042.198 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 15:01 | Success | - | |
|
exp_self.20260417145326.197_20260417_145327
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417145326.197 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 14:54 | Success | - | |
|
exp_self.20260417144614.196_20260417_144614
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417144614.196 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 14:47 | Success | - | |
|
exp_pytrain.20260417144359.049_20260417_144400
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 14:45 | Success | - | |
|
exp_self.20260417143708.195_20260417_143708
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417143708.195 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 14:38 | Success | - | |
|
exp_self.20260417142947.194_20260417_142947
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417142947.194 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 14:30 | Success | - | |
|
exp_self.20260417142230.193_20260417_142230
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417142230.193 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 14:23 | Success | - | |
|
exp_self.20260417141504.192_20260417_141504
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417141504.192 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 14:16 | Success | - | |
|
exp_pytrain.20260417141242.048_20260417_141242
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 14:13 | Success | - | |
|
exp_self.20260417140727.191_20260417_140727
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417140727.191 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 14:08 | Success | - | |
|
exp_self.20260417135954.190_20260417_135954
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417135954.190 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 14:00 | Success | - | |
|
exp_self.20260417135219.189_20260417_135219
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417135219.189 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 13:53 | Success | - | |
|
exp_self.20260417134451.188_20260417_134451
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417134451.188 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 13:45 | Success | - | |
|
exp_pytrain.20260417134121.047_20260417_134122
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 13:42 | Success | - | |
|
exp_self.20260417133718.187_20260417_133719
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417133718.187 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 13:38 | Success | - | |
|
exp_self.20260417133007.186_20260417_133007
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417133007.186 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 13:31 | Success | - | |
|
exp_self.20260417132254.185_20260417_132255
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417132254.185 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 13:23 | Success | - | |
|
exp_self.20260417131544.184_20260417_131544
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417131544.184 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 13:16 | Success | - | |
|
exp_pytrain.20260417131002.046_20260417_131003
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 13:11 | Success | - | |
|
exp_self.20260417130813.183_20260417_130813
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417130813.183 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 13:09 | Success | - | |
|
exp_self.20260417130057.182_20260417_130057
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417130057.182 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 13:01 | Success | - | |
|
exp_self.20260417125338.181_20260417_125339
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417125338.181 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 12:54 | Success | - | |
|
exp_self.20260417124621.180_20260417_124621
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417124621.180 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 12:47 | Success | - | |
|
exp_self.20260417123909.179_20260417_123910
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417123909.179 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 12:40 | Success | - | |
|
exp_pytrain.20260417123656.045_20260417_123656
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 12:37 | Success | - | |
|
exp_self.20260417123000.178_20260417_123001
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417123000.178 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 12:31 | Success | - | |
|
exp_self.20260417122220.177_20260417_122220
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417122220.177 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 12:23 | Success | - | |
|
exp_self.20260417121509.176_20260417_121509
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417121509.176 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 12:16 | Success | - | |
|
exp_self.20260417120757.175_20260417_120758
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417120757.175 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 12:09 | Success | - | |
|
exp_pytrain.20260417120537.044_20260417_120538
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 12:06 | Success | - | |
|
exp_self.20260417115857.174_20260417_115858
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417115857.174 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 12:00 | Success | - | |
|
exp_self.20260417115139.173_20260417_115139
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417115139.173 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 11:52 | Success | - | |
|
exp_self.20260417114425.172_20260417_114426
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417114425.172 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 11:45 | Success | - | |
|
exp_self.20260417113712.171_20260417_113712
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417113712.171 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 11:38 | Success | - | |
|
exp_pytrain.20260417113347.043_20260417_113347
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 11:34 | Success | - | |
|
exp_self.20260417112944.170_20260417_112944
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417112944.170 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 11:30 | Success | - | |
|
exp_self.20260417112229.169_20260417_112230
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417112229.169 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 11:23 | Success | - | |
|
exp_self.20260417111517.168_20260417_111518
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417111517.168 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 11:16 | Success | - | |
|
exp_self.20260417110801.167_20260417_110801
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417110801.167 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 11:09 | Success | - | |
|
exp_pytrain.20260417110227.042_20260417_110228
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 11:03 | Success | - | |
|
exp_self.20260417110040.166_20260417_110040
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417110040.166 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 11:01 | Success | - | |
|
exp_self.20260417105319.165_20260417_105319
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417105319.165 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 10:54 | Success | - | |
|
exp_self.20260417104605.164_20260417_104605
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417104605.164 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 10:47 | Success | - | |
|
exp_self.20260417103853.163_20260417_103853
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417103853.163 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 10:39 | Success | - | |
|
exp_self.20260417103137.162_20260417_103137
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417103137.162 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 10:32 | Success | - | |
|
exp_pytrain.20260417102916.041_20260417_102917
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 10:30 | Success | - | |
|
exp_self.20260417102407.161_20260417_102407
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417102407.161 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 10:25 | Success | - | |
|
exp_self.20260417101654.160_20260417_101655
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417101654.160 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 10:17 | Success | - | |
|
exp_self.20260417100927.159_20260417_100928
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417100927.159 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 10:10 | Success | - | |
|
exp_self.20260417100203.158_20260417_100204
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417100203.158 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 10:03 | Success | - | |
|
exp_pytrain.20260417095726.040_20260417_095726
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 09:58 | Success | - | |
|
exp_self.20260417095424.157_20260417_095425
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417095424.157 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 09:55 | Success | - | |
|
exp_self.20260417094657.156_20260417_094657
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417094657.156 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 09:47 | Success | - | |
|
exp_self.20260417093932.155_20260417_093936
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417093932.155 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 09:40 | Success | - | |
|
exp_self.20260417093155.154_20260417_093155
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417093155.154 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 09:32 | Success | - | |
|
exp_pytrain.20260417092540.039_20260417_092540
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 09:26 | Success | - | |
|
exp_self.20260417092351.153_20260417_092351
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417092351.153 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 09:24 | Success | - | |
|
exp_self.20260417091639.152_20260417_091640
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417091639.152 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 09:17 | Success | - | |
|
exp_self.20260417090926.151_20260417_090926
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417090926.151 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 09:10 | Success | - | |
|
exp_self.20260417090207.150_20260417_090208
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417090207.150 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 09:03 | Success | - | |
|
exp_self.20260417085450.149_20260417_085451
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417085450.149 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 08:55 | Success | - | |
|
exp_pytrain.20260417085236.038_20260417_085236
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 08:53 | Success | - | |
|
exp_self.20260417084720.148_20260417_084720
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417084720.148 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 08:48 | Success | - | |
|
exp_hf_2211.16780_20260417_084412
|
An Optimal Transport-driven Approach for Cultivating Latent Space in Online Incremental Learning
Paper ID: hf_2211.16780 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-17 08:45 | Success | - | |
|
exp_self.20260417083851.147_20260417_083852
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417083851.147 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 08:39 | Success | - | |
|
exp_self.20260417083105.146_20260417_083106
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417083105.146 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 08:32 | Success | - | |
|
exp_self.20260417082324.145_20260417_082324
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417082324.145 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 08:24 | Success | - | |
|
exp_pytrain.20260417082043.037_20260417_082043
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 08:21 | Success | - | |
|
exp_self.20260417081448.144_20260417_081448
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417081448.144 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 08:15 | Success | - | |
|
exp_self.20260417080715.143_20260417_080716
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417080715.143 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 08:08 | Success | - | |
|
exp_self.20260417075935.142_20260417_075935
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417075935.142 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 08:00 | Success | - | |
|
exp_self.20260417075156.141_20260417_075157
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417075156.141 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 07:52 | Success | - | |
|
exp_pytrain.20260417074917.036_20260417_074917
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 07:50 | Success | - | |
|
exp_self.20260417074216.140_20260417_074216
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417074216.140 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 07:43 | Success | - | |
|
exp_self.20260417073447.139_20260417_073448
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417073447.139 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 07:35 | Success | - | |
|
exp_self.20260417072717.138_20260417_072717
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417072717.138 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 07:28 | Success | - | |
|
exp_self.20260417071940.137_20260417_071941
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417071940.137 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 07:20 | Success | - | |
|
exp_pytrain.20260417071709.035_20260417_071709
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 07:18 | Success | - | |
|
exp_self.20260417071004.136_20260417_071005
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417071004.136 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 07:11 | Success | - | |
|
exp_self.20260417070224.135_20260417_070224
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417070224.135 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 07:03 | Success | - | |
|
exp_self.20260417065445.134_20260417_065445
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417065445.134 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 06:55 | Success | - | |
|
exp_self.20260417064714.133_20260417_064714
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417064714.133 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 06:48 | Success | - | |
|
exp_pytrain.20260417064437.034_20260417_064437
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 06:45 | Success | - | |
|
exp_self.20260417063744.132_20260417_063744
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417063744.132 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 06:38 | Success | - | |
|
exp_cr_10.3390_app16083892_20260417_063317
|
Latent Diffusion Model for Chlorophyll Remote Sensing Spectral Synthesis Integrating Bio-Optical Priors and Band Attenti...
Paper ID: cr_10.3390_app16083892 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered b...
|
04-17 06:34 | Success | - | |
|
exp_self.20260417063002.131_20260417_063002
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417063002.131 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 06:31 | Success | - | |
|
exp_self.20260417062224.130_20260417_062224
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417062224.130 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 06:23 | Success | - | |
|
exp_cr_10.1145_3807782_20260417_061749
|
Efficient Addition-Based Sparse GEMM for Fast Ternary Large Language Model Inference on Edge Devices
Paper ID: cr_10.1145_3807782 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered bench...
|
04-17 06:18 | Success | - | |
|
exp_self.20260417061523.129_20260417_061524
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417061523.129 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 06:16 | Success | - | |
|
exp_pytrain.20260417061254.033_20260417_061254
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 06:13 | Success | - | |
|
exp_self.20260417060549.128_20260417_060549
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417060549.128 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 06:06 | Success | - | |
|
exp_self.20260417055815.127_20260417_055815
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417055815.127 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 05:59 | Success | - | |
|
exp_self.20260417055036.126_20260417_055036
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417055036.126 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 05:51 | Success | - | |
|
exp_self.20260417054307.125_20260417_054307
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417054307.125 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 05:44 | Success | - | |
|
exp_pytrain.20260417054037.032_20260417_054037
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 05:41 | Success | - | |
|
exp_self.20260417053502.124_20260417_053503
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417053502.124 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 05:36 | Success | - | |
|
exp_self.20260417052728.123_20260417_052728
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417052728.123 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 05:28 | Success | - | |
|
exp_self.20260417051943.122_20260417_051944
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417051943.122 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 05:20 | Success | - | |
|
exp_self.20260417051157.121_20260417_051158
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417051157.121 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 05:13 | Success | - | |
|
exp_pytrain.20260417050917.031_20260417_050918
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 05:10 | Success | - | |
|
exp_self.20260417050347.120_20260417_050348
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417050347.120 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 05:04 | Success | - | |
|
exp_self.20260417045553.119_20260417_045554
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417045553.119 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 04:56 | Success | - | |
|
exp_self.20260417044816.118_20260417_044817
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417044816.118 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 04:49 | Success | - | |
|
exp_self.20260417044043.117_20260417_044043
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417044043.117 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 04:41 | Success | - | |
|
exp_pytrain.20260417043753.030_20260417_043753
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 04:38 | Success | - | |
|
exp_hf_2604.14572_20260417_043506
|
Don't Retrieve, Navigate: Distilling Enterprise Knowledge into Navigable Agent Skills for QA and RAG
Paper ID: hf_2604.14572 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-17 04:36 | Success | - | |
|
exp_self.20260417043037.116_20260417_043037
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417043037.116 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 04:31 | Success | - | |
|
exp_self.20260417042240.115_20260417_042240
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417042240.115 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 04:23 | Success | - | |
|
exp_self.20260417041503.114_20260417_041504
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417041503.114 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 04:16 | Success | - | |
|
exp_self.20260417040721.113_20260417_040721
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417040721.113 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 04:08 | Success | - | |
|
exp_pytrain.20260417040450.029_20260417_040451
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 04:05 | Success | - | |
|
exp_self.20260417035743.112_20260417_035743
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417035743.112 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 03:58 | Success | - | |
|
exp_self.20260417035011.111_20260417_035012
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417035011.111 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 03:51 | Success | - | |
|
exp_self.20260417034243.110_20260417_034244
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417034243.110 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 03:43 | Success | - | |
|
exp_self.20260417033510.109_20260417_033511
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417033510.109 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 03:36 | Success | - | |
|
exp_pytrain.20260417033241.028_20260417_033241
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 03:33 | Success | - | |
|
exp_self.20260417032542.108_20260417_032542
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417032542.108 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 03:26 | Success | - | |
|
exp_self.20260417031813.107_20260417_031815
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417031813.107 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 03:19 | Success | - | |
|
exp_self.20260417031041.106_20260417_031041
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417031041.106 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 03:11 | Success | - | |
|
exp_hf_2604.14629_20260417_030721
|
Switch-KD: Visual-Switch Knowledge Distillation for Vision-Language Models
Paper ID: hf_2604.14629 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-17 03:08 | Success | - | |
|
exp_self.20260417030241.105_20260417_030243
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417030241.105 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 03:03 | Success | - | |
|
exp_pytrain.20260417030001.027_20260417_030001
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 03:01 | Success | - | |
|
exp_self.20260417025532.104_20260417_025532
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417025532.104 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 02:56 | Success | - | |
|
exp_self.20260417024752.103_20260417_024752
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417024752.103 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 02:48 | Success | - | |
|
exp_self.20260417024016.102_20260417_024016
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417024016.102 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 02:41 | Success | - | |
|
exp_self.20260417023231.101_20260417_023231
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417023231.101 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 02:33 | Success | - | |
|
exp_pytrain.20260417022831.026_20260417_022832
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 02:29 | Success | - | |
|
exp_self.20260417022511.100_20260417_022512
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417022511.100 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 02:26 | Success | - | |
|
exp_self.20260417021730.099_20260417_021730
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417021730.099 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 02:18 | Success | - | |
|
exp_self.20260417020944.098_20260417_020944
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417020944.098 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 02:10 | Success | - | |
|
exp_self.20260417020214.097_20260417_020214
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417020214.097 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 02:03 | Success | - | |
|
exp_hf_2604.14531_20260417_015916
|
TRACER: Trace-Based Adaptive Cost-Efficient Routing for LLM Classification
Paper ID: hf_2604.14531 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-17 02:00 | Success | - | |
|
exp_pytrain.20260417015711.025_20260417_015711
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 01:58 | Success | - | |
|
exp_self.20260417015247.096_20260417_015247
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417015247.096 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 01:53 | Success | - | |
|
exp_self.20260417014516.095_20260417_014516
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417014516.095 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 01:46 | Success | - | |
|
exp_self.20260417013737.094_20260417_013738
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417013737.094 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 01:38 | Success | - | |
|
exp_self.20260417013009.093_20260417_013009
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417013009.093 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 01:31 | Success | - | |
|
exp_pytrain.20260417012524.024_20260417_012524
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 01:26 | Success | - | |
|
exp_self.20260417012313.092_20260417_012313
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417012313.092 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 01:24 | Success | - | |
|
exp_hf_2604.11661_20260417_011838
|
Towards Autonomous Mechanistic Reasoning in Virtual Cells
Paper ID: hf_2604.11661 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-17 01:19 | Success | - | |
|
exp_self.20260417011414.091_20260417_011414
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417011414.091 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 01:15 | Success | - | |
|
exp_self.20260417010446.090_20260417_010446
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417010446.090 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 01:05 | Success | - | |
|
exp_self.20260417005649.089_20260417_005650
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417005649.089 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 00:57 | Success | - | |
|
exp_gh_msu-denver_bili-core_20260417_005401
|
msu-denver/bili-core
Paper ID: gh_msu-denver_bili-core - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 00:55 | Success | - | |
|
exp_pytrain.20260417005151.023_20260417_005152
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 00:52 | Success | - | |
|
exp_self.20260417004633.088_20260417_004633
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417004633.088 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 00:47 | Success | - | |
|
exp_self.20260417003905.087_20260417_003906
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417003905.087 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 00:40 | Success | - | |
|
exp_self.20260417003102.086_20260417_003102
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417003102.086 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 00:32 | Success | - | |
|
exp_self.20260417002221.085_20260417_002222
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417002221.085 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 00:23 | Success | - | |
|
exp_pytrain.20260417001957.022_20260417_001959
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-17 00:21 | Success | - | |
|
exp_hf_2604.15284_20260417_001536
|
GlobalSplat: Efficient Feed-Forward 3D Gaussian Splatting via Global Scene Tokens
Paper ID: hf_2604.15284 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-17 00:16 | Success | - | |
|
exp_self.20260417001225.084_20260417_001225
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417001225.084 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 00:13 | Success | - | |
|
exp_self.20260417000459.083_20260417_000500
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260417000459.083 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-17 00:06 | Success | - | |
|
exp_self.20260416235727.082_20260416_235728
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416235727.082 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 23:58 | Success | - | |
|
exp_self.20260416235006.081_20260416_235007
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416235006.081 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 23:51 | Success | - | |
|
exp_pytrain.20260416234734.021_20260416_234734
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-16 23:48 | Success | - | |
|
exp_self.20260416234042.080_20260416_234042
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416234042.080 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 23:41 | Success | - | |
|
exp_self.20260416233309.079_20260416_233310
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416233309.079 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 23:34 | Success | - | |
|
exp_self.20260416232538.078_20260416_232538
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416232538.078 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 23:26 | Success | - | |
|
exp_self.20260416231809.077_20260416_231810
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416231809.077 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 23:19 | Success | - | |
|
exp_pytrain.20260416231534.020_20260416_231534
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-16 23:16 | Success | - | |
|
exp_self.20260416230841.076_20260416_230841
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416230841.076 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 23:09 | Success | - | |
|
exp_self.20260416230107.075_20260416_230108
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416230107.075 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 23:02 | Success | - | |
|
exp_self.20260416225341.074_20260416_225342
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416225341.074 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 22:54 | Success | - | |
|
exp_gh_sakhama_memfuse_20260416_224810
|
sakhama/memfuse
Paper ID: gh_sakhama_memfuse - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered bench...
|
04-16 22:49 | Success | - | |
|
exp_self.20260416224558.073_20260416_224559
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416224558.073 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 22:47 | Success | - | |
|
exp_pytrain.20260416224331.019_20260416_224331
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-16 22:44 | Success | - | |
|
exp_self.20260416223858.072_20260416_223858
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416223858.072 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 22:40 | Success | - | |
|
exp_self.20260416223128.071_20260416_223128
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416223128.071 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 22:32 | Success | - | |
|
exp_hf_2604.14125_20260416_222809
|
HiVLA: A Visual-Grounded-Centric Hierarchical Embodied Manipulation System
Paper ID: hf_2604.14125 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-16 22:29 | Success | - | |
|
exp_self.20260416222347.070_20260416_222348
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416222347.070 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 22:24 | Success | - | |
|
exp_gh_Daubingweirdie414_multimodal-rag_20260416_221819
|
Daubingweirdie414/multimodal-rag
Paper ID: gh_Daubingweirdie414_multimodal-rag - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal...
|
04-16 22:19 | Success | - | |
|
exp_self.20260416221607.069_20260416_221607
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416221607.069 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 22:17 | Success | - | |
|
exp_gh_Mustii2009_NeuroRag_20260416_221320
|
Mustii2009/NeuroRag
Paper ID: gh_Mustii2009_NeuroRag - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered b...
|
04-16 22:14 | Success | - | |
|
exp_pytrain.20260416221112.018_20260416_221113
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-16 22:12 | Success | - | |
|
exp_hf_2509.25843_20260416_220826
|
ASGuard: Activation-Scaling Guard to Mitigate Targeted Jailbreaking Attack
Paper ID: hf_2509.25843 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-16 22:09 | Success | - | |
|
exp_self.20260416220512.068_20260416_220513
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416220512.068 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 22:06 | Success | - | |
|
exp_2604.15306v1_20260416_220223
|
Generalization in LLM Problem Solving: The Case of the Shortest Path
Paper ID: 2604.15306v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-16 22:03 | Success | - | |
|
exp_self.20260416215529.067_20260416_215530
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416215529.067 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 21:56 | Success | - | |
|
exp_2604.15308v1_20260416_215106
|
RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework
Paper ID: 2604.15308v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-16 21:52 | Success | - | |
|
exp_self.20260416214755.066_20260416_214755
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416214755.066 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 21:48 | Success | - | |
|
exp_self.20260416214035.065_20260416_214035
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416214035.065 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 21:41 | Success | - | |
|
exp_pytrain.20260416213803.017_20260416_213804
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-16 21:39 | Success | - | |
|
exp_hf_2604.14922_20260416_213548
|
LongAct: Harnessing Intrinsic Activation Patterns for Long-Context Reinforcement Learning
Paper ID: hf_2604.14922 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-16 21:36 | Success | - | |
|
exp_self.20260416213236.064_20260416_213236
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416213236.064 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 21:33 | Success | - | |
|
exp_hf_2604.14967_20260416_212947
|
UniDoc-RL: Coarse-to-Fine Visual RAG with Hierarchical Actions and Dense Rewards
Paper ID: hf_2604.14967 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-16 21:30 | Success | - | |
|
exp_hf_2604.15308_20260416_212546
|
RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework
Paper ID: hf_2604.15308 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-16 21:26 | Success | - | |
|
exp_self.20260416212346.063_20260416_212347
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416212346.063 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 21:24 | Success | - | |
|
exp_self.20260416211618.062_20260416_211618
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416211618.062 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 21:17 | Success | - | |
|
exp_hf_2604.14683_20260416_211303
|
DR^{3}-Eval: Towards Realistic and Reproducible Deep Research Evaluation
Paper ID: hf_2604.14683 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-16 21:14 | Success | - | |
|
exp_self.20260416210846.061_20260416_210847
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416210846.061 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 21:09 | Success | - | |
|
exp_pytrain.20260416210615.016_20260416_210616
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-16 21:07 | Success | - | |
|
exp_hf_2604.13226_20260416_210332
|
KV Packet: Recomputation-Free Context-Independent KV Caching for LLMs
Paper ID: hf_2604.13226 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-16 21:04 | Success | - | |
|
exp_self.20260416205914.060_20260416_205914
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416205914.060 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 21:00 | Success | - | |
|
exp_2604.15167v1_20260416_205449
|
When Flat Minima Fail: Characterizing INT4 Quantization Collapse After FP32 Convergence
Paper ID: 2604.15167v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-16 20:55 | Success | - | |
|
exp_self.20260416205136.059_20260416_205136
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416205136.059 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 20:52 | Success | - | |
|
exp_2604.15174v1_20260416_204849
|
MambaSL: Exploring Single-Layer Mamba for Time Series Classification
Paper ID: 2604.15174v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-16 20:49 | Success | - | |
|
exp_self.20260416204157.058_20260416_204157
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416204157.058 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 20:43 | Success | - | |
|
exp_self.20260416203440.057_20260416_203441
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416203440.057 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 20:35 | Success | - | |
|
exp_pytrain.20260416203212.015_20260416_203213
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-16 20:33 | Success | - | |
|
exp_self.20260416202511.056_20260416_202512
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416202511.056 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 20:26 | Success | - | |
|
exp_self.20260416201745.055_20260416_201746
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416201745.055 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 20:18 | Success | - | |
|
exp_self.20260416201012.054_20260416_201013
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416201012.054 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 20:11 | Success | - | |
|
exp_self.20260416200237.053_20260416_200237
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416200237.053 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 20:03 | Success | - | |
|
exp_pytrain.20260416200008.014_20260416_200009
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-16 20:01 | Success | - | |
|
exp_self.20260416195307.052_20260416_195307
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416195307.052 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 19:54 | Success | - | |
|
exp_self.20260416194534.051_20260416_194534
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416194534.051 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 19:46 | Success | - | |
|
exp_self.20260416193803.050_20260416_193803
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416193803.050 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 19:39 | Success | - | |
|
exp_self.20260416193026.049_20260416_193026
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416193026.049 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 19:31 | Success | - | |
|
exp_pytrain.20260416192757.013_20260416_192758
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-16 19:29 | Success | - | |
|
exp_self.20260416192051.048_20260416_192051
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416192051.048 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 19:21 | Success | - | |
|
exp_self.20260416191323.047_20260416_191323
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416191323.047 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 19:14 | Success | - | |
|
exp_gh_qualcomm_ai-hub-models_20260416_190751
|
qualcomm/ai-hub-models
Paper ID: gh_qualcomm_ai-hub-models - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovere...
|
04-16 19:08 | Success | - | |
|
exp_self.20260416190543.046_20260416_190543
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416190543.046 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 19:06 | Success | - | |
|
exp_self.20260416185810.045_20260416_185811
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416185810.045 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 18:59 | Success | - | |
|
exp_pytrain.20260416185542.012_20260416_185542
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-16 18:56 | Success | - | |
|
exp_self.20260416184839.044_20260416_184839
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416184839.044 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 18:49 | Success | - | |
|
exp_self.20260416184111.043_20260416_184111
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416184111.043 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 18:42 | Success | - | |
|
exp_self.20260416183348.042_20260416_183348
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416183348.042 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 18:34 | Success | - | |
|
exp_self.20260416182624.041_20260416_182624
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416182624.041 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 18:27 | Success | - | |
|
exp_pytrain.20260416182358.011_20260416_182358
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-16 18:25 | Success | - | |
|
exp_self.20260416181706.040_20260416_181706
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416181706.040 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 18:18 | Success | - | |
|
exp_self.20260416180940.039_20260416_180940
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416180940.039 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 18:10 | Success | - | |
|
exp_self.20260416180203.038_20260416_180204
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416180203.038 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 18:03 | Success | - | |
|
exp_self.20260416175437.037_20260416_175438
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416175437.037 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 17:55 | Success | - | |
|
exp_pytrain.20260416175209.010_20260416_175209
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-16 17:53 | Success | - | |
|
exp_self.20260416174514.036_20260416_174514
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416174514.036 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 17:46 | Success | - | |
|
exp_self.20260416173747.035_20260416_173747
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416173747.035 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 17:38 | Success | - | |
|
exp_self.20260416173015.034_20260416_173016
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416173015.034 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 17:31 | Success | - | |
|
exp_self.20260416172247.033_20260416_172248
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416172247.033 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 17:23 | Success | - | |
|
exp_pytrain.20260416172018.009_20260416_172019
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-16 17:21 | Success | - | |
|
exp_self.20260416171323.032_20260416_171323
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416171323.032 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 17:14 | Success | - | |
|
exp_self.20260416170559.031_20260416_170559
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416170559.031 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 17:07 | Success | - | |
|
exp_self.20260416165831.030_20260416_165832
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416165831.030 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 16:59 | Success | - | |
|
exp_self.20260416165101.029_20260416_165102
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416165101.029 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 16:52 | Success | - | |
|
exp_pytrain.20260416164832.008_20260416_164832
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-16 16:49 | Success | - | |
|
exp_self.20260416164137.028_20260416_164138
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416164137.028 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 16:42 | Success | - | |
|
exp_self.20260416163411.027_20260416_163411
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416163411.027 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 16:35 | Success | - | |
|
exp_self.20260416162644.026_20260416_162644
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416162644.026 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 16:27 | Success | - | |
|
exp_self.20260416161913.025_20260416_161913
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416161913.025 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 16:20 | Success | - | |
|
exp_pytrain.20260416161643.007_20260416_161644
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-16 16:17 | Success | - | |
|
exp_self.20260416160948.024_20260416_160948
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416160948.024 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 16:10 | Success | - | |
|
exp_self.20260416160222.023_20260416_160222
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416160222.023 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 16:03 | Success | - | |
|
exp_self.20260416155454.022_20260416_155454
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416155454.022 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 15:55 | Success | - | |
|
exp_self.20260416154723.021_20260416_154723
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416154723.021 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 15:48 | Success | - | |
|
exp_pytrain.20260416154448.006_20260416_154449
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-16 15:45 | Success | - | |
|
exp_self.20260416153753.020_20260416_153754
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416153753.020 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 15:38 | Success | - | |
|
exp_self.20260416153017.019_20260416_153018
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416153017.019 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 15:31 | Success | - | |
|
exp_self.20260416152241.018_20260416_152241
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416152241.018 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 15:23 | Success | - | |
|
exp_self.20260416151505.017_20260416_151506
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416151505.017 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 15:16 | Success | - | |
|
exp_pytrain.20260416151227.005_20260416_151227
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-16 15:13 | Success | - | |
|
exp_self.20260416150518.016_20260416_150518
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416150518.016 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 15:06 | Success | - | |
|
exp_self.20260416145733.015_20260416_145734
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416145733.015 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 14:58 | Success | - | |
|
exp_self.20260416144955.014_20260416_144955
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416144955.014 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 14:50 | Success | - | |
|
exp_self.20260416144223.013_20260416_144223
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416144223.013 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 14:43 | Success | - | |
|
exp_pytrain.20260416143947.004_20260416_143948
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-16 14:40 | Success | - | |
|
exp_hf_2604.11490_20260416_143701
|
Anthropogenic Regional Adaptation in Multimodal Vision-Language Model
Paper ID: hf_2604.11490 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-16 14:38 | Success | - | |
|
exp_self.20260416143345.012_20260416_143345
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416143345.012 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 14:34 | Success | - | |
|
exp_hf_2604.12002_20260416_143028
|
Self-Distillation Zero: Self-Revision Turns Binary Rewards into Dense Supervision
Paper ID: hf_2604.12002 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-16 14:31 | Success | - | |
|
exp_self.20260416142601.011_20260416_142601
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416142601.011 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 14:27 | Success | - | |
|
exp_self.20260416141819.010_20260416_141819
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416141819.010 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 14:19 | Success | - | |
|
exp_hf_2604.11748_20260416_141246
|
LangFlow: Continuous Diffusion Rivals Discrete in Language Modeling
Paper ID: hf_2604.11748 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-16 14:13 | Success | - | |
|
exp_self.20260416141034.009_20260416_141034
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416141034.009 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 14:11 | Success | - | |
|
exp_pytrain.20260416140752.003_20260416_140752
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-16 14:08 | Success | - | |
|
exp_self.20260416140042.008_20260416_140043
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416140042.008 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 14:01 | Success | - | |
|
exp_self.20260416135311.007_20260416_135312
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416135311.007 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 13:54 | Success | - | |
|
exp_self.20260416134531.006_20260416_134531
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416134531.006 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 13:46 | Success | - | |
|
exp_self.20260416133750.005_20260416_133750
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416133750.005 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 13:38 | Success | - | |
|
exp_pytrain.20260416133514.002_20260416_133514
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-16 13:36 | Success | - | |
|
exp_hf_2604.03088_20260416_133249
|
SkVM: Compiling Skills for Efficient Execution Everywhere
Paper ID: hf_2604.03088 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-16 13:33 | Success | - | |
|
exp_self.20260416133041.004_20260416_133041
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416133041.004 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 13:31 | Success | - | |
|
exp_self.20260416132244.003_20260416_132244
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416132244.003 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 13:23 | Success | - | |
|
exp_self.20260416131459.002_20260416_131459
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416131459.002 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 13:16 | Success | - | |
|
exp_self.20260416130724.001_20260416_130724
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416130724.001 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 13:08 | Success | - | |
|
exp_pytrain.20260416130350.001_20260416_130351
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-16 13:04 | Success | - | |
|
exp_self.20260416124116.001_20260416_124116
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416124116.001 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 12:41 | Pending | - | |
|
exp_pytrain.20260416123843.001_20260416_123843
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-16 12:39 | Success | - | |
|
exp_self.20260416123358.015_20260416_123358
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416123358.015 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 12:35 | Success | - | |
|
exp_self.20260416122616.014_20260416_122616
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416122616.014 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 12:27 | Success | - | |
|
exp_self.20260416121830.013_20260416_121831
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416121830.013 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 12:19 | Success | - | |
|
exp_pytrain.20260416121548.004_20260416_121548
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-16 12:16 | Success | - | |
|
exp_self.20260416121011.012_20260416_121012
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416121011.012 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 12:11 | Success | - | |
|
exp_self.20260416120258.011_20260416_120258
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416120258.011 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 12:04 | Success | - | |
|
exp_self.20260416115544.010_20260416_115544
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416115544.010 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 11:56 | Success | - | |
|
exp_self.20260416114808.009_20260416_114808
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416114808.009 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 11:49 | Success | - | |
|
exp_pytrain.20260416114421.003_20260416_114421
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-16 11:45 | Success | - | |
|
exp_self.20260416114053.008_20260416_114053
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416114053.008 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 11:41 | Success | - | |
|
exp_self.20260416113313.007_20260416_113314
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416113313.007 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 11:34 | Success | - | |
|
exp_self.20260416112524.006_20260416_112524
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416112524.006 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 11:26 | Success | - | |
|
exp_self.20260416111742.005_20260416_111743
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416111742.005 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 11:18 | Success | - | |
|
exp_pytrain.20260416111246.002_20260416_111246
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-16 11:13 | Success | - | |
|
exp_self.20260416111038.004_20260416_111039
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416111038.004 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 11:11 | Success | - | |
|
exp_hf_2604.07882_20260416_110712
|
ReconPhys: Reconstruct Appearance and Physical Attributes from Single Video
Paper ID: hf_2604.07882 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-16 11:08 | Success | - | |
|
exp_2604.14147v1_20260416_110447
|
ROSE: Retrieval-Oriented Segmentation Enhancement
Paper ID: 2604.14147v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-16 11:05 | Success | - | |
|
exp_self.20260416110220.003_20260416_110221
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416110220.003 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 11:03 | Success | - | |
|
exp_2604.14141v1_20260416_105920
|
Geometric Context Transformer for Streaming 3D Reconstruction
Paper ID: 2604.14141v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-16 11:00 | Success | - | |
|
exp_self.20260416105200.002_20260416_105200
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416105200.002 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 10:53 | Success | - | |
|
exp_hf_2604.14141_20260416_104847
|
Geometric Context Transformer for Streaming 3D Reconstruction
Paper ID: hf_2604.14141 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-16 10:49 | Success | - | |
|
exp_2604.14149v1_20260416_104632
|
One Token per Highly Selective Frame: Towards Extreme Compression for Long Video Understanding
Paper ID: 2604.14149v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-16 10:47 | Success | - | |
|
exp_self.20260416104433.001_20260416_104434
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260416104433.001 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-16 10:45 | Success | - | |
|
exp_hf_2604.11045_20260416_104145
|
Sema Code: Decoupling AI Coding Agents into Programmable, Embeddable Infrastructure
Paper ID: hf_2604.11045 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-16 10:42 | Success | - | |
|
exp_pytrain.20260416103919.001_20260416_103920
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-16 10:40 | Success | - | |
|
exp_self.20260415122901.382_20260415_122902
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415122901.382 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 12:30 | Success | - | |
|
exp_self.20260415122136.381_20260415_122136
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415122136.381 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 12:22 | Success | - | |
|
exp_pytrain.20260415121901.146_20260415_121901
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-15 12:20 | Success | - | |
|
exp_hf_2604.11177_20260415_121402
|
Do Thought Streams Matter? Evaluating Reasoning in Gemini Vision-Language Models for Video Scene Understanding
Paper ID: hf_2604.11177 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-15 12:15 | Success | - | |
|
exp_self.20260415121200.380_20260415_121200
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415121200.380 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 12:13 | Success | - | |
|
exp_self.20260415120429.379_20260415_120429
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415120429.379 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 12:05 | Success | - | |
|
exp_self.20260415115656.378_20260415_115657
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415115656.378 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 11:57 | Success | - | |
|
exp_self.20260415114924.377_20260415_114925
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415114924.377 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 11:50 | Success | - | |
|
exp_pytrain.20260415114658.145_20260415_114658
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-15 11:48 | Success | - | |
|
exp_self.20260415113952.376_20260415_113954
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415113952.376 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 11:40 | Success | - | |
|
exp_self.20260415113225.375_20260415_113225
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415113225.375 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 11:33 | Success | - | |
|
exp_self.20260415112459.374_20260415_112500
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415112459.374 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 11:26 | Success | - | |
|
exp_self.20260415111723.373_20260415_111723
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415111723.373 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 11:18 | Success | - | |
|
exp_pytrain.20260415111455.144_20260415_111456
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-15 11:15 | Success | - | |
|
exp_self.20260415110755.372_20260415_110756
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415110755.372 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 11:08 | Success | - | |
|
exp_self.20260415110028.371_20260415_110029
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415110028.371 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 11:01 | Success | - | |
|
exp_self.20260415105258.370_20260415_105258
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415105258.370 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 10:54 | Success | - | |
|
exp_self.20260415104524.369_20260415_104525
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415104524.369 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 10:46 | Success | - | |
|
exp_pytrain.20260415104251.143_20260415_104252
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-15 10:43 | Success | - | |
|
exp_self.20260415103550.368_20260415_103551
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415103550.368 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 10:36 | Success | - | |
|
exp_self.20260415102822.367_20260415_102823
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415102822.367 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 10:29 | Success | - | |
|
exp_self.20260415102054.366_20260415_102055
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415102054.366 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 10:21 | Success | - | |
|
exp_self.20260415101323.365_20260415_101323
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415101323.365 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 10:14 | Success | - | |
|
exp_pytrain.20260415101049.142_20260415_101050
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-15 10:11 | Success | - | |
|
exp_self.20260415100347.364_20260415_100348
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415100347.364 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 10:04 | Success | - | |
|
exp_self.20260415095614.363_20260415_095614
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415095614.363 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 09:57 | Success | - | |
|
exp_self.20260415094843.362_20260415_094843
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415094843.362 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 09:49 | Success | - | |
|
exp_self.20260415094124.361_20260415_094124
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415094124.361 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 09:42 | Success | - | |
|
exp_pytrain.20260415093850.141_20260415_093850
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-15 09:39 | Success | - | |
|
exp_self.20260415093204.360_20260415_093204
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415093204.360 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 09:33 | Success | - | |
|
exp_self.20260415092427.359_20260415_092428
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415092427.359 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 09:25 | Success | - | |
|
exp_self.20260415091648.358_20260415_091648
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415091648.358 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 09:17 | Success | - | |
|
exp_self.20260415090909.357_20260415_090910
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415090909.357 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 09:10 | Success | - | |
|
exp_pytrain.20260415090643.140_20260415_090644
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-15 09:07 | Success | - | |
|
exp_self.20260415085933.356_20260415_085934
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415085933.356 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 09:00 | Success | - | |
|
exp_self.20260415085202.355_20260415_085203
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415085202.355 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 08:53 | Success | - | |
|
exp_self.20260415084420.354_20260415_084420
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415084420.354 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 08:45 | Success | - | |
|
exp_self.20260415083648.353_20260415_083648
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415083648.353 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 08:37 | Success | - | |
|
exp_pytrain.20260415083420.139_20260415_083420
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-15 08:35 | Success | - | |
|
exp_self.20260415082712.352_20260415_082712
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415082712.352 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 08:28 | Success | - | |
|
exp_self.20260415081940.351_20260415_081940
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415081940.351 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 08:20 | Success | - | |
|
exp_self.20260415081210.350_20260415_081211
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415081210.350 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 08:13 | Success | - | |
|
exp_self.20260415080427.349_20260415_080427
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415080427.349 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 08:05 | Success | - | |
|
exp_pytrain.20260415080158.138_20260415_080159
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-15 08:03 | Success | - | |
|
exp_self.20260415075603.348_20260415_075603
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415075603.348 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 07:57 | Success | - | |
|
exp_self.20260415074825.347_20260415_074826
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415074825.347 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 07:49 | Success | - | |
|
exp_self.20260415074044.346_20260415_074045
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415074044.346 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 07:41 | Success | - | |
|
exp_self.20260415073310.345_20260415_073311
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415073310.345 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 07:34 | Success | - | |
|
exp_pytrain.20260415073032.137_20260415_073033
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-15 07:31 | Success | - | |
|
exp_self.20260415072446.344_20260415_072447
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415072446.344 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 07:25 | Success | - | |
|
exp_self.20260415071713.343_20260415_071713
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415071713.343 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 07:18 | Success | - | |
|
exp_self.20260415070911.342_20260415_070911
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415070911.342 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 07:10 | Success | - | |
|
exp_self.20260415070116.341_20260415_070116
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415070116.341 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 07:02 | Success | - | |
|
exp_pytrain.20260415065845.136_20260415_065845
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-15 06:59 | Success | - | |
|
exp_self.20260415065302.340_20260415_065302
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415065302.340 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 06:54 | Success | - | |
|
exp_self.20260415064521.339_20260415_064522
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415064521.339 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 06:46 | Success | - | |
|
exp_self.20260415063742.338_20260415_063742
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415063742.338 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 06:38 | Success | - | |
|
exp_self.20260415063011.337_20260415_063012
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415063011.337 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 06:31 | Success | - | |
|
exp_pytrain.20260415062710.135_20260415_062711
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-15 06:28 | Success | - | |
|
exp_self.20260415062103.336_20260415_062103
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415062103.336 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 06:22 | Success | - | |
|
exp_self.20260415061326.335_20260415_061326
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415061326.335 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 06:14 | Success | - | |
|
exp_self.20260415060543.334_20260415_060544
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415060543.334 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 06:06 | Success | - | |
|
exp_self.20260415055813.333_20260415_055813
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415055813.333 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 05:59 | Success | - | |
|
exp_pytrain.20260415055550.134_20260415_055551
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-15 05:56 | Success | - | |
|
exp_self.20260415054842.332_20260415_054842
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415054842.332 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 05:49 | Success | - | |
|
exp_self.20260415054105.331_20260415_054105
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415054105.331 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 05:42 | Success | - | |
|
exp_self.20260415053329.330_20260415_053330
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415053329.330 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 05:34 | Success | - | |
|
exp_self.20260415052556.329_20260415_052556
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415052556.329 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 05:26 | Success | - | |
|
exp_pytrain.20260415052333.133_20260415_052333
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-15 05:24 | Success | - | |
|
exp_self.20260415051633.328_20260415_051634
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415051633.328 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 05:17 | Success | - | |
|
exp_self.20260415050907.327_20260415_050907
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415050907.327 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 05:10 | Success | - | |
|
exp_self.20260415050139.326_20260415_050139
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415050139.326 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 05:02 | Success | - | |
|
exp_cr_10.3390_aichem1020007_20260415_045718
|
Active Learning on Protein Language Model Embeddings Accelerates Rubisco Variant Discovery for Desired Traits
Paper ID: cr_10.3390_aichem1020007 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 04:58 | Success | - | |
|
exp_self.20260415045408.325_20260415_045408
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415045408.325 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 04:55 | Success | - | |
|
exp_pytrain.20260415045141.132_20260415_045142
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-15 04:52 | Success | - | |
|
exp_self.20260415044451.324_20260415_044451
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415044451.324 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 04:45 | Success | - | |
|
exp_self.20260415043723.323_20260415_043724
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415043723.323 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 04:38 | Success | - | |
|
exp_self.20260415042956.322_20260415_042956
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415042956.322 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 04:30 | Success | - | |
|
exp_self.20260415042232.321_20260415_042232
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415042232.321 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 04:23 | Success | - | |
|
exp_pytrain.20260415042004.131_20260415_042005
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-15 04:21 | Success | - | |
|
exp_self.20260415040230.320_20260415_040230
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415040230.320 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 04:03 | Success | - | |
|
exp_self.20260415035455.319_20260415_035455
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415035455.319 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 03:55 | Success | - | |
|
exp_self.20260415034725.318_20260415_034725
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415034725.318 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 03:48 | Success | - | |
|
exp_self.20260415033959.317_20260415_033959
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415033959.317 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 03:41 | Success | - | |
|
exp_pytrain.20260415033729.130_20260415_033730
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-15 03:38 | Success | - | |
|
exp_self.20260415033150.316_20260415_033150
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415033150.316 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 03:32 | Success | - | |
|
exp_self.20260415032424.315_20260415_032424
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415032424.315 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 03:25 | Success | - | |
|
exp_self.20260415031642.314_20260415_031643
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415031642.314 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 03:17 | Success | - | |
|
exp_self.20260415030907.313_20260415_030908
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415030907.313 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 03:10 | Success | - | |
|
exp_pytrain.20260415030533.129_20260415_030533
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-15 03:06 | Success | - | |
|
exp_self.20260415030126.312_20260415_030126
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415030126.312 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 03:02 | Success | - | |
|
exp_self.20260415025346.311_20260415_025346
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415025346.311 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 02:54 | Success | - | |
|
exp_self.20260415024618.310_20260415_024618
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415024618.310 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 02:47 | Success | - | |
|
exp_self.20260415023856.309_20260415_023856
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415023856.309 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 02:39 | Success | - | |
|
exp_pytrain.20260415023419.128_20260415_023419
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-15 02:35 | Success | - | |
|
exp_self.20260415023214.308_20260415_023214
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415023214.308 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 02:33 | Success | - | |
|
exp_cr_10.1038_s41524-026-01995-1_20260415_022905
|
High-throughput parameter estimation from experimental data using Bayesian Inference with accelerated sampling
Paper ID: cr_10.1038_s41524-026-01995-1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Reco...
|
04-15 02:30 | Success | - | |
|
exp_self.20260415022339.307_20260415_022339
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415022339.307 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 02:24 | Success | - | |
|
exp_self.20260415021609.306_20260415_021609
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415021609.306 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 02:17 | Success | - | |
|
exp_self.20260415020842.305_20260415_020842
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415020842.305 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 02:09 | Success | - | |
|
exp_pytrain.20260415020258.127_20260415_020258
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-15 02:04 | Success | - | |
|
exp_self.20260415020104.304_20260415_020105
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415020104.304 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 02:02 | Success | - | |
|
exp_self.20260415013843.303_20260415_013844
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415013843.303 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 01:39 | Success | - | |
|
exp_hf_2604.12373_20260415_013311
|
Masked by Consensus: Disentangling Privileged Knowledge in LLM Correctness
Paper ID: hf_2604.12373 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-15 01:34 | Success | - | |
|
exp_self.20260415013113.302_20260415_013113
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415013113.302 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 01:32 | Success | - | |
|
exp_pytrain.20260415012842.126_20260415_012843
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-15 01:29 | Success | - | |
|
exp_self.20260415012146.301_20260415_012147
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415012146.301 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 01:22 | Success | - | |
|
exp_self.20260415011423.300_20260415_011423
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415011423.300 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 01:15 | Success | - | |
|
exp_self.20260415010700.299_20260415_010701
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415010700.299 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 01:08 | Success | - | |
|
exp_self.20260415005937.298_20260415_005937
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415005937.298 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 01:00 | Success | - | |
|
exp_pytrain.20260415005711.125_20260415_005711
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-15 00:58 | Success | - | |
|
exp_self.20260415005026.297_20260415_005027
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415005026.297 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 00:51 | Success | - | |
|
exp_self.20260415004300.296_20260415_004300
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415004300.296 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 00:44 | Success | - | |
|
exp_self.20260415003539.295_20260415_003540
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415003539.295 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 00:36 | Success | - | |
|
exp_self.20260415002819.294_20260415_002819
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415002819.294 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 00:29 | Success | - | |
|
exp_pytrain.20260415002552.124_20260415_002552
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-15 00:26 | Success | - | |
|
exp_self.20260415001906.293_20260415_001907
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415001906.293 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 00:20 | Success | - | |
|
exp_self.20260415001139.292_20260415_001140
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415001139.292 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 00:12 | Success | - | |
|
exp_self.20260415000409.291_20260415_000409
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260415000409.291 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-15 00:05 | Success | - | |
|
exp_hf_2604.05072_20260414_235948
|
Hierarchical SVG Tokenization: Learning Compact Visual Programs for Scalable Vector Graphics Modeling
Paper ID: hf_2604.05072 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-15 00:00 | Success | - | |
|
exp_self.20260414235641.290_20260414_235642
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414235641.290 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 23:57 | Success | - | |
|
exp_pytrain.20260414235420.123_20260414_235420
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-14 23:55 | Success | - | |
|
exp_self.20260414234722.289_20260414_234723
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414234722.289 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 23:48 | Success | - | |
|
exp_self.20260414234003.288_20260414_234003
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414234003.288 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 23:41 | Success | - | |
|
exp_self.20260414233240.287_20260414_233241
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414233240.287 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 23:33 | Success | - | |
|
exp_self.20260414232519.286_20260414_232520
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414232519.286 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 23:26 | Success | - | |
|
exp_pytrain.20260414232250.122_20260414_232251
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-14 23:23 | Success | - | |
|
exp_self.20260414231559.285_20260414_231559
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414231559.285 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 23:17 | Success | - | |
|
exp_self.20260414230832.284_20260414_230833
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414230832.284 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 23:09 | Success | - | |
|
exp_self.20260414230115.283_20260414_230115
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414230115.283 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 23:02 | Success | - | |
|
exp_hf_2604.12627_20260414_225652
|
KnowRL: Boosting LLM Reasoning via Reinforcement Learning with Minimal-Sufficient Knowledge Guidance
Paper ID: hf_2604.12627 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-14 22:57 | Success | - | |
|
exp_self.20260414225345.282_20260414_225346
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414225345.282 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 22:54 | Success | - | |
|
exp_pytrain.20260414225122.121_20260414_225122
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-14 22:52 | Success | - | |
|
exp_self.20260414224434.281_20260414_224434
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414224434.281 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 22:45 | Success | - | |
|
exp_self.20260414223714.280_20260414_223714
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414223714.280 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 22:38 | Success | - | |
|
exp_hf_2604.12322_20260414_223358
|
Self-Adversarial One Step Generation via Condition Shifting
Paper ID: hf_2604.12322 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-14 22:35 | Success | - | |
|
exp_self.20260414222945.279_20260414_222945
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414222945.279 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 22:30 | Success | - | |
|
exp_self.20260414222221.278_20260414_222222
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414222221.278 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 22:23 | Success | - | |
|
exp_pytrain.20260414221954.120_20260414_221954
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-14 22:20 | Success | - | |
|
exp_hf_2604.12890_20260414_221711
|
Towards Long-horizon Agentic Multimodal Search
Paper ID: hf_2604.12890 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-14 22:18 | Success | - | |
|
exp_self.20260414221253.277_20260414_221254
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414221253.277 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 22:13 | Success | - | |
|
exp_self.20260414220527.276_20260414_220528
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414220527.276 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 22:06 | Success | - | |
|
exp_hf_2604.13010_20260414_220207
|
Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation
Paper ID: hf_2604.13010 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-14 22:03 | Success | - | |
|
exp_hf_2604.12374_20260414_215840
|
Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning
Paper ID: hf_2604.12374 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-14 21:59 | Success | - | |
|
exp_self.20260414215643.275_20260414_215643
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414215643.275 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 21:57 | Success | - | |
|
exp_self.20260414214915.274_20260414_214916
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414214915.274 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 21:50 | Success | - | |
|
exp_pytrain.20260414214646.119_20260414_214646
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-14 21:47 | Success | - | |
|
exp_hf_2604.08865_20260414_214149
|
SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks
Paper ID: hf_2604.08865 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-14 21:42 | Success | - | |
|
exp_self.20260414213952.273_20260414_213952
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414213952.273 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 21:40 | Success | - | |
|
exp_self.20260414213231.272_20260414_213231
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414213231.272 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 21:33 | Success | - | |
|
exp_self.20260414212508.271_20260414_212508
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414212508.271 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 21:26 | Success | - | |
|
exp_2604.13024v1_20260414_212157
|
CLAD: Efficient Log Anomaly Detection Directly on Compressed Representations
Paper ID: 2604.13024v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-14 21:22 | Success | - | |
|
exp_self.20260414211744.270_20260414_211745
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414211744.270 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 21:18 | Success | - | |
|
exp_pytrain.20260414211516.118_20260414_211516
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-14 21:16 | Success | - | |
|
exp_self.20260414211059.269_20260414_211059
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414211059.269 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 21:12 | Success | - | |
|
exp_2604.13035v1_20260414_210746
|
SceneCritic: A Symbolic Evaluator for 3D Indoor Scene Synthesis
Paper ID: 2604.13035v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-14 21:08 | Success | - | |
|
exp_self.20260414210042.268_20260414_210043
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414210042.268 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 21:01 | Success | - | |
|
exp_self.20260414205319.267_20260414_205319
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414205319.267 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 20:54 | Success | - | |
|
exp_self.20260414204559.266_20260414_204600
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414204559.266 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 20:47 | Success | - | |
|
exp_pytrain.20260414204330.117_20260414_204331
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-14 20:44 | Success | - | |
|
exp_self.20260414203634.265_20260414_203634
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414203634.265 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 20:37 | Success | - | |
|
exp_self.20260414202909.264_20260414_202909
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414202909.264 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 20:30 | Success | - | |
|
exp_self.20260414202147.263_20260414_202147
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414202147.263 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 20:22 | Success | - | |
|
exp_self.20260414201421.262_20260414_201422
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414201421.262 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 20:15 | Success | - | |
|
exp_pytrain.20260414201154.116_20260414_201154
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-14 20:12 | Success | - | |
|
exp_self.20260414200511.261_20260414_200511
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414200511.261 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 20:06 | Success | - | |
|
exp_self.20260414195748.260_20260414_195749
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414195748.260 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 19:58 | Success | - | |
|
exp_self.20260414195026.259_20260414_195027
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414195026.259 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 19:51 | Success | - | |
|
exp_gh_leitoooatr_PythonVectorDB_20260414_194717
|
leitoooatr/PythonVectorDB
Paper ID: gh_leitoooatr_PythonVectorDB - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recov...
|
04-14 19:48 | Success | - | |
|
exp_self.20260414194238.258_20260414_194238
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414194238.258 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 19:43 | Success | - | |
|
exp_pytrain.20260414194017.115_20260414_194017
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-14 19:41 | Success | - | |
|
exp_self.20260414193322.257_20260414_193322
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414193322.257 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 19:34 | Success | - | |
|
exp_self.20260414192600.256_20260414_192601
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414192600.256 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 19:27 | Success | - | |
|
exp_self.20260414191843.255_20260414_191843
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414191843.255 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 19:19 | Success | - | |
|
exp_gh_Sheaantisocial810_pytorch-mobilenet-efficiency_20260414_191420
|
Sheaantisocial810/pytorch-mobilenet-efficiency
Paper ID: gh_Sheaantisocial810_pytorch-mobilenet-efficiency - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - E...
|
04-14 19:15 | Success | - | |
|
exp_self.20260414191114.254_20260414_191114
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414191114.254 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 19:12 | Success | - | |
|
exp_pytrain.20260414190847.114_20260414_190848
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-14 19:09 | Success | - | |
|
exp_self.20260414190159.253_20260414_190159
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414190159.253 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 19:03 | Success | - | |
|
exp_self.20260414185435.252_20260414_185436
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414185435.252 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 18:55 | Success | - | |
|
exp_self.20260414184714.251_20260414_184715
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414184714.251 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 18:48 | Success | - | |
|
exp_self.20260414183950.250_20260414_183950
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414183950.250 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 18:40 | Success | - | |
|
exp_pytrain.20260414183730.113_20260414_183731
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-14 18:38 | Success | - | |
|
exp_self.20260414183210.249_20260414_183211
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414183210.249 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 18:33 | Success | - | |
|
exp_self.20260414182454.248_20260414_182454
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414182454.248 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 18:25 | Success | - | |
|
exp_self.20260414181734.247_20260414_181735
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414181734.247 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 18:18 | Success | - | |
|
exp_self.20260414181015.246_20260414_181015
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414181015.246 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 18:11 | Success | - | |
|
exp_hf_2604.04385_20260414_180721
|
How Alignment Routes: Localizing, Scaling, and Controlling Policy Circuits in Language Models
Paper ID: hf_2604.04385 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-14 18:08 | Success | - | |
|
exp_pytrain.20260414180526.112_20260414_180526
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-14 18:06 | Success | - | |
|
exp_self.20260414180331.245_20260414_180332
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414180331.245 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 18:04 | Success | - | |
|
exp_hf_2604.11004_20260414_180040
|
Panoptic Pairwise Distortion Graph
Paper ID: hf_2604.11004 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-14 18:01 | Success | - | |
|
exp_cr_10.3390_axioms15040289_20260414_175754
|
Amortized Parameter Inference for the Arbitrary-Order Hidden Markov Model
Paper ID: cr_10.3390_axioms15040289 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovere...
|
04-14 17:58 | Success | - | |
|
exp_hf_2604.10539_20260414_175532
|
IceCache: Memory-efficient KV-cache Management for Long-Sequence LLMs
Paper ID: hf_2604.10539 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-14 17:56 | Success | - | |
|
exp_self.20260414175335.244_20260414_175336
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414175335.244 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 17:54 | Success | - | |
|
exp_pytrain.20260414173208.111_20260414_173234
|
AST-Based Package Type Coverage Analyzer
This benchmark tests the ability to construct a static analysis tool using Python's standard library. The goal is to validate type annotation coverage across a dynamically generated Python package structure without executing the target code...
|
04-14 17:33 | Success | - | |
|
exp_self.20260414171027.243_20260414_171046
|
Benchmark: SSM Memory Policy Stress Test
This benchmark evaluates the impact of a disciplined memory management strategy on State Space Model (SSM) throughput and VRAM consumption. Hypothesis Applying a chunked execution strategy (disciplined memory policy) to SSM layers significa...
|
04-14 17:11 | Success | - | |
|
exp_pytrain.20260414161649.110_20260414_161727
|
Generic Resource Loader Benchmark
This benchmark demonstrates a robust implementation of a generic resource loader using Python's modern type hinting system (PEP 585/PEP 591) and the `importlib.resources` API. Objective To verify that a generic class `ResourceLoader[T]` can...
|
04-14 16:18 | Success | - | |
|
exp_self.20260414155420.242_20260414_155447
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414155420.242 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 15:55 | Success | - | |
|
exp_pytrain.20260414150311.109_20260414_150344
|
Strictly Typed Plugin Registry with Runtime Validation
Overview This benchmark validates a robust plugin architecture implementation using Python's `typing.Protocol`. The system enforces interface compliance at both static (linting/type checking) and dynamic (runtime) levels. Problem Statement...
|
04-14 15:04 | Success | - | |
|
exp_self.20260414144028.241_20260414_144110
|
Self-directed benchmark: ssm strategy stress test
This benchmark evaluates the performance characteristics and memory efficiency of a Selective State Space Model (SSM) strategy against a standard Transformer (Attention) baseline. Hypothesis Applying SSM with a disciplined memory policy imp...
|
04-14 14:42 | Success | - | |
|
exp_pytrain.20260414134700.108_20260414_134734
|
Python Skill Fallback
Title: Type-Validated ZipApp Packager - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-14 13:48 | Success | - | |
|
exp_self.20260414132528.240_20260414_132628
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414132528.240 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 13:27 | Success | - | |
|
exp_pytrain.20260414123136.107_20260414_123157
|
Strictly Typed Plugin Architecture with Dynamic Discovery
Overview This benchmark demonstrates a robust, type-safe plugin architecture using Python's standard library. It leverages `typing.Protocol` for structural interface enforcement and `types.ModuleType` for dynamic module generation and intro...
|
04-14 12:32 | Success | - | |
|
exp_self.20260414120714.239_20260414_120732
|
SSM Strategy Stress Test Benchmark
This benchmark evaluates the hypothesis that applying SSM (State Space Model) strategies with a disciplined memory policy improves throughput under strict 8GB VRAM constraints. Context SSMs, such as Mamba, rely on efficient recurrence mecha...
|
04-14 12:09 | Success | - | |
|
exp_pytrain.20260414110652.106_20260414_110752
|
Python Skill Fallback
Title: Strictly Typed Data Pipeline with Dynamic Registration - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-14 11:08 | Success | - | |
|
exp_self.20260414103906.238_20260414_103943
|
SSM Strategy Stress Test Benchmark
Overview This benchmark evaluates the hypothesis that applying **State Space Model (SSM)** strategies with a disciplined memory policy and dynamic precision can significantly improve throughput under constrained **8GB VRAM** environments. I...
|
04-14 10:40 | Success | - | |
|
exp_pytrain.20260414093416.105_20260414_093513
|
Dynamic Protocol-Compliant Plugin Loader
This benchmark evaluates a system's ability to dynamically construct Python package structures in a volatile environment and enforce strict structural subtyping using `typing.Protocol`. It tests the candidate's capability to manage temporar...
|
04-14 09:36 | Success | - | |
|
exp_self.20260414090917.237_20260414_090943
|
SSM Strategy Stress Test
This benchmark evaluates the performance of State Space Models (SSMs) under varying memory policies. Hypothesis Applying an SSM with a disciplined memory policy (using caching and dynamic precision) improves throughput and efficiency under...
|
04-14 09:11 | Success | - | |
|
exp_pytrain.20260414081013.104_20260414_081042
|
Python Skill Fallback
Title: Strictly-Typed Modular Plugin Registry - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-14 08:11 | Success | - | |
|
exp_self.20260414074311.236_20260414_074343
|
SSM Strategy Stress Test Benchmark
Overview This benchmark tests the hypothesis that State Space Models (SSMs) with a disciplined memory policy (specifically selective state spaces like Mamba) offer superior throughput and VRAM efficiency compared to standard attention mecha...
|
04-14 07:45 | Success | - | |
|
exp_pytrain.20260414064022.103_20260414_064045
|
Python Skill Fallback
Title: Runtime Plugin Loader with Protocol Validation - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-14 06:41 | Success | - | |
|
exp_self.20260414061348.235_20260414_061444
|
SSM Strategy Stress Test Benchmark
This benchmark evaluates the performance of a State Space Model (SSM) strategy against a traditional ablated baseline. Specifically, it tests the hypothesis that applying SSMs with a disciplined memory policy (using dynamic precision and re...
|
04-14 06:15 | Success | - | |
|
exp_pytrain.20260414051045.102_20260414_051117
|
Python Skill Fallback
Title: PEP 695 Generic Dependency Container with Runtime Validation - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-14 05:12 | Success | - | |
|
exp_self.20260414044155.234_20260414_044252
|
SSM Strategy Stress Test Benchmark
This benchmark evaluates the hypothesis that a State Space Model (SSM) implementation, when combined with a disciplined memory policy (specifically dynamic precision mixing and state caching), yields superior throughput and lower VRAM usage...
|
04-14 04:43 | Success | - | |
|
exp_pytrain.20260414033228.101_20260414_033254
|
Python Skill Fallback
Title: Generic Plugin Registry with API Surface Hygiene - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-14 03:33 | Success | - | |
|
exp_2604.11807v1_20260414_031604
|
Physics-Informed State Space Models for Reliable Solar Irradiance Forecasting in Off-Grid Systems
Paper ID: 2604.11807v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-14 03:17 | Success | - | |
|
exp_pytrain.20260414025339.100_20260414_025339
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-14 02:54 | Success | - | |
|
exp_self.20260414024923.233_20260414_024923
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414024923.233 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 02:50 | Success | - | |
|
exp_self.20260414024147.232_20260414_024148
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414024147.232 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 02:42 | Success | - | |
|
exp_pytrain.20260414022017.099_20260414_022118
|
Python Skill Fallback
Title: Typing-Driven Plugin Registry with Namespace Control - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-14 02:22 | Success | - | |
|
exp_self.20260414015142.231_20260414_015142
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414015142.231 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 01:52 | Success | - | |
|
exp_self.20260414014353.230_20260414_014353
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414014353.230 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 01:44 | Success | - | |
|
exp_pytrain.20260414014109.098_20260414_014109
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-14 01:42 | Success | - | |
|
exp_self.20260414013536.229_20260414_013536
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414013536.229 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 01:36 | Success | - | |
|
exp_self.20260414012749.228_20260414_012749
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414012749.228 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 01:28 | Success | - | |
|
exp_self.20260414012000.227_20260414_012000
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414012000.227 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 01:21 | Success | - | |
|
exp_self.20260414011214.226_20260414_011214
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414011214.226 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 01:13 | Success | - | |
|
exp_pytrain.20260414010934.097_20260414_010934
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-14 01:10 | Success | - | |
|
exp_self.20260414010507.225_20260414_010508
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414010507.225 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 01:06 | Success | - | |
|
exp_self.20260414005754.224_20260414_005754
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414005754.224 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 00:58 | Success | - | |
|
exp_self.20260414005013.223_20260414_005014
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414005013.223 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 00:51 | Success | - | |
|
exp_self.20260414004231.222_20260414_004232
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414004231.222 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 00:43 | Success | - | |
|
exp_pytrain.20260414003726.096_20260414_003727
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-14 00:38 | Success | - | |
|
exp_self.20260414003516.221_20260414_003516
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414003516.221 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 00:36 | Success | - | |
|
exp_self.20260414002727.220_20260414_002728
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414002727.220 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 00:28 | Success | - | |
|
exp_self.20260414001944.219_20260414_001945
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414001944.219 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 00:20 | Success | - | |
|
exp_self.20260414001201.218_20260414_001201
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414001201.218 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 00:13 | Success | - | |
|
exp_hf_2604.10333_20260414_000835
|
Zero-shot World Models Are Developmentally Efficient Learners
Paper ID: hf_2604.10333 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-14 00:09 | Success | - | |
|
exp_pytrain.20260414000401.095_20260414_000402
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-14 00:05 | Success | - | |
|
exp_self.20260414000152.217_20260414_000152
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260414000152.217 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-14 00:02 | Success | - | |
|
exp_self.20260413235410.216_20260413_235411
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413235410.216 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 23:55 | Success | - | |
|
exp_hf_2604.10030_20260413_235112
|
Prompt Relay: Inference-Time Temporal Control for Multi-Event Video Generation
Paper ID: hf_2604.10030 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-13 23:52 | Success | - | |
|
exp_self.20260413234354.215_20260413_234354
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413234354.215 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 23:44 | Success | - | |
|
exp_self.20260413233607.214_20260413_233607
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413233607.214 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 23:37 | Success | - | |
|
exp_pytrain.20260413233109.094_20260413_233110
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-13 23:32 | Success | - | |
|
exp_self.20260413232854.213_20260413_232855
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413232854.213 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 23:29 | Success | - | |
|
exp_self.20260413232118.212_20260413_232118
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413232118.212 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 23:22 | Success | - | |
|
exp_hf_2604.09212_20260413_231753
|
SPASM: Stable Persona-driven Agent Simulation for Multi-turn Dialogue Generation
Paper ID: hf_2604.09212 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-13 23:18 | Success | - | |
|
exp_2604.11808v1_20260413_231528
|
Pair2Scene: Learning Local Object Relations for Procedural Scene Generation
Paper ID: 2604.11808v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-13 23:16 | Success | - | |
|
exp_2604.11804v1_20260413_231109
|
OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation
Paper ID: 2604.11804v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-13 23:12 | Success | - | |
|
exp_self.20260413230858.211_20260413_230858
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413230858.211 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 23:10 | Success | - | |
|
exp_hf_2604.11035_20260413_230555
|
Introspective Diffusion Language Models
Paper ID: hf_2604.11035 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-13 23:06 | Success | - | |
|
exp_cr_10.1186_s42400-026-00589-0_20260413_230302
|
VulSCC: image-based vulnerability detection with SPP-CNN and code large language model
Paper ID: cr_10.1186_s42400-026-00589-0 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Reco...
|
04-13 23:04 | Success | - | |
|
exp_pytrain.20260413225641.093_20260413_225729
|
Strictly-Typed Dynamic Module Loader
This benchmark evaluates a robust, strictly-typed plugin architecture that dynamically discovers and imports modules at runtime without hardcoded imports. It simulates a high-performance plugin system where: 1. **Dynamic Discovery**: A `Plu...
|
04-13 22:58 | Success | - | |
|
exp_self.20260413224201.210_20260413_224201
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413224201.210 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 22:43 | Success | - | |
|
exp_self.20260413223424.209_20260413_223424
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413223424.209 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 22:35 | Success | - | |
|
exp_self.20260413222649.208_20260413_222650
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413222649.208 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 22:27 | Success | - | |
|
exp_self.20260413221917.207_20260413_221918
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413221917.207 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 22:20 | Success | - | |
|
exp_pytrain.20260413221639.092_20260413_221639
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-13 22:17 | Success | - | |
|
exp_hf_2604.10098_20260413_221348
|
Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation
Paper ID: hf_2604.10098 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-13 22:14 | Success | - | |
|
exp_self.20260413221026.206_20260413_221027
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413221026.206 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 22:11 | Success | - | |
|
exp_2604.11585v1_20260413_220705
|
GeomPrompt: Geometric Prompt Learning for RGB-D Semantic Segmentation Under Missing and Degraded Depth
Paper ID: 2604.11585v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-13 22:08 | Success | - | |
|
exp_self.20260413220238.205_20260413_220238
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413220238.205 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 22:03 | Success | - | |
|
exp_self.20260413215458.204_20260413_215459
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413215458.204 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 21:56 | Success | - | |
|
exp_2604.11590v1_20260413_215204
|
Learning Robustness at Test-Time from a Non-Robust Teacher
Paper ID: 2604.11590v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-13 21:53 | Success | - | |
|
exp_self.20260413214506.203_20260413_214506
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413214506.203 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 21:46 | Success | - | |
|
exp_pytrain.20260413214228.091_20260413_214229
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-13 21:43 | Success | - | |
|
exp_hf_2604.11804_20260413_213941
|
OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation
Paper ID: hf_2604.11804 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-13 21:40 | Success | - | |
|
exp_self.20260413213625.202_20260413_213625
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413213625.202 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 21:37 | Success | - | |
|
exp_self.20260413212842.201_20260413_212843
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413212842.201 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 21:29 | Success | - | |
|
exp_self.20260413212107.200_20260413_212108
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413212107.200 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 21:22 | Success | - | |
|
exp_self.20260413211337.199_20260413_211338
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413211337.199 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 21:14 | Success | - | |
|
exp_pytrain.20260413211102.090_20260413_211102
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-13 21:12 | Success | - | |
|
exp_self.20260413210407.198_20260413_210407
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413210407.198 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 21:05 | Success | - | |
|
exp_self.20260413205635.197_20260413_205636
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413205635.197 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 20:57 | Success | - | |
|
exp_2604.10556v1_20260413_205104
|
Lost in Diffusion: Uncovering Hallucination Patterns and Failure Modes in Diffusion Large Language Models
Paper ID: 2604.10556v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-13 20:52 | Success | - | |
|
exp_self.20260413204857.196_20260413_204857
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413204857.196 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 20:50 | Success | - | |
|
exp_self.20260413204123.195_20260413_204124
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413204123.195 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 20:42 | Success | - | |
|
exp_pytrain.20260413203855.089_20260413_203856
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-13 20:39 | Success | - | |
|
exp_self.20260413203349.194_20260413_203349
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413203349.194 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 20:34 | Success | - | |
|
exp_self.20260413202457.193_20260413_202457
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413202457.193 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 20:26 | Success | - | |
|
exp_self.20260413201653.192_20260413_201653
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413201653.192 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 20:17 | Success | - | |
|
exp_self.20260413200910.191_20260413_200910
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413200910.191 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 20:10 | Success | - | |
|
exp_pytrain.20260413200636.088_20260413_200637
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-13 20:07 | Success | - | |
|
exp_self.20260413195935.190_20260413_195935
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413195935.190 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 20:00 | Success | - | |
|
exp_self.20260413195206.189_20260413_195207
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413195206.189 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 19:53 | Success | - | |
|
exp_self.20260413194437.188_20260413_194437
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413194437.188 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 19:45 | Success | - | |
|
exp_self.20260413193700.187_20260413_193701
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413193700.187 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 19:38 | Success | - | |
|
exp_pytrain.20260413193429.087_20260413_193430
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-13 19:35 | Success | - | |
|
exp_self.20260413193008.186_20260413_193009
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413193008.186 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 19:31 | Success | - | |
|
exp_self.20260413192240.185_20260413_192240
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413192240.185 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 19:23 | Success | - | |
|
exp_self.20260413191513.184_20260413_191514
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413191513.184 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 19:16 | Success | - | |
|
exp_self.20260413190737.183_20260413_190738
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413190737.183 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 19:08 | Success | - | |
|
exp_gh_qualcomm_ai-hub-apps_20260413_190448
|
qualcomm/ai-hub-apps
Paper ID: gh_qualcomm_ai-hub-apps - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 19:05 | Success | - | |
|
exp_pytrain.20260413190232.086_20260413_190232
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-13 19:03 | Success | - | |
|
exp_self.20260413185538.182_20260413_185538
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413185538.182 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 18:56 | Success | - | |
|
exp_self.20260413184807.181_20260413_184807
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413184807.181 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 18:49 | Success | - | |
|
exp_self.20260413184035.180_20260413_184035
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413184035.180 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 18:41 | Success | - | |
|
exp_self.20260413183306.179_20260413_183307
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413183306.179 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 18:34 | Success | - | |
|
exp_pytrain.20260413183033.085_20260413_183033
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-13 18:31 | Success | - | |
|
exp_self.20260413182453.178_20260413_182454
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413182453.178 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 18:25 | Success | - | |
|
exp_self.20260413181723.177_20260413_181723
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413181723.177 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 18:18 | Success | - | |
|
exp_self.20260413180947.176_20260413_180947
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413180947.176 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 18:10 | Success | - | |
|
exp_self.20260413180241.175_20260413_180241
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413180241.175 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 18:03 | Success | - | |
|
exp_pytrain.20260413175904.084_20260413_175904
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-13 18:00 | Success | - | |
|
exp_self.20260413175335.174_20260413_175336
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413175335.174 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 17:54 | Success | - | |
|
exp_self.20260413174608.173_20260413_174609
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413174608.173 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 17:47 | Success | - | |
|
exp_self.20260413173840.172_20260413_173840
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413173840.172 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 17:39 | Success | - | |
|
exp_hf_2604.02315_20260413_173519
|
Beyond the Assistant Turn: User Turn Generation as a Probe of Interaction Awareness in Language Models
Paper ID: hf_2604.02315 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-13 17:36 | Success | - | |
|
exp_self.20260413172951.171_20260413_172952
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413172951.171 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 17:30 | Success | - | |
|
exp_pytrain.20260413172713.083_20260413_172714
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-13 17:28 | Success | - | |
|
exp_self.20260413172016.170_20260413_172016
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413172016.170 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 17:21 | Success | - | |
|
exp_self.20260413171240.169_20260413_171240
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413171240.169 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 17:13 | Success | - | |
|
exp_self.20260413170513.168_20260413_170513
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413170513.168 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 17:06 | Success | - | |
|
exp_self.20260413165741.167_20260413_165741
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413165741.167 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 16:58 | Success | - | |
|
exp_pytrain.20260413165501.082_20260413_165501
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-13 16:56 | Success | - | |
|
exp_self.20260413164805.166_20260413_164805
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413164805.166 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 16:49 | Success | - | |
|
exp_self.20260413164033.165_20260413_164034
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413164033.165 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 16:41 | Success | - | |
|
exp_self.20260413163251.164_20260413_163251
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413163251.164 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 16:33 | Success | - | |
|
exp_self.20260413162522.163_20260413_162522
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413162522.163 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 16:26 | Success | - | |
|
exp_pytrain.20260413162239.081_20260413_162239
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-13 16:23 | Success | - | |
|
exp_self.20260413161627.162_20260413_161628
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413161627.162 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 16:17 | Success | - | |
|
exp_self.20260413160857.161_20260413_160857
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413160857.161 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 16:09 | Success | - | |
|
exp_self.20260413160133.160_20260413_160133
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413160133.160 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 16:02 | Success | - | |
|
exp_self.20260413155412.159_20260413_155412
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413155412.159 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 15:55 | Success | - | |
|
exp_pytrain.20260413155041.080_20260413_155041
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-13 15:51 | Success | - | |
|
exp_self.20260413154634.158_20260413_154634
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413154634.158 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 15:47 | Success | - | |
|
exp_self.20260413153914.157_20260413_153914
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413153914.157 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 15:40 | Success | - | |
|
exp_self.20260413153157.156_20260413_153157
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413153157.156 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 15:33 | Success | - | |
|
exp_self.20260413152441.155_20260413_152441
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413152441.155 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 15:25 | Success | - | |
|
exp_pytrain.20260413151858.079_20260413_151858
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-13 15:20 | Success | - | |
|
exp_self.20260413151704.154_20260413_151705
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413151704.154 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 15:18 | Success | - | |
|
exp_self.20260413150949.153_20260413_150950
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413150949.153 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 15:10 | Success | - | |
|
exp_self.20260413150237.152_20260413_150237
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413150237.152 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 15:03 | Success | - | |
|
exp_self.20260413145518.151_20260413_145519
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413145518.151 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 14:56 | Success | - | |
|
exp_self.20260413144752.150_20260413_144752
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413144752.150 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 14:48 | Success | - | |
|
exp_pytrain.20260413144534.078_20260413_144535
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-13 14:46 | Success | - | |
|
exp_self.20260413143846.149_20260413_143847
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413143846.149 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 14:39 | Success | - | |
|
exp_self.20260413143134.148_20260413_143134
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413143134.148 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 14:32 | Success | - | |
|
exp_self.20260413142413.147_20260413_142414
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413142413.147 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 14:25 | Success | - | |
|
exp_self.20260413141638.146_20260413_141638
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413141638.146 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 14:17 | Success | - | |
|
exp_pytrain.20260413141419.077_20260413_141419
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-13 14:15 | Success | - | |
|
exp_self.20260413140731.145_20260413_140731
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413140731.145 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 14:08 | Success | - | |
|
exp_self.20260413140007.144_20260413_140008
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413140007.144 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 14:01 | Success | - | |
|
exp_self.20260413135239.143_20260413_135240
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413135239.143 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 13:53 | Success | - | |
|
exp_self.20260413134519.142_20260413_134520
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413134519.142 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 13:46 | Success | - | |
|
exp_pytrain.20260413134259.076_20260413_134300
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-13 13:44 | Success | - | |
|
exp_self.20260413133737.141_20260413_133737
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413133737.141 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 13:38 | Success | - | |
|
exp_self.20260413133015.140_20260413_133016
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413133015.140 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 13:31 | Success | - | |
|
exp_self.20260413132255.139_20260413_132255
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413132255.139 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 13:23 | Success | - | |
|
exp_hf_2604.04987_20260413_132006
|
Cactus: Accelerating Auto-Regressive Decoding with Constrained Acceptance Speculative Sampling
Paper ID: hf_2604.04987 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-13 13:21 | Success | - | |
|
exp_self.20260413131311.138_20260413_131311
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413131311.138 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 13:14 | Success | - | |
|
exp_pytrain.20260413131047.075_20260413_131047
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-13 13:11 | Success | - | |
|
exp_self.20260413130332.137_20260413_130332
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413130332.137 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 13:04 | Success | - | |
|
exp_self.20260413125611.136_20260413_125611
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413125611.136 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 12:57 | Success | - | |
|
exp_self.20260413124848.135_20260413_124849
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413124848.135 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 12:49 | Success | - | |
|
exp_self.20260413124120.134_20260413_124120
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413124120.134 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 12:42 | Success | - | |
|
exp_pytrain.20260413123900.074_20260413_123900
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-13 12:40 | Success | - | |
|
exp_self.20260413123157.133_20260413_123158
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413123157.133 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 12:33 | Success | - | |
|
exp_self.20260413122432.132_20260413_122432
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413122432.132 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 12:25 | Success | - | |
|
exp_self.20260413121709.131_20260413_121710
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413121709.131 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 12:18 | Success | - | |
|
exp_self.20260413120923.130_20260413_120923
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413120923.130 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 12:10 | Success | - | |
|
exp_pytrain.20260413120634.073_20260413_120634
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-13 12:07 | Success | - | |
|
exp_self.20260413115941.129_20260413_115942
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413115941.129 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 12:00 | Success | - | |
|
exp_self.20260413115219.128_20260413_115220
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413115219.128 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 11:53 | Success | - | |
|
exp_self.20260413114447.127_20260413_114447
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413114447.127 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 11:45 | Success | - | |
|
exp_self.20260413113704.126_20260413_113705
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413113704.126 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 11:38 | Success | - | |
|
exp_pytrain.20260413113435.072_20260413_113435
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-13 11:35 | Success | - | |
|
exp_self.20260413113014.125_20260413_113015
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413113014.125 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 11:31 | Success | - | |
|
exp_self.20260413112251.124_20260413_112252
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413112251.124 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 11:23 | Success | - | |
|
exp_self.20260413111516.123_20260413_111517
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413111516.123 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 11:16 | Success | - | |
|
exp_self.20260413110742.122_20260413_110742
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413110742.122 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 11:08 | Success | - | |
|
exp_hf_2604.09527_20260413_110451
|
Envisioning the Future, One Step at a Time
Paper ID: hf_2604.09527 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-13 11:05 | Success | - | |
|
exp_pytrain.20260413110243.071_20260413_110243
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-13 11:03 | Success | - | |
|
exp_self.20260413105711.121_20260413_105712
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413105711.121 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 10:58 | Success | - | |
|
exp_self.20260413104939.120_20260413_104940
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413104939.120 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 10:50 | Success | - | |
|
exp_self.20260413104213.119_20260413_104214
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413104213.119 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 10:43 | Success | - | |
|
exp_hf_2604.09482_20260413_103857
|
Process Reward Agents for Steering Knowledge-Intensive Reasoning
Paper ID: hf_2604.09482 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-13 10:39 | Success | - | |
|
exp_self.20260413103335.118_20260413_103335
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413103335.118 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 10:34 | Success | - | |
|
exp_pytrain.20260413103101.070_20260413_103101
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-13 10:32 | Success | - | |
|
exp_self.20260413102411.117_20260413_102411
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413102411.117 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 10:25 | Success | - | |
|
exp_self.20260413101648.116_20260413_101648
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413101648.116 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 10:17 | Success | - | |
|
exp_self.20260413100929.115_20260413_100929
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413100929.115 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 10:10 | Success | - | |
|
exp_self.20260413100202.114_20260413_100203
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413100202.114 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 10:03 | Success | - | |
|
exp_pytrain.20260413095826.069_20260413_095827
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-13 09:59 | Success | - | |
|
exp_self.20260413095417.113_20260413_095417
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413095417.113 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 09:55 | Success | - | |
|
exp_self.20260413094654.112_20260413_094654
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413094654.112 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 09:47 | Success | - | |
|
exp_self.20260413093927.111_20260413_093927
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413093927.111 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 09:40 | Success | - | |
|
exp_hf_2604.09130_20260413_093638
|
EquiformerV3: Scaling Efficient, Expressive, and General SE(3)-Equivariant Graph Attention Transformers
Paper ID: hf_2604.09130 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-13 09:37 | Success | - | |
|
exp_self.20260413092935.110_20260413_092936
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413092935.110 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 09:30 | Success | - | |
|
exp_pytrain.20260413092708.068_20260413_092708
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-13 09:28 | Success | - | |
|
exp_hf_2604.01848_20260413_092425
|
Semantic Richness or Geometric Reasoning? The Fragility of VLM's Visual Invariance
Paper ID: hf_2604.01848 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-13 09:25 | Success | - | |
|
exp_self.20260413092007.109_20260413_092007
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413092007.109 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 09:21 | Success | - | |
|
exp_self.20260413091244.108_20260413_091244
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413091244.108 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 09:13 | Success | - | |
|
exp_self.20260413090519.107_20260413_090520
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413090519.107 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 09:06 | Success | - | |
|
exp_self.20260413085730.106_20260413_085731
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413085730.106 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 08:58 | Success | - | |
|
exp_pytrain.20260413085446.067_20260413_085447
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-13 08:55 | Success | - | |
|
exp_self.20260413084746.105_20260413_084746
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413084746.105 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 08:48 | Success | - | |
|
exp_self.20260413084025.104_20260413_084025
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413084025.104 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 08:41 | Success | - | |
|
exp_self.20260413083237.103_20260413_083238
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413083237.103 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 08:33 | Success | - | |
|
exp_self.20260413082502.102_20260413_082502
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413082502.102 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 08:26 | Success | - | |
|
exp_pytrain.20260413082233.066_20260413_082234
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-13 08:23 | Success | - | |
|
exp_self.20260413081537.101_20260413_081538
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413081537.101 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 08:16 | Success | - | |
|
exp_self.20260413080805.100_20260413_080805
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413080805.100 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 08:09 | Success | - | |
|
exp_self.20260413080040.099_20260413_080040
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413080040.099 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 08:01 | Success | - | |
|
exp_self.20260413075316.098_20260413_075317
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413075316.098 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 07:54 | Success | - | |
|
exp_pytrain.20260413075049.065_20260413_075049
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-13 07:51 | Success | - | |
|
exp_self.20260413074357.097_20260413_074357
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413074357.097 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 07:45 | Success | - | |
|
exp_self.20260413073627.096_20260413_073628
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413073627.096 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 07:37 | Success | - | |
|
exp_self.20260413072905.095_20260413_072906
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413072905.095 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 07:30 | Success | - | |
|
exp_self.20260413072140.094_20260413_072140
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413072140.094 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 07:22 | Success | - | |
|
exp_pytrain.20260413071808.064_20260413_071808
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-13 07:19 | Success | - | |
|
exp_self.20260413071354.093_20260413_071354
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413071354.093 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 07:14 | Success | - | |
|
exp_cr_10.1145_3800690_20260413_071104
|
Enabling Low-Latency, GPU-Efficient Serverless Inference with Model Swapping
Paper ID: cr_10.1145_3800690 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered bench...
|
04-13 07:12 | Success | - | |
|
exp_cr_10.1145_3807449_20260413_070711
|
Optimizing Attention for Large Language Model Inference on the MT-3000 Many-Core Processor
Paper ID: cr_10.1145_3807449 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered bench...
|
04-13 07:08 | Success | - | |
|
exp_self.20260413070451.092_20260413_070451
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413070451.092 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 07:05 | Success | - | |
|
exp_cr_10.1145_3802593_20260413_070141
|
FDSR: Efficient Model Training via Adaptive Tensor Quantization Based on Frequency Domain Division and Similarity Data R...
Paper ID: cr_10.1145_3802593 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered bench...
|
04-13 07:02 | Success | - | |
|
exp_self.20260413065617.091_20260413_065617
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413065617.091 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 06:57 | Success | - | |
|
exp_self.20260413064849.090_20260413_064849
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413064849.090 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 06:49 | Success | - | |
|
exp_pytrain.20260413064618.063_20260413_064618
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-13 06:47 | Success | - | |
|
exp_self.20260413064201.089_20260413_064202
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413064201.089 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 06:43 | Success | - | |
|
exp_self.20260413063429.088_20260413_063429
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413063429.088 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 06:35 | Success | - | |
|
exp_self.20260413062659.087_20260413_062700
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413062659.087 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 06:28 | Success | - | |
|
exp_self.20260413061935.086_20260413_061935
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413061935.086 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 06:20 | Success | - | |
|
exp_pytrain.20260413061446.062_20260413_061447
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-13 06:15 | Success | - | |
|
exp_self.20260413061143.085_20260413_061148
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413061143.085 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 06:12 | Success | - | |
|
exp_self.20260413060418.084_20260413_060418
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413060418.084 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 06:05 | Success | - | |
|
exp_hf_2604.08118_20260413_060058
|
Initialisation Determines the Basin: Efficient Codebook Optimisation for Extreme LLM Quantization
Paper ID: hf_2604.08118 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-13 06:02 | Success | - | |
|
exp_self.20260413055536.083_20260413_055537
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413055536.083 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 05:56 | Success | - | |
|
exp_hf_2604.08540_20260413_055245
|
AVGen-Bench: A Task-Driven Benchmark for Multi-Granular Evaluation of Text-to-Audio-Video Generation
Paper ID: hf_2604.08540 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-13 05:53 | Success | - | |
|
exp_self.20260413054522.082_20260413_054523
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413054522.082 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 05:46 | Success | - | |
|
exp_pytrain.20260413054253.061_20260413_054254
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-13 05:43 | Success | - | |
|
exp_self.20260413053607.081_20260413_053607
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413053607.081 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 05:37 | Success | - | |
|
exp_self.20260413052831.080_20260413_052831
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413052831.080 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 05:29 | Success | - | |
|
exp_self.20260413052103.079_20260413_052103
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413052103.079 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 05:22 | Success | - | |
|
exp_self.20260413051341.078_20260413_051342
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413051341.078 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 05:14 | Success | - | |
|
exp_pytrain.20260413051112.060_20260413_051112
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-13 05:12 | Success | - | |
|
exp_self.20260413050416.077_20260413_050416
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413050416.077 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 05:05 | Success | - | |
|
exp_self.20260413045638.076_20260413_045638
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413045638.076 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 04:57 | Success | - | |
|
exp_self.20260413044911.075_20260413_044911
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413044911.075 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 04:50 | Success | - | |
|
exp_self.20260413044147.074_20260413_044147
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413044147.074 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 04:42 | Success | - | |
|
exp_pytrain.20260413043901.059_20260413_043901
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-13 04:40 | Success | - | |
|
exp_self.20260413043202.073_20260413_043203
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413043202.073 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 04:33 | Success | - | |
|
exp_hf_2604.04415_20260413_042844
|
Structured Causal Video Reasoning via Multi-Objective Alignment
Paper ID: hf_2604.04415 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-13 04:29 | Success | - | |
|
exp_self.20260413042432.072_20260413_042433
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413042432.072 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 04:25 | Success | - | |
|
exp_self.20260413041707.071_20260413_041707
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413041707.071 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 04:18 | Success | - | |
|
exp_self.20260413040943.070_20260413_040944
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413040943.070 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 04:10 | Success | - | |
|
exp_pytrain.20260413040716.058_20260413_040716
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-13 04:08 | Success | - | |
|
exp_self.20260413040018.069_20260413_040018
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413040018.069 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 04:01 | Success | - | |
|
exp_cr_10.3390_rs18081145_20260413_035558
|
Dynamic Expansion Mixture-of-Experts with Pre-Trained Vision Transformer for Few-Shot Class-Incremental Remote Sensing S...
Paper ID: cr_10.3390_rs18081145 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered be...
|
04-13 03:57 | Success | - | |
|
exp_self.20260413035256.068_20260413_035257
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413035256.068 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 03:53 | Success | - | |
|
exp_self.20260413034524.067_20260413_034525
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413034524.067 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 03:46 | Success | - | |
|
exp_self.20260413033746.066_20260413_033746
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413033746.066 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 03:38 | Success | - | |
|
exp_pytrain.20260413033522.057_20260413_033522
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-13 03:36 | Success | - | |
|
exp_self.20260413033107.065_20260413_033107
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413033107.065 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 03:32 | Success | - | |
|
exp_self.20260413032334.064_20260413_032335
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413032334.064 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 03:24 | Success | - | |
|
exp_self.20260413031558.063_20260413_031558
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413031558.063 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 03:17 | Success | - | |
|
exp_self.20260413030812.062_20260413_030813
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413030812.062 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 03:09 | Success | - | |
|
exp_pytrain.20260413030335.056_20260413_030335
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-13 03:04 | Success | - | |
|
exp_self.20260413030033.061_20260413_030033
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413030033.061 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 03:01 | Success | - | |
|
exp_self.20260413025302.060_20260413_025302
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413025302.060 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 02:54 | Success | - | |
|
exp_self.20260413024538.059_20260413_024538
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413024538.059 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 02:46 | Success | - | |
|
exp_self.20260413023805.058_20260413_023806
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413023805.058 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 02:39 | Success | - | |
|
exp_pytrain.20260413023150.055_20260413_023150
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-13 02:32 | Success | - | |
|
exp_self.20260413022957.057_20260413_022957
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413022957.057 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 02:30 | Success | - | |
|
exp_self.20260413022231.056_20260413_022231
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413022231.056 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 02:23 | Success | - | |
|
exp_self.20260413021500.055_20260413_021501
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413021500.055 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 02:16 | Success | - | |
|
exp_self.20260413020711.054_20260413_020711
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413020711.054 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 02:08 | Success | - | |
|
exp_cr_10.54254_2755-2721_2026.ba32663_20260413_020400
|
Comparative Study of LSTM, Transformer, and Mixture of Experts for RUL Prediction with Regime-Aware Optimization Researc...
Paper ID: cr_10.54254_2755-2721_2026.ba32663 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal:...
|
04-13 02:05 | Success | - | |
|
exp_pytrain.20260413015938.054_20260413_015939
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-13 02:00 | Success | - | |
|
exp_self.20260413015745.053_20260413_015746
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413015745.053 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 01:58 | Success | - | |
|
exp_self.20260413015018.052_20260413_015018
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413015018.052 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 01:51 | Success | - | |
|
exp_self.20260413014253.051_20260413_014254
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413014253.051 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 01:43 | Success | - | |
|
exp_self.20260413013527.050_20260413_013527
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413013527.050 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 01:36 | Success | - | |
|
exp_hf_2604.08626_20260413_013207
|
WildDet3D: Scaling Promptable 3D Detection in the Wild
Paper ID: hf_2604.08626 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-13 01:33 | Success | - | |
|
exp_pytrain.20260413012749.053_20260413_012750
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-13 01:28 | Success | - | |
|
exp_self.20260413012556.049_20260413_012556
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413012556.049 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 01:26 | Success | - | |
|
exp_self.20260413011829.048_20260413_011829
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413011829.048 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 01:19 | Success | - | |
|
exp_hf_2604.07786_20260413_011510
|
Cross-Modal Emotion Transfer for Emotion Editing in Talking Face Video
Paper ID: hf_2604.07786 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-13 01:16 | Success | - | |
|
exp_2604.09547v1_20260413_011251
|
Tango: Taming Visual Signals for Efficient Video Large Language Models
Paper ID: 2604.09547v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-13 01:13 | Success | - | |
|
exp_cr_10.38124_ijisrt_26apr247_20260413_011001
|
Leveraging Gemma 4 Large Language Model for Protein Function Prediction and Interpretability Application of AI Models fo...
Paper ID: cr_10.38124_ijisrt_26apr247 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recove...
|
04-13 01:11 | Success | - | |
|
exp_hf_2604.09450_20260413_010740
|
ECHO: Efficient Chest X-ray Report Generation with One-step Block Diffusion
Paper ID: hf_2604.09450 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-13 01:08 | Success | - | |
|
exp_self.20260413010542.047_20260413_010543
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260413010542.047 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-13 01:06 | Success | - | |
|
exp_hf_2604.08995_20260413_010246
|
Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory
Paper ID: hf_2604.08995 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-13 01:03 | Success | - | |
|
exp_pytrain.20260413005442.052_20260413_005542
|
Python Skill Fallback
Title: Strictly Typed Event Dispatcher Library - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-13 00:56 | Success | - | |
|
exp_self.20260413002630.046_20260413_002654
|
SSM Strategy Stress Test
This benchmark evaluates the hypothesis that applying State Space Models (SSMs) with a disciplined memory policy improves throughput under strict 8GB VRAM constraints compared to standard sequence processing. Methodology We compare two dist...
|
04-13 00:28 | Success | - | |
|
exp_pytrain.20260412232411.051_20260412_232449
|
Python Skill Fallback
Title: Type-Aware CLI Argument Binder - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-12 23:25 | Success | - | |
|
exp_self.20260412230034.045_20260412_230100
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260412230034.045 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-12 23:02 | Success | - | |
|
exp_pytrain.20260412215617.050_20260412_215715
|
Dynamic Type-Verified Plugin Loader
This benchmark validates a robust plugin architecture implementation based on `typing.Protocol` and `importlib`. It simulates an autonomous system that receives raw code artifacts, dynamically packages them into a runtime module, and enforc...
|
04-12 21:58 | Success | - | |
|
exp_gh_piroplayers69-ops_S3T-Former_20260412_214229
|
piroplayers69-ops/S3T-Former
Paper ID: gh_piroplayers69-ops_S3T-Former - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Re...
|
04-12 21:43 | Success | - | |
|
exp_pytrain.20260412211950.049_20260412_212018
|
Generic Type-Safe CLI Command Builder
This benchmark evaluates the design and implementation of a robust, type-safe command-line interface (CLI) framework using Python's standard library. Problem Statement The goal is to construct a `cli_builder` framework that enforces strong...
|
04-12 21:21 | Success | - | |
|
exp_self.20260412205759.044_20260412_205827
|
Self-directed SSM Strategy Stress Test Benchmark
Overview This benchmark evaluates the efficiency of State Space Models (SSM) versus standard Attention-based architectures under strict memory constraints. The "Innovation" is the utilization of an SSM strategy (mimicking Mamba-style select...
|
04-12 20:59 | Success | - | |
|
exp_pytrain.20260412201149.048_20260412_201207
|
Generic Dependency Container with CLI Entry Point
This coding drill benchmarks the implementation of a dependency injection container using Python 3.12's modern Type Parameter Syntax (PEP 695). It enforces a strict separation of concerns, treating the logic as a reusable library and the `m...
|
04-12 20:13 | Success | - | |
|
exp_self.20260412195156.043_20260412_195217
|
Self-directed SSM Strategy Stress Test
This benchmark evaluates the hypothesis that applying State Space Model (SSM) strategies with a disciplined memory policy improves throughput under constrained 8GB VRAM environments. It compares a **Baseline** (naive memory handling) agains...
|
04-12 19:53 | Success | - | |
|
exp_pytrain.20260412190615.047_20260412_190640
|
Strictly Typed Component Registry Benchmark
This benchmark evaluates the implementation of a strictly typed component registry system using Python's `typing.Protocol` (PEP 544) to enforce structural subtyping. It simulates a modular architecture for performing operations on tensor-li...
|
04-12 19:07 | Success | - | |
|
exp_self.20260412184637.042_20260412_184656
|
Self-directed Benchmark: SSM Strategy Stress Test
This benchmark evaluates the performance of a disciplined State Space Model (SSM) implementation against a baseline approach under strict memory constraints (simulating an 8GB VRAM limit). Hypothesis Applying an SSM with a disciplined memor...
|
04-12 18:48 | Success | - | |
|
exp_pytrain.20260412175234.046_20260412_175259
|
Python Skill Fallback
Title: Dynamic Plugin Loader with Runtime Typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-12 17:54 | Success | - | |
|
exp_self.20260412173053.041_20260412_173114
|
SSM Strategy Stress Test
This benchmark evaluates the hypothesis that State Space Models (SSMs), specifically the Mamba architecture, provide higher inference throughput and better VRAM utilization under 8GB constraints compared to traditional Transformer-based mod...
|
04-12 17:32 | Success | - | |
|
exp_pytrain.20260412164417.045_20260412_164437
|
Dynamic CLI Plugin System Benchmark
This benchmark tests your ability to implement a robust, type-safe plugin architecture using Python's standard library. You will define a Protocol for interface enforcement, a Registry for dependency management, and use `importlib` to dynam...
|
04-12 16:45 | Success | - | |
|
exp_self.20260412162331.040_20260412_162357
|
Self-directed benchmark: SSM Strategy Stress Test
This repository contains a minimal, runnable benchmark designed to test the hypothesis that applying State Space Models (SSMs) with a disciplined memory policy improves throughput under constrained 8GB VRAM environments. Objective To compar...
|
04-12 16:25 | Success | - | |
|
exp_pytrain.20260412153650.044_20260412_153715
|
Python Skill Fallback
Title: Strict Package Metadata Validator - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-12 15:38 | Success | - | |
|
exp_self.20260412151622.039_20260412_151650
|
SSM Strategy Stress Test: Memory Policy & Precision
This benchmark evaluates the hypothesis that applying State Space Models (SSMs) with a disciplined memory policy (chunked processing, state retention, and mixed precision) significantly improves throughput and reduces VRAM usage compared to...
|
04-12 15:18 | Success | - | |
|
exp_pytrain.20260412143046.043_20260412_143104
|
Python Skill Fallback
Title: Dynamic Plugin Loader with Protocol Enforcement - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-12 14:32 | Success | - | |
|
exp_self.20260412141123.038_20260412_141148
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260412141123.038 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-12 14:12 | Success | - | |
|
exp_pytrain.20260412132229.042_20260412_132256
|
Python Skill Fallback
Title: Generic Package Loader with PEP 695 - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-12 13:23 | Success | - | |
|
exp_self.20260412130035.037_20260412_130119
|
SSM Strategy Stress Test
This benchmark evaluates the "Self-directed benchmark: ssm strategy stress test" hypothesis, specifically testing whether a disciplined memory policy (specifically `dynamic_precision` scaling) applied to SSM architectures (Mamba) improves t...
|
04-12 13:02 | Success | - | |
|
exp_pytrain.20260412120400.041_20260412_120500
|
Python Skill Fallback
Title: Strict Dynamic Plugin Loader with Protocol Enforcement - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-12 12:06 | Success | - | |
|
exp_gh_JacobHuang91_prompt-refiner_20260412_115035
|
Benchmark for JacobHuang91/prompt-refiner
This benchmark evaluates the performance of the `prompt-refiner` library, focusing on its ability to manage context windows and optimize token usage for LLM applications. Overview The `prompt-refiner` library claims to save 10-20% on API co...
|
04-12 11:51 | Success | - | |
|
exp_pytrain.20260412112501.040_20260412_112538
|
Typed Plugin Registry with Semantic Versioning
Overview This benchmark implements a high-performance, type-safe plugin registry system simulating a modern AI package manager. It utilizes advanced Python `typing` features (Generics, Protocols, TypeVars) and `dataclasses` to manage data t...
|
04-12 11:26 | Success | - | |
|
exp_self.20260412110044.036_20260412_110124
|
Self-directed Benchmark: SSM Strategy Stress Test
This benchmark evaluates the hypothesis that applying State Space Models (SSMs) with a disciplined memory policy (specifically leveraging dynamic precision and cache management) improves throughput under constrained memory (simulated 8GB li...
|
04-12 11:02 | Success | - | |
|
exp_pytrain.20260412100230.039_20260412_100317
|
Generic Auto-Registry with Dynamic Module Loading
This coding drill focuses on advanced Python `typing` and dynamic module loading mechanisms, commonly found in frameworks like Hugging Face Transformers. The benchmark constructs a self-contained environment where a virtual package is gener...
|
04-12 10:04 | Success | - | |
|
exp_self.20260412093503.035_20260412_093541
|
Small, Runnable Benchmark: SSM Strategy Stress Test
This benchmark is designed to test the hypothesis that **applying SSM (State Space Models) with a disciplined memory policy improves throughput under 8GB constraints**. README.md SSM Strategy Stress Test Benchmark Overview This benchmark ev...
|
04-12 09:38 | Success | - | |
|
exp_pytrain.20260412083845.038_20260412_083915
|
Python Skill Fallback
Title: Typed Plugin Architecture Simulator - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-12 08:40 | Success | - | |
|
exp_self.20260412081456.034_20260412_081524
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260412081456.034 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-12 08:16 | Success | - | |
|
exp_pytrain.20260412071638.037_20260412_071659
|
Dynamic Protocol-Based Extension Loader
Overview This benchmark evaluates a Python system's capability to enforce strict structural typing using `typing.Protocol` while dynamically discovering and loading logic using `importlib`. Hypothesis An autonomous coding system can create...
|
04-12 07:18 | Success | - | |
|
exp_self.20260412065209.033_20260412_065239
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260412065209.033 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-12 06:53 | Success | - | |
|
exp_pytrain.20260412055546.036_20260412_055604
|
Python Skill Fallback
Title: Typed Dynamic Plugin Loader - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-12 05:57 | Success | - | |
|
exp_self.20260412053141.032_20260412_053212
|
SSM Strategy Stress Test Benchmark
This repository contains a minimal, runnable benchmark designed to test the hypothesis that a disciplined memory policy (Dynamic Precision + Selective Caching) applied to State Space Model (SSM) layers improves throughput under constrained...
|
04-12 05:33 | Success | - | |
|
exp_pytrain.20260412044037.035_20260412_044106
|
Python Skill Fallback
Title: Strictly Typed Plugin Registry with Logical Namespacing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-12 04:42 | Success | - | |
|
exp_self.20260412042010.031_20260412_042032
|
SSM Strategy Stress Test Benchmark
This benchmark evaluates the hypothesis that applying State Space Model (SSM) strategies with a disciplined memory policy improves throughput under strict 8GB VRAM constraints. **Concept** The benchmark compares two approaches to processing...
|
04-12 04:21 | Success | - | |
|
exp_pytrain.20260412032915.034_20260412_032936
|
Runtime-Validated Plugin Registry
This coding drill evaluates the ability to design a robust plugin system using Python's standard library. The candidate must implement an `ExtensionLoader` that dynamically discovers, loads, and validates external Python modules against str...
|
04-12 03:30 | Success | - | |
|
exp_self.20260412025342.030_20260412_025422
|
SSM Strategy Stress Test Benchmark
This benchmark evaluates the hypothesis that applying State Space Models (SSMs) with a disciplined memory policy (dynamic precision and memory-efficient scanning) improves throughput compared to a naive float32 implementation under tight 8G...
|
04-12 03:05 | Success | - | |
|
exp_pytrain.20260412015407.033_20260412_015433
|
Generic Backend Registry with Protocol Enforcement
**Objective:** Design and implement a modular inference engine simulation that strictly decouples interface definitions from concrete implementations. The solution must leverage Python's `typing.Protocol`, `TypeVar`, and Generic programming...
|
04-12 01:55 | Success | - | |
|
exp_self.20260412013115.029_20260412_013141
|
Self-directed benchmark: SSM Strategy Stress Test
This benchmark evaluates the hypothesis that applying Selective State Space Models (SSM) with a disciplined memory policy (dynamic precision) improves throughput under strict VRAM constraints compared to standard attention mechanisms. Conte...
|
04-12 01:33 | Success | - | |
|
exp_pytrain.20260412004008.032_20260412_004035
|
Strictly-Typed Component Registry and Dynamic Namespace Loader Benchmark
This benchmark evaluates the ability to architect internal SDK structures similar to large-scale libraries like HuggingFace Transformers. It tests the implementation of a robust registry pattern, Protocol enforcement, and dynamic namespace...
|
04-12 00:41 | Success | - | |
|
exp_self.20260412001746.028_20260412_001807
|
SSM Strategy Stress Test
This benchmark evaluates the hypothesis that applying State Space Model (SSM) strategies—specifically disciplined memory policies and dynamic precision—improves throughput under constrained VRAM (8GB) environments. Methodology We compare tw...
|
04-12 00:19 | Success | - | |
|
exp_pytrain.20260411232628.031_20260411_232657
|
Runtime-Type-Checked Plugin Registry
This coding drill implements a modular Plugin Manager system leveraging Python's `typing.Protocol` for structural subtyping and runtime validation. Unlike traditional inheritance-based architectures, this system enforces contracts via type...
|
04-11 23:28 | Success | - | |
|
exp_self.20260411230611.027_20260411_230646
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260411230611.027 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-11 23:07 | Success | - | |
|
exp_pytrain.20260411221202.030_20260411_221230
|
Dynamic Async Plugin System Loader
Overview This benchmark tests your ability to design a robust runtime code loading system using Python's standard library. It focuses on dynamic packaging, strict type enforcement using `typing.Protocol`, and asynchronous execution handling...
|
04-11 22:13 | Success | - | |
|
exp_self.20260411215054.026_20260411_215114
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260411215054.026 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-11 21:52 | Success | - | |
|
exp_pytrain.20260411210019.029_20260411_210047
|
Dynamic Virtual Package Loader with Strict Protocol Enforcement
Overview This benchmark tests your ability to manipulate Python's import system and enforce type safety using modern typing protocols. **Scenario:** You are building a plugin system where modules are generated dynamically at runtime (e.g.,...
|
04-11 21:01 | Success | - | |
|
exp_self.20260411204051.025_20260411_204128
|
Benchmark: SSM Strategy Stress Test
Overview This benchmark evaluates the efficiency of State Space Models (SSM) versus standard Transformer architectures under constrained VRAM conditions (8GB limit). It specifically tests the hypothesis that an SSM implementation with a dis...
|
04-11 20:42 | Success | - | |
|
exp_pytrain.20260411200250.028_20260411_200317
|
Benchmark: Generic Entry-Point Plugin Loader
Overview This benchmark evaluates the implementation of a type-safe, generic plugin loading mechanism. It tests the candidate's ability to combine Python's static type safety features (Generics, Protocols) with dynamic runtime introspection...
|
04-11 20:04 | Success | - | |
|
exp_self.20260411194619.024_20260411_194631
|
Self-directed Benchmark: SSM Strategy Stress Test
This benchmark evaluates the hypothesis that applying a State Space Model (SSM) approach with a disciplined memory policy improves throughput and reduces VRAM usage compared to a standard baseline implementation. Objective To simulate the m...
|
04-11 19:47 | Success | - | |
|
exp_pytrain.20260411190821.027_20260411_190845
|
Python Reliability Drill: Typing & Verification
This drill implements a mock inference engine using strict Python typing and standard library tools. It simulates tensor operations and memory allocation patterns typical in LLM workloads (referenced from PyTorch and LitGPT contexts) withou...
|
04-11 19:09 | Success | - | |
|
exp_self.20260411185202.023_20260411_185228
|
Self-directed benchmark: ssm strategy stress test
Overview This benchmark evaluates the hypothesis that applying State Space Models (SSMs) with a disciplined memory policy improves throughput under constrained memory environments (approx 8GB VRAM). It compares a standard Transformer block...
|
04-11 18:53 | Success | - | |
|
exp_pytrain.20260411181400.026_20260411_181430
|
Benchmark: Strict Backend Registry with PEP 440 Versioning
This benchmark evaluates the implementation of a robust `PluginRegistry` system typical in high-performance ML inference engines (like vLLM or Diffusers). Objective Candidates must implement a registry system using Python's standard library...
|
04-11 18:15 | Success | - | |
|
exp_self.20260411175636.022_20260411_175657
|
Self-directed benchmark: ssm strategy stress test
Objective This benchmark evaluates the efficacy of a disciplined memory management policy for State Space Models (specifically mimicking Mamba-style SSMs) under a strict 8GB VRAM constraint. Hypothesis Applying SSM operations with a discipl...
|
04-11 17:58 | Success | - | |
|
exp_pytrain.20260411171609.025_20260411_171633
|
Python Skill Fallback
Title: Strict Typed Module Interface and CLI Entry Point - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-11 17:17 | Success | - | |
|
exp_self.20260411165411.021_20260411_165447
|
Self-directed benchmark: SSM Strategy Stress Test
This benchmark evaluates the memory efficiency of State Space Models (SSM) compared to standard Transformer Attention mechanisms under high-sequence-length stress tests. Hypothesis Applying SSM with a disciplined memory policy improves thro...
|
04-11 16:57 | Success | - | |
|
exp_pytrain.20260411160747.024_20260411_160808
|
Dynamic Plugin Loader with Protocol Enforcement
This benchmark tests your ability to use Python's standard library to perform dynamic code generation, filesystem manipulation, and runtime type verification. Objective Create a Python script that programmatically defines a strict `Protocol...
|
04-11 16:09 | Success | - | |
|
exp_self.20260411153809.020_20260411_153832
|
Self-directed SSM Strategy Stress Test
Overview This benchmark validates the hypothesis that applying State Space Model (SSM) strategies with disciplined memory policies improves throughput and efficiency under strict 8GB VRAM constraints. It compares a Baseline approach (simula...
|
04-11 15:49 | Success | - | |
|
exp_pytrain.20260411145202.023_20260411_145220
|
Structural Subtyping and Dynamic Module Loading Benchmark
This benchmark tests the ability to combine static structural typing (`typing.Protocol`) with dynamic module introspection (`importlib`). The objective is to build a robust, minimalistic plugin architecture that allows an autonomous system...
|
04-11 14:53 | Success | - | |
|
exp_self.20260411143309.019_20260411_143346
|
Self-Directed Benchmark: SSM Strategy Stress Test
Overview This benchmark evaluates the hypothesis that applying **State Space Models (SSM)** with a disciplined memory policy (specifically the Mamba architecture) significantly improves throughput and reduces VRAM overhead compared to stand...
|
04-11 14:34 | Success | - | |
|
exp_pytrain.20260411134749.022_20260411_134812
|
Python 3.12 Type Parameter Syntax Benchmark
Objective This benchmark evaluates the runtime behavior and validity of Python 3.12's PEP 695 Type Parameter Syntax within a dynamic package generation scenario. It simulates a meta-build system that generates source code on-the-fly to veri...
|
04-11 13:49 | Success | - | |
|
exp_self.20260411131521.018_20260411_131603
|
SSM Strategy Stress Test: Memory vs. Throughput
This benchmark evaluates the **State Space Model (SSM)** innovation regarding memory efficiency. The core hypothesis is that an SSM-based architecture with a disciplined memory policy can maintain high throughput (tokens/sec) while drastica...
|
04-11 13:29 | Success | - | |
|
exp_pytrain.20260411122601.021_20260411_122632
|
Python Skill Fallback
Title: Strictly Typed Dependency Injection Container - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-11 12:27 | Success | - | |
|
exp_self.20260411120227.017_20260411_120259
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260411120227.017 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-11 12:04 | Success | - | |
|
exp_pytrain.20260411111143.020_20260411_111217
|
Python Skill Fallback
Title: Strictly Typed Artifact Registry - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-11 11:13 | Success | - | |
|
exp_self.20260411104959.016_20260411_105034
|
SSM Strategy Stress Test
This repository contains a lightweight, runnable benchmark designed to test the hypothesis that **applying SSM (State Space Model) strategies with a disciplined memory policy improves throughput under 8GB VRAM constraints**. Hypothesis Stan...
|
04-11 10:52 | Success | - | |
|
exp_pytrain.20260411100253.019_20260411_100312
|
Strictly Typed Dynamic Plugin Loader
Overview This benchmark evaluates the system's ability to simulate the packaging and dynamic loading patterns common in modern ML libraries (e.g., HuggingFace Transformers). It programmatically generates a Python package structure at runtim...
|
04-11 10:04 | Success | - | |
|
exp_cr_10.1007_s44443-026-00723-5_20260411_095100
|
TM-RAG: a transformer-mamba model for long-text evidence aggregation in retrieval-augmented generation
Paper ID: cr_10.1007_s44443-026-00723-5 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Reco...
|
04-11 09:52 | Success | - | |
|
exp_pytrain.20260411093037.018_20260411_093107
|
Python Skill Fallback
Title: Type-Safe Plugin Discovery using importlib - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-11 09:32 | Success | - | |
|
exp_self.20260411090958.015_20260411_091020
|
Self-directed benchmark: SSM Strategy Stress Test
This benchmark evaluates the hypothesis that State Space Models (SSMs), employing disciplined memory policies (constant state size), offer superior throughput compared to standard Attention mechanisms under strict VRAM constraints (8GB). Me...
|
04-11 09:11 | Success | - | |
|
exp_pytrain.20260411082224.017_20260411_082242
|
Robust Package Scaffolder Benchmark
This benchmark tests the ability to generate a Python project structure using strict type definitions (`TypedDict`, `NewType`, `Literal`), `argparse` for CLI interaction, and `pathlib` for file system operations. Usage Run the script direct...
|
04-11 08:23 | Success | - | |
|
exp_self.20260411080236.014_20260411_080303
|
---
Self-directed benchmark: SSM Strategy Stress Test This benchmark evaluates the hypothesis that applying State Space Models (SSM) with a disciplined memory policy (specifically dynamic precision and optimized caching strategies) improves thr...
|
04-11 08:04 | Success | - | |
|
exp_pytrain.20260411071439.016_20260411_071505
|
Strictly-Typed ZipApp Constructor
This benchmark evaluates a Python environment's ability to perform a micro-packaging pipeline that strictly adheres to typing protocols. Objective The goal is to dynamically generate a standalone Python application archive (`.pyz`) that imp...
|
04-11 07:16 | Success | - | |
|
exp_oa_W7152933450_20260411_070235
|
BOSCH: Black-Box Binary Optimization for Short-Context Attention-Head Selection in LLMs
Paper ID: oa_W7152933450 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-11 07:03 | Success | - | |
|
exp_pytrain.20260411064112.015_20260411_064135
|
Type-Safe Generic Event Dispatcher Benchmark
This project implements a Type-Safe Generic Event Dispatcher using modern Python 3.12+ features, specifically PEP 695 (Type Parameter Syntax) and PEP 544 (Protocols). It serves as a coding drill to verify static type safety constructs and r...
|
04-11 06:42 | Success | - | |
|
exp_self.20260411062140.013_20260411_062159
|
SSM Strategy Stress Test
This repository contains a minimal benchmark designed to evaluate the efficiency of State Space Models (SSMs) versus standard recurrent accumulation when dealing with long sequence dependencies under strict memory constraints. Objective The...
|
04-11 06:23 | Success | - | |
|
exp_pytrain.20260411053337.014_20260411_053401
|
Python Skill Fallback
Title: Strict PyProject Metadata Validator - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-11 05:35 | Success | - | |
|
exp_self.20260411051126.012_20260411_051147
|
SSM Strategy Stress Test
This benchmark evaluates the hypothesis that applying a State Space Model (SSM) strategy with a disciplined memory policy (specifically: chunked inference with state caching and dynamic precision) improves inference throughput under strict...
|
04-11 05:13 | Success | - | |
|
exp_pytrain.20260411041837.013_20260411_041912
|
Typing-Safe Dynamic Plugin Loader
This benchmark tests the ability to construct a robust, dynamic class loading mechanism using `importlib` and `typing.Protocol`. The goal is to simulate a modular architecture where classes are loaded at runtime based on string identifiers...
|
04-11 04:20 | Success | - | |
|
exp_self.20260411035703.011_20260411_035730
|
SSM Strategy Stress Test
This benchmark evaluates the performance of State Space Models (specifically Mamba) under strict VRAM constraints. It contrasts a **Standard Baseline** against a **Precision-Optimized** variant to verify the hypothesis that disciplined memo...
|
04-11 03:58 | Success | - | |
|
exp_pytrain.20260411030229.012_20260411_030249
|
Dynamic Component Loader with Strict Protocol Validation
This benchmark evaluates the implementation of a robust, ML-style plugin architecture using Python's standard library. The design simulates a Model Registration system where "plugin" modules are loaded dynamically from memory without touchi...
|
04-11 03:03 | Success | - | |
|
exp_self.20260411024146.010_20260411_024215
|
SSM Strategy Stress Test Benchmark
Overview This benchmark evaluates the hypothesis that applying State Space Model (SSM) strategies with disciplined memory policies improves throughput under 8GB VRAM constraints compared to standard Transformer architectures. It compares tw...
|
04-11 02:43 | Success | - | |
|
exp_pytrain.20260411015154.011_20260411_015227
|
Python Skill Fallback
Title: Strictly Typed Configuration & CLI Entry Point - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-11 01:53 | Success | - | |
|
exp_self.20260411013004.009_20260411_013045
|
SSM Strategy Stress Test Benchmark
Overview This benchmark evaluates the hypothesis that applying State Space Models (SSMs) with a disciplined memory policy significantly improves inference throughput and reduces VRAM overhead compared to standard attention mechanisms when h...
|
04-11 01:32 | Success | - | |
|
exp_pytrain.20260411004055.010_20260411_004120
|
Python Skill Fallback
Title: Strictly-Typed Dependency Visualizer - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-11 00:42 | Success | - | |
|
exp_self.20260411002149.008_20260411_002206
|
Self-directed benchmark: ssm strategy stress test
This benchmark evaluates the hypothesis that a disciplined memory policy within an SSM (State Space Model) architecture improves throughput under strict 8GB VRAM constraints. We compare a **Baseline** (Standard Transformer Attention mechani...
|
04-11 00:23 | Success | - | |
|
exp_pytrain.20260410233809.009_20260410_233839
|
Dynamic Plugin System with Runtime Type Verification
This benchmark tests the ability to design a modular, type-safe plugin system using Python's standard library. It evaluates the candidate's proficiency with `typing.Protocol` for interface definition, `importlib` for dynamic module loading,...
|
04-10 23:39 | Success | - | |
|
exp_self.20260410231535.007_20260410_231609
|
SSM Strategy Stress Test
This benchmark evaluates the performance characteristics of a State Space Model (SSM) implementation under memory pressure. It compares a naive, full-sequence processing approach against a disciplined memory policy that utilizes chunked sca...
|
04-10 23:17 | Success | - | |
|
exp_pytrain.20260410222110.008_20260410_222140
|
Self-Validating Plugin Registry with Dynamic Imports
Overview This benchmark evaluates a Python system's capability to dynamically construct, load, and validate software modules without relying on external files. It tests the integration of `importlib` for runtime module management and `typin...
|
04-10 22:22 | Success | - | |
|
exp_gh_onehundredfifty-myelatelia678_streaminfer_20260410_220818
|
Benchmark: Streaming Inference with Adaptive Batching
This benchmark evaluates the performance of a streaming inference engine. It simulates a real-time workload where input requests arrive continuously. The engine implements adaptive batching (grouping requests to maximize throughput) and bac...
|
04-10 22:09 | Success | - | |
|
exp_pytrain.20260410214603.007_20260410_214646
|
Type-Safe Tensor Arithmetic Package Benchmark
Objective Design and implement a robust Python package named `tensor_lite` that performs basic 2D matrix operations. The solution must demonstrate proficiency in modern Python packaging, static typing using Generics and Protocols, and basic...
|
04-10 21:47 | Success | - | |
|
exp_self.20260410212356.006_20260410_212427
|
Self-Directed Benchmark: SSM Strategy Stress Test
This benchmark evaluates the hypothesis that applying State Space Model (SSM) logic with a disciplined memory policy (dynamic precision and strict state management) improves inference throughput under constrained VRAM (8GB) compared to stan...
|
04-10 21:25 | Success | - | |
|
exp_pytrain.20260410202331.006_20260410_202400
|
Typed Plugin Registry and Namespace Dispatcher
Overview This benchmark demonstrates a robust, modular architecture using Python's standard `typing` module. It simulates a multi-package ecosystem (Core, Models, Utils) within a single script by leveraging class-based namespaces and `__all...
|
04-10 20:25 | Success | - | |
|
exp_self.20260410195844.005_20260410_195903
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260410195844.005 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-10 20:00 | Success | - | |
|
exp_pytrain.20260410191054.005_20260410_191112
|
Strictly Typed Source Distribution Builder
This benchmark evaluates the generation of a Python build script that enforces strict type safety using standard library modules (`typing`, `dataclasses`). **Overview** The system must construct a valid `PackageMetadata` schema and a runtim...
|
04-10 19:12 | Success | - | |
|
exp_self.20260410185055.004_20260410_185129
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260410185055.004 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-10 18:52 | Success | - | |
|
exp_pytrain.20260410180458.004_20260410_180525
|
Strictly Typed Configuration Manager Benchmark
This benchmark evaluates your ability to construct a robust, single-file Python module that demonstrates professional packaging standards (PEP 8 compliance, import organization, module metadata) and utilizes Python's static typing system to...
|
04-10 18:06 | Success | - | |
|
exp_self.20260410174513.003_20260410_174534
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260410174513.003 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-10 17:46 | Success | - | |
|
exp_pytrain.20260410165732.003_20260410_165753
|
Python Skill Fallback
Title: Strictly Typed Configuration Loader with Module Encapsulation - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-10 16:58 | Success | - | |
|
exp_self.20260410163757.002_20260410_163836
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260410163757.002 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-10 16:39 | Success | - | |
|
exp_pytrain.20260410154958.002_20260410_155027
|
Python Reliability Drill: Advanced Typing & Generics
This repository contains a coding benchmark designed to test advanced Python typing capabilities, specifically leveraging PEP 695 (Type Parameter Syntax) introduced in Python 3.12. Objective Implement a generic `Pipeline` class that enforce...
|
04-10 15:51 | Success | - | |
|
exp_self.20260410153050.001_20260410_153116
|
Benchmark for SSM Strategy: Stress Test
Overview This benchmark evaluates the **SSM Strategy Stress Test**, comparing a standard dense processing approach against an optimized SSM-inspired implementation featuring disciplined memory policies, caching, and dynamic precision (bf16)...
|
04-10 15:32 | Success | - | |
|
exp_pytrain.20260410144330.001_20260410_144415
|
Type-Safe Plugin Architecture Simulator Benchmark
This benchmark validates the capability of an autonomous system to dynamically generate Python package structures, implement strict typing protocols using `typing.Protocol` and `typing.TypeVar`, and perform runtime module discovery and load...
|
04-10 14:45 | Success | - | |
|
exp_pytrain.20260410140132.025_20260410_140159
|
Dynamic Plugin Loader with Strict Type Validation
Overview This coding drill tests the hypothesis that a robust Python system can dynamically construct local package structures at runtime, strictly define interface contracts using `typing.Protocol`, and utilize `importlib` to load and vali...
|
04-10 14:03 | Success | - | |
|
exp_self.20260410134129.024_20260410_134158
|
SSM Strategy Stress Test Benchmark
This repository contains a benchmark designed to test the hypothesis that applying State Space Models (SSMs) with a disciplined memory policy (specifically chunked state management and hardware-aware cache utilization) improves inference th...
|
04-10 13:43 | Success | - | |
|
exp_pytrain.20260410125519.024_20260410_125602
|
Python Skill Fallback
Title: Strictly Typed Module Architecture: Configuration Validator - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-10 12:57 | Success | - | |
|
exp_self.20260410123602.023_20260410_123618
|
Self-directed benchmark: SSM Strategy Stress Test
This benchmark evaluates the memory efficiency and throughput of a Selective State Space Model (SSM) strategy against a standard Transformer baseline. Innovation Abstract **Hypothesis**: Applying SSM with disciplined memory policy improves...
|
04-10 12:37 | Success | - | |
|
exp_pytrain.20260410114924.023_20260410_114949
|
Python Skill Fallback
Title: Type-Safe Plugin Registry with Dynamic Imports - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-10 11:50 | Success | - | |
|
exp_self.20260410112808.022_20260410_112829
|
SSM Strategy Stress Test Benchmark
This benchmark evaluates whether applying State Space Models (SSM) with a disciplined memory policy improves throughput under 8GB VRAM constraints. Overview The benchmark compares two implementations: 1. **Baseline SSM**: Standard implement...
|
04-10 11:29 | Success | - | |
|
exp_pytrain.20260410103325.022_20260410_103402
|
Generic Datastore using PEP 695 Type Parameters Benchmark
This benchmark evaluates a Python 3.12+ implementation of a type-safe Key-Value Store utilizing PEP 695 Type Parameter Syntax. Hypothesis Adopting Python 3.12's `class Class[T]:` and `type Alias = ...` syntax significantly reduces syntactic...
|
04-10 10:35 | Success | - | |
|
exp_self.20260410101117.021_20260410_101139
|
SSM Strategy Stress Test Benchmark
This benchmark evaluates the hypothesis that applying a State Space Model (SSM) with a disciplined memory policy improves throughput under 8GB VRAM constraints compared to a naive baseline implementation. Hypothesis Applying SSM with discip...
|
04-10 10:12 | Success | - | |
|
exp_pytrain.20260410091951.021_20260410_092012
|
Strict Typed Package Scaffolder
Overview This benchmark evaluates an autonomous coding agent's ability to synthesize a utility that bridges abstract type definitions with concrete filesystem operations. The goal is to generate a standards-compliant Python project structur...
|
04-10 09:21 | Success | - | |
|
exp_self.20260410085852.020_20260410_085922
|
SSM Strategy Stress Test Benchmark
This repository contains a standalone benchmark designed to evaluate the efficiency of State Space Models (SSMs) against standard Transformer architectures under memory-constrained scenarios (8GB VRAM limit). Hypothesis Applying SSMs with a...
|
04-10 09:00 | Success | - | |
|
exp_pytrain.20260410080757.020_20260410_080827
|
Python Skill Fallback
Title: Robust Plugin Loader with Strict Typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-10 08:09 | Success | - | |
|
exp_hf_2604.08120_20260410_075547
|
Benchmark: Adaptive Token Allocation (ATA) for Long Video Understanding
This benchmark simulates the **Tempo** framework for efficient long-video understanding. It tests the core hypothesis: that a Small Vision-Language Model (SVLM) acting as a query-aware compressor can drastically reduce VRAM usage while main...
|
04-10 07:56 | Success | - | |
|
exp_pytrain.20260410073453.019_20260410_073513
|
Python Skill Fallback
Title: Type-Safe Plugin Registry and Configuration Validator - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-10 07:36 | Success | - | |
|
exp_self.20260410071430.019_20260410_071452
|
SSM Strategy Stress Test: Disciplined Memory Policy Benchmark
Overview This benchmark evaluates the performance of a State Space Model (SSM) under constrained memory conditions (8GB VRAM target). It compares a **Baseline** (standard FP32) against an **Optimized** variant that applies a disciplined mem...
|
04-10 07:16 | Success | - | |
|
exp_pytrain.20260410062427.018_20260410_062453
|
Strictly-Typed Metadata Validator and Plugin Loader
This benchmark demonstrates a robust, zero-dependency package management system implementation using Python's advanced static typing features. Hypothesis Leveraging Python's advanced static typing features (`Protocol`, `TypeGuard`, and `Gen...
|
04-10 06:25 | Success | - | |
|
exp_self.20260410060220.018_20260410_060252
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260410060220.018 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-10 06:03 | Success | - | |
|
exp_pytrain.20260410050440.017_20260410_050517
|
Type-Safe Plugin Loader Simulation
This benchmark demonstrates the capability of an autonomous coding system to leverage Python's `typing` and `inspect` modules to construct a runtime plugin loader that enforces strict interface compliance. **Hypothesis:** An autonomous syst...
|
04-10 05:06 | Success | - | |
|
exp_self.20260410043647.017_20260410_043726
|
SSM Strategy Stress Test
This benchmark compares a standard Transformer-based architecture against an SSM (State Space Model) variant optimized with a disciplined memory policy and dynamic precision. The objective is to validate the hypothesis that SSMs with strict...
|
04-10 04:38 | Success | - | |
|
exp_pytrain.20260410032856.016_20260410_032928
|
Benchmark: Robust Dynamic Plugin Loader with Protocol Validation
Objective This benchmark validates a Python engineer's ability to construct a secure, dynamic plugin system. It demonstrates the bridge between Python's runtime import machinery (`importlib`) and its static type hinting system (`typing.Prot...
|
04-10 03:30 | Success | - | |
|
exp_self.20260410030024.016_20260410_030116
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260410030024.016 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-10 03:02 | Success | - | |
|
exp_pytrain.20260410015343.015_20260410_015415
|
Python Reliability Drill: Type-Safe Container Benchmark
This benchmark tests the implementation of a robust, generic `TypeSafeContainer` utility. The goal is to demonstrate proficiency with Python's type hinting system (PEP 484), runtime type enforcement, and error handling without relying on ex...
|
04-10 01:55 | Success | - | |
|
exp_self.20260410012423.015_20260410_012459
|
Self-directed Benchmark: SSM Strategy Stress Test
Overview This benchmark evaluates the **Memory-Disciplined SSM** innovation against a standard baseline. The hypothesis is that applying a disciplined memory policy (chunking and explicit cache management) to State Space Models (SSM) improv...
|
04-10 01:26 | Success | - | |
|
exp_pytrain.20260410002315.014_20260410_002405
|
Strict Package Type Auditor
Overview This benchmark provides a self-contained Python script that implements a static analysis tool for auditing Python packages. The tool, `audit_pkg.py` (implemented as a core function within `benchmark.py`), inspects a given directory...
|
04-10 00:25 | Success | - | |
|
exp_self.20260409235854.014_20260409_235923
|
Self-directed Benchmark: SSM Strategy Stress Test
1. Overview This benchmark evaluates the memory efficiency and throughput of **State Space Model (SSM)** strategies compared to traditional Transformer attention mechanisms under strict constraints (simulated 8GB VRAM limit). The innovation...
|
04-10 00:00 | Success | - | |
|
exp_pytrain.20260409225445.013_20260409_225545
|
Python Skill Fallback
Title: Strictly Typed Plugin Registry with Semantic Versioning - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-09 22:56 | Success | - | |
|
exp_self.20260409222703.013_20260409_222804
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260409222703.013 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-09 22:29 | Success | - | |
|
exp_pytrain.20260409212103.012_20260409_212134
|
Type-Safe Dynamic Plugin Registry Benchmark
This benchmark tests a Python developer's ability to implement a robust, extensible architecture using Python's `typing` module for Protocols and `importlib` for dynamic runtime discovery. Problem Description Modern Python frameworks often...
|
04-09 21:22 | Success | - | |
|
exp_self.20260409205521.012_20260409_205545
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260409205521.012 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-09 20:56 | Success | - | |
|
exp_self.20260409200913.011_20260409_200932
|
Self-directed benchmark: SSM strategy stress test
Overview This benchmark evaluates the impact of a disciplined memory policy (Dynamic Precision) on a State Space Model (SSM) architecture similar to Mamba. The goal is to validate if aggressive memory optimization improves throughput under...
|
04-09 20:10 | Success | - | |
|
exp_pytrain.20260409195603.011_20260409_195646
|
Benchmark: Typed CLI Log Filter
This benchmark evaluates a Python coding system's ability to generate a structured, robust Python module that adheres to modern packaging and typing standards while functioning as both a library and a command-line interface. Objective The s...
|
04-09 19:57 | Success | - | |
|
exp_self.20260409193726.010_20260409_193756
|
SSM Strategy Stress Test
This benchmark evaluates the "SSM Strategy" hypothesis: that using State Space Models (SSMs) with a disciplined memory policy significantly improves throughput and reduces VRAM usage compared to standard attention-based baselines when opera...
|
04-09 19:39 | Success | - | |
|
exp_pytrain.20260409185734.010_20260409_185755
|
Robust Asynchronous Plugin Loader
This benchmark evaluates the design of a strict, type-safe asynchronous plugin system using only the Python standard library. Objectives 1. **Protocol Enforcement**: Demonstrate the use of `typing.Protocol` to define structural subtyping (d...
|
04-09 18:59 | Success | - | |
|
exp_self.20260409183848.009_20260409_183932
|
SSM Strategy Stress Test
This benchmark evaluates the performance of a Selective State Space Model (SSM) architecture under constrained memory conditions. Objective To validate the hypothesis that a disciplined memory policy (utilizing `torch.compile` kernel fusion...
|
04-09 18:40 | Success | - | |
|
exp_2604.07350v1_20260409_180410
|
Fast Spatial Memory with Elastic Test-Time Training
Paper ID: 2604.07350v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-09 18:05 | Success | - | |
|
exp_pytrain.20260409174610.009_20260409_174629
|
Python Skill Fallback
Title: Dynamic Type-Verified Plugin Loader - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-09 17:47 | Success | - | |
|
exp_self.20260409172843.008_20260409_172909
|
README: Self-directed SSM Strategy Stress Test
Overview This benchmark evaluates the hypothesis that a disciplined SSM (State Space Model) memory policy improves throughput under strict memory constraints (specifically targeting < 8GB VRAM usage) compared to a standard Transformer-style...
|
04-09 17:30 | Success | - | |
|
exp_pytrain.20260409164600.008_20260409_164626
|
Python Skill Fallback
Title: Dynamic Plugin Registry with Type-Safe Discovery - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-09 16:47 | Success | - | |
|
exp_self.20260409162725.007_20260409_162756
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260409162725.007 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-09 16:28 | Success | - | |
|
exp_pytrain.20260409154439.007_20260409_154458
|
Python Skill Fallback
Title: Generic Type-Safe Component Registry - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-09 15:46 | Success | - | |
|
exp_self.20260409152700.006_20260409_152722
|
SSM Strategy Stress Test
**Objective:** Evaluate the performance impact of a disciplined State Space Model (SSM) memory policy against a standard attention-based baseline under strict 8GB VRAM constraints. **Hypothesis:** Applying SSM with disciplined memory policy...
|
04-09 15:28 | Success | - | |
|
exp_hf_2604.05643_20260409_145252
|
Graph-Based Chain-of-Thought Pruning Benchmark
This benchmark evaluates the efficiency gains of the proposed **Graph-Based CoT Pruning** framework. The innovation targets the reduction of "Indiscriminate" and "Repetitive" reflections in Large Language Models (LLMs) by converting linear...
|
04-09 14:53 | Success | - | |
|
exp_pytrain.20260409143402.006_20260409_143428
|
Dynamic Module Loader with Runtime Protocol Verification
This benchmark tests the ability to dynamically compile, load, and validate Python modules from source code strings at runtime. It simulates a plugin architecture where untrusted code must be strictly verified against a `typing.Protocol` be...
|
04-09 14:35 | Success | - | |
|
exp_self.20260409141422.005_20260409_141502
|
Benchmark: SSM Strategy Stress Test
This benchmark evaluates the hypothesis that applying a **State Space Model (SSM)** strategy with a disciplined memory policy (specifically, a Mamba-inspired selective scan) significantly improves throughput and reduces VRAM footprint compa...
|
04-09 14:16 | Success | - | |
|
exp_pytrain.20260409133428.005_20260409_133446
|
Dynamic Type-Verified Package Loader
This benchmark demonstrates the creation of a robust, autonomous plugin loading system using Python's standard library. Objective The goal is to simulate a dynamic extension system where: 1. A temporary Python package is generated programma...
|
04-09 13:35 | Success | - | |
|
exp_self.20260409131636.004_20260409_131657
|
Self-Directed Benchmark: SSM Strategy Stress Test
This benchmark evaluates the impact of a disciplined memory policy (Dynamic Precision + Cache Management) on State Space Models (SSM) under tight VRAM constraints (targeting < 8GB). Hypothesis Applying an SSM with a disciplined memory polic...
|
04-09 13:18 | Success | - | |
|
exp_pytrain.20260409123802.004_20260409_123819
|
Strictly Typed Generic Data Processor
This benchmark evaluates the implementation of a robust, reusable data processing component using Python's advanced static typing features. The focus is on creating a strictly typed library using `typing.Generic`, `typing.TypeVar`, and `typ...
|
04-09 12:39 | Success | - | |
|
exp_self.20260409122132.003_20260409_122158
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260409122132.003 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-09 12:23 | Success | - | |
|
exp_pytrain.20260409114310.003_20260409_114329
|
Dynamic Plugin Loader with Protocol Enforcement
This benchmark tests the ability to construct a modular, type-safe system using Python's standard library. It programmatically generates a Python plugin script on disk, utilizes `importlib` to load it into the runtime, validates the loaded...
|
04-09 11:44 | Success | - | |
|
exp_self.20260409112139.002_20260409_112202
|
SSM Strategy Stress Test
This benchmark evaluates the hypothesis that a **State Space Model (SSM)** architecture, specifically one mimicking the memory efficiency of `mamba`, achieves higher throughput than standard Transformer-style baselines when constrained to 8...
|
04-09 11:24 | Success | - | |
|
exp_pytrain.20260409102502.002_20260409_102530
|
Python Skill Fallback
Title: Generic Repository with PEP 695 Syntax and Strict Encapsulation - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-09 10:26 | Success | - | |
|
exp_self.20260409100210.001_20260409_100233
|
SSM Strategy Stress Test
This benchmark evaluates the efficacy of a State Space Model (SSM) strategy against a standard Transformer baseline under strict memory constraints (8GB VRAM limit). Hypothesis Applying an SSM with a disciplined memory policy (state retenti...
|
04-09 10:03 | Success | - | |
|
exp_pytrain.20260409090302.001_20260409_090334
|
Python Skill Fallback
Title: Type-Safe Dependency Introspection System - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-09 09:04 | Success | - | |
|
exp_pytrain.20260409075940.114_20260409_075957
|
Dynamic Plugin Loader with Strict Protocol Validation
This benchmark tests the ability to implement a robust runtime module loader that simulates package dynamics by writing and importing modules programmatically, while enforcing strict type adherence using Python's `typing.Protocol` and `runt...
|
04-09 08:01 | Success | - | |
|
exp_self.20260409073439.086_20260409_073513
|
Self-directed benchmark: SSM Strategy Stress Test
This benchmark evaluates the hypothesis that a disciplined memory policy within a State Space Model (SSM) implementation improves throughput under 8GB VRAM constraints. The script compares a **Baseline SSM** (naive state accumulation) again...
|
04-09 07:36 | Success | - | |
|
exp_pytrain.20260409063320.113_20260409_063357
|
Runtime Package Constructor and Protocol Verifier
Overview This benchmark evaluates an engineer's ability to dynamically construct Python packaging structures in-memory and enforce strict runtime type safety. The candidate must implement a `DynamicPackageLoader` class that simulates the lo...
|
04-09 06:35 | Success | - | |
|
exp_hf_2604.04913_20260409_061848
|
A Frame is Worth One Token: Efficient Generative World Modeling with Delta Tokens
Paper ID: hf_2604.04913 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-09 06:19 | Success | - | |
|
exp_pytrain.20260409054728.112_20260409_054802
|
Typed Plugin Registry & Configuration Loader
Overview This benchmark evaluates the implementation of a robust, type-safe plugin registry system using only the Python standard library. It simulates the architecture patterns often seen in large-scale frameworks (like HuggingFace Transfo...
|
04-09 05:49 | Success | - | |
|
exp_pytrain.20260409051458.111_20260409_051546
|
Generic CLI Data Transformer with Strict Typing
This coding drill focuses on constructing a robust Command Line Interface (CLI) tool for data transformation using Python's standard library. The objective is to implement a generic Extract, Transform, Load (ETL) pipeline utility that conve...
|
04-09 05:16 | Success | - | |
|
exp_pytrain.20260409041903.110_20260409_042209
|
Python Reliability Drill: Strict Typing & Performance
Objective This benchmark evaluates your ability to write robust, type-safe Python code using standard library features only. It emphasizes strict type annotations (`typing` module), internal package structure, runtime validation, and perfor...
|
04-09 04:23 | Success | - | |
|
exp_self.20260409024723.085_20260409_024941
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260409024723.085 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-09 02:50 | Success | - | |
|
exp_pytrain.20260409005219.109_20260409_005239
|
Protocol-Based Plugin Pipeline
This benchmark demonstrates the use of Python's `typing.Protocol` with `@runtime_checkable` to create a flexible, type-safe plugin architecture. This architectural pattern enables structural subtyping (duck typing with static verification)...
|
04-09 00:53 | Success | - | |
|
exp_hf_2604.06912_20260409_003812
|
Q-Zoom: Query-Aware Adaptive Perception for Efficient Multimodal Large Language Models
Paper ID: hf_2604.06912 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-09 00:39 | Success | - | |
|
exp_pytrain.20260409001404.108_20260409_001428
|
Robust Plugin Loader with Structural Typing Benchmark
This benchmark evaluates the implementation of a robust plugin architecture using Python's standard library. It focuses on two advanced Python features: `typing.Protocol` for Structural Subtyping (Duck Typing) and `importlib` for dynamic mo...
|
04-09 00:15 | Success | - | |
|
exp_self.20260408234520.084_20260408_234548
|
Self-Directed Benchmark: SSM Strategy Stress Test
Overview This benchmark evaluates the hypothesis that a State Space Model (SSM) strategy with a disciplined memory policy (inspired by Mamba architectures) significantly improves throughput and stabilizes VRAM usage under high-context const...
|
04-08 23:50 | Success | - | |
|
exp_pytrain.20260408224742.107_20260408_224807
|
Generic Data Normalizer Registry
This project implements a robust, plugin-based architecture for data normalization using Python's `typing.Protocol` for structural subtyping. It demonstrates how to define generic interfaces and manage concrete implementations (plugins) wit...
|
04-08 22:49 | Success | - | |
|
exp_pytrain.20260408221112.106_20260408_221221
|
Python Skill Fallback
Title: Type-Safe Component Registry with Dynamic Configuration - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-08 22:13 | Success | - | |
|
exp_hf_2604.07023_20260408_214925
|
MARS: Enabling Autoregressive Models Multi-Token Generation
Paper ID: hf_2604.07023 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-08 21:50 | Success | - | |
|
exp_pytrain.20260408211723.105_20260408_211800
|
Strictly Typed Plugin Registry with Runtime Protocol Enforcement
Overview This benchmark tests the ability to design a strictly typed, modular plugin system using Python's standard library. The system utilizes `typing.Protocol` for interface definition and `runtime_checkable` for strict validation during...
|
04-08 21:19 | Success | - | |
|
exp_pytrain.20260408204129.104_20260408_204210
|
PEP 561 Compliant Package Scaffolder
Overview This coding drill benchmark tests the ability to write a sophisticated CLI tool that generates a standards-compliant Python project structure. The tool must strictly adhere to PEP 517 (build system), PEP 621 (project metadata), and...
|
04-08 20:43 | Success | - | |
|
exp_self.20260408200742.083_20260408_200808
|
Self-directed benchmark: SSM Strategy Stress Test
This benchmark evaluates the performance of State Space Models (SSMs) against traditional Transformer-style attention mechanisms under strict memory constraints. Hypothesis Applying an SSM with a disciplined memory policy improves throughpu...
|
04-08 20:09 | Success | - | |
|
exp_pytrain.20260408190658.103_20260408_190726
|
Standard Library Wheel Archiver
**Challenge:** Implement a minimal PEP 427 Wheel packager using only the Python Standard Library. **Objective:** Create a self-contained Python script (`benchmark.py`) that takes a project directory, compiles source code (optional but good...
|
04-08 19:08 | Success | - | |
|
exp_self.20260408184419.082_20260408_184448
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260408184419.082 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-08 18:45 | Success | - | |
|
exp_pytrain.20260408175501.102_20260408_175525
|
PEP 695 Generic Data Processor & Module API Design
This benchmark validates the implementation of Python 3.12+ features, specifically PEP 695 (Type Parameter Syntax), within a robust data processing context. Problem Statement Legacy Python typing relies on verbose `Generic` inheritance and...
|
04-08 17:56 | Success | - | |
|
exp_self.20260408173237.081_20260408_173319
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260408173237.081 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-08 17:34 | Success | - | |
|
exp_pytrain.20260408162803.101_20260408_162826
|
Dynamic Package Constructor and Type Introspector
Hypothesis Combining `typing.TypedDict` for schema validation with `importlib` for dynamic module loading enables the creation of robust, self-validating package scaffolding utilities that strictly enforce typing standards at runtime. Goal...
|
04-08 16:29 | Success | - | |
|
exp_self.20260408160543.080_20260408_160610
|
Self-directed SSM Strategy Stress Test
This benchmark evaluates the hypothesis that applying State Space Models (SSMs) with a disciplined memory policy (recurrent state management) significantly reduces VRAM usage and improves throughput compared to a naive "unrolled" implementa...
|
04-08 16:07 | Success | - | |
|
exp_pytrain.20260408151421.100_20260408_151445
|
Strictly Typed 1D Tensor Module
Overview This coding drill implements a robust, strictly typed 1-dimensional Tensor (Vector) library using pure Python standard library features. The core objective is to demonstrate advanced Python typing mechanisms, specifically **Generic...
|
04-08 15:15 | Success | - | |
|
exp_self.20260408145359.079_20260408_145413
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260408145359.079 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-08 14:55 | Success | - | |
|
exp_pytrain.20260408140115.099_20260408_140149
|
Generic Model Registry with Type-Safety
This drill demonstrates the creation of a robust, type-safe component registry using Python's `typing` module. Learning Objectives * **Protocol Definition:** Define strict interfaces using `typing.Protocol` that enforce structural subtyping...
|
04-08 14:02 | Success | - | |
|
exp_self.20260408133955.078_20260408_134024
|
SSM Strategy Stress Test
This benchmark evaluates the hypothesis that applying State Space Model (SSM) strategies with disciplined memory policies improves throughput under constrained VRAM (8GB target). Overview The test compares a standard Transformer-style atten...
|
04-08 13:41 | Success | - | |
|
exp_pytrain.20260408125053.098_20260408_125116
|
Benchmark: Protocol-Based Dynamic Plugin Loader
**Design Brief:** The objective of this coding drill is to engineer a robust, runtime-safe plugin loading system. The solution must generate a temporary package structure containing varied plugin definitions (valid, invalid, and broken) and...
|
04-08 12:52 | Success | - | |
|
exp_self.20260408122028.077_20260408_122139
|
SSM Memory Policy Stress Test
This benchmark evaluates the hypothesis that applying a **State Space Model (SSM)** strategy with a disciplined memory policy (specifically utilizing dynamic precision and efficient state caching) improves throughput under constrained VRAM...
|
04-08 12:24 | Success | - | |
|
exp_pytrain.20260408111422.097_20260408_111509
|
Strictly Typed Dynamic Module Loader
Overview This benchmark demonstrates a robust Python application architecture that dynamically loads standard library modules at runtime. It enforces type safety constraints using `typing.Protocol` and `@runtime_checkable`, ensuring that dy...
|
04-08 11:16 | Success | - | |
|
exp_self.20260408104359.076_20260408_104451
|
SSM Strategy Stress Test Benchmark
This benchmark evaluates the effectiveness of memory optimization strategies in State Space Models (SSMs) under constrained memory conditions (8GB VRAM). Overview The benchmark compares two SSM implementations: 1. **Baseline**: A standard S...
|
04-08 10:48 | Success | - | |
|
exp_pytrain.20260408094121.096_20260408_094146
|
README: Strictly Typed Dynamic Plugin Loader Benchmark
Objective This benchmark validates the hypothesis that an autonomous system can dynamically discover Python modules at runtime and strictly enforce interface compliance using Structural Sub-typing (Protocols) rather than explicit inheritanc...
|
04-08 09:42 | Success | - | |
|
exp_self.20260408091606.075_20260408_091627
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260408091606.075 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-08 09:17 | Success | - | |
|
exp_pytrain.20260408081853.095_20260408_081927
|
Python Skill Fallback
Title: Generic Type-Safe Event Bus with Strict API - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-08 08:20 | Success | - | |
|
exp_self.20260408075326.074_20260408_075358
|
Self-directed benchmark: SSM Strategy Stress Test
This benchmark evaluates the hypothesis that a State Space Model (SSM) strategy provides superior throughput compared to traditional Attention mechanisms under strict memory constraints (simulated 8GB VRAM limit). Instructions 1. **Dependen...
|
04-08 07:55 | Success | - | |
|
exp_pytrain.20260408065610.094_20260408_065638
|
Dynamic Type-Verified Package Scaffolder
Overview This benchmark evaluates the ability of a coding agent or engineer to programmatically construct a valid Python package structure on the filesystem, populate it with modules containing strict Type Hints, and dynamically load and ve...
|
04-08 06:57 | Success | - | |
|
exp_self.20260408062903.073_20260408_062925
|
SSM Strategy Stress Test Benchmark
This benchmark evaluates the hypothesis that applying State Space Model (SSM) strategies (specifically the constant-memory recurrence found in architectures like Mamba) improves throughput and reduces VRAM pressure compared to standard Atte...
|
04-08 06:30 | Success | - | |
|
exp_pytrain.20260408052934.093_20260408_053032
|
Type-Safe Plugin Architecture with Runtime Discovery
This benchmark demonstrates the implementation of a robust, extensible plugin system using Python's `typing.Protocol` and `inspect` module. It simulates a library core (like vLLM or PyTorch) that dynamically discovers and validates model im...
|
04-08 05:31 | Success | - | |
|
exp_pytrain.20260408045130.092_20260408_045341
|
Generic Plugin Registry Benchmark
This benchmark evaluates the implementation of a type-safe, extensible Plugin Registry system using Python's advanced static typing features. Objective Create a `benchmark.py` script that simulates a robust package structure (using `__all__...
|
04-08 04:54 | Success | - | |
|
exp_pytrain.20260408031557.091_20260408_031754
|
Strictly Typed Modular Data ETL Framework
This benchmark tests your ability to architect a robust, single-file Python script that simulates a package structure using advanced typing features (`typing.Protocol`, `typing.TypeVar`, `typing.Generic`) and standard library introspection...
|
04-08 03:18 | Success | - | |
|
exp_pytrain.20260408021119.090_20260408_021207
|
Strictly Typed Async Event Dispatcher Benchmark
This benchmark tests the implementation of a generic, strictly-typed asynchronous event dispatcher using Python's standard `asyncio` and `typing` libraries. Goal Create a single-file Python module (`benchmark.py`) that acts as a standalone...
|
04-08 02:13 | Success | - | |
|
exp_pytrain.20260408011910.089_20260408_012110
|
Benchmark: Runtime Plugin System with Protocol Validation
Design Brief This benchmark tests an autonomous system's ability to integrate Python's dynamic module loading capabilities (`importlib`) with static type enforcement (`typing.Protocol`). The system must construct a robust, extensible archit...
|
04-08 01:22 | Success | - | |
|
exp_pytrain.20260408003923.088_20260408_003952
|
Python Skill Fallback
Title: Generic Plugin Loader & Dynamic Package Validator - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-08 00:40 | Success | - | |
|
exp_self.20260408001429.072_20260408_001458
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260408001429.072 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-08 00:16 | Success | - | |
|
exp_pytrain.20260407231114.087_20260407_231204
|
Python Skill Fallback
Title: Strictly-Typed Dynamic Plugin Loader - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-07 23:13 | Success | - | |
|
exp_self.20260407223256.071_20260407_223345
|
SSM Strategy Stress Test: Memory Policy Benchmark
Overview This benchmark evaluates the hypothesis that applying a **disciplined memory policy** (specifically gradient checkpointing and state-space tiling) to State Space Models (SSMs) improves throughput under strict hardware constraints (...
|
04-07 22:44 | Success | - | |
|
exp_pytrain.20260407210023.086_20260407_210057
|
Dynamic Plugin Loader with Protocol Validation
Overview This benchmark tests the ability to construct a robust, type-safe dynamic import mechanism using Python's standard library. The script programmatically generates a package structure on disk, enforces interface compliance via `typin...
|
04-07 21:02 | Success | - | |
|
exp_self.20260407203054.070_20260407_203122
|
Self-directed benchmark: ssm strategy stress test
Overview This benchmark evaluates the **Hypothesis: applying ssm with disciplined memory policy improves throughput under 8GB constraints.** It compares two distinct modes of processing a long sequence: 1. **Baseline (Naive SSM)**: Processe...
|
04-07 20:32 | Success | - | |
|
exp_pytrain.20260407193507.085_20260407_193537
|
Typed Configuration Micro-Package
Overview This benchmark evaluates the ability of an autonomous coding system to design a robust, reusable library module within a single Python file. The task requires combining strong static typing (using Protocols and Generics) with packa...
|
04-07 19:36 | Success | - | |
|
exp_self.20260407191353.069_20260407_191438
|
SSM Strategy Stress Test: Disciplined Memory Policy
This benchmark evaluates the impact of a disciplined memory policy on State Space Model (SSM) throughput under constrained VRAM conditions (8GB target). Hypothesis Applying an SSM with a disciplined memory policy (chunked state inference) i...
|
04-07 19:15 | Success | - | |
|
exp_pytrain.20260407182706.084_20260407_182728
|
Robust Typed CLI Factory
An autonomous system can engineer a reusable command-line interface factory that dynamically maps input arguments to a typed configuration class using Python's standard introspection libraries, ensuring strict type safety without external d...
|
04-07 18:28 | Success | - | |
|
exp_self.20260407180749.068_20260407_180832
|
SSM Strategy Stress Test
Overview This benchmark evaluates the "Mamba-style" SSM (State Space Model) strategy against a standard Transformer baseline under strict memory constraints. The goal is to validate the hypothesis that applying an SSM with a disciplined mem...
|
04-07 18:09 | Success | - | |
|
exp_pytrain.20260407172351.083_20260407_172422
|
Python Skill Fallback
Title: Robust Typed Configuration Manager - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-07 17:25 | Success | - | |
|
exp_self.20260407170431.067_20260407_170452
|
SSM Strategy Stress Test Benchmark
This benchmark evaluates the memory efficiency and throughput of a **State Space Model (SSM)** inference strategy when subjected to a disciplined chunking memory policy versus a naive full-sequence baseline. Objective The goal is to simulat...
|
04-07 17:05 | Success | - | |
|
exp_pytrain.20260407161949.082_20260407_162015
|
PEP 695 Generic Event Dispatcher Benchmark
Overview This coding drill evaluates the implementation and performance of an **Event Dispatcher** system utilizing **PEP 695 Type Parameter Syntax** (introduced in Python 3.12). Objective Implement a type-safe, generic event dispatcher wit...
|
04-07 16:21 | Success | - | |
|
exp_self.20260407155808.066_20260407_155832
|
This benchmark tests a synthetic SSM (State Space Model) against a standard Attention baseline to validate the hypothesi...
Benchmark: SSM Strategy Stress Test Overview This script evaluates the memory efficiency and processing speed of a State Space Model (SSM) strategy compared to a standard Transformer Attention baseline. It simulates a "disciplined memory po...
|
04-07 16:00 | Success | - | |
|
exp_pytrain.20260407151008.081_20260407_151031
|
Python Skill Fallback
Title: Strictly Typed Command Dispatcher with Package Metadata - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-07 15:11 | Success | - | |
|
exp_self.20260407144925.065_20260407_144954
|
This repository contains a micro-benchmark designed to evaluate the efficiency gains of State Space Models (SSMs) with d...
Objective The benchmark tests the hypothesis that applying SSM strategies (specifically mimicking the selective scan mechanisms of Mamba architectures) significantly improves throughput and reduces VRAM pressure when processing long sequenc...
|
04-07 14:51 | Success | - | |
|
exp_pytrain.20260407135941.080_20260407_140034
|
Robust Type-Safe Quantization Kernel Benchmark
This project demonstrates a simulation of a quantized linear layer often found in Large Language Models (LLMs), utilizing only the Python standard library. It focuses on strict static typing, package metadata structures, and type-safe opera...
|
04-07 14:01 | Success | - | |
|
exp_self.20260407133703.064_20260407_133726
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260407133703.064 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-07 13:38 | Success | - | |
|
exp_pytrain.20260407124905.079_20260407_124931
|
Benchmark: Typed Model Registry & Public API Management
This benchmark evaluates the implementation of a type-safe, modular component registry system using Python's standard library `typing` module. The goal is to demonstrate robust API design patterns often found in large-scale ML frameworks (l...
|
04-07 12:50 | Success | - | |
|
exp_self.20260407122809.063_20260407_122840
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260407122809.063 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-07 12:29 | Success | - | |
|
exp_pytrain.20260407113455.078_20260407_113538
|
Type-Safe Plugin Loader Benchmark
This project demonstrates a robust, type-safe plugin architecture using Python's standard library. It leverages `typing.Protocol` for structural subtyping (interface compliance without inheritance) and `typing.Generic` for a flexible, type-...
|
04-07 11:36 | Success | - | |
|
exp_self.20260407111110.062_20260407_111135
|
This benchmark is designed to test the hypothesis that State Space Models (SSMs) with a strict memory discipline (linear...
README.md SSM Strategy Stress Test Benchmark Overview This benchmark evaluates the memory efficiency and throughput of a linear-complexity State Space Model (SSM) strategy against a quadratic-complexity Baseline Transformer attention mechan...
|
04-07 11:12 | Success | - | |
|
exp_pytrain.20260407101030.077_20260407_101104
|
Dynamic Module Loader and Protocol Verifier
This coding drill validates a robust plugin architecture using Python's `typing.Protocol` for structural subtyping and `importlib` for runtime module discovery within an isolated file system environment. Scenario You are building an extensi...
|
04-07 10:12 | Success | - | |
|
exp_self.20260407094036.061_20260407_094059
|
Self-directed benchmark: SSM Strategy Stress Test
This benchmark evaluates the efficiency of a State Space Model (SSM) inference strategy against a standard Transformer attention baseline. The specific goal is to validate the hypothesis that a disciplined memory policy (inherent to the rec...
|
04-07 09:42 | Success | - | |
|
exp_pytrain.20260407084328.076_20260407_084347
|
Protocol-Based Dynamic Module Loader
This benchmark evaluates the capability of an autonomous coding system to design a robust plugin architecture using Python's standard library. Objective To implement a dynamic module loading system that enforces strict interface compliance...
|
04-07 08:44 | Success | - | |
|
exp_cr_10.3390_electronics15071535_20260407_083052
|
Tac-Mamba: A Pose-Guided Cross-Modal State Space Model with Trust-Aware Gating for mmWave Radar Human Activity Recogniti...
Paper ID: cr_10.3390_electronics15071535 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Rec...
|
04-07 08:31 | Success | - | |
|
exp_pytrain.20260407080505.075_20260407_080526
|
Generic Plugin Registry with PEP 695 Syntax
Overview This benchmark evaluates a `PluginRegistry` system implementation leveraging Python 3.12's **PEP 695 Type Parameter Syntax**. It demonstrates the new generic class (`class MyClass[T]:`) and generic function (`def method :`) syntax...
|
04-07 08:06 | Success | - | |
|
exp_self.20260407074112.060_20260407_074132
|
SSM Strategy Stress Test
This benchmark evaluates the hypothesis that a State Space Model (SSM) strategy with a disciplined memory policy maintains higher throughput and lower VRAM usage compared to standard Transformer-based attention mechanisms under constrained...
|
04-07 07:42 | Success | - | |
|
exp_pytrain.20260407065013.074_20260407_065037
|
Strictly-Typed Model Configuration Registry
This benchmark validates the design of a robust, type-safe configuration system for Large Language Models (LLMs) using Python's standard `typing` module. It enforces strict structural subtyping (Protocols) and semantic type aliases to preve...
|
04-07 06:51 | Success | - | |
|
exp_self.20260407062325.059_20260407_062359
|
Self-directed benchmark: SSM Strategy Stress Test
Overview This benchmark evaluates the hypothesis that applying State Space Models (SSMs) with a disciplined memory policy significantly improves inference throughput (tokens/sec) and reduces VRAM usage compared to standard Transformer archi...
|
04-07 06:25 | Success | - | |
|
exp_pytrain.20260407051001.073_20260407_051116
|
Type-Safe Entry Point Registry
Overview This benchmark evaluates a custom `PluginRegistry` implementation designed to mimic the robustness of frameworks like vLLM or PyTorch. It leverages Python's `typing.Protocol` and `runtime_checkable` decorators to create a type-safe...
|
04-07 05:12 | Success | - | |
|
exp_hf_2604.02073_20260407_045001
|
PLUME: Latent Reasoning Based Universal Multimodal Embedding
Paper ID: hf_2604.02073 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-07 04:51 | Success | - | |
|
exp_pytrain.20260407041942.072_20260407_042011
|
Typed Configuration and Plugin Registry System
This benchmark implements a robust, mini-framework for a typed plugin registry system using the Python standard library. It demonstrates the architectural patterns found in large-scale libraries like Hugging Face Transformers and Diffusers....
|
04-07 04:21 | Success | - | |
|
exp_pytrain.20260407034536.071_20260407_034646
|
Python Skill Fallback
Title: Type-Safe CLI Application Builder - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-07 03:47 | Success | - | |
|
exp_pytrain.20260407030816.070_20260407_030904
|
Concurrent Dependency Graph Resolver Benchmark
This benchmark tests the ability to design a robust, typed, asynchronous dependency resolution system. The candidate must implement a `resolve_dependencies` function that utilizes `asyncio` for concurrency and strictly adheres to `typing` p...
|
04-07 03:10 | Success | - | |
|
exp_pytrain.20260407023110.069_20260407_023152
|
Structural Subtyping Plugin Loader Benchmark
This benchmark tests the ability to define strict structural interfaces using Python's `typing.Protocol` and implement a robust discovery mechanism for dynamically generated code modules. The candidate system must identify valid implementat...
|
04-07 02:32 | Success | - | |
|
exp_pytrain.20260407015524.068_20260407_015705
|
Python Skill Fallback
Title: Generic Plugin Registry with CLI Entry Points - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-07 01:58 | Success | - | |
|
exp_pytrain.20260407012135.067_20260407_012226
|
Generic Namespace Manager with Protocol Enforcement
Overview This coding drill focuses on advanced Python type hinting and structural subtyping. You are tasked with implementing a `PackageManager` that acts as a namespace registry. It must leverage `typing.Generic`, `typing.TypeVar`, and `ty...
|
04-07 01:23 | Success | - | |
|
exp_pytrain.20260407004901.066_20260407_004927
|
Python Skill Fallback
Title: In-Memory Plugin Architecture with Runtime Type Verification - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-07 00:50 | Success | - | |
|
exp_self.20260407001121.058_20260407_001153
|
Self-directed benchmark: ssm strategy stress test
This benchmark evaluates the memory efficiency and throughput of two distinct processing strategies under strict 8GB VRAM constraints: 1. **Ablated Variant (Baseline):** Simulates a "Global Attention" or "Full Cache" strategy. This model na...
|
04-07 00:18 | Success | - | |
|
exp_pytrain.20260406231332.065_20260406_231357
|
Dynamic Protocol-Compliant Plugin Loader
This coding drill validates the ability to dynamically construct Python packages on a filesystem, load them using low-level `importlib` introspection tools, and enforce structural subtyping using `typing.Protocol`. Objective The candidate m...
|
04-06 23:15 | Success | - | |
|
exp_hf_2604.04921_20260406_225822
|
TriAttention: Efficient Long Reasoning with Trigonometric KV Compression
Paper ID: hf_2604.04921 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-06 22:59 | Success | - | |
|
exp_pytrain.20260406222533.064_20260406_222600
|
Python Skill Fallback
Title: Strictly-Typed Package Configuration Loader - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-06 22:27 | Success | - | |
|
exp_self.20260406215309.057_20260406_215352
|
Self-Directed Benchmark: SSM Strategy Stress Test
Overview This benchmark evaluates the "Disciplined Memory Policy" hypothesis for State Space Models (SSMs). It compares a standard full-precision SSM implementation against an optimized variant utilizing dynamic precision and memory checkpo...
|
04-06 21:55 | Success | - | |
|
exp_pytrain.20260406204123.063_20260406_204146
|
Generic Async Task Dispatcher with Protocol Enforcement
This benchmark implements an asynchronous task processing system using Python's `typing.Protocol`, `typing.Generic`, and `asyncio`. It demonstrates a modular architecture where strict type contracts are enforced to ensure data safety and ro...
|
04-06 20:42 | Success | - | |
|
exp_self.20260406201612.056_20260406_201653
|
SSM Strategy Stress Test Benchmark
This benchmark evaluates the hypothesis that a **Disciplined Memory Policy** (selective state retention) in State Space Models (SSMs) significantly reduces VRAM usage while maintaining competitive throughput under strict 8GB constraints. Ov...
|
04-06 20:18 | Success | - | |
|
exp_pytrain.20260406192440.062_20260406_192524
|
Dynamic Generic Plugin Loader with PEP 695 Benchmark
Overview This coding drill evaluates your ability to programmatically construct Python packages and utilize modern Python type systems (PEP 695). The script creates a temporary package structure on disk, injects source code using Python 3.1...
|
04-06 19:26 | Success | - | |
|
exp_pytrain.20260406185011.061_20260406_185238
|
Generic Plugin Loader with Runtime Type Validation
This benchmark demonstrates a robust architectural pattern for building extensible Python applications. It utilizes `typing.Protocol` to define structural interfaces (contracts) that plugins must satisfy, and `importlib` to dynamically disc...
|
04-06 18:53 | Success | - | |
|
exp_self.20260406182829.055_20260406_182851
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260406182829.055 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-06 18:29 | Success | - | |
|
exp_pytrain.20260406173300.060_20260406_173333
|
Python Skill Fallback
Title: Strictly-Typed Backend Dispatcher with Dynamic Discovery - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-06 17:34 | Success | - | |
|
exp_hf_2604.01609_20260406_172139
|
Swift-SVD: Low-Rank LLM Compression Benchmark
This benchmark evaluates the performance characteristics of **Swift-SVD**, a novel activation-aware compression framework. Specifically, it measures the **VRAM reduction**, **Inference Throughput (Tokens/sec)**, and **Compression Speed** wh...
|
04-06 17:22 | Success | - | |
|
exp_pytrain.20260406165827.059_20260406_165856
|
Python Skill Fallback
Title: Generic Component Registry with Simulated Sub-Module Registration - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-06 16:59 | Success | - | |
|
exp_self.20260406163742.054_20260406_163802
|
Self-directed Benchmark: SSM Strategy Stress Test
Hypothesis Applying SSM (State Space Model) architectures with a disciplined memory policy (specifically dynamic precision and compilation) improves throughput under 8GB VRAM constraints compared to a standard baseline configuration. Plan W...
|
04-06 16:39 | Success | - | |
|
exp_pytrain.20260406154951.058_20260406_155021
|
Generic Plugin Registry and CLI Dispatcher
Challenge Overview This benchmark tests the ability to architect a robust, type-safe plugin system using Python's advanced `typing` features. The candidate must implement a generic command registry and a dispatcher that can handle different...
|
04-06 15:51 | Success | - | |
|
exp_self.20260406151636.053_20260406_151707
|
SSM Strategy Stress Test
This benchmark evaluates the memory efficiency and throughput performance of a State Space Model (SSM) strategy against a standard Dense baseline. It simulates a scenario with a large sequence length to stress GPU memory constraints (8GB li...
|
04-06 15:28 | Success | - | |
|
exp_pytrain.20260406142735.057_20260406_142826
|
Python Skill Fallback
Title: Strictly-Typed Generic Data Pipeline with CLI Entry Point - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-06 14:29 | Success | - | |
|
exp_self.20260406140413.052_20260406_140510
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260406140413.052 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-06 14:06 | Success | - | |
|
exp_pytrain.20260406130831.056_20260406_130903
|
Python Skill Fallback
Title: Type-Safe Plugin Architecture with Dynamic Discovery - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-06 13:10 | Success | - | |
|
exp_self.20260406124920.051_20260406_124944
|
SSM Strategy Stress Test
This benchmark evaluates the performance implications of a disciplined memory policy applied to State Space Models (SSMs). It compares a standard sequential implementation against an optimized variant that utilizes chunked processing and au...
|
04-06 12:50 | Success | - | |
|
exp_pytrain.20260406115518.055_20260406_115551
|
Programmatic Package Construction and Runtime Type Verification
Overview This coding drill tests the ability to dynamically construct a valid Python package distribution (simulating a wheel/ZIP), inject it into the runtime, and perform runtime type verification using the `typing` module. Objective Creat...
|
04-06 11:56 | Success | - | |
|
exp_hf_2604.03118_20260406_113833
|
Benchmark for Salt: Self-Consistent Distribution Matching
This benchmark evaluates the computational efficiency and memory footprint characteristics of the **Salt** algorithm proposals. Specifically, it simulates the overhead introduced by: 1. **SC-DMD (Self-Consistent Distribution Matching):** Th...
|
04-06 11:39 | Success | - | |
|
exp_pytrain.20260406111214.054_20260406_111234
|
Typed Metadata Discovery System
Objective Design and implement a robust `DistributionScanner` class that utilizes Python's standard library `importlib.metadata` to perform introspection on installed packages. Requirements 1. **Strict Typing**: Utilize `typing.TypedDict` t...
|
04-06 11:13 | Success | - | |
|
exp_self.20260406104834.050_20260406_104902
|
Benchmark: SSM Strategy Stress Test
This benchmark evaluates the performance of State Space Models (SSM) under constrained VRAM environments (8GB limit). It compares a baseline SSM implementation against a variant employing dynamic precision and disciplined memory policies. I...
|
04-06 10:50 | Success | - | |
|
exp_pytrain.20260406095148.053_20260406_095225
|
Python Skill Fallback
Title: Strictly-Typed Multi-Backend Dispatcher Simulation - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-06 09:53 | Success | - | |
|
exp_self.20260406092333.049_20260406_092535
|
SSM Strategy Stress Test Benchmark
This benchmark evaluates the performance of State Space Models (SSMs) with different memory management strategies, specifically testing if a disciplined memory policy improves throughput under 8GB VRAM constraints. Background State Space Mo...
|
04-06 09:26 | Success | - | |
|
exp_pytrain.20260406081147.052_20260406_081231
|
Robust Dynamic Plugin Loader using Structural Typing
Overview This benchmark verifies the hypothesis that `typing.Protocol` with `@runtime_checkable` enables an autonomous system to dynamically verify and enforce interface compliance without explicit inheritance. The Challenge In modular plug...
|
04-06 08:13 | Success | - | |
|
exp_pytrain.20260406073857.051_20260406_073939
|
Type-Safe Dynamic Module Loader Benchmark
This benchmark tests the ability to design a robust runtime type checking system using Python's `typing.Protocol`. It simulates a dynamic plugin loader where modules (represented as dictionaries) are inspected for structural compliance with...
|
04-06 07:40 | Success | - | |
|
exp_self.20260406071042.048_20260406_071123
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260406071042.048 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-06 07:12 | Success | - | |
|
exp_pytrain.20260406060536.050_20260406_060556
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-06 06:06 | Success | - | |
|
exp_self.20260406053301.047_20260406_053343
|
Self-directed benchmark: ssm strategy stress test
This project implements a reproducible benchmark designed to test the hypothesis that applying SSM (State Space Model) strategies with a disciplined memory policy improves throughput under strict VRAM constraints (8GB). The Hypothesis We hy...
|
04-06 05:34 | Success | - | |
|
exp_pytrain.20260406044142.049_20260406_044216
|
Strict Configuration & Metadata Validator
This coding drill evaluates the ability to enforce strict type safety in Python using `TypedDict` and `importlib` for runtime environment verification. Objective The candidate must implement a `PackageManifest` validator and an environment...
|
04-06 04:43 | Success | - | |
|
exp_self.20260406041826.046_20260406_041851
|
SSM Strategy Stress Test Benchmark
This benchmark evaluates the hypothesis that a **Disciplined Memory Policy**—specifically utilizing **Selective State Space Models (SSM)** with **Dynamic Precision** and **State Caching**—improves throughput under strict VRAM constraints (s...
|
04-06 04:19 | Success | - | |
|
exp_pytrain.20260406031856.048_20260406_031937
|
Python Skill Fallback
Title: Type-Safe Generic Storage Module - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-06 03:20 | Success | - | |
|
exp_self.20260406024855.045_20260406_024930
|
Benchmark: SSM Strategy Stress Test
This benchmark evaluates a synthetic Selective State Space Model (SSM) implementation to test memory policies. It compares an optimized configuration (utilizing dynamic precision and disciplined caching) against an ablated configuration (FP...
|
04-06 02:50 | Success | - | |
|
exp_pytrain.20260406015132.047_20260406_015205
|
Strictly-Typed Tensor Micro-Package CLI
This module implements a minimalistic, strongly-typed Tensor micro-package using Python's standard `typing` generics. It demonstrates a domain-specific object design that enforces type consistency across numerical operations while adhering...
|
04-06 01:53 | Success | - | |
|
exp_2604.03225v1_20260406_013957
|
VOSR: Vision-Only Generative Model Benchmark
This benchmark evaluates the inference performance of the VOSR (Vision-Only Super-Resolution) model architecture. VOSR distinguishes itself by relying purely on visual data for generation, employing a pretrained vision encoder for semantic...
|
04-06 01:40 | Success | - | |
|
exp_pytrain.20260406011845.046_20260406_011904
|
Typed Module Dependency Resolver
Overview This coding drill benchmarks the creation of a robust dependency resolution mechanism. It emphasizes the use of Python's standard library `typing` module (specifically `TypedDict`) for explicit data structuring and `importlib` for...
|
04-06 01:20 | Success | - | |
|
exp_self.20260406005911.044_20260406_010023
|
SSM Strategy Stress Test Benchmark
Overview This benchmark evaluates the performance of State Space Models (SSM) with and without memory optimization strategies, focusing on techniques inspired by Mamba architecture. The benchmark measures VRAM usage and tokens per second un...
|
04-06 01:01 | Success | - | |
|
exp_pytrain.20260406001435.045_20260406_001503
|
Python Skill Fallback
Title: Robust Type-Safe Plugin Loader - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-06 00:16 | Success | - | |
|
exp_self.20260405235250.043_20260405_235317
|
SSM Strategy Stress Test
This benchmark validates the hypothesis that applying State Space Model (SSM) strategies with disciplined memory policies improves throughput and efficiency under 8GB VRAM constraints. Overview The benchmark simulates two inference strategi...
|
04-05 23:54 | Success | - | |
|
exp_pytrain.20260405225940.044_20260405_230006
|
Strictly Typed Plugin System Benchmark
This project demonstrates a high-performance, type-safe plugin architecture using Python's standard library. It combines structural subtyping (`typing.Protocol`) with dynamic module loading (`importlib`) to validate and execute plugin code...
|
04-05 23:01 | Success | - | |
|
exp_self.20260405223743.042_20260405_223812
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260405223743.042 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-05 22:39 | Success | - | |
|
exp_pytrain.20260405215000.043_20260405_215027
|
Python Reliability Drill: Strict Typing & Runtime Validation
This benchmark implements a robust utility class `StrictValidator` designed to enforce runtime type safety on complex data structures without external dependencies. It simulates the behavior of high-level validation libraries (like Pydantic...
|
04-05 21:51 | Success | - | |
|
exp_self.20260405212935.041_20260405_212957
|
SSM Strategy Stress Test Benchmark
This benchmark evaluates the hypothesis that applying State Space Models (SSMs) with a disciplined memory policy improves throughput and reduces VRAM usage compared to standard attention mechanisms under strict memory constraints (simulatin...
|
04-05 21:31 | Success | - | |
|
exp_pytrain.20260405204218.042_20260405_204233
|
PEP 695 Generic Factory Benchmark
This benchmark validates the implementation of a generic factory system using Python 3.12's Type Parameter Syntax (PEP 695). It enforces strict namespace management and Protocol-based constraints. Prerequisites - Python 3.12 or higher (Requ...
|
04-05 20:43 | Success | - | |
|
exp_self.20260405202243.040_20260405_202303
|
SSM Strategy Stress Test
This benchmark evaluates the hypothesis that applying State Space Models (SSMs) with a disciplined memory policy significantly improves throughput under strict 8GB VRAM constraints. It contrasts a **Baseline SSM** (which may naively cache s...
|
04-05 20:25 | Success | - | |
|
exp_pytrain.20260405193958.041_20260405_194018
|
Runtime-Verified Plugin Architecture Benchmark
This benchmark demonstrates an autonomous system's ability to programmatically construct a valid Python package structure on disk and enforce strict structural subtyping (Protocols) on dynamically discovered modules. Objective To test dynam...
|
04-05 19:41 | Success | - | |
|
exp_self.20260405191951.039_20260405_192023
|
Self-directed benchmark: ssm strategy stress test
Objective This benchmark evaluates the hypothesis that applying a Selective State Space Model (SSM) strategy with a disciplined memory policy improves inference throughput and reduces VRAM overhead compared to a standard Transformer-style K...
|
04-05 19:21 | Success | - | |
|
exp_pytrain.20260405183353.040_20260405_183412
|
Dynamic Kernel Dispatcher with Type Safety
Overview This coding drill evaluates the ability to construct a robust plugin architecture similar to backend selection in deep learning frameworks (like PyTorch or LitGPT). The candidate must implement a dispatcher system using Python's `t...
|
04-05 18:35 | Success | - | |
|
exp_self.20260405181427.038_20260405_181452
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260405181427.038 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-05 18:15 | Success | - | |
|
exp_pytrain.20260405172750.039_20260405_172815
|
Robust Plugin Registry with Version Compatibility Simulation
Design Brief This coding drill assesses the ability to construct a generic, type-safe registry pattern similar to those found in large-scale frameworks like Transformers or vLLM. The benchmark simulates how these frameworks handle dynamic m...
|
04-05 17:29 | Success | - | |
|
exp_self.20260405170718.037_20260405_170759
|
Self-Directed Benchmark: SSM Strategy Stress Test
Overview This benchmark evaluates the hypothesis that applying State Space Models (SSM) with a disciplined memory policy (specifically, the Mamba architecture) improves inference throughput and stabilizes VRAM usage under 8GB constraints co...
|
04-05 17:09 | Success | - | |
|
exp_pytrain.20260405161902.038_20260405_161930
|
Strictly-Typed Dynamic Plugin Loader
Overview This benchmark demonstrates the use of Python's `typing.Protocol` for structural subtyping in a dynamic plugin loading system. Unlike nominal subtyping (Abstract Base Classes), Protocols allow class compatibility based on the prese...
|
04-05 16:20 | Success | - | |
|
exp_self.20260405155424.036_20260405_155454
|
SSM Strategy Stress Test Benchmark
This benchmark evaluates the efficacy of a **State Space Model (SSM)** memory strategy against a standard Transformer-style baseline. Specifically, it tests the hypothesis that a disciplined memory policy (constant-state recurrence) allows...
|
04-05 15:55 | Success | - | |
|
exp_pytrain.20260405150811.037_20260405_150842
|
Dynamic Type-Safe Plugin Loader
Overview This coding drill benchmark implements a **Dynamic Type-Safe Plugin Loader**. The objective is to demonstrate how to use Python's `typing.Protocol` and `tempfile` to build a robust system for loading and verifying external code mod...
|
04-05 15:09 | Success | - | |
|
exp_self.20260405144634.035_20260405_144659
|
SSM Strategy Stress Test Benchmark
Overview This benchmark evaluates the performance of State Space Models (SSMs) under strict memory constraints (simulating an 8GB VRAM limit). It compares a **Naive Baseline** implementation against an **Optimized Policy** variant that util...
|
04-05 14:48 | Success | - | |
|
exp_pytrain.20260405135522.036_20260405_135546
|
Strictly Typed Dynamic Plugin Loader
Introduction This benchmark demonstrates a robust, zero-trust plugin architecture within a pure Python environment. It leverages **Structural Subtyping (Protocols)** to enforce interface compatibility at runtime without requiring shared bas...
|
04-05 13:56 | Success | - | |
|
exp_self.20260405133626.034_20260405_133656
|
Self-Directed Benchmark: SSM Strategy Stress Test
Overview This benchmark evaluates a **Disciplined Memory Policy** applied to a State Space Model (SSM) architecture. The objective is to test the hypothesis that selective state caching and chunk-based processing improve throughput and redu...
|
04-05 13:38 | Success | - | |
|
exp_pytrain.20260405124901.035_20260405_124923
|
Generic Plugin Architecture with Dynamic Discovery
This benchmark demonstrates a robust, type-safe plugin architecture using Python's standard library. Objective The hypothesis is that an autonomous coding system can leverage Python's type system (specifically `typing.Protocol` and Generics...
|
04-05 12:50 | Success | - | |
|
exp_self.20260405122950.033_20260405_123014
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260405122950.033 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-05 12:31 | Success | - | |
|
exp_pytrain.20260405114356.034_20260405_114428
|
Strictly-Typed Generic Dependency Resolver
This coding drill validates your ability to write robust, type-safe Python code using advanced `typing` constructs (Generics, Protocols) and classical algorithms (Topological Sort). Objective Implement a generic package manager capable of r...
|
04-05 11:45 | Success | - | |
|
exp_self.20260405112401.032_20260405_112421
|
Self-Directed Benchmark: SSM Strategy Stress Test
Overview This benchmark evaluates the **SSM (State Space Model)** strategy against a baseline attention mechanism under strict **8GB VRAM constraints**. The core hypothesis is that applying an SSM with a **disciplined memory policy** (fixed...
|
04-05 11:25 | Success | - | |
|
exp_pytrain.20260405103723.033_20260405_103752
|
Python Skill Fallback
Title: Generic Plugin Registry with Dynamic Namespace Simulation - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-05 10:38 | Success | - | |
|
exp_self.20260405101406.031_20260405_101427
|
SSM Strategy Stress Test
This benchmark evaluates the performance implications of applying a disciplined memory policy to State Space Model (SSM) architectures, specifically mimicking the Mamba selective state space approach. Hypothesis Applying SSM with a discipli...
|
04-05 10:15 | Success | - | |
|
exp_pytrain.20260405091646.032_20260405_091717
|
Type-Safe Plugin Registry Coding Drill
This benchmark challenges the implementation of a modular, extensible application framework using Python's standard library type system. The objective is to construct a `ModelRunner` registry that allows for the dynamic registration and ret...
|
04-05 09:18 | Success | - | |
|
exp_self.20260405085511.030_20260405_085535
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260405085511.030 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-05 08:56 | Success | - | |
|
exp_pytrain.20260405080414.031_20260405_080441
|
Type-Safe Dynamic Plugin Loader Benchmark
Overview This benchmark evaluates a Python system's ability to synthesize standard library tools—specifically the `typing` and `inspect` modules—to create a robust, type-safe plugin architecture. The Challenge The goal is to implement a dyn...
|
04-05 08:05 | Success | - | |
|
exp_self.20260405074154.029_20260405_074232
|
Self-directed benchmark: SSM Strategy Stress Test
This repository contains a runnable benchmark designed to test the hypothesis that applying State Space Model (SSM) architectures with a disciplined memory policy improves throughput under 8GB VRAM constraints compared to standard recurrent...
|
04-05 07:43 | Success | - | |
|
exp_pytrain.20260405065622.030_20260405_065644
|
Typed Asynchronous Plugin Loader
A Python coding drill designed to test strict type adherence, packaging standards (PEP 8), and asynchronous concurrent execution capabilities within the standard library. Objective Build a robust, extensible plugin architecture where plugin...
|
04-05 06:57 | Success | - | |
|
exp_self.20260405063601.028_20260405_063625
|
SSM Strategy Stress Test Benchmark
This benchmark evaluates the hypothesis that a disciplined memory policy applied to State Space Models (SSM) significantly improves throughput and reduces VRAM usage during high-load inference (simulating >8GB context scenarios). Requiremen...
|
04-05 06:37 | Success | - | |
|
exp_pytrain.20260405054549.029_20260405_054611
|
Strictly-Typed Dynamic Package Generator
This benchmark evaluates a system's ability to programmatically synthesize a valid Python package structure. It verifies the system can write advanced static typing constructs (Protocols, Generics, ParamSpec) to disk, generate valid packagi...
|
04-05 05:47 | Success | - | |
|
exp_self.20260405052336.027_20260405_052410
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260405052336.027 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-05 05:25 | Success | - | |
|
exp_pytrain.20260405042503.028_20260405_042552
|
Strict Generic Plugin Registry Benchmark
This benchmark evaluates the performance and correctness of a strictly typed plugin system implemented using Python's `typing.Protocol` (PEP 544) and modern Type Parameter syntax (PEP 695). **Design Overview** The system defines a `Processo...
|
04-05 04:26 | Success | - | |
|
exp_self.20260405040113.026_20260405_040207
|
Self-Directed Benchmark: SSM Strategy Stress Test
Overview This benchmark evaluates the performance impact of a disciplined memory policy applied to State Space Models (SSMs), specifically mimicking architectures like Mamba. The test compares a baseline implementation against an optimized...
|
04-05 04:03 | Success | - | |
|
exp_pytrain.20260405030141.027_20260405_030219
|
Coding Drill Benchmark: Strictly Typed Autograd Mini-Library
Robust library architecture relies on strict separation between the public interface and private implementation, enforced by explicit `__all__` declarations and structural subtyping.
|
04-05 03:03 | Success | - | |
|
exp_self.20260405023405.025_20260405_023438
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260405023405.025 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-05 02:35 | Success | - | |
|
exp_pytrain.20260405013040.026_20260405_013116
|
Strictly Typed Dynamic Plugin Registry
Objective This benchmark demonstrates a robust plugin architecture using Python's `typing.Protocol` and `runtime_checkable` decorators. Unlike traditional ad-hoc duck typing (which assumes "if it walks like a duck, it's a duck" often leadin...
|
04-05 01:32 | Success | - | |
|
exp_self.20260405010652.024_20260405_010722
|
SSM Strategy Stress Test Benchmark
This benchmark evaluates the hypothesis that applying State Space Model (SSM) strategies with disciplined memory policies improves throughput and reduces VRAM usage compared to standard Transformer-based approaches under constrained VRAM (8...
|
04-05 01:08 | Success | - | |
|
exp_pytrain.20260405001558.025_20260405_001632
|
Strict Protocol-Driven Plugin Loader with Metadata Introspection
This benchmark evaluates the ability to construct an extensible plugin architecture using Python's `typing.Protocol`. It enforces strict runtime signature validation using `inspect` and `typing` modules to ensure interface compliance before...
|
04-05 00:17 | Success | - | |
|
exp_self.20260404235228.023_20260404_235309
|
SSM Strategy Stress Test Benchmark
This benchmark evaluates the hypothesis that applying State Space Models (SSMs) with a disciplined memory policy improves throughput and reduces VRAM usage compared to standard architectures (specifically Attention-based models) under stric...
|
04-04 23:54 | Success | - | |
|
exp_pytrain.20260404225531.024_20260404_225613
|
Type-Safe Generic Batch Validator Module Benchmark
This benchmark evaluates a Python module's ability to define and enforce strict type specifications using modern `typing` features (`Protocol`, `Generic`, `TypeVar`) and packaging standards (`__all__`). Benchmark Design The subject under te...
|
04-04 22:57 | Success | - | |
|
exp_self.20260404223103.022_20260404_223134
|
Self-directed benchmark: SSM strategy stress test
Hypothesis Applying **SSM** (State Space Model) logic with a disciplined memory policy (simulated here via `dynamic_precision` and efficient state `cache` management) significantly improves throughput and reduces VRAM footprint compared to...
|
04-04 22:32 | Success | - | |
|
exp_pytrain.20260404213505.023_20260404_213532
|
Strictly-Typed Dynamic Plugin Loader
Overview This benchmark evaluates a system's ability to construct a robust, extensible architecture using Python's `typing.Protocol` for interface enforcement and `importlib` for runtime module discovery. Objective Develop a single-file scr...
|
04-04 21:36 | Success | - | |
|
exp_self.20260404210403.021_20260404_210426
|
SSM Strategy Stress Test
This benchmark evaluates the memory efficiency and throughput of State Space Model (SSM) layers when subjected to a disciplined memory policy (dynamic precision and chunked scanning) versus a naive full-precision baseline. Requirements - Py...
|
04-04 21:05 | Success | - | |
|
exp_pytrain.20260404200851.022_20260404_200927
|
PEP 695 Generic Dependency Resolver Drill
**Overview** This benchmark evaluates your ability to implement generic algorithms using modern Python 3.12+ syntax. Specifically, it tests the implementation of a Type Parameter Syntax (PEP 695) class to perform dependency resolution on a...
|
04-04 20:10 | Success | - | |
|
exp_self.20260404194638.020_20260404_194708
|
Self-directed benchmark: SSM Strategy Stress Test
Overview This benchmark evaluates the **Memory Policy** of State Space Models (SSMs) compared to standard dense linear transformations (simulating a Transformer block without attention or a standard MLP). The hypothesis is that the selectiv...
|
04-04 19:48 | Success | - | |
|
exp_pytrain.20260404185708.021_20260404_185741
|
Strictly Typed Dynamic Plugin Loader and Metadata Validator
Overview This benchmark evaluates the use of Python's advanced type hinting features (specifically `NewType`, `TypedDict`, and `Protocol`) to construct a robust, strictly typed runtime plugin system. The Hypothesis An autonomous system can...
|
04-04 18:58 | Success | - | |
|
exp_self.20260404183647.019_20260404_183713
|
SSM Strategy Stress Test
This benchmark evaluates the memory efficiency and throughput of a State Space Model (SSM) strategy compared to a standard Attention-based baseline. The goal is to demonstrate that SSMs, utilizing a disciplined memory policy (constant state...
|
04-04 18:38 | Success | - | |
|
exp_pytrain.20260404174545.020_20260404_174623
|
Dynamic Type-Safe Plugin System
This coding drill implements a self-contained benchmark for a robust, dynamic plugin architecture using only Python's standard library. Overview The system simulates a high-performance kernel loader (similar to PyTorch or Lightning backend...
|
04-04 17:47 | Success | - | |
|
exp_self.20260404172449.018_20260404_172525
|
Self-directed benchmark: ssm strategy stress test
This benchmark evaluates the hypothesis that applying State Space Models (SSM) with a disciplined memory policy improves throughput under 8GB VRAM constraints. It compares a Baseline configuration against an Optimized configuration (discipl...
|
04-04 17:26 | Success | - | |
|
exp_pytrain.20260404163602.019_20260404_163623
|
Python Skill Fallback
Title: Strictly Typed Backend Registry and Dependency Resolver - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-04 16:37 | Success | - | |
|
exp_self.20260404161330.017_20260404_161353
|
Self-directed benchmark: ssm strategy stress test
This benchmark evaluates the hypothesis that applying State Space Model (SSM) strategies with a disciplined memory policy improves throughput and reduces VRAM usage compared to standard autoregressive baselines under 8GB constraints. Concep...
|
04-04 16:17 | Success | - | |
|
exp_pytrain.20260404152841.018_20260404_152902
|
Runtime Plugin System with Structural Subtyping
This benchmark implements a dynamic plugin loader that utilizes Python's `typing.Protocol` and `@runtime_checkable` to discover and validate modules at runtime without explicit inheritance. It demonstrates structural subtyping where classes...
|
04-04 15:30 | Success | - | |
|
exp_self.20260404150914.016_20260404_150948
|
Self-Directed Benchmark: SSM Strategy Stress Test
Overview This benchmark evaluates the memory efficiency and throughput of State Space Models (SSM) compared to standard Attention-based mechanisms under high-sequence constraints. Hypothesis Applying SSM with a disciplined memory policy (co...
|
04-04 15:10 | Success | - | |
|
exp_pytrain.20260404142316.017_20260404_142349
|
Typed Extensibility: Protocol-Based Module Discovery
README.md This benchmark evaluates an agent's ability to design a fault-tolerant plugin architecture using Python's `typing.Protocol` and dynamic module introspection. Objective Implement a module discovery system that: 1. Defines a strict...
|
04-04 14:24 | Success | - | |
|
exp_self.20260404140340.015_20260404_140407
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260404140340.015 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-04 14:05 | Success | - | |
|
exp_pytrain.20260404131811.016_20260404_131826
|
Dynamic Type-Safe Plugin Loader
This benchmark tests the ability to dynamically construct a Python package in memory and enforce strict typing contracts using `typing.Protocol`. Objective The script performs the following complex operations: 1. **Protocol Definition**: De...
|
04-04 13:19 | Success | - | |
|
exp_self.20260404125656.014_20260404_125726
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260404125656.014 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-04 12:58 | Success | - | |
|
exp_pytrain.20260404120709.015_20260404_120739
|
Generic Dependency Container with Importlib Resolution
This benchmark tests the ability to construct a robust, zero-dependency dependency injection system using modern Python 3.12 features. Hypothesis Utilizing PEP 695 Type Parameter Syntax and the `importlib` standard library module allows for...
|
04-04 12:08 | Success | - | |
|
exp_self.20260404114703.013_20260404_114722
|
Self-Directed Benchmark: SSM Strategy Stress Test
Overview This benchmark evaluates the efficiency gains of State Space Models (SSMs) when optimized with a disciplined memory policy and dynamic precision strategies. The goal is to simulate an "SSM Mamba" style workload under constrained me...
|
04-04 11:48 | Success | - | |
|
exp_pytrain.20260404105755.014_20260404_105821
|
Typed Plugin Registry with Metadata Parsing
This benchmark tests the implementation of a strictly typed plugin system using Python's `typing.Protocol` and `typing.Generic`. It simulates a workflow where components are loaded dynamically based on a configuration dictionary (mimicking...
|
04-04 10:59 | Success | - | |
|
exp_self.20260404103731.012_20260404_103758
|
SSM Strategy Stress Test
This benchmark evaluates the efficacy of a disciplined memory management policy applied to State Space Model (SSM) workloads. Hypothesis Applying an SSM architecture with a disciplined memory policy (chunked execution) significantly reduces...
|
04-04 10:39 | Success | - | |
|
exp_pytrain.20260404094229.013_20260404_094251
|
Python Skill Fallback
Title: Generic Kernel Dispatcher with Strict Typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-04 09:43 | Success | - | |
|
exp_self.20260404091740.011_20260404_091805
|
SSM Strategy Stress Test Benchmark
This repository contains a self-contained benchmark designed to test the hypothesis that **State Space Models (SSM)** with a disciplined memory policy can achieve higher throughput and lower VRAM usage compared to standard attention-based b...
|
04-04 09:19 | Success | - | |
|
exp_pytrain.20260404082157.012_20260404_082247
|
Strictly-Typed Dynamic Plugin Registry Benchmark
Overview This benchmark evaluates the implementation of a robust, type-safe plugin system utilizing Python's `typing.Protocol` and `importlib` features. It simulates an environment where plugin classes are discovered dynamically (mimicking...
|
04-04 08:23 | Success | - | |
|
exp_self.20260404080148.010_20260404_080210
|
Self-directed benchmark: SSM Strategy Stress Test
This benchmark evaluates the impact of a disciplined memory management policy on State Space Model (SSM) inference, specifically targeting throughput and VRAM constraints under 8GB. Objective To validate the hypothesis that applying strict...
|
04-04 08:03 | Success | - | |
|
exp_pytrain.20260404071349.011_20260404_071412
|
Python Skill Fallback
Title: Validated Package Scaffolder - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-04 07:15 | Success | - | |
|
exp_self.20260404065349.009_20260404_065409
|
Self-directed benchmark: SSM Strategy Stress Test
Overview This benchmark compares the memory efficiency and throughput of a standard Transformer-style Attention mechanism against an optimized State Space Model (SSM) implementation. The hypothesis is that the SSM strategy, which utilizes a...
|
04-04 06:55 | Success | - | |
|
exp_pytrain.20260404055919.010_20260404_055945
|
Python Skill Fallback
Title: Strictly Typed Async Batch Processor Module - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-04 06:00 | Success | - | |
|
exp_self.20260404053814.008_20260404_053850
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260404053814.008 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-04 05:39 | Success | - | |
|
exp_pytrain.20260404044427.009_20260404_044455
|
Self-Contained Modular Report Generator
This benchmark is designed to validate a Python engineer's ability to create a production-grade, self-contained module architecture within a single file. Hypothesis An autonomous coding system can simulate production-grade package architect...
|
04-04 04:45 | Success | - | |
|
exp_self.20260404042146.007_20260404_042210
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260404042146.007 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-04 04:23 | Success | - | |
|
exp_pytrain.20260404033417.008_20260404_033439
|
Python Skill Fallback
Title: Generic Repository Pattern with Modern Packaging - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-04 03:35 | Success | - | |
|
exp_self.20260404031350.006_20260404_031429
|
Self-directed benchmark: ssm strategy stress test
Overview This benchmark evaluates the efficiency of State Space Models (SSMs) under constrained memory environments. Specifically, it tests the hypothesis that applying an SSM with a disciplined memory policy (encompassing dynamic precision...
|
04-04 03:15 | Success | - | |
|
exp_pytrain.20260404022416.007_20260404_022435
|
Type-Safe Dynamic Component Instantiation Benchmark
This benchmark tests the ability to implement a generic factory pattern commonly used in large-scale AI frameworks (like PyTorch or LitGPT) where model architectures are defined via string paths. Objective Implement a robust system to: 1. D...
|
04-04 02:25 | Success | - | |
|
exp_self.20260404020402.005_20260404_020428
|
SSM Strategy Stress Test
This benchmark evaluates the hypothesis that **State Space Models (SSM)** with a disciplined memory policy provide higher throughput and lower VRAM usage compared to standard Attention mechanisms under constrained memory environments (8GB l...
|
04-04 02:05 | Success | - | |
|
exp_pytrain.20260404011543.006_20260404_011608
|
Type-Safe Auto-Registering Model Registry Benchmark
This benchmark evaluates a Python-centric architecture pattern designed to simplify the management of complex ML pipelines (e.g., Diffusers, vLLM). By leveraging `__init_subclass__` and `typing.Protocol`, we eliminate boilerplate code assoc...
|
04-04 01:17 | Success | - | |
|
exp_gh_VectorInstitute_odyssey_20260404_010257
|
VectorInstitute/odyssey
Paper ID: gh_VectorInstitute_odyssey - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recover...
|
04-04 01:03 | Success | - | |
|
exp_pytrain.20260404004118.005_20260404_004137
|
Strictly-Typed Dynamic Module Loader Benchmark
This benchmark tests the ability to construct a secure, type-checked plugin system using only the Python standard library. The program dynamically creates a Python package on the fly, defines a strict `Protocol` interface, and utilizes `imp...
|
04-04 00:42 | Success | - | |
|
exp_self.20260404002107.004_20260404_002132
|
Self-directed SSM Strategy Stress Test Benchmark
This benchmark evaluates the hypothesis that a Selective State Space Model (SSM) implementation, adhering to a disciplined memory policy, improves throughput and reduces VRAM overhead compared to standard Transformer attention mechanisms un...
|
04-04 00:22 | Success | - | |
|
exp_pytrain.20260403233224.004_20260403_233254
|
Strict Distribution Metadata Introspector
Overview This benchmark validates the ability of an autonomous system to programmatically inspect installed Python distributions using the standard library `importlib.metadata` module. It enforces structural integrity of the extracted data...
|
04-03 23:33 | Success | - | |
|
exp_self.20260403231218.003_20260403_231249
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260403231218.003 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-03 23:13 | Success | - | |
|
exp_pytrain.20260403222240.003_20260403_222317
|
Dynamic Module Loader with Structural Subtyping Benchmark
This benchmark tests the ability to design a robust runtime loader for modular components. It utilizes the `importlib` library for dynamic package introspection and the `typing.Protocol` system to enforce structural subtyping (duck typing w...
|
04-03 22:24 | Success | - | |
|
exp_self.20260403220335.002_20260403_220359
|
SSM Strategy Stress Test Benchmark
This benchmark evaluates the performance improvements gained by applying a disciplined memory policy to State Space Models (SSMs), specifically focusing on throughput and VRAM usage under constrained memory environments (8GB target). Object...
|
04-03 22:05 | Success | - | |
|
exp_pytrain.20260403211944.002_20260403_212009
|
PEP 695 Generic Dependency Resolver Benchmark
This benchmark evaluates the developer experience and runtime characteristics of Python 3.12+'s new Type Parameter Syntax (PEP 695) by implementing a generic dependency resolution system. Objective Implement a lightweight package manager re...
|
04-03 21:21 | Success | - | |
|
exp_self.20260403210039.001_20260403_210100
|
SSM Strategy Stress Test
This benchmark evaluates the hypothesis that applying a disciplined memory policy and dynamic precision to State Space Models (SSMs) improves throughput under strict 8GB VRAM constraints. Methodology We simulate a Mamba-like SSM workload us...
|
04-03 21:02 | Success | - | |
|
exp_pytrain.20260403201417.001_20260403_201451
|
Structurally-Typed Plugin Loader Benchmark
This benchmark validates a Python architecture that combines runtime dynamism with static structural typing. Overview The script demonstrates an autonomous plugin loading system. It uses `importlib` to dynamically discover and load modules...
|
04-03 20:15 | Success | - | |
|
exp_self.20260403200346.012_20260403_200409
|
SSM Strategy Stress Test: Memory vs. Throughput
This benchmark evaluates the hypothesis that a State Space Model (SSM) inference strategy (recurrent mode) significantly reduces VRAM usage compared to a standard Attention mechanism (Transformer baseline) under high sequence lengths, while...
|
04-03 20:04 | Pending | - | |
|
exp_pytrain.20260403191447.023_20260403_191508
|
Python Skill Fallback
Title: Type-Safe Async Resource Pool with Internal Package Structure - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-03 19:16 | Success | - | |
|
exp_oa_W7148177295_20260403_190329
|
Video Generation Models as World Models: Efficient Paradigms, Architectures and Algorithms
Paper ID: oa_W7148177295 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
04-03 19:04 | Success | - | |
|
exp_pytrain.20260403184221.022_20260403_184240
|
Generic Plugin Registry with PEP 695 Syntax
This benchmark validates a Python engineer's ability to utilize modern type hinting features introduced in Python 3.12 (PEP 695) to create generic classes without external dependencies. It combines this with advanced standard library usage...
|
04-03 18:43 | Success | - | |
|
exp_self.20260403182055.011_20260403_182129
|
Self-directed benchmark: ssm strategy stress test
This repository contains a benchmark designed to test the hypothesis that applying State Space Models (SSMs) with a disciplined memory policy (specifically, dynamic precision casting) improves throughput and reduces VRAM usage under constra...
|
04-03 18:22 | Success | - | |
|
exp_pytrain.20260403173545.021_20260403_173605
|
Strictly-Typed Generic Pipeline
Overview This benchmark demonstrates the creation of a strictly-typed data transformation pipeline using Python's standard typing utilities. The goal is to maintain type safety across a chain of operations, ensuring that static type checker...
|
04-03 17:37 | Success | - | |
|
exp_self.20260403171455.010_20260403_171524
|
SSM Strategy Stress Test Benchmark
This benchmark evaluates the performance of a State Space Model (SSM) implementation—specifically mimicking Mamba-style selective state spaces—under constrained memory conditions (8GB VRAM simulation). It compares a naive sequential recurre...
|
04-03 17:16 | Success | - | |
|
exp_pytrain.20260403162737.020_20260403_162810
|
Strictly-Typed Modular Pipeline with Exports Control
This benchmark demonstrates the implementation of a robust, modular data pipeline using Python's standard `typing` module and strict module export controls. Design Principles 1. **Structural Subtyping**: Uses `typing.Protocol` to define int...
|
04-03 16:29 | Success | - | |
|
exp_self.20260403160744.009_20260403_160806
|
SSM Strategy Stress Test Benchmark
This repository contains a self-directed benchmark designed to test the hypothesis that **State Space Models (SSM)** with a disciplined memory policy (fixed state size) maintain higher throughput and lower VRAM usage than standard Attention...
|
04-03 16:09 | Success | - | |
|
exp_pytrain.20260403152229.019_20260403_152320
|
Strictly Typed Model Registry & Configuration Loader
Overview This benchmark evaluates the implementation of a type-safe plugin architecture using Python's `typing` module. The system mimics the dependency injection patterns found in major ML frameworks like Hugging Face Transformers. Feature...
|
04-03 15:24 | Success | - | |
|
exp_hf_2603.06679_20260403_151001
|
MultiGen: External Memory Benchmark
This benchmark evaluates the computational efficiency of the **MultiGen** architecture compared to standard next-frame diffusion baselines. **Innovation Tested:** The core hypothesis is that decomposing world simulation into **Memory**, **O...
|
04-03 15:11 | Success | - | |
|
exp_pytrain.20260403144441.018_20260403_144522
|
Dynamic Package Loader with Strict Protocol Validation
This benchmark tests the engineering capability to design a robust plugin system that bridges Python's dynamic module loading with strict static typing. The goal is to implement a runtime validator that discovers modules dynamically (simula...
|
04-03 14:46 | Success | - | |
|
exp_self.20260403142205.008_20260403_142249
|
SSM Strategy Stress Test Benchmark
This repository contains a benchmark designed to evaluate the efficiency of State Space Model (SSM) architectures under constrained memory conditions (8GB VRAM limit). Objective The benchmark tests the hypothesis that applying **SSM with a...
|
04-03 14:24 | Success | - | |
|
exp_pytrain.20260403132751.017_20260403_132816
|
Python Skill Fallback
Title: Dynamic Plugin Loader with Strict Protocol Enforcement - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-03 13:29 | Success | - | |
|
exp_self.20260403130624.007_20260403_130656
|
Self-directed SSM Strategy Stress Test
Overview This benchmark evaluates the hypothesis that **SSM (State Space Model) strategies** significantly improve throughput and reduce VRAM overhead compared to standard Transformer architectures under strict memory constraints (8GB). Hyp...
|
04-03 13:08 | Success | - | |
|
exp_pytrain.20260403121418.016_20260403_121445
|
Strictly Typed Modular Plugin Loader
Overview This coding drill benchmark tests your ability to design a strictly typed, modular plugin system within a single Python file. It leverages advanced type hinting features (`Protocol`, `TypedDict`, `TypeVar`, `overload`) to enforce s...
|
04-03 12:15 | Success | - | |
|
exp_self.20260403114713.006_20260403_114803
|
Self-directed benchmark: ssm strategy stress test
This repository contains a synthetic benchmark designed to test the hypothesis that applying State Space Models (SSM) with a disciplined memory policy improves throughput under 8GB VRAM constraints. Overview The benchmark compares two appro...
|
04-03 11:49 | Success | - | |
|
exp_pytrain.20260403104743.015_20260403_104811
|
Generic Event Dispatcher with Modern Type Syntax
This benchmark implements a thread-safe Generic Event Dispatcher utilizing Python 3.12+ syntax (PEP 695) to define type parameters. It evaluates runtime performance and memory overhead while maintaining strict type hygiene.
|
04-03 10:49 | Success | - | |
|
exp_self.20260403102243.005_20260403_102314
|
Self-directed benchmark: ssm strategy stress test
This benchmark evaluates the efficiency of State Space Models (SSM) compared to standard Attention mechanisms under constrained memory environments. Hypothesis Applying SSM with disciplined memory policy improves throughput under 8GB constr...
|
04-03 10:24 | Success | - | |
|
exp_pytrain.20260403092847.014_20260403_092910
|
Type-Safe Plugin Registry Benchmark
Objective Design a robust, modular component registry using Python's `typing.Protocol` and generic types (`typing.Generic`, `typing.TypeVar`). This benchmark simulates the internal architecture of scalable systems like LitGPT, ensuring loos...
|
04-03 09:30 | Success | - | |
|
exp_self.20260403090347.004_20260403_090407
|
SSM Strategy Stress Test
This benchmark evaluates the memory efficiency and throughput of a State Space Model (SSM) architecture compared to a traditional Transformer architecture under strict VRAM constraints (8GB). Concept The test compares a standard **Transform...
|
04-03 09:05 | Success | - | |
|
exp_pytrain.20260403080654.013_20260403_080727
|
Python Skill Fallback
Title: Runtime Module Loader with Strict Type Verification - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-03 08:08 | Success | - | |
|
exp_self.20260403074228.003_20260403_074249
|
SSM Strategy Stress Test
This benchmark evaluates the hypothesis that applying a State Space Model (SSM) strategy with a disciplined memory policy (specifically chunked processing and mixed precision) improves throughput and memory efficiency under strict 8GB VRAM...
|
04-03 07:43 | Success | - | |
|
exp_pytrain.20260403065322.012_20260403_065356
|
Python Skill Fallback
Title: Structural Subtyping Plugin Loader - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-03 06:54 | Success | - | |
|
exp_hf_2604.01152_20260403_063908
|
Brainstacks: Modular Continual Learning Benchmark
This benchmark validates the **Brainstacks** architecture, focusing on its ability to learn new domains sequentially (continual learning) without catastrophic forgetting, using frozen MoE-LoRA stacks. Key Innovations Validated 1. **Frozen S...
|
04-03 06:40 | Success | - | |
|
exp_pytrain.20260403061445.011_20260403_061505
|
Strictly Typed Source Distribution Builder
This benchmark tests the ability to generate a standards-compliant Python package structure programmatically using only the standard library. Objective Create a script that demonstrates proficiency with: 1. **Strict Typing**: Utilizing `typ...
|
04-03 06:16 | Success | - | |
|
exp_cr_10.1038_s41598-026-44804-x_20260403_055828
|
Mamba-based modulated fusion model for video moment retrieval
Paper ID: cr_10.1038_s41598-026-44804-x - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Reco...
|
04-03 05:59 | Success | - | |
|
exp_pytrain.20260403053129.010_20260403_053201
|
Robust Typed Configuration Module
This benchmark evaluates a Python module's ability to strictly enforce type safety and adhere to packaging hygiene standards using only the standard library. Objective The goal is to simulate a high-integrity configuration loader typically...
|
04-03 05:33 | Success | - | |
|
exp_pytrain.20260403045845.009_20260403_045949
|
Python Skill Fallback
Title: Robust Dynamic Plugin Loader with Structural Typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-03 05:00 | Success | - | |
|
exp_pytrain.20260403042335.008_20260403_042428
|
Robust Package Dependency Resolver
This benchmark evaluates the implementation of a `DependencyResolver` class designed to manage package installation order and detect conflicts using Python's standard library. Implementation Details The `DependencyResolver` class uses `grap...
|
04-03 04:25 | Success | - | |
|
exp_pytrain.20260403035100.007_20260403_035125
|
Generic Registry with Protocol-Based Plugin Loading
This coding drill verifies the capability of an autonomous coding system to construct a robust, type-safe package architecture using only the Python Standard Library. Architecture Overview This benchmark creates a modular plugin architectur...
|
04-03 03:52 | Success | - | |
|
exp_pytrain.20260403031621.006_20260403_031719
|
Type-Safe Component Registry using Importlib
This benchmark demonstrates a robust, extensible plugin architecture using Python's standard library. It leverages `typing.Protocol` for structural subtyping (duck typing) and `typing.Generic` to create a type-safe registry. It simulates a...
|
04-03 03:18 | Success | - | |
|
exp_pytrain.20260403023931.005_20260403_024054
|
Robust Dynamic Plugin Loader with Structural Subtyping Benchmark
This benchmark evaluates a Python system's capability to dynamically discover, load, and validate plugins using structural subtyping (Protocols) rather than explicit inheritance. Design The script creates a secure, ephemeral package structu...
|
04-03 02:41 | Success | - | |
|
exp_pytrain.20260403020308.004_20260403_020338
|
Type-Safe Modular Log Filter Benchmark
Overview This project demonstrates a robust, modular architecture for log filtering using Python's `typing.Protocol` for structural subtyping. It adheres to strict type safety standards and includes a built-in benchmark suite to validate pe...
|
04-03 02:04 | Success | - | |
|
exp_self.20260403012816.002_20260403_012915
|
SSM Strategy Stress Test Benchmark
This benchmark tests the hypothesis that applying SSM with disciplined memory policy improves throughput under 8GB constraints. Overview State Space Models (SSMs) like Mamba have shown impressive capabilities in sequence modeling while main...
|
04-03 01:30 | Success | - | |
|
exp_pytrain.20260403001853.003_20260403_001911
|
Generic Plugin Registry Benchmark
This benchmark evaluates a system's ability to dynamically construct a Python package architecture at runtime, enforce structural typing via `typing.Protocol`, and manage module lifecycles using `importlib`. Scenario The script simulates a...
|
04-03 00:20 | Success | - | |
|
exp_self.20260402234808.001_20260402_234843
|
Self-Directed Benchmark: SSM Strategy Stress Test
This benchmark evaluates the hypothesis that a **disciplined memory policy** applied to State Space Models (SSMs) improves throughput under constrained VRAM (8GB). The Innovation Standard large language models and naive SSM implementations...
|
04-02 23:49 | Success | - | |
|
exp_pytrain.20260402224511.002_20260402_224608
|
Generic Plugin Loader & PEP 695 Syntax Benchmark
This benchmark evaluates the implementation of a type-safe, generic plugin architecture using Python 3.12's new Type Parameter Syntax (PEP 695). It demonstrates how modern generic syntax (`class MyClass[T]:`) improves code readability over...
|
04-02 22:47 | Success | - | |
|
exp_pytrain.20260402221115.001_20260402_221156
|
Dynamic Plugin Loader with Strict Protocol Enforcement
This benchmark evaluates a system's ability to programmatically construct a Python package in a volatile file system environment and enforce strict type protocols using Python's standard `typing` module. Objective The candidate script must...
|
04-02 22:12 | Success | - | |
|
exp_self.20260402215337.004_20260402_215404
|
Self-directed benchmark: SSM Strategy Stress Test
This benchmark evaluates the hypothesis that applying a State Space Model (SSM) strategy with a disciplined memory policy improves throughput under 8GB VRAM constraints compared to standard Transformer architectures. Requirements - Python 3...
|
04-02 21:54 | Pending | - | |
|
exp_pytrain.20260402205031.006_20260402_205102
|
Strictly-Typed Dynamic Component Loader
Objective This benchmark challenges you to implement a robust, plugin-like architecture in Python without relying on external frameworks. The goal is to mimic the dynamic loading patterns used in large-scale ML libraries (like vLLM or Huggi...
|
04-02 20:52 | Success | - | |
|
exp_gh_quic_aimet_20260402_203724
|
AIMET Quantization Benchmark
This benchmark evaluates the efficiency of **AIMET (AI Model Efficiency Toolkit)** for Post-Training Quantization (PTQ). It measures VRAM usage and inference throughput (tokens/sec) of a standard Transformer model before and after applying...
|
04-02 20:38 | Success | - | |
|
exp_pytrain.20260402201405.005_20260402_201435
|
Runtime-Verified ZipApp Packager
This benchmark evaluates an autonomous coding system's ability to programmatically synthesize a Python package structure, enforce strict type compliance on the generated source code using runtime introspection (without external linters), an...
|
04-02 20:15 | Success | - | |
|
exp_self.20260402195336.003_20260402_195359
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260402195336.003 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
04-02 19:55 | Success | - | |
|
exp_pytrain.20260402190028.004_20260402_190049
|
Strictly Typed CLI Log Processor
This coding drill benchmarks the ability to write a robust, strictly-typed Python CLI application using only the standard library. Overview The script `benchmark.py` implements a log processor that: 1. **Parses Arguments**: Uses `argparse`...
|
04-02 19:01 | Success | - | |
|
exp_self.20260402183757.002_20260402_183822
|
Self-Directed SSM Strategy Stress Test
This benchmark evaluates the performance characteristics of a novel State Space Model (SSM) strategy designed for memory-constrained environments (8GB VRAM limit). The Innovation The proposed method integrates two key optimizations: 1. **Dy...
|
04-02 18:39 | Success | - | |
|
exp_pytrain.20260402174528.003_20260402_174548
|
Python Skill Fallback
Title: Strictly Typed Dynamic Plugin System - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-02 17:46 | Success | - | |
|
exp_self.20260402172433.001_20260402_172454
|
SSM Strategy Stress Test Benchmark
This benchmark evaluates the efficiency of State Space Models (SSMs) against standard Attention mechanisms under strict memory constraints. It simulates an 8GB VRAM environment by tracking peak memory allocation and throughput for long-cont...
|
04-02 17:26 | Success | - | |
|
exp_pytrain.20260402163435.002_20260402_163503
|
Benchmark: Modern Generic Cache Manager with PEP 695
This coding drill validates the implementation of a generic `LRUCache` class utilizing the new PEP 695 Type Parameter Syntax introduced in Python 3.12. The objective is to ensure the codebase leverages modern typing features for improved re...
|
04-02 16:36 | Success | - | |
|
exp_2604.01216v1_20260402_162258
|
Benchmark for LAPIS-SHRED
This benchmark evaluates the computational performance and reconstruction capability of the LAPIS-SHRED (LAtent Phase Inference from Short time sequences using SHallow REcurrent Decoders) architecture. Architecture Overview LAPIS-SHRED is d...
|
04-02 16:24 | Success | - | |
|
exp_pytrain.20260402160154.001_20260402_160220
|
Structural Subtyping Plugin Loader Benchmark
Overview This benchmark tests the ability to construct a robust, type-safe plugin loading system using Python's `typing.Protocol` and `importlib`. The goal is to discover modules within a package structure, instantiate classes that structur...
|
04-02 16:03 | Success | - | |
|
exp_self.20260402154924.002_20260402_154959
|
SSM Strategy Stress Test Benchmark
This benchmark evaluates the hypothesis that State Space Models (SSMs) with disciplined memory policies (specifically Mamba) offer superior throughput and memory efficiency compared to standard Transformer architectures under strict 8GB VRA...
|
04-02 15:49 | Pending | - | |
|
exp_pytrain.20260402145859.013_20260402_145918
|
Python Skill Fallback
Title: Type-Safe Kernel Dispatcher with Package Semantics - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-02 15:00 | Success | - | |
|
exp_self.20260402143425.001_20260402_143531
|
SSM Strategy Stress Test Benchmark
Overview This benchmark evaluates the performance of State Space Models (SSM) under memory constraints. It specifically tests the hypothesis that applying SSM with a disciplined memory policy improves throughput under 8GB VRAM constraints....
|
04-02 14:37 | Success | - | |
|
exp_pytrain.20260402134712.012_20260402_134741
|
Python Skill Fallback
Title: Dynamic Plugin Registry with Structural Typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-02 13:48 | Success | - | |
|
exp_2604.01220v1_20260402_133500
|
Universal YOCO for Efficient Depth Scaling
Paper ID: 2604.01220v1 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
04-02 13:36 | Success | - | |
|
exp_oa_W4413304852_20260402_132320
|
Benchmark: LLM Optimization for PHM on Edge Devices
**Paper:** Large language models for PHM: a review of optimization techniques and applications **Type:** Review This paper surveys LLM deployment strategies for Prognostics and Health Management (PHM) on resource-constrained industrial hard...
|
04-02 13:24 | Success | - | |
|
exp_pytrain.20260402130329.011_20260402_130359
|
Python Skill Fallback
Title: Dynamic Type-Safe Plugin Loader - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-02 13:05 | Success | - | |
|
exp_2411.02985v1_20260402_125300
|
Benchmark: Hybrid Sparse Coding with Unrolled Solver
**Architecture:** Hybrid sparse coding model utilizing a concatenated dictionary (Zernike polynomials + complex modes) and a trainable affine transform layer. Inference relies on $L_1$-regularized optimization (sparse recovery) rather than...
|
04-02 12:54 | Success | - | |
|
exp_pytrain.20260402122944.010_20260402_123039
|
Strictly Typed Async Plugin System
This benchmark evaluates a Python plugin architecture that leverages **Structural Subtyping (Protocol)** and **Generics** to enforce type safety without explicit inheritance. Objective The goal is to design an asynchronous data processor re...
|
04-02 12:31 | Success | - | |
|
exp_cr_10.1016_j.aiig.2024.100104_20260402_121708
|
Convolutional Sparse Coding (CSC) Benchmark
**Architecture:** Proposes a **feed-forward Convolutional Sparse Coding (CSC)** network designed to replace iterative optimization algorithms. The structure typically utilizes cascaded convolutional layers coupled with non-linear shrinkage...
|
04-02 12:18 | Success | - | |
|
exp_pytrain.20260402115233.009_20260402_115315
|
Python Skill Fallback
Title: Strict Project Metadata Auditor - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-02 11:54 | Success | - | |
|
exp_2603.26465v1_20260402_114107
|
Backfill Candidate 2603.26465v1
**Architecture:** A hybrid model enhancing standard Transformers with Boltzmann Machine constraints. It integrates structured binary gating variables into multi-head attention to model higher-order dependencies, utilizing mean-field variati...
|
04-02 11:42 | Success | - | |
|
exp_pytrain.20260402111105.008_20260402_111142
|
Python Skill Fallback
Title: Strictly-Typed Project Scaffolder - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-02 11:12 | Success | - | |
|
exp_2411.01399v1_20260402_105727
|
MambaReg Benchmark: Linear vs. Quadratic Complexity
**Architecture:** MambaReg introduces a hybrid architecture combining Convolutional Neural Networks (CNNs) with Mamba (State Space Models). It extracts local features via convolutions and processes global context via Mamba blocks to handle...
|
04-02 10:58 | Success | - | |
|
exp_pytrain.20260402102936.007_20260402_103014
|
Strict Metadata Validator and Plugin Loader
This drill validates the hypothesis that leveraging Python's structural typing features (`TypedDict`, `Protocol`) alongside `importlib` creates a robust, self-documenting plugin architecture. By defining strict interfaces for metadata and e...
|
04-02 10:31 | Success | - | |
|
exp_2603.25722v1_20260402_101515
|
Benchmark: Parameter-Free Cross-Modal Attention Pooling
**Architecture:** Modifies standard dual-encoder (Contrastive V&L) frameworks. Replaces final global pooling with **parameter-free cross-modal attention-pooling** to align concept-centric text segments with visual features. **Memory Footpri...
|
04-02 10:16 | Success | - | |
|
exp_pytrain.20260402094822.006_20260402_094852
|
Python Skill Fallback
Title: Runtime Type-Checked Plugin Registry - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-02 09:49 | Success | - | |
|
exp_2410.18794v2_20260402_093622
|
Backfill Candidate 2410.18794v2
**Architecture:** Hybrid model integrating a lightweight "predictor network" (CNN) with a hard-thresholded Convolutional Locally Competitive Algorithm (LCA) solver. The predictor performs "state warm-up," generating a high-quality initial g...
|
04-02 09:37 | Success | - | |
|
exp_pytrain.20260402090520.005_20260402_090618
|
Generic Plugin Registry with Typed Configuration
This benchmark implements a standalone `cli_engine` simulation. It demonstrates advanced type safety features in Python standard library including `Protocol`, `Generic`, `TypeVar`, and `TypedDict`. Architecture 1. **TypedDict (`Settings`)**...
|
04-02 09:07 | Success | - | |
|
exp_hf_2603.13904_20260402_085033
|
Benchmark for CroBo: Single-Token Visual State Compression
**Paper:** CroBo (Visual States Need What-is-Where Composition) **Architecture:** CroBo is a self-supervised encoder-decoder framework designed to compress visual observations into a **single, compact bottleneck token** capturing "what-is-w...
|
04-02 08:51 | Success | - | |
|
exp_pytrain.20260402082637.004_20260402_082710
|
Python Skill Fallback
Title: Dynamic Type-Safe Plugin Loader - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-02 08:28 | Success | - | |
|
exp_cr_10.3390_pr13071977_20260402_081430
|
Backfill Candidate cr_10.3390_pr13071977
**Architecture:** TransQwen is a specialized fine-tune of **Qwen-7B-Chat** utilizing **DoRA** (Weight-Decomposed Low-Rank Adaptation) for parameter-efficient updates and RoPE for positional encoding. This is a **weight-based learning approa...
|
04-02 08:15 | Success | - | |
|
exp_pytrain.20260402074913.003_20260402_074940
|
Protocol-Driven Extensible CLI Dispatcher
This benchmark tests the implementation of a modular command-line interface (CLI) using Python's `typing.Protocol` for structural sub-typing. Objectives 1. **Protocol Enforcement**: Define a `Command` interface using `typing.Protocol` and `...
|
04-02 07:50 | Success | - | |
|
exp_2412.00503v3_20260402_073338
|
Benchmarking Bio-Plausible Transformers (RFB-kWTA)
**Architecture:** The paper proposes integrating biological homeostasis mechanisms—RFB-kWTA (Random Feedback k-Winners-Take-All) and "Smart" Inhibition—into standard Transformer attention and output layers. These modules use running statist...
|
04-02 07:34 | Success | - | |
|
exp_pytrain.20260402070205.002_20260402_070257
|
Python Reliability Drill: Typing & Robustness
Overview This drill implements a **Type-Safe Inference Engine** to test your ability to write robust, reusable utilities with strict typing constraints, edge-case handling, and performance monitoring. Objective Create a generic processing u...
|
04-02 07:03 | Success | - | |
|
exp_cr_10.3390_info16050343_20260402_064550
|
Backfill Candidate cr_10.3390_info16050343
**Architecture:** Introduces **CPSE** (encoding) and **CPSD** (decoding), a framework utilizing Sparse Binary Representations (SDRs) and triadic memory. It extends Context-Dependent Thinning (CDT) to manage nested compositional structures a...
|
04-02 06:46 | Success | - | |
|
exp_pytrain.20260402061631.001_20260402_061706
|
Generic Plugin Loader with Strict Interface Contracts
This benchmark evaluates an implementation of a modular data processing pipeline architecture. It utilizes Python's `typing.Protocol` to define structural subtyping (duck typing with explicit contracts) and `typing.Generic` for type-safe co...
|
04-02 06:18 | Success | - | |
|
exp_pytrain.20260401075805.001_20260401_075856
|
Runtime-Checked Plugin Loader
This benchmark tests a developer's ability to design a robust, type-safe plugin architecture using Python's standard library. Problem Description Create a single-file Python script `benchmark.py` that implements a **Runtime-Checked Plugin L...
|
04-01 07:59 | Success | - | |
|
exp_pytrain.20260401071752.001_20260401_071825
|
Structural Subtyping Plugin Registry
This benchmark simulates a modern plugin architecture where plugins are discovered dynamically and validated against a strict **Structural Subtyping** (Protocol) contract defined via PEP 544. It tests the ability to: 1. Define a strict `typ...
|
04-01 07:19 | Success | - | |
|
exp_pytrain.20260401063316.091_20260401_063401
|
Dynamic CLI Architecture with Strict Typing
Objective Design a single-file executable Python script (`smart_cli.py`) that demonstrates advanced use of type hints (`Protocol`) and reflection (`importlib`) to build a modular command-line interface. The goal is to simulate the architect...
|
04-01 06:35 | Success | - | |
|
exp_cr_10.3390_s25010064_20260401_061451
|
Benchmark: Edge-Scale Driver Intent Model (Llama-3-8B + 4-bit)
**Architecture** Built on **Llama-3-8B-Instruct**, optimized via **LoRA** to integrate multi-attribute inputs (historical interactions, driver emotion, vehicle/physics state). It functions as an encoder-decoder for intent prediction, treati...
|
04-01 06:15 | Success | - | |
|
exp_pytrain.20260401054831.090_20260401_054910
|
Python Skill Fallback
Title: Asyncio ZipApp Packager - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
04-01 05:50 | Success | - | |
|
exp_2410.16443v4_20260401_053313
|
Benchmark: CRATE (Coding RAte TransformEr) vs Standard Transformer
**Architecture:** CRATE (Coding RAte TransformEr) is a "white-box" Transformer variant that explicitly integrates sparse coding mechanisms—specifically coding rate minimization—directly into the network layers to capture low-dimensional dat...
|
04-01 05:34 | Success | - | |
|
exp_pytrain.20260401050134.089_20260401_050235
|
Generic Plugin Registry with Type Safety
Overview This benchmark evaluates a Python developer's ability to construct a robust, type-safe "micro-framework" within a single file. It simulates a modular package architecture by leveraging advanced `typing` constructs (Generics, Protoc...
|
04-01 05:03 | Success | - | |
|
exp_pytrain.20260401042214.088_20260401_042307
|
Typed Metadata Inspector (PEP 695)
This benchmark validates a developer's ability to utilize modern Python 3.12+ type hinting features (PEP 695) in conjunction with the standard library's packaging tooling (`importlib.metadata`). **Objective:** Implement a Generic class `Pac...
|
04-01 04:24 | Success | - | |
|
exp_pytrain.20260401032811.087_20260401_032935
|
Type-Safe Dynamic Plugin Registry
This benchmark tests the ability to design a robust, dynamic plugin system using Python's `typing.Protocol` and `importlib` modules. Scenario You are building an extensible data processing framework. You must define a strict `Transform` pro...
|
04-01 03:30 | Success | - | |
|
exp_pytrain.20260401024404.086_20260401_024524
|
Coding Drill: Generic Component Registry
Objective Implement a robust, type-safe `Registry` class using Python's standard library. This pattern is common in large-scale ML frameworks (like Diffusers or vLLM) to manage dynamic model loading and configuration without hard-coding dep...
|
04-01 02:46 | Success | - | |
|
exp_pytrain.20260401015913.085_20260401_020033
|
Typed ZipApp Packager
This benchmark tests the ability of an autonomous coding system to construct a lightweight distribution tool using Python's standard library. Objective The candidate must implement a `ZipAppBuilder` class that compiles a dictionary of virtu...
|
04-01 02:01 | Success | - | |
|
exp_pytrain.20260401011808.084_20260401_011920
|
Type-Safe Modular Data Processor
A robust, single-file Python module demonstrating strict type integrity using Generics, Protocols, and modern packaging standards within the standard library. This benchmark simulates a high-throughput data ingestion pipeline. Features - **...
|
04-01 01:20 | Success | - | |
|
exp_pytrain.20260401003943.083_20260401_004013
|
Dynamic Plugin Loader with Runtime Type Validation
Overview This coding drill benchmarks a Python system's ability to implement a secure, modular plugin architecture. It tests the hypothesis that an autonomous system can achieve robust modularity by programmatically generating Python module...
|
04-01 00:41 | Success | - | |
|
exp_core_299002838_20260401_002015
|
Backfill Candidate core_299002838
This review surveys Transformer-based LLMs and multi-modal architectures for Prognostics and Health Management (PHM), specifically targeting deployment on resource-constrained industrial hardware. * **Architecture:** Focuses on adapting gen...
|
04-01 00:21 | Success | - | |
|
exp_pytrain.20260331234921.082_20260331_234950
|
PEP 695 Generic Repository Implementation
Overview This benchmark evaluates a Python developer's ability to utilize PEP 695 Type Parameter Syntax (introduced in Python 3.12) to define generic classes and functions without relying on legacy `TypeVar` imports. Objective Implement a r...
|
03-31 23:50 | Success | - | |
|
exp_2410.00340v3_20260331_233316
|
Backfill Candidate 2410.00340v3
**Assessment: Low Relevance for Inference Optimization** **Architecture:** No new model architecture proposed. The paper introduces a diagnostic tool using Singular Value Decomposition (SVD) on GPT-2 Small’s attention weight matrices to iso...
|
03-31 23:34 | Success | - | |
|
exp_pytrain.20260331231020.081_20260331_231044
|
Python Typing & Structure Drill: Generic Plugin Registry
This drill validates the implementation of a strictly typed, generic plugin system using Python's `typing.Protocol`, `typing.Generic`, and `typing.TypeVar`. It simulates a package structure within a single script by enforcing proper `__all_...
|
03-31 23:11 | Success | - | |
|
exp_pytrain.20260331223703.080_20260331_223800
|
Python Skill Fallback
Title: Strictly-Typed Plugin Registry with Metadata Validation - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-31 22:39 | Success | - | |
|
exp_pytrain.20260331214749.079_20260331_214824
|
Python Skill Fallback
Title: Typing-Driven Model Registry Factory - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-31 21:49 | Success | - | |
|
exp_pytrain.20260331211005.078_20260331_211154
|
Type-Safe Dynamic Module Loader Benchmark
This benchmark evaluates the ability to construct a robust, type-safe plugin architecture using Python's standard library. Objective The goal is to programmatically generate a Python package structure on disk, define a strict structural int...
|
03-31 21:12 | Success | - | |
|
exp_2401.00243v1_20260331_204734
|
UP-RLHF Policy Inference Benchmark
**Architecture:** UP-RLHF introduces a training-time architecture utilizing an ensemble of diverse Low-Rank Adaptations (LoRAs) for the Reward Model (RM). Diversity is enforced by maximizing the nuclear norm of concatenated LoRA matrices. T...
|
03-31 20:48 | Success | - | |
|
exp_pytrain.20260331201655.077_20260331_201740
|
Strict Typed Plugin System Simulator
Overview This coding drill evaluates the system's ability to construct a robust, modular application architecture using modern Python typing constructs (`Protocol`, `TypeVar`, `runtime_checkable`) and standard library introspection tools (`...
|
03-31 20:18 | Success | - | |
|
exp_2508.16915v3_20260331_200034
|
Benchmark for Candidate 2508.16915v3: Reinforcement-Guided Hyper-Heuristic SNN for Fraud Detection
Fallback synthesis: Reinforcement-Guided Hyper-Heuristic Hyperparameter Optimization for Fair and Explainable Spiking Neural Network-Based Financial Fraud Detection. Potential 8GB relevance via sparse, rag.
|
03-31 20:01 | Success | - | |
|
exp_pytrain.20260331192901.076_20260331_192932
|
Generic Dependency Injection Container with Public API Hygiene
Overview This benchmark evaluates your ability to construct a robust, type-safe dependency injection (DI) system using Python's standard type hints and packaging best practices. The goal is to create a `ServiceContainer` that manages object...
|
03-31 19:30 | Success | - | |
|
exp_2507.10855v1_20260331_191835
|
Backfill Candidate 2507.10855v1
Fallback synthesis: Sparse Fine-Tuning of Transformers for Generative Tasks. Potential 8GB relevance via sparse, rag.
|
03-31 19:19 | Success | - | |
|
exp_cr_10.34088_kojose.1658929_20260331_190725
|
Backfill Candidate cr_10.34088_kojose.1658929
Fallback synthesis: Refining Sparse Coding Dictionaries Using High Dimensional Model Representation for Hyperspectral Imagery. Potential 8GB relevance via sparse, rag.
|
03-31 19:08 | Success | - | |
|
exp_pytrain.20260331184729.075_20260331_184745
|
Generic Command Registry Benchmark
This benchmark tests the creation of a robust, extensible command processing pipeline leveraging Python's advanced typing features (Generics and Protocols) and strict packaging standards within a single-file constraint. Objective Develop a...
|
03-31 18:48 | Success | - | |
|
exp_cr_10.7717_peerj-cs.3388_20260331_183704
|
Benchmark: Sparse CNN Efficiency via Feature Decoupling
Fallback synthesis: Towards optimal sparse CNNs: sparsity-friendly knowledge distillation through feature decoupling. Potential 8GB relevance via sparse.
|
03-31 18:38 | Success | - | |
|
exp_2411.04519v2_20260331_182547
|
FNet-LZSC: Deep Unfolding Sparse Coding Benchmark
**Architecture:** FNet utilizes **Deep Unfolding** of an $\ell_0$-regularized Multi-Modal Convolutional Sparse Coding (MCSC) model. The core component is the **Learnable $\ell_0$ Sparse Coding (LZSC)** block, which explicitly decomposes sou...
|
03-31 18:26 | Success | - | |
|
exp_pytrain.20260331180541.074_20260331_180614
|
Generic Component Registry & CLI Benchmark
This benchmark evaluates the implementation of a type-safe, generic component registry within a strict packaging structure, mimicking the architecture of frameworks like LitGPT. Objectives 1. **Packaging Structure:** Correctly define module...
|
03-31 18:07 | Success | - | |
|
exp_2411.00393v4_20260331_175509
|
Backfill Candidate 2411.00393v4
**Architecture:** Replaces scalar regression or one-hot classification layers with **population-coded layers**. In this scheme, a continuous variable is represented by a distributed activation pattern across a neuron ensemble, mimicking bio...
|
03-31 17:56 | Success | - | |
|
exp_cr_10.1609_aaai.v40i42.40891_20260331_174359
|
Benchmark: ToT (Test of Time) Framework for Multimodal LLMs
**Architecture:** ToT is a model-agnostic, inference-time framework for Multimodal LLMs. It operates as a non-invasive "black-box" wrapper, detecting backdoors by analyzing semantic consistency and confidence drift in response to controlled...
|
03-31 17:45 | Success | - | |
|
exp_pytrain.20260331172420.073_20260331_172442
|
Python Skill Fallback
Title: Strictly-Typed Plugin Registry with Dependency Verification - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-31 17:25 | Success | - | |
|
exp_2507.07136v2_20260331_171339
|
Benchmark: LangSplatV2 High-Dimensional Language Splatting
Fallback synthesis: LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS. Potential 8GB relevance via sparse, inference, rag.
|
03-31 17:14 | Success | - | |
|
exp_pytrain.20260331165133.072_20260331_165155
|
Python Skill Fallback
Title: Strictly Typed Plugin Discovery and Registry - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-31 16:52 | Success | - | |
|
exp_2506.24041v1_20260331_163915
|
Backfill Candidate 2506.24041v1
Fallback synthesis: Unsupervised Sparse Coding-based Spiking Neural Network for Real-time Spike Sorting. Potential 8GB relevance via sparse, inference, rag.
|
03-31 16:40 | Success | - | |
|
exp_pytrain.20260331161811.071_20260331_161843
|
Python Skill Fallback
Title: Dynamic Type-Checked Plugin Loader - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-31 16:19 | Success | - | |
|
exp_cr_10.61091_jcmcc127a-423_20260331_160544
|
Polynomial Matrix Sparse Coding (PMSC) Benchmark
**Summary for ARES 8GB Roadmap** **Architecture:** The paper proposes a Polynomial Matrix Sparse Coding (PMSC) framework. This is a mathematical approach to signal feature extraction (specifically for non-electrical signals in HVDC valves),...
|
03-31 16:06 | Success | - | |
|
exp_pytrain.20260331153946.070_20260331_154018
|
Generic Repository with Encapsulated API
This benchmark evaluates your ability to construct a robust, type-safe data access layer using Python's advanced typing features (Generics, Protocols) and packaging standards (`__all__`). Objective Implement a Generic Repository pattern wit...
|
03-31 15:41 | Success | - | |
|
exp_hf_2603.24793_20260331_152843
|
AVControl Benchmark: Modular LoRA Injection for LTX-2
**Architecture** AVControl is a modular framework built on the LTX-2 DiT architecture. It employs a "parallel canvas" mechanism, injecting control modalities (e.g., depth, pose, audio) as additional tokens within attention layers. Each cont...
|
03-31 15:30 | Success | - | |
|
exp_pytrain.20260331150644.069_20260331_150719
|
Dynamic Typed Package Construction and Verification
Overview This benchmark evaluates an autonomous coding system's ability to programmatically generate a valid Python package structure, enforce strict type annotations, manage module visibility, and perform runtime introspection using the st...
|
03-31 15:08 | Success | - | |
|
exp_2412.08516v2_20260331_145530
|
Hybrid Offline Feature Selection for Recommender Systems
**Architecture:** Hybrid offline feature selection pipeline. LLMs provide semantic reasoning to rank feature importance, followed by a lightweight surrogate model that refines these rankings for task-specific optimization. **Memory Footprin...
|
03-31 14:56 | Success | - | |
|
exp_oa_W4417147545_20260331_144443
|
Benchmark: Edge Deployment Optimization for MLLMs
**Summary for ARES 8GB Roadmap** This survey provides a systematic review of optimization strategies for Multimodal Large Language Models (MLLMs), specifically targeting edge deployment constraints relevant to 8GB VRAM limitations. * **Arch...
|
03-31 14:45 | Success | - | |
|
exp_pytrain.20260331142553.068_20260331_142629
|
Typed PyProject Manifest Validator
This benchmark tests the hypothesis that utilizing PEP 484 Type Hints and TypedDicts to model packaging configuration data reduces runtime errors and improves the maintainability of configuration parsers. Objective To create a robust valida...
|
03-31 14:27 | Success | - | |
|
exp_2603.25720v1_20260331_142303
|
R-C2 Benchmark: Cycle-Consistency Latency Overhead
**Architecture:** R-C2 is a Reinforcement Learning (RL) framework designed for Vision-Language Models (VLMs). It enforces a "cycle-consistency" constraint, utilizing backward inference (Answer $\to$ Reconstruction) and modality switching to...
|
03-31 14:24 | Success | - | |
|
exp_oa_W4413304852_20260331_141203
|
Backfill Candidate oa_W4413304852
**Paper:** Large language models for PHM: a review of optimization techniques and applications **Type:** Review This paper surveys LLM deployment strategies for Prognostics and Health Management (PHM) on resource-constrained industrial hard...
|
03-31 14:13 | Success | - | |
|
exp_pytrain.20260331135330.067_20260331_135405
|
Python Skill Fallback
Title: Dynamic Component Loader with Runtime Type Verification - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-31 13:55 | Success | - | |
|
exp_pytrain.20260331135202.066_20260331_135231
|
pytrain.20260331135202.066
No summary available yet.
|
03-31 13:52 | Pending | - | |
|
exp_pytrain.20260331134857.065_20260331_134932
|
pytrain.20260331134857.065
No summary available yet.
|
03-31 13:49 | Pending | - | |
|
exp_pytrain.20260331134714.064_20260331_134807
|
pytrain.20260331134714.064
No summary available yet.
|
03-31 13:48 | Pending | - | |
|
exp_pytrain.20260331134412.063_20260331_134502
|
pytrain.20260331134412.063
No summary available yet.
|
03-31 13:45 | Pending | - | |
|
exp_pytrain.20260331134229.062_20260331_134259
|
pytrain.20260331134229.062
No summary available yet.
|
03-31 13:42 | Pending | - | |
|
exp_pytrain.20260331133919.061_20260331_134008
|
pytrain.20260331133919.061
No summary available yet.
|
03-31 13:40 | Pending | - | |
|
exp_pytrain.20260331133740.060_20260331_133818
|
pytrain.20260331133740.060
No summary available yet.
|
03-31 13:38 | Pending | - | |
|
exp_pytrain.20260331133401.059_20260331_133508
|
pytrain.20260331133401.059
No summary available yet.
|
03-31 13:35 | Pending | - | |
|
exp_pytrain.20260331133157.058_20260331_133251
|
pytrain.20260331133157.058
No summary available yet.
|
03-31 13:32 | Pending | - | |
|
exp_pytrain.20260331132843.057_20260331_132918
|
pytrain.20260331132843.057
No summary available yet.
|
03-31 13:29 | Pending | - | |
|
exp_pytrain.20260331132714.056_20260331_132746
|
pytrain.20260331132714.056
No summary available yet.
|
03-31 13:27 | Pending | - | |
|
exp_pytrain.20260331132557.055_20260331_132627
|
pytrain.20260331132557.055
No summary available yet.
|
03-31 13:26 | Pending | - | |
|
exp_pytrain.20260331132321.054_20260331_132356
|
pytrain.20260331132321.054
No summary available yet.
|
03-31 13:23 | Pending | - | |
|
exp_pytrain.20260331132203.053_20260331_132233
|
pytrain.20260331132203.053
No summary available yet.
|
03-31 13:22 | Pending | - | |
|
exp_pytrain.20260331131910.052_20260331_131951
|
pytrain.20260331131910.052
No summary available yet.
|
03-31 13:19 | Pending | - | |
|
exp_pytrain.20260331131751.051_20260331_131810
|
pytrain.20260331131751.051
No summary available yet.
|
03-31 13:18 | Pending | - | |
|
exp_pytrain.20260331131459.050_20260331_131536
|
pytrain.20260331131459.050
No summary available yet.
|
03-31 13:15 | Pending | - | |
|
exp_pytrain.20260331131325.049_20260331_131407
|
pytrain.20260331131325.049
No summary available yet.
|
03-31 13:14 | Pending | - | |
|
exp_pytrain.20260331131045.048_20260331_131109
|
pytrain.20260331131045.048
No summary available yet.
|
03-31 13:11 | Pending | - | |
|
exp_pytrain.20260331130909.047_20260331_130946
|
pytrain.20260331130909.047
No summary available yet.
|
03-31 13:09 | Pending | - | |
|
exp_pytrain.20260331130611.046_20260331_130640
|
pytrain.20260331130611.046
No summary available yet.
|
03-31 13:06 | Pending | - | |
|
exp_pytrain.20260331130428.045_20260331_130517
|
pytrain.20260331130428.045
No summary available yet.
|
03-31 13:05 | Pending | - | |
|
exp_pytrain.20260331130145.044_20260331_130220
|
pytrain.20260331130145.044
No summary available yet.
|
03-31 13:02 | Pending | - | |
|
exp_pytrain.20260331130013.043_20260331_130054
|
pytrain.20260331130013.043
No summary available yet.
|
03-31 13:00 | Pending | - | |
|
exp_pytrain.20260331125903.042_20260331_125929
|
pytrain.20260331125903.042
No summary available yet.
|
03-31 12:59 | Pending | - | |
|
exp_pytrain.20260331125629.041_20260331_125702
|
pytrain.20260331125629.041
No summary available yet.
|
03-31 12:57 | Pending | - | |
|
exp_pytrain.20260331125444.040_20260331_125515
|
pytrain.20260331125444.040
No summary available yet.
|
03-31 12:55 | Pending | - | |
|
exp_pytrain.20260331125204.039_20260331_125238
|
pytrain.20260331125204.039
No summary available yet.
|
03-31 12:52 | Pending | - | |
|
exp_2411.02985v1_20260331_125051
|
2411.02985v1
**Architecture:** Hybrid sparse coding model utilizing a concatenated dictionary (Zernike polynomials + complex modes) and a trainable affine transform layer. Inference relies on $L_1$-regularized optimization (sparse recovery) rather than...
|
03-31 12:50 | Pending | - | |
|
exp_cr_10.1016_j.aiig.2024.100104_20260331_124955
|
cr_10.1016_j.aiig.2024.100104
**Architecture:** Proposes a **feed-forward Convolutional Sparse Coding (CSC)** network designed to replace iterative optimization algorithms. The structure typically utilizes cascaded convolutional layers coupled with non-linear shrinkage...
|
03-31 12:49 | Pending | - | |
|
exp_2603.26465v1_20260331_124906
|
2603.26465v1
**Architecture:** A hybrid model enhancing standard Transformers with Boltzmann Machine constraints. It integrates structured binary gating variables into multi-head attention to model higher-order dependencies, utilizing mean-field variati...
|
03-31 12:49 | Pending | - | |
|
exp_2411.01399v1_20260331_124712
|
2411.01399v1
**Architecture:** MambaReg introduces a hybrid architecture combining Convolutional Neural Networks (CNNs) with Mamba (State Space Models). It extracts local features via convolutions and processes global context via Mamba blocks to handle...
|
03-31 12:47 | Pending | - | |
|
exp_2603.25722v1_20260331_124604
|
2603.25722v1
**Architecture:** Modifies standard dual-encoder (Contrastive V&L) frameworks. Replaces final global pooling with **parameter-free cross-modal attention-pooling** to align concept-centric text segments with visual features. **Memory Footpri...
|
03-31 12:46 | Pending | - | |
|
exp_2410.18794v2_20260331_124501
|
2410.18794v2
**Architecture:** Hybrid model integrating a lightweight "predictor network" (CNN) with a hard-thresholded Convolutional Locally Competitive Algorithm (LCA) solver. The predictor performs "state warm-up," generating a high-quality initial g...
|
03-31 12:45 | Pending | - | |
|
exp_hf_2603.13904_20260331_124253
|
hf_2603.13904
**Paper:** CroBo (Visual States Need What-is-Where Composition) **Architecture:** CroBo is a self-supervised encoder-decoder framework designed to compress visual observations into a **single, compact bottleneck token** capturing "what-is-w...
|
03-31 12:42 | Pending | - | |
|
exp_cr_10.3390_pr13071977_20260331_124202
|
cr_10.3390_pr13071977
**Architecture:** TransQwen is a specialized fine-tune of **Qwen-7B-Chat** utilizing **DoRA** (Weight-Decomposed Low-Rank Adaptation) for parameter-efficient updates and RoPE for positional encoding. This is a **weight-based learning approa...
|
03-31 12:42 | Pending | - | |
|
exp_2412.00503v3_20260331_124105
|
2412.00503v3
**Architecture:** The paper proposes integrating biological homeostasis mechanisms—RFB-kWTA (Random Feedback k-Winners-Take-All) and "Smart" Inhibition—into standard Transformer attention and output layers. These modules use running statist...
|
03-31 12:41 | Pending | - | |
|
exp_cr_10.3390_info16050343_20260331_123838
|
cr_10.3390_info16050343
**Architecture:** Introduces **CPSE** (encoding) and **CPSD** (decoding), a framework utilizing Sparse Binary Representations (SDRs) and triadic memory. It extends Context-Dependent Thinning (CDT) to manage nested compositional structures a...
|
03-31 12:38 | Pending | - | |
|
exp_cr_10.3390_s25010064_20260331_123735
|
cr_10.3390_s25010064
**Architecture** Built on **Llama-3-8B-Instruct**, optimized via **LoRA** to integrate multi-attribute inputs (historical interactions, driver emotion, vehicle/physics state). It functions as an encoder-decoder for intent prediction, treati...
|
03-31 12:37 | Pending | - | |
|
exp_2410.16443v4_20260331_123638
|
2410.16443v4
**Architecture:** CRATE (Coding RAte TransformEr) is a "white-box" Transformer variant that explicitly integrates sparse coding mechanisms—specifically coding rate minimization—directly into the network layers to capture low-dimensional dat...
|
03-31 12:36 | Pending | - | |
|
exp_core_299002838_20260331_123415
|
core_299002838
This review surveys Transformer-based LLMs and multi-modal architectures for Prognostics and Health Management (PHM), specifically targeting deployment on resource-constrained industrial hardware. * **Architecture:** Focuses on adapting gen...
|
03-31 12:34 | Pending | - | |
|
exp_2410.00340v3_20260331_123324
|
2410.00340v3
**Assessment: Low Relevance for Inference Optimization** **Architecture:** No new model architecture proposed. The paper introduces a diagnostic tool using Singular Value Decomposition (SVD) on GPT-2 Small’s attention weight matrices to iso...
|
03-31 12:33 | Pending | - | |
|
exp_2411.01399v1_20260331_123226
|
2411.01399v1
**Architecture:** MambaReg introduces a hybrid architecture combining Convolutional Neural Networks (CNNs) with Mamba (State Space Models). It extracts local features via convolutions and processes global context via Mamba blocks to handle...
|
03-31 12:32 | Pending | - | |
|
exp_hf_2603.24793_20260331_123006
|
hf_2603.24793
**Architecture** AVControl is a modular framework built on the LTX-2 DiT architecture. It employs a "parallel canvas" mechanism, injecting control modalities (e.g., depth, pose, audio) as additional tokens within attention layers. Each cont...
|
03-31 12:30 | Pending | - | |
|
exp_2506.24041v1_20260331_122908
|
2506.24041v1
Fallback synthesis: Unsupervised Sparse Coding-based Spiking Neural Network for Real-time Spike Sorting. Potential 8GB relevance via sparse, inference, rag.
|
03-31 12:29 | Pending | - | |
|
exp_2508.16915v3_20260331_122807
|
2508.16915v3
Fallback synthesis: Reinforcement-Guided Hyper-Heuristic Hyperparameter Optimization for Fair and Explainable Spiking Neural Network-Based Financial Fraud Detection. Potential 8GB relevance via sparse, rag.
|
03-31 12:28 | Pending | - | |
|
exp_2410.00340v3_20260331_122548
|
2410.00340v3
**Assessment: Low Relevance for Inference Optimization** **Architecture:** No new model architecture proposed. The paper introduces a diagnostic tool using Singular Value Decomposition (SVD) on GPT-2 Small’s attention weight matrices to iso...
|
03-31 12:25 | Pending | - | |
|
exp_pytrain.20260331121908.038_20260331_121941
|
Python Skill Fallback
Title: Robust Generic Plugin Registry with Metadata Simulation - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-31 12:20 | Success | - | |
|
exp_2401.00243v1_20260331_120659
|
2401.00243v1
**Architecture:** UP-RLHF introduces a training-time architecture utilizing an ensemble of diverse Low-Rank Adaptations (LoRAs) for the Reward Model (RM). Diversity is enforced by maximizing the nuclear norm of concatenated LoRA matrices. T...
|
03-31 12:06 | Pending | - | |
|
exp_oa_W7139145681_20260331_120430
|
CARE: Covariance-Aware and Rank-Enhanced Decomposition Benchmark
**Architecture:** CARE converts Grouped-Query Attention (GQA) to Multi-Head Latent Attention (MLA). It replaces standard low-rank SVD baselines with **activation-preserving factorization** and **adjusted-rank allocation**, distributing rank...
|
03-31 12:05 | Success | - | |
|
exp_gh_HyperKuvid-Labs_SpecQuant_20260331_120105
|
HyperKuvid-Labs/SpecQuant
**Architecture:** Proposes an adaptive speculative decoding pipeline. A lightweight classifier routes inputs based on complexity to select specific quantized draft models. These drafts generate tokens verified by a larger FP16 target model....
|
03-31 12:02 | Success | - | |
|
exp_pytrain.20260331113610.037_20260331_113637
|
Python Skill Fallback
Title: Protocol-Based Dynamic Module Loader - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-31 11:37 | Success | - | |
|
exp_cr_10.7717_peerj-cs.3388_20260331_112615
|
cr_10.7717_peerj-cs.3388
Fallback synthesis: Towards optimal sparse CNNs: sparsity-friendly knowledge distillation through feature decoupling. Potential 8GB relevance via sparse.
|
03-31 11:26 | Pending | - | |
|
exp_2509.10033v1_20260331_112420
|
Sparse Coding Representation of 2-way Data (AODL)
Fallback synthesis: Sparse Coding Representation of 2-way Data. Potential 8GB relevance via linear, sparse.
|
03-31 11:25 | Success | - | |
|
exp_2410.08003v6_20260331_112118
|
More Experts Than Galaxies: Conditionally-overlapping Experts With Biologically-Inspired Fixed Routing
**Architecture:** COMET replaces trainable gating networks with fixed, biologically-inspired random projections. It utilizes a modular, sparse architecture where experts overlap conditionally based on input similarity, rather than remaining...
|
03-31 11:22 | Success | - | |
|
exp_pytrain.20260331105438.036_20260331_105507
|
PEP 561 Compliant Package Scaffolder
An autonomous coding system can effectively combine the 'packaging' module structure (PEP 561) with advanced 'typing' constructs (TypedDict, Protocol) to create a robust, metadata-aware build tool without relying on external dependencies li...
|
03-31 10:56 | Success | - | |
|
exp_cr_10.1609_aaai.v38i12.29237_20260331_105149
|
OWQ Benchmark: Outlier-Aware Mixed-Precision Quantization
**Architecture:** OWQ utilizes a sensitivity-aware, mixed-precision strategy. It isolates a small subset of structured "outlier" weights—typically sensitive to quantization—and retains them in high-precision (FP16). The remaining dense weig...
|
03-31 10:52 | Success | - | |
|
exp_2603.25722v1_20260331_105047
|
2603.25722v1
**Architecture:** Modifies standard dual-encoder (Contrastive V&L) frameworks. Replaces final global pooling with **parameter-free cross-modal attention-pooling** to align concept-centric text segments with visual features. **Memory Footpri...
|
03-31 10:50 | Pending | - | |
|
exp_cr_10.3390_technologies13120587_20260331_104746
|
CALM: Continual Associative Learning Model via Sparse Distributed Memory
Fallback synthesis: CALM: Continual Associative Learning Model via Sparse Distributed Memory. Potential 8GB relevance via sparse, memory, inference, rag.
|
03-31 10:48 | Success | - | |
|
exp_pytrain.20260331102158.035_20260331_102228
|
Generic Extension Loader with Runtime Type Verification
This benchmark tests a plugin architecture hypothesis: that explicit generic constraints (PEP 484/695) combined with dynamic module loading (importlib) create a more robust system by catching type mismatches at registration time rather than...
|
03-31 10:23 | Success | - | |
|
exp_2411.02985v1_20260331_100820
|
2411.02985v1
**Architecture:** Hybrid sparse coding model utilizing a concatenated dictionary (Zernike polynomials + complex modes) and a trainable affine transform layer. Inference relies on $L_1$-regularized optimization (sparse recovery) rather than...
|
03-31 10:08 | Pending | - | |
|
exp_2603.26465v1_20260331_100654
|
2603.26465v1
**Architecture:** A hybrid model enhancing standard Transformers with Boltzmann Machine constraints. It integrates structured binary gating variables into multi-head attention to model higher-order dependencies, utilizing mean-field variati...
|
03-31 10:06 | Pending | - | |
|
exp_hf_2603.13904_20260331_100414
|
hf_2603.13904
**Paper:** CroBo (Visual States Need What-is-Where Composition) **Architecture:** CroBo is a self-supervised encoder-decoder framework designed to compress visual observations into a **single, compact bottleneck token** capturing "what-is-w...
|
03-31 10:04 | Pending | - | |
|
exp_cr_10.13052_dgaej2156-3306.40565_20260331_100317
|
cr_10.13052_dgaej2156-3306.40565
Fallback synthesis: Energy Efficient Optimization of Current Transformer Error Compensation in Smart Grids Using Sparse Coding and Blockchain-Secured IoT Framework. Potential 8GB relevance via linear, sparse.
|
03-31 10:03 | Pending | - | |
|
exp_2507.07136v2_20260331_100214
|
2507.07136v2
Fallback synthesis: LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS. Potential 8GB relevance via sparse, inference, rag.
|
03-31 10:02 | Pending | - | |
|
exp_cr_10.3390_info16050343_20260331_095938
|
cr_10.3390_info16050343
**Architecture:** Introduces **CPSE** (encoding) and **CPSD** (decoding), a framework utilizing Sparse Binary Representations (SDRs) and triadic memory. It extends Context-Dependent Thinning (CDT) to manage nested compositional structures a...
|
03-31 09:59 | Pending | - | |
|
exp_pytrain.20260331094205.034_20260331_094234
|
Python Skill Fallback
Title: Type-Safe Package Resource Loader - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-31 09:43 | Success | - | |
|
exp_pytrain.20260331090809.033_20260331_090847
|
Type-Safe Dependency Injection Container
Overview This benchmark tests the ability to implement a robust, structural-subtyping based Dependency Injection (DI) container using only Python's standard library. Objective Implement a `Container` class that leverages `typing.Protocol` t...
|
03-31 09:09 | Success | - | |
|
exp_2411.13117v2_20260331_085323
|
Benchmark: Amortisation Gap in Sparse Autoencoders
**Architecture:** Proposes decoupling the SAE pipeline. Replaces the standard single-pass linear encoder with iterative sparse inference algorithms (e.g., optimization-based solvers like ISTA) to recover accurate latent codes, while retaini...
|
03-31 08:54 | Success | - | |
|
exp_pytrain.20260331082649.032_20260331_082714
|
Strict Package Introspection & Typed Configuration Validator Benchmark
This benchmark evaluates the robustness of a Python coding system in implementing strict type safety, Generic programming, and runtime environment introspection using only the Python Standard Library. Objective Create a dependency managemen...
|
03-31 08:28 | Success | - | |
|
exp_2603.26323v1_20260331_081234
|
This benchmark tests the **Computational Primitives of Spatial Reasoning** in Large Language Models, inspired by recent...
**Assessment for ARES 8GB Roadmap** This paper investigates the internal spatial reasoning capabilities of standard multilingual Transformer architectures using linear probing and sparse autoencoders. It decomposes reasoning into three prim...
|
03-31 08:13 | Success | - | |
|
exp_pytrain.20260331074457.031_20260331_074525
|
Dynamic Namespace Package Loader with Structural Type Validation
This benchmark tests the ability of a Python script to dynamically generate a distributable package structure (zip archive), load it at runtime, and enforce strict structural typing (using `typing.Protocol`) to validate modules without requ...
|
03-31 07:46 | Success | - | |
|
exp_oa_W4393064007_20260331_073141
|
MELT Benchmark Suite: Local Simulation
**Paper:** *MELTing point: Mobile Evaluation of Language Transformers* **Type:** Infrastructure/Benchmarking Study (Not RAG/Retrieval). **Architecture:** Introduces **MELT**, a headless benchmarking framework for evaluating instruction-tune...
|
03-31 07:32 | Success | - | |
|
exp_pytrain.20260331070541.030_20260331_070609
|
Strictly-Typed Namespace Dispatcher Drill
This benchmark validates your ability to implement a strictly-typed command pattern using Python's `typing.Protocol`. The objective is to create a robust, type-safe plugin dispatcher system without external dependencies. Instructions 1. Ens...
|
03-31 07:07 | Success | - | |
|
exp_2411.02199v5_20260331_065052
|
Provably Transformers Harness Multi-Concept Word Semantics for Efficient In-Context Learning
**Paper:** Provably Transformers Harness Multi-Concept Word Semantics for Efficient In-Context Learning **Classification:** Theoretical Analysis (Non-Engineering) **Roadmap Relevance:** Low / None. This paper provides a mathematical proof r...
|
03-31 06:51 | Success | - | |
|
exp_pytrain.20260331063050.029_20260331_063117
|
Runtime Type-Checked Dynamic Plugin Loader
This benchmark evaluates the ability to construct a robust, type-safe plugin loading mechanism using Python's standard library. The task is to dynamically load Python modules from a filesystem path and strictly validate their interface agai...
|
03-31 06:32 | Success | - | |
|
exp_pytrain.20260331055833.028_20260331_055902
|
Strictly Typed Processor & Modern Packaging Generation Benchmark
Overview This benchmark evaluates the ability of a Python script to dynamically construct a valid, modern Python project structure compliant with PEP 621 (using `pyproject.toml`) and generate strictly typed source code utilizing Generics an...
|
03-31 06:00 | Success | - | |
|
exp_pytrain.20260331052437.027_20260331_052537
|
Typed Neural Architecture Registry with Dynamic Plugin Loading
Overview This coding drill benchmarks your ability to design a strictly typed, modular Python framework that mimics the architecture of modern deep learning libraries (like PyTorch or LitGPT). The Challenge You are tasked with implementing...
|
03-31 05:26 | Success | - | |
|
exp_2603.26365v1_20260331_050240
|
SCORE: Dynamic Token Compression Benchmark
**Architecture:** SCORE utilizes a lightweight policy network conditioned on inter-frame residuals ("surprise") to dynamically prune redundant visual tokens. Unlike static merging, it employs Group-wise Reinforcement Learning (RL) to learn...
|
03-31 05:03 | Success | - | |
|
exp_pytrain.20260331043016.026_20260331_043054
|
Lazy-Loading Submodule Proxy with Type Safety
Design Brief This benchmark implements a lazy-loading mechanism designed to minimize the startup overhead of Python applications that depend on heavy libraries (e.g., `torch`, `numpy`, `tensorflow`). This pattern is commonly found in high-p...
|
03-31 04:31 | Success | - | |
|
exp_pytrain.20260331035202.025_20260331_035256
|
Robust Plugin Loader with Runtime Type Verification
Overview This coding drill tests the ability to construct a zero-dependency plugin management system using Python's standard library. The system simulates a package environment where code modules are discovered, loaded, and validated agains...
|
03-31 03:53 | Success | - | |
|
exp_pytrain.20260331031049.024_20260331_031204
|
The Modular Typed CLI Benchmark
This benchmark verifies the architectural robustness of a Python module designed according to strict typing and separation of concerns principles. Objective The benchmark validates a generated module (`data_processor.py`) against three spec...
|
03-31 03:13 | Success | - | |
|
exp_pytrain.20260331023453.023_20260331_023542
|
Dynamic Type-Safe Plugin Loader
This benchmark evaluates the implementation of a robust, loosely-coupled plugin system using Python's standard library. It demonstrates runtime component discovery and validation by defining a strict `typing.Protocol`, dynamically generatin...
|
03-31 02:36 | Success | - | |
|
exp_pytrain.20260331015839.022_20260331_015931
|
Generic Datastore Benchmark (PEP 695)
Overview This benchmark evaluates the implementation of a generic datastore using Python 3.12's Type Parameter Syntax (PEP 695). It verifies type safety, packaging hygiene (`__all__`, `__version__`), and CLI integration using only the Pytho...
|
03-31 02:00 | Success | - | |
|
exp_pytrain.20260331012202.021_20260331_012236
|
Strictly Typed Plugin Registry Benchmark
Objective This benchmark evaluates the ability to write robust, production-grade Python code using advanced standard library features. It tests adherence to strict type checking (`mypy --strict`), packaging hygiene (`__all__`, `__version__`...
|
03-31 01:23 | Success | - | |
|
exp_2603.26434v1_20260331_010325
|
Automating Clinical Information Retrieval from Finnish Electronic Health Records Using Large Language Models
**Paper:** Automating Clinical Information Retrieval from Finnish EHRs **Architecture:** Clinical Contextual Question Answering (CCQA) framework utilizing open-source LLMs (Llama-3.1-70B, Qwen3-30B) for offline inference on Finnish clinical...
|
03-31 01:04 | Success | - | |
|
exp_pytrain.20260331003905.020_20260331_003926
|
Python Reliability Drill: Robust Typing & Telemetry
Overview This benchmark evaluates your ability to write robust, type-safe Python code using standard library type hints (`typing` module) without external dependencies. The task is to implement a `TypeSafeContainer` that enforces strict typ...
|
03-31 00:40 | Success | - | |
|
exp_2512.19720v1_20260331_002859
|
Benchmark: Per-Axis 1-Bit Weight Deltas
**Architecture:** Proposes a **1-bit delta scheme** where fine-tuned weights are stored as the sign of the difference ($\pm 1$) from a base model, augmented with learned **per-axis (row/column) FP16 scaling factors** derived from a small ca...
|
03-31 00:30 | Success | - | |
|
exp_pytrain.20260331000621.019_20260331_000656
|
Typed Configuration Dispatch System
This benchmark simulates a core component of a machine learning inference framework (similar in design philosophy to Hugging Face `transformers` or `diffusers`). It utilizes Python's static typing features (`Protocol`, `TypedDict`) to decou...
|
03-31 00:07 | Success | - | |
|
exp_2312.17493v2_20260330_235343
|
Benchmark for DP-LoRA
**Architecture:** DP-LoRA integrates Federated Learning (FL) with Low-Rank Adaptation. Clients train lightweight LoRA adapters locally, while a Gaussian mechanism injects noise into weight updates to ensure Differential Privacy (DP), preven...
|
03-30 23:54 | Success | - | |
|
exp_pytrain.20260330232806.018_20260330_232832
|
Generic Plugin Registry with Runtime Type Validation
**Hypothesis**: Utilizing `typing.Protocol` combined with Generics provides a strict contract for interoperability within a package ecosystem, enabling `importlib`/`inspect`-based loaders to validate plugin compatibility at runtime. This en...
|
03-30 23:29 | Success | - | |
|
exp_2603.26595v1_20260330_231342
|
PQuantML: A Tool for End-to-End Hardware-aware Model Compression
**PQuantML: End-to-End Hardware-Aware Compression** * **Architecture:** PQuantML is an open-source library providing a unified interface for model compression. It supports structured and unstructured pruning alongside fixed-point quantizati...
|
03-30 23:14 | Success | - | |
|
exp_pytrain.20260330224418.017_20260330_224455
|
Python Skill Fallback
Title: Robust Package Metadata Validator and Entry Point Simulator - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-30 22:45 | Success | - | |
|
exp_oa_W4413681814_20260330_222838
|
Dynamic Precision Quantization for Iterative Generative Models
**Summary:** This survey reviews quantization strategies to mitigate the high computational and memory costs of diffusion models. * **Architecture:** Focuses on the sensitivity of hierarchical, iterative denoising architectures where quanti...
|
03-30 22:29 | Success | - | |
|
exp_pytrain.20260330215859.016_20260330_215926
|
Dynamic Type-Verified Package Generator
This coding drill benchmarks an autonomous system's ability to dynamically scaffold a Python package structure, enforce strict typing via the `typing` module, and validate the module's interface using `importlib` introspection without relyi...
|
03-30 22:00 | Success | - | |
|
exp_oa_W4413364992_20260330_214542
|
Benchmarking Unified Quantization in Generative AI
**Architecture:** This paper is a technical survey of quantization strategies applicable to large-scale autoregressive transformers and diffusion models. It focuses on unified, differentiable quantization frameworks designed to handle the n...
|
03-30 21:46 | Success | - | |
|
exp_pytrain.20260330211700.015_20260330_211737
|
Typed Plugin Registry Benchmark
This coding drill verifies the implementation of a robust, type-safe Plugin Registry using Python 3.12+ features (PEP 695). Features - **Modern Syntax**: Uses `type` alias statements and generic class parameter syntax (e.g., `class Registry...
|
03-30 21:18 | Success | - | |
|
exp_cr_10.1609_aaai.v40i32.39899_20260330_210141
|
RCMoE Benchmark
**Architecture:** RCMoE targets Mixture-of-Experts (MoE) models to reduce the "All-to-All" communication bottleneck. It utilizes **Local-Stochastic Quantization** to compress intermediate expert outputs row-by-row and **Probabilistic Thresh...
|
03-30 21:03 | Success | - | |
|
exp_pytrain.20260330203515.014_20260330_203545
|
Python Skill Fallback
Title: Dynamic Model Registry with Structural Typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-30 20:36 | Success | - | |
|
exp_oa_W7128864297_20260330_202316
|
MiniCPM-SALA Attention Mechanism Benchmark
**Architecture:** 9B parameter model hybridizing Sparse (InfLLM-V2) and Linear (Lightning) attention in a 1:3 ratio using Hybrid Positional Encoding (HyPE) to balance local fidelity with global efficiency. **Memory Footprint:** Linear atten...
|
03-30 20:24 | Success | - | |
|
exp_pytrain.20260330195549.013_20260330_195628
|
Strictly-Typed Modular Plugin Registry
This benchmark implements a zero-dependency plugin architecture using Python's `typing.Protocol` and `typing.runtime_checkable`. It demonstrates the creation of a `SystemRegistry` capable of runtime type validation and automatic discovery o...
|
03-30 19:57 | Success | - | |
|
exp_2603.26603v1_20260330_194137
|
Benchmark: On-Device LLM Efficiency & Quantization Paradox
**Summary for ARES 8GB Roadmap** This paper provides an empirical analysis of on-device LLMs (0.5B–9B) regarding energy, latency, and quality, utilizing a Samsung Galaxy S25 Ultra. * **Architecture:** The study identifies **Mixture-of-Exper...
|
03-30 19:42 | Success | - | |
|
exp_pytrain.20260330191615.012_20260330_191640
|
Type-Safe Dynamic Plugin Loader Benchmark
This benchmark tests the system's ability to construct a robust, type-safe plugin architecture using only Python's standard library. It specifically targets advanced features such as structural subtyping (using `typing.Protocol`), dynamic m...
|
03-30 19:17 | Success | - | |
|
exp_oa_W4400337965_20260330_190205
|
KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches
This benchmark evaluates 10+ techniques to mitigate KV cache memory growth, the primary bottleneck for long-context inference on 8GB VRAM hardware. * **Architecture:** Provides a taxonomy of efficiency-focused approaches, including KV quant...
|
03-30 19:03 | Success | - | |
|
exp_pytrain.20260330183329.011_20260330_183410
|
Coding Drill: Protocol-Based Namespace Loader
Objective Design a robust, single-file Python script that implements a dynamic plugin loader. This system leverages Python's structural subtyping (Protocols) to enforce interface compliance without explicit inheritance. The script must simu...
|
03-30 18:35 | Success | - | |
|
exp_2401.00503v1_20260330_181942
|
Backfill Candidate 2401.00503v1
**Architecture:** Viz proposes a marketplace framework integrating QLoRA to decouple frozen base model weights from trainable adapters. This architecture facilitates a copyright-compliant ecosystem where content licensing is managed explici...
|
03-30 18:20 | Success | - | |
|
exp_pytrain.20260330175336.010_20260330_175411
|
Python Skill Fallback
Title: Strictly Typed Asynchronous Plugin Loader - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-30 17:55 | Success | - | |
|
exp_cr_10.55041_ijsrem43474_20260330_173844
|
Developing New AI Model Compression Techniques
This survey reviews foundational compression techniques—pruning, quantization, and knowledge distillation—aimed at enabling edge AI. * **Architecture:** Validates lightweight backbones (MobileNet, SqueezeNet) and structural sparsity as effe...
|
03-30 17:39 | Success | - | |
|
exp_pytrain.20260330171056.009_20260330_171136
|
Python Skill Fallback
Title: Dynamic ZipApp Packager with Runtime Type Verification - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-30 17:12 | Success | - | |
|
exp_core_159796903_20260330_165618
|
Benchmark: Transformer vs. Efficient SSM (Mamba-style) Architecture
**Summary for ARES 8GB Roadmap** * **Architecture:** Surveys compression techniques targeting standard Transformer Attention/FFN blocks. Contrasts these with inherently efficient architectures (Mamba, RetNet, RWKV) designed to replace atten...
|
03-30 16:57 | Success | - | |
|
exp_pytrain.20260330162958.008_20260330_163042
|
Dynamic Module Loader with Strict Generic Typing
Overview This coding drill benchmarks a robust, runtime-verified plugin architecture built entirely with the Python Standard Library. It demonstrates the synergy between **PEP 695** (Type Parameter Syntax) and Python's native import machine...
|
03-30 16:31 | Success | - | |
|
exp_oa_W4391766345_20260330_161648
|
A Survey on Transformer Compression
**Architecture:** Reviews compression techniques for standard Transformers (Attention/FFN blocks) and efficient architectures like Mamba, RetNet, and RWKV that utilize linear-complexity mechanisms to bypass quadratic attention constraints....
|
03-30 16:17 | Success | - | |
|
exp_pytrain.20260330154854.007_20260330_154933
|
Python Skill Fallback
Title: Typed Plugin Discovery System - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-30 15:50 | Success | - | |
|
exp_hf_2603.18742_20260330_153604
|
6Bit-Diffusion Benchmark
**Architecture:** Proposes an inference-time mixed-precision quantization framework (NVFP4/INT8) and Temporal Delta Cache (TDC) for Video Diffusion Transformers (DiTs). A lightweight predictor dynamically allocates NVFP4 to temporally stabl...
|
03-30 15:37 | Success | - | |
|
exp_pytrain.20260330151254.006_20260330_151331
|
Strictly-Typed Component Registry and Serialization Benchmark
Objective This benchmark tests the ability to construct a robust, plugin-based architecture reminiscent of Hugging Face `diffusers` or `vLLM` using only the Python standard library. Core Concepts 1. **Protocol-Based Design**: Using `typing....
|
03-30 15:14 | Success | - | |
|
exp_2412.08890v1_20260330_150104
|
Lexico KV Cache Compression Benchmark
**Architecture** Lexico replaces the standard KV cache with a **sparse coding** framework. It utilizes a small, input-agnostic dictionary of ~4k atoms to reconstruct attention vectors. The encoding process employs **Orthogonal Matching Purs...
|
03-30 15:02 | Success | - | |
|
exp_pytrain.20260330143502.005_20260330_143540
|
Python Skill Fallback
Title: Strict Protocol-Based Plugin Loader - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-30 14:36 | Success | - | |
|
exp_oa_W7133137559_20260330_142201
|
This benchmark validates the core architectural efficiency claims described in the paper regarding "Tokens as Computatio...
**Architecture:** Theoretical analysis of Transformer embeddings and the $O(n^2)$ complexity of attention mechanisms. Reviews optimization techniques including token pruning, sparse attention, and long-context extensions. **Memory Footprint...
|
03-30 14:23 | Success | - | |
|
exp_pytrain.20260330135546.004_20260330_135618
|
Python Reliability Drill: Typing & Packaging
This benchmark demonstrates the creation of a robust, type-safe data processing utility using only the Python Standard Library. It focuses on strict type checking enforcement at runtime to ensure reliability, utilizing advanced `typing` mod...
|
03-30 13:57 | Success | - | |
|
exp_oa_W7125352730_20260330_134113
|
LLMOrbit: The Efficiency Revolution Benchmark
**LLMOrbit** is a survey analyzing 50+ models to identify efficiency paradigms critical for the ARES 8GB roadmap. It highlights a shift from brute-force scaling to architectural optimization to overcome data scarcity and hardware costs. * *...
|
03-30 13:42 | Success | - | |
|
exp_pytrain.20260330131606.003_20260330_131652
|
Dynamic Type-Safe Plugin Loader Benchmark
This benchmark evaluates a Python architecture that enforces strict type safety on dynamically loaded modules. It tests the hypothesis that `typing.Protocol` combined with `importlib` provides a robust, zero-dependency mechanism for plugin...
|
03-30 13:17 | Success | - | |
|
exp_cr_10.1145_3725338_20260330_130141
|
PQCache Benchmark
**Architecture & Retrieval Strategy:** PQCache reframes KV cache management as an **embedding retrieval** task. It utilizes **Product Quantization (PQ)** to compress token keys into compact codes during the prefill phase. During decoding, i...
|
03-30 13:02 | Success | - | |
|
exp_pytrain.20260330123406.002_20260330_123446
|
Python Skill Fallback
Title: Generic Data Buffer with PEP 695 Type Parameters - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-30 12:35 | Success | - | |
|
exp_oa_W4405434119_20260330_121731
|
SCBench: Shared Context Benchmark Evaluation
**Paper:** SCBench: A KV Cache-Centric Analysis of Long-Context Methods **Relevance to ARES 8GB Roadmap:** This paper provides a critical benchmark for optimizing the **KV cache lifecycle**, specifically for **shared contexts** (e.g., syste...
|
03-30 12:20 | Success | - | |
|
exp_pytrain.20260330115255.001_20260330_115322
|
Generic Type-Safe Service Locator
This benchmark tests the ability to construct a robust, modular Dependency Injection (DI) container using Python's standard `typing` module. Objective Implement a `ServiceLocator` that decouples interface definitions (Protocols) from concre...
|
03-30 11:54 | Success | - | |
|
exp_oa_W4405434119_20260330_114104
|
SCBench: KV Cache Shared-Context Evaluation
**Paper:** SCBench: A KV Cache-Centric Analysis of Long-Context Methods **Relevance to ARES 8GB Roadmap:** This paper provides a critical benchmark for optimizing the **KV cache lifecycle**, specifically for **shared contexts** (e.g., syste...
|
03-30 11:41 | Pending | - | |
|
exp_pytrain.20260330111810.001_20260330_111845
|
Type-Safe Dynamic Plugin Loader Benchmark
Overview This benchmark demonstrates the implementation of a robust, type-safe plugin system in Python using structural subtyping (`typing.Protocol`) and dynamic module loading (`importlib`). The Hypothesis Using `Protocol` combined with `r...
|
03-30 11:19 | Success | - | |
|
exp_oa_W4405434119_20260330_110632
|
SCBench: Lightweight KV Cache Benchmark
**Paper:** SCBench: A KV Cache-Centric Analysis of Long-Context Methods **Relevance to ARES 8GB Roadmap:** This paper provides a critical benchmark for optimizing the **KV cache lifecycle**, specifically for **shared contexts** (e.g., syste...
|
03-30 11:06 | Pending | - | |
|
exp_pytrain.20260330103658.001_20260330_103730
|
Dynamic Package Entry Point Validator
This benchmark tests the ability to design a robust, type-safe package installation simulator using Python's standard `typing` module. Objective Implement a `validate_and_install` function that enforces strict adherence to: 1. **Data Contra...
|
03-30 10:38 | Success | - | |
|
exp_oa_W4405434119_20260330_102231
|
SCBench: Lightweight KV Cache Evaluation
**Paper:** SCBench: A KV Cache-Centric Analysis of Long-Context Methods **Relevance to ARES 8GB Roadmap:** This paper provides a critical benchmark for optimizing the **KV cache lifecycle**, specifically for **shared contexts** (e.g., syste...
|
03-30 10:22 | Pending | - | |
|
exp_pytrain.20260330095143.001_20260330_095221
|
Python Skill Fallback
Title: Type-Safe Plugin Dispatcher with Protocols - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-30 09:53 | Success | - | |
|
exp_oa_W4405434119_20260330_093714
|
SCBench: A KV Cache-Centric Analysis of Long-Context Methods
**Paper:** SCBench: A KV Cache-Centric Analysis of Long-Context Methods **Relevance to ARES 8GB Roadmap:** This paper provides a critical benchmark for optimizing the **KV cache lifecycle**, specifically for **shared contexts** (e.g., syste...
|
03-30 09:37 | Pending | - | |
|
exp_pytrain.20260330091417.033_20260330_091450
|
Strict Protocol-Based Plugin System with Dynamic Packaging
This benchmark evaluates an engineering system's ability to construct a robust, modular architecture using advanced Python type hinting (`typing.Protocol`) and dynamic module loading (`importlib`). Objective The benchmark programmatically s...
|
03-30 09:15 | Success | - | |
|
exp_pytrain.20260330083653.032_20260330_083727
|
Dynamic Plugin Registry Benchmark
This benchmark evaluates the system's ability to construct a robust, framework-style plugin loader using only the Python standard library. Objective Implement a `ModelRegistry` that: 1. Defines a strict `ModelProtocol` using `typing.Protoco...
|
03-30 08:38 | Success | - | |
|
exp_pytrain.20260330075617.031_20260330_075656
|
Strictly-Typed Dependency Resolver Simulator
Overview This benchmark implements a robust package resolution engine using Python's strict type system. It demonstrates the usage of `typing.Protocol`, `typing.Generic`, `@total_ordering`, and `dataclasses` to enforce compile-time logic co...
|
03-30 07:57 | Success | - | |
|
exp_pytrain.20260330071452.030_20260330_071521
|
Strictly Typed Dynamic Component Loader
This coding drill verifies the hypothesis that combining `typing.Protocol` with `importlib` enables the creation of robust, modular systems. Overview The benchmark script (`benchmark.py`) simulates an extensible asynchronous application. It...
|
03-30 07:16 | Success | - | |
|
exp_pytrain.20260330063706.029_20260330_063756
|
Type-Safe Dynamic Service Locator
This coding drill evaluates your ability to implement a robust dependency injection mechanism using Python's standard library. The challenge involves constructing a generic `ServiceLocator` that dynamically loads modules via `importlib` and...
|
03-30 06:38 | Success | - | |
|
exp_pytrain.20260330055723.028_20260330_055757
|
Strictly Typed Dependency Resolver
This benchmark evaluates the implementation of a robust dependency resolution system using Python's advanced standard library typing features. The goal is to ensure type safety, structural subtyping (via Protocols), and runtime integrity du...
|
03-30 05:58 | Success | - | |
|
exp_pytrain.20260330051735.027_20260330_051805
|
Python Skill Fallback
Title: Strictly-Typed Dynamic Module Loader - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-30 05:19 | Success | - | |
|
exp_pytrain.20260330043314.026_20260330_043344
|
Typed Plugin Registry and Configuration Validator
This benchmark tests the ability to design a robust, type-safe plugin architecture similar to those found in vLLM or Diffusers. It enforces strict interface compliance using `typing.Protocol`, centralizes component management via a Registry...
|
03-30 04:34 | Success | - | |
|
exp_pytrain.20260330040024.025_20260330_040051
|
Typed Dynamic Package Loader
This benchmark evaluates the ability to construct a Python runtime environment programmatically. The candidate script must define a strict type contract using the `typing` module, materialize a package directory structure on the physical di...
|
03-30 04:01 | Success | - | |
|
exp_pytrain.20260330032327.024_20260330_032419
|
Python Reliability Drill: Typing
Overview This drill tests the ability to implement a robust, type-safe utility class in Python without relying on external type checkers. The `StrictTypeRegistry` class enforces runtime type checking for object storage and retrieval, ensuri...
|
03-30 03:25 | Success | - | |
|
exp_pytrain.20260330024756.023_20260330_024837
|
Dynamic Package Instantiation and Type Verification
This benchmark tests the ability to programmatically generate Python package structures, write strictly typed code, dynamically import the code, and verify its compliance with a defined `typing.Protocol` interface. Description The script pe...
|
03-30 02:49 | Success | - | |
|
exp_pytrain.20260330021439.022_20260330_021528
|
PEP 695 Generic Repository & Dynamic Packaging Benchmark
This benchmark evaluates an autonomous coding system's ability to leverage Python 3.12+ Type Parameter Syntax (PEP 695) and dynamic module packaging mechanics within a single executable script. Features * **PEP 695 Syntax**: Defines generic...
|
03-30 02:16 | Success | - | |
|
exp_pytrain.20260330013932.021_20260330_014008
|
Python Skill Fallback
Title: Strictly-Typed Generic Module with Encapsulated API - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-30 01:41 | Success | - | |
|
exp_pytrain.20260330005440.020_20260330_005509
|
Python Skill Fallback
Title: Strictly Typed Dynamic Plugin Loader - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-30 00:56 | Success | - | |
|
exp_pytrain.20260330000834.019_20260330_000912
|
Strictly Typed Dynamic Plugin Loader
**Hypothesis:** Developing a modular architecture similar to HuggingFace Transformers requires mastery of advanced `typing` (Protocols, Generics) to define strict contracts and `importlib` to manage dynamic component discovery, ensuring ext...
|
03-30 00:10 | Success | - | |
|
exp_pytrain.20260329232402.018_20260329_232443
|
Python Skill Fallback
Title: Strictly Typed Plugin Registry with Dynamic Discovery - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 23:25 | Success | - | |
|
exp_pytrain.20260329224257.017_20260329_224312
|
Dynamic Plugin Registry with Structural Subtyping
This benchmark tests a Python system's ability to dynamically discover, load, and validate plugins based on structural subtyping (Protocols) rather than explicit inheritance. Objective Create a self-contained script that: 1. Generates a tem...
|
03-29 22:44 | Success | - | |
|
exp_pytrain.20260329215852.016_20260329_215950
|
Runtime Type-Safe Plugin Packaging Benchmark
This benchmark demonstrates advanced Python module internals by dynamically generating a plugin package structure at runtime, loading it via the import system, and enforcing strict structural typing constraints using `typing.Protocol`. Acce...
|
03-29 22:00 | Success | - | |
|
exp_pytrain.20260329211824.015_20260329_211849
|
Python Skill Fallback
Title: Type-Safe Dependency Injection Container - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 21:19 | Success | - | |
|
exp_pytrain.20260329203623.014_20260329_203653
|
Python Skill Fallback
Title: Dynamic Package Construction and Importlib Validation - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 20:37 | Success | - | |
|
exp_pytrain.20260329195600.013_20260329_195626
|
Python Skill Fallback
Title: Robust Type-Checked Plugin Registry - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 19:57 | Success | - | |
|
exp_pytrain.20260329192217.012_20260329_192246
|
Generic Model Registry with Runtime Type Validation
This drill implements a robust, modular component loader similar to those used in Hugging Face Transformers or PyTorch. It leverages Python's advanced `typing` features—specifically `Protocol`, `TypeVar`, and `Generic`—to ensure that dynami...
|
03-29 19:23 | Success | - | |
|
exp_pytrain.20260329183827.011_20260329_183857
|
Log Analysis System Design Drill
This drill challenges you to construct a robust, strictly-typed command-line interface (CLI) application in Python. The objective is to process simulated web server logs and generate statistics while demonstrating high-level software archit...
|
03-29 18:40 | Success | - | |
|
exp_pytrain.20260329180406.010_20260329_180429
|
Type-Safe Plugin Architecture Simulator Benchmark
This benchmark evaluates the design and execution of a strictly typed, concurrent plugin system simulated within a single Python script. It enforces modern Python packaging standards (`__version__`, `__all__`) and utilizes advanced typing f...
|
03-29 18:05 | Success | - | |
|
exp_pytrain.20260329173028.009_20260329_173101
|
Strictly-Typed Modular Resource Processor Benchmark
This benchmark assesses the ability to implement a robust, type-safe data processing pipeline using Python's advanced typing features. The candidate must construct a script that simulates a modular package structure, leveraging `Generic`, `...
|
03-29 17:32 | Success | - | |
|
exp_pytrain.20260329165531.008_20260329_165601
|
Python Skill Fallback
Title: Type-Safe Dependency Resolver Simulator - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 16:57 | Success | - | |
|
exp_pytrain.20260329161900.007_20260329_161927
|
Type-Safe Plugin Registry with Dynamic Discovery
This coding drill focuses on building a robust, generic plugin system using Python's advanced standard library features, specifically `typing`, `importlib`, and `inspect`. Objective Create a self-contained Python module that implements a ty...
|
03-29 16:20 | Success | - | |
|
exp_pytrain.20260329154614.006_20260329_154649
|
Type-Safe Plugin Architecture with Resource Encapsulation
Overview This benchmark simulates the creation of a robust, production-ready Python package infrastructure. It constructs a local package named `ml_infra` that demonstrates type safety using `typing.Protocol` and robust resource management...
|
03-29 15:47 | Success | - | |
|
exp_pytrain.20260329151219.005_20260329_151245
|
Python Skill Fallback
Title: Strictly-Typed Plugin Registry System - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 15:13 | Success | - | |
|
exp_pytrain.20260329142919.004_20260329_143002
|
Python Skill Fallback
Title: Strictly Typed Modular Task Runner - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 14:31 | Success | - | |
|
exp_pytrain.20260329134539.003_20260329_134557
|
Strict Typed Dynamic Extension Loader
This benchmark validates an autonomous agent's ability to construct a robust, dependency-free plugin system using Python's standard library. Objective The goal is to programmatically generate a temporary Python package containing multiple m...
|
03-29 13:47 | Success | - | |
|
exp_pytrain.20260329130847.002_20260329_130928
|
Python Skill Fallback
Title: Generic Configuration Manager with PEP 695 Syntax - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 13:10 | Success | - | |
|
exp_pytrain.20260329122856.001_20260329_122920
|
Structural Typing and Dynamic Module Loading
Overview This benchmark evaluates a script's ability to leverage Python's `typing` module for structural subtyping (Protocols) and `importlib` for dynamic package loading. It simulates a plugin system where a Python package is constructed a...
|
03-29 12:30 | Success | - | |
|
exp_pytrain.20260329113221.005_20260329_113304
|
Dynamic Plugin Architecture with Structural Typing
Objective Design a robust, extensible plugin system leveraging Python's `importlib` for runtime module discovery and `typing.Protocol` for enforcing strict interface compliance without explicit inheritance. Scenario You are building a data...
|
03-29 11:34 | Success | - | |
|
exp_pytrain.20260329105253.004_20260329_105323
|
Python Skill Fallback
Title: Dynamic Plugin Loader with Type-Safe Interface Enforcement - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 10:54 | Success | - | |
|
exp_pytrain.20260329101727.003_20260329_101751
|
Dynamic Type-Verified Package Constructor Benchmark
Overview This benchmark evaluates a system's ability to programmatically synthesize a valid Python package structure at runtime. It verifies that the generated code adheres to strict `typing.Protocol` definitions and can be successfully int...
|
03-29 10:18 | Success | - | |
|
exp_pytrain.20260329094511.002_20260329_094543
|
Modern Generic Data Container Benchmark (PEP 695)
Overview This benchmark evaluates the implementation of a generic, thread-safe data container utilizing Python 3.12's **PEP 695 Type Parameter Syntax**. It verifies the developer's ability to define scoped type parameters, constrained types...
|
03-29 09:46 | Success | - | |
|
exp_pytrain.20260329085930.001_20260329_085953
|
Generic Plugin Loader with Namespace Hygiene
Overview This benchmark validates a Python implementation of a robust, type-safe event processing system using only the standard library. It enforces strict structural subtyping (Protocol-based), generic programming, and namespace hygiene s...
|
03-29 09:00 | Success | - | |
|
exp_pytrain.20260329083145.001_20260329_083245
|
Python Skill Fallback
Title: Strictly-Typed Plugin Registry with Structural Subtyping - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 08:33 | Success | - | |
|
exp_pytrain.20260329081229.001_20260329_081251
|
Robust Typed Plugin System Benchmark
This benchmark evaluates the implementation of a strictly typed plugin system using Python's standard `typing` module features introduced in recent versions (specifically `Protocol`, `TypeVar`, and `Generic`). Context The script `benchmark....
|
03-29 08:13 | Success | - | |
|
exp_2302.00100v2_20260306_173656
|
This benchmark evaluates the performance of a **Physics-Informed Reduced-Order Model (PI-ROM)** for simulating the Time-...
README.md This benchmark evaluates the performance of a **Physics-Informed Reduced-Order Model (PI-ROM)** for simulating the Time-Dependent Schrödinger Equation (TDSE), as described in the innovation *2302.00100v2*. The goal is to demonstra...
|
03-29 08:01 | Success | - | |
|
exp_2302.00107v1_20260306_172525
|
Benchmark: Sequential Adaptive Aggregation for Federated GLMs
README.md Benchmark: Sequential Adaptive Aggregation for Federated GLMs This benchmark implements the **Sequential Data-Driven Aggregation** method described in paper 2302.00107v1. It demonstrates the improvement in statistical integrity an...
|
03-29 08:01 | Success | - | |
|
exp_2302.00129v1_20260307_053731
|
Explanation of the Benchmark Design
This benchmark evaluates the core claim of the innovation: **Efficiency without Optimization**. The paper argues that the topological efficiency of syntactic structures (short dependency lengths) arises naturally from a **sublinear preferen...
|
03-29 08:01 | Success | - | |
|
exp_2302.00129v1_20260307_071741
|
Benchmark: Syntactic Topological Efficiency
README.md Benchmark: Syntactic Topological Efficiency This benchmark investigates the "Universal Topological Regularities of Syntactic Structures." It tests the hypothesis that syntactic efficiency (minimized dependency length) can arise fr...
|
03-29 08:01 | Success | - | |
|
exp_2302.00136v2_20260306_180733
|
Benchmark: Differentiable Topological Loss (RTD)
README.md Benchmark: Differentiable Topological Loss (RTD) **Innovation Source:** arXiv:2302.00136v2 **Core Concept:** Integration of Topological Data Analysis (TDA) directly into Deep Learning loss functions via Representation Topology Div...
|
03-29 08:01 | Success | - | |
|
exp_2302.00136v2_20260307_053806
|
RTD-AE: Representation Topology Divergence Autoencoder Benchmark
README.md RTD-AE: Representation Topology Divergence Autoencoder Benchmark This benchmark evaluates the implementation of **RTD-AE** (Backfill Candidate 2302.00136v2), an autoencoder architecture constrained by a Representation Topology Div...
|
03-29 08:01 | Success | - | |
|
exp_2302.00136v2_20260307_053923
|
Here is the benchmark design for the RTD-AE (Representation Topology Divergence Autoencoder).
README.md
|
03-29 08:01 | Success | - | |
|
exp_2302.10800v1_20260307_072844
|
Backfill Candidate 2302.10800v1 (KG-Hub Data Infrastructure)
bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_2303.01590v4_20260306_172454
|
Here is the design for the benchmark.
README.md bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_2303.01610v1_20260306_174735
|
Benchmark: Self-Slimmable Sparse Mixture of Experts (SMoE-Dropout)
README.md Benchmark: Self-Slimmable Sparse Mixture of Experts (SMoE-Dropout) Overview This benchmark evaluates the **SMoE-Dropout** architecture (Candidate 2303.01610v1). The core innovation is the replacement of learned, complex routing po...
|
03-29 08:01 | Success | - | |
|
exp_2304.00387v1_20260307_105418
|
Benchmark: Backfill Candidate 2304.00387v1 (HaLP)
**Architecture:** Introduces a lightweight augmentation-free contrastive learning framework. The HaLP module hallucinates synthetic positive samples directly in the latent space using a closed-form solver, replacing the need for complex geo...
|
03-29 08:01 | Success | - | |
|
exp_2304.01222v1_20260307_155227
|
Benchmark: NeuroDAVIS (Parametric Dimensionality Reduction)
**Architecture** NeuroDAVIS employs an unsupervised deep neural network designed for dimensionality reduction. It extracts features non-linearly, theoretically preserving high-dimensional neighborhood relationships (local and global structu...
|
03-29 08:01 | Success | - | |
|
exp_2306.00204v1_20260306_180653
|
Benchmark for Directional Sharpness and Coordinate-wise Clipping
README.md Benchmark for Directional Sharpness and Coordinate-wise Clipping Innovation Overview This benchmark evaluates the optimization technique **Coordinate-wise Clipping** proposed in the analysis of "Directional Sharpness". The Theory...
|
03-29 08:01 | Success | - | |
|
exp_2306.01009v1_20260306_174523
|
Section 1: README.md
Benchmark: Scale vs. Reasoning Robustness **Innovation:** Backfill Candidate 2306.01009v1 **Core Finding:** Deductive reasoning in Transformer-Decoders is an emergent property of Scale. Larger models maintain reasoning robustness regardless...
|
03-29 08:01 | Success | - | |
|
exp_2306.17848v1_20260307_105126
|
Benchmark: Patch Mixing on CNNs (Backfill 2306.17848v1)
README.md Benchmark: Patch Mixing on CNNs (Backfill 2306.17848v1) This benchmark evaluates the **Patch Mixing** augmentation strategy as applied to a standard ResNet-18 architecture. Patch Mixing is a training-time augmentation that randoml...
|
03-29 08:01 | Success | - | |
|
exp_2307.00065v1_20260307_104653
|
Benchmark: Dense Scene Interaction Prediction (Candidate 2307.00065v1)
README.md Benchmark: Dense Scene Interaction Prediction (Candidate 2307.00065v1) Overview This benchmark validates the "Purely Data-Driven" approach described in *Backfill Candidate 2307.00065v1*. The abstract highlights that this model rel...
|
03-29 08:01 | Success | - | |
|
exp_2307.00097v3_20260307_110516
|
This benchmark evaluates the **POLE (Prompt-only Learning)** innovation, focusing on its proposed highly efficient memor...
README.md This benchmark evaluates the **POLE (Prompt-only Learning)** innovation, focusing on its proposed highly efficient memory footprint and fast inference speed for Weakly Supervised Semantic Segmentation (WSSS). **Innovation Highligh...
|
03-29 08:01 | Success | - | |
|
exp_2307.00112v2_20260307_153809
|
Local Medical Domain Evaluation Benchmark
README.md Local Medical Domain Evaluation Benchmark **Overview** This benchmark adapts the methodology of "Backfill Candidate 2307.00112v2" (evaluation of LLMs on medical exams) to a local, constrained environment. While the original paper...
|
03-29 08:01 | Success | - | |
|
exp_2307.00119v1_20260307_104745
|
Benchmark: Retrieval-Augmented Generation (RAG) with DPR
README.md Benchmark: Retrieval-Augmented Generation (RAG) with DPR This benchmark evaluates the architecture described in **2307.00119v1**, which proposes decoupling knowledge storage from model parameters. Architecture Overview Instead of...
|
03-29 08:01 | Success | - | |
|
exp_2307.00149v1_20260306_172849
|
HNC-CAD Architecture Benchmark
README.md HNC-CAD Architecture Benchmark This benchmark evaluates the performance of the **HNC-CAD** (Hierarchical Neural Code for Computer-Aided Design) architecture. The core innovation involves decomposing CAD construction into a **3-lev...
|
03-29 08:01 | Success | - | |
|
exp_2307.00149v1_20260307_094933
|
Benchmark: Hierarchical VQ-VAE CAD Generation (ARES 8GB Optimization)
README.md Benchmark: Hierarchical VQ-VAE CAD Generation (ARES 8GB Optimization) This repository contains a minimal, runnable benchmark to evaluate the performance and memory footprint of a Hierarchical VQ-VAE architecture coupled with Casca...
|
03-29 08:01 | Success | - | |
|
exp_2307.00150v1_20260307_085733
|
Benchmark: Local Automated Code Feedback (Backfill 2307.00150v1)
README.md Benchmark: Local Automated Code Feedback (Backfill 2307.00150v1) **Objective:** This benchmark validates the feasibility of replacing the cloud-based GPT-3.5 API (described in the source paper) with a locally hosted, quantized Sma...
|
03-29 08:01 | Success | - | |
|
exp_2307.00154v2_20260307_104903
|
```markdown
bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_2307.00169v1_20260307_154831
|
VoxWatch Benchmark Simulation
README.md VoxWatch Benchmark Simulation This directory contains a lightweight simulation of the **VoxWatch** benchmark logic, designed to quantify the "False-Alarm Problem" in Open-Set Speaker Identification (OSI). The Innovation The core i...
|
03-29 08:01 | Success | - | |
|
exp_2307.00171v1_20260307_154752
|
Benchmark: NLP Inference via Integer Linear Programming (ILP)
README.md Benchmark: NLP Inference via Integer Linear Programming (ILP) This benchmark evaluates the performance characteristics of NLP inference formulated as an Integer Linear Programming (ILP) problem, as discussed in the methodology of...
|
03-29 08:01 | Success | - | |
|
exp_2307.00174v1_20260307_154614
|
---
README.md --- Benchmark: Candidate 2307.00174v1 (Memory-Optimized Multimodal Segmentation) This benchmark evaluates a synthetic implementation of the architecture described in arXiv 2307.00174v1 ("Prior Prompt Encoder with Multimodal Fusion...
|
03-29 08:01 | Success | - | |
|
exp_2308.15620v1_20260306_180946
|
Here is the benchmark design for the "Fuzzy-Enhanced Hybrid Predictive System" (Backfill Candidate 2308.15620v1).
This benchmark evaluates the throughput and memory footprint of the proposed Hybrid Intelligence Architecture compared to a traditional statistical baseline. --- README.md Benchmark: Fuzzy-Enhanced Hybrid Predictive System vs. Traditional M...
|
03-29 08:01 | Success | - | |
|
exp_2309.16829v2_20260306_174101
|
Benchmark: Derivative-Free Feynman-Kac PINN
README.md Benchmark: Derivative-Free Feynman-Kac PINN This benchmark evaluates the performance differences between a standard **Physics-Informed Neural Network (PINN)** relying on Automatic Differentiation (AutoGrad) and the **Derivative-Fr...
|
03-29 08:01 | Success | - | |
|
exp_2309.16870v1_20260306_170641
|
Backfill Candidate 2309.16870v1: Recurrent Fusion Benchmark
**Architecture** LEF proposes a recurrent "late-to-early" fusion scheme that injects object-aware latent embeddings into the early stages of a pillar-based detector. It processes temporally aligned sparse pillar tokens using window-based at...
|
03-29 08:01 | Failed | GPU_REQUIRED policy blocked benchmark execution. | |
|
exp_2309.16898v1_20260306_172419
|
Benchmark: Hybrid Edge-Cloud Pipeline for Humanoid Interaction (Candidate 2309.16898v1)
README.md Benchmark: Hybrid Edge-Cloud Pipeline for Humanoid Interaction (Candidate 2309.16898v1) This benchmark evaluates the performance characteristics of a **Hybrid Pipeline Architecture** designed for resource-constrained humanoid plat...
|
03-29 08:01 | Success | - | |
|
exp_2311.16339v1_20260306_172101
|
Benchmark: Granular Event-Based Reward Shaping in RL
README.md Benchmark: Granular Event-Based Reward Shaping in RL Overview This benchmark evaluates the impact of **Granular, Event-Based Reward Shaping** on Reinforcement Learning training efficiency. It contrasts a standard "Sparse Reward" s...
|
03-29 08:01 | Success | - | |
|
exp_2312.16582v1_20260307_160847
|
Here is the design for the **Backfill Candidate 2312.16582v1 (Learnable Chamfer Distance)** benchmark.
This benchmark compares a standard Point Cloud Autoencoder using static Chamfer Distance against the same architecture augmented with the proposed **Learnable Chamfer Distance (LCD)** module. --- README.md Benchmark: Learnable Chamfer Dista...
|
03-29 08:01 | Success | - | |
|
exp_2312.16600v1_20260307_161509
|
Benchmark: CICL Architecture (Backfill Candidate 2312.16600v1)
README.md Benchmark: CICL Architecture (Backfill Candidate 2312.16600v1) This benchmark evaluates the memory footprint and inference throughput of the **Contrastive Instance-Consistent Learning (CICL)** architecture applied to single-cell R...
|
03-29 08:01 | Success | - | |
|
exp_2312.16610v1_20260307_124601
|
Benchmark: Efficient MoFME vs. Standard MoE
README.md Benchmark: Efficient MoFME vs. Standard MoE This benchmark evaluates the **Efficient Deweather Mixture-of-Experts (MoFME)** architecture against a **Standard Mixture-of-Experts (MoE)** baseline. The innovation in MoFME lies in rep...
|
03-29 08:01 | Success | - | |
|
exp_2312.16623v1_20260307_104426
|
This benchmark evaluates the memory footprint and inference latency of the architecture described in arXiv:2312.16623v1.
README.md This benchmark evaluates the memory footprint and inference latency of the architecture described in arXiv:2312.16623v1. **Innovation Summary:** The paper proposes a BERT-based enhancement for Chinese Spelling Check (CSC) featurin...
|
03-29 08:01 | Success | - | |
|
exp_2312.16627v1_20260307_124218
|
Here is the runnable benchmark for MIM4DD.
No summary available yet.
|
03-29 08:01 | Success | - | |
|
exp_2312.16649v1_20260307_110434
|
FatFormer (Backfill 2312.16649v1) Benchmark
README.md FatFormer (Backfill 2312.16649v1) Benchmark This benchmark evaluates the **FatFormer** architecture, focusing on its efficiency in memory usage and throughput when employing "Forgery-aware Adapters" and frequency domain analysis o...
|
03-29 08:01 | Success | - | |
|
exp_2312.16682v2_20260307_105556
|
Benchmark for Backfill Candidate 2312.16682v2
README.md Benchmark for Backfill Candidate 2312.16682v2 **Soft Margin Extension of the Binary Cringe Loss** This benchmark is designed to verify the core claims of the proposed training objective: 1. **Zero Inference Overhead:** The method...
|
03-29 08:01 | Success | - | |
|
exp_2312.16702v1_20260307_160805
|
```markdown
bash pip install torch transformers bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_2312.16707v1_20260307_105644
|
Section 1: README.md
No summary available yet.
|
03-29 08:01 | Success | - | |
|
exp_2312.16730v1_20260307_095021
|
Benchmark: Theoretical RL & Bandit Function Approximation
README.md Benchmark: Theoretical RL & Bandit Function Approximation This benchmark evaluates the fundamental concepts described in **Backfill Candidate 2312.16730v1**. Since the innovation is a theoretical survey of reinforcement learning a...
|
03-29 08:01 | Success | - | |
|
exp_2312.16733v1_20260307_113021
|
SuperServe Benchmark: SubNetAct & SlackFit
README.md SuperServe Benchmark: SubNetAct & SlackFit This benchmark evaluates the **SuperServe** architecture, specifically the **SubNetAct** mechanism and **SlackFit** scheduling policy, as described in the research on "Fine-Grained Infere...
|
03-29 08:01 | Success | - | |
|
exp_2312.17278v2_20260307_105019
|
Based on the provided abstract, the "TAISR framework" is a methodological guide for applying existing LLMs to research w...
We will benchmark the inference speed and VRAM usage of a standard model (`gpt2`) executing a "TAISR-style" complex prompting workflow (which involves context and role-playing) compared to a standard direct query. --- FILE_BREAK--- bash pip...
|
03-29 08:01 | Success | - | |
|
exp_2312.17279v3_20260307_124533
|
Here is the runnable benchmark for the Stateful Conformer with Cache-based Inference.
This benchmark compares a **Standard Buffered Conformer** (Baseline) against the proposed **Stateful Conformer with Cache** (Innovation). It simulates a streaming scenario where audio is processed in chunks, highlighting the memory efficien...
|
03-29 08:01 | Success | - | |
|
exp_2401.08664v3_20260307_095423
|
This repository contains the benchmarking suite for **Backfill Candidate 2401.08664v3**.
README.md This repository contains the benchmarking suite for **Backfill Candidate 2401.08664v3**. **Context:** As the associated document is a literature survey on Large Language Model (LLM) capabilities in education rather than a specific...
|
03-29 08:01 | Success | - | |
|
exp_2401.15203v1_20260306_172724
|
Benchmark: FedGT (Federated Graph Transformer) - Hybrid Attention Scheme
README.md Benchmark: FedGT (Federated Graph Transformer) - Hybrid Attention Scheme This repository contains a minimal, self-contained benchmark to evaluate the performance characteristics of the **FedGT (Federated Graph Transformer)** archi...
|
03-29 08:01 | Success | - | |
|
exp_2401.15236v2_20260306_180611
|
Dual-Norse Adaptive Inference Benchmark
README.md Dual-Norse Adaptive Inference Benchmark This benchmark simulates the **"Dual-Norse"** dynamic model-swapping architecture (Innovation 2401.15236v2). It demonstrates a hardware-constrained inference scenario (such as a nano-drone)...
|
03-29 08:01 | Success | - | |
|
exp_2401.15238v1_20260306_173527
|
Benchmark: Self-Supervised TabTransformer with Specialized Encoders
README.md Benchmark: Self-Supervised TabTransformer with Specialized Encoders This benchmark evaluates the performance of a **Self-Supervised TabTransformer** implementing the specialized input encoding strategies (Binned-TT and MLP-based-T...
|
03-29 08:01 | Success | - | |
|
exp_2402.16194v1_20260306_171132
|
ASEM Architecture Benchmark
README.md ASEM Architecture Benchmark This benchmark evaluates the performance characteristics of the **ASEM (Emotion Analysis on top of Sentiment Analysis)** architecture, specifically focusing on the **Mixture of Experts (Multiple Encoder...
|
03-29 08:01 | Success | - | |
|
exp_2403.18128v1_20260306_172338
|
This benchmark evaluates the performance characteristics of the **HealthGAT** architecture against a standard Transforme...
**Architecture:** HealthGAT utilizes a hierarchical Graph Attention Network (GAT) architecture. It transforms raw Electronic Health Records (EHR) into a graph structure, employing iterative refinement layers to update medical code embedding...
|
03-29 08:01 | Success | - | |
|
exp_2403.18159v2_20260306_173809
|
Here is the runnable benchmark for the `ov-freeze` innovation.
**Architecture:** Introduces **ov-freeze**, a lightweight Quantization-Aware Knowledge Distillation (KD-QAT) technique. It stabilizes the training of 4-bit weight quantized LLMs by addressing gradient propagation vulnerabilities identified...
|
03-29 08:01 | Success | - | |
|
exp_2405.16312v2_20260306_180544
|
```markdown
README.md bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_2405.16339v2_20260306_174141
|
Section 1: README.md
bash pip install torch python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_2405.16363v2_20260306_172814
|
Benchmarking Hierarchical Cluster-Constrained Control System (2405.16363v2)
README.md Benchmarking Hierarchical Cluster-Constrained Control System (2405.16363v2) This repository contains a runnable, self-contained benchmark for the **Hierarchical, Cluster-Constrained Control System** architecture proposed in Backfi...
|
03-29 08:01 | Success | - | |
|
exp_2406.17086v1_20260307_160651
|
BrainMAE Efficiency Benchmark
README.md BrainMAE Efficiency Benchmark This benchmark evaluates the architectural efficiency of the proposed **BrainMAE** model (Candidate 2406.17086v1). Innovation Summary BrainMAE proposes using a Masked Autoencoder (MAE) with a Graph At...
|
03-29 08:01 | Success | - | |
|
exp_2406.17095v1_20260307_094135
|
Backfill Candidate 2406.17095v1: Attention Directive Benchmark
README.md Backfill Candidate 2406.17095v1: Attention Directive Benchmark Overview This benchmark evaluates the performance impact of **Candidate 2406.17095v1**, a non-invasive prompting technique designed to mitigate the "Lost-in-the-Middle...
|
03-29 08:01 | Success | - | |
|
exp_2406.17115v3_20260307_161424
|
HQH & HQM Benchmark Suite
README.md HQH & HQM Benchmark Suite This repository contains a runnable benchmark for the **HQH** (Hallucination Questionnaire for Heterogeneity) dataset and the **HQM** (Hallucination Quality Metric) evaluation framework, as proposed in th...
|
03-29 08:01 | Success | - | |
|
exp_2406.17119v2_20260307_154532
|
Benchmark: U-AFNO (U-Net + Adaptive Fourier Neural Operator)
README.md Benchmark: U-AFNO (U-Net + Adaptive Fourier Neural Operator) **Candidate:** 2406.17119v2 **Innovation:** Hybrid U-AFNO Architecture **Abstract:** This benchmark evaluates a hybrid architecture combining a U-Net backbone with a Vis...
|
03-29 08:01 | Success | - | |
|
exp_2406.17126v2_20260307_085517
|
```markdown
README.md
|
03-29 08:01 | Success | - | |
|
exp_2406.17148v2_20260307_084131
|
MixTex Architecture Benchmark
README.md MixTex Architecture Benchmark This benchmark evaluates the **MixTex** architecture as described in "Backfill Candidate 2406.17148v2". **Architecture Overview:** MixTex proposes a dual-transformer approach combining a **Swin Transf...
|
03-29 08:01 | Success | - | |
|
exp_2406.17150v1_20260307_081622
|
Here is the design for a runnable benchmark validating the sparse activation efficiency claims of Backfill Candidate 240...
No summary available yet.
|
03-29 08:01 | Success | - | |
|
exp_2406.17158v1_20260306_173925
|
This benchmark is designed to evaluate the **DEXTER** innovation claims: specifically, the performance gap between stand...
README.md This benchmark is designed to evaluate the **DEXTER** innovation claims: specifically, the performance gap between standard Dense Retrievers and Hybrid/Lexical approaches (like BM25 or Late Interaction) on complex, multi-hop Quest...
|
03-29 08:01 | Success | - | |
|
exp_2406.17167v1_20260306_180518
|
Section 1: README.md
bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_2406.17167v1_20260307_112833
|
Benchmark: Low-Rank & Sparse Properties of One-Layer Transformers
README.md Benchmark: Low-Rank & Sparse Properties of One-Layer Transformers This benchmark validates the theoretical findings from "Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis." Theory Verification The pap...
|
03-29 08:01 | Success | - | |
|
exp_2406.17168v1_20260307_160612
|
Benchmark for Backfill Candidate 2406.17168v1
README.md Benchmark for Backfill Candidate 2406.17168v1 This benchmark evaluates the **concurrent multi-task reinforcement learning with distillation** architecture described in the paper "Backfill Candidate 2406.17168v1". Innovation Overvi...
|
03-29 08:01 | Success | - | |
|
exp_2406.17184v2_20260306_172607
|
Benchmark: Bias-Canceling UCB & Discretized Partitioning (Candidate 2406.17184v2)
README.md Benchmark: Bias-Canceling UCB & Discretized Partitioning (Candidate 2406.17184v2) This benchmark evaluates the architectural innovation proposed in *2406.17184v2*, which introduces a **Bias-Canceling Upper Confidence Bound (BC-UCB...
|
03-29 08:01 | Success | - | |
|
exp_2406.17185v1_20260306_170040
|
Section 1: README.md
bash pip install numpy psutil bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_2406.17185v1_20260307_081228
|
Vaporetto Algorithm Simulation Benchmark
README.md Vaporetto Algorithm Simulation Benchmark This benchmark demonstrates the performance characteristics of **Vaporetto** (Efficient Japanese Tokenization) using a pure Python simulation. Overview of the Innovation Vaporetto optimizes...
|
03-29 08:01 | Success | - | |
|
exp_2406.17186v2_20260307_095516
|
Benchmark: CLERC RAG Pipeline (Local Inference)
README.md Benchmark: CLERC RAG Pipeline (Local Inference) This benchmark evaluates the **Local Inference** capabilities of the CLERC architecture, specifically testing the viability of replacing the cloud-based GPT-4o generator with a quant...
|
03-29 08:01 | Success | - | |
|
exp_2407.09527v1_20260307_105513
|
Benchmark: Median-Based 1.58-bit Quantization (Candidate 2407.09527v1)
README.md Benchmark: Median-Based 1.58-bit Quantization (Candidate 2407.09527v1) Overview This benchmark validates the efficiency claims of the proposed "median-based" BitNet b1.58 variant. Specifically, it tests the hypothesis that a 1.58-...
|
03-29 08:01 | Success | - | |
|
exp_2407.17642v1_20260306_171107
|
SMA-Hyper Framework Benchmark
README.md SMA-Hyper Framework Benchmark **Innovation:** Dynamic Dual Adaptive Spatiotemporal Learning with Hypergraphs **Domain:** Urban Risk Prediction (Spatiotemporal Forecasting) This benchmark evaluates the **SMA-Hyper** architecture, w...
|
03-29 08:01 | Success | - | |
|
exp_2407.17671v2_20260306_173342
|
Benchmark: UDI vs. Global Distillation
README.md Benchmark: UDI vs. Global Distillation This benchmark evaluates the computational cost of the **UDI (Unsqueezed Distillation-based SSL)** architecture compared to a standard global compression baseline. Context Standard SSL method...
|
03-29 08:01 | Success | - | |
|
exp_2407.20266v1_20260306_170503
|
Section 1: README.md
bash pip install torch bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_2408.13352v1_20260306_174940
|
Here is the design for the QAdaPrune benchmark. This benchmark simulates a Variational Quantum Circuit (VQC) training sc...
README.md QAdaPrune: Adaptive Parameter Pruning Benchmark This benchmark evaluates the efficiency of **QAdaPrune**, an adaptive, hyperparameter-free pruning method for Variational Quantum Circuits (VQCs). Innovation Overview Standard VQCs s...
|
03-29 08:01 | Success | - | |
|
exp_2409.05872v1_20260306_174451
|
Here is the runnable benchmark design for the CSRec (Causal Sequential Recommendation) innovation.
No summary available yet.
|
03-29 08:01 | Success | - | |
|
exp_2410.17477v6_20260306_173236
|
Architectural Benchmark: Transformer vs. Recurrent (RWKV)
README.md Architectural Benchmark: Transformer vs. Recurrent (RWKV) Overview This benchmark validates the claims of **Backfill Candidate 2410.17477v6**, specifically the shift from self-attention (Transformer) to Recurrent Architectures (RW...
|
03-29 08:01 | Success | - | |
|
exp_2410.19859v1_20260306_171635
|
Benchmark for Backfill Candidate 2410.19859v1
README.md Benchmark for Backfill Candidate 2410.19859v1 Hierarchical Beam Selection (MMT + RL) This benchmark evaluates the performance of the proposed **Hierarchical Two-Stage Beam Selection Framework**. The system decouples the selection...
|
03-29 08:01 | Success | - | |
|
exp_2411.14585v3_20260306_172149
|
PointLCA-Net Benchmark
README.md PointLCA-Net Benchmark This benchmark evaluates the **PointLCA-Net** architecture, a hybrid spatio-temporal processing system designed for edge neuromorphic computing. It combines the spatial feature extraction capabilities of **P...
|
03-29 08:01 | Success | - | |
|
exp_2412.16715v1_20260307_083845
|
Benchmarking CCFormer for WSI Analysis
README.md Benchmarking CCFormer for WSI Analysis This benchmark evaluates the performance characteristics of **CCFormer**, an architecture designed to process Whole Slide Images (WSIs) as sparse point clouds of cells. Objective The primary...
|
03-29 08:01 | Success | - | |
|
exp_2412.16738v1_20260307_102802
|
KKAN (Kolmogorov-Arnold Network) Hybrid Benchmark
README.md KKAN (Kolmogorov-Arnold Network) Hybrid Benchmark This benchmark evaluates the **KKAN (Kolmogorov-Arnold Network)** architecture, a hybrid design combining MLP-based inner functions with learnable outer basis functions. This struc...
|
03-29 08:01 | Success | - | |
|
exp_2412.16739v2_20260307_113900
|
```markdown
bash python benchmark.py ``` Expected Output The benchmark will report VRAM usage and Throughput (Tokens/Samples per second) for both modes, demonstrating the speed and efficiency gains of the unrolled architecture.
|
03-29 08:01 | Success | - | |
|
exp_2412.16745v2_20260307_103336
|
Visual Mamba (ViM) Benchmark: Candidate 2412.16745v2
README.md Visual Mamba (ViM) Benchmark: Candidate 2412.16745v2 Overview This benchmark suite is designed to verify the performance claims of the **Visual Mamba (ViM)** architecture (arXiv 2412.16745v2). Specifically, it targets the innovati...
|
03-29 08:01 | Success | - | |
|
exp_2412.16746v4_20260307_155146
|
```markdown
No summary available yet.
|
03-29 08:01 | Success | - | |
|
exp_2412.16763v1_20260306_172630
|
Benchmark Design for Paraformer (ClimSim Innovation)
Innovation Summary **Paraformer** introduces a **Transformer-based** architecture to replace classical CNN/RNN methods in global climate model parameterization. It utilizes a **"memory-aware"** design to handle the large-scale **ClimSim** d...
|
03-29 08:01 | Success | - | |
|
exp_2412.16763v1_20260307_112925
|
Paraformer Benchmark: Climate Parameterization
README.md Paraformer Benchmark: Climate Parameterization This benchmark evaluates the performance characteristics of **Paraformer**, a "memory-aware" Transformer model designed for climate parameterization using the ClimSim dataset. Overvie...
|
03-29 08:01 | Success | - | |
|
exp_2412.16777v1_20260307_113938
|
HyperCLIP Benchmark (Candidate 2412.16777v1)
README.md HyperCLIP Benchmark (Candidate 2412.16777v1) This benchmark evaluates the **HyperCLIP** architecture, which replaces large static vision encoders with a text-conditioned hypernetwork. The goal is to validate the claim that this ar...
|
03-29 08:01 | Success | - | |
|
exp_2412.16778v2_20260307_161356
|
Benchmark: Candidate 2412.16778v2 (RoomPainter MVIS)
README.md Benchmark: Candidate 2412.16778v2 (RoomPainter MVIS) Overview This benchmark evaluates the computational overhead and memory footprint associated with **Candidate 2412.16778v2**, specifically focusing on the **Attention-Guided Mul...
|
03-29 08:01 | Success | - | |
|
exp_2412.16806v1_20260307_081941
|
Benchmark for Quantum Contextuality Analysis in BERT
README.md Benchmark for Quantum Contextuality Analysis in BERT This benchmark evaluates the computational overhead of applying **Sheaf and Contextuality-by-Default (CbD)** theoretical frameworks to standard BERT models, as proposed in Backf...
|
03-29 08:01 | Success | - | |
|
exp_2412.18633v1_20260307_161325
|
Benchmark: BoostMD Surrogate Acceleration
README.md Benchmark: BoostMD Surrogate Acceleration This benchmark validates the **BoostMD** architecture proposal (Candidate 2412.18633v1), which focuses on accelerating Molecular Dynamics (MD) inference by minimizing atomic feature recalc...
|
03-29 08:01 | Success | - | |
|
exp_2501.11733v2_20260306_180805
|
Benchmark: Mobile-Agent-E Hierarchical Performance
README.md Benchmark: Mobile-Agent-E Hierarchical Performance This benchmark evaluates the architectural efficiency of **Mobile-Agent-E**, a hierarchical multi-agent framework with persistent memory, against a standard flat agent architectur...
|
03-29 08:01 | Success | - | |
|
exp_2501.11779v2_20260306_163150
|
```markdown
text pip install torch bash python benchmark.py ```
|
03-29 08:01 | Success | - | |
|
exp_2502.15709v2_20260306_173159
|
2502.15709v2
No summary available yet.
|
03-29 08:01 | Success | - | |
|
exp_2504.14772v2_20260306_173107
|
Here is the design for the benchmark.
No summary available yet.
|
03-29 08:01 | Success | - | |
|
exp_2505.14959v1_20260306_170011
|
---
README.md Benchmark: Privacy-Preserving Collaborative CVR Training This benchmark evaluates the **Privacy-Preserving Collaborative Training Framework** for Conversion Rate (CVR) prediction. Innovation Overview The paper proposes a dual-laye...
|
03-29 08:01 | Failed | RuntimeError: Expected all tensors to be on the same device, but got mat1 is on cuda:0, different from other tensors on cpu (when checking argument in method wrapper_CUDA_addmm) | |
|
exp_2505.14969v2_20260307_161029
|
Here is the runnable benchmark designed for the innovation described in Backfill Candidate 2505.14969v2.
README.md bash pip install torch python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_2505.14970v4_20260306_173025
|
Here is the runnable benchmark for the **Self-Evolving Curriculum (SEC)** innovation.
README.md
|
03-29 08:01 | Success | - | |
|
exp_2505.14972v2_20260306_171837
|
Benchmark: CROSS Cultural Safety Alignment Framework
README.md Benchmark: CROSS Cultural Safety Alignment Framework This repository contains a minimal, reproducible benchmark designed to evaluate the **CROSS Cultural Safety Alignment Framework** (Innovation Candidate 2505.14972v2). Overview o...
|
03-29 08:01 | Success | - | |
|
exp_2505.14975v3_20260306_174819
|
This benchmark evaluates the architectural efficiency of **Backfill Candidate 2505.14975v3** ("Flat Policy via Bootstrap...
README.md This benchmark evaluates the architectural efficiency of **Backfill Candidate 2505.14975v3** ("Flat Policy via Bootstrapping"). **The Innovation:** Standard Hierarchical Reinforcement Learning (HRL) relies on a "Manager" (High-Lev...
|
03-29 08:01 | Success | - | |
|
exp_2506.16552v3_20260307_100154
|
Benchmark: Revela-Style Dense Retriever Learning
**Architecture:** Revela employs a standard dense dual-encoder architecture (Bi-Encoder). It integrates retriever optimization into Language Modeling (LM) training by using retriever-computed similarity scores to weight an in-batch cross-do...
|
03-29 08:01 | Success | - | |
|
exp_2506.16571v2_20260307_161251
|
Here is the benchmark design for the "Backfill Candidate 2506.16571v2" innovation, focusing on the feasibility of proces...
**Paper Analysis:** *Capturing Visualization Design Rationale* This paper introduces a methodology and dataset for extracting visualization design rationales from student notebooks, creating a corpus of Question-Answer-Rationale triples usi...
|
03-29 08:01 | Success | - | |
|
exp_2506.16575v1_20260307_154445
|
Benchmark: Elo-Based Multi-Candidate Aggregation (ARES)
**Paper Summary: Elo Rating System for Harmful Content Detection** **Architecture:** The paper proposes an inference workflow utilizing an Elo rating system to rank and select optimal LLM responses for detecting harmful content (microaggres...
|
03-29 08:01 | Success | - | |
|
exp_2506.16580v1_20260307_100237
|
Here is the design for the runnable benchmark targeting the Emformer + NAR architecture candidate.
**Architecture:** Replaces standard encoder blocks with an **Emformer** (Efficient Memory Transformer) to enable chunk-based attention and streamable processing. The model utilizes a non-autoregressive decoder to parallelize output generati...
|
03-29 08:01 | Success | - | |
|
exp_2506.16584v1_20260307_083216
|
Benchmark for Variance Decomposition (Semantic Grounding)
**Architecture & Methodology** This paper does not propose a new model architecture. Instead, it introduces a **Variance Decomposition Framework**, an evaluation methodology designed to measure semantic grounding. It assesses whether an LLM...
|
03-29 08:01 | Success | - | |
|
exp_2506.16586v1_20260307_100312
|
Benchmark: Agentic QA Workflow Efficiency (Backfill 2506.16586v1)
**Assessment:** This paper evaluates a *workflow* rather than a specific model architecture. It focuses on applying generic "state-of-the-art" LLMs to QA tasks. * **Architecture:** Utilizes AI-agents for automated test case generation, stat...
|
03-29 08:01 | Success | - | |
|
exp_2506.16592v1_20260307_090447
|
Backfill Candidate 2506.16592v1: Architecture Benchmark
**Architecture:** Utilizes a hybrid design coupling a pre-trained DenseNet121 encoder with a multi-branch attention-enhanced decoder. The bottleneck employs Global Spatial Attention (GSA), Position Encoding, and Scaled Dot-Product Attention...
|
03-29 08:01 | Success | - | |
|
exp_2506.16593v1_20260307_113618
|
Benchmark: Slip-Steer Kinematics & DRIVE Protocol (Candidate 2506.16593v1)
**Summary for ARES 8GB Roadmap** **Focus:** Physical System Identification & Uncertainty Quantification (Classical/Model-based, not Deep Learning). * **Architecture:** Proposes a lightweight mathematical "transfer function" linking velocity...
|
03-29 08:01 | Success | - | |
|
exp_2506.16594v2_20260307_113657
|
Benchmark: LLM Biomedical Synthetic Data Generation (Scoping Review 2506.16594v2)
This paper is a **scoping review**, not a technical architecture proposal. Consequently, it provides **no specific data** regarding model architecture, memory footprint, or inference speed required for the ARES 8GB roadmap. * **Architecture...
|
03-29 08:01 | Success | - | |
|
exp_2506.16596v3_20260307_104214
|
```markdown
This paper outlines a community-driven vision for a modern Cyc-like knowledge infrastructure to address LLM hallucinations and reasoning gaps. * **Architecture:** Proposes an "open engineering framework" integrating modular Knowledge Repres...
|
03-29 08:01 | Success | - | |
|
exp_2506.16597v1_20260307_113747
|
Benchmark: Vision Transformer (ViT) on Recurrence Plots for Exoplanet Classification
**Paper:** Exoplanet Classification through Vision Transformers with Temporal Image Analysis **Architecture:** The proposed pipeline converts 1D Kepler light curves into 2D Recurrence Plots (RPs) or Gramian Angular Fields (GAFs) to serve as...
|
03-29 08:01 | Success | - | |
|
exp_2506.16600v2_20260306_165933
|
FLAME Architecture Benchmark: Dynamic Sparse Activation
**FLAME** proposes a Sparse Mixture-of-Experts (SMoE) framework for federated LLM fine-tuning, designed to eliminate the performance degradation caused by compressing LoRA matrices on low-resource clients. * **Architecture:** Replaces stand...
|
03-29 08:01 | Success | - | |
|
exp_2506.16600v2_20260307_080909
|
Here is the design for the FLAME benchmark, focusing on the core innovation: enabling resource-adaptive federated learni...
**FLAME** proposes a Sparse Mixture-of-Experts (SMoE) framework for federated LLM fine-tuning, designed to eliminate the performance degradation caused by compressing LoRA matrices on low-resource clients. * **Architecture:** Replaces stand...
|
03-29 08:01 | Success | - | |
|
exp_2506.16617v1_20260306_174407
|
thoughts:
1. **Analyze the Request**: * **Input**: Title "Backfill Candidate 2506.16617v1", Abstract about "Human-Centric Evaluation Framework" for XAI in PPM. * **Constraints**: Output `README.md` and `benchmark.py` separated by `
|
03-29 08:01 | Success | - | |
|
exp_2506.16623v1_20260307_100441
|
Section 1: README.md
**Architecture** The framework utilizes a **frontier-based exploration strategy** guided by a Vision-Language Model (VLM). Instead of simple embedding similarity, it employs **dynamic history-augmented prompting**. The system injects a text...
|
03-29 08:01 | Success | - | |
|
exp_2506.16628v1_20260307_105849
|
---
**Architecture:** Hybrid offline design. LLMs are utilized exclusively during the development phase to generate rules, identify relevant text snippets, and extract keywords. The production system is a traditional rule-based NLP pipeline (Re...
|
03-29 08:01 | Success | - | |
|
exp_2506.16633v2_20260307_083805
|
Section 1: README.md
**Paper:** GeoGuess (SightSense) **Summary for ARES 8GB Roadmap:** * **Architecture:** Proposes **SightSense**, a multimodal framework processing **Street View panoramas**. It employs a **hierarchical visual encoder** to synthesize local de...
|
03-29 08:01 | Success | - | |
|
exp_2506.16636v1_20260307_154250
|
**README.md**
**Architecture** The method relies on **Masked Autoregressive Flows (MAF)**. Rather than standard generative sampling, it proposes a "Latent Noise Injection" (LNI) technique: encoding specific observed data points into the latent space, app...
|
03-29 08:01 | Success | - | |
|
exp_2506.16640v4_20260306_170755
|
Here is the runnable benchmark code for the Adaptive-Scalable Entmax (ASEntmax) innovation.
**Architecture** Proposes **Adaptive-Scalable Entmax (ASEntmax)**, a drop-in replacement for Softmax attention. It utilizes $\alpha$-entmax to assign exact zeros to irrelevant tokens, creating dynamically sparse attention maps. A learnable...
|
03-29 08:01 | Success | - | |
|
exp_2506.16640v4_20260307_080835
|
Section 1: README.md
**Architecture** Proposes **Adaptive-Scalable Entmax (ASEntmax)**, a drop-in replacement for Softmax attention. It utilizes $\alpha$-entmax to assign exact zeros to irrelevant tokens, creating dynamically sparse attention maps. A learnable...
|
03-29 08:01 | Success | - | |
|
exp_2506.16644v1_20260307_100709
|
SORE Architecture Benchmark
**Architecture** SORE replaces autoregressive LLMs with a dual-stage pipeline utilizing multilingual sentence encoders and Approximate Nearest Neighbor (ANN) search. It identifies core content via metadata embeddings and filters extraneous...
|
03-29 08:01 | Success | - | |
|
exp_2506.16650v1_20260306_174212
|
SemAgent: Semantic-Driven Two-Stage Benchmark
**Architecture:** Proposes a complex, multi-stage agentic workflow. It moves beyond simple code localization by integrating **execution semantics** for context retrieval and **generalized abstraction** for issue understanding. The core uses...
|
03-29 08:01 | Success | - | |
|
exp_2506.16650v1_20260307_102357
|
SemAgent Pipeline Benchmark (8GB Constraint)
**Architecture:** Proposes a complex, multi-stage agentic workflow. It moves beyond simple code localization by integrating **execution semantics** for context retrieval and **generalized abstraction** for issue understanding. The core uses...
|
03-29 08:01 | Success | - | |
|
exp_2506.16655v1_20260307_102631
|
Backfill Candidate 2506.16655v1 Benchmark
**Architecture** Arch-Router is a compact 1.5B parameter model functioning as a classifier. Instead of generating text, it maps user queries to specific domains (e.g., travel) or action types to select the most appropriate downstream model...
|
03-29 08:01 | Success | - | |
|
exp_2507.14722v1_20260306_155934
|
This benchmark simulates the core innovation of the **LeanTree** methodology for Automated Theorem Proving (ATP) as desc...
README.md This benchmark simulates the core innovation of the **LeanTree** methodology for Automated Theorem Proving (ATP) as described in the analysis. **The Innovation:** LeanTree proposes a "White-Box" approach that factorizes complex pr...
|
03-29 08:01 | Success | - | |
|
exp_2507.14757v1_20260306_180835
|
Section 1: README.md
Operational Manifold SNN Benchmark Overview This benchmark validates the **"Operational Manifold"** design principle for Spiking Neural Networks (SNNs). It demonstrates that SNN performance (measured here as network viability and spike thro...
|
03-29 08:01 | Success | - | |
|
exp_2507.14758v1_20260306_165905
|
GRACE Framework Benchmark
This benchmark evaluates the **GRACE (Generative Recommendation via Chain-of-Thought)** framework concepts. What is being tested? 1. **Hybrid CoT Tokenization**: Instead of predicting the next Item ID directly, the model interprets and gene...
|
03-29 08:01 | Failed | TypeError: int is not a Module subclass | |
|
exp_2507.14766v1_20260306_172220
|
Design Reasoning
To benchmark the innovation described (CXR-TFT), we need to simulate the computational cost of the **Multi-Modal Temporal Fusion** architecture. **Core Architecture to Simulate:** 1. **Sparse-Dense Alignment:** The unique computational load...
|
03-29 08:01 | Success | - | |
|
exp_2507.14768v2_20260306_172920
|
Benchmark: Heterogeneous Hierarchical Secure Aggregation (H-HSA)
README.md Benchmark: Heterogeneous Hierarchical Secure Aggregation (H-HSA) This benchmark evaluates the computational and memory efficiency gains of the **Heterogeneous Hierarchical Secure Aggregation (H-HSA)** innovation compared to standa...
|
03-29 08:01 | Success | - | |
|
exp_2508.06495v1_20260306_155746
|
Benchmark: Local Fact-Checking Data Pipeline (Backfill Candidate 2508.06495v1)
README.md Benchmark: Local Fact-Checking Data Pipeline (Backfill Candidate 2508.06495v1) Overview This benchmark evaluates the feasibility of running the **"Claim Extraction"** phase of the Portuguese fact-checking pipeline (as described in...
|
03-29 08:01 | Success | - | |
|
exp_2508.13337v1_20260306_163845
|
Benchmark: X-MoE Inspired Padding-Free & Sparse Execution
README.md Benchmark: X-MoE Inspired Padding-Free & Sparse Execution Overview This benchmark evaluates the memory efficiency gains derived from the **X-MoE (Padding-Free Execution)** and **Redundancy-Bypassing Dispatch** principles, specific...
|
03-29 08:01 | Success | - | |
|
exp_2508.13346v1_20260306_155642
|
Backfill Candidate 2508.13346v1: Barron Bounds & Linear Efficiency
README.md Backfill Candidate 2508.13346v1: Barron Bounds & Linear Efficiency Overview This benchmark validates the theoretical limits of function approximation for linear methods on hardware with constrained VRAM (RTX A2000 8GB target). Bas...
|
03-29 08:01 | Failed | RuntimeError: The size of tensor a (128) must match the size of tensor b (8) at non-singleton dimension 1 | |
|
exp_2508.13358v1_20260306_153101
|
Benchmark: Dynamic Beam Pruning for LLM Text Generation
README.md Benchmark: Dynamic Beam Pruning for LLM Text Generation 1. Context & Relevance This benchmark evaluates the transfer of **"Aggressive Beam Search Pruning"** logic (originally proposed for ASR/MT streaming in the candidate paper) t...
|
03-29 08:01 | Success | - | |
|
exp_2508.13364v1_20260306_162901
|
HAL 9000 Risk Prediction Benchmark
README.md HAL 9000 Risk Prediction Benchmark This benchmark evaluates the machine learning component of the **HAL 9000** system, simulating the processing of scraped vulnerability data to predict exploitability. **Context:** The underlying...
|
03-29 08:01 | Success | - | |
|
exp_2508.13376v1_20260306_171223
|
Innovation: Semantic-Enhanced ASR via LLaMA Distillation
README.md Innovation: Semantic-Enhanced ASR via LLaMA Distillation **Candidate:** Backfill 2508.13376v1 **Target Hardware:** RTX A2000 (8GB VRAM) **Focus:** Cross-modal Distillation & Memory Efficiency Summary This benchmark validates the f...
|
03-29 08:01 | Success | - | |
|
exp_2508.13380v1_20260306_155459
|
J3O: Joint Optimization of Onloading & Offloading Benchmark
README.md J3O: Joint Optimization of Onloading & Offloading Benchmark This benchmark validates the **J3O (Joint Optimization)** innovation for constrained hardware environments (Target: <8GB VRAM, e.g., RTX A2000). The Innovation Traditiona...
|
03-29 08:01 | Failed | RuntimeError: Expected all tensors to be on the same device, but got weight is on cpu, different from other tensors on cuda:0 (when checking argument in method wrapper_CUDA__native | |
|
exp_2508.14125v1_20260306_133835
|
Benchmark: Smart Parking Prediction (Sensor-Free Framework)
README.md Benchmark: Smart Parking Prediction (Sensor-Free Framework) Overview This benchmark evaluates the **Smart Parking Prediction Framework** proposed in *Backfill Candidate 2508.14125v1*. The original paper proposes a "sensor-free" ap...
|
03-29 08:01 | Success | - | |
|
exp_2508.14125v1_20260306_152000
|
Section 1: README.md
bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_2508.14125v1_20260306_152219
|
```markdown
bash pip install torch numpy scikit-learn pandas bash python benchmark.py ``` The script will output performance metrics, VRAM usage, and the final verification of the hypothesis (RFR vs LSTM).
|
03-29 08:01 | Failed | RuntimeError: Found dtype Double but expected Float | |
|
exp_2508.15831v2_20260306_162734
|
2508.15831v2
No summary available yet.
|
03-29 08:01 | Success | - | |
|
exp_2509.14438v1_20260306_134635
|
Benchmark: Bias Mitigation Overhead Analysis (Candidate 2509.14438v1)
README.md Benchmark: Bias Mitigation Overhead Analysis (Candidate 2509.14438v1) This benchmark evaluates the **computational cost** and **efficacy** of the bias mitigation strategies proposed in the ARES Analysis Log. Objective The source p...
|
03-29 08:01 | Success | - | |
|
exp_2509.14448v1_20260306_152446
|
VCBench: Lightweight Founder Success Prediction (Replica)
README.md VCBench: Lightweight Founder Success Prediction (Replica) This benchmark is a runnable, lightweight replication of the **VCBench** evaluation framework (Target: Backfill Candidate 2509.14448v1). **Context:** The original VCBench p...
|
03-29 08:01 | Success | - | |
|
exp_2509.14456v2_20260306_152929
|
Benchmark: CORRECT-DETECT Trade-off Analysis (Candidate 2509.14456v2)
README.md Benchmark: CORRECT-DETECT Trade-off Analysis (Candidate 2509.14456v2) This benchmark evaluates the **CORRECT-DETECT** cognitive bottleneck identified in the candidate paper. The Innovation The paper argues that standard LLMs suffe...
|
03-29 08:01 | Success | - | |
|
exp_2509.14480v1_20260306_163542
|
Benchmark: Turn-level Adjudicated Reinforcement Learning (TARL)
README.md Benchmark: Turn-level Adjudicated Reinforcement Learning (TARL) This benchmark suite evaluates the computational efficiency and memory footprint of the **Turn-level Adjudicated Reinforcement Learning (TARL)** protocol proposed in...
|
03-29 08:01 | Failed | RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x32 and 128x1) | |
|
exp_2509.14488v1_20260306_163044
|
Thought Process
1. **Analyze the Innovation:** The paper "ARES Analysis" proposes replacing global synchronization (expensive, scales linearly $O(m)$) with randomized local coordination (constant time $O(1)$). In the context of an RTX A2000 8GB (Bandwidth...
|
03-29 08:01 | Success | - | |
|
exp_2509.16256v1_20260306_155721
|
Benchmark: HausaMovieReview Innovation (Low-Data Efficiency)
README.md Benchmark: HausaMovieReview Innovation (Low-Data Efficiency) **Candidate:** Backfill Candidate 2509.16256v1 (HausaMovieReview) **Verdict:** REJECTED FOR CODING (Directive 11) **Objective:** Verify the paper's claim that Classical...
|
03-29 08:01 | Failed | ZeroDivisionError: float division by zero | |
|
exp_2509.18178v2_20260306_115349
|
Benchmark: Multi-Agent Workflow Orchestration (Foam-Agent Pattern)
README.md Benchmark: Multi-Agent Workflow Orchestration (Foam-Agent Pattern) Overview This benchmark validates the **architectural efficiency** of the multi-agent pattern described in the "Foam-Agent" paper (Backfill Candidate 2509.18178v2)...
|
03-29 08:01 | Success | - | |
|
exp_2510.16197v1_20260306_162952
|
**Benchmark: LaSDI-Inference (Latent Space Dynamics Identification)**
README.md **Benchmark: LaSDI-Inference (Latent Space Dynamics Identification)** This benchmark evaluates the computational efficiency and memory footprint of the **LaSDI (Latent Space Dynamics Identification)** framework when applied to hig...
|
03-29 08:01 | Success | - | |
|
exp_2510.16198v1_20260306_153211
|
ARES Protocol Benchmark: EgMM-Corpus & CLIP Evaluation
README.md ARES Protocol Benchmark: EgMM-Corpus & CLIP Evaluation Overview This benchmark evaluates the computational requirements and processing throughput of standard CLIP models (specifically `openai/clip-vit-base-patch32`) when subjected...
|
03-29 08:01 | Success | - | |
|
exp_2510.16208v1_20260306_153937
|
Backfill Candidate 2510.16208v1: Nonstationary Bandits with Linear Dynamics Benchmark
README.md Backfill Candidate 2510.16208v1: Nonstationary Bandits with Linear Dynamics Benchmark This benchmark evaluates the computational efficiency of the **Explore-Then-Commit** strategy applied to Linear Dynamical Systems (LDS) as descr...
|
03-29 08:01 | Success | - | |
|
exp_2510.16232v2_20260306_164034
|
Section 1: README.md
Benchmark: AffPCL (Affinity-based Personalized Collaborative Learning) on 8GB VRAM Innovation Overview This benchmark validates the **ARES Analysis: AffPCL & The 8GB Efficiency Frontier**. It translates the theoretical "AffPCL" framework (t...
|
03-29 08:01 | Failed | ModuleNotFoundError: No module named 'peft' | |
|
exp_2510.16250v1_20260306_162842
|
Here is the design for the benchmark based on the provided innovation analysis.
README.md Benchmark: 1-Bit Weight Quantization (ARES Candidate 2510.16250v1) Overview This benchmark evaluates the memory and performance efficiency of the **1-Bit Weight Quantization** technique applied to Random Features/MLP architectures...
|
03-29 08:01 | Failed | NotImplementedError: Module [StandardModel] is missing the required "forward" function | |
|
exp_2510.16252v1_20260306_152359
|
WEBSERV Input Efficiency Benchmark
README.md WEBSERV Input Efficiency Benchmark This benchmark evaluates the **WEBSERV** innovation proposal (Backfill Candidate 2510.16252v1). Context & Goal Modern Web Agents face an **Input Bottleneck**. Raw browser environments inject mass...
|
03-29 08:01 | Failed | GPU_REQUIRED policy blocked benchmark execution. | |
|
exp_2510.17881v2_20260306_152814
|
POPI: Modular Personalization Benchmark
README.md POPI: Modular Personalization Benchmark Overview This benchmark evaluates the **POPI (Modular Personalization via Preference Inference)** innovation, specifically targeting the **8GB VRAM Efficiency Frontier**. The core hypothesis...
|
03-29 08:01 | Failed | torch.AcceleratorError: CUDA error: device-side assert triggered | |
|
exp_2511.12791v3_20260306_134323
|
Benchmark: Adaptive Horizon Selection (Backfill Candidate 2511.12791v3)
README.md Benchmark: Adaptive Horizon Selection (Backfill Candidate 2511.12791v3) Objective This benchmark validates the "Dynamic Context Pruning" innovation for the RTX A2000 (8GB VRAM) architecture. It tests the hypothesis that we can def...
|
03-29 08:01 | Success | - | |
|
exp_2511.12797v2_20260306_155856
|
Benchmark: Modality-Agnostic Symbolic Reasoning (Evo2 Insight)
README.md Benchmark: Modality-Agnostic Symbolic Reasoning (Evo2 Insight) Overview This benchmark validates the hypothesis proposed in **Backfill Candidate 2511.12797v2** (Evo2): that **In-Context Learning (ICL) and symbolic reasoning capabi...
|
03-29 08:01 | Success | - | |
|
exp_2511.12805v1_20260306_155603
|
Benchmark: Sign-augmented Structural Intervention Distance (sSID)
README.md Benchmark: Sign-augmented Structural Intervention Distance (sSID) Overview This benchmark evaluates the computational performance of the **sign-augmented Structural Intervention Distance (sSID)** algorithm as described in Backfill...
|
03-29 08:01 | Success | - | |
|
exp_2511.12808v4_20260306_152739
|
Section 1: README.md
LTLf Dense Reward Benchmark (Backfill 2511.12808v4) Overview This benchmark evaluates the computational overhead of **Quantitative Linear Temporal Logic ($\text{LTL}_f$)** for Reward Shaping in Reinforcement Learning. The innovation replace...
|
03-29 08:01 | Success | - | |
|
exp_2511.12810v1_20260306_162803
|
MSRNet-Inspired Efficiency Benchmark
README.md MSRNet-Inspired Efficiency Benchmark This benchmark evaluates the memory efficiency claims derived from the MSRNet (Multi-Scale Refinement) analysis, specifically comparing **Stacked Architectures** against **Recursive Refinement...
|
03-29 08:01 | Success | - | |
|
exp_2511.12817v2_20260306_134602
|
Here is the runnable benchmark for the FAITH (Knowledge Graph Grounded Evaluation) innovation.
README.md bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_2511.12827v1_20260306_133748
|
ARES Backfill Candidate 2511.12827v1
README.md ARES Backfill Candidate 2511.12827v1 Confidence-Adaptive Bit-Depth Reduction (CABDR) Benchmark **Subject:** Cross-Domain Innovation Transfer (Malware Defense -> Post-Transformer Inference) **Target Hardware:** RTX A2000 (8GB VRAM)...
|
03-29 08:01 | Success | - | |
|
exp_2511.12827v1_20260306_134505
|
Here is the runnable benchmark for the **Confidence-Adaptive Bit-Depth Reduction (CABDR)** innovation.
This benchmark simulates the core "ARES" objective: maximizing inference efficiency on hardware-constrained edge devices (simulated here via dynamic precision switching). It compares a static high-precision model against a dynamic model tha...
|
03-29 08:01 | Success | - | |
|
exp_2511.12836v1_20260306_155429
|
Benchmark: DIGing-SGLD (Decentralized Sampling)
README.md Benchmark: DIGing-SGLD (Decentralized Sampling) Overview This benchmark implements the **DIGing-SGLD** algorithm as described in "Backfill Candidate 2511.12836v1". **Note:** This innovation focuses on **Bayesian Training/Sampling*...
|
03-29 08:01 | Success | - | |
|
exp_2511.12838v1_20260306_163712
|
Co-Sparsify Benchmark
README.md Co-Sparsify Benchmark This benchmark evaluates the **Co-Sparsify** topology-aware sparsification technique for Higher-order Graph Neural Networks (HOGNNs). The Innovation Standard HOGNN layers (2-FWL) require cubic complexity $O(N...
|
03-29 08:01 | Success | - | |
|
exp_2512.14856v2_20260307_125012
|
Benchmark: T5Gemma 2 (Encoder-Decoder) Memory & Throughput
**Architecture:** T5Gemma 2 repurposes the decoder-only Gemma 3 into an **encoder-decoder** architecture via UL2 adaptation, specifically optimized for multimodal and long-context tasks. **Memory Footprint:** The model prioritizes VRAM effi...
|
03-29 08:01 | Success | - | |
|
exp_2512.14865v1_20260307_124833
|
Audio MultiChallenge Benchmark
**Paper:** Audio MultiChallenge (Benchmark) **Architecture & Scope:** This paper introduces **Audio MultiChallenge**, a benchmark for End-to-End (E2E) Spoken Dialogue Systems (SDS) that process raw audio without intermediate transcription....
|
03-29 08:01 | Success | - | |
|
exp_2512.14870v1_20260307_081406
|
Benchmark: ARES Architecture (HERBench Simulation)
**HERBench** introduces a high-complexity VideoQA benchmark requiring the aggregation of at least three temporally separated visual cues. It utilizes a Minimum Required Frame-Set (MRFS) metric averaging 5.5 frames, significantly higher than...
|
03-29 08:01 | Success | - | |
|
exp_2512.14879v1_20260307_155056
|
Here is the runnable benchmark for the Entropy-Reservoir Bregman Projection (ERBP) innovation.
**Architecture:** Proposes Entropy-Reservoir Bregman Projection (ERBP), a theoretical framework for self-referential training. It addresses model collapse via information geometry rather than proposing a new hardware-efficient model archite...
|
03-29 08:01 | Success | - | |
|
exp_2512.14880v1_20260307_105725
|
```markdown
**Architecture:** Introduces "Task Matrices"—linear transformations that map base model embeddings to specific finetuned states. This allows a single base model to simulate the behavior of multiple specialized models by applying distinct li...
|
03-29 08:01 | Success | - | |
|
exp_2512.14896v1_20260307_104007
|
Benchmark: External RAG Pipeline (Backfill Candidate 2512.14896v1)
**Architecture** DrugRAG is a model-agnostic, three-step Retrieval-Augmented Generation (RAG) pipeline. It functions as an external wrapper, retrieving structured drug knowledge to augment prompts without modifying the underlying LLM archit...
|
03-29 08:01 | Success | - | |
|
exp_2512.14908v5_20260307_113120
|
Benchmark: ATLAS (Adjacency-Free Inference) vs. Traditional GNN
**Architecture:** ATLAS is a propagation-free framework replacing message passing with multi-resolution community features. It utilizes modularity-guided search to identify optimal community scales, projects these structures into embeddings...
|
03-29 08:01 | Success | - | |
|
exp_2512.14910v1_20260306_115659
|
Here is the design for the benchmark based on the "AgroAskAI" analysis and the required "Strategic Pivot" to fit 8GB VRA...
README.md AgroAskAI: Efficiency & VRAM Constraint Benchmark 1. Context & Strategic Pivot (Step 11) The original **AgroAskAI** proposal suggests a Multi-Agent System (MAS) using a Chain-of-Responsibility (Router -> Specialist -> Synthesizer)...
|
03-29 08:01 | Success | - | |
|
exp_2512.14925v2_20260306_140457
|
Here is a runnable benchmark suite designed to validate the VRAM efficiency claims of the MAHA proposal.
**Architecture:** MAHA replaces standard MHSA with a hybrid dilated-convolutional transformer backbone. It utilizes learnable downsampling to partition inputs into hierarchical scales and aggregates attention maps using differentiable conve...
|
03-29 08:01 | Success | - | |
|
exp_2512.14925v2_20260307_081020
|
Here is the runnable benchmark code for the Multiscale Aggregated Hierarchical Attention (MAHA) innovation.
**Architecture:** MAHA replaces standard MHSA with a hybrid dilated-convolutional transformer backbone. It utilizes learnable downsampling to partition inputs into hierarchical scales and aggregates attention maps using differentiable conve...
|
03-29 08:01 | Success | - | |
|
exp_2512.14930v1_20260306_140625
|
RMPMAB-Inspired KV-Cache Eviction Benchmark
**Architecture:** Proposes a Restless Multi-Process Multi-Armed Bandit (RMPMAB) framework. Instead of deep neural networks, it models imaging regions as ensembles of Markov chains to capture biological heterogeneity. It relies on scalable W...
|
03-29 08:01 | Success | - | |
|
exp_2512.14930v1_20260307_124800
|
---
**Architecture:** Proposes a Restless Multi-Process Multi-Armed Bandit (RMPMAB) framework. Instead of deep neural networks, it models imaging regions as ensembles of Markov chains to capture biological heterogeneity. It relies on scalable W...
|
03-29 08:01 | Success | - | |
|
exp_2512.14938v1_20260306_134152
|
Here is the design for the benchmark, simulating the specific memory efficiencies claimed by the TalkVerse architecture...
**Architecture** The model utilizes a 5B parameter Diffusion Transformer (DiT) built upon Wan2.2. To manage long-form generation, it employs a sliding window mechanism with motion-frame context and a high-compression Video VAE. **Memory Foo...
|
03-29 08:01 | Success | - | |
|
exp_2512.14938v1_20260306_134950
|
Here is the runnable benchmark designed to test the architectural claims of the "TalkVerse" innovation (Sliding Window A...
**Architecture** The model utilizes a 5B parameter Diffusion Transformer (DiT) built upon Wan2.2. To manage long-form generation, it employs a sliding window mechanism with motion-frame context and a high-compression Video VAE. **Memory Foo...
|
03-29 08:01 | Success | - | |
|
exp_2512.14938v1_20260306_152312
|
Benchmark: TalkVerse Efficiency Simulation (Linear Attention + High Compression)
**Architecture** The model utilizes a 5B parameter Diffusion Transformer (DiT) built upon Wan2.2. To manage long-form generation, it employs a sliding window mechanism with motion-frame context and a high-compression Video VAE. **Memory Foo...
|
03-29 08:01 | Success | - | |
|
exp_2512.14938v1_20260307_154128
|
Wan2.2-5B Video Generation Benchmark
**Architecture** The model utilizes a 5B parameter Diffusion Transformer (DiT) built upon Wan2.2. To manage long-form generation, it employs a sliding window mechanism with motion-frame context and a high-compression Video VAE. **Memory Foo...
|
03-29 08:01 | Success | - | |
|
exp_2512.14941v1_20260306_153138
|
Benchmark: Physics-Informed Neural Networks (PINNs) on Complex 3D Geometries
README.md Benchmark: Physics-Informed Neural Networks (PINNs) on Complex 3D Geometries This benchmark evaluates the computational performance of the PINN methodology described in **Backfill Candidate 2512.14941v1**. Context & Strategic Alig...
|
03-29 08:01 | Success | - | |
|
exp_2512.14944v1_20260307_124911
|
Puzzle Curriculum GRPO Benchmark
**Architecture & Methodology** PC-GRPO is a post-training reinforcement learning algorithm for VLMs (tested on Qwen-3B/7B). It eliminates external verifiers by using self-supervised "puzzle" environments (PatchFit, Rotation, Jigsaw) to gene...
|
03-29 08:01 | Success | - | |
|
exp_2512.14946v1_20260306_152604
|
EVICPRESS Memory Optimization Benchmark
**Summary for ARES 8GB Roadmap:** * **Architecture:** A multi-tier KV management system (GPU VRAM to CPU RAM) that jointly optimizes eviction and lossy compression. It utilizes a "unified utility function" to balance quality loss against la...
|
03-29 08:01 | Success | - | |
|
exp_2512.14946v1_20260307_094824
|
EVICPRESS Benchmark Simulation
**Summary for ARES 8GB Roadmap:** * **Architecture:** A multi-tier KV management system (GPU VRAM to CPU RAM) that jointly optimizes eviction and lossy compression. It utilizes a "unified utility function" to balance quality loss against la...
|
03-29 08:01 | Success | - | |
|
exp_2512.14954v1_20260307_090240
|
```markdown
**Summary for ARES 8GB Roadmap** **Architecture:** Proposes a probabilistic framework to align teacher and student probability spaces across distinct tokenizers. By exploiting the recursive structure of Byte-Pair Encoding (BPE), it enables...
|
03-29 08:01 | Success | - | |
|
exp_2512.14961v3_20260307_090321
|
Here is the runnable benchmark designed for Backfill Candidate 2512.14961v3 (Hybrid Multimodal Fusion).
**Architecture:** Utilizes a hybrid trimodal framework (face, voice, motion) with independent encoders feeding into a cross-attention and gated fusion module. It employs a single classification head with a confidence-weighted strategy to dy...
|
03-29 08:01 | Success | - | |
|
exp_2601.10859v1_20260306_165652
|
Project ARES: Topology-Inspired KV-Cache Routing
README.md Project ARES: Topology-Inspired KV-Cache Routing **Innovation:** Application of Structural Topology Optimization to LLM Inference. Overview This benchmark validates the concept of using a lightweight "Router Agent" (inspired by th...
|
03-29 08:01 | Success | - | |
|
exp_2601.10873v1_20260306_165755
|
Benchmark: Unit-Consistent (UC) Backpropagation vs. Standard Backprop
README.md Benchmark: Unit-Consistent (UC) Backpropagation vs. Standard Backprop **Innovation:** Backfill Candidate 2601.10873v1 **Context:** 8GB Efficiency Frontier (RTX A2000 Class) Overview Standard backpropagation in ReLU networks suffer...
|
03-29 08:01 | Failed | RuntimeError: t() expects a tensor with <= 2 dimensions, but self is 3D | |
|
exp_2601.10880v1_20260306_140427
|
Benchmark: Medical SAM3 VRAM Efficiency Frontier
README.md Benchmark: Medical SAM3 VRAM Efficiency Frontier **Candidate ID:** 2601.10880v1 **Subject:** Medical SAM3 (3D Transformer Adaptation) **System Constraints:** 8GB VRAM Limit (RTX A2000 Class) **ARES Verdict:** DO NOT IMPLEMENT (Har...
|
03-29 08:01 | Success | - | |
|
exp_2601.10905v1_20260306_171517
|
Benchmark: Action Shapley Data Selection Efficiency
README.md Benchmark: Action Shapley Data Selection Efficiency **Overview** This benchmark evaluates the computational efficiency of the **Action Shapley** data selection methodology introduced in the paper *2601.10905v1*. **Innovation Summa...
|
03-29 08:01 | Success | - | |
|
exp_2601.11557v1_20260307_154325
|
Benchmark: Information-Theoretic Binarization (MIB) vs. Float32 HNSW
**Architecture:** Replaces the standard "HNSW + float32" stack with **Maximally Informative Binarization (MIB)**. The system utilizes exhaustive search over 1-bit binary vectors using bitwise distance metrics and Information-Theoretic Scori...
|
03-29 08:01 | Success | - | |
|
exp_2601.11657v1_20260306_163120
|
D-PARC Innovation Benchmark
README.md D-PARC Innovation Benchmark This benchmark evaluates the **D-PARC (Deformable Physics-Aware Recurrent Convolutions)** methodology. It simulates the core "Active Filtration" and "Smarter, Not Bigger" paradigm by comparing a standar...
|
03-29 08:01 | Success | - | |
|
exp_2601.11659v1_20260306_115302
|
Benchmark: Llama 4 Hybrid MoE vs. Dense Inference (8GB VRAM Optimization)
README.md Benchmark: Llama 4 Hybrid MoE vs. Dense Inference (8GB VRAM Optimization) Overview This benchmark evaluates the **Hardware Awareness** and **Efficiency Frontier** improvements proposed for the Llama 4-inspired architecture on cons...
|
03-29 08:01 | Success | - | |
|
exp_2601.11659v1_20260306_134246
|
This benchmark evaluates the efficiency of the proposed Llama 4-style Mixture of Experts (MoE) architecture against trad...
README.md This benchmark evaluates the efficiency of the proposed Llama 4-style Mixture of Experts (MoE) architecture against traditional Dense Transformers within an 8GB VRAM constraint (simulated here for adaptability). Objective To valid...
|
03-29 08:01 | Success | - | |
|
exp_2601.11660v1_20260306_140352
|
MBU-Net Efficiency Benchmark
README.md MBU-Net Efficiency Benchmark **Innovation:** Masked Binary U-Net (MBU-Net) / "Backfill Candidate 2601.11660v1" **Objective:** Validate memory footprint reduction and inference efficiency via "Cost-Aware Masked Binary Quantization"...
|
03-29 08:01 | Success | - | |
|
exp_2601.11663v1_20260306_170834
|
Benchmark: Unified Activation Sensitivity Framework (ARES Strategy)
README.md Benchmark: Unified Activation Sensitivity Framework (ARES Strategy) Overview This benchmark evaluates the "Unified Activation Sensitivity" innovation described in *Backfill Candidate 2601.11663v1*. **The Innovation:** The paper pr...
|
03-29 08:01 | Success | - | |
|
exp_2601.11664v1_20260306_134736
|
Here is the runnable benchmark code designed to simulate and evaluate the metrics presented in the "Serverless AI Shield...
While the analysis log suggests skipping this for the local 8GB VRAM objective, the benchmark below validates the paper's specific claims regarding cloud-based FaaS security (Detection Rate and Latency Overhead) in a simulated environment....
|
03-29 08:01 | Success | - | |
|
exp_2602.13871v1_20260306_163409
|
Here is the design for the **Ens-CGP (Ensemble Conditional Gaussian Process)** benchmark, specifically tailored for the...
Design Rationale To properly benchmark this innovation without requiring the implementation of the full mathematical engine, we focus on the **computational complexity** and **memory footprint** claims of the paper: 1. **Standard Transforme...
|
03-29 08:01 | Success | - | |
|
exp_2602.13914v1_20260306_135050
|
```markdown
bash python benchmark.py ```
|
03-29 08:01 | Success | - | |
|
exp_2602.13914v1_20260306_163305
|
Benchmark: Polytopological Propositional Dynamic Logic (PDL) Evaluator
README.md Benchmark: Polytopological Propositional Dynamic Logic (PDL) Evaluator Context This benchmark evaluates the computational feasibility of the **Polytopological Propositional Dynamic Logic** system proposed in the paper "Backfill Ca...
|
03-29 08:01 | Success | - | |
|
exp_2602.13921v1_20260306_164003
|
GREPO-Lite: VRAM Efficiency Benchmark
README.md GREPO-Lite: VRAM Efficiency Benchmark This benchmark evaluates the memory efficiency and processing speed of a Graph Neural Network (GNN) architecture similar to GREPO, designed for repository-level bug localization. **Objective**...
|
03-29 08:01 | Failed | RuntimeError: The expanded size of the tensor (50000) must match the existing size (250000) at non-singleton dimension 0. Target sizes: [50000, 256]. Tensor sizes: [250000, 1] | |
|
exp_2603.00084v2_20260306_165828
|
**Title:** Benchmark: DeepXiv-SDK (Structured JSON vs. Unstructured PDF)
README.md **Title:** Benchmark: DeepXiv-SDK (Structured JSON vs. Unstructured PDF) **Description:** This benchmark evaluates the **VRAM efficiency** of the proposed "DeepXiv-SDK" innovation. The core hypothesis is that shifting from Unstruc...
|
03-29 08:01 | Success | - | |
|
exp_2603.05437v1_20260306_163229
|
```markdown
bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_2603.05451v1_20260306_165623
|
Backfill Candidate 2603.05451v1: A2000 "Low-Mem" Adapter
README.md Backfill Candidate 2603.05451v1: A2000 "Low-Mem" Adapter Innovation Summary This benchmark validates a derivative strategy extracted from **FlashAttention-4** (Candidate 2603.05451v1), adapted for the **RTX A2000 (Ampere)** archit...
|
03-29 08:01 | Failed | UnboundLocalError: cannot access local variable 'attn_out' where it is not associated with a value | |
|
exp_2603.05459v1_20260306_134034
|
DEBISS Corpus Stress Test Benchmark
README.md DEBISS Corpus Stress Test Benchmark **Innovation:** Backfill Candidate 2603.05459v1 (DEBISS Multi-Modal Corpus) **Assessment:** High-Compute Load Data Resource **Objective:** To provide a synthetic, runnable simulation of the memo...
|
03-29 08:01 | Success | - | |
|
exp_2603.05462v1_20260306_133935
|
Benchmark: NCTB-QA Bangla Reading Comprehension
This benchmark evaluates the performance of Transformer models (specifically BERT) on the **NCTB-QA** task (Bangla Reading Comprehension with unanswerable questions). As the full NCTB-QA dataset (87,805 pairs) requires external file handlin...
|
03-29 08:01 | Success | - | |
|
exp_2603.05462v1_20260306_134402
|
2603.05462v1
No summary available yet.
|
03-29 08:01 | Success | - | |
|
exp_2603.05462v1_20260306_152102
|
Benchmark: NCTB-QA Baseline (Backfill Candidate 2603.05462v1)
README.md Benchmark: NCTB-QA Baseline (Backfill Candidate 2603.05462v1) Overview This benchmark evaluates the computational efficiency of the method described in the paper "NCTB-QA: A Large-Scale Dataset for Low-Resource Language Question A...
|
03-29 08:01 | Success | - | |
|
exp_2603.05468v1_20260306_163914
|
Benchmark: Neural Quantum Estimator with Kraus Constraints
README.md Benchmark: Neural Quantum Estimator with Kraus Constraints Overview This benchmark evaluates the **"Kraus-structured output layer"** innovation, applied to a **Mamba-like Linear SSM architecture**. It simulates the inference effic...
|
03-29 08:01 | Failed | RuntimeError: expected a matrix | |
|
exp_2603.05485v1_20260306_163742
|
Here is the runnable benchmark design. Per the analysis that running a full Judge model is infeasible for 8GB VRAM, this...
No summary available yet.
|
03-29 08:01 | Success | - | |
|
exp_2603.05495v1_20260306_231345
|
```markdown
README.md bash pip install torch numpy scipy bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_2603.05498v1_20260306_082531
|
2603.05498v1
No summary available yet.
|
03-29 08:01 | Success | - | |
|
exp_2603.05504v1_20260306_155252
|
```markdown
No summary available yet.
|
03-29 08:01 | Success | - | |
|
exp_2603.05507v1_20260307_161715
|
Transformer-Based Inpainting for Sparse 3D Streaming Benchmark
README.md Transformer-Based Inpainting for Sparse 3D Streaming Benchmark This benchmark evaluates the performance of a simplified, synthetic implementation of the proposed "Transformer-Based Inpainting" module designed for real-time 3D stre...
|
03-29 08:01 | Success | - | |
|
exp_core_304987179_20260307_080420
|
Benchmark: RazorAttention KV Cache Compression
README.md Benchmark: RazorAttention KV Cache Compression This repository contains a lightweight, runnable benchmark simulating the **RazorAttention** technique for efficient KV cache compression. Overview RazorAttention optimizes Long-Conte...
|
03-29 08:01 | Success | - | |
|
exp_cr_10.1007_s12046-026-03064-1_20260307_082916
|
Benchmark: Hybrid EfficientNet-B7 + ViT Candidate
README.md Benchmark: Hybrid EfficientNet-B7 + ViT Candidate **Candidate ID:** cr_10.1007_s12046-026-03064-1 Overview This benchmark evaluates the computational feasibility of the proposed hybrid architecture combining **EfficientNet-B7** wi...
|
03-29 08:01 | Pending | - | |
|
exp_cr_10.1007_s12046-026-03064-1_20260307_083140
|
Here is the runnable benchmark design for the candidate innovation.
No summary available yet.
|
03-29 08:01 | Success | - | |
|
exp_cr_10.1007_s42399-026-02316-9_20260307_162004
|
```markdown
bash pip install torch transformers bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_cr_10.1038_s41598-026-39986-3_20260307_095821
|
---
README.md --- MI-SOH: Multi-scale Inverted Transformer Benchmark Overview This benchmark implements the **MI-SOH (Multi-scale Inverted Transformer for State-of-Health)** architecture described in the innovation candidate. It combines **dila...
|
03-29 08:01 | Success | - | |
|
exp_cr_10.1038_s41698-025-01103-4_20260307_113300
|
Benchmark: LLM-AIx (Local Information Extraction)
**Summary: LLM-AIx Pipeline for Oncology** * **Architecture:** The paper outlines **LLM-AIx**, a software protocol acting as a wrapper for open-source, privacy-preserving LLMs. It is designed to extract structured clinical data (e.g., TNM s...
|
03-29 08:01 | Success | - | |
|
exp_cr_10.1088_1361-6501_ae46b7_20260307_081748
|
Here is the benchmark for the Bi-Mamba Time Series Regression architecture.
bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_cr_10.1145_3768167_20260307_105805
|
Section 1: README.md
**Architecture** The paper proposes a Graph-Transformer Network (GTN) acting as a surrogate model for circuit topology optimization. It encodes circuit physics specifically—voltage changes in loops and current flows—directly into graph embe...
|
03-29 08:01 | Success | - | |
|
exp_cr_10.1515_jiip-2022-0050_20260307_160722
|
Benchmark: Multi-Fidelity Elasticity Surrogate (cr_10.1515_jiip-2022-0050)
**Architecture** Proposes a multi-fidelity framework combining a low-fidelity Deep Neural Network (DNN) surrogate with a high-fidelity physical model for Bayesian inference on elastic properties. The DNN handles the bulk of the prior distri...
|
03-29 08:01 | Success | - | |
|
exp_cr_10.1609_aaai.v38i12.29197_20260307_083540
|
Benchmark: Excel Transformer (60M Params)
**Architecture:** FLAME is a 60M parameter Transformer optimized specifically for Excel formulas. Key architectural differentiators include an Excel-specific tokenizer and domain-adapted pre-training objectives: masked span prediction and n...
|
03-29 08:01 | Success | - | |
|
exp_cr_10.1609_aaai.v38i16.29765_20260307_124634
|
Benchmark: The Lens of Perturbation in LLM Quantization
**Architecture:** Introduces a "perturbation lens" framework, analyzing quantization error as additive noise to weights and activations. This theory supports a non-uniform quantization scheme that adapts grid spacing to activation sensitivi...
|
03-29 08:01 | Success | - | |
|
exp_cr_10.1609_aaai.v38i17.29815_20260307_124257
|
Benchmark: Norm Tweaking for Low-Bit LLM Quantization
**Architecture:** A plugin for existing Post-Training Quantization (PTQ) pipelines. It does not alter core Transformer blocks but modifies Layer Normalization weights. The method aligns the distribution of quantized activations with their f...
|
03-29 08:01 | Success | - | |
|
exp_cr_10.1609_aaai.v38i17.29822_20260307_161200
|
Benchmark: LatestEval Dynamic Evaluation Protocol
README.md Benchmark: LatestEval Dynamic Evaluation Protocol **Innovation:** LatestEval (AAAI 2024) **Concept:** A dynamic evaluation protocol that constructs tests from "future" data (published after model training cutoffs) to mitigate data...
|
03-29 08:01 | Success | - | |
|
exp_cr_10.1609_aaai.v38i21.30443_20260307_110307
|
Benchmark: Structured Prompting for Bias Mitigation
**Summary for ARES 8GB Roadmap** * **Architecture:** This research proposes a **software-layer methodology** rather than a neural architecture. It utilizes existing Transformer-based models, relying on structured prompt engineering (context...
|
03-29 08:01 | Success | - | |
|
exp_cr_10.2196_67967_20260307_081828
|
Benchmark for Backfill Candidate: cr_10.2196_67967
**Architecture:** The study evaluates a fine-tuned `scispaCy` model against two domain-specific LLMs: **NYUTron** (110M parameters) and **GatorTron** (345M parameters). Both are highly optimized "tiny" architectures suitable for clinical NL...
|
03-29 08:01 | Success | - | |
|
exp_cr_10.24252_literatify.v5i1.44458_20260307_094222
|
Benchmark: Classical VSM Retrieval-Augmented Generation (RAG)
**Report: Literature Review on Vector Space Models (VSM)** **Type:** Literature Review (Traditional Information Retrieval) **Relevance:** Low (Non-Neural), but applicable to RAG preprocessing. * **Architecture:** Analyzes the classic **Vect...
|
03-29 08:01 | Success | - | |
|
exp_cr_10.24425_jppr.2024.151253_20260307_102906
|
Benchmark: Hybrid Swin-Transformer YOLOv5 vs. Standard CNN
**Architecture:** Modifies the YOLOv5m baseline by integrating a Swin Transformer (Swin-T) module into the backbone network. It also utilizes K-means++ for anchor optimization and Efficient IoU (EIoU) loss to improve bounding box regression...
|
03-29 08:01 | Success | - | |
|
exp_cr_10.29019_enfoqueute.1204_20260307_071307
|
```markdown
README.md bash pip install torch transformers accelerate psutil bash python benchmark.py
|
03-29 08:01 | Pending | - | |
|
exp_cr_10.29019_enfoqueute.1204_20260307_110702
|
This benchmark evaluates the performance efficiency of **Mamba**, a State Space Model (SSM), compared to a traditional T...
README.md This benchmark evaluates the performance efficiency of **Mamba**, a State Space Model (SSM), compared to a traditional Transformer architecture (GPT-2). The innovation of Mamba lies in its ability to maintain linear time complexit...
|
03-29 08:01 | Success | - | |
|
exp_cr_10.3233_mas-221411_20260307_124500
|
---
README.md Benchmark: Bayesian Inference with Smoothed Dirichlet Priors This repository contains a runnable benchmark designed to evaluate the computational performance and accuracy of Bayesian inference using **Smoothed Dirichlet Priors** o...
|
03-29 08:01 | Success | - | |
|
exp_cr_10.3389_frobt.2025.1518965_20260307_080556
|
Model Compression Benchmark: Precision Reduction & Pruning
This paper provides a comprehensive methodological framework for optimizing Large Language Models (LLMs) within the ARES 8GB hardware constraints. As a survey, it does not propose a specific architecture but evaluates compression techniques...
|
03-29 08:01 | Success | - | |
|
exp_cr_10.3390_agronomy14040673_20260307_085702
|
Benchmark: Hybrid CNN-Transformer for Agronomy (cr_10.3390_agronomy14040673)
**Architecture:** Hybrid framework combining a Densely Connected CNN for multilevel local feature extraction with a Transformer module for global context capture. A Cycle-GAN is utilized for training data augmentation but is excluded during...
|
03-29 08:01 | Success | - | |
|
exp_cr_10.3390_app14188526_20260307_103037
|
SA-LSTM Time Series Regression Benchmark
**Summary for ARES 8GB Roadmap** * **Architecture:** The paper proposes a hybrid **Long Short-Term Memory (LSTM)** network integrated with a **Self-Attention Mechanism (SA-LSTM)**. This architecture weights specific time-steps in the input...
|
03-29 08:01 | Success | - | |
|
exp_cr_10.3390_designs10020030_20260307_103414
|
Benchmark: Local VLM Viability on ARES 8GB Roadmap
README.md Benchmark: Local VLM Viability on ARES 8GB Roadmap **Context:** The target candidate (`cr_10.3390_designs10020030`) proposes a cloud-centric hybrid architecture utilizing the ChatGPT API. The review highlights that this is **Low F...
|
03-29 08:01 | Success | - | |
|
exp_cr_10.3390_electronics13183710_20260307_082805
|
Section 1: README.md
**Architecture:** Hybrid model utilizing multi-scale frequency decomposition. High-frequency data is processed via a Temporal GNN with an Adaptive Graph Learning module, while low-frequency data uses a Bidirectional Temporal Network, fused...
|
03-29 08:01 | Success | - | |
|
exp_cr_10.3390_en18184924_20260307_113225
|
Section 1: README.md
**Architecture:** The proposed model is a hybrid statistical system combining Monte Carlo filters for state estimation with a clustering algorithm (likely K-Means or similar) for outlier removal and forecasting. It is not a neural network o...
|
03-29 08:01 | Success | - | |
|
exp_cr_10.3390_math12182941_20260307_083419
|
Benchmark: Arabic Transformer Ensemble (AMFND)
**Architecture:** Proposes a weighted-average ensemble of five heterogeneous Arabic Transformers (AraBERT, MARBERT, AraELECTRA, AraGPT2, ARBERT). **Memory Footprint:** **Critical Bottleneck.** Concurrently loading five distinct encoder/deco...
|
03-29 08:01 | Success | - | |
|
exp_cr_10.3390_rs17183200_20260307_085344
|
TransMambaCNN Benchmark
**Architecture** TransMambaCNN utilizes a dual-branch topology to fuse global and local spatiotemporal features. The global branch replaces standard self-attention with a **Convolutional State-Space Module (C-SSM)**, combining an Attentive...
|
03-29 08:01 | Success | - | |
|
exp_cr_10.3390_rs18050793_20260307_081336
|
Here is the benchmark design for the underwater fusion architecture with Variable Mixture-of-Experts (vMoE).
README.md bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_cr_10.3390_rs18050793_20260307_081511
|
Benchmark: Underwater Fusion vMoE (cr_10.3390_rs18050793)
README.md Benchmark: Underwater Fusion vMoE (cr_10.3390_rs18050793) Overview This benchmark evaluates the performance characteristics of the **Variable Mixture-of-Experts (vMoE)** mechanism proposed for fusing camera and sonar data in under...
|
03-29 08:01 | Success | - | |
|
exp_cr_10.3390_s24072091_20260307_161606
|
Benchmark: Bayesian Neural Network (BNN) Surrogate for Structural Health Monitoring
**Paper Analysis: BNNs for Structural Health Monitoring (SHM)** **Architecture:** The paper proposes a **Bayesian Neural Network (BNN)** utilizing probabilistic inference to predict structural displacement. It operates within a "dual-drive"...
|
03-29 08:01 | Success | - | |
|
exp_cr_10.3390_s25185786_20260307_085434
|
MFT-Net: Hybrid CNN-Transformer Benchmark
**Architecture** The paper proposes MFT-Net, a hybrid architecture that integrates a Convolutional Neural Network (CNN) for local feature extraction with a Transformer module for global dependency modeling. It utilizes Squeeze-and-Excitatio...
|
03-29 08:01 | Success | - | |
|
exp_cr_10.3390_s25185805_20260307_125105
|
FILE_BREAK
**Architecture:** Uses a customized **BLIP-2** framework with a Q-Former to fuse heterogeneous inputs (visual frames, kinematic data) into low-dimensional embeddings representing "task demand" and "driving capability" within a shared latent...
|
03-29 08:01 | Pending | - | |
|
exp_cr_10.3390_s25185805_20260307_155403
|
This benchmark evaluates the performance characteristics of the BLIP-2 architecture when utilized for embedding extracti...
**Architecture:** Uses a customized **BLIP-2** framework with a Q-Former to fuse heterogeneous inputs (visual frames, kinematic data) into low-dimensional embeddings representing "task demand" and "driving capability" within a shared latent...
|
03-29 08:01 | Success | - | |
|
exp_cr_10.3390_sym17030471_20260307_154914
|
Benchmark: Improved Model-Free Adaptive Predictive Control (MFAPC) under DoS and Quantization
**Verdict: Incompatible** This paper addresses **Control Theory** (Model-Free Adaptive Predictive Control), not Deep Learning. It focuses on networked cyber-physical systems under DoS attacks and does not describe a neural network architect...
|
03-29 08:01 | Success | - | |
|
exp_cr_10.36724_2072-8735-2024-18-3-41-49_20260307_110401
|
Backfill Candidate: cr_10.36724_2072-8735-2024-18-3-41-49
**Status: Irrelevant** This paper addresses **telecommunications protocols** (specifically queueing theory and traffic shaping for high-throughput satellites), not Deep Learning. * **Architecture:** N/A. The paper proposes a mathematical pr...
|
03-29 08:01 | Success | - | |
|
exp_cr_10.3897_jucs.94657_20260307_160925
|
Section 1: README.md
PlantKViT Architecture Benchmark This benchmark evaluates the performance characteristics of the **PlantKViT** hybrid architecture (Vision Transformer + KNN Classifier). Architecture Overview The benchmark simulates the deployment scenario...
|
03-29 08:01 | Success | - | |
|
exp_cr_10.51519_journalisi.v7i1.1024_20260307_093251
|
---
**Subject:** IT-Based Knowledge Sharing System with LLM Integration **Architecture:** Conceptual system architecture proposing the integration of Large Language Models (specifically ChatGPT) into university IT ticketing systems. The design...
|
03-29 08:01 | Pending | - | |
|
exp_cr_10.51519_journalisi.v7i1.1024_20260307_095059
|
Benchmark: Local Knowledge Sharing System (RAG-Lite)
**Subject:** IT-Based Knowledge Sharing System with LLM Integration **Architecture:** Conceptual system architecture proposing the integration of Large Language Models (specifically ChatGPT) into university IT ticketing systems. The design...
|
03-29 08:01 | Success | - | |
|
exp_cr_10.52783_jisem.v10i3.4744_20260307_083344
|
This benchmark evaluates the computational efficiency of a hybrid **Enhanced Vision Transformer (EViT) + BiLSTM** archit...
**Architecture:** The paper proposes a hybrid architecture combining an Enhanced Vision Transformer (EViT) with a Bidirectional LSTM (BiLSTM) for glaucoma detection. The EViT extracts global spatial features, while the BiLSTM processes sequ...
|
03-29 08:01 | Success | - | |
|
exp_cr_10.55041_ijsrem57223_20260307_103235
|
BiLAT Architecture Benchmark
This benchmark implements a representative **BiLAT** (Bidirectional LSTM with Attention and Transformer components) model to verify the architectural claims regarding memory footprint and inference speed. Architecture Details The implemente...
|
03-29 08:01 | Success | - | |
|
exp_cr_10.58414_scientifictemper.2025.16.2.03_20260307_110040
|
Summary of reasoning
**Analysis for ARES 8GB Roadmap** * **Architecture:** The MRMGKTL model combines a standard Transformer encoder with a Gaussian Kernel classifier. Crucially, it utilizes a pre-processing pipeline involving Sokal–Michener’s multivariate reli...
|
03-29 08:01 | Success | - | |
|
exp_gh_Dao-AILab_flash-attention_20260307_164230
|
This repository contains a minimal benchmark to evaluate the performance and memory efficiency of **Dao-AILab/flash-atte...
README.md This repository contains a minimal benchmark to evaluate the performance and memory efficiency of **Dao-AILab/flash-attention**. Overview Flash Attention is a precise attention algorithm that significantly reduces memory usage (HB...
|
03-29 08:01 | Success | - | |
|
exp_gh_EvanVOSSIER_birdnet-onnx-converter_20260307_215337
|
Benchmark: EvanVOSSIER/birdnet-onnx-converter
README.md Benchmark: EvanVOSSIER/birdnet-onnx-converter This benchmark evaluates the inference performance of BirdNET models converted to ONNX format. It focuses on measuring the throughput (audio processed per second) and memory usage (VRA...
|
03-29 08:01 | Success | - | |
|
exp_gh_huggingface_transformers_20260307_170900
|
Hugging Face Transformers Inference Benchmark
README.md Hugging Face Transformers Inference Benchmark This repository contains a focused benchmark designed to evaluate the inference performance of the `huggingface/transformers` library. The objective is to measure the efficiency of a s...
|
03-29 08:01 | Success | - | |
|
exp_gh_robloxexploiterponole_aegis-trainer_20260308_000255
|
AEGIS AI Trainer: Layer-Streaming Benchmark
README.md AEGIS AI Trainer: Layer-Streaming Benchmark This benchmark demonstrates the core innovation behind **AEGIS AI Trainer**: the ability to train massive Mixture of Experts (MoE) and dense models (80B+ parameters) on consumer hardware...
|
03-29 08:01 | Success | - | |
|
exp_gh_svg-project_Sparse-VideoGen_20260307_231058
|
Here is the benchmark design for the **svg-project/Sparse-VideoGen** innovation.
This benchmark focuses on the core efficiency claim: replacing Dense Global Attention with Sparse Sliding-Window Attention to reduce VRAM usage and increase throughput in Video Diffusion Transformers. --- README.md Benchmark: Sparse VideoGe...
|
03-29 08:01 | Success | - | |
|
exp_gh_vllm-project_vllm_20260307_162231
|
vLLM Benchmark Suite
README.md vLLM Benchmark Suite This benchmark evaluates the inference performance of **vLLM**, a high-throughput and memory-efficient inference engine. It focuses on measuring the engine's ability to manage KV Cache memory (PagedAttention)...
|
03-29 08:01 | Success | - | |
|
exp_hf_2603.03942_20260308_040339
|
Benchmark: Lightweight Visual Reasoning Feedback Loop
This benchmark simulates the architectural difference between a standard Vision-Language Model (VLM) and the proposed **Lightweight Visual Reasoning** approach. **The Innovation:** The paper introduces a "language-to-vision feedback module"...
|
03-29 08:01 | Success | - | |
|
exp_hf_2603.04800_20260307_070100
|
Benchmark: MASQuant Modality-Aware Quantization
README.md Benchmark: MASQuant Modality-Aware Quantization This benchmark evaluates the performance characteristics of **MASQuant (Modality-Aware Smoothing Quantization)** principles applied to a Multimodal Large Language Model (MLLM) archit...
|
03-29 08:01 | Pending | - | |
|
exp_hf_2603.04800_20260307_070855
|
**Benchmark: MASQuant (Modality-Aware Smoothing Quantization)**
README.md **Benchmark: MASQuant (Modality-Aware Smoothing Quantization)** This repository provides a lightweight, synthetic benchmark to evaluate the core performance benefits of **MASQuant**, specifically focusing on its ability to handle...
|
03-29 08:01 | Success | - | |
|
exp_hf_2603.04800_20260307_164050
|
MASQuant Benchmark Suite
README.md MASQuant Benchmark Suite This benchmark evaluates the **Modality-Aware Smoothing Quantization (MASQuant)** framework. Overview MASQuant addresses "Smoothing Misalignment" and "Cross-Modal Computational Invariance" in Multimodal LL...
|
03-29 08:01 | Success | - | |
|
exp_oa_W4415031789_20260307_090407
|
Here is the benchmark design to validate the findings of the T2I survey paper (Backfill Candidate oa_W4415031789), speci...
**Architecture:** Surveys 141 T2I works (2021–2024), categorizing them into Autoregressive, GAN, and Diffusion foundations. Highlights **Mamba** and Multimodality as emerging architectures for future performance gains, potentially offering...
|
03-29 08:01 | Success | - | |
|
exp_oa_W4415248384_20260307_081445
|
Innovation Benchmark: Mamba vs Transformer for 6G Edge Inference
**Subject:** Analysis of *A Comprehensive Survey of Large AI Models for Future Communications* This survey evaluates Large AI Models (LAMs) for 6G, reviewing **Transformers, Diffusion, and Mamba** architectures. Key takeaways for the ARES 8...
|
03-29 08:01 | Success | - | |
|
exp_oa_W4415248384_20260307_081717
|
Thought Process for Code Generation:
**Subject:** Analysis of *A Comprehensive Survey of Large AI Models for Future Communications* This survey evaluates Large AI Models (LAMs) for 6G, reviewing **Transformers, Diffusion, and Mamba** architectures. Key takeaways for the ARES 8...
|
03-29 08:01 | Success | - | |
|
exp_oa_W7133137559_20260308_082020
|
Section 1: README.md
**Architecture:** Theoretical analysis of Transformer embeddings and the $O(n^2)$ complexity of attention mechanisms. Reviews optimization techniques including token pruning, sparse attention, and long-context extensions. **Memory Footprint...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307082736.001_20260307_082807
|
Python Skill Fallback
Title: Automated Package Builder and Strict Type Verifier - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307083022.001_20260307_083054
|
Strictly-Typed Package Dependency Resolver
README.md Strictly-Typed Package Dependency Resolver Overview This benchmark implements a robust dependency resolution engine using Python's standard `typing` module. It leverages `Protocol`, `TypedDict`, and Type Aliases to enforce strict...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307083656.002_20260307_083722
|
Overview
README.md Overview This benchmark evaluates the implementation of a **Dynamic Generic Plugin Loader** utilizing **PEP 695 Type Parameter Syntax** (available in Python 3.12+). Key Features 1. **PEP 695 Implementation**: Defines `PluginRegist...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307085212.003_20260307_085251
|
Strict Type-Hinted Package Builder and Validator
README.md Strict Type-Hinted Package Builder and Validator Description This benchmark tests an autonomous coding system's ability to programmatically construct a PEP 561 compliant Python package and validate its structural and type integrit...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307090113.004_20260307_090149
|
Dynamic Package Scaffolder with Runtime Type Verification
This benchmark evaluates an agent's ability to programmatically generate a Python package structure that adheres to packaging standards (PEP 8) and utilizes advanced typing protocols (PEP 484/585). Objective Implement a function `build_and_...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307093135.001_20260307_093213
|
Self-Introspecting Typed Plugin System
README.md Self-Introspecting Typed Plugin System This benchmark demonstrates a robust, self-contained plugin architecture using Python's standard library. It simulates a Python package environment by dynamically generating plugin modules at...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307094021.001_20260307_094050
|
This benchmark demonstrates the creation of a dynamic, in-memory Python package structure without writing files to disk....
README.md This benchmark demonstrates the creation of a dynamic, in-memory Python package structure without writing files to disk. It utilizes `sys.modules` and `types` to simulate a package named `internal_plugins` containing dynamically g...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307094405.001_20260307_094459
|
Design rationale:
The `benchmark.py` script is designed to fulfill the "Runtime-Verified Plugin Architecture" requirement. 1. **Typing**: It defines a `DataProcessor[T]` Protocol using `typing` module features. 2. **Packaging**: It uses `pathlib` to create a...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307094639.001_20260307_094713
|
Python Skill Fallback
Title: Strictly Typed Plugin Architecture with Packaging Simulation - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307095241.002_20260307_095316
|
PEP 695 Generic Resource Pool Benchmark
This benchmark evaluates the implementation of a generic resource pool using Python 3.12+'s PEP 695 Type Parameter Syntax. Overview **Hypothesis**: Utilizing Python 3.12+ Type Parameter Syntax allows for more concise and maintainable generi...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307095939.003_20260307_100015
|
Benchmark: Dynamic Package Loader with Protocol Enforcement
README.md Benchmark: Dynamic Package Loader with Protocol Enforcement Objective To evaluate the ability of a Python system to dynamically load code from a temporary file system structure and enforce strict type safety using `typing.Protocol...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307100559.004_20260307_100638
|
Robust CLI Configuration Merger
README.md Robust CLI Configuration Merger Objective This benchmark evaluates the ability to write a robust, type-safe Python utility that performs a recursive deep merge of JSON configurations. The solution must adhere to strict static typi...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307102243.005_20260307_102322
|
Python Skill Fallback
Title: Robust Typed Plugin Loader with Namespace Inspection - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307102938.006_20260307_102957
|
Python Skill Fallback
Title: Typing-Driven Dynamic Plugin Loader - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307103844.007_20260307_103921
|
Type-Safe Generic Component Registry Benchmark
README.md Title: Type-Safe Generic Component Registry Benchmark Description This benchmark evaluates the implementation of a modular, type-safe dependency-injection style registry system using Python's standard library. It focuses on struct...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307104523.008_20260307_104609
|
Python Skill Fallback
Title: Generic Task Queue with Package Metadata - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307105159.009_20260307_105234
|
Self-Contained ZipApp Generator with Type Safety
This benchmark tests the ability to programmatically generate a strictly-typed Python package structure, compile it into a executable Zip Application (`.pyz`) using the standard library, and verify its execution integrity. Requirements - Py...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307105923.010_20260307_105948
|
Dynamic Plugin Loader with Strict Protocol Validation
README.md Dynamic Plugin Loader with Strict Protocol Validation Overview This benchmark demonstrates a robust, zero-dependency plugin architecture in Python. It utilizes `importlib` for dynamic discovery and loading of modules from a tempor...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307110558.011_20260307_110624
|
Type-Safe Extensible Log Formatter
This coding drill evaluates a system's ability to design a robust, extensible logging architecture using Python's advanced type hinting system. The focus is on defining structural interfaces (`Protocol`), creating generic containers for dyn...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307112731.012_20260307_112753
|
Auto-Registry System Benchmark
README.md Auto-Registry System Benchmark This benchmark evaluates the implementation of a robust, dynamic class registry system using Python's standard library. It simulates a modular plugin architecture, similar to those found in Hugging F...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307113452.013_20260307_113530
|
Generic Registry with Dynamic Module Discovery
README.md Generic Registry with Dynamic Module Discovery This benchmark demonstrates a decoupled plugin architecture using Python's standard library. Design Philosophy Modern frameworks require extensibility without modifying core logic. Th...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307124109.014_20260307_124125
|
```markdown
bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307124707.015_20260307_124728
|
```markdown
bash python3.12 benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307153616.001_20260307_153705
|
This benchmark evaluates a data transformation pipeline design that leverages Python's `typing.Protocol`, Generics (`Typ...
README.md This benchmark evaluates a data transformation pipeline design that leverages Python's `typing.Protocol`, Generics (`TypeVar`), and `typing` module features to enforce structural typing and type safety. Design Principles 1. **Prot...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307154009.001_20260307_154041
|
Python Skill Fallback
Title: Strictly-Typed Plugin Registry with PEP 562 Lazy Loading - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307154643.002_20260307_154718
|
PEP 695 Dynamic Package Benchmark
README.md PEP 695 Dynamic Package Benchmark This benchmark evaluates an autonomous coding system's ability to generate and verify modern Python typing constructs (PEP 695) within a dynamic file structure. Objective The script programmatical...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307155254.003_20260307_155324
|
Python Reliability Drill: Typing & Packaging Benchmark
README.md Python Reliability Drill: Typing & Packaging Benchmark This benchmark evaluates a candidate's ability to implement robust utilities focusing on static analysis, type hint validation, and package structure verification using only t...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307160444.004_20260307_160531
|
```markdown
README.md bash python benchmark.py RUNNING SELF-TESTS... [OK] ... BENCHMARKING... VRAM_USAGE: <value>MB TOKENS_PER_SEC: <value> VERIFIED: ... ---
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307161054.005_20260307_161123
|
Runtime Package Composition with Generic Protocols
This benchmark evaluates your ability to programmatically construct a Python package hierarchy using standard library modules like `types` and `importlib`, while enforcing strict type safety using `typing.Protocol` and `typing.Generic`. Obj...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307161841.006_20260307_161914
|
Dynamic Module Loader with Protocol Enforcement
README.md Dynamic Module Loader with Protocol Enforcement Objective This benchmark tests the ability to dynamically construct a local package structure at runtime, discover modules using `importlib`, and rigorously enforce interface complia...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307163924.007_20260307_164000
|
Typed Component Registry System Benchmark
README.md Typed Component Registry System Benchmark Overview This benchmark demonstrates a scalable Python package structure using **structural subtyping** (`typing.Protocol`) and a **registration-based architecture**. It simulates a scenar...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307164606.008_20260307_164638
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307165300.009_20260307_165324
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307165852.010_20260307_165921
|
Benchmark: Concurrent ZipApp Packager
README.md Benchmark: Concurrent ZipApp Packager Overview This benchmark evaluates a Python engineer's ability to construct a robust, standalone CLI packaging tool. The core task involves building `packager.py`, which demonstrates concurrent...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307170449.011_20260307_170515
|
```markdown
bash python benchmark.py text VRAM_USAGE: 0MB TOKENS_PER_SEC: <calculated_speed> VERIFIED: All plugins loaded and structural typing checks passed.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307171105.012_20260307_171140
|
Benchmark: Typed CLI Tool for Hyperparameter Validation
README.md Benchmark: Typed CLI Tool for Hyperparameter Validation Objective This benchmark evaluates the robustness and efficiency of a Python-based CLI tool designed to validate machine learning training configurations. The implementation...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307171803.013_20260307_171829
|
Type-Safe Plugin Registry with Semantic Version Resolution
README.md This benchmark evaluates a Python system's capability to manage a type-safe plugin architecture using only the standard library. Overview The system implements a `Plugin` Protocol and a central `Registry`. It demonstrates: 1. **Dy...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307172409.014_20260307_172438
|
Dynamic Component Registry with Runtime Type Validation
README.md Dynamic Component Registry with Runtime Type Validation This coding drill benchmarks your ability to design a robust, plugin-based architecture in Python using only the standard library. Objective You must construct a single execu...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307173023.015_20260307_173142
|
Python Reliability Drill: Typing & Generics
README.md Python Reliability Drill: Typing & Generics This benchmark evaluates a Python engineer's ability to implement robust, type-safe utilities using the standard library. Overview The drill implements a `TypedStore` utility leveraging...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307173722.016_20260307_173759
|
Python Skill Fallback
Title: Runtime Type-Safe Plugin Loader - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307174438.017_20260307_174505
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307175102.018_20260307_175132
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307180633.019_20260307_180705
|
Python Skill Fallback
Title: Generic Model Factory with Type-Safe Configuration - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307181235.020_20260307_181305
|
Benchmark: Dynamic Module Loader with Structural Type Verification
README.md Benchmark: Dynamic Module Loader with Structural Type Verification Objective This benchmark evaluates the robustness of a dynamic plugin loading system in Python. It simulates a high-performance environment (similar to LLM kernel...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307181930.021_20260307_182014
|
Python Skill Fallback
Title: Robust Plugin Loader with Protocol Enforcement - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307182621.022_20260307_182648
|
Generic Result Monad with PEP 695
This drill implements a robust `Result[T, E]` Monad (Generic Wrapper) using Python 3.12+ features. Features * **PEP 695 Type Parameters**: Uses the new syntax `class Result[T, E]:` instead of `typing.Generic`. * **Module Structure**: Explic...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307184150.023_20260307_184226
|
Python Reliability Drill: Runtime Typing & Validation
README.md Python Reliability Drill: Runtime Typing & Validation Objective This benchmark evaluates the robustness and reliability of a Python utility designed to perform runtime type validation using the standard `typing` module. The goal i...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307184859.024_20260307_184938
|
Benchmark: Strictly-Typed CLI Data Exporter
README.md Benchmark: Strictly-Typed CLI Data Exporter This benchmark evaluates a Python implementation that adheres to strict static typing using `typing.TypeVar`, `typing.Generic`, and `typing.Protocol`. It verifies the robustness of a dat...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307190316.025_20260307_190338
|
Typing-First Configuration Module Benchmark
This benchmark evaluates the creation of a robust, strictly typed configuration management system using only Python's standard library. Overview The goal is to implement a `ConfigLoader` that enforces schema validation using `typing.TypedDi...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307190938.026_20260307_191002
|
Typed Component Registry and Config Validator
README.md Typed Component Registry and Config Validator **Hypothesis:** A generic registry pattern combined with runtime type introspection (using `typing` and `inspect`) can create a robust, self-validating factory system, reducing runtime...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307191536.027_20260307_191607
|
Strictly Typed Tensor Core with Module Encapsulation
README.md Strictly Typed Tensor Core with Module Encapsulation Overview This coding drill benchmarks the implementation of a robust, strictly typed `Tensor` data structure using only the Python Standard Library. It demonstrates advanced typ...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307192143.028_20260307_192225
|
Python Skill Fallback
Title: Strict Generic Box Package - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307192804.029_20260307_192822
|
Self-Validating Package Scaffold Generator Benchmark
README.md Self-Validating Package Scaffold Generator Benchmark This benchmark evaluates a Python script's ability to programmatically generate a standards-compliant Python package structure ("src-layout") based on a strict `TypedDict` confi...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307193451.030_20260307_193515
|
Strict Runtime Interface Validator
README.md Strict Runtime Interface Validator Overview This coding drill benchmarks your ability to construct a robust Python module loader that guarantees strict adherence to a defined interface at runtime. It leverages `importlib` for dyna...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307194040.031_20260307_194119
|
Strictly-Typed Modular Log Aggregator
README.md Strictly-Typed Modular Log Aggregator Design Hypothesis This benchmark tests the hypothesis that enforcing strict type annotations (TypedDict, Protocols) and separating CLI logic from core business logic within a single artifact i...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307194710.032_20260307_194736
|
Python Skill Fallback
Title: Type-Safe Configuration Registry for Multi-Modal Models - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307195308.033_20260307_195347
|
Strictly-Typed Kernel Loader Registry
This repository contains a single-file Python benchmark designed to simulate a high-performance kernel loading system similar to those found in vLLM or PyTorch. Overview In systems requiring high throughput, computational kernels (e.g., mat...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307195915.034_20260307_195945
|
Dynamic Type-Safe Plugin Loader Benchmark
README.md Dynamic Type-Safe Plugin Loader Benchmark Hypothesis An autonomous coding system can construct a robust, dependency-free plugin architecture using Python's standard library. By leveraging `typing.Protocol` for structural subtyping...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307200553.035_20260307_200627
|
Generic Data Processing Framework Benchmark
README.md Generic Data Processing Framework Benchmark This benchmark evaluates a robust data processing pipeline implementation utilizing modern Python typing features introduced in PEP 695 (Type Parameter Syntax) and PEP 484 (Protocols). D...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307201300.036_20260307_201329
|
Typed Async Service Package Benchmark
README.md Typed Async Service Package Benchmark Objective This benchmark evaluates a single-file Python script designed to function as a lightweight, installable-style package. The focus is on strict typing adherence, proper `asyncio` usage...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307201955.001_20260307_202020
|
Strictly Typed Dynamic Module Loader
README.md Strictly Typed Dynamic Module Loader **Objective:** This benchmark tests the reliability and performance of a Python-based plugin loading system that leverages advanced `typing` features (Protocols and Generics) to enforce runtime...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307202626.002_20260307_202654
|
PEP 695 Generic Repository & Module Encapsulation Benchmark
README.md PEP 695 Generic Repository & Module Encapsulation Benchmark This benchmark validates the implementation of a generic repository system using **Python 3.12+ Type Parameter Syntax (PEP 695)** and strict **Module Encapsulation** (`__...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307203324.003_20260307_203350
|
Type-Safe Dependency Resolver Engine
This benchmark is designed to test a Python engineering system's ability to implement a robust, type-safe algorithm using only the standard library. Objective Create a dependency resolution engine that calculates the correct installation or...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307204007.004_20260307_204033
|
Benchmark: Strictly-Typed Generic Pipeline
README.md Benchmark: Strictly-Typed Generic Pipeline Overview This benchmark implements a robust, single-file `DataPipeline` using Python's advanced static typing features. It demonstrates how Generics, Protocols, and Type Guards can be use...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307205246.005_20260307_205322
|
Strictly Typed Plugin Architecture Benchmark
README.md Strictly Typed Plugin Architecture Benchmark This benchmark evaluates the design of a robust, extensible command registry within a single file, leveraging Python's `typing` module for strict interface enforcement and simulation of...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307205852.006_20260307_205923
|
**Project:** Dynamic Extension Loader with Protocol Verification Benchmark
README.md **Project:** Dynamic Extension Loader with Protocol Verification Benchmark **Description:** This benchmark demonstrates a zero-dependency plugin architecture using Python's standard library. It programmatically generates a tempora...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307210514.007_20260307_210544
|
Python Skill Fallback
Title: Typing-Driven Dynamic Plugin Loader - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307212344.008_20260307_212415
|
Generic Asset Loader Benchmark
This benchmark tests the creation of a robust, reusable generic asset loader using Python 3.12's new Type Parameter Syntax (PEP 695) and the modern `importlib.resources` API for packaging. Objectives 1. **PEP 695 Implementation:** Define cl...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307213028.009_20260307_213107
|
**Title:** Type-Safe Dynamic Module Loader Benchmark
README.md **Title:** Type-Safe Dynamic Module Loader Benchmark **Objective:** Validate a dynamic module loading strategy using `typing.Protocol` for structural subtyping (duck typing) verification at runtime. **Description:** This benchmark...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307213734.010_20260307_213807
|
Python Skill Fallback
Title: Typed Async Plugin Loader - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307214405.011_20260307_214439
|
Dynamic Virtual Package Loader with Generic Protocol Enforcement
This benchmark demonstrates an advanced Python pattern involving the dynamic construction of Python modules in-memory without touching the filesystem, combined with structural subtyping (Protocol) enforcement. This mirrors how modern plugin...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307215057.012_20260307_215121
|
Robust Dynamic Plugin Registry Benchmark
README.md Robust Dynamic Plugin Registry Benchmark This benchmark tests the hypothesis that an autonomous system can construct a robust, type-safe plugin architecture using Python's standard library. It mirrors the dynamic model loading mec...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307215718.013_20260307_215756
|
Strict Protocol Enforcement and Virtual Package Management Benchmark
README.md Strict Protocol Enforcement and Virtual Package Management Benchmark Design Brief This benchmark simulates the internal architecture of robust AI libraries like **vLLM** or **PyTorch**. It focuses on the problem of dynamic backend...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307221100.014_20260307_221137
|
Benchmark: Strictly-Typed Recipe Executor with Metadata Validation
README.md Benchmark: Strictly-Typed Recipe Executor with Metadata Validation This benchmark tests the ability to write robust, production-grade Python code that enforces strict typing using modern type hinting features (`Protocol`, `Generic...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307221746.015_20260307_221819
|
Python Skill Fallback
Title: Type-Safe Generic Resource Pool with Modern Packaging Hygiene - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307222338.016_20260307_222410
|
Coding Drill: Strict Typed Data Ingestion Module
README.md Coding Drill: Strict Typed Data Ingestion Module Objective This benchmark evaluates the candidate's ability to construct a robust, production-ready Python data processing module using strictly the Standard Library. The focus is on...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307223031.017_20260307_223107
|
Type-Safe Dynamic Plugin System
A Python benchmark demonstrating advanced packaging and typing capabilities by implementing a dynamic discovery system. The system loads code from a virtual package structure at runtime, enforcing strict interface compliance using `typing.P...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307223640.018_20260307_223705
|
Dynamic Package Loader with Runtime Type Enforcement
README.md Title: Dynamic Package Loader with Runtime Type Enforcement Objective This benchmark tests a Python engineer's ability to programmatically manipulate the Python import system, construct valid in-memory package structures, and enfo...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307224307.019_20260307_224332
|
Robust Dynamic Plugin Loader with Protocol Validation
README.md Robust Dynamic Plugin Loader with Protocol Validation Overview This benchmark demonstrates the construction of a modular, extensible application architecture using Python's standard library. It simulates a plugin system where modu...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307225831.020_20260307_225901
|
Generic Component Registry Benchmark
README.md Generic Component Registry Benchmark Overview This benchmark tests the ability of an autonomous coding agent to construct a sophisticated, type-safe plugin system using only the Python standard library. Core Concepts The system ut...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307230500.021_20260307_230535
|
Benchmark: Robust Dynamic Module Loader with TypeGuard Validation
README.md Benchmark: Robust Dynamic Module Loader with TypeGuard Validation **Overview** This benchmark tests a Python engine's ability to programmatically generate a file-system package structure, dynamically import it using `importlib`, a...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307231227.022_20260307_231251
|
---
README.md --- Modern Generic Distribution Inspector Hypothesis Adopting PEP 695 Type Parameter Syntax simplifies the definition of generic container classes and type aliases, reducing the boilerplate and cognitive load associated with legac...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307231904.023_20260307_231938
|
Strictly-Typed Async Worker Module Benchmark
README.md Strictly-Typed Async Worker Module Benchmark This benchmark evaluates a Python system's ability to structure a professional, single-file software package. It specifically targets strict type usage (Generics), public API definition...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307232519.024_20260307_232550
|
Strict Package Metadata Validator
README.md Strict Package Metadata Validator Overview This coding drill benchmark tests an autonomous coding system's ability to utilize Python's static typing system, specifically `TypedDict` and strict type checking protocols. The script i...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307233124.025_20260307_233153
|
Strict Package API Validator Benchmark
README.md Strict Package API Validator Benchmark Overview This coding drill benchmarks a robust, dependency-free implementation of a **Package API Validator**. The goal is to enforce packaging hygiene and type safety at runtime by validatin...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307234613.026_20260307_234649
|
Python Skill Fallback
Title: Dynamic Component Loader with Strict Typing and Dependency Validation - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307235226.027_20260307_235254
|
Generic Training Pipeline with Runtime Protocol Validation
README.md Generic Training Pipeline with Runtime Protocol Validation This benchmark evaluates the implementation of a strictly typed, mock machine learning training pipeline using Python's standard library advanced typing features. Objectiv...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260307235846.028_20260307_235921
|
Python Skill Fallback
Title: Strictly-Typed Application Configuration Manager - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308000500.029_20260308_000533
|
Python Skill Fallback
Title: Runtime Plugin Discovery with Strict Protocol Validation - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308001106.030_20260308_001128
|
Strict Typing and Module Structure for Async Handlers
Overview This benchmark evaluates your ability to construct a robust, distributable Python library module (`handler_lib.py`) that adheres to strict type-checking protocols and packaging conventions. Objectives 1. **Module Structure**: Prope...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308001745.031_20260308_001808
|
Python Skill Fallback
Title: Type-Safe Backend Dispatcher with Namespace Isolation - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308002412.032_20260308_002447
|
Strictly Typed Dynamic Configuration Dispatcher
This benchmark simulates the core of a lightweight ML framework where model components are instantiated dynamically based on type-safe configurations. It relies on Python's `typing.Protocol` for interface definition and `typing.get_type_hin...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308003042.033_20260308_003114
|
Typed Configuration Schema and Runtime Dependency Validator
README.md Typed Configuration Schema and Runtime Dependency Validator Objective This benchmark tests the ability to design a robust, type-safe configuration management module using standard Python libraries. It simulates the initialization...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308003708.034_20260308_003745
|
Strictly-Typed Dynamic Plugin Loader
README.md Strictly-Typed Dynamic Plugin Loader Objective This benchmark evaluates the ability to write a robust, modular Python system using advanced type hinting features (`typing.Protocol`, `typing.TypeVar`) and reflection tools (`importl...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308004344.035_20260308_004415
|
Type-Safe Dynamic Plugin Loader Benchmark
README.md Type-Safe Dynamic Plugin Loader Benchmark Objective This benchmark evaluates a Python 3.12+ implementation of a dynamic plugin system that enforces structural type safety at runtime without external dependencies. Technical Context...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308005017.036_20260308_005056
|
Section 1: README.md
Strict Type-Safe Package Scaffolder This benchmark evaluates your ability to design robust, type-safe Python filesystem tooling using modern standard library features (`dataclasses`, `Protocol`, `pathlib`). Objective Create a CLI tool that...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308005706.037_20260308_005731
|
Python Skill Fallback
Title: Metadata-Aware Typed Dispatcher - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308010332.038_20260308_010405
|
Strictly-Typed Dynamic Package Loader and Validator
README.md Strictly-Typed Dynamic Package Loader and Validator Overview This benchmark evaluates a Python system's capability to dynamically generate Python packages in a temporary filesystem, load them using `importlib`, and enforce strict...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308011000.039_20260308_011032
|
---
**README.md** StrictlyTypedAutoRegistry Benchmark Overview This benchmark implements a strictly-typed, plugin-based model registry system similar to the architecture found in Hugging Face Transformers or Diffusers, utilizing **only** the Py...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308011620.040_20260308_011643
|
**Title:** Dynamic Plugin Registry with Type-Safe Dispatch
README.md **Title:** Dynamic Plugin Registry with Type-Safe Dispatch **Description:** This benchmark evaluates an autonomous coding agent's ability to construct a robust, extensible plugin architecture using the Python standard library. The...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308012254.041_20260308_012315
|
Benchmark: Runtime Package Construction with Generic Protocol Enforcement
README.md Benchmark: Runtime Package Construction with Generic Protocol Enforcement Overview This benchmark validates an autonomous system's ability to synthesize a valid Python package structure at runtime. It dynamically generates source...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308012937.042_20260308_013007
|
PEP 695 Generic Repository Implementation Benchmark
README.md PEP 695 Generic Repository Implementation Benchmark This benchmark demonstrates the utilization of **PEP 695 (Type Parameter Syntax)** introduced in Python 3.12. It implements a thread-safe, generic in-memory `Repository` class us...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308013606.043_20260308_013634
|
Protocol-Based Dynamic Plugin Loader
Overview This benchmark validates a robust, modular Python architecture that enables runtime extensibility without tight coupling. It utilizes `typing.Protocol` to define structural interfaces and `importlib` to dynamically load code from a...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308015640.044_20260308_015700
|
Strictly-Typed Plugin Registry with Runtime Validation
README.md Strictly-Typed Plugin Registry with Runtime Validation Design Brief This benchmark validates a Python engineer's ability to construct a robust, extensible architecture using Python's advanced type system (Protocols, Generics) and...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308030305.045_20260308_030334
|
Robust Plugin Registry with Structural Subtyping
README.md Robust Plugin Registry with Structural Subtyping Hypothesis Utilizing structural subtyping (`typing.Protocol`) for package interfaces decouples implementation details from definition. This facilitates independent development and t...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308031001.046_20260308_031029
|
Strictly-Typed Plugin Registry Benchmark
README.md Strictly-Typed Plugin Registry Benchmark Overview This benchmark evaluates a robust `PluginRegistry` implementation designed for modular ML pipelines. It emphasizes strict type safety using Python's `typing.Protocol` and `typing.T...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308031602.047_20260308_031628
|
Dynamic Plugin Registry with Strict Structural Subtyping
This benchmark evaluates a Python engine's capability to dynamically construct a modular architecture using runtime code generation and strict structural subtyping (Protocols). Overview In modern MLOps systems, pipelines are often composed...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308032213.048_20260308_032246
|
---
README.md --- Generic Dependency Resolver and Module Structure Simulation Overview This coding drill benchmark, `benchmark.py`, implements `mini_installer.py` as a self-contained, type-safe Python module. It simulates a minimal package mana...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308032821.049_20260308_032848
|
Strictly-Typed Python Package Scaffolder
Overview This coding drill benchmarks the ability to construct a robust, file-system generator that strictly enforces data schemas before execution. The goal is to implement a standalone executable script (embedded within this benchmark) th...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308033504.050_20260308_033541
|
Type-Safe Dynamic Plugin Discovery System
README.md Type-Safe Dynamic Plugin Discovery System This benchmark validates a Python system that simulates an autonomous package distribution and import workflow. It programmatically generates a Python package structure on the disk, enforc...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308034241.051_20260308_034306
|
Dynamic Module Loader and Strict Interface Verifier
README.md Dynamic Module Loader and Strict Interface Verifier This benchmark evaluates the ability of a Python system to dynamically load code from a string source and rigorously validate its adherence to a `typing.Protocol` interface. Hypo...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308035511.052_20260308_035538
|
```markdown
README.md bash python benchmark.py ---
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308040139.053_20260308_040211
|
Dynamic Backend Loader with Type Protocol Validation
This benchmark simulates a high-performance plugin architecture commonly found in systems like vLLM or PyTorch, where backends (CUDA, CPU, FlashAttention implementations) are loaded dynamically based on availability or user configuration. T...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308040734.054_20260308_040800
|
Python Skill Fallback
Title: Generic Plugin Registry with Dynamic Module Loading - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308041409.055_20260308_041435
|
Dynamic Package Construction and Type Introspection Benchmark
README.md Dynamic Package Construction and Type Introspection Benchmark Overview This benchmark evaluates an autonomous coding system's ability to leverage Python 3.12+ features, specifically PEP 695 (Type Parameter Syntax). The system must...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308042050.056_20260308_042120
|
Strictly-Typed Dynamic Plugin Loader
README.md Strictly-Typed Dynamic Plugin Loader This coding drill benchmarks the creation of a robust, dynamic extension system using Python's standard library. Context Traditional plugin architectures in Python often rely on loose conventio...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308042711.057_20260308_042738
|
Type-Safe Plugin Registry & Package Mock Benchmark
This benchmark evaluates the ability of a system to construct a valid Python package structure using standard library typing features. The script simulates a distributable library `datatools` that defines a strict Protocol interface, discov...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308043336.058_20260308_043358
|
Type-Safe Python Package Scaffolder Benchmark
README.md Type-Safe Python Package Scaffolder Benchmark **Description** This benchmark evaluates the generation of a robust, type-safe Python CLI tool that automates the creation of standard Python package structures. **Goal** The solution...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308044009.059_20260308_044030
|
Python Skill Fallback
Title: Strictly-Typed Dynamic Component Registry - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308044620.060_20260308_044648
|
Strictly-Typed Operation Registry & CLI
README.md Strictly-Typed Operation Registry & CLI This repository contains a single-file Python package (`benchmark.py`) that demonstrates a robust, strictly-typed plugin architecture using Python's `typing.Protocol`, `TypeVar`, and `Generi...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308045216.061_20260308_045247
|
Python Skill Fallback
Title: Strict Typing Runtime Validator - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308045836.062_20260308_045910
|
Python Skill Fallback
Title: PEP 695 Generic Service Registry - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308050541.063_20260308_050610
|
```markdown
bash python3 benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308051348.064_20260308_051419
|
Structural Subtyping Validator for Dynamic Modules
README.md Structural Subtyping Validator for Dynamic Modules Overview This benchmark tests the implementation of a robust, structural subtyping system using Python's `typing.Protocol`. Unlike nominal typing (inheritance), structural typing...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308051953.065_20260308_052035
|
Strictly-Typed Modular Plugin Dispatcher Benchmark
README.md Strictly-Typed Modular Plugin Dispatcher Benchmark This benchmark evaluates a Python engineer's ability to construct a self-contained, strictly-typed plugin ecosystem using the standard library. Objectives 1. **Protocol Enforcemen...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308052708.066_20260308_052740
|
This benchmark focuses on the creation of a robust, strictly typed configuration module for a high-performance inference...
README.md This benchmark focuses on the creation of a robust, strictly typed configuration module for a high-performance inference engine, similar to architectures found in vLLM or FlashAttention. **Objective** The goal is to demonstrate ho...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308053312.067_20260308_053344
|
---
README.md Benchmark: Robustly Typed Module Design Objective This benchmark evaluates your ability to design a robust, self-contained Python library that adheres to strict packaging and typing standards. It focuses on using Python's type sys...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308053932.068_20260308_054005
|
---
README.md --- Strictly-Typed Plugin Loader Benchmark **Objective**: Evaluate the performance and robustness of a dynamic plugin loading system that utilizes Python 3.12's PEP 695 Type Parameter Syntax, PEP 484 Type Hints, and `typing.Protoc...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308055111.069_20260308_055140
|
Python Skill Fallback
Title: Strict Package Interface Verifier - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308055738.070_20260308_055812
|
```markdown
README.md bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308060403.071_20260308_060432
|
Type-Safe Python Package Scaffolder Benchmark
README.md Type-Safe Python Package Scaffolder Benchmark This benchmark evaluates the implementation of a robust, type-safe CLI tool for generating Python package scaffolds. It emphasizes the use of modern Python typing constructs (`TypedDic...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308061019.072_20260308_061042
|
Python Skill Fallback
Title: Type-Safe Generic Registry with Dynamic Dependency Simulation - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308061627.073_20260308_061659
|
Strictly-Typed Backend Dispatcher
README.md Strictly-Typed Backend Dispatcher Design Brief This benchmark evaluates a Python system's ability to design a robust internal package structure that simulates a 'hardware dispatcher' (similar to `vllm` or `flash-attention` selecti...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308062920.074_20260308_062959
|
Section 1: README.md
Strictly Typed Dependency Constraint Resolver Overview This benchmark tests a developer's ability to implement a core algorithm (dependency resolution) using Python's advanced type system features. The goal is to create a robust, subset-com...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308063735.001_20260308_063809
|
Virtual Package Construction with Generic Protocols
Objective This benchmark evaluates a system's ability to programmatically synthesize a valid Python package structure on the filesystem while strictly adhering to PEP 484 typing standards (specifically Generics and Protocols). Design Brief...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308064535.001_20260308_064611
|
Structural Plugin Loader Benchmark
README.md Structural Plugin Loader Benchmark Overview This benchmark evaluates a system's ability to implement a modular, type-safe plugin architecture using Python's standard library. It focuses on `typing.Protocol` for structural subtypin...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308065154.002_20260308_065226
|
pytrain.20260308065154.002
No summary available yet.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308065800.003_20260308_065823
|
Strict Typed Dynamic Plugin Loader
This benchmark evaluates a Python script's ability to perform robust dynamic module loading and verification using Python's type system. Objective The script demonstrates how to safely load external code (plugins) at runtime. It leverages `...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308070419.004_20260308_070502
|
Strictly Typed Data Ingestion Module Benchmark
README.md Strictly Typed Data Ingestion Module Benchmark Objective This benchmark evaluates the correctness and performance of a Python module (`ingestor.py`) designed with strict typing standards. The module utilizes `TypedDict`, `Protocol...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308071045.005_20260308_071119
|
```markdown
README.md bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308071811.006_20260308_071845
|
Python Skill Fallback
Title: Strictly Typed Component Registry for Simulation Engine - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308072454.007_20260308_072523
|
Dynamic Generic Package Builder
This benchmark tests the ability of a system to programmatically scaffold a valid Python package structure, handle relative imports, and verify runtime behavior of Generic types. Instructions 1. Save the code below into a file named `benchm...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308073130.008_20260308_073213
|
```markdown
bash python benchmark.py ``` Expected Output The script will generate temporary files, load plugins, process data, print performance metrics, and conclude with a `VERIFIED` status.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308073819.009_20260308_073843
|
---
README.md Dynamic Type-Verified Plugin System Benchmark Overview This benchmark tests the hypothesis that **structural subtyping** (using `typing.Protocol`) combined with **dynamic module loading** (using `importlib`) allows for the creatio...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308075149.010_20260308_075217
|
Strictly Typed Asynchronous Plugin Loader
README.md Strictly Typed Asynchronous Plugin Loader Overview This coding drill evaluates a Python system's ability to simulate a distributable package structure while enforcing strict type safety using `typing.Protocol` and `typing.Generic`...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308075803.011_20260308_075841
|
Modular Log Analysis Toolkit Benchmark
README.md Modular Log Analysis Toolkit Benchmark Overview This coding drill evaluates the ability to construct a robust, single-file Python executable that mimics a professional package structure. The solution implements a text processing t...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308080411.012_20260308_080440
|
---
**README.md** Typed Component Registry Benchmark Overview This benchmark tests the ability to design a robust, modular, and type-safe component registry system using Python's `typing` module. It simulates the architecture found in large-sca...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308081041.013_20260308_081109
|
Generic Plugin Registry Benchmark
Overview This benchmark demonstrates a high-performance, type-safe plugin architecture suitable for large-scale Python applications (such as inference engines or data pipelines). It leverages Python's `typing.Protocol` for structural subtyp...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308081702.014_20260308_081727
|
Python Skill Fallback
Title: Dynamic Plugin Loader with Structural Subtyping - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308082333.015_20260308_082413
|
Strict Typed Package Scaffolder
README.md Strict Typed Package Scaffolder Overview This benchmark tests the ability of a coding system to leverage modern Python 3.12+ features, specifically **PEP 695 (Type Parameter Syntax)** and strict typing protocols, to construct a ro...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308083006.016_20260308_083026
|
Python Skill Fallback
Title: Type-Safe Dynamic Module Loader - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308083613.017_20260308_083639
|
pytrain.20260308083613.017
No summary available yet.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308084218.018_20260308_084250
|
Type-Safe Plugin Loader with Runtime Validation
README.md Type-Safe Plugin Loader with Runtime Validation Overview This coding drill benchmark tests the ability to design a robust, extensible module loader using Python's `typing.Protocol` and `@runtime_checkable` decorators. The goal is...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308084928.019_20260308_085019
|
Lazy Backend Loader - Coding Drill Benchmark
This document outlines a coding drill designed to test knowledge of Python's `typing.Protocol`, `importlib`, and exception handling within the context of building a lazy-loading system for heavy machine-learning backends (simulating framewo...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308090032.020_20260308_090100
|
Dynamic Configuration Loader with Strict Typing and Virtual Packaging
README.md Dynamic Configuration Loader with Strict Typing and Virtual Packaging This benchmark validates the design of a scalable, PyTorch-like experiment framework skeleton. It tests the core engineering skills required to build large-scal...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308090705.021_20260308_090735
|
Python Reliability Drill: Typing & Packaging
README.md Python Reliability Drill: Typing & Packaging This benchmark suite, `benchmark.py`, is designed to validate robustness in Python type handling and module packaging structures without external dependencies. It simulates a high-perfo...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308091342.022_20260308_091420
|
Benchmark: PEP 695 Generic Registry and ZipApp Deployment
README.md Benchmark: PEP 695 Generic Registry and ZipApp Deployment Objective This benchmark validates the developer's ability to utilize **PEP 695 Type Parameter Syntax** to define robust, thread-safe generic classes and package them as a...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308092015.023_20260308_092052
|
Strictly Typed Dynamic Module Inspector
README.md Strictly Typed Dynamic Module Inspector This Python coding drill demonstrates the creation of a robust utility that leverages the `typing.Protocol` for structural subtyping and `importlib` for runtime introspection. Hypothesis An...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308092659.024_20260308_092740
|
Here is the design for the coding drill benchmark focusing on a Robust Dynamic Plugin Loader with Runtime Type Verificat...
README.md Dynamic Plugin Loader & Runtime Type Verification Benchmark Overview This benchmark demonstrates the creation of a robust, modular Python system that dynamically loads code at runtime. It leverages Python's `importlib` for runtime...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308094751.025_20260308_094818
|
Dynamic Module Loader with Protocol Validation
README.md Dynamic Module Loader with Protocol Validation Overview This benchmark tests the ability to construct a robust, type-safe dynamic plugin system using Python's standard library. The solution demonstrates advanced `typing.Protocol`...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308095422.026_20260308_095453
|
Strict Typed Artifact Packager Benchmark
README.md Strict Typed Artifact Packager Benchmark Overview This benchmark evaluates the engineer's ability to construct robust deployment pipelines using Python's `typing` module and file-system management utilities. **Hypothesis:** Robust...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308100025.027_20260308_100056
|
Python Engineering Drill: Dynamic Component Registry
README.md Python Engineering Drill: Dynamic Component Registry Objective This benchmark tests the ability to implement a robust, type-safe plugin system using only the Python Standard Library. **Focus Areas:** 1. **Advanced Typing**: Correc...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308100637.028_20260308_100710
|
Python Skill Fallback
Title: Generic Package Registry with PEP 695 - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308101322.029_20260308_101408
|
Strictly Typed Plugin Architecture Simulation
README.md Strictly Typed Plugin Architecture Simulation Hypothesis An autonomous system can effectively internalize modern Python typing and packaging concepts by constructing a lightweight, extensible plugin system using only the standard...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308102615.030_20260308_102643
|
Asynchronous Log Aggregator with Strict Typing
Overview This benchmark evaluates the effectiveness of combining Python's `asyncio` library with strict static typing (`typing.TypedDict`, `dataclasses`) for building a simulated high-throughput log processing pipeline. The hypothesis is th...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308103242.031_20260308_103308
|
```markdown
Dynamic Module Loader with Strict Protocol Validation Overview This coding drill tests the ability to design a robust plugin system using Python's standard library. The focus is on dynamic code discovery/loading using `importlib` and enforc...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308103942.032_20260308_104016
|
Strict-Typed Component Factory Benchmark
README.md Strict-Typed Component Factory Benchmark This benchmark validates a candidate's ability to structure a Python module that simulates a professional package architecture. It focuses on strict typing using `typing.Protocol`, proper e...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308104634.033_20260308_104701
|
Dynamic Module Discovery with Structural Subtyping Benchmark
README.md Dynamic Module Discovery with Structural Subtyping Benchmark Overview This benchmark tests a robust plugin architecture hypothesis: using `typing.Protocol` with `runtime_checkable` provides a more flexible and decoupled method for...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308105258.034_20260308_105331
|
Python Skill Fallback
Title: Dynamic Model Registry with Structural Subtyping - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308105906.035_20260308_105941
|
Generic Plugin Registry with Dynamic Module Loading
This benchmark evaluates the performance and type safety of a generic plugin registry system using Python 3.12's PEP 695 Type Parameter Syntax. Features - **PEP 695 Syntax**: Uses `class PluginRegistry[T]` for cleaner generic definitions. -...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308111448.036_20260308_111519
|
Dynamic Module Loader with Strict Protocol Compliance
README.md Dynamic Module Loader with Strict Protocol Compliance Overview This benchmark evaluates a robust package loading mechanism designed for dynamic plugin systems. The implementation demonstrates how an autonomous agent can construct...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308112106.037_20260308_112129
|
Generic Data Pipeline Benchmark
README.md Generic Data Pipeline Benchmark This coding drill evaluates the implementation of a robust, type-safe data pipeline using Python's advanced standard library features. Objective The goal is to design a single-file module (`benchmar...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308112731.038_20260308_112804
|
Strictly Typed Plugin Registry
Overview This benchmark challenges you to implement a robust, modular plugin architecture in Python using modern type hinting features. The goal is to create a system that enforces strict structural typing (Protocols) and type-safe storage...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308113407.001_20260308_113441
|
Robust Typed Plugin Loader: Benchmark & Verification
README.md Robust Typed Plugin Loader: Benchmark & Verification Objective This benchmark evaluates a Python-based plugin architecture that relies on `typing.Protocol` for structural subtyping (duck typing with static type checking) combined...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308114047.002_20260308_114111
|
Generic Plugin Registry with PEP 695 - Benchmark Drill
This benchmark validates the implementation of a generic plugin system using Python 3.12's Type Parameter Syntax (PEP 695). It tests syntax compliance, functional correctness of the generic registry, and runtime performance metrics. Accepta...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308114710.003_20260308_114741
|
In-Memory Plugin Loader with Strict Protocols
README.md In-Memory Plugin Loader with Strict Protocols This benchmark implements a robust, file-system-free plugin architecture using Python's standard library. It demonstrates the creation of a custom import mechanism that loads Python mo...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308120314.001_20260308_120342
|
Structural Subtyping and Mock Package Registry Benchmark
README.md Structural Subtyping and Mock Package Registry Benchmark Objective This benchmark evaluates a Python system's ability to leverage **Structural Subtyping** (using `typing.Protocol` and `@runtime_checkable`) to create a robust, zero...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308120715.001_20260308_120745
|
---
README.md Dynamic Plugin Loader with Strict Type Enforcement Overview This benchmark validates a zero-dependency, robust plugin architecture implementation using Python's standard library. It demonstrates dynamic module compilation, runtime...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308121343.002_20260308_121418
|
Python Skill Fallback
Title: Modern Generic Plugin Loader with PEP 695 - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308122037.003_20260308_122053
|
Benchmark: Strict Type-Verified Plugin Registry
An autonomous coding system can simulate the robustness of a package distribution system by implementing a runtime registry that utilizes structural subtyping (Protocols) to validate interfaces. This ensures that only strictly compliant mod...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308122648.004_20260308_122712
|
Type-Safe Plugin Architecture with Namespace Management
README.md Type-Safe Plugin Architecture with Namespace Management Design Brief This coding drill validates the hypothesis that utilizing `typing.Protocol` (Structural Subtyping) combined with explicit Namespace Management (`__all__`) provid...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308123236.005_20260308_123255
|
Typing-First Dynamic Module Loader
Overview This benchmark evaluates an agent's ability to leverage Python's advanced type hinting features (specifically `typing.Protocol` and `@runtime_checkable`) to enforce structural subtyping (duck typing) at runtime. The task involves s...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308124805.006_20260308_124833
|
Type-Safe Plugin Registry Benchmark
README.md Type-Safe Plugin Registry Benchmark This benchmark simulates the core functionality of complex ML frameworks (like Diffusers or vLLM) that rely on dynamic component discovery and strict interface adherence. Objective Implement a `...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308125419.007_20260308_125437
|
Strictly Typed Plugin Registry and Package Simulator
README.md Strictly Typed Plugin Registry and Package Simulator Overview This benchmark tests the ability to design a robust, dependency-free component registry using Python's advanced `typing` features. It simulates a professional Python pa...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308130014.008_20260308_130040
|
Dynamic In-Memory Package Loader with Generic Registry
README.md Dynamic In-Memory Package Loader with Generic Registry This benchmark evaluates the implementation of an advanced Python packaging mechanism where software distribution is simulated entirely in memory, alongside strict type safety...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308132049.001_20260308_132120
|
Python Skill Fallback
Title: Generic Repository Pattern with Packaging Hygiene - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308132722.002_20260308_132754
|
Generic Plugin Loader with PEP 695
This benchmark validates the use of Python 3.12's PEP 695 Type Parameter Syntax to define a generic plugin interface. It dynamically constructs a Python package in a temporary directory, creates a plugin module, and loads it using `importli...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308133345.003_20260308_133425
|
Type-Safe Plugin Registry Benchmark
README.md Type-Safe Plugin Registry Benchmark Overview This coding drill validates the hypothesis that a robust, type-safe plugin architecture can be constructed using Python's standard library `typing.Protocol` for structural subtyping. It...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308134029.004_20260308_134055
|
Dynamic Plugin System with Structural Subtyping
README.md Dynamic Plugin System with Structural Subtyping This benchmark tests the hypothesis that an autonomous coding system can effectively decouple interface definition from implementation by leveraging `typing.Protocol` for structural...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308134727.005_20260308_134757
|
This document outlines the design and execution of a coding benchmark focused on **Strictly-Typed Dependency Graph Resol...
README.md This document outlines the design and execution of a coding benchmark focused on **Strictly-Typed Dependency Graph Resolution**. Overview The goal of this benchmark is to test the ability of a system to generate a robust, type-saf...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308135336.006_20260308_135357
|
Dynamic Component Registry with Runtime Type Validation
Overview This benchmark evaluates a Python engineer's ability to construct a robust, dynamic plugin architecture using Python's standard library. The task involves generating a temporary package structure on the fly and implementing a regis...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308140048.007_20260308_140107
|
Generic Component Registry Benchmark
This benchmark validates the implementation of a type-safe, generic registry pattern using Python's standard library. The pattern is fundamental in large-scale frameworks (like PyTorch or Lightning) for dynamically managing modules, optimiz...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308140649.008_20260308_140720
|
Robust Dependency Graph Resolver
README.md Robust Dependency Graph Resolver This benchmark validates the implementation of a rigorous, type-safe dependency resolution engine suitable for inclusion in a package manager toolchain. Overview The `benchmark.py` script implement...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308141314.009_20260308_141338
|
Type-Safe Dynamic Plugin Loader
README.md This benchmark evaluates a developer's ability to construct a robust, runtime-extensible plugin system using Python's `typing.Protocol` and `importlib`. Design Brief In an autonomous system, components often need to load third-par...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308141954.010_20260308_142011
|
Dynamic Plugin Loader with Typing Validation
This benchmark simulates a robust plugin architecture by leveraging Python's `typing.Protocol` for structural subtyping. It demonstrates how to dynamically load and validate "packages" (mock objects) at runtime without explicit inheritance,...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308142635.011_20260308_142718
|
Stdlib ZipApp Builder with AST Type Enforcement Benchmark
README.md Stdlib ZipApp Builder with AST Type Enforcement Benchmark This benchmark evaluates the ability to construct a robust build pipeline tool using only the Python standard library. Objective The candidate must implement a tool (`build...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308143420.012_20260308_143439
|
```markdown
bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308144100.013_20260308_144125
|
Python Skill Fallback
Title: Protocol-Based Plugin System with Dependency Resolution - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308144712.014_20260308_144733
|
Python Skill Fallback
Title: Strict Config Validator & PEP 440 Environment Checker - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308145400.015_20260308_145423
|
Typed Plugin Registry System
README.md Typed Plugin Registry System Overview This benchmark demonstrates the implementation of a robust, type-safe plugin system using modern Python type hinting features (PEP 484) and the `typing.Protocol` definition. Design Principles...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308150033.016_20260308_150059
|
Strictly-Typed Dynamic Plugin Loader
README.md Strictly-Typed Dynamic Plugin Loader Overview This benchmark demonstrates an autonomous system capable of utilizing Python's advanced type hinting system to enforce runtime interface compliance while dynamically discovering and lo...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308150738.017_20260308_150800
|
Protocol-Validated Dynamic Plugin Loader
README.md Protocol-Validated Dynamic Plugin Loader This benchmark tests an autonomous coding system's ability to leverage Python's standard library to perform advanced metaprogramming tasks. Hypothesis An autonomous system can programmatica...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308151327.018_20260308_151343
|
Dynamic Package Loader with Runtime Type Validation
README.md Dynamic Package Loader with Runtime Type Validation Objective This benchmark evaluates the ability of a Python system to programmatically generate code, manage the file system, load modules dynamically, and enforce structural subt...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308152025.019_20260308_152050
|
Strictly Typed Module Registry with Semantic Versioning
README.md Strictly Typed Module Registry with Semantic Versioning This benchmark evaluates a candidate's ability to design a robust, zero-dependency plugin architecture within the Python standard library. It focuses on modern typing protoco...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308153659.020_20260308_153722
|
Benchmark: Strictly-Typed Backend Registry with Dynamic Loading
README.md Benchmark: Strictly-Typed Backend Registry with Dynamic Loading Overview This benchmark evaluates a Python system's capability to manage heterogeneous numerical backends using advanced type hinting features (`typing.Protocol`, `ty...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308154435.021_20260308_154504
|
Strict Package Metadata & Build System Simulator
README.md Strict Package Metadata & Build System Simulator Overview This benchmark tests the ability to construct a robust, self-documenting Python packaging utility using advanced standard library typing features. The goal is to enforce da...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308155125.022_20260308_155147
|
Generic Virtual Package Builder Benchmark
README.md This coding drill evaluates your ability to leverage modern Python 3.12+ typing features (PEP 695) and dynamic module introspection to create a robust build utility. **Objective:** Implement a `PackageBuilder[T]` generic class cap...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308155845.023_20260308_155913
|
Typed Distribution Simulator Benchmark
README.md Typed Distribution Simulator Benchmark This project demonstrates a robust, single-file Python implementation of a local package registry manager (`pkg_simulator`), designed with high-level static typing and packaging standards. Fe...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308160711.024_20260308_160743
|
Strictly-Typed Event Dispatcher Benchmark
README.md This benchmark tests the creation of a strictly-typed Event Dispatcher system using Python's standard library `typing` module. It enforces compile-time type safety using `Protocol` and `Generic`. Prerequisites - Python 3.10+ - `my...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308162949.001_20260308_163012
|
Strictly Typed Plugin Registry Benchmark
README.md Strictly Typed Plugin Registry Benchmark Overview This benchmark demonstrates a robust, self-validating extension system (plugin registry) built with Python's standard library. It leverages `typing.Protocol`, `runtime_checkable`,...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308163401.001_20260308_163419
|
Typed ZipApp Distribution Benchmark
README.md Typed ZipApp Distribution Benchmark **Design Brief:** This benchmark evaluates an autonomous coding system's ability to programmatically generate, structure, and package a Python application using modern static typing features and...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308164102.001_20260308_164125
|
Strictly Typed Modular Data Aggregator
Overview This benchmark demonstrates the implementation of a strictly typed, modular data processing system using Python's standard library `typing` features. It simulates a professional package structure within a single file, leveraging `P...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308164451.001_20260308_164513
|
Coding Drill: Typed Plugin System Benchmark
README.md Coding Drill: Typed Plugin System Benchmark Objective Design and verify a Python package `processor_pkg` that demonstrates strict adherence to typing standards (using `Protocol` and `TypeVar`) and encapsulation (controlling API ex...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308164845.001_20260308_164907
|
Benchmark: Strict Package Metadata Validator with Extensible Type Guards
README.md Benchmark: Strict Package Metadata Validator with Extensible Type Guards Overview This benchmark implements a robust, runtime type-safe validator for Python package metadata, simulating structures found in `pyproject.toml`. It dem...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308165654.001_20260308_165717
|
Python Skill Fallback
Title: Dynamic Package Construction and Strict Protocol Verification - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308170418.002_20260308_170452
|
---
Generic Data Pipeline Refactoring using PEP 695 Design Brief **Hypothesis**: Adopting Python 3.12's PEP 695 Type Parameter Syntax enhances the clarity and maintainability of generic algorithms by reducing boilerplate and scoping type variab...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308171044.003_20260308_171105
|
Dynamic Plugin Loader with Strict Type Verification
This benchmark demonstrates a robust plugin architecture where Python code is loaded at runtime from a string, injected into `sys.path`, and rigorously validated against a `typing.Protocol`. This ensures that third-party or user-defined cod...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308171406.001_20260308_171433
|
Strictly Typed Dependency Resolution Simulation
README.md Strictly Typed Dependency Resolution Simulation Overview This benchmark tests the ability to design a robust, lightweight package manager simulation using advanced Python typing constructs. The core hypothesis is that strict typin...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308172046.002_20260308_172110
|
Python Reliability Drill: Typing & Generics
README.md Python Reliability Drill: Typing & Generics This drill benchmarks your ability to implement robust, type-safe utilities using modern Python type systems (PEP 695) without external dependencies. Objective Implement a generic contai...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308172740.003_20260308_172803
|
Type-Verified Zip Application Packager
README.md Type-Verified Zip Application Packager This benchmark is designed to test the implementation of a robust, type-safe Python application packager. Overview The script defines a packaging pipeline that enforces strict typing on appli...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308173339.004_20260308_173418
|
```markdown
README.md bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308174702.005_20260308_174721
|
Dynamic Plugin Loader with Protocol Validation
This benchmark demonstrates a robust, type-safe plugin architecture using Python's standard library. Overview The `PluginManager` class in this benchmark: 1. Uses `tempfile` to dynamically construct a valid Python package directory structur...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308175256.006_20260308_175321
|
Python Coding Drill: Lazy-Loaded Module Simulation
README.md Python Coding Drill: Lazy-Loaded Module Simulation Objective This benchmark challenges the developer to architect a simulation of a high-performance library's internal structure (similar to `vllm` or `diffusers`). The task involve...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260308180011.007_20260308_180041
|
Python Skill Fallback
Title: Strictly Typed Plugin Registry with Dynamic Discovery - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-29 08:01 | Success | - | |
|
exp_self.20260307063408.001_20260307_063436
|
Adaptive Precision Hierarchical Distillation Benchmark
README.md Adaptive Precision Hierarchical Distillation Benchmark This repository evaluates the **Adaptive Precision Hierarchical Distillation** methodology. It tests the hypothesis that a student model utilizing hierarchical attention and d...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307063657.001_20260307_063731
|
Here is the runnable benchmark code for the **Dynamic Precision Hierarchical Distillation** innovation.
README.md Dynamic Precision Hierarchical Distillation Benchmark This repository contains a minimal, runnable benchmark for the "Dynamic Precision Hierarchical Distillation with Selective Memory Caching" innovation. Innovation Summary This b...
|
03-29 08:01 | Pending | - | |
|
exp_self.20260307064659.001_20260307_064737
|
Here is the design for the benchmark. This setup uses PyTorch to simulate the workload of a Transformer-based model, com...
README.md bash pip install torch bash python benchmark.py
|
03-29 08:01 | Pending | - | |
|
exp_self.20260307170335.003_20260307_170400
|
Benchmark: Dynamic-Precision State Caching for Mamba SSMs
README.md Benchmark: Dynamic-Precision State Caching for Mamba SSMs Overview This benchmark validates the hypothesis that utilizing **dynamic precision (bfloat16)** for the recurrent hidden states of Mamba (SSM) models can reduce VRAM press...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307170553.004_20260307_170622
|
Memory-Efficient Distillation of Mamba SSMs with Dynamic Precision Caching
README.md This benchmark evaluates a novel training approach for State Space Models (SSMs), specifically focusing on a Mamba-based student model distilled from a Transformer teacher. The core innovation lies in the integration of **layer-wi...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307171222.005_20260307_171449
|
Here is the design and implementation for the **Dynamic Precision Caching for Low-Memory SSM Distillation** benchmark.
README.md bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_self.20260307171956.006_20260307_172024
|
Section 1: README.md
Dynamic Precision State Caching for Memory-Efficient Mamba Distillation Overview This benchmark validates the hypothesis that distilling a Transformer teacher into a Mamba-like SSM student can be run on memory-constrained GPUs (8GB) by util...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307172232.007_20260307_172325
|
---
README.md --- Self-Directed Benchmark: SSM Strategy Stress Test Innovation Summary This benchmark validates the hypothesis that **State Space Model (SSM)** inference strategies, which utilize fixed-size recurrent state buffers rather than g...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307172511.008_20260307_172539
|
Here are the two files as requested.
README.md Dynamic Precision State Caching for Memory-Efficient SSM Distillation Overview This benchmark evaluates an "Innovation" technique designed to optimize the training of State Space Models (SSMs) on hardware-constrained devices (e.g....
|
03-29 08:01 | Success | - | |
|
exp_self.20260307172903.009_20260307_172930
|
Dynamic Precision State Caching for Memory-Efficient SSM Distillation
README.md Dynamic Precision State Caching for Memory-Efficient SSM Distillation This benchmark evaluates a novel approach to training State Space Models (SSMs), specifically focusing on the Mamba architecture, under strict memory constraint...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307173243.010_20260307_173313
|
Benchmark: Memory-Efficient SSM Distillation via Dynamic State Precision
README.md Benchmark: Memory-Efficient SSM Distillation via Dynamic State Precision Overview This benchmark evaluates a hypothesis regarding State Space Models (SSMs): that explicitly enforcing lower precision (FP16) on the recurrent hidden...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307174235.012_20260307_174802
|
Benchmark: Adaptive Layer-wise State Precision for SSMs
README.md Benchmark: Adaptive Layer-wise State Precision for SSMs Overview This benchmark evaluates the efficiency gains of applying **Adaptive Layer-wise State Precision** to State Space Models (SSMs). In the context of SSM distillation, s...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307180747.014_20260307_180816
|
Memory-Efficient SSM Distillation Benchmark
README.md Memory-Efficient SSM Distillation Benchmark This benchmark validates the hypothesis that a State Space Model (Student) can effectively distill knowledge from a larger Transformer (Teacher) while strictly adhering to an 8GB VRAM bu...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307180936.015_20260307_181011
|
Efficient SSM Distillation Benchmark
README.md Efficient SSM Distillation Benchmark This benchmark implements a teacher-student distillation setup where a GPT-2 model (Teacher) transfers knowledge to a lightweight Mamba-style State Space Model (Student). Key Features 1. **Cust...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307181354.016_20260307_181648
|
Benchmark: Efficient SSM Distillation with Dynamic Precision and State Caching
README.md Benchmark: Efficient SSM Distillation with Dynamic Precision and State Caching This benchmark evaluates the performance gains of a hypothetical Student State Space Model (SSM) against a baseline Teacher model. The innovation focus...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307181801.017_20260307_181840
|
Here is the runnable benchmark for the SSM Distillation with Dynamic Precision and Memory-Cache Optimization.
README.md SSM Distillation with Dynamic Precision and Memory-Cache Optimization This repository contains a benchmark designed to test the hypothesis that integrating **Dynamic Precision** training into the **Knowledge Distillation** of a **...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307182132.018_20260307_182212
|
Efficient SSM Distillation Benchmark
README.md Efficient SSM Distillation Benchmark ===================================== This benchmark evaluates the performance of a Knowledge Distillation pipeline where a Transformer-based teacher model trains a simplified Mamba-like Select...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307182441.019_20260307_182547
|
Memory-Efficient SSM Distillation via Dynamic State Caching
**README.md** --- Memory-Efficient SSM Distillation Benchmark Overview This benchmark evaluates the hypothesis that applying dynamic precision (specifically FP16) to the recurrent state cache of a Student State Space Model (SSM) during know...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307184318.021_20260307_184424
|
Benchmark: Dynamic Precision Recurrent State Caching for SSMs
README.md Benchmark: Dynamic Precision Recurrent State Caching for SSMs Overview This benchmark evaluates the memory efficiency of a **Dynamic Precision Recurrent State Caching** mechanism designed for State Space Models (SSMs) during the d...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307184654.022_20260307_185114
|
Zero-Shot SSM Distillation Benchmark
README.md Zero-Shot SSM Distillation Benchmark This benchmark evaluates the performance characteristics of the **Zero-Shot SSM Distillation** technique. The innovation focuses on two primary efficiency mechanisms: 1. **Adaptive Precision**:...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307185210.023_20260307_185249
|
Here are the two sections of the runnable benchmark, designed to demonstrate Adaptive-Precision SSM Distillation with Re...
bash pip install torch transformers tqdm python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_self.20260307190419.024_20260307_190458
|
Benchmark: Low-Memory SSM Distillation via Cached State Quantization
README.md Benchmark: Low-Memory SSM Distillation via Cached State Quantization This benchmark evaluates the hypothesis that applying dynamic precision quantization to the recurrent state cache of a State Space Model (SSM) student can signif...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307190718.025_20260307_190752
|
Benchmark: Dynamic Precision SSM Distillation
README.md Benchmark: Dynamic Precision SSM Distillation This benchmark evaluates the hypothesis that applying dynamic precision reduction to the recurrent state cache of a distilled State Space Model (SSM) can significantly reduce peak VRAM...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307191059.026_20260307_191143
|
```markdown
bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_self.20260307191408.027_20260307_191450
|
You are an ML engineer creating a safe, runnable benchmarking code.
Design a small, runnable benchmark for this innovation. STRICT REQUIREMENT: Output two sections separated by '
|
03-29 08:01 | Success | - | |
|
exp_self.20260307191701.028_20260307_191730
|
Memory-Efficient SSM Distillation Benchmark
README.md Memory-Efficient SSM Distillation Benchmark This repository contains a minimal, runnable benchmark designed to evaluate the hypothesis that **Dynamic Precision State Caching** enables the training of State Space Models (SSMs) via...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307191944.029_20260307_192257
|
Innovation: Fine-Grained Dynamic Precision in SSM State Caching
README.md Innovation: Fine-Grained Dynamic Precision in SSM State Caching This benchmark validates the efficiency gains of applying dynamic precision reduction (FP32 -> FP16/BF16) specifically to the recurrent state cache of a State Space M...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307192342.030_20260307_192415
|
Cache-Aware Dynamic Precision Distillation for Memory-Constrained SSMs
README.md Cache-Aware Dynamic Precision Distillation for Memory-Constrained SSMs Overview This benchmark evaluates an innovation aimed at running large State Space Models (SSMs) on memory-constrained hardware (8GB VRAM). The core hypothesis...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307192642.031_20260307_192725
|
Dynamic Precision State-Cache Distillation for Low-Resource SSMs
README.md Dynamic Precision State-Cache Distillation for Low-Resource SSMs Overview This benchmark evaluates a hypothesis for optimizing State Space Models (SSMs) on memory-constrained hardware (e.g., 8GB GPUs). The innovation introduces a...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307192857.032_20260307_192928
|
Here is the runnable benchmark designed for the "Dynamic-Precision State Distillation" innovation.
README.md Dynamic-Precision State Distillation for Low-Resource SSMs Overview This benchmark tests the hypothesis that applying dynamic precision (FP16) to the state cache of a student SSM (distilled from a larger teacher) significantly red...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307193200.033_20260307_193235
|
This repository contains a benchmark for "State-Quantized Distillation for Low-Latency SSMs."
README.md This repository contains a benchmark for "State-Quantized Distillation for Low-Latency SSMs." Overview This benchmark tests the hypothesis that dynamically quantizing the recurrent state cache of a State Space Model (SSM) from FP3...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307193600.034_20260307_193630
|
```markdown
bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_self.20260307193840.035_20260307_193914
|
Dynamic-Precision State-Cache Distillation Benchmark
README.md Dynamic-Precision State-Cache Distillation Benchmark This repository contains a minimal, self-contained benchmark to validate the memory efficiency of **Dynamic-Precision State-Cache Distillation** for State-Space Models (SSMs). H...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307194432.037_20260307_194641
|
Benchmark: Dynamic-Precision State-Cache Distillation for SSMs
README.md Benchmark: Dynamic-Precision State-Cache Distillation for SSMs This benchmark evaluates the memory efficiency and inference throughput of a novel State Space Model (SSM) approach. The proposed innovation ("Dynamic-Precision State-...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307194829.038_20260307_194915
|
Dynamic-Precision State-Cache Distillation Benchmark
README.md Dynamic-Precision State-Cache Distillation Benchmark Overview This benchmark tests the hypothesis that a State Space Model (SSM) using **Dynamic Precision** for recurrent state tensors and **Gradient Checkpointing** for state cach...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307195135.039_20260307_195212
|
Adaptive State Distillation for Low-Memory SSMs
README.md Adaptive State Distillation for Low-Memory SSMs Overview This benchmark validates the hypothesis that a **Student SSM** utilizing **dynamic precision** (FP16 state caching) can maintain throughput comparable to a standard **Teache...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307195425.040_20260307_195501
|
Dynamic State-Cache Distillation for Low-Memory SSMs
README.md Dynamic State-Cache Distillation for Low-Memory SSMs Innovation Overview This benchmark demonstrates a novel approach to optimizing State Space Models (SSMs), specifically the Mamba architecture, for edge-constrained environments....
|
03-29 08:01 | Success | - | |
|
exp_self.20260307195724.041_20260307_195807
|
Dynamic-Precision State-Cache Distillation Benchmark
README.md Dynamic-Precision State-Cache Distillation Benchmark This repository contains a minimal, self-contained benchmark designed to validate the "Dynamic-Precision State-Cache Distillation" hypothesis for State Space Models (SSMs). Hypo...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307200025.042_20260307_200232
|
Benchmark: Dynamic-Precision State-Cache for SSMs
README.md Benchmark: Dynamic-Precision State-Cache for SSMs This benchmark evaluates the "Dynamic-Precision State-Cache Distillation" concept for State Space Models (SSMs). Since the original architecture generation was skipped, this benchm...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307200347.043_20260307_200704
|
Based on the provided abstract and innovation title, here is a runnable benchmark design. Since the abstract mentions th...
The benchmark compares a standard full-precision SSM (Baseline) against an SSM utilizing dynamic precision for its state cache (Innovation). --- README.md Benchmark: Dynamic State-Cache Distillation for Low-Memory SSMs Overview This benchma...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307200745.044_20260307_200825
|
Dynamic Precision State Caching for SSMs via Logit Distillation
README.md Dynamic Precision State Caching for SSMs via Logit Distillation Overview This benchmark implements a minimal Selective State Space Model (Mamba-style) to test the hypothesis that storing recurrent state tensors in dynamic precisio...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307201045.045_20260307_201409
|
```markdown
bash pip install torch bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_self.20260307201445.046_20260307_201519
|
Adaptive Precision State Caching for Mamba SSMs
README.md Adaptive Precision State Caching for Mamba SSMs Overview This benchmark evaluates an **Adaptive Precision State Caching** mechanism designed for Mamba-style State Space Models (SSMs). The core hypothesis is that by storing recurre...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307202107.001_20260307_202143
|
Adaptive Precision State Cache for Mamba SSMs
README.md Adaptive Precision State Cache for Mamba SSMs Overview This benchmark validates the "Adaptive Precision State Cache" hypothesis. It demonstrates that dynamically quantizing the recurrent state cache of a Mamba Selective State Spac...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307202349.002_20260307_202420
|
Memory-Constrained Dynamic Precision Caching for Mamba SSMs
This benchmark evaluates a hypothesis regarding dynamic precision in State Space Models (specifically a Mamba-style architecture). **Hypothesis:** By storing the recurrent hidden state in half-precision (FP16) while maintaining the immediat...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307202731.003_20260307_202810
|
Dynamic Precision State Cache for Efficient Mamba Inference
README.md Dynamic Precision State Cache for Efficient Mamba Inference Overview This benchmark implements and tests a novel memory optimization for State Space Models (specifically Mamba architectures). The core innovation involves applying...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307203029.004_20260307_203109
|
```markdown
README.md bash pip install torch numpy python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_self.20260307203435.005_20260307_203518
|
---
README.md --- Benchmark: Dynamic-Precision State Caching for SSMs This repository contains a minimal, runnable benchmark designed to validate the hypothesis regarding memory-efficient State Space Models (SSMs). Hypothesis Employing dynamic...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307203739.006_20260307_203812
|
```markdown
README.md bash pip install torch tqdm bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_self.20260307204142.007_20260307_204219
|
Dynamic-Precision State Distillation Benchmark
README.md Dynamic-Precision State Distillation Benchmark This benchmark evaluates **Dynamic-Precision State Distillation**, a technique to optimize State Space Models (SSMs) like Mamba. The Innovation Standard SSMs maintain high-precision (...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307205408.008_20260307_205452
|
Here is the runnable benchmark code.
README.md Dynamic-Precision State Distillation Benchmark This benchmark validates the hypothesis that dynamic precision switching combined with knowledge distillation reduces VRAM usage for SSM training without sacrificing perplexity. Metho...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307205730.009_20260307_205821
|
Here is the design for the "Mixed-Precision State Distillation for Low-Resource SSMs" benchmark.
No summary available yet.
|
03-29 08:01 | Success | - | |
|
exp_self.20260307210026.010_20260307_210111
|
Adaptive State Space Distillation with Dynamic Precision Caching
README.md Adaptive State Space Distillation with Dynamic Precision Caching Overview This repository contains a benchmark implementation for **Adaptive State Space Distillation with Dynamic Precision Caching**. The core innovation combines *...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307210324.011_20260307_210618
|
Benchmark: Memory-Adaptive SSM Distillation via Dynamic Precision Caching
README.md Benchmark: Memory-Adaptive SSM Distillation via Dynamic Precision Caching This benchmark evaluates a simulated implementation of a **State Space Model (SSM)** that utilizes dynamic precision switching to optimize memory bandwidth...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307210718.012_20260307_210753
|
Dynamic Precision SSM Distillation Benchmark
README.md Dynamic Precision SSM Distillation Benchmark This repository contains the benchmark code for evaluating **Dynamic Precision SSM Distillation with State Memory Caching**. Overview This benchmark tests the hypothesis that using Auto...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307212455.013_20260307_212537
|
Here is the design for the benchmark.
README.md bash pip install torch numpy bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_self.20260307212753.014_20260307_213143
|
Benchmark: Dynamic Precision SSM with State Caching & Memory Distillation
README.md Benchmark: Dynamic Precision SSM with State Caching & Memory Distillation Overview This benchmark evaluates a synthetic State Space Model (SSM) implementation designed to test the efficiency gains of three key architectural innova...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307213254.015_20260307_213338
|
This benchmark evaluates the "Dynamic Precision SSM Distillation with State Memory Caching" innovation.
README.md This benchmark evaluates the "Dynamic Precision SSM Distillation with State Memory Caching" innovation. Hypothesis By distilling a lightweight State Space Model (SSM) from a larger Transformer teacher and utilizing Dynamic Precisi...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307213618.016_20260307_213647
|
This benchmark evaluates a synthetic implementation of a Dynamic Precision State Space Model (SSM). The goal is to valid...
README.md This benchmark evaluates a synthetic implementation of a Dynamic Precision State Space Model (SSM). The goal is to validate the hypothesis that utilizing reduced precision (FP16) for the recurrent state tensors during inference—si...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307213856.017_20260307_213942
|
This repository contains the benchmarking suite for the "Dynamic Precision SSM with Cached State Distillation" project.
README.md This repository contains the benchmarking suite for the "Dynamic Precision SSM with Cached State Distillation" project. Objective To validate the hypothesis that a State Space Model (SSM) utilizing Dynamic Precision (AMP), State C...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307214200.018_20260307_214535
|
Here is the runnable benchmark for the innovation described in the title "Memory-Efficient Mamba Distillation via Activa...
bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_self.20260307214620.019_20260307_214713
|
8GB-Optimized SSM Distillation Benchmark
README.md 8GB-Optimized SSM Distillation Benchmark This benchmark validates the **8GB-Optimized SSM Distillation** innovation. Hypothesis By offloading Teacher Logit computation to a CPU cache and utilizing Dynamic Precision (AMP), we can t...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307214829.020_20260307_214913
|
Dynamic Precision SSM Distillation with CPU-State Offloading Benchmark
README.md Dynamic Precision SSM Distillation with CPU-State Offloading Benchmark This benchmark validates the hypothesis that a State Space Model (SSM) student can be effectively distilled from a Transformer teacher on memory-constrained ha...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307215208.021_20260307_215236
|
CPU-Offloaded Dynamic Precision SSM Distillation
README.md CPU-Offloaded Dynamic Precision SSM Distillation This benchmark demonstrates a novel training optimization for State Space Models (SSMs), specifically targeting scenarios where GPU VRAM is constrained (e.g., 8GB cards). Innovation...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307215527.022_20260307_215834
|
Here is the runnable benchmark for the "Hybrid-Precision State-Checkpointing for SSM Distillation" innovation.
README.md
|
03-29 08:01 | Success | - | |
|
exp_self.20260307215935.023_20260307_220013
|
Dynamic-Precision SSM Distillation Benchmark
README.md Dynamic-Precision SSM Distillation Benchmark This benchmark validates the hypothesis that a combination of **System RAM Caching**, **Gradient Checkpointing**, and **Dynamic Precision (AMP)** can enable the distillation of a large...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307221231.024_20260307_221254
|
Dynamic-Precision SSM Distillation Benchmark
README.md Dynamic-Precision SSM Distillation Benchmark Overview This benchmark demonstrates a novel training optimization technique designed to fit Large Language Model (LLM) distillation into strict hardware constraints (specifically 8GB V...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307221456.025_20260307_221705
|
Here is the design for the **Backfill Candidate** benchmark.
Since the abstract indicates the original output was empty ("architect_output_empty"), this implementation realizes the *intent* described in the title: **Dynamic-Precision SSM Distillation with Gradient-Gated State Caching**. We define a l...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307221924.026_20260307_222120
|
Here is a runnable benchmark for the "Dynamic-Precision SSM with Recurrent State Caching" innovation, designed to profil...
README.md --- Dynamic-Precision SSM Benchmark This benchmark evaluates the performance characteristics of a **Dynamic-Precision State Space Model (SSM)** utilizing **Recurrent State Caching**. Innovation Summary Traditional SSMs (like S4 or...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307222152.027_20260307_222225
|
Section 1: README.md
bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_self.20260307222454.028_20260307_222708
|
Mixed-Precision SSM State Caching Benchmark
README.md Mixed-Precision SSM State Caching Benchmark This benchmark implements a lightweight, runnable simulation of a **State Space Model (SSM)** with **State Caching** and **Mixed-Precision** optimization. It is designed to verify the ef...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307222754.029_20260307_222823
|
Benchmark: Dynamic-Precision SSM Distillation with State Caching
README.md Benchmark: Dynamic-Precision SSM Distillation with State Caching This benchmark evaluates a hardware-efficient training and inference pipeline for State Space Models (SSMs). Hypothesis By distilling a large Transformer into a smal...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307223147.030_20260307_223229
|
---
README.md Layer-Wise Dynamic-Precision SSM Distillation Benchmark This repository contains a minimal, runnable benchmark for **Layer-Wise Dynamic-Precision SSM Distillation with Activation Caching**. Innovation Summary This benchmark demons...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307223442.031_20260307_223523
|
This repository contains the benchmark implementation for **Dynamic-Precision SSM Distillation with Gradient-Sensitive S...
README.md This repository contains the benchmark implementation for **Dynamic-Precision SSM Distillation with Gradient-Sensitive State Caching**. Overview This benchmark validates the hypothesis that dynamically switching between FP16 and F...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307223735.032_20260307_223811
|
Memory-Efficient SSM Distillation Benchmark
README.md Memory-Efficient SSM Distillation Benchmark This benchmark validates the hypothesis that **Segment State Caching** combined with **Dynamic Precision** can significantly reduce the memory footprint of training a State Space Model (...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307224116.033_20260307_224233
|
Memory-Efficient SSM Distillation Benchmark
By monitoring the gradient magnitude of the SSM hidden state during the backward pass, we can dynamically downshift the state cache precision (BF16 vs FP32), reducing VRAM usage by >15% while maintaining model accuracy.
|
03-29 08:01 | Success | - | |
|
exp_self.20260307224448.034_20260307_224526
|
Cache-Aware Dynamic Precision Distillation for SSMs
README.md Cache-Aware Dynamic Precision Distillation for SSMs This repository contains the benchmark implementation for **Cache-Aware Dynamic Precision Distillation**. This innovation targets State Space Models (SSMs) to reduce memory footp...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307224731.035_20260307_224803
|
Memory-Bounded SSM Distillation Benchmark
This benchmark evaluates a novel **Segmented State Caching** mechanism with **Dynamic Precision** for training State Space Models (SSMs) under strict memory constraints. Innovation Summary Standard SSM training (e.g., Mamba architectures) r...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307225949.036_20260307_230023
|
Explanation of the Design
The benchmark is designed to validate the "Dynamic-Precision SSM Distillation" hypothesis. 1. **Synthetic SSM Model**: Instead of relying on external `mamba-ssm` libraries which may be hard to install/benchmark in a standalone script, I imp...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307230248.037_20260307_230311
|
Here is the design for the benchmark, split into the README and the runnable Python script as requested.
This benchmark implements a synthetic SSM (State Space Model) distillation pipeline. It compares a full-precision Teacher model against a Student model that utilizes **Selective State Caching** (forcing recurrent states to `bfloat16`) and *...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307230626.038_20260307_230651
|
Memory-Efficient SSM Distillation Benchmark
README.md Memory-Efficient SSM Distillation Benchmark Overview This benchmark evaluates a hypothesis for training Selective State Space Models (SSMs) on constrained hardware (8GB GPU). It tests a distillation setup where a smaller Student S...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307230930.039_20260307_230959
|
Innovation Benchmark: Quantized State Caching for Low-Resource SSM Distillation
README.md Innovation Benchmark: Quantized State Caching for Low-Resource SSM Distillation Overview This benchmark evaluates the "Quantized State Caching" hypothesis. It demonstrates that by applying dynamic precision (FP16/FP8) specifically...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307231325.040_20260307_231354
|
This benchmark evaluates **Dynamic Precision State Caching** for Selective State Space Models (SSMs).
README.md This benchmark evaluates **Dynamic Precision State Caching** for Selective State Space Models (SSMs). Innovation The core hypothesis is that SSMs do not require full float32 precision for their recurrent hidden states at all times...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307231612.041_20260307_231647
|
The user wants a benchmark for "Adaptive Precision State Caching".
I will implement a synthetic benchmark where: 1. A Teacher Transformer (FP32) processes a sequence. 2. A Student SSM processes the same sequence, guided by the teacher. 3. The SSM uses a `DynamicPrecisionCache` that stores recurrent states...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307231757.042_20260307_231833
|
This repository contains a runnable benchmark designed to evaluate the memory efficiency of a Dynamic Precision State Ca...
README.md This repository contains a runnable benchmark designed to evaluate the memory efficiency of a Dynamic Precision State Caching mechanism for State Space Models (SSMs) during Knowledge Distillation. Innovation: Dynamic Precision Sta...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307232016.043_20260307_232210
|
Benchmark for Self-Regulated State Cache Precision
Overview This benchmark is designed to validate the **Self-Regulated State Cache Precision** concept for State Space Models (SSMs). Since the architectural definition was previously empty, this implementation reconstructs the core hypothesi...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307232257.044_20260307_232325
|
Self-Regulated Quantized State Caching for SSM Distillation
README.md Self-Regulated Quantized State Caching for SSM Distillation This benchmark evaluates a novel approach to memory-efficient State Space Model (SSM) training via Knowledge Distillation. The core innovation is a **Self-Regulated State...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307232836.046_20260307_233042
|
The following benchmark is designed to evaluate the efficiency of a Dynamic Precision State Caching mechanism for State...
bash python benchmark.py ```
|
03-29 08:01 | Success | - | |
|
exp_self.20260307233233.047_20260307_233256
|
Cache-Augmented SSM Distillation with Dynamic Precision State Management
This repository contains the benchmark implementation for testing memory-efficient inference using a distilled State Space Model (SSM) augmented with a dynamic precision state cache. Overview The benchmark tests the hypothesis that a Studen...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307233516.048_20260307_233546
|
Section 1: README.md
bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_self.20260307234722.049_20260307_234750
|
Cache-Compressed Hybrid SSM Distillation Benchmark
README.md Cache-Compressed Hybrid SSM Distillation Benchmark This benchmark evaluates a novel architecture designed to maximize context window handling and memory efficiency on consumer-grade hardware (8GB VRAM target). The Innovation: Hybr...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307234959.050_20260307_235024
|
This benchmark implements a proof-of-concept for **Dynamic-Precision SSM Distillation**. It validates the hypothesis tha...
README.md This benchmark implements a proof-of-concept for **Dynamic-Precision SSM Distillation**. It validates the hypothesis that selectively reducing the precision of recurrent state tensors within a Selective State Space Model (SSM) stu...
|
03-29 08:01 | Success | - | |
|
exp_self.20260307235331.051_20260307_235356
|
Benchmark Design: State-Aware Dynamic Precision SSM
README.md bash pip install torch numpy bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_self.20260307235604.052_20260307_235630
|
Efficient SSM Distillation via Adaptive State Cache Precision
README.md Efficient SSM Distillation via Adaptive State Cache Precision Overview This benchmark evaluates the hypothesis that applying dynamic precision scaling (FP16/INT8) specifically to the recurrent state cache of a Student SSM during k...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308000001.053_20260308_000033
|
Efficient Long-Context SSM Distillation via Dynamic State Caching
This repository contains the benchmark implementation for testing hybrid memory architectures on State Space Models (SSMs). It demonstrates how moving long-term SSM hidden states to low-precision system RAM allows for effective distillation...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308000612.054_20260308_000643
|
Adaptive State Precision for Memory-Efficient SSM Distillation
README.md Adaptive State Precision for Memory-Efficient SSM Distillation Overview This benchmark evaluates the hypothesis that applying dynamic precision techniques to the recurrent state cache (hidden states) of a State Space Model (SSM) d...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308000849.055_20260308_000914
|
Benchmark: Cache-Aware Dynamic Precision for Efficient SSM Distillation
README.md Benchmark: Cache-Aware Dynamic Precision for Efficient SSM Distillation Overview This benchmark evaluates the hypothesis that applying dynamic precision reduction to the recurrent state cache of a State Space Model (SSM) during kn...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308001231.056_20260308_001433
|
```markdown
README.md bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_self.20260308001506.057_20260308_001541
|
Benchmark: Cache-Aware Dynamic State Precision for SSM Distillation
README.md Benchmark: Cache-Aware Dynamic State Precision for SSM Distillation This benchmark evaluates the hypothesis that applying dynamic precision quantization specifically to the recurrent state memory (cache) of a Student SSM during di...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308001901.058_20260308_001932
|
Dynamic State Precision for Low-VRAM SSM Distillation
README.md Dynamic State Precision for Low-VRAM SSM Distillation This benchmark evaluates the efficacy of a hardware-aware dynamic precision wrapper applied to the recurrent state of a distilled Mamba-like model. Hypothesis Implementing a dy...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308002152.059_20260308_002220
|
Distilled SSM with Mixed-Precision State Caching Benchmark
README.md Distilled SSM with Mixed-Precision State Caching Benchmark 1. Overview This benchmark validates the **Distilled SSM with Mixed-Precision State Caching** innovation. The core hypothesis is that a student State Space Model (SSM), tr...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308002536.060_20260308_002609
|
Cache-Aware Dynamic Precision SSM Distillation
Overview This benchmark validates the **Cache-Aware Dynamic Precision SSM Distillation** methodology. It demonstrates a training loop where a Student SSM (Mamba-like) learns from a Teacher SSM while utilizing two key innovations: 1. **State...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308002820.061_20260308_003013
|
Benchmark: Memory-Efficient State Distillation for SSM Inference
README.md Benchmark: Memory-Efficient State Distillation for SSM Inference Overview This benchmark evaluates the performance gains from "Memory-Efficient State Distillation" applied to State Space Models (SSMs). In standard SSM inference (e...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308003147.062_20260308_003348
|
Innovation: Selective State Caching for Efficient SSM Distillation
README.md Innovation: Selective State Caching for Efficient SSM Distillation Overview This benchmark demonstrates the **Selective State Caching** mechanism designed to optimize the distillation process of Selective State Space Models (SSMs)...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308003447.063_20260308_003515
|
Innovation: Memory-Efficient State Space Model Distillation with Dynamic Caching
README.md Innovation: Memory-Efficient State Space Model Distillation with Dynamic Caching Overview This benchmark validates a **Dynamic Caching** strategy for State Space Models (SSMs), specifically focusing on the Mamba architecture durin...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308003838.064_20260308_004042
|
Here is the runnable benchmark design for the "Efficient SSM Distillation via Selective State Caching" innovation.
Design Rationale * **Innovation Modeled:** Selective State Caching for State Space Models (SSMs). * **Scenario:** Autoregressive generation (e.g., text generation) where an SSM needs to maintain a hidden state over a long context. * **Basel...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308004118.065_20260308_004156
|
Efficient Mamba Knowledge Distillation via Selective State-Aware Caching
README.md bash pip install torch numpy bash python benchmark.py MODE: Baseline Full-Graph VRAM_USAGE: 2100MB TOKENS_PER_SEC: 1200 ... MODE: Selective State Caching VRAM_USAGE: 1450MB TOKENS_PER_SEC: 1150 ... RESULT: Memory reduced by 30.9%....
|
03-29 08:01 | Success | - | |
|
exp_self.20260308004451.066_20260308_004528
|
Here is the runnable benchmark code for the "Memory-Efficient Mamba Distillation via Selective State Caching" innovation...
README.md
|
03-29 08:01 | Success | - | |
|
exp_self.20260308004759.067_20260308_004827
|
Dynamic Precision SSM Distillation Benchmark
README.md Dynamic Precision SSM Distillation Benchmark This repository contains a minimal, self-contained benchmark designed to evaluate the memory efficiency of Dynamic Precision Selective State Space Models (SSM) during Knowledge Distilla...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308005213.068_20260308_005422
|
Offline SSM Distillation via Cached State Replay
README.md Offline SSM Distillation via Cached State Replay This benchmark implements the "Offline SSM Distillation via Cached State Replay on Memory-Constrained Hardware" concept. Concept Standard Knowledge Distillation requires both the la...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308005505.069_20260308_005535
|
Memory-Efficient SSM Distillation via Cached State Replay
README.md Memory-Efficient SSM Distillation via Cached State Replay This benchmark validates an innovation for training large sequence models on constrained hardware (8GB GPU) by utilizing **Cached State Replay** during the distillation of...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308005815.070_20260308_010006
|
Here is the runnable benchmark design for the Memory-Bounded SSM Distillation concept, including the requested documenta...
README.md --- Benchmark: Memory-Bounded SSM Distillation via Selective State Caching Overview This benchmark evaluates the performance and memory efficiency of a **Selective State Space Model (SSM)** against a standard full-history SSM. **T...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308010051.071_20260308_010133
|
---
README.md --- Benchmark: CPU-Offloaded State Caching for SSM Distillation Overview This benchmark validates the hypothesis that offloading Teacher SSM (State Space Model) recurrent states to system RAM (CPU) during knowledge distillation re...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308010454.072_20260308_010539
|
CPU-Offloaded SSM State Distillation via Cached Replay
README.md CPU-Offloaded SSM State Distillation via Cached Replay Innovation Summary This benchmark demonstrates a training strategy where a large Teacher Mamba model (State Space Model) pre-computes and caches its hidden states to system RA...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308010727.073_20260308_010807
|
Dynamic-Precision State Caching Benchmark
This benchmark tests the "Dynamic-Precision State Caching" innovation designed for efficient SSM (State Space Model) distillation. The core hypothesis is that dynamically reducing the precision of the recurrent state tensor (from FP32 to FP...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308011137.074_20260308_011211
|
Benchmark: Dynamic-Precision Cached State Distillation for Memory-Efficient SSMs
README.md Benchmark: Dynamic-Precision Cached State Distillation for Memory-Efficient SSMs Overview This benchmark tests the hypothesis that applying **Dynamic Precision (AMP)** specifically to **cached recurrent states** during the distill...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308011449.075_20260308_011545
|
Cached State Distillation for Memory-Efficient Mamba Training
README.md This benchmark evaluates the hypothesis that implementing a state caching mechanism during the distillation of a Mamba SSM significantly reduces peak GPU memory usage compared to standard backpropagation through time (BPTT). Innov...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308011747.076_20260308_011820
|
**README.md**
Memory-Efficient SSM Distillation Benchmark Innovation: Memory-Efficient SSM Distillation via Cached State Checkpointing This benchmark tests the hypothesis that implementing gradient checkpointing on a student SSM, combined with a read-onl...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308012031.077_20260308_012056
|
Memory-Efficient SSM Distillation via Dynamic Precision State Caching
This repository contains a benchmarking suite designed to validate the hypothesis that applying dynamic precision (FP16) to the recurrent state cache during the distillation of State Space Models (SSMs) reduces peak GPU memory usage without...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308012429.078_20260308_012520
|
Here is the design for the benchmark.
README.md Memory-Optimized State-Space Model Distillation Benchmark This benchmark evaluates the "Memory-Optimized State-Space Model Distillation via Selective State Caching" innovation. Hypothesis By offloading Teacher hidden states to CPU...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308012748.079_20260308_012851
|
**Memory-Efficient Mamba Distillation Benchmark**
README.md **Memory-Efficient Mamba Distillation Benchmark** This benchmark validates the "Memory-Efficient Mamba Distillation" hypothesis. It simulates a distillation process between a large Teacher Mamba and a small Student Mamba. **Key In...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308013055.080_20260308_013129
|
```markdown
README.md
|
03-29 08:01 | Success | - | |
|
exp_self.20260308013248.081_20260308_013341
|
Memory-Efficient Distillation of Mamba Models via Selective State Caching
README.md Memory-Efficient Distillation of Mamba Models via Selective State Caching Overview This benchmark validates the hypothesis that implementing a **Selective State Caching** mechanism during the distillation of a Mamba-based SSM (Sta...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308014005.083_20260308_014058
|
```markdown
README.md bash pip install torch transformers datasets tqdm bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_self.20260308015738.084_20260308_015818
|
Benchmark: CPU-Offloaded Selective State Caching for Mamba Distillation
README.md Benchmark: CPU-Offloaded Selective State Caching for Mamba Distillation 1. Overview This benchmark validates the "CPU-Offloaded Selective State Caching" strategy for distilling large Mamba-style State Space Models (SSMs) on memory...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308015931.085_20260308_030123
|
Here is the design for the benchmark evaluating "Low-VRAM Mamba Distillation via Selective State Offloading". This bench...
No summary available yet.
|
03-29 08:01 | Success | - | |
|
exp_self.20260308030156.086_20260308_030218
|
```markdown
bash python benchmark.py ``` Expected Outcome The script should run without `RuntimeError: CUDA out of memory`. You will observe high system RAM usage (due to the Teacher) but low, stable GPU VRAM usage (due to the Student-only on-device st...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308030436.087_20260308_030508
|
This repository contains a runnable benchmark for **Dynamic-Precision Mamba Distillation with CPU-Offloaded State Cache*...
README.md This repository contains a runnable benchmark for **Dynamic-Precision Mamba Distillation with CPU-Offloaded State Cache**. Objective The benchmark tests the hypothesis that dynamic precision scaling of SSM states combined with CPU...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308030729.088_20260308_030759
|
Benchmark: Dynamic-Precision Mamba Distillation with CPU-Offloaded State Caching
README.md Benchmark: Dynamic-Precision Mamba Distillation with CPU-Offloaded State Caching Overview This benchmark tests the hypothesis that a student Mamba model can be trained efficiently on limited VRAM (targeting < 8GB) by utilizing **C...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308031115.089_20260308_031141
|
---
README.md --- Memory-Efficient Mamba Distillation Benchmark This benchmark evaluates the hypothesis that explicitly caching recurrent hidden states during Mamba distillation reduces peak VRAM usage and increases training throughput compared...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308031332.090_20260308_031534
|
Memory-Efficient Mamba Distillation via Selective State Caching
This benchmark evaluates a novel approach to optimizing State Space Models (SSMs), specifically targeting Mamba architectures. The core innovation lies in combining model **distillation** with a **selective state caching** mechanism to dras...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308031743.091_20260308_031954
|
Benchmark: Segmented State Caching for Memory-Efficient Mamba Distillation
README.md Benchmark: Segmented State Caching for Memory-Efficient Mamba Distillation Overview This benchmark evaluates a "Segmented State Caching" mechanism designed for State Space Models (SSMs), specifically targeting scenarios involving...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308032029.092_20260308_032105
|
Here is the design for the runnable benchmark.
Section 1: README.md Section 2: benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_self.20260308032340.093_20260308_032421
|
self.20260308032340.093
No summary available yet.
|
03-29 08:01 | Success | - | |
|
exp_self.20260308032653.094_20260308_032733
|
---
README.md --- CPU-Offloaded State Distillation for 8GB Mamba Optimization Overview This benchmark implements and tests a novel training strategy for large-context State Space Models (SSMs), specifically targeting hardware constraints (e.g.,...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308033152.096_20260308_033230
|
Benchmark: Delta-Encoded State Caching for Mamba Distillation
README.md Benchmark: Delta-Encoded State Caching for Mamba Distillation Innovation Summary This benchmark validates a memory-efficient distillation pipeline for State Space Models (SSMs), specifically focusing on the `Mamba` architecture. T...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308033357.097_20260308_033426
|
Recurrent State Caching for Low-Memory Mamba Distillation
README.md Recurrent State Caching for Low-Memory Mamba Distillation Overview This benchmark validates the hypothesis that implementing a recurrent state caching strategy during the distillation of SSM-based Mamba models optimizes GPU memory...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308033644.098_20260308_033719
|
Benchmark: Segmented State Caching for Low-Memory Mamba Distillation
README.md Benchmark: Segmented State Caching for Low-Memory Mamba Distillation Overview This benchmark tests the hypothesis that processing input sequences in discrete segments and caching only recurrent state boundaries—detached from the c...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308033948.099_20260308_034021
|
Here is the design for the "Selective State Retention for Memory-Constrained Mamba Distillation" benchmark.
This solution uses a synthetic implementation of the Mamba SSM recurrence logic to ensure the code is **runnable immediately** without requiring complex CUDA-dependent compilation of the specific `mamba-ssm` library, while accurately demons...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308034400.100_20260308_034440
|
**Title:** Mamba Model Distillation with Cached State Retention
README.md **Title:** Mamba Model Distillation with Cached State Retention **Abstract:** This benchmark evaluates the performance of distilling a pre-trained Mamba-130M State Space Model (SSM) into a smaller student variant. The core innovat...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308035621.101_20260308_035820
|
Efficient Mamba Distillation via Selective State Caching
README.md Efficient Mamba Distillation via Selective State Caching Innovation Overview This benchmark validates the "Efficient Mamba Distillation via Selective State Caching" architecture. While the initial generation was skipped due to emp...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308035908.102_20260308_035942
|
```markdown
bash pip install torch python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_self.20260308040509.103_20260308_040537
|
Precision-Aware SSM Distillation Benchmark
README.md Precision-Aware SSM Distillation Benchmark This repository provides a minimal, self-contained benchmark for evaluating **Precision-Aware SSM Distillation with Adaptive State Caching**. The Innovation This benchmark tests the hypot...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308041045.105_20260308_041111
|
```markdown
README.md bash pip install torch bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_self.20260308041307.106_20260308_041338
|
Memory-Efficient SSM Distillation Benchmark
README.md Memory-Efficient SSM Distillation Benchmark This repository contains a runnable benchmark demonstrating "Memory-Efficient SSM Distillation via Adaptive Precision State Caching." Overview The benchmark compares a standard training...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308041520.107_20260308_041722
|
Adaptive-Precision SSM Distillation via State-Space Caching
README.md Adaptive-Precision SSM Distillation via State-Space Caching This benchmark evaluates a novel approach to optimizing State Space Models (SSMs) for inference efficiency. The core innovation combines two techniques: 1. **State-Space...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308041758.108_20260308_042007
|
Backfill Implementation: Cache-Aware Dynamic Precision SSM
README.md Backfill Implementation: Cache-Aware Dynamic Precision SSM **Original Candidate:** `self.20260308041758.108` **Status:** Backfilled (Original Architect Output was Empty) Overview This benchmark validates the concept of **Cache-Awa...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308042217.109_20260308_042244
|
Selective State-Space Distillation with Dynamic Precision Caching
README.md Selective State-Space Distillation with Dynamic Precision Caching Overview This benchmark validates the hypothesis that a mixed-precision (Dynamic Precision) Student model, utilizing a Selective State-Space Model (SSM) architectur...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308042452.110_20260308_042514
|
This benchmark evaluates the "Distilled SSM Memory Efficiency via Dynamic Precision Caching" innovation. The goal is to...
README.md This benchmark evaluates the "Distilled SSM Memory Efficiency via Dynamic Precision Caching" innovation. The goal is to demonstrate that a Student State-Space Model (SSM), trained via distillation from a Teacher model and utilizin...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308042824.111_20260308_042916
|
Design for Dynamic Precision State Caching Benchmark
README.md This benchmark evaluates the "Dynamic Precision State Caching" hypothesis for State Space Models (SSMs). It simulates a Distilled Mamba-130M-like architecture to demonstrate that storing recurrent hidden states in lower precision...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308043115.112_20260308_043144
|
Benchmark: Dynamic Precision State Caching for Distilled Mamba Models
README.md Benchmark: Dynamic Precision State Caching for Distilled Mamba Models This benchmark evaluates the hypothesis that implementing dynamic precision scaling for the state cache of a Selective State Space Model (SSM/Mamba) can reduce...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308043437.113_20260308_043504
|
Here is the design for the benchmark.
bash pip install torch bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_self.20260308043726.114_20260308_043806
|
Dynamic Precision State Caching for Distilled Mamba Inference
README.md Dynamic Precision State Caching for Distilled Mamba Inference Overview This benchmark demonstrates a simulation of the "Dynamic Precision State Caching" innovation applied to a simplified SSM (State Space Model) architecture, insp...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308044116.115_20260308_044156
|
Distilled SSM with Dynamic State Precision and Memory Caching
README.md Distilled SSM with Dynamic State Precision and Memory Caching **Innovation Overview:** This benchmark evaluates a hypothesis that a distilled State Space Model (SSM), utilizing dynamic precision on recurrent state caches, can sign...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308044407.116_20260308_044431
|
```markdown
bash pip install torch numpy python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_self.20260308044747.117_20260308_044822
|
```markdown
No summary available yet.
|
03-29 08:01 | Success | - | |
|
exp_self.20260308044952.118_20260308_045151
|
Here is the design for the "Entropy-Guided Dynamic State Precision for SSM Distillation" benchmark, implemented as a run...
No summary available yet.
|
03-29 08:01 | Success | - | |
|
exp_self.20260308045342.119_20260308_045530
|
State-Aware Dynamic Precision Distillation for SSMs
README.md State-Aware Dynamic Precision Distillation for SSMs This benchmark evaluates the effectiveness of **State-Aware Dynamic Precision** techniques applied to State Space Models (SSMs) running on memory-constrained devices. Background...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308045625.120_20260308_045645
|
Here are the sections for the runnable benchmark.
Cached-State Distillation of Dynamic-Precision SSMs Overview This benchmark evaluates a memory-efficient Knowledge Distillation pipeline for State Space Models (SSMs). It targets environments with strict 8GB VRAM constraints by combining tw...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308045958.121_20260308_050026
|
Dynamic-Precision State Distillation for Low-Memory SSMs
README.md Dynamic-Precision State Distillation for Low-Memory SSMs Innovation Overview This benchmark evaluates a novel technique to enable large context processing on memory-constrained GPUs (8GB limit) by integrating **Dynamic Precision**...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308050428.123_20260308_050455
|
Low-Memory SSM Training via Dynamic-Precision State Caching and Distillation
README.md Low-Memory SSM Training via Dynamic-Precision State Caching and Distillation Overview This benchmark tests the hypothesis that a Selective State Space Model (SSM) can be trained efficiently on limited VRAM (target < 7.5GB) by impl...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308050702.124_20260308_050722
|
---
README.md --- Benchmark: Gradient-Checkpointed SSMs with Dynamic State Precision and Distillation Overview This benchmark validates a hypothesis for training State Space Models (SSMs) on memory-constrained GPUs (target: 8GB). It combines th...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308051512.125_20260308_051746
|
Here is the runnable benchmark for the **Low-Memory SSM Distillation via Dynamic-Precision State Caching** innovation.
bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_self.20260308051844.126_20260308_051912
|
This repository contains the implementation and benchmarking suite for the research on **Dynamic-Precision State Distill...
README.md This repository contains the implementation and benchmarking suite for the research on **Dynamic-Precision State Distillation for Efficient State Space Models (SSMs)**. Overview This innovation addresses the memory constraints of...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308052133.127_20260308_052201
|
Efficient SSM Distillation Benchmark
README.md Efficient SSM Distillation Benchmark This benchmark evaluates a novel training strategy for State Space Models (SSMs) aimed at reducing GPU memory footprint during knowledge distillation. It tests the hypothesis that applying dyna...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308052829.129_20260308_052859
|
Dynamic-Precision Cached Distillation for Compact SSMs
README.md Dynamic-Precision Cached Distillation for Compact SSMs This repository contains a minimal, runnable benchmark for the paper: **"Dynamic-Precision Cached Distillation for Compact SSMs"**. Overview This benchmark demonstrates a nove...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308053107.130_20260308_053150
|
Here is the design for the Dynamic-Precision State Distillation benchmark.
1. README.md bash pip install torch python benchmark.py 2. benchmark.py ```python import torch import torch.nn as nn import time import math --- Minimal Mamba-Style SSM Implementation --- class MinimalSSMBlock(nn.Module): """ A minimal SSM...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308053441.131_20260308_053508
|
Dynamic-Precision State Caching for Distilled SSMs
README.md Dynamic-Precision State Caching for Distilled SSMs This benchmark evaluates the memory efficiency and inference speed of a novel **Dynamic-Precision State Caching** mechanism applied to a distilled State Space Model (SSM). Overvie...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308053712.132_20260308_053900
|
Here is the runnable benchmark design for the **Adaptive Precision State Caching for Distilled SSMs** concept. Since the...
README.md
|
03-29 08:01 | Success | - | |
|
exp_self.20260308054504.135_20260308_054527
|
Benchmark: Dynamic-Precision State Caching for Distilled SSMs
README.md Benchmark: Dynamic-Precision State Caching for Distilled SSMs This repository contains a runnable synthetic benchmark designed to validate the hypothesis of **Dynamic-Precision State Caching for Distilled SSMs**. Hypothesis By rep...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308055214.136_20260308_055401
|
Memory-Efficient Distilled Mamba with Dynamic State Caching
README.md Memory-Efficient Distilled Mamba with Dynamic State Caching This benchmark evaluates a **Memory-Efficient Distilled Mamba** architecture implementing **Dynamic State Caching** and **Dynamic Precision**. The Innovation The core inn...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308055449.137_20260308_055515
|
Here is the design and implementation for the requested benchmark.
No summary available yet.
|
03-29 08:01 | Success | - | |
|
exp_self.20260308055858.138_20260308_060103
|
Here is the benchmark design for the concept described in the title, strictly adhering to your formatting requirements.
README.md Adaptive-Precision SSM State Caching Benchmark Overview This benchmark evaluates the **Memory-Efficient SSM State Caching** innovation. State Space Models (SSMs) require maintaining a hidden state that grows with sequence length o...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308060154.139_20260308_060214
|
Hybrid-Precision Distilled SSM Benchmark
README.md Hybrid-Precision Distilled SSM Benchmark Overview This benchmark validates the "Hybrid-Precision Distilled SSM" innovation. The core hypothesis is that storing the recurrent state tensors of a State Space Model (SSM) in FP16 (half...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308060518.140_20260308_060541
|
Benchmark: Dynamic-Precision State Caching for Distilled SSMs
README.md Benchmark: Dynamic-Precision State Caching for Distilled SSMs This benchmark evaluates the memory efficiency and performance of a **Distilled State Space Model (SSM)** that utilizes **Dynamic-Precision State Caching**. Hypothesis...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308061131.141_20260308_061329
|
Here is the design for the benchmark. Since the original experiment was skipped due to an empty architect output, I have...
README.md Cache-Augmented Memory Optimization for Mamba Model Distillation Overview This benchmark implements a framework for distilling knowledge from a large Teacher Mamba model to a smaller Student Mamba model. The specific innovation be...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308061401.142_20260308_061554
|
Here is the runnable benchmark for the Cache-Augmented Distillation innovation.
bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_self.20260308061738.143_20260308_061800
|
Dynamic-Precision SSM Distillation Benchmark
README.md Dynamic-Precision SSM Distillation Benchmark This benchmark evaluates the hypothesis that a Mamba-based State Space Model (SSM) student, distilled from a Transformer teacher using dynamic precision (BF16) and explicit state cachin...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308063850.001_20260308_064044
|
Benchmark: Dynamic-Precision SSM with Unified State Caching
README.md Benchmark: Dynamic-Precision SSM with Unified State Caching This repository contains a benchmark for evaluating the efficiency of **Dynamic-Precision State Space Models (SSM)** utilizing **Unified State Caching**. Overview The ben...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308064650.001_20260308_064851
|
Here is the benchmark design for the Dynamic-Precision SSM Distillation with Unified State Caching innovation.
Since the original abstract was empty ("architect_output_empty"), I have synthesized the core logic for the benchmark: 1. **SSM (State Space Model):** Modeled using a simplified selective recurrent layer to simulate Mamba-like architecture....
|
03-29 08:01 | Success | - | |
|
exp_self.20260308064929.002_20260308_064956
|
---
README.md --- Dynamic-Precision SSM Distillation Benchmark Overview This benchmark validates the hypothesis that distilling a dense Transformer teacher into a Mamba-style SSM student using **dynamic precision** (int8 weights, fp16 states) a...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308065328.003_20260308_065411
|
Memory-Efficient SSM Distillation Benchmark
This benchmark tests the hypothesis that offloading recurrent states to CPU during the distillation of a large SSM (Teacher) to a small SSM (Student) reduces VRAM usage significantly while maintaining training throughput. --- README.md Memo...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308065629.004_20260308_065716
|
State-Aligned Mamba Distillation Benchmark
README.md State-Aligned Mamba Distillation Benchmark This benchmark evaluates **State-Aligned Mamba Distillation**, a technique designed to train efficient student Mamba models by aligning their internal recurrent states with a larger teach...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308065913.005_20260308_065941
|
Here is the design for the benchmarking code focusing on CPU-offloaded state caching for Mamba distillation.
README.md Benchmark: CPU-Offloaded State Caching for Efficient Mamba Distillation This benchmark validates the hypothesis that CPU-offloading teacher states during SSM (State Space Model) distillation significantly reduces GPU VRAM consumpt...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308070311.006_20260308_070343
|
Dynamic Precision SSM Distillation with Hierarchical Memory Caching
README.md Dynamic Precision SSM Distillation with Hierarchical Memory Caching This repository contains the benchmarking suite for the **Dynamic Precision SSM Distillation** innovation. Overview This innovation aims to enable efficient proce...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308070557.007_20260308_070631
|
Dynamic Precision SSM Distillation Benchmark
README.md Dynamic Precision SSM Distillation Benchmark This benchmark evaluates a novel training strategy for **Selective State Space Models (SSMs)**, specifically testing the hypothesis that applying **Dynamic Precision** techniques to the...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308070842.008_20260308_070915
|
Here is the runnable benchmark design for the "GPU-Efficient Distilled SSM" innovation.
README.md GPU-Efficient Distilled SSM with Dynamic State Caching Overview This benchmark evaluates a novel training approach for State Space Models (SSMs) designed for resource-constrained environments (e.g., 8GB GPUs). It combines Knowledg...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308071226.009_20260308_071304
|
This benchmark evaluates a **Layer-wise Dynamic Precision SSM** against a standard Transformer baseline.
README.md This benchmark evaluates a **Layer-wise Dynamic Precision SSM** against a standard Transformer baseline. Hypothesis By monitoring gradient norms, we can dynamically cast stable layers of a State Space Model (SSM) to FP16 (simulate...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308071555.010_20260308_071653
|
Memory-Efficient SSM Distillation Benchmark
README.md Memory-Efficient SSM Distillation Benchmark This benchmark evaluates the "Memory-Efficient SSM Distillation with Dynamic Precision and State Caching" innovation. **Goal**: Demonstrate that a lightweight State Space Model (SSM) stu...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308071929.011_20260308_072001
|
Low-Memory SSM Distillation Benchmark
README.md Low-Memory SSM Distillation Benchmark This benchmark evaluates the effectiveness of **Dynamic State Precision** for State Space Models (SSMs) during knowledge distillation. Hypothesis Dynamically down-casting recurrent state tenso...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308072213.012_20260308_072241
|
SSM Distillation with Dynamic Precision State Caching
This repository contains a minimal, runnable benchmark designed to validate the "SSM Distillation with Dynamic Precision State Caching" innovation. Hypothesis Implementing a dynamic precision cache for recurrent states during SSM distillati...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308072603.013_20260308_072640
|
Benchmark: SSM Distillation via Recurrent State Caching
README.md Benchmark: SSM Distillation via Recurrent State Caching Overview This benchmark validates the memory efficiency of **Recurrent State Caching** during the distillation of State Space Models (SSMs). Specifically, it tests the hypoth...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308072849.014_20260308_072924
|
Efficient SSM Distillation via Static State Caching
README.md Efficient SSM Distillation via Static State Caching Overview This benchmark demonstrates the innovation of **Static State Caching** during the distillation of Mamba-based State Space Models (SSMs). **The Hypothesis:** By freezing...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308073322.015_20260308_073357
|
Memory-Efficient SSM Distillation via CPU Offloaded State Caching
README.md Memory-Efficient SSM Distillation via CPU Offloaded State Caching Benchmark Overview This benchmark evaluates the hypothesis that **pre-computing Teacher SSM states and offloading them to CPU system RAM** allows for memory-efficie...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308073628.016_20260308_073723
|
Here is the design for the benchmark.
README.md bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_self.20260308073923.017_20260308_074006
|
Efficient Mamba Distillation with CPU-Offloaded State Cache
README.md Efficient Mamba Distillation with CPU-Offloaded State Cache Overview This benchmark validates an innovation designed to enable large-context processing on memory-constrained GPUs (e.g., 8GB VRAM) by combining model distillation wi...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308075306.018_20260308_075352
|
SSM Distillation with Selective State Caching
README.md SSM Distillation with Selective State Caching Overview This benchmark demonstrates a memory-efficient distillation pipeline for State Space Models (SSMs), specifically Mamba-style architectures. **The Innovation:** Standard backpr...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308075611.019_20260308_075715
|
```markdown
README.md bash pip install torch transformers python benchmark.py MODE: CPU_OFFLOAD_Q8 VRAM_USAGE: 450MB TOKENS_PER_SEC: 1200 RESULT: SUCCESS (Memory Optimized, Loss Converged) ---
|
03-29 08:01 | Success | - | |
|
exp_self.20260308075923.020_20260308_080004
|
Memory-Efficient State-Space Distillation Benchmark
README.md Memory-Efficient State-Space Distillation Benchmark This benchmark demonstrates a memory-efficient training strategy for State-Space Models (SSMs), specifically tailored for Mamba-like architectures. The innovation, **Recurrent Ca...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308080247.021_20260308_080331
|
Here is the runnable benchmark design for the State-Space Distillation innovation.
README.md State-Space Distillation via Latent Memory Alignment Hypothesis Distilling the internal recurrent memory states of a teacher State Space Model (SSM) into a smaller student model yields superior accuracy compared to standard logit-...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308080540.022_20260308_080639
|
Benchmark Design: SSM-Mamba Distillation with Segment-Based Latent Caching
This benchmark evaluates a simplified Mamba-style State Space Model (SSM) implementation where a large Teacher model distills knowledge into a smaller Student model. To handle long-context sequences without exceeding VRAM, we utilize a segm...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308081351.024_20260308_081432
|
Section 1: README.md
bash python benchmark.py MODE: baseline VRAM_USAGE: <value>MB TOKENS_PER_SEC: <value> ... MODE: innovation VRAM_USAGE: <value>MB TOKENS_PER_SEC: <value> ... RESULT: Memory reduction of <percentage>% achieved. ```
|
03-29 08:01 | Success | - | |
|
exp_self.20260308081824.025_20260308_081908
|
Here is the design for the benchmark evaluating **Dynamic Precision State-Space Distillation with Cache Optimization**.
Design Philosophy The benchmark implements a minimal but functionally accurate **State-Space Model (SSM)** layer that mimics the recurrent memory behavior of Mamba architectures. 1. **Models**: A Teacher (large) and a Student (small) SSM ar...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308082217.026_20260308_082258
|
Dynamic Precision State-Space Distillation with Adaptive Caching
This repository contains a benchmark for the proposed "Dynamic Precision State-Space Distillation" technique. The goal is to demonstrate that utilizing adaptive precision (FP16/FP8) for the recurrent state tensors ($h_t$) in a State Space M...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308082455.027_20260308_082708
|
Here is a runnable benchmark designed for the **Hybrid SSM-Transformer with Dynamic Precision Caching** concept.
Since the original experiment output was empty, I have synthesized a representative architecture that combines: 1. **Hybrid Layers:** Alternating blocks of Standard Attention (Transformer) and Selective State Space (Mamba-like) blocks. 2. *...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308082751.028_20260308_082940
|
Innovation: Dynamic Precision SSM Distillation with Selective Memory Caching
README.md Innovation: Dynamic Precision SSM Distillation with Selective Memory Caching This benchmark evaluates the efficiency of a theoretical distilled State Space Model (SSM) that employs two primary optimization strategies: 1. **Dynamic...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308083059.029_20260308_083130
|
Dynamic Precision SSM Distillation Benchmark
README.md Dynamic Precision SSM Distillation Benchmark **Innovation:** Dynamic Precision SSM Distillation with Recurrent State Caching **Hypothesis:** We hypothesize that distilling a lightweight State Space Model (SSM) student from a froze...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308083509.030_20260308_083535
|
```markdown
bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_self.20260308083731.031_20260308_083802
|
This repository contains a standalone benchmark to evaluate the efficiency gains of **Dynamic Precision SSM Distillation...
README.md This repository contains a standalone benchmark to evaluate the efficiency gains of **Dynamic Precision SSM Distillation with Recurrent State Caching**. Overview State Space Models (SSMs), such as Mamba, offer significant potentia...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308084022.032_20260308_084110
|
Here are the two sections as requested.
README.md bash pip install torch numpy bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_self.20260308084335.033_20260308_084415
|
Dynamic Precision SSM Distillation with State Memory Caching
README.md Dynamic Precision SSM Distillation with State Memory Caching Overview This benchmark evaluates a novel approach to training State Space Models (SSMs) by combining **Dynamic Precision (Automatic Mixed Precision - AMP)** with **Stat...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308084629.034_20260308_084704
|
Dynamic Precision SSM Distillation with Detached State Caching
README.md Dynamic Precision SSM Distillation with Detached State Caching This repository contains a benchmark implementation designed to validate the hypothesis that a detached recurrent state cache strategy, combined with Automatic Mixed P...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308085103.035_20260308_085138
|
```markdown
bash pip install torch python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_self.20260308090156.036_20260308_090224
|
```markdown
README.md bash pip install torch bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_self.20260308090431.037_20260308_090514
|
```markdown
bash python benchmark.py ``` Expected Output The script outputs: * **VRAM_USAGE**: Peak memory allocated during the operation. * **TOKENS_PER_SEC**: Throughput measured in tokens generated per second. * **RESULT**: A final verification comp...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308090818.038_20260308_091011
|
Here is the runnable benchmark design.
README.md Dynamic Precision SSM with State Caching: Efficiency Benchmark Overview This benchmark evaluates the proposed innovation: **Dynamic Precision SSM Distillation with Cached State Memory**. The goal is to demonstrate the efficiency g...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308091114.039_20260308_091138
|
Here is the runnable benchmark code.
bash pip install torch bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_self.20260308091518.040_20260308_091552
|
Section 1: README.md
bash pip install torch bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_self.20260308091816.041_20260308_091853
|
Efficient SSM Distillation Benchmark
README.md Efficient SSM Distillation Benchmark This benchmark evaluates the "Efficient SSM Distillation" innovation. The core hypothesis is that a student State Space Model (SSM) can maintain training stability comparable to a Transformer t...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308092146.042_20260308_092256
|
---
**README.md** Memory-Efficient SSM Distillation via Dynamic State Caching Overview This benchmark evaluates a knowledge distillation pipeline where a Transformer Teacher model trains a State Space Model (SSM) Student. The core innovation te...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308092508.043_20260308_092557
|
Dynamic-Precision SSM Distillation Benchmark
README.md Dynamic-Precision SSM Distillation Benchmark This benchmark validates the hypothesis that **Dynamic-Precision SSM Distillation with Selective State Caching** reduces GPU memory usage for long-context sequences while maintaining ac...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308092826.044_20260308_092918
|
Adaptive-Precision SSM Distillation Benchmark
README.md Adaptive-Precision SSM Distillation Benchmark This repository contains the benchmarking code for evaluating **Adaptive-Precision SSM Distillation with Cached State Memory**. Hypothesis Implementing layer-wise dynamic precision adj...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308093127.045_20260308_093201
|
Low-Resource SSM Distillation Benchmark
README.md Low-Resource SSM Distillation Benchmark Overview This benchmark evaluates the hypothesis that a lightweight Selective State Space Model (SSM), utilizing **Selective State Caching** and **Dynamic Precision** (AMP) training, can pro...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308094910.046_20260308_094952
|
```markdown
bash python benchmark.py ``` 3. The script will output VRAM usage, processing speed, and final verification results.
|
03-29 08:01 | Success | - | |
|
exp_self.20260308095221.047_20260308_095310
|
---
README.md --- VRAM-Efficient SSM Distillation Benchmark This benchmark validates the **VRAM-Efficient SSM Distillation** innovation, which utilizes Adaptive State Quantization and Selective Caching to reduce memory footprint during the trai...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308095539.048_20260308_095613
|
Adaptive State Distillation for Memory-Constrained Mamba Models
README.md Adaptive State Distillation for Memory-Constrained Mamba Models Innovation Overview This benchmark demonstrates a novel training strategy for State Space Models (specifically Mamba). The hypothesis is that distilling a large Teach...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308095844.049_20260308_095917
|
Adaptive State Distillation Benchmark
README.md Adaptive State Distillation Benchmark This benchmark evaluates the **Adaptive State Distillation** technique designed to train large State Space Models (SSMs) on memory-constrained hardware (8GB VRAM). Methodology The code impleme...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308100138.050_20260308_100331
|
Here is the design for the benchmark based on the "Dynamic Precision State Distillation" concept. Since the original arc...
README.md Benchmark: Dynamic Precision SSM Inference Overview This benchmark evaluates the memory efficiency and throughput of **Dynamic Precision State Distillation** concepts on State Space Models (SSMs). Since the target architecture was...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308100426.051_20260308_100609
|
Benchmark: Dynamic Precision State Distillation for VRAM-Constrained SSMs
README.md Benchmark: Dynamic Precision State Distillation for VRAM-Constrained SSMs Overview This benchmark evaluates the "Dynamic Precision State Distillation" technique applied to a synthetic State Space Model (SSM). The core innovation i...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308100758.052_20260308_100829
|
Dynamic Precision State Cache Distillation for SSMs
README.md Dynamic Precision State Cache Distillation for SSMs Innovation Overview This benchmark demonstrates a novel technique to optimize State Space Models (SSMs) for deployment on consumer-grade hardware (8GB VRAM). By applying **Dynami...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308101059.053_20260308_101125
|
---
README.md Dynamic Precision State Caching for Distilled SSMs Overview This benchmark evaluates the "Dynamic Precision State Caching" innovation applied to a distilled Mamba-style Selective State Space Model (SSM). **Hypothesis:** By dynamic...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308101507.054_20260308_101548
|
Benchmark: Dynamic Precision State Cache for Memory-Efficient SSM Distillation
README.md Benchmark: Dynamic Precision State Cache for Memory-Efficient SSM Distillation 1. Objective This benchmark evaluates the hypothesis that **Dynamic Precision State Caching** significantly reduces the peak VRAM consumption of State...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308102733.055_20260308_102801
|
---
FILE_BREAK--- Benchmark: Phase-Shifted Distillation for Low-Precision SSMs Overview This benchmark validates the "Phase-Shifted Distillation" hypothesis. It tests whether a dynamic precision schedule applied to a Student State Space Model (...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308103017.056_20260308_103051
|
Dynamic Precision State Caching for Distilled SSMs
README.md Dynamic Precision State Caching for Distilled SSMs Overview This benchmark evaluates the "Dynamic Precision State Caching" innovation for State Space Models (SSMs), specifically targeting memory-constrained hardware. **Hypothesis:...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308103400.057_20260308_103436
|
Section 1: README.md
Adaptive Precision State Caching for Distilled SSMs Overview This benchmark validates the "Adaptive Precision State Caching" innovation applied to a distilled State Space Model (SSM). The core hypothesis is that by storing the recurrent hid...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308103547.058_20260308_103622
|
---
README.md --- Dynamic Precision State Caching for Distilled SSMs Overview This benchmark implements a lightweight, custom Selective State Space Model (SSM) inspired by Mamba. It demonstrates a memory-efficient training strategy combining **...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308103826.059_20260308_103907
|
Tiered-Precision State Distillation Benchmark
README.md Tiered-Precision State Distillation Benchmark This benchmark validates the memory efficiency of a Tiered-Precision State Caching mechanism for State Space Models (SSMs). Hypothesis By implementing a tiered caching mechanism that d...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308104114.060_20260308_104158
|
---
README.md Distilled Adaptive-Precision State Caching for Memory-Efficient SSMs This repository contains the benchmark implementation for the "Distilled Adaptive-Precision State Caching" innovation. Overview This project demonstrates a novel...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308104421.061_20260308_104451
|
```markdown
README.md bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_self.20260308104741.062_20260308_104803
|
Section 1: README.md
Adaptive-Precision Distilled State Caching for Memory-Bound SSMs Benchmark Overview This benchmark evaluates a novel memory optimization technique for State Space Models (SSMs). The innovation combines **knowledge distillation** with a **ti...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308105024.063_20260308_105047
|
Distilled State-Space Models with Temporal Dynamic Precision Caching
README.md bash pip install torch tqdm bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_self.20260308105413.064_20260308_105454
|
Low-Memory Distilled SSMs via Tiered Dynamic Precision Caching
README.md Low-Memory Distilled SSMs via Tiered Dynamic Precision Caching Overview This benchmark evaluates a novel memory optimization technique for State-Space Models (SSMs) during long-context inference. The innovation involves "Tiered Dy...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308105706.065_20260308_110021
|
Benchmark: Tiered-Precision State Caching for SSMs
README.md Benchmark: Tiered-Precision State Caching for SSMs Overview This benchmark evaluates the efficacy of **Tiered-Precision State Caching**, a technique designed to optimize memory usage and inference speed for Long-Context State Spac...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308110112.066_20260308_110316
|
Here is the runnable benchmark code designed for the **Tiered-Precision Distilled Mamba** concept.
README.md bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_self.20260308110352.067_20260308_110415
|
Efficient Mamba Distillation Benchmark
This benchmark evaluates the "Dynamic Precision State Caching" technique applied to a distilled Student-Teacher Mamba pipeline. Hypothesis Storing the recurrent hidden state `h_t` in `bfloat16` instead of `float32` reduces peak VRAM usage d...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308111607.068_20260308_111641
|
Dynamic Precision Mamba Distillation Benchmark
README.md Dynamic Precision Mamba Distillation Benchmark This repository contains a benchmark designed to evaluate the efficiency gains of a **Dynamic Precision Mamba** model distilled from a larger Transformer teacher, utilizing a **Persis...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308111853.069_20260308_112037
|
Here is the benchmark design based on the provided internal policies and the "Dynamic Precision State Space Distillation...
README.md --- Benchmark: Dynamic Precision SSM with Adaptive Caching Overview This benchmark evaluates the performance characteristics of a **State Space Model (SSM)** enhanced with **Dynamic Precision** and **Adaptive Caching** mechanisms....
|
03-29 08:01 | Success | - | |
|
exp_self.20260308112212.070_20260308_112247
|
Section 1: README.md
Section 2: benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_self.20260308112501.071_20260308_112524
|
Benchmark: Dynamic Precision Distilled SSM
README.md Benchmark: Dynamic Precision Distilled SSM Overview This benchmark evaluates a **Dynamic Precision Distilled State Space Model (SSM)**. The core hypothesis is that selectively applying lower precision (bfloat16/float16) to the rec...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308113518.001_20260308_113547
|
```markdown
README.md bash pip install torch tqdm bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_self.20260308113759.002_20260308_113824
|
Dynamic Precision SSM & Caching Distillation Benchmark
README.md Dynamic Precision SSM & Caching Distillation Benchmark This benchmark validates the hypothesis that a **Dynamic Precision Selective State Space Model (SSM)** with **Memory-Efficient Caching** significantly reduces GPU memory footp...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308114155.003_20260308_114225
|
Here is the design for the runnable benchmark.
README.md Mixed-Precision Cached State Distillation Benchmark This repository contains a minimal, runnable benchmark designed to validate the hypothesis that **Dynamic Precision** and **State Caching** can significantly reduce VRAM usage an...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308114440.004_20260308_114507
|
Dynamic Precision Distilled SSM Benchmark
README.md Dynamic Precision Distilled SSM Benchmark This benchmark evaluates the hypothesis that a Student State Space Model (SSM), utilizing dynamic precision and a segment-aware state cache, achieves lower peak VRAM usage and higher infer...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308114844.005_20260308_114905
|
Section 1: README.md
bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_self.20260308115116.006_20260308_115154
|
Efficient Distillation of Memory-Cached SSMs
README.md Efficient Distillation of Memory-Cached SSMs This benchmark demonstrates the efficiency gains of applying **Dynamic Precision** and **State Caching** to a student State Space Model (SSM) that has been distilled from a larger teach...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308120841.001_20260308_120920
|
Adaptive Precision Caching for SSM Distillation Benchmark
README.md Adaptive Precision Caching for SSM Distillation Benchmark This repository contains a synthetic benchmark designed to validate the "Adaptive Precision Caching for SSM Distillation" hypothesis. It simulates a State Space Model (SSM)...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308121148.002_20260308_121224
|
Dynamic State Precision for Low-Memory SSM Distillation
README.md Dynamic State Precision for Low-Memory SSM Distillation Overview This benchmark validates the hypothesis that dynamically reducing the numerical precision of recurrent state tensors (the SSM cache) during training allows for proce...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308121501.003_20260308_121531
|
Dynamic-Precision SSM Distillation Benchmark
README.md Dynamic-Precision SSM Distillation Benchmark This repository contains a minimal, runnable benchmark to evaluate the efficiency of **Dynamic-Precision State Space Models (SSM)** combined with **Knowledge Distillation** and **Cached...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308121814.004_20260308_121842
|
Low-Bit State Caching for Distilled Mamba Inference
README.md Low-Bit State Caching for Distilled Mamba Inference This benchmark validates the hypothesis that applying dynamic precision quantization to the recurrent state cache of a distilled Mamba-style State Space Model (SSM) significantly...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308122151.005_20260308_122216
|
Dynamic-Precision State Caching for Memory-Efficient SSM Distillation
README.md Dynamic-Precision State Caching for Memory-Efficient SSM Distillation Overview This benchmark evaluates the hypothesis that applying dynamic precision reduction to the recurrent state caches of a Student State Space Model (SSM) du...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308122430.006_20260308_122746
|
```markdown
README.md bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_self.20260308122833.007_20260308_123025
|
Benchmark: Cache-Augmented Dynamic Precision SSM Distillation
README.md Benchmark: Cache-Augmented Dynamic Precision SSM Distillation Overview This benchmark validates the "Backfill Candidate" concept for **Cache-Augmented Dynamic Precision SSM Distillation**. Although the original experiment (`self.2...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308123118.008_20260308_123150
|
Cache-Augmented Dynamic Precision SSM Distillation
README.md Cache-Augmented Dynamic Precision SSM Distillation This repository contains a runnable benchmark demonstrating the **Cache-Augmented Dynamic Precision SSM Distillation** technique. Abstract This innovation hypothesizes that applyi...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308123338.009_20260308_123532
|
Benchmark: Memory-Efficient Distilled SSM with Dynamic Precision
README.md Benchmark: Memory-Efficient Distilled SSM with Dynamic Precision This benchmark evaluates the performance characteristics of a synthetic **State Space Model (SSM)** architecture designed for memory efficiency and dynamic precision...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308123622.010_20260308_123649
|
Adaptive Precision Distilled SSM with State Caching
README.md Adaptive Precision Distilled SSM with State Caching Overview This benchmark demonstrates an innovative approach to efficient Large Language Model (LLM) training and inference. It validates the hypothesis that distilling a dense Tr...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308124927.011_20260308_125115
|
Memory-Constrained Dynamic Precision Distillation for SSMs
README.md Memory-Constrained Dynamic Precision Distillation for SSMs Overview This benchmark evaluates a **Dynamic Precision** strategy for State Space Models (SSMs). Traditional Large Language Models (LLMs) rely on KV-caches which grow qua...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308125208.012_20260308_125236
|
Mixed-Precision SSM Distillation with State Caching
README.md Mixed-Precision SSM Distillation with State Caching Innovation Overview This benchmark evaluates a **Mixed-Precision Student-Teacher Distillation** pipeline designed to optimize State Space Models (SSMs) on memory-constrained hard...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308125532.013_20260308_125600
|
Dynamic Precision SSM Distillation Benchmark
README.md Dynamic Precision SSM Distillation Benchmark This repository contains a standalone benchmark designed to test the hypothesis that **State Space Model (SSM) distillation combined with Dynamic Precision State Caching** can significa...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308125834.014_20260308_125900
|
Dynamic Precision SSM Distillation with Logit Caching
README.md Dynamic Precision SSM Distillation with Logit Caching Overview This benchmark demonstrates a novel approach to Knowledge Distillation (KD) designed for hardware-constrained environments (e.g., 8GB GPUs). It combines a Transformer-...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308130119.015_20260308_130316
|
Benchmark: Dynamic Precision SSM with State Caching
README.md Benchmark: Dynamic Precision SSM with State Caching This benchmark evaluates the performance characteristics of a simulated State Space Model (SSM) augmented with **Dynamic Precision** and **State Caching** mechanisms. Overview Th...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308132907.003_20260308_132934
|
FP8 Dynamic State Quantization Benchmark
README.md FP8 Dynamic State Quantization Benchmark This benchmark evaluates the **FP8 Dynamic State Quantization** innovation. The core hypothesis is that the recurrent state memory bandwidth in State Space Models (SSMs) like Mamba is a bot...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308133122.004_20260308_133153
|
Hybrid Attention-SSM with Cross-Layer State Recycling
README.md Hybrid Attention-SSM with Cross-Layer State Recycling Hypothesis The Attention mechanism captures rich local context. Projecting the final Attention KV-cache into the initial SSM state $h_0$ will result in faster convergence and l...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308133546.005_20260308_133737
|
Benchmark: SSM + Cache Co-design vs Standard Attention
README.md Benchmark: SSM + Cache Co-design vs Standard Attention This benchmark evaluates the memory efficiency and inference speed of a **State Space Model (SSM)** augmented with a cache-co-design strategy against a standard Transformer-st...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308133800.006_20260308_134243
|
Benchmark: SSM + Cache Co-design vs. Standard Attention
README.md Benchmark: SSM + Cache Co-design vs. Standard Attention Overview This benchmark evaluates the performance characteristics of a simulated **State Space Model (SSM) with Cache Co-design** against a standard **Transformer Attention**...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308134341.007_20260308_134420
|
Entropy-Gated State Caching for SSMs
README.md Entropy-Gated State Caching for SSMs Innovation This benchmark explores an optimization technique for State Space Models (SSMs) such as Mamba. The core hypothesis is that not every token in a sequence requires a full-precision sta...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308134543.008_20260308_134621
|
Benchmark: Cross-Layer State Recycling (Tied States)
README.md Benchmark: Cross-Layer State Recycling (Tied States) Overview This benchmark tests the hypothesis that sharing recurrent state memory between sequential layers (Cross-Layer Tying) can significantly reduce VRAM usage with minimal i...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308134914.009_20260308_135005
|
---
Combining **SSM + Cache + Memory** will improve throughput or memory efficiency without breaking 8GB execution.
|
03-29 08:01 | Success | - | |
|
exp_self.20260308135202.010_20260308_135223
|
Associative State Memory (ASM) Retrieval Benchmark
This benchmark evaluates the "Associative State Memory (ASM)" innovation. The core hypothesis is that augmenting a State Space Model (SSM) with a non-recurrent, associative memory bank (using KNN lookup) improves recall capabilities with ac...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308135530.011_20260308_135559
|
CPU-Pinned State Streaming (CPSS) Benchmark
README.md CPU-Pinned State Streaming (CPSS) Benchmark Overview This benchmark validates the **CPU-Pinned State Streaming (CPSS)** innovation. The core hypothesis is that offloading the SSM (State Space Model) recurrent state tensor to pinne...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308135717.012_20260308_135752
|
Cross-Layer State Sharing via Memory Cache
README.md Cross-Layer State Sharing via Memory Cache This benchmark validates the hypothesis that deep State Space Models (SSMs) re-learn similar features at different depths. By explicitly caching and injecting the state from Layer $N$ int...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308135903.013_20260308_135937
|
Sparse Associative State Cache (SAS-Cache)
This repository contains the reference implementation and benchmark for the **Sparse Associative State Cache (SAS-Cache)**. Hypothesis Standard State Space Models (SSMs) like Mamba compress the entire history into a fixed hidden state. Whil...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308140228.014_20260308_140257
|
```markdown
bash pip install torch tqdm bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_self.20260308140501.016_20260308_140529
|
```markdown
Student Hypothesis Benchmark: SSM + Cache + Memory Co-design Hypothesis Combining **SSM** (State Space Models), **Cache** (State retention), and **Memory** (Gradient Checkpointing/Precision) optimizations will improve throughput and memory...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308140650.017_20260308_140722
|
Entropy-Driven Dynamic Quantization for SSM States
README.md Entropy-Driven Dynamic Quantization for SSM States Overview This benchmark explores the hypothesis that State Space Models (SSMs) do not require full precision (FP16) for their recurrent states when processing predictable, low-ent...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308141439.020_20260308_141506
|
CPU-Pinned Segmented State Streaming
README.md CPU-Pinned Segmented State Streaming Hypothesis LLM inference is fundamentally memory-bound. By treating the SSM (State Space Model) state or KV-Cache as a paged cache and streaming fixed-size segments from CPU RAM, we can effecti...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308141647.021_20260308_141715
|
Benchmark: Segmented State Recycle with Sliding Window Eviction
README.md Benchmark: Segmented State Recycle with Sliding Window Eviction Overview This benchmark evaluates an innovative memory management technique for State Space Models (SSMs) and Attention-based mechanisms. By implementing a segmented...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308141824.022_20260308_141841
|
Saliency-Triggered CPU Stream Benchmark
README.md Saliency-Triggered CPU Stream Benchmark This benchmark evaluates the **Saliency-Triggered CPU Stream** innovation for State Space Models (SSMs). Hypothesis Deeper layers in SSMs frequently enter low-entropy states where they act m...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308142120.023_20260308_142148
|
Saliency-Gated Async State Spilling Benchmark
README.md Saliency-Gated Async State Spilling Benchmark This repository contains a benchmark implementation for **Saliency-Gated Async State Spilling**, a technique designed to optimize memory usage in State Space Models (SSMs) like Mamba d...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308142311.024_20260308_142338
|
Benchmark: SSM + Cache + Memory Co-design
README.md Benchmark: SSM + Cache + Memory Co-design Overview This benchmark evaluates a **Student Hypothesis** regarding the co-design of State Space Models (SSM), efficient Caching strategies, and Dynamic Memory management. **Hypothesis:**...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308142448.025_20260308_142516
|
---
Section 1: README.md Benchmark: Entropy-Gated State Skipping Overview This benchmark evaluates the **Entropy-Gated State Skipping** innovation for Selective State Space Models (SSMs). The core hypothesis is that not every token requires a f...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308142858.026_20260308_142948
|
Cache-Retrieval Augmented SSM (CRASS)
This repository contains the benchmark suite for **CRASS (Cache-Retrieval Augmented SSM)**. Overview CRASS proposes a hybrid architecture where the hidden state of a State Space Model (SSM) is used to explicitly query a Key-Value (KV) cache...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308143545.027_20260308_143613
|
**README.md**
bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_self.20260308143924.029_20260308_143945
|
Delta-State Residual Compression
README.md Delta-State Residual Compression Hypothesis The state tensor $H_t$ in State Space Models (SSMs) exhibits high temporal correlation ($H_t \approx H_{t-1}$). Storing the full state for every sequence step during generation is redund...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308144238.030_20260308_144501
|
Magnitude-Adaptive State Quantization (MASQ)
Overview This benchmark implements **Magnitude-Adaptive State Quantization (MASQ)** for State Space Models (SSM). The Innovation Standard SSMs and RNNs maintain a hidden state `h` that is typically stored in full precision (FP32 or FP16). H...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308144526.031_20260308_144602
|
```markdown
bash pip install torch bash python benchmark.py ``` Expected Output The script will output VRAM usage and tokens per second. We expect a significant reduction in VRAM for the Innovation mode (>40%) with a negligible drop in processing speed...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308144831.032_20260308_144904
|
Delta-State Streaming Benchmark
README.md Delta-State Streaming Benchmark Overview This benchmark evaluates **Delta-State Streaming**, an optimization technique designed to reduce the overhead of CPU-GPU data transfer in State Space Models (SSMs) or large recurrent networ...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308145053.033_20260308_145129
|
Linear-Sparse Recurrent Cache (LSRC) Benchmark
README.md Linear-Sparse Recurrent Cache (LSRC) Benchmark This repository contains the benchmark code for the **Linear-Sparse Recurrent Cache (LSRC)** innovation. The Innovation State Space Models (SSMs), like Mamba, are excellent at efficie...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308145231.034_20260308_145254
|
Hierarchical State Cache (CPU-GPU Offload)
README.md Hierarchical State Cache (CPU-GPU Offload) Innovation Overview This benchmark demonstrates a **Hierarchical State Cache** strategy for State Space Models (SSMs). By treating CPU pinned memory as a "Level 2" cache, we decouple the...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308145611.035_20260308_145633
|
```markdown
README.md bash pip install torch tqdm python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_self.20260308145846.036_20260308_145909
|
Section 1: README.md
No summary available yet.
|
03-29 08:01 | Success | - | |
|
exp_self.20260308150350.038_20260308_150416
|
Section 1: README.md
bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_self.20260308150551.039_20260308_150618
|
CPU-Pinned State Checkpointing (CPSC)
README.md CPU-Pinned State Checkpointing (CPSC) Overview This benchmark validates the **CPU-Pinned State Checkpointing (CPSC)** innovation. The hypothesis is that by offloading SSM (State Space Model) states to CPU pinned memory (system RAM...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308150904.040_20260308_150932
|
Entropy-Gated State Skipping Benchmark
README.md Entropy-Gated State Skipping Benchmark This repository contains a minimal, self-contained benchmark for the **Entropy-Gated State Skipping** innovation. Hypothesis Tokens with low information density (low entropy) induce minimal c...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308151038.041_20260308_151119
|
Adaptive State Dimensionality (ASD) Benchmark
This benchmark evaluates the **Adaptive State Dimensionality (ASD)** hypothesis. The core idea is that not all tokens in a sequence require the full state capacity of an SSM (State Space Model). By using a lightweight gating network, we cla...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308151512.042_20260308_151537
|
Student hypothesis: ssm + cache + memory
This repository contains a compact, runnable benchmark designed to test the hypothesis that combining State Space Models (SSM), explicit state caching, and dynamic memory precision can improve throughput and memory efficiency compared to st...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308151705.043_20260308_151729
|
```markdown
bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_self.20260308151832.044_20260308_151858
|
**Innovation:** CPU-GPU State Streamer (CGSS)
README.md **Innovation:** CPU-GPU State Streamer (CGSS) **Objective:** Benchmark the viability of offloading SSM (State Space Model) history states to CPU pinned memory to process sequences longer than GPU VRAM normally allows. Problem Stat...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308152053.045_20260308_152129
|
Benchmark: Delta-State Cache Compression (DSCC)
README.md Benchmark: Delta-State Cache Compression (DSCC) Overview This benchmark implements and tests the **Delta-State Cache Compression (DSCC)** hypothesis for State Space Models (SSMs), specifically targeting Mamba-like architectures. T...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308152219.046_20260308_152331
|
```markdown
README.md bash python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_self.20260308152356.047_20260308_152425
|
Sketch-Based SSM History Compression
README.md Sketch-Based SSM History Compression Innovation Summary This benchmark validates a novel approach to decoupling context length from VRAM usage in State Space Models (SSMs). By treating the SSM's hidden state trajectory as a stream...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308153834.048_20260308_153903
|
Student Hypothesis Benchmark: SSM + Cache Co-design
README.md Student Hypothesis Benchmark: SSM + Cache Co-design Hypothesis We hypothesize that a co-design combining **State Space Models (SSM)**, **Caching mechanisms**, and **Memory optimization (Dynamic Precision)** will significantly impr...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308154606.049_20260308_154627
|
Sketch-Preconditioned SSM State
Overview This benchmark implements a **Sketch-Preconditioned State Space Model (SSM)**. The core hypothesis is that the hidden state $h$ in standard recurrent architectures (like Mamba) is often redundant or low-rank. Instead of maintaining...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308154740.050_20260308_154808
|
Benchmark: Asynchronous CPU State Streaming for SSMs
README.md Benchmark: Asynchronous CPU State Streaming for SSMs Overview This benchmark evaluates a **CPU Offload Strategy** for State Space Models (SSMs). Specifically, it tests the hypothesis that offloading the recurrent state accumulatio...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308155358.052_20260308_155601
|
Here is the design for a runnable benchmark based on the hypothesis of **SSM + Cache Co-design with Dynamic Precision**.
Since the original architectural output was empty, this benchmark implements a representative synthetic experiment. It compares a baseline Float32 SSM implementation against an "Optimized" version that utilizes Dynamic Precision (simulating...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308155643.053_20260308_155717
|
Innovation: Token-Entropy Dynamic Precision for SSMs
README.md Innovation: Token-Entropy Dynamic Precision for SSMs Hypothesis Not all tokens require high-precision state updates in State Space Models (SSMs). High-entropy tokens (rare words carrying high information) require FP16 stability to...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308160924.054_20260308_160958
|
KV-State Hybrid Cache Benchmark
README.md KV-State Hybrid Cache Benchmark This repository contains a minimal, runnable benchmark for the **KV-State Hybrid Cache** architecture innovation. Hypothesis Standard Transformers rely on growing KV-Caches, which consume massive VR...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308161105.055_20260308_161141
|
Here are the requested files.
README.md
|
03-29 08:01 | Success | - | |
|
exp_self.20260308163619.001_20260308_163648
|
Frequency-Domain State Compression Benchmark
README.md This repository contains a benchmark for **Frequency-Domain State Compression**, a novel technique to optimize memory usage in State Space Models (SSMs). The Innovation SSMs maintain a large internal state tensor that scales with...
|
03-29 08:01 | Pending | - | |
|
exp_self.20260308165917.001_20260308_170017
|
Entropy-Adaptive State Quantization (EASQ)
README.md Entropy-Adaptive State Quantization (EASQ) This benchmark tests the hypothesis that State Space Model (SSM) hidden states can be dynamically quantized to `float8` without significant performance degradation when the model's predic...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308170232.002_20260308_170309
|
Tiered State Streaming (TSS) Benchmark
README.md Tiered State Streaming (TSS) Benchmark Overview This benchmark implements **Tiered State Streaming (TSS)**, a technique designed to overcome VRAM limitations in State Space Models (SSMs) like Mamba. The Innovation Standard SSMs ma...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308170554.003_20260308_170642
|
Innovation: Entropy-Gated State Quantization
README.md Innovation: Entropy-Gated State Quantization **Title:** Entropy-Gated State Quantization for SSMs **Techniques:** ssm, dynamic_precision, memory Hypothesis The recurrent state in Selective State Space Models (SSMs) like Mamba cont...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308171551.001_20260308_171622
|
Student hypothesis: ssm + cache co-design
Paper ID: self.20260308171551.001 - Hypothesis: Combining ssm + cache + memory will improve throughput or memory efficiency without breaking 8GB execution. - Plan: Create a compact comparative benchmark against a simple baseline, measure VR...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308171721.002_20260308_171800
|
Linear-Associative State Injection (LASI) Benchmark
README.md Linear-Associative State Injection (LASI) Benchmark This repository contains a minimal, runnable benchmark demonstrating the **Linear-Associative State Injection (LASI)** concept. Overview Standard State Space Models (SSMs), like...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308171902.003_20260308_171929
|
Dynamic Entropy State Reset
README.md Dynamic Entropy State Reset **Innovation:** Dynamic Entropy State Reset (SSM) **Hypothesis:** High entropy in output logits indicates a transition or noise. Using this as a trigger to reset the SSM state will improve stability and...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308172220.004_20260308_172302
|
Section 1: README.md
bash pip install torch numpy python benchmark.py
|
03-29 08:01 | Success | - | |
|
exp_self.20260308172406.005_20260308_172438
|
Per-Matrix Dynamic Precision
Paper ID: self.20260308172406.005 - Hypothesis: The projection matrices (B, C) are more robust to quantization than the state transition matrix (A). Applying aggressive 4-bit quantization only to B/C yields speedups with minimal accuracy lo...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308172546.006_20260308_172621
|
GLA-2: Hybrid Linear-SSM Gate Benchmark
README.md GLA-2: Hybrid Linear-SSM Gate Benchmark This repository implements a benchmark for the **GLA-2 (Gated Linear-Attention 2)** architecture. This innovation tests the hypothesis that a lightweight, learned gating mechanism can optima...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308172925.007_20260308_172954
|
Student hypothesis: ssm + cache co-design
Paper ID: self.20260308172925.007 - Hypothesis: Combining ssm + cache + memory will improve throughput or memory efficiency without breaking 8GB execution. - Plan: Create a compact comparative benchmark against a simple baseline, measure VR...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308173100.008_20260308_173150
|
Hierarchical State Space Partitioning (HSSP)
Paper ID: self.20260308173100.008 - Hypothesis: The SSM state vector can be segmented into a short-term active window (GPU) and a long-term compressed history (CPU). Transferring only the delta every N steps will maintain perplexity while r...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308173513.009_20260308_173548
|
Zero-Copy Memory-Mapped State Streaming for SSMs
README.md Zero-Copy Memory-Mapped State Streaming for SSMs This repository provides a runnable benchmark for **Zero-Copy Memory-Mapped State Streaming**. The Innovation Standard State Space Models (SSMs) require maintaining recurrent states...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308174849.010_20260308_174914
|
Dormant State Offloading (DSO) Benchmark
README.md Dormant State Offloading (DSO) Benchmark **Innovation:** Dormant State Offloading (DSO) **Category:** Memory Optimization, SSM/Cache Management Hypothesis State Space Models (SSMs) and Transformers processing long contexts (128k+)...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308175017.011_20260308_175052
|
Cross-Layer State Distillation (CLSD) Benchmark
README.md Cross-Layer State Distillation (CLSD) Benchmark Overview This benchmark evaluates the **Cross-Layer State Distillation (CLSD)** hypothesis. The core idea is to replace a "Deep" stack of sequential State Space Model (SSM) layers wi...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308175435.012_20260308_175501
|
Asynchronous Host-Device State Ring Buffer
README.md Asynchronous Host-Device State Ring Buffer Hypothesis By maintaining a sliding window of 'active' states on GPU and 'dormant' states in pageable/pinned CPU memory, we can theoretically infer infinite context lengths on 8GB GPUs, b...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308175629.013_20260308_175701
|
Sparse Associative State Injection (SASI)
Paper ID: self.20260308175629.013 - Hypothesis: Injecting a k-NN retrieved vector from a running history cache into the SSM input will improve performance on long-context needle-in-haystack tasks without re-training. - Plan: Implement a CPU...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308175823.014_20260308_175854
|
Prompt-Gated Temporal Decay (PGTD) Benchmark
README.md Prompt-Gated Temporal Decay (PGTD) Benchmark This benchmark evaluates the **Prompt-Gated Temporal Decay (PGTD)** innovation against a standard SSM baseline. Hypothesis Static SSMs often forget early context due to fixed decay rate...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308180140.015_20260308_180204
|
Entropy-Adaptive State Quantization Benchmark
README.md Entropy-Adaptive State Quantization Benchmark This benchmark evaluates a novel **Dynamic Precision State Space Model (SSM)** wrapper. The core hypothesis is that memory bandwidth and compute can be optimized by adjusting the numer...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308180341.016_20260308_180415
|
Sparse Associative State (SAS) Benchmark
README.md Sparse Associative State (SAS) Benchmark This benchmark validates the **Sparse Associative State (SAS)** hypothesis, which proposes that dense State Space Model (SSM) states can be optimized for long-context tasks by offloading "d...
|
03-29 08:01 | Success | - | |
|
exp_self.20260308180540.017_20260308_180605
|
Benchmark: SSM + Cache + Dynamic Precision Co-design
README.md Benchmark: SSM + Cache + Dynamic Precision Co-design This benchmark investigates the hypothesis that integrating **State Space Models (SSM)**, optimized **Caching** strategies, and **Dynamic Precision** (AMP) can yield better memo...
|
03-29 08:01 | Success | - | |
|
exp_pytrain.20260329075911.001_20260329_075929
|
Dynamic Entry Point Dispatcher
This benchmark tests the efficiency and robustness of a dynamic plugin system using Python's `typing.Protocol`. The design simulates an entry-point based architecture where concrete classes are registered, validated against a structural int...
|
03-29 08:00 | Success | - | |
|
exp_pytrain.20260327104617.001_20260327_104619
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-27 10:46 | Success | - | |
|
exp_pytrain.20260326135218.064_20260326_135239
|
Python Skill Fallback
Title: Dynamic Plugin Loader with Protocol Enforcement - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-26 13:52 | Success | - | |
|
exp_pytrain.20260326132907.063_20260326_132939
|
Python Skill Fallback
Title: Dynamic ZipApp Construction and Runtime Type Verification - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-26 13:29 | Success | - | |
|
exp_pytrain.20260326130903.062_20260326_130924
|
Generic Plugin Registry with PEP 695 Syntax
Overview This benchmark demonstrates the use of **PEP 695 Type Parameter Syntax** (available in Python 3.12+) to create a robust, type-safe Generic Plugin Registry. Key Features 1. **Type Parameters (PEP 695)**: Uses the new `class ClassNam...
|
03-26 13:09 | Success | - | |
|
exp_pytrain.20260326124943.061_20260326_125019
|
Python Skill Fallback
Title: Dynamic Type-Safe Plugin Loader - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-26 12:50 | Success | - | |
|
exp_pytrain.20260326122844.060_20260326_122920
|
Python Skill Fallback
Title: Strictly-Typed Generic Registry for Distributed Configs - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-26 12:29 | Success | - | |
|
exp_pytrain.20260326120655.059_20260326_120725
|
Python Skill Fallback
Title: Generic Component Registry with Runtime Validation - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-26 12:07 | Success | - | |
|
exp_pytrain.20260326114330.058_20260326_114355
|
Strict-Typed Virtual Module Loader
This coding drill validates the ability to programmatically construct Python modules in memory using the `types` and `importlib` standard libraries, while enforcing strict behavioral contracts using `typing.Protocol`. Overview The script im...
|
03-26 11:44 | Success | - | |
|
exp_pytrain.20260326112128.057_20260326_112159
|
Type-Safe Dynamic Plugin Loader Benchmark
This benchmark tests a Python environment's ability to dynamically generate a package structure, load modules at runtime using `importlib`, and strictly validate their interfaces using modern static typing features (`typing.Protocol` and `@...
|
03-26 11:22 | Success | - | |
|
exp_pytrain.20260326105908.056_20260326_105930
|
Protocol-Based Extensible Data Ingestion Framework
This coding drill focuses on advanced type hinting features in Python, specifically `typing.Protocol` for structural subtyping and `typing.Generic` for creating reusable, type-safe components. Objective Implement a generic data ingestion an...
|
03-26 10:59 | Success | - | |
|
exp_pytrain.20260326103622.055_20260326_103655
|
Python Skill Fallback
Title: Type-Safe Generic Event Dispatcher - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-26 10:36 | Success | - | |
|
exp_pytrain.20260326101135.054_20260326_101209
|
Python Skill Fallback
Title: Strict Generic Registry for Extensible Packages - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-26 10:12 | Success | - | |
|
exp_pytrain.20260326094816.053_20260326_094840
|
```python
README.md Robust Plugin Loader with Runtime Type Validation Objective This benchmark tests your ability to construct a secure, dynamic plugin loading system using Python's standard library. The system must enforce strict interface contracts...
|
03-26 09:48 | Success | - | |
|
exp_pytrain.20260326092327.052_20260326_092413
|
Python Skill Fallback
Title: Strictly Typed Dynamic Component Registry - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-26 09:24 | Success | - | |
|
exp_pytrain.20260326085929.051_20260326_090005
|
Type-Safe Data Serializer and CLI Tool
This benchmark implements a robust, type-safe serialization library and command-line interface within a single Python file. Overview The `benchmark.py` script serves a dual purpose: 1. **Library**: It acts as an importable module providing...
|
03-26 09:00 | Success | - | |
|
exp_pytrain.20260326083042.050_20260326_083124
|
Dynamic Plugin Loader with Type Safety
Hypothesis A robust system relies on strict interfaces and dynamic discovery mechanisms rather than hard-coded dependencies. By combining `typing.Protocol` with `importlib`, developers can create extensible architectures that fail predictab...
|
03-26 08:31 | Success | - | |
|
exp_pytrain.20260326075836.049_20260326_075921
|
Dynamic Plugin Packaging and Type Verification
This benchmark tests a Python system's ability to dynamically generate, package, and verify source code at runtime. Scenario The system must act as an autonomous plugin manager. It defines a strict **Protocol** (`DataProcessor`) that expect...
|
03-26 07:59 | Success | - | |
|
exp_pytrain.20260326073144.048_20260326_073209
|
Python Skill Fallback
Title: Runtime-Validated Package Scaffolder with Modern Generics - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-26 07:32 | Success | - | |
|
exp_pytrain.20260326070337.047_20260326_070425
|
Type-Safe Dynamic Module Loader Benchmark
Overview This coding drill tests the ability to construct a robust, zero-dependency plugin architecture using Python's standard library. The focus is on strict interface enforcement using `typing.Protocol` and the dynamic loading of modules...
|
03-26 07:04 | Success | - | |
|
exp_pytrain.20260326063939.046_20260326_064013
|
Dynamic Plugin Registry with Virtual Package Simulation
Overview This benchmark tests your ability to construct a robust, type-safe plugin architecture similar to those found in high-performance ML frameworks like `vLLM` or `Diffusers`. It requires creating a virtual package namespace at runtime...
|
03-26 06:40 | Success | - | |
|
exp_pytrain.20260326061608.045_20260326_061642
|
PEP 621 Metadata Validator and Version Syncer
Overview This benchmark implements a robust, static analysis tool to ensure build integrity by synchronizing version information between a package's source code (`__init__.py`) and its build metadata (`pyproject.toml`). The Hypothesis Confi...
|
03-26 06:16 | Success | - | |
|
exp_pytrain.20260326053307.044_20260326_053344
|
Dynamic Type-Safe Plugin Registry Benchmark
This benchmark validates the implementation of a dynamic plugin system that combines runtime module discovery with static type checking using Python's `typing.Protocol`. Objective The goal is to implement a `PluginRegistry` that can: 1. Dyn...
|
03-26 05:33 | Success | - | |
|
exp_pytrain.20260326050255.043_20260326_050325
|
Typed Plugin System with Package Simulation
Overview This benchmark challenges the implementation of a robust, type-safe plugin architecture within a single Python file. It simulates a micro-package environment using standard library features, focusing on `typing.Protocol` for struct...
|
03-26 05:03 | Success | - | |
|
exp_pytrain.20260326043055.042_20260326_043149
|
Generic Entity Repository with PEP 695 Syntax
This benchmark tests the implementation of a generic in-memory repository using Python 3.12+ features. It validates the use of **PEP 695 Type Parameter Syntax** (introducing type parameters using square brackets) and the new `type` statemen...
|
03-26 04:31 | Success | - | |
|
exp_pytrain.20260326035513.041_20260326_035556
|
Python Skill Fallback
Title: Dynamic Module Loader with Structural Subtyping - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-26 03:55 | Success | - | |
|
exp_pytrain.20260326031951.040_20260326_032043
|
Benchmark: Typed Plugin Registry with Strict Packaging Hygiene
This coding drill evaluates your ability to design a robust, modular library architecture within a single file. You must leverage Python's advanced typing features (Protocols, Generics) to enforce interface contracts and implement strict pa...
|
03-26 03:20 | Success | - | |
|
exp_pytrain.20260326023831.039_20260326_023924
|
Strictly Typed Component Registry with CLI Simulation
This benchmark tests the ability to construct a zero-dependency, type-safe plugin registry and command-line interface (CLI) dispatcher, mimicking the architectural patterns found in major ML libraries like Hugging Face Transformers. Problem...
|
03-26 02:39 | Success | - | |
|
exp_pytrain.20260326020413.038_20260326_020438
|
Dynamic Type-Checked Plugin Loader
Objective Design a Python system that bridges static type safety with dynamic runtime execution. The goal is to define a strictly typed generic interface using `typing.Protocol` and `TypeVar`, programmatically generate a Python package in a...
|
03-26 02:04 | Success | - | |
|
exp_pytrain.20260326012417.037_20260326_012508
|
Python Skill Fallback
Title: Runtime Type-Checked Package Loader - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-26 01:25 | Success | - | |
|
exp_pytrain.20260326005036.036_20260326_005159
|
Metadata-Aware Plugin Loader
This benchmark challenges you to implement a robust, type-safe plugin architecture using Python's standard library. The system must dynamically discover a "third-party" plugin package using `importlib.metadata` and verify its compliance wit...
|
03-26 00:52 | Success | - | |
|
exp_pytrain.20260326001549.035_20260326_001633
|
PEP 695 Generic Pipeline Processor Benchmark
This benchmark tests your ability to utilize modern Python 3.12+ type hinting features (PEP 695) to build a robust, type-safe data processing pipeline. It validates the new Type Parameter Syntax for classes and type aliases, eliminating the...
|
03-26 00:16 | Success | - | |
|
exp_pytrain.20260325234909.034_20260325_234936
|
Strictly-Typed Modular Configuration Registry
This benchmark evaluates the implementation of a robust, library-grade configuration system using Python's advanced type hinting features. The solution must simulate a core component of a large-scale application (similar to LitGPT), enforci...
|
03-25 23:49 | Success | - | |
|
exp_pytrain.20260325232603.033_20260325_232640
|
Strict Configuration Validator Benchmark
This benchmark evaluates a high-performance, zero-dependency configuration validation engine designed for production-grade Python applications. It utilizes advanced metaprogramming with `typing` and `dataclasses` to enforce strict schema co...
|
03-25 23:26 | Success | - | |
|
exp_pytrain.20260325230200.032_20260325_230228
|
Generic Plugin Registry with Protocol-Based Constraints
Description This benchmark implements a robust, modular plugin registry system using Python's `typing.Protocol` and `typing.TypeVar`. It mimics architectural patterns found in large-scale frameworks (like Hugging Face Transformers or Diffus...
|
03-25 23:02 | Success | - | |
|
exp_pytrain.20260325223351.031_20260325_223418
|
Python Skill Fallback
Title: Protocol-Based Dynamic Plugin Loader - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-25 22:34 | Success | - | |
|
exp_pytrain.20260325220212.030_20260325_220242
|
Python Skill Fallback
Title: Metadata-Aware Secure Source Archiver - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-25 22:02 | Success | - | |
|
exp_pytrain.20260325213137.029_20260325_213231
|
Static Package Metadata and Type-Strictness Verifier
This benchmark implements a CLI verification tool designed to statically analyze Python package structures. It enforces code quality standards by parsing Abstract Syntax Trees (AST) without executing the target code, ensuring safety and sid...
|
03-25 21:32 | Success | - | |
|
exp_pytrain.20260325203954.028_20260325_204021
|
Benchmark: PEP 695 Generic Plugin Registry
This benchmark evaluates the implementation of a type-safe plugin registry system using **Python 3.12+ Type Parameter Syntax (PEP 695)**. Objectives 1. **Modern Syntax**: Utilize the new class-based type parameter syntax (e.g., `class Regis...
|
03-25 20:40 | Success | - | |
|
exp_pytrain.20260325201424.027_20260325_201454
|
Strictly Typed Autograd System with Protocol Contracts
Design Brief This coding drill validates the hypothesis that an autonomous system can produce robust, maintainable code by implementing a simplified Automatic Differentiation (autograd) engine. The implementation must leverage Python's type...
|
03-25 20:14 | Success | - | |
|
exp_pytrain.20260325195258.026_20260325_195320
|
Type-Safe Configuration & Dynamic Plugin Dispatcher Benchmark
Overview This benchmark evaluates the ability to construct a robust, modular Python architecture using the standard library (`typing`, `dataclasses`, `importlib`). It simulates a simplified Machine Learning inference framework where the exe...
|
03-25 19:53 | Success | - | |
|
exp_pytrain.20260325193156.025_20260325_193221
|
Python Skill Fallback
Title: Typed Event Dispatcher with Module Hygiene - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-25 19:32 | Success | - | |
|
exp_pytrain.20260325191221.024_20260325_191246
|
Strictly Typed Modular Log Processor
Overview This coding drill benchmark evaluates your ability to construct a robust, multi-module Python package that strictly enforces type safety and adheres to PEP 8 standards. Objective Create a Python package named `logtools` containing:...
|
03-25 19:12 | Success | - | |
|
exp_pytrain.20260325185336.023_20260325_185406
|
In-Memory Zip Loader with Protocol Enforcement
This coding drill demonstrates a robust method for creating, packaging, and enforcing strict structural typing (Protocol) for Python plugins dynamically loaded from a Zip archive, without persisting files to disk (using temporary files). Ob...
|
03-25 18:54 | Success | - | |
|
exp_pytrain.20260325183432.022_20260325_183453
|
PEP 695 Generic Registry & Introspection Benchmark
This benchmark evaluates the implementation of a thread-safe generic registry utilizing **PEP 695 Type Parameter Syntax** (introduced in Python 3.12). It validates the reduction of boilerplate code and verifies module introspection capabili...
|
03-25 18:34 | Success | - | |
|
exp_pytrain.20260325181145.021_20260325_181208
|
Structural Plugin Loader Benchmark
This benchmark evaluates the ability to construct a robust, decoupled plugin architecture using Python's `importlib` for dynamic discovery and `typing.Protocol` for structural interface validation. Objective Create a standalone system that...
|
03-25 18:12 | Success | - | |
|
exp_pytrain.20260325175112.020_20260325_175144
|
Dynamic Backend Registry with Runtime Type Verification
This benchmark demonstrates the creation of a robust, modular plugin system using Python's standard library. It simulates a high-performance computing environment (similar to ML frameworks like PyTorch or Lightning) where backend implementa...
|
03-25 17:51 | Success | - | |
|
exp_pytrain.20260325173055.019_20260325_173125
|
Dynamic Namespace Package Injection & Runtime Type Verification
Overview This benchmark tests the ability to implement a robust, runtime-safe plugin loader using Python's standard library. The solution must dynamically create a namespace package from a string source, inject it into the runtime path, and...
|
03-25 17:31 | Success | - | |
|
exp_pytrain.20260325170955.018_20260325_171018
|
Dynamic Plugin Inspector with Type Guarantees
This benchmark tests the ability to write a robust, type-safe Python utility for runtime package introspection. It utilizes `importlib.metadata` to inspect installed distributions and enforces strict data structures using `typing.TypedDict`...
|
03-25 17:10 | Success | - | |
|
exp_pytrain.20260325164801.017_20260325_164829
|
Type-Safe Plugin Loader with Runtime Validation
This benchmark evaluates the ability to design a robust dynamic plugin system using Python's standard library. Objective Create a `PluginLoader` class that dynamically discovers, imports, and validates Python modules from a temporary direct...
|
03-25 16:48 | Success | - | |
|
exp_pytrain.20260325162708.016_20260325_162736
|
Python Skill Fallback
Title: Generic Typed Pipeline and CLI Interface - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-25 16:27 | Success | - | |
|
exp_pytrain.20260325160606.015_20260325_160638
|
Modern Generic Plugin Registry - PEP 695 Benchmark
This benchmark validates the hypothesis that utilizing **PEP 695 Type Parameter Syntax** significantly reduces the boilerplate associated with defining generic containers while enforcing stricter interface adherence via **PEP 484 Protocols*...
|
03-25 16:06 | Success | - | |
|
exp_pytrain.20260325154529.014_20260325_154559
|
Dynamic 'Plugin' Registry with Type-Safe Packaging
This benchmark evaluates a Python engineer's ability to implement a modular, type-safe plugin system using advanced standard library features. Objective Construct a runtime environment that dynamically discovers, loads, and validates "plugi...
|
03-25 15:46 | Success | - | |
|
exp_pytrain.20260325152529.013_20260325_152555
|
Dynamic Module Construction & Type Validation Benchmark
This benchmark evaluates the ability to construct Python modules dynamically at runtime using `types.ModuleType` and `sys.modules`, and to rigorously validate the generated components against strict `typing.Protocol` definitions. Scenario T...
|
03-25 15:25 | Success | - | |
|
exp_pytrain.20260325150542.012_20260325_150609
|
Dynamic Component Registry and Generic Loader
This benchmark demonstrates the construction of a robust plugin architecture using Python's standard library. It mimics the `AutoModel` pattern found in major ML frameworks (like Hugging Face Transformers) by leveraging `inspect` and `typin...
|
03-25 15:06 | Success | - | |
|
exp_pytrain.20260325144750.011_20260325_144752
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-25 14:47 | Success | - | |
|
exp_pytrain.20260325144141.010_20260325_144142
|
Python Skill Fallback
Title: Python reliability drill: typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-25 14:41 | Success | - | |
|
exp_pytrain.20260325142735.009_20260325_142804
|
Python Skill Fallback
Title: Runtime Type-Safe Plugin Loader - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-25 14:28 | Success | - | |
|
exp_pytrain.20260325140547.008_20260325_140630
|
Python Skill Fallback
Title: Robust PEP 440 Version Resolver with Generic Constraints - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-25 14:06 | Success | - | |
|
exp_pytrain.20260325134544.007_20260325_134607
|
Strictly-Typed Configuration Resolver Benchmark
This benchmark validates a Python module (`benchmark.py`) that implements a strict configuration schema for tensor initialization using Python's `typing` module. Goals 1. **Structure**: Implement a module compliant with packaging standards...
|
03-25 13:46 | Success | - | |
|
exp_pytrain.20260325132404.006_20260325_132436
|
Python Skill Fallback
Title: Typed Plugin Registry with Semantic Versioning - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-25 13:24 | Success | - | |
|
exp_pytrain.20260325130330.005_20260325_130352
|
Strictly Typed Dependency Injection Container Benchmark
This benchmark implements a robust Dependency Injection (DI) container using Python's standard library. It demonstrates the use of `typing.Protocol` for interface definition and `inspect.signature` for automatic dependency resolution (auto-...
|
03-25 13:03 | Success | - | |
|
exp_pytrain.20260325123738.004_20260325_123810
|
Strictly Typed Plugin System CLI
This benchmark demonstrates the implementation of a strictly typed, architectural CLI using Python's `typing.Protocol`, `TypedDict`, and `argparse`. It simulates a plugin system where components are decoupled via structural subtyping (proto...
|
03-25 12:38 | Success | - | |
|
exp_pytrain.20260325121326.003_20260325_121424
|
Robust Async Micro-Service Skeleton Benchmark
This benchmark validates the implementation of a robust, asynchronous Python micro-service skeleton. It tests the developer's ability to structure a Python application simulating a package layout, utilizing strict type hints (`typing.Protoc...
|
03-25 12:14 | Success | - | |
|
exp_pytrain.20260325114709.002_20260325_114736
|
Python Skill Fallback
Title: PEP 695 Type-Safe Command Dispatcher - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-25 11:47 | Success | - | |
|
exp_pytrain.20260325112102.001_20260325_112127
|
Python Skill Fallback
Title: Strictly Typed Dynamic Plugin Loader - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-25 11:21 | Success | - | |
|
exp_pytrain.20260325104914.001_20260325_104946
|
Python Skill Fallback
Title: Structural Subtyping for Package Entry Points - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-25 10:49 | Success | - | |
|
exp_pytrain.20260324102734.004_20260324_102819
|
Strictly Typed Event Dispatcher Module
Overview This coding drill benchmarks a strictly typed, modular Event Dispatcher system designed with Python's `typing.Protocol` and `typing.Generic` features. Architecture The solution implements a **Type-Safe Observer Pattern**. 1. **`Eve...
|
03-24 10:28 | Success | - | |
|
exp_pytrain.20260324095427.003_20260324_095516
|
Protocol-Based Dynamic Extension Loader
Objective This benchmark tests a Python system's ability to simulate a robust, heterogeneous plugin architecture. It demonstrates the creation of a strict type-safe interface using `typing.Protocol`, dynamic discovery of modules using `impo...
|
03-24 09:55 | Success | - | |
|
exp_pytrain.20260324092754.002_20260324_092822
|
Python Skill Fallback
Title: Modern Generic Result Monad & Module API Design - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-24 09:28 | Success | - | |
|
exp_pytrain.20260324091447.001_20260324_091555
|
Python Skill Fallback
Title: Dynamic Entrypoint Loader with Structural Typing - Focus: typing.Protocol, typing.runtime_checkable, typing.Annotated, importlib, packaging - Note: Generated fallback due to unavailable model output.
|
03-24 09:15 | Success | - | |
|
exp_pytrain.20260318102029.002_20260318_102057
|
Generic Result Wrapper with PEP 695
This benchmark demonstrates the implementation of a robust, Rust-like `Result` type utilizing Python 3.12's PEP 695 Type Parameter Syntax. Features - **PEP 695 Syntax**: Uses the new `class MyClass[T]:` syntax, removing the need for explici...
|
03-18 10:21 | Success | - | |
|
exp_pytrain.20260318095243.001_20260318_095341
|
Strictly-Typed Modular Data Pipeline Benchmark
This benchmark evaluates a Python implementation of a modular data processing pipeline. The architecture prioritizes **Structural Subtyping (Protocols)** over nominal inheritance, ensuring that components are interchangeable based on their...
|
03-18 09:53 | Success | - | |
|
exp_pytrain.20260316152436.002_20260316_152457
|
Type-Safe Generic Cache (PEP 695)
This benchmark tests your ability to utilize **PEP 695 Type Parameter Syntax** (introduced in Python 3.12). The Challenge Modern Python allows you to define generic classes using the syntax `class MyClass[T]:`, removing the need for `TypeVa...
|
03-16 15:25 | Success | - | |
|
exp_pytrain.20260316150232.001_20260316_150252
|
MiniPlugin: Strictly Typed Modular Plugin System
Overview This benchmark demonstrates the implementation of a robust, single-file Python package named `MiniPlugin`. It showcases advanced Python features including Generic Protocols, TypeVars, and strict runtime type checking enforcement wi...
|
03-16 15:02 | Success | - | |
|
exp_pytrain.20260316142805.005_20260316_142858
|
Dynamic Namespace Packaging and Runtime Protocol Verification
This benchmark tests an autonomous agent's ability to programmatically construct a Python namespace package on a virtual file system, perform dynamic module loading using `importlib`, and enforce runtime interface contracts using `typing.Pr...
|
03-16 14:29 | Success | - | |
|
exp_pytrain.20260316140324.004_20260316_140411
|
Python Skill Fallback
Title: Type-Safe Plugin System with Packaging Hygiene - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-16 14:04 | Success | - | |
|
exp_pytrain.20260316134142.003_20260316_134220
|
Python Skill Fallback
Title: Strictly Typed Modular Data Pipeline - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-16 13:42 | Success | - | |
|
exp_pytrain.20260316131743.002_20260316_131835
|
Generic Versioned Registry using PEP 695
This benchmark tests the implementation of a type-safe, generic registry for versioned software artifacts using Python 3.12's Type Parameter Syntax (PEP 695). Objectives 1. Demonstrate the reduction of boilerplate code using the new generic...
|
03-16 13:18 | Success | - | |
|
exp_pytrain.20260316124809.001_20260316_124836
|
Dynamic Package Construction and Protocol Validation
Overview This benchmark evaluates the system's ability to programmatically construct Python package structures at runtime and validate type safety using `typing.Protocol`. Tasks 1. **Protocol Definition**: Define a `DataPlugin` protocol req...
|
03-16 12:48 | Success | - | |
|
exp_pytrain.20260316122337.002_20260316_122404
|
Generic Repository Pattern with PEP 695 Type Parameters
Overview This benchmark validates the implementation of a **Generic Repository Pattern** utilizing the **PEP 695 Type Parameter Syntax** introduced in Python 3.12. The objective is to demonstrate a clean, maintainable architecture by levera...
|
03-16 12:24 | Success | - | |
|
exp_pytrain.20260316115901.001_20260316_115932
|
Python Skill Fallback
Title: Implementation of a Strictly-Typed In-Memory Package Registry - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-16 11:59 | Success | - | |
|
exp_pytrain.20260316100558.004_20260316_100640
|
Python Skill Fallback
Title: Strictly-Typed Dynamic Plugin Loader - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-16 10:06 | Success | - | |
|
exp_pytrain.20260316093508.003_20260316_093529
|
Strict Zip-App Bundler with Runtime Type Validation
This benchmark tests the ability to engineer a robust code packaging pipeline. The script implements a `StrictBundler` class that enforces code quality standards by inspecting Python source files, ensuring type hint coverage using the `typi...
|
03-16 09:35 | Success | - | |
|
exp_pytrain.20260316090922.002_20260316_090959
|
Python Skill Fallback
Title: PEP 695 Generic Plugin Registry with Importlib Introspection - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-16 09:10 | Success | - | |
|
exp_pytrain.20260316084207.001_20260316_084229
|
Python Skill Fallback
Title: Strictly Typed Plugin System with Entry-Point Simulation - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-16 08:42 | Success | - | |
|
exp_pytrain.20260315163655.006_20260315_163719
|
Dynamic Type-Checked Plugin Loader Benchmark
Overview This benchmark validates the robustness of a modular autonomous system component by simulating the dynamic loading of a computation engine (plugin). It enforces strict **Protocol** compliance using Python's `typing` module and vali...
|
03-15 16:37 | Pending | - | |
|
exp_self.20260315162309.008_20260315_162330
|
Self-directed benchmark: SSM Strategy Stress Test
This repository contains a micro-benchmark designed to evaluate the efficacy of a **Disciplined Memory Policy** within State Space Models (SSMs). Hypothesis Applying an SSM with a disciplined memory policy (fixed-size state recurrence) sign...
|
03-15 16:34 | Success | - | |
|
exp_self.20260315162013.007_20260315_162046
|
SSM Strategy Stress Test Benchmark
This benchmark evaluates the performance impact of applying a disciplined memory policy to State Space Models (SSMs) when operating under strict VRAM constraints (8GB). Hypothesis Applying SSMs with a disciplined memory policy (chunked recu...
|
03-15 16:20 | Success | - | |
|
exp_pytrain.20260315161708.005_20260315_161732
|
Type-Safe Dynamic Plugin Loader
Objective This benchmark evaluates the implementation of a robust, type-safe plugin discovery system using Python's standard library. It tests proficiency in dynamic code loading (`importlib`) and Structural Sub-typing (`typing.Protocol`)....
|
03-15 16:17 | Success | - | |
|
exp_self.20260315161500.006_20260315_161525
|
Self-Directed Benchmark: SSM Strategy Stress Test
This benchmark evaluates the memory efficiency and throughput of a State Space Model (SSM) strategy compared to a standard quadratic attention mechanism under constrained resources. The hypothesis posits that a disciplined memory policy (co...
|
03-15 16:15 | Success | - | |
|
exp_self.20260315161215.005_20260315_161243
|
Benchmark: SSM Strategy Stress Test
This benchmark evaluates the hypothesis that a disciplined memory policy within a State Space Model (SSM) architecture improves throughput and reduces VRAM footprint compared to a naive accumulation baseline. Requirements - Python 3.8+ - Py...
|
03-15 16:12 | Success | - | |
|
exp_pytrain.20260315160916.004_20260315_160958
|
Python Skill Fallback
Title: Typed CSV Data Pipeline Module - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-15 16:10 | Success | - | |
|
exp_self.20260315160603.004_20260315_160637
|
Self-directed benchmark: SSM Strategy Stress Test
This benchmark evaluates a "Memory-Disciplined" State Space Model (SSM) strategy against a standard naive implementation. The hypothesis is that an SSM approach, which explicitly manages state history rather than materializing the entire at...
|
03-15 16:07 | Success | - | |
|
exp_self.20260315160247.003_20260315_160337
|
Benchmark: SSM Strategy Stress Test
This repository contains a lightweight, runnable benchmark designed to test the hypothesis that applying State Space Models (SSMs) with a disciplined memory policy improves throughput and efficiency under strict VRAM constraints (<8GB). Hyp...
|
03-15 16:03 | Success | - | |
|
exp_pytrain.20260315155952.003_20260315_160025
|
Strictly-Typed Plugin Registry with Runtime Validation
This coding drill implements a strictly-typed Plugin System using Python's `typing.Protocol` and the `@runtime_checkable` decorator. Unlike traditional Abstract Base Classes (ABCs) that rely on inheritance, this approach uses Structural Sub...
|
03-15 16:00 | Success | - | |
|
exp_self.20260315155616.002_20260315_155643
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260315155616.002 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
03-15 15:56 | Success | - | |
|
exp_pytrain.20260315155252.002_20260315_155322
|
Modern Generic Stack with Module Encapsulation
Objective This benchmark evaluates the implementation of a Python 3.12 generic stack class utilizing the new **PEP 695 Type Parameter Syntax**. It tests adherence to modern module packaging standards, including strict API definition via `__...
|
03-15 15:53 | Success | - | |
|
exp_self.20260315154412.001_20260315_154439
|
Self-directed benchmark: SSM strategy stress test
Overview This benchmark evaluates the memory efficiency and throughput of a State Space Model (SSM) strategy compared to a standard Transformer baseline. The innovation hypothesis is that an SSM with a disciplined memory policy (using recur...
|
03-15 15:51 | Success | - | |
|
exp_pytrain.20260315154100.001_20260315_154127
|
Dynamic Package Builder with Runtime Type Verification
**Hypothesis:** An autonomous coding system can utilize Python's standard library to programmatically construct a valid package namespace and enforce strict type safety (Generics and Protocols) at runtime without relying on external static...
|
03-15 15:41 | Success | - | |
|
exp_self.20260315153603.032_20260315_153636
|
SSM Strategy Stress Test: Memory Disciplined Benchmark
This benchmark evaluates the performance of a State Space Model (SSM) inference strategy under strict memory constraints (simulating an 8GB VRAM environment). Hypothesis Applying an SSM with a disciplined memory policy (chunking + precision...
|
03-15 15:36 | Success | - | |
|
exp_pytrain.20260315153303.017_20260315_153337
|
Dynamic Extension Loader with Protocol Validation
Problem Statement Modern Python plugin architectures require a mechanism to load code at runtime (dynamic packaging) while guaranteeing that the loaded code adheres to specific contracts (typing/protocols). Without strict runtime validation...
|
03-15 15:33 | Success | - | |
|
exp_self.20260315153058.031_20260315_153118
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260315153058.031 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
03-15 15:31 | Success | - | |
|
exp_self.20260315152802.030_20260315_152828
|
Self-directed benchmark: ssm strategy stress test
This repository contains a benchmark designed to test the hypothesis that applying State Space Models (SSMs) with a disciplined memory policy improves throughput under 8GB VRAM constraints compared to standard Transformer Attention mechanis...
|
03-15 15:28 | Success | - | |
|
exp_pytrain.20260315152518.016_20260315_152545
|
Python Skill Fallback
Title: Strictly Typed CSV Data Ingestion Module - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-15 15:25 | Success | - | |
|
exp_self.20260315152311.029_20260315_152337
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260315152311.029 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
03-15 15:23 | Success | - | |
|
exp_self.20260315152039.028_20260315_152105
|
SSM Strategy Stress Test Benchmark
This benchmark evaluates the impact of a "disciplined memory policy" on State Space Model (SSM) inference throughput under tight VRAM constraints. Hypothesis Applying an SSM recurrence strategy with explicit chunking and state management ma...
|
03-15 15:21 | Success | - | |
|
exp_pytrain.20260315151735.015_20260315_151806
|
Python Skill Fallback
Title: Typed Configuration Factory using PEP 695 - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-15 15:18 | Success | - | |
|
exp_self.20260315151528.027_20260315_151559
|
Self-Directed Benchmark: SSM Strategy Stress Test
This benchmark evaluates the hypothesis that a disciplined memory policy (specifically chunked processing and dynamic precision) applied to State Space Models (SSM) improves throughput under strict memory constraints (target < 8GB VRAM). Hy...
|
03-15 15:16 | Success | - | |
|
exp_self.20260315151223.026_20260315_151247
|
SSM Strategy Stress Test
This benchmark evaluates the hypothesis that applying State Space Models (SSMs) with a disciplined memory policy improves throughput under constrained VRAM environments (8GB limit). It contrasts a standard Attention-based block against a si...
|
03-15 15:13 | Success | - | |
|
exp_pytrain.20260315150912.014_20260315_150938
|
Type-Safe Component Registry and Dependency Resolver Benchmark
This drill evaluates the developer's ability to construct a robust, zero-dependency component loader using Python's advanced typing features. Objective Design a generic `PluginRegistry` system that manages component lifecycle and dependenci...
|
03-15 15:09 | Success | - | |
|
exp_self.20260315150704.025_20260315_150732
|
Self-directed benchmark: SSM Strategy Stress Test
Overview This benchmark evaluates the performance efficiency of a State Space Model (SSM) strategy compared to a standard Transformer baseline under constrained memory conditions (targeting <8GB VRAM). Hypothesis Applying SSM with a discipl...
|
03-15 15:07 | Success | - | |
|
exp_self.20260315150402.024_20260315_150434
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260315150402.024 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
03-15 15:04 | Success | - | |
|
exp_pytrain.20260315150114.013_20260315_150146
|
Python Skill Fallback
Title: Generic Plugin Registry with Dynamic Imports - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-15 15:01 | Success | - | |
|
exp_self.20260315145907.023_20260315_145945
|
Benchmark: SSM Strategy Stress Test
This benchmark evaluates the hypothesis that applying State Space Models (SSMs) with a disciplined memory policy improves throughput under 8GB VRAM constraints compared to standard attention mechanisms. Hypothesis SSMs maintain a fixed-size...
|
03-15 14:59 | Success | - | |
|
exp_self.20260315145606.022_20260315_145641
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260315145606.022 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
03-15 14:56 | Success | - | |
|
exp_pytrain.20260315145305.012_20260315_145336
|
Strictly-Typed Dynamic Plugin Registry
Overview This benchmark demonstrates the implementation of a robust, type-safe plugin architecture using Python's standard `typing` module. It mirrors architectural patterns found in major ML libraries (like Hugging Face Transformers) to en...
|
03-15 14:53 | Success | - | |
|
exp_self.20260315145048.021_20260315_145120
|
SSM Strategy Stress Test
This benchmark evaluates the hypothesis that **applying State Space Models (SSM) with a disciplined memory policy improves throughput under 8GB VRAM constraints** compared to standard dense architectures. Methodology We compare two modes of...
|
03-15 14:51 | Success | - | |
|
exp_self.20260315144746.020_20260315_144808
|
Self-Directed Benchmark: SSM Strategy Stress Test
Overview This benchmark evaluates the hypothesis that applying State Space Model (SSM) strategies—specifically **constant-state memory management** combined with **dynamic precision**—improves throughput and stability under strict 8GB VRAM...
|
03-15 14:48 | Success | - | |
|
exp_pytrain.20260315144432.011_20260315_144456
|
Strictly-Typed Dynamic Plugin Loader
Overview This coding drill evaluates the ability to synthesize Python's advanced typing features (Protocols, Generics, Type Guards) with standard library packaging tools (`importlib`). The goal is to create a robust, runtime-extensible arch...
|
03-15 14:45 | Success | - | |
|
exp_self.20260315144128.019_20260315_144154
|
SSM Strategy Stress Test
This benchmark evaluates the hypothesis that **State Space Models (SSMs)** combined with a **disciplined memory policy** (dynamic precision and caching) deliver superior throughput compared to standard Transformer-style architectures when o...
|
03-15 14:42 | Success | - | |
|
exp_pytrain.20260315143821.010_20260315_143851
|
Python Skill Fallback
Title: Strictly Typed ZipApp Generator - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-15 14:38 | Success | - | |
|
exp_self.20260315143610.018_20260315_143638
|
SSM Strategy Stress Test Benchmark
Overview This benchmark evaluates the hypothesis that a Selective State Space Model (SSM) strategy, combined with a disciplined memory policy, improves throughput under constrained VRAM conditions (e.g., 8GB) compared to a standard Transfor...
|
03-15 14:36 | Success | - | |
|
exp_self.20260315143335.017_20260315_143358
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260315143335.017 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
03-15 14:34 | Success | - | |
|
exp_pytrain.20260315143036.009_20260315_143101
|
Runtime-Verified Plugin Loader
Design Brief This benchmark demonstrates a zero-dependency plugin architecture using Python's `typing.Protocol` for structural subtyping. It simulates a packaging system by programmatically creating virtual modules using `types.ModuleType`,...
|
03-15 14:31 | Success | - | |
|
exp_self.20260315142813.016_20260315_142846
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260315142813.016 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
03-15 14:28 | Success | - | |
|
exp_self.20260315142506.015_20260315_142532
|
README: SSM Strategy Stress Test Benchmark
This benchmark evaluates the hypothesis that applying an SSM (State Space Model) strategy with a disciplined memory policy significantly improves throughput (tokens/sec) and reduces VRAM usage compared to a naive baseline when operating und...
|
03-15 14:25 | Success | - | |
|
exp_pytrain.20260315142212.008_20260315_142239
|
Generic Storage Package with Protocol Enforcement
This coding drill verifies the ability to design a Python package structure that adheres to modern packaging standards (`src` layout, `pyproject.toml`) and utilizes advanced typing features (`Protocol`, `Generic`, `TypeVar`) to enforce stri...
|
03-15 14:22 | Success | - | |
|
exp_self.20260315141931.014_20260315_141954
|
Self-directed Benchmark: SSM Strategy Stress Test
Overview This benchmark evaluates the hypothesis that applying State Space Model (SSM) strategies with disciplined memory policies (recurrent state management and dynamic precision) improves inference throughput under strict VRAM constraint...
|
03-15 14:20 | Success | - | |
|
exp_self.20260315141701.013_20260315_141724
|
Self-Directed Benchmark: SSM Strategy Stress Test
Innovation Overview This benchmark tests the hypothesis that applying a State Space Model (SSM) approach with a disciplined memory policy improves throughput under strict 8GB VRAM constraints compared to traditional Attention-based caching...
|
03-15 14:17 | Success | - | |
|
exp_pytrain.20260315141342.007_20260315_141415
|
Type-Generic Plugin Registry with Protocol Enforcement
This benchmark demonstrates the construction of a robust, modular plugin system using Python's `typing` module. It enforces structural interfaces via `Protocol` and manages algorithm components using a type-safe `Generic` registry. Features...
|
03-15 14:14 | Success | - | |
|
exp_self.20260315140035.012_20260315_140105
|
Self-directed benchmark: SSM strategy stress test
Overview This benchmark evaluates the hypothesis that applying **State Space Models (SSMs)** with a disciplined memory policy and dynamic precision improves throughput under tight **8GB VRAM constraints**. It compares a standard Attention-b...
|
03-15 14:11 | Success | - | |
|
exp_self.20260315135754.011_20260315_135817
|
SSM Strategy Stress Test
This benchmark compares the memory footprint and throughput of a standard Transformer-style Attention mechanism against a State Space Model (SSM) implementation. **Hypothesis:** The SSM approach, utilizing a disciplined recurrent memory pol...
|
03-15 13:58 | Success | - | |
|
exp_pytrain.20260315135459.006_20260315_135524
|
Dynamic Plugin Loader with Runtime Type Verification
Objective This benchmark evaluates a system's ability to dynamically construct a Python package environment at runtime, load arbitrary code modules, and strictly enforce interface compliance using Python's `typing.Protocol`. Scenario The sc...
|
03-15 13:55 | Success | - | |
|
exp_self.20260315135309.010_20260315_135331
|
SSM Strategy Stress Test Benchmark
Overview This benchmark evaluates the hypothesis that applying a State Space Model (SSM) strategy with a disciplined memory policy improves throughput under 8GB VRAM constraints compared to a baseline implementation. Hypothesis By leveragin...
|
03-15 13:53 | Success | - | |
|
exp_self.20260315135021.009_20260315_135046
|
Self-Directed Benchmark: SSM Strategy Stress Test
Overview This benchmark evaluates the hypothesis that **applying a Selective State Space Model (SSM) with a disciplined memory policy improves throughput under 8GB VRAM constraints** compared to a standard Transformer baseline. The Innovati...
|
03-15 13:50 | Success | - | |
|
exp_pytrain.20260315134740.005_20260315_134814
|
Python Skill Fallback
Title: Robust Semantic Versioning & Constraint Resolver - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-15 13:48 | Success | - | |
|
exp_self.20260315134512.008_20260315_134537
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260315134512.008 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
03-15 13:45 | Success | - | |
|
exp_self.20260315134222.007_20260315_134247
|
SSM Strategy Stress Test
This benchmark evaluates the hypothesis that State Space Model (SSM) architectures significantly reduce VRAM usage compared to standard Transformers when processing long sequences under strict memory constraints. Setup We compare two approa...
|
03-15 13:42 | Success | - | |
|
exp_pytrain.20260315133948.004_20260315_134011
|
Strictly-Typed Modular Log Analyzer
Overview This benchmark evaluates the implementation of a `log_analyzer` module that serves as both a reusable library and a standalone script. The design enforces strict static typing (`mypy --strict`), explicit public APIs (`__all__`), an...
|
03-15 13:40 | Success | - | |
|
exp_self.20260315133740.006_20260315_133804
|
Self-Directed Benchmark: SSM Strategy Stress Test
This benchmark evaluates the hypothesis that applying **State Space Models (SSM)** with a **disciplined memory policy** significantly improves inference throughput under constrained VRAM (8GB limit). The Innovation The proposed strategy com...
|
03-15 13:38 | Success | - | |
|
exp_self.20260315133448.005_20260315_133518
|
Self-directed benchmark: SSM Strategy Stress Test
This benchmark evaluates the hypothesis that applying State Space Models (SSM) with a disciplined memory policy (specifically dynamic precision and activation checkpointing) improves throughput and fits within strict VRAM constraints (8GB)...
|
03-15 13:35 | Success | - | |
|
exp_pytrain.20260315133200.003_20260315_133238
|
Coding Drill: Strictly Typed Dynamic Plugin Loader
Hypothesis An autonomous coding system can robustly integrate external functionality by simulating a package environment and enforcing structural subtyping (Protocols) to validate plugin interfaces before execution, thereby preventing runti...
|
03-15 13:32 | Success | - | |
|
exp_self.20260315133000.004_20260315_133027
|
Self-Directed Benchmark: SSM Strategy Stress Test
Overview This benchmark evaluates the hypothesis that applying State Space Model (SSM) architectures with a disciplined memory policy improves inference throughput under strict 8GB VRAM constraints compared to standard attention-based basel...
|
03-15 13:30 | Success | - | |
|
exp_self.20260315132656.003_20260315_132729
|
Self-directed Benchmark: SSM Strategy Stress Test
This benchmark evaluates the hypothesis that applying State Space Models (SSMs) with a disciplined memory policy (specifically, chunked state management and caching) improves throughput under constrained VRAM environments (<8GB). It compare...
|
03-15 13:27 | Success | - | |
|
exp_pytrain.20260315132425.002_20260315_132448
|
PEP 695 Generic Dependency Resolver
Overview This benchmark validates the implementation of a directed acyclic graph (DAG) dependency resolver using **Python 3.12+ Type Parameter Syntax** (PEP 695). The goal is to demonstrate the reduction of boilerplate code by utilizing the...
|
03-15 13:24 | Success | - | |
|
exp_self.20260315132149.002_20260315_132213
|
SSM Strategy Stress Test Benchmark
This repository contains a benchmark designed to test the hypothesis that applying **State Space Model (SSM)** strategies with a disciplined memory policy and dynamic precision improves throughput under strict **8GB VRAM** constraints. Hypo...
|
03-15 13:22 | Success | - | |
|
exp_self.20260315131817.001_20260315_131856
|
Self-directed Benchmark: SSM Strategy Stress Test
Overview This benchmark evaluates the performance of a State Space Model (SSM) simulation under strict memory constraints (8GB limit). It tests the hypothesis that applying **Dynamic Precision** (Float16) and a disciplined **Cache/Memory Po...
|
03-15 13:18 | Success | - | |
|
exp_pytrain.20260315131524.001_20260315_131548
|
Typing-Driven Dynamic Plugin Loader
This benchmark validates a Python architecture that enforces strict interface contracts at runtime using `typing.Protocol` and `importlib`. Objective The goal is to simulate a modular plugin system where code is discovered dynamically from...
|
03-15 13:15 | Success | - | |
|
exp_self.20260315131222.014_20260315_131254
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260315131222.014 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
03-15 13:12 | Success | - | |
|
exp_pytrain.20260315130947.009_20260315_131008
|
Python Skill Fallback
Title: Protocol-Based Plugin Loader with ImportLib Simulation - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-15 13:10 | Success | - | |
|
exp_self.20260315130713.013_20260315_130739
|
Self-Directed SSM Strategy Stress Test
This benchmark evaluates the hypothesis that a disciplined memory policy within a State Space Model (SSM) architecture improves throughput under constrained VRAM (8GB). Methodology We compare two variants of a recurrent SSM block: 1. **Abla...
|
03-15 13:07 | Success | - | |
|
exp_self.20260315130411.012_20260315_130439
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260315130411.012 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
03-15 13:04 | Success | - | |
|
exp_pytrain.20260315130147.008_20260315_130208
|
Dynamic Package Loader with PEP 695 Type Constraints
This benchmark demonstrates the integration of modern Python type hinting (PEP 695) with runtime dynamic module loading. It simulates a plugin architecture where a temporary Python package is constructed programmatically, loaded via `import...
|
03-15 13:02 | Success | - | |
|
exp_self.20260315125940.011_20260315_130006
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260315125940.011 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
03-15 13:00 | Success | - | |
|
exp_self.20260315125635.010_20260315_125702
|
Self-directed Benchmark: SSM Strategy Stress Test
This benchmark evaluates the hypothesis that applying State Space Model (SSM) strategies with disciplined memory policies significantly improves throughput and efficiency under strict 8GB VRAM constraints compared to standard full-context c...
|
03-15 12:57 | Success | - | |
|
exp_pytrain.20260315125347.007_20260315_125417
|
Dynamic Plugin Registry with Runtime Type Checking
This benchmark demonstrates a robust, dependency-free plugin architecture using Python's standard library. It leverages `typing.Protocol` for structural subtyping (duck typing with static and runtime verification) and `importlib` for dynami...
|
03-15 12:54 | Success | - | |
|
exp_self.20260315125051.009_20260315_125123
|
Self-directed benchmark: ssm strategy stress test
This benchmark evaluates the hypothesis that applying State Space Model (SSM) memory principles—specifically a disciplined memory policy and dynamic precision—improves inference throughput under strict 8GB VRAM constraints compared to a sta...
|
03-15 12:51 | Success | - | |
|
exp_pytrain.20260315124749.006_20260315_124816
|
Modular Configuration Registry Benchmark
This benchmark tests the implementation of a robust, type-safe configuration management system using Python's standard library. The task requires the creation of a `config_registry` system that enforces strict typing via `typing.Protocol` a...
|
03-15 12:48 | Success | - | |
|
exp_self.20260315124526.008_20260315_124548
|
Self-directed benchmark: ssm strategy stress test
Overview This benchmark evaluates the hypothesis that applying State Space Model (SSM) strategies with a disciplined memory policy improves throughput under strict 8GB VRAM constraints. It compares a standard Transformer-style KV-Cache appr...
|
03-15 12:45 | Success | - | |
|
exp_self.20260315124234.007_20260315_124254
|
Self-directed benchmark: ssm strategy stress test
This benchmark evaluates the impact of a disciplined memory policy and mixed precision on a State Space Model (SSM) simulation. Hypothesis Applying SSM inference with chunked processing and dynamic precision (FP16) significantly reduces VRA...
|
03-15 12:43 | Success | - | |
|
exp_pytrain.20260315123917.005_20260315_123943
|
Strict Metadata Validator and Dependency Resolver
This project implements a lightweight package manager simulation in Python, focusing on strict type enforcement and robust dependency resolution. Features - **Strict Typing**: Uses `typing.TypedDict` to enforce the structure of package meta...
|
03-15 12:39 | Success | - | |
|
exp_self.20260315123655.006_20260315_123720
|
SSM Strategy Stress Test: Memory vs. Throughput
This benchmark evaluates the hypothesis that applying State Space Model (SSM) techniques with a disciplined memory policy (specifically, chunked recurrence vs. unrolled convolution) improves throughput under constrained memory (8GB VRAM tar...
|
03-15 12:37 | Success | - | |
|
exp_self.20260315123337.005_20260315_123407
|
SSM Strategy Stress Test Benchmark
This benchmark evaluates the performance of a Selective State Space Model (SSM) implementation under different memory and precision policies. It compares a baseline floating-point implementation against an optimized variant that leverages d...
|
03-15 12:34 | Success | - | |
|
exp_pytrain.20260315123026.004_20260315_123100
|
Python Skill Fallback
Title: Strictly Typed Modular CLI Pipeline - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-15 12:31 | Success | - | |
|
exp_self.20260315122735.004_20260315_122759
|
SSM Strategy Stress Test
This benchmark evaluates the hypothesis that a State Space Model (SSM) strategy with a disciplined memory policy improves throughput under 8GB VRAM constraints compared to standard attention mechanisms. **Setup:** We compare a standard Self...
|
03-15 12:28 | Success | - | |
|
exp_pytrain.20260315122408.003_20260315_122445
|
Robust Dynamic Plugin Loader with Structural Subtyping
This benchmark demonstrates a robust plugin system architecture using Python's standard library. The goal is to simulate an autonomous system that: 1. **Dynamically generates** a temporary package structure on disk using `tempfile` and `pat...
|
03-15 12:24 | Success | - | |
|
exp_self.20260315122206.003_20260315_122229
|
Self-directed benchmark: ssm strategy stress test
Objective This benchmark evaluates the hypothesis that applying State Space Model (SSM) strategies with disciplined memory policies and dynamic precision can improve throughput under constrained memory environments (8GB VRAM target). It com...
|
03-15 12:22 | Success | - | |
|
exp_self.20260315121902.002_20260315_121929
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260315121902.002 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
03-15 12:19 | Success | - | |
|
exp_pytrain.20260315121544.002_20260315_121612
|
Generic Plugin Registry Benchmark using PEP 695
This benchmark tests the implementation of a generic plugin registry utilizing Python 3.12's Type Parameter Syntax (PEP 695). It aims to reduce boilerplate associated with `typing.Generic` while maintaining strict type safety and runtime be...
|
03-15 12:16 | Success | - | |
|
exp_self.20260315121302.001_20260315_121335
|
SSM Strategy Stress Test Benchmark
This benchmark evaluates the performance and memory efficiency of an optimized State Space Model (SSM) implementation against a standard Transformer baseline. The focus is on a "disciplined memory policy," utilizing techniques like key-valu...
|
03-15 12:13 | Success | - | |
|
exp_pytrain.20260315120853.001_20260315_120933
|
Benchmark: Strict pyproject.toml Validator with TypedDict
This benchmark evaluates a custom, recursive runtime validation engine for complex nested data structures (simulating `pyproject.toml` PEP 518/621 standards) using Python's standard `typing` module. It specifically tests the introspection o...
|
03-15 12:09 | Success | - | |
|
exp_pytrain.20260315120346.006_20260315_120427
|
Dynamic Plugin Loader with Protocol Constraints
This coding drill validates a hypothesis about autonomous systems leveraging Python's `typing.Protocol` for structural subtyping and `importlib` for runtime module discovery. Objective Create a robust, dependency-free plugin architecture ca...
|
03-15 12:06 | Pending | - | |
|
exp_self.20260315120123.010_20260315_120146
|
SSM Strategy Stress Test
This benchmark evaluates the performance characteristics of a State Space Model (SSM) implementation under strict memory constraints. It simulates the inference throughput and VRAM usage of two configurations: 1. **Baseline**: Standard exec...
|
03-15 12:01 | Success | - | |
|
exp_self.20260315115742.009_20260315_115813
|
Benchmark: SSM Strategy Stress Test
This benchmark evaluates the hypothesis that a State Space Model (SSM) strategy, combined with disciplined memory policies and dynamic precision, maintains higher throughput than standard quadratic-attention mechanisms under strict 8GB VRAM...
|
03-15 11:58 | Success | - | |
|
exp_pytrain.20260315115445.005_20260315_115508
|
Strict-Typed Dynamic Plugin Loader
Overview This benchmark evaluates the ability to construct a robust, extensible plugin architecture using Python's standard `importlib` for dynamic module discovery and `typing.Protocol` for strict interface enforcement. Problem Statement T...
|
03-15 11:55 | Success | - | |
|
exp_self.20260315115245.008_20260315_115311
|
Self-directed benchmark: SSM Strategy Stress Test
Overview This benchmark evaluates the hypothesis that applying State Space Model (SSM) strategies with a disciplined memory policy (including dynamic precision) improves throughput while adhering to strict VRAM constraints (< 8GB). It compa...
|
03-15 11:53 | Success | - | |
|
exp_self.20260315115021.007_20260315_115042
|
SSM Strategy Stress Test: Memory vs. Throughput
Overview This benchmark evaluates the performance impact of a disciplined memory policy on State Space Models (SSMs). It compares a **Baseline (Ablated)** configuration against an **Optimized (Innovation)** configuration that leverages dyna...
|
03-15 11:50 | Success | - | |
|
exp_pytrain.20260315114715.004_20260315_114738
|
Strictly Typed Modular Plugin System
**Benchmark ID:** `strict_typing_plugin_system` **Hypothesis:** An autonomous coding system can effectively utilize Python's packaging conventions and advanced static typing features to build a robust, extensible data processing framework w...
|
03-15 11:47 | Success | - | |
|
exp_self.20260315114503.006_20260315_114531
|
Self-directed benchmark: SSM Strategy Stress Test
Overview This benchmark evaluates the hypothesis that applying a **State Space Model (SSM)** with a disciplined memory policy and dynamic precision (bfloat16) significantly improves inference throughput and reduces VRAM footprint compared t...
|
03-15 11:45 | Success | - | |
|
exp_self.20260315114147.005_20260315_114215
|
Benchmark: SSM Strategy Stress Test
This benchmark evaluates the performance of a standard Transformer architecture (Baseline) against a State Space Model (SSM) simulation (Innovation) under constrained memory conditions. Hypothesis Applying an SSM strategy with disciplined m...
|
03-15 11:42 | Success | - | |
|
exp_pytrain.20260315113919.003_20260315_113938
|
Python Skill Fallback
Title: Typed Async Plugin Loader - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-15 11:39 | Success | - | |
|
exp_self.20260315113637.004_20260315_113657
|
SSM Strategy Stress Test Benchmark
This repository contains a minimal benchmark designed to evaluate the hypothesis that State Space Model (SSM) architectures with disciplined memory policies provide superior throughput and memory efficiency compared to standard Attention-ba...
|
03-15 11:37 | Success | - | |
|
exp_self.20260315113343.003_20260315_113401
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260315113343.003 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
03-15 11:34 | Success | - | |
|
exp_pytrain.20260315113100.002_20260315_113121
|
PEP 695 Generic Container Benchmark
This benchmark evaluates your ability to implement modern Python 3.12+ features, specifically PEP 695 (Type Parameter Syntax), within a robust, package-ready structure. Problem Statement Design a thread-safe generic key-value cache named `S...
|
03-15 11:31 | Success | - | |
|
exp_self.20260315112837.002_20260315_112859
|
Self-directed Benchmark: SSM Strategy Stress Test
This benchmark evaluates the hypothesis that a **State Space Model (SSM)** strategy, utilizing a disciplined memory policy, significantly improves inference throughput compared to a standard Transformer baseline under strict 8GB VRAM constr...
|
03-15 11:29 | Success | - | |
|
exp_self.20260315112528.001_20260315_112558
|
Self-directed benchmark: ssm strategy stress test
Hypothesis Applying SSM (State Space Model) with a disciplined memory policy improves throughput and efficiency under 8GB VRAM constraints compared to standard attention-based architectures. Plan 1. **Environment**: PyTorch script runnable...
|
03-15 11:26 | Success | - | |
|
exp_pytrain.20260315112235.001_20260315_112312
|
Python Skill Fallback
Title: Runtime Type-Checked Plugin Loader - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-15 11:23 | Success | - | |
|
exp_self.20260315103305.015_20260315_103342
|
SSM Strategy Stress Test
This benchmark evaluates the hypothesis that applying State Space Model (SSM) architectures with a **disciplined memory policy** (specifically gradient checkpointing and chunked state management) improves throughput under strict 8GB VRAM co...
|
03-15 10:33 | Pending | - | |
|
exp_pytrain.20260315102855.012_20260315_102929
|
Python Skill Fallback
Title: Type-Safe Plugin Architecture with Versioning Metadata - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-15 10:29 | Success | - | |
|
exp_self.20260315102550.014_20260315_102631
|
Self-Directed Benchmark: SSM Strategy Stress Test
Overview This benchmark evaluates the efficiency of State Space Models (SSM) strategies against standard Transformer-based attention mechanisms. Specifically, it tests the hypothesis that applying an SSM strategy with a disciplined memory p...
|
03-15 10:26 | Success | - | |
|
exp_pytrain.20260315102230.011_20260315_102303
|
Python Skill Fallback
Title: Type-Safe CLI Application Architecture - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-15 10:23 | Success | - | |
|
exp_self.20260315101504.013_20260315_101533
|
SSM Strategy Stress Test
This benchmark evaluates the memory efficiency and throughput of a **State Space Model (SSM)** strategy compared to a standard **Attention-based Transformer** baseline under constrained memory conditions. Hypothesis Applying SSM with a disc...
|
03-15 10:20 | Success | - | |
|
exp_self.20260315101155.012_20260315_101220
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260315101155.012 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
03-15 10:12 | Success | - | |
|
exp_pytrain.20260315100812.010_20260315_100848
|
Python Skill Fallback
Title: Dynamic ZipApp Construction with Protocol Enforcement - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-15 10:08 | Success | - | |
|
exp_self.20260315100514.011_20260315_100540
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260315100514.011 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
03-15 10:05 | Success | - | |
|
exp_pytrain.20260315100155.009_20260315_100223
|
Typed Configuration Package with CLI Interface
Overview This benchmark evaluates a system's ability to generate a Python script that implements a robust configuration management module. The script must utilize advanced typing features (`typing.TypedDict`) for schema definition and `argp...
|
03-15 10:02 | Success | - | |
|
exp_self.20260315095850.010_20260315_095926
|
SSM Strategy Stress Test
This benchmark evaluates the hypothesis that applying a Selective State Space Model (SSM) with a disciplined memory policy improves inference throughput under strict 8GB VRAM constraints compared to standard Transformer Attention mechanisms...
|
03-15 09:59 | Success | - | |
|
exp_pytrain.20260315095459.008_20260315_095600
|
Generic Dependency Resolver Benchmark
Overview This benchmark tests the implementation of a robust generic dependency resolver using Python's standard library type system features (PEP 484, PEP 695 concepts). Objective Implement a resolver that can process package dependencies,...
|
03-15 09:56 | Success | - | |
|
exp_self.20260315095145.009_20260315_095220
|
SSM Strategy Stress Test
Overview This benchmark evaluates the hypothesis that applying State Space Model (SSM) architectures with a disciplined memory policy improves throughput and reduces VRAM overhead compared to standard attention-based mechanisms under constr...
|
03-15 09:52 | Success | - | |
|
exp_pytrain.20260315094754.007_20260315_094849
|
Dynamic Plugin Architecture with Type Safety
This benchmark verifies the ability to dynamically scaffold a Python package structure in a runtime environment, utilizing Python's `typing` module to enforce structural subtyping (Protocol) and `importlib` to load the generated code. Objec...
|
03-15 09:48 | Success | - | |
|
exp_self.20260315094459.008_20260315_094530
|
Benchmark: SSM Strategy Stress Test
Overview This benchmark evaluates the performance characteristics of a State Space Model (SSM) strategy against a standard dense baseline. The hypothesis is that applying an SSM with a disciplined memory policy (chunked inference and state...
|
03-15 09:45 | Success | - | |
|
exp_self.20260315094112.007_20260315_094201
|
This benchmark is designed to evaluate the hypothesis that an SSM-based architecture, when coupled with a disciplined me...
The implementation simulates a standard Transformer layer (Baseline) against a Recurrent SSM layer (Innovation). Self-Directed Benchmark: SSM Strategy Stress Test Overview This benchmark validates the memory efficiency and throughput of a S...
|
03-15 09:42 | Success | - | |
|
exp_pytrain.20260315093814.006_20260315_093844
|
Python Skill Fallback
Title: Type-Safe Plugin Registry with Dependency Constraints - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-15 09:38 | Success | - | |
|
exp_self.20260315093521.006_20260315_093559
|
Self-directed benchmark: SSM Strategy Stress Test
This repository contains a runnable benchmark designed to test the hypothesis: *Applying SSM (State Space Model) logic with a disciplined memory policy improves throughput under 8GB constraints.* Objective To compare the VRAM usage and infe...
|
03-15 09:36 | Success | - | |
|
exp_pytrain.20260315093057.005_20260315_093203
|
Strict Dependency Resolver Engine
Overview This benchmark tests the ability to implement a core component of package management systems: the Dependency Resolver. The goal is to construct a robust, type-safe engine that determines a valid installation plan given a set of pac...
|
03-15 09:32 | Success | - | |
|
exp_self.20260315092818.005_20260315_092859
|
Self-directed Benchmark: SSM Strategy Stress Test
Hypothesis Applying State Space Models (SSM) with a disciplined memory policy and dynamic precision improves throughput under 8GB VRAM constraints compared to standard dense attention mechanisms. Benchmark Plan We compare a standard Transfo...
|
03-15 09:29 | Success | - | |
|
exp_pytrain.20260315092413.004_20260315_092459
|
Python Skill Fallback
Title: Dynamic Plugin Loader with Structural Typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-15 09:25 | Success | - | |
|
exp_self.20260315092128.004_20260315_092202
|
SSM Strategy Stress Test
This benchmark evaluates the impact of a disciplined memory management policy (chunked recurrent processing) on State Space Model (SSM) workloads under tight VRAM constraints. **Hypothesis** Applying an SSM with a disciplined memory policy...
|
03-15 09:22 | Success | - | |
|
exp_pytrain.20260315091747.003_20260315_091824
|
Type-Safe Plugin Registry with Async Dispatch
This benchmark implements a modular task runner using Python's standard library to demonstrate a clean separation of interface definition, implementation registration, and asynchronous execution. Design Brief Modern software architecture re...
|
03-15 09:18 | Success | - | |
|
exp_self.20260315091355.003_20260315_091503
|
Self-directed Benchmark: SSM Strategy Stress Test
Hypothesis Applying an SSM (State Space Model) strategy with a disciplined memory policy improves throughput (tokens/sec) and reduces VRAM footprint compared to a naive implementation under 8GB VRAM constraints. Abstract This benchmark test...
|
03-15 09:15 | Success | - | |
|
exp_self.20260315091056.002_20260315_091124
|
Self-directed benchmark: SSM Strategy Stress Test
Overview This benchmark evaluates the hypothesis that applying State Space Model (SSM) strategies with a disciplined memory policy improves throughput and reduces VRAM usage compared to standard attention-based baselines. The test compares...
|
03-15 09:11 | Success | - | |
|
exp_pytrain.20260315090730.002_20260315_090806
|
PEP 695 Generic Registry Benchmark
This benchmark tests the implementation of a generic type-safe registry using Python 3.12's PEP 695 syntax. It verifies syntax correctness, type parameter scoping, and runtime behavior while measuring throughput. Requirements - Python 3.12...
|
03-15 09:08 | Success | - | |
|
exp_self.20260315090448.001_20260315_090531
|
Self-Directed Benchmark: SSM Strategy Stress Test
This benchmark evaluates the memory efficiency and throughput of a State Space Model (SSM) strategy versus a standard dense attention baseline. Hypothesis Applying an SSM approach with a disciplined memory policy (fixed state recurrence) ma...
|
03-15 09:05 | Success | - | |
|
exp_pytrain.20260315090036.001_20260315_090129
|
Benchmark: Structural Typing and Dynamic Plugin Loader
Overview This coding drill evaluates the ability to design a robust, type-safe plugin architecture using Python's standard library. The benchmark focuses on **Structural Typing** (using `typing.Protocol` and `@runtime_checkable`) and **Pack...
|
03-15 09:01 | Success | - | |
|
exp_pytrain.20260315084527.016_20260315_084555
|
Python Skill Fallback
Title: Type-Validated Dynamic Plugin Loader - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-15 08:45 | Success | - | |
|
exp_self.20260315084217.018_20260315_084254
|
SSM Strategy Stress Test
Overview This benchmark evaluates the performance of State Space Model (SSM) inference under constrained memory conditions (8GB VRAM limit). It compares two modes: 1. **Baseline (Ablated)**: Uses standard memory handling and full precision...
|
03-15 08:43 | Success | - | |
|
exp_pytrain.20260315083851.015_20260315_083924
|
Typed Observable State Container
This benchmark implements a "mini-package" within a single file to demonstrate robust state management using Python's advanced typing features. Design Hypothesis Explicit use of Python's `typing` system (Generics and Protocols) enforces str...
|
03-15 08:39 | Success | - | |
|
exp_self.20260315083622.017_20260315_083656
|
SSM Strategy Stress Test
This benchmark evaluates the efficiency of a State Space Model (SSM) implementation under constrained VRAM (8GB limit). It contrasts a naive implementation against a memory-disciplined variant that utilizes dynamic chunking and cache optimi...
|
03-15 08:37 | Success | - | |
|
exp_pytrain.20260315083130.014_20260315_083203
|
Python Skill Fallback
Title: Strictly Typed Module with Dynamic Protocol Resolution - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-15 08:32 | Success | - | |
|
exp_self.20260315082827.016_20260315_082853
|
SSM Strategy Stress Test
This benchmark evaluates the performance impact of applying a disciplined memory policy to State Space Model (SSM) operations under constrained VRAM environments (target: < 8GB). Hypothesis Applying SSM architectures with a disciplined memo...
|
03-15 08:28 | Success | - | |
|
exp_pytrain.20260315082426.013_20260315_082518
|
Dynamic Plugin Loader with Runtime Type Enforcement
Overview This drill challenges you to build an extensible system that simulates a lightweight inference engine plugin architecture. You must implement a `PluginRegistry` that dynamically discovers, loads, and validates Python modules from t...
|
03-15 08:25 | Success | - | |
|
exp_self.20260315082119.015_20260315_082212
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260315082119.015 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
03-15 08:22 | Success | - | |
|
exp_pytrain.20260315081801.012_20260315_081833
|
Strictly-Typed Virtual Component Loader
This benchmark tests the ability to construct a robust, dependency-free loader mechanism simulating Python packaging entry points (e.g., `package.module:Class`). It utilizes advanced typing features (`typing.Protocol`, `typing.Type`, Generi...
|
03-15 08:18 | Success | - | |
|
exp_self.20260315081329.014_20260315_081409
|
Self-directed benchmark: ssm strategy stress test
Overview This benchmark evaluates the hypothesis that applying State Space Model (SSM) strategies with disciplined memory policies improves throughput (tokens/sec) while maintaining lower VRAM usage compared to standard Transformer attentio...
|
03-15 08:15 | Success | - | |
|
exp_pytrain.20260315080952.011_20260315_081034
|
Runtime-Checked Dynamic Plugin Loader
This benchmark tests the ability to construct a robust, type-safe plugin architecture using Python's standard library. Objective Implement a `PluginLoader` system that: 1. Dynamically discovers Python modules in a target directory. 2. Inspe...
|
03-15 08:10 | Success | - | |
|
exp_self.20260315080722.013_20260315_080749
|
Self-directed benchmark: SSM Strategy Stress Test
This benchmark evaluates the hypothesis that applying State Space Models (SSMs) with a disciplined memory policy improves throughput under strict VRAM constraints (simulating an 8GB environment) compared to standard Attention-based architec...
|
03-15 08:07 | Success | - | |
|
exp_self.20260315080437.012_20260315_080502
|
SSM Strategy Stress Test
This benchmark evaluates a State Space Model (SSM) based strategy against a standard Transformer attention baseline. The specific hypothesis is that the linear complexity of an SSM architecture (simulated here via a performant PyTorch appro...
|
03-15 08:05 | Success | - | |
|
exp_pytrain.20260315080138.010_20260315_080203
|
Protocol-Based Dynamic Extension Loader
This benchmark tests the ability to design a robust, type-safe plugin system using Python's `typing.Protocol` for structural subtyping and `importlib` for runtime discovery. It simulates a package environment to verify strict interface adhe...
|
03-15 08:02 | Success | - | |
|
exp_self.20260315075809.011_20260315_075839
|
Self-directed Benchmark: SSM Strategy Stress Test
Overview This benchmark investigates the hypothesis that applying State Space Models (SSM) with a disciplined memory policy (specifically selective activation caching and dynamic precision) improves throughput under strict 8GB VRAM constrai...
|
03-15 07:59 | Success | - | |
|
exp_self.20260315075521.010_20260315_075553
|
This benchmark compares a naive State Space Model (SSM) implementation against an optimized variant employing mixed prec...
README.md SSM Strategy Stress Test Benchmark This repository contains a benchmark designed to test the hypothesis that applying SSM architectures with disciplined memory policies improves throughput under strict hardware constraints (8GB VR...
|
03-15 07:55 | Success | - | |
|
exp_pytrain.20260315075236.009_20260315_075300
|
Dynamic Protocol-Based Plugin Loader Benchmark
This benchmark evaluates the ability of an autonomous agent to design a modular plugin system using Python's `typing.Protocol` for structural subtyping and `importlib` for dynamic runtime loading. Objective Create a self-contained script `b...
|
03-15 07:53 | Success | - | |
|
exp_self.20260315074958.009_20260315_075033
|
SSM Strategy Stress Test
This benchmark evaluates the hypothesis that **State Space Models (SSM)** with a disciplined memory policy improve throughput under strict VRAM constraints (8GB) compared to standard quadratic-attention mechanisms. Methodology We compare tw...
|
03-15 07:50 | Success | - | |
|
exp_pytrain.20260315074550.008_20260315_074611
|
Python Skill Fallback
Title: Strictly Typed Pyproject Metadata Validator - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-15 07:46 | Success | - | |
|
exp_self.20260315074250.008_20260315_074318
|
Self-directed benchmark: SSM Strategy Stress Test
Overview This benchmark evaluates the hypothesis that a State Space Model (SSM) utilizing a disciplined memory policy (specifically, state truncation and selective checkpointing) achieves higher throughput and lower VRAM consumption compare...
|
03-15 07:43 | Success | - | |
|
exp_pytrain.20260315073933.007_20260315_074004
|
Strictly Typed Dynamic Module Registry
This coding drill benchmarks your ability to construct a robust, type-safe internal registry system that simulates a Python package's modular architecture. Overview The goal is to create a script `benchmark.py` that simulates a mini-package...
|
03-15 07:40 | Success | - | |
|
exp_self.20260315073627.007_20260315_073702
|
SSM Strategy Stress Test
This benchmark evaluates the hypothesis that applying State Space Models (SSMs) with a disciplined memory policy (specifically, chunking and dynamic precision) improves inference throughput and reduces VRAM usage compared to standard attent...
|
03-15 07:37 | Success | - | |
|
exp_pytrain.20260315073247.006_20260315_073324
|
Dynamic Component Registry with Runtime Protocol Validation
Overview This benchmark evaluates the ability of a Python script to dynamically construct a library architecture, emulate a plugin system using `importlib`, and enforce strict runtime type validation using `typing.Protocol`. Objective Creat...
|
03-15 07:33 | Success | - | |
|
exp_self.20260315072951.006_20260315_073026
|
SSM Strategy Stress Test
This benchmark evaluates the hypothesis that applying State Space Model (SSM) strategies with a disciplined memory policy improves throughput under 8GB VRAM constraints compared to a standard dense (ablated) baseline. Methodology The benchm...
|
03-15 07:30 | Success | - | |
|
exp_self.20260315072611.005_20260315_072636
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260315072611.005 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
03-15 07:26 | Success | - | |
|
exp_pytrain.20260315072242.005_20260315_072333
|
Python Skill Fallback
Title: Dynamic Module Loader with Runtime Type Enforcement - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-15 07:23 | Success | - | |
|
exp_self.20260315072006.004_20260315_072045
|
SSM Strategy Stress Test Benchmark
This benchmark evaluates the hypothesis that a State Space Model (SSM) approach, utilizing a disciplined memory policy (recurrent state management), yields superior throughput and lower VRAM consumption compared to a standard Attention-base...
|
03-15 07:20 | Success | - | |
|
exp_pytrain.20260315071617.004_20260315_071640
|
Typed Plugin Registry and CLI Dispatcher Benchmark
This benchmark evaluates a lightweight, modular Python framework that enforces strict interface contracts using `typing.Protocol` and runtime type checking. The system dynamically loads and executes "plugins" based on a defined structure, e...
|
03-15 07:16 | Success | - | |
|
exp_self.20260315071258.003_20260315_071344
|
Self-directed benchmark: SSM Strategy Stress Test
This benchmark evaluates the efficiency of State Space Models (SSM) compared to standard Transformer architectures under strict memory constraints (8GB VRAM limit). Overview The benchmark compares two implementations of a sequence processin...
|
03-15 07:13 | Success | - | |
|
exp_pytrain.20260315070922.003_20260315_070949
|
Robust Dynamic Plugin Loader with Type Safety
This benchmark demonstrates a modular package architecture simulation using Python's standard library. It focuses on structural subtyping (`typing.Protocol`) and runtime validation (`inspect`, `isinstance`) to create a robust plugin system...
|
03-15 07:09 | Success | - | |
|
exp_self.20260315070615.002_20260315_070656
|
Self-Directed Benchmark: SSM Strategy Stress Test
This repository contains a runnable benchmark designed to test the hypothesis that **applying State Space Model (SSM) strategies with a disciplined memory policy improves throughput under constrained VRAM (8GB)**. Overview The benchmark sim...
|
03-15 07:07 | Success | - | |
|
exp_pytrain.20260315070205.002_20260315_070308
|
Modern Generic Utilities with PEP 695
This benchmark verifies the implementation of modern Python generic types using PEP 695 Type Parameter Syntax (introduced in Python 3.12) within a strictly hygienic module structure. Goal To ensure the coding system can: 1. Define generic c...
|
03-15 07:03 | Success | - | |
|
exp_self.20260315065841.001_20260315_065909
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260315065841.001 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
03-15 06:59 | Success | - | |
|
exp_pytrain.20260315065531.001_20260315_065605
|
Strictly Typed PyProject.toml Generator Benchmark
This benchmark evaluates a Python script's ability to leverage advanced type hinting features (`dataclasses`, `typing.Protocol`, and `Literal`) to construct a strictly typed domain model for Python project metadata (PEP 621). The objective...
|
03-15 06:56 | Success | - | |
|
exp_pytrain.20260315065100.008_20260315_065145
|
Strictly Typed Dependency Injection Container Benchmark
This benchmark evaluates a modern Dependency Injection (DI) implementation in pure Python. It leverages **PEP 695** (Type Parameter Syntax) to eliminate boilerplate associated with `typing.Generic` and `TypeVar`. The design utilizes `typing...
|
03-15 06:51 | Success | - | |
|
exp_self.20260315064827.009_20260315_064906
|
SSM Strategy Stress Test Benchmark
This repository contains a minimal benchmark designed to test the hypothesis that a State Space Model (SSM) utilizing a disciplined memory policy (specifically dynamic precision and efficient state caching) achieves higher throughput under...
|
03-15 06:49 | Success | - | |
|
exp_self.20260315064521.008_20260315_064553
|
SSM Strategy Stress Test Benchmark
Overview This benchmark evaluates the hypothesis that a State Space Model (SSM) strategy, utilizing a disciplined memory policy, improves throughput and reduces VRAM usage compared to a standard full-context attention baseline under tight m...
|
03-15 06:46 | Success | - | |
|
exp_pytrain.20260315064216.007_20260315_064244
|
Strictly-Typed Dynamic Plugin Loader
Overview This coding drill validates the hypothesis that structural subtyping (via `typing.Protocol`) combined with dynamic module generation (`types.ModuleType`) creates a robust, type-safe plugin system without sacrificing the flexibility...
|
03-15 06:42 | Success | - | |
|
exp_self.20260315063844.007_20260315_063939
|
Self-directed benchmark: SSM Strategy Stress Test
This benchmark evaluates the hypothesis that a **disciplined State Space Model (SSM)** memory policy yields superior throughput and memory efficiency compared to standard attention mechanisms under strict 8GB VRAM constraints. Hypothesis Ap...
|
03-15 06:40 | Success | - | |
|
exp_pytrain.20260315063602.006_20260315_063617
|
Robust Dynamic Plugin Loader with Type Safety
Objective This benchmark simulates a robust plugin loading system similar to those found in large-scale inference libraries (like `vllm` or `diffusers`). It tests the ability to define strict interfaces using Python's `typing.Protocol`, pro...
|
03-15 06:36 | Success | - | |
|
exp_self.20260315063318.006_20260315_063400
|
Benchmark: SSM Strategy Stress Test
This repository contains a lightweight benchmark designed to evaluate the efficiency of Selective State Space Models (SSM) versus standard Transformer architectures under strict memory constraints (8GB VRAM). Hypothesis Applying SSM archite...
|
03-15 06:34 | Success | - | |
|
exp_pytrain.20260315062924.005_20260315_062957
|
Dynamic Plugin Loader with Protocol Validation
**Objective** Evaluate the performance and correctness of a dynamic plugin architecture built on Python's `importlib` and structural subtyping via `typing.Protocol`. **Hypothesis** Using `typing.Protocol` allows for a robust plugin system w...
|
03-15 06:30 | Success | - | |
|
exp_self.20260315062651.005_20260315_062723
|
SSM Strategy Stress Test
Overview This benchmark evaluates the hypothesis that applying State Space Models (SSMs) with a disciplined memory policy (chunked processing) improves throughput and manages VRAM more effectively under strict 8GB constraints compared to a...
|
03-15 06:27 | Success | - | |
|
exp_pytrain.20260315061300.004_20260315_061347
|
Strictly Typed Modular Data Pipeline Benchmark
This document outlines the specifications for a self-validating Python coding drill focused on creating a strictly typed, modular data pipeline. Objective The goal is to implement a `pipeline.py` style module contained within `benchmark.py`...
|
03-15 06:23 | Success | - | |
|
exp_self.20260315060926.004_20260315_061009
|
Self-directed benchmark: SSM strategy stress test
This repository contains a minimal benchmark designed to test the hypothesis that applying State Space Model (SSM) strategies with disciplined memory policies improves throughput under constrained VRAM environments (specifically 8GB). Conte...
|
03-15 06:10 | Success | - | |
|
exp_self.20260315060553.003_20260315_060626
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260315060553.003 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
03-15 06:06 | Success | - | |
|
exp_pytrain.20260315060216.003_20260315_060246
|
Python Skill Fallback
Title: Robust Async Plugin Loader with Structural Subtyping - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-15 06:02 | Success | - | |
|
exp_self.20260315055921.002_20260315_060009
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260315055921.002 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
03-15 06:00 | Success | - | |
|
exp_pytrain.20260315055518.002_20260315_055617
|
PEP 695 Generic Result Monad Implementation
Overview This benchmark implements a generic `Result[T, E]` Monad (a container for success or failure states) utilizing Python 3.12's Type Parameter Syntax (PEP 695). This syntax removes the boilerplate of importing `Generic` and `TypeVar`...
|
03-15 05:56 | Success | - | |
|
exp_self.20260315055209.001_20260315_055258
|
SSM Strategy Stress Test Benchmark
This repository contains a minimal benchmark designed to test the hypothesis that a State Space Model (SSM) utilizing a disciplined memory policy (specifically, chunked computation) achieves higher throughput and lower VRAM usage compared t...
|
03-15 05:53 | Success | - | |
|
exp_pytrain.20260315054745.001_20260315_054818
|
Type-Safe Plugin Loader Benchmark
This benchmark verifies the implementation of a dynamic plugin loader that enforces structural subtyping (Protocols) at runtime. Objective Implement `ExtensionLoader.load(spec, protocol)` which: 1. Parses a string specification `module:attr...
|
03-15 05:48 | Success | - | |
|
exp_self.20260314211910.042_20260314_211934
|
Self-directed benchmark: SSM Strategy Stress Test
Hypothesis Applying SSM (State Space Model) architectures with a disciplined memory policy (specifically gradient checkpointing and selective state retention) improves throughput under 8GB VRAM constraints compared to standard eager executi...
|
03-14 21:19 | Pending | - | |
|
exp_self.20260314211641.041_20260314_211703
|
README: SSM Strategy Stress Test
Objective This benchmark validates the hypothesis that applying State Space Model (SSM) inference strategies with disciplined memory management significantly improves throughput (tokens/sec) while maintaining lower VRAM footprints compared...
|
03-14 21:17 | Success | - | |
|
exp_pytrain.20260314211425.022_20260314_211445
|
PEP 695 Generic CLI Manager
This benchmark tests the ability to write modern, type-safe Python code utilizing **PEP 695** (Type Parameter Syntax) introduced in Python 3.12. It combines this new syntax with standard library packaging conventions to create a robust CLI...
|
03-14 21:14 | Success | - | |
|
exp_self.20260314210834.040_20260314_210859
|
SSM Strategy Stress Test
Overview This benchmark evaluates the **Hypothesis**: applying SSM (State Space Model) with a disciplined memory policy improves throughput under 8GB VRAM constraints compared to standard attention-based caching mechanisms. Concept We compa...
|
03-14 21:13 | Success | - | |
|
exp_pytrain.20260314210615.021_20260314_210641
|
Type-Safe Dynamic Plugin Loader
This benchmark tests the ability to construct a mock Python package structure in memory, dynamically discover and load a plugin using `importlib`, and enforce strict adherence to `typing.Protocol` interfaces at runtime. Instructions 1. Ensu...
|
03-14 21:06 | Success | - | |
|
exp_self.20260314210436.039_20260314_210455
|
Self-directed benchmark: SSM strategy stress test
This repository contains a benchmark designed to test the hypothesis that applying State Space Models (SSMs) with a disciplined memory policy and dynamic precision improves throughput under constrained VRAM environments (8GB). Methodology T...
|
03-14 21:05 | Success | - | |
|
exp_self.20260314210210.038_20260314_210243
|
SSM Strategy Stress Test Benchmark
This benchmark evaluates the performance of a State Space Model (SSM) inference implementation under strict memory constraints (8GB). It compares a **Baseline** implementation (naive memory management, standard precision) against an **Optim...
|
03-14 21:02 | Success | - | |
|
exp_pytrain.20260314205947.020_20260314_210012
|
Python Skill Fallback
Title: Strictly-Typed Artifact Persistence System - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-14 21:00 | Success | - | |
|
exp_self.20260314205801.037_20260314_205827
|
SSM Strategy Stress Test Benchmark
This benchmark evaluates the hypothesis that applying State Space Model (SSM) strategies with a disciplined memory policy improves throughput under constrained VRAM (8GB limit). Objective Compare a standard Transformer architecture (Baselin...
|
03-14 20:58 | Success | - | |
|
exp_self.20260314205536.036_20260314_205609
|
Self-Directed Benchmark: SSM Strategy Stress Test
Overview This benchmark evaluates the memory efficiency and throughput of a State Space Model (SSM) strategy compared to a standard Transformer baseline under constrained memory conditions (8GB VRAM). Hypothesis Applying SSMs with a discipl...
|
03-14 20:56 | Success | - | |
|
exp_pytrain.20260314205334.019_20260314_205352
|
Python Skill Fallback
Title: Generic Plugin Registry and Module Encapsulation - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-14 20:53 | Success | - | |
|
exp_self.20260314204343.035_20260314_204405
|
SSM Strategy Stress Test: Memory & Throughput
This benchmark evaluates the hypothesis that a State Space Model (SSM) simulation, operating with a disciplined memory policy, provides superior throughput and lower VRAM footprint compared to a standard Transformer attention baseline under...
|
03-14 20:52 | Success | - | |
|
exp_self.20260314204048.034_20260314_204112
|
SSM Strategy Stress Test Benchmark
This benchmark evaluates the hypothesis that a State Space Model (SSM) approach with disciplined memory management yields superior throughput and lower VRAM usage compared to standard Attention-based mechanisms under constrained memory (8GB...
|
03-14 20:41 | Success | - | |
|
exp_pytrain.20260314203818.018_20260314_203839
|
Dynamic Plugin Loader with Protocol Validation
Overview This coding drill benchmarks your ability to construct a flexible, robust plugin architecture using Python's standard library. The task involves dynamic module discovery using `importlib` and structural interface enforcement using...
|
03-14 20:38 | Success | - | |
|
exp_self.20260314203558.033_20260314_203647
|
Self-directed benchmark: ssm strategy stress test
Overview This benchmark evaluates the effectiveness of memory optimization strategies for State Space Models (SSMs) under constrained VRAM conditions (8GB). It compares a baseline SSM implementation with memory policy optimizations against...
|
03-14 20:36 | Success | - | |
|
exp_self.20260314203326.032_20260314_203403
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260314203326.032 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
03-14 20:34 | Success | - | |
|
exp_pytrain.20260314203039.017_20260314_203121
|
Python Skill Fallback
Title: Strictly Typed Data Pipeline with Packaging Standards - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-14 20:31 | Success | - | |
|
exp_self.20260314202755.031_20260314_202816
|
Self-Directed Benchmark: SSM Strategy Stress Test
This benchmark evaluates the hypothesis that **State Space Models (SSMs)** with a disciplined memory policy provide superior throughput and lower VRAM usage compared to standard Transformer attention mechanisms under strict memory constrain...
|
03-14 20:28 | Success | - | |
|
exp_self.20260314202539.030_20260314_202606
|
SSM Strategy Stress Test
This benchmark evaluates the hypothesis that applying a **State Space Model (SSM)** strategy with a disciplined memory policy (specifically, chunked inference and dynamic precision) significantly improves throughput and reduces VRAM pressur...
|
03-14 20:26 | Success | - | |
|
exp_pytrain.20260314202319.016_20260314_202340
|
Python Skill Fallback
Title: Dynamic Module Loader with Runtime Type Verification - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-14 20:23 | Success | - | |
|
exp_self.20260314202053.029_20260314_202112
|
SSM Strategy Stress Test
This benchmark evaluates a synthetic State Space Model (SSM) inference strategy against a standard Transformer-style KV-Cache approach. Hypothesis Applying an SSM-inspired disciplined memory policy (fixed state size + dynamic precision) imp...
|
03-14 20:21 | Success | - | |
|
exp_self.20260314201752.028_20260314_201818
|
Self-directed benchmark: SSM Strategy Stress Test
This benchmark evaluates the hypothesis that applying State Space Models (SSM) with a disciplined memory policy improves inference throughput under strict 8GB VRAM constraints compared to standard attention-based accumulation. Context Tradi...
|
03-14 20:18 | Success | - | |
|
exp_pytrain.20260314201530.015_20260314_201553
|
Generic Package Resource Loader using PEP 695
This benchmark tests the implementation of a type-safe generic resource loader using Python 3.12's PEP 695 Type Parameter Syntax. It verifies the ability to define a generic class `ResourceDecoder[T]` and utilize `importlib.resources` to re...
|
03-14 20:15 | Success | - | |
|
exp_self.20260314201313.027_20260314_201336
|
This benchmark evaluates the hypothesis that applying State Space Model (SSM) strategies with a disciplined memory polic...
Overview The benchmark compares a standard Transformer-based architecture (Baseline) against a linear-complexity SSM-inspired architecture (Innovation). * **Baseline (Attention):** Utilizes standard `nn.MultiheadAttention`. This mechanism s...
|
03-14 20:14 | Success | - | |
|
exp_self.20260314201040.026_20260314_201101
|
SSM Strategy Stress Test
This benchmark evaluates the performance characteristics of a State Space Model (SSM) implementation against a standard Transformer Attention baseline. The goal is to verify the hypothesis that an SSM architecture, when combined with a disc...
|
03-14 20:11 | Success | - | |
|
exp_pytrain.20260314200825.014_20260314_200847
|
Benchmark: Robust Package Structure Validator
Objective This benchmark tests the ability to write a robust, type-safe Python tool using only the standard library (`typing`, `pathlib`, `contextlib`). The task is to simulate a Python package generation process and implement a validation...
|
03-14 20:08 | Success | - | |
|
exp_self.20260314200628.025_20260314_200701
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260314200628.025 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
03-14 20:07 | Success | - | |
|
exp_self.20260314200342.024_20260314_200403
|
Self-Directed Benchmark: SSM Strategy Stress Test
Overview This benchmark is designed to test the hypothesis that applying a State Space Model (SSM) with a disciplined memory policy improves throughput under 8GB VRAM constraints. It implements a synthetic Diagonal State Space Model (DSSM)...
|
03-14 20:04 | Success | - | |
|
exp_pytrain.20260314200107.013_20260314_200130
|
Strictly Typed Plugin Architecture with Dynamic Registry
This benchmark evaluates the design and implementation of a strictly typed plugin system using Python's `typing.Protocol`. The script simulates a computational engine package structure, defining a `Kernel` interface and enforcing strict typ...
|
03-14 20:01 | Success | - | |
|
exp_self.20260314195908.023_20260314_195939
|
Self-directed benchmark: SSM Strategy Stress Test
This benchmark evaluates the hypothesis that applying Selective State Space Models (SSM) with a disciplined memory policy (dynamic precision) improves throughput under 8GB VRAM constraints compared to standard attention mechanisms. Experime...
|
03-14 19:59 | Success | - | |
|
exp_self.20260314195623.022_20260314_195647
|
This benchmark compares a standard Attention-based Transformer block against a simulated State Space Model (SSM) archite...
1. README.md SSM Strategy Stress Test Objective To verify the hypothesis that applying SSM (State Space Model) architectures with a disciplined memory policy significantly improves throughput (tokens/sec) and reduces VRAM usage compared to...
|
03-14 19:56 | Success | - | |
|
exp_pytrain.20260314195414.012_20260314_195434
|
Python Skill Fallback
Title: Strictly Typed Auto-Registry System - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-14 19:54 | Success | - | |
|
exp_self.20260314195229.021_20260314_195251
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260314195229.021 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
03-14 19:52 | Success | - | |
|
exp_self.20260314194939.020_20260314_195012
|
Entropy-Based State Stagnation
This benchmark tests the hypothesis that during fluent text generation (characterized by low entropy/uncertainty in the next-token prediction), the internal state of a State Space Model (SSM) remains relatively constant. By monitoring the e...
|
03-14 19:50 | Success | - | |
|
exp_pytrain.20260314194757.011_20260314_194816
|
Runtime-Verified Plugin Loader via Protocols
This benchmark demonstrates a robust mechanism for loading and verifying Python plugins dynamically using **Structural Subtyping (Protocols)** rather than traditional Inheritance (ABCs). Context In plugin architectures, developers often nee...
|
03-14 19:48 | Success | - | |
|
exp_self.20260314194557.019_20260314_194637
|
Frequency-Modulated State Layers (FMSL)
Paper ID: self.20260314194557.019 - Hypothesis: Semantic processing happens early. We can use FP16 for state updates in the first 50% of layers and FP8 for the last 50%. This 'frequency modulation' of precision saves VRAM bandwidth during t...
|
03-14 19:46 | Success | - | |
|
exp_self.20260314194408.018_20260314_194434
|
Adaptive SSM-Attention Router Benchmark
This benchmark validates the "Adaptive SSM-Attention Router" hypothesis: that a learned router can identify "hard" tokens requiring global attention and route "easy" tokens to a linear SSM path, resulting in sub-linear KV Cache memory scali...
|
03-14 19:44 | Success | - | |
|
exp_self.20260314194213.017_20260314_194240
|
Tiered Precision State Cache (TPSC)
Paper ID: self.20260314194213.017 - Hypothesis: State-Space models rely on a recurrent hidden state. The influence of distant tokens on the current gradient is mathematically bounded. By tiering the cache (FP32 for active, FP16 for history)...
|
03-14 19:42 | Success | - | |
|
exp_pytrain.20260314194019.010_20260314_194046
|
Dynamic Async Plugin Loader with Type Safety
This benchmark tests proficiency in Python's `asyncio`, `typing` protocols, and dynamic module loading. The system constructs a temporary plugin package structure on disk, writes an asynchronous class implementation, and loads it using stan...
|
03-14 19:40 | Success | - | |
|
exp_self.20260314193812.016_20260314_193846
|
DEDP Benchmark: Dynamic Precision for SSMs
This repository contains a minimal, runnable benchmark for **Delta-Encoded Dynamic Precision (DEDP)**. Hypothesis Small changes in the recurrent state (low delta) can be safely stored in INT8, while large changes (high delta) require FP16 t...
|
03-14 19:38 | Success | - | |
|
exp_self.20260314193546.015_20260314_193625
|
Temporal Decay Quantization (TDQ)
Paper ID: self.20260314193546.015 - Hypothesis: Older history in the recurrent state is less critical for immediate next-token prediction than recent history. We can quantize the 'tail' of the state history to 4-bit or 8-bit while keeping t...
|
03-14 19:36 | Success | - | |
|
exp_pytrain.20260314193340.009_20260314_193406
|
Strictly Typed Modular Entry Point
This coding drill validates your ability to design a strictly typed, modular Python application structure within a single script. It simulates package distribution metadata (`__version__`, `__all__`), defines a `Protocol` for interface enfo...
|
03-14 19:34 | Success | - | |
|
exp_self.20260314193119.014_20260314_193159
|
CPU-Offloaded State Streaming with Prefetch
Paper ID: self.20260314193119.014 - Hypothesis: Existing CPU offloading is sync/blocking. By creating a 'background thread' that predicts the next required state window and prefetches it to GPU VRAM *before* the SSM scan reaches it, we can...
|
03-14 19:32 | Success | - | |
|
exp_self.20260314192907.013_20260314_192935
|
Contextual LoRA Switching via State Clustering
This benchmark tests the hypothesis that an SSM's internal state can serve as a highly efficient signal for routing specialized domain experts (LoRA adapters). The Innovation Traditional LLMs use static weights or computationally expensive...
|
03-14 19:29 | Success | - | |
|
exp_pytrain.20260314192717.008_20260314_192740
|
Typed Dependency Graph Resolver
Overview This benchmark evaluates the implementation of a robust package dependency resolver using modern Python static typing features (`Protocol`, `Generics`, `dataclasses`) and standard library packaging tools (`tomllib`). Objective The...
|
03-14 19:27 | Success | - | |
|
exp_self.20260314192515.012_20260314_192540
|
Task-Gated Semantic State Pruning
Paper ID: self.20260314192515.012 - Hypothesis: Not all history is useful for the next token prediction. By using a lightweight 'Gate' (similar to a gating mechanism in LSTMs but applied to the state dimension) driven by the current embeddi...
|
03-14 19:25 | Success | - | |
|
exp_self.20260314192234.011_20260314_192334
|
Time-Aware Tiered Precision (TATP) for SSM States
Paper ID: self.20260314192234.011 - Hypothesis: Recent history in an SSM is more sensitive to precision than ancient history. By storing t-1 states in FP16, t-10 in INT8, and t-50 in INT4, we can fit longer contexts on 8GB GPUs. - Plan: Mod...
|
03-14 19:23 | Success | - | |
|
exp_pytrain.20260314192035.007_20260314_192101
|
Strictly-Typed Model Registry and Configuration Loader
Overview This benchmark demonstrates a robust, type-safe implementation of a Model Registry and Configuration Loader, inspired by the architecture of modern LLM frameworks like PyTorch and LitGPT. The Hypothesis Explicitly defining interfac...
|
03-14 19:21 | Success | - | |
|
exp_self.20260314191817.010_20260314_191849
|
Entropy-Based Dynamic State Quantization
README.md This benchmark explores **Entropy-Based Dynamic State Quantization** for State Space Models (SSMs). Hypothesis We hypothesize that the "cognitive load" of an SSM, measured by the entropy of its hidden state $h_t$, fluctuates durin...
|
03-14 19:18 | Success | - | |
|
exp_self.20260314191621.009_20260314_191644
|
Variance-Gated Dynamic State Precision Benchmark
Overview This benchmark tests the **Variance-Gated Dynamic State Precision** hypothesis. It posits that not all states in a State Space Model (SSM) require high precision (FP16). By monitoring the variance of the hidden state during inferen...
|
03-14 19:16 | Success | - | |
|
exp_pytrain.20260314191445.006_20260314_191501
|
Robust Dynamic Plugin Loader Benchmark
Objective This benchmark evaluates the ability of an autonomous system to design a secure, extensible architecture using Python's standard library. Specifically, it tests the dynamic loading of Python modules (plugins) from a temporary file...
|
03-14 19:15 | Success | - | |
|
exp_self.20260314191213.008_20260314_191238
|
Tiered-Precision SSM State Cache
Paper ID: self.20260314191213.008 - Hypothesis: A tiered precision scheme (Hot=FP16, Cold=INT4) will double the effective context window of an SSM with negligible perplexity increase. - Plan: Implement a ring-buffer for the SSM state. Quant...
|
03-14 19:12 | Success | - | |
|
exp_self.20260314191014.007_20260314_191041
|
Latent State Injection for RAG
Overview This benchmark evaluates **Latent State Injection**, a novel approach to Retrieval-Augmented Generation (RAG) using State Space Models (SSMs). The Innovation Standard RAG systems retrieve raw text chunks, concatenate them with the...
|
03-14 19:10 | Success | - | |
|
exp_pytrain.20260314190817.005_20260314_190842
|
Python Skill Fallback
Title: Dynamic Package Construction and Loader - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-14 19:08 | Success | - | |
|
exp_self.20260314190614.006_20260314_190647
|
Tiered-Precision State Cache for Mamba
Overview This benchmark evaluates a **Tiered-Precision State Cache** designed for State Space Models (SSMs) like Mamba. **The Problem:** Long-context SSMs must maintain a massive hidden state (`h_t`) that grows or updates with every token....
|
03-14 19:06 | Success | - | |
|
exp_gh_huggingface_transformers_20260314_190423
|
Hugging Face Transformers Efficiency Benchmark
This benchmark evaluates the performance of the `transformers` library, focusing on efficient inference strategies for Large Language Models (LLMs). It highlights the library's optimization capabilities, specifically **KV-Caching** for gene...
|
03-14 19:04 | Success | - | |
|
exp_pytrain.20260314190202.004_20260314_190224
|
Type-Safe Virtual Package Manager Benchmark
This benchmark tests the ability to write a robust, type-safe CLI application using Python's standard library. The candidate must implement a virtual package manager that handles dependencies, immutability, and argument parsing according to...
|
03-14 19:02 | Success | - | |
|
exp_self.20260314185958.005_20260314_190030
|
Speculative RAG Skipping
Paper ID: self.20260314185958.005 - Hypothesis: If the SSM state has low entropy (high confidence) regarding the next token, the answer is likely 'in memory'. If entropy spikes, we trigger RAG. This creates a 'Just-In-Time' retrieval system...
|
03-14 19:00 | Success | - | |
|
exp_self.20260314185715.004_20260314_185800
|
Sparse Attention Routing for SSM Recall
This benchmark evaluates a hybrid architecture designed to solve the "Needle-in-a-Haystack" retrieval problem often faced by State Space Models (SSMs) like Mamba. Hypothesis While SSMs excel at efficient reasoning over long sequences (low e...
|
03-14 18:58 | Success | - | |
|
exp_pytrain.20260314185436.003_20260314_185501
|
Robust Dynamic Plugin Loader with Protocol Enforcement
This coding benchmark tests the ability to construct a robust, type-safe plugin architecture using Python's standard library. It focuses on combining `typing.Protocol` for interface definition and `importlib` for runtime module loading to c...
|
03-14 18:55 | Success | - | |
|
exp_self.20260314185233.003_20260314_185308
|
Tiered SSM State Cache Benchmark
Innovation This benchmark tests a **Tiered SSM State Cache** mechanism. **Hypothesis**: Offloading older SSM states to system RAM (at FP16) while keeping active states in GPU VRAM (at FP8) will allow for effectively infinite context windows...
|
03-14 18:53 | Success | - | |
|
exp_self.20260314184953.002_20260314_185025
|
Delta-State Compression for Long Context
This benchmark implements a simulation of State Space Model (SSM) state caching to verify the **Delta-State Compression** hypothesis. Hypothesis SSM states evolve smoothly over time (governed by decay factors like $A \bar{H}$). Therefore, s...
|
03-14 18:50 | Success | - | |
|
exp_pytrain.20260314184735.002_20260314_184807
|
PEP 695 Generic Repository Implementation Benchmark
Overview This coding drill verifies your ability to utilize **PEP 695 Type Parameter Syntax** introduced in Python 3.12. The Challenge You must implement a generic in-memory Repository within `benchmark.py`. The implementation is strictly c...
|
03-14 18:48 | Success | - | |
|
exp_self.20260314184516.001_20260314_184550
|
CPU-Offloaded Tiered State Cache
Paper ID: self.20260314184516.001 - Hypothesis: Distant states in an SSM have diminishing impact on the immediate next token. Quantizing and moving them to system RAM frees up GPU VRAM, allowing for significantly longer context windows with...
|
03-14 18:45 | Success | - | |
|
exp_2603.12254v1_20260314_184330
|
This benchmark implements a synthetic simulation of the AutoGaze architecture to compare a standard ViT (Baseline) again...
AutoGaze Efficiency Benchmark This repository contains a synthetic benchmark designed to evaluate the efficiency claims of **AutoGaze** (Attend Before Attention). It simulates the heavy computational load of processing long, high-resolution...
|
03-14 18:43 | Success | - | |
|
exp_pytrain.20260314184102.001_20260314_184126
|
Type-Safe Local Package Validator
A Python coding drill benchmark designed to test your ability to create robust, type-safe package management tools. Objective Create a CLI script `validate_and_install.py` (simulated within `benchmark.py`) that verifies a local library's ty...
|
03-14 18:41 | Success | - | |
|
exp_self.20260314183733.004_20260314_183757
|
Tiered SSM State Cache Benchmark
This benchmark tests the hypothesis that offloading older SSM (State Space Model) states to system RAM while keeping active states in GPU VRAM allows for effectively infinite context windows on consumer hardware. Benchmark Details The code...
|
03-14 18:38 | Success | - | |
|
exp_pytrain.20260314183557.018_20260314_183618
|
Dynamic Package Construction and Type-Safety Verification
This benchmark tests an autonomous system's ability to programmatically scaffold a Python project structure, generate strictly typed source code, and perform runtime verification against a defined `Protocol`. Objectives 1. **Filesystem Oper...
|
03-14 18:36 | Success | - | |
|
exp_self.20260314183316.003_20260314_183346
|
SSM State Recycling Benchmark
This benchmark tests the hypothesis that maintaining the SSM (State Space Model) hidden state across tool execution boundaries improves efficiency (tokens/sec) and reduces context re-processing overhead. **The Innovation:** Standard LLM wor...
|
03-14 18:33 | Success | - | |
|
exp_self.20260314183014.002_20260314_183104
|
Dynamic Precision State Skipping Benchmark
This benchmark evaluates the "Dynamic Precision State Skipping" hypothesis for Mamba-style State Space Models (SSMs). The core idea is that during fluent generation (low entropy), the state changes slowly, allowing for lower precision (INT4...
|
03-14 18:31 | Success | - | |
|
exp_pytrain.20260314182827.017_20260314_182856
|
Python Skill Fallback
Title: Strictly Typed Modular Data Processor - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-14 18:28 | Success | - | |
|
exp_gh_Dao-AILab_flash-attention_20260314_182610
|
Flash Attention Benchmark
This benchmark evaluates the performance and memory efficiency of Flash Attention compared to standard attention mechanisms in transformer models. What is Flash Attention? Flash Attention is a fast and memory-efficient exact attention algor...
|
03-14 18:26 | Success | - | |
|
exp_hf_2603.08258_20260314_182342
|
WaDi: Weight Direction-aware Distillation for One-step Image Synthesis
Paper ID: hf_2603.08258 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
03-14 18:23 | Success | - | |
|
exp_pytrain.20260314182119.016_20260314_182137
|
Runtime Type-Checked Dynamic Plugin Loader
This benchmark evaluates the capability of a Python system to simulate a packaging environment by programmatically generating a Python module, persisting it to disk, and dynamically importing it using `importlib` machinery. The core challen...
|
03-14 18:21 | Success | - | |
|
exp_gh_vllm-project_vllm_20260314_180402
|
vLLM Inference Benchmark
This benchmark evaluates the performance of **vLLM**, a high-throughput and memory-efficient inference engine for Large Language Models (LLMs). vLLM introduces **PagedAttention**, an algorithm that optimizes memory management for the KV cac...
|
03-14 18:19 | Success | - | |
|
exp_pytrain.20260314180137.015_20260314_180205
|
Dynamic Type-Safe Package Generator Benchmark
Overview This benchmark evaluates a system's ability to programmatically construct a valid Python package structure on the filesystem, populate it with source code adhering to modern typing standards (specifically PEP 695 Type Parameter Syn...
|
03-14 18:02 | Success | - | |
|
exp_self.20260314175900.001_20260314_175926
|
Adaptive Tool-State Quantization (ATSQ) Benchmark
This repository contains a runnable benchmark for the "Adaptive Tool-State Quantization" innovation. It tests the hypothesis that selectively applying 4-bit quantization to a State Space Model's (SSM) hidden state *only* during tool-use tra...
|
03-14 17:59 | Success | - | |
|
exp_hf_2603.10604_20260314_175718
|
HyPER-GAN Benchmark
This benchmark evaluates the real-time inference capabilities of the HyPER-GAN architecture simulation, focusing on memory efficiency and patch throughput. Key Metrics * **VRAM_USAGE**: Peak GPU memory consumed during the patch-enhancement...
|
03-14 17:57 | Success | - | |
|
exp_pytrain.20260314175435.014_20260314_175521
|
Strictly-Typed Plugin Loader with Entry Point Simulation
Overview This benchmark tests a developer's ability to implement a robust plugin architecture using Python's standard library. Specifically, it evaluates the use of `typing.Protocol` for defining structural interfaces (`SupportsProcess`) an...
|
03-14 17:55 | Success | - | |
|
exp_2308.04657v1_20260314_175319
|
Benchmarking Token Reduction in Vision Transformers (ViTs)
**Architecture:** Investigates token reduction in Vision Transformers (ViTs) across 10 methods, contrasting dynamic pruning against fixed spatial patterns. **Memory Footprint:** Token pruning reduces sequence length within self-attention la...
|
03-14 17:53 | Success | - | |
|
exp_2308.01045v2_20260314_175232
|
Benchmark for Dynamic Token Pruning (DToP) in Vision Transformers
**Architecture:** Introduces Dynamic Token Pruning (DToP) for plain Vision Transformers (ViTs). It employs a multi-stage architecture with auxiliary classifiers to grade token difficulty. Instead of dropping tokens (which harms dense output...
|
03-14 17:52 | Success | - | |
|
exp_2409.08464v2_20260314_175146
|
This benchmark evaluates the **VLTP (Vision Language Guided Token Pruning)** framework, specifically investigating the h...
**Architecture:** VLTP inserts a trainable "pruning decoder" into the ViT pipeline. This module fuses image tokens with Vision-Language guidance (from an MLLM) to predict token relevance. Only tokens identified as pertinent to the specific...
|
03-14 17:51 | Success | - | |
|
exp_2512.14332v1_20260314_175050
|
Step-Tagging Framework Benchmark
**Architecture:** The paper proposes "Step-Tagging," a framework utilizing a lightweight, auxiliary sentence-classifier alongside the host Language Reasoning Model (LRM). It introduces "ReasonType," a specific taxonomy for categorizing reas...
|
03-14 17:50 | Success | - | |
|
exp_2504.01690v2_20260314_175010
|
Backfill Candidate 2504.01690v2
**Architecture:** Adapts TopK token pruning to ViT-based audio encoders (AudioMAE, AST) processing Mel-spectrograms. **Memory & Speed:** Achieves a 30-40% reduction in Multiply-Accumulate (MAC) operations with <1% accuracy drop. Reducing to...
|
03-14 17:50 | Success | - | |
|
exp_pytrain.20260314174817.013_20260314_174837
|
Strictly-Typed Modular Configuration System
Overview This benchmark challenges the developer to construct a robust, modular configuration loader and inference engine simulator using Python's advanced type-hinting capabilities. The goal is to enforce strict interface contracts using `...
|
03-14 17:48 | Success | - | |
|
exp_2505.21375v2_20260314_174704
|
Backfill Candidate 2505.21375v2
**Architecture:** Built on the LLaVA framework, specifically modified for remote sensing (RS). It introduces **Background Token Pruning** and **Anchored Token Selection** to address the "token explosion" typical in ultra-high-res inputs. Th...
|
03-14 17:47 | Success | - | |
|
exp_2302.06015v3_20260314_174626
|
Benchmark: Token Sparsification in Shallow ViTs
**Summary for ARES 8GB Roadmap** **Architecture:** The paper provides a theoretical framework for a **shallow ViT** architecture, specifically a single self-attention layer followed by a 2-layer MLP. **Memory Footprint & Inference Speed:**...
|
03-14 17:46 | Success | - | |
|
exp_2506.07138v1_20260314_174543
|
Spatial Token Fusion (STF) Benchmark
**Architecture:** Proposes **Spatial Token Fusion (STF)** to merge adjacent spatial tokens, drastically shortening the visual sequence. It is augmented by **Multi-Block Token Fusion (MBTF)**, which injects multi-granularity features to pres...
|
03-14 17:45 | Success | - | |
|
exp_2307.13770v1_20260314_174457
|
Backfill Candidate 2307.13770v1
**Architecture** E^2VPT implements a dual-prompt strategy to freeze backbone weights. It introduces learnable visual tokens at the input layer and injects learnable Key-Value (KV) pairs directly into the self-attention mechanisms of transfo...
|
03-14 17:45 | Success | - | |
|
exp_2307.10780v2_20260314_174404
|
Benchmark: Learned Threshold Masking Pruning (LTMP) on ViT
**Architecture:** LTMP integrates learned threshold masking modules into Vision Transformers (ViTs). These modules dynamically route tokens—deciding between merging (similarity-based grouping) or pruning (dropping)—to optimize sequence leng...
|
03-14 17:44 | Success | - | |
|
exp_pytrain.20260314174209.012_20260314_174232
|
Python Skill Fallback
Title: Robust Typed Dependency Container - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-14 17:42 | Success | - | |
|
exp_2402.02554v2_20260314_174057
|
DeSparsify: Adversarial DoS Benchmark for Vision Transformers
**Paper:** DeSparsify: Adversarial Attack Against Token Sparsification Mechanisms in Vision Transformers **Summary for ARES 8GB Roadmap:** * **Architecture:** Targets Vision Transformers (ViTs) utilizing dynamic token sparsification mechani...
|
03-14 17:41 | Success | - | |
|
exp_2409.10197v2_20260314_174005
|
Benchmark: FitPrune - Training-Free Visual Token Pruning for MLLMs
**Architecture:** FitPrune is a training-free, statistical pruning method for MLLMs (e.g., LLaVA). Instead of dynamic evaluation, it generates a static "pruning recipe" by analyzing attention map distributions on a small calibration batch....
|
03-14 17:40 | Success | - | |
|
exp_2505.15816v1_20260314_173916
|
Benchmark: ProxyV Vision Token Bypass
**Architecture** ProxyV introduces lightweight "proxy vision tokens" into the LLM backbone. While original vision tokens are preserved to prevent information loss, the proxy tokens handle the heavy lifting (Self-Attention and FFNs). Origina...
|
03-14 17:39 | Success | - | |
|
exp_2510.07974v2_20260314_173826
|
Adaptive World Model Benchmark
**Architecture:** Proposes a wrapper mechanism ("Adaptive World Model") that constructs a dynamic textual world model to track entity states and timelines. It monitors the LLM’s reasoning trajectory for specific "confusion indicators" (e.g....
|
03-14 17:38 | Success | - | |
|
exp_hf_2603.06854_20260314_173739
|
Backfill Candidate hf_2603.06854
**Architecture** Proposes an inference-time activation steering mechanism to mitigate "text dominance" in Large Audio-Language Models (LALMs). It utilizes mechanistic interpretability to identify specific "audio-specialist" attention heads...
|
03-14 17:37 | Success | - | |
|
exp_pytrain.20260314173532.011_20260314_173558
|
AST-Driven Type-Aware ZipApp Builder
Overview This benchmark tests an autonomous coding system's ability to leverage Python's `ast` module for static analysis and the `zipfile` module for packaging. The task is to implement a `StrictZipAppBuilder` class that enforces a "strict...
|
03-14 17:36 | Success | - | |
|
exp_2511.20683v1_20260314_172837
|
README: Dynamic Template Selection (DTS) Router Benchmark
**Architecture:** Proposes a lightweight **MLP router** for Dynamic Template Selection (DTS) to classify query complexity and map inputs to optimized response templates. This contrasts with a heavier fine-tuned RoBERTa baseline. **Memory Fo...
|
03-14 17:35 | Success | - | |
|
exp_2307.02321v2_20260314_172749
|
Backfill Candidate 2307.02321v2
**Architecture:** MSViT proposes a dynamic mixed-scale tokenization scheme using a lightweight, conditional gating mechanism. This module selects optimal token scales per image region, functioning as a preprocessing layer that is agnostic t...
|
03-14 17:27 | Success | - | |
|
exp_2403.14047v2_20260314_172650
|
Backfill Candidate 2403.14047v2
**Architecture:** Proposes a hybrid pruning approach combining static structured block pruning (weights) with dynamic token pruning (input-dependent). A specialized training algorithm recovers accuracy, while the hardware design utilizes mu...
|
03-14 17:26 | Success | - | |
|
exp_2408.17062v1_20260314_172555
|
Benchmark: VoMix for Vision Transformers (ViT)
**Analysis for ARES 8GB Roadmap: VoMix** * **Architecture:** A plug-and-play, parameter-free module inserted between ViT blocks. It uses a "Vote" mechanism (layer-wise similarity voting) to identify redundant tokens and a "Mix" operation to...
|
03-14 17:26 | Success | - | |
|
exp_pytrain.20260314172353.010_20260314_172416
|
Asynchronous Dependency Resolution Engine
This benchmark tests your ability to build a robust, type-safe Python application using the standard library. The task is to implement a simplified package dependency resolver that utilizes `asyncio` for concurrent I/O operations and strict...
|
03-14 17:24 | Success | - | |
|
exp_2407.10756v2_20260314_172237
|
GTPT Token Pruning Efficiency Benchmark
**Architecture** GTPT is a coarse-to-fine Transformer designed for efficient human pose estimation. It dynamically introduces keypoints and processes them via "Multi-Head Group Attention" (MHGA). To optimize efficiency, the architecture gro...
|
03-14 17:22 | Success | - | |
|
exp_2507.08806v1_20260314_172153
|
Benchmark: Structure-Aware Pruning for KV Cache Optimization
**Architecture:** Proposes "Structure-Aware Pruning," an inference-time method that injects temporary "end-of-thinking" instructions. It analyzes attention patterns relative to these markers to identify and evict low-contributing reasoning...
|
03-14 17:21 | Success | - | |
|
exp_2506.07077v1_20260314_172103
|
Dual-Priv Pruning: Visual Token Optimization Benchmark
**Architecture:** Dual-Priv Pruning targets Multimodal LLMs (MLLMs) by combining two distinct mechanisms: (1) **Visual Token Pruning**, which reduces input dimensionality by discarding redundant visual information, and (2) **Gradient-Update...
|
03-14 17:21 | Success | - | |
|
exp_2505.22411v2_20260314_172025
|
Backfill Candidate 2505.22411v2
**Architecture** "Manifold Steering" is an inference-time intervention, not a structural change. It identifies a low-dimensional manifold within the model's activation space responsible for redundant deliberation loops. By projecting steeri...
|
03-14 17:20 | Success | - | |
|
exp_2505.19536v3_20260314_171938
|
This repository contains a synthetic benchmark to evaluate the efficacy of the FlowCut optimization strategy for Large V...
**Architecture** FlowCut is an information-flow-aware pruning framework for LVLMs. Unlike static methods relying on single-layer attention, FlowCut tracks progressive token interactions across layers using the CLS token as a relay. This dyn...
|
03-14 17:19 | Success | - | |
|
exp_pytrain.20260314171739.009_20260314_171803
|
Dynamic Package Inspector
This benchmark evaluates an autonomous agent's ability to programmatically inspect, validate, and introspect local Python packages using only the Python Standard Library. Objective Create a robust script that defines a function `analyze_pac...
|
03-14 17:18 | Success | - | |
|
exp_2505.17020v2_20260314_171633
|
CrossLMM Architecture Benchmark
**Architecture:** CrossLMM decouples long video sequences via a dual cross-attention mechanism. It first applies aggressive pooling to pretrained visual encoder outputs. Within the LLM layers, it utilizes a Visual-to-Visual cross-attention...
|
03-14 17:16 | Success | - | |
|
exp_2505.12509v2_20260314_171550
|
Benchmark: Proxy Framework Efficiency (Backfill 2505.12509v2)
**Architecture:** Introduces a **Proxy Framework** that trains smaller, efficient models to approximate the decision boundaries of large "oracle" LLMs. It employs a **"screen-and-apply"** statistical mechanism to verify local alignment betw...
|
03-14 17:15 | Success | - | |
|
exp_2505.10118v2_20260314_171512
|
Multi-Objective Balanced Covering (MoB) Benchmark
**Architecture:** Multi-Objective Balanced Covering (MoB). This method formulates visual token pruning as a bi-objective covering problem. It balances prompt alignment and visual preservation using Hausdorff distance bounds and $\epsilon$-c...
|
03-14 17:15 | Success | - | |
|
exp_2504.10854v1_20260314_171434
|
Backfill Candidate 2504.10854v1
**Summary for ARES 8GB Roadmap: LVLM_CSP** * **Architecture:** LVLM_CSP is a **training-free** inference accelerator designed for LVLMs performing reasoning segmentation. It utilizes a three-stage pipeline: 1. **Clustering:** Performs coars...
|
03-14 17:14 | Success | - | |
|
exp_2504.04653v2_20260314_171358
|
Backfill Candidate 2504.04653v2
**Architecture:** LEO-MINI introduces two core components: **Conditional Token Reduction (CoTR)** and a **Mixture of Multi-Modal Experts (MMoE)**. CoTR compresses long visual sequences into compact sets using cross-attention between visual...
|
03-14 17:14 | Success | - | |
|
exp_2503.23459v1_20260314_171313
|
Backfill Candidate 2503.23459v1
**Architecture:** Proposes "RL4EViT," replacing static pruning heuristics with Multi-Agent Proximal Policy Optimization (MAPPO). Token pruning is formulated as a Markov Game where individual agents (tokens) make collaborative, layer-wise de...
|
03-14 17:13 | Success | - | |
|
exp_pytrain.20260314171133.008_20260314_171149
|
Strictly Typed Dependency Graph Inspector
Objective Design and implement a robust, type-safe CLI utility script named `pkg_inspector.py` (simulated within the benchmark logic) that analyzes the current Python runtime environment. The solution must demonstrate proficiency with moder...
|
03-14 17:11 | Success | - | |
|
exp_2511.12267v1_20260314_170022
|
Backfill Candidate 2511.12267v1: Active Perception Benchmark
**Architecture:** ZoomEarth introduces an "active perception" framework that processes Ultra-High-Resolution (UHR) images via an adaptive cropping-zooming mechanism. Instead of passively feeding the entire image into a Vision-Language Model...
|
03-14 17:10 | Success | - | |
|
exp_pytrain.20260314165555.007_20260314_165748
|
Type-Safe Modular Plugin System
This benchmark evaluates the ability to dynamically construct a Python package structure that leverages advanced typing features (`typing.Protocol`, `typing.Generic`) to enforce interface compliance without external dependencies. Objective...
|
03-14 16:57 | Success | - | |
|
exp_2511.10081v1_20260314_165148
|
Benchmark for GridPrune (Backfill Candidate 2511.10081v1)
**Architecture** GridPrune replaces standard global Top-K pruning with a two-stage "guide-globally, select-locally" strategy. It uses text-conditional guidance to dynamically allocate token quotas across spatial grids before performing loca...
|
03-14 16:51 | Success | - | |
|
exp_2510.24214v1_20260314_165108
|
SCOPE: Set-Coverage Oriented Visual Token Pruning Benchmark
**Architecture:** SCOPE introduces a visual token pruning strategy for Multimodal LLMs (specifically LLaVA-1.5 and Next) designed to operate prior to the main transformer blocks. Instead of relying solely on attention-based saliency, SCOPE...
|
03-14 16:51 | Success | - | |
|
exp_2510.17205v1_20260314_165015
|
Backfill Candidate 2510.17205v1
**Architecture & Dynamics** VisiPruner leverages a discovered "three-stage" cross-modal fusion process: visual tokens act as passive attention sinks in shallow layers, drive abrupt fusion in middle layers, and are discarded in deep layers....
|
03-14 16:50 | Success | - | |
|
exp_2303.08685v2_20260314_164917
|
STViT Benchmark Suite
**Architecture:** STViT replaces standard dense patch tokens with sparse "semantic tokens" acting as cluster centers. Initialized via spatial pooling and refined through attention, these tokens compress global or local information. It suppo...
|
03-14 16:49 | Success | - | |
|
exp_pytrain.20260314164528.006_20260314_164707
|
Type-Safe Dynamic Component Registry
Overview This benchmark tests the ability to construct a robust, dependency-free component registry using Python's standard library. The design mirrors patterns found in high-performance ML frameworks like Hugging Face Diffusers and vLLM. F...
|
03-14 16:47 | Success | - | |
|
exp_2403.17411v1_20260314_163254
|
PCToolkit (2403.17411v1) Benchmark
**Architecture:** PCToolkit proposes a modular, unified framework designed as a plug-and-play solution for LLMs. It integrates various cutting-edge prompt compression algorithms into a single interface, abstracting the complexity of differe...
|
03-14 16:42 | Success | - | |
|
exp_2511.21477v1_20260314_163212
|
Backfill Candidate 2511.21477v1
**Architecture** The proposed method introduces a frequency-aware token reduction module within the self-attention mechanism. It partitions tokens into high-frequency (detail-oriented) and low-frequency (structural/background) groups. High-...
|
03-14 16:32 | Success | - | |
|
exp_2401.01470v2_20260314_163133
|
Backfill Candidate 2401.01470v2
**Architecture** TPC-ViT introduces a Token Propagation Controller (TPC) module to optimize token lifecycle management. Unlike static pruning methods, TPC employs a probabilistic approach using "pause" (reduction) and "restart" (reuse) dist...
|
03-14 16:31 | Success | - | |
|
exp_2511.16449v3_20260314_163041
|
Benchmark for VLA-Pruner: Dual-Level Token Pruning for VLAs
**Architecture:** VLA-Pruner is a plug-and-play module designed for Vision-Language-Action (VLA) models. It introduces a dual-level pruning strategy that deviates from standard VLM methods by considering action execution. It calculates toke...
|
03-14 16:30 | Success | - | |
|
exp_2504.04024v1_20260314_163004
|
WiCo (Window Concatenation) Optimization Benchmark
**Architecture:** Utilizes a sliding window to concatenate spatially adjacent visual tokens. To prevent detail loss, the last layers of the vision encoder are fine-tuned to align features within windows. The "WiCo+" variant further decompos...
|
03-14 16:30 | Success | - | |
|
exp_pytrain.20260314162814.005_20260314_162831
|
Python Skill Fallback
Title: Strictly-Typed Plugin System - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-14 16:28 | Success | - | |
|
exp_2512.13438v1_20260314_162736
|
Backfill Candidate 2512.13438v1
**Architecture:** UIFormer optimizes LLM agents by synthesizing UI transformation programs via a Domain-Specific Language (DSL). It utilizes constraint-based optimization and iterative LLM refinement to compress complex UI trees into semant...
|
03-14 16:27 | Success | - | |
|
exp_2510.08483v1_20260314_162608
|
DeepPrune Architecture Benchmark
**Architecture:** DeepPrune introduces a specialized "Judge" model (trained via focal loss) to evaluate partial Chain-of-Thought traces. It uses an online greedy clustering algorithm to dynamically prune redundant reasoning paths before gen...
|
03-14 16:26 | Success | - | |
|
exp_2505.16122v3_20260314_162339
|
Plan-and-Budget (P&B) Inference Benchmark
**Architecture** Introduces **Plan-and-Budget (P&B)**, a model-agnostic, test-time framework that decomposes complex queries into sub-questions. A controller dynamically allocates token budgets based on estimated uncertainty, solving the "o...
|
03-14 16:25 | Success | - | |
|
exp_pytrain.20260314162148.004_20260314_162209
|
Python Skill Fallback
Title: Type-Safe Plugin Loader with Protocol Validation - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-14 16:22 | Success | - | |
|
exp_2504.17996v1_20260314_162043
|
Backfill Candidate 2504.17996v1
**Architecture** LVTP is a "plug-and-play" progressive token pruning wrapper for Vision Transformers (ViTs). It introduces a dynamic scoring mechanism that fuses multi-scale Tsallis entropy with low-level visual features (specifically edge...
|
03-14 16:20 | Success | - | |
|
exp_2512.17920v1_20260314_161954
|
Backfill Candidate 2512.17920v1
**Paper Focus:** Evaluation of LLM instruction-following robustness under **prompt compression**. * **Architecture:** No new model proposed. Evaluates 9 frontier LLMs, finding reasoning models are 27.5% more robust to compression than effic...
|
03-14 16:19 | Success | - | |
|
exp_2511.20439v1_20260314_161851
|
OC-VTP Benchmark
**Architecture:** OC-VTP introduces a lightweight, plug-and-play pruner module positioned upstream of the LLM backbone. It utilizes a small, pre-trained network to select "object-centric" vision tokens by minimizing the reconstruction error...
|
03-14 16:19 | Success | - | |
|
exp_2505.00019v1_20260314_161805
|
Backfill Candidate 2505.00019v1
**Architecture:** This study evaluates six distinct prompt compression algorithms (e.g., structural pruning, token summarization) designed to preprocess inputs before feeding them to the LLM, rather than modifying the model weights themselv...
|
03-14 16:18 | Success | - | |
|
exp_hf_2603.10178_20260314_161721
|
Backfill Candidate hf_2603.10178
**Architecture:** ExeVRM is an 8B parameter Vision-Language Model (VLM) fine-tuned on the ExeVR-53k dataset to classify computer-use task success from video keyframes. Its key innovation is **spatiotemporal token pruning**, a mechanism that...
|
03-14 16:17 | Success | - | |
|
exp_pytrain.20260314161523.003_20260314_161545
|
Python Skill Fallback
Title: Runtime Module Loader with Protocol Enforcement - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-14 16:15 | Success | - | |
|
exp_2512.07580v2_20260314_161243
|
Backfill Candidate 2512.07580v2: Information Horizon Benchmark
**Architecture:** Identifies an "information horizon" in VLLMs where visual token salience vanishes (typically beyond layer 20). The paper proves that in deep layers, token information becomes uniform, rendering complex, attention-based pru...
|
03-14 16:14 | Success | - | |
|
exp_2511.14293v1_20260314_161143
|
Benchmark for Segmentwise Pruning in Audio-Language Models
**Architecture** The paper proposes **segmentwise pruning**, a token selection strategy tailored for Audio-Language Models (ALMs). Unlike generic vision approaches, this method accounts for the **time dimension** of audio, pruning irrelevan...
|
03-14 16:11 | Success | - | |
|
exp_2505.08058v2_20260314_161102
|
Semantic Hypernym Compression Benchmark
**Architecture:** Introduces a pre-processing text compression engine that utilizes word-level semantic constriction. It replaces specific nouns with their **hypernyms** (broader category terms) to drastically shorten sequences, relying on...
|
03-14 16:11 | Success | - | |
|
exp_pytrain.20260314160913.002_20260314_160930
|
Python Reliability Drill: PEP 695 Type Parameter Syntax
Overview This drill validates your ability to utilize modern Python typing features introduced in PEP 695 (Type Parameter Syntax). You must implement a generic data processing pipeline that handles various data types strictly, using the new...
|
03-14 16:09 | Success | - | |
|
exp_2510.11588v1_20260314_155755
|
Benchmark: CAP-CPT Inference Efficiency vs. RAG
**Architecture** Introduces **CAP-CPT** (Category-Aware Policy Continued Pretraining), a training pipeline that moves policy knowledge from the context window into model weights. It parses policy documents into categories (factual, behavior...
|
03-14 16:08 | Success | - | |
|
exp_2409.10994v3_20260314_155716
|
TRIM: Token Reduction for Efficient VLM Inference
**Architecture:** TRIM proposes a token-pruning strategy situated between the vision encoder and the LLM. It utilizes CLIP similarity metrics to identify and retain salient visual features while discarding redundant tokens, mimicking human...
|
03-14 15:57 | Success | - | |
|
exp_2511.12281v2_20260314_155636
|
Backfill Candidate 2511.12281v2
**Architecture:** Cmprsr repurposes **Qwen3-4B** via Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO). It performs abstractive, token-level compression, specifically optimizing for semantic retention and strict adh...
|
03-14 15:56 | Success | - | |
|
exp_pytrain.20260314155435.001_20260314_155506
|
Structural Subtyping & Dynamic Plugin Loader Benchmark
This project demonstrates a robust, type-safe plugin architecture using Python's advanced type hinting features and dynamic module loading. Architecture 1. **Protocol Definition (`DataHandler`)**: We utilize `typing.Protocol` combined with...
|
03-14 15:55 | Success | - | |
|
exp_2511.12281v2_20260314_152959
|
Benchmark: Cmprsr (Qwen3-4B) Memory & Compression Efficiency
**Architecture:** Cmprsr repurposes **Qwen3-4B** via Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO). It performs abstractive, token-level compression, specifically optimizing for semantic retention and strict adh...
|
03-14 15:36 | Pending | - | |
|
exp_2504.14692v1_20260314_152911
|
OmniV-Med: Scaling Medical Vision-Language Model for Universal Visual Understanding
**Architecture:** Utilizes a unified **rotary position-adaptive encoder** to handle 2D, 3D, and video inputs within a single model, eliminating the architectural overhead and VRAM cost of maintaining separate modality-specific towers. **Mem...
|
03-14 15:29 | Success | - | |
|
exp_pytrain.20260314152704.051_20260314_152727
|
Strictly Typed Modular Pipeline
This benchmark evaluates a Python implementation of a strictly typed data processing pipeline. The system leverages Python's `typing.Protocol`, `typing.TypeVar`, and `typing.Generic` modules to enforce structural subtyping and data integrit...
|
03-14 15:27 | Success | - | |
|
exp_2505.11707v1_20260314_152540
|
Benchmark: Structure-then-Detail Token Merging (SDTM)
**Architecture** SDTM is a post-training token merging technique for Diffusion Transformers (DiT). It exploits "structure-then-detail" denoising priors to identify and prune redundant tokens that the attention mechanism ignores. The archite...
|
03-14 15:25 | Success | - | |
|
exp_2505.22654v3_20260314_152443
|
VScan: Rethinking Visual Token Reduction for Efficient Large Vision-Language Models
**Architecture:** VScan proposes a two-stage visual token reduction framework to handle LVLM bottlenecks: 1. **Encoding Stage:** Implements token merging via complementary global and local scans. 2. **LLM Stage:** Introduces pruning at inte...
|
03-14 15:24 | Success | - | |
|
exp_2403.02991v1_20260314_152357
|
MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer
**Architecture:** MADTP introduces two plug-in modules for Vision-Language Transformers (VLTs): 1. **MAG (Multi-modality Alignment Guidance):** Aligns semantic features across modalities before pruning to ensure tokens critical to both visi...
|
03-14 15:24 | Success | - | |
|
exp_2403.10030v3_20260314_152309
|
Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers
**Architecture** Proposes **Multi-criteria Token Fusion (MCTF)** to reduce the quadratic complexity of Vision Transformers. Instead of standard pruning, MCTF fuses tokens based on similarity, informativeness, and cluster size. It utilizes "...
|
03-14 15:23 | Success | - | |
|
exp_pytrain.20260314152106.050_20260314_152126
|
Python Skill Fallback
Title: Strictly Typed CLI Plugin Loader - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-14 15:21 | Success | - | |
|
exp_2510.10528v2_20260314_151932
|
Merlin's Whisper Benchmark
**Architecture:** Whisper is not a model architecture but a black-box prompting framework. It functions as an inference wrapper that iteratively refines input prompts to persuade LLMs to generate concise responses, bypassing the verbose Cha...
|
03-14 15:19 | Success | - | |
|
exp_2511.06283v2_20260314_151841
|
TinyChemVL: Efficient Chemical Vision-Language Benchmarking
**Architecture:** TinyChemVL is a 4B parameter Vision-Language Model (VLM) optimized for chemical reasoning. It employs a visual token reduction mechanism to filter non-informative backgrounds, focusing processing power on molecular structu...
|
03-14 15:18 | Success | - | |
|
exp_2503.20540v1_20260314_151753
|
Beyond Intermediate States: Explaining Visual Redundancy through Language
**Summary for ARES 8GB Roadmap** * **Architecture:** Proposes a "Dual-Perspective" pruning mechanism. Instead of relying on intermediate attention maps, it defines redundancy by analyzing textual output variations against visual input pertu...
|
03-14 15:18 | Success | - | |
|
exp_2505.12359v1_20260314_151710
|
Benchmark for STAR: Stage-Wise Attention-Guided Token Reduction
**Architecture:** STAR is a training-free, plug-and-play framework for Large Vision-Language Models (LVLMs) that utilizes a two-stage token reduction strategy. It performs **early-stage pruning** based on **visual self-attention** to remove...
|
03-14 15:17 | Success | - | |
|
exp_pytrain.20260314151449.049_20260314_151513
|
Python Skill Fallback
Title: Robust Dynamic Plugin Loader with Runtime Protocol Enforcement - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-14 15:15 | Success | - | |
|
exp_2505.19217v1_20260314_151331
|
The Overthinker's DIET: Benchmarking Efficiency & Performance
**Architecture** DIET is a training framework, not a structural modification, utilizing Reinforcement Learning (RL) to optimize the efficiency-performance trade-off. It employs "Advantage Weighting" to stabilize group-normalized RL (specifi...
|
03-14 15:13 | Success | - | |
|
exp_2506.00307v2_20260314_151245
|
Lossless Token Sequence Compression Benchmark
**Paper:** Lossless Token Sequence Compression via Meta-Tokens **Architecture:** Proposes a task-agnostic, lossless compression algorithm similar to LZ77. It identifies repeated subsequences within the input context and replaces them with u...
|
03-14 15:12 | Success | - | |
|
exp_2408.12742v1_20260314_151158
|
TReX: Reusing Vision Transformer's Attention for Efficient Xbar-based Computing
**Architecture** TReX proposes a hardware-algorithm co-design for Xbar-based In-Memory Computing (IMC). It optimizes Vision Transformers (ViTs) by strategically **reusing attention maps** from earlier encoder layers in later layers. This by...
|
03-14 15:12 | Success | - | |
|
exp_2402.16058v1_20260314_151048
|
Gist-COCO Efficiency Benchmark
**Architecture:** Gist-COCO utilizes a trainable "plugin" encoder to compress lengthy input prompts into a small set of "gist" tokens. Crucially, it employs a "gist verbalization" mechanism to translate these compressed representations back...
|
03-14 15:11 | Success | - | |
|
exp_pytrain.20260314150845.048_20260314_150911
|
Generic Plugin Loader with Entry Point Simulation
Overview This coding drill validates a developer's ability to implement a robust, type-safe plugin system using modern Python 3.12 features. The benchmark simulates a simplified packaging environment where "plugins" are discovered and loade...
|
03-14 15:09 | Success | - | |
|
exp_2304.00341v1_20260314_150724
|
JacobiNeRF Memory & Speed Benchmark
**Architecture** JacobiNeRF utilizes a standard NeRF backbone but augments the training process with a second-order regularization objective. It explicitly aligns the Jacobians of correlated scene points to model mutual information, rather...
|
03-14 15:07 | Success | - | |
|
exp_2510.09085v1_20260314_150640
|
FLToP CTC: Frame-Level Token Pruning via Relative Threshold for Efficient and Memory-Saving Decoding on Diverse Platform...
**Architecture:** FLToP CTC optimizes the *decoding* stage of CTC-based ASR models (e.g., wav2vec 2.0). Rather than exhaustive token computation, it implements frame-level token pruning guided by a relative probability threshold to dynamica...
|
03-14 15:06 | Success | - | |
|
exp_2511.03929v2_20260314_150554
|
Benchmark for NVIDIA Nemotron Nano V2 VL
**Architecture:** Utilizes a hybrid **Mamba-Transformer** backbone (successor to the 8B Llama-3.1 variant) optimized for multimodal inputs (text, documents, video). It incorporates innovative **token reduction techniques** to manage long-co...
|
03-14 15:06 | Success | - | |
|
exp_2511.22235v2_20260314_150509
|
Training High-Level Schedulers with Execution-Feedback Reinforcement Learning for Long-Horizon GUI Automation
**Architecture** Proposes the **Coordinator-Executor-State Tracker (CES)** framework to decouple high-level reasoning from execution. The system utilizes a **Coordinator** for planning, a **State Tracker** for context compression/history ma...
|
03-14 15:05 | Success | - | |
|
exp_2512.02700v4_20260314_150426
|
Benchmark Design for VLM-Pruner
**VLM-Pruner** optimizes VLMs for memory-constrained hardware via a **training-free** token pruning mechanism. * **Architecture:** Introduces a "Centrifugal" pruning paradigm and a **Buffering for Spatial Sparsity (BSS)** criterion. This ba...
|
03-14 15:04 | Success | - | |
|
exp_pytrain.20260314150214.047_20260314_150238
|
Type-Safe Generic Registry Benchmark
This benchmark evaluates the implementation of a robust, type-safe plugin registry system using Python's advanced type hinting features (`typing.Protocol`, `typing.Generic`, and `runtime_checkable`). Objective Create a generic `Registry` cl...
|
03-14 15:02 | Success | - | |
|
exp_2505.20100v1_20260314_150041
|
AdaTP: Attention-Debiased Token Pruning for Video Large Language Models
**Architecture** AdaTP is a training-free token pruning pipeline for Video LLMs. It addresses the redundancy in visual tokens by correcting two specific biases in standard attention scores: global bias (over-focusing on temporal sequence en...
|
03-14 15:00 | Success | - | |
|
exp_2506.12707v1_20260314_145947
|
SecurityLingua Benchmark
**Architecture:** Utilizes a dual-stage pipeline comprising a lightweight "Intent Compressor" and the Target LLM. The compressor extracts the true intent (detecting malicious payloads) and injects this analysis into the system prompt, while...
|
03-14 15:00 | Success | - | |
|
exp_2506.16369v2_20260314_145859
|
Prompt-based Dynamic Token Pruning (PrATo) Benchmark
**Architecture** PrATo introduces a dynamic token pruning layer for Vision Transformers (ViTs). It utilizes a spatial prompt to generate a prior that ranks tokens by relevance. Low-relevance tokens are down-weighted and excluded from proces...
|
03-14 14:59 | Success | - | |
|
exp_2407.02043v1_20260314_145809
|
Concise and Precise Context Compression Benchmark
**Summary for ARES 8GB Roadmap** This paper introduces a context compression framework designed to reduce the memory overhead of API documentation for tool-using LLMs. **Architecture:** The approach utilizes a dual-strategy mechanism. **Sel...
|
03-14 14:58 | Success | - | |
|
exp_2407.15504v2_20260314_145727
|
Fundamental Limits of Prompt Compression: A Rate-Distortion Framework for Black-Box Language Models
**Architecture:** Formalizes token-level prompt compression via a Rate-Distortion (R-D) framework, deriving theoretical performance limits using Linear Programming (LP). **Memory Footprint:** Significantly reduces input token counts, direct...
|
03-14 14:57 | Success | - | |
|
exp_pytrain.20260314145504.046_20260314_145537
|
Extensible Command Registry with Protocol Enforcement
Overview This coding drill demonstrates a robust, modular command-line framework built entirely with the Python standard library. It simulates advanced packaging concepts such as namespace separation and entry-point discovery using `typing....
|
03-14 14:55 | Success | - | |
|
exp_2407.19410v1_20260314_144335
|
AdaCoder Benchmark Suite
**Architecture:** A lightweight wrapper for Visual Programmatic Models (e.g., ViperGPT). It uses a question-type classifier to retrieve task-specific, compressed "pre-prompts" containing only relevant API definitions, filtering out unnecess...
|
03-14 14:53 | Success | - | |
|
exp_2510.22963v3_20260314_144249
|
Benchmark for CompressionAttack: Semantic Drift and Performance Evaluation
**Architecture:** Focuses on the **prompt compression** module within LLM agent pipelines. Introduces **CompressionAttack**, which exploits compression layers via **HardCom** (discrete adversarial edits) and **SoftCom** (latent-space pertur...
|
03-14 14:42 | Success | - | |
|
exp_2511.15098v1_20260314_144207
|
README: Benchmarking Visual Token Redundancy in dMLLMs
**Analysis: Visual Token Redundancy in Discrete Diffusion MLLMs** This paper investigates optimization strategies for **discrete diffusion-based Multimodal LLMs (dMLLMs)** to address the computational overhead of full-sequence attention dur...
|
03-14 14:42 | Success | - | |
|
exp_2511.19928v1_20260314_144124
|
Benchmark: Context-Aware Token Pruning and Discriminative Attention (CPDATrack)
**Architecture:** CPDATrack optimizes one-stream Vision Transformer (ViT) trackers via two key mechanisms: 1) A learnable **Token Pruning Module** positioned between encoder layers that estimates target probabilities and discards low-probab...
|
03-14 14:41 | Success | - | |
|
exp_2505.23617v2_20260314_144030
|
Grounded Video Tokenization (TrajViT) Benchmark
**Architecture:** TrajViT replaces standard space-time patches with **panoptic sub-object trajectories**, generating a single token per semantic object track rather than per grid block. **Memory & Speed:** Achieves **10x token reduction** a...
|
03-14 14:40 | Success | - | |
|
exp_pytrain.20260314143815.045_20260314_143836
|
Robust Dynamic Plugin Loader with Runtime Type Enforcement
Overview This coding drill focuses on advanced Python metaprogramming, specifically dynamic module loading and structural subtyping (protocols). The objective is to build a self-contained system that acts as a strict plugin loader, verifyin...
|
03-14 14:38 | Success | - | |
|
exp_2408.03094v1_20260314_142646
|
Benchmark: 500xCompressor Efficiency Simulation
**Architecture:** 500xCompressor is a lightweight encoder (pretrained on Arxiv) that compresses long text sequences into single special tokens. It uniquely relies on Key-Value (KV) preservation rather than embeddings to maintain semantic in...
|
03-14 14:36 | Success | - | |
|
exp_2409.01227v3_20260314_142559
|
Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference
**Architecture** Proposes Context-Aware Prompt Compression (CPC), utilizing a contrastive learning-based sentence encoder. The model scores sentence relevance against the specific query, filtering irrelevant data at the sentence level rathe...
|
03-14 14:26 | Success | - | |
|
exp_2308.08758v3_20260314_142507
|
Benchmarking Discrete Prompt Compression with Reinforcement Learning (PCRL)
**Architecture:** PCRL introduces a lightweight, RL-trained policy network that performs discrete token-level editing (deletion/substitution) on prompts. It treats the LLM as a black-box environment, requiring no gradient access or labeled...
|
03-14 14:25 | Success | - | |
|
exp_pytrain.20260314142317.044_20260314_142335
|
Robust Typed Plugin Loader
A coding drill benchmark focusing on Python's `typing` module, specifically `Protocol` and `runtime_checkable`, combined with dynamic module loading using `importlib`. Objective Implement a robust runtime plugin loader that: 1. Dynamically...
|
03-14 14:23 | Success | - | |
|
exp_2406.18294v2_20260314_141159
|
Benchmark: Hierarchical Context Pruning (HCP)
**Architecture:** Hierarchical Context Pruning (HCP) is a context-management strategy, not a model weight modification. It parses repositories into a function-level dependency graph. The architecture retains topological file dependencies an...
|
03-14 14:22 | Success | - | |
|
exp_2510.14393v1_20260314_141100
|
Benchmark for Low Power Vision Transformer Accelerator
**Architecture** Shifts optimization focus from self-attention to the Feed-Forward Network (FFN), identified as the bottleneck for short-token Vision Transformers. Implements algorithm-hardware co-design using dynamic token pruning and repl...
|
03-14 14:11 | Success | - | |
|
exp_pytrain.20260314140855.043_20260314_140911
|
Typed Module Scaffolder & Validator
Objective This benchmark tests your ability to construct robust Python filesystem utilities using modern type annotations. You will implement a lightweight package scaffolder that leverages `typing.TypedDict` for metadata definitions and `p...
|
03-14 14:09 | Success | - | |
|
exp_2510.27135v1_20260314_140730
|
Benchmark Design: E-MMDiT Efficiency Analysis
**Architecture:** E-MMDiT is a 304M parameter Multimodal Diffusion Transformer (MMDiT) optimized for token efficiency. It employs a highly compressive visual tokenizer and a multi-path compression module to reduce sequence length. Key innov...
|
03-14 14:07 | Success | - | |
|
exp_2511.08128v1_20260314_140647
|
Sentence-Anchored Gist Compression for Long-Context LLMs
**Architecture:** Introduces "Sentence-Anchored Gist Compression," utilizing learned compression tokens integrated into pre-trained LLMs via fine-tuning. **Memory Footprint:** Significantly reduces KV cache storage and memory bandwidth. Val...
|
03-14 14:06 | Success | - | |
|
exp_2511.15244v2_20260314_140607
|
Context Cascade Compression (C3) Benchmark
**Architecture:** C3 utilizes a cascaded design: a small "Compressor" LLM encodes long contexts into fixed-length latent vectors (e.g., 32–64 tokens), which a large "Decoder" LLM subsequently processes for generation. **Memory Footprint:**...
|
03-14 14:06 | Success | - | |
|
exp_2512.12560v1_20260314_140525
|
StreamingAssistant: Efficient Visual Token Pruning for Accelerating Online Video Understanding
**Architecture:** StreamingAssistant optimizes Multimodal LLMs for video via a token pruning framework. It introduces the MSSAVT metric to evaluate spatial redundancy and employs a "masked pruning strategy" to remove mutually unadjacent tok...
|
03-14 14:05 | Success | - | |
|
exp_2504.04787v1_20260314_140437
|
Dynamic Vision Mamba (DyVM) Efficiency Benchmark
**Architecture:** DyVM optimizes Mamba vision backbones by addressing spatial redundancy via **Dynamic Token Merging** (rearranging pruned sequences before SSM layers to prevent training-inference mismatch) and **Dynamic Block Skipping** (s...
|
03-14 14:04 | Success | - | |
|
exp_pytrain.20260314140220.042_20260314_140244
|
Generic Data Container Refactoring using PEP 695
This benchmark evaluates a refactoring of a generic data container utilizing **PEP 695** (Type Parameter Syntax). The primary goal is to eliminate boilerplate code associated with `typing.TypeVar` and `typing.Generic`, improving namespace m...
|
03-14 14:02 | Success | - | |
|
exp_2504.08966v1_20260314_135055
|
Benchmark for PACT (Pruning and Clustering-Based Token Reduction)
**Architecture:** PACT optimizes Visual Language Models (VLMs) by deploying a dual-strategy token reduction module at early LLM layers. It utilizes a novel, attention-free importance metric for pruning irrelevant tokens and applies Distance...
|
03-14 14:00 | Success | - | |
|
exp_2504.11004v1_20260314_135010
|
Dynamic Compressing Prompts for Efficient Inference of Large Language Models
**Architecture** LLM-DCP utilizes a reinforcement learning framework where a lightweight policy network (DCP-Agent) treats prompt compression as a Markov Decision Process (MDP). The agent sequentially evaluates and prunes tokens based on a...
|
03-14 13:50 | Success | - | |
|
exp_pytrain.20260314134747.041_20260314_134821
|
Type-Safe Entrypoint Dispatcher
Overview This coding drill demonstrates a robust, type-safe command dispatcher implemented in Python standard library. It leverages `typing.TypedDict` for configuration schema definition and `typing.Protocol` for structural interface enforc...
|
03-14 13:48 | Success | - | |
|
exp_2505.17827v2_20260314_134613
|
Not All Tokens Are What You Need In Thinking
**Architecture:** Introduces **Conditional Token Selection (CTS)**, a token-level compression framework. It utilizes conditional importance scoring to identify and prune non-essential reasoning tokens, training models to generate compressed...
|
03-14 13:46 | Success | - | |
|
exp_2505.21233v2_20260314_134517
|
Benchmark for CROP: Contextual Region-Oriented Visual Token Pruning
**Architecture** CROP introduces a query-driven localization module to identify relevant image regions, followed by a two-stage pruning strategy. It offers Pre-LLM Compression (PLC) for adaptive spatial downsampling and Inner-LLM Pruning (I...
|
03-14 13:45 | Success | - | |
|
exp_2505.22038v2_20260314_134426
|
Balanced Token Pruning (BTP) Benchmark
**Architecture:** Balanced Token Pruning (BTP) is a plug-and-play inference strategy for LVLMs that optimizes vision token reduction. It utilizes a multi-stage approach with a small calibration set to balance local output consistency agains...
|
03-14 13:44 | Success | - | |
|
exp_2506.05709v1_20260314_134347
|
Token Transforming: A Unified and Training-Free Token Compression Framework for Vision Transformer Acceleration
**Architecture** Proposes a "Token Transforming" framework that unifies token pruning and merging into an explicit matrix transformation operation. By generalizing token reduction as a many-to-many mapping, it preserves more information tha...
|
03-14 13:43 | Success | - | |
|
exp_2506.10967v2_20260314_134300
|
Benchmark: CDPruner for Visual Token Pruning
**Architecture** CDPruner replaces standard attention or similarity-based pruning with a Determinantal Point Process (DPP) algorithm. It calculates "conditional diversity" to select a subset of visual tokens that are both representative of...
|
03-14 13:43 | Success | - | |
|
exp_pytrain.20260314134051.040_20260314_134111
|
Strictly Typed Plugin Registry with Dynamic Module Discovery
Overview This benchmark tests the ability to construct a robust, type-safe plugin system using Python's standard library. It simulates a simplified architecture similar to PyTorch or LitGPT, where model architectures are registered dynamica...
|
03-14 13:41 | Success | - | |
|
exp_2407.14057v1_20260314_132917
|
Benchmark Design: LazyLLM Simulation
**Architecture:** LazyLLM introduces dynamic token pruning within the attention mechanism. Unlike static pruning, it re-evaluates token importance at each generation step, skipping KV cache computation for tokens deemed irrelevant to the im...
|
03-14 13:39 | Success | - | |
|
exp_2408.08604v5_20260314_132808
|
Benchmark for Bi-Directional Deep Contextual Video Compression (DCVC-B)
**Paper:** Bi-Directional Deep Contextual Video Compression (DCVC-B) **Architecture:** DCVC-B replaces traditional hybrid coding with a deep learning framework optimized for B-frames. It utilizes a bi-directional motion difference context p...
|
03-14 13:28 | Success | - | |
|
exp_2409.01179v3_20260314_132719
|
Recoverable Compression Benchmark
**Architecture:** A training-free, plug-and-play module for Large Multimodal Models (LMMs). It utilizes cross-modal similarity between the textual prompt and visual feature maps to dynamically recover semantically relevant visual tokens whi...
|
03-14 13:27 | Success | - | |
|
exp_pytrain.20260314132426.039_20260314_132504
|
Python Skill Fallback
Title: Generic Model Registry with Strict Typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-14 13:25 | Success | - | |
|
exp_2401.04975v1_20260314_132231
|
HaltingVT Benchmark
**Architecture:** HaltingVT modifies Joint Space-Time Video Transformers by introducing a "Glimpser" module that performs adaptive, layer-wise token pruning. It dynamically removes redundant spatial-temporal tokens—specifically targeting mi...
|
03-14 13:22 | Success | - | |
|
exp_2303.06522v1_20260314_132128
|
Token Sparsification for Faster Medical Image Segmentation
**Architecture:** Proposes a Sparse-Completion-Dense (SCD) pipeline to enable token sparsification for segmentation. The method employs **Soft-topK Token Pruning (STP)** using a lightweight sub-network for differentiable token selection. It...
|
03-14 13:21 | Success | - | |
|
exp_2510.16092v1_20260314_132035
|
Compressing Many-Shots in In-Context Learning
**Architecture:** Introduces **MemCom**, a layer-wise compression technique for In-Context Learning (ICL). Unlike standard prompt pruning, MemCom utilizes a dedicated compressor network to generate "soft-token" summaries at **every transfor...
|
03-14 13:20 | Success | - | |
|
exp_2511.10488v1_20260314_131935
|
SPOT: Sparsification Benchmark
**Architecture:** SPOT introduces lightweight relevance predictors into standard Vision Transformer (ViT) blocks. These modules analyze token embeddings and inter-layer attention dynamics to identify and prune redundant tokens *prior* to th...
|
03-14 13:19 | Success | - | |
|
exp_pytrain.20260314131714.038_20260314_131737
|
Robust Type-Safe Plugin Registry with Runtime Discovery
Overview This benchmark implements a modular plugin architecture in pure Python. It demonstrates the utility of Python's `typing.Protocol` for defining structural interfaces (subtyping) and `inspect` for runtime discovery and registration o...
|
03-14 13:17 | Success | - | |
|
exp_2504.17040v2_20260314_131534
|
DyMU: Dynamic Merging and Virtual Unmerging Benchmark
**Architecture:** DyMU optimizes VLMs via two training-free modules: Dynamic Token Merging (DToMe) and Virtual Token Unmerging (VTU). DToMe prunes redundant ViT tokens based on image complexity, while VTU reconstructs attention masks for th...
|
03-14 13:15 | Success | - | |
|
exp_2303.14526v1_20260314_131433
|
Benchmark: Selective Structured State-Spaces (S5) for Video
**Architecture:** S5 (Selective Structured State-Space) improves upon the S4 architecture by introducing a **lightweight mask generator**. This module adaptively prunes redundant image tokens, avoiding the quadratic complexity of dense self...
|
03-14 13:14 | Success | - | |
|
exp_2511.18920v1_20260314_131343
|
EventSTU: Event-Guided Efficient Spatio-Temporal Understanding for Video Large Language Models
**Architecture:** EventSTU is a training-free framework for Video LLMs that optimizes spatio-temporal processing. It utilizes **simulated events** (pixel changes between frames) to guide a **coarse-to-fine keyframe sampling** strategy (temp...
|
03-14 13:13 | Success | - | |
|
exp_2512.03643v1_20260314_131246
|
Optical Context Compression Is Just (Bad) Autoencoding
**Architecture:** The study benchmarks DeepSeek-OCR’s Vision Encoder against two lightweight alternatives: parameter-free Mean Pooling and a learned Hierarchical Encoder. **Memory Footprint & Speed:** Vision encoders introduce significant p...
|
03-14 13:12 | Success | - | |
|
exp_pytrain.20260314131022.037_20260314_131048
|
Python Skill Fallback
Title: Typed Package Scaffolder & Import Manager - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-14 13:10 | Success | - | |
|
exp_2503.23959v2_20260314_130909
|
Local-Aware Token Pruning (ALTP) Benchmark
**Summary for ARES 8GB Roadmap** **Architecture:** ALTP (Adaptive Local-Aware Token Pruning) accelerates Grounded Conversation Generation models (e.g., GLaMM, OMG-LLaVA) by integrating two lightweight modules: Detail Density Capture (DDC) a...
|
03-14 13:09 | Success | - | |
|
exp_2504.02438v5_20260314_130819
|
Benchmarking ViLAMP: Hierarchical Differential Distillation
**Architecture:** ViLAMP introduces "Differential Distillation," a hierarchical method treating video tokens with "mixed precision." It isolates task-relevant keyframes for full-patch processing while compressing non-keyframes to query-sali...
|
03-14 13:08 | Success | - | |
|
exp_2505.18051v3_20260314_130720
|
LookWhere? Efficient Visual Recognition Benchmark
**Architecture:** Introduces a dual-branch adaptive system comprising a low-resolution **Selector** (identifies ROIs) and a high-resolution **Extractor** (processes only relevant patches). This decouples "where to look" from "what to see,"...
|
03-14 13:07 | Success | - | |
|
exp_2511.16943v2_20260314_130632
|
RASTP: Representation-Aware Semantic Token Pruning for Generative Recommendation with Semantic Identifiers
**Architecture** RASTP introduces a dynamic token pruning layer for Generative Recommendation systems. To handle the bloat caused by long Semantic Identifiers (SIDs), it calculates a composite importance score combining **Semantic Saliency*...
|
03-14 13:06 | Success | - | |
|
exp_pytrain.20260314130359.036_20260314_130424
|
Typed Module Loader & Validator
Overview This benchmark demonstrates a robust, autonomous system for safely loading and validating third-party Python modules at runtime. It simulates a package installation process where code is dynamically generated, written to disk, and...
|
03-14 13:04 | Success | - | |
|
exp_2504.08934v1_20260314_125239
|
This benchmark evaluates the **GistPool** methodology against standard **Average Pooling** for Long Context In-Context C...
**Architecture:** GistPool is an in-context compression technique designed for decoder-only transformers. It addresses the information loss and capacity limitations of previous "Gisting" methods by integrating average pooling principles to...
|
03-14 13:02 | Success | - | |
|
exp_2504.12778v1_20260314_125129
|
Towards Lossless Token Pruning in Late-Interaction Retrieval Models
**Architecture:** Modifies **Late Interaction (ColBERT)** training using regularization losses to force non-essential token embeddings to zero, enabling lossless static pruning. **Memory Footprint:** **Critical for 8GB VRAM.** Reduces index...
|
03-14 12:51 | Success | - | |
|
exp_2504.16574v1_20260314_125024
|
PIS: Prompt Importance Sampling Benchmark
**PIS Architecture:** The paper proposes a dual-level compression framework utilizing a lightweight 9-layer Reinforcement Learning (RL) agent coupled with "Russian Roulette" semantic sampling. It quantifies token saliency using the target L...
|
03-14 12:50 | Success | - | |
|
exp_pytrain.20260314124800.035_20260314_124836
|
Python Skill Fallback
Title: Dynamic Generic Plugin Pipeline - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-14 12:48 | Success | - | |
|
exp_2504.21263v1_20260314_124621
|
Embracing Collaboration Over Competition: Condensing Multiple Prompts for Visual In-Context Learning
**Architecture:** Condenser is a lightweight, trainable external plugin for Visual In-Context Learning (VICL). Instead of selecting a single prompt or ensembling, it performs "prompt condensation," fusing fine-grained context from multiple...
|
03-14 12:46 | Success | - | |
|
exp_2505.11471v1_20260314_124400
|
CRISP: Efficiency Benchmark Simulation
**Architecture:** CRISP modifies Multi-Vector retrieval (specifically ColBERT-style) by integrating clustering objectives directly into the end-to-end training loop. It learns to prune "noisy" tokens, creating representations that are inher...
|
03-14 12:44 | Success | - | |
|
exp_pytrain.20260314124121.034_20260314_124143
|
Dynamic Plugin Loader with Runtime Type Verification
This benchmark evaluates the ability to implement a robust dynamic plugin loading system. It tests the candidate's proficiency with the `importlib` library, `typing.Protocol` for structural subtyping, and file system management using `pathl...
|
03-14 12:41 | Success | - | |
|
exp_2505.13975v3_20260314_123932
|
DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models
**Architecture:** DRP utilizes a hybrid teacher-student framework. A teacher model performs skill-aware step decomposition to prune verbose reasoning chains. These compact paths are distilled into a student model via standard Supervised Fin...
|
03-14 12:39 | Success | - | |
|
exp_2505.18757v2_20260314_123838
|
ToDRE: Effective Visual Token Pruning via Token Diversity and Task Relevance
**ToDRE** is a training-free, two-stage framework for efficient Large Vision-Language Model (LVLM) inference. * **Architecture:** 1. **Token Diversity (Post-Encoder):** Uses a greedy max-sum diversification algorithm to select representativ...
|
03-14 12:38 | Success | - | |
|
exp_2506.04997v1_20260314_123741
|
Benchmark Proposal: Light-ColPali/ColQwen2 (Token Merging)
**Architecture:** Introduces **Light-ColPali/ColQwen2**, an optimization of late-interaction visual document retrievers (VDR) based on ColBERT-style architecture. **Indexing & Strategy:** Rejects token pruning (due to the loss of query-agno...
|
03-14 12:37 | Success | - | |
|
exp_2407.05941v4_20260314_123656
|
Pruning One More Token is Enough: Leveraging Latency-Workload Non-Linearities for Vision Transformers on the Edge
**Architecture:** Introduces a training-free token pruning schedule for Vision Transformers (ViTs) that exploits non-linear latency-workload correlations specific to edge hardware. **Memory Footprint:** Significantly reduces activation memo...
|
03-14 12:36 | Success | - | |
|
exp_pytrain.20260314123421.033_20260314_123457
|
Python Reliability Drill: Typing & Packaging Benchmark
This benchmark evaluates the robustness of a pure-Python "Inference Engine" simulation, focusing on strict type enforcement (`typing`), package metadata handling (`packaging`), and deterministic resource telemetry. It mocks the behavior of...
|
03-14 12:35 | Success | - | |
|
exp_2407.08892v1_20260314_122306
|
Benchmark: Prompt Compression Methods for Long Context
**Summary for ARES 8GB Roadmap** This study evaluates three prompt compression paradigms—extractive, abstractive, and token pruning—to mitigate the high memory and compute costs of long-context inference. * **Architecture:** A comparative a...
|
03-14 12:33 | Success | - | |
|
exp_2408.00274v1_20260314_122207
|
QUITO: Accelerating Long-Context Reasoning through Query-Guided Context Compression
**Architecture:** QUITO is a lightweight, plug-in attention compressor for RAG pipelines. It computes the attention distribution of a "trigger token" (the query) over retrieved context tokens to identify and retain relevant information. **R...
|
03-14 12:22 | Success | - | |
|
exp_2408.10497v3_20260314_122110
|
QUITO-X: A New Perspective on Context Compression from the Information Bottleneck Theory
**QUITO-X** optimizes long-context handling for 8GB VRAM constraints by applying Information Bottleneck (IB) theory to compress prompts based on query relevance. * **Architecture:** Replaces standard self-information metrics with a **cross-...
|
03-14 12:21 | Success | - | |
|
exp_pytrain.20260314121825.032_20260314_121900
|
Type-Safe Plugin Registry Benchmark
Overview This benchmark evaluates the implementation of a robust, type-safe plugin registry system in Python. It leverages Python's `typing.Protocol` to enforce structural subtyping (duck typing) at registration time, ensuring that all regi...
|
03-14 12:19 | Success | - | |
|
exp_2409.14364v4_20260314_121649
|
Position IDs Matter: An Enhanced Position Layout for Efficient Context Compression in Large Language Models
**Enhanced Position Layout (EPL)** improves context compression via position ID manipulation. * **Architecture:** Modifies the position indices of special "gist" or compression tokens to minimize the distance to source context tokens, prese...
|
03-14 12:16 | Success | - | |
|
exp_2402.18700v2_20260314_121458
|
Benchmark: Natural Language Prompt Encapsulation (Nano-Capsulator)
**Paper:** Learning to Compress Prompt in Natural Language Formats (Nano-Capsulator) * **Architecture:** Proposes a reinforcement learning framework that distills long prompts into dense "Capsule Prompts" in natural language. It utilizes a...
|
03-14 12:15 | Success | - | |
|
exp_2309.15755v2_20260314_121358
|
CAIT: Triple-Win Compression towards High Accuracy, Fast Inference, and Favorable Transferability For ViTs
**Architecture** CAIT proposes a dual-strategy compression pipeline for Vision Transformers (ViTs). It integrates **Asymmetric Token Merging (ATME)**, which merges neighboring tokens to reduce sequence length while strictly preserving spati...
|
03-14 12:14 | Success | - | |
|
exp_pytrain.20260314121101.031_20260314_121124
|
Typed Micro-Package Architecture Benchmark
This benchmark evaluates a candidate's ability to structure a Python script as a robust, installable micro-package. It focuses on strict static typing using `typing.Protocol` and proper namespace management using `__all__`. Benchmark Detail...
|
03-14 12:11 | Success | - | |
|
exp_2309.16738v3_20260314_120919
|
ELIP: Efficient Discriminative Language-Image Pre-training with Fewer Vision Tokens
**Paper:** ELIP: Efficient Discriminative Language-Image Pre-training with Fewer Vision Tokens **Architecture:** ELIP proposes a trainable-parameter-free token pruning and merging mechanism for Vision Transformers (ViT) within Language-Imag...
|
03-14 12:09 | Success | - | |
|
exp_2504.18579v4_20260314_120826
|
Sparsity Forcing: Reinforcing Token Sparsity of MLLMs
**Architecture** Introduces *Sparsity Forcing*, a Reinforcement Learning (RL) post-training framework for Multimodal LLMs (specifically Qwen2-VL/2.5-VL). It does not alter model weights but optimizes token selection by contrasting inference...
|
03-14 12:08 | Success | - | |
|
exp_2512.00647v2_20260314_120739
|
MambaScope: Coarse-to-Fine Scoping for Efficient Vision Mamba
**Summary for ARES 8GB Roadmap** * **Architecture:** MambaScope proposes an adaptive "coarse-to-fine" wrapper for Vision Mamba (Vim). It replaces static high-resolution processing with a dynamic pipeline. The model initially processes the i...
|
03-14 12:07 | Success | - | |
|
exp_2510.18043v1_20260314_120633
|
CompactPrompt: A Unified Pipeline for Prompt Data Compression in LLM Workflows
**Architecture:** CompactPrompt is a model-agnostic preprocessing pipeline. It utilizes "hard" prompt pruning via self-information scoring and dependency-based phrase grouping, paired with "soft" file-level compression (n-gram abbreviation...
|
03-14 12:06 | Success | - | |
|
exp_pytrain.20260314120413.030_20260314_120435
|
Dynamic Type-Safe Plugin Loader Benchmark
This coding drill benchmarks your ability to construct a robust, runtime-validated plugin system using Python's standard library. You must implement a mechanism that dynamically discovers code modules within a temporary package structure, v...
|
03-14 12:04 | Success | - | |
|
exp_2511.18691v1_20260314_120238
|
EVCC: Enhanced Vision Transformer-ConvNeXt-CoAtNet Fusion Benchmark
**Architecture:** EVCC is a multi-branch hybrid fusing ViT, ConvNeXt, and CoAtNet via a dynamic router gate and gated bidirectional cross-attention. Its primary efficiency mechanism is adaptive token pruning, which preserves information whi...
|
03-14 12:02 | Success | - | |
|
exp_2512.08169v1_20260314_120119
|
Information-Dense Reasoning for Efficient and Auditable Security Alert Triage
**Architecture:** Hybrid cloud-edge framework (AIDR) employing a lightweight cloud router to dispatch alerts to specialized on-premise "expert" models for reasoning generation. **Memory Footprint:** Optimized for constrained environments. I...
|
03-14 12:01 | Success | - | |
|
exp_2512.10324v1_20260314_120027
|
Benchmark for EchoingPixels: Cross-Modal Adaptive Token Reduction
**Architecture:** EchoingPixels optimizes Audio-Visual LLMs via the **Cross-Modal Semantic Sieve (CS2)**. Instead of unimodal pruning, CS2 merges audio and video tokens into a single pool, using cross-modal co-attention to dynamically selec...
|
03-14 12:00 | Success | - | |
|
exp_pytrain.20260314115746.029_20260314_115807
|
Strict Package Metadata Inspector
This coding drill validates your ability to use the Python standard library for system introspection and strict type safety. Objective Create a robust script `meta_inspector.py` (implemented within `benchmark.py`) that inspects installed Py...
|
03-14 11:58 | Success | - | |
|
exp_2512.14244v4_20260314_115615
|
EDU-based Context Compressor: Benchmark
**Architecture:** Proposes a two-stage "structure-then-select" pipeline. First, *LingoEDU* parses linear text into a structural relation tree of Elementary Discourse Units (EDUs) anchored to source indices to prevent hallucinations. Second,...
|
03-14 11:56 | Success | - | |
|
exp_2503.20384v2_20260314_115533
|
Benchmark for MoLe-VLA: Dynamic Layer-skipping VLA
**Architecture:** MoLe-VLA transforms static LLM inference into a dynamic "Mixture-of-Layers" framework. A **Spatial-Temporal Aware Router (STAR)** selectively activates specific LLM layers based on the robot's current state, treating layer...
|
03-14 11:55 | Success | - | |
|
exp_2504.16786v1_20260314_115448
|
MOOSComp: Improving Lightweight Long-Context Compressor via Mitigating Over-Smoothing and Incorporating Outlier Scores
**MOOSComp Analysis for ARES 8GB Roadmap** * **Architecture:** Utilizes a lightweight BERT-based encoder for token classification. It mitigates over-smoothing via an inter-class cosine similarity loss during training and incorporates outlie...
|
03-14 11:54 | Success | - | |
|
exp_2505.12215v2_20260314_115404
|
GMSA Context Compression Benchmark
**Architecture:** GMSA is an encoder-decoder framework designed to compress long-context inputs into a compact sequence of "soft tokens." It utilizes **Group Merging** to ensure uniform semantic aggregation and **Layer Semantic Alignment (L...
|
03-14 11:54 | Success | - | |
|
exp_2403.15388v6_20260314_115326
|
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
**Architecture:** PruMerge inserts a lightweight optimization module between the visual encoder (e.g., CLIP) and the LLM. It utilizes a two-stage strategy: **Pruning** discards redundant visual tokens based on attention sparsity between the...
|
03-14 11:53 | Success | - | |
|
exp_pytrain.20260314115105.028_20260314_115138
|
Environment Metadata Auditor with PEP 695 Generics
This drill verifies the ability to inspect the Python runtime environment using standard library tools (`importlib.metadata`) and modern typing features introduced in Python 3.12 (PEP 695 Type Parameter Syntax). Objective Create a script `b...
|
03-14 11:51 | Success | - | |
|
exp_2510.08907v4_20260314_113950
|
Semantic-Anchor Compression (SAC) Benchmark
**Architecture:** Proposes Semantic-Anchor Compression (SAC), eliminating the need for autoencoding-based training. The method selects specific "anchor" tokens from the input context and aggregates information from the entire text into thei...
|
03-14 11:49 | Success | - | |
|
exp_2512.01949v1_20260314_113902
|
Script: Graph-Structured and Query-Conditioned Semantic Token Pruning for Multimodal Large Language Models
**Architecture:** Script proposes a plug-and-play, training-free pipeline featuring two core modules: a graph-structured pruning module (to remove spatial redundancy) and a query-conditioned semantic pruning module (to retain task-relevant...
|
03-14 11:39 | Success | - | |
|
exp_2505.15774v1_20260314_113818
|
Hybrid Context Compression (HyCo2) Benchmark
**Paper:** Beyond Hard and Soft: Hybrid Context Compression for Balancing Local and Global Information Retention **Architecture:** HyCo2 introduces a dual-module context compressor. It utilizes a **hybrid adapter** to refine global semantic...
|
03-14 11:38 | Success | - | |
|
exp_pytrain.20260314113600.027_20260314_113620
|
Robust Dynamic Plugin Loader with Protocol Validation
Overview This coding drill benchmark tests your ability to design a robust, type-safe plugin architecture using only the Python Standard Library. It simulates an environment where code must be loaded dynamically at runtime from temporary fi...
|
03-14 11:36 | Success | - | |
|
exp_2506.07851v2_20260314_113441
|
Learning to Focus (LeaF) Benchmark
**Paper:** Learning to Focus (LeaF) **Architecture:** LeaF is a **training-phase distillation framework** that utilizes a larger teacher model to perform gradient-based interventions. It identifies "confounding" tokens (distractors) in the...
|
03-14 11:34 | Success | - | |
|
exp_2408.11799v1_20260314_113339
|
Practical token pruning for foundation models in few-shot conversational virtual assistant systems
**Architecture:** Utilizes contrastive-pretrained Sentence Transformers for intent classification. The core innovation is a **Dynamic Token Pruning** mechanism implemented via a multi-task adaptation approach, allowing the model to skip pro...
|
03-14 11:33 | Success | - | |
|
exp_2409.13035v3_20260314_113249
|
TACO-RL: Task Aware Prompt Compression Optimization with Reinforcement Learning
**Architecture:** Utilizes a lightweight Transformer encoder (token classification policy) trained via the REINFORCE algorithm. Unlike task-agnostic pruning, it optimizes retention decisions using task-specific reward signals (e.g., ROUGE,...
|
03-14 11:32 | Success | - | |
|
exp_2505.18227v3_20260314_113156
|
Token Reduction Should Go Beyond Efficiency in Generative Models -- From Vision, Language to Multimodality
**Architecture:** Position paper proposing unified token reduction (pruning/merging) strategies across Vision, Language, and Multimodal Transformers. Reframes reduction as a core design principle for model alignment and stability, not just...
|
03-14 11:32 | Success | - | |
|
exp_pytrain.20260314112929.026_20260314_113006
|
Python Skill Fallback
Title: Type-Safe Component Registry with Dynamic Configuration - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-14 11:30 | Success | - | |
|
exp_2505.18227v3_20260314_112742
|
Benchmark Proposal: Semantic Token Reduction for Quality and Efficiency
**Architecture:** Position paper proposing unified token reduction (pruning/merging) strategies across Vision, Language, and Multimodal Transformers. Reframes reduction as a core design principle for model alignment and stability, not just...
|
03-14 11:27 | Success | - | |
|
exp_2511.18950v1_20260314_112654
|
Compressor-VLA: Instruction-Guided Visual Token Compression for Efficient Robotic Manipulation
**Architecture** Compressor-VLA introduces a hybrid, instruction-conditioned compression framework. It utilizes two distinct modules: a Semantic Task Compressor (STC) for holistic context and a Spatial Refinement Compressor (SRC) for fine-g...
|
03-14 11:26 | Success | - | |
|
exp_2407.09014v3_20260314_112556
|
Benchmark: CompAct (Compressing Retrieved Documents Actively)
**Architecture:** Modular plug-in framework utilizing off-the-shelf dense retrievers (e.g., Contriever) and an iterative "Active Selector" policy network. Unlike static one-shot filters, it sequentially selects documents based on the evolvi...
|
03-14 11:26 | Success | - | |
|
exp_2510.09156v1_20260314_112516
|
Agentic-KGR: Co-evolutionary Knowledge Graph Construction through Multi-Agent Reinforcement Learning
**Architecture:** A multi-agent reinforcement learning (RL) framework designed to co-evolve LLMs with Knowledge Graphs (KGs), specifically integrating with **GraphRAG**. **Retrieval & Context:** * **Architecture:** GraphRAG. * **Indexing:**...
|
03-14 11:25 | Success | - | |
|
exp_pytrain.20260314112300.025_20260314_112327
|
Type-Safe Dynamic ZipApp Packager
This benchmark evaluates a system's ability to programmatically construct a Python application, perform static type checking to enforce interface compliance using `typing.Protocol`, and package the result into a standalone executable ZipApp...
|
03-14 11:23 | Success | - | |
|
exp_2511.09883v1_20260314_112139
|
HCC-3D: Hierarchical Compensatory Compression for 98% 3D Token Reduction in Vision-Language Models
**Architecture:** HCC-3D solves the 3D-VLM context bottleneck where dense point-cloud tokens overwhelm the LLM. It utilizes a two-stage compressor preceding the LLM: Global Structure Compression (GSC), which employs learnable queries to agg...
|
03-14 11:21 | Success | - | |
|
exp_2601.02365v1_20260314_112051
|
FUSE: Failure-aware Usage of Subagent Evidence
**Architecture:** FUSE replaces raw image prompting with a **Grounded Design Representation (GDR)**, a compact JSON schema encoding canvas elements, styles, and structure. It utilizes a **subagent architecture** where tasks are routed to sp...
|
03-14 11:20 | Success | - | |
|
exp_2511.14582v1_20260314_112006
|
OmniZip: Audio-Guided Dynamic Token Compression Benchmark
**Architecture** OmniZip is a training-free middleware framework for Omnimodal LLMs. It optimizes inference by using audio modality as an anchor to guide video token compression. The architecture calculates "audio retention scores" to ident...
|
03-14 11:20 | Success | - | |
|
exp_2511.19718v1_20260314_111913
|
Benchmark: Structural Reparameterization for Efficient Vision Transformers
**Architecture:** Proposes a structural reparameterization technique that trains parallel multi-branch ViT blocks (spanning FFN and MHSA) which are mathematically consolidated into a single-path architecture for deployment. **Memory & Speed...
|
03-14 11:19 | Success | - | |
|
exp_pytrain.20260314111631.024_20260314_111702
|
Python Skill Fallback
Title: Generic Pipeline CLI Engine - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-14 11:17 | Success | - | |
|
exp_2504.03165v3_20260314_111446
|
Benchmark for EDC2-RAG: Efficient Dynamic Clustering for RAG
**Architecture:** EDC2-RAG is a post-retrieval optimization layer. It utilizes dynamic clustering (grouping retrieved chunks by semantic similarity) to identify and remove redundancy and noise before sending context to the LLM. **Retrieval...
|
03-14 11:14 | Success | - | |
|
exp_2505.07861v3_20260314_111333
|
Benchmark: Caprese - Scalable LLM Reasoning Acceleration
**Paper:** Scalable LLM Reasoning Acceleration with Low-rank Distillation (Caprese) **Architecture:** Proposes low-rank distillation applied to feedforward (FFN) layers to recover math reasoning capabilities lost during quantization or prun...
|
03-14 11:13 | Success | - | |
|
exp_2505.13506v1_20260314_111220
|
EcoSafeRAG: Efficient Security through Context Analysis in Retrieval-Augmented Generation
**Architecture:** A plug-and-play security module using "bait-guided" context diversity detection and sentence-level processing to filter corpus poisoning without relying on LLM internal knowledge. **Retrieval Strategy:** Functions as a pos...
|
03-14 11:12 | Success | - | |
|
exp_pytrain.20260314111001.023_20260314_111029
|
Benchmark: Asynchronous Plugin Loader with Strict Protocol Enforcement
Overview This benchmark tests the ability to construct a robust, in-memory plugin architecture using Python's standard library. It combines `typing.Protocol` for strict interface definition and `asyncio` for concurrent execution to simulate...
|
03-14 11:10 | Success | - | |
|
exp_2505.21334v3_20260314_110619
|
HoliTom: Holistic Token Merging Benchmark
**Architecture:** HoliTom introduces a training-free, dual-stage framework combining "Outer-LLM" and "Inner-LLM" token merging. 1. **Outer-LLM:** Performs global redundancy-aware temporal segmentation and spatio-temporal merging to handle l...
|
03-14 11:08 | Success | - | |
|
exp_2506.12723v3_20260314_110527
|
SP-VLA: A Joint Model Scheduling and Token Pruning Approach for VLA Model Acceleration
**Architecture:** SP-VLA accelerates Vision-Language-Action models through joint model scheduling and token pruning. It introduces a dynamic scheduler that classifies actions as "deliberative" (requiring full VLA) or "intuitive" (offloaded...
|
03-14 11:05 | Success | - | |
|
exp_2407.20485v2_20260314_110431
|
A2SF: Accumulative Attention Scoring with Forgetting Factor
**Architecture:** A2SF refines KV cache eviction logic in decoder-only models. It addresses the bias inherent in causal masking (where older tokens accumulate artificially high attention scores) by introducing a "Forgetting Factor" ($\gamma...
|
03-14 11:04 | Success | - | |
|
exp_2401.07469v1_20260314_110319
|
SUReID Benchmark
**Architecture:** SUReID utilizes a Vision Transformer backbone featuring **Hierarchical Token Sparsification (HTS)**. HTS dynamically prunes redundant and occluded tokens prior to the self-attention layer, effectively streamlining feature...
|
03-14 11:03 | Success | - | |
|
exp_pytrain.20260314110052.022_20260314_110125
|
Python Skill Fallback
Title: PEP 695 Generic Result Monad Implementation - Focus: Typing, Packaging - Note: Generated fallback due to unavailable model output.
|
03-14 11:01 | Success | - | |
|
exp_2510.18866v4_20260314_104850
|
LightMem Benchmark
**Architecture:** LightMem implements a three-stage memory pipeline inspired by human cognition: **Sensory Memory** (rapid filtering and topic-based compression), **Short-Term Memory** (topic-aware consolidation and summarization), and **Lo...
|
03-14 10:58 | Success | - | |
|
exp_2511.12428v1_20260314_104755
|
RedVTP: Training-Free Acceleration of Diffusion Vision-Language Models Inference via Masked Token-Guided Visual Token Pr...
**Architecture:** RedVTP targets Diffusion Vision-Language Models (DVLMs) like LLaDA-V and LaViDa. It introduces a training-free, response-driven strategy to prune redundant visual tokens during parallel decoding. **Memory Footprint:** Sign...
|
03-14 10:47 | Success | - | |
|
exp_2503.23455v1_20260314_104702
|
Efficient Token Compression for Vision Transformer with Spatial Information Preserved
**Architecture:** Introduces "Prune and Merge," a layer-wise compression module for Vision Transformers (ViTs). It integrates trainable merge and reconstruct matrices with shortcut connections to aggregate spatial information while discardi...
|
03-14 10:47 | Success | - | |
|
exp_2506.05096v4_20260314_104557
|
Astraea: Token-wise Acceleration Benchmark
**Architecture:** Introduces a plug-in acceleration framework for Video Diffusion Transformers (vDiTs) centered on a lightweight token selection mechanism and a memory-efficient, GPU-compatible sparse attention strategy. **Optimization Stra...
|
03-14 10:46 | Success | - | |
|
exp_pytrain.20260314104338.021_20260314_104404
|
Self-Validating Entry-Point Loader Benchmark
Overview This benchmark tests a developer's ability to construct a robust runtime plugin loader using Python's standard `typing` and `importlib` libraries. It simulates a micro-kernel architecture where functionality is discovered dynamical...
|
03-14 10:44 | Success | - | |
|
exp_2406.20092v2_20260314_104132
|
Visual Context Compression Benchmark
**Architecture:** Proposes a **Visual Context Compressor** to prune redundant visual tokens. This is integrated using **LLaVolta**, a staged training scheme that progressively increases compression (heavy to light) to maintain visual semant...
|
03-14 10:41 | Success | - | |
|
exp_2409.11182v1_20260314_104038
|
Video Token Sparsification (VTS) Benchmark
**Architecture:** VTS integrates a lightweight CNN-based proposal network to preprocess video inputs. It adaptively selects key frames and prunes redundant visual tokens to minimize the context window passed to the multimodal LLM. **Memory...
|
03-14 10:40 | Success | - | |
|
exp_2510.19183v1_20260314_103920
|
PruneHal: Multi-modal LLM Hallucination Mitigation Benchmark
**Architecture:** PruneHal targets multimodal LLMs (MLLMs) by introducing adaptive KV cache pruning specifically for visual tokens. It identifies that redundant visual tokens dilute attention, causing hallucinations. The architecture dynami...
|
03-14 10:39 | Success | - | |
|
exp_pytrain.20260314103540.020_20260314_103631
|
Dynamic Kernel Dispatcher Benchmark
This benchmark implements a robust, type-safe kernel registration and dispatch system. It mimics the architecture of high-performance libraries (like PyTorch or FlashAttention) where specific computational kernels are dynamically registered...
|
03-14 10:36 | Success | - | |
|
exp_2510.20797v1_20260314_103321
|
Simple Context Compression: Mean-Pooling and Multi-Ratio Training
**Architecture:** Proposes a **Mean-Pooling** compressor for **soft context compression** within RAG pipelines. This replaces the heavier "compression-tokens" architecture by averaging embeddings. It employs **multi-ratio training**, enabli...
|
03-14 10:33 | Success | - | |
|
exp_2511.08003v2_20260314_103218
|
SharpV Benchmark
**SharpV Summary for ARES 8GB Roadmap** **Architecture:** SharpV introduces a two-stage pruning framework to mitigate VideoLLM quadratic complexity. It first performs spatial-temporal adaptive token pruning (removing redundant frames/patche...
|
03-14 10:32 | Success | - | |
|
exp_2511.17129v2_20260314_103119
|
Benchmark: LLM2Comp Context Compression Efficiency
**Architecture:** LLM2Comp adapts causal LLMs via a **context compression pretext task**. The model splits into a Compressor and a Predictor, learning to generate fixed-size **"memory tokens"** that represent the full context for sequence p...
|
03-14 10:31 | Success | - | |
|
exp_pytrain.20260314102808.019_20260314_102832
|
Type-Guarded Plugin Loader with Semantic Versioning
Overview This benchmark tests the ability to construct a robust, type-safe plugin system using only the Python standard library. It simulates an environment where "Backend" models must be loaded dynamically based on strict interface complia...
|
03-14 10:28 | Success | - | |
|
exp_2511.18832v1_20260314_101648
|
This benchmark evaluates the performance impact of the "Concept than Document" context compression strategy.
**Architecture:** Unsupervised **AMR (Abstract Meaning Representation)** graph compression framework. **RAG Details:** * **Retrieval Strategy:** Post-retrieval semantic filtering. It parses retrieved documents into AMR graphs to extract sem...
|
03-14 10:26 | Success | - | |
|
exp_2512.04550v1_20260314_101550
|
AdmTree: Context Compression Benchmark
**Architecture** AdmTree implements a semantic binary tree for hierarchical context compression. Input is dynamically segmented based on information density, with variable-length segments converted into "gist tokens" at leaf nodes. A lightw...
|
03-14 10:15 | Success | - | |
|
exp_pytrain.20260314101323.018_20260314_101402
|
Strictly-Typed Event Dispatcher with Protocol Constraints
This benchmark tests your ability to design a robust, type-safe event system using Python's advanced type hinting features (`Protocol`, `Generic`, `TypeVar`). The goal is to create a generic `EventBus` that enforces structural subtyping (du...
|
03-14 10:14 | Success | - | |
|
exp_2512.13956v2_20260314_101048
|
Benchmark: AOI vs. Standard LLM Agent
**Architecture:** AOI proposes a multi-agent framework integrating three specialized agents with an LLM-based **Context Compressor**. It features a three-layer memory hierarchy (Working, Episodic, Semantic) and a dynamic task scheduler for...
|
03-14 10:10 | Success | - | |
|
exp_2505.18458v3_20260314_100946
|
LLM x DATA: KV-Cache Management Benchmark
**Paper:** A Survey of LLM $\times$ DATA **Architecture & Feasibility:** This is a broad survey (DATA4LLM) proposing a paradigm where inference is treated as a data-serving problem. It does not introduce a specific model architecture but re...
|
03-14 10:09 | Success | - | |
|
exp_2406.19251v1_20260314_100753
|
AutoRAG-HP: Automatic Online Hyper-Parameter Tuning for Retrieval-Augmented Generation
**Architecture:** AutoRAG-HP implements a two-level Hierarchical Multi-Armed Bandit (Hier-MAB) to automate RAG hyperparameter tuning online. **RAG Specifics:** Optimizes dense retrieval pipelines by dynamically adjusting *top-k* document co...
|
03-14 10:07 | Success | - | |
|
exp_pytrain.20260314100532.017_20260314_100603
|
Type-Safe Dynamic Plugin Registry
This coding drill demonstrates how to architect a modular, type-safe application by programmatically generating Python packages and enforcing runtime interface contracts. Overview The `benchmark.py` script performs the following complex ope...
|
03-14 10:06 | Success | - | |
|
exp_2510.12856v1_20260314_100322
|
Efficient Adaptive Transformer (EAT) Benchmark
**Architecture:** EAT integrates progressive token pruning, sparse attention, and dynamic early exiting into a unified 6-layer encoder (DistilBERT-based) designed for input-adaptive computation. **Memory Footprint:** While token pruning and...
|
03-14 10:03 | Success | - | |
|
exp_2510.17197v1_20260314_100206
|
ZSPAPrune: Zero-Shot Prompt-Aware Token Pruning for Vision-Language Models
**Architecture** ZSPAPrune introduces a zero-shot, hierarchical token pruning strategy for Vision-Language Models (VLMs). It operates in two stages: 1. **Prompt-Guided Selection:** Identifies visual tokens with high attentional relevance to...
|
03-14 10:02 | Success | - | |
|
exp_2510.18234v1_20260314_100057
|
DeepSeek-OCR: Optical Compression Benchmark
**Architecture:** Hybrid system utilizing `DeepEncoder` (compression engine) and a `DeepSeek3B-MoE-A570M` decoder. It maps dense text and high-resolution images into "optical 2D maps" represented as sparse vision tokens. **Memory Footprint:...
|
03-14 10:01 | Success | - | |
|
exp_pytrain.20260314095803.016_20260314_095838
|
Structural Subtyping and Dynamic Module Discovery
This benchmark tests the implementation of a flexible plugin architecture using Python's `typing.Protocol` for structural subtyping and runtime discovery mechanisms. Objective Create a single-file Python script that: 1. **Defines a Protocol...
|
03-14 09:58 | Success | - | |
|
exp_2511.02650v2_20260314_095618
|
Can Visual Input Be Compressed? A Visual Token Compression Benchmark for Large Multimodal Models
**Architecture:** Introduces **UniPruneBench**, a standardized benchmark for evaluating **visual token pruning** (and merging) strategies in LMMs (LLaVA, InternVL, Qwen2.5-VL). **Memory Footprint:** Focuses on reducing the massive token seq...
|
03-14 09:56 | Success | - | |
|
exp_2511.11139v2_20260314_095508
|
Speech-Aware Long Context Pruning and Integration for Contextualized Automatic Speech Recognition
**Architecture:** The paper proposes **SAP$^{2}$**, a dual-stage framework utilizing **Speech-Driven Attention-based Pooling (SDAP)**. This module dynamically compresses long textual context (e.g., presentation slides) into dense embeddings...
|
03-14 09:55 | Success | - | |
|
exp_2505.20698v1_20260314_095359
|
Sparsified State-Space Models (Simba) Benchmark
**Architecture:** Simba proposes a sparsified Mamba (SSM) architecture using hierarchical token pruning. It retains dense processing in lower layers to capture local features while aggressively pruning tokens in upper layers to establish "h...
|
03-14 09:54 | Success | - | |
|
exp_pytrain.20260314095037.015_20260314_095100
|
Strictly Typed Generic Result Container Module Benchmark
This benchmark tests the creation and usage of a strictly typed `Result[T, E]` monad container. It enforces proper encapsulation using `__all__`, utilizes `typing.Generic` and `dataclasses`, and validates the contract safety provided by PEP...
|
03-14 09:51 | Success | - | |
|
exp_2506.11886v1_20260314_094858
|
Beyond Homogeneous Attention: Memory-Efficient LLMs via Fourier-Approximated KV Cache
**Architecture** FourierAttention is a training-free framework optimizing the KV cache by exploiting the heterogeneous roles of attention heads. It maintains local context in lower dimensions while compressing long-range dependencies in upp...
|
03-14 09:49 | Success | - | |
|
exp_2506.13166v1_20260314_094730
|
GreedyPrune: Retenting Critical Visual Token Set for Large Vision Language Models
**Architecture** GreedyPrune is a training-free, plug-and-play visual token pruning module. It formalizes token selection as a combinatorial optimization problem, utilizing a greedy algorithm to jointly maximize semantic saliency (importanc...
|
03-14 09:47 | Success | - | |
|
exp_2407.12077v1_20260314_094618
|
GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill and Extreme KV-Cache Compression
**Architecture:** GoldFinch is a hybrid stacking an enhanced RWKV-6 ("Finch") base with a novel "GOLD" Transformer top. It combines RNN recurrence with linear attention mechanisms to balance efficient state management with high-performance...
|
03-14 09:46 | Success | - | |
|
exp_pytrain.20260314094329.014_20260314_094405
|
Python Skill Fallback
Title: Strictly Typed Package Scaffolder - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-14 09:44 | Success | - | |
|
exp_2403.08312v3_20260314_094055
|
StreamingDialogue: Prolonged Dialogue Learning via Long Context Compression with Minimal Losses
**Architecture:** StreamingDialogue compresses long dialogue histories into "conversational attention sinks" located at End-of-Utterance (EoU) tokens. It replaces dense full-context attention with a compressed representation, utilizing Shor...
|
03-14 09:40 | Success | - | |
|
exp_2510.07293v1_20260314_093915
|
AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs
**AudioMarathon Benchmark Analysis** * **Architecture & Scope:** This is a benchmark paper evaluating Large Audio Language Models (LALMs) on long-form audio (90s–300s). It exposes the limitations of standard Transformer attention ($O(N^2)$)...
|
03-14 09:39 | Success | - | |
|
exp_2401.03462v3_20260314_093830
|
Long Context Compression with Activation Beacon
**Architecture:** Introduces a "plug-in" module that directly compresses Keys and Values (KV) activations at every transformer layer. Unlike soft prompt methods, it uses a progressive, fine-grained workflow where compression is trained via...
|
03-14 09:38 | Success | - | |
|
exp_pytrain.20260314093515.013_20260314_093615
|
Strict-Typed Kernel API Design Benchmark
Objective This benchmark validates the implementation of a robust, strictly-typed kernel API design using Python's type hinting system (`typing.Protocol`, `typing.Generic`, `typing.TypeVar`) and module encapsulation (`__all__`). Design Brie...
|
03-14 09:36 | Success | - | |
|
exp_2510.18269v1_20260314_093331
|
StreamingTOM: Streaming Token Compression for Efficient Video Understanding
**Architecture:** StreamingTOM is a training-free, two-stage framework for streaming video LLMs. It decouples efficiency into: 1. **Causal Temporal Reduction (Pre-LLM):** Enforces a fixed visual budget per frame by selecting tokens based on...
|
03-14 09:33 | Success | - | |
|
exp_2510.22101v1_20260314_093212
|
Efficient SLM Semantic Search Benchmark
**Summary for ARES 8GB Roadmap** * **Architecture:** Decoder-only SLM tailored for semantic search. * **Memory Footprint:** Structural pruning reduces model size by 40%, while context compression techniques decrease input sequence length by...
|
03-14 09:32 | Success | - | |
|
exp_2504.04514v2_20260314_093123
|
Saliency-driven Dynamic Token Pruning for Large Language Models
**Architecture:** SDTP integrates a lightweight saliency-driven prediction module into LLM layers to estimate token importance via hidden states. It employs hierarchical pruning to dynamically discard redundant tokens layer-by-layer. **Memo...
|
03-14 09:31 | Success | - | |
|
exp_pytrain.20260314092842.012_20260314_092911
|
Generic Plugin Registry with Dynamic Module Loading
This benchmark evaluates an implementation of a robust, type-safe plugin architecture using Python's `typing` module and standard library introspection tools. Overview The script implements a `PluginRegistry` generic class capable of storin...
|
03-14 09:29 | Success | - | |
|
exp_2506.02850v2_20260314_092706
|
METok: Multi-Stage Event-based Token Compression Benchmark
**Architecture** METok is a training-free, three-stage token compression pipeline for Video LLMs: 1. **Event-aware Compression:** Reduces redundancy during vision encoding. 2. **Hierarchical Pruning:** Filters tokens during the prefill stag...
|
03-14 09:27 | Success | - | |
|
exp_2506.05167v2_20260314_092611
|
ECoRAG: Evidentiality-guided Compression Benchmark
**Architecture:** ECoRAG proposes an **iterative retrieval** framework. It utilizes an **evidentiality-guided compression** module that functions as a semantic filter/reranker, processing retrieved chunks to retain only information strictly...
|
03-14 09:26 | Success | - | |
|
exp_2506.11092v2_20260314_092516
|
Dynamic Context Tuning for Retrieval-Augmented Generation: Enhancing Multi-Turn Planning and Tool Adaptation
**Architecture:** DCT is a lightweight RAG wrapper featuring an attention-based context cache and a LoRA-based retrieval router to handle dynamic tools and multi-turn history. **Retrieval & Context:** * **Retrieval Architecture:** Uses LoRA...
|
03-14 09:25 | Success | - | |
|
exp_2407.09252v3_20260314_092410
|
Context Embeddings for Efficient Answer Generation in RAG
**Architecture:** COCOM proposes a compression module that encodes retrieved documents into a fixed set of Context Embeddings, bypassing the processing of long text sequences during decoding. **RAG Specifics:** * **Retrieval Strategy:** Ope...
|
03-14 09:24 | Success | - | |
|
exp_pytrain.20260314092059.011_20260314_092203
|
AST-Based Type Compliance Checker Benchmark
This benchmark defines a task for an autonomous coding agent to create a static analysis tool named `pkg_typing_guard.py`. The tool must recursively scan a given directory, identify valid Python packages (directories containing `__init__.py...
|
03-14 09:22 | Success | - | |
|
exp_2408.05933v1_20260314_092024
|
Optimizing RAG Techniques for Automotive Industry PDF Chatbots: A Case Study with Locally Deployed Ollama Models
**Architecture & Feasibility:** This paper proposes a **Self-RAG agent** architecture using **LangGraph** and **Ollama**, designed for local, low-resource environments. It is highly feasible for **8GB VRAM** roadmaps, leveraging Ollama’s qu...
|
03-14 09:20 | Success | - | |
|
exp_2409.10593v3_20260314_091855
|
CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios
**Architecture:** CSKV targets KV cache redundancy via channel-level low-rank decomposition on Key/Value projection layers. It utilizes a hybrid "bi-branch" cache: a sliding window preserves full-precision local context, while the global hi...
|
03-14 09:18 | Success | - | |
|
exp_2512.00504v1_20260314_091749
|
G-KV: Decoding-Time KV Cache Eviction with Global Attention
**Architecture:** G-KV introduces a decoding-time KV eviction mechanism utilizing a global scoring function. It combines local attention patterns with historical importance metrics to accurately identify and prune redundant tokens. To count...
|
03-14 09:17 | Success | - | |
|
exp_2512.11920v1_20260314_091636
|
CXL-SpecKV: A Disaggregated FPGA Speculative KV-Cache for Datacenter LLM Serving
**CXL-SpecKV** targets the memory bandwidth bottleneck of LLM serving by disaggregating Key-Value (KV) caches from GPU VRAM. * **Architecture:** Uses Compute Express Link (CXL) to offload KV storage to remote FPGA memory, decoupling memory...
|
03-14 09:16 | Success | - | |
|
exp_pytrain.20260314091415.010_20260314_091438
|
Python Skill Fallback
Title: AsyncIO Data Pipeline with Strict Typing and Module Structure - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-14 09:14 | Success | - | |
|
exp_2503.23367v3_20260314_091258
|
FastVAR: Linear Visual Autoregressive Modeling via Cached Token Pruning
**Architecture:** FastVAR is a post-training acceleration framework for Visual Autoregressive (VAR) models. It introduces a "cached token pruning" strategy that identifies converged tokens during the final (large-scale) generation step. Ins...
|
03-14 09:13 | Success | - | |
|
exp_2504.00557v1_20260314_091155
|
README: Efficient LLaMA-3.2-Vision Benchmark
**Architecture** Targets cross-attention-based LVLMs (specifically LLaMA-3.2-Vision). Unlike prior methods focused on self-attention, this approach exploits sparsity in cross-attention maps to identify and prune redundant visual features di...
|
03-14 09:11 | Success | - | |
|
exp_2505.15394v1_20260314_091057
|
Reranking with Compressed Document Representation
**Architecture & RAG:** Proposes a pipeline utilizing a first-stage retriever, a document compressor, and a distilled 1B-parameter reranker. Instead of processing raw text, the reranker consumes fixed-size embedding representations of docum...
|
03-14 09:11 | Success | - | |
|
exp_pytrain.20260314090712.009_20260314_090812
|
Robust Distribution Metadata Inspector
A Python CLI tool and coding drill benchmark designed to introspect environment packaging metadata using the standard library. This tool enforces strict type safety and gracefully handles missing or corrupt package data. Features * **Zero D...
|
03-14 09:08 | Success | - | |
|
exp_2407.01527v2_20260314_085525
|
KV Cache Compression Benchmark
**Paper:** KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches **Summary for ARES 8GB Roadmap:** This study provides a critical benchmark for long-context inference strategies,...
|
03-14 09:05 | Success | - | |
|
exp_2403.12968v2_20260314_085423
|
This benchmark evaluates the efficiency of the LLMLingua-2 methodology, which employs a small Transformer encoder (simul...
**Architecture:** Replaces unidirectional entropy-based models (LLaMA-7B) with a bidirectional **Transformer Encoder** (e.g., XLM-RoBERTa-large). Formulates compression as a **token classification** problem, using data distillation to train...
|
03-14 08:54 | Success | - | |
|
exp_2307.06945v4_20260314_085314
|
ICAE Efficiency Benchmark
**Architecture:** Introduces the In-context Autoencoder (ICAE), a lightweight wrapper (~1% parameter overhead) for Llama models. It utilizes a two-stage training pipeline (autoencoding + instruction tuning) to compress long contexts into de...
|
03-14 08:53 | Success | - | |
|
exp_pytrain.20260314084951.008_20260314_085030
|
Python Skill Fallback
Title: Metadata-Aware Typed Plugin Loader - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-14 08:50 | Success | - | |
|
exp_2308.14508v2_20260314_083802
|
LongBench: Long Context Understanding Benchmark
**Paper Type:** Benchmark / Evaluation Study. **Relevance to ARES 8GB:** LongBench standardizes evaluation for long-context understanding across 21 datasets (avg. length 6,711 words). While it proposes no new architecture, it offers critica...
|
03-14 08:48 | Success | - | |
|
exp_2510.13799v1_20260314_083655
|
BRIEF-Pro: Universal Context Compression with Short-to-Long Synthesis for Fast and Accurate Multi-Hop Reasoning
**Architecture:** BRIEF-Pro is a lightweight, universal compressor model utilizing "short-to-long synthesis" to perform abstractive summarization of retrieved documents, specifically trained to handle contexts exceeding 10k words. **RAG Imp...
|
03-14 08:36 | Success | - | |
|
exp_2510.20535v1_20260314_083608
|
Benchmark: ARC-Encoder Efficiency Simulation
**Architecture:** ARC-Encoder is a standalone compression model mapping $N$ text tokens to $N/x$ continuous vectors ($x \in \{4, 8\}$). These vectors replace standard token embeddings at the input layer of a frozen decoder LLM. **Memory Foo...
|
03-14 08:36 | Success | - | |
|
exp_2512.12701v1_20260314_083506
|
Efficient Vision-Language Reasoning via Adaptive Token Pruning
**Architecture** ATP introduces a lightweight gating module at the vision-language interface. It dynamically prunes visual tokens by ranking them via a hybrid importance score (combining ViT intra-modal attention and CLIP text-image similar...
|
03-14 08:35 | Success | - | |
|
exp_pytrain.20260314083214.007_20260314_083251
|
Dynamic Plugin Registry Benchmark
This benchmark evaluates an autonomous agent's ability to construct a robust, extensible plugin system using the Python standard library. It specifically targets the combination of `typing.Protocol` for Structural Subtyping (Duck Typing wit...
|
03-14 08:32 | Success | - | |
|
exp_2505.23277v2_20260314_083034
|
Sentinel: Decoding Context Utilization via Attention Probing for Efficient LLM Context Compression
**Sentinel** optimizes RAG inference by treating context compression as an **attention-decoding task**. * **Architecture:** Uses a lightweight **0.5B proxy model** with a trained "readout" module to probe the frozen target LLM's attention p...
|
03-14 08:30 | Success | - | |
|
exp_2407.08454v2_20260314_082827
|
Benchmark for Adaptive KV Cache Merging (KVMerger)
**Paper:** *Model Tells You Where to Merge (KVMerger)* * **Architecture:** KVMerger optimizes the Transformer attention mechanism by compressing the KV cache. It utilizes a **Merging Set Identification** algorithm to group tokens based on i...
|
03-14 08:28 | Success | - | |
|
exp_2409.01579v1_20260314_082729
|
AdaComp: Adaptive Context Compression Benchmark
**Architecture:** AdaComp augments standard Dense Retrieval pipelines (Retriever $\to$ LLM) with a lightweight **rate predictor**. This small auxiliary model (typically a distilled BERT or MLP) performs extractive compression, filtering the...
|
03-14 08:27 | Success | - | |
|
exp_pytrain.20260314082428.006_20260314_082520
|
Python Skill Fallback
Title: Generic Pipeline Engine with Dynamic Virtual Packaging - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-14 08:25 | Success | - | |
|
exp_2510.16439v4_20260314_081227
|
FrugalPrompt Benchmark
**Architecture:** FrugalPrompt is a prompt compression framework using token attribution methods (specifically GlobEnc and DecompX). It operates as a preprocessing layer that scores input tokens for semantic salience and retains only the to...
|
03-14 08:22 | Success | - | |
|
exp_pytrain.20260314080948.005_20260314_081032
|
StrictTypeRegistry: Protocol-Based Plugin System
**Overview** This benchmark evaluates the implementation of a robust, structural subtyping-based plugin manager using Python's standard `typing.Protocol`. The goal is to enforce strict interface adherence without relying on external meta-pr...
|
03-14 08:10 | Success | - | |
|
exp_2511.13223v1_20260314_080802
|
This benchmark is designed to simulate the inference stage of a Reasoning LLM. It compares the computational cost (VRAM...
**TokenSqueeze** optimizes reasoning LLMs (e.g., DeepSeek-R1) by training them to generate concise Chain-of-Thought (CoT) traces, addressing the high memory and latency costs of long reasoning sequences. * **Architecture:** A two-stage trai...
|
03-14 08:08 | Success | - | |
|
exp_2511.17885v1_20260314_080701
|
FastMMoE: Accelerating Multimodal LLMs Benchmark
**Architecture:** FastMMoE is a training-free accelerator for MoE-based Multimodal LLMs (e.g., DeepSeek-VL2). It optimizes inference through **Routing-Aware Token Pruning**, which clusters and removes visual tokens sharing high routing prob...
|
03-14 08:07 | Success | - | |
|
exp_2409.00855v1_20260314_080552
|
LanguaShrink: Reducing Token Overhead with Psycholinguistics
**Architecture:** LanguaShrink proposes a task-agnostic compression framework utilizing psycholinguistic principles (the Ebbinghaus memory curve) and Part-of-Speech (POS) tagging to score token importance. It employs a chunk-based algorithm...
|
03-14 08:05 | Success | - | |
|
exp_pytrain.20260314080234.004_20260314_080310
|
Strictly-Typed Pipeline with Namespace Hygiene
This benchmark evaluates a candidate's ability to construct a robust, modular data processing pipeline using advanced Python type hinting features and strict namespace controls. Objectives 1. **Type Safety**: Define strict `Protocol` interf...
|
03-14 08:03 | Success | - | |
|
exp_2510.10448v1_20260314_080101
|
RECON: Reasoning with Condensation for Efficient Retrieval-Augmented Generation
**Architecture & Retrieval:** RECON modifies the standard RAG pipeline by inserting a **learned condenser module** between retrieval and generation. Utilizing the *Search-R1* framework, it employs a distillation-trained summarizer to compre...
|
03-14 08:01 | Success | - | |
|
exp_2511.06029v3_20260314_075936
|
This benchmark evaluates the **Lethe** framework, focusing on its Layer- and Time-Adaptive KV Cache Pruning for LLMs. It...
**Architecture:** Lethe introduces a dynamic KV cache management framework with two distinct dimensions of adaptivity: 1. **Spatial (Layer-wise):** Allocates token pruning budgets individually per layer based on estimated attention redundan...
|
03-14 08:00 | Success | - | |
|
exp_2511.12869v2_20260314_075848
|
On the Fundamental Limits of LLMs at Scale
**Architecture & Memory:** This paper provides a theoretical proof that LLM scaling is fundamentally bounded by computability and information theory. It characterizes "context compression" as a geometric limit, proving that effective contex...
|
03-14 07:58 | Success | - | |
|
exp_pytrain.20260314075558.003_20260314_075635
|
Generic Plugin Loader with Runtime Type Enforcement
This benchmark demonstrates a robust, modular architecture for discovering and loading Python plugins dynamically at runtime. It leverages `importlib` for filesystem-based discovery and `typing.Protocol` for structural subtyping (duck typin...
|
03-14 07:56 | Success | - | |
|
exp_2511.18936v1_20260314_075430
|
SWAN: Sparse Winnowed Attention for Reduced Inference Memory via Decompression-Free KV-Cache Compression
**Architecture:** SWAN introduces a fine-tuning-free framework utilizing an offline orthogonal matrix to rotate and prune the KV-cache. It augments this sparse data with a small, fixed-size dense buffer to maintain retrieval accuracy. **Mem...
|
03-14 07:54 | Success | - | |
|
exp_2505.08261v1_20260314_075317
|
Enhancing Cache-Augmented Generation (CAG) with Adaptive Contextual Compression
**Architecture:** Proposes a **Hybrid CAG-RAG Framework** utilizing **Adaptive Contextual Compression (ACC)**. The system preloads static knowledge into the context window (CAG) but activates **selective retrieval** for dynamic or missing i...
|
03-14 07:53 | Success | - | |
|
exp_2505.18092v2_20260314_075232
|
QwenLong-CPRS Benchmark Suite
**Architecture:** QwenLong-CPRS is a compression framework featuring **Bidirectional Reasoning Layers** and **Token Critics** (using LM heads) to perform dynamic, natural language-guided context pruning. It utilizes **Window-Parallel Infere...
|
03-14 07:52 | Success | - | |
|
exp_2407.21118v2_20260314_075147
|
Palu: Compressing KV-Cache with Low-Rank Projection
**Architecture:** Palu targets hidden-dimension redundancy by decomposing projection matrices into low-rank components. It caches compressed Key/Value states and reconstructs full tensors on-the-fly during attention. The framework utilizes...
|
03-14 07:51 | Success | - | |
|
exp_pytrain.20260314074858.002_20260314_074929
|
Python Skill Fallback
Title: Modern Generic Data Structures with PEP 695 - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-14 07:49 | Success | - | |
|
exp_2402.18096v1_20260314_074714
|
Benchmark: Mixed-Precision KV Cache (MiKV) Simulation
**Architecture:** MiKV proposes an importance-aware mixed-precision quantization scheme. Instead of discarding "unimportant" tokens, the architecture retains the full KV context but stores high-importance pairs in high precision (e.g., FP16...
|
03-14 07:47 | Success | - | |
|
exp_2505.23416v2_20260314_074555
|
KVzip Benchmark Suite
**Architecture:** KVzip is a query-agnostic eviction method that compresses KV caches based on a **reconstruction proxy**. It quantifies token importance by using the underlying LLM to reconstruct the original context from the KV cache; tok...
|
03-14 07:46 | Success | - | |
|
exp_2408.15491v1_20260314_074448
|
Instruction-Aware Contextual Compression Benchmark
**Architecture:** Introduces **Instruction-Aware Contextual Compression**, a lightweight filter module designed to sit between the retriever and the LLM. It uses the instruction prompt to identify and prune irrelevant segments from retrieve...
|
03-14 07:45 | Success | - | |
|
exp_cr_10.1145_3759441.3759448_20260314_074406
|
EMPIRIC: Exploring Missing Pieces in KV Cache Compression for Reducing Computation, Storage, and Latency in Long-Context...
**Architecture:** An oracle-based framework extending RocketKV, analyzing intrinsic attention head patterns to define theoretical bounds for optimal KV cache eviction. **Memory Footprint:** Significantly reduces VRAM usage by validating agg...
|
03-14 07:44 | Success | - | |
|
exp_pytrain.20260314074150.001_20260314_074239
|
Python Skill Fallback
Title: Strictly Typed Configuration Loader - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-14 07:42 | Success | - | |
|
exp_cr_10.1145_3759441.3759448_20260314_073316
|
Benchmark: EMPIRIC KV Cache Compression
**Architecture:** An oracle-based framework extending RocketKV, analyzing intrinsic attention head patterns to define theoretical bounds for optimal KV cache eviction. **Memory Footprint:** Significantly reduces VRAM usage by validating agg...
|
03-14 07:33 | Pending | - | |
|
exp_2506.08373v3_20260314_073210
|
Draft-based Approximate Inference for LLMs
**Architecture:** Introduces a draft-based framework using a small auxiliary model (e.g., 1-3B) to perform lookahead importance estimation for a larger target model. It proposes **SpecKV** (KV cache eviction), **SpecPC** (prompt token pruni...
|
03-14 07:32 | Success | - | |
|
exp_pytrain.20260314072923.005_20260314_072958
|
Typed Package Bootstrapper
Overview This benchmark evaluates a Python system's ability to synthesize a standard-compliant Python project structure. It rigorously validates metadata configuration using `typing.TypedDict` schemas before generating filesystem artifacts....
|
03-14 07:30 | Success | - | |
|
exp_pytrain.20260314065457.004_20260314_065525
|
Strictly-Typed Dynamic Package Loader Benchmark
Objective This benchmark evaluates an autonomous agent's ability to programmatically construct a valid Python package structure on the filesystem, utilize the `importlib` standard library for dynamic module loading, and enforce strict runti...
|
03-14 06:55 | Success | - | |
|
exp_pytrain.20260314064115.003_20260314_064149
|
Python Skill Fallback
Title: Strictly-Typed CLI Dispatcher with ParamSpec - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-14 06:41 | Success | - | |
|
exp_pytrain.20260314063413.002_20260314_063448
|
PEP 695 Generic Package Scaffolder
This coding drill benchmarks the developer experience and code robustness improvements offered by **PEP 695 Type Parameter Syntax** (introduced in Python 3.12). Hypothesis Adopting PEP 695 syntax (using square brackets for generics and `typ...
|
03-14 06:34 | Success | - | |
|
exp_pytrain.20260314062740.001_20260314_062802
|
Python Skill Fallback
Title: Robust Plugin Loader with Strict Type Safety - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-14 06:28 | Success | - | |
|
exp_2506.16636v1_20260313_105745
|
This benchmark evaluates the performance of Masked Autoregressive Flows (MAF) utilizing the Latent Noise Injection (LNI)...
**Architecture** The method relies on **Masked Autoregressive Flows (MAF)**. Rather than standard generative sampling, it proposes a "Latent Noise Injection" (LNI) technique: encoding specific observed data points into the latent space, app...
|
03-13 10:57 | Success | - | |
|
exp_pytrain.20260313105503.016_20260313_105531
|
Robust Dynamic Plugin Registry with importlib
Overview This drill demonstrates the construction of a modular, type-safe plugin loader using Python's standard library. It bridges the gap between dynamic runtime imports and static type checking by leveraging `typing.Protocol` for structu...
|
03-13 10:55 | Success | - | |
|
exp_2506.16584v1_20260313_105421
|
Benchmark: Semantic Stability on Constrained Hardware
**Architecture & Methodology** This paper does not propose a new model architecture. Instead, it introduces a **Variance Decomposition Framework**, an evaluation methodology designed to measure semantic grounding. It assesses whether an LLM...
|
03-13 10:54 | Success | - | |
|
exp_oa_W4412056540_20260313_105243
|
Backfill Candidate oa_W4412056540
This paper analyzes the shift to data-centric AI, identifying key bottlenecks for embedded and real-time systems relevant to the ARES 8GB roadmap. **Architecture & Memory:** The authors argue that while training faces data scarcity, inferen...
|
03-13 10:52 | Success | - | |
|
exp_hf_2603.09400_20260313_105158
|
Backfill Candidate hf_2603.09400
**Architecture:** StateFactory utilizes an LLM to transform unstructured observations into **factorized, hierarchical object-attribute structures**. Instead of discriminative training, it computes rewards as semantic similarity between the...
|
03-13 10:52 | Success | - | |
|
exp_2309.16859v1_20260313_105059
|
Benchmark: Identity-Conditioned HyperNeRF (Backfill Candidate 2309.16859v1)
**Architecture:** Utilizes an identity-conditioned hypernetwork to generate NeRF weights, learning a volumetric latent space of facial geometry and appearance from a low-res multi-view dataset. **Memory Footprint:** **High Risk.** While the...
|
03-13 10:51 | Success | - | |
|
exp_cr_10.1515_jiip-2022-0050_20260313_105015
|
Multi-Fidelity Bayesian Inference Benchmark
**Architecture** Proposes a multi-fidelity framework combining a low-fidelity Deep Neural Network (DNN) surrogate with a high-fidelity physical model for Bayesian inference on elastic properties. The DNN handles the bulk of the prior distri...
|
03-13 10:50 | Success | - | |
|
exp_pytrain.20260313104750.015_20260313_104834
|
Python Skill Fallback
Title: PEP 695 Generic API with Public Interface Control - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-13 10:48 | Success | - | |
|
exp_2403.18096v1_20260313_104620
|
Benchmark: Cascade Temporal Filtering (Backfill Candidate 2403.18096v1)
**Summary for ARES 8GB Roadmap** **Architecture:** The paper proposes a "cascade temporal filtering" method using dual-time dimensions (isochronal and chronological) to distinguish short- and long-term human activity. Crucially, it function...
|
03-13 10:46 | Success | - | |
|
exp_2409.14586v1_20260313_104534
|
Backfill Candidate 2409.14586v1
**Architecture:** Introduces a single `[RESET]` token to the vocabulary. Training (SFT/DPO) conditions the model to emit this token to abort unsafe continuations and restart generation, effectively adding a "self-correct" loop without struc...
|
03-13 10:45 | Success | - | |
|
exp_2409.14538v1_20260313_104439
|
Benchmark: HMDC (Heterogeneous Multi-model Dataset Condensation)
**Architecture:** HMDC proposes a framework for generating model-agnostic condensed datasets by utilizing multiple heterogeneous architectures simultaneously. To resolve conflicts between diverse models, it introduces a Gradient Balance Mod...
|
03-13 10:44 | Success | - | |
|
exp_oa_W4403322739_20260313_104306
|
This benchmark evaluates the inference performance (memory footprint and generation speed) of a standard Transformer-bas...
This survey evaluates generative LLM architectures (specifically GPT and Llama series) and their inference performance across diverse hardware platforms (CPU, GPU, FPGA, ASIC, PIM). * **Architecture:** Focuses on standard Transformer-based...
|
03-13 10:43 | Success | - | |
|
exp_pytrain.20260313104107.014_20260313_104135
|
Self-Validating Plugin Registry with Strict Typing
Overview This benchmark demonstrates the implementation of a type-safe, modular plugin architecture using Python's standard library. It leverages `typing.Protocol` for structural subtyping and `importlib` for dynamic runtime introspection a...
|
03-13 10:41 | Success | - | |
|
exp_cr_10.58414_scientifictemper.2025.16.2.03_20260313_103952
|
MRMGKTL Benchmark
**Analysis for ARES 8GB Roadmap** * **Architecture:** The MRMGKTL model combines a standard Transformer encoder with a Gaussian Kernel classifier. Crucially, it utilizes a pre-processing pipeline involving Sokal–Michener’s multivariate reli...
|
03-13 10:39 | Success | - | |
|
exp_2506.16594v2_20260313_103754
|
Benchmark: Efficient Local Biomedical Inference
This paper is a **scoping review**, not a technical architecture proposal. Consequently, it provides **no specific data** regarding model architecture, memory footprint, or inference speed required for the ARES 8GB roadmap. * **Architecture...
|
03-13 10:39 | Success | - | |
|
exp_2506.16575v1_20260313_103712
|
Benchmark for Elo-Based Harmful Content Detection Workflow
**Paper Summary: Elo Rating System for Harmful Content Detection** **Architecture:** The paper proposes an inference workflow utilizing an Elo rating system to rank and select optimal LLM responses for detecting harmful content (microaggres...
|
03-13 10:37 | Success | - | |
|
exp_pytrain.20260313103445.013_20260313_103506
|
Strictly Typed Backend Registry with Runtime Validation
This benchmark demonstrates a robust, pluggable architecture simulation using Python's `typing.Protocol` for structural subtyping. It implements a `KernelRegistry` that enforces strict type checking at registration time, ensuring that only...
|
03-13 10:35 | Success | - | |
|
exp_2506.16571v2_20260313_102932
|
Benchmark: Visualization Rationale Extraction
**Paper Analysis:** *Capturing Visualization Design Rationale* This paper introduces a methodology and dataset for extracting visualization design rationales from student notebooks, creating a corpus of Question-Answer-Rationale triples usi...
|
03-13 10:33 | Success | - | |
|
exp_pytrain.20260313102718.012_20260313_102745
|
Dynamic Type-Safe Component Loader
Overview This benchmark implements a robust, self-contained plugin architecture using Python's standard library. It demonstrates advanced use of `importlib` for dynamic module loading from arbitrary file paths and `typing.Protocol` for stru...
|
03-13 10:27 | Success | - | |
|
exp_cr_10.1038_s41698-025-01103-4_20260313_102301
|
LLM-AIx Pipeline Benchmark: Local Privacy-Preserving Extraction
**Summary: LLM-AIx Pipeline for Oncology** * **Architecture:** The paper outlines **LLM-AIx**, a software protocol acting as a wrapper for open-source, privacy-preserving LLMs. It is designed to extract structured clinical data (e.g., TNM s...
|
03-13 10:25 | Success | - | |
|
exp_2512.14954v1_20260313_102220
|
Backfill Candidate 2512.14954v1
**Summary for ARES 8GB Roadmap** **Architecture:** Proposes a probabilistic framework to align teacher and student probability spaces across distinct tokenizers. By exploiting the recursive structure of Byte-Pair Encoding (BPE), it enables...
|
03-13 10:22 | Success | - | |
|
exp_hf_2603.09221_20260313_102122
|
Test-Time Control (TTC) Layer Benchmark
**Architecture** The paper introduces the **Test-Time Control (TTC) layer**, an adapter that integrates finite-horizon LQR planning into pretrained LLMs. Instead of relying solely on associative recall, the architecture projects future late...
|
03-13 10:21 | Success | - | |
|
exp_hf_2603.08942_20260313_102018
|
Benchmark: BiCLIP (Geometric Domain Alignment)
**Architecture** BiCLIP functions as a lightweight wrapper for frozen Vision-Language Models (VLMs). It operates on the principle of "domain canonicalization," learning a structured geometric transformation matrix to align image-text featur...
|
03-13 10:20 | Success | - | |
|
exp_pytrain.20260313101806.011_20260313_101833
|
Dynamic Protocol Validator & Package Generator
This benchmark validates a candidate's ability to bridge static type definitions with dynamic code execution. It simulates a plugin system where Python code is generated on-the-fly, written to the filesystem, and loaded dynamically using `i...
|
03-13 10:18 | Success | - | |
|
exp_2303.10944v3_20260313_101631
|
Benchmark: Pix2SG Architecture Evaluation
**Architecture:** Pix2SG utilizes a **standard Transformer Encoder-Decoder** architecture. It treats Scene Graph Generation (SGG) as an autoregressive sequence-to-sequence task, converting image patches directly into a sequence of (subject,...
|
03-13 10:16 | Success | - | |
|
exp_2309.16175v1_20260313_101535
|
Backfill Candidate 2309.16175v1
**Summary for ARES 8GB Roadmap:** This paper details a **data-centric training pipeline** for biomedical QA (COVID-19), focusing on weak supervision and augmentation rather than inference architecture optimization. * **Architecture:** Stand...
|
03-13 10:15 | Success | - | |
|
exp_cr_10.60027_ijsasr.2025.7518_20260313_101450
|
Benchmark: Blended Learning Curriculum Simulation
**Assessment: Irrelevant to Inference Roadmap** This document is an **educational pedagogical study**, not a technical AI paper. It evaluates the efficacy of a blended learning curriculum for library science students at Zhoukou Normal Unive...
|
03-13 10:14 | Success | - | |
|
exp_2506.16593v1_20260313_101407
|
ARES 8GB Roadmap: Physical System Identification Benchmark
**Summary for ARES 8GB Roadmap** **Focus:** Physical System Identification & Uncertainty Quantification (Classical/Model-based, not Deep Learning). * **Architecture:** Proposes a lightweight mathematical "transfer function" linking velocity...
|
03-13 10:14 | Success | - | |
|
exp_pytrain.20260313101122.010_20260313_101156
|
Typed Asynchronous Plugin Architecture
Overview This benchmark demonstrates a robust, extensible plugin system using **Structural Subtyping (Protocol)** and **Asynchronous I/O (asyncio)**. Features * **Protocol Enforcement**: Uses `typing.Protocol` to define the `Plugin` interfa...
|
03-13 10:12 | Success | - | |
|
exp_2304.00320v1_20260313_095955
|
Benchmark: Backfill Candidate 2304.00320v1 (SGD as SDE)
**Architecture:** Theoretical analysis of training dynamics, not a network design. Proposes modeling SGD as a Stochastic Differential Equation (SDE) with two diffusion terms (mini-batch sampling and unbiased label noise). **Memory Footprint...
|
03-13 10:09 | Success | - | |
|
exp_2309.16849v2_20260313_095842
|
Benchmark: Shifted Non-Local Search (SNLS) vs. Standard Attention
**Architecture:** Proposes **Shifted Non-Local Search (SNLS)**, a hybrid space-time attention mechanism. It predicts global offsets for long-range motion and refines them via a corrective local grid search. This acts as a drop-in replacemen...
|
03-13 09:58 | Success | - | |
|
exp_pytrain.20260313095549.009_20260313_095627
|
Type-Safe Dynamic Extension Loader
This benchmark validates the hypothesis that Python's `typing.Protocol` combined with `importlib` can be used to create a robust, zero-dependency plugin architecture. Objective To design a runtime system that: 1. Defines a strict structural...
|
03-13 09:56 | Success | - | |
|
exp_2403.18148v1_20260313_094810
|
Benchmark Design: Feasibility of Local Empathic Models
**Paper Type:** Behavioral Evaluation (Not an architectural proposal). **Summary:** This study compares empathic response generation in existing LLMs (GPT-4 Turbo, Llama 2, Mistral) against human benchmarks. It does not introduce new archit...
|
03-13 09:53 | Success | - | |
|
exp_2403.18125v1_20260313_094724
|
Benchmark for Digital Newcomer Queries
**Relevance:** Low (Data Resource). **Assessment:** This paper proposes a dataset of "digital newcomer" queries to study LLM robustness against non-standard language. It does **not** present a model architecture or optimization technique. *...
|
03-13 09:47 | Success | - | |
|
exp_cr_10.3390_s24072091_20260313_094645
|
Benchmark: Lightweight BNN for Structural Health Monitoring (SHM)
**Paper Analysis: BNNs for Structural Health Monitoring (SHM)** **Architecture:** The paper proposes a **Bayesian Neural Network (BNN)** utilizing probabilistic inference to predict structural displacement. It operates within a "dual-drive"...
|
03-13 09:46 | Success | - | |
|
exp_pytrain.20260313094430.008_20260313_094503
|
Dynamic Typed Plugin Loader with PEP 695
This benchmark verifies the hypothesis that combining dynamic module loading (`importlib`) with modern type parameter syntax (PEP 695) results in a robust, performant, and extensible plugin architecture. Hypothesis Dynamic generation and ex...
|
03-13 09:45 | Success | - | |
|
exp_cr_10.36724_2072-8735-2024-18-3-41-49_20260313_094308
|
Backfill Candidate cr_10.36724_2072-8735-2024-18-3-41-49
**Status: Irrelevant** This paper addresses **telecommunications protocols** (specifically queueing theory and traffic shaping for high-throughput satellites), not Deep Learning. * **Architecture:** N/A. The paper proposes a mathematical pr...
|
03-13 09:43 | Success | - | |
|
exp_cr_10.1609_aaai.v38i16.29810_20260313_094122
|
Backfill Benchmark: Dynamic Layerwise Token Dropping
**Architecture:** Framework-level intervention. Introduces "efficient data sampling" (curriculum learning) and "random layerwise token dropping" to optimize training data routing. It does not modify the underlying model architecture (e.g.,...
|
03-13 09:41 | Success | - | |
|
exp_pytrain.20260313093752.007_20260313_093830
|
Generic Component Pipeline Builder
This benchmark evaluates the creation of a modular, type-safe data processing pipeline using Python's standard library. The goal is to design a framework that separates core logic from concrete implementations, leveraging advanced typing fe...
|
03-13 09:38 | Success | - | |
|
exp_2409.14516v1_20260313_093231
|
Benchmark: Local Feasibility of Phi-3-mini for Geospatial Planning
**Assessment:** This paper evaluates GPT-4 and Phi-3-mini for geospatial and transportation planning tasks. * **Architecture:** The study contrasts the proprietary GPT-4 against Phi-3-mini, a lightweight transformer architecture optimized f...
|
03-13 09:36 | Success | - | |
|
exp_2506.16628v1_20260313_093155
|
Benchmark: Offline-LLM to Rule-Based Pipeline
**Architecture:** Hybrid offline design. LLMs are utilized exclusively during the development phase to generate rules, identify relevant text snippets, and extract keywords. The production system is a traditional rule-based NLP pipeline (Re...
|
03-13 09:31 | Success | - | |
|
exp_cr_10.3390_s25185786_20260313_093109
|
Benchmark for MFT-Net (Tactile Sensing Architecture)
**Architecture** The paper proposes MFT-Net, a hybrid architecture that integrates a Convolutional Neural Network (CNN) for local feature extraction with a Transformer module for global dependency modeling. It utilizes Squeeze-and-Excitatio...
|
03-13 09:31 | Success | - | |
|
exp_pytrain.20260313092922.006_20260313_092949
|
Python Skill Fallback
Title: Strictly-Typed Component Registry with Dynamic Import Mechanics - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-13 09:29 | Success | - | |
|
exp_2512.14961v3_20260313_092842
|
Benchmark: Hybrid Trimodal Fusion (Backfill 2512.14961v3)
**Architecture:** Utilizes a hybrid trimodal framework (face, voice, motion) with independent encoders feeding into a cross-attention and gated fusion module. It employs a single classification head with a confidence-weighted strategy to dy...
|
03-13 09:28 | Success | - | |
|
exp_cr_10.1609_aaai.v37i4.25597_20260313_092728
|
Efficient Dual-Encoder CLIP with Visual Prompting
**Architecture & Retrieval Strategy:** This paper proposes a **dual-encoder** architecture fine-tuning a **frozen CLIP** backbone. The retrieval mechanism converts the reference image into a **learnable visual prompt** which is prefixed to...
|
03-13 09:27 | Success | - | |
|
exp_2506.12724v1_20260313_092646
|
Dynamic Modality Scheduling (DMS) Benchmark
**Architecture:** Dynamic Modality Scheduling (DMS) is a model-agnostic wrapper for Multimodal LLMs (e.g., LLaVA, BLIP-2). It uses a scheduler to weigh modality contributions based on three signals: predictive entropy (confidence), Monte Ca...
|
03-13 09:26 | Success | - | |
|
exp_2304.00387v1_20260313_092545
|
Benchmark for HaLP (Hallucinating Latent Positives)
**Architecture:** Introduces a lightweight augmentation-free contrastive learning framework. The HaLP module hallucinates synthetic positive samples directly in the latent space using a closed-form solver, replacing the need for complex geo...
|
03-13 09:25 | Success | - | |
|
exp_2404.00057v1_20260313_092455
|
Backfill Candidate 2404.00057v1
**Architecture:** Proposes a **cloud-centric** OS architecture integrating LLMs via declarative interfaces and self-adaptive kernels. The system prioritizes personalized intelligence by decoupling the decision-making layer from local hardwa...
|
03-13 09:24 | Success | - | |
|
exp_pytrain.20260313092254.005_20260313_092314
|
Generic Plugin Registry with Protocol Enforcement
This benchmark tests the implementation of a modular, type-safe plugin system using Python's `typing.Protocol`, `typing.TypeVar`, and `typing.Generic`. Objectives 1. **Structural Subtyping**: Define a strict interface using `Protocol` that...
|
03-13 09:23 | Success | - | |
|
exp_cr_10.3390_en18184924_20260313_091823
|
Hybrid Monte Carlo & Clustering Time-Series Forecasting
**Architecture:** The proposed model is a hybrid statistical system combining Monte Carlo filters for state estimation with a clustering algorithm (likely K-Means or similar) for outlier removal and forecasting. It is not a neural network o...
|
03-13 09:21 | Success | - | |
|
exp_cr_10.36676_jrps.v15.i3.1520_20260313_091726
|
Benchmark: Content-Based Image Retrieval (CBIR) with Lightweight Feature Extraction
**Paper Type:** Literature Survey (Not a specific implementation). * **Architecture:** Analyzes Deep Learning feature extractors (CNNs/ViTs) and handcrafted features. No specific architecture proposed for deployment. * **Retrieval Architect...
|
03-13 09:17 | Success | - | |
|
exp_cr_10.17588_2072-2672.2023.3.062-067_20260313_091651
|
Innovation Benchmark: Classical HVAC State-Space Control
**Assessment:** Reject for ARES Roadmap. This paper concerns physical control theory (HVAC), not AI workloads. * **Architecture:** Classical (State-Space & Transfer Functions). The "model" consists of differential equations derived from the...
|
03-13 09:16 | Success | - | |
|
exp_cr_10.3390_agronomy14040673_20260313_091607
|
Backfill Candidate cr_10.3390_agronomy14040673
**Architecture:** Hybrid framework combining a Densely Connected CNN for multilevel local feature extraction with a Transformer module for global context capture. A Cycle-GAN is utilized for training data augmentation but is excluded during...
|
03-13 09:16 | Success | - | |
|
exp_pytrain.20260313091410.004_20260313_091433
|
Python Skill Fallback
Title: Strictly Typed ZipApp Packager - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-13 09:14 | Success | - | |
|
exp_cr_10.24425_jppr.2024.151253_20260313_091249
|
Backfill Candidate cr_10.24425_jppr.2024.151253
**Architecture:** Modifies the YOLOv5m baseline by integrating a Swin Transformer (Swin-T) module into the backbone network. It also utilizes K-means++ for anchor optimization and Efficient IoU (EIoU) loss to improve bounding box regression...
|
03-13 09:12 | Success | - | |
|
exp_2506.16597v1_20260313_091202
|
Backfill Candidate 2506.16597v1
**Paper:** Exoplanet Classification through Vision Transformers with Temporal Image Analysis **Architecture:** The proposed pipeline converts 1D Kepler light curves into 2D Recurrence Plots (RPs) or Gramian Angular Fields (GAFs) to serve as...
|
03-13 09:12 | Success | - | |
|
exp_cr_10.3390_rs17183200_20260313_091118
|
TransMambaCNN Architecture Benchmark
**Architecture** TransMambaCNN utilizes a dual-branch topology to fuse global and local spatiotemporal features. The global branch replaces standard self-attention with a **Convolutional State-Space Module (C-SSM)**, combining an Attentive...
|
03-13 09:11 | Success | - | |
|
exp_2512.14908v5_20260313_091038
|
Backfill Candidate 2512.14908v5
**Architecture:** ATLAS is a propagation-free framework replacing message passing with multi-resolution community features. It utilizes modularity-guided search to identify optimal community scales, projects these structures into embeddings...
|
03-13 09:10 | Success | - | |
|
exp_2303.10699v1_20260313_090945
|
Backfill Candidate 2303.10699v1
**Architecture:** This paper introduces a dataset augmentation strategy (FVQA 2.0) for Fact-based VQA, addressing model vulnerability to imbalanced Knowledge Graph (KG) distributions. The underlying architecture employs a **Dual-Encoder** s...
|
03-13 09:09 | Success | - | |
|
exp_pytrain.20260313090742.003_20260313_090810
|
Type-Introspective Package Manifestor
Overview This benchmark validates the hypothesis that Python's standard library `typing` and `inspect` modules are sufficient to build robust, type-safe packaging utilities without external dependencies. Objective Implement a lightweight pa...
|
03-13 09:08 | Success | - | |
|
exp_2506.17336v3_20260313_090606
|
Backfill Candidate 2506.17336v3
**Architecture:** Hybrid system splitting computation between a remote strong LLM (GPT-4o) for "Socratic CoT" query planning and a local **Llama-3.2-1B** for final response generation. **Retrieval Strategy:** Uses **Homomorphically Encrypte...
|
03-13 09:06 | Success | - | |
|
exp_2506.13467v1_20260313_090446
|
NeuroEmbed Bi-Encoder Benchmark
**NeuroEmbed** fine-tunes **PubMedBERT** for semantic retrieval of biomedical cohorts. * **Architecture:** Bi-encoder (PubMedBERT) fine-tuned on synthetically generated QA pairs derived from ontology-aligned metadata. * **Retrieval Strategy...
|
03-13 09:05 | Success | - | |
|
exp_2304.01222v1_20260313_090354
|
Benchmark: NeuroDAVIS (2304.01222v1)
**Architecture** NeuroDAVIS employs an unsupervised deep neural network designed for dimensionality reduction. It extracts features non-linearly, theoretically preserving high-dimensional neighborhood relationships (local and global structu...
|
03-13 09:04 | Success | - | |
|
exp_2304.06724v1_20260313_090305
|
Backfill Candidate 2304.06724v1
**Assessment: High-Risk Vulnerability for Dynamic Architectures** **Architecture:** GradMDM targets **Dynamic Neural Networks (DNNs)**—models designed to skip layers or adapt width to save resources. The attack manipulates gradient directio...
|
03-13 09:03 | Success | - | |
|
exp_pytrain.20260313090040.002_20260313_090104
|
PEP 695 Generic Repository Benchmark
This benchmark tests the implementation of Python 3.12's PEP 695 Type Parameter Syntax within a single-file module structure. Features * **PEP 695 Syntax**: Uses the new `class ClassName[T]:` and `type Alias[T] = ...` syntax. * **Module Enc...
|
03-13 09:01 | Success | - | |
|
exp_2309.16804v2_20260313_084913
|
Benchmark Candidate 2309.16804v2
**Architecture:** A pipeline fine-tuning an unspecified open-source model on synthetic dialogues derived from textbooks. The specific base architecture is redacted in this excerpt. **Memory Footprint:** No explicit VRAM usage is detailed. F...
|
03-13 08:59 | Success | - | |
|
exp_cr_10.1609_aaai.v38i12.29197_20260313_084834
|
FLAME Architecture Benchmark
**Architecture:** FLAME is a 60M parameter Transformer optimized specifically for Excel formulas. Key architectural differentiators include an Excel-specific tokenizer and domain-adapted pre-training objectives: masked span prediction and n...
|
03-13 08:48 | Success | - | |
|
exp_pytrain.20260313084613.001_20260313_084638
|
Type-Safe Virtual Package Builder Benchmark
Overview This benchmark demonstrates the ability to construct a Python package entirely in memory, inject it into the runtime environment, and enforce strict type constraints using `typing.Protocol` and Generics. It simulates a build proces...
|
03-13 08:46 | Success | - | |
|
exp_cr_10.1609_aaai.v38i12.29197_20260313_083849
|
FLAME Architecture Benchmark
**Architecture:** FLAME is a 60M parameter Transformer optimized specifically for Excel formulas. Key architectural differentiators include an Excel-specific tokenizer and domain-adapted pre-training objectives: masked span prediction and n...
|
03-13 08:44 | Pending | - | |
|
exp_cr_10.1609_aaai.v38i12.29197_20260313_083809
|
Backfill Candidate cr_10.1609_aaai.v38i12.29197
**Architecture:** FLAME is a 60M parameter Transformer optimized specifically for Excel formulas. Key architectural differentiators include an Excel-specific tokenizer and domain-adapted pre-training objectives: masked span prediction and n...
|
03-13 08:38 | Success | - | |
|
exp_pytrain.20260313083547.003_20260313_083620
|
Robust Typed Plugin Loader with `importlib`
This benchmark tests the ability to design a flexible plugin architecture using Python's standard library. The solution must dynamically generate a module in a temporary filesystem context, load it using low-level import utilities, and vali...
|
03-13 08:36 | Success | - | |
|
exp_oa_W4404574673_20260313_083420
|
Backfill Candidate oa_W4404574673
**Analysis for ARES 8GB Roadmap** * **Architecture:** The survey reviews standard Transformer-based architectures and pre-training objectives. It identifies multilingual capabilities primarily as a result of data quality, diversity, and ali...
|
03-13 08:34 | Success | - | |
|
exp_2506.16655v1_20260313_083303
|
Arch-Router v1.0 Benchmark
**Architecture** Arch-Router is a compact 1.5B parameter model functioning as a classifier. Instead of generating text, it maps user queries to specific domains (e.g., travel) or action types to select the most appropriate downstream model...
|
03-13 08:33 | Success | - | |
|
exp_2506.16596v3_20260313_083145
|
Cyc-like Knowledge Infrastructure Benchmark
This paper outlines a community-driven vision for a modern Cyc-like knowledge infrastructure to address LLM hallucinations and reasoning gaps. * **Architecture:** Proposes an "open engineering framework" integrating modular Knowledge Repres...
|
03-13 08:32 | Success | - | |
|
exp_pytrain.20260313082915.002_20260313_082954
|
Generic Event Dispatcher with PEP 695 Syntax
Overview This benchmark provides a reference implementation of a thread-safe Generic Event Dispatcher using Python 3.12's **PEP 695 Type Parameter Syntax**. Hypothesis Utilizing PEP 695 Type Parameter Syntax reduces generic type boilerplate...
|
03-13 08:29 | Success | - | |
|
exp_2512.14880v1_20260313_082625
|
Benchmark: Task Matrices for Efficient Model Specialization
**Architecture:** Introduces "Task Matrices"—linear transformations that map base model embeddings to specific finetuned states. This allows a single base model to simulate the behavior of multiple specialized models by applying distinct li...
|
03-13 08:27 | Success | - | |
|
exp_hf_2603.09555_20260313_082538
|
Backfill Candidate hf_2603.09555
**Architecture:** Proposes a compiler-first implementation of Mamba-2, leveraging XLA's fusion and tiling passes to handle state space duality (diagonal structures, chunkable recurrence). This eliminates the need for hand-written CUDA or Tr...
|
03-13 08:25 | Success | - | |
|
exp_2309.10945v1_20260313_082428
|
Benchmark: Pirá 2.0 Bilingual Scientific QA
**Paper:** Benchmarks for Pirá 2.0 **Type:** Dataset Release (No novel model architecture). **Summary:** This paper establishes baselines for the Pirá 2.0 dataset, a curated bilingual (English/Portuguese) resource for testing expert knowled...
|
03-13 08:24 | Success | - | |
|
exp_pytrain.20260313082208.001_20260313_082233
|
Strictly-Typed Dependency Resolver Benchmark
This benchmark evaluates the ability of an autonomous coding system to implement a robust package dependency resolver using Python's standard library. The solution requires a strict type system (simulating `mypy --strict` compliance), a bac...
|
03-13 08:22 | Success | - | |
|
exp_hf_2603.06854_20260313_072309
|
Benchmark: Audio-Text Text-Dominance Mitigation (Steering Overhead)
**Architecture** Proposes an inference-time activation steering mechanism to mitigate "text dominance" in Large Audio-Language Models (LALMs). It utilizes mechanistic interpretability to identify specific "audio-specialist" attention heads...
|
03-13 07:23 | Pending | - | |
|
exp_hf_2603.10145_20260313_072159
|
Backfill Candidate hf_2603.10145
**Architecture:** The paper identifies the standard LM Head (projection from hidden dimension $D$ to vocabulary $V$) as a fundamental "gradient bottleneck." Due to the $D \ll V$ mismatch, the rank-$D$ layer acts as a severe compressor durin...
|
03-13 07:22 | Success | - | |
|
exp_2309.16812v1_20260313_072058
|
Benchmark for Semantic Layout-to-Image Diffusion
**Architecture:** Conditional Denoising Diffusion Probabilistic Model (DDPM) utilizing a U-Net backbone enhanced with adaptive normalization (likely SPADE-style) and self-attention mechanisms to integrate semantic layout conditioning. **Mem...
|
03-13 07:21 | Success | - | |
|
exp_pytrain.20260313071750.090_20260313_071827
|
Dynamic Typed Plugin Loader
Objective The objective of this drill is to verify the ability to construct a robust Python plugin architecture that merges strict static typing definitions (using `typing.Protocol`, `TypeVar`, and Generics) with dynamic runtime module gene...
|
03-13 07:18 | Success | - | |
|
exp_2403.18098v1_20260313_070552
|
Legal Entailment Benchmark (COLIEE Task 4)
**Analysis: GPTs and Language Barrier (COLIEE Task 4)** * **Architecture:** The paper evaluates generic "GPTs" (likely proprietary APIs or large base models) on a legal entailment task. No specific architectural modifications (e.g., pruning...
|
03-13 07:16 | Success | - | |
|
exp_pytrain.20260313070305.089_20260313_070333
|
Typed Dynamic Plugin Loader
This benchmark demonstrates a robust, extensible plugin architecture that leverages Python's `typing.Protocol` for interface safety and `importlib` for dynamic runtime module loading. Objective To validate that dynamically loaded code—often...
|
03-13 07:03 | Success | - | |
|
exp_cr_10.3390_app14188526_20260313_070104
|
Backfill Candidate cr_10.3390_app14188526
**Summary for ARES 8GB Roadmap** * **Architecture:** The paper proposes a hybrid **Long Short-Term Memory (LSTM)** network integrated with a **Self-Attention Mechanism (SA-LSTM)**. This architecture weights specific time-steps in the input...
|
03-13 07:01 | Success | - | |
|
exp_2506.16592v1_20260313_070005
|
Benchmark for DenseNet121 Attention-Enhanced Hybrid (Candidate 2506.16592v1)
**Architecture:** Utilizes a hybrid design coupling a pre-trained DenseNet121 encoder with a multi-branch attention-enhanced decoder. The bottleneck employs Global Spatial Attention (GSA), Position Encoding, and Scaled Dot-Product Attention...
|
03-13 07:00 | Success | - | |
|
exp_cr_10.1145_3768167_20260313_065845
|
Backfill Candidate cr_10.1145_3768167
**Architecture** The paper proposes a Graph-Transformer Network (GTN) acting as a surrogate model for circuit topology optimization. It encodes circuit physics specifically—voltage changes in loops and current flows—directly into graph embe...
|
03-13 06:59 | Success | - | |
|
exp_pytrain.20260313065531.088_20260313_065602
|
Generic Package Metadata Inspector
A robust Python coding drill designed to test proficiency with the `importlib.metadata` standard library and modern Generics. Objective Implement a generic class `PackageMetadataInspector[T]` that performs introspection on installed Python...
|
03-13 06:56 | Success | - | |
|
exp_cr_10.3390_s25185805_20260313_065334
|
Benchmark for BLIP-2 Heterogeneous Input Fusion
**Architecture:** Uses a customized **BLIP-2** framework with a Q-Former to fuse heterogeneous inputs (visual frames, kinematic data) into low-dimensional embeddings representing "task demand" and "driving capability" within a shared latent...
|
03-13 06:53 | Success | - | |
|
exp_2303.16839v3_20260313_065232
|
Backfill Candidate 2303.16839v3
**Architecture:** A decoder-only multimodal model pairing a vision encoder with a unified text decoder. It utilizes a "two-pass" approach: the first pass extracts contrastive embeddings for retrieval, and the second pass performs autoregres...
|
03-13 06:52 | Success | - | |
|
exp_2303.16576v2_20260313_065106
|
Backfill Candidate 2303.16576v2
**Architecture:** WordStylist utilizes a Latent Diffusion Model (LDM) backbone, comprising a VAE for latent space compression and a U-Net denoiser. It conditions generation on writer style (via class indices) and text content, replacing adv...
|
03-13 06:51 | Success | - | |
|
exp_pytrain.20260313064740.087_20260313_064818
|
Python Skill Fallback
Title: Dynamic Module Loader with Structural Typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-13 06:48 | Success | - | |
|
exp_2303.15132v1_20260313_064556
|
Benchmark: Graph-based Label Propagation for ASR Rescoring
**Architecture** Graph-based label propagation model operating on ASR N-best lists. Nodes represent hypotheses, and edges are weighted by cross-utterance acoustic similarity. This allows for collaborative rescoring, utilizing neighboring ut...
|
03-13 06:46 | Success | - | |
|
exp_cr_10.1609_aaai.v38i17.29885_20260313_064508
|
Benchmark for Contrastive Confidence Regularizer (CCR) in Dense Retrieval
**Architecture:** Dual-Encoder Dense Retrieval (Contrastive Learning). **Retrieval Specifics:** * **Retrieval Architecture:** Standard Dual-Encoder (bi-encoder) with vector similarity search. * **Training Strategy:** Introduces a "Contrasti...
|
03-13 06:45 | Success | - | |
|
exp_2507.00033v1_20260313_064344
|
Video LLM Context Optimization Benchmark
**Architecture:** Proposes a **Retrieval-Augmented Generation (RAG)** pipeline where a lightweight **text-to-video moment retrieval model** acts as a "selector." It retrieves top-$k$ relevant video segments based on the query before passing...
|
03-13 06:44 | Success | - | |
|
exp_2403.18134v1_20260313_064255
|
GTI Block Benchmark
**Architecture:** Proposes a **Graph Transformer Integration (GTI)** block for Multiple Instance Learning (MIL). It hybridizes a local **Graph Convolutional Network (GCN)** to model spatial relationships between neighboring tissue patches w...
|
03-13 06:43 | Success | - | |
|
exp_pytrain.20260313063952.086_20260313_064030
|
Dynamic Backend Resolution with Strict Typing and Metadata Checks
This benchmark implements a self-contained "backend dispatcher" mechanism often found in high-performance ML frameworks like vLLM or Diffusers. Overview In production-grade inference engines, the system must dynamically select the most effi...
|
03-13 06:40 | Success | - | |
|
exp_2409.14557v3_20260313_063753
|
Backfill Candidate 2409.14557v3
**Architecture:** Proposes Exo-MDPs, decomposing state dynamics into independent stochastic (exogenous) and action-dependent deterministic (endogenous) components. Structurally equivalent to Linear Mixture MDPs, enabling linear function app...
|
03-13 06:37 | Success | - | |
|
exp_cr_10.1609_aaai.v38i21.30443_20260313_063657
|
Backfill Candidate cr_10.1609_aaai.v38i21.30443
**Summary for ARES 8GB Roadmap** * **Architecture:** This research proposes a **software-layer methodology** rather than a neural architecture. It utilizes existing Transformer-based models, relying on structured prompt engineering (context...
|
03-13 06:37 | Success | - | |
|
exp_cr_10.51519_journalisi.v7i1.1024_20260313_063618
|
Backfill Candidate cr_10.51519_journalisi.v7i1.1024
**Subject:** IT-Based Knowledge Sharing System with LLM Integration **Architecture:** Conceptual system architecture proposing the integration of Large Language Models (specifically ChatGPT) into university IT ticketing systems. The design...
|
03-13 06:36 | Success | - | |
|
exp_2506.16644v1_20260313_063517
|
This benchmark simulates the **SORE (Sentence-based Omission & Retrieval Engine)** architecture. It replaces an autoregr...
**Architecture** SORE replaces autoregressive LLMs with a dual-stage pipeline utilizing multilingual sentence encoders and Approximate Nearest Neighbor (ANN) search. It identifies core content via metadata embeddings and filters extraneous...
|
03-13 06:35 | Success | - | |
|
exp_pytrain.20260313063309.085_20260313_063336
|
Type-Safe ZipApp Packager
Objective Create a Python function `build_distribution` that programmatically generates a `.pyz` (ZipApp) executable from a dictionary of virtual source files. Constraints - **Standard Library Only**: No external dependencies (e.g., no `myp...
|
03-13 06:33 | Success | - | |
|
exp_2506.16580v1_20260313_063149
|
Backfill Candidate 2506.16580v1
**Architecture:** Replaces standard encoder blocks with an **Emformer** (Efficient Memory Transformer) to enable chunk-based attention and streamable processing. The model utilizes a non-autoregressive decoder to parallelize output generati...
|
03-13 06:31 | Success | - | |
|
exp_oa_W4415031789_20260313_062953
|
Benchmark: T2I Architectures (Transformer vs. Mamba/SSM)
**Architecture:** Surveys 141 T2I works (2021–2024), categorizing them into Autoregressive, GAN, and Diffusion foundations. Highlights **Mamba** and Multimodality as emerging architectures for future performance gains, potentially offering...
|
03-13 06:30 | Success | - | |
|
exp_hf_2603.09906_20260313_062856
|
Benchmark: Reasoning Token Memory & Speed Overhead
**Architecture:** The paper analyzes standard autoregressive LLMs, identifying "reasoning" tokens as a dual-purpose mechanism: a computational buffer for latent processing and a semantic primer (factual priming) that retrieves inaccessible...
|
03-13 06:29 | Success | - | |
|
exp_pytrain.20260313062640.084_20260313_062705
|
Robust Typed CLI Utility with Protocol Abstraction
This benchmark evaluates a Python script's adherence to strict packaging standards and advanced static typing. The candidate script, `benchmark.py`, implements a mock `SystemExporter` utility. It demonstrates robustness by defining a `Stora...
|
03-13 06:27 | Success | - | |
|
exp_2303.17574v1_20260313_062548
|
Benchmark: Expert Weight Removal (EWR) on Flan-T5
**Architecture:** EWR is a training method for **Flan-T5** (Encoder-Decoder) models. It trains a "negative expert" on hallucinated responses and subtracts its weights from the base model, utilizing the **Fisher Information Matrix** to weigh...
|
03-13 06:26 | Success | - | |
|
exp_2309.08960v1_20260313_062352
|
Benchmark: ODSum Simulation (Retrieve-then-Summarize)
**Paper:** ODSum: New Benchmarks for Open Domain Multi-Document Summarization **Architecture:** Standard **retrieve-then-summarize** pipeline. The paper proposes a rule-based method to convert query-based datasets into Open Domain Multi-Doc...
|
03-13 06:24 | Success | - | |
|
exp_2309.08872v2_20260313_062257
|
Benchmark: Structural RAG vs. Naive Chunking (Candidate 2309.08872v2)
**Architecture:** A specialized RAG framework designed to handle document structure, routing queries to retrieve specific layout elements (tables, sections, pages) rather than treating the document as a flat text stream. **Retrieval Strateg...
|
03-13 06:23 | Success | - | |
|
exp_2403.14258v1_20260313_062142
|
Benchmark: Local TRIZ Contradiction Extraction (Llama 3 8B)
**Architecture:** Shifts from fine-tuned BERT-style discriminative classifiers to generative Prompt Engineering using **GPT-4** to extract complex TRIZ contradictions. **Memory & Speed:** The paper relies on API-based GPT-4, bypassing local...
|
03-13 06:22 | Success | - | |
|
exp_pytrain.20260313061914.083_20260313_061949
|
Dynamic Plugin Loader with Strict Type Validation
This benchmark evaluates the implementation of a robust, type-safe plugin architecture using Python's standard library. Problem Statement The objective is to create a system where functionality (plugins) can be discovered and loaded dynamic...
|
03-13 06:19 | Success | - | |
|
exp_cr_10.1093_llc_fqaf082_20260313_061742
|
Backfill Candidate cr_10.1093_llc_fqaf082
**Architecture:** Fine-tuned CLIP (Contrastive Language-Image Pre-Training) model for cross-modal retrieval. **Retrieval Strategy:** Text-to-Image retrieval using visual feature embeddings (bypassing metadata). **Indexing:** Vector index of...
|
03-13 06:17 | Success | - | |
|
exp_2512.14448v1_20260313_061701
|
Backfill Candidate 2512.14448v1
This paper investigates **Reasoning-Style Poisoning (RSP)**, targeting **ReAct**, **Reflection**, and **Tree of Thoughts (ToT)** agent architectures. It employs **Generative Style Injection (GSI)** to rewrite **retrieved documents** with pa...
|
03-13 06:17 | Success | - | |
|
exp_cr_10.3390_electronics13183710_20260313_061614
|
Backfill Candidate cr_10.3390_electronics13183710
**Architecture:** Hybrid model utilizing multi-scale frequency decomposition. High-frequency data is processed via a Temporal GNN with an Adaptive Graph Learning module, while low-frequency data uses a Bidirectional Temporal Network, fused...
|
03-13 06:16 | Success | - | |
|
exp_cr_10.52783_jisem.v10i3.4744_20260313_061522
|
Backfill Candidate cr_10.52783_jisem.v10i3.4744
**Architecture:** The paper proposes a hybrid architecture combining an Enhanced Vision Transformer (EViT) with a Bidirectional LSTM (BiLSTM) for glaucoma detection. The EViT extracts global spatial features, while the BiLSTM processes sequ...
|
03-13 06:15 | Success | - | |
|
exp_pytrain.20260313061210.082_20260313_061311
|
Generic Plugin Loader with PEP 695
Overview This benchmark evaluates a coding agent's ability to utilize modern Python 3.12+ syntax (PEP 695 Type Parameter Syntax) to define generic classes, while simultaneously demonstrating robust packaging practices by dynamically creatin...
|
03-13 06:13 | Success | - | |
|
exp_2506.16633v2_20260313_055245
|
Benchmark for SightSense (GeoGuess) Architecture
**Paper:** GeoGuess (SightSense) **Summary for ARES 8GB Roadmap:** * **Architecture:** Proposes **SightSense**, a multimodal framework processing **Street View panoramas**. It employs a **hierarchical visual encoder** to synthesize local de...
|
03-13 06:10 | Success | - | |
|
exp_hf_2603.10101_20260313_055142
|
Benchmark for CLIPO: Zero-Overhead RLVR Integration
**Architecture:** CLIPO modifies the RLVR training pipeline by integrating a contrastive learning objective into policy optimization. Instead of relying solely on sparse, final-answer rewards, it optimizes the model to distinguish between r...
|
03-13 05:51 | Success | - | |
|
exp_2303.16341v3_20260313_055028
|
This benchmark simulates the **S-ViLM (Structured Video-Language Modeling)** architecture, specifically focusing on the...
**Paper:** S-ViLM (Structured Video-Language Modeling) **Architecture:** S-ViLM utilizes a dual-stream Transformer (Video + Text). It deviates from global contrastive learning to implement **inter-clip spatial grounding** (aligning text to...
|
03-13 05:50 | Success | - | |
|
exp_pytrain.20260313054716.081_20260313_054752
|
Structural Subtyping Plugin Loader
This benchmark validates a robust Python plugin architecture based on structural subtyping using `typing.Protocol`. Hypothesis Leveraging `typing.Protocol` combined with `importlib` enables the development of modular, extensible systems whe...
|
03-13 05:47 | Success | - | |
|
exp_2403.12894v2_20260313_054537
|
Backfill Candidate 2403.12894v2
**Architecture:** Tri-modal binding framework (CXR, ECG, Text) using text as a central anchor. It employs a dual-loss strategy: standard contrastive loss for modality-text pairs and a custom "Edge-Modality Contrastive Loss" to align dispara...
|
03-13 05:45 | Success | - | |
|
exp_2409.13997v1_20260313_054414
|
Backfill Candidate 2409.13997v1
**Architecture:** DriftNet utilizes a "representational drift" mechanism to navigate local loss landscape minima, dynamically retrieving relevant tasks to prevent catastrophic forgetting. It functions as a lifelong learning layer atop stand...
|
03-13 05:44 | Success | - | |
|
exp_pytrain.20260313054024.080_20260313_054105
|
Type-Safe Configuration Manager and Mock Plugin Registry
This benchmark evaluates a Python developer's ability to construct a robust core system typical of high-performance machine learning frameworks (like PyTorch or Lightning AI). The challenge involves creating a strictly typed configuration s...
|
03-13 05:41 | Success | - | |
|
exp_2409.14617v1_20260313_053831
|
Backfill Candidate 2409.14617v1
**Architecture:** Protein-Mamba replaces standard attention mechanisms with Mamba State Space Models (SSMs). It employs a two-stage pipeline: self-supervised pre-training on chemical structures followed by supervised fine-tuning. This shift...
|
03-13 05:38 | Success | - | |
|
exp_2409.14584v1_20260313_053703
|
Benchmark for Hybrid Entity Typing System (Candidate 2409.14584v1)
**Assessment for ARES 8GB Roadmap:** * **Architecture:** Hybrid system combining a fine-tuned Transformer-based text encoder (likely BERT/RoBERTa) with pre-computed network embeddings. Features a classification head over 136 semantic types....
|
03-13 05:37 | Success | - | |
|
exp_2303.16769v1_20260313_053553
|
Backfill Candidate 2303.16769v1
**Architecture:** Utilizes off-the-shelf Vision-Language Models (VLMs) like CLIP, introducing "Semantic Anchors" to fuse sketch features with textual semantic spaces. Trained via a novel Anchored Contrastive Loss to align sketch embeddings...
|
03-13 05:35 | Success | - | |
|
exp_pytrain.20260313053245.079_20260313_053336
|
Type-Safe Virtual Package Registry
Overview This benchmark is designed to test an autonomous coding system's ability to simulate a complex package distribution and loading mechanism, akin to frameworks like Hugging Face Transformers or vLLM. The Challenge The candidate must...
|
03-13 05:33 | Success | - | |
|
exp_2309.11206v2_20260313_052042
|
Retrieve-Rewrite-Answer RAG Benchmark
**Architecture:** Proposes a modular "Retrieve-Rewrite-Answer" RAG pipeline. Instead of injecting raw Knowledge Graph (KG) triples directly into the prompt, it inserts an intermediate generation step. This "Rewrite" stage converts graph tri...
|
03-13 05:30 | Success | - | |
|
exp_pytrain.20260313051821.078_20260313_051857
|
Dynamic Type-Safe Plugin Registry
This benchmark evaluates a Python script's ability to dynamically construct a modular plugin architecture using `typing.Protocol` for structural subtyping and `importlib` for runtime introspection. Objective The script creates a strict `Dat...
|
03-13 05:19 | Success | - | |
|
exp_2309.16816v1_20260313_051652
|
PROSE: Physics-Informed Multimodal Transformers
**Architecture:** PROSE utilizes a multimodal Transformer architecture with feature fusion to simultaneously map parametric inputs to both numerical solution operators and symbolic mathematical expressions. **Memory Footprint:** **High Risk...
|
03-13 05:16 | Success | - | |
|
exp_2409.14607v2_20260313_051552
|
Backfill Candidate 2409.14607v2
**Architecture** Proposes a "Patch Ranking" framework consisting of a lightweight predictor trained to approximate a greedy "Golden Ranking" of local patch tokens. The model prunes lower-ranked tokens and introduces learnable visual prompts...
|
03-13 05:15 | Success | - | |
|
exp_2409.14572v2_20260313_051435
|
Backfill Candidate 2409.14572v2
**Summary: Evaluating LLMs in Materials Science** This study evaluates standard LLM architectures (not novel ones) for materials science applications (Q&A and property prediction) using prompt engineering strategies like Chain-of-Thought an...
|
03-13 05:14 | Success | - | |
|
exp_pytrain.20260313051151.077_20260313_051229
|
Strict CLI Subcommand Dispatcher with Protocol-Based Registry
Overview This benchmark evaluates the implementation of a lightweight, modular CLI tool using Python's standard library. It focuses on correct usage of `argparse` for subcommands and `typing.Protocol` for structural subtyping to ensure a pl...
|
03-13 05:12 | Success | - | |
|
exp_cr_10.2196_67967_20260313_051002
|
Backfill Candidate cr_10.2196_67967
**Architecture:** The study evaluates a fine-tuned `scispaCy` model against two domain-specific LLMs: **NYUTron** (110M parameters) and **GatorTron** (345M parameters). Both are highly optimized "tiny" architectures suitable for clinical NL...
|
03-13 05:10 | Success | - | |
|
exp_2506.16650v1_20260313_050904
|
Backfill Candidate 2506.16650v1
**Architecture:** Proposes a complex, multi-stage agentic workflow. It moves beyond simple code localization by integrating **execution semantics** for context retrieval and **generalized abstraction** for issue understanding. The core uses...
|
03-13 05:09 | Success | - | |
|
exp_2506.16586v1_20260313_050732
|
Benchmark: AI-Agent QA Workflow Simulation (Target: ARES 8GB Roadmap)
**Assessment:** This paper evaluates a *workflow* rather than a specific model architecture. It focuses on applying generic "state-of-the-art" LLMs to QA tasks. * **Architecture:** Utilizes AI-agents for automated test case generation, stat...
|
03-13 05:07 | Success | - | |
|
exp_pytrain.20260313050510.076_20260313_050537
|
Python Skill Fallback
Title: Runtime Type-Safe Plugin Loader - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-13 05:05 | Success | - | |
|
exp_2512.14896v1_20260313_050256
|
DrugRAG Efficiency Benchmark
**Architecture** DrugRAG is a model-agnostic, three-step Retrieval-Augmented Generation (RAG) pipeline. It functions as an external wrapper, retrieving structured drug knowledge to augment prompts without modifying the underlying LLM archit...
|
03-13 05:03 | Success | - | |
|
exp_hf_2603.10165_20260313_050147
|
Benchmark: OpenClaw-RL Policy Deployment
**Architecture:** OpenClaw-RL utilizes an asynchronous pipeline decoupling three components: the live serving policy, a Process Reward Model (PRM) for evaluative signals, and a Hindsight-Guided On-Policy Distillation (OPD) trainer for direc...
|
03-13 05:02 | Success | - | |
|
exp_hf_2603.08068_20260313_050056
|
ICRL: Iterative Curriculum Reinforcement Learning
**Architecture:** ICRL is a training methodology, not a novel inference architecture. It replaces standard SFT+RL pipelines with an RL-only approach, utilizing a "curriculum" where the model learns tool use via in-context examples that are...
|
03-13 05:01 | Success | - | |
|
exp_pytrain.20260313045743.075_20260313_045827
|
Type-Safe Dynamic Extension Loader
Overview This coding drill validates the hypothesis that combining `typing.Protocol` with runtime `importlib` introspection enables the creation of robust, self-verifying plugin architectures. By defining explicit generic interfaces (Protoc...
|
03-13 04:58 | Success | - | |
|
exp_oa_W4377820925_20260313_045615
|
Backfill Candidate oa_W4377820925
**Paper Type:** General Taxonomy / Survey (Not a specific model architecture). **Summary:** This text outlines standard NLP workloads rather than a novel architecture. It defines **Autoregressive Language Models** as the core for text gener...
|
03-13 04:56 | Success | - | |
|
exp_cr_10.1609_aaai.v37i4.25603_20260313_045523
|
Backfill Candidate cr_10.1609_aaai.v37i4.25603
**Architecture:** Dense Retrieval (Contrastive Dual-Encoder). **Retrieval Strategy:** Unsupervised training via "Approximate Aggregated Positive," aggregating same-case evidence to serve as positive examples for queries. **Indexing/Chunking...
|
03-13 04:55 | Success | - | |
|
exp_2309.10506v1_20260313_045432
|
Table Retrieval Benchmark (Dual-Encoder Structural Aggregation)
**Architecture:** Proposes a dual-encoder dense retrieval framework. It decouples the processing of queries (syntactic representation) and tables (structural representation of headers and values), utilizing a specific "syntactical-to-struct...
|
03-13 04:54 | Success | - | |
|
exp_cr_10.1609_aaai.v38i8.28779_20260313_045334
|
Benchmark: TriSampler Enabled Compact Dense Retrieval
**Classification:** Training Optimization (Inference Architecture Agnostic). **Architecture & Retrieval:** Enhances standard **Dense Retrieval (Bi-Encoder)** models via a "quasi-triangular" negative sampling principle. It optimizes training...
|
03-13 04:53 | Success | - | |
|
exp_pytrain.20260313045017.074_20260313_045121
|
Type-Safe Plugin Registry with Semantic Versioning
This benchmark tests the implementation of a robust, type-driven plugin architecture using Python's standard library. It simulates a subset of a package manager's core logic, leveraging advanced typing constructs like `Protocols`, `Generics...
|
03-13 04:51 | Success | - | |
|
exp_cr_10.1142_s0129156425409179_20260313_043305
|
README: Vision Transformer Benchmark (Swin vs ViT)
**Architecture:** Dual-model vision framework utilizing Vision Transformers (ViT) and Swin Transformers for feature extraction, coupled with a spatial indexing strategy for rapid image retrieval. **Retrieval Strategy:** * **Retrieval Archit...
|
03-13 04:48 | Success | - | |
|
exp_pytrain.20260313042923.073_20260313_042951
|
README: Typed Plugin Architecture Benchmark
This benchmark evaluates a Python system's capability to dynamically construct a strictly typed namespace package at runtime. The test simulates a plugin architecture where a core interface (`Protocol`) is defined in a base module, implemen...
|
03-13 04:30 | Success | - | |
|
exp_cr_10.1609_aaai.v38i16.29755_20260313_042619
|
Benchmark: Soft-Prompt Augmented Dense Retrieval
**Architecture:** Standard Dense Retrieval (Bi-Encoder) augmented with learnable **soft tokens** prepended to inputs. These tokens explicitly decouple domain-specific knowledge and supervision signals, enabling zero-shot adaptation without...
|
03-13 04:27 | Success | - | |
|
exp_2506.16552v3_20260313_042452
|
Backfill Candidate 2506.16552v3
**Architecture:** Revela employs a standard dense dual-encoder architecture (Bi-Encoder). It integrates retriever optimization into Language Modeling (LM) training by using retriever-computed similarity scores to weight an in-batch cross-do...
|
03-13 04:24 | Success | - | |
|
exp_pytrain.20260313042055.072_20260313_042139
|
Strict Dataclass Mapper Implementation
This benchmark defines a robust, recursive object mapper (`hydrate`) using only the Python standard library. It validates primitive types, handles nested `dataclass` instances, and manages `Optional` fields. Usage The module exposes two pub...
|
03-13 04:21 | Success | - | |
|
exp_2512.14870v1_20260313_041812
|
HERBench Memory & Fusion Benchmark
**HERBench** introduces a high-complexity VideoQA benchmark requiring the aggregation of at least three temporally separated visual cues. It utilizes a Minimum Required Frame-Set (MRFS) metric averaging 5.5 frames, significantly higher than...
|
03-13 04:18 | Success | - | |
|
exp_hf_2603.08754_20260313_041613
|
HCAPO "Hindsight Critique" Performance Benchmark
**Architecture:** HCAPO modifies the Group Relative Policy Optimization (GRPO) framework by repurposing the LLM as a post-hoc critic. It introduces a multi-scale advantage mechanism to refine step-level Q-values and correct misaligned basel...
|
03-13 04:16 | Success | - | |
|
exp_pytrain.20260313041221.071_20260313_041334
|
Python Skill Fallback
Title: Structural Typing for CLI Plugin Architecture - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-13 04:13 | Success | - | |
|
exp_2303.10395v1_20260313_040931
|
Graph-Guided Retrieval-Augmented Generation (RAG) Benchmark
**Architecture:** A Graph-Guided Retrieval-Augmented Generation (RAG) framework. It retrieves supporting facts from a textual knowledge base, converts them into a question-specific open Knowledge Graph (KG), and performs sequential reasonin...
|
03-13 04:09 | Success | - | |
|
exp_2309.12294v1_20260313_040736
|
Logical Form (LF) to Text: Dual-Stage Generate-and-Rerank Benchmark
**Architecture:** Proposes a dual-stage **Generate-and-Rerank** pipeline for Logical Form (LF) to text. A generator LLM creates $N$ diverse candidates, which a task-specific discriminative reranker scores based on semantic alignment and hum...
|
03-13 04:07 | Success | - | |
|
exp_pytrain.20260313040323.070_20260313_040441
|
Robust Async Plugin Dispatcher Benchmark
Overview This benchmark evaluates a Python-based mini-framework designed for dynamically discovering and executing asynchronous tasks. It emphasizes strict type adherence using `typing.Protocol` and explicit namespace management via `__all_...
|
03-13 04:04 | Success | - | |
|
exp_2403.17359v2_20260313_040042
|
Backfill Candidate 2403.17359v2
**Architecture & RAG Specifics:** Chain-of-Action (CoA) is an **agentic RAG framework** utilizing a reasoning-retrieval loop. It decomposes queries into "Plug-and-Play" actions to fetch heterogeneous multimodal data. * **Retrieval:** Iterat...
|
03-13 04:00 | Success | - | |
|
exp_2512.13164v2_20260313_035850
|
Backfill Candidate 2512.13164v2
**Architecture:** CRAFTS is a Latent Diffusion Model (LDM) utilizing a dual-stage "Correlation-Regulated Alignment Framework" to minimize semantic drift. It integrates ControlNet for spatial conditioning via segmentation masks. **Memory Foo...
|
03-13 03:58 | Success | - | |
|
exp_2403.18058v2_20260313_035737
|
Backfill Candidate 2403.18058v2
**Architecture:** N/A (Data-centric). This paper introduces a high-quality Chinese instruction tuning dataset (COIG-CQIA) derived from real-world sources. It is designed to fine-tune existing open-source architectures (e.g., LLaMA, Baichuan...
|
03-13 03:57 | Success | - | |
|
exp_pytrain.20260313035409.069_20260313_035448
|
Dynamic Type-Safe Plugin Registry Benchmark
This benchmark tests the ability to construct a robust, type-safe plugin architecture using Python's standard library. It evaluates the implementation of dynamic module loading, runtime type checking using `typing.Protocol`, and filesystem...
|
03-13 03:54 | Success | - | |
|
exp_cr_10.3390_math12182941_20260313_035154
|
Backfill Candidate cr_10.3390_math12182941
**Architecture:** Proposes a weighted-average ensemble of five heterogeneous Arabic Transformers (AraBERT, MARBERT, AraELECTRA, AraGPT2, ARBERT). **Memory Footprint:** **Critical Bottleneck.** Concurrently loading five distinct encoder/deco...
|
03-13 03:51 | Success | - | |
|
exp_2506.16623v1_20260313_035042
|
Backfill Candidate 2506.16623v1
**Architecture** The framework utilizes a **frontier-based exploration strategy** guided by a Vision-Language Model (VLM). Instead of simple embedding similarity, it employs **dynamic history-augmented prompting**. The system injects a text...
|
03-13 03:50 | Success | - | |
|
exp_pytrain.20260313034627.068_20260313_034739
|
Robust Dynamic Plugin Loader Benchmark
This benchmark evaluates a Python implementation of a robust plugin architecture using `importlib` for dynamic discovery and `typing.Protocol` for structural subtyping. Objective Create a system that: 1. Dynamically generates a temporary en...
|
03-13 03:47 | Success | - | |
|
exp_oa_W4415248384_20260313_034239
|
Benchmark: Transformer vs. Mamba (SSM) Efficiency on 8GB Constraints
**Subject:** Analysis of *A Comprehensive Survey of Large AI Models for Future Communications* This survey evaluates Large AI Models (LAMs) for 6G, reviewing **Transformers, Diffusion, and Mamba** architectures. Key takeaways for the ARES 8...
|
03-13 03:42 | Success | - | |
|
exp_hf_2603.09877_20260313_034113
|
Benchmark for InternVL-U Architecture Simulation
**Architecture:** InternVL-U utilizes a hybrid "decoupled" architecture, merging a Multimodal Large Language Model (MLLM) for understanding/reasoning with a specialized Multimodal Diffusion Transformer (MMDiT) head for visual generation and...
|
03-13 03:41 | Success | - | |
|
exp_2309.14735v2_20260313_034009
|
Backfill Candidate 2309.14735v2
**Paper Classification:** Comparative Survey / Evaluation (Not a new architecture proposal). * **Architecture:** Benchmarks existing "AILQA paradigms" against OpenAI GPT (API-based baseline). No specific local model architecture (e.g., Enco...
|
03-13 03:40 | Success | - | |
|
exp_pytrain.20260313033558.067_20260313_033723
|
Generic Data Pipeline with Protocol Registration
This benchmark evaluates an autonomous coding system's ability to architect a modular, type-safe data processing pipeline using Python's advanced `typing` features (`Protocol`, `Generic`, `TypeVar`) and packaging standards (`__all__`). Obje...
|
03-13 03:37 | Success | - | |
|
exp_2309.09070v1_20260313_033309
|
Legal QA Hybrid Retrieval Benchmark (L2R + PLM)
**Architecture:** Hybrid system combining classical statistical models and Pre-trained Language Models (PLMs) for legal domain QA. **Retrieval Architecture:** Employs a **Learning-to-Rank (L2R)** approach to consolidate features from variou...
|
03-13 03:33 | Success | - | |
|
exp_2309.08187v1_20260313_033118
|
Benchmark: Hybrid Retrieval with Encoded Summarization (2309.08187v1)
**Architecture:** Hybrid retrieval system combining lexical (sparse) and latent (dense) features via a deep neural phrase-scoring framework. **Retrieval Strategy:** **Encoded Summarization**. The method compresses full legal documents into...
|
03-13 03:31 | Success | - | |
|
exp_pytrain.20260313032651.066_20260313_032756
|
Strictly Typed Plugin System with Semantic Versioning
Overview This benchmark validates the hypothesis that enforcing structural sub-typing using `typing.Protocol` and runtime `inspect` validation creates a more robust plugin architecture than implicit duck-typing. The `ComponentRegistry` dyna...
|
03-13 03:27 | Success | - | |
|
exp_2403.16702v1_20260313_031453
|
Bi-Encoder Code Search Benchmark (Dual-Encoder)
**Architecture & Feasibility:** The paper proposes a **Dual-Encoder (Bi-Encoder)** architecture using modality-agnostic contrastive pre-training to align natural language queries with code representations. This is highly feasible for 8GB VR...
|
03-13 03:25 | Success | - | |
|
exp_pytrain.20260313031141.065_20260313_031222
|
Dynamic Type-Safe Plugin Loader Benchmark
This coding drill evaluates the ability to implement a robust, type-safe plugin system using only the Python standard library. The focus is on dynamic module generation, structural subtyping (Protocols), and generic type safety. Features -...
|
03-13 03:12 | Success | - | |
|
exp_2409.09010v1_20260313_030945
|
Backfill Candidate 2409.09010v1
**Architecture:** Hybrid Graph-Text RAG pipeline (Retrieve-then-Read). **Retrieval Architecture:** Dual-source extraction combining structured Knowledge Graphs (DBLP, SemOpenAlex) and unstructured text (Wikipedia). **Indexing/Chunking:** Ab...
|
03-13 03:09 | Success | - | |
|
exp_2512.13511v1_20260313_030733
|
TARA: Dual-Encoder Video-Text Retrieval Benchmark
**Architecture:** TARA adapts frozen MLLMs (e.g., LLaVA) into video-text embedding models by adding a trainable projection layer. It is trained exclusively on synthetic caption data, eliminating the need for real video datasets. **Retrieval...
|
03-13 03:07 | Success | - | |
|
exp_pytrain.20260313030319.064_20260313_030441
|
Strictly-Typed Data Pipeline CLI Benchmark
Overview This benchmark defines a coding drill focused on **Strict Typing** and **Interface Segregation** using Python's `typing.Protocol` and `argparse`. The goal is to implement a text processing pipeline where components adhere to a stri...
|
03-13 03:04 | Success | - | |
|
exp_2512.13001v1_20260313_030054
|
Backfill Candidate 2512.13001v1
This paper validates the **superiority of Text Embedding Models (TEMs) over Large Language Models (LLMs)** for training-free cold-start recommendation (TFCSR). * **Architecture:** Benchmarks a **TEM-based retrieval approach** (bi-encoder ve...
|
03-13 03:00 | Success | - | |
|
exp_pytrain.20260313025506.063_20260313_025618
|
Structural Subtyping Dispatcher Benchmark
Objective This benchmark evaluates the implementation of a robust CLI dispatcher using Python's `typing.Protocol` for structural subtyping. The architecture ensures that the core dispatcher remains agnostic to concrete command implementatio...
|
03-13 02:56 | Success | - | |
|
exp_2512.14856v2_20260313_025206
|
Backfill Candidate 2512.14856v2
**Architecture:** T5Gemma 2 repurposes the decoder-only Gemma 3 into an **encoder-decoder** architecture via UL2 adaptation, specifically optimized for multimodal and long-context tasks. **Memory Footprint:** The model prioritizes VRAM effi...
|
03-13 02:52 | Success | - | |
|
exp_cr_10.24252_literatify.v5i1.44458_20260313_025015
|
Vector Space Model (VSM) Benchmark
**Report: Literature Review on Vector Space Models (VSM)** **Type:** Literature Review (Traditional Information Retrieval) **Relevance:** Low (Non-Neural), but applicable to RAG preprocessing. * **Architecture:** Analyzes the classic **Vect...
|
03-13 02:50 | Success | - | |
|
exp_pytrain.20260313024535.062_20260313_024713
|
Modern Generic Cache with PEP 695 and Module Hygiene
Objective This coding drill validates the implementation of a modern, thread-safe Least Recently Used (LRU) Cache utilizing **PEP 695 Type Parameter Syntax** (Python 3.12+) and strict module packaging standards. Key Concepts * **PEP 695 (Ty...
|
03-13 02:47 | Success | - | |
|
exp_2403.18093v1_20260313_024223
|
Benchmark: 3-Stage Retrieval-Augmented Generation (RAG) Pipeline
**Architecture:** A sequential 3-stage pipeline: Sparse Retrieval (BM25) $\rightarrow$ Neural Re-ranking (BERT) $\rightarrow$ Generative Retrieval (LLM Prompting). **Memory Footprint:** Mixed. The BM25 and BERT stages are low-VRAM and feasi...
|
03-13 02:44 | Success | - | |
|
exp_pytrain.20260313023843.061_20260313_023939
|
Dynamic Plugin Loader with Protocol Validation
Overview This coding drill demonstrates the use of Python's `importlib` and `typing.Protocol` to build a robust, dynamic plugin system. Objective Construct a command-line script that acts as a plugin loader: 1. **Define Protocol**: Use `typ...
|
03-13 02:39 | Success | - | |
|
exp_hf_2603.08561_20260313_022704
|
RetroAgent Context-Memory Benchmark
**Architecture:** RetroAgent introduces an online RL framework utilizing "hindsight self-reflection" to generate dual intrinsic feedback: numerical rewards for tracking exploration and linguistic lessons stored in an explicit memory buffer....
|
03-13 02:37 | Success | - | |
|
exp_2403.16218v4_20260313_022530
|
This benchmark evaluates the efficacy of the "Coverage-Guided Iterative Generation" architecture described in the subjec...
**Architecture:** Iterative "Test-Analyze-Refine" loop. Uses a standard LLM coupled with a Python interpreter and coverage analyzer (e.g., `coverage.py`). It generates tests, executes them to identify uncovered lines/branches, and feeds the...
|
03-13 02:25 | Success | - | |
|
exp_2403.13468v1_20260313_022442
|
Backfill Candidate 2403.13468v1
**Architecture:** Uses a Mixture-of-Experts (MoE) framework comprising a neural gating network (trained on Wikipedia) and multiple specialized domain experts. **Retrieval Architecture:** Dense Bi-Encoder retrieval. The gating mechanism clas...
|
03-13 02:24 | Success | - | |
|
exp_pytrain.20260313022129.060_20260313_022242
|
Runtime Type-Safe Plugin Loader Benchmark
This benchmark tests the ability to construct a robust, type-safe plugin system using Python's standard library, mirroring the module discovery and registration patterns found in large-scale frameworks like PyTorch or LitGPT. Objective Crea...
|
03-13 02:22 | Success | - | |
|
exp_2409.09717v1_20260313_020953
|
This benchmark focuses on the core bottleneck identified in the abstract: the multi-turn latency introduced by the "Expe...
**Architecture:** Embodied agent framework utilizing function-calling to interface with ATC simulators, augmented by a retrieval mechanism. **Retrieval Architecture:** "Experience Library" (Vector DB). **Strategy:** Stores synthesized knowl...
|
03-13 02:19 | Success | - | |
|
exp_2403.18105v2_20260313_020848
|
README: Educational LLM Tutoring Benchmark
**Assessment: Low Technical Relevance for ARES 8GB Roadmap** * **Architecture:** N/A. This is a survey paper reviewing existing educational applications (tutoring, adaptive learning) and datasets. It does not propose a new model architectur...
|
03-13 02:09 | Success | - | |
|
exp_2403.18063v2_20260313_020737
|
Heracles: High-Resolution Vision Model Benchmark
**Architecture** Heracles is a hybrid model combining a local SSM (using localized convolutions), a global SSM (leveraging a Hartley kernel), and an attention-based token interaction module. This design mitigates the instability of pure SSM...
|
03-13 02:07 | Success | - | |
|
exp_pytrain.20260313020419.059_20260313_020457
|
Typed Plugin Registry with Protocol Enforcement
This coding drill benchmarks a robust, dependency-injection style registry system built entirely with Python's standard library. It leverages structural sub-typing via `typing.Protocol` and Generics (`typing.TypeVar`) to ensure type safety...
|
03-13 02:05 | Success | - | |
|
exp_2303.16780v1_20260313_020242
|
Thistle VDB Benchmark
**Architecture & Retrieval Strategy:** Thistle is a **Rust-based vector database** designed for high-performance, local semantic search. It functions as the retrieval backbone for RAG systems, utilizing standard Approximate Nearest Neighbor...
|
03-13 02:02 | Success | - | |
|
exp_2303.16780v1_20260313_020126
|
Benchmark: Thistle Rust-Based VDB Integration
**Architecture & Retrieval Strategy:** Thistle is a **Rust-based vector database** designed for high-performance, local semantic search. It functions as the retrieval backbone for RAG systems, utilizing standard Approximate Nearest Neighbor...
|
03-13 02:01 | Success | - | |
|
exp_2309.12158v1_20260313_020019
|
Benchmark: Cross-Modal Audio-Sheet Music Retrieval (SSM Dual-Encoder)
**Paper Type:** Survey/Review on Cross-Modal Retrieval. **Architecture:** The paper evaluates **Cross-Modal Deep Learning** architectures, specifically **Dual-Encoders** (Siamese networks) that learn a **Joint Embedding Space** to link audi...
|
03-13 02:00 | Success | - | |
|
exp_pytrain.20260313015742.058_20260313_015822
|
Type-Safe Plugin Architecture Benchmark
This project implements a robust, type-safe plugin architecture using Python's `typing.Protocol` and Generics. It demonstrates structural subtyping (duck typing with static type hints) to enforce interface contracts without explicit inherit...
|
03-13 01:58 | Success | - | |
|
exp_2309.11087v6_20260313_015600
|
Backfill Candidate 2309.11087v6
**Architecture:** Reference-Free DNA Transformer encoder utilizing contrastive loss to project reads and reference fragments into a shared vector space. **Retrieval Strategy (RAG-oriented):** * **Architecture:** Approximate Nearest Neighbor...
|
03-13 01:56 | Success | - | |
|
exp_2403.12393v1_20260313_015437
|
Backfill Candidate 2403.12393v1
**Architecture:** Dr3 is an inference wrapper, not a standalone model. It adds a **Discriminator** module to detect off-topic answers and a **Corrector** loop that refines outputs backward (Re-Compose $\rightarrow$ Re-Solve $\rightarrow$ Re...
|
03-13 01:54 | Success | - | |
|
exp_2409.12959v2_20260313_015323
|
Benchmark: MMSearch-Engine Pipeline (Candidate 2409.12959v2)
**Assessment:** The paper introduces `MMSearch-Engine`, a retrieval-augmented generation (RAG) pipeline designed to empower Large Multimodal Models (LMMs) with search capabilities, plus the `MMSearch` benchmark. * **Architecture & RAG Strat...
|
03-13 01:53 | Success | - | |
|
exp_2409.08788v1_20260313_015243
|
Backfill Candidate 2409.08788v1
**Architecture:** A dual-stage pipeline consisting of a self-supervised ECG encoder (generating fixed-dimensional embeddings from raw time-series data) coupled with an off-the-shelf LLM for report synthesis and QA. **RAG Strategy:** * **Ret...
|
03-13 01:52 | Success | - | |
|
exp_pytrain.20260313014950.057_20260313_015042
|
Dynamic Package Construction and Type Verification
Overview This benchmark evaluates an agent's ability to programmatically generate a valid Python package structure, write strictly typed Python code into it, and subsequently verify the structure and type correctness using reflection and dy...
|
03-13 01:50 | Success | - | |
|
exp_2403.18128v1_20260313_014814
|
Backfill Candidate 2403.18128v1
**Architecture:** HealthGAT utilizes a hierarchical Graph Attention Network (GAT) architecture. It transforms raw Electronic Health Records (EHR) into a graph structure, employing iterative refinement layers to update medical code embedding...
|
03-13 01:48 | Success | - | |
|
exp_2409.14556v2_20260313_014724
|
Backfill Candidate 2409.14556v2
**Architecture:** RACOON utilizes a Retrieval-Augmented Generation (RAG) pipeline, substituting standard vector retrieval with Knowledge Graph (KG) querying. It dynamically retrieves semantic context and constraints from the KG to augment t...
|
03-13 01:47 | Success | - | |
|
exp_hf_2603.04597_20260313_014616
|
Benchmark: GOLF (Group-level Natural Language Feedback)
**Paper Analysis: GOLF (Group-level Natural Language Feedback)** **Architecture:** GOLF introduces a unified RL framework that moves beyond scalar rewards by leveraging group-level natural language feedback. It aggregates two distinct sourc...
|
03-13 01:46 | Success | - | |
|
exp_2409.13920v1_20260313_014525
|
Backfill Candidate 2409.13920v1
**Architecture:** ByT5 (Byte-level Text-to-Text Transfer Transformer). An encoder-decoder model fine-tuned for Sanskrit morphology (segmentation, lemmatization, POS tagging). It processes raw bytes, eliminating the need for tokenizers and h...
|
03-13 01:45 | Success | - | |
|
exp_pytrain.20260313014313.056_20260313_014339
|
Dynamic Plugin Loader with Strict Protocol Validation
Overview This benchmark evaluates a system's capability to dynamically construct a Python package ecosystem at runtime, load modules via `importlib`, and enforce strict structural typing using `typing.Protocol`. Objective The `PluginManager...
|
03-13 01:43 | Success | - | |
|
exp_2506.15594v1_20260313_014100
|
Backfill Candidate 2506.15594v1
**WikiMixQA** is a **benchmark** evaluating **Visual RAG** capabilities, comprising 1,000 multimodal questions over tables and charts from 4,000 long Wikipedia pages. * **Retrieval Architecture:** The benchmark evaluates models in a "Retrie...
|
03-13 01:41 | Success | - | |
|
exp_2303.12998v1_20260313_013906
|
This benchmark evaluates the local feasibility of the candidate "Universal NFT Vector Database" (2303.12998v1). The orig...
**Architecture:** Modular, cloud-centered framework utilizing vector embeddings to represent NFTs (ERC-721) for similarity matching and duplicate detection. **Retrieval Specifics:** * **Architecture:** Universal NFT Vector Database. * **Ind...
|
03-13 01:39 | Success | - | |
|
exp_pytrain.20260313013540.055_20260313_013627
|
Generic Type-Safe Configuration Store
This benchmark evaluates the implementation of a generic, type-safe configuration store using modern Python 3.12+ features. Features * **PEP 695 Support:** Uses the new type parameter syntax `class ConfigStore[T]:` for cleaner, more maintai...
|
03-13 01:36 | Success | - | |
|
exp_cr_10.1609_aaai.v38i20.30232_20260313_013156
|
RAG Legal QA Benchmark (8GB VRAM Constraint)
**Architecture:** An end-to-end **RAG ("retrieve-then-read")** pipeline designed for long-form French legal QA, utilizing the LLeQA dataset. **Retrieval Strategy:** The system retrieves "pertinent legal provisions" (statutory text) to groun...
|
03-13 01:33 | Success | - | |
|
exp_cr_10.3390_app14062613_20260313_013050
|
Sparse RAG Pipeline: CPU-Bound Lucene Simulation
**Architecture:** Sparse RAG pipeline utilizing Apache Lucene for indexing 26.5M PubMed articles. **Retrieval & Chunking:** Employs Query Likelihood with Dirichlet Smoothing (outperforming BM25) on full-text documents. **Reranking & Citatio...
|
03-13 01:30 | Success | - | |
|
exp_pytrain.20260313012805.054_20260313_012837
|
Strictly Typed PyProject Metadata Builder
This benchmark evaluates a Python engineer's ability to utilize advanced static typing constructs to define robust data structures for packaging configurations. Overview Python's dynamic nature allows for flexibility, but in complex systems...
|
03-13 01:28 | Success | - | |
|
exp_cr_10.1167_tvst.14.9.18_20260313_012602
|
Ophthalmology RAG Benchmark
**Paper Summary: Advancing Question-Answering in Ophthalmology** This study benchmarks open-source LLMs (Llama 2, Mistral) against proprietary models (GPT-3.5/4) within a Retrieval-Augmented Generation (RAG) framework for ophthalmology. * *...
|
03-13 01:26 | Success | - | |
|
exp_2506.12733v1_20260313_012447
|
Learning to Fuse: Modality-Aware Adaptive Scheduling (MA-AFS)
**Architecture:** MA-AFS introduces a lightweight neural scheduler that dynamically modulates fusion weights for multimodal encoders (e.g., CLIP, BLIP). It predicts instance-specific weights based on visual/textual entropy and cross-modal a...
|
03-13 01:24 | Success | - | |
|
exp_cr_10.1128_jcm.01624-24_20260313_012354
|
Retrieval-augmented generation salvages poor performance from large language models in answering microbiology-specific m...
**Assessment:** This paper validates the core 8GB VRAM hypothesis: *Domain-specific RAG enables a 7B model (Llama-2) to significantly outperform GPT-4.* It demonstrates that retrieval quality is more critical than parameter count for specia...
|
03-13 01:23 | Success | - | |
|
exp_pytrain.20260313012118.053_20260313_012158
|
Dynamic Type-Validated Plugin Registry
Overview This benchmark tests the ability to design a robust, type-safe plugin architecture using Python's standard library. The objective is to simulate an environment where "plugins" are dynamically created as isolated modules, discovered...
|
03-13 01:22 | Success | - | |
|
exp_2409.13483v1_20260313_011922
|
Speech-Based Open-Domain QA Benchmark
This paper proposes an **ASR-free Multimodal Dense Retriever** for spoken open-domain QA, bypassing the error-prone ASR transcription step. **Architecture:** Utilizes a **Dual-Encoder** setup: a frozen speech encoder (e.g., wav2vec 2.0) and...
|
03-13 01:19 | Success | - | |
|
exp_2403.11335v1_20260313_011742
|
ConvSDG: Session Data Generation for Conversational Search
**ConvSDG** is a data-centric training framework utilizing offline LLMs to generate synthetic multi-turn sessions, thereby improving **Conversational Dense Retrievers** (Bi-encoders). * **Retrieval Architecture:** Dense Bi-encoder (Query-Do...
|
03-13 01:17 | Success | - | |
|
exp_2403.11671v1_20260313_011644
|
HDLdebugger: Streamlining HDL debugging with Large Language Models
**Architecture:** HDLdebugger is a retrieval-augmented framework designed for Hardware Description Language (HDL) debugging. It integrates a reverse-engineering data generator, a search engine for context retrieval, and a fine-tuned Large L...
|
03-13 01:16 | Success | - | |
|
exp_pytrain.20260313011353.052_20260313_011413
|
Type-Safe Plugin Loader for Inference Models
Overview This coding drill challenges you to construct a robust, framework-agnostic model loading system in Python. The goal is to implement a `ModelRegistry` that enforces strict contracts on "inference plugins" without requiring them to i...
|
03-13 01:14 | Success | - | |
|
exp_2403.17611v1_20260313_011211
|
DoTTeR Benchmark: Table-Text Retrieval Evaluation
**Architecture:** DoTTeR utilizes a **dense retrieval** framework augmented with a specialized **Rank-Aware Column Encoder**. It employs a false-positive detection model (during training) to denoise data and integrates table-level ranking i...
|
03-13 01:12 | Success | - | |
|
exp_2309.08469v2_20260313_011116
|
Silver Retriever Benchmark
**Architecture:** Silver Retriever utilizes a **Dense Bi-Encoder** architecture (query and passage encoded independently) based on a Polish BERT variant (likely HerBERT or similar), optimized for semantic vector matching. **Memory & Inferen...
|
03-13 01:11 | Success | - | |
|
exp_2309.08788v2_20260313_011037
|
BioinspiredLLM Benchmarking Suite
**Architecture & Feasibility:** BioinspiredLLM is an open-source autoregressive transformer fine-tuned on a corpus of ~1,000 peer-reviewed articles. **Critical Gap:** The abstract does not specify the base model parameter count (e.g., 7B vs...
|
03-13 01:10 | Success | - | |
|
exp_pytrain.20260313010650.051_20260313_010722
|
Stdlib ZipApp Builder with Protocol Enforcement
Overview This benchmark tests the ability to programmatically construct a Python application using only the standard library. The task involves generating a virtual filesystem, enforcing a `typing.Protocol` interface for a data processing a...
|
03-13 01:07 | Success | - | |
|
exp_2512.14944v1_20260313_010442
|
Puzzle Curriculum GRPO (PC-GRPO) Benchmark
**Architecture & Methodology** PC-GRPO is a post-training reinforcement learning algorithm for VLMs (tested on Qwen-3B/7B). It eliminates external verifiers by using self-supervised "puzzle" environments (PatchFit, Rotation, Jigsaw) to gene...
|
03-13 01:04 | Success | - | |
|
exp_2512.11490v1_20260313_010337
|
VLM2GeoVec: Toward Universal Multimodal Embeddings for Remote Sensing
**Architecture:** Single-encoder Vision-Language Model (VLM) trained contrastively to embed interleaved inputs (images, text, bounding boxes, coordinates) into a unified vector space. **Retrieval Architecture:** **Single-encoder contrastive...
|
03-13 01:03 | Success | - | |
|
exp_2512.12818v1_20260313_010251
|
Hindsight: Agent Memory Benchmark
**Architecture:** Hindsight replaces standard vector retrieval with a structured "first-class" substrate comprising four logical networks (world facts, agent experiences, entity summaries, beliefs) and a recursive "reflection" layer that up...
|
03-13 01:03 | Success | - | |
|
exp_pytrain.20260313005911.050_20260313_010003
|
Python Skill Fallback
Title: Typed Asynchronous Data Ingestion Framework - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-13 01:00 | Success | - | |
|
exp_2506.14429v3_20260313_005718
|
LongLLaDA Benchmark
**Paper:** LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs **Architecture:** Utilizes Diffusion LLMs (LLaDA) enhanced with NTK-aware interpolation (RoPE scaling) for context extrapolation. **Memory Footprint:** **High Poten...
|
03-13 00:57 | Success | - | |
|
exp_2506.15925v1_20260313_005605
|
This benchmark evaluates the "Reranking-based Generation" concept. It compares a standard Zero-Shot generation baseline...
**Architecture:** This paper proposes a **Reranking-based Generation** pipeline. It diverges from single-pass inference by first generating multiple summary candidates (e.g., via zero-shot sampling) and then employing a separate **LLM-based...
|
03-13 00:56 | Success | - | |
|
exp_cr_10.69978_rebicte.v11i.210_20260313_005456
|
Benchmark: Neural Network Indexing vs Classical B-Tree
**Architecture/Retrieval:** Proposes a **Learned Index Model**, replacing traditional structures (B-Trees, Hash) with a Neural Network that acts as a mapping function. The NN approximates the Cumulative Distribution Function (CDF) of data t...
|
03-13 00:55 | Success | - | |
|
exp_pytrain.20260313005152.049_20260313_005231
|
Dynamic Type-Checked Plugin Loader
Overview This benchmark tests the ability to design a robust plugin architecture using Python's `importlib` for dynamic module loading and `typing.Protocol` for structural sub-typing (duck typing with static-like hints). The Challenge Imple...
|
03-13 00:52 | Success | - | |
|
exp_2409.11901v1_20260313_004956
|
LLMs + Persona-Plug = Personalized LLMs
**Architecture:** Proposes **Persona-Plug**, consisting of a frozen base LLM augmented by a lightweight, trainable **User Embedder**. This module aggregates all historical user contexts to generate a single, dense user-specific embedding ve...
|
03-13 00:50 | Success | - | |
|
exp_cr_10.3390_app14062506_20260313_004850
|
Sensor Data Retrieval Benchmark
**Architecture:** A dual-stage pipeline comprising: (1) an LLM-based ETL component that normalizes unstructured sensor data into FAIR-compliant formats (offline), and (2) a retrieval component that creates semantic embeddings of entire tabu...
|
03-13 00:49 | Success | - | |
|
exp_2403.17007v1_20260313_004753
|
DreamLIP Benchmark Simulation
**Architecture:** Standard dual-encoder (Vision Transformer + Text Transformer) utilizing a contrastive learning framework. It introduces a "grouping loss" and dynamic sub-caption sampling during training to align specific text chunks with...
|
03-13 00:48 | Success | - | |
|
exp_2403.17998v1_20260313_004708
|
T-MASS: Text Is MASS Benchmark
**Architecture:** T-MASS replaces static text embeddings with stochastic distributions ("text masses") within a joint text-video embedding space. It employs a **similarity-aware radius module** to dynamically scale the semantic range of the...
|
03-13 00:47 | Success | - | |
|
exp_pytrain.20260313004435.048_20260313_004509
|
Type-Safe Plugin Loader for Namespace Packages
This benchmark tests the ability to construct a robust, type-safe plugin architecture using Python's standard library. The focus is on leveraging `typing.Protocol` for interface definition, `typing.Generic` for container safety, and `import...
|
03-13 00:45 | Success | - | |
|
exp_2309.07610v1_20260313_004240
|
Feature Engineering in Learning-to-Rank for Community Question Answering Task
**Architecture:** A hybrid Learning-to-Rank (LTR) framework that fuses sparse lexical features (BM25, TF-IDF) with dense semantic features derived from a BERT encoder. It explicitly utilizes features extracted from both questions and answer...
|
03-13 00:42 | Success | - | |
|
exp_2309.10954v2_20260313_004141
|
In-Context Learning for Text Classification with Many Labels
**Architecture:** A retrieval-augmented ICL pipeline combining a **pre-trained dense retrieval model** with frozen LLMs (OPT, LLaMA). **RAG Specifics:** * **Retrieval Architecture:** Dense retrieval (bi-encoder). * **Strategy:** **Label Spa...
|
03-13 00:41 | Success | - | |
|
exp_2309.12669v1_20260313_004038
|
HRoT Benchmark
**Architecture & Retrieval Strategy:** HRoT is a prompt-engineering framework combining a **Retriever-Reader** pipeline. It employs a **Retrieval of Thought (RoT)** mechanism, effectively treating reasoning retrieval as a task to fetch spec...
|
03-13 00:41 | Success | - | |
|
exp_2309.14323v1_20260313_003944
|
Cluster Language Model Benchmark
**Architecture:** Proposes replacing global bi-encoders with **Cluster Language Models (CLMs)**. **Retrieval Strategy:** * **Indexing/Chunking:** Uses **K-Means** to cluster queries based on semantic similarity. * **Method:** Fine-tunes a d...
|
03-13 00:40 | Success | - | |
|
exp_pytrain.20260313003711.047_20260313_003745
|
Strict Generic Registry & Packaging Benchmark
This benchmark tests the ability to implement a robust, type-safe plugin registry using Python's advanced typing features (`Protocol`, `Generic`, `TypeVar`, `runtime_checkable`) within a simulated package structure (`__all__`). Drill Instru...
|
03-13 00:37 | Success | - | |
|
exp_2303.13009v1_20260313_003530
|
MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models
**Architecture:** MELTR is a **training-phase** plug-in module utilizing a Transformer network and bi-level optimization (Approximate Implicit Differentiation) to dynamically combine multiple loss functions for fine-tuning video foundation...
|
03-13 00:35 | Success | - | |
|
exp_2303.14617v1_20260313_003433
|
Neural Graph Reasoning (NGDB) Benchmark
This paper proposes Neural Graph Databases (NGDB) for Complex Logical Query Answering (CLQA), shifting retrieval from structural indices to latent reasoning. * **Architecture:** NGDB separates into a **Neural Graph Storage** (Graph/Feature/...
|
03-13 00:34 | Success | - | |
|
exp_hf_2603.07392_20260313_003336
|
Can Large Language Models Keep Up? Benchmarking Online Adaptation to Continual Knowledge Streams
**Assessment: OAKS Benchmark on Continual Knowledge Streams** * **Architecture:** The paper introduces the **OAKS benchmark** to stress-test LLMs on evolving facts within streaming contexts. It evaluates 14 models, including base LLMs and *...
|
03-13 00:33 | Success | - | |
|
exp_2512.14865v1_20260313_003227
|
Audio MultiChallenge Benchmark
**Paper:** Audio MultiChallenge (Benchmark) **Architecture & Scope:** This paper introduces **Audio MultiChallenge**, a benchmark for End-to-End (E2E) Spoken Dialogue Systems (SDS) that process raw audio without intermediate transcription....
|
03-13 00:32 | Success | - | |
|
exp_pytrain.20260313002953.046_20260313_003035
|
Strict Module Interface Validator
Overview This benchmark simulates the initialization routine of a high-performance library (like vLLM or Diffusers). It tests the engine's ability to strictly enforce interface compliance before allowing a module to be loaded into the activ...
|
03-13 00:30 | Success | - | |
|
exp_2512.14930v1_20260313_002809
|
RMPMAB Benchmark: High-Content Microscopy Simulation
**Architecture:** Proposes a Restless Multi-Process Multi-Armed Bandit (RMPMAB) framework. Instead of deep neural networks, it models imaging regions as ensembles of Markov chains to capture biological heterogeneity. It relies on scalable W...
|
03-13 00:28 | Success | - | |
|
exp_oa_W4404354530_20260313_002701
|
Small Language Model (SLM) Efficiency Benchmark
This survey establishes Small Language Models (SLMs) as the optimal solution for hardware-constrained inference (e.g., 8GB VRAM). It redefines SLMs by capability and resource suitability, distinguishing them from massive LLMs like Llama-3.1...
|
03-13 00:27 | Success | - | |
|
exp_cr_10.1609_aaai.v38i16.29765_20260313_002602
|
What Makes Quantization for Large Language Model Hard? An Empirical Study from the Lens of Perturbation
**Architecture:** Introduces a "perturbation lens" framework, analyzing quantization error as additive noise to weights and activations. This theory supports a non-uniform quantization scheme that adapts grid spacing to activation sensitivi...
|
03-13 00:26 | Success | - | |
|
exp_pytrain.20260313002319.045_20260313_002354
|
Strictly Typed Protocol & Resource Packager
This benchmark evaluates the implementation of a strictly-typed, dependency-free resource packager. It verifies the correct usage of modern Python typing constructs, specifically `Protocol`, `TypeGuard`, and `TypedDict`, while ensuring perf...
|
03-13 00:24 | Success | - | |
|
exp_2309.16783v2_20260313_002156
|
Photonic Image Segmentation Benchmark
**Summary: Photonic Accelerators for Image Segmentation** * **Architecture:** The paper evaluates image segmentation DNNs adapted for analog photonic chips. It identifies that specific architectures (likely those with noise-resilient struct...
|
03-13 00:22 | Success | - | |
|
exp_oa_W4416768581_20260313_002047
|
This benchmark implements a "Deep Research" agent architecture based on the systematic survey provided. It decomposes a...
**Paper:** Deep Research: A Systematic Survey **Assessment:** Conceptual Framework / Agentic Workflow **Architecture:** Proposes a "Deep Research" agentic framework with four components: **Query Planning**, **Information Acquisition** (tool...
|
03-13 00:21 | Success | - | |
|
exp_2512.10435v1_20260313_001955
|
SRAP: Semantic Reconstruction of Adversarial Plagiarism Benchmark
**Paper:** Semantic Reconstruction of Adversarial Plagiarism (SRAP) **Summary:** **Architecture & Retrieval Strategy** SRAP utilizes a two-stage pipeline: 1. **Anomaly Detection:** A fine-tuned SciBERT (domain-specific MLM) calculates token...
|
03-13 00:20 | Success | - | |
|
exp_2512.15766v1_20260313_001913
|
LOOPRAG: Enhancing Loop Transformation Optimization with Retrieval-Augmented Large Language Models
**Architecture:** LOOPRAG combines a Large Language Model (LLM) with a **parameter-driven retrieval system** and a **feedback-based iterative mechanism** that utilizes compilation and testing results for verification. **Retrieval Specifics:...
|
03-13 00:19 | Success | - | |
|
exp_pytrain.20260313001653.044_20260313_001734
|
Python Skill Fallback
Title: Structural Subtyping and Dynamic Module Loading - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-13 00:17 | Success | - | |
|
exp_2512.11509v2_20260313_001500
|
This repository provides a lightweight, reproducible benchmark designed to evaluate the computational trade-offs of thre...
**Paper Summary: Does Less Hallucination Mean Less Creativity?** This study benchmarks hallucination mitigation methods—**Chain of Verification (CoVe)**, **Decoding by Contrasting Layers (DoLa)**, and **RAG**—across LLaMA, Qwen, and Mistral...
|
03-13 00:15 | Success | - | |
|
exp_2512.12084v1_20260313_001359
|
FloodSQL-Bench
**FloodSQL-Bench** is a benchmark for evaluating Text-to-SQL systems on complex, multi-table geospatial queries involving spatial and hybrid joins within a flood management domain. * **Architecture:** It assesses RAG-enhanced LLMs rather th...
|
03-13 00:14 | Success | - | |
|
exp_2512.12281v1_20260313_001309
|
Cognitive-YOLO Architecture Synthesis Benchmark
**Architecture:** Cognitive-YOLO synthesizes YOLO-style object detection networks defined in a Neural Architecture Description Language (NADL), instantiated via a compiler. **RAG & Retrieval:** The LLM uses **RAG** to retrieve SOTA detectio...
|
03-13 00:13 | Success | - | |
|
exp_2512.12885v1_20260313_001224
|
SignRAG Pipeline Benchmark
**Architecture:** A dual-stage generative pipeline. An input image is captioned by a Vision Language Model (VLM). This text query retrieves candidates from a vector database, which a Large Language Model (LLM) synthesizes for final classifi...
|
03-13 00:12 | Success | - | |
|
exp_pytrain.20260313001008.043_20260313_001046
|
Asynchronous Type-Safe Asset Manifestor
Overview This benchmark evaluates a Python CLI tool's ability to strictly enforce static typing using `typing.TypedDict` and `TypeAlias`, while correctly implementing `asyncio` for concurrent file processing. The Challenge The script (`mani...
|
03-13 00:10 | Success | - | |
|
exp_2512.13059v1_20260313_000935
|
An Open and Reproducible Deep Research Agent for Long-Form Question Answering
**Architecture:** Iterative agentic workflow combining an LLM controller with a live Open Web Search API for retrieval, reasoning, and synthesis. **RAG Strategy:** * **Retrieval:** Live Web Search API (no static vector database). * **Indexi...
|
03-13 00:09 | Success | - | |
|
exp_2512.13237v1_20260313_000804
|
Learning to Retrieve with Weakened Labels: Robust Training under Label Noise
**Architecture & Training:** This paper introduces a training methodology—**Label Weakening**—for standard Neural Encoders (Bi-Encoders) and Cross-Encoder rerankers. Instead of relying on single, potentially erroneous hard labels, the appro...
|
03-13 00:08 | Success | - | |
|
exp_2601.10718v1_20260313_000722
|
HPV AI Agent System Benchmark
**Architecture:** ReAct Agent with **RAG** and multi-tool orchestration across five heterogeneous sources. Includes a secondary pipeline for automated report generation (sentiment/synthesis). **RAG Details:** * **Retrieval:** Vector databas...
|
03-13 00:07 | Success | - | |
|
exp_2512.13573v2_20260313_000636
|
MMhops-R1: Multimodal Multi-hop Reasoning Benchmark
**Architecture:** MMhops-R1 is a multimodal Retrieval-Augmented Generation (mRAG) framework utilizing Reinforcement Learning (RL) to autonomously plan reasoning paths, generate targeted queries, and synthesize multi-level information. **Ret...
|
03-13 00:06 | Success | - | |
|
exp_2512.14766v1_20260313_000556
|
GR-Agent: Adaptive Graph Reasoning Benchmark
**Architecture:** GR-Agent formalizes Knowledge Graph Question Answering (KGQA) as an agentic interaction loop, utilizing an LLM controller with access to specific graph reasoning tools. **Retrieval Strategy:** The **retrieval architecture*...
|
03-13 00:06 | Success | - | |
|
exp_pytrain.20260313000314.042_20260313_000354
|
Robust Generic Service Container using PEP 695
This coding drill benchmark verifies the implementation of a generic `ServiceContainer` class utilizing **PEP 695 Type Parameter Syntax** (available in Python 3.12+). Features * **Modern Type Syntax**: Uses the new `class ClassName[T]:` syn...
|
03-13 00:03 | Success | - | |
|
exp_2512.14792v1_20260313_000135
|
IaC Generation with LLMs: An Error Taxonomy and A Study on Configuration Knowledge Injection
**Architecture & Retrieval Strategy:** This paper implements a **Graph RAG** framework designed to enhance IaC (Terraform) generation. The retrieval architecture evolves from Naive RAG to a Knowledge Graph (KG) approach. It employs **semant...
|
03-13 00:01 | Success | - | |
|
exp_cr_10.3390_info16090804_20260313_000046
|
Secure Multifaceted-RAG (SecMulti-RAG) Benchmark
**Paper:** Secure Multifaceted-RAG (SecMulti-RAG) **Architecture & Retrieval:** A hybrid RAG framework utilizing three knowledge sources: internal documents, pre-generated "Expert Knowledge" (static cache), and on-demand external LLM genera...
|
03-13 00:01 | Success | - | |
|
exp_2506.12494v2_20260313_000001
|
FlexRAG: A Flexible and Comprehensive Framework for Retrieval-Augmented Generation
**Architecture:** Modular framework supporting **text-based, multimodal, and network-based** retrieval architectures. **RAG Specs:** Abstracts the retrieval pipeline; **chunking and indexing strategies** are user-defined (pluggable) rather...
|
03-13 00:00 | Success | - | |
|
exp_2506.13743v1_20260312_235908
|
LTRR: Learning To Rank Retrievers for LLMs
**Paper:** LTRR: Learning To Rank Retrievers for LLMs **Architecture:** LTRR implements a **Query Routing** strategy using a Learning-to-Rank (LTR) model (specifically XGBoost) to dynamically select the optimal retriever from a heterogeneou...
|
03-12 23:59 | Success | - | |
|
exp_pytrain.20260312235706.041_20260312_235725
|
Type-Safe Plugin Dispatcher Benchmark
This project demonstrates a robust, modular plugin architecture using Python's `typing.Protocol` and `@runtime_checkable` decorators. It simulates the behavior of Python packaging entry points (like `setup.py` entry points or `pyproject.tom...
|
03-12 23:57 | Success | - | |
|
exp_2506.14084v1_20260312_235522
|
Lightweight Relevance Grader in RAG
**Architecture:** Fine-tuned Llama-3.2-1B deployed as a binary relevance grader (classifier) within a RAG pipeline to filter documents post-retrieval. **Memory Footprint:** Extreme efficiency. At 1B parameters, the model requires ~2GB VRAM...
|
03-12 23:55 | Success | - | |
|
exp_2506.14516v2_20260312_235418
|
Benchmark for G-RAG: Generation-Retrieval-Augmented Generation
**Architecture:** A "Generation-Retrieval-Augmented Generation" (G-RAG) pipeline. **Retrieval & Reranking Strategy:** The system employs **HyDE** (Hypothetical Document Embeddings), where the LLM generates a synthetic answer to augment retr...
|
03-12 23:54 | Success | - | |
|
exp_2506.14529v1_20260312_235333
|
Automated Decision-Making on Networks with LLMs through Knowledge-Guided Evolution
**Architecture:** LLMNet is an agentic AutoML framework, not a standalone inference model. It employs LLM agents to iteratively design and refine GNN architectures via a knowledge-guided evolutionary process. **RAG & Retrieval:** Uses RAG t...
|
03-12 23:53 | Success | - | |
|
exp_cr_10.3390_math13050856_20260312_235237
|
Benchmark Design: RAG Hallucination Mitigation via Grounded Constraints
**Paper Type:** Comprehensive Survey. **Architecture:** Reviews standard RAG frameworks (Retriever + LLM), analyzing hallucination sources (confabulations) in both retrieval (missed top-k) and generation (ignoring context) sub-tasks. **RAG...
|
03-12 23:52 | Success | - | |
|
exp_pytrain.20260312235018.040_20260312_235045
|
Dynamic Package Injection and Protocol Verification
This benchmark tests the ability to generate Python package structures dynamically at runtime, inject them into the Python interpreter path, and enforce strict type compliance using `typing.Protocol`. Objective 1. **Dynamic Packaging**: Pro...
|
03-12 23:50 | Success | - | |
|
exp_cr_10.1038_s41746-025-01536-y_20260312_234843
|
Evaluating LLMs vs. RAG in Neurology: Benchmark Suite
**Evaluation Scope:** Clinical performance comparison of Base LLMs vs. Retrieval-Augmented Generation (RAG) in neurology. **Architecture:** * **RAG Variants:** "Document-enabled" (static guidelines) and "Online-enabled" (live web search). *...
|
03-12 23:49 | Success | - | |
|
exp_cr_10.1007_s10278-025-01483-w_20260312_234803
|
Evaluation of a Retrieval-Augmented Generation-Powered Chatbot for Pre-CT Informed Consent: a Prospective Comparative St...
**Status: Technical specifications omitted.** This paper is a clinical outcome study, not an engineering report. Essential architectural details for the ARES 8GB roadmap are **not disclosed**: * **Architecture:** The underlying LLM (e.g., L...
|
03-12 23:48 | Success | - | |
|
exp_2410.00005v1_20260312_234714
|
Benchmark: Meta KDD Cup '24 Winning Solution (CRAG System)
**Architecture:** Hybrid RAG system combining unstructured web search with structured Knowledge Graph (KG) access via tool use. **Retrieval Strategy:** Uses a "regularized API set" where a tuned LLM generates specific API calls to query the...
|
03-12 23:47 | Success | - | |
|
exp_2409.09510v2_20260312_234624
|
Personalization Benchmark: RAG vs. PEFT
**Summary** This paper evaluates RAG versus Parameter-Efficient Fine-Tuning (PEFT) for privacy-preserving LLM personalization on the LaMP benchmark. **Architecture:** Contrasts standard RAG (prompt enrichment) against PEFT (likely LoRA/Adap...
|
03-12 23:46 | Success | - | |
|
exp_2409.09582v2_20260312_234541
|
NEVLP Benchmark Implementation
**Architecture:** NEVLP bridges a **frozen image encoder** and a **frozen LLM** using a trainable **Transformer connector**. It optimizes training via noise-adaptive learning (estimating noise probabilities) and concept-enhanced learning (i...
|
03-12 23:45 | Success | - | |
|
exp_pytrain.20260312234343.039_20260312_234409
|
Python Skill Fallback
Title: Type-Safe Dynamic Backend Loader - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-12 23:44 | Success | - | |
|
exp_2409.18986v2_20260312_234307
|
Lab-AI: Using Retrieval Augmentation to Enhance Language Models for Personalized Lab Test Interpretation in Clinical Med...
**Architecture & Feasibility:** Lab-AI utilizes a two-stage RAG pipeline: **Factor Retrieval** (identifying patient demographics) followed by **Normal Range Retrieval** (fetching conditional reference data), orchestrated via GPT-4-turbo. Th...
|
03-12 23:43 | Success | - | |
|
exp_2409.10825v5_20260312_234139
|
Benchmark: Bias Mitigation in LLM Recommendations
**Architecture:** Evaluates off-the-shelf LLMs (LLaMA, GPT, Gemini) for recommendation tasks; proposes a Retrieval-Augmented Generation (RAG) framework to mitigate algorithmic bias by retrieving diverse candidates to counteract skewed train...
|
03-12 23:41 | Success | - | |
|
exp_2409.11279v1_20260312_234058
|
P-RAG: Progressive Retrieval Augmented Generation Benchmark
**Architecture:** LLM-based agent utilizing an iterative, self-updating retrieval loop. **Retrieval Strategy:** Progressive RAG. Unlike static RAG, it accumulates "experiences" (historical interactions) into a dynamic database. It uses a gr...
|
03-12 23:41 | Success | - | |
|
exp_2409.12140v2_20260312_234014
|
MoRAG Benchmark: Evaluating Multi-Fusion Retrieval & SSM Optimization
**Architecture:** MoRAG augments motion diffusion models via a dual-module pipeline: an LLM for query normalization (spelling/rephrasing) and a multi-part retriever that performs spatial composition of part-specific motion features. **RAG S...
|
03-12 23:40 | Success | - | |
|
exp_2409.12519v3_20260312_233929
|
This repository contains the runnable benchmark code for the Multi-View Adaptive Contrastive Learning for Information Re...
**Architecture:** MACL-IRFL utilizes Graph Neural Networks (GNNs) combined with Adaptive Contrastive Learning. It generates embeddings by aggregating information from three specific graph views: report-code interaction, report-report simila...
|
03-12 23:39 | Success | - | |
|
exp_pytrain.20260312233713.038_20260312_233741
|
Python Skill Fallback
Title: Typed Configuration Package Module - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-12 23:37 | Success | - | |
|
exp_2409.12941v3_20260312_233633
|
Fact, Fetch, and Reason (FRAMES) Benchmark
**Paper Summary:** *Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation* **Focus:** Evaluation Benchmark (FRAMES) for Multi-hop RAG. * **Retrieval Architecture:** The paper proposes a **multi-step retrieval pipel...
|
03-12 23:36 | Success | - | |
|
exp_2409.13537v1_20260312_233514
|
ShizishanGPT: An Agricultural Large Language Model Integrating Tools and Resources
**Architecture** ShizishanGPT is a modular agent framework integrating a **Retrieval Augmented Generation (RAG)** pipeline with an Agricultural Knowledge Graph (KG) and external tool execution. It relies on a heavy GPT-4 backbone for generi...
|
03-12 23:35 | Success | - | |
|
exp_2409.14083v1_20260312_233420
|
SURf Benchmark Suite
**Architecture:** SURf is a self-refinement fine-tuning framework for LVLMs. It constructs training sets using positive (corrective) and negative (misleading) multimodal references to teach the model backbone how to selectively filter retri...
|
03-12 23:34 | Success | - | |
|
exp_2403.12582v1_20260312_233338
|
README: AlphaFin Benchmarking Suite
**Paper:** AlphaFin (Stock-Chain) **Architecture:** A retrieval-augmented generation (RAG) framework trained on the AlphaFin benchmark, combining real-time financial data with handwritten chain-of-thought (CoT) reasoning. **RAG Specifics:**...
|
03-12 23:33 | Success | - | |
|
exp_2404.10779v1_20260312_233240
|
Fine-Tuning LLM for Enterprise: Benchmark Suite
**Architecture:** Focuses on fine-tuning open-weight models (specifically LLaMA) on proprietary enterprise data (documentation and code) to surpass standard Retrieval-Augmented Generation (RAG) quality, arguing RAG is limited by vector data...
|
03-12 23:32 | Success | - | |
|
exp_pytrain.20260312233045.037_20260312_233104
|
Python Skill Fallback
Title: Runtime-Typed Plugin Loader with Dynamic Package Discovery - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-12 23:31 | Success | - | |
|
exp_2403.17428v2_20260312_232919
|
Aligning Large Language Models for Enhancing Psychiatric Interviews Through Symptom Delineation and Summarization: Pilot...
**Architecture:** Proposes a multi-stage pipeline: (1) Stressor Extraction (NER), (2) Symptom Section Identification (Span Detection), and (3) Summarization using extracted context. **RAG Strategy:** The paper explicitly states RAG showed *...
|
03-12 23:29 | Success | - | |
|
exp_2403.17645v3_20260312_232826
|
DANCER: Entity Description Augmented Named Entity Corrector for Automatic Speech Recognition
**Architecture:** DANCER proposes an Efficient Entity Description Augmented Masked Language Model (EDA-MLM) for post-ASR error correction. It replaces traditional phonetic edit-distance algorithms with a hybrid **dense retrieval + Masked La...
|
03-12 23:28 | Success | - | |
|
exp_2403.17848v1_20260312_232656
|
ArabicaQA Benchmark Suite
**Paper:** ArabicaQA: A Comprehensive Dataset for Arabic Question Answering **Summary for ARES 8GB Roadmap:** * **Architecture:** The paper introduces **AraDPR**, a **Dense Passage Retrieval (DPR)** model (Dual-encoder BERT-based) tailored...
|
03-12 23:27 | Success | - | |
|
exp_2309.11322v2_20260312_232615
|
Vector database management systems: Fundamental concepts, use-cases, and current challenges
**Architecture:** Narrative review of Vector Database Management Systems (VDBMS) designed for high-dimensional, sparse data. **RAG Specifics:** * **Retrieval Architecture:** Approximate Nearest Neighbor (ANN) similarity search. * **Indexing...
|
03-12 23:26 | Success | - | |
|
exp_pytrain.20260312232355.036_20260312_232431
|
Python Skill Fallback
Title: Dynamic Namespace Loader with Protocol Enforcement - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-12 23:24 | Success | - | |
|
exp_2309.12132v2_20260312_232208
|
Benchmark Design: GraphRAG vs. Vanilla LLM for Contract Review
**Architecture:** A tuning-free GraphRAG framework combining LLMs with a Nested Contract Knowledge Graph (NCKG). **Retrieval Strategy:** Utilizes **NCKG-based graph traversal** instead of vector chunking. The system indexes contract clauses...
|
03-12 23:22 | Success | - | |
|
exp_2309.15427v2_20260312_232118
|
Graph Neural Prompting (GNP) Benchmark
**Architecture:** GNP augments a frozen LLM with a trainable Graph Neural Network (GNN) encoder and a domain projector. It extracts embeddings from Knowledge Graph (KG) subgraphs and converts them into continuous "soft prompts" to guide the...
|
03-12 23:21 | Success | - | |
|
exp_2309.16035v3_20260312_232021
|
MKRAG Efficiency Benchmark
**Architecture:** Standard RAG pipeline coupling a retrieval encoder with a Vicuna-7B generator. Avoids fine-tuning, relying on prompt injection for domain adaptation. **Retrieval Strategy:** Extracts facts from the MedQA-SMILE dataset. Spe...
|
03-12 23:20 | Success | - | |
|
exp_2303.14369v1_20260312_231937
|
Benchmark Design for HBI (Hierarchical Banzhaf Interaction)
**Architecture:** Proposes Hierarchical Banzhaf Interaction (HBI), modeling video frames and text words as cooperative game players. It stacks token-merge modules to cluster inputs and compute fine-grained interactions at multiple semantic...
|
03-12 23:19 | Success | - | |
|
exp_pytrain.20260312231732.035_20260312_231759
|
Python Skill Fallback
Title: Generic Data Store & CLI Module - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-12 23:18 | Success | - | |
|
exp_2303.16145v1_20260312_230544
|
Benchmark: NeuralMind-UNICAMP mT5 CLIR Reranker
**Architecture:** Utilizes **mT5-XXL** (approx. 11B parameters) as a cross-lingual reranker within a two-stage retrieval pipeline. **Retrieval & Context:** * **1st Stage:** Sparse retrieval (BM25). * **2nd Stage:** mT5-XXL reranks query-doc...
|
03-12 23:15 | Success | - | |
|
exp_2304.01003v1_20260312_230449
|
QUADRo: Dataset and Models for QUestion-Answer Database Retrieval
**Paper:** QUADRo: Dataset and Models for QUestion-Answer Database Retrieval **Summary:** **Architecture:** A dual-stage Neural IR pipeline utilizing a **Bi-Encoder** for retrieval and a **Cross-Encoder** for reranking. The system encodes b...
|
03-12 23:04 | Success | - | |
|
exp_pytrain.20260312230140.034_20260312_230225
|
Python Skill Fallback
Title: Strictly-Typed Model Artifact Packager - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-12 23:02 | Success | - | |
|
exp_hf_2603.09229_20260312_225857
|
Flash-KMeans: Fast and Memory-Efficient Exact K-Means
**Architecture:** Flash-KMeans replaces standard GPU K-means stages with two kernel-level innovations. **FlashAssign** fuses distance computation with online argmin selection, bypassing intermediate memory writes. The **sort-inverse update*...
|
03-12 22:59 | Success | - | |
|
exp_hf_2603.10702_20260312_225727
|
UniCom: Unified Multimodal Modeling Benchmark
**Architecture** UniCom utilizes a **transfusion architecture** (superior to query-based designs) featuring an attention-based semantic compressor. It generates **compact, continuous semantic representations** by prioritizing channel reduct...
|
03-12 22:57 | Success | - | |
|
exp_cr_10.3390_sym17030471_20260312_225621
|
Benchmark: Improved Model-Free Adaptive Predictive Control (MFAPC)
**Verdict: Incompatible** This paper addresses **Control Theory** (Model-Free Adaptive Predictive Control), not Deep Learning. It focuses on networked cyber-physical systems under DoS attacks and does not describe a neural network architect...
|
03-12 22:56 | Success | - | |
|
exp_pytrain.20260312225328.033_20260312_225403
|
Strictly-Typed Kernel Registry Benchmark
Overview This benchmark simulates a high-performance kernel registration subsystem similar to those found in vLLM or PyTorch. It tests the hypothesis that enforcing strict `typing.Protocol` constraints at import-time reduces runtime errors...
|
03-12 22:54 | Success | - | |
|
exp_cr_10.1609_aaai.v38i17.29815_20260312_225059
|
Benchmark for Norm Tweaking in Low-Bit Quantization
**Architecture:** A plugin for existing Post-Training Quantization (PTQ) pipelines. It does not alter core Transformer blocks but modifies Layer Normalization weights. The method aligns the distribution of quantized activations with their f...
|
03-12 22:51 | Success | - | |
|
exp_2512.10596v1_20260312_224927
|
Benchmark: Beyond Pixels (T2T Retrieval)
**Architecture:** Proposes **TRSLLaVA**, a training-free framework converting cross-modal retrieval into **Text-to-Text (T2T)** matching. It replaces vision encoders with a VLM (LLaVA) to generate structured captions for images, aligning th...
|
03-12 22:49 | Success | - | |
|
exp_2512.14102v1_20260312_224836
|
Neurosymbolic Inference On Foundation Models For Remote Sensing Text-to-image Retrieval With Complex Queries
**Architecture:** Neurosymbolic framework (RUNE) combining Large Language Models (LLMs), object detectors, and First-Order Logic (FOL). It treats text-to-image retrieval as a symbolic reasoning task rather than implicit vector matching. **R...
|
03-12 22:48 | Success | - | |
|
exp_cr_10.14419_dzzstd42_20260312_224734
|
DNGR: Deep Neural Graph-Based Recommendation System for Scholarly Paper Retrieval
**Architecture:** DNGR couples Graph Neural Networks (GNNs) with SciBERT embeddings, processing a heterogeneous academic graph of citations, authors, and topics. **Retrieval & RAG Details:** * **Architecture:** Deep Neural Graph-based Recom...
|
03-12 22:47 | Success | - | |
|
exp_pytrain.20260312224454.032_20260312_224533
|
Type-Safe Plugin Registry for Model Configurations
This benchmark evaluates a Python coding system's ability to implement robust, type-safe package architecture using standard library features. The task is to construct a modular "model registry" system similar to those found in enterprise A...
|
03-12 22:45 | Success | - | |
|
exp_2506.14445v1_20260312_224302
|
Vela: Multimodal Embedding Benchmark
**Architecture:** Vela repurposes a frozen Voice Large Language Model (vLLM) as a dual-encoder to generate unified multimodal embeddings. It bridges the text-audio gap using prompt engineering and in-context learning, training exclusively o...
|
03-12 22:43 | Success | - | |
|
exp_2409.09721v2_20260312_224158
|
Finetuning CLIP to Reason about Pairwise Differences
**Architecture:** Standard CLIP dual-encoder (ViT + Text Transformer) finetuned via contrastive learning on synthetic LLM-generated data to align image embedding differences ($I_1 - I_2$) with text descriptions of differences. **Memory Foot...
|
03-12 22:42 | Success | - | |
|
exp_2403.15378v3_20260312_224059
|
Long-CLIP: Unlocking Long-Text Capability Benchmark
**Architecture:** Long-CLIP addresses CLIP’s 77-token limit via two efficient fine-tuning strategies: knowledge-preserved stretching of positional embeddings and primary component matching of features. This preserves the original latent spa...
|
03-12 22:41 | Success | - | |
|
exp_pytrain.20260312223822.031_20260312_223854
|
Strict Protocol Plugin Loader Benchmark
This benchmark tests the hypothesis that combining `typing.Protocol` with `importlib` allows for a robust, zero-dependency plugin system that validates interfaces at runtime without manual registration. Instructions 1. Save the code below a...
|
03-12 22:38 | Success | - | |
|
exp_2403.16265v1_20260312_223608
|
Benchmark: Graph-Augmented Patent Phrase Similarity
**Architecture:** Hybrid retrieval-augmented encoder combining a standard contextualized model (e.g., BERT) with a Graph Neural Network (GNN). **Retrieval Architecture:** **Citation-Graph Retrieval.** Instead of standard chunking, it constr...
|
03-12 22:36 | Success | - | |
|
exp_cr_10.1609_aaai.v38i8.28714_20260312_223443
|
UniGen: Unified Retrieval and QA Benchmark
**Architecture:** Dual-decoder Transformer (Shared Encoder + Generative Retrieval Decoder + QA Decoder). Utilizes LLM-generated connectors to bridge query-to-doc and doc-to-answer representations. **Retrieval Strategy:** **Generative Docume...
|
03-12 22:34 | Success | - | |
|
exp_2303.11313v3_20260312_223349
|
CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition
**Paper:** CLIP goes 3D (CG3D) **Architecture** Introduces a learnable 3D point cloud encoder aligned with frozen CLIP (Vision/Text) encoders. It uses contrastive loss on triplets of (Pointcloud, Rendered Image, Text). **Retrieval Strategy*...
|
03-12 22:33 | Success | - | |
|
exp_pytrain.20260312223032.030_20260312_223108
|
Python Skill Fallback
Title: Type-Annotated Async Fetcher with Package Structure - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-12 22:31 | Success | - | |
|
exp_2309.16889v2_20260312_222904
|
Superpixel Transformers for Efficient Semantic Segmentation
**Architecture:** The model replaces dense pixel processing with a Superpixel Transformer backbone. It utilizes local cross-attention to dynamically map pixels to a reduced set of "superpixel" tokens. Standard multi-head self-attention is t...
|
03-12 22:29 | Success | - | |
|
exp_2512.11506v2_20260312_222800
|
EmeraldMind Benchmark
**EmeraldMind Summary** * **Architecture:** A GraphRAG framework integrating a domain-specific Knowledge Graph (**EmeraldGraph**) with an LLM to verify claims against ESG reports. * **Memory Footprint:** **High Efficiency.** The heavy memor...
|
03-12 22:28 | Success | - | |
|
exp_2512.14744v1_20260312_222708
|
VERAFI: Verified Agentic Financial Intelligence through Neurosymbolic Policy Generation
**Architecture:** Neurosymbolic Agentic framework combining Dense Retrieval, Cross-Encoder Reranking, and automated reasoning policies (GAAP/SEC/Math validation). **RAG Specs:** Dense Retrieval + **Cross-Encoder Reranking**. No specific chu...
|
03-12 22:27 | Success | - | |
|
exp_pytrain.20260312222417.029_20260312_222454
|
Package Metadata & Type Coverage Verifier
This benchmark evaluates the ability to construct a static analysis tool using Python's standard library. The goal is to inspect a namespace (simulated by `globals()`) to verify packaging compliance (checking `__all__` integrity) and type c...
|
03-12 22:24 | Success | - | |
|
exp_2601.06039v1_20260312_222232
|
Operation Veja: VEJA Framework Benchmark
**Architecture:** None. This is a data curation framework, not a model architecture proposal. **Methodology:** Introduces the **VEJA** paradigm (Values, Experiences, Judgments, Abilities) to generate training data that fosters "deliberative...
|
03-12 22:22 | Success | - | |
|
exp_2512.12858v1_20260312_222122
|
Benchmark: Information-Consistent LM Recommendations (GRPO)
**Architecture:** Proposes a reinforcement learning framework utilizing Group Relative Policy Optimization (GRPO) to minimize output variance across semantically equivalent prompt groups. This is a model alignment/training technique, not a...
|
03-12 22:21 | Success | - | |
|
exp_pytrain.20260312221728.028_20260312_221831
|
Generic Plugin Registry with PEP 695 Type Parameters
Overview This benchmark evaluates the design and implementation of a strictly-typed Plugin Registry system utilizing **Python 3.12+** features, specifically **PEP 695 Type Parameter Syntax**. Features - **PEP 695 Syntax**: Uses the new `cla...
|
03-12 22:18 | Success | - | |
|
exp_2512.13074v1_20260312_221513
|
Benchmark: Symmetric Consistent Indexing (SCI) for Dense Retrieval
**Architecture:** SCI enhances standard **dual-tower dense retrieval** by addressing representational misalignment and training-inference inconsistency. **Retrieval Specs:** * **Indexing Strategy:** Implements **Dual-view indexing** to ensu...
|
03-12 22:15 | Success | - | |
|
exp_2512.14762v1_20260312_221355
|
Benchmark: Workflows vs Agents for Code Translation
**Architecture:** Compares fixed workflows against an **MCP-based agentic framework** for MATLAB-to-HDL syntax repair. The agent architecture dynamically selects tools rather than following a static chain. **Retrieval & Context:** Utilizes...
|
03-12 22:14 | Success | - | |
|
exp_2512.14179v1_20260312_221212
|
Benchmark: RAG Pipelines for Bengali Dialect Translation
**Validation:** This paper validates the ARES 8GB strategy, demonstrating that retrieval augmentation allows an 8B parameter model (Llama-3.1-8B) to outperform 120B-class models in low-resource translation. **Retrieval Architecture:** The s...
|
03-12 22:12 | Success | - | |
|
exp_pytrain.20260312220925.027_20260312_220959
|
Type-Safe Plugin Registry Factory
Overview This coding drill challenges you to implement a robust, generic Plugin Registry system in Python, inspired by the extensibility mechanisms found in frameworks like PyTorch and LitGPT. Objective Create a `PluginRegistry` class that...
|
03-12 22:10 | Success | - | |
|
exp_2512.14417v1_20260312_220657
|
PortAgent: LLM-driven Vehicle Dispatching Agent for Port Terminals
**Architecture:** Multi-agent framework (Virtual Expert Team) utilizing four specialized roles (Retriever, Modeler, Coder, Debugger) and a Reflexion-inspired self-correction loop to automate vehicle dispatching logic. **RAG Implementation:*...
|
03-12 22:07 | Success | - | |
|
exp_cr_10.64552_wipiec.v11i1.95_20260312_220532
|
MicroRAG Benchmark
**Architecture:** RAG-based framework targeting technical microarchitecture documentation (AURIX TriCore). **Memory Footprint:** The study validates 3B and 8B parameter models against a 72B baseline. An 8B model is highly suitable for 8GB V...
|
03-12 22:06 | Success | - | |
|
exp_pytrain.20260312220144.026_20260312_220237
|
Generic Pipeline Registry Benchmark
This benchmark evaluates a Python implementation of a modular processing pipeline using modern typing features (`typing.Protocol`, `typing.TypeVar`, `typing.Generic`). Key Concepts * **`ProcessingStep` Protocol**: Defines the contract for a...
|
03-12 22:02 | Success | - | |
|
exp_cr_10.3390_info16090766_20260312_214913
|
This repository contains the benchmarking code for the paper titled **"Retrieval-Augmented Generation vs. Baseline LLMs:...
**Analysis for ARES 8GB Roadmap:** * **Architecture:** Evaluates RAG-augmented performance against baselines for TinyLlama (1.1B), Mistral (7B), Llama 3.1 (8B), and Llama 1 (13B). * **RAG Specifics:** The abstract lacks technical specifics...
|
03-12 21:59 | Success | - | |
|
exp_cr_10.3390_info16090786_20260312_214825
|
Analysis of Large Language Models for Company Annual Reports Based on Retrieval-Augmented Generation
**Paper Type:** Evaluation Study (Proprietary Models) **Summary:** This paper assesses the performance of cloud-based LLMs (ChatGPT-4, Gemini) enhanced with Retrieval-Augmented Generation (RAG) for analyzing financial annual reports. * **Ar...
|
03-12 21:48 | Success | - | |
|
exp_cr_10.3390_computers14090382_20260312_214740
|
GraphTrace: A Modular Retrieval Framework Combining Knowledge Graphs and Large Language Models for Multi-Hop Question An...
**Architecture:** Modular Graph-based RAG utilizing a Knowledge Graph (KG) rather than vector stores. **Retrieval Strategy:** * **Indexing:** Structured entity relationships (domain-specific KG), bypassing traditional text chunking. * **Pro...
|
03-12 21:47 | Success | - | |
|
exp_cr_10.32996_jcsts.2025.7.9.56_20260312_214637
|
This benchmark simulates the performance characteristics of the described "Contextual Retrieval-Augmented Generation" ar...
**Architecture:** Serverless RAG pipeline utilizing **AWS Kendra** for retrieval and external **Claude API** for generation, orchestrated via API Gateway and Lambda. **Retrieval & Context:** * **Retrieval Architecture:** AWS Kendra (Managed...
|
03-12 21:46 | Success | - | |
|
exp_pytrain.20260312214356.025_20260312_214425
|
Robust Plug-in Loader with Runtime Protocol Verification
This benchmark evaluates the design of a type-safe, extensible plugin architecture using Python's `typing.Protocol` and `@runtime_checkable`. Overview The script implements a simulated data processing package. It defines a strict behavioral...
|
03-12 21:44 | Success | - | |
|
exp_cr_10.3390_electronics14183676_20260312_214201
|
Enhancing Clinical Named Entity Recognition via Fine-Tuned BERT and Dictionary-Infused Retrieval-Augmented Generation
**Architecture:** Two-stage pipeline. Stage 1 utilizes a fine-tuned BERT for clinical NER. Stage 2 employs a **Dictionary-Infused Retrieval-Augmented Generation (DiRAG)** module for terminology normalization, merging semantic retrieval with...
|
03-12 21:42 | Success | - | |
|
exp_cr_10.3390_biomimetics10090626_20260312_214043
|
Benchmark: Biomimicry Design Spiral RAG Framework
**Architecture:** A specialized, stage-specific RAG framework coupling a locally hosted **Llama 3.1** model with a domain-specific **AskNature corpus** (2,106 documents) to facilitate the Biomimicry Design Spiral (BSD). **RAG Specifics:** *...
|
03-12 21:40 | Success | - | |
|
exp_oa_W4410600121_20260312_213933
|
Document GraphRAG Benchmark
**Architecture:** Knowledge Graph-enhanced RAG (GraphRAG) leveraging document-intrinsic structure. **Retrieval & Indexing:** Uses **graph-based document structuring** and **keyword-based semantic linking**. It optimizes retrieval by tuning...
|
03-12 21:39 | Success | - | |
|
exp_pytrain.20260312213630.024_20260312_213706
|
Strictly-Typed Plugin Dispatcher Benchmark
Objective This benchmark evaluates a Python implementation of a modular plugin dispatcher. It validates the hypothesis that utilizing Structural Subtyping (via `typing.Protocol`) combined with Generics ensures strict adherence to component...
|
03-12 21:37 | Success | - | |
|
exp_2506.12637v2_20260312_213448
|
How Grounded is Wikipedia? A Study on Structured Evidential Support and Retrieval
**Assessment:** This is a study and dataset release (**PeopleProfiles**), not a novel model architecture. It evaluates the reliability of Wikipedia citations and the efficacy of retrieval systems in finding supporting evidence. **Retrieval...
|
03-12 21:34 | Success | - | |
|
exp_2506.12895v1_20260312_213342
|
Legal IR Performance Benchmark
**Architecture:** Comparative analysis of **Lexical (BM25)** vs. **Dense Retrieval** (Transformer-based Bi-encoders). **Retrieval Strategy:** Passage-level retrieval of legal decisions. **Key Findings:** Off-the-shelf dense models underperf...
|
03-12 21:33 | Success | - | |
|
exp_2506.14086v1_20260312_213214
|
InsertRank: Benchmark Suite
**InsertRank** employs a **Listwise Reranking** architecture. It integrates **BM25** lexical scores directly into the LLM prompt, allowing the model to reason over retrieval signals rather than just semantic text. * **Retrieval Architecture...
|
03-12 21:32 | Success | - | |
|
exp_pytrain.20260312212933.023_20260312_213008
|
Strict Type ZipApp Bundler
Overview This project provides a robust CLI tool that enforces strict typing within Python source files before bundling them into a portable ZipApp (`.pyz`) executable. Hypothesis An autonomous coding system demonstrates advanced capability...
|
03-12 21:30 | Success | - | |
|
exp_2506.14336v1_20260312_212905
|
AviationLLM Benchmark: RALA-DPO vs Base SFT
**Architecture:** RALA-DPO utilizes a **Qwen** base model, fine-tuned via **Direct Preference Optimization (DPO)** and enhanced with **Retrieval-Augmented Generation (RAG)**. **RAG Pipeline:** The abstract confirms RAG usage to mitigate hal...
|
03-12 21:29 | Success | - | |
|
exp_2506.14488v1_20260312_212747
|
Benchmark: Retrieval-Enhanced Aligned Diffusion (READ)
**Architecture:** READ integrates an SE(3)-equivariant diffusion model with a contrastively pre-trained graph encoder to align atom-level representations. **RAG Specifics:** * **Retrieval Architecture:** Graph-based retrieval using a pre-tr...
|
03-12 21:27 | Success | - | |
|
exp_2506.15241v1_20260312_212707
|
Research on Graph-Retrieval Augmented Generation Based on Historical Text Knowledge Graphs
**Architecture:** A GraphRAG framework combining Knowledge Graph (KG) retrieval with Chain-of-Thought (CoT) prompting. It utilizes a collaborative KG-LLM mechanism to improve entity alignment and reduce hallucinations in historical text ana...
|
03-12 21:27 | Success | - | |
|
exp_2506.15415v1_20260312_212617
|
Benchmark Design: Targeted Lexical Injection (TLI)
**Architecture:** The paper applies **Targeted Lexical Injection (TLI)** to Lugha-Llama-8B. This method uses **LoRA** to fine-tune embeddings specifically from **Layer 2** (identified as the peak alignment layer) using a contrastive objecti...
|
03-12 21:26 | Success | - | |
|
exp_pytrain.20260312212238.022_20260312_212341
|
Modern Generic Cache with PEP 695 Syntax
**Overview** This benchmark evaluates the implementation of a modern, type-safe in-memory cache using Python 3.12's PEP 695 Type Parameter Syntax. **Features** - **PEP 695 Syntax**: Utilizes the new `class MyClass[T]:` and `type MyAlias[T]...
|
03-12 21:23 | Success | - | |
|
exp_2506.15569v1_20260312_212050
|
SciVer: Evaluating Foundation Models for Multimodal Scientific Claim Verification
**Paper:** SciVer (Benchmark) **Category:** Evaluation & RAG Analysis **Architecture:** Focuses on **Multimodal LLMs** suitable for local inference, specifically **Llama-3.2-Vision** and **Qwen2.5-VL**. **RAG & Retrieval Strategy:** * **Arc...
|
03-12 21:20 | Success | - | |
|
exp_2506.15655v2_20260312_211955
|
cAST: Enhancing Code Retrieval-Augmented Generation with Structural Chunking via Abstract Syntax Tree
**Architecture:** cAST proposes a **structure-aware preprocessing pipeline** for Code RAG, replacing heuristic line-based splitting with **Abstract Syntax Tree (AST)** parsing. **Retrieval & Chunking Strategy:** * **Retrieval Architecture:*...
|
03-12 21:19 | Success | - | |
|
exp_2506.21596v2_20260312_211855
|
Evaluating Multimodal Large Language Models on Educational Textbook Question Answering
**Architecture:** Benchmarks LLaVA-1.5 and LLaMA 3.2-Vision (VLMs) on the CK12-QA dataset. **Retrieval Architecture:** Multimodal RAG pipeline providing lesson paragraphs and diagrams. (Indexing/chunking strategy and reranking methods are n...
|
03-12 21:18 | Success | - | |
|
exp_pytrain.20260312211539.021_20260312_211617
|
Generic Event Dispatcher with Protocol-Based Registration
Overview This benchmark demonstrates an autonomous coding system designing an extensible, loosely-coupled architecture using Python's advanced typing features. **Hypothesis:** An autonomous coding system can effectively design extensible, l...
|
03-12 21:16 | Success | - | |
|
exp_2506.15911v2_20260312_211239
|
Tibbe-AG: Islamic Medicine Response Validation Benchmark
**Architecture:** Evaluates 7B-class models (LLaMA-3, Mistral-7B, Qwen2-7B) within a multi-stage pipeline. The flow transitions from **Retrieval-Augmented Generation (RAG)** to a **Scientific Self-Critique Agent**, concluding with an **LLM-...
|
03-12 21:13 | Success | - | |
|
exp_2506.16172v1_20260312_211058
|
Benchmark: SGIC (Self-Guided Iterative Calibration) for RAG
**Architecture & RAG Strategy:** SGIC introduces an iterative wrapper around standard RAG, utilizing an **uncertainty estimator** to perform **reranking** based on document relevance and LLM confidence. It employs a **multi-round calibratio...
|
03-12 21:11 | Success | - | |
|
exp_2506.16411v2_20260312_210957
|
This repository contains a synthetic benchmark to evaluate the **"Noise Decomposition Framework"** for Long Context LLMs...
**Architecture:** Proposes a MapReduce-style "Multi-Agent Chunking" framework. It splits long inputs into fixed-size segments to minimize "Model Noise" (fidelity decay in long sequences) and aggregates partial results. **Memory Footprint:**...
|
03-12 21:10 | Success | - | |
|
exp_pytrain.20260312210717.020_20260312_210806
|
Type-Safe Dynamic Kernel Packager
This benchmark demonstrates the creation of a simulated AI kernel plugin system. It bridges static type definitions using `typing.Protocol` with dynamic runtime module loading via `zipfile`. Overview The script performs the following operat...
|
03-12 21:08 | Success | - | |
|
exp_cr_10.3390_a18030155_20260312_210516
|
This benchmark evaluates the computational efficiency of the Text-Guided Synthesis framework for colonoscopy data augmen...
**Architecture:** The framework employs **Stable Diffusion** fine-tuned with **DreamBooth Low-Rank Adaptation (LoRA)** for synthetic colonoscopy image generation. Downstream classification utilizes **Vision Transformers (ViT)** and **Effici...
|
03-12 21:05 | Success | - | |
|
exp_cr_10.48175_ijarsct-25189_20260312_210423
|
JobMatchr RAG Performance Benchmark
**Architecture & Retrieval:** JobMatchr is a web-based RAG system built on **Flask** and **LangChain**. It employs a **vector embedding** retrieval architecture and depends on the proprietary **Gemini-2.0-flash** API for generation. * **RAG...
|
03-12 21:04 | Success | - | |
|
exp_cr_10.2196_67677_20260312_210333
|
Improving Dietary Supplement Information Retrieval: Development of a Retrieval-Augmented Generation System With Large La...
**Architecture:** Knowledge Graph (KG) based RAG system utilizing a hybrid generator-retriever approach. The retrieval component extracts relevant subgraphs from the integrated Dietary Supplement Knowledgebase (iDISK2.0), containing 174k en...
|
03-12 21:03 | Success | - | |
|
exp_2409.08597v1_20260312_210227
|
LA-RAG:Enhancing LLM-based ASR Accuracy with Retrieval-Augmented Generation
**Architecture & RAG Design:** LA-RAG is a specialized RAG framework for LLM-based ASR that utilizes **speech-to-speech retrieval**. Instead of text chunks, it indexes **token-level speech datastores** (acoustic embeddings). It retrieves si...
|
03-12 21:02 | Success | - | |
|
exp_pytrain.20260312205933.019_20260312_210019
|
Strictly Typed Plugin Registry Benchmark
This benchmark tests the ability to implement a type-safe, dynamic plugin registry system using Python's standard library, mimicking patterns found in frameworks like Transformers or vLLM. It focuses on strict typing (`Protocol`, `Generic`,...
|
03-12 21:00 | Success | - | |
|
exp_2409.08820v2_20260312_205755
|
A RAG Approach for Generating Competency Questions in Ontology Engineering
**Summary:** This paper validates a RAG workflow for generating Competency Questions (CQs) for ontology engineering from scientific papers. * **Architecture:** Uses GPT-4 as the generator. The retrieval component ingests scientific text to...
|
03-12 20:57 | Success | - | |
|
exp_2409.09493v2_20260312_205701
|
Pentest Copilot: LLM-Augmented Reasoning Benchmark
**Architecture:** An agentic workflow ("Pentest Copilot") utilizing GPT-4-turbo with Chain of Thought (CoT) to automate penetration testing sub-tasks and interpret tool outputs. **RAG & Retrieval:** The abstract confirms RAG usage for hallu...
|
03-12 20:57 | Success | - | |
|
exp_2409.10102v1_20260312_205606
|
Trustworthiness in RAG: Lightweight Benchmark
**Summary:** **Type:** Survey Paper (Review of Existing Techniques). **Architecture/Memory/Speed:** N/A. This paper does not propose a new model architecture, nor does it address memory footprint or inference speed optimizations. **RAG Spec...
|
03-12 20:56 | Success | - | |
|
exp_2409.10173v3_20260312_205504
|
Benchmark for jina-embeddings-v3
**Architecture:** A 570M parameter transformer utilizing task-specific Low-Rank Adaptation (LoRA) adapters to specialize embeddings for distinct objectives (retrieval, clustering, classification). **Memory Footprint:** Exceptionally efficie...
|
03-12 20:55 | Success | - | |
|
exp_pytrain.20260312205305.018_20260312_205338
|
Python Skill Fallback
Title: Dynamic Plugin Loader with Runtime Type Verification - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-12 20:53 | Success | - | |
|
exp_2409.15364v1_20260312_205145
|
VERA: Validation and Enhancement for Retrieval Augmented systems
**Architecture:** VERA wraps standard RAG with a dual-stage LLM validator. A "cum-enhancer" LLM pre-filters retrieved documents for relevance and redundancy, and a post-generator splits responses into atomic statements for fact-checking aga...
|
03-12 20:51 | Success | - | |
|
exp_2409.12558v2_20260312_205101
|
RAD-Bench: Evaluating Large Language Models Capabilities in Retrieval Augmented Dialogues
**Assessment:** RAD-Bench is a benchmark framework for evaluating **Search-Augmented Generation (SAG)** and **Retrieval-Augmented Generation (RAG)** in multi-turn dialogues. It measures **Retrieval Synthesis** (aggregating info) and **Retri...
|
03-12 20:51 | Success | - | |
|
exp_2409.12880v1_20260312_205007
|
E-commerce Product Title Translation RAG Benchmark
**Architecture:** Standard RAG pipeline coupling a dense retriever with a generative LLM. **RAG Specifics:** * **Retrieval Architecture:** Semantic search over a database of existing bilingual product titles. * **Indexing:** Stores "bilingu...
|
03-12 20:50 | Success | - | |
|
exp_2409.13902v1_20260312_204900
|
Ophthalmology RAG Benchmark
**Architecture:** Domain-specific RAG pipeline utilizing a 70,000-document ophthalmology corpus to augment LLM inference. **RAG Specifics:** * **Retrieval Strategy:** Top-10 document retrieval (k=10). * **Indexing/Chunking:** Unspecified in...
|
03-12 20:49 | Success | - | |
|
exp_pytrain.20260312204700.017_20260312_204727
|
Runtime Type-Validated Dynamic Plugin Loader
Overview This coding drill tests the integration of Python's dynamic module loading capabilities with the Structural Subtyping (Protocol) features introduced in recent Python versions. Goal Construct a self-contained runtime environment tha...
|
03-12 20:47 | Success | - | |
|
exp_2409.19006v2_20260312_204534
|
Towards Automated Patent Workflows: AI-Orchestrated Multi-Agent Framework for Intellectual Property Management and Analy...
**Architecture:** PatExpert utilizes a multi-agent orchestration model comprising a meta-agent, task-specific expert agents, and critique agents (Gold/Reward-LLM-as-a-Judge). **Retrieval (RAG):** Employs **Graph Retrieval-Augmented Generati...
|
03-12 20:45 | Success | - | |
|
exp_2409.14192v2_20260312_204429
|
Benchmark: Knowledge in Triples for Table QA
**Architecture:** A RAG framework that transforms semi-structured tables into (Subject, Predicate, Object) triples to feed the generator, bypassing the need for SQL/SPARQL parsing. **Retrieval Strategy:** * **Indexing/Chunking:** Data is ch...
|
03-12 20:44 | Success | - | |
|
exp_2403.10798v2_20260312_204330
|
Benchmarking Object Retrieval for Visual Question Answering (OR-OK-VQA)
**Architecture:** Proposes **OR-OK-VQA**, a Visual RAG framework replacing global image retrieval with **object-level retrieval**. It employs **Multi-scale Group Collaborative Embedding Learning (MS-GCEL)** to generate unsupervised embeddin...
|
03-12 20:43 | Success | - | |
|
exp_pytrain.20260312204031.016_20260312_204111
|
Dynamic Package Loader with Protocol Enforcement
This benchmark tests the ability to construct a robust runtime loader in Python using only the standard library. It simulates a micro-framework that dynamically generates a plugin architecture on the filesystem, loads these modules using `i...
|
03-12 20:41 | Success | - | |
|
exp_cr_10.69987_jacs.2024.40306_20260312_203844
|
Semantic Verifier for Post-hoc Answer Validation in Chat Platforms: Claim Decomposition, Evidence Retrieval, NLI, and Tr...
**Architecture:** Modular post-hoc verification pipeline consisting of Claim Decomposition, Evidence Retrieval, and NLI classification. **Retrieval Strategy:** Uses a "title-only evidence approximation." The system indexes Wikipedia page ti...
|
03-12 20:38 | Success | - | |
|
exp_2403.14952v1_20260312_203735
|
Evidence-Driven Retrieval Augmented Response Generation for Online Misinformation
**Architecture:** RARG utilizes a two-stage pipeline: (1) Evidence Collection via **retrieval and reranking** from a corpus of 1M+ academic articles, and (2) Response Generation using an **RLHF-aligned LLM** tuned to maximize evidence utili...
|
03-12 20:37 | Success | - | |
|
exp_cr_10.1609_aaai.v38i20.30590_20260312_203617
|
Select and Augment: Enhanced Dense Retrieval Knowledge Graph Augmentation (Abstract Reprint)
**Architecture:** A dual-component framework combining a Knowledge Graph (KG) embedding model with a trainable dense Retriever. Unlike static augmentation, this model performs multi-task optimization to select and align KG entities with dyn...
|
03-12 20:36 | Success | - | |
|
exp_pytrain.20260312203314.015_20260312_203351
|
Python Skill Fallback
Title: Generic CLI Toolkit with Type Parameter Syntax - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-12 20:33 | Success | - | |
|
exp_cr_10.1609_aaai.v38i8.28717_20260312_203125
|
Learning to Rank in Generative Retrieval (LTRGR) Benchmark
**Architecture:** LTRGR optimizes **Generative Retrieval** (typically T5-based Seq2Seq models) by introducing a **Learning-to-Rank (LTR)** training objective. It replaces standard maximum likelihood estimation with a **ListWise rank loss**,...
|
03-12 20:31 | Success | - | |
|
exp_cr_10.1609_aaai.v38i16.29728_20260312_203030
|
This benchmark implements a lightweight, self-contained evaluation harness inspired by the RGB (Retrieval-Augmented Gene...
**Paper:** Benchmarking Large Language Models in Retrieval-Augmented Generation (RGB) **Type:** RAG Evaluation & Robustness Analysis **Summary:** This paper introduces the **RGB benchmark**, isolating four critical RAG capabilities: noise r...
|
03-12 20:30 | Success | - | |
|
exp_2403.16435v1_20260312_202944
|
InstUPR Benchmark: Instruction-based Unsupervised Passage Reranking
**Architecture:** Unsupervised reranker leveraging instruction-tuned LLMs via prompt engineering. Utilizes **pairwise comparison** and a novel **soft score aggregation** mechanism to rank passages without task-specific fine-tuning. **Retrie...
|
03-12 20:29 | Success | - | |
|
exp_2403.17209v4_20260312_202855
|
Benchmark: Asset Administration Shell (AAS) Generation via Semantic Nodes
**Architecture:** Constructs a "semantic node" data structure to map raw technical datasheets into standardized Asset Administration Shells (AAS) for Industry 4.0. **RAG Implementation:** Utilizes Retrieval-Augmented Generation to ground te...
|
03-12 20:28 | Success | - | |
|
exp_pytrain.20260312202620.014_20260312_202650
|
Strictly Typed Config Package Builder
This benchmark evaluates your ability to programmatically generate a valid Python package structure and implement a robust configuration system using modern standard packaging and static type safety features. Objective Create a single execu...
|
03-12 20:26 | Success | - | |
|
exp_2309.07606v2_20260312_202532
|
Zero-shot Audio Topic Reranking Benchmark
**Architecture:** Dual-stage pipeline combining vector-based retrieval with zero-shot LLM reranking. **Retrieval Strategy:** Rapid search via video attribute embeddings. **Reranking Method:** Zero-shot LLM scoring to refine initial results....
|
03-12 20:25 | Success | - | |
|
exp_2309.12767v1_20260312_202406
|
Furthest Reasoning with Plan Assessment: Stable Reasoning Path with Retrieval-Augmented Large Language Models
**Architecture:** An iterative RAG framework coupling a generator LLM with a distinct, trainable "Plan Assessor" module. **RAG Specifics:** * **Architecture:** Iterative Retrieval. * **Strategy:** Uses "Furthest Reasoning," where the LLM re...
|
03-12 20:24 | Success | - | |
|
exp_2309.14805v1_20260312_202315
|
Fine-tuning and aligning question answering models for complex information extraction tasks
**Architecture:** Proposes a **fine-tuned Extractive Question Answering (QA)** architecture (specifically German encoder-based models) rather than generative LLMs. This approach focuses on span prediction to guarantee output grounding withi...
|
03-12 20:23 | Success | - | |
|
exp_2309.15088v1_20260312_202150
|
RankVicuna: Zero-Shot Listwise Document Reranking Benchmark
**RankVicuna** adapts the Vicuna-7B LLM for zero-shot listwise document reranking, achieving performance comparable to GPT-3.5. * **Architecture:** Listwise permutation generation. It acts as a second-stage reranker, ingesting a query and r...
|
03-12 20:22 | Success | - | |
|
exp_pytrain.20260312201913.013_20260312_201958
|
Strictly-Typed Plugin Registry with Dynamic Dependency Loading
Overview This benchmark evaluates a developer's ability to construct a framework-agnostic plugin architecture using Python's advanced type system and standard library introspection tools. Hypothesis An autonomous system can construct a robu...
|
03-12 20:20 | Success | - | |
|
exp_2303.12024v3_20260312_201753
|
Benchmark for cTBLS: Augmenting Large Language Models with Conversational Tables
**Architecture:** cTBLS is a 3-stage RAG pipeline: (1) Dense Retrieval (Transformer encoders) for table selection, (2) Coarse+Fine Ranking (shared encoder-decoder) for cell selection, and (3) LLM Generation (paper uses GPT-3.5). **Retrieval...
|
03-12 20:17 | Success | - | |
|
exp_2303.12501v1_20260312_201707
|
Text-to-Image Person Retrieval Benchmark
**Paper:** Cross-Modal Implicit Relation Reasoning and Aligning (IRRA) for Text-to-Image Person Retrieval **Architecture & Retrieval Focus:** IRRA proposes a **cross-modal encoder** architecture. Instead of treating modalities independently...
|
03-12 20:17 | Success | - | |
|
exp_2304.00241v1_20260312_201624
|
Benchmarking Bipartite Graph Convolutional Hashing (BGCH)
**Architecture:** End-to-End Bipartite Graph Convolutional Network (GCN) that generates compact binary hash codes. It utilizes adaptive convolution and latent feature dispersion to preserve structural information during binarization. **Retr...
|
03-12 20:16 | Success | - | |
|
exp_hf_2603.08075_20260312_201526
|
TALON: Test-time Adaptive Learning for On-the-Fly Category Discovery
**Architecture:** TALON replaces static hash-based quantization with a test-time adaptation framework featuring two core components: semantic-aware prototype updates (refining class representations) and stable test-time encoder updates (int...
|
03-12 20:15 | Success | - | |
|
exp_pytrain.20260312201241.012_20260312_201316
|
Typed Module Emulator with Semantic Versioning
This benchmark evaluates the capability of a Python environment to construct a standalone, typed library module that simulates strict software packaging practices. Objective The candidate script, `benchmark.py`, must function as a self-cont...
|
03-12 20:13 | Success | - | |
|
exp_hf_2603.10913_20260312_200041
|
LLM2Vec-Gen: Generative Embeddings Benchmark
**Architecture:** LLM2Vec-Gen utilizes a **frozen LLM backbone** augmented with trainable special tokens appended to the input. Training involves optimizing these tokens using the LLM’s own completions and distillation signals from an unsup...
|
03-12 20:11 | Success | - | |
|
exp_2309.11049v2_20260312_195941
|
Localize, Retrieve and Fuse: A Generalized Framework for Free-Form Question Answering over Tables
**Architecture:** TAG-QA uses a three-stage pipeline: (1) **Table-to-Graph conversion** via Graph Neural Networks (GNN) to locate relevant cells; (2) **External Retrieval** fetching Wikipedia evidence; (3) **Fusion Generator** integrating b...
|
03-12 19:59 | Success | - | |
|
exp_2512.12938v1_20260312_195815
|
SPAR: Session-based Pipeline for Adaptive Retrieval
**Architecture:** SPAR proposes a two-stage **adaptive RAG** framework. It replaces monolithic vector databases with a lightweight static **Semantic Metadata Index** coupled with dynamically generated, **session-specific vector databases**....
|
03-12 19:58 | Success | - | |
|
exp_pytrain.20260312195537.011_20260312_195617
|
Generic CLI Execution Engine with Type-Safe Decorators
This benchmark demonstrates a robust, modular command-line interface system built entirely with the Python standard library. It leverages advanced typing features—specifically `typing.Protocol`, `typing.ParamSpec`, and `typing.Concatenate`—...
|
03-12 19:56 | Success | - | |
|
exp_2506.13607v1_20260312_194437
|
Tree-Based Text Retrieval via Hierarchical Clustering
**Architecture:** Replaces standard vector search with a **Hierarchical Clustering** retrieval architecture. **Indexing/Chunking:** Uses a **tree-based structure** where document chunks are organized into hierarchical clusters based on sema...
|
03-12 19:54 | Success | - | |
|
exp_cr_10.1609_aaai.v38i17.29947_20260312_194332
|
Fine-Grained Distillation for Long Document Retrieval Benchmark
**Architecture:** FGD enhances standard **dense bi-encoders** (retrievers) via a specific training-stage distillation loss. It addresses the "granular-mismatch" in long documents by aligning global representations across multiple granularit...
|
03-12 19:43 | Success | - | |
|
exp_cr_10.3390_math11122733_20260312_194233
|
Benchmark: Automotive Domain Retrieval-Based QA
**Summary for ARES 8GB Roadmap** This paper validates a **domain-adaptive encoder-retriever** for automotive QA using a **BERT-base** architecture fine-tuned via a pretraining-multitask framework. * **Architecture & Retrieval:** Standard **...
|
03-12 19:43 | Success | - | |
|
exp_pytrain.20260312194036.010_20260312_194120
|
Strictly Typed Asynchronous Package Architecture
This benchmark evaluates a developer's ability to structure a formally typed Python package using `asyncio`, `typing.Generic`, and proper packaging markers (`py.typed`). The script dynamically generates the required package structure, verif...
|
03-12 19:41 | Success | - | |
|
exp_hf_2603.09827_20260312_193938
|
MA-EgoQA: Multi-Agent Egocentric Video QA Benchmark
**Architecture:** EgoMAS proposes a RAG-style pipeline featuring a "shared memory" module to fuse multi-agent sensory data. It utilizes **agent-wise dynamic retrieval**, compressing video frames into feature embeddings via a vision encoder,...
|
03-12 19:39 | Success | - | |
|
exp_oa_W4415233873_20260312_193841
|
Healthcare RAG Performance Benchmark
**Architecture:** This survey classifies RAG into Naive, Advanced, and Modular frameworks. For 8GB constraints, **Naive RAG** is the primary viable candidate for local inference, as it follows a linear "retrieve-then-read" pipeline. **RAG S...
|
03-12 19:38 | Success | - | |
|
exp_oa_W4416955380_20260312_193759
|
Evaluating Faithfulness in Agentic RAG Systems for e-Governance Applications Using LLM-Based Judging Frameworks
**Paper:** Evaluating Faithfulness in Agentic RAG Systems for e-Governance Applications... **Summary:** This study proposes a **modular Agentic RAG** framework rather than a low-memory inference technique. It evaluates a hybrid retrieval ar...
|
03-12 19:38 | Success | - | |
|
exp_2512.10942v2_20260312_193652
|
VL-JEPA: Joint Embedding Predictive Architecture for Vision-language
**Architecture:** Replaces autoregressive token generation with a Joint Embedding Predictive Architecture (JEPA). The model predicts continuous text embeddings via a vision encoder and predictor, utilizing a lightweight text decoder only wh...
|
03-12 19:36 | Success | - | |
|
exp_2512.11614v2_20260312_193604
|
Merlin-Arthur RAG Benchmarking Suite
**Architecture:** Proposes a Merlin-Arthur (M/A) training protocol where a generator LLM ("Arthur") is trained using a helpful retriever ("Merlin") and an adversarial retriever ("Morgana"). **RAG Specifications:** * **Retrieval:** Utilizes...
|
03-12 19:36 | Success | - | |
|
exp_pytrain.20260312193408.009_20260312_193437
|
Dynamic Plugin Loader with Strict Protocol Typing
This benchmark tests the ability to construct a modular plugin architecture using Python's advanced `typing` features and the `importlib` system. Overview The script programmatically creates a temporary package structure (`mock_package/`) c...
|
03-12 19:34 | Success | - | |
|
exp_2512.11997v1_20260312_193214
|
Benchmark: EnrichLog - Knowledge-Enriched Log Anomaly Detection
**Architecture:** EnrichLog is a training-free, entry-based anomaly detection framework utilizing a RAG pipeline to fuse raw logs with external knowledge. **Retrieval & Context Strategy:** * **Architecture:** Vector-based retrieval (dense e...
|
03-12 19:32 | Success | - | |
|
exp_2512.12694v1_20260312_193127
|
Hybrid RAG Benchmark
**Architecture:** Modular multilingual RAG pipeline utilizing **Hybrid Retrieval** to handle noisy OCR data. It combines semantic query expansion and multi-query fusion, aggregated via **Reciprocal Rank Fusion (RRF)** to stabilize recall ag...
|
03-12 19:31 | Success | - | |
|
exp_2602.22219v1_20260312_193031
|
Comparative Analysis of Neural Retriever-Reranker Pipelines for Retrieval-Augmented Generation over Knowledge Graphs in...
**Paper:** Comparative Analysis of Neural Retriever-Reranker Pipelines for Retrieval-Augmented Generation over Knowledge Graphs in E-commerce Applications **Summary for ARES 8GB Roadmap:** This study evaluates Retriever-Reranker pipelines f...
|
03-12 19:30 | Success | - | |
|
exp_2512.13632v1_20260312_192937
|
StutterFuse: Performance Benchmark
**Architecture:** StutterFuse is a **Retrieval-Augmented Classifier (RAC)** combining a **Conformer encoder** with a **Gated Mixture-of-Experts (MoE)**. It conditions acoustic features on a **non-parametric memory bank** of clinical example...
|
03-12 19:29 | Success | - | |
|
exp_pytrain.20260312192727.008_20260312_192800
|
Robust Namespace Package Loader with Structural Typing
This benchmark evaluates your ability to construct a scalable plugin architecture using modern Python typing features (PEP 544 Protocols) and the standard library's import system (`importlib`, `pkgutil`). Objective Implement a `PluginLoader...
|
03-12 19:28 | Success | - | |
|
exp_2512.14313v1_20260312_192550
|
Dynamic Context Selection for Retrieval-Augmented Generation: Mitigating Distractors and Positional Bias
**Architecture & Retrieval Strategy** This paper replaces standard fixed top-$k$ retrieval with a **dynamic context selection** mechanism. The architecture introduces a lightweight **context-size classifier** (likely a BERT-style model) tha...
|
03-12 19:25 | Success | - | |
|
exp_cr_10.55606_jurritek.v4i3.6664_20260312_192458
|
This repository contains the benchmarking code for the UCIC Academic Service Chatbot based on the Retrieval-Augmented Ge...
**Paper:** Chatbot Layanan Akademik Calon Mahasiswa UCIC Menggunakan Metode RAG **Summary for 8GB Roadmap:** * **Architecture:** Standard Retrieval-Augmented Generation (RAG) pipeline orchestrated via LangChain. * **Retrieval:** **FAISS** (...
|
03-12 19:25 | Success | - | |
|
exp_cr_10.37432_jieph-confpro5-00265_20260312_192430
|
Enhancing Lassa fever health literacy through AI: Development and evaluation of a retrieval-augmented generation chatbot...
**Architecture**: Standard Retrieval-Augmented Generation (RAG) chatbot. **Retrieval Strategy**: Curated static documents (WHO, NCDC). *Specific indexing, chunking strategy, vector database, and reranking methods are not specified in the pr...
|
03-12 19:24 | Success | - | |
|
exp_2506.12483v1_20260312_192353
|
MALM: A Multi-Information Adapter for Large Language Models to Mitigate Hallucination
**Architecture:** MALM introduces a parameter-efficient adapter utilizing a **multilayered Graph Attention Network (GAT)**. It explicitly models the interdependencies between the original input, retrieved context, and parametric knowledge t...
|
03-12 19:23 | Success | - | |
|
exp_2506.14035v1_20260312_192313
|
SimpleDoc Benchmark
**Architecture:** Agentic multi-modal RAG framework utilizing a Vision Language Model (VLM) for both embedding and final reasoning. **Retrieval Architecture & Strategy:** * **Indexing/Chunking:** Pages are indexed as visual chunks using VLM...
|
03-12 19:23 | Success | - | |
|
exp_pytrain.20260312192112.007_20260312_192135
|
Typed Component Registry System
This project implements a robust, type-safe component registry pattern using Python's `typing` module. It demonstrates how to build a plugin architecture where the compiler and runtime enforce strict interface compliance, reducing attribute...
|
03-12 19:21 | Success | - | |
|
exp_2506.15001v1_20260312_191004
|
Memory Token Benchmark
**Architecture:** Introduces "Memory Tokens"—single, optimized embedding vectors that act as lossless, compressed keys. When prompted with this token, the LLM reconstructs the original text sequence (up to ~240 tokens) exactly without weigh...
|
03-12 19:20 | Success | - | |
|
exp_2506.16035v2_20260312_190911
|
Vision-Guided Chunking Benchmark
**Architecture:** Multimodal RAG utilizing Large Multimodal Models (LMMs) for document parsing instead of traditional text extractors. **Retrieval & Chunking:** **Vision-Guided Chunking**. The strategy processes PDFs in **configurable page...
|
03-12 19:09 | Success | - | |
|
exp_pytrain.20260312190702.006_20260312_190728
|
Runtime Type-Checked Plugin Loader
This benchmark demonstrates a robust, autonomous plugin architecture using Python's standard library. The system simulates a multi-module package hierarchy entirely in-memory using `types` and `importlib`, bypassing the need for physical fi...
|
03-12 19:07 | Success | - | |
|
exp_2506.16037v1_20260312_185540
|
Multi-Hop RAG Benchmark for LLaMA 3
**Architecture:** LLaMA 3 enhanced with a **Dense Retrieval Module** and multi-hop reasoning chains for complex, long-document QA. **RAG Specifics:** * **Retrieval Architecture:** **Dense Retrieval**. * **Optimization/Reranking:** Uses **Jo...
|
03-12 19:05 | Success | - | |
|
exp_pytrain.20260312185359.005_20260312_185418
|
Generic Service Registry & Dispatcher Benchmark
This benchmark evaluates the implementation of a robust, type-safe Service Registry using Python's standard `typing` module. The focus is on structural subtyping via `Protocol`, generics via `TypeVar`, and simulating proper packaging conven...
|
03-12 18:54 | Success | - | |
|
exp_cr_10.3390_ai6030050_20260312_185245
|
Benchmark: Multimodal RAG for Eurobarometer Data
**Architecture:** Modular framework integrating Retrieval-Augmented Generation (RAG) with Multimodal Large Language Models (MLLMs) to process Eurobarometer surveys (text + charts/images). **RAG Specifics:** * **Retrieval Architecture:** Mul...
|
03-12 18:52 | Success | - | |
|
exp_cr_10.71070_oaml.v5i1.141_20260312_185203
|
Retrieval-augmented generation for personalized physician recommendations in online medical services: model development...
**Architecture:** Standard dense RAG. The system uses embedding-based retrieval to match patient queries against a database of consultation records and physician profiles, followed by an LLM synthesizing the recommendation. **Retrieval:** E...
|
03-12 18:52 | Success | - | |
|
exp_oa_W4404390755_20260312_185120
|
LEGO-GraphRAG Benchmark
**Architecture:** LEGO-GraphRAG decomposes the GraphRAG pipeline into four modular stages: **Query Understanding**, **Retrieval**, **Subgraph Construction**, and **Response Synthesis**. **RAG Specifics:** * **Retrieval Architecture:** Modul...
|
03-12 18:51 | Success | - | |
|
exp_2409.08479v2_20260312_185042
|
Exploring Information Retrieval Landscapes: An Investigation of a Novel Evaluation Techniques and Comparative Document S...
**Assessment for ARES 8GB Roadmap** This paper focuses on optimizing RAG preprocessing pipelines rather than core inference architecture or VRAM management. **Retrieval & Chunking:** * **Architecture:** The study evaluates a standard RAG pi...
|
03-12 18:50 | Success | - | |
|
exp_2409.09281v2_20260312_184944
|
Benchmark: Language Models "Grok" to Copy
This paper is a theoretical study of **Transformer** internal dynamics, specifically regarding the formation of **Induction Heads**—the attention mechanism responsible for copying context, a prerequisite for **In-Context Learning (ICL)** an...
|
03-12 18:49 | Success | - | |
|
exp_pytrain.20260312184753.004_20260312_184813
|
Robust Plugin Loader with Runtime Type Checking
**Difficulty:** Intermediate **Focus:** Dynamic Packaging, Structural Typing (`typing.Protocol`), `importlib` **Time Limit:** 20 Seconds Objective Implement a self-contained Python benchmark that simulates a plugin architecture. The system...
|
03-12 18:48 | Success | - | |
|
exp_2409.10955v2_20260312_184722
|
Investigating Context-Faithfulness in Large Language Models: The Roles of Memory Strength and Evidence Style
**Verdict:** Low-priority architectural integration, high-priority retrieval pipeline optimization. **Research Focus:** This is an empirical analysis of RAG behaviors rather than a new model architecture. It investigates how **Memory Streng...
|
03-12 18:47 | Success | - | |
|
exp_2409.11242v4_20260312_184603
|
Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse
**Architecture & Memory:** Trust-Align is an alignment strategy designed for small, open-weight models (LLaMA 1-8B, Qwen 0.5-7B, Phi-3.5). It focuses on "Grounded Attributions" and "Learning to Refuse," ensuring outputs strictly adhere to r...
|
03-12 18:46 | Success | - | |
|
exp_2409.12812v3_20260312_184511
|
CoDrivingLLM Benchmark
**Architecture:** CoDrivingLLM utilizes a modular design separating semantic reasoning from physics. An **Environment Module** handles mathematical updates (vehicle kinematics), while a **CoT-based Reasoning Module** manages state perceptio...
|
03-12 18:45 | Success | - | |
|
exp_2409.13682v1_20260312_184426
|
ReMEmbR Benchmark: Long-Horizon Memory Retrieval
**Architecture:** ReMEmbR is a retrieval-augmented framework utilizing a dual-phase structure: a memory building phase and a querying phase. It uses a Vision-Language Model (VLM) to encode video frames and metadata into a memory bank, rathe...
|
03-12 18:44 | Success | - | |
|
exp_2409.13992v1_20260312_184330
|
SMART-RAG: Context Selection Benchmark
**Architecture:** SMART-RAG replaces standard top-k selection with **Determinantal Point Processes (DPPs)** to optimize for both relevance and diversity. **Retrieval & Budget:** Utilizes a **Retrieve-then-Select** strategy. It retrieves a l...
|
03-12 18:43 | Success | - | |
|
exp_pytrain.20260312184117.003_20260312_184144
|
Type-Safe Dynamic Package Registry Benchmark
This benchmark tests the robustness of a dynamic Python plugin system. It simulates an environment where functionality is extended at runtime by loading modules from the filesystem. The core challenge is to ensure that these dynamically loa...
|
03-12 18:41 | Success | - | |
|
exp_oa_W4399511665_20260312_183957
|
Multi-Head RAG: Solving Multi-Aspect Problems with LLMs
**Architecture:** MRAG modifies the *Retriever Component* by using the activations of each Transformer attention head as distinct retrieval keys, rather than a single aggregated embedding vector. **Retrieval & Indexing:** It utilizes a **mu...
|
03-12 18:40 | Success | - | |
|
exp_2403.14197v1_20260312_183843
|
Context Quality Matters in Training Fusion-in-Decoder for Extractive Open-Domain Question Answering
**Architecture:** Fusion-in-Decoder (FiD). **Retrieval Strategy:** Passage-level retrieval with multi-context concatenation. **Memory Footprint:** **Critical Constraint.** FiD encodes all retrieved passages simultaneously in the encoder. Th...
|
03-12 18:38 | Success | - | |
|
exp_2403.14374v1_20260312_183727
|
FIT-RAG: Black-Box RAG Benchmark
**Architecture:** FIT-RAG optimizes black-box RAG using a **Bi-label Document Scorer** (aligns retrieval with factual relevance rather than LLM preference), a **Self-knowledge Recognizer** (bypasses retrieval if the frozen LLM knows the ans...
|
03-12 18:38 | Success | - | |
|
exp_2403.15268v5_20260312_183652
|
Awakening Augmented Generation (AAG) Benchmark
**Architecture:** A non-retrieval framework designed to activate internal knowledge. It employs a Context Generator to synthesize a compressed "symbolic" document and a Hypernetwork to generate dynamic, query-specific adapters. These adapte...
|
03-12 18:36 | Success | - | |
|
exp_pytrain.20260312183421.002_20260312_183454
|
Dynamic Module Loader with PEP 695 Syntax
This benchmark tests the ability to implement a generic wrapper for runtime module loading using Python 3.12+'s Type Parameter Syntax (PEP 695). Objective Create a script `dynamic_loader.py` that implements a generic class `ModuleLoader[T]`...
|
03-12 18:34 | Success | - | |
|
exp_2404.07221v2_20260312_182309
|
Benchmark: RAG Retrieval Enhancement on Financial Documents
This paper proposes a modular RAG optimization pipeline focused on financial document QA, aiming to fix retrieval errors rather than LLM limitations. * **Architecture:** Standard RAG with dense vector retrieval. * **Chunking:** "Sophisticat...
|
03-12 18:33 | Success | - | |
|
exp_2403.15729v3_20260312_182222
|
RAGS4EIC Summarization Benchmark
**RAGS4EIC** proposes a RAG-based agent for managing complex scientific documentation using a modular **LangChain** workflow. * **Architecture:** A two-stage pipeline: a comprehensive **Vector Database** for semantic retrieval and an LLM fo...
|
03-12 18:22 | Success | - | |
|
exp_cr_10.1609_aaai.v38i21.30577_20260312_182149
|
GEAR-Up: Generative AI and External Knowledge-Based Retrieval: Upgrading Scholarly Article Searches for Systematic Revie...
**Architecture:** KG-augmented query expansion pipeline. The system retrieves semantic context from a Knowledge Graph (KG) to enrich user queries before passing them to an LLM for translation and refinement. **Retrieval Strategy:** * **Retr...
|
03-12 18:21 | Success | - | |
|
exp_2309.13375v2_20260312_182119
|
Benchmark: Generative Retrieval with SEATER (Semantic Tree-Structured IDs)
**Paper:** SEATER (SEmAntic Tree-structured item identifiERs) **Architecture:** An **Encoder-Decoder Generative Retrieval** framework optimized for large-scale recommendations. It replaces traditional vector similarity search with autoregre...
|
03-12 18:21 | Success | - | |
|
exp_2309.15217v2_20260312_182034
|
Ragas: Automated Evaluation of Retrieval Augmented Generation
**Subject:** Ragas (Automated RAG Evaluation Framework) **Architecture:** An **LLM-as-a-Judge** framework. It utilizes prompt engineering to guide an LLM to score specific dimensions—*Context Precision* (retrieval quality), *Faithfulness* (...
|
03-12 18:20 | Success | - | |
|
exp_pytrain.20260312181849.001_20260312_181909
|
Dynamic Plugin Registry with Runtime Type Validation
This drill verifies the ability to design a robust, extensible plugin system using Python's standard library. Candidates must demonstrate proficiency with `typing.Protocol` for structural subtyping, `importlib` for dynamic code loading, and...
|
03-12 18:19 | Success | - | |
|
exp_pytrain.20260312140657.027_20260312_140723
|
Dynamic Type-Safe Plugin Loader with Runtime Validation
README.md Dynamic Type-Safe Plugin Loader with Runtime Validation Overview This benchmark demonstrates a robust, autonomous system for loading Python plugins dynamically from a simulated package distribution. It enforces strict type safety...
|
03-12 14:09 | Success | - | |
|
exp_oa_W7114889968_20260312_140058
|
RAG vs. Parametric Performance Benchmark
**Paper Type:** Systematic Literature Review (SLR) **Analysis Scope:** Synthesis of 128 studies (Jan 2020–May 2025) on Retrieval-Augmented Generation (RAG). **Architecture & Feasibility:** N/A (Survey Paper). This paper does not propose a s...
|
03-12 14:05 | Success | - | |
|
exp_pytrain.20260312135459.026_20260312_135525
|
Strict Configuration Dispatcher Benchmark
README.md Strict Configuration Dispatcher Benchmark Objective This benchmark evaluates an autonomous agent's ability to implement a "Configuration-to-Instance" dispatcher, a core pattern in high-performance machine learning frameworks (e.g....
|
03-12 13:56 | Success | - | |
|
exp_2512.12935v1_20260312_135132
|
Unified Interactive Multimodal Moment Retrieval - Benchmark
**Paper:** Unified Interactive Multimodal Moment Retrieval via Cascaded Embedding-Reranking and Temporal-Aware Score Fusion **Summary:** **Retrieval Architecture:** A cascaded dual-encoder system using **BEIT-3** and **SigLIP** for broad ca...
|
03-12 13:52 | Success | - | |
|
exp_2512.14554v4_20260312_134845
|
VLegal-Bench: Cognitively Grounded Benchmark for Vietnamese Legal Reasoning of Large Language Models
This paper introduces **VLegal-Bench**, a **benchmark** rather than a novel model architecture, designed to evaluate LLMs on Vietnamese legal reasoning using 10,450 expert-validated samples. * **Architecture & Feasibility:** The benchmark f...
|
03-12 13:49 | Success | - | |
|
exp_pytrain.20260312134313.025_20260312_134349
|
Typed CLI Dispatcher & Entry-Point Simulation
README.md Typed CLI Dispatcher & Entry-Point Simulation This benchmark demonstrates an advanced understanding of Python's type system and software architecture patterns, specifically focusing on creating a modular, extensible CLI framework...
|
03-12 13:44 | Success | - | |
|
exp_cr_10.5334_uproc.170_20260312_133920
|
Smart Decision-Making: The Role of Digital Twins, Retrieval-Augmented Generation-Enhanced AI, and Learning Analytics
**Architecture:** Proposes a macro-architecture integrating Learning Analytics (data mining), Digital Twins (simulation), and RAG-enhanced LLMs (synthesis) for higher-ed management. **RAG Specifics:** **Missing Technical Specs.** The abstra...
|
03-12 13:40 | Success | - | |
|
exp_pytrain.20260312133517.024_20260312_133557
|
Strict Package Metadata and Typing Inspector Benchmark
README.md Strict Package Metadata and Typing Inspector Benchmark Overview This benchmark evaluates a system's ability to generate a Python CLI tool that performs static analysis on a codebase. The tool, `pkg_inspector.py`, must verify packa...
|
03-12 13:36 | Success | - | |
|
exp_cr_10.3390_ai6090226_20260312_133329
|
Section 1: README.md
**Type:** Systematic Literature Review (SLR). **Architecture:** Synthesizes **Naïve**, **Advanced**, and **Modular** RAG architectures for clinical applications (diagnostics, EHR summarization, QA). **RAG Specifics:** As a survey, it aggreg...
|
03-12 13:34 | Success | - | |
|
exp_pytrain.20260312132648.023_20260312_132722
|
Dynamic Plugin Loader with Structural Subtyping
This benchmark demonstrates a robust, zero-dependency plugin architecture using Python's standard library. Objective To simulate an autonomous coding system capable of: 1. **Defining Strict Interfaces:** Using `typing.Protocol` to enforce s...
|
03-12 13:28 | Success | - | |
|
exp_2506.13026v1_20260312_132337
|
Knowledge Graph Fusion with Large Language Models for Accurate, Explainable Manufacturing Process Planning
**Architecture:** ARKNESS is a GraphRAG framework fusing zero-shot Knowledge Graph (KG) construction with on-premise LLMs for CNC process planning. **Retrieval Strategy:** * **Indexing:** Converts heterogeneous documents into multi-relation...
|
03-12 13:24 | Success | - | |
|
exp_2506.15862v1_20260312_132201
|
Here is the design for the MoR (Mixture of Retrievers) benchmark.
**Architecture & Memory** MoR proposes a lightweight gating network (0.8B parameters) to dynamically fuse outputs from heterogeneous retrievers. The architecture combines BM25 (Sparse), Dense Embeddings (Semantic), and specialized Human ret...
|
03-12 13:22 | Success | - | |
|
exp_pytrain.20260312131840.022_20260312_131907
|
---
README.md --- Generic Plugin System Benchmark (PEP 695) Overview This benchmark evaluates the implementation of a **Generic Plugin System** using modern Python 3.12+ features. It specifically validates the usage of **PEP 695 Type Parameter...
|
03-12 13:19 | Success | - | |
|
exp_cr_10.3897_biss.8.136735_20260312_131443
|
Benchmark: LLM-Based Biodiversity Information Extraction
**Summary for ARES 8GB Roadmap** **Objective:** Automate the extraction of deep learning metadata (datasets, metrics, hyperparameters) from biodiversity literature to replace manual annotation. **RAG & Architecture:** * **Base Model:** Mixt...
|
03-12 13:16 | Success | - | |
|
exp_pytrain.20260312131038.021_20260312_131136
|
Python Skill Fallback
Title: Robust Dependency Graph Resolver using Structural Typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-12 13:11 | Success | - | |
|
exp_2409.11190v2_20260312_130720
|
SuperCoder2.0: Architecture Benchmark
**Architecture & RAG:** SuperCoder2.0 utilizes a multi-agent architecture with a **three-step hierarchical RAG** pipeline. 1. **Retrieval:** Uses a **Repository File Level Map** to identify candidate files. 2. **Chunking/Indexing:** Refines...
|
03-12 13:08 | Success | - | |
|
exp_2409.12468v3_20260312_130537
|
Familiarity-Aware Evidence Compression (FaviComp) Benchmark
**Paper:** FaviComp (Familiarity-Aware Evidence Compression) **Architecture:** FaviComp is a **training-free** compression module designed to sit between the retriever and the generator in a RAG pipeline. It utilizes the target generator’s...
|
03-12 13:06 | Success | - | |
|
exp_pytrain.20260312130336.020_20260312_130353
|
Dynamic Plugin Loader with Type-Safe Contracts Benchmark
README.md Dynamic Plugin Loader with Type-Safe Contracts Benchmark This benchmark evaluates a Python system's ability to dynamically load code at runtime while strictly enforcing interface compliance using `typing.Protocol`. Objective The g...
|
03-12 13:04 | Success | - | |
|
exp_2409.12682v2_20260312_130030
|
Here is the runnable benchmark for the "Retrieval-Augmented Test Generation" innovation.
**Summary for ARES 8GB Roadmap** **Architecture & RAG Strategy:** The paper evaluates a **Basic RAG** pipeline against a domain-specific **API-level RAG** approach. The retrieval architecture pulls from three external sources: API documenta...
|
03-12 13:01 | Success | - | |
|
exp_2409.14175v2_20260312_125902
|
QMOS: Enhancing LLMs for Telecommunication with Question Masked loss and Option Shuffling
**Architecture:** Fine-tunes efficient Small Language Models (SLMs), specifically **Phi-2** (2.7B) and **Falcon-7B**, within a RAG framework. Introduces **Question-Masked Loss** (masking query tokens to force context-to-option alignment) an...
|
03-12 12:59 | Success | - | |
|
exp_pytrain.20260312125609.019_20260312_125643
|
Type-Safe Plugin Registry and Configuration Loader Benchmark
README.md Type-Safe Plugin Registry and Configuration Loader Benchmark Overview This benchmark evaluates the capability of an autonomous coding system to implement core architectural patterns found in large-scale machine learning frameworks...
|
03-12 12:56 | Success | - | |
|
exp_2403.17759v1_20260312_125445
|
TWOLAR: a TWO-step LLM-Augmented distillation method for passage Reranking
**Architecture:** A two-step distillation pipeline training a lightweight BERT-based **Cross-Encoder** student to mimic the zero-shot reranking capabilities of a large LLM teacher. **RAG & Retrieval Strategy:** * **Retrieval:** Agnostic to...
|
03-12 12:55 | Success | - | |
|
exp_2512.12980v2_20260312_125254
|
Benchmark: Iceberg - Task-Centric Vector Similarity Search
This paper introduces **Iceberg**, a benchmark suite evaluating **Vector Similarity Search (VSS)** architectures based on downstream task utility rather than isolated recall-latency metrics. **Retrieval Architecture:** Focuses on **Approxim...
|
03-12 12:53 | Success | - | |
|
exp_2512.13771v1_20260312_125045
|
Here is the design for the Semantic Grounding Index (SGI) benchmark.
**Architecture:** Introduces the Semantic Grounding Index (SGI), a geometric post-hoc detector analyzing angular distances on a hypersphere ($\mathbb{S}^{d-1}$). It identifies "semantic laziness" where responses remain proximate to question...
|
03-12 12:51 | Success | - | |
|
exp_pytrain.20260312124840.018_20260312_124909
|
```markdown
README.md bash python benchmark.py
|
03-12 12:49 | Success | - | |
|
exp_cr_10.63887_jtie.2025.1.3.3_20260312_124710
|
Benchmark: LLM-RAG Patent Retrieval System
**Architecture & Retrieval** The paper proposes a cloud-centric RAG framework utilizing `gpt-3.5-turbo` for generation and an unspecified "high-efficiency vector retrieval engine" for semantic search. The pipeline consists of data preproces...
|
03-12 12:47 | Success | - | |
|
exp_oa_W4409588626_20260312_124607
|
Benchmark: Mamba-GraphRAG for Medical Reasoning
**Architecture & Retrieval:** This paper proposes a hybrid GraphRAG system using a Neo4j knowledge graph (storing UMLS entities) combined with a dense vector store (textbook embeddings). The retrieval architecture is dual-layered: it perfor...
|
03-12 12:46 | Success | - | |
|
exp_oa_W4410082953_20260312_124514
|
Investigation: Evidence-Based GraphRAG for USMLE Questions
**Architecture:** Hybrid GraphRAG utilizing **Neo4j** for symbolic reasoning (UMLS entities) and a vector store for semantic search (textbook embeddings). **Retrieval:** Dual-strategy indexing: graph-based entity mapping and dense retrieval...
|
03-12 12:45 | Success | - | |
|
exp_2506.16444v2_20260312_124429
|
REIS: In-Storage Processing Retrieval Benchmark
**Architecture:** REIS proposes an In-Storage Processing (ISP) architecture that offloads Approximate Nearest Neighbor (ANNS) retrieval computations directly to the SSD controller, minimizing data movement between storage and host. **RAG Sp...
|
03-12 12:44 | Success | - | |
|
exp_cr_10.3390_app14177995_20260312_124342
|
Here is the benchmark design for the Personalized RAG System.
**Architecture & Retrieval Strategy** This paper implements a standard RAG pipeline using **hybrid retrieval**. It combines semantic search via `text-embedding-ada-002` with **keyword tagging** to organize documents into **context-based cat...
|
03-12 12:43 | Success | - | |
|
exp_pytrain.20260312124150.017_20260312_124217
|
---
**README.md**
|
03-12 12:42 | Success | - | |
|
exp_2409.09916v1_20260312_124107
|
SFR-RAG Benchmark Suite
**Architecture & RAG Design** SFR-RAG-9B is a dense, instruction-tuned decoder-only model optimized specifically for the **Reader/Generator** component of RAG. It does not define a specific internal retrieval architecture but is engineered...
|
03-12 12:41 | Success | - | |
|
exp_2309.10966v6_20260312_123950
|
MBR and QE Finetuning: Training-time Distillation of the Best and Most Expensive Decoding Methods
**Architecture:** Standard Transformer encoder-decoder. The authors propose "MBR Finetuning" and "QE Finetuning," training strategies that distill the knowledge of expensive decoding methods (Minimum Bayes' Risk decoding and Quality Estimat...
|
03-12 12:39 | Success | - | |
|
exp_2512.10787v2_20260312_123829
|
SEAL-RAG Benchmark
**Architecture:** SEAL-RAG is a training-free controller wrapping standard RAG components. It executes a **Search $\rightarrow$ Extract $\rightarrow$ Assess $\rightarrow$ Loop** cycle to perform multi-hop reasoning without expanding the con...
|
03-12 12:38 | Success | - | |
|
exp_cr_10.1609_aaai.v37i4.25598_20260312_123709
|
ConTextual Masked Auto-Encoder (CoT-MAE) Benchmark
**Architecture & Memory:** CoT-MAE utilizes an **asymmetric encoder-decoder** for pre-training but deploys **only the encoder** for inference. This structure is optimized to compress sentence semantics into dense vectors. Memory footprint i...
|
03-12 12:37 | Success | - | |
|
exp_pytrain.20260312123508.016_20260312_123543
|
Strictly Typed Plugin Registry Benchmark
README.md Strictly Typed Plugin Registry Benchmark This drill verifies the use of Python's `typing.Protocol` and `typing.Generic` to build a robust, loosely-coupled system suitable for a distributable library. Objective Candidates must impl...
|
03-12 12:35 | Success | - | |
|
exp_oa_W4415560266_20260312_123343
|
This benchmark evaluates the performance impact of the proposed **MCP-aware Re-ranking** mechanism integrated into a Ret...
**Architecture:** Hybrid multi-agent system utilizing RAG, an Agent Communication Protocol (ACP) for orchestration, and a Model Context Protocol (MCP) for context fusion. **Retrieval & Indexing:** Python prototype using a vector store for t...
|
03-12 12:33 | Success | - | |
|
exp_oa_W4416430905_20260312_123308
|
RAGSmith: A Framework for Finding the Optimal Composition of Retrieval-Augmented Generation Methods Across Datasets
**RAGSmith** employs a genetic search to optimize RAG pipelines over 46,080 configurations. * **Architecture:** The study identifies **Vector Retrieval + Post-Generation Reflection/Revision** as the optimal backbone. **Passage compression**...
|
03-12 12:33 | Success | - | |
|
exp_oa_W4416075695_20260312_123223
|
Benchmark: Retrieval-Augmented Generation (RAG) Performance
**Architecture:** Hybrid "retrieve-then-generate" framework combining parametric LLMs with external, non-parametric knowledge retrieval. **RAG Specifics:** As a comprehensive review, this paper outlines the general paradigm rather than a si...
|
03-12 12:32 | Success | - | |
|
exp_2512.10393v2_20260312_123140
|
BinSeek: Cross-Modal Retrieval for Stripped Binary Analysis
**Architecture:** BinSeek implements a **two-stage retrieval pipeline**: a dual-encoder (**BinSeek-Embedding**) for efficient high-recall retrieval, followed by a cross-encoder (**BinSeek-Reranker**) for context-aware refinement. **Retrieva...
|
03-12 12:31 | Success | - | |
|
exp_2512.10422v3_20260312_123035
|
Cooperative RAG (CoopRAG) Benchmark
**Architecture:** CoopRAG utilizes a dual-component system featuring a dense retriever and an LLM that iteratively exchange states. The retriever employs a "Contrasting Layers" mechanism to rank documents by comparing representations from e...
|
03-12 12:30 | Success | - | |
|
exp_pytrain.20260312122836.015_20260312_122907
|
```markdown
README.md
|
03-12 12:29 | Success | - | |
|
exp_2512.12458v2_20260312_122715
|
Benchmark Design: Stability of Multi-Vector vs. Single-Vector Retrieval
**Architecture:** Theoretical analysis of **Multi-vector** (ColBERT-style), **Filtered**, and **Sparse** retrieval systems. **Key Findings:** * **Multi-vector:** Proves **Chamfer distance** preserves stability, while average pooling fails....
|
03-12 12:27 | Success | - | |
|
exp_oa_W4417313874_20260312_122609
|
Biomedical RAG Trilemma Benchmark
**Summary for ARES 8GB Roadmap:** This survey (2020–2025) classifies biomedical RAG into **naive**, **advanced**, and **modular** architectures, formalizing the "Biomedical RAG Trilemma" (trade-offs between reasoning depth, inference latenc...
|
03-12 12:26 | Success | - | |
|
exp_2512.13072v1_20260312_122527
|
Benchmark: Retrieval-Guided Continual Learning (RG-CL) for Medical VLMs
**Architecture:** Multimodal VLM framework integrating dynamic knowledge distillation with a **multi-modal, multi-layer RAG** system for Continual Learning (CL). **RAG Strategy:** Retrieves from a massive **18-million record PubMed-derived...
|
03-12 12:25 | Success | - | |
|
exp_2512.14465v2_20260312_122452
|
Context-Picker: Dynamic Context Selection Benchmark
**Architecture:** Replaces static Top-K retrieval with a **two-stage Reinforcement Learning (RL)** policy. It first maximizes recall of critical passages, then prunes redundancy to distill a minimal sufficient evidence set. **RAG Specifics:...
|
03-12 12:24 | Success | - | |
|
exp_2506.12981v2_20260312_122409
|
SymRAG: Neuro-Symbolic Retrieval Benchmark
**Architecture:** SymRAG introduces a neuro-symbolic RAG framework centered on an **adaptive query router**. This router assesses real-time query complexity and system load to dynamically dispatch requests to symbolic (rule-based), neural (...
|
03-12 12:24 | Success | - | |
|
exp_pytrain.20260312122203.014_20260312_122246
|
Python Skill Fallback
Title: Typed Plugin Registry for Model Architectures - Focus: typing.Protocol, typing.TypeVar, typing.Generic, abc, dataclasses, Runtime type checking simulation - Note: Generated fallback due to unavailable model output.
|
03-12 12:22 | Success | - | |
|
exp_2506.14412v2_20260312_122133
|
RAGtifier: Evaluating RAG Generation Approaches of State-of-the-Art RAG Systems for the SIGIR LiveRAG Competition
**Architecture:** Dense retrieval pipeline utilizing Pinecone vectors, a BGE cross-encoder reranker, and InstructRAG for flow control, terminating in a Falcon-3-10B generator. **Memory Footprint:** **High Risk.** Falcon-3-10B requires aggre...
|
03-12 12:21 | Success | - | |
|
exp_2506.15522v1_20260312_122001
|
Benchmark: Grounded LLM Inference & Verification
**Architecture:** Standard decoder LLMs augmented with internal reasoning traces. Optimized via **GRPO (Group Relative Policy Optimization)** using verifiable outcome-based rewards. No architectural changes for memory reduction. **Retrieval...
|
03-12 12:20 | Success | - | |
|
exp_oa_W4403815812_20260312_121905
|
Here is the design for the QAEncoder benchmark.
**Architecture & Retrieval Strategy** QAEncoder is a **training-free** augmentation for dense retrieval (Dual-Encoder). It bridges the query-document gap by generating **Question-Expected Embeddings (QEE)**—estimating the center of a query...
|
03-12 12:19 | Success | - | |
|
exp_2409.10576v2_20260312_121819
|
Language Models and Retrieval Augmented Generation for Automated Structured Data Extraction from Diagnostic Reports
**Architecture & Feasibility:** Benchmarks open-weights models (Llama 3, medical fine-tunes) for structured clinical extraction. High implementation feasibility for local deployment. **Memory Footprint:** Crucially validates that **quantiza...
|
03-12 12:18 | Success | - | |
|
exp_2409.11353v3_20260312_121729
|
Here is the design for a small, runnable benchmark tailored to the THaMES innovation. This benchmark focuses on the effi...
**Architecture & Implementation:** THaMES is a modular framework applying In-Context Learning (ICL), Retrieval-Augmented Generation (RAG), and PEFT (LoRA) to mitigate hallucinations. It automates test generation and benchmarking. **RAG & Re...
|
03-12 12:17 | Success | - | |
|
exp_pytrain.20260312121538.013_20260312_121607
|
Strict-Typed Plugin Registry with Runtime Validation
README.md Strict-Typed Plugin Registry with Runtime Validation Overview This benchmark evaluates the design and implementation of a robust, type-safe plugin system in Python using `typing.Protocol`, `TypeGuard`, and strict packaging hygiene...
|
03-12 12:16 | Success | - | |
|
exp_2409.13385v2_20260312_121410
|
Benchmark: Contextual Compression in RAG
**Architecture:** This survey reviews **Contextual Compression** paradigms, integrating filtering and condensation modules between the retriever and LLM to process raw retrieved data. **Memory Footprint & Speed:** Compression reduces input...
|
03-12 12:14 | Success | - | |
|
exp_2403.12583v1_20260312_121333
|
Quantixar: High-performance Vector Data Management System
**Architecture & Retrieval:** Quantixar proposes a vector database architecture utilizing **HNSW (Hierarchical Navigable Small World)** indexing for Approximate Nearest Neighbor (ANN) search. To manage high-dimensional data, it implements a...
|
03-12 12:13 | Success | - | |
|
exp_2404.07220v2_20260312_121247
|
Blended RAG Benchmark
**Architecture & Retrieval Strategy:** Blended RAG proposes a **Hybrid Sparse-Dense Retrieval** architecture. It utilizes **Dense Vector indexes** (semantic search via bi-encoders) blended with **Sparse Encoder indexes** (lexical search) an...
|
03-12 12:13 | Success | - | |
|
exp_2309.11392v1_20260312_121149
|
This benchmark evaluates the performance of a Retrieval-Augmented Generation (RAG) verification pipeline, inspired by th...
**Architecture:** Hybrid Retrieval-Augmented Verification. **Retrieval Strategy:** Combines sparse and dense retrieval with neural rerankers on the MS MARCO V1 corpus. **Verification Methods:** 1. **Holistic:** Validates the entire generate...
|
03-12 12:12 | Success | - | |
|
exp_2310.01429v1_20260312_121109
|
Chatmap: Geospatial LLM Benchmark
**Architecture & Feasibility** ChatMap utilizes a **1B parameter student model** fine-tuned via distillation (using a larger teacher) to interpret OpenStreetMap (OSM) data. This is **highly feasible** for 8GB VRAM targets; the model require...
|
03-12 12:11 | Success | - | |
|
exp_pytrain.20260312120920.012_20260312_120945
|
**Title:** Strictly Typed Configuration Module Benchmark
README.md **Title:** Strictly Typed Configuration Module Benchmark **Description:** This benchmark evaluates an autonomous coding system's ability to construct a robust, single-file Python module (`config_manager.py`). The module must enfor...
|
03-12 12:09 | Success | - | |
|
exp_2303.13416v1_20260312_120815
|
**Title:** A Unified Framework for Learned Sparse Retrieval (LSR)
**Architecture:** Unified Learned Sparse Retrieval (LSR) framework using BERT-style encoders (e.g., Splade) to generate sparse lexical representations for inverted indices. **Retrieval Specifics:** * **Retrieval Architecture:** Inverted Ind...
|
03-12 12:08 | Success | - | |
|
exp_2512.12117v1_20260312_120729
|
Here is the design for the Citation-Grounded Code Comprehension benchmark.
**Retrieval Architecture:** Hybrid RAG system combining BM25 (sparse), BGE (dense), and Neo4j graph retrieval. **Indexing & Context:** Indexing leverages code structure, specifically **import relationships**, to link cross-file dependencies...
|
03-12 12:07 | Success | - | |
|
exp_cr_10.24908_iqurcp19921_20260312_120640
|
Performing Automated Employment Law Case Analysis Using Large Language Models
**Architecture:** Comparative evaluation of Retrieval-Augmented Generation (RAG) strategies—specifically Vector Chunking, Graph RAG, and Full-Context ("No-processing")—for legal QA on the Sagaz dataset. **RAG Specifics:** * **Retrieval & In...
|
03-12 12:06 | Success | - | |
|
exp_2506.17288v1_20260312_120546
|
SlimRAG: Retrieval without Graphs via Entity-Aware Context Selection
**Architecture:** SlimRAG is a graph-free, entity-centric framework replacing Knowledge Graph (KG) construction with a lightweight "entity-to-chunk" table. **RAG Implementation:** * **Retrieval Architecture:** Entity-aware context selection...
|
03-12 12:05 | Success | - | |
|
exp_pytrain.20260312120304.011_20260312_120347
|
Strictly Typed Plugin Registry with Runtime Protocol Validation
Overview This benchmark evaluates the robustness of a Python plugin architecture utilizing `typing.Protocol` and `@runtime_checkable`. It simulates a system where modules must strictly adhere to a defined interface (`DataProcessor`) before...
|
03-12 12:03 | Success | - | |
|
exp_2409.10516v3_20260312_120126
|
```markdown
**Architecture & Retrieval Strategy** RetrievalAttention offloads the Key-Value (KV) cache from GPU VRAM to CPU DRAM, replacing quadratic attention with a sparse, vector-retrieval mechanism. It constructs Approximate Nearest Neighbor Search...
|
03-12 12:01 | Success | - | |
|
exp_2403.11366v2_20260312_120006
|
JORA: JAX Tensor-Parallel LoRA Benchmark
**Architecture:** JORA utilizes a JAX-based framework featuring just-in-time (JIT) compilation and tensor-sharding (Tensor Parallelism) to enable distributed LoRA fine-tuning of Llama-2 models. **Memory Footprint:** Reduces per-GPU VRAM con...
|
03-12 12:00 | Success | - | |
|
exp_2304.00114v1_20260312_115913
|
Benchmark: Dense Sparse Retrieval (Efficiency Focus)
**Architecture:** The paper proposes replacing standard dense encoders (e.g., BERT) with sparse-activated language models (specifically Switch Transformers) within a **Bi-encoder** framework. It utilizes the **Tevatron** library for impleme...
|
03-12 11:59 | Success | - | |
|
exp_pytrain.20260312115633.010_20260312_115720
|
Strictly Typed Dynamic Plugin Loader Benchmark
README.md Strictly Typed Dynamic Plugin Loader Benchmark This benchmark tests a Python engineer's ability to bridge dynamic runtime code execution with static type safety. Problem Context In large-scale autonomous systems, plugins are often...
|
03-12 11:57 | Success | - | |
|
exp_2601.03262v1_20260312_115404
|
Benchmark: MLLM Roles in Visually Rich Document Retrieval (VRD)
**Summary:** This survey classifies MLLM roles for Visually Rich Document (VRD) retrieval into three architectures: 1. **Modality-Unifying Captioners:** MLLMs synthesize figures/tables into text. * *Retrieval Strategy:* Text-to-Text (compat...
|
03-12 11:55 | Success | - | |
|
exp_2506.12571v1_20260312_115229
|
DoTA-RAG Benchmark: Dynamic-of-Thought Aggregation
**Architecture:** DoTA-RAG implements a three-stage pipeline: query rewriting, dynamic routing to specialized sub-indexes, and multi-stage retrieval with ranking. **Retrieval Strategy:** The system utilizes a re-embedded FineWeb-10BT corpus...
|
03-12 11:52 | Success | - | |
|
exp_2506.14707v1_20260312_115140
|
HARMONY: A Scalable Distributed Vector Database for High-Throughput Approximate Nearest Neighbor Search
**Paper:** HARMONY (Scalable Distributed Vector DB) * **Architecture:** Distributed **Approximate Nearest Neighbor (ANN)** engine utilizing a **multi-granularity partition strategy**. This hybrid approach combines dimension-based and vector...
|
03-12 11:51 | Success | - | |
|
exp_2506.15246v1_20260312_115025
|
TopClustRAG Benchmark Suite
**Architecture:** TopClustRAG utilizes a **Hybrid Retrieval Architecture** (Sparse + Dense) followed by **K-Means clustering** to group semantically similar chunks. The system generates distinct, cluster-specific intermediate answers that a...
|
03-12 11:50 | Success | - | |
|
exp_pytrain.20260312114750.009_20260312_114826
|
Dynamic Package Construction and Runtime Protocol Verification
README.md Dynamic Package Construction and Runtime Protocol Verification This benchmark tests an autonomous agent's ability to programmatically generate Python code, construct a valid package structure on the disk, define strict interfaces...
|
03-12 11:48 | Success | - | |
|
exp_2506.15513v1_20260312_114609
|
RePCS: Retrieval-Path Contamination Scoring Benchmark
**Architecture:** RePCS is a model-agnostic diagnostic algorithm, not a new LLM. It detects memorization by calculating the Kullback-Leibler (KL) divergence between two output distributions: a parametric path (Query only) versus a retrieval...
|
03-12 11:46 | Success | - | |
|
exp_2303.13220v1_20260312_114522
|
Parameter-Efficient Sparse Retrievers and Rerankers using Adapters
**Architecture:** Inserts lightweight bottleneck Adapters into **SPLADE (Sparse Lexical and Expansion)**, keeping the heavy Pre-trained Language Model (PLM) frozen. Also applies adapters to rerankers, enabling knowledge transfer between ret...
|
03-12 11:45 | Success | - | |
|
exp_cr_10.1007_s11227-025-07118-9_20260312_114443
|
Benchmark: GPU-Centric Storage Optimization (ESPN vs. Baseline)
**Architecture & Retrieval Strategy:** This paper proposes a **GPU-centric retrieval architecture** using **GPUDirect Storage (GDS)** to bypass CPU bottlenecks, enabling direct SSD-to-GPU data transfer. It introduces **Embedding from Storag...
|
03-12 11:44 | Success | - | |
|
exp_2506.13589v3_20260312_114351
|
AdaVideoRAG Benchmark
**Architecture:** AdaVideoRAG introduces a lightweight **Intent Classifier** that dynamically routes queries to appropriate retrieval schemes (Naive, Visual, or Knowledge Graph) based on complexity, avoiding unnecessary processing for simpl...
|
03-12 11:43 | Success | - | |
|
exp_pytrain.20260312114131.008_20260312_114214
|
Robust Distribution Inspector
README.md Robust Distribution Inspector Overview The **Robust Distribution Inspector** is a command-line utility designed to inspect Python packages installed in the current environment. It demonstrates strict type usage using Python's `typ...
|
03-12 11:42 | Success | - | |
|
exp_oa_W4416322438_20260312_112829
|
Benchmark: RAG-Augmented LLM for Yunnan Arabica Coffee Cultivation
**Architecture & Retrieval:** This paper implements a **Retrieve–Rerank–Generate** pipeline. It employs **hybrid retrieval** (dense + sparse) fused by Reciprocal Rank Fusion (RRF) and **semantic-aware chunking** with stable identifiers (`do...
|
03-12 11:39 | Success | - | |
|
exp_2512.12284v3_20260312_112710
|
```markdown
**V-Rex** targets streaming video LLMs on edge devices, specifically addressing memory bandwidth and compute bottlenecks inherent to continuous video processing. * **Retrieval Architecture:** Implements **Dynamic KV Cache Retrieval (ReSV)**...
|
03-12 11:27 | Success | - | |
|
exp_pytrain.20260312112513.007_20260312_112532
|
Python Skill Fallback
Title: Robust Generic Tensor Arithmetic Module - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-12 11:25 | Success | - | |
|
exp_2409.15355v5_20260312_112349
|
Benchmark: Block-Attention for Efficient Prefilling in RAG
**Architecture:** Block-Attention decouples context into independent passage blocks. Instead of sequential prefilling, KV states are computed in parallel. Crucially, it enables **KV state reuse**, allowing cached retrieval passages to be re...
|
03-12 11:23 | Success | - | |
|
exp_2403.13291v1_20260312_112222
|
Late-Interaction Retrieval & Token Pruning Benchmark
**Architecture:** Analyzes **Late-Interaction** models (ColBERT/COIL), which use multi-vector token embeddings and sum-of-max scoring rather than single-vector dense retrieval. **Memory Footprint:** Addresses the prohibitive storage cost of...
|
03-12 11:22 | Success | - | |
|
exp_2506.21593v1_20260312_112121
|
PentaRAG Benchmark Simulation
**Architecture:** PentaRAG implements a 5-layer cascading router that prioritizes speed: (1) Fixed KV Cache, (2) Semantic Cache, (3) Memory-Recall (exploiting LLM internal weights), (4) Adaptive Session Memory, and (5) Conventional Retrieva...
|
03-12 11:21 | Success | - | |
|
exp_2601.06037v4_20260312_112037
|
TeleMem: Building Long-Term and Multimodal Memory for Agentic AI
**Architecture & Retrieval:** TeleMem is a RAG-based memory system employing a structured writing pipeline (batching, retrieval, clustering, and consolidation) to maintain narrative user profiles. It integrates a multimodal memory module wi...
|
03-12 11:20 | Success | - | |
|
exp_pytrain.20260312111831.006_20260312_111859
|
Dynamic Backend Registry with Protocol Validation
README.md Dynamic Backend Registry with Protocol Validation Overview This benchmark tests the ability to design a robust, scalable plugin architecture similar to those found in high-performance Machine Learning libraries (e.g., vLLM, Diffus...
|
03-12 11:19 | Success | - | |
|
exp_2309.13335v2_20260312_111710
|
Model-enhanced Vector Index
**Architecture:** MEVI uses a differentiable hybrid architecture combining a Twin-Tower representation model with a Seq2Seq generator, bridged by a Residual Quantization (RQ) codebook. **Retrieval Strategy:** A two-stage "Generative-to-Dens...
|
03-12 11:17 | Success | - | |
|
exp_cr_10.54963_jic.v4i2.1706_20260312_111619
|
BERT and Beyond: A Comprehensive Survey of Natural Language Processing Techniques for Information Retrieval
**Paper Analysis: Survey (Taxonomy & Trends)** **Architecture:** Surveys **Dual-Encoder (Bi-Encoder)** BERT models for semantic retrieval and **Cross-Encoders** for reranking. Highlights **Hybrid Dense-Sparse** architectures (combining vect...
|
03-12 11:16 | Success | - | |
|
exp_2506.21601v2_20260312_111526
|
Hierarchical Patch Compression for ColPali (HPC-ColPali) Benchmark
**Architecture:** Extends **ColPali** (a VLM-based multi-vector retrieval architecture) with Hierarchical Patch Compression (HPC). * **Retrieval Strategy:** Utilizes patch-level embeddings. * **Indexing:** Optimized via **HNSW** indexing an...
|
03-12 11:15 | Success | - | |
|
exp_2304.01016v3_20260312_111433
|
Quick Dense Retrievers Consume KALE: Post Training Kullback Leibler Alignment of Embeddings for Asymmetrical dual encode...
**Architecture:** Asymmetrical Dual Encoders (Bi-Encoder). **Retrieval Strategy:** Dense Retrieval via Knowledge Distillation. KALE aligns the pruned query encoder's output distribution to the original teacher using Kullback-Leibler diverge...
|
03-12 11:14 | Success | - | |
|
exp_pytrain.20260312111138.005_20260312_111221
|
Dynamic Protocol-Based Plugin System Benchmark
README.md Dynamic Protocol-Based Plugin System Benchmark Objective This benchmark tests the ability to implement a robust plugin architecture using Python's standard library. The focus is on dynamic code loading from strings, runtime type s...
|
03-12 11:12 | Success | - | |
|
exp_pytrain.20260312103311.004_20260312_103345
|
Protocol-Enforced Virtual Package Importer
README.md Protocol-Enforced Virtual Package Importer Design Brief This coding drill benchmark tests the hypothesis that an autonomous system can construct a robust internal packaging mechanism by extending `sys.meta_path`. The system must i...
|
03-12 10:33 | Success | - | |
|
exp_pytrain.20260312101232.003_20260312_101306
|
Type-Safe Dynamic Package Generator & Importer
Overview This coding drill benchmarks your ability to use Python's standard library for **dynamic code generation** and **runtime module loading**. Unlike simple `eval()` or `exec()` calls, this exercise requires the creation of a valid, im...
|
03-12 10:13 | Success | - | |
|
exp_pytrain.20260312095303.002_20260312_095323
|
Benchmark: PEP 695 Generic Vault with Explicit Public API
README.md Benchmark: PEP 695 Generic Vault with Explicit Public API Description This coding drill verifies the implementation of a generic `Vault` class using Python 3.12+ syntax (PEP 695) and a strictly defined public interface using `__al...
|
03-12 09:53 | Success | - | |
|
exp_pytrain.20260312093112.001_20260312_093149
|
Here is the runnable Python coding drill benchmark designed to your specifications.
README.md Generic Repository Package Construction Benchmark Overview This benchmark evaluates a Python system's ability to programmatically scaffold a Python package structure and utilize advanced typing features (specifically `Protocol` an...
|
03-12 09:31 | Success | - | |
|
exp_hf_2603.10757_20260312_092735
|
CodePercept: Code-Grounded Visual STEM Perception Benchmark
**Analysis for ARES 8GB Roadmap** **Architecture & Methodology** CodePercept proposes a "Code-as-Perception" paradigm, asserting that visual perception—not reasoning—is the bottleneck in STEM tasks. It introduces ICC-1M, a dataset of 1M Ima...
|
03-12 09:28 | Success | - | |
|
exp_2409.14515v1_20260312_092641
|
SPAQ-DL-SLAM: Towards Optimizing Deep Learning-based SLAM for Resource-Constrained Embedded Platforms
**Architecture:** SPAQ-DL-SLAM optimizes DROID-SLAM by applying 20% structured pruning (based on layer-wise sensitivity analysis) and 8-bit post-training static quantization (PTQ) to its deep learning modules. **Memory Footprint:** Achieves...
|
03-12 09:26 | Success | - | |
|
exp_pytrain.20260312092411.002_20260312_092435
|
```markdown
README.md
|
03-12 09:24 | Success | - | |
|
exp_2309.16870v1_20260312_092235
|
LEF: Late-to-Early Temporal Fusion for LiDAR 3D Object Detection
**Architecture** LEF proposes a recurrent "late-to-early" fusion scheme that injects object-aware latent embeddings into the early stages of a pillar-based detector. It processes temporally aligned sparse pillar tokens using window-based at...
|
03-12 09:22 | Success | - | |
|
exp_2309.16870v1_20260312_092138
|
LEF: Late-to-Early Temporal Fusion Benchmark
**Architecture** LEF proposes a recurrent "late-to-early" fusion scheme that injects object-aware latent embeddings into the early stages of a pillar-based detector. It processes temporally aligned sparse pillar tokens using window-based at...
|
03-12 09:21 | Success | - | |
|
exp_2512.14879v1_20260312_092048
|
**README.md**
**Architecture:** Proposes Entropy-Reservoir Bregman Projection (ERBP), a theoretical framework for self-referential training. It addresses model collapse via information geometry rather than proposing a new hardware-efficient model archite...
|
03-12 09:20 | Success | - | |
|
exp_2512.14938v1_20260312_091949
|
---
**Architecture** The model utilizes a 5B parameter Diffusion Transformer (DiT) built upon Wan2.2. To manage long-form generation, it employs a sliding window mechanism with motion-frame context and a high-compression Video VAE. **Memory Foo...
|
03-12 09:19 | Success | - | |
|
exp_pytrain.20260312091738.001_20260312_091806
|
Runtime-Checked Plugin Architecture Drill
README.md Runtime-Checked Plugin Architecture Drill Overview This benchmark demonstrates an autonomous system constructing a robust Python package (`text_ops`) that leverages structural subtyping (Protocols) to define interfaces. It ensures...
|
03-12 09:18 | Success | - | |
|
exp_2409.14595v1_20260312_091509
|
```markdown
**Architecture:** EchoAtt optimizes transformers by sharing attention matrices across layers with high similarity. It utilizes knowledge distillation to train a student model that selectively "echoes" (copies) attention computations from ea...
|
03-12 09:15 | Success | - | |
|
exp_pytrain.20260312091146.013_20260312_091233
|
Dynamic Protocol-Based Plugin Loader
This benchmark demonstrates the hypothesis that utilizing structural subtyping (`typing.Protocol`) combined with dynamic module loading (`importlib`) creates a more flexible and maintainable architecture than traditional, rigid inheritance...
|
03-12 09:12 | Success | - | |
|
exp_oa_W4395065783_20260312_090948
|
This benchmark suite is designed to validate the core efficiency hypotheses presented in "A Survey on Efficient Inferenc...
This survey identifies three core architectural bottlenecks for LLM deployment: massive parameter counts, quadratic-complexity attention mechanisms, and auto-regressive decoding. It categorizes solutions into a three-tier taxonomy: 1. **Mem...
|
03-12 09:09 | Success | - | |
|
exp_hf_2603.08899_20260312_090837
|
ConFu: Contemplate the Future for Better Speculative Sampling
**Architecture:** ConFu optimizes speculative decoding by introducing "contemplate tokens" and soft prompts into the draft model. It employs a lightweight Mixture-of-Experts (MoE) layer to dynamically predict future context, reducing the er...
|
03-12 09:08 | Success | - | |
|
exp_hf_2603.10744_20260312_090743
|
---
**Architecture:** JiT is a **training-free** inference framework targeting spatial redundancy in Diffusion Transformers (DiT). It replaces full latent processing with a **spatially approximated generative ODE**, driven by a dynamically sele...
|
03-12 09:07 | Success | - | |
|
exp_hf_2603.10705_20260312_090644
|
Prism-Δ: Differential Subspace Steering for Prompt Highlighting in Large Language Models
**Architecture:** PRISM-Δ steers generation by decomposing the difference between positive and negative cross-covariance matrices to isolate discriminative directions. It utilizes continuous softplus weighting for attention heads—allowing w...
|
03-12 09:06 | Success | - | |
|
exp_pytrain.20260312090424.012_20260312_090457
|
Type-Safe Plugin Architecture with `importlib`
README.md Type-Safe Plugin Architecture with `importlib` This benchmark implements a zero-dependency plugin registry system inspired by HuggingFace Transformers. It demonstrates how to use Python's `typing` module (Generics, TypeVars) to en...
|
03-12 09:05 | Success | - | |
|
exp_2304.00280v1_20260312_090304
|
Benchmark: Progressive Channel-Shrinking Network (PCS)
**Architecture:** Introduces Progressive Channel-Shrinking (PCS) to replace unstable gating functions in salience-based pruning. It employs a Running Shrinking Policy (RSP) to transition from dynamic training to a **testing-static** pruning...
|
03-12 09:03 | Success | - | |
|
exp_2512.14925v2_20260312_090159
|
Here is the runnable benchmark for the Multiscale Aggregated Hierarchical Attention (MAHA) innovation.
**Architecture:** MAHA replaces standard MHSA with a hybrid dilated-convolutional transformer backbone. It utilizes learnable downsampling to partition inputs into hierarchical scales and aggregates attention maps using differentiable conve...
|
03-12 09:02 | Success | - | |
|
exp_2403.18159v2_20260312_090048
|
Benchmark for "Oh! We Freeze" (OV-Freeze)
**Architecture:** Introduces **ov-freeze**, a lightweight Quantization-Aware Knowledge Distillation (KD-QAT) technique. It stabilizes the training of 4-bit weight quantized LLMs by addressing gradient propagation vulnerabilities identified...
|
03-12 09:01 | Success | - | |
|
exp_pytrain.20260312085711.011_20260312_085758
|
This document describes the "Runtime Checkable Plugin Loader" benchmark.
README.md This document describes the "Runtime Checkable Plugin Loader" benchmark. Overview This benchmark tests the ability to implement a robust, dynamic plugin system using Python's standard library. It focuses on structural subtyping (P...
|
03-12 08:58 | Success | - | |
|
exp_2506.16600v2_20260312_085525
|
FLAME: Federated Fine-Tuning Benchmark
**FLAME** proposes a Sparse Mixture-of-Experts (SMoE) framework for federated LLM fine-tuning, designed to eliminate the performance degradation caused by compressing LoRA matrices on low-resource clients. * **Architecture:** Replaces stand...
|
03-12 08:55 | Success | - | |
|
exp_2506.16640v4_20260312_085418
|
Benchmark: Adaptive-Scalable Entmax (ASEntmax) Simulation
**Architecture** Proposes **Adaptive-Scalable Entmax (ASEntmax)**, a drop-in replacement for Softmax attention. It utilizes $\alpha$-entmax to assign exact zeros to irrelevant tokens, creating dynamically sparse attention maps. A learnable...
|
03-12 08:54 | Success | - | |
|
exp_oa_W4404313603_20260312_085338
|
Here is the runnable benchmark for the Small Language Model (SLM) innovation, focusing on **Dynamic Precision (Mixed Pre...
**Architecture:** Reviews compact transformer designs and Small Language Models (typically <7B parameters) optimized for edge environments. It highlights architectural trade-offs that maintain task performance while reducing parameter count...
|
03-12 08:53 | Success | - | |
|
exp_2309.16795v2_20260312_085243
|
Benchmark: Ultra-low-power Image Classification (Quartz SNN)
**Paper:** Ultra-low-power Image Classification on Neuromorphic Hardware (Quartz) **Architecture:** Proposes "Quartz," a temporal conversion method that translates stateless ANNs to Spiking Neural Networks (SNNs) using Time-To-First-Spike (...
|
03-12 08:53 | Success | - | |
|
exp_2304.00335v1_20260312_085153
|
Here is the runnable benchmark for the Volumetric Attribute Compression innovation.
**Architecture** Replaces RAHT’s piecewise constant functions with a **feedforward linear network** implementing higher-order B-spline bases. The core mechanism is a space-varying convolution (Geometric Attention) where weights are dynamica...
|
03-12 08:51 | Success | - | |
|
exp_pytrain.20260312084937.010_20260312_085018
|
Type-Safe Pipeline Package Benchmark
README.md Type-Safe Pipeline Package Benchmark This benchmark evaluates a Python implementation of a modular, type-safe data processing pipeline. The implementation leverages advanced Python `typing` features, including Generics, Protocols,...
|
03-12 08:50 | Success | - | |
|
exp_oa_W4416386252_20260312_084802
|
Which Heads Matter for Reasoning? RL-Guided KV Cache Compression
**Architecture:** RLKV utilizes offline Reinforcement Learning to probe and identify specific attention heads critical for generative reasoning and Chain-of-Thought (CoT) stability. Unlike static pruning, it optimizes head selection by dire...
|
03-12 08:48 | Success | - | |
|
exp_hf_2603.09488_20260312_084623
|
Streaming Autoregressive Video Generation via Diagonal Distillation
**Architecture** Proposes **Diagonal Distillation**, an asymmetric autoregressive strategy. It allocates higher denoising steps to initial video chunks to establish high-fidelity features, while subsequent chunks use significantly fewer ste...
|
03-12 08:46 | Success | - | |
|
exp_2601.11557v1_20260312_084437
|
Benchmark: Information-Theoretic Binarization vs. Float32 ANN
**Architecture:** Replaces the standard "HNSW + float32" stack with **Maximally Informative Binarization (MIB)**. The system utilizes exhaustive search over 1-bit binary vectors using bitwise distance metrics and Information-Theoretic Scori...
|
03-12 08:45 | Success | - | |
|
exp_pytrain.20260312084136.009_20260312_084229
|
Strictly Typed Modular Data Processor
This benchmark evaluates the implementation of a data processing system using Python's structural subtyping features and strict module packaging hygiene. Overview The drill requires the creation of a single-file module (`benchmark.py` which...
|
03-12 08:42 | Success | - | |
|
exp_hf_2603.02188_20260312_084023
|
Multi-Head Low-Rank Attention (MLRA) Benchmark
**Architecture** MLRA modifies Multi-Head Latent Attention (MLA) by replacing the non-partitionable single latent head with a multi-head latent structure. This allows the latent Key-Value states to be effectively sharded across GPUs. **Memo...
|
03-12 08:40 | Success | - | |
|
exp_oa_W4416557533_20260312_083858
|
Small Language Models (SLM) Efficiency Benchmark
**Architecture:** Survey of design frameworks and training methodologies for edge-compatible Small Language Models (SLMs). **Memory Footprint:** Focuses heavily on minimizing model size through optimization techniques, specifically pruning,...
|
03-12 08:39 | Success | - | |
|
exp_oa_W4415037605_20260312_083754
|
Hardware-Efficient Attention for Fast Decoding
**Summary for ARES 8GB Roadmap** * **Architecture:** Proposes **Grouped-Tied Attention (GTA)** and **Grouped Latent Attention (GLA)**. Both mechanisms optimize arithmetic intensity by reusing key-value states (GTA) or utilizing parallel-fri...
|
03-12 08:37 | Success | - | |
|
exp_pytrain.20260312083445.008_20260312_083532
|
Generic Plugin Registry with Semantic Versioning
README.md Generic Plugin Registry with Semantic Versioning This benchmark demonstrates a robust, self-contained module loader that simulates a mini packaging ecosystem. Objectives 1. **PEP 695 Implementation**: Utilize Python 3.12+ Type Par...
|
03-12 08:35 | Success | - | |
|
exp_oa_W4415048600_20260312_082323
|
```markdown
**Analysis for ARES 8GB Roadmap:** * **Architecture:** Prioritizes hybrid edge-cloud collaborative systems (e.g., EdgeShard) and microservices over monolithic designs. Suggests leveraging intelligent workload distribution to bypass local ha...
|
03-12 08:33 | Success | - | |
|
exp_cr_10.3389_frobt.2025.1518965_20260312_082215
|
A survey of model compression techniques: past, present, and future
This paper provides a comprehensive methodological framework for optimizing Large Language Models (LLMs) within the ARES 8GB hardware constraints. As a survey, it does not propose a specific architecture but evaluates compression techniques...
|
03-12 08:22 | Success | - | |
|
exp_oa_W4415098413_20260312_082141
|
Artificial Hippocampus Networks (AHN) Benchmark
**Architecture:** A hybrid framework combining a 32k sliding window attention buffer (short-term memory) with a learnable recurrent compressor (Artificial Hippocampus Network) for long-term memory. The AHN utilizes modern RNN architectures...
|
03-12 08:21 | Success | - | |
|
exp_pytrain.20260312081922.007_20260312_081950
|
Strictly Typed Generic Registry with Package Metadata
An autonomous coding system can effectively utilize Python's advanced type system (Protocols and Generics) to enforce interface safety while simultaneously adhering to library packaging standards (`__all__`, versioning) to ensure API stabil...
|
03-12 08:19 | Success | - | |
|
exp_2512.14946v1_20260312_081715
|
EVICPRESS: Joint KV-Cache Compression and Eviction for Efficient LLM Serving
**Summary for ARES 8GB Roadmap:** * **Architecture:** A multi-tier KV management system (GPU VRAM to CPU RAM) that jointly optimizes eviction and lossy compression. It utilizes a "unified utility function" to balance quality loss against la...
|
03-12 08:17 | Success | - | |
|
exp_oa_W4410363086_20260312_081534
|
Distributed & Multimodal LLM Benchmark
This survey advocates for distributed architectures—including data, model, and pipeline parallelism—to mitigate the memory and computational constraints of centralized Large Language Models (LLMs) and Multimodal LLMs (MLLMs). * **Architectu...
|
03-12 08:16 | Success | - | |
|
exp_oa_W4416458930_20260312_081349
|
On-Device Large Language Models: A Survey of Model Compression and System Optimization
This survey systematizes on-device LLM optimization (1-4B parameters) using the ALEM (Accuracy, Latency, Energy, Memory) protocol. * **Architecture:** Advocates for hybrid pipelines combining **quantization**, structured pruning with mergea...
|
03-12 08:14 | Success | - | |
|
exp_pytrain.20260312081042.006_20260312_081110
|
Strictly Typed Dynamic Plugin Loader with Validation
README.md Strictly Typed Dynamic Plugin Loader with Validation This benchmark demonstrates a robust, enterprise-grade plugin architecture using Python's standard library. It leverages `typing.Protocol` to enforce structural sub-typing (Stat...
|
03-12 08:11 | Success | - | |
|
exp_pytrain.20260312080135.005_20260312_080217
|
Dynamic Type-Safe Plugin Loader Benchmark
README.md Dynamic Type-Safe Plugin Loader Benchmark Overview This benchmark evaluates the ability of a Python execution environment to implement a robust, type-safe plugin architecture using only the standard library. It tests the integrati...
|
03-12 08:02 | Success | - | |
|
exp_pytrain.20260312074012.004_20260312_074102
|
Python Skill Fallback
Title: Strictly Typed Plugin Loader with Public API Enforcement - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-12 07:41 | Success | - | |
|
exp_pytrain.20260312073002.003_20260312_073025
|
Typed Package Metadata Auditor
README.md Typed Package Metadata Auditor This benchmark evaluates the system's ability to generate robust, type-safe Python tooling using only the standard library. **Goal:** Create a self-contained script `benchmark.py` that acts as a pack...
|
03-12 07:30 | Success | - | |
|
exp_pytrain.20260312072152.002_20260312_072217
|
PEP 695 Generic Dependency Resolver Benchmark
This benchmark evaluates the implementation of a `DependencyGraph` using Python 3.12+'s PEP 695 Type Parameter Syntax. Requirements - **Python Version**: 3.12 or higher (required for PEP 695 syntax). - **Dependencies**: None (Standard Libra...
|
03-12 07:22 | Success | - | |
|
exp_self.20260312071726.002_20260312_071812
|
Frequency-Modulated State Spaces (FMSS) Benchmark
README.md Frequency-Modulated State Spaces (FMSS) Benchmark This benchmark evaluates the **Frequency-Modulated State Spaces (FMSS)** innovation, which applies multi-rate signal processing concepts to State Space Models (SSMs). The Innovatio...
|
03-12 07:18 | Success | - | |
|
exp_self.20260312071539.001_20260312_071616
|
Entropy-Triggered State Snapshot (ETSS) Benchmark
This benchmark evaluates the **Entropy-Triggered State Snapshot (ETSS)** hypothesis. The core idea is that in Low Entropy contexts (e.g., repetitive code, templates), the internal state of a State Space Model (SSM) changes minimally. By cal...
|
03-12 07:16 | Success | - | |
|
exp_pytrain.20260312071407.001_20260312_071443
|
Strictly Typed Generic Pipeline Benchmark
README.md Strictly Typed Generic Pipeline Benchmark This benchmark evaluates a Python engineer's ability to design a robust, type-safe data processing framework using Python's `typing` module. Architecture Overview The solution implements a...
|
03-12 07:14 | Success | - | |
|
exp_pytrain.20260310062524.001_20260310_062551
|
Python Skill Fallback
Title: Strict-Type Dynamic Module Loader - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-10 06:25 | Success | - | |
|
exp_self.20260309152420.007_20260309_152446
|
Section 1: README.md
bash python benchmark.py
|
03-09 15:24 | Success | - | |
|
exp_pytrain.20260309152138.004_20260309_152200
|
```markdown
No summary available yet.
|
03-09 15:22 | Success | - | |
|
exp_self.20260309151933.006_20260309_152002
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260309151933.006 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
03-09 15:20 | Success | - | |
|
exp_self.20260309151700.005_20260309_151725
|
Here is the runnable benchmark design for the **SSM Strategy Stress Test**.
README.md bash python benchmark.py
|
03-09 15:17 | Success | - | |
|
exp_pytrain.20260309151409.003_20260309_151434
|
```markdown
bash python benchmark.py ``` 3. The script will create a temporary directory structure, generate mock plugins, and attempt to load them. 4. It will verify that valid plugins are accepted and invalid ones are rejected based on the `Command`...
|
03-09 15:14 | Success | - | |
|
exp_self.20260309151226.004_20260309_151249
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260309151226.004 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
03-09 15:12 | Success | - | |
|
exp_self.20260309150928.003_20260309_150958
|
Self-directed benchmark: ssm strategy stress test
README.md Self-directed benchmark: ssm strategy stress test Overview This benchmark evaluates the hypothesis that applying State Space Model (SSM) strategies with a disciplined memory policy improves throughput and reduces VRAM usage compar...
|
03-09 15:10 | Success | - | |
|
exp_pytrain.20260309150626.002_20260309_150719
|
Generic Package Manifest Validator using PEP 695
Overview This benchmark evaluates the developer experience and runtime characteristics of Python 3.12's **PEP 695 Type Parameter Syntax** within the context of a generic package metadata validation system. Features * **PEP 695 Implementatio...
|
03-09 15:07 | Success | - | |
|
exp_self.20260309150417.002_20260309_150447
|
Section 1: README.md
bash python benchmark.py
|
03-09 15:05 | Success | - | |
|
exp_self.20260309150111.001_20260309_150135
|
SSM Strategy Stress Test: Memory vs. Throughput
README.md SSM Strategy Stress Test: Memory vs. Throughput Overview This benchmark evaluates the **"disciplined memory policy"** hypothesis for State Space Models (SSMs). The Innovation We compare a standard **Transformer (Baseline)** agains...
|
03-09 15:01 | Success | - | |
|
exp_pytrain.20260309145820.001_20260309_145847
|
Strictly Typed Package Manifest Generator
This benchmark evaluates the creation of a strictly typed Python packaging utility using standard library type hinting features (PEP 484, PEP 621). Objective The goal is to write a script `manifest_gen.py` that simulates a lightweight packa...
|
03-09 14:58 | Success | - | |
|
exp_pytrain.20260309145550.008_20260309_145612
|
Generic Typed Registry Library Implementation
README.md Generic Typed Registry Library Implementation This project implements a robust, type-safe registry component using Python 3.12's modern type parameter syntax (PEP 695). Features - **Type Safety**: Uses `class Registry[T]:` syntax...
|
03-09 14:56 | Success | - | |
|
exp_self.20260309145401.013_20260309_145439
|
SSM Strategy Stress Test: Memory Policy Benchmark
README.md SSM Strategy Stress Test: Memory Policy Benchmark Overview This benchmark evaluates the **Innovation: Disciplined Memory Policy for State Space Models (SSM)**. The hypothesis is that applying strict memory management—specifically...
|
03-09 14:54 | Success | - | |
|
exp_self.20260309145113.012_20260309_145140
|
SSM Strategy Stress Test Benchmark
README.md SSM Strategy Stress Test Benchmark This benchmark evaluates the hypothesis that applying a State Space Model (SSM) strategy—specifically a disciplined memory policy based on chunking and state recurrence—improves throughput under...
|
03-09 14:51 | Success | - | |
|
exp_pytrain.20260309144803.007_20260309_144839
|
Python Skill Fallback
Title: Generic Registry with Dynamic CLI Dispatcher - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-09 14:48 | Success | - | |
|
exp_self.20260309144608.011_20260309_144643
|
SSM Strategy Stress Test Benchmark
README.md SSM Strategy Stress Test Benchmark This benchmark evaluates the memory efficiency and throughput of a Selective State Space Model (SSM) strategy versus a standard Attention-based baseline (Transformer) under constrained memory con...
|
03-09 14:46 | Success | - | |
|
exp_self.20260309144339.010_20260309_144406
|
SSM Strategy Stress Test: Memory vs Throughput
README.md SSM Strategy Stress Test: Memory vs Throughput This benchmark evaluates the hypothesis that applying State Space Models (SSM) with a disciplined memory policy improves throughput under constrained VRAM (8GB) compared to standard a...
|
03-09 14:44 | Success | - | |
|
exp_pytrain.20260309144048.006_20260309_144120
|
Type-Safe Sliding Window KV Cache Implementation
README.md Type-Safe Sliding Window KV Cache Implementation This benchmark evaluates the ability to implement a robust, type-safe data structure using only the Python standard library, mimicking the core logic of Key-Value (KV) caches found...
|
03-09 14:41 | Success | - | |
|
exp_self.20260309143901.009_20260309_143929
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260309143901.009 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
03-09 14:39 | Success | - | |
|
exp_self.20260309143559.008_20260309_143623
|
Here is the runnable benchmark code designed to test the SSM strategy hypothesis.
README.md SSM Strategy Stress Test: Dynamic Precision & Memory Policy Overview This benchmark evaluates the hypothesis that applying **State Space Models (SSM)** with a disciplined memory policy (specifically leveraging **Dynamic Precision*...
|
03-09 14:36 | Success | - | |
|
exp_pytrain.20260309143255.005_20260309_143336
|
Strictly-Typed Plugin Registry System
Design Brief This benchmark evaluates a Python implementation of a modular **Plugin Registry** system. The system leverages Python's advanced typing features—specifically `typing.TypeVar`, `abc.ABC`, and `typing.Protocol`—to enforce compile...
|
03-09 14:33 | Success | - | |
|
exp_self.20260309143037.007_20260309_143124
|
Section 1: README.md
bash pip install torch transformers accelerate bash python benchmark.py MODE: ablated_fp32 VRAM_USAGE: <value>MB TOKENS_PER_SEC: <value> RESULT: <status> --- MODE: optimized_bf16 VRAM_USAGE: <value>MB TOKENS_PER_SEC: <value> RESULT: <status...
|
03-09 14:31 | Success | - | |
|
exp_self.20260309142742.006_20260309_142819
|
SSM Strategy Stress Test
README.md SSM Strategy Stress Test Overview This benchmark evaluates a **State Space Model (SSM)** workload under strict memory constraints (simulating an 8GB VRAM limit). It compares a standard baseline implementation against an **optimize...
|
03-09 14:28 | Success | - | |
|
exp_pytrain.20260309142438.004_20260309_142516
|
Benchmark: Strictly-Typed Configuration Abstraction Layer
README.md Benchmark: Strictly-Typed Configuration Abstraction Layer Overview This benchmark evaluates the design and implementation of a strictly-typed, generic configuration system in Python. It focuses on leveraging Python's `typing` modu...
|
03-09 14:25 | Success | - | |
|
exp_self.20260309142146.005_20260309_142225
|
Benchmark: SSM Strategy Stress Test
README.md Benchmark: SSM Strategy Stress Test This benchmark evaluates the hypothesis that a State Space Model (SSM) strategy, specifically employing a disciplined memory policy (recurrent state caching) and dynamic precision, yields superi...
|
03-09 14:23 | Success | - | |
|
exp_self.20260309141907.004_20260309_141945
|
SSM Strategy Stress Test
README.md SSM Strategy Stress Test **Innovation:** Disciplined SSM Memory Policy This benchmark tests the hypothesis that applying State Space Models (SSMs) with a disciplined memory policy (chunking/caching + dynamic precision) improves in...
|
03-09 14:19 | Success | - | |
|
exp_pytrain.20260309141526.003_20260309_141640
|
Runtime-Verified Plugin Loader Benchmark
This benchmark evaluates your ability to construct a robust, modular plugin architecture using Python's standard library. The goal is to implement a plugin loader that utilizes structural subtyping (`typing.Protocol`) for runtime safety and...
|
03-09 14:16 | Success | - | |
|
exp_self.20260309141307.003_20260309_141335
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260309141307.003 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
03-09 14:13 | Success | - | |
|
exp_self.20260309141002.002_20260309_141049
|
This benchmark evaluates the **SSM Strategy Stress Test**.
README.md This benchmark evaluates the **SSM Strategy Stress Test**. **Hypothesis**: Applying a State Space Model (SSM) approach with a disciplined memory policy (fixed state size) improves throughput compared to standard attention mechanis...
|
03-09 14:10 | Success | - | |
|
exp_pytrain.20260309140641.002_20260309_140737
|
PEP 695 Generic Plugin Loader Benchmark
README.md PEP 695 Generic Plugin Loader Benchmark Overview This coding drill tests the implementation of Python 3.12's PEP 695 Type Parameter Syntax within the context of a dynamic plugin architecture. It demonstrates how the new syntax red...
|
03-09 14:07 | Success | - | |
|
exp_self.20260309140320.001_20260309_140409
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260309140320.001 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
03-09 14:04 | Success | - | |
|
exp_pytrain.20260309140012.001_20260309_140048
|
Strictly Typed Dependency Resolver Benchmark
README.md Strictly Typed Dependency Resolver Benchmark Overview This benchmark implements a minimal dependency resolution engine utilizing Python's advanced static typing features (`TypedDict`, `Protocol`, and `Generics`). It demonstrates h...
|
03-09 14:00 | Success | - | |
|
exp_pytrain.20260309135710.030_20260309_135745
|
Asynchronous Data Pipeline with Strict Typing
README.md Asynchronous Data Pipeline with Strict Typing Overview This coding drill evaluates your ability to construct a robust, IO-bound data processing pipeline using modern Python type hinting (PEP 484) and asynchronous programming primi...
|
03-09 13:57 | Success | - | |
|
exp_self.20260309135203.054_20260309_135230
|
Here is the runnable benchmark design.
Section 1: README.md This benchmark compares a standard Transformer-based approach (Baseline) against an SSM-inspired Linear Recurrent approach (Optimized) to test the hypothesis that disciplined memory policies improve throughput under con...
|
03-09 13:55 | Success | - | |
|
exp_pytrain.20260309134923.029_20260309_134943
|
Runtime Interface Compliance Validator using Importlib
README.md Runtime Interface Compliance Validator using Importlib Overview This coding drill implements a robust plugin architecture validation system. It demonstrates how to use Python's `typing.Protocol` to enforce structural subtyping (du...
|
03-09 13:49 | Success | - | |
|
exp_self.20260309134723.053_20260309_134756
|
Here is the runnable benchmark design.
bash pip install torch python benchmark.py ```
|
03-09 13:48 | Success | - | |
|
exp_self.20260309134448.052_20260309_134516
|
Here is the runnable benchmark for the SSM Strategy Stress Test.
README.md Self-directed benchmark: SSM Strategy Stress Test Hypothesis Applying SSM (State Space Model) logic with a disciplined memory policy (specifically dynamic precision and selective state caching) improves inference throughput and re...
|
03-09 13:45 | Success | - | |
|
exp_pytrain.20260309134146.028_20260309_134238
|
Generic Resource Manager & ZipApp Packager Benchmark
This benchmark tests a developer's ability to leverage modern Python type hinting (PEP 695) to create strict, generic data structures, and then utilize standard library packaging tools (`zipapp`) to distribute them. Prerequisites * **Python...
|
03-09 13:42 | Success | - | |
|
exp_self.20260309134004.051_20260309_134035
|
SSM Strategy Stress Test Benchmark
README.md SSM Strategy Stress Test Benchmark This benchmark evaluates the hypothesis that **SSM (State Space Model)** strategies, particularly those mimicking Mamba-style memory management, offer superior throughput and lower VRAM footprint...
|
03-09 13:40 | Success | - | |
|
exp_self.20260309133738.050_20260309_133813
|
SSM Strategy Stress Test Benchmark
README.md SSM Strategy Stress Test Benchmark This benchmark evaluates the memory efficiency and throughput of a State Space Model (SSM) strategy against a standard Attention-based (Transformer) mechanism under constrained VRAM conditions (s...
|
03-09 13:38 | Success | - | |
|
exp_pytrain.20260309133519.027_20260309_133540
|
Strictly-Typed Component Registry with Dynamic Imports
README.md Strictly-Typed Component Registry with Dynamic Imports Overview This coding drill demonstrates the creation of a robust, type-safe plugin architecture using Python's standard library. It leverages advanced `typing` features (Gener...
|
03-09 13:35 | Success | - | |
|
exp_self.20260309133324.049_20260309_133358
|
SSM Strategy Stress Test Benchmark
README.md SSM Strategy Stress Test Benchmark Overview This benchmark evaluates the performance of a Selective State Space Model (SSM) implementation under constrained memory conditions (simulating an 8GB VRAM limit). It compares a **Standar...
|
03-09 13:34 | Success | - | |
|
exp_self.20260309133058.048_20260309_133119
|
```markdown
README.md
|
03-09 13:31 | Success | - | |
|
exp_pytrain.20260309132844.026_20260309_132908
|
Typed Dependency Injection Container Benchmark
README.md Typed Dependency Injection Container Benchmark Overview This benchmark tests the engineering capability to construct a robust, type-driven **Dependency Injection (DI) Container** from scratch using only the Python Standard Library...
|
03-09 13:29 | Success | - | |
|
exp_self.20260309132655.047_20260309_132727
|
**Title:** SSM Strategy Stress Test: Linear vs. Quadratic Memory
README.md **Title:** SSM Strategy Stress Test: Linear vs. Quadratic Memory **Hypothesis:** Applying SSM (State Space Model) logic with a disciplined memory policy (constant state size) improves throughput under 8GB constraints compared to s...
|
03-09 13:27 | Success | - | |
|
exp_self.20260309132423.046_20260309_132453
|
Here is the design for the SSM Strategy Stress Test benchmark.
Design Rationale This benchmark compares a standard Transformer Encoder (which relies on $O(N^2)$ Attention) against a custom State Space Model (SSM) implementation (which relies on $O(N)$ recurrence). * **Innovation:** The `SSM_Mamba` modu...
|
03-09 13:25 | Success | - | |
|
exp_pytrain.20260309132156.025_20260309_132216
|
Strictly Typed Plugin Registry with Runtime Validation
This benchmark evaluates a Python developer's ability to construct robust, maintainable plugin architectures using modern type hinting features (`typing.Protocol`, `typing.Generic`, `typing.TypeVar`) and runtime validation mechanisms. Overv...
|
03-09 13:22 | Success | - | |
|
exp_self.20260309132000.045_20260309_132037
|
**README.md**
bash python benchmark.py
|
03-09 13:20 | Success | - | |
|
exp_self.20260309131702.044_20260309_131738
|
Self-directed benchmark: ssm strategy stress test
Hypothesis Applying an SSM (State Space Model) with a disciplined memory policy (fixed state size vs. growing KV cache) improves throughput and reduces VRAM pressure under 8GB constraints compared to standard attention mechanisms. Plan This...
|
03-09 13:17 | Success | - | |
|
exp_pytrain.20260309131430.024_20260309_131453
|
Section 1: README.md
Runtime-Verified Plugin Loader with Strict Typing Overview This benchmark tests the ability to construct a robust plugin system in Python using `typing.Protocol` and `runtime_checkable`. It simulates a high-assurance environment where stati...
|
03-09 13:14 | Success | - | |
|
exp_self.20260309131255.043_20260309_131326
|
I will create a benchmark for "SSM Memory Policy Stress Test". The code will define a synthetic SSM workload using pure...
**README.md** This section explains the purpose, setup, and interpretation of the benchmark. **benchmark.py** This section contains the runnable code. - It defines a simplified SSM block (Selective State Space). - It implements two modes: `...
|
03-09 13:13 | Success | - | |
|
exp_self.20260309131032.042_20260309_131059
|
This benchmark evaluates the hypothesis that a State Space Model (SSM) strategy with a disciplined memory policy improve...
README.md This benchmark evaluates the hypothesis that a State Space Model (SSM) strategy with a disciplined memory policy improves throughput under 8GB VRAM constraints compared to a standard baseline (simulated via dense linear layers/sta...
|
03-09 13:11 | Success | - | |
|
exp_pytrain.20260309130821.023_20260309_130841
|
**Title:** Strictly-Typed Dynamic Plugin Loader
README.md **Title:** Strictly-Typed Dynamic Plugin Loader **Topic:** `typing`, `packaging`, `importlib` **Overview:** This benchmark evaluates the ability to construct a robust dynamic module loading system using only the Python standard li...
|
03-09 13:08 | Success | - | |
|
exp_self.20260309130609.041_20260309_130640
|
```markdown
README.md
|
03-09 13:06 | Success | - | |
|
exp_self.20260309130352.040_20260309_130419
|
This benchmark validates the hypothesis that applying **State Space Model (SSM)** strategies with a disciplined memory p...
README.md This benchmark validates the hypothesis that applying **State Space Model (SSM)** strategies with a disciplined memory policy significantly improves throughput and reduces VRAM overhead compared to naive implementations under cons...
|
03-09 13:04 | Success | - | |
|
exp_pytrain.20260309130042.022_20260309_130123
|
Python Skill Fallback
Title: Dynamic Generic Plugin Loader with PEP 695 - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-09 13:01 | Success | - | |
|
exp_self.20260309124842.039_20260309_124908
|
```markdown
bash python benchmark.py
|
03-09 12:59 | Success | - | |
|
exp_pytrain.20260309124621.021_20260309_124645
|
Strictly Typed Generic Data Pipeline Benchmark
README.md Strictly Typed Generic Data Pipeline Benchmark Overview This benchmark evaluates the implementation of a robust, modular data processing pipeline using Python's advanced typing features. It enforces strict standards regarding API...
|
03-09 12:46 | Success | - | |
|
exp_self.20260309124417.038_20260309_124454
|
SSM Strategy Stress Test: Dynamic Precision Benchmarking
README.md SSM Strategy Stress Test: Dynamic Precision Benchmarking Overview This benchmark evaluates the performance impact of applying a **Dynamic Precision** memory policy to a State Space Model (SSM) architecture. It simulates a lightwei...
|
03-09 12:45 | Success | - | |
|
exp_self.20260309124047.037_20260309_124121
|
SSM Strategy Stress Test Benchmark
README.md SSM Strategy Stress Test Benchmark This benchmark evaluates the performance efficiency of State Space Models (SSM) compared to standard Attention mechanisms when processing long sequences under constrained memory (8GB VRAM target)...
|
03-09 12:42 | Success | - | |
|
exp_pytrain.20260309123836.020_20260309_123857
|
Benchmark: Typed Plugin Architecture for Model Registry
README.md Benchmark: Typed Plugin Architecture for Model Registry This benchmark demonstrates the implementation of a robust, type-safe plugin system often found in modern Machine Learning frameworks (like LitGPT or PyTorch). It enforces st...
|
03-09 12:39 | Success | - | |
|
exp_self.20260309122642.036_20260309_122715
|
Self-directed benchmark: ssm strategy stress test
README.md Self-directed benchmark: ssm strategy stress test Hypothesis Applying ssm with disciplined memory policy improves throughput under 8GB constraints. Plan Benchmark a standard caching mechanism (Baseline) against a fixed-state SSM-l...
|
03-09 12:37 | Success | - | |
|
exp_pytrain.20260309122420.019_20260309_122441
|
Dynamic Plugin Loader with Protocol Enforcement
README.md Title: Dynamic Plugin Loader with Protocol Enforcement Description Modern ML frameworks like HuggingFace Transformers rely on dynamic module loading to support hundreds of model architectures without hard-coding dependencies. This...
|
03-09 12:24 | Success | - | |
|
exp_self.20260309122202.035_20260309_122239
|
SSM Strategy Stress Test
README.md SSM Strategy Stress Test This benchmark evaluates the **SSM Strategy Stress Test**, focusing on the hypothesis that a disciplined memory policy combined with State Space Model (SSM) architectures improves throughput under strict 8...
|
03-09 12:23 | Success | - | |
|
exp_self.20260309121947.034_20260309_122013
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260309121947.034 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
03-09 12:20 | Success | - | |
|
exp_pytrain.20260309121712.018_20260309_121735
|
Dynamic Type-Verified Plugin Loader
README.md Dynamic Type-Verified Plugin Loader Overview This benchmark evaluates a Python system's ability to dynamically generate code, manage temporary package structures, and verify runtime type safety using the `typing` module. Problem D...
|
03-09 12:17 | Success | - | |
|
exp_self.20260309121540.033_20260309_121610
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260309121540.033 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
03-09 12:16 | Success | - | |
|
exp_self.20260309121312.032_20260309_121342
|
Here is the benchmark design for the SSM Strategy Stress Test, focusing on disciplined memory policies (specifically Dyn...
bash python benchmark.py
|
03-09 12:13 | Success | - | |
|
exp_pytrain.20260309121059.017_20260309_121122
|
Python Skill Fallback
Title: Runtime Type-Safe Dynamic Package Loader - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-09 12:11 | Success | - | |
|
exp_self.20260309120912.031_20260309_120939
|
SSM Strategy Stress Test: Memory vs. Throughput
README.md SSM Strategy Stress Test: Memory vs. Throughput **Innovation:** Selective State Space Model (SSM) vs. Standard Attention **Hypothesis:** Applying SSM with disciplined memory policy and dynamic precision improves throughput under 8...
|
03-09 12:09 | Success | - | |
|
exp_self.20260309120643.030_20260309_120713
|
Section 1: README.md
SSM Strategy Stress Test Overview This benchmark evaluates the hypothesis that **State Space Model (SSM)** strategies, specifically when combined with a disciplined memory policy and chunked recurrence, provide superior throughput under str...
|
03-09 12:07 | Success | - | |
|
exp_pytrain.20260309120424.016_20260309_120506
|
Strictly Typed Modular Data Pipeline
README.md Title: Strictly Typed Modular Data Pipeline Design Brief **Hypothesis**: Utilizing Python's type hinting system (specifically Protocols and Generics) combined with strict module encapsulation practices yields code that is signific...
|
03-09 12:05 | Success | - | |
|
exp_self.20260309120240.029_20260309_120304
|
SSM Strategy Stress Test
README.md SSM Strategy Stress Test This benchmark evaluates the performance impact of a **Disciplined Memory Policy** and **Dynamic Precision** on State Space Models (SSMs). Hypothesis Applying SSM architectures with disciplined memory mana...
|
03-09 12:03 | Success | - | |
|
exp_self.20260309120012.028_20260309_120041
|
SSM Strategy Stress Test: Memory vs. Throughput
README.md SSM Strategy Stress Test: Memory vs. Throughput Overview This benchmark evaluates the hypothesis that applying **State Space Model (SSM)** strategies with a disciplined memory policy (specifically chunked recurrence and dynamic pr...
|
03-09 12:00 | Success | - | |
|
exp_pytrain.20260309115748.015_20260309_115810
|
Strict Generic Resource Registry
README.md Strict Generic Resource Registry This coding drill benchmarks a robust, zero-dependency implementation of a `ResourceRegistry` leveraging **PEP 695 Type Parameter Syntax** (introduced in Python 3.12). Hypothesis Using PEP 695 synt...
|
03-09 11:58 | Success | - | |
|
exp_self.20260309115559.027_20260309_115632
|
Self-directed benchmark: SSM Strategy Stress Test
README.md Benchmark Overview This benchmark evaluates the efficiency of **State Space Models (SSM)** versus traditional Transformer-style Attention mechanisms when operating under strict hardware constraints (8GB VRAM). **The Innovation:**...
|
03-09 11:56 | Success | - | |
|
exp_self.20260309115325.026_20260309_115352
|
Benchmark: SSM Strategy Stress Test
README.md Benchmark: SSM Strategy Stress Test Overview This benchmark tests the hypothesis that applying a State Space Model (SSM) with a disciplined memory policy (specifically state caching) improves throughput under constrained VRAM (8GB...
|
03-09 11:54 | Success | - | |
|
exp_pytrain.20260309115133.014_20260309_115152
|
Python Skill Fallback
Title: Dynamic Recipe Loader with Structural Typing - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-09 11:51 | Success | - | |
|
exp_self.20260309114921.025_20260309_115002
|
```markdown
bash python benchmark.py ```
|
03-09 11:50 | Success | - | |
|
exp_self.20260309114644.024_20260309_114714
|
Benchmark: SSM Strategy Stress Test
README.md Benchmark: SSM Strategy Stress Test Overview This benchmark evaluates the hypothesis that applying **State Space Models (SSM)** with a disciplined memory policy improves inference throughput and memory efficiency compared to stand...
|
03-09 11:47 | Success | - | |
|
exp_pytrain.20260309114426.013_20260309_114456
|
Strictly Typed Semantic Version Plugin Loader
README.md **Title:** Strictly Typed Semantic Version Plugin Loader **Description:** This benchmark evaluates the ability to write robust, strictly typed Python code using advanced standard library features. The objective is to implement a s...
|
03-09 11:44 | Success | - | |
|
exp_self.20260309114231.023_20260309_114313
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260309114231.023 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
03-09 11:43 | Success | - | |
|
exp_self.20260309113949.022_20260309_114021
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260309113949.022 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
03-09 11:40 | Success | - | |
|
exp_pytrain.20260309113709.012_20260309_113729
|
AutoFactory Pattern Implementation with Strict Typing
README.md AutoFactory Pattern Implementation with Strict Typing Overview This coding drill implements a robust, maintainable plugin architecture using Python's `__init_subclass__` hook and `typing.Protocol`. This design pattern mimics the r...
|
03-09 11:37 | Success | - | |
|
exp_self.20260309113434.021_20260309_113502
|
SSM Strategy Stress Test
README.md SSM Strategy Stress Test Overview This benchmark evaluates the **Disciplined Memory Policy** hypothesis for State Space Models (SSMs). It compares a naive SSM implementation (which retains extensive history/cache) against an optim...
|
03-09 11:35 | Success | - | |
|
exp_self.20260309113213.020_20260309_113238
|
```markdown
bash python benchmark.py ``` Expected Output The script will output VRAM usage in Megabytes (MB) and Tokens per Second (TPS) for both the Baseline and the SSM variant, followed by a verification summary.
|
03-09 11:32 | Success | - | |
|
exp_pytrain.20260309112909.011_20260309_112935
|
Dynamic Plugin Loader with Structural Subtyping
Overview This benchmark tests a developer's ability to implement a robust, type-safe plugin system using Python's standard library. It leverages **Structural Subtyping** (via `typing.Protocol`) to enforce interfaces without explicit inherit...
|
03-09 11:29 | Success | - | |
|
exp_self.20260309112736.019_20260309_112759
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260309112736.019 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
03-09 11:28 | Success | - | |
|
exp_self.20260309112510.018_20260309_112534
|
SSM Strategy Stress Test
README.md SSM Strategy Stress Test **Innovation:** Self-directed benchmark: ssm strategy stress test **Concept:** State Space Models (SSM), Memory Policy, Dynamic Precision Overview This benchmark evaluates the hypothesis that applying a di...
|
03-09 11:25 | Success | - | |
|
exp_pytrain.20260309112218.010_20260309_112255
|
Strictly Typed Dynamic Package Generator Benchmark
This benchmark tests the ability to programmatically construct a Python package structure containing strictly typed code. It verifies that the generated module can be imported dynamically and that its type hints are correctly introspected u...
|
03-09 11:22 | Success | - | |
|
exp_self.20260309110955.017_20260309_111020
|
Here is the runnable benchmark design.
README.md bash python benchmark.py
|
03-09 11:20 | Success | - | |
|
exp_self.20260309110659.016_20260309_110729
|
---
README.md --- SSM Strategy Stress Test Overview This benchmark evaluates the hypothesis that applying a State Space Model (SSM) with a disciplined **chunked memory policy** significantly improves throughput and reduces VRAM pressure compare...
|
03-09 11:07 | Success | - | |
|
exp_pytrain.20260309110415.009_20260309_110442
|
Dynamic Type-Safe Plugin Loader
This benchmark tests the ability of a Python system to dynamically generate code, scaffold a file system structure, and perform runtime type validation using `typing.Protocol`. Context Modern Python applications often rely on plugin archite...
|
03-09 11:04 | Success | - | |
|
exp_self.20260309110228.015_20260309_110257
|
This benchmark evaluates the hypothesis that **SSM (State Space Model)** strategies with disciplined memory policies sig...
README.md This benchmark evaluates the hypothesis that **SSM (State Space Model)** strategies with disciplined memory policies significantly improve throughput and reduce VRAM overhead compared to standard attention mechanisms under long-co...
|
03-09 11:03 | Success | - | |
|
exp_self.20260309110018.014_20260309_110040
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260309110018.014 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
03-09 11:00 | Success | - | |
|
exp_pytrain.20260309105743.008_20260309_105807
|
Python Skill Fallback
Title: Strictly-Typed Dependency Resolver with PEP 695 - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-09 10:58 | Success | - | |
|
exp_self.20260309105537.013_20260309_105624
|
Self-directed benchmark: ssm strategy stress test
README.md This benchmark investigates the **hypothesis** that applying Selective State Space Models (SSM) with a disciplined memory policy and dynamic precision improves throughput and efficiency under strict memory constraints (8GB). **Bac...
|
03-09 10:56 | Success | - | |
|
exp_self.20260309105309.012_20260309_105337
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260309105309.012 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
03-09 10:53 | Success | - | |
|
exp_pytrain.20260309105048.007_20260309_105125
|
Python Skill Fallback
Title: Dynamic Plugin Registry with Runtime Type Validation - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-09 10:51 | Success | - | |
|
exp_self.20260309104839.011_20260309_104926
|
Self-directed benchmark: SSM Strategy Stress Test
README.md This benchmark evaluates the hypothesis that applying a State Space Model (SSM) with a disciplined memory policy (specifically, fixed-state recurrent processing) improves inference throughput and efficiency under tight 8GB VRAM co...
|
03-09 10:49 | Success | - | |
|
exp_self.20260309104554.010_20260309_104623
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260309104554.010 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
03-09 10:46 | Success | - | |
|
exp_pytrain.20260309104318.006_20260309_104400
|
Dynamic Type-Safe Plugin Loader with Auto-Discovery
README.md Dynamic Type-Safe Plugin Loader with Auto-Discovery This benchmark demonstrates a robust implementation of a dynamic plugin loading system using only the Python standard library. It simulates an environment similar to machine lear...
|
03-09 10:44 | Success | - | |
|
exp_self.20260309104129.009_20260309_104211
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260309104129.009 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
03-09 10:42 | Success | - | |
|
exp_self.20260309103839.008_20260309_103905
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260309103839.008 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
03-09 10:39 | Success | - | |
|
exp_pytrain.20260309103604.005_20260309_103630
|
Dynamic Module Loader with Strict Protocol Enforcement
README.md Dynamic Module Loader with Strict Protocol Enforcement Overview This coding drill evaluates the implementation of a robust plugin loading system in Python. It focuses on decoupling interface definition from implementation using `t...
|
03-09 10:36 | Success | - | |
|
exp_self.20260309103405.007_20260309_103439
|
SSM Strategy Stress Test: Memory vs. Throughput
README.md SSM Strategy Stress Test: Memory vs. Throughput Overview This benchmark evaluates the hypothesis that applying **State Space Models (SSM)** with a disciplined memory policy significantly improves throughput and reduces VRAM pressu...
|
03-09 10:34 | Success | - | |
|
exp_self.20260309103106.006_20260309_103151
|
**README.md**
Self-directed benchmark: SSM Strategy Stress Test This benchmark evaluates the performance of a State Space Model (SSM) architecture, specifically focusing on the impact of a disciplined memory policy and dynamic precision on throughput and...
|
03-09 10:31 | Success | - | |
|
exp_pytrain.20260309102743.004_20260309_102823
|
Benchmark: Dynamic Plugin Loader with Structural Subtyping
README.md Benchmark: Dynamic Plugin Loader with Structural Subtyping Overview This benchmark evaluates a Python architectural pattern combining dynamic code loading with structural subtyping (Protocols). The objective is to implement a robu...
|
03-09 10:28 | Success | - | |
|
exp_self.20260309102554.005_20260309_102619
|
```markdown
README.md bash pip install torch python benchmark.py ```
|
03-09 10:26 | Success | - | |
|
exp_self.20260309102235.004_20260309_102312
|
SSM Strategy Stress Test: Benchmarking Memory Policy
README.md SSM Strategy Stress Test: Benchmarking Memory Policy Overview This benchmark evaluates the performance of **Selective State Space Models (SSM)** compared to traditional Transformer architectures. Specifically, it tests the hypothe...
|
03-09 10:24 | Success | - | |
|
exp_pytrain.20260309101934.003_20260309_102019
|
Strict Dynamic Plugin Loader with Runtime Protocol Validation
README.md Strict Dynamic Plugin Loader with Runtime Protocol Validation Overview This benchmark evaluates the design of a robust runtime plugin loader that simulates package structures using `types` and `sys` standard library modules. It en...
|
03-09 10:20 | Success | - | |
|
exp_self.20260309101708.003_20260309_101748
|
```markdown
bash python benchmark.py
|
03-09 10:18 | Success | - | |
|
exp_self.20260309101438.002_20260309_101507
|
Self-directed benchmark: SSM Strategy Stress Test
README.md Self-directed benchmark: SSM Strategy Stress Test Hypothesis Applying a State Space Model (SSM) approach with a disciplined memory policy (simulating selective state retention and chunked processing) improves inference throughput...
|
03-09 10:15 | Success | - | |
|
exp_pytrain.20260309101133.002_20260309_101209
|
Typed Configuration Validator using PEP 695
README.md Typed Configuration Validator using PEP 695 This benchmark demonstrates the usage of Python 3.12's Type Parameter Syntax (PEP 695) to create a robust, zero-dependency configuration validation micro-library. Features - **Generic Cl...
|
03-09 10:12 | Success | - | |
|
exp_self.20260309100716.001_20260309_100754
|
Here is the runnable benchmark for the SSM strategy stress test.
bash python benchmark.py markdown
|
03-09 10:10 | Success | - | |
|
exp_pytrain.20260309100256.001_20260309_100328
|
This benchmark evaluates the efficiency and robustness of a dynamic plugin loading system built using Python's `typing.P...
README.md This benchmark evaluates the efficiency and robustness of a dynamic plugin loading system built using Python's `typing.Protocol` for structural subtyping. **Objective:** The goal is to simulate a "plugin manager" that dynamically...
|
03-09 10:03 | Success | - | |
|
exp_self.20260309090324.030_20260309_090353
|
Self-Directed Benchmark: SSM Strategy Stress Test
README.md Self-Directed Benchmark: SSM Strategy Stress Test Overview This benchmark evaluates the hypothesis that applying State Space Models (SSMs) with a disciplined memory policy improves throughput under strict 8GB VRAM constraints. Hyp...
|
03-09 09:03 | Pending | - | |
|
exp_pytrain.20260309090036.017_20260309_090117
|
Typed ZipApp Generator
README.md Title: Typed ZipApp Generator Overview This benchmark evaluates a Python system's ability to dynamically generate, package, and verify a typed command-line application using only the standard library. Design Goals 1. **Dependency-...
|
03-09 09:01 | Success | - | |
|
exp_self.20260309085725.029_20260309_085851
|
Self-directed benchmark: ssm strategy stress test
README.md --- SSM Strategy Stress Test Benchmark Overview This benchmark evaluates the **SSM Strategy** against a standard **Attention Baseline** (Transformer) to validate the hypothesis: *applying ssm with disciplined memory policy improve...
|
03-09 08:58 | Success | - | |
|
exp_pytrain.20260309085410.016_20260309_085447
|
Typed Dependency Injection Container Benchmark
README.md Title: Typed Dependency Injection Container Benchmark Design Brief This benchmark validates the hypothesis that **Strict type hinting and Protocol-based design** allow for the creation of robust dependency injection (DI) mechanism...
|
03-09 08:54 | Success | - | |
|
exp_self.20260309085201.028_20260309_085236
|
Self-directed benchmark: ssm strategy stress test
Paper ID: self.20260309085201.028 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered...
|
03-09 08:52 | Success | - | |
|
exp_self.20260309084855.027_20260309_084930
|
Here is the design for the SSM Strategy Stress Test benchmark.
No summary available yet.
|
03-09 08:49 | Success | - | |
|
exp_pytrain.20260309084648.015_20260309_084711
|
Dynamic Module Injection and Strict Protocol Validation
README.md Dynamic Module Injection and Strict Protocol Validation Overview This benchmark evaluates a system's ability to simulate a Python packaging environment by dynamically generating, compiling, and injecting modules into `sys.modules`...
|
03-09 08:47 | Success | - | |
|
exp_self.20260309084435.026_20260309_084501
|
SSM Strategy Stress Test
README.md SSM Strategy Stress Test **Innovation:** Self-directed benchmark: ssm strategy stress test **Hypothesis:** Applying SSM with a disciplined memory policy improves throughput under 8GB constraints. Description This benchmark compare...
|
03-09 08:45 | Success | - | |
|
exp_self.20260309084211.025_20260309_084235
|
Section 1: README.md
SSM Strategy Stress Test This benchmark evaluates the hypothesis that **State Space Models (SSM)**, when combined with disciplined memory policies (specifically state reduction and dynamic precision), offer superior throughput and memory ef...
|
03-09 08:42 | Success | - | |
|
exp_pytrain.20260309083929.014_20260309_084005
|
Generic Plugin Registry & Factory Benchmark
README.md Generic Plugin Registry & Factory Benchmark Overview This benchmark simulates a core component of large-scale AI frameworks like LitGPT: a modular, type-safe plugin system. It challenges the implementation to utilize Python's adva...
|
03-09 08:40 | Success | - | |
|
exp_self.20260309083735.024_20260309_083813
|
Section 1: README.md
Section 2: benchmark.py
|
03-09 08:38 | Success | - | |
|
exp_self.20260309083456.023_20260309_083549
|
Logit-Gated State Skipping Benchmark
README.md Logit-Gated State Skipping Benchmark Overview This benchmark tests the **Logit-Gated State Skipping** hypothesis on a simplified State Space Model (SSM). The core idea is to reduce computational overhead by skipping the state upda...
|
03-09 08:35 | Success | - | |
|
exp_pytrain.20260309083231.013_20260309_083303
|
Virtual Package Dispatcher with Protocol Validation
README.md Virtual Package Dispatcher with Protocol Validation Design Brief **Hypothesis**: An autonomous coding system can simulate a complex package ecosystem by generating virtual modules in-memory, validating them against strict runtime...
|
03-09 08:33 | Success | - | |
|
exp_self.20260309082955.022_20260309_083030
|
Gated Linear Attention (GLA) to SSM Bridge: Innovation Benchmark
README.md Gated Linear Attention (GLA) to SSM Bridge: Innovation Benchmark Hypothesis Gated Linear Attention (GLA) and State Space Models (SSMs) share fundamental mathematical properties as linear recurrent systems. This benchmark tests the...
|
03-09 08:30 | Success | - | |
|
exp_self.20260309082749.021_20260309_082812
|
Benchmark: Delta-State Quantization (DSQ) for SSMs
README.md Benchmark: Delta-State Quantization (DSQ) for SSMs Overview This benchmark evaluates **Delta-State Quantization (DSQ)**, a technique designed to improve the efficiency of State Space Models (SSMs) like Mamba. **The Innovation:** S...
|
03-09 08:28 | Success | - | |
|
exp_pytrain.20260309082605.012_20260309_082643
|
```markdown
README.md bash python benchmark.py
|
03-09 08:26 | Success | - | |
|
exp_oa_W7131910431_20260309_082329
|
SideQuest: Model-Driven KV Cache Management Benchmark
README.md SideQuest: Model-Driven KV Cache Management Benchmark This repository contains a benchmark designed to evaluate **SideQuest**, a novel approach to KV cache management for long-horizon agentic reasoning. Overview Large Language Mod...
|
03-09 08:23 | Success | - | |
|
exp_self.20260309082110.020_20260309_082157
|
This benchmark evaluates the efficacy of **Frequency-Domain State Compression** for State Space Models (SSMs).
README.md This benchmark evaluates the efficacy of **Frequency-Domain State Compression** for State Space Models (SSMs). Concept Standard SSMs (like Mamba) maintain a large hidden state vector $h_t$ that evolves over time. This hidden state...
|
03-09 08:22 | Success | - | |
|
exp_pytrain.20260309081908.011_20260309_081948
|
This benchmark verifies the ability to construct a robust, dynamic plugin loading system using Python's standard library...
README.md This benchmark verifies the ability to construct a robust, dynamic plugin loading system using Python's standard library. It tests the candidate's understanding of `importlib`, `typing.Protocol`, and exception handling within a fi...
|
03-09 08:19 | Success | - | |
|
exp_self.20260309081640.019_20260309_081731
|
Pinned-Window 4-bit State Streaming
Paper ID: self.20260309081640.019 - Hypothesis: Standard VRAM overflow crashes training. By implementing a ring-buffer in pinned CPU memory and syncing only the active state window in FP16 to GPU, we can train on infinite sequences. - Plan:...
|
03-09 08:17 | Success | - | |
|
exp_self.20260309081427.018_20260309_081459
|
CPU-Pinned State Recycle Cache Benchmark
README.md CPU-Pinned State Recycle Cache Benchmark This benchmark tests the **CPU-Pinned State Recycle Cache** innovation designed for SSM/Mamba architectures running on memory-constrained GPUs. The Innovation Standard SSM blocks maintain t...
|
03-09 08:15 | Success | - | |
|
exp_pytrain.20260309081208.010_20260309_081313
|
Modular Asynchronous Log Processor
README.md Modular Asynchronous Log Processor Overview This benchmark verifies the structural integrity, type safety, and performance of a modular asynchronous log processing system. It simulates a "drill" where a library component `async_pr...
|
03-09 08:13 | Success | - | |
|
exp_self.20260309081004.017_20260309_081034
|
Dynamic State Quantization for SSMs
README.md Dynamic State Quantization for SSMs Overview This benchmark evaluates a dynamic precision mechanism for State Space Models (SSMs). The innovation implements a "State Quantizer" that monitors the magnitude of state deltas ($\Delta...
|
03-09 08:10 | Success | - | |
|
exp_self.20260309080802.016_20260309_080831
|
Magnitude-Adaptive State Quantization (MASQ)
Paper ID: self.20260309080802.016 - Hypothesis: Using a hebbian-like gating mechanism to detect 'high energy' state updates and keeping those in FP16, while quantizing 'low energy' updates to INT4, will preserve model stability. - Plan: Mod...
|
03-09 08:08 | Success | - | |
|
exp_pytrain.20260309080532.009_20260309_080607
|
Benchmark: Strictly Typed Dynamic Plugin Loader
README.md Benchmark: Strictly Typed Dynamic Plugin Loader Overview This benchmark evaluates the ability of a Python system to construct a robust, dependency-free plugin loading mechanism. It demonstrates the synergy between Python's `typing...
|
03-09 08:06 | Success | - | |
|
exp_self.20260309080313.015_20260309_080355
|
Section 1: README.md
Latency-Aware State Tiering (LAST) Benchmark Overview This benchmark evaluates the **Latency-Aware State Tiering (LAST)** hypothesis. The core idea is that in State Space Models (SSMs) or RNNs, not all hidden states in a large batch are act...
|
03-09 08:04 | Success | - | |
|
exp_self.20260309080044.014_20260309_080126
|
Associative State Injection (ASI) Layer Benchmark
README.md Associative State Injection (ASI) Layer Benchmark Overview This benchmark implements and evaluates the **Associative State Injection (ASI)** layer innovation. ASI augments standard State Space Models (SSMs) with a cross-attention...
|
03-09 08:01 | Success | - | |
|
exp_pytrain.20260309075851.008_20260309_075915
|
Strict Package Metadata Validator
README.md Strict Package Metadata Validator Overview This benchmark tests the implementation of a strict package metadata validator using Python's `typing` module (specifically `TypedDict`) and the `re` module for regex-based validation. Ob...
|
03-09 07:59 | Success | - | |
|
exp_self.20260309074140.013_20260309_074221
|
This benchmark implements **Adaptive Dimension-Wise State Quantization (ADWSQ)**.
README.md This benchmark implements **Adaptive Dimension-Wise State Quantization (ADWSQ)**. This experiment tests the hypothesis that high-variance dimensions in State Space Model (SSM) hidden states carry more information and thus require...
|
03-09 07:57 | Success | - | |
|
exp_self.20260309073923.012_20260309_074015
|
Per-Channel Dynamic State Precision (PC-DSP) Benchmark
This benchmark evaluates a novel optimization technique for State Space Models (SSMs) and RNNs, specifically targeting the memory footprint of the recurrent state cache. Hypothesis In sequence modeling, the hidden state acts as a memory. We...
|
03-09 07:40 | Success | - | |
|
exp_pytrain.20260309073720.007_20260309_073800
|
```markdown
README.md bash python benchmark.py ```
|
03-09 07:38 | Success | - | |
|
exp_self.20260309073417.011_20260309_073526
|
Hybrid CPU-GPU State Streaming (HCGS) Benchmark
README.md Hybrid CPU-GPU State Streaming (HCGS) Benchmark Overview This benchmark validates the **Hybrid CPU-GPU State Streaming (HCGS)** hypothesis. It aims to demonstrate that by overlapping GPU computation of SSM (State Space Model) step...
|
03-09 07:35 | Success | - | |
|
exp_self.20260309073159.010_20260309_073250
|
Interpolated State Buffering (ISB)
Paper ID: self.20260309073159.010 - Hypothesis: SSM states change smoothly. We can compute the state every N steps, and for the intermediate steps, linearly interpolate between the last two checkpoints. This reduces memory bandwidth pressur...
|
03-09 07:32 | Success | - | |
|
exp_pytrain.20260309072934.006_20260309_072959
|
Benchmark: Dynamic Plugin Loader with Strict Type Verification
README.md Benchmark: Dynamic Plugin Loader with Strict Type Verification Hypothesis An autonomous system can robustly manage modular code architectures by implementing a custom dynamic import system. This system enforces interface complianc...
|
03-09 07:30 | Success | - | |
|
exp_self.20260309072744.009_20260309_072809
|
Recency-Biased Dynamic Precision (RBDP) Benchmark
This benchmark demonstrates the **Recency-Biased Dynamic Precision (RBDP)** innovation. It simulates a State Space Model (SSM) processing a long sequence. The core hypothesis is that recent SSM states require high precision (FP16), while ol...
|
03-09 07:28 | Success | - | |
|
exp_self.20260309072529.008_20260309_072601
|
Here is the runnable benchmark for the **Tiered State Precision (TSP)** innovation.
README.md Tiered State Precision (TSP) Benchmark **Hypothesis:** The SSM hidden state is non-uniform; the first half (recent history) requires FP16, while the second half (long-term history) can be quantized to FP8 without significant degra...
|
03-09 07:26 | Success | - | |
|
exp_pytrain.20260309072338.005_20260309_072358
|
Dynamic Module Injection with Strict Protocol Validation
README.md Dynamic Module Injection with Strict Protocol Validation This benchmark evaluates the capability of an autonomous coding system to implement a robust, modular plugin architecture using Python's standard library. The test focuses o...
|
03-09 07:24 | Success | - | |
|
exp_self.20260309072154.007_20260309_072223
|
Entropy-Modulated Spectral State Pruning (EMSSP)
README.md Entropy-Modulated Spectral State Pruning (EMSSP) Overview This benchmark implements the **EMSSP** innovation for State Space Models (SSMs). It tests the hypothesis that high-entropy tokens correspond to high-frequency components i...
|
03-09 07:22 | Success | - | |
|
exp_self.20260309071924.006_20260309_072002
|
---
README.md --- Quantized Snapshot Recycling (QSR) Benchmark This repository contains a micro-benchmark designed to validate the **Quantized Snapshot Recycling (QSR)** hypothesis. Hypothesis SSM (State Space Model) states are deterministic. B...
|
03-09 07:20 | Success | - | |
|
exp_pytrain.20260309071731.004_20260309_071806
|
Strictly Typed CLI Data Processor
README.md Strictly Typed CLI Data Processor This benchmark evaluates the ability to generate a robust, single-file Python CLI tool that enforces strict static typing using `typing` protocols and generics, while adhering to PEP 8 standards....
|
03-09 07:18 | Success | - | |
|
exp_self.20260309071516.005_20260309_071622
|
Entropy-Gated Spectral Cache (EGSC) Benchmark
README.md Entropy-Gated Spectral Cache (EGSC) Benchmark Overview This benchmark validates the **Entropy-Gated Spectral Cache (EGSC)** hypothesis. It posits that High-entropy states in a language model carry more information and require high...
|
03-09 07:16 | Success | - | |
|
exp_self.20260309071239.004_20260309_071333
|
Hybrid-Precision Asynchronous State Offloading (HP-ASO) Benchmark
README.md Hybrid-Precision Asynchronous State Offloading (HP-ASO) Benchmark Overview This benchmark evaluates **HP-ASO**, a memory management strategy designed to extend the context window of State Space Models (SSMs), such as Mamba. The co...
|
03-09 07:13 | Success | - | |
|
exp_pytrain.20260309071029.003_20260309_071109
|
---
README.md --- Coding Drill Benchmark: Typed ZipApp Package Factory Overview This benchmark evaluates an agent's ability to programmatically construct a Python package structure, enforce strict static typing using advanced standard library c...
|
03-09 07:11 | Success | - | |
|
exp_hf_2603.01666_20260309_070832
|
Beyond the Grid: Layout-Informed Multi-Vector Retrieval with Parsed Visual Document Representations
Paper ID: hf_2603.01666 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
03-09 07:08 | Success | - | |
|
exp_self.20260309070552.003_20260309_070635
|
Correction-Buffered State Streaming
Paper ID: self.20260309070552.003 - Hypothesis: We keep the main SSM state in 4-bit (on CPU or disk). We maintain a tiny (e.g., 1%) 8-bit 'correction cache' in VRAM that stores the error between the 4-bit approx and the true state. - Plan:...
|
03-09 07:06 | Success | - | |
|
exp_pytrain.20260309070422.002_20260309_070444
|
Type-Safe Dynamic Plugin Loader with PEP 695
Overview This coding drill demonstrates the implementation of a **Type-Safe Dynamic Plugin Loader** using **PEP 695 (Type Parameter Syntax)** introduced in Python 3.12. The objective is to modernize generic wrapper classes—commonly found in...
|
03-09 07:04 | Success | - | |
|
exp_self.20260309070151.002_20260309_070222
|
Entropy-Gated State Speculative Decoding
Paper ID: self.20260309070151.002 - Hypothesis: High entropy tokens carry more information and require higher state fidelity. Low entropy tokens (tokens, stop words) can be processed with 4-bit states. This dynamic switching will reduce ave...
|
03-09 07:02 | Success | - | |
|
exp_self.20260309065920.001_20260309_065956
|
Here is the runnable benchmark design for the Tiered Precision State Cache (TPSC) innovation.
No summary available yet.
|
03-09 07:00 | Success | - | |
|
exp_pytrain.20260309065752.001_20260309_065814
|
**Title:** Structurally Typed Dynamic Plugin Loader
README.md **Title:** Structurally Typed Dynamic Plugin Loader **Description:** This benchmark evaluates a system's ability to manage dynamic code loading and structural type validation without external dependencies. It tests the creation of...
|
03-09 06:58 | Success | - | |
|
exp_pytrain.20260309064248.002_20260309_064327
|
PEP 695 Generic Storage and Packaging Drill
README.md PEP 695 Generic Storage and Packaging Drill **Objective** This benchmark validates the implementation of Python 3.12+ `PEP 695` Type Parameter Syntax. It requires the creation of a generic class `Storage[T]` and a generic function...
|
03-09 06:43 | Success | - | |
|
exp_self.20260309064035.002_20260309_064116
|
```markdown
bash python benchmark.py
|
03-09 06:41 | Success | - | |
|
exp_self.20260309063822.001_20260309_063908
|
ARES: SSM + Cache + Dynamic Precision Benchmark
README.md ARES: SSM + Cache + Dynamic Precision Benchmark This benchmark tests the hypothesis that combining **State Space Models (SSM)**, efficient **Caching**, and **Dynamic Precision** improves memory efficiency and throughput compared t...
|
03-09 06:39 | Success | - | |
|
exp_pytrain.20260309063620.001_20260309_063710
|
Protocol-Based Dynamic Plugin Registry
README.md Protocol-Based Dynamic Plugin Registry Overview This benchmark demonstrates a robust, structural subtyping-based plugin system using Python's `typing.Protocol`. Unlike traditional inheritance-based plugin architectures (Abstract B...
|
03-09 06:37 | Success | - | |
|
exp_pytrain.20260309062914.003_20260309_062946
|
Dynamic Plugin Loader with Runtime Type Validation
Overview This benchmark tests the ability to construct a flexible, type-safe plugin architecture using Python's standard library. It simulates a dynamic package environment where modules are created in-memory, loaded via `importlib`, and va...
|
03-09 06:29 | Success | - | |
|
exp_hf_2603.05438_20260309_062747
|
Planning in 8 Tokens: A Compact Discrete Tokenizer for Latent World Model
Paper ID: hf_2603.05438 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
03-09 06:27 | Success | - | |
|
exp_self.20260309062433.002_20260309_062523
|
Low-Rank Associative State Injection (LASI)
Paper ID: self.20260309062433.002 - Hypothesis: SSMs are theoretically limited to finite memory. By maintaining a small (low-rank) 'global context' matrix updated via linear attention (which is O(N) and fits in cache) and injecting it into...
|
03-09 06:25 | Success | - | |
|
exp_pytrain.20260309062235.002_20260309_062308
|
Generic Component Registry using PEP 695
This benchmark demonstrates the use of Python 3.12's **PEP 695 Type Parameter Syntax** to create a generic `ComponentRegistry` class. It validates that the new syntax reduces boilerplate (removing the need for explicit `Generic` inheritance...
|
03-09 06:23 | Success | - | |
|
exp_self.20260309062030.001_20260309_062105
|
Entropy-Gated Dynamic Precision (EGDP) for SSMs
README.md Entropy-Gated Dynamic Precision (EGDP) for SSMs Overview This benchmark evaluates the **Entropy-Gated Dynamic Precision (EGDP)** innovation applied to Mamba-style State Space Models (SSMs). Hypothesis Tokens with high entropy (hig...
|
03-09 06:21 | Success | - | |
|
exp_hf_2603.06331_20260309_061853
|
WorldCache: Benchmarking Heterogeneous Token Caching
README.md WorldCache: Benchmarking Heterogeneous Token Caching This benchmark demonstrates the performance gains of **WorldCache**, a framework designed to accelerate diffusion-based world models. The Innovation Standard diffusion models ap...
|
03-09 06:18 | Success | - | |
|
exp_pytrain.20260309061616.001_20260309_061655
|
Python Skill Fallback
Title: Type-Safe Dynamic Plugin Loader - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-09 06:17 | Success | - | |
|
exp_self.20260309025539.108_20260309_025736
|
Entropy-Gated Dynamic State Quantization (EG-DSQ)
README.md Entropy-Gated Dynamic State Quantization (EG-DSQ) Overview This benchmark evaluates the **Entropy-Gated Dynamic State Quantization (EG-DSQ)** innovation applied to a State Space Model (SSM). The Innovation Standard SSMs (like Mamb...
|
03-09 03:01 | Success | - | |
|
exp_pytrain.20260309025116.060_20260309_025205
|
Robust Dynamic Plugin System using Protocols and Importlib
README.md This benchmark evaluates a Python system's ability to dynamically construct a package structure, generate source code on-the-fly, and validate loaded modules against strict `typing.Protocol` interfaces. Objective To demonstrate ma...
|
03-09 02:52 | Success | - | |
|
exp_self.20260309024727.107_20260309_024843
|
Gated State Quantization (GSQ)
Paper ID: self.20260309024727.107 - Hypothesis: When the SSM gate is 'closed' (retaining old memory), the state is static and can be aggressively quantized (int8). When the gate is 'open' (absorbing new info), we temporarily switch to high...
|
03-09 02:48 | Success | - | |
|
exp_pytrain.20260309024255.059_20260309_024356
|
Benchmark: Auto-Registering Component System with Typed Configurations
README.md Benchmark: Auto-Registering Component System with Typed Configurations Objective This benchmark tests your ability to design a robust, declarative plugin architecture using advanced Python metaprogramming features and static type...
|
03-09 02:43 | Success | - | |
|
exp_pytrain.20260309023336.058_20260309_023534
|
Python Skill Fallback
Title: Dynamic Plugin Loader with Runtime Protocol Enforcement - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-09 02:35 | Success | - | |
|
exp_self.20260309022924.106_20260309_023049
|
Spectral State Cache (SSC) Benchmark
README.md Spectral State Cache (SSC) Benchmark This benchmark evaluates the **Spectral State Cache** innovation, which applies frequency-domain decomposition (DCT/FFT) to the recurrent states of State Space Models (SSMs). Hypothesis The rec...
|
03-09 02:30 | Success | - | |
|
exp_pytrain.20260309022456.057_20260309_022547
|
Benchmark: Strict Typed Plugin System with Namespace Control
README.md Benchmark: Strict Typed Plugin System with Namespace Control Objective This benchmark validates the implementation of a strictly typed, extensible plugin system using Python's `typing.Protocol` and explicit namespace management vi...
|
03-09 02:25 | Success | - | |
|
exp_self.20260309022045.105_20260309_022212
|
Asynchronous State Offloading (ASO) Benchmark
This repository contains a minimal, runnable benchmark designed to test the **Asynchronous State Offloading (ASO)** hypothesis. The Hypothesis In State Space Models (SSMs) like Mamba, managing the recurrent state during long-context generat...
|
03-09 02:22 | Success | - | |
|
exp_pytrain.20260309021757.056_20260309_021840
|
Strictly-Typed Dynamic Plugin Loader
Overview This benchmark evaluates the ability to construct a robust, dynamic plugin loading system using Python's standard library. It focuses on the combination of `importlib` for dynamic runtime loading and `typing.Protocol` for strict st...
|
03-09 02:18 | Success | - | |
|
exp_self.20260309021514.104_20260309_021552
|
Student hypothesis: dynamic_precision + ssm_mamba co-design
Paper ID: self.20260309021514.104 - Hypothesis: Combining dynamic_precision + ssm_mamba + memory will improve throughput or memory efficiency without breaking 8GB execution. - Plan: Create a compact comparative benchmark against a simple ba...
|
03-09 02:15 | Success | - | |
|
exp_self.20260309021223.103_20260309_021308
|
Benchmark: SSM + Cache Co-design with Dynamic Precision
README.md Benchmark: SSM + Cache Co-design with Dynamic Precision Hypothesis This benchmark explores the **Student Hypothesis**: Integrating State Space Models (SSM), efficient State Caching, and Dynamic Precision (Mixed Precision) in a co-...
|
03-09 02:13 | Success | - | |
|
exp_pytrain.20260309021023.055_20260309_021047
|
Benchmark: Type-Safe Plugin Registry with PEP 695
README.md Benchmark: Type-Safe Plugin Registry with PEP 695 Overview This benchmark validates the use of Python 3.12+'s PEP 695 Type Parameter Syntax to create a generic, type-safe Plugin Registry. It ensures that the new syntax reduces boi...
|
03-09 02:10 | Success | - | |
|
exp_self.20260309020620.102_20260309_020743
|
Section 1: README.md
bash pip install torch python benchmark.py
|
03-09 02:07 | Success | - | |
|
exp_self.20260309020325.101_20260309_020424
|
Asynchronous State Recycle Cache (ASRC) Benchmark
README.md Asynchronous State Recycle Cache (ASRC) Benchmark This repository contains a benchmark designed to test the **Asynchronous State Recycle Cache (ASRC)** innovation. The hypothesis is that by offloading SSM (State Space Model) state...
|
03-09 02:04 | Success | - | |
|
exp_pytrain.20260309020023.054_20260309_020117
|
```markdown
README.md bash python benchmark.py ---
|
03-09 02:01 | Success | - | |
|
exp_self.20260309015635.100_20260309_015742
|
Innovation: Temporal Delta State Quantization
README.md Innovation: Temporal Delta State Quantization Overview This benchmark validates the **Temporal Delta State Quantization** technique applied to State Space Models (SSMs). **Hypothesis:** SSM states evolve smoothly over time. The di...
|
03-09 01:57 | Success | - | |
|
exp_pytrain.20260309015345.053_20260309_015434
|
Benchmark: Dynamic Backend Registry with Protocol Enforcement
README.md Benchmark: Dynamic Backend Registry with Protocol Enforcement **Title:** Dynamic Backend Registry with Protocol Enforcement **Focus:** `typing.Protocol`, `importlib`, dynamic plugin discovery. **Execution Time:** < 20 seconds. Obj...
|
03-09 01:54 | Success | - | |
|
exp_self.20260309015057.099_20260309_015157
|
Pinned-State Quantization Buffer (PSQB) Benchmark
README.md Pinned-State Quantization Buffer (PSQB) Benchmark This repository contains the benchmark code for the **Pinned-State Quantization Buffer (PSQB)** innovation. Hypothesis For State Space Models (SSMs) like Mamba, the recurrent state...
|
03-09 01:52 | Success | - | |
|
exp_self.20260309014812.098_20260309_014905
|
Spectral State Denoising (SSD) Benchmark
README.md Spectral State Denoising (SSD) Benchmark This benchmark evaluates the hypothesis that recurrent hidden states in State Space Models (SSMs) contain high-frequency noise that can be discarded to improve memory efficiency. The Innova...
|
03-09 01:49 | Success | - | |
|
exp_pytrain.20260309014459.052_20260309_014557
|
Strictly-Typed Component Registry System
Overview This benchmark demonstrates a strictly-typed `Registry` pattern implementation using Python's standard `typing` module. It mimics the behavior of modern ML frameworks (like Hugging Face Transformers or Diffusers) where components a...
|
03-09 01:46 | Success | - | |
|
exp_self.20260309014142.097_20260309_014230
|
Linear-Mamba Kernel Fusion (LMKF) Benchmark
README.md Linear-Mamba Kernel Fusion (LMKF) Benchmark Overview This benchmark validates the **Linear-Mamba Kernel Fusion (LMKF)** hypothesis: that a hybrid inference engine can switch between an optimized SSM (Mamba-style) execution path an...
|
03-09 01:42 | Success | - | |
|
exp_self.20260309013907.096_20260309_013955
|
Entropy-Gated Dynamic State Quantization Benchmark
README.md Entropy-Gated Dynamic State Quantization Benchmark This benchmark evaluates a novel optimization for State Space Models (SSMs) where the precision of the hidden state is dynamically adjusted based on the information entropy of the...
|
03-09 01:40 | Success | - | |
|
exp_pytrain.20260309013625.051_20260309_013708
|
```markdown
README.md bash python benchmark.py
|
03-09 01:37 | Success | - | |
|
exp_self.20260309013215.095_20260309_013322
|
Entropy-Triggered CPU Offload (ETCO)
Overview This benchmark tests the **Entropy-Triggered CPU Offload (ETCO)** strategy applied to State Space Models (SSMs). The core hypothesis is that the internal state `h` of an SSM acts as a compressive history. During fluent generation (...
|
03-09 01:33 | Success | - | |
|
exp_pytrain.20260309012927.050_20260309_013026
|
Coding Drill: Asynchronous Typed Module Pattern
README.md Coding Drill: Asynchronous Typed Module Pattern Objective This benchmark evaluates the ability to design and verify a robust, single-file Python module that adheres to modern packaging and typing standards. The drill requires gene...
|
03-09 01:30 | Success | - | |
|
exp_self.20260309012628.094_20260309_012724
|
This benchmark evaluates the **Frequency-Domain State Offloading** technique for State Space Models (SSMs).
README.md This benchmark evaluates the **Frequency-Domain State Offloading** technique for State Space Models (SSMs). Concept Standard SSM implementations maintain a recurrent state tensor on the GPU to avoid slow PCIe transfers. This limit...
|
03-09 01:27 | Success | - | |
|
exp_self.20260309012325.093_20260309_012408
|
Entropy-Adaptive State Quantization (EASQ)
Paper ID: self.20260309012325.093 - Hypothesis: High-entropy inputs require full FP16 state precision to maintain gradients, while low-entropy inputs can safely use INT4 states, reducing VRAM pressure by 30%. - Plan: Implement a wrapper for...
|
03-09 01:24 | Success | - | |
|
exp_pytrain.20260309012015.049_20260309_012146
|
Dynamic Module Validator with TypeGuards
README.md Dynamic Module Validator with TypeGuards Overview This coding drill demonstrates a robust approach to runtime type safety in Python plugin systems. It simulates a scenario where an application must dynamically load a module from a...
|
03-09 01:21 | Success | - | |
|
exp_self.20260309011711.092_20260309_011815
|
---
Student hypothesis: ssm + cache + dynamic_precision Hypothesis Combining `ssm` + `cache` + `dynamic_precision` will improve throughput or memory efficiency without breaking 8GB execution. Plan Create a compact comparative benchmark against...
|
03-09 01:18 | Success | - | |
|
exp_self.20260309011422.091_20260309_011508
|
Sliding-Window Linear SSM Bridge
Paper ID: self.20260309011422.091 - Hypothesis: SSMs fail at precise retrieval because of state compression. A sliding window attention layer (Linear Attention) applied to the raw recent tokens will boost retrieval accuracy without quadrati...
|
03-09 01:15 | Success | - | |
|
exp_pytrain.20260309011242.048_20260309_011314
|
```markdown
README.md bash python benchmark.py
|
03-09 01:13 | Success | - | |
|
exp_self.20260309011015.090_20260309_011114
|
Salience-Adaptive Mixed-Precision States (SAMP-S)
Innovation This benchmark introduces **Salience-Adaptive Mixed-Precision States**, a compression technique for State Space Models (SSMs). Standard SSMs maintain large recurrent states (e.g., in Mamba architectures) entirely in FP16. We hypo...
|
03-09 01:11 | Success | - | |
|
exp_self.20260309010812.089_20260309_010843
|
Student hypothesis: ssm + cache co-design
Paper ID: self.20260309010812.089 - Hypothesis: Combining ssm + cache + dynamic_precision will improve throughput or memory efficiency without breaking 8GB execution. - Plan: Create a compact comparative benchmark against a simple baseline,...
|
03-09 01:08 | Success | - | |
|
exp_pytrain.20260309010534.047_20260309_010619
|
Dynamic Plugin Registry with Runtime Type Enforcement
README.md Title: Dynamic Plugin Registry with Runtime Type Enforcement Overview This benchmark tests a system's ability to create a robust, dynamic module loader that utilizes Python's `importlib` to discover user-defined packages within a...
|
03-09 01:06 | Success | - | |
|
exp_self.20260309010152.088_20260309_010330
|
Benchmark: Pipeline-Asynchronous State Offload (PASO)
README.md Benchmark: Pipeline-Asynchronous State Offload (PASO) Overview This benchmark tests the **PASO** innovation, designed to handle infinite-length context sequences on limited GPU VRAM (e.g., 8GB) by offloading SSM (State Space Model...
|
03-09 01:03 | Success | - | |
|
exp_pytrain.20260309005912.046_20260309_005954
|
Benchmark: Strictly-Typed Plugin Registry with Metadata Introspection
README.md Benchmark: Strictly-Typed Plugin Registry with Metadata Introspection Overview This benchmark tests the ability to implement a robust, type-safe plugin system using Python's standard library. The core hypothesis is that `typing.Pr...
|
03-09 00:59 | Success | - | |
|
exp_self.20260309005654.087_20260309_005730
|
Paged-Scan State Memory (PSSM) Benchmark
This benchmark demonstrates the **Paged-Scan State Memory (PSSM)** concept, an optimization designed to overcome GPU VRAM limitations when processing long-context sequences in State Space Models (SSMs) like Mamba. The Innovation: Paged-Scan...
|
03-09 00:57 | Success | - | |
|
exp_self.20260309005456.086_20260309_005542
|
Here is the runnable benchmark for the Modular State Experts (MoE-State) innovation.
README.md
|
03-09 00:55 | Success | - | |
|
exp_pytrain.20260309005252.045_20260309_005312
|
```markdown
bash mypy --strict benchmark.py bash python benchmark.py ``` *Expected:* `VERIFIED: PASSED` along with performance metrics. Acceptance Criteria - **Typing**: Implements `Plugin` Protocol and `PluginRegistry` using `typing.Protocol`, `typing...
|
03-09 00:53 | Success | - | |
|
exp_self.20260309005035.085_20260309_005135
|
Exponential Temporal Quantization (ETQ)
Paper ID: self.20260309005035.085 - Hypothesis: Recent state information requires FP16, but historical state (older than 1k tokens) can be stored in INT4 or FP8 without performance loss, exponentially decaying precision over time. - Plan: M...
|
03-09 00:51 | Success | - | |
|
exp_self.20260309004838.084_20260309_004923
|
Progressive-Precision State Quantization (PPSQ)
README.md Progressive-Precision State Quantization (PPSQ) Overview **PPSQ** is a memory optimization technique for State Space Models (SSMs) inspired by the concept of "Dynamic Precision". The core hypothesis is that the sensitivity of the...
|
03-09 00:49 | Success | - | |
|
exp_pytrain.20260309004605.044_20260309_004641
|
Python Skill Fallback
Title: Dynamic Typed CLI Dispatcher - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-09 00:46 | Success | - | |
|
exp_self.20260309004425.083_20260309_004458
|
Asynchronous CPU-Projected State
Paper ID: self.20260309004425.083 - Hypothesis: SSM states are low-bandwidth compared to weights. By maintaining a 'hot' state on GPU and a 'cold' history on CPU (pinned memory), we can process effectively infinite context lengths within 8G...
|
03-09 00:45 | Success | - | |
|
exp_self.20260309004248.082_20260309_004318
|
Variance-Gated Dynamic Quantization for SSMs
This repository contains a benchmark suite designed to validate the **Variance-Gated Dynamic Quantization** hypothesis. Hypothesis Channels within the State Space Model (SSM) state tensor exhibit varying temporal activity. By tracking the r...
|
03-09 00:43 | Success | - | |
|
exp_self.20260309004030.081_20260309_004118
|
Tiered State Offloading for Long Context
Paper ID: self.20260309004030.081 - Hypothesis: Segregating the SSM hidden state into a 'hot' GPU resident state (recent tokens) and a 'cold' CPU resident state (older tokens) will allow for longer contexts than VRAM alone permits, with acc...
|
03-09 00:41 | Success | - | |
|
exp_pytrain.20260309003850.043_20260309_003927
|
Python Skill Fallback
Title: Dynamic Type-Safe Plugin Loader - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-09 00:39 | Success | - | |
|
exp_self.20260309003642.080_20260309_003714
|
Logarithmic State Space Machine (LogSSM) Benchmark
README.md Logarithmic State Space Machine (LogSSM) Benchmark Overview This benchmark evaluates the **LogSSM** innovation, which hypothesizes that storing SSM (State Space Model) states in a Logarithmic Number System (LNS) using 8-bit intege...
|
03-09 00:37 | Success | - | |
|
exp_self.20260309003418.079_20260309_003451
|
Chronos-Decayed State Precision Benchmark
Section 1: README.md Section 2: benchmark.py
|
03-09 00:35 | Success | - | |
|
exp_pytrain.20260309003242.042_20260309_003303
|
Generic Data Packet Router with PEP 695 Syntax
This benchmark validates the implementation of a Generic Data Packet Router using Python 3.12's PEP 695 Type Parameter Syntax. Overview PEP 695 introduces a new, more concise syntax for declaring generics. This drill requires implementing a...
|
03-09 00:33 | Success | - | |
|
exp_self.20260309002920.078_20260309_003106
|
Dual-Resolution State Management (DRSM)
Paper ID: self.20260309002920.078 - Hypothesis: Splitting the SSM recurrent state into a 'hot' path (recent tokens) and 'cold' path (history) allows for aggressive compression of the history without significant performance degradation on lo...
|
03-09 00:31 | Success | - | |
|
exp_self.20260309002649.077_20260309_002735
|
Innovation Benchmark: SSM + Cache + Dynamic Precision
README.md Innovation Benchmark: SSM + Cache + Dynamic Precision Hypothesis Combining **SSM** (State Space Models), **Cache** (KV optimization), and **Dynamic Precision** (Mixed Precision/AMP) in a co-design architecture will improve through...
|
03-09 00:27 | Success | - | |
|
exp_pytrain.20260309002503.041_20260309_002528
|
---
**README.md** Type-Safe Entry Point Resolver System Overview This benchmark demonstrates a robust, type-safe plugin loading mechanism using Python's standard library. It simulates a package manager's ability to discover, load, and validate...
|
03-09 00:25 | Success | - | |
|
exp_self.20260309000728.076_20260309_000814
|
Variance-Gated KV Cache Quantization (VGBKV)
README.md Variance-Gated KV Cache Quantization (VGBKV) Concept Modern LLMs are bottlenecked by the memory bandwidth required to read the growing KV Cache during inference. Standard KV caches store 16-bit (FP16/BF16) vectors for every token....
|
03-09 00:23 | Success | - | |
|
exp_pytrain.20260309000515.040_20260309_000551
|
Type-Safe 'Mini-Tensor' Library Benchmark
README.md Type-Safe 'Mini-Tensor' Library Benchmark Objective This benchmark evaluates a Python engineering system's ability to construct a modular, type-safe numerical library using **only the Python Standard Library**. The system must dem...
|
03-09 00:05 | Success | - | |
|
exp_self.20260309000216.075_20260309_000242
|
This benchmark investigates the hypothesis that combining **State Space Models (SSM)**, **Caching mechanisms**, and **Dy...
README.md This benchmark investigates the hypothesis that combining **State Space Models (SSM)**, **Caching mechanisms**, and **Dynamic Precision** can significantly improve throughput and memory efficiency compared to standard Transformer-...
|
03-09 00:02 | Success | - | |
|
exp_pytrain.20260308235900.039_20260308_235924
|
Extensible Type-Safe Plugin Registry
This benchmark demonstrates a robust, scalable architecture pattern often seen in production ML frameworks (like Hugging Face Transformers or Diffusers), implemented entirely with Python standard library features. Overview The system implem...
|
03-08 23:59 | Success | - | |
|
exp_self.20260308235554.074_20260308_235646
|
Asynchronous Delta-State Prefetching
Paper ID: self.20260308235554.074 - Hypothesis: Transferring the full state from CPU to GPU causes stalls. Transferring only the delta (updates) allows overlapping computation and data transfer (async), improving throughput for large-contex...
|
03-08 23:56 | Success | - | |
|
exp_self.20260308235409.073_20260308_235439
|
Linear-SSM Bridge Compression (LSBC)
Paper ID: self.20260308235409.073 - Hypothesis: SSMs struggle with 'recall' of very distant context. Passing the SSM state through a Linear Attention layer every N steps allows the model to 'attend' to its own history more efficiently than...
|
03-08 23:54 | Success | - | |
|
exp_pytrain.20260308235242.038_20260308_235301
|
Dynamic Type-Safe Plugin Loader
README.md Dynamic Type-Safe Plugin Loader Overview This benchmark tests the ability to dynamically construct a Python package on the file system and load it using the standard import machinery. It emphasizes strict typing using `typing.Prot...
|
03-08 23:53 | Success | - | |
|
exp_self.20260308235054.072_20260308_235126
|
Semantic LRU for SSM State Windows
Paper ID: self.20260308235054.072 - Hypothesis: In long-context conversations, recent tokens (LRU) are often filler. Replacing the state based on semantic similarity to the current query (e.g., cosine similarity of embeddings) will yield be...
|
03-08 23:51 | Success | - | |
|
exp_self.20260308234916.071_20260308_234940
|
Innovation: Entropy-Gated Host-Side State Streaming (EG-HS3)
README.md Innovation: Entropy-Gated Host-Side State Streaming (EG-HS3) Hypothesis High-entropy states in Selective State Space Models (SSMs) like Mamba carry unique information that is harder to compress but worth retaining in slower CPU me...
|
03-08 23:49 | Success | - | |
|
exp_self.20260308234724.070_20260308_234748
|
Host-Side Linear Memory Pool (HS-LMP) Benchmark
README.md Host-Side Linear Memory Pool (HS-LMP) Benchmark Overview This benchmark evaluates the **Host-Side Linear Memory Pool (HS-LMP)**, a technique designed to extend the effective context window of State Space Models (SSMs), such as Mam...
|
03-08 23:48 | Success | - | |
|
exp_pytrain.20260308234557.037_20260308_234628
|
Strictly Typed Environment Metadata Inspector
README.md Strictly Typed Environment Metadata Inspector Overview This coding drill validates the hypothesis that an autonomous coding system can bridge dynamic runtime introspection (packaging metadata) with static type safety (the `typing`...
|
03-08 23:46 | Success | - | |
|
exp_self.20260308234339.069_20260308_234411
|
Variance-Gated Bitwidth (VGB)
Paper ID: self.20260308234339.069 - Hypothesis: Not all state dimensions are equally important at all times. Dimensions with low variance (static memory) can be stored in FP8, while high-variance dimensions (active processing) require FP16....
|
03-08 23:44 | Success | - | |
|
exp_self.20260308234204.068_20260308_234237
|
Entropy-Adaptive State Tiering (EAST) Reloaded
README.md Entropy-Adaptive State Tiering (EAST) Reloaded Overview This benchmark implements **Entropy-Adaptive State Tiering (EAST)**, a memory optimization technique for State Space Models (SSMs) and Large Language Models (LLMs). The Hypot...
|
03-08 23:42 | Success | - | |
|
exp_pytrain.20260308233955.036_20260308_234021
|
Python Skill Fallback
Title: Dynamic Module Loader with Strict Protocol Enforcement - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-08 23:40 | Success | - | |
|
exp_self.20260308233749.067_20260308_233831
|
Semantic Bitwidth Allocation (SBA) Benchmark
README.md This repository contains a runnable benchmark for **Semantic Bitwidth Allocation (SBA)**, a novel technique designed to optimize memory bandwidth in State Space Models (SSMs) like Mamba. The Innovation Standard SSMs maintain a sta...
|
03-08 23:38 | Success | - | |
|
exp_self.20260308233431.066_20260308_233504
|
Token-Triggered Precision Decay (TTPD) Benchmark
README.md Token-Triggered Precision Decay (TTPD) Benchmark This repository contains a micro-benchmark designed to validate the **Token-Triggered Precision Decay (TTPD)** hypothesis. Hypothesis Recent tokens in a Sequence Modeling (SSM) stat...
|
03-08 23:35 | Success | - | |
|
exp_pytrain.20260308233233.035_20260308_233258
|
Strictly Typed Generic Dispatcher with API Isolation
README.md Strictly Typed Generic Dispatcher with API Isolation Overview This coding drill verifies the implementation of a library-grade `EventBus[T]` using Python 3.12's Type Parameter Syntax (PEP 695). The goal is to demonstrate how moder...
|
03-08 23:33 | Success | - | |
|
exp_self.20260308233023.065_20260308_233055
|
---
README.md --- TASP Benchmark: Token-Adaptive State Precision This benchmark evaluates the **Token-Adaptive State Precision (TASP)** innovation for Mamba-style State Space Models (SSMs). Hypothesis Tokens with low entropy (e.g., punctuation,...
|
03-08 23:30 | Success | - | |
|
exp_self.20260308232737.064_20260308_232825
|
Here is the runnable benchmark design for the Bi-Precision State Streaming (BPSS) innovation.
No summary available yet.
|
03-08 23:28 | Success | - | |
|
exp_pytrain.20260308232548.034_20260308_232614
|
Type-Safe Plugin Registry Benchmark
README.md Type-Safe Plugin Registry Benchmark This benchmark evaluates the implementation of a modular, type-safe command registry using Python's standard library type hinting features. Overview The design leverages `typing.Protocol` and `t...
|
03-08 23:26 | Success | - | |
|
exp_self.20260308232252.063_20260308_232358
|
Hybrid CPU-GPU State Streaming (H-CGS)
Paper ID: self.20260308232252.063 - Hypothesis: Decoupling the state update (fast, GPU) from the state storage (large, CPU) allows processing sequences 4x longer than GPU VRAM would normally allow with negligible latency penalty. - Plan: 1....
|
03-08 23:24 | Success | - | |
|
exp_2603.06577v1_20260308_232106
|
Section 1: README.md
bash python benchmark.py
|
03-08 23:21 | Success | - | |
|
exp_pytrain.20260308231824.033_20260308_231900
|
Type-Driven Plugin System Drill
README.md Type-Driven Plugin System Drill Overview This benchmark tests your ability to design a robust, type-safe Python library architecture using `typing.Protocol` and `typing.Generic`. The goal is to create a "Task Executor" system wher...
|
03-08 23:19 | Success | - | |
|
exp_self.20260308231603.062_20260308_231640
|
Here is the design for the Pinned-State Swap Scheduler (PSSS) benchmark.
Benchmark Design Overview This benchmark tests the **Pinned-State Swap Scheduler (PSSS)** hypothesis. It simulates a workload consisting of alternating **SSM layers** (which rely on a large hidden state) and **MLP layers** (which are comput...
|
03-08 23:16 | Success | - | |
|
exp_self.20260308231322.061_20260308_231405
|
Delta-Indexed Semantic Cache (DISC)
Paper ID: self.20260308231322.061 - Hypothesis: Using the derivative of the SSM state as a query key into a compressed KV-cache will allow retrieval of relevant distant context with O(1) complexity, improving perplexity on long-context task...
|
03-08 23:14 | Success | - | |
|
exp_pytrain.20260308231143.032_20260308_231213
|
Python Skill Fallback
Title: Strict Type-Safe Plugin Registry - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-08 23:12 | Success | - | |
|
exp_self.20260308230918.060_20260308_231004
|
Associative State Patching (ASP) Benchmark
README.md Associative State Patching (ASP) Benchmark This benchmark evaluates the **Associative State Patching (ASP)** technique applied to State Space Models (SSMs). Hypothesis SSMs are prone to 'state drift' over long sequences. ASP maint...
|
03-08 23:10 | Success | - | |
|
exp_self.20260308230715.059_20260308_230748
|
Gated Linear-Attention State Bridge (GLA-Bridge)
Paper ID: self.20260308230715.059 - Hypothesis: SSMs struggle with exact recall. Gating the SSM state with a Linear Attention summary of the input history will allow the model to 'lookup' past tokens explicitly without an $O(N^2)$ cost. - P...
|
03-08 23:07 | Success | - | |
|
exp_pytrain.20260308230520.031_20260308_230555
|
Robust Generic Command Bus Implementation
README.md Robust Generic Command Bus Implementation This benchmark implements a production-ready **Command Bus** pattern using only the Python Standard Library. Architecture The design enforces strict decoupling between the **Request** (Com...
|
03-08 23:06 | Success | - | |
|
exp_self.20260308230315.058_20260308_230339
|
```markdown
README.md bash python benchmark.py
|
03-08 23:03 | Success | - | |
|
exp_self.20260308230054.057_20260308_230129
|
CPU-Pinned Sparse Associative Memory (CPSAM)
Paper ID: self.20260308230054.057 - Hypothesis: The hidden state $H_t$ can be sparsified and stored in pinned CPU memory. A lightweight 'gate' on the GPU determines if the CPU state is needed, preventing full-GPU history storage. - Plan: Im...
|
03-08 23:01 | Success | - | |
|
exp_pytrain.20260308225819.030_20260308_225849
|
Generic Asynchronous Event Dispatcher Benchmark
README.md Generic Asynchronous Event Dispatcher Benchmark Overview This benchmark validates the design of a strictly typed, generic asynchronous event dispatcher using Python's standard library. It demonstrates the creation of a robust, tes...
|
03-08 22:59 | Success | - | |
|
exp_self.20260308225613.056_20260308_225643
|
Sparse Associative State Injection
Paper ID: self.20260308225613.056 - Hypothesis: Instead of a monolithic state vector, we maintain a sparse set of 'memory slots' updated by the SSM. During generation, we perform a sparse lookup (KNN) on these slots to inject relevant histo...
|
03-08 22:56 | Success | - | |
|
exp_self.20260308225403.055_20260308_225449
|
Semantic State Delta Caching (SSDC)
README.md Semantic State Delta Caching (SSDC) Innovation Semantic State Delta Caching (SSDC) improves the inference speed of State Space Models (SSMs) by caching internal state vectors based on input token hashes. Concept Traditional KV cac...
|
03-08 22:54 | Success | - | |
|
exp_pytrain.20260308225136.029_20260308_225212
|
Generic Type-Safe Event Dispatcher Benchmark
README.md Generic Type-Safe Event Dispatcher Benchmark Design Brief This benchmark demonstrates a modular, single-file Python package implementation that leverages advanced static typing features. It simulates a package structure using clas...
|
03-08 22:52 | Success | - | |
|
exp_2603.06576v1_20260308_225001
|
Section 1: README.md
bash pip install torch python benchmark.py
|
03-08 22:50 | Success | - | |
|
exp_self.20260308224720.054_20260308_224804
|
Hybrid KV-SSM Cache Injection
Overview This benchmark evaluates the **Hybrid KV-SSM Cache Injection** architecture. This innovation combines the long-range comprehension of State Space Models (SSMs) with the precise, factual recall of a sliding-window KV cache. The Inno...
|
03-08 22:48 | Success | - | |
|
exp_pytrain.20260308224510.028_20260308_224538
|
Robust Type-Safe Plugin Loader
README.md Robust Type-Safe Plugin Loader Overview This benchmark evaluates a developer's ability to construct a secure, extensible plugin architecture in Python using only the standard library. The task involves creating a `PluginManager` c...
|
03-08 22:45 | Success | - | |
|
exp_self.20260308224304.053_20260308_224342
|
Delta-State Accumulator with CPU Offload
README.md Delta-State Accumulator with CPU Offload Innovation Overview This benchmark evaluates a "Delta-State Accumulator" technique for Selective State Space Models (SSMs), specifically optimizing for GPU memory constraints. **Hypothesis:...
|
03-08 22:43 | Success | - | |
|
exp_self.20260308224109.052_20260308_224138
|
Heterogeneous State Tiering (HST) Benchmark
README.md Heterogeneous State Tiering (HST) Benchmark This repository contains a runnable benchmark for the **Heterogeneous State Tiering (HST)** proposal. Concept HST proposes an OS Paging-inspired approach to Sequence Model (SSM) memory m...
|
03-08 22:41 | Success | - | |
|
exp_pytrain.20260308223912.027_20260308_223936
|
Benchmark: Strictly Typed Dynamic Plugin Registry
README.md Benchmark: Strictly Typed Dynamic Plugin Registry This benchmark tests the ability to construct a robust, zero-dependency extension framework using Python's standard library. It simulates a "model packaging system" often found in...
|
03-08 22:39 | Success | - | |
|
exp_hf_2603.05888_20260308_223738
|
PixARMesh Benchmark
README.md PixARMesh Benchmark This benchmark evaluates the `PixARMesh` architecture for autoregressive 3D scene reconstruction. It specifically highlights the efficiency of using **State Space Models (SSM/Mamba)** for processing long sequen...
|
03-08 22:37 | Success | - | |
|
exp_self.20260308223444.051_20260308_223517
|
Entropy-Adaptive Precision State Machine Benchmark
This repository contains the implementation and benchmarking code for the **Entropy-Adaptive Precision State Machine**. Overview Traditional State Space Models (SSMs) and sequence models maintain state in full precision (FP32) regardless of...
|
03-08 22:35 | Success | - | |
|
exp_pytrain.20260308223301.026_20260308_223319
|
---
README.md Typed Component Registry and Dynamic Loader This benchmark demonstrates the implementation of a robust, type-safe plugin registry system using Python's standard library `typing` module. It mimics the extensibility patterns found i...
|
03-08 22:33 | Success | - | |
|
exp_self.20260308223042.050_20260308_223133
|
Delta State Quantization (DSQ) for Streaming
Paper ID: self.20260308223042.050 - Hypothesis: State changes ($h_t - h_{t-1}$) are sparser and lower magnitude than the state $h_t$. Storing the delta in 4-bit INT and the base state in 16-bit FP reduces memory bandwidth for state updates....
|
03-08 22:31 | Success | - | |
|
exp_self.20260308222831.049_20260308_222913
|
CPU-Pinned Historical State Buffer (CHSB)
README.md CPU-Pinned Historical State Buffer (CHSB) Innovation Summary Standard State Space Models (SSMs) like Mamba require maintaining a hidden state tensor that grows with sequence length. On GPU-constrained hardware (e.g., 8GB VRAM), th...
|
03-08 22:29 | Success | - | |
|
exp_pytrain.20260308222619.025_20260308_222652
|
```markdown
README.md
|
03-08 22:26 | Success | - | |
|
exp_self.20260308222344.048_20260308_222413
|
Linear Attention Hybrid IO-Layer Benchmark
README.md Linear Attention Hybrid IO-Layer Benchmark This benchmark evaluates the **Hybrid IO-Layer**, a novel architecture combining the efficiency of State Space Models (SSMs) for long-term history with the precision of Linear Attention f...
|
03-08 22:24 | Success | - | |
|
exp_self.20260308222134.047_20260308_222201
|
SSM + Cache + Dynamic Precision Benchmark
README.md SSM + Cache + Dynamic Precision Benchmark This benchmark investigates the hypothesis that combining State Space Models (SSM), efficient caching mechanisms, and dynamic precision (Automatic Mixed Precision) can yield better memory...
|
03-08 22:22 | Success | - | |
|
exp_pytrain.20260308221857.024_20260308_221942
|
Strictly-Typed Plugin Pipeline Benchmark
README.md Strictly-Typed Plugin Pipeline Benchmark Overview This coding drill validates the implementation of a robust, strictly-typed data processing pipeline using Python's standard `typing` module. It demonstrates the use of `Protocol` f...
|
03-08 22:19 | Success | - | |
|
exp_self.20260308221650.046_20260308_221719
|
Sparse State History Retrieval (SSHR) Benchmark
This benchmark tests the hypothesis that offloading state history to a CPU-side KNN index (FAISS) and injecting the nearest neighbor into the current SSM step improves long-term retention without increasing the recurrent state size. Hypothe...
|
03-08 22:17 | Success | - | |
|
exp_self.20260308221355.045_20260308_221441
|
---
**README.md** bash python benchmark.py
|
03-08 22:14 | Success | - | |
|
exp_pytrain.20260308221201.023_20260308_221223
|
Python Skill Fallback
Title: Type-Safe Asynchronous Entry Point Dispatcher - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-08 22:12 | Success | - | |
|
exp_gh_obss_sahi_20260308_221028
|
obss/sahi
Paper ID: gh_obss_sahi - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark s...
|
03-08 22:10 | Success | - | |
|
exp_self.20260308220740.044_20260308_220836
|
Entropy-Adaptive State Tiering (EAST) Benchmark
README.md Entropy-Adaptive State Tiering (EAST) Benchmark This benchmark validates the **EAST** hypothesis: Low-entropy (stable/boring) states in State Space Models (SSMs) can be offloaded to CPU pinned memory without significantly degradin...
|
03-08 22:08 | Success | - | |
|
exp_pytrain.20260308220502.022_20260308_220549
|
PEP 695 Type Parameter Syntax & Module Hygiene
Overview This benchmark evaluates a developer's implementation of Python 3.12's PEP 695 Type Parameter Syntax and module hygiene standards. It verifies that the provided module uses the new generic syntax (e.g., `class MyClass[T]:`, `def fu...
|
03-08 22:05 | Success | - | |
|
exp_self.20260308220301.043_20260308_220336
|
Benchmark: SSM + Cache Co-Design with Dynamic Precision
README.md Benchmark: SSM + Cache Co-Design with Dynamic Precision Hypothesis Combining State Space Models (SSM), explicit State Caching, and Dynamic Precision (AMP) will yield higher throughput and lower VRAM usage compared to a standard Tr...
|
03-08 22:03 | Success | - | |
|
exp_hf_2603.06569_20260308_220047
|
```markdown
bash python benchmark.py
|
03-08 22:00 | Success | - | |
|
exp_pytrain.20260308215722.021_20260308_215759
|
Dynamic Plugin Architecture with Strict Typing
README.md Dynamic Plugin Architecture with Strict Typing This benchmark tests the ability to implement a robust, dynamic plugin loading system using Python's standard library. It focuses on simulating a packaging workflow where package stru...
|
03-08 21:58 | Success | - | |
|
exp_self.20260308215449.042_20260308_215605
|
CPU-Pinned Sparse State Recycling
README.md CPU-Pinned Sparse State Recycling This benchmark implements and evaluates a memory-efficient State Space Model (SSM) inference technique designed to extend context windows beyond standard GPU VRAM limitations. Concept Standard SSM...
|
03-08 21:56 | Success | - | |
|
exp_self.20260308215147.041_20260308_215234
|
Entropy-Adaptive KV Cache Quantization
Paper ID: self.20260308215147.041 - Hypothesis: Tokens with low entropy (predictable) can be stored in 4-bit without loss, while high-entropy tokens require 8-bit. This adaptive method preserves coherence where it matters most. - Plan: Hook...
|
03-08 21:52 | Success | - | |
|
exp_pytrain.20260308214956.020_20260308_215027
|
Strictly-Typed Dynamic Package Loader
README.md Strictly-Typed Dynamic Package Loader Overview This coding drill tests your ability to dynamically generate Python packages, enforce strict static typing using Generics (`typing.Generic`), and validate package structure programmat...
|
03-08 21:50 | Success | - | |
|
exp_self.20260308214806.040_20260308_214831
|
```markdown
bash python benchmark.py
|
03-08 21:48 | Success | - | |
|
exp_self.20260308214513.039_20260308_214539
|
Asynchronous CPU-Pinned State Ringbuffer for SSMs
README.md Asynchronous CPU-Pinned State Ringbuffer for SSMs This benchmark demonstrates a novel memory management technique for State-Space Models (SSMs), specifically targeting Mamba-style architectures. By exploiting the natural decay of...
|
03-08 21:46 | Success | - | |
|
exp_pytrain.20260308214254.019_20260308_214321
|
Design one runnable Python coding drill benchmark.
STRICT REQUIREMENT: Output two sections separated by '
|
03-08 21:43 | Success | - | |
|
exp_self.20260308213951.038_20260308_214037
|
Section 1: README.md
Section 2: benchmark.py README.md content: - Title, Hypothesis, Setup, Usage. benchmark.py content: - Import torch, time, gc. - Define constants. - Class `DRSPCache` implementing the tiered logic. - Class `StandardCache` for baseline. - `ru...
|
03-08 21:41 | Success | - | |
|
exp_self.20260308213821.037_20260308_213847
|
Section 1: README.md
Hybrid Attention-SSM Corrector (HASC) Benchmark Innovation The **Hybrid Attention-SSM Corrector (HASC)** enhances standard Selective State Space Models (SSMs) like Mamba by injecting a local attention vector into the state update mechanism....
|
03-08 21:38 | Success | - | |
|
exp_pytrain.20260308213637.018_20260308_213701
|
**Title:** Strict Data Processor Module Design
README.md **Title:** Strict Data Processor Module Design **Description:** This benchmark evaluates the creation of a robust, reusable generic pipeline system using Python's standard typing utilities. The candidate must implement a `Pipeline...
|
03-08 21:37 | Success | - | |
|
exp_self.20260308213236.036_20260308_213311
|
Gradient-Modulated State Quantization (GMSQ)
README.md Gradient-Modulated State Quantization (GMSQ) **Innovation:** Dynamic Precision + SSM **Hypothesis:** Timesteps with high gradient magnitude require higher precision state retention, while 'flat' regions can survive 4-bit or 2-bit...
|
03-08 21:35 | Success | - | |
|
exp_self.20260308213037.035_20260308_213115
|
Here is the design for the Semantic Partitioned State Space (SPSS) benchmark.
Section 1 contains the documentation. Section 2 contains the runnable Python benchmark. bash python benchmark.py ```
|
03-08 21:31 | Success | - | |
|
exp_pytrain.20260308212853.017_20260308_212910
|
```markdown
README.md bash python benchmark.py Generating temporary package structure... Loading module from tmp_pkg/processor.py... Validating against StrictValidator protocol... VRAM_USAGE: 0.00MB TOKENS_PER_SEC: <calculated_value> VERIFIED: PASSED
|
03-08 21:29 | Success | - | |
|
exp_self.20260308212625.034_20260308_212702
|
Spectral State Compression (SSC) Benchmark
This benchmark evaluates the hypothesis that SSM hidden states can be compressed in the frequency domain (using FFT) to save memory with minimal degradation in model performance (perplexity). README.md bash python benchmark.py
|
03-08 21:27 | Success | - | |
|
exp_self.20260308212442.033_20260308_212510
|
Speculative State Offloading (SSO) Benchmark
README.md Speculative State Offloading (SSO) Benchmark This benchmark validates the **Speculative State Offloading (SSO)** hypothesis, which posits that state evolution in State Space Models (SSMs) is sufficiently smooth to be approximated...
|
03-08 21:25 | Success | - | |
|
exp_pytrain.20260308212245.016_20260308_212309
|
Typed Dependency Graph Resolver
README.md Typed Dependency Graph Resolver This benchmark evaluates the implementation of a robust `DependencyResolver` using Python's modern typing features. Objective Implement a dependency resolution algorithm that calculates a valid inst...
|
03-08 21:23 | Success | - | |
|
exp_self.20260308211846.032_20260308_211946
|
Entropy-Gated Token-Wise State Precision
README.md Entropy-Gated Token-Wise State Precision Overview This benchmark evaluates an optimization technique for State Space Models (SSMs) and Recurrent Architectures. It tests the hypothesis that not all tokens require full-precision (FP...
|
03-08 21:19 | Success | - | |
|
exp_pytrain.20260308211623.015_20260308_211656
|
Dynamic Package Loading with Structural Typing Validation
Overview This benchmark tests the ability to construct a robust Python plugin system. It demonstrates dynamic module discovery, loading from an arbitrary file system location, and structural interface validation using Python's `typing.Proto...
|
03-08 21:17 | Success | - | |
|
exp_hf_2603.06351_20260308_211436
|
Dynamic Chunking Diffusion Transformer
Paper ID: hf_2603.06351 - Hypothesis: Benchmark a simplified recovered baseline against an ablated variant. - Plan: Run the deterministic recovery benchmark and capture VRAM plus throughput telemetry. - Expected Signal: Recovered benchmark...
|
03-08 21:14 | Success | - | |
|
exp_self.20260308211115.031_20260308_211210
|
Innovation: Log-State Numerical Stability (LSNS)
README.md Innovation: Log-State Numerical Stability (LSNS) Overview This benchmark investigates the hypothesis that performing State Space Model (SSM) state updates in the logarithmic domain improves numerical fidelity on long sequences com...
|
03-08 21:12 | Success | - | |
|
exp_pytrain.20260308210854.014_20260308_210939
|
Drill: Strictly Typed Configuration Module with CLI Interface
Adhering to strict `typing` protocols (TypedDict, Protocol) and packaging standards (versioning, `__all__`, entry-point simulation) within a single script significantly reduces runtime errors and improves the maintainability of configuratio...
|
03-08 21:09 | Success | - | |
|
exp_self.20260308210647.030_20260308_210731
|
VRAM-Responsive State Eviction (VRSE) Benchmark
README.md VRAM-Responsive State Eviction (VRSE) Benchmark This repository contains a benchmark designed to test the **VRSE** innovation. Hypothesis Applying a cache policy (e.g., LRU) to the *batch* state dimension of State Space Models (SS...
|
03-08 21:07 | Success | - | |
|
exp_self.20260308210434.029_20260308_210514
|
```markdown
README.md
|
03-08 21:05 | Success | - | |
|
exp_pytrain.20260308210227.013_20260308_210302
|
Python Skill Fallback
Title: Strict Entry Point Dispatcher - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-08 21:03 | Success | - | |
|
exp_self.20260308205807.028_20260308_205932
|
Temporal-Decay State Precision (TDSP) Benchmark
README.md Temporal-Decay State Precision (TDSP) Benchmark This benchmark evaluates the **TDSP** innovation, which hypothesizes that recent token history requires BF16 precision for gradient stability, while older history (state) can be main...
|
03-08 20:59 | Success | - | |
|
exp_pytrain.20260308205537.012_20260308_205616
|
Dynamic Module Packaging and Runtime Type Verification
Overview This benchmark tests the ability to construct Python packaging tooling from scratch using only the standard library. It validates a system's capability to perform file system operations, dynamic code generation, runtime module impo...
|
03-08 20:56 | Success | - | |
|
exp_self.20260308205343.027_20260308_205410
|
Contiguous-Buffer State Offload (CBSO) Benchmark
README.md Contiguous-Buffer State Offload (CBSO) Benchmark This benchmark evaluates the **CBSO** innovation, designed to mitigate device synchronization crashes and optimize VRAM usage in State Space Models (SSMs) like Mamba. The Innovation...
|
03-08 20:54 | Success | - | |
|
exp_hf_2603.06199_20260308_205152
|
FlashPrefill Benchmark
Overview This benchmark evaluates the performance characteristics of **FlashPrefill**, a framework designed for ultra-fast long-context prefilling. It compares the proposed method against a standard Dense Attention baseline. **Key Innovatio...
|
03-08 20:52 | Success | - | |
|
exp_pytrain.20260308204912.011_20260308_204941
|
Python Skill Fallback
Title: Structural Subtyping Plugin System - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-08 20:49 | Success | - | |
|
exp_self.20260308204701.026_20260308_204739
|
**Project:** ARES Benchmark Prototype (SSM + Cache + Dynamic Precision)
README.md **Project:** ARES Benchmark Prototype (SSM + Cache + Dynamic Precision) **Description:** This benchmark investigates the hypothesis that a co-design of State Space Models (SSM), State Caching, and Dynamic Precision can yield super...
|
03-08 20:47 | Success | - | |
|
exp_self.20260308204439.025_20260308_204520
|
Variance-Based Dynamic State Precision Benchmark
This benchmark evaluates a novel optimization for State Space Models (SSMs), specifically targeting the Mamba architecture. The core hypothesis is that the hidden state within the SSM recurrence does not require uniform FP16 precision. By c...
|
03-08 20:45 | Success | - | |
|
exp_pytrain.20260308204222.010_20260308_204311
|
Type-Safe Async Worker Simulation Benchmark
README.md Type-Safe Async Worker Simulation Benchmark Objective This benchmark evaluates the ability to construct a production-ready Python module that adheres to strict software engineering standards. The goal is to create `async_worker.py...
|
03-08 20:43 | Success | - | |
|
exp_self.20260308202337.024_20260308_202444
|
---
README.md Benchmark: Entropy-Thresholded Dynamic State Quantization (Mamba) This benchmark implements and tests an innovation applied to State Space Models (SSMs), specifically targeting the **Mamba** architecture. Hypothesis The hidden sta...
|
03-08 20:39 | Success | - | |
|
exp_self.20260308202042.023_20260308_202133
|
Here is the design for the SSM-Guided KV Cache Eviction benchmark.
No summary available yet.
|
03-08 20:21 | Success | - | |
|
exp_pytrain.20260308201908.009_20260308_201925
|
Benchmark: Runtime-Checked Plugin Discovery System
README.md Benchmark: Runtime-Checked Plugin Discovery System Hypothesis An autonomous system can robustly implement a modular architecture by leveraging Python's `importlib` for dynamic code loading and `typing.Protocol` for structural subt...
|
03-08 20:19 | Success | - | |
|
exp_self.20260308201708.022_20260308_201749
|
Benchmark: Linear-Attention State Priming (LASP)
README.md Benchmark: Linear-Attention State Priming (LASP) Hypothesis Standard State Space Models (SSMs) like Mamba theoretically handle infinite context, but in practice, the recurrent hidden state $h_t$ acts as a lossy bottleneck. Informa...
|
03-08 20:18 | Success | - | |
|
exp_self.20260308201444.021_20260308_201539
|
Mixed-Precision State Segments Benchmark
This benchmark evaluates the "Mixed-Precision State Segments" hypothesis, specifically applied to Mamba-style State Space Models (SSMs). It aims to demonstrate that by profiling state gradients to identify sensitive dimensions, we can store...
|
03-08 20:15 | Success | - | |
|
exp_pytrain.20260308201205.008_20260308_201326
|
Python Skill Fallback
Title: PEP 440 Semantic Version Resolver & Validator - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-08 20:13 | Success | - | |
|
exp_self.20260308200958.020_20260308_201025
|
Hybrid Mamba-Linear Router (HMLR)
Paper ID: self.20260308200958.020 - Hypothesis: High-entropy tokens require the recall of Linear Attention, while low-entropy tokens are efficiently handled by SSM recurrence. A per-token router will lower VRAM usage (via SSM) while maintai...
|
03-08 20:10 | Success | - | |
|
exp_self.20260308200824.019_20260308_200855
|
This benchmark evaluates the **LoRA-Dynamic State Expansion** technique for efficient sequence modeling.
README.md This benchmark evaluates the **LoRA-Dynamic State Expansion** technique for efficient sequence modeling. Concept Standard State Space Models (SSMs) like Mamba maintain a large hidden state to handle long-range dependencies, leadin...
|
03-08 20:09 | Success | - | |
|
exp_self.20260308200635.018_20260308_200700
|
Entropy-Gated Sparse State
Paper ID: self.20260308200635.018 - Hypothesis: Not every token requires a full state update. For low-entropy tokens (stopwords, punctuation), we can skip updating 50% of the state dimensions (Top-K update) without degrading coherence. - Pl...
|
03-08 20:07 | Success | - | |
|
exp_pytrain.20260308200505.007_20260308_200531
|
Typed Modular Plugin Registry
README.md Typed Modular Plugin Registry This benchmark evaluates the design and performance of a robust, type-safe component registry using Python's `typing` module. It simulates a micro-kernel architecture where a central registry manages...
|
03-08 20:05 | Success | - | |
|
exp_self.20260308200236.017_20260308_200309
|
Entropy-Adaptive State Quantization (EASQ) Benchmark
README.md Entropy-Adaptive State Quantization (EASQ) Benchmark This repository contains a minimal, runnable benchmark for the **Entropy-Adaptive State Quantization (EASQ)** innovation. Hypothesis Tokens with low information entropy (e.g., p...
|
03-08 20:03 | Success | - | |
|
exp_self.20260308200055.016_20260308_200128
|
Student hypothesis: ssm + cache co-design
Paper ID: self.20260308200055.016 - Hypothesis: Combining ssm + cache + dynamic_precision will improve throughput or memory efficiency without breaking 8GB execution. - Plan: Create a compact comparative benchmark against a simple baseline,...
|
03-08 20:01 | Success | - | |
|
exp_pytrain.20260308195832.006_20260308_195854
|
Generic Plugin Registry with Protocol Constraints
Drill Overview This benchmark evaluates your ability to design robust, type-safe polymorphic architectures using Python's advanced type system (`typing.Protocol`, `typing.TypeVar`, `typing.Generic`), mirroring patterns found in high-perform...
|
03-08 19:59 | Success | - | |
|
exp_self.20260308195640.015_20260308_195718
|
Local Attention-SSM Error Correction Loop
README.md Local Attention-SSM Error Correction Loop Innovation This benchmark implements a **Local Attention-SSM Error Correction Loop**, a hybrid architecture combining State Space Models (SSMs) with local sliding-window attention. Hypothe...
|
03-08 19:57 | Success | - | |
|
exp_self.20260308195529.014_20260308_195547
|
Sink-Token State Initialization Benchmark
README.md Sink-Token State Initialization Benchmark This benchmark evaluates the **Sink-Token State Initialization** technique for State Space Models (SSMs). The Innovation Standard SSMs (like Mamba) initialize their recurrent state $h_0$ t...
|
03-08 19:55 | Success | - | |
|
exp_self.20260308195353.013_20260308_195421
|
Student hypothesis: ssm + cache co-design
Paper ID: self.20260308195353.013 - Hypothesis: Combining ssm + cache + dynamic_precision will improve throughput or memory efficiency without breaking 8GB execution. - Plan: Create a compact comparative benchmark against a simple baseline,...
|
03-08 19:54 | Success | - | |
|
exp_pytrain.20260308195213.005_20260308_195234
|
Strictly Typed Dynamic Plugin Registry
Overview This benchmark is a self-contained Python script designed to test a developer's ability to implement a robust, strictly-typed plugin architecture using Python's standard library `typing` module. It simulates a micro-packaging envir...
|
03-08 19:52 | Success | - | |
|
exp_self.20260308195013.012_20260308_195102
|
Gradient-Checkpointing State Streaming Benchmark
README.md Gradient-Checkpointing State Streaming Benchmark This benchmark validates the **Keyframe Caching** innovation, which applies gradient-checkpointing principles to State Space Model (SSM) inference. The Problem Standard SSM inferenc...
|
03-08 19:51 | Success | - | |
|
exp_self.20260308194845.011_20260308_194916
|
Recency-Stratified State Precision (RSSP)
Paper ID: self.20260308194845.011 - Hypothesis: SSM state vectors suffer primarily from quantization error in the immediate recurrence window; older history can be aggressively quantized to 4-bit or binary with minimal performance loss. - P...
|
03-08 19:49 | Success | - | |
|
exp_self.20260308194641.010_20260308_194718
|
SSM + Cache + Dynamic Precision Co-design Benchmark
README.md SSM + Cache + Dynamic Precision Co-design Benchmark Hypothesis Combining State Space Models (SSMs), State Caching, and Dynamic Precision (Mixed Precision) will significantly improve inference throughput and memory efficiency compa...
|
03-08 19:47 | Success | - | |
|
exp_pytrain.20260308194525.004_20260308_194547
|
Python Skill Fallback
Title: Dynamic Type-Safe Plugin Loader - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-08 19:45 | Success | - | |
|
exp_self.20260308194354.009_20260308_194421
|
**README.md**
No summary available yet.
|
03-08 19:44 | Success | - | |
|
exp_self.20260308194207.008_20260308_194230
|
Dynamic LoRA Injection for State Decay
README.md Dynamic LoRA Injection for State Decay Hypothesis In State Space Models (SSMs) like Mamba, the `dt` (delta time-step) parameter acts as a gate, controlling the balance between long-term history (global context) and immediate input...
|
03-08 19:42 | Success | - | |
|
exp_self.20260308194041.007_20260308_194103
|
Benchmark: Time-Decay Weighted State Cache for SSMs
README.md Benchmark: Time-Decay Weighted State Cache for SSMs Overview This benchmark evaluates the **Time-Decay Weighted State Cache** innovation. The hypothesis is that standard State Space Models (SSMs) suffer from unbounded state growth...
|
03-08 19:41 | Success | - | |
|
exp_pytrain.20260308193907.003_20260308_193926
|
Python Skill Fallback
Title: Asyncio-Driven Service Registry with Protocol Enforcement - Focus: typing, packaging - Note: Generated fallback due to unavailable model output.
|
03-08 19:39 | Success | - | |
|
exp_self.20260308193714.006_20260308_193749
|
Hybrid Linear-SSM State Fusion Benchmark
README.md Hybrid Linear-SSM State Fusion Benchmark This repository contains the implementation and benchmarking code for the **Hybrid Linear-SSM State Fusion** architecture. Concept The standard implementation of Linear Attention layers req...
|
03-08 19:37 | Success | - | |
|
exp_self.20260308193524.005_20260308_193555
|
Section 1: README.md
bash pip install torch numpy scipy bash python benchmark.py [Baseline] VRAM_USAGE: 1200MB TOKENS_PER_SEC: 85.5 [DSRA] VRAM_USAGE: 950MB TOKENS_PER_SEC: 90.2 RESULT: DSRA reduces VRAM by X% and improves TPS by Y%.
|
03-08 19:36 | Success | - | |
|
exp_self.20260308193327.004_20260308_193356
|
Here is the design for the "Student hypothesis: ssm + cache + dynamic_precision" benchmark.
README.md Benchmark: SSM + Cache + Dynamic Precision Co-design Hypothesis Combining **SSM** (State Space Models), **Cache** (state persistence), and **Dynamic Precision** (BF16/AMP) will significantly improve memory efficiency (VRAM) and th...
|
03-08 19:34 | Success | - | |
|
exp_pytrain.20260308193143.002_20260308_193207
|
Dynamic Package Construction with PEP 695 Generics
This benchmark evaluates a system's ability to programmatically generate a valid Python package structure and utilize modern typing features introduced in Python 3.12 (PEP 695). Overview The script attempts to: 1. Create a temporary file sy...
|
03-08 19:32 | Success | - | |
|
exp_self.20260308192941.003_20260308_193013
|
Channel-Wise Adaptive State Quantization (WASQ)
README.md Channel-Wise Adaptive State Quantization (WASQ) Overview This benchmark implements the **Channel-Wise Adaptive State Quantization (WASQ)** innovation for State Space Models (SSMs). It tests the hypothesis that allocating heterogen...
|
03-08 19:30 | Success | - | |
|
exp_self.20260308192753.002_20260308_192831
|
Low-Rank State Projection (LoRSP) Benchmark
README.md Low-Rank State Projection (LoRSP) Benchmark Innovation Description **Low-Rank State Projection (LoRSP)** is a technique designed to optimize the CPU offloading of State Space Model (SSM) hidden states. **The Problem:** In SSMs (li...
|
03-08 19:28 | Success | - | |
|
exp_self.20260308192546.001_20260308_192629
|
Adaptive-Resolution State Cache (ARSC) Benchmark
README.md Adaptive-Resolution State Cache (ARSC) Benchmark This repository contains a minimal, runnable benchmark for the **Adaptive-Resolution State Cache (ARSC)** innovation. Concept Standard State Space Models (SSMs) like Mamba maintain...
|
03-08 19:26 | Success | - | |
|
exp_pytrain.20260308192403.001_20260308_192439
|
Runtime-Validated Plugin Registry Benchmark
README.md Runtime-Validated Plugin Registry Benchmark This benchmark demonstrates a robust, loosely coupled plugin architecture using Python's `typing.Protocol` for structural subtyping and `importlib` for dynamic runtime loading. Objective...
|
03-08 19:24 | Success | - | |
|
exp_self.20260308190055.006_20260308_190124
|
Benchmark: CPU-Pinned State Swapping for Long Context
README.md Benchmark: CPU-Pinned State Swapping for Long Context Overview This benchmark tests the hypothesis that an SSM (State Space Model) can handle arbitrarily long sequences (100k+ tokens) on limited VRAM (8GB) by offloading the "cold"...
|
03-08 19:01 | Pending | - | |
|
exp_pytrain.20260308185926.003_20260308_185944
|
This benchmark verifies the implementation of a robust dynamic plugin loader using Python's standard library. It demonst...
README.md This benchmark verifies the implementation of a robust dynamic plugin loader using Python's standard library. It demonstrates structural sub-typing using `typing.Protocol` and runtime module discovery via `importlib`. Features 1....
|
03-08 18:59 | Success | - | |
|
exp_self.20260308184721.005_20260308_184759
|
Benchmark: Asynchronous State Prefetch Pipeline
README.md Benchmark: Asynchronous State Prefetch Pipeline **Innovation:** Asynchronous State Prefetch Pipeline **Concept:** Latency Hiding, Double Buffering, Pinned Memory **Target:** SSM / Mamba-like architectures with large context window...
|
03-08 18:58 | Success | - | |
|
exp_self.20260308184533.004_20260308_184604
|
This repository contains a synthetic benchmark designed to validate the hypothesis that **combining State Space Models (...
README.md This repository contains a synthetic benchmark designed to validate the hypothesis that **combining State Space Models (SSM), architectural caching optimizations, and dynamic precision techniques** yields superior memory efficienc...
|
03-08 18:46 | Success | - | |
|
exp_pytrain.20260308184348.002_20260308_184418
|
PEP 695 Generic Plugin Loader Benchmark
Overview This benchmark evaluates the use of **PEP 695 Type Parameter Syntax** (introduced in Python 3.12) to define a generic base class for a dynamic plugin architecture. The Hypothesis Using the new syntax `class Base[T]:` (instead of `c...
|
03-08 18:44 | Success | - | |
|
exp_self.20260308184213.003_20260308_184237
|
Associative State Retrieval (ASR) Benchmark
This benchmark tests the hypothesis that offloading SSM state history to CPU RAM and retrieving it via dot-product attention improves long-context fidelity without exploding GPU VRAM usage. Dependencies - Python 3.8+ - PyTorch 2.0+ - numpy...
|
03-08 18:42 | Success | - | |
|
exp_self.20260308184014.002_20260308_184045
|
Benchmark: Tiered Delta State Compression
README.md Benchmark: Tiered Delta State Compression Overview This benchmark evaluates the "Tiered Delta State Compression" technique. This innovation aims to enable processing of significantly longer sequences (2x length) on fixed hardware...
|
03-08 18:41 | Success | - | |
|
exp_self.20260308183811.001_20260308_183846
|
Innovation Benchmark: SSM + Cache + Dynamic Precision Co-design
README.md Innovation Benchmark: SSM + Cache + Dynamic Precision Co-design Hypothesis Combining **State Space Models (SSM)**, **Caching** (state persistence), and **Dynamic Precision** (Automatic Mixed Precision) will improve throughput and...
|
03-08 18:38 | Success | - | |
|
exp_pytrain.20260308183640.001_20260308_183711
|
Section 1: README.md
bash python benchmark.py
|
03-08 18:37 | Success | - |