New AI framework autonomously optimizes training data, architectures and algorithms — outperforming human baselines

AI R&D runs on a cycle of hypothesis, experiment, and analysis — each step demanding substantial manual engineering effort. A new framework from researchers at SII-GAIR aims to close that bottleneck by automating the full optimization loop for training data, model architectures, and learning algorithms.

A new framework called ASI-EVOLVE, developed by researchers at the Generative Artificial Intelligence Research Lab (SII-GAIR), aims to solve this bottleneck. Designed as an agentic system for AI-for-AI research, it uses a continuous “learn-design-experiment-analyze” cycle to automate the optimization of the foundational AI stack.

In experiments, this self-improvement loop autonomously discovered novel designs that significantly outperformed state-of-the-art human baselines. The system generated novel language model architectures, improved pretraining data pipelines to boost benchmark scores by over 18 points, and designed highly efficient reinforcement learning algorithms. 

For enterprise teams running repeated optimization cycles on their AI systems, the framework offers a path to reducing manual engineering overhead while matching or exceeding the performance of human-designed baselines.

The data and design bottleneck

Engineering teams can only explore a tiny fraction of the vast possible design space for AI models at any given time. Executing experimental workflows requires costly manual effort and frequent human intervention. And the insights gained from these expensive cycles are often siloed as individual intuition or experience, making it difficult to systematically preserve and transfer that knowledge to future projects or across different teams. These constraints fundamentally limit the pace and scale of AI innovation.

AI has made incredible strides in scientific discovery, ranging from specialized tools like AlphaFold solving discrete biological problems to agentic systems answering basic scientific questions. However, current frameworks still struggle with open-ended AI innovation and are mostly limited to narrow optimization within very specific constraints.

Advancing core AI capabilities is far more complex. It requires modifying large interdependent codebases, running compute-heavy experiments that consume tens to hundreds of GPU hours, and analyzing multi-dimensional feedback from training dynamics. 

“Existing frameworks have not yet demonstrated that AI can operate effectively in this regime in a unified way, nor that it can generate meaningful advances across the three foundational pillars of AI development rather than within a single narrowly scoped setting,” the researchers write.

How ASI-EVOLVE learns to research

To overcome the limitations of manual R&D, ASI-EVOLVE operates on a continuous loop between prior knowledge, hypothesis generation, experimentation, and refinement. The system learns relevant knowledge and historical experience from existing databases, designs a candidate program representing its next hypothesis, runs experiments to obtain evaluation signals, and analyzes outcomes into reusable, human-readable lessons that it feeds back into its knowledge base.

There are two key components that drive ASI-EVOLVE. The “Cognition Base” acts as the system’s foundational domain expertise. To speed up the search process, the system is pre-loaded with human knowledge, task-relevant heuristics, and known pitfalls extracted from existing literature. This steers the exploration toward promising directions right from the first iteration. 

The second component is the “Analyzer,” which tackles the complex, multi-dimensional feedback from the experiments. It processes raw training logs, benchmark results, and efficiency traces, distilling them into compact, actionable insights and causal analyses.

Several other complementary modules bring the framework together. A “Researcher” agent reviews prior knowledge from the cognition base and past experimental results to generate new hypotheses, either proposing localized code modifications or writing new programs. 

The “Engineer” component runs the actual experiments. Because AI training trials are incredibly costly, the Engineer is equipped with efficiency measures like wall-clock limits and early rejection quick tests to filter out flawed candidate programs before they consume excessive GPU hours. 

Finally, the “Database” serves as the system’s persistent memory, storing the code, research motivations, raw results, and the Analyzer’s final reports for every iteration, ensuring that insights compound systematically over time.

By unifying these components, ASI-EVOLVE ensures that an AI agent systematically learns from complex, real-world experimental feedback without requiring constant human intervention. 

While previous frameworks are designed to evolve candidate solutions, “ASI-EVOLVE evolves cognition itself,” the researchers write. “Accumulated experience and distilled insights are continuously stored and retrieved to inform future exploration, ensuring that the system grows not only in the quality of its solutions but in its capacity to reason about where to search next.”

ASI-EVOLVE in action

In their experiments, the researchers showed that ASI-EVOLVE can successfully improve data curation, model architectures, and learning algorithms to create better AI systems.

For real-world enterprise applications, high-quality data is a persistent bottleneck. When tasked with designing category-specific cleaning strategies for massive pretraining corpora, ASI-EVOLVE inspected data samples and diagnosed quality issues like HTML artifacts and formatting inconsistencies. The system autonomously formulated custom curation rules, discovering that systematic cleaning combined with domain-aware preservation rules is far more effective than aggressive filtering. 

In benchmark tests, 3B-parameter models trained on the AI-curated data saw an average score boost of nearly 4 points over models trained on raw data. The gains were highest in knowledge-intensive tasks, with performance increasing by over 18 points on Massive Multitask Language Understanding (MMLU), an LLM benchmark that covers tasks across STEM, humanities, and social sciences.

Beyond data, the system proved highly capable at neural architecture design. Across 1,773 autonomous exploration rounds, it generated 105 novel linear attention architectures that surpassed DeltaNet, a highly efficient human-designed baseline. To achieve these results, ASI-EVOLVE developed multi-scale routing mechanisms that dynamically adjust the model’s computational budget based on the specific content of the input.

Finally, in reinforcement learning algorithm design, ASI-EVOLVE discovered novel optimization mechanisms. It designed algorithms that outperformed the competitive GRPO baseline on complex mathematical reasoning benchmarks such as AMC32 and AIME24. One successful variant invented a “Budget-Constrained Dynamic Radius” that keeps model updates within a defined budget, effectively stabilizing training on noisy data.

What this means for enterprise AI

Enterprise AI workflows constantly require optimizations to existing systems, from fine-tuning open-source models on proprietary data to making small changes to architectures and algorithms. Usually, the computational resources and engineering hours required to carry out such efforts are immense and beyond the capabilities of most organizations. As a result, many are left to run unoptimized versions of standard AI models.

The research team says the framework is designed so enterprises can integrate proprietary domain knowledge into the cognition repository and allow the autonomous loop to iterate on internal AI systems.

The research team has open-sourced the ASI-EVOLVE code, making the foundational framework available for developers and product builders.