Login
Sign Up
Woofun AI reports that the Gaoling School of Artificial Intelligence at Renmin University of China has released DeNovoSWE, the first dataset focused on long-horizon repository-level code generation. The dataset contains 4,818 high-quality instances constructed via Divide & Conquer and Critic & Repair mechanisms, shifting focus from bug fixing to creating entire repositories from task documents.
Experiments demonstrate that training the Qwen3-30B-A3B-Instruct model with DeNovoSWE increased success rates on BeyondSWE-Doc2Repo from 5.8% to 47.2% and on NL2RepoBench from 4.3% to 23.0%. The dataset addresses the challenge of long-term planning by providing verifiable, leak-proof data for generating executable software architectures.