What makes solving an 'open' math problem different from an IMO problem for AI?

International Mathematical Olympiad (IMO) problems, while difficult, are curated puzzles with known solution pathways. They test the application of established techniques. An 'open' problem, like those in the Erdos Conjectures database, has no known solution. Tackling it requires genuine exploration of an unknown mathematical landscape, literature review to understand the context, and potentially the creation of novel proof strategies or even new mathematical concepts. This shifts AI from a powerful calculator to a research collaborator.

How does Code2World's approach differ from traditional AI world models?

Traditional world models often try to predict the next visual state pixel-by-pixel, which is computationally heavy and prone to structural errors. Code2World reframes the task: instead of predicting pixels, it predicts the underlying code (like HTML/CSS) that would generate the next screen state. This is more efficient, inherently structured, and allows for precise, verifiable predictions. An 8-billion-parameter model using this method can rival the performance of much larger models like GPT-5 on GUI interaction tasks.

What does 'autonomy grading' for AI research mean, and why is it important?

The DeepMind team proposed a framework to grade the level of autonomy and novelty in AI-assisted discoveries. This is crucial because it moves beyond a binary 'solved/not solved' metric. It assesses how much human guidance was needed (from full direction to mere problem posing) and whether the result was a novel theorem or a known result reached in a new way. This formalization acknowledges that AI's role in science is a spectrum and sets standards for crediting both human and machine contributions, which will be essential for academic integrity.

Can the knowledge from models like VideoWorld 2 be applied to real robots?

Yes, a key breakthrough of VideoWorld 2 is the demonstrated transfer of knowledge. The model learns physical concepts and control policies by watching vast amounts of video of humans performing handcraft tasks (like assembling objects). This learned 'understanding' of physics and manipulation can then be transferred to guide real robotic arms, improving task success rates by up to 70%. This bypasses the need for expensive, dangerous, and slow real-world robot trial-and-error training.

March 3, 2026 AI In-Depth Analysis

Beyond the Benchmark: How AI's Leap into Original Mathematical Discovery is Redefining Research

The week artificial intelligence stopped being a student and started being a scientist.

Abstract visualization of AI neural networks interacting with mathematical symbols and code

The history of artificial intelligence is punctuated by moments where a long-anticipated capability transitions from theoretical promise to tangible reality. For decades, the notion of machines contributing to the fundamental expansion of human knowledge—particularly in the austere realm of pure mathematics—existed primarily in the domains of philosophy and science fiction. That era has conclusively ended. The recent unveiling of Google DeepMind's Aletheia agent, which autonomously solved four genuinely open problems from the prestigious Erdos Conjectures database, represents not merely an incremental improvement, but a paradigm shift. AI is no longer just a tool for solving known puzzles; it has become an active collaborator in the exploration of the unknown.

This breakthrough, however, does not exist in a vacuum. It arrives alongside complementary revolutions in how AI perceives and interacts with the world. The rise of structural world models like Code2World—which foregoes pixel prediction for code generation—and the transfer of physical intuition from video to robotics in VideoWorld 2, paint a coherent picture. We are witnessing the emergence of a new class of AI systems: not just pattern recognizers, but reasoning explorers capable of operating in abstract, code-defined, and physical spaces with increasing autonomy. This analysis delves into the technical