Qwen-AgentWorld: Language World Models for General Agents

(arxiv.org)

71 points | by ilreb 5 hours ago

10 comments

adrian_b 27 minutes ago
The smaller of the two models is open weights and available on Huggingface:
https://huggingface.co/Qwen/Qwen-AgentWorld-35B-A3B
blurbleblurble 1 hour ago
This might be pretty big. One of my biggest frustrations with smaller models (especially MoE) is their failure to track workflow state at a high level. I'm constantly reminding them what we decided on or asking them to revisit, and reminding them eats context.
Seems like this might make that a lot less painful. And if not off the bat, with some minimal tuning or even just good prompting.
aliljet 32 minutes ago
The benchmarks here are confusing at best. Am I reading correctly that this model is essentially as good or better than all frontier models right now?
[-]
- anana_ 14 minutes ago
  I believe the benchmark listed is about simulating the environment for the various tasks, rather than doing them. It seems that the point of this model is to generate sim data to improve other models with
- blourvim 25 minutes ago
  Benchmarks in general are a little iffy, the whole industry is going off of vibes anyways. Can't decide before trying it out
dippogriff 2 hours ago
I'm a fan of this direction. For me the most interesting use case for these world models isn't even training, it's verification. If this thing or some idealized version of it can actually reliably simulate state transitions, could you use it to verify an agent's execution path against hard constraints and replace/eclipse LLMs-as-a-judge?
ElenaDaibunny 31 minutes ago
10M trajectories, probably more of a data scale win than a world model breakthrough tbh
psc007 2 hours ago
Eli5? What is this compared to a regular llm assistant model like the base qwen?
[-]
- gavmor 2 hours ago
  A regular LLM acts as a "policy," mapping a current state to a specific action (states → actions). Their new LLM acts as a "world model," mapping a current state and a chosen action to a predicted future state ((states, actions) → subsequent states). Instead of deciding "what to do," its explicit objective is to predict the exact environment observation that will result from the interaction history and the agent's current action.
  I assumed at first that it was trained on synthetic data, but they actually went and deployed real physical hosts and virtual machines (e.g. Ubuntu, macOS, and Android) and browsers. They ran agentic systems on these continuously and recorded the actual, real-world interactions.
  So it's an LLM that infers next state, or outcome,as structured data e.g. literal HTML code, UI view hierarchies, or accessibility trees.
  [-]
  - dmos62 9 minutes ago
    So, if I'm reading this correctly, whereas a regular LLM would, given a prompt to edit a file, infer a sed call, this "world" model infers the resulting contents of the file.
    [-]
    - kakugawa 0 minutes ago
      Here's the demo: https://docs.qwenlm.ai/resources/mlu56_demo.html
      Here's the description of the world model prompt for the UI-based domains: "A precise GUI state simulator — given the current screen (as HTML) and a user action, predicts the exact next screen as a complete, self-contained HTML document." (You can click the world model prompt for the full prompt.)
      So the world model generates the current state (an html document), an agent tells it what action it wants to perform, then the world model generates the next state (another html document).
      The other domains are similar, but w/ domain-specific nuance.
Tepix 3 hours ago
The labels of the very first chart (figure 1, bottom left) are obviously wrong which casts a doubt on the entire paper.
[-]
- dudisubekti 2 hours ago
  This label?
  > Figure 1: Overview of Qwen-AgentWorld. Top: Qwen-AgentWorld is a unified native language world model across seven domains. Bottom: We explore two complementary strategies for applying world modeling to enhance language agents (mainly using the 35B-A3B model as agent): Decouple and Unify , where the world model serves as the environment simulator and agent foundation model, respectively.
  Where is the mistake?
  [-]
  - Tepix 1 hour ago
    The deltas are wrong.
    The bars above the label "Infinite Real-World Envs" show growth for example from approx 42 to 55 but the red label says "+7.1". It's wrong for all of them.
    [-]
    - dudisubekti 4 minutes ago
      Ah I see. Yeah the graphics are probably AI-generated, and AIs do struggle with unit consistency in charts.
      (For another example, the charts in the August 2025 GPT-5 presentation)
    - yorwba 52 minutes ago
      According to Table 6, it's supposed to be 47.9 to 55.
verdverm 3 hours ago
35B model from the qwen-3.5 line
https://github.com/QwenLM/Qwen-AgentWorld
https://huggingface.co/Qwen/Qwen-AgentWorld-35B-A3B
[-]
- khimaros 2 hours ago
  unsloth, activate!
moozechen 1 hour ago
[dead]
stingraycharles 3 hours ago
[dead]