Can language models serve as text-based world simulators?

sbotzek

I tried for a few months getting ChatGPT4 to work with a MUD. In my experience it's not very good at that particular task.

One problem I ran into was its ability to logically connect rooms. In a MUD you navigate by going north, south, east, west, up, or down. Not every room lets you go any direction. And usually, if you go east, in your new room, you can go west. Rarely a level creator will make this not be true. ChatGPT4 was pretty bad at it though.

Another problem was descriptions. It might mention a mountain in the distance once. But then never again. So this giant landmark was described in a single room.

It was also difficult to get it to create a fair quantity of secrets in logical places. Lots of times it would just chain together multiple secrets in a single place. If you have more than one, you want to spread it around

And finally, room layout. It tended to not be very good at this. Lots of linear layouts. It didn't have an eye towards when details should be complex rooms and when it can just be a line in a description.

So it could do it, but it created levels that weren't very fun or particularly creative, even when it came to room descriptions.

gfosco

Not by themselves... but with a good program built around it, managing the actual state, and very careful prompting, yes. I've been thinking about this for a while, always have the desire to make a game.

xrd

I made a D&D game for my kids using llama. It went surprisingly well. Lots of potential here.

https://blog.katarismo.com/2023-05-26-i-m-a-dad-i-replaced-m...

sk11001

Recently I'm getting some strange behavior from ChatGPT (with GPT-4) where it outputs a code snippet in which it declares a variable, and then on the very next line it forgets the name of the variable and refers to it as something else. Or it refers to the variable but it makes a typo in the name.

If that's the behavior by one of the best models on a few lines of code, I'm not hopeful for a world simulator any time soon unless we see another big leap in model quality.

skybrian

Maybe they can, but they’re trained on conversation and stories (among other things) rather than simulations. These are more general than the log of a simulation run in how they use time. Stories can be chronological, but they can also fill in the details in any order using things like flashbacks. Or they can get really complicated with time travel stories.

So it seems like to understand a story well, an LLM would need a more sophisticated notion of time, where there is a history and events are slotted in as they’re learned from the narrative, and sometimes the whole history gets reinterpreted based on new information. (Plot twist!)

It would be fascinating if some mechanistic interpretability researcher figured out how they actually work with story time. Are there the rudiments of that kind of understanding yet?

leonardspeiser

I was able to get this to work with an LLM, but had to build some short term memory to keep awareness as you explored. The current site allows you to interact with an Oregon Trail or Zork like world, but you can specify any time in history and then it will create the stories and stay within that world. I also have it generate some images to go along with you for fun. https://grue.is/ (PS, I don't know how to code, so this is also proof that you can use an LLM to write all the software, here is my github if you are interested in learning more about that: https://github.com/lrspeiser/Grue.is)

ninetyninenine

The answer is yes.

https://aidungeon.com/

This actually came out before chatGPT and it floored me.

klaussilveira

This is neat. In theory, you could hook llama.cpp into into a GOAL-based planner (https://www.gamedevs.org/uploads/three-states-plan-ai-of-fea...) and have much better default bots navigating your nav mesh. Even better, if you record player actions as GOAL actions within the nav mesh, you can use that to fine tune the model. Or even feed it back in realtime so they learn the modus operandi of the player.

d13

Try it!

All you need to do is type this into llama 3 8B or 70B 16fp instruct:

“You are a text adventure game.”

Done.

anthk

Inform7 it's a beast and the resulting game can be run under a 486. No AI required.

guestbest

I think there is a MUD effort called LlamaTale which allows creation of this as telepresence. I’m trying to put a MUD together to try this out.

enterexit

Curious, can this objective be reworded as "How big a universe an LLM can serve to simulate?"

binary132

Can headlines written in the interrogative case ever be answered in the affirmative?

bbor

Technically “text-based world simulator” pretty much is an explanation of LLMs already. That’s why we all suddenly care about dumb chatbots —- they accidentally cracked the frame problem.

lyu07282

TL;DR: unfortunately the conclusion is, no they can't

> the best recorded performance is 59.9% on accurately simulating state transitions [...] after 10 steps, average simulation accuracy would reduce to less than 1%. Our results indicate that LLMs are not yet able to reliably act as text world simulators

firtoz

Would be interesting to see a version of this but incorporating tool usage features of the latest models.

w4ffl35

I've already built things like this. Cool paper but I build things, put them to the side, never write a paper. Sometimes tweet about them. Then I see a paper later about some similar thing and the crowd goes wild. Not a complaint. Idk what else to say.

Bluestein

Nethack FTW :)

carabiner

Too much data are locked up in non-public sources that still affect the world. You will not find complete engineering analysis reports for the Ford F-150 engine on the internet or full email exchanges for Trump's presidential campaign planning in Ohio. Yet these all influence us.

P_I_Staker

The answer is no