https://blog.katarismo.com/2023-05-26-i-m-a-dad-i-replaced-m...
If that's the behavior by one of the best models on a few lines of code, I'm not hopeful for a world simulator any time soon unless we see another big leap in model quality.
So it seems like to understand a story well, an LLM would need a more sophisticated notion of time, where there is a history and events are slotted in as they’re learned from the narrative, and sometimes the whole history gets reinterpreted based on new information. (Plot twist!)
It would be fascinating if some mechanistic interpretability researcher figured out how they actually work with story time. Are there the rudiments of that kind of understanding yet?
This actually came out before chatGPT and it floored me.
All you need to do is type this into llama 3 8B or 70B 16fp instruct:
“You are a text adventure game.”
Done.
> the best recorded performance is 59.9% on accurately simulating state transitions [...] after 10 steps, average simulation accuracy would reduce to less than 1%. Our results indicate that LLMs are not yet able to reliably act as text world simulators
One problem I ran into was its ability to logically connect rooms. In a MUD you navigate by going north, south, east, west, up, or down. Not every room lets you go any direction. And usually, if you go east, in your new room, you can go west. Rarely a level creator will make this not be true. ChatGPT4 was pretty bad at it though.
Another problem was descriptions. It might mention a mountain in the distance once. But then never again. So this giant landmark was described in a single room.
It was also difficult to get it to create a fair quantity of secrets in logical places. Lots of times it would just chain together multiple secrets in a single place. If you have more than one, you want to spread it around
And finally, room layout. It tended to not be very good at this. Lots of linear layouts. It didn't have an eye towards when details should be complex rooms and when it can just be a line in a description.
So it could do it, but it created levels that weren't very fun or particularly creative, even when it came to room descriptions.