WEBVTT

00:00.000 --> 00:10.440
Hello everybody, I'm here to present a software project I've been working on called

00:10.440 --> 00:14.600
Nobody Who.

00:14.600 --> 00:16.160
This is a structure of the talk.

00:16.160 --> 00:20.160
First I'll give you a brief overview of what nobody who is, what it will do for you.

00:20.160 --> 00:24.440
Then I'll get into the motivation of why nobody who is a good idea, why we actually

00:24.440 --> 00:27.440
need this, and why we built it, how we built it.

00:27.440 --> 00:31.120
Then I'll get into the details of how it works, and how you can use it, some of the

00:31.120 --> 00:35.000
technology choices we made, and then at the end I'll get into some demos, I'll show you

00:35.000 --> 00:42.080
some games that people built with nobody who, because nobody who, what is it?

00:42.080 --> 00:48.320
It is a plugin for goodo, natively compiled written and rust for running local large language

00:48.320 --> 00:49.720
models.

00:49.720 --> 00:53.800
A common misconception that I hear all of the time when I present to nobody who like this is

00:53.800 --> 00:56.920
people are going like, oh, so it'll write the GD script code for you and make the game.

00:57.000 --> 00:58.440
This is not the point.

00:58.440 --> 01:04.040
The point of nobody who is to run the large language models at runtime to provide an enriched

01:04.040 --> 01:07.240
role-playing interactive fiction experience.

01:07.240 --> 01:10.800
It's sort of the core use case, there's lots of different applications of it, but the core

01:10.800 --> 01:16.600
use case is to have richer NPCs, it's 2025, that's not to multiple choice dialogue

01:16.600 --> 01:20.160
in RPGs anymore.

01:20.160 --> 01:24.200
This is sort of what the stack looks like in the middle there, just under my cursor,

01:24.200 --> 01:30.200
so nobody who logo, we depend on LMS CPT, if you work with local LLMs, you're probably

01:30.200 --> 01:35.440
familiar with this library, it's really cool, it goes really fast, and it has uses an open

01:35.440 --> 01:40.760
specification for the language models, you can use all sorts of open weights models, not

01:40.760 --> 01:42.800
just meta-slama.

01:42.800 --> 01:48.120
We do hardware acceleration with Vulcan, so it'll work on your GPU, no matter if it's an

01:48.120 --> 01:52.280
Nvidia or indeed or an IGPU, and if you don't have a GPU it'll go as fast as a camera

01:52.280 --> 01:59.160
on the CPU, which is, in my experience, fast enough, it all works in good though, and

01:59.160 --> 02:07.480
we sort of ship these pre-compiled binaries that are linked into the good though game, and

02:07.480 --> 02:10.040
it's just one big fat binary.

02:10.040 --> 02:17.560
We ship for Linux and Mac and Windows, both on Armin, X86, and we have sort of an experimental

02:17.560 --> 02:21.800
Android build, it's kind of fun.

02:21.800 --> 02:22.800
So why?

02:22.800 --> 02:25.720
Now I'll try and convince you that this is actually a good idea, this is something

02:25.720 --> 02:32.320
that games developers could make their games better with.

02:32.320 --> 02:38.240
First of all, why would you want to use language models in games at all?

02:38.240 --> 02:42.880
And I think my core argument here is that in real-playing games the state of the art is sort

02:42.880 --> 02:50.920
of having these multiple choice dialog trees, and I think they kind of suck.

02:50.920 --> 02:54.540
If I'm being generous, I would say it'd be very useful because you can force a character

02:54.540 --> 02:59.680
into taking specific decisions, but that also takes out a lot of the fun of role play.

02:59.680 --> 03:03.680
A comparison is you can sort of think of if you're playing Baldur's Gate 3, which has

03:03.680 --> 03:09.160
D&D rules versus playing actual D&D, you can be much more creative in actual D&D, because

03:09.160 --> 03:11.800
you could sort of bring more of your self into it, you could bring more role playing

03:11.800 --> 03:12.800
into it.

03:12.800 --> 03:14.040
It's a very, very common experience.

03:14.040 --> 03:18.280
If you've ever played one of these RPGs with multiple choice dialogs, that you reach

03:18.280 --> 03:22.280
a point where you're like, oh, but none of these three options are actually what I want

03:22.280 --> 03:24.280
my character to do.

03:24.280 --> 03:30.560
But if you have free text input or voice, if you want that, then you can sort of say whatever

03:30.560 --> 03:35.000
you want, and the characters in the game will improvise along with you and sort of make things

03:35.000 --> 03:37.200
work.

03:37.200 --> 03:41.760
This is another case if you've ever played Skyrim, you are familiar with this dialogue option,

03:41.760 --> 03:44.520
like it sort of gets very, very repetitive.

03:44.520 --> 03:49.680
What we want is emergent gameplay, so you can role play in a universe for a long time,

03:49.680 --> 03:54.440
and with pre-written dialogue, it takes a lot of time from games writers, and even when

03:54.440 --> 03:59.520
it takes long time from games writers, with AAA projects like this, it still ends up

03:59.520 --> 04:03.440
being sort of repetitive, and it doesn't feel that that immerses, and we want more

04:03.440 --> 04:06.600
immersion in our games.

04:06.600 --> 04:10.440
There's sort of the other thing around, I don't think we should be ever be applying

04:10.440 --> 04:17.880
a lens just for the case of playing a lens, but games are also one of the best places

04:17.880 --> 04:22.200
to apply a lens.

04:22.200 --> 04:27.080
If you're thinking about harmful uses of a lens, it's sort of these things where the

04:27.080 --> 04:30.120
risk of the LLM hallucinating can be very, very bad.

04:30.120 --> 04:34.920
If you're looking at like parola provost for correctional facilities or screening job interviews

04:34.960 --> 04:43.160
have you never want hallucinations, but interactive fiction hallucinations are not that bad.

04:43.160 --> 04:50.600
I think there's actually a case to be made that you a lot of the time do want hallucinations.

04:50.600 --> 04:54.400
I'm thinking of like, if I walk up to a character, the standing on the court of the street,

04:54.400 --> 04:57.400
and then I say, what did you have for breakfast, right?

04:57.400 --> 05:02.240
The character didn't have breakfast, and probably was standing on the court of the street

05:02.320 --> 05:09.280
the whole time, but a sort of willing improvisational partner will say, oh, I should

05:09.280 --> 05:13.040
yes, and then I should go over breakfast, I had big kinetics, right?

05:13.040 --> 05:17.120
So we actually, actually, a lot of the time do want it to just go along with whatever

05:17.120 --> 05:20.320
you're presenting.

05:20.320 --> 05:24.400
Another case for why games are a good application of the lens is that machines that

05:24.400 --> 05:29.760
are built for running video games have graphics cards.

05:29.760 --> 05:33.360
This is exactly the kind of hardware you want if you want to run local elements in the

05:33.360 --> 05:34.880
fast.

05:34.880 --> 05:39.040
So when looking at like, oh, what's the compromise of local elements or the machines

05:39.040 --> 05:39.840
good enough?

05:39.840 --> 05:44.320
The machines that are good enough for the machines that are built for running video games.

05:44.320 --> 05:48.480
So why do we want to do it local?

05:48.480 --> 05:52.400
A big one is it fits really well with the current market of game development.

05:52.400 --> 05:56.400
If you're sort of looking at how do games developers make money, what they do is they put

05:56.400 --> 05:59.680
their game on Steam, you pay 10 euros, you get the game, and you as a consumer sort of

05:59.760 --> 06:03.200
expect for this one-time payment to keep playing the game.

06:03.200 --> 06:07.440
If you want to be able to play the game in a year or in 10 years, and if you're relying on

06:07.440 --> 06:14.160
sort of the easy thing to do, which is to use like GBT4 and the OpenAI, it's to PAPI,

06:14.160 --> 06:17.920
it's very quickly get something going, but you'll end up being paying for usage,

06:17.920 --> 06:19.920
and this doesn't work that well.

06:19.920 --> 06:24.240
The games that do this, they typically do it by keeping the product sales business model,

06:24.240 --> 06:26.720
but then imposing a usage limit.

06:26.720 --> 06:32.000
And then you sort of, the game just shuts down after you're playing it for five hours or something like this,

06:32.000 --> 06:36.960
because the game developers can only afford processing this many tokens for one sale.

06:36.960 --> 06:40.480
And that sucks. I want to be able to buy a game and then be able to keep running it.

06:41.520 --> 06:48.080
There's a privacy aspect. If I'm a role-playing, I don't always want somebody to pick in, right?

06:48.080 --> 06:52.320
This is a thing that's familiar to a pen and paper, a bunch of dragons players, right?

06:52.320 --> 06:58.080
You're very, there's a lot of trust building in order to be able to improvise with other people.

06:58.080 --> 07:01.920
So it can be uncomfortable this idea of other people peaking on your conversations.

07:02.720 --> 07:06.960
Offline capabilities, just a great feature. When I go on the airplane home from Boston,

07:06.960 --> 07:09.600
I want to be able to take out my steam deck and play a few games.

07:09.600 --> 07:13.200
I can't play the games that require an online API.

07:13.200 --> 07:18.400
So this is a cool new feature that you only can get with the alarms if you're doing local alarms.

07:18.880 --> 07:24.720
There's reliability. If you're depending on an external API to run your game,

07:24.720 --> 07:28.560
then that API can change. It's sort of a bunch of software that's out of your control.

07:29.360 --> 07:33.120
If you look at the history of OpenAI's or OpenAI's APIs,

07:34.480 --> 07:37.200
they've sort of changed how the models respond over time.

07:37.200 --> 07:40.400
So there's something that the model might want to play along with one week.

07:40.400 --> 07:44.880
Could in the next week it could just go like, oh, as an AI assistant, I cannot.

07:45.840 --> 07:51.120
And you don't want this. You want things to keep working in the same way as they did when you shipped your game.

07:52.800 --> 07:56.800
This one, I think, is really the big and important one.

07:57.520 --> 08:02.320
Games are art, games are culture. We want to be able to preserve this for future generations.

08:03.200 --> 08:06.320
Games are a cable. There's people working with games are a cable professionally,

08:06.320 --> 08:10.320
and they have a really, really hard time making things running or keeping things running.

08:11.200 --> 08:14.240
Players Rocket League will notice that they've dedicated Linux support a few years ago,

08:14.240 --> 08:17.920
and then like, oh, this game that I bought, it doesn't even run on my computer anymore.

08:17.920 --> 08:22.160
Well, it will, but I can't play with my friends because it requires a server for online play.

08:22.720 --> 08:27.440
So that kind of sucks. But with local elements, we can sort of make the type of game that I can

08:27.440 --> 08:31.040
put on my laptop with my laptop and a vacuum chamber and pull it out a hundred years later,

08:31.040 --> 08:34.080
and it'll still work. So I think that's cool.

08:35.600 --> 08:41.280
Okay, so how are we building this? I'll get back to this slide that we saw before.

08:42.000 --> 08:48.560
This is Lama CBP with Vulcan Acceleration, so we get good hardware support and runs fast.

08:50.480 --> 08:54.400
You'll sort of notice, if you know Codell, well, you'll notice that's a C++ project,

08:54.400 --> 08:59.200
and Lama CBP is a C++ project from home, but then we wedged some rust in there.

08:59.920 --> 09:06.640
That sort of gives us a lot more confidence in writing correct code and it runs fast like C++ does.

09:07.680 --> 09:10.720
Another thing that sort of sticks us out a little bit in the ecosystem is that

09:11.280 --> 09:14.400
looking upwards in the stack and downwards in the stack, and everything is.

09:14.400 --> 09:18.400
It might see lessons. We cover left license, nobody who.

09:20.320 --> 09:22.880
For reasons I don't have to explain to this obvious.

09:24.560 --> 09:28.560
I think that we notice a lot of the time with new people picking up this project is that they

09:29.760 --> 09:32.960
are going like, oh, copy left license, that's scary because I want to make a proprietary game.

09:34.000 --> 09:38.480
So we have to be sort of very explicit in the read me saying like, you can link into this

09:38.480 --> 09:42.880
copy left dependency, you just can't make proprietary forps. It's not something that the

09:42.880 --> 09:44.960
game dev communities totally privy to.

09:47.280 --> 09:52.880
Yeah, so because of we're using Lama CBP, Lama CBP exposes or increments an open

09:52.880 --> 09:57.280
specification for the model files, which means that we can use lots of different models.

09:57.280 --> 10:01.760
Whenever a new fancy model comes out very quickly, there's a GDUF, which is a file format,

10:03.520 --> 10:07.600
a quantization available for it, and you can sort of just pull these down from the internet.

10:07.680 --> 10:12.000
So we don't ship a specific model. We're expected developers to sort of figure out what model

10:12.000 --> 10:17.280
suits their needs. So if you want to just run the state-of-the-art game out or deep-seek models,

10:17.280 --> 10:21.680
that's fine. If you want to run something that's trained on a specific data set or doesn't come

10:21.680 --> 10:24.960
out of big tech and Silicon Valley for some reason, then you can go and use the

10:24.960 --> 10:29.120
sell-and-mental models from Barcelona University or the Luther AI stuff for whatever you want.

10:29.120 --> 10:33.360
So we don't have a strong opinion on what the open weights model should look like and sort of just

10:33.360 --> 10:38.640
use any model that's in this format or fine-tune your own thing if you want to. Let's get into

10:38.640 --> 10:45.280
the code. We expose them very, very simple API and look sort of like this. I'll just go through

10:45.280 --> 10:51.280
all of it. We first configure, like set the model, this is the file of GDUF, and you select

10:51.280 --> 10:55.200
them from the last slide. There's a prompt, says you're an evil wizard, you always curse whoever,

10:56.160 --> 11:01.120
then you call say, which is the user saying something to the character, and then it'll sort of

11:01.120 --> 11:04.640
in that thread in the background generator response, and if you're familiar with GDUF idioms,

11:04.640 --> 11:09.360
you'll know about signals, and which is sort of like a pub sub-tech thing you can

11:10.080 --> 11:15.200
await response. You can also wait for the tokens as they come out when at a time, which makes

11:15.200 --> 11:21.920
a much better experience. We'll see this in the next couple of demos. We also support embeddings

11:21.920 --> 11:28.000
by the bar. I don't only have three minutes, so I'll talk too much about this. We support all

11:28.080 --> 11:32.080
of the sample methods that allow us to be personalized, so you can do what you need to in terms

11:32.080 --> 11:38.480
of controlling the model. If you ever want to penalize some tokens, compared to others, if you

11:38.480 --> 11:45.280
want to greedy or a mere step-sign for them in P, whatever we can do all of this. This slide is too

11:45.280 --> 11:56.080
early. Let's see some games. This game was made as a demo by a friend of mine. I like to think

11:56.160 --> 12:01.200
of it. It's sort of like papers, please, in reverse. You're suspect of a crime. You're talking

12:01.200 --> 12:06.720
to an interrogator who is sensing you with evidence, and you sort of have to provide convincing

12:06.720 --> 12:11.280
explanations and elevations for each piece of evidence that they're presenting with. This one is

12:11.280 --> 12:15.920
using two different model contexts, one that's sort of prompted to in a structured way using

12:15.920 --> 12:22.160
JSON, assess the viability of your, of your, of your, of your, elevating your explanation. And then

12:22.240 --> 12:26.560
there's another model context using the same model loaded into VRAM, which is doing the role

12:26.560 --> 12:33.040
play of the interrogator. And you can see there's sort of a credibility meter down there. I think

12:33.040 --> 12:44.000
that's really fun. Next one. This one is not a full game. It's a small demo I made to show that

12:44.000 --> 12:49.600
you can have the element direct back to your game code. This is like a potion shopkeeper. I named

12:49.680 --> 12:56.320
her a fubar. You can sort of ask her what potions tell for sale. What about that orange one? What

12:56.320 --> 13:00.480
about the blue one? What does it do? I need a strength potion. What would you recommend? And then

13:00.480 --> 13:08.720
you can also ask her like, okay, sure, let me get the strength potion. And it'll give you

13:08.720 --> 13:12.320
like a pop-up. And this is sort of showing that you can use this to trigger your actual game code.

13:12.320 --> 13:17.360
And then the strength potion disappears from the inventory. You can also do stuff like ask her like,

13:17.440 --> 13:20.640
what do they taste like? Or what do you have for breakfast? And she'll sort of play along.

13:24.640 --> 13:30.640
But this next one was something that somebody made for global game jam last week.

13:32.320 --> 13:36.960
And it's another detective game. This one you play sort of as the detective, you arrive at a

13:36.960 --> 13:40.800
space station that's binding helium 3. And there was an incident recently you have to figure out

13:40.800 --> 13:45.600
what happened. And there's some undertones that like probably the corporate overloads for running the

13:45.680 --> 13:50.320
station are trying to cover stuff up. And there's a bunch of different characters and they all

13:50.320 --> 13:55.680
have different tones of voice and like they have secrets that you can sort of transcess out.

13:56.880 --> 14:02.960
And that's kind of fun. And you can tell all of these demos by the way are running on a laptop,

14:02.960 --> 14:09.440
right? So, and so it doesn't require like insane the gamer rigs or something. This is just a

14:09.440 --> 14:18.320
framework 16. And the tokens come up reasonably fast. Who, one minute? It's good because I think

14:18.320 --> 14:26.000
of one more demo. This one is sort of a dungeon crawler. I really like the, well I really like the

14:26.000 --> 14:31.760
aesthetics of this game, but I really like the ambitions of this guy. Because there are some projects

14:31.760 --> 14:37.200
that are like LLM power dungeon crawlers. But a lot of them work like collaborative storytelling

14:37.280 --> 14:40.400
exercises. You can sort of go with them. Then I pull out my two double flame swords. And to

14:40.400 --> 14:44.960
capitate tenogers in the game, we're just going to make sure you do another old debt. This one's

14:44.960 --> 14:50.160
a bit more intelligent and works like an actual game. Like the power of your sword will assess the

14:50.160 --> 14:54.400
probability if you actually feeling a certain number of damage. And you could still be creative,

14:54.400 --> 14:59.040
like a pan in the, in the indecession and saying, okay, I use my shield to bash the barrel and make

14:59.040 --> 15:04.160
it find the direction of the skeleton. And then it like SSO is absurd, false snow. This is actually

15:04.240 --> 15:08.480
a barrel in the room. So it actually works like a real game, but you get to bring a lot of

15:08.480 --> 15:17.440
your own creativity into it. That's it, I guess. Yes, you're out of time. Thank you, links.

15:22.560 --> 15:24.400
Is Daniel around on next speaker?

