WEBVTT

00:00.000 --> 00:14.000
Hi, I'm going to present the talk from book court to tricks in upgrading the Raspberry Pi

00:14.000 --> 00:15.000
Graphic Start.

00:15.000 --> 00:21.000
I would like to at the beginning thanks to the organisers for making possible this year the

00:21.000 --> 00:24.000
Graphic The Prime, we are back again.

00:24.000 --> 00:29.000
Well, I'm Chema Casanova, people know me by this name.

00:29.000 --> 00:32.000
Also my real name is Chosemaria Casanova.

00:32.000 --> 00:36.000
I'm an open source developer at Gallion, the Graphic Sting.

00:36.000 --> 00:43.000
I've been collaborating with the Mesa Community since almost power more than nine years

00:43.000 --> 00:49.000
contributing to the interior driver at the beginning and last, I don't know, five six

00:49.000 --> 00:52.000
years working on the broad control drivers for the Raspberry Pi Foundation.

00:52.000 --> 00:57.000
And I'm also one of the founders of the Egalia in the South San Juan.

00:57.000 --> 01:03.000
So if anybody's interesting about how to create a company working on free open source,

01:03.000 --> 01:06.000
you can ask questions to me.

01:06.000 --> 01:08.000
So what are we going to talk about?

01:08.000 --> 01:12.000
It has been two years since last time we presented it.

01:12.000 --> 01:19.000
And recently we have launched the new version of the operatic system that runs in the Raspberry Pi

01:19.000 --> 01:20.000
devices.

01:20.000 --> 01:22.000
They imagine that usually people tend to use it.

01:22.000 --> 01:26.000
And it's the moment when we are getting the updates to the users.

01:26.000 --> 01:31.000
So we are going to present what sense in that we are going to do a small hardware overview

01:31.000 --> 01:34.000
to understand what we are going to focus.

01:34.000 --> 01:40.000
It's mainly on the work on the graphics in the Mesa side and the kernel side.

01:40.000 --> 01:45.000
I will focus while the more important changes that is related to performance.

01:45.000 --> 01:50.000
We have improved in some benchmarks, the performance of the driver.

01:50.000 --> 01:58.000
So while Raspberry Pi is a segmented system that usually are used by the users.

01:58.000 --> 02:02.000
When we land some new feature of a new Mesa version,

02:02.000 --> 02:05.000
it's the moment we receive the feedback from users.

02:05.000 --> 02:09.000
Because power users usually like to have the last version of the drivers,

02:09.000 --> 02:13.000
the last version of the camera, and tend to get updated.

02:13.000 --> 02:15.000
So we should unfair in that case.

02:15.000 --> 02:17.000
This was the previous version.

02:17.000 --> 02:18.000
They took one.

02:18.000 --> 02:20.000
You can see the desktop.

02:20.000 --> 02:25.000
It has been almost the same since the previous version is on both sides.

02:25.000 --> 02:30.000
It contains, but under the hood, there have been a lot of changes.

02:30.000 --> 02:35.000
We were using materials compositor.

02:35.000 --> 02:41.000
In both sides we started having the first using well and by default.

02:41.000 --> 02:47.000
But in Booker we launched the desktop using Wi-Fi.

02:47.000 --> 02:52.000
That and we went make it used by default for every users.

02:52.000 --> 02:58.000
After one year, more or less we changed the laptop you see.

02:58.000 --> 03:02.000
Because there were several uses in the older generation of the hardware.

03:02.000 --> 03:07.000
Firstly, pi1 and 3 have some problems because how the GPU minus the memory.

03:07.000 --> 03:15.000
So it was better to use software rendering for the compositor.

03:15.000 --> 03:20.000
Because if you run out of memory and run in the composition, it crashed.

03:20.000 --> 03:25.000
But it doesn't happen if you're doing the software rendering.

03:25.000 --> 03:31.000
But applications now can run hardware accelerated in the desktop.

03:31.000 --> 03:38.000
And we make an update in October 2024, including a new release on NASA.

03:38.000 --> 03:41.000
So now the desktop is almost the same.

03:41.000 --> 03:46.000
We had used in LWC as we agreed it in the previous cycle.

03:46.000 --> 03:52.000
And there are some upgrades related to, we have a new.

03:52.000 --> 03:57.000
This was a request from users to improve the control center application.

03:57.000 --> 04:00.000
It was to have a reset together.

04:00.000 --> 04:03.000
How do you change the resolution of the screen?

04:03.000 --> 04:08.000
How do you set up when you have the different outputs?

04:08.000 --> 04:11.000
There are some improvements on the operative system.

04:11.000 --> 04:12.000
How we manage the desktop.

04:12.000 --> 04:15.000
That is easier to come from a light.

04:15.000 --> 04:17.000
You must do a desktop image.

04:17.000 --> 04:18.000
It's the other way around.

04:18.000 --> 04:20.000
It's just installing some packets and removing that.

04:20.000 --> 04:25.000
In the past we were quite tight to move from one of the versions of the image to the other.

04:25.000 --> 04:27.000
That is for free.

04:27.000 --> 04:30.000
Start with a headless display.

04:30.000 --> 04:32.000
Moving to the desktop one is easy.

04:32.000 --> 04:38.000
The path was installing several packets and modifying several files that were already in the image.

04:38.000 --> 04:44.000
So, well, my presentation is going to focus all the best mass of them.

04:44.000 --> 04:49.000
And the analysis and the features that are enabled are going to be about the display.

04:49.000 --> 04:54.000
Almost all the features are available in the Raspberry Pi for the generation also.

04:54.000 --> 04:55.000
So, one, two, three.

04:55.000 --> 05:02.000
I'm going to touch the mass them because they are quite stable and we haven't updated the drivers to mass.

05:02.000 --> 05:07.000
It's that solving some issues that came from pervasive mass.

05:07.000 --> 05:13.000
The prototype was launched with the previous version of Google once was launched in 2003.

05:13.000 --> 05:19.000
Well, this is the graphic stuff for Raspberry Pi.

05:19.000 --> 05:28.000
We have the first generations that are using in the kernel driver is the BC4 model under DRAM.

05:28.000 --> 05:30.000
And the message driver is called BCDN.

05:30.000 --> 05:32.000
It has support for OpenGL.

05:32.000 --> 05:34.000
It really is 2.0.

05:34.000 --> 05:42.000
When we get to the new generation of Raspberry Pi 4 and 5, in the kernel side, the driver is still BC4.

05:42.000 --> 05:44.000
But now it doesn't handle the render.

05:44.000 --> 05:49.000
The rendering is by DRAM is 100 in the BCDN kernel model.

05:49.000 --> 05:57.000
And in the case of the user space drivers, we have BCDN that is the OpenGLES OpenGL driver.

05:57.000 --> 06:02.000
And we have now a board driver that is BCDB.

06:02.000 --> 06:05.000
So, well, what sense?

06:05.000 --> 06:12.000
Well, mainly we are maintaining the same version of OpenGL from Google to electricity.

06:12.000 --> 06:16.000
But we have implemented a lot of instances.

06:16.000 --> 06:23.000
There are several ones related to supporting whether option for death clamping.

06:23.000 --> 06:32.000
A lot of time was investing in proven things to manage to get better performance for the users.

06:32.000 --> 06:37.000
So now we have timer queries so you can check the time it has to launch a draw.

06:37.000 --> 06:43.000
Even there is has issues in the case of Tyler because if you get the time between two draws, you can have.

06:43.000 --> 06:47.000
You could be split in the ad job that we were going to be together.

06:47.000 --> 06:50.000
So, but in Sanchez is really useful.

06:50.000 --> 06:58.000
The next thing we have been working also was making the implementation of texture barriers.

06:58.000 --> 07:05.000
That was a big performance improvement because we realized that driver was doing test drive by default.

07:05.000 --> 07:06.000
And it was not needed.

07:06.000 --> 07:13.000
So, December, I am being less conservative about how to handle the rates of the textures.

07:13.000 --> 07:19.000
Imply that as a big performance in case that with it was not needed.

07:19.000 --> 07:27.000
And one of the main features that we were working during this cycle was the implementation of dual social landing.

07:27.000 --> 07:29.000
That is not supported by the hardware.

07:29.000 --> 07:32.000
So, it is a kind of emulated but we are taking advantage of that.

07:32.000 --> 07:38.000
We have a lot of emulators that to implement the sound of the effects that they are doing.

07:38.000 --> 07:49.000
And we take advantage of that also because one of the requirements of OpenGL 3.1 was supporting render targets with do not 16 bits.

07:49.000 --> 07:52.000
And that was not supported by the hardware.

07:52.000 --> 07:59.000
So, with this we managed to implement that and we are more 3.1 OpenGL desktop conformers.

07:59.000 --> 08:05.000
We only have one thing pending to solve that conformers on this.

08:05.000 --> 08:13.000
And while another world was also related to sound requests from people using web GPU with hardware.

08:13.000 --> 08:23.000
That is improved the robustness because it was running after the application and just set in shaders.

08:23.000 --> 08:31.000
And the running that in a browser is a good place to put some incorrect access to a buffer and modify things that you are not especially.

08:31.000 --> 08:37.000
But browsers used to have robustness of requirement to support that.

08:37.000 --> 08:45.000
And another interesting feature was the framework buffer fits that allows you to get in your shader.

08:45.000 --> 08:57.000
The data from the frame buffer that I use that instead of taking and use a sample to get information from the text to that you have time to the frame buffer.

08:57.000 --> 09:04.000
And while there were also some shaders to group operation that we already implemented.

09:04.000 --> 09:14.000
Well, on the Vulcan size, the big change really was that now we are currently discussing Vulcan 1.3 that is conformant.

09:14.000 --> 09:26.000
It is conformant since 2024, but it is now supposed to reduce it because of the time life of when the best I was updating in the previous cycle.

09:26.000 --> 09:29.000
It was just a couple of commits before.

09:29.000 --> 09:32.000
So, within a day to the moment.

09:33.000 --> 09:45.000
So, well, the more important of having a new core feature is that it allows us to avoid the check of I have that extension enable.

09:45.000 --> 09:48.000
So, you assume that everything is already available.

09:49.000 --> 09:55.000
It is implementing with a two-class to implement dynamic renderings and synchronizations.

09:55.000 --> 10:00.000
API operations, the standard dynamic runs also.

10:00.000 --> 10:08.000
And some improvement that are usually this maintenance extension that we have a lot of them that mix several improvements for the.

10:08.000 --> 10:13.000
We already implemented 24 extensions in this complex cycle.

10:13.000 --> 10:20.000
I including all of them to me because there are only three since the version that we are currently using in book work.

10:20.000 --> 10:23.000
So, well, ready to dynamic state.

10:23.000 --> 10:26.000
The same for death control.

10:27.000 --> 10:34.000
So, in this case, we are taking advantage of some some cases that some stations that are already implemented by other drivers.

10:34.000 --> 10:41.000
That they are happening for us because there are general call for some of these ones where.

10:41.000 --> 10:47.000
Actually, these are developed by Indonesia's multidisciplinary support.

10:47.000 --> 10:56.000
A lot of bands of the Windows system integration extension that are already part of the.

10:56.000 --> 10:59.000
Mesa infrastructure.

10:59.000 --> 11:06.000
And different extensions that are related to well access to.

11:06.000 --> 11:08.000
The storage operations.

11:08.000 --> 11:12.000
The accessibility the visual that was already interesting for for us.

11:12.000 --> 11:15.000
And in order to get in the timing that the word.

11:15.000 --> 11:19.000
Developed for Bob and call what I was participating on them.

11:19.000 --> 11:25.000
And well, the other thing that users are going to notice in some cases is the.

11:25.000 --> 11:28.000
If they are interesting run investments.

11:28.000 --> 11:31.000
GFS events is going to pass faster nowadays.

11:31.000 --> 11:33.000
Generative more frames.

11:34.000 --> 11:39.000
I'm going to show some of the more interesting improvements that we have in.

11:39.000 --> 11:43.000
The period from Bungor to two tricks and run on the Raspberry Pi five.

11:43.000 --> 11:45.000
So, this have the.

11:45.000 --> 11:46.000
The different.

11:46.000 --> 11:49.000
Demos that has guest events.

11:49.000 --> 11:54.000
And this is the in the blue we have the original at book or running the benchmark.

11:54.000 --> 11:58.000
Jala was the day that we did in the middle of the cycle in October.

11:59.000 --> 12:01.000
And in the red.

12:01.000 --> 12:03.000
Red gram one is the.

12:03.000 --> 12:04.000
The.

12:04.000 --> 12:06.000
The green person that we have already intrigued see.

12:06.000 --> 12:08.000
So we see that this policy.

12:08.000 --> 12:10.000
Now the performance improving.

12:10.000 --> 12:12.000
Some of the benchmarks in the.

12:12.000 --> 12:17.000
But Manhattan was more than an increase of more than 200%.

12:17.000 --> 12:18.000
That is good.

12:18.000 --> 12:20.000
I go to show.

12:20.000 --> 12:22.000
The video of this is not.

12:22.000 --> 12:23.000
Days is was a.

12:23.000 --> 12:24.000
We did.

12:24.000 --> 12:25.000
60% of the final.

12:25.000 --> 12:26.000
A number of the government works.

12:26.000 --> 12:27.000
I have been my kids in Wa'ah.

12:27.000 --> 12:28.000
So these are the.

12:28.000 --> 12:30.000
Yeah, and we have got.

12:30.000 --> 12:32.000
I have the year, the 들어 thing.

12:32.000 --> 12:33.000
Yes we have.

12:33.000 --> 12:35.000
Uh 그.

12:35.000 --> 12:36.000
Uh, we.

12:36.000 --> 12:38.000
예 Yeah, uh, trying to.

12:38.000 --> 12:39.000
So we need.

12:39.000 --> 12:45.000
We need this content, when people pressman.

12:45.000 --> 12:48.000
We know that the people.

12:48.000 --> 12:50.000
days, we'll see each of are in.

12:50.000 --> 12:53.000
All superdaunts.

12:53.000 --> 12:58.000
the performance increase now we are even a little bit better in May.

12:58.000 --> 13:02.000
But what is it's hard when you get the low-handing roof,

13:02.000 --> 13:07.000
low-handy protein in performance to get better on this.

13:07.000 --> 13:13.000
So well, most of the optimizations need to be focused on that we are currently

13:13.000 --> 13:17.000
at a tile buses render in our case.

13:17.000 --> 13:21.000
We need a lot of performance investment on improving the compiler,

13:22.000 --> 13:27.000
but we get a round of three, four percent of performance improvement in this case.

13:27.000 --> 13:34.000
But most of the performance we got was an avoid in the load

13:34.000 --> 13:39.000
and store operations from the tile, because every time that you are, for example,

13:39.000 --> 13:42.000
I have a job with multiple draw calls.

13:42.000 --> 13:47.000
And because I'm starting a reason, I need to flash it and create

13:47.000 --> 13:51.000
for the same frame buffer, another job, and plus it.

13:51.000 --> 13:53.000
And I'm sending it to this GPU.

13:53.000 --> 13:58.000
For the first one, I need to load every tile and store every tile.

13:58.000 --> 14:00.000
And the same for the second one.

14:00.000 --> 14:04.000
If I can merge all of them in the same job, I'm avoiding them,

14:04.000 --> 14:06.000
one load and one store.

14:06.000 --> 14:07.000
And this is happening.

14:07.000 --> 14:10.000
For example, you can force a driver to make every little call,

14:10.000 --> 14:13.000
you flash and it's really, really slow.

14:13.000 --> 14:15.000
And you see that the most of the time,

14:15.000 --> 14:18.000
it's not just running the driver.

14:18.000 --> 14:20.000
It's just loading, I'm unloading.

14:20.000 --> 14:24.000
Because you put two protocols together, and it's the same time,

14:24.000 --> 14:27.000
then one, a little bit more.

14:27.000 --> 14:29.000
What were the main improvements?

14:29.000 --> 14:31.000
The first one was reducing the number of your flashes.

14:31.000 --> 14:34.000
We have some, as I commented, when we implemented texture barrier.

14:34.000 --> 14:38.000
There were, we were too conservative,

14:38.000 --> 14:41.000
flashing jobs.

14:41.000 --> 14:47.000
That were using previous use buffers up, or textures.

14:47.000 --> 14:51.000
The second one was a lot of compiler optimization,

14:51.000 --> 14:58.000
reducing maybe 10% of the instructions.

14:58.000 --> 15:01.000
At the end, we are waiting in this demo,

15:01.000 --> 15:05.000
around 2.5 of performance improvement in execution.

15:05.000 --> 15:09.000
The other one was related to sorting some,

15:09.000 --> 15:13.000
enabling the early-fermented testing,

15:13.000 --> 15:15.000
in the case of this corporation.

15:15.000 --> 15:19.000
By default, the driver was completely disabling early-fermented.

15:19.000 --> 15:22.000
But even the case that you are in your series,

15:22.000 --> 15:25.000
doesn't update the default or extensive,

15:25.000 --> 15:26.000
you can enable that.

15:26.000 --> 15:28.000
So you need to check that condition,

15:28.000 --> 15:31.000
and enable this optimization that the hardware can do,

15:31.000 --> 15:36.000
and just avoid doing the rendering for the pictures

15:36.000 --> 15:40.000
that are behind the recommended buffer.

15:40.000 --> 15:45.000
The last one was 11% in average,

15:45.000 --> 15:48.000
but it's 60% in the case of Manhattan.

15:48.000 --> 15:51.000
That is, avoid lost some stores

15:51.000 --> 15:55.000
in the case of your draw calls,

15:55.000 --> 15:57.000
have the same resturization.

15:57.000 --> 16:00.000
This is usually happening when you have transfer feedback.

16:00.000 --> 16:03.000
In that case, you are usually only interested,

16:03.000 --> 16:10.000
in the case of executing the geometry states.

16:10.000 --> 16:12.000
So if you remove completely the resturization,

16:12.000 --> 16:15.000
because the application points to that,

16:15.000 --> 16:17.000
you are saving a lot of time.

16:17.000 --> 16:20.000
In the case of Manhattan,

16:20.000 --> 16:22.000
they are losing a lot of transfer feedback,

16:22.000 --> 16:24.000
only without resturization.

16:24.000 --> 16:27.000
We can see the different points I was commented

16:27.000 --> 16:31.000
because it implies,

16:31.000 --> 16:36.000
and we can see that in some benchmark,

16:36.000 --> 16:38.000
the performance is huge,

16:38.000 --> 16:40.000
because that's something that the event is not doing,

16:40.000 --> 16:42.000
and in other cases,

16:42.000 --> 16:43.000
well, it doesn't affect,

16:43.000 --> 16:45.000
because they are not using discounts,

16:45.000 --> 16:48.000
for example, or they are using transfer feedback in the same way.

16:48.000 --> 16:50.000
So we are seeing that, in other cases,

16:50.000 --> 16:52.000
although we have seen here,

16:52.000 --> 16:54.000
200% in some demos,

16:54.000 --> 16:57.000
when we run the general traces that we already have,

16:57.000 --> 16:58.000
that are not part of the benchmark,

16:58.000 --> 17:01.000
we are seeing like 10% of performance improvement

17:01.000 --> 17:04.000
for the general case.

17:04.000 --> 17:08.000
We already did some instrumentization

17:08.000 --> 17:11.000
to make a possible to ramp effect,

17:11.000 --> 17:12.000
that is,

17:12.000 --> 17:16.000
honestly, to know where you are losing the time of your GPU,

17:16.000 --> 17:18.000
when you are waiting for fences,

17:18.000 --> 17:21.000
and know that this CPU is stored,

17:21.000 --> 17:24.000
because you are still waiting for the jobs

17:24.000 --> 17:26.000
we've been rendering in the GPU,

17:26.000 --> 17:28.000
so that helps a lot,

17:28.000 --> 17:29.000
and to understand,

17:29.000 --> 17:32.000
was the problem in your application,

17:32.000 --> 17:35.000
so this is running on the facility,

17:35.000 --> 17:39.000
and the other part that we are growing

17:39.000 --> 17:41.000
than the improvements,

17:41.000 --> 17:44.000
which showcases this last year in the Mavader room,

17:44.000 --> 17:48.000
it was the superpaces support.

17:49.000 --> 17:52.000
Our hardware from Raspberry Pi 4 and 5

17:52.000 --> 17:57.000
has support for the memory management unit,

17:57.000 --> 18:02.000
of big pages that there are 64 kilobytes

18:02.000 --> 18:03.000
and superpaces.

18:03.000 --> 18:04.000
There are one megabyte,

18:04.000 --> 18:06.000
this has some implications,

18:06.000 --> 18:08.000
because you have one megabyte,

18:08.000 --> 18:09.000
and using four pages,

18:09.000 --> 18:13.000
there are a lot of interesting memory management,

18:13.000 --> 18:14.000
but you need,

18:14.000 --> 18:15.000
and in that case,

18:15.000 --> 18:18.000
you need to use the cache for that memory,

18:18.000 --> 18:20.000
and using for example superpaces,

18:20.000 --> 18:24.000
you are only getting one pointer to the physical address,

18:24.000 --> 18:26.000
and all of them,

18:26.000 --> 18:30.000
you are saving the cache and makes things faster.

18:30.000 --> 18:34.000
We are using transparent GPU pages for this,

18:34.000 --> 18:38.000
for the implementation of the kernel,

18:38.000 --> 18:40.000
and with that,

18:40.000 --> 18:42.000
we have a lot of benefits in some particular case,

18:42.000 --> 18:43.000
and in our case,

18:43.000 --> 18:46.000
we are getting at 1% running the sensors,

18:46.000 --> 18:49.000
but in the case of some emulators,

18:49.000 --> 18:51.000
I remember we showcases last year

18:51.000 --> 18:55.000
the sample PlayStation games running in emulator,

18:55.000 --> 18:58.000
and you see that it's from not being able to play the game,

18:58.000 --> 19:00.000
to just being able to play the game,

19:00.000 --> 19:03.000
because there are a lot of resources.

19:08.000 --> 19:09.000
That's all,

19:09.000 --> 19:10.000
summary,

19:10.000 --> 19:14.000
we now they are having a better test of,

19:14.000 --> 19:18.000
because we have an stable running with that user,

19:18.000 --> 19:19.000
compositor,

19:19.000 --> 19:21.000
and it's working quite fine.

19:21.000 --> 19:24.000
There are some investors to do the data about it,

19:24.000 --> 19:25.000
about here.

19:25.000 --> 19:27.000
The mayor upgrades,

19:27.000 --> 19:30.000
was the having pull can 1.3 available to the user,

19:30.000 --> 19:33.000
not building Mesa source code.

19:33.000 --> 19:35.000
That is also nice,

19:35.000 --> 19:37.000
we have a lot of different extensions that we can,

19:37.000 --> 19:41.000
already used as developers for OpenGL and Vulcan,

19:41.000 --> 19:44.000
and in the cases of the performance,

19:44.000 --> 19:49.000
we identify several interesting points that provide

19:49.000 --> 19:52.000
a huge performance jam.

19:52.000 --> 19:54.000
There were a lot of things,

19:54.000 --> 19:57.000
but I didn't include anything that was under the 3%

19:57.000 --> 20:03.000
but there are other things that make things faster also.

20:03.000 --> 20:04.000
For example,

20:04.000 --> 20:07.000
the kernel improvements that we did for superpages,

20:07.000 --> 20:12.000
and in particular cases are really nice to have.

20:12.000 --> 20:14.000
That's all, thank you very much.

20:14.000 --> 20:16.000
I hope you have some questions.

20:17.000 --> 20:22.000
Thank you.

20:22.000 --> 20:24.000
This might be a few questions,

20:24.000 --> 20:28.000
but maybe because I haven't talked to Mesa since 2020,

20:28.000 --> 20:29.000
and in Nokia,

20:29.000 --> 20:34.000
we implemented a body allocated for anything smaller than 4K,

20:34.000 --> 20:36.000
so that smaller allocations,

20:36.000 --> 20:38.000
or not waste loss,

20:38.000 --> 20:39.000
based on a lot of time,

20:39.000 --> 20:43.000
and we say 10% performance on the power of the art.

20:44.000 --> 20:48.000
Yes, it was commented that he was in Nokia times,

20:48.000 --> 20:53.000
they were implementing an allocator to manage the small,

20:53.000 --> 20:54.000
tiny chunks.

20:54.000 --> 20:56.000
There's small tiny chunks of memory,

20:56.000 --> 20:59.000
instead of spending the paste size for each of them,

20:59.000 --> 21:02.000
they were like doing so publication for that,

21:02.000 --> 21:04.000
and then stand up,

21:04.000 --> 21:05.000
yes,

21:05.000 --> 21:06.000
of the,

21:06.000 --> 21:08.000
yet that improved a 10% performance.

21:08.000 --> 21:12.000
No, Mesa is not doing that for you.

21:12.000 --> 21:15.000
No,

21:15.000 --> 21:16.000
this,

21:16.000 --> 21:19.000
this is just being an option for example,

21:19.000 --> 21:21.000
when you are creating,

21:21.000 --> 21:22.000
I imagine,

21:22.000 --> 21:23.000
in some cases,

21:23.000 --> 21:24.000
we are creating a command list,

21:24.000 --> 21:26.000
but distributed to the GPU.

21:26.000 --> 21:28.000
You can put probably more information there,

21:28.000 --> 21:30.000
to only do one submission.

21:30.000 --> 21:31.000
For example,

21:31.000 --> 21:34.000
put in the uniforms there at the end of the buffer,

21:34.000 --> 21:37.000
and there are a lot of optimization that could improve that.

21:37.000 --> 21:40.000
The only thing we are doing mainly is,

21:40.000 --> 21:42.000
we have been buffer up,

21:42.000 --> 21:43.000
cash,

21:43.000 --> 21:45.000
in order to allocate one buffer,

21:45.000 --> 21:48.000
and you were used to avoid the overhead of calling the kernel,

21:48.000 --> 21:49.000
this is the main,

21:49.000 --> 21:51.000
the optimization of the sales.

21:51.000 --> 21:53.000
Yes, that,

21:53.000 --> 21:54.000
that,

21:54.000 --> 21:56.000
you remember the cash from the obvious,

21:56.000 --> 21:59.000
you see that there is a performance,

21:59.000 --> 22:00.000
you say, well, this could,

22:00.000 --> 22:01.000
do you have an issue,

22:01.000 --> 22:02.000
maybe is the,

22:02.000 --> 22:03.000
never it is,

22:03.000 --> 22:06.000
but you see the performance drop there.

22:07.000 --> 22:12.000
Are you using the level 5 source of ideas for optimization,

22:12.000 --> 22:14.000
or is it distributed?

22:14.000 --> 22:17.000
We are not using the level 5 for,

22:17.000 --> 22:18.000
the question is,

22:18.000 --> 22:22.000
are we using the level 5 for new optimizations?

22:22.000 --> 22:25.000
Not in particular,

22:25.000 --> 22:26.000
the level 5,

22:26.000 --> 22:27.000
but we are,

22:27.000 --> 22:30.000
you saw the following is the development of the command framework,

22:30.000 --> 22:31.000
for example,

22:31.000 --> 22:32.000
it is a new,

22:32.000 --> 22:35.000
a new input that is doing some optimization,

22:35.000 --> 22:36.000
we saw the,

22:36.000 --> 22:37.000
certain,

22:37.000 --> 22:38.000
see,

22:38.000 --> 22:39.000
we can apply then,

22:39.000 --> 22:42.000
and we have a true list of things to check to,

22:42.000 --> 22:43.000
if they are both,

22:43.000 --> 22:44.000
for example,

22:44.000 --> 22:45.000
last month,

22:45.000 --> 22:46.000
there was,

22:46.000 --> 22:47.000
I don't remember,

22:47.000 --> 22:50.000
was a law-winning that was taking advantage of,

22:50.000 --> 22:51.000
you see,

22:51.000 --> 22:52.000
the,

22:52.000 --> 22:53.000
24 operations.

22:53.000 --> 22:55.000
And we implemented that,

22:55.000 --> 22:56.000
and we were getting,

22:56.000 --> 22:57.000
in some of the events,

22:57.000 --> 22:58.000
like,

22:58.000 --> 22:59.000
a 2% improvement.

23:00.000 --> 23:01.000
So, you are,

23:01.000 --> 23:03.000
also always seeing what all the drivers are doing,

23:03.000 --> 23:04.000
where they are,

23:04.000 --> 23:05.000
or they're in,

23:05.000 --> 23:06.000
in the compiler,

23:06.000 --> 23:07.000
different,

23:07.000 --> 23:08.000
or just checking if,

23:08.000 --> 23:09.000
can move in some things,

23:09.000 --> 23:10.000
makes things better,

23:10.000 --> 23:11.000
because,

23:11.000 --> 23:12.000
in some cases,

23:12.000 --> 23:13.000
are you seeing,

23:13.000 --> 23:14.000
lower is that,

23:14.000 --> 23:15.000
where is the,

23:15.000 --> 23:16.000
the beginning,

23:16.000 --> 23:17.000
and now,

23:17.000 --> 23:18.000
nobody is using that.

23:18.000 --> 23:20.000
Hmm.

23:20.000 --> 23:23.000
Can you speak to,

23:23.000 --> 23:24.000
or maybe explain why,

23:24.000 --> 23:25.000
the script,

23:25.000 --> 23:27.000
do you think it's not for both yet?

23:27.000 --> 23:28.000
Why?

23:28.000 --> 23:29.000
I think,

23:29.000 --> 23:30.000
does dynamic connecting

23:30.000 --> 23:32.000
of arrays of the script,

23:32.000 --> 23:33.000
or one of the things?

23:33.000 --> 23:34.000
Yes.

23:34.000 --> 23:36.000
The question is,

23:36.000 --> 23:37.000
why do we,

23:37.000 --> 23:38.000
don't we have the script,

23:38.000 --> 23:40.000
or indexing implemented?

23:40.000 --> 23:41.000
I think that,

23:41.000 --> 23:43.000
we need to work,

23:43.000 --> 23:44.000
work how the script of

23:44.000 --> 23:46.000
Sarmana's currently in the driver,

23:46.000 --> 23:47.000
no,

23:47.000 --> 23:48.000
so that implies,

23:48.000 --> 23:49.000
no,

23:49.000 --> 23:50.000
my colleague,

23:50.000 --> 23:51.000
I love to ask a comment,

23:51.000 --> 23:52.000
well,

23:52.000 --> 23:53.000
for this,

23:53.000 --> 23:54.000
it was maybe we need to implement that,

23:54.000 --> 23:56.000
but then we managed to do it that yet.

23:57.000 --> 23:58.000
It's something that we are,

23:58.000 --> 23:59.000
interesting,

23:59.000 --> 24:00.000
for example,

24:00.000 --> 24:01.000
we are,

24:01.000 --> 24:05.000
currently working on dynamic array indexing,

24:05.000 --> 24:06.000
right,

24:06.000 --> 24:08.000
that is requested for WebGL,

24:08.000 --> 24:10.000
because a lot of people are interesting

24:10.000 --> 24:11.000
in playing the WebGPU,

24:11.000 --> 24:13.000
and we are looking on that.

24:13.000 --> 24:14.000
If one of the solutions is,

24:14.000 --> 24:15.000
yes,

24:15.000 --> 24:17.000
when you to reply to this part of the driver,

24:17.000 --> 24:18.000
to implement that,

24:18.000 --> 24:21.000
or maybe we use a different approach to manage that,

24:21.000 --> 24:22.000
you know,

24:22.000 --> 24:24.000
it's an extension that is required by a lot.

24:25.000 --> 24:26.000
So, we have,

24:26.000 --> 24:28.000
we have that in the queue.

24:28.000 --> 24:29.000
Thank you.

24:34.000 --> 24:36.000
Thank you very much.

