WEBVTT

00:00.000 --> 00:15.120
So I'm here because, well, there are too many people around there saying, hey, you should rewrite

00:15.120 --> 00:17.480
your project into Rust.

00:17.480 --> 00:24.720
And I am maintaining a project in C, which has like 25 years old, and I am saying no, and

00:24.720 --> 00:30.720
now I'm going to say why, or maybe why I'm not going to rewrite my project to Rust.

00:30.720 --> 00:38.920
I am not going to be as funny as the one before me, it's impossible.

00:38.920 --> 00:47.280
So if you are expecting something better, just you can go away, there is nothing to see here.

00:47.280 --> 00:52.160
Anyway, if you are not scared, yeah, the quest is simple.

00:52.160 --> 01:04.160
We have many years old C code base, which was written in like 1998, 99 in ISOC, and so

01:04.160 --> 01:09.640
we are expecting to keep this because it runs and rewriting this completely through

01:09.640 --> 01:15.120
something else, means like imposing more bugs than fixing.

01:15.120 --> 01:22.120
Yes, we do have memory safety concerns, there are lots of people actually saying,

01:22.120 --> 01:32.280
anything written in C is bad because it's memory unsafe, and I am saying, well, it depends.

01:32.280 --> 01:41.680
If it doesn't crash for like last 10 years, I am asking you where is it unsafe.

01:41.680 --> 01:47.520
On the other hand, we are improving its ever since, and in the new code there are some problems,

01:47.520 --> 01:55.360
and we are fixing them continuously, but it's still a piece of software, which is rather stable.

01:55.360 --> 02:03.200
So I am asking why should I rewrite something to Rust when it's not broken?

02:03.200 --> 02:07.600
So the requirements are, I want to keep what works.

02:07.600 --> 02:15.720
I don't want to rewrite a whole parser written in C, which has been working for last 25 years.

02:15.720 --> 02:24.360
I want to automate most of the refactoring I am going to do to make the future updates safe.

02:24.360 --> 02:30.880
I want to lead the developers, the way they should code, it's obvious that you can shoot

02:30.880 --> 02:41.920
your own leg in the most creative place in C, and I am not going to make you know do it.

02:41.960 --> 02:46.160
I am just going to make it harder for you.

02:46.160 --> 02:53.800
I want to allow all the food guns you want to have, but it will be obvious that you are shooting your leg.

02:53.800 --> 03:02.800
And then on the code review, I'm going to look at it and say, no, that's not going there.

03:02.800 --> 03:10.120
So as I am saying here, it should be hard to write bad code, but I don't like making it impossible

03:10.120 --> 03:18.160
because making the bad code impossible throws out even some good code.

03:18.160 --> 03:24.840
So what I'm expecting to happen with on safe code is, ideally, it's obvious, it built error.

03:24.840 --> 03:27.200
It's what Rust has.

03:27.200 --> 03:32.280
Or there can be aesthetic analyzer error, which is something you can just put into your CI,

03:32.280 --> 03:37.440
and then you say, well, it's killed by the aesthetic analyzer where I'm not going to,

03:37.480 --> 03:40.480
I'm not going to include your updates.

03:40.480 --> 03:46.320
It can fail on some unit test error, it can just stick out.

03:46.320 --> 03:56.960
If I see a typecast in a plane side in your changes, I say no, because you should not do typecast,

03:56.960 --> 04:00.560
but more on that later.

04:00.640 --> 04:08.240
And one of those requirements is if some code is intended to be innocent,

04:08.240 --> 04:16.400
it must actually be innocent, which is a reason why I'm not going to rewrite it into C++.

04:16.400 --> 04:26.160
Because I don't want to study what's the plus happening, whether it's overloaded and where.

04:26.240 --> 04:34.960
So innocent codes, the code which looks innocent must also be innocent.

04:34.960 --> 04:39.120
Now the controversial thing, how to do it?

04:39.120 --> 04:43.280
The as a part is what should be the default?

04:43.280 --> 04:46.960
Basically, if you do something, do it locally.

04:46.960 --> 04:52.760
If you can do it locally, don't rely on anything which is global.

04:52.840 --> 04:55.160
You should put const everywhere.

04:55.160 --> 04:59.880
This is something which Rust has done it, it has done well.

04:59.880 --> 05:07.240
Doing it, view posit, doing it making everything mutable, marked as mutable,

05:07.240 --> 05:12.040
and everything const is just const by default.

05:12.040 --> 05:15.880
There are other things like functions that are not pure.

05:15.880 --> 05:20.840
I am not against things like side effects.

05:20.840 --> 05:26.680
These are legitimate, but it should have a reason.

05:26.680 --> 05:32.520
What I hate actually is like returning things in the arguments.

05:32.520 --> 05:37.800
And so, because it's also obscures what you are doing,

05:37.800 --> 05:43.400
but what I am trying to say, the code should say what it's doing.

05:43.400 --> 05:51.560
And if you try to write a code which shows what you are doing,

05:51.560 --> 05:54.200
then you should be safe.

05:54.200 --> 06:02.600
If you are writing a code which is adhering to specific rules,

06:02.600 --> 06:06.280
it's just adhering to specific rules.

06:06.280 --> 06:11.880
And the last point here, no void pointers anywhere.

06:11.880 --> 06:17.640
If you see any void pointer, just replace it by something else.

06:17.640 --> 06:22.200
There is no single reason to use void pointers.

06:22.200 --> 06:26.360
Apart from things like the return value from the memory allocator.

06:26.360 --> 06:31.240
If you are allocating memory, yes, you get a void pointer that's okay.

06:31.240 --> 06:35.000
But you should not put a void pointer as an argument

06:35.000 --> 06:41.400
of some object of some size, not at all never.

06:42.040 --> 06:45.080
But first we have to read the get rid of globals.

06:45.080 --> 06:47.160
It's typically a context.

06:47.160 --> 06:50.920
You don't have a reason to look at a global variable,

06:50.920 --> 06:55.000
which has how you should, for example, format time.

06:55.000 --> 06:57.800
It's typically a context.

06:57.800 --> 07:00.120
Or it's a global information.

07:00.120 --> 07:04.760
It's something like how the page size was the page size.

07:04.760 --> 07:08.520
Well, yes, then it's a read only global information,

07:08.520 --> 07:10.680
and you are not writing it.

07:10.680 --> 07:13.640
Well, yes, then there is a really shared data,

07:13.640 --> 07:16.360
and you probably have to do some log access.

07:16.360 --> 07:21.640
It should be explicit more on that later.

07:21.640 --> 07:25.480
This is my favorite part.

07:25.480 --> 07:27.560
You should not use void pointers.

07:27.560 --> 07:30.600
And I will say it several times again.

07:30.600 --> 07:32.120
You can use unions.

07:32.120 --> 07:33.720
Unions are good.

07:33.720 --> 07:35.560
Unions are fine.

07:35.560 --> 07:41.720
And you can, with now, with C11 and 18 and 20 C or 4,

07:41.720 --> 07:46.680
or what's that now, you can use anonymous structures

07:46.680 --> 07:49.000
inside unions, inside structures.

07:49.000 --> 07:52.840
Well, yes, it looks ugly for the beginning.

07:52.840 --> 07:59.880
But you can use a structure, put there the type of what piece

07:59.880 --> 08:03.960
of the union is there, and you can use a macro.

08:03.960 --> 08:10.840
And if you are against macros, don't try it in C.

08:10.840 --> 08:14.360
C is a language with macros.

08:14.360 --> 08:17.240
And if the macro preprocessor, if the C preprocessor

08:17.240 --> 08:21.400
is not enough for you, hello M4.

08:21.400 --> 08:27.800
Is there anybody who has written some M4 code in like 10 years?

08:27.800 --> 08:32.280
Yes, I have written like a thousand lines of M4,

08:32.280 --> 08:35.720
and it has a big thick warning of the beginning.

08:35.720 --> 08:37.480
Do not read this code.

08:37.480 --> 08:40.440
Do not continue after this line.

08:40.440 --> 08:41.960
Until you want to get brain damage.

08:45.240 --> 08:46.920
And I am not kidding.

08:46.920 --> 08:48.760
It actually is in the code.

08:48.760 --> 08:49.480
You can Google it.

08:49.480 --> 08:52.520
You can find it.

08:52.520 --> 08:54.520
You can have generated code.

08:54.520 --> 08:59.400
You can have a linked list type for every single thing.

08:59.400 --> 09:02.120
You want to put into the linked list.

09:02.120 --> 09:04.760
So if you have one structure and these structures

09:04.760 --> 09:08.200
are put into the linked list, you have a linked list

09:08.200 --> 09:10.040
of these structures as a type.

09:10.040 --> 09:12.840
And then you have another structure and you put

09:12.840 --> 09:14.360
that structure into a linked list.

09:14.360 --> 09:18.440
You have a linked list of that structure as another type.

09:18.440 --> 09:20.920
You have a complete set of functions,

09:20.920 --> 09:25.960
manipulating these data structures for every single data type.

09:25.960 --> 09:28.680
Yes, it is greedy.

09:28.680 --> 09:34.120
But we are in the times where the compiler time

09:34.120 --> 09:39.880
is actually much cheaper than the time you spend

09:39.880 --> 09:42.040
debugging these problems.

09:42.040 --> 09:47.320
So please make the compiler do what the compiler should do.

09:47.320 --> 09:52.120
And yes, this is kind of abusing the type system.

09:52.120 --> 09:53.640
Please do it.

09:53.640 --> 09:57.800
This is one of those things she can do.

09:57.800 --> 10:04.120
And please do not typecast plainly anywhere for any reason.

10:04.120 --> 10:08.600
If you want to typecast, you have to write the macro.

10:08.600 --> 10:13.320
And the macro says what you are intending with doing it

10:13.320 --> 10:17.880
and not only that the macro should actually check

10:17.880 --> 10:22.200
that what you put in is intended.

10:22.200 --> 10:23.240
It is possible.

10:23.240 --> 10:27.720
And I don't have it in this presentation, but I have it in the code.

10:28.120 --> 10:32.760
It is possible to check that the specific points you are putting in

10:32.760 --> 10:37.000
is actually of the type you are expecting to be put in.

10:37.000 --> 10:40.840
Yes, it is possible to write type checking macros.

10:40.840 --> 10:46.440
And if it makes sense, you should do it.

10:46.440 --> 10:52.680
There are other things one should do like acquiring

10:52.680 --> 10:55.800
managing the local acquired resources.

10:55.800 --> 11:00.360
Well, you acquire anything and then you return.

11:00.360 --> 11:02.120
Well, no, please.

11:02.120 --> 11:05.160
It should be the explicit releasing.

11:05.160 --> 11:06.280
Make some problems.

11:06.280 --> 11:09.400
So you can have some clean up hooks.

11:09.400 --> 11:11.560
It is not a new feature.

11:11.560 --> 11:15.400
It is a thing not yet standardized in C because it

11:15.400 --> 11:21.640
it got in some very nasty problems inside the committee.

11:21.640 --> 11:27.160
But every component is okay with clean up hooks,

11:27.160 --> 11:33.960
which are executed when a variable gets out of scope.

11:33.960 --> 11:35.800
You can have end of task hooks.

11:35.800 --> 11:40.600
Who is that are run just before you enter the poll again?

11:40.600 --> 11:43.400
You can have different times for different allocation

11:43.400 --> 11:47.720
scopes and you just have to move the data from the local

11:47.720 --> 11:50.520
erochatory source to the global erochatory source,

11:50.520 --> 11:54.040
either basically by explicitly copying,

11:54.040 --> 12:02.280
to make it explicit that now you are writing something global.

12:02.280 --> 12:05.720
And yes, you should mark stack and stack all

12:05.720 --> 12:08.520
and elochatory data is the worst one.

12:08.520 --> 12:10.200
At least please mark them.

12:10.200 --> 12:14.680
I don't see much, much of better things.

12:14.760 --> 12:16.680
Yeah, this is some example.

12:16.680 --> 12:21.720
This is how one can use an unlock macro.

12:21.720 --> 12:24.680
This basically is a function, it's a dummy function.

12:24.680 --> 12:28.360
It gets a, it's got a size from some table.

12:28.360 --> 12:30.520
And the table has its lock inside.

12:30.520 --> 12:34.200
So the get the information you have to lock.

12:34.200 --> 12:39.560
So there is a macro doing some some weird things

12:39.560 --> 12:47.720
with the locking, what's the thing to look at is the return

12:47.720 --> 12:51.160
inside the locking and the return is safe.

12:51.160 --> 12:56.760
You can return from the locked context because the lock

12:56.760 --> 12:59.080
is released automatically.

12:59.080 --> 13:05.560
So this means you can just let it go.

13:05.560 --> 13:09.560
The implementation is quite scary.

13:09.560 --> 13:12.520
One has to look through it and read through it.

13:12.520 --> 13:14.440
It takes some time in the beginning.

13:14.440 --> 13:21.640
When one gets to actually the grasp of what is is doing,

13:21.640 --> 13:25.640
yes, then you can use it and you can be sure that it works.

13:25.640 --> 13:30.520
And there is, yes, this is one part, which basically

13:30.520 --> 13:35.080
you peruse is, abuse is a force cycle to do the block,

13:35.080 --> 13:38.040
to do the block c-mentics.

13:38.040 --> 13:41.720
And there is also the cleaner function which is called

13:41.720 --> 13:44.760
when the block is left.

13:44.760 --> 13:50.680
And this is basically, you can see and here it is doing the

13:50.680 --> 13:53.080
object lock simple.

13:53.080 --> 13:59.400
And the unlocking is actually done in the cleaner function.

13:59.400 --> 14:03.800
And this is typically put in one place together in some

14:03.800 --> 14:07.560
header file.

14:07.560 --> 14:10.360
Yes, then there is some memory allocation strategies.

14:10.360 --> 14:13.400
You should use what fits your project.

14:13.400 --> 14:14.200
Do not be bad to me.

14:19.080 --> 14:21.800
In a bird, what we are using, yeah, I have not said not

14:21.800 --> 14:23.640
said that, but I'll return to that.

14:23.640 --> 14:27.080
In a bird, we are using hierarchical pools.

14:27.080 --> 14:29.000
And the pools are keeping track of everything.

14:29.000 --> 14:35.720
So if you walk from a root place, you can traverse all the

14:35.720 --> 14:39.400
allocated memory and show what's located and what's

14:39.400 --> 14:40.040
where.

14:40.040 --> 14:41.560
There is a temporary allocation.

14:41.560 --> 14:43.400
It just gets freed at the task.

14:43.400 --> 14:46.120
You do the MPL lock and then it gets freed.

14:46.120 --> 14:48.920
You don't have to worry.

14:48.920 --> 14:52.920
You can also get temporarily some global resources.

14:52.920 --> 14:54.520
And it's the same principle.

14:54.520 --> 14:58.440
You reference it and immediately schedule a release task

14:58.520 --> 15:01.240
to be done at the end of the task.

15:01.240 --> 15:03.720
So you know, it's safe to store.

15:03.720 --> 15:09.400
It's not safe to, yeah, I was skip, skip, skip.

15:09.400 --> 15:11.240
Yes, you can see it in a bird.

15:11.240 --> 15:13.480
You can see it in a lip UCV.

15:13.480 --> 15:14.760
This is myself.

15:14.760 --> 15:15.320
Thank you.

15:15.320 --> 15:29.880
Thank you.

