WEBVTT

00:00.000 --> 00:10.000
Jim, we'll be talking about a programming language perspective on

00:10.000 --> 00:14.000
and replication. It's super high level, super interesting. Let's go.

00:14.000 --> 00:16.000
Let's welcome him.

00:16.000 --> 00:23.000
So then talk about the scientific environment.

00:23.000 --> 00:27.000
I can have a perspective as a post-conceptal tool researcher at the TV.

00:27.000 --> 00:32.000
So I'm the expert in the programming language perspective.

00:32.000 --> 00:35.000
So we're talking about the applied module.

00:35.000 --> 00:37.000
I have these vision data types.

00:37.000 --> 00:40.000
So they have been talking about two parts.

00:40.000 --> 00:44.000
First of all, look at the problem that we've made.

00:45.000 --> 00:48.000
How we attempted to work, try to solve them for the method teams.

00:48.000 --> 00:52.000
And then try to go a bit into my personal framework.

00:52.000 --> 00:55.000
I've been able to test these things very well.

00:55.000 --> 00:59.000
So a lot of the problems, actually, that we have to do.

00:59.000 --> 01:01.000
We can get them to get them strong.

01:01.000 --> 01:03.000
Well, it's right that we can't remember.

01:03.000 --> 01:05.000
We need a distributed system.

01:05.000 --> 01:07.000
A lot of these pre-porting systems.

01:07.000 --> 01:10.000
We've been able to build a built-up team and partition tools.

01:10.000 --> 01:13.000
We're going to take it into a liquid system, right?

01:13.000 --> 01:17.000
But, actually, in reality, we always need more people, right?

01:17.000 --> 01:20.000
We always have to assume that our system remains correct.

01:20.000 --> 01:22.000
If we have something, what does it get?

01:22.000 --> 01:24.000
It's not as much as the method.

01:24.000 --> 01:27.000
What that means is that we have to have a built-up system in the state.

01:27.000 --> 01:28.000
And then we'll be able to build it.

01:28.000 --> 01:31.000
So that's why we have to do this.

01:31.000 --> 01:34.000
But the reality when the development is replicated,

01:34.000 --> 01:35.000
it's not like this.

01:35.000 --> 01:36.000
The consistency is poor.

01:36.000 --> 01:37.000
You get it's right.

01:37.000 --> 01:40.000
The replicated setting is used to be strong.

01:40.000 --> 01:42.000
We practically get it.

01:42.000 --> 01:44.000
It's a contextual tool.

01:44.000 --> 01:46.000
But it is not impossible.

01:46.000 --> 01:49.000
I'm going to go pick over several little problems.

01:49.000 --> 01:51.000
And we'll put that into the system.

01:51.000 --> 01:53.000
So the first thing is right.

01:53.000 --> 01:56.000
Selecting the right to begin the device for your application.

01:56.000 --> 02:00.000
So, if you're building an application, you have to ask yourself,

02:00.000 --> 02:04.000
do we want strong consistency to want synchronize or data?

02:04.000 --> 02:07.000
Do we want that strong advantage of consistency?

02:07.000 --> 02:11.000
We can tolerate a bit of delay in our synthesis.

02:11.000 --> 02:14.000
We're at the gas consumption rate of data.

02:14.000 --> 02:16.000
But eventually, it converts.

02:16.000 --> 02:21.000
Or do we need just causal consistency, which might be different?

02:21.000 --> 02:22.000
Let's be glad that.

02:22.000 --> 02:24.000
Thank you very much.

02:24.000 --> 02:33.000
All right, I'll just recap the slide.

02:33.000 --> 02:35.000
Just for the online audience.

02:35.000 --> 02:38.000
So basically, selecting the right or the T is difficult.

02:38.000 --> 02:41.000
We have to select the right consistency level.

02:41.000 --> 02:43.000
And based on that, we also have to think,

02:43.000 --> 02:46.000
what invariant in our application, we have to guarantee.

02:46.000 --> 02:48.000
What does our application need?

02:48.000 --> 02:51.000
And often this is very specific to what you're going to do.

02:51.000 --> 02:53.000
So one of the things that we've been developing,

02:53.000 --> 02:56.000
and actually this was done by Kevin, who gave a presentation earlier,

02:56.000 --> 02:58.000
is that we let the computer decide.

02:58.000 --> 03:01.000
Basically, you encode, in a formal specification,

03:01.000 --> 03:04.000
in a high-level language, what you need,

03:04.000 --> 03:06.000
what the semantics of your data type have to be,

03:06.000 --> 03:09.000
what invariants you have to uphold,

03:09.000 --> 03:11.000
and we run competitions on this.

03:11.000 --> 03:14.000
So basically, what happens is you give a scalar-like spec.

03:14.000 --> 03:18.000
We let the three wrong on this together with some analysis tools.

03:18.000 --> 03:20.000
It gives you back a lot of bunch of results.

03:20.000 --> 03:23.000
Based on this, we can automatically generate

03:23.000 --> 03:25.000
a scalar code, JavaScript code, other code,

03:25.000 --> 03:27.000
which will allow you to synchronize it.

03:27.000 --> 03:31.000
We have also proof that whatever the computer decides

03:31.000 --> 03:34.000
to generate what data type generates out of this is correct.

03:34.000 --> 03:38.000
So a really cool part of this is that it allows you

03:38.000 --> 03:42.000
to offload a bit about thinking how to design correct

03:42.000 --> 03:46.000
to serialities, data types, or strongly consistent data types,

03:46.000 --> 03:49.000
because not only can verify, make it a serialities.

03:49.000 --> 03:53.000
It also can tell you for this replicated data type,

03:53.000 --> 03:56.000
you cannot have a seriality, because we will never be able

03:56.000 --> 03:59.000
to uphold this variance, and you will have to sync on this point.

03:59.000 --> 04:03.000
So this gives the developer a bit of a flexibility

04:03.000 --> 04:07.000
to be more confident about what they have.

04:07.000 --> 04:09.000
So that's important.

04:09.000 --> 04:13.000
The second part is that in a replicated data type,

04:13.000 --> 04:16.000
you often have all these events happening concurrently, right?

04:16.000 --> 04:19.000
This is why you need serialities to have convergence

04:19.000 --> 04:20.000
or any other mechanism.

04:20.000 --> 04:23.000
But so one of the ways that you want to implement a seriality

04:23.000 --> 04:25.000
is by tracking causality, for example,

04:25.000 --> 04:27.000
tracking what links you have between events,

04:27.000 --> 04:30.000
what operations happened before other events.

04:30.000 --> 04:32.000
And so this actually leads to a problem,

04:32.000 --> 04:35.000
because you have to kind of track this metadata

04:35.000 --> 04:38.000
inside of your replicated state, so either as data,

04:38.000 --> 04:42.000
or often when you are transmitting packages over the network,

04:42.000 --> 04:44.000
you want to track logical timestamps,

04:44.000 --> 04:47.000
about the data or what other data variations

04:47.000 --> 04:49.000
this might grow with the size of the events in your system.

04:49.000 --> 04:53.000
So basically, we have a data management problem, right?

04:53.000 --> 04:57.000
So what we've been doing is looking at ways

04:57.000 --> 05:00.000
that we can have distributed garbage collection

05:00.000 --> 05:03.000
and serialities, that can happen in the background,

05:03.000 --> 05:05.000
and it makes it more efficient.

05:05.000 --> 05:07.000
We move to data when needed.

05:07.000 --> 05:09.000
And also Tomas here has been working

05:09.000 --> 05:12.000
on having Delta vector close, which allow us to have

05:12.000 --> 05:14.840
in efficient representation of called timestamps

05:14.840 --> 05:18.440
which are replicated in a network.

05:18.440 --> 05:22.440
Finally, designing a sensible CRT is hard.

05:22.440 --> 05:25.240
So earlier, you saw that we can let

05:25.240 --> 05:27.760
computer decide how to make CRTs.

05:27.760 --> 05:29.920
But also, we have to kind of think that if you want

05:29.920 --> 05:34.200
to have a general set of CRTs that people can directly use,

05:34.200 --> 05:35.720
we have to think a bit about or semantic

05:35.720 --> 05:36.960
that we give to the end user.

05:36.960 --> 05:39.760
So for given a set, for example, a set,

05:39.760 --> 05:43.040
you cannot replicate directly because if you have received

05:43.040 --> 05:44.920
either remove operations in different orders,

05:44.920 --> 05:46.280
you get different results.

05:46.280 --> 05:48.960
So this makes that you have CRTs, for example,

05:48.960 --> 05:53.400
be add-win CRT or remove CRT or last writer-win CRT's.

05:53.400 --> 05:56.320
They allow you to force in certain ordering

05:56.320 --> 05:59.600
that will make that your system converges, right?

05:59.600 --> 06:02.000
But even if you pick one of these, implementing them

06:02.000 --> 06:04.720
still can lead to strange behavior.

06:04.720 --> 06:06.520
Imagine we have two replicas.

06:06.520 --> 06:08.960
And then when we add an item, and remove it,

06:08.960 --> 06:11.680
and we do exactly the same in an order replica.

06:11.680 --> 06:13.640
Now I say we have add-win semantic system

06:13.640 --> 06:15.160
in a concurrent context.

06:15.160 --> 06:17.800
We will let ads win over removes.

06:17.800 --> 06:20.560
That would mean that in this particular case,

06:20.560 --> 06:26.200
if we merge everything, we might end up with the item in it

06:26.200 --> 06:28.760
in the set because the concurrent ads

06:28.760 --> 06:30.640
would cancel or out the removes.

06:30.640 --> 06:34.080
But sequentially, actually, we have removed the element.

06:34.080 --> 06:36.160
So depending on your interpretation,

06:36.160 --> 06:37.960
you can get to very strange results

06:37.960 --> 06:41.800
that might not also make sense to the end user.

06:41.800 --> 06:44.440
So building on the automatic generation

06:44.440 --> 06:47.080
of what we saw in the first problem,

06:47.080 --> 06:49.480
we've been also working on an extension of this,

06:49.480 --> 06:52.880
where that we actually are trying to create this building blocks

06:52.880 --> 06:55.520
and different building blocks that we can provide

06:55.520 --> 06:57.640
to the end user to build replications

06:57.640 --> 07:01.480
that where you first will analyze, for example,

07:01.480 --> 07:04.800
the data type, you will automatically detect

07:04.800 --> 07:06.280
the complex different behaviors,

07:06.280 --> 07:09.240
and there's different properties of your sequential implementations.

07:09.240 --> 07:11.440
And then you will propose to the user look,

07:11.440 --> 07:14.720
I find five ways that I can produce a serity,

07:14.720 --> 07:16.920
and this is to be these behaviors, right?

07:16.920 --> 07:19.880
And so we can give these components

07:19.880 --> 07:22.520
and users can also pick what states best.

07:22.520 --> 07:24.480
In the first version that I showed,

07:24.480 --> 07:27.160
the computer will give you something that solves the problem.

07:27.160 --> 07:30.200
But here, we're actually trying to give the end user

07:30.200 --> 07:32.680
the ability to pick a bit and see what fits best

07:32.680 --> 07:33.960
to their application.

07:33.960 --> 07:37.000
And the realities that often see what it is,

07:37.000 --> 07:39.560
may not be the best case, but at least we give them

07:39.560 --> 07:41.360
a way to envision this.

07:41.360 --> 07:43.080
So this work, excitingly enough,

07:43.080 --> 07:45.960
we're doing this also with two important figures

07:45.960 --> 07:47.920
who worked on the original serity paper,

07:47.920 --> 07:50.320
Carlos Bacero, and no producer.

07:50.320 --> 07:53.120
So we are really getting important feedback of them

07:53.120 --> 07:55.640
as well on doing this correctly.

07:57.840 --> 08:00.560
Finally, in the understanding problem,

08:00.560 --> 08:02.320
well, understanding or it is hard,

08:02.320 --> 08:03.360
but it doesn't have to be.

08:03.360 --> 08:05.600
If you're a master student or a PhD,

08:05.600 --> 08:07.840
you can go to the Ford Summer School on this route

08:07.840 --> 08:12.000
or replicate an environment where you can get direct class

08:12.000 --> 08:16.280
from some of the experts in the domain.

08:16.280 --> 08:18.720
So what I want to say about this first part

08:18.720 --> 08:23.640
is basically that well, consistency can be easy

08:23.640 --> 08:25.440
if you do have the right components.

08:25.440 --> 08:28.560
And so this is the research that we're doing at university.

08:28.560 --> 08:31.400
We're looking at ways we can make serity more efficient

08:31.400 --> 08:34.400
in what ways we can target new settings,

08:34.400 --> 08:37.160
for example, imagine that we are in a system

08:37.160 --> 08:39.720
where we transmit packages over Lora.

08:39.720 --> 08:41.640
We have a very small bandwidth, right?

08:41.640 --> 08:45.080
We cannot just broadcast data at random

08:45.080 --> 08:48.080
so we really have to consider how we package our data,

08:48.080 --> 08:49.680
what representation we use,

08:49.680 --> 08:51.280
how is our causality linked,

08:51.280 --> 08:56.120
so that we can have as efficient data structures as possible.

08:56.120 --> 08:58.080
So this finishes the first part,

08:58.080 --> 09:02.240
where I give a bit of a view of the problems.

09:02.240 --> 09:03.960
And I will go back into FLEC,

09:03.960 --> 09:08.120
which is my personal laboratory for experimenting with seritys.

09:08.120 --> 09:13.120
So I started my research on seritys actually back in 2018.

09:13.120 --> 09:17.480
And I was using a lot of Lua code to implement my seritys.

09:17.480 --> 09:19.160
I had a small little framework.

09:19.160 --> 09:22.080
And later, it is actually evolved into a TypeScript

09:22.080 --> 09:25.040
based implementation because I wanted to use

09:25.040 --> 09:28.040
assembly scripts to compile this to WebAssembly.

09:28.040 --> 09:31.400
And we were running this on embedded microcontrollers.

09:31.400 --> 09:35.160
In the end, we just focus on the TypeScript.

09:35.160 --> 09:36.000
And so what is this?

09:36.000 --> 09:38.400
What FLEC is actually is a small tool,

09:38.400 --> 09:40.760
which gives you several libraries.

09:40.760 --> 09:43.600
Let's say a causal delivery middleware

09:43.600 --> 09:45.840
a serity-based serity library

09:45.840 --> 09:47.520
where you can implement state-based seritys,

09:47.520 --> 09:48.960
operation-based seritys.

09:48.960 --> 09:50.600
And we have this important layer called

09:50.600 --> 09:52.640
pure operation-based serity layer.

09:52.640 --> 09:56.520
So pure operation-based seritys are like a known ad hoc way

09:56.520 --> 09:59.120
of designing operation-based seritys.

09:59.120 --> 10:01.360
And what's really nice is that we can reason easily

10:01.360 --> 10:05.960
about stuff like when we should drop certain metadata

10:05.960 --> 10:07.600
in the network.

10:07.600 --> 10:11.200
So this framework is built on something I call TSAT,

10:11.200 --> 10:15.200
which is actually an implementation of what you call

10:15.200 --> 10:18.400
ambient oriented programming, which was a bit popular

10:18.400 --> 10:22.320
in 2009 on top of TypeScript.

10:22.320 --> 10:24.000
So I'll give a bit of an example of that

10:24.000 --> 10:26.280
and show you how we use this.

10:26.280 --> 10:29.080
So the most basic example that I can show you

10:29.080 --> 10:32.160
with just the not-slag but TSAT

10:32.160 --> 10:34.960
is that we're going to have a ping-pong application.

10:34.960 --> 10:37.280
So on one side, we can make a new actor.

10:37.280 --> 10:39.880
And we can export an object, a ping-pong object

10:39.880 --> 10:40.600
on the network.

10:40.600 --> 10:42.400
So here you see ping-pong.

10:42.400 --> 10:45.320
And so these nodes can be using web port

10:45.320 --> 10:46.440
to see peer-to-peer.

10:46.440 --> 10:50.240
You can have shuffle a bit to the middleware layer outside.

10:50.240 --> 10:52.880
You could even add a Bluetooth layer if you wanted.

10:52.880 --> 10:55.400
But the idea is that you have a full mesh network

10:55.400 --> 10:58.600
where nodes can discover all objects exported by the order

10:58.600 --> 10:59.800
nodes.

10:59.800 --> 11:02.400
And so that's what we do exactly in this computer.

11:02.400 --> 11:04.240
We're discovering, we have an actor,

11:04.240 --> 11:06.280
which is discovering a ping-pong object.

11:06.280 --> 11:08.440
And so we got here is a reference.

11:08.440 --> 11:10.400
And what's really cool is that this reference is actually

11:10.400 --> 11:11.800
a four reference.

11:11.800 --> 11:14.760
And any operation that would be originally synchronous

11:14.760 --> 11:16.800
on the other side is automatically

11:17.360 --> 11:19.400
related into an asynchronous call.

11:19.400 --> 11:20.360
Even at type level.

11:20.360 --> 11:22.240
So here you would see the reference,

11:22.240 --> 11:24.840
and you would get a full type of it.

11:24.840 --> 11:29.240
But let's say if the original sequential implementation

11:29.240 --> 11:32.720
would return string, here we will get a promise to a string.

11:32.720 --> 11:39.000
So we can safely type and implement asynchronous applications.

11:39.000 --> 11:41.480
So on top of this, we have actually implemented

11:41.480 --> 11:43.760
and the serity layer.

11:43.760 --> 11:46.720
So the most basic example is that we can, for example,

11:46.720 --> 11:49.480
we want to have a simple shopping list, the most basic one.

11:49.480 --> 11:51.760
We say we want a net wind set.

11:51.760 --> 11:55.480
And so we say this at wind set should go online

11:55.480 --> 11:56.600
with the name shopping list.

11:56.600 --> 12:00.280
So now anywhere in the network, I can discover this shopping list.

12:00.280 --> 12:01.720
And so that's what I do here.

12:01.720 --> 12:02.920
And discovered the shopping list.

12:02.920 --> 12:04.960
I set a callback on the shopping list

12:04.960 --> 12:08.320
to get updates on when the serity itself

12:08.320 --> 12:11.640
can rely on serity is updated.

12:11.640 --> 12:13.760
And so here you can see I add some items

12:13.760 --> 12:18.760
as you would do in a regular sequential data type.

12:18.760 --> 12:21.040
I forgot to go over my arrows, but OK.

12:21.040 --> 12:23.320
So the implementation of the yet-wins itself

12:23.320 --> 12:25.160
builds again on top of, let's say,

12:25.160 --> 12:27.520
or pure operation-based serity theory.

12:27.520 --> 12:29.640
And we kind of basically defined these rules,

12:29.640 --> 12:33.760
these redundancy rules, which map actually directly

12:33.760 --> 12:37.520
to their specification as you can find in academia.

12:37.520 --> 12:40.200
So one of the goals when I was building this framework

12:40.200 --> 12:43.840
was that I could easily experiment and change it,

12:43.840 --> 12:48.160
changes to these frameworks, new concepts for serities.

12:48.160 --> 12:50.280
For example, I introduced redundancy relations

12:50.280 --> 12:52.720
that allow you to access data directly

12:52.720 --> 12:55.720
that is cash in the reliable causal broadcasting layer

12:55.720 --> 12:58.760
to allow you to have more efficient updates.

12:58.760 --> 13:02.000
And so that was my goal with building this little laboratory

13:02.000 --> 13:03.960
for myself.

13:03.960 --> 13:08.400
How do I make something that I can easily extend and play with?

13:08.400 --> 13:12.960
So this is a bit of unclear, so maybe I should give a visualization

13:12.960 --> 13:15.920
of what actually happens when we receive operations

13:15.920 --> 13:18.800
and what redundancy relations are used.

13:18.800 --> 13:23.520
So imagine that we have a serity, a pure operation-based serity,

13:23.520 --> 13:26.640
and internally we keep a log of the events that happened,

13:26.640 --> 13:28.800
only the ones that we need to.

13:28.800 --> 13:30.640
We actually compact this log regularly

13:30.640 --> 13:32.120
and we keep a compact state.

13:32.120 --> 13:34.040
But basically, whatever we keep in the log,

13:34.040 --> 13:36.000
is the frontier of operations, which

13:36.000 --> 13:40.040
are not causing stable, which we don't have information

13:40.040 --> 13:42.600
if they have been observed by other replicas.

13:42.600 --> 13:47.760
So let's say we receive a remove or a operation.

13:47.760 --> 13:53.000
What we'll first do is we'll check if anything in the log,

13:53.000 --> 13:56.000
the existing log of the serity will become redundant, right?

13:56.000 --> 13:59.640
And we do that using the redundancy rules of the pure operation-based

13:59.640 --> 14:03.800
serity framework and well, or implementation.

14:03.800 --> 14:08.600
So basically, we say that any existing item is redundant.

14:08.600 --> 14:12.720
If either the newly arriving operation is a clear operation,

14:12.720 --> 14:16.160
or if the existing item has the same arguments.

14:16.160 --> 14:18.680
What's important to note is that all these operations

14:18.680 --> 14:22.880
are arriving in causal order, the underlying framework

14:22.880 --> 14:25.600
will ensure that they're causally ordered.

14:25.600 --> 14:30.800
So in this case, we can make the ads for a redundant.

14:30.800 --> 14:37.000
And then, before we put the element itself in the log,

14:37.000 --> 14:38.760
we will check, actually, do we need to keep it.

14:38.760 --> 14:41.160
And in this case, for the ads, we actually

14:41.160 --> 14:43.560
do not need to keep removes.

14:43.560 --> 14:47.160
Because the serity will converge without it.

14:47.160 --> 14:50.960
And so what happens is that actually, then, in this naive version,

14:50.960 --> 14:55.360
we will just iterate over the log and compute the difference

14:55.360 --> 14:58.440
and compute the state.

14:58.440 --> 15:02.960
And this result is here T, which contains B and C.

15:02.960 --> 15:05.360
Now, in an actual implementation, so in black,

15:05.360 --> 15:07.680
what we do is we keep this compact state.

15:07.680 --> 15:09.560
And whenever operations are causally stable,

15:09.560 --> 15:11.800
we apply on sequential version to ensure

15:11.800 --> 15:15.320
that we have as little internal data usage as possible.

15:15.320 --> 15:16.520
And the different techniques, actually,

15:16.520 --> 15:19.320
to clean this up and force it.

15:19.320 --> 15:21.240
But so what was important for me

15:21.240 --> 15:23.160
was that we had these different layers where I could

15:23.160 --> 15:27.560
poke and play with, and so I built this extensible implementation

15:27.560 --> 15:30.880
where we, at different levels, have different hooks

15:30.880 --> 15:31.560
that I can hook in.

15:31.560 --> 15:34.320
For example, in the TSAT core layer,

15:34.320 --> 15:37.920
I can hook into message shunning, message receiving,

15:37.920 --> 15:40.400
the ide generation of your different nodes,

15:40.400 --> 15:43.080
when events are received, updated, et cetera.

15:43.080 --> 15:45.160
So that's really nice that I had these distinct ports,

15:45.160 --> 15:49.720
which helped me reason about CRT's replicas, network messages

15:49.720 --> 15:52.360
in a flexible way.

15:52.360 --> 15:55.480
And that's where the name flag actually also come from.

15:55.480 --> 15:57.480
So for the CRDT layer,

15:57.480 --> 16:01.480
I could hook into when nodes would go online.

16:01.480 --> 16:06.160
I could have hook into when operations were generated.

16:06.160 --> 16:08.600
Operations were received.

16:08.600 --> 16:11.720
And in the buffer, the causal buffer,

16:11.720 --> 16:13.440
and also finally applied.

16:13.440 --> 16:14.440
So different layers.

16:14.440 --> 16:16.200
And on because I have these hooks,

16:16.200 --> 16:17.720
now I can build on top of this.

16:17.720 --> 16:22.200
And that's where these, the pure operation layer

16:22.200 --> 16:24.040
and the RCB layer, build on.

16:24.040 --> 16:27.280
So the RCB layer then adds this causal buffering.

16:27.280 --> 16:30.000
And the pure up layer adds this method

16:30.000 --> 16:34.800
for dealing with all the redundancy layers.

16:34.800 --> 16:37.800
So actually, when we really fast through my presentation,

16:37.800 --> 16:39.040
and this is my last slide.

16:39.040 --> 16:40.960
So I'm giving you some buffer time.

16:40.960 --> 16:43.680
But so I will conclude with basically saying

16:43.680 --> 16:46.240
that designing and implementing replicated data type

16:46.240 --> 16:46.760
is hard.

16:46.760 --> 16:50.720
This is why you often will go to one-fit-all solutions

16:50.720 --> 16:53.200
that are out there that are made by very small people

16:53.200 --> 16:55.480
who have already talked about a lot of the problems.

16:55.480 --> 16:59.600
But sometimes your problem might not fit this one

16:59.600 --> 17:01.160
for fits all solutions.

17:01.160 --> 17:05.640
So we've been working at getting like computer-generated,

17:05.640 --> 17:08.560
provenly correct designs out there.

17:08.560 --> 17:12.440
And this is important because we can optimize for efficiency.

17:12.440 --> 17:14.040
We can optimize for correctness.

17:14.040 --> 17:15.480
And we can also really try to make

17:15.480 --> 17:17.880
a series ofties that sometimes makes sense, right?

17:17.880 --> 17:20.520
Because sometimes we want to build an application.

17:20.520 --> 17:24.600
And in reality, concurrency forces you

17:24.600 --> 17:26.320
to make compromises.

17:26.320 --> 17:28.760
So we want you to develop, or at least,

17:28.760 --> 17:32.720
to be understanding what compromises they will make

17:32.720 --> 17:35.000
and what it means for the end user.

17:35.000 --> 17:37.040
So yeah, the core is we're trying to distill

17:37.040 --> 17:39.600
the core building blocks horserierlyties

17:39.600 --> 17:42.680
and let the computer do as much calculation as possible.

17:42.680 --> 17:44.080
Let's see end of my presentation.

17:44.080 --> 17:45.280
And thank you for listening.

17:45.280 --> 17:54.160
Thank you, Jim.

17:54.160 --> 17:55.200
I'm sure we have questions.

17:55.200 --> 17:56.800
Yes, I was thinking about you.

17:56.800 --> 17:59.920
Yeah, I wish that's getting closer.

17:59.920 --> 18:01.120
It might be a bit of a...

18:01.120 --> 18:02.280
Is it on?

18:02.280 --> 18:03.680
Yeah, it is.

18:03.680 --> 18:05.880
The remote here is neutral press.

18:05.880 --> 18:07.280
It might be a bit of an assumption of question.

18:07.280 --> 18:09.720
But I was...

18:09.720 --> 18:12.160
My trainer thinking is that this is still highly dependent

18:12.160 --> 18:16.360
on some elements of time date.

18:16.360 --> 18:19.040
Do you foresee, or do you imagine,

18:19.040 --> 18:21.640
some way of achieving CRDT, you know,

18:21.640 --> 18:24.760
just consistency as all, without that?

18:24.760 --> 18:26.000
We're dealing with time and date.

18:26.000 --> 18:28.120
Without human data, without...

18:28.120 --> 18:29.600
Ah.

18:29.600 --> 18:31.160
Well, the thing is what's really important

18:31.160 --> 18:32.800
with developing strategies and application

18:32.800 --> 18:36.000
is that you verify the correctness, right?

18:36.000 --> 18:38.400
And so the benefit of the tools that we develop

18:38.400 --> 18:41.160
is that you give the sequential specification,

18:41.160 --> 18:43.640
which is always based on the business logic

18:43.640 --> 18:45.760
of what your application needs, right?

18:45.760 --> 18:49.320
And so that is still something that has to become up with a human

18:49.320 --> 18:50.960
or maybe these days you could do an LM,

18:50.960 --> 18:55.120
but I don't know how much you could trust it to do the correct thing.

18:55.120 --> 18:58.040
But you don't have to, or...

18:58.040 --> 19:01.600
You much less have to think about designing the correct CRDT

19:01.600 --> 19:03.880
because the semantics will be automatically...

19:03.880 --> 19:06.640
that's the goal verified by the computer for you.

19:06.640 --> 19:08.280
So you get something, you can see,

19:08.280 --> 19:12.560
look, we can offer you this, these strategies based on your spec,

19:12.560 --> 19:15.960
and well, you decide if this fits your application.

19:15.960 --> 19:17.520
But at least you don't have to think about

19:17.520 --> 19:20.960
how you go into the details of it.

19:28.240 --> 19:29.080
Thanks, Jamen.

19:29.080 --> 19:32.080
Things are really, really great body of work.

19:32.080 --> 19:35.520
And as a creator of a one-size-fits-all CRDT,

19:35.520 --> 19:38.280
I think it's an empirical question.

19:38.280 --> 19:40.280
We don't think it's the right approach.

19:40.280 --> 19:44.120
We think it's an expedient approach to try and test whether

19:44.120 --> 19:47.120
you can have a one-size-fits-most approach.

19:47.120 --> 19:53.520
And we don't think it's optimal, but it's useful.

19:53.520 --> 19:55.080
It's got pretty far.

19:55.080 --> 19:59.280
I, in a past life, did a lot of database stuff.

19:59.280 --> 20:02.600
And what I noticed is nobody actually understands how to use databases

20:02.600 --> 20:05.880
and the vast majority of postgres features

20:05.880 --> 20:10.200
are never used by anyone except consultants.

20:10.200 --> 20:12.320
I see someone from the postgres community having a chocolate

20:12.320 --> 20:12.760
front of me.

20:12.760 --> 20:13.640
It's very true, though.

20:13.640 --> 20:14.760
It's very true.

20:14.760 --> 20:17.400
And so I think historically, I would have been worried

20:17.400 --> 20:21.440
about this, but I'm curious now, it's been a few years since Flek.

20:21.440 --> 20:25.360
Has anyone been making any progress on type inference,

20:25.360 --> 20:28.040
static analysis, so that we don't get

20:28.040 --> 20:32.360
sort of either the laziest or most conservative interpretations

20:32.360 --> 20:36.680
that could actually use PL techniques to find the weakest forms

20:36.680 --> 20:38.280
of consistency necessary.

20:38.280 --> 20:40.080
Does it feel like a promising direction?

20:40.080 --> 20:42.880
Yeah, so I maybe will go back to this slide.

20:46.600 --> 20:47.760
I went to four, sorry.

20:47.760 --> 20:48.680
Two, sorry.

20:48.680 --> 20:50.480
No, let me see.

20:50.480 --> 20:52.440
Show navigator.

20:52.440 --> 20:54.600
Uh, I won't.

20:54.600 --> 20:55.320
Yeah, you're just fine.

20:55.320 --> 20:56.720
Yeah.

20:56.720 --> 20:59.080
So it's a very good question, right?

20:59.080 --> 21:01.760
Because there is no right solution for very many things.

21:01.760 --> 21:04.640
We're just trying to create something and see if it fits.

21:04.640 --> 21:08.160
So in this last paper that we were working on,

21:08.160 --> 21:11.440
we're trying to really analyze the building blocks of what

21:11.440 --> 21:13.280
makes a replicated data type take.

21:13.280 --> 21:15.600
Where they core essential ports.

21:15.600 --> 21:18.440
And this is why we're also working together at like Carlos Bacero

21:18.440 --> 21:21.840
because they have a very good insights on what is important

21:21.840 --> 21:24.320
and what has been needed for them to come up with the concept

21:24.320 --> 21:26.560
of series these in the first place.

21:26.560 --> 21:30.200
And so it's a very difficult problem.

21:30.200 --> 21:34.480
We're trying to analyze as much as possible given,

21:34.480 --> 21:38.840
let's say, some goal, some business goal,

21:38.840 --> 21:39.920
some application, right?

21:39.920 --> 21:42.880
What is the minimum we need to make something

21:42.880 --> 21:47.240
that you can automatically compose these components

21:47.240 --> 21:49.160
and get something else useful?

21:49.160 --> 21:52.400
So in the end, the one size set all solution is actually

21:52.400 --> 21:53.880
really good because you get these components

21:53.880 --> 21:57.000
that you can play with and build it and easily iterate.

21:57.000 --> 21:58.920
So we're also trying to do that somehow,

21:58.920 --> 22:02.840
but create it by some algorithms that help you formulate

22:02.840 --> 22:05.560
this in a way that fits this package, right?

22:05.560 --> 22:07.720
Is it still bringing this to Papak this year

22:07.720 --> 22:09.560
or will we have to wait a little longer?

22:09.560 --> 22:12.440
So this paper is going to be submitted

22:12.440 --> 22:15.480
in the initial version of this paper is going to be submitted

22:15.480 --> 22:17.320
in to Eco?

22:17.320 --> 22:18.000
OK, great.

22:18.000 --> 22:20.760
Well, hopefully it's going to come out.

22:20.760 --> 22:22.800
Hopefully it gets accepted, let's say, to I get first.

22:22.800 --> 22:25.280
Because obviously we have to be verified for correctness.

22:25.280 --> 22:27.600
It's okay if it's the key of the show.

22:27.600 --> 22:31.560
Yeah, but we don't want to burn the paper immediately

22:31.560 --> 22:35.760
for sure to try to get it published and then we see.

22:35.760 --> 22:37.160
Well, thank you so much.

22:37.160 --> 22:39.080
Thank you for your question.

22:39.080 --> 22:41.080
Another question?

22:41.080 --> 22:41.920
Yes.

22:41.920 --> 22:42.920
Next question.

22:47.920 --> 22:48.920
Yeah, thanks.

22:48.920 --> 22:50.400
This was a great presentation.

22:50.400 --> 22:52.480
I think, very information dense.

22:52.480 --> 22:56.520
So I'm mainly wondering, do you have either these slides,

22:56.520 --> 22:59.280
but like links to the papers and all the work somewhere,

22:59.280 --> 23:04.520
or I saw the end, you have like a page for the faculty

23:04.520 --> 23:08.400
like to disco, is it all on there or like back?

23:08.400 --> 23:13.240
And I find more information in the end, indeed, to the end.

23:13.240 --> 23:14.800
Animations make it navigation slow.

23:14.800 --> 23:17.240
But so basically, if you go here, that's where all

23:17.240 --> 23:18.920
our publications will list as well.

23:18.920 --> 23:21.160
So some papers and I had listed there

23:21.160 --> 23:22.880
because there's still an owner progress,

23:22.880 --> 23:25.720
but you can find through this site, probably a little bit,

23:25.720 --> 23:30.720
or you can look up my name at like the name of the...

23:32.720 --> 23:34.520
Ah, on the page of your song?

23:34.520 --> 23:36.520
Yeah, yes, as well.

23:36.520 --> 23:40.280
And also really great is cdt.dech website,

23:40.280 --> 23:43.760
which lists super much resources on cdt's

23:43.760 --> 23:47.640
and of very prominent goods researchers.

23:47.640 --> 23:50.160
Well, nothing, I'm not good, but I mean,

23:50.240 --> 23:53.000
more famous researchers, like Martin Klappman.

23:53.000 --> 23:56.560
So cdt.dech is really a very good resource.

23:56.560 --> 23:57.560
Good things.

24:03.760 --> 24:05.000
Yeah, I'm Willem.

24:05.000 --> 24:09.280
System where, basically, you don't want the programmer

24:09.280 --> 24:12.280
to manually specify the CRDT, but you want it to be

24:12.280 --> 24:16.400
generated automatically to a lot of extent

24:16.400 --> 24:18.680
just that depend on the use case,

24:18.680 --> 24:23.840
and are there maybe invariants or properties of the use case

24:23.840 --> 24:27.480
that somehow need to be encoded to render this derivation

24:27.480 --> 24:31.520
and is there tooling to help in identifying that?

24:31.520 --> 24:33.720
So that's actually a really good question

24:33.720 --> 24:37.200
because that's exactly where the problem lies, right?

24:37.200 --> 24:40.400
In variance, dealing with applications, specific constraints.

24:40.400 --> 24:42.920
So if you don't have any constraints,

24:42.920 --> 24:45.040
you can easily build a CRDT.

24:45.040 --> 24:47.120
However, as soon as you add invariants to the mix,

24:47.120 --> 24:51.200
or you have data relations, it becomes much more complicated.

24:51.200 --> 24:55.400
And so because we have a solver that can understand this stuff,

24:55.400 --> 24:57.600
the goal is to have indeed that we can automatically

24:57.600 --> 25:01.200
generate a CRDT's, which even have data dependencies

25:01.200 --> 25:04.440
among them, and we have to respect certain rules.

25:04.440 --> 25:06.080
Now, the first bit that we're going to publish

25:06.080 --> 25:08.960
doesn't have this invariant port yet, because it would

25:08.960 --> 25:13.120
be too large, but the goal is to have this in a subsequent

25:13.120 --> 25:16.320
releases, and we've already been working on the solution there.

25:16.320 --> 25:20.320
And can you help the programmer to specify the invariant?

25:20.320 --> 25:23.600
So the specification for the invariant is basically

25:23.600 --> 25:27.320
you implement, so the spec is given in a scalar-like language,

25:27.320 --> 25:31.400
and there you can describe how your business rule should be

25:31.400 --> 25:34.400
in contrast to operation that are applied

25:34.400 --> 25:38.280
on the data type itself, the sequential one.

25:38.280 --> 25:40.640
So you're supposed to write there, basically,

25:40.640 --> 25:42.920
some of the invariants, but you do it in a high-level

25:42.920 --> 25:45.840
language, and not in something like Koch.

25:45.840 --> 25:48.960
And it's like open source.

25:48.960 --> 25:50.040
Flake is open source.

25:50.040 --> 25:53.600
It's not like something you should use to build the application,

25:53.600 --> 25:56.280
because it's a laboratory, mainly made for myself,

25:56.280 --> 25:59.080
but it's open source, and you can find all source code

25:59.080 --> 26:02.320
on the GitHub, which is linked to my pages as well.

