WEBVTT

00:00.000 --> 00:13.000
Hello everyone, so let's just start a little bit early with Michele D'Nyong.

00:13.000 --> 00:14.000
Yes.

00:14.000 --> 00:18.000
Sorry, the local first and the ultimate bookkeeping system.

00:18.000 --> 00:19.000
Welcome.

00:19.000 --> 00:21.000
Thank you.

00:24.000 --> 00:26.000
Is it running yet?

00:27.000 --> 00:35.000
Yeah, a little talk about some thoughts I've had in the last few years working with data and

00:35.000 --> 00:38.000
data portability and interoperability.

00:38.000 --> 00:43.000
Links data and how you separate data from applications.

00:43.000 --> 00:47.000
It all started with, it started here at first, I'm actually in 2011.

00:47.000 --> 00:51.000
I was working on apps that running your browser and some people told me,

00:51.000 --> 00:53.000
I should quit your day job and crowdfund that.

00:53.000 --> 00:59.000
And I crowdfund it and I got an outnet support and then an outnet has been supporting me.

00:59.000 --> 01:03.000
I've been working a lot on the unhocet project, apps that running your browser,

01:03.000 --> 01:07.000
and you connect your personal data store at runtime.

01:07.000 --> 01:09.000
We have a lot of fun.

01:09.000 --> 01:12.000
We were very young as you can see in this photo.

01:12.000 --> 01:16.000
We went to a village called Unhosh.

01:16.000 --> 01:20.000
And near Prague, just because it was a village with a funny name.

01:20.000 --> 01:23.000
And we offered a bottle of clook, mutted to the mayor.

01:23.000 --> 01:25.000
That's a sort of thing that we were doing.

01:25.000 --> 01:29.000
I think I received 12 an outnets grants.

01:29.000 --> 01:34.000
So if you have an outnets source project, ask for an outnets grant.

01:34.000 --> 01:37.000
It only costs you a few hours to apply.

01:37.000 --> 01:40.000
And maybe you can quit your day job like I did.

01:40.000 --> 01:45.000
So one of the things that I was looking at more recently,

01:45.000 --> 01:49.000
for synchronization of data among systems,

01:49.000 --> 01:53.000
so that we can liberate our data from the look-in,

01:53.000 --> 01:57.000
is, for instance, how you could federate a GitHub issue tracker with a

01:57.000 --> 01:59.000
JIRA issue tracker.

01:59.000 --> 02:02.000
And both GitHub and JIRA have APIs.

02:02.000 --> 02:07.000
So you can run a little process that bridges between those APIs.

02:07.000 --> 02:09.000
And they send web hooks.

02:09.000 --> 02:13.000
So you can say, oh, if there's a new thing in JIRA, send me a web hook.

02:13.000 --> 02:15.000
And I'll do a post to the API of GitHub.

02:15.000 --> 02:17.000
And the issue will get created there.

02:17.000 --> 02:19.000
And the other way around as well.

02:19.000 --> 02:21.000
So all the issues that are in JIRA are in GitHub.

02:21.000 --> 02:24.000
And if there's a comment, you think that as well.

02:24.000 --> 02:25.000
But there's a problem.

02:25.000 --> 02:33.000
Because both these APIs use URLs as universal resource locators.

02:33.000 --> 02:35.000
But they use their own URLs.

02:35.000 --> 02:40.000
So this URL at the top is this is issue number 181.

02:40.000 --> 02:43.000
I did a lot of testing on this wrapper.

02:43.000 --> 02:46.000
And this is one that was synced from JIRA,

02:46.000 --> 02:48.000
but then assigned this identifier.

02:48.000 --> 02:51.000
You still have to map these identifiers between the system.

02:51.000 --> 02:53.000
So even if you want to liberate the data,

02:53.000 --> 02:58.000
you need to make sure that the local identifiers stay in sync.

02:58.000 --> 03:01.000
Now, JIRA has custom fields in the API.

03:01.000 --> 03:06.000
So I could just, when I send an issue from the GitHub tracker to the JIRA tracker,

03:06.000 --> 03:10.000
I would just add a field saying, by the way, this is the GitHub URL.

03:10.000 --> 03:12.000
GitHub doesn't offer that.

03:12.000 --> 03:15.000
So what I did in the end, in this issue,

03:15.000 --> 03:18.000
you can actually look it up in GitHub.

03:18.000 --> 03:19.000
And I'm not sure if you can click Add it,

03:19.000 --> 03:22.000
but when I click Add it on this comment,

03:22.000 --> 03:23.000
I see this.

03:23.000 --> 03:26.000
There's an HTML comment inside the description

03:26.000 --> 03:30.000
from Bridgebot saying the identifier that it has in JIRA.

03:30.000 --> 03:33.000
And that's how you can link these things together.

03:34.000 --> 03:38.000
So that's one of the basic things about the federated bookkeeping

03:38.000 --> 03:40.000
resource that we did.

03:40.000 --> 03:42.000
How do you keep these identifiers in track?

03:42.000 --> 03:46.000
Another way we could have done it is just keep a table in this box,

03:46.000 --> 03:49.000
just remember which issue go with which.

03:49.000 --> 03:52.000
But then if there are multiple bots at the same time,

03:52.000 --> 03:55.000
they would not know about each other's actions.

03:55.000 --> 03:58.000
So by having the annotation right there in the data,

03:58.000 --> 04:00.000
you can keep track of those things.

04:00.000 --> 04:02.000
And when you do these little tricks,

04:02.000 --> 04:04.000
this is a very simple trick.

04:04.000 --> 04:08.000
But it really changes everything with regards to where the data lives.

04:08.000 --> 04:12.000
Because now this no longer is really a GitHub issue.

04:12.000 --> 04:16.000
It's just an issue in general that GitHub has a view on.

04:16.000 --> 04:18.000
And JIRA has another view on it.

04:18.000 --> 04:25.000
So there are many systems of record for this data.

04:25.000 --> 04:28.000
Well in this case, our two systems of record.

04:28.000 --> 04:33.000
And that is a fundamentally different view on software engineering

04:33.000 --> 04:36.000
than the web of data and the web of data.

04:36.000 --> 04:39.000
It's only one system of record for each piece of data.

04:39.000 --> 04:44.000
And I've been trying to build interoperable software with that.

04:44.000 --> 04:48.000
And we always run into, yeah, but now you haven't synced it.

04:48.000 --> 04:53.000
Also, so solid is a personal data store with linked data.

04:53.000 --> 04:58.000
And local first is the concept of having a local copy of your data

04:58.000 --> 04:59.000
that you edit locally.

04:59.000 --> 05:03.000
And then asynchronously, you synchronize with the personal data store.

05:03.000 --> 05:06.000
And those two that match with each other,

05:06.000 --> 05:08.000
because in solid everything has a URL.

05:08.000 --> 05:13.000
But you have a local data store that might be an issue light with a primary key,

05:13.000 --> 05:15.000
which is also an identifier for the same data.

05:15.000 --> 05:18.000
So you have two systems of record,

05:18.000 --> 05:22.000
and they both have their own identifiers for the same piece of data.

05:22.000 --> 05:27.000
So in each system of record, you have a copy of the data.

05:27.000 --> 05:31.000
You have many universal resource locators,

05:31.000 --> 05:34.000
and as you saw, yeah, a URL is a universal visual locator,

05:34.000 --> 05:36.000
which you can always find it.

05:36.000 --> 05:40.000
But it's not a universal identifier of the data,

05:40.000 --> 05:45.000
because it's a dearer specific URL of the independent issue,

05:45.000 --> 05:47.000
or a GitHub specific one.

05:47.000 --> 05:52.000
So that took me years to realize that,

05:52.000 --> 05:54.000
and you know, actually you realize it's feel simple,

05:54.000 --> 05:59.000
but it is a fundamental different way to design software,

05:59.000 --> 06:00.000
I think.

06:00.000 --> 06:05.000
Another thing about, so another thing we worked on this year

06:05.000 --> 06:08.000
is solid data modules, which are little pieces of code

06:08.000 --> 06:14.000
that help app developers to work with data in different data formats,

06:14.000 --> 06:17.000
because you know, maybe there was a bookmarking app already,

06:17.000 --> 06:19.000
and you're also creating a book argument,

06:19.000 --> 06:23.000
that works with solid, even though you both use RDF,

06:23.000 --> 06:25.000
you might be using different ontologies.

06:25.000 --> 06:28.000
So these code snippets is solid data modules,

06:28.000 --> 06:31.000
know all the vocabulary that people have used,

06:31.000 --> 06:32.000
that it knows for instance,

06:32.000 --> 06:36.000
that your name on your WebID profile might be written down

06:36.000 --> 06:38.000
with a friend of a friend for vocabulary,

06:38.000 --> 06:40.000
or maybe with a V-card for vocabulary,

06:40.000 --> 06:41.000
but it extracts that.

06:41.000 --> 06:45.000
So these bits of code make these different apps

06:45.000 --> 06:46.000
interpretable.

06:46.000 --> 06:51.000
But you know, just like software developers like to

06:51.000 --> 06:54.000
pick their own data schemas when they start to develop an app.

06:54.000 --> 06:56.000
They also like to have their own code base,

06:56.000 --> 06:59.000
because it's not just different programming languages,

06:59.000 --> 07:02.000
but also different frameworks and different approaches.

07:02.000 --> 07:06.000
So in theory, you know, you could make that worse.

07:06.000 --> 07:07.000
Like, so I'll use the same code,

07:07.000 --> 07:10.000
and then we'll have a happy software piece around the world.

07:10.000 --> 07:14.000
But in practice, we live in a polygod world with

07:14.000 --> 07:18.000
many code bases at many different formats.

07:18.000 --> 07:23.000
And apart from many code bases, many data formats,

07:23.000 --> 07:27.000
and many identifiers for the same piece of data,

07:27.000 --> 07:30.000
the data that we work on in these apps,

07:30.000 --> 07:32.000
especially when they are local first apps,

07:32.000 --> 07:34.000
will be versions.

07:34.000 --> 07:37.000
So there are many versions even of the same document

07:37.000 --> 07:41.000
that you might get confused about.

07:41.000 --> 07:46.000
So the realization is that just like in RDF,

07:46.000 --> 07:50.000
you have different representations of the same triple,

07:50.000 --> 07:51.000
right?

07:51.000 --> 07:53.000
The data consists of triple in RDF framework.

07:53.000 --> 07:55.000
You can write it down as JSONLD,

07:55.000 --> 07:57.000
or as turtle, it doesn't matter.

07:57.000 --> 08:02.000
You cannot say this triple is in turtle.

08:02.000 --> 08:05.000
You can look at it in a turtle representation,

08:05.000 --> 08:07.000
or in a JSONLD representation.

08:07.000 --> 08:11.000
But none of them are really like the real representation of that data.

08:11.000 --> 08:14.000
And I think we should do that not only for the representation,

08:14.000 --> 08:18.000
but also for the locators and for the location,

08:18.000 --> 08:24.000
and for like different code bases accessing the same data.

08:24.000 --> 08:30.000
Just consider these all like different views on the same piece of data.

08:30.000 --> 08:32.000
Just like, you know,

08:32.000 --> 08:35.000
if you look at a truck from different sides, it looks different.

08:35.000 --> 08:37.000
But we don't see three trucks here.

08:37.000 --> 08:40.000
We just see three views of the same truck.

08:40.000 --> 08:45.000
And if you think of data that exists in software in that way,

08:45.000 --> 08:47.000
then you cannot really say like,

08:47.000 --> 08:50.000
this data is in that system,

08:50.000 --> 08:53.000
or this data is in that system and now I synchronize it.

08:53.000 --> 08:56.000
If the sync is fast enough,

08:56.000 --> 09:01.000
you have two ways sync with on the fly translations of data formats,

09:01.000 --> 09:05.000
then it's really, it becomes one system.

09:05.000 --> 09:09.000
So that's why I think the,

09:09.000 --> 09:11.000
how we always decide software,

09:11.000 --> 09:13.000
like you say you have a database,

09:13.000 --> 09:15.000
and then you have application on top of it,

09:15.000 --> 09:17.000
and then you have representation.

09:17.000 --> 09:20.000
That's a limited way to design software.

09:20.000 --> 09:23.000
It works well if you have VC funding and they say,

09:23.000 --> 09:26.000
well, you need to build a system where you're going to lock people in.

09:26.000 --> 09:28.000
But if you want to build open software,

09:28.000 --> 09:32.000
where your aim is to allow people to liberate their data,

09:32.000 --> 09:36.000
then you should become part of one big bootkeeping system.

09:36.000 --> 09:41.000
In the same way as the web is woven of different web servers,

09:41.000 --> 09:43.000
that all make part of the same web.

09:43.000 --> 09:47.000
You don't pay a lot of attention when a hyperling goes to the same site,

09:47.000 --> 09:49.000
or a different site.

09:49.000 --> 09:52.000
Now, I showed how to do the local identifiers.

09:52.000 --> 09:54.000
That's what we did some research on.

09:54.000 --> 09:58.000
What we still haven't done a lot with

09:58.000 --> 10:03.000
is how to translate on the fly between these data formats.

10:03.000 --> 10:05.000
So we wrote these solid data modules

10:05.000 --> 10:10.000
that read code in different vocabularies and translated.

10:10.000 --> 10:15.000
That's, and we also have documentation about which data formats are used.

10:15.000 --> 10:19.000
But something that is in between documentation and code

10:19.000 --> 10:23.000
is declarative codes, such as shapes

10:23.000 --> 10:26.000
in the solid project we use data shape that the fine,

10:26.000 --> 10:31.000
just like a JSON schema, the fine, what kind of fields could be there and what they mean.

10:31.000 --> 10:35.000
There's a project from InconSwitch called the Cambria project,

10:35.000 --> 10:37.000
which is very promising.

10:37.000 --> 10:41.000
It worked for the small example that they gave,

10:41.000 --> 10:43.000
where they have lenses on the data.

10:43.000 --> 10:45.000
They do it two way translation,

10:45.000 --> 10:48.000
and they actually integrate with TypeScript generation.

10:48.000 --> 10:53.000
And the only thing is that they've worked on it for a few months

10:53.000 --> 10:55.000
and then they went on to work for also.

10:55.000 --> 10:58.000
So I think we should evolve the Cambria project more,

10:58.000 --> 11:01.000
especially not just in the local first community,

11:01.000 --> 11:06.000
but also towards interoperability and freedom of data.

11:06.000 --> 11:09.000
And I want to plug two more projects,

11:09.000 --> 11:13.000
a friend of mine, one is Arrogel from Noelle,

11:13.000 --> 11:16.000
which is a way to view,

11:16.000 --> 11:18.000
and have a view on the data,

11:18.000 --> 11:20.000
and we were just talking here in the whole ways

11:20.000 --> 11:23.000
about doing schema migrations in Arrogel.

11:23.000 --> 11:25.000
So I think that's super promising.

11:25.000 --> 11:27.000
Another one is next graph.

11:27.000 --> 11:29.000
I don't know if you saw the presentation yesterday,

11:29.000 --> 11:31.000
and what next graph does, it's Combines,

11:31.000 --> 11:33.000
local first, with RDF.

11:33.000 --> 11:36.000
So it says, yeah, everything is triples,

11:36.000 --> 11:40.000
but the identifiers used in these triples are not URLs,

11:40.000 --> 11:44.000
but they are universal locators.

11:44.000 --> 11:49.000
So evolving this software with the ground architecture

11:49.000 --> 11:53.000
of saying there's only one bookkeeping system,

11:53.000 --> 11:58.000
every system of record is just a part of a bigger system,

11:58.000 --> 12:01.000
is what I call the ultimate bookkeeping system.

12:01.000 --> 12:05.000
And that's a bit tongue-in-cheek.

12:05.000 --> 12:09.000
Now we don't need any other bookkeeping system here.

12:09.000 --> 12:12.000
Here a bookkeeping is used in sort of a generic way.

12:12.000 --> 12:15.000
It could be, like, the data that accounts work with,

12:15.000 --> 12:18.000
but it could also be bookkeeping your calendars,

12:18.000 --> 12:20.000
calendars that are simply, each other,

12:20.000 --> 12:23.000
just part of the ultimate bookkeeping system.

12:23.000 --> 12:26.000
And yeah, that's what I want to be working on.

12:26.000 --> 12:28.000
So far, I only had the logo.

12:28.000 --> 12:30.000
It's luckily at the existing unicode,

12:30.000 --> 12:32.000
so you can type it lots of times.

12:32.000 --> 12:33.000
Thank you.

12:33.000 --> 12:43.000
If you would do questions or, yeah,

12:43.000 --> 12:48.000
if there are any questions, we have a few minutes left.

12:48.000 --> 13:02.000
Thank you for the presentation.

13:02.000 --> 13:05.000
I love all the ideas.

13:05.000 --> 13:08.000
It seems like you have a lot of ideas for a lot of high-level concepts.

13:08.000 --> 13:13.000
Are there any, like, simple demo type apps that you've worked on?

13:13.000 --> 13:21.000
There's not so many really usable apps that I've worked on.

13:21.000 --> 13:25.000
There are small things, but they're really just demos.

13:25.000 --> 13:32.000
So yeah, there's, for instance, light-dried is a text editing app.

13:32.000 --> 13:37.000
Light-dried.net, and it allows you to use your own personal data store.

13:37.000 --> 13:42.000
I didn't write that, but some people in the unhased the project did.

13:42.000 --> 13:48.000
And so that is an actual, but actual, at least five or six people use it for their real notes.

13:48.000 --> 13:53.000
And in the solid projects, there are a few demo apps.

13:53.000 --> 13:57.000
And I don't know they're used that much, but they do work.

13:57.000 --> 14:02.000
I think Media Cracken is one where you can say through movies, through your poll.

14:02.000 --> 14:07.000
But we haven't shown a second movies app that is compatible with it.

14:07.000 --> 14:11.000
We do have, there is a bookmarking app called Mark Book,

14:11.000 --> 14:18.000
and then there's another book app called Potted, and those two are compatible with each other.

14:18.000 --> 14:21.000
So I wrote a little demo last week of how those,

14:21.000 --> 14:27.000
how you can switch between those applications through co-existence.

14:27.000 --> 14:32.000
They call it, and that's an all-term in data engineering.

14:32.000 --> 14:34.000
So co-existence is where you have two systems.

14:34.000 --> 14:37.000
When, for instance, one company buys another company.

14:37.000 --> 14:40.000
This company, they buy has a calendar system.

14:40.000 --> 14:43.000
But they want to migrate into the new calendar system.

14:43.000 --> 14:48.000
And first, they let the two systems co-exist together with two waysync.

14:48.000 --> 14:52.000
And then at some point, when everybody switched over, they switched off the old one.

14:52.000 --> 14:54.000
That's a really nice way to do a data migration.

14:54.000 --> 14:58.000
And I was talking about that with migration experts.

14:58.000 --> 15:03.000
They said, well, yeah, that's really two idealistic in general, the real way to do it is you just,

15:03.000 --> 15:05.000
migrate to data's, okay, now you're on the new system.

15:05.000 --> 15:11.000
Also psychologically, because, you know, it's hard for people to then actually switch if there's no deadline.

15:12.000 --> 15:18.000
So these are the access. And of course, local first is, you know, local first movement.

15:18.000 --> 15:21.000
There are a lot of apps that actually get used.

15:21.000 --> 15:24.000
They use the local first replication.

15:24.000 --> 15:30.000
But not so much yet, data portability between those apps.

15:30.000 --> 15:32.000
So that's what we were still working on.

15:32.000 --> 15:34.000
Thank you.

15:41.000 --> 15:45.000
Thank you.

