WEBVTT

00:00.000 --> 00:10.880
Thank you for coming to our talk. I am David Binyar's Eek. This is Jan Cherney. We're

00:10.880 --> 00:17.600
both students, master students at the Charles University Faculty of Mathematics and Physics,

00:17.600 --> 00:23.560
which also does computer science, although it's not in the name. And we are the main

00:23.640 --> 00:29.000
trainers, both code and content wise, of Mathis Binyar, our student Binyar that we built.

00:31.240 --> 00:38.040
Thank you very much for introduction, David. Our faculty has a really long-running problem

00:38.040 --> 00:45.080
that finding information about anything was actually really hard. Everything was really fragmented,

00:45.160 --> 00:52.600
and not just the source of information wise. Our school itself is really fragmented. We have

00:52.600 --> 00:57.800
buildings all over the park. We have multiple fields of study. We can study computer science, physics,

00:57.800 --> 01:04.280
linguistics, mathematics. Every building has its own organization and systems they use.

01:04.280 --> 01:13.960
And it's really easy to get lost in it. And so, the attempt of erosion in the past was multiple

01:14.040 --> 01:23.640
wikis, which mostly ended unmentant or broken or full of spam, forums, some social media groups,

01:23.640 --> 01:31.880
which ended up being too chaotic to actually search with bad signal-to-nose ratio,

01:31.880 --> 01:38.120
and more spam that any content it could be useful. Then this code service pop-up. There was many of

01:39.080 --> 01:48.280
them, but they are very hard to archive or search to past. And there is a tons of personal

01:48.280 --> 01:54.280
and organizational websites which are very often outdated or contain only

01:56.040 --> 02:03.320
incomplete information. So, our vision for solution would be to finally make one central place,

02:03.320 --> 02:08.280
where you can find the information or find where you can find the information. Everyone can

02:08.280 --> 02:15.960
contribute it and even anonymously content is organized and categorized. So, it's easy to navigate

02:15.960 --> 02:22.440
and search. It would be very versatile and universal. So, you can put text in it, media files,

02:23.000 --> 02:29.640
PDFs, study materials, photos and anything we need. And everything will be free and open source.

02:29.640 --> 02:36.600
So, everyone can contribute even into the code of the solution itself. And that's how

02:36.600 --> 02:44.040
Matriz wikis was born. This is the front page and how it currently looks like. As you can see,

02:44.040 --> 02:51.000
there are four, there are four main categories, which we'll check later. Currently, some statistics,

02:51.880 --> 02:57.320
we started by importing a lot of content from other dead platforms and replacing them.

02:58.280 --> 03:03.160
There are currently, which generated more than 10,000 items.

03:04.840 --> 03:11.480
Now, about 950 pages were edited from last year and are currently in the

03:12.520 --> 03:18.200
not archives by the living part of the content which is heavily edited by students and

03:18.520 --> 03:27.960
university adjacent people. And since start of this year, 140 items were edited since.

03:29.720 --> 03:36.760
This is the categorization we've finally went through because it's in check-out translated.

03:36.760 --> 03:43.560
This one is about school, this and how it works. This one is about studies, this one about student life

03:43.640 --> 03:50.760
and this is about general tips. School contains information about bureaucracy, how it's organized,

03:50.760 --> 03:56.920
campus information, information about buildings, where you can find it, the hierarchy of power,

03:56.920 --> 04:03.800
what you need to ask for and stuff like that. It's very useful for freshmen who can

04:04.520 --> 04:11.560
diversity and try to orient in the system of university. The study, which is the biggest and most

04:11.640 --> 04:21.880
used category, where you can find information for specific subjects in school and study materials,

04:21.880 --> 04:28.200
past exams, tips, what you need, but even more meta tips like how to set up your latex.

04:28.200 --> 04:37.480
So it's good for your physical reports and other tips like what content management system to use,

04:37.480 --> 04:42.440
knowledge management system to use, how to set up your editor and stuff like that.

04:42.440 --> 04:51.480
Then there is a student life which is mostly to snapshot of our culture, which is very strong

04:53.320 --> 05:00.760
strong. And we have very strong sense of community and long history. So this one is something

05:00.760 --> 05:06.920
that contains it and archives and preserves it and there you can find information from other

05:06.920 --> 05:13.480
organizations from part of the school and what is happening where and the general tips, which are

05:14.600 --> 05:23.000
much more general. There are tech tips that you can do on your dorm, how to set up any tips,

05:23.000 --> 05:29.560
what to choose for Linux distributions and stuff like that. First content as I already said was

05:29.640 --> 05:37.560
important. We've imported from mainstream sources, which we all replaced all WK, which was

05:37.560 --> 05:47.160
Media WK maintained by students, but it died and got a really large problem with bots.

05:47.160 --> 05:52.360
So it became basically unusable. There was another project called Student Googlebook, which was

05:52.440 --> 06:02.200
another media WK, but in an administration of specific group of people. So it wasn't able to be

06:02.200 --> 06:09.800
publicly edited by all students. It was very selective on what it contained. And forum, which was

06:09.800 --> 06:16.840
all page BB forum, which was in its last years mostly bots talking to each other, trying to

06:16.840 --> 06:26.360
sell each other something. What do we do with moderation? Our faculty has a really strong sense of

06:26.360 --> 06:34.360
community and it's a very high trust environment. We allow anonymous edits and basically we have

06:34.360 --> 06:41.160
only three rules of all the WK. User common sense, every article must have exactly one

06:41.960 --> 06:48.840
H1 heading and links to pages should have depth of one. And theoretically it's moderated,

06:48.840 --> 06:57.000
but we never had any malicious actor. And all we did was maybe contact someone who used

06:57.000 --> 07:04.920
macros in a bad way, how it's not intended. But that's basically it. So one of morals, which you can

07:04.920 --> 07:12.600
take from a lecture, is if you are scared of moderation, if you are working in a specific community,

07:13.160 --> 07:20.520
and there is some level of trust, you do not need to worry about it. It actually works really great.

07:21.400 --> 07:27.640
And now I will give space to David to tell more about the software side of Matthus Vicki.

07:28.840 --> 07:34.680
Thank you, thank you. We decided to replace the media WK system that the old WK used,

07:34.680 --> 07:40.520
because it was a really old version, and we didn't want to move forward with that. And we

07:41.560 --> 07:50.280
got made a few decision to make a list of requirements for the new WK. And we expected

07:50.280 --> 07:55.880
there being like tens of different WK projects that one would fit. Well, we need something that

07:55.880 --> 08:01.320
used Markdown because students already know Markdown from GitHub, from GitLab, from various other

08:01.880 --> 08:07.080
applications that faculty uses that do use GitLab, Markdown. We wanted something that could be

08:07.080 --> 08:15.720
extended with custom macros. So we could create plugins that would work to add maybe reusable

08:15.720 --> 08:21.000
components, or even something that could talk to a different server and pool data from it,

08:21.000 --> 08:27.400
without having to manually modify the source code of the parser or something like that.

08:28.200 --> 08:33.880
We wanted multiple input and input formats joined by a common ASD. You may know this sort of

08:33.880 --> 08:39.720
happens by design from Pandok, and we used this to import the other WKs.

08:42.120 --> 08:47.160
And we wanted something that could use the university cast system for authentication, because the

08:47.160 --> 08:52.200
secret of our moderation is that we only allow people with a university account

08:53.160 --> 08:58.680
access to the WK or write access to the WK. If anyone else wants it, you have to write an email

08:58.680 --> 09:05.000
and be nice to us and explain why they want the access. So that's there have been very few cases

09:05.000 --> 09:09.800
of our happening, so it's not a very big workload for us. And the last requirement we wanted

09:09.800 --> 09:15.960
something that didn't require JavaScript for viewing. Sadly, after going through these requirements,

09:15.960 --> 09:22.600
we were left with only two WK software options. Moinmoin and X WK, and the X WK being

09:22.600 --> 09:28.920
really large and written in Java, and it's own scripting language. Partly, we picked Moinmoin

09:28.920 --> 09:35.160
because it was an impiphon and much smaller and more comprehensible for us as the primary

09:35.960 --> 09:42.440
maintainers of the WK. Moinmoin used to be very popular, but then came Python 3 and Moinmoin

09:42.520 --> 09:51.560
version 2 and V2 is still in perpetual beta or even alpha. I think it works mostly pretty well,

09:51.560 --> 10:00.120
but it's a minority choice currently. Well, we didn't want to really modify it too much. We just

10:00.120 --> 10:07.240
wanted to add our own theme to make a bit of a brand statement with the WK to make it look like

10:07.320 --> 10:18.280
something new and a few features here and there may be a bug fix or two. It worked well in our

10:18.280 --> 10:24.680
testing instance, but once we imported the old WK with its tens of thousands of revisions at

10:24.680 --> 10:31.880
first before we cut down all the spam and other bad stuff, it became really slow, like multiple

10:31.960 --> 10:39.080
seconds per a simple page load and tens of seconds for more demanding pages. The problem here

10:39.080 --> 10:46.760
is that Moinmoin 2 and also Moin 1 use the file system for storing everything, data, metadata,

10:48.040 --> 10:54.680
everything. Moinmoin 2 added an index using the WK library and the index

10:55.000 --> 11:02.680
WK is mostly a full text indexing library, so it was used for a search function, but Moinmoin

11:02.680 --> 11:09.640
1 used it for everything, including things like looking up articles by ID or looking up users.

11:11.640 --> 11:18.600
Somewhere along the line, WK, which is now unmaintained, became really slow, because I can find

11:18.680 --> 11:24.920
benchmarks from maybe 2012, which say on data sets bigger than ours, it's really fast,

11:25.640 --> 11:32.760
but currently or rather two or three years ago when we debug this, it was slower than

11:32.760 --> 11:40.520
grep on read and rebuilt almost the whole index on right, so we clearly needed to either fix it or

11:40.840 --> 11:50.120
replace it, fixing it didn't get us anywhere or attempt things to fix it rather, so we decided to

11:51.000 --> 11:57.080
move everything over its very postgres database. First, as just a secondary store, as a secondary

11:57.080 --> 12:05.000
index, replacing the WK and now replacing the whole storage layer of the WK, replacing it with

12:05.000 --> 12:13.000
SQL alchemy and type entities, and after three years of work on this, request now take 50

12:13.000 --> 12:20.760
milliseconds and on a week, you saw the size of our wiki, and I don't think it's small, I think it

12:20.760 --> 12:31.800
deserves the 50 milliseconds. We didn't just do this, the two of us, I think our greatest success

12:31.800 --> 12:41.720
is actually creating and organizing these student meetups, these student hackathons, you could say,

12:42.440 --> 12:48.600
where we got last time I think it was like eight people programming roughly, as a pretty big amount

12:48.600 --> 12:57.800
of people working on the wiki and first converting it to the postgres storage, and then later adding

12:57.880 --> 13:08.840
various functions and making it more user-friendly, we got pizza. This was partly organized by

13:08.840 --> 13:16.200
our sort of partner group, which officially controls the wiki and then hands over the

13:16.200 --> 13:20.120
control to the both of us, but they both are pizza, so thank you, Spot Matrizak.

13:20.200 --> 13:30.280
We didn't just change the storage, of course, that's a big thing, and I think moving from

13:30.280 --> 13:38.520
untyped dict to typed classes, holding the main data in the wiki is a big improvement,

13:39.160 --> 13:45.400
but we also added typing, basically, everywhere, because it's a big help when onboarding other people.

13:45.480 --> 13:51.320
I think those eight or ten people who went to the hackathons couldn't have done what they did

13:51.320 --> 13:59.400
if the wiki had no type hints as they had at the start. Postgres being a proper transactional

13:59.400 --> 14:07.160
database also really helps with integrity issues. We previously had at least once a month, I'd say,

14:07.160 --> 14:13.320
an issue with index, the secondary index, and the primary file system storage de-synced

14:14.280 --> 14:19.400
through something which we were not always able to trace back to its source problem.

14:20.840 --> 14:28.760
Now that storage and indexing is fully done in the same postgres database, we haven't had such

14:28.760 --> 14:36.920
problem yet. We read it, the permission system, it's now simpler and simpler and more powerful,

14:37.560 --> 14:43.880
somewhat at the expense of user-configurability, and I mean, like, a user, not as in the admin,

14:43.880 --> 14:53.320
but the person with an account on the wiki. We put in a more powerful mark on parser with macro

14:53.320 --> 15:00.040
support, because previously Moindmore had macro support, but only using its own custom markup

15:00.600 --> 15:09.720
language. Now, Markdown has full-fact macro support, and we made various UI tweaks to our theme

15:09.720 --> 15:16.600
and to Moindmore and as a whole to make it a bit more friendly to new users who don't have the

15:16.600 --> 15:22.360
wiki training necessary to maybe know how to add attachments when attachments don't really exist.

15:22.680 --> 15:31.960
We also removed a lot of un-maintained features, and we consider that a feature, not a bug,

15:31.960 --> 15:37.800
because those features mostly didn't work, and they made everything much more complicated.

15:37.800 --> 15:43.800
So, during the process of converting a rewriting to postgres, we asked a lot of them

15:45.640 --> 15:51.000
and the wiki is now more maintainable as a result. We're still fixing something,

15:51.000 --> 15:56.280
like, Moindmore and had a great suite of automated tests, most of those are no broken,

15:56.280 --> 16:04.360
because we just haven't had the time to rewrite them with the new code-based in mind,

16:05.160 --> 16:14.920
maybe one day. We have a few hopes for the future. We've just a week ago about

16:14.920 --> 16:20.040
we submitted our fork to the Moindmore developers, which they are still very diligently developing

16:20.040 --> 16:26.360
MoindMind version 2. With the help that they could use, if not the helping,

16:26.360 --> 16:35.480
then at least some of the parts of the code that we refactored. I also like to reuse some of their

16:35.480 --> 16:41.320
improvements that they've made in the three years since we've worked. It'll take a lot of work,

16:41.320 --> 16:49.480
but maybe we can make it happen. I think we've demonstrated that using a free and open source software

16:49.480 --> 16:58.600
in this kind of setting is a reality and not just using, but actually improving it. Without having

16:58.600 --> 17:09.160
to do any complicated things like pay for a maintenance contract or have great support from

17:10.040 --> 17:19.160
some internally at Mathvis. I think the two of us managed it on the side of most official

17:19.160 --> 17:28.920
processes. I think you don't really need moderation if what you're doing is giving select people

17:28.920 --> 17:37.400
access, and that's somewhat of a trueism, but it turns out that select people can mean a whole

17:37.480 --> 17:43.480
university, so thousands upon thousands of people, and we hope that our software will be useful

17:43.480 --> 17:50.600
for other applications as well. Right now, it probably has a few words in that regard. We don't

17:50.600 --> 17:56.120
really test it outside Mathvis Wiki and our development setups, our local development setups,

17:56.120 --> 18:03.480
but over the like 10 people who've come to the HECA phones and contributed to the Wiki,

18:03.800 --> 18:11.640
almost every one of them has found some sort of bug in our onboarding documentation and scripts,

18:13.400 --> 18:19.560
so hopefully all the bugs have been ironed out and you can mostly problem free run your own

18:19.560 --> 18:31.240
instance of our Wiki. Thank you very much for your attention. We hope that you check our code

18:31.240 --> 18:40.360
and you check the website and we hope that we were able to inspire you to do try something like

18:40.360 --> 18:46.920
this because it seems like a big project but it's now here and we are very glad that we didn't

18:46.920 --> 18:53.080
get scared and we actually tried it. These are our contacts. You feel free to reach for us if you have

18:53.080 --> 19:00.920
any questions after if you get any after the questions you can ask here right now. So thank you again

19:00.920 --> 19:03.080
for your attention.

19:31.000 --> 19:40.440
So onboarding process for new contributors on the content side, I presume.

19:41.960 --> 19:55.640
That's something we have documentation, maybe I can, we have a few pages on the Wiki which show

19:55.640 --> 20:01.880
examples of the macros we use, examples of markdown itself, examples of how pages look and we

20:01.880 --> 20:08.920
sort of hope people figure it out by example and if they don't we tell them what they're doing wrong.

20:09.640 --> 20:14.840
Ons are here keeps a tight watch on every page modification every day who looks at the global

20:14.840 --> 20:22.040
history. So that's we haven't figured out anything great or awesome is just

20:23.000 --> 20:29.800
examples and hard work and plus we got advertisement through posters and social media posts by faculty.

20:30.120 --> 20:53.480
I think we had of from all time editors in a hundred it's one hundred 36 editors of all time

20:53.480 --> 21:00.520
with most of them did only small amount and there is a smaller, much more active group which

21:01.880 --> 21:09.960
there's like 80 people who've made five or more edits and 10 people who've made a hundred or

21:09.960 --> 21:11.080
more edits.

21:23.960 --> 21:32.040
How did the faculty react to the fact that apparently the exclamation paper side is not sufficient

21:32.040 --> 21:35.800
to know that you have to make a decision of your own?

21:35.800 --> 21:40.280
Well how did the faculty react that their information is insufficient?

21:41.480 --> 21:48.200
They didn't I think math is decentralized enough that there is no definitive answer to that

21:48.200 --> 21:55.560
question but I think everybody in the faculty they know that sort of distribution of information

21:55.560 --> 22:02.840
on math is I wouldn't call it a four point it's very much it's anarchy so we have some central

22:02.840 --> 22:08.440
system which should contain loads of things and has the potential to contain loads of things but

22:08.440 --> 22:16.120
it's sort of ossified it hasn't been properly developed for I don't know 10 years it's on

22:16.120 --> 22:24.600
life support and all the new system systems have so far failed or not been delivered so

22:24.600 --> 22:30.520
information is on people's personal pages hosted in on various servers in various

22:32.040 --> 22:40.440
buildings or elsewhere or on personal domains or I think they they all know that it's a bit of a mess

22:40.440 --> 22:44.100
thank you

