WEBVTT

00:00.000 --> 00:10.080
Well, welcome, everyone, it's really nice to see how many people have turned up and joined

00:10.080 --> 00:12.680
online, probably.

00:12.680 --> 00:19.120
So my name is Lasto, and I would like to tell you about using Nix for reproducible

00:19.120 --> 00:24.120
by informatics workflows.

00:24.120 --> 00:26.520
I have a pretty unusual setup.

00:26.520 --> 00:34.360
As you can see, I have no R installed, no Python, and also none of the usual bioinformatics

00:34.360 --> 00:37.040
tools that you might have.

00:37.040 --> 00:41.880
So how do I get things done, other than asking my team to do it?

00:41.880 --> 00:52.360
Well, I just enter one of my work folders, and voila, I have R451 available, I go in another

00:52.360 --> 00:58.280
folder, have a different version of R, and if I ever need to compare some sequences,

00:58.280 --> 01:02.680
I can just prepend the comma and blast away.

01:02.680 --> 01:06.480
It hasn't always been like that.

01:06.480 --> 01:13.560
Some years ago, I was a biologist looking for free software to do some statistics and plotting,

01:13.560 --> 01:18.320
to do it properly, so I found R and learned it.

01:18.320 --> 01:25.040
So some scripts, of course, and I realized my scripts started breaking when I upgraded

01:25.040 --> 01:27.240
R or some packages.

01:27.240 --> 01:34.880
So I thought, okay, we need to control this somehow, so I found RM, that is a great software

01:34.880 --> 01:38.120
to do so, more recent than RV.

01:38.120 --> 01:45.480
It controls R and R packages, but not anything else in your system.

01:45.480 --> 01:50.480
So every time I try to run it or restore it on a new computer, then I have to think a little

01:50.480 --> 01:54.120
bit, quite a bit.

01:54.120 --> 02:01.240
So then, for other workflows, I use Docker and CWL, so of course, the question comes naturally,

02:01.240 --> 02:06.400
why can't you just use Docker for your analysis work flows as well.

02:06.400 --> 02:11.480
And it turns out you can, and it actually works really nicely and reproducib bit, Docker images

02:11.480 --> 02:13.920
are reproducible.

02:13.920 --> 02:17.040
The problem there, I think, is that it doesn't scale too well.

02:17.040 --> 02:22.320
And what I mean by that is, if you have just one workflow, it's fine.

02:22.320 --> 02:28.400
If you have multiple tasks that you are trying to do, then you cannot just cram everything

02:28.400 --> 02:30.720
into one Docker image, right?

02:30.720 --> 02:36.920
You need to build some kind of composition of Docker images with base, with R, Python, and

02:36.920 --> 02:43.040
then one for bulk RNA-seek, one for single cell, and one for flow cytometry, for example.

02:43.040 --> 02:49.200
So you can do this with the from statement, of course.

02:49.200 --> 02:54.000
But then you end up still slightly more packages than what you need for each of these.

02:54.000 --> 02:59.320
You cannot create an image for every single project that you have.

02:59.320 --> 03:06.520
And also you have installation of different things, as well, many places in your tree.

03:06.520 --> 03:13.200
So I will still not satisfy with the granularity, and that Docker is not really atomic in that

03:13.200 --> 03:14.680
sense.

03:14.680 --> 03:22.400
So I went further and I found Nick's, which is, I will quickly introduce here.

03:22.400 --> 03:28.440
So it's a package manager that controls everything.

03:28.440 --> 03:34.120
And this is what really drew me to this, is that it's not only R, not only Python, it controls

03:34.120 --> 03:36.200
all the system dependencies and everything.

03:36.200 --> 03:41.400
You have packages for all kinds of software.

03:41.400 --> 03:49.000
You can build software truly reproducibly, and it's isolated, and it's atomic, so everything

03:49.000 --> 03:50.000
I needed.

03:50.000 --> 03:58.800
It did change my life completely the way I work for the better.

03:58.800 --> 04:01.240
So how does it manage packages?

04:01.240 --> 04:09.600
It's a monorepo that has basically software recipes written in the next language.

04:09.600 --> 04:16.800
Each recipe has the source, all is dependencies defined, and how you build that particular

04:16.800 --> 04:18.120
software.

04:18.120 --> 04:21.920
And then of course all the dependencies are also contained in mixed packages, so it's just packages

04:21.920 --> 04:26.880
all the way down to zero.

04:26.880 --> 04:30.520
And you build something, it ends up in the slash Nick's store, so it doesn't interfere

04:30.520 --> 04:33.080
with anything else on your system.

04:33.080 --> 04:39.040
It is also dependent on the unique hash that is calculated based on the recipe itself,

04:39.040 --> 04:42.800
and the dependencies recipe recursively.

04:42.800 --> 04:48.720
So if anything changes, even an upstream package version changes, you get a new instance

04:48.720 --> 04:56.560
of your software, and these different variants can leave next to each other peacefully.

04:56.560 --> 05:02.560
And that's what you saw, like I could have many different versions of our on my computer.

05:02.560 --> 05:06.920
You don't have to build everything yourself, the rescaching built in, so most of the time

05:06.920 --> 05:16.800
you end up just downloading the binary, but sometimes you can also build it yourself.

05:16.800 --> 05:24.880
It is huge, so Nick's package is the biggest repository out there, which is a bit of a cheating,

05:24.880 --> 05:31.480
because it contains all the language-specific packages, including by a conductor and

05:31.480 --> 05:38.120
cron, but as a bi-informatician, I think we don't mind too much about this.

05:38.120 --> 05:44.560
And these are all built with a single Nick's expression, and I think about 80-90% of the

05:44.560 --> 05:47.360
packages work out of the box.

05:47.360 --> 05:52.720
Some needs special attention because of how Nick's is, so it's because of the immutable nature

05:52.800 --> 05:58.240
of Nick's, or that you cannot download things during installation, or you need to point out some

05:58.240 --> 06:00.440
system dependencies.

06:00.440 --> 06:08.680
So for that, we have a pretty nice community, you are welcome to join the Matrix Channel.

06:08.680 --> 06:16.120
Around our bi-oconductor really cycles, we get together and hack away at night to make

06:16.120 --> 06:23.600
those new packages work again, but the ones that are breaking, of course.

06:23.600 --> 06:28.680
And if you want something, if you are not happy with the really cycles, you want something

06:28.680 --> 06:34.120
bleeding edge, then you can check out Bruno Rodriguez's, our stats on Nick's, where he has

06:34.120 --> 06:37.000
daily snapshots.

06:37.000 --> 06:40.560
Right.

06:40.560 --> 06:48.080
So if you are not yet into Nick's, even after today's talk, you can do this also with

06:48.080 --> 06:54.480
purely with R. There is the rig's package for you. You can just use a function, define

06:54.480 --> 07:02.520
the R version, the packages that you need, and you can have also system packages, ID's,

07:02.520 --> 07:04.440
you name it.

07:04.440 --> 07:10.160
You can also convert an existing RM into an next expression, and so this will just

07:10.160 --> 07:15.160
speed out an next expression for you. It's a file that you can then check in to get

07:15.160 --> 07:22.160
for example, and with Nick's shell, you just enter and you have your environment that you

07:22.160 --> 07:26.160
defined.

07:26.160 --> 07:30.840
Yeah, I could talk a little bit about workflows, but I think for the sake of time, maybe

07:30.840 --> 07:36.920
I will just quickly mention, you can manage workflows, there are two initiatives that I know

07:36.920 --> 07:47.160
about, bionics and rig's press that you can try to, a similar to targets, for example,

07:47.160 --> 07:51.040
or next floor.

07:51.040 --> 07:55.640
But what I would like to talk a little bit more about is Docker, because I might have given

07:55.640 --> 08:01.800
the expression that Docker and Nick's are somehow mutually exclusive, but they do work really

08:01.800 --> 08:03.680
well together, actually.

08:03.680 --> 08:13.400
And this you can do in two ways, you can either define Docker image environment in an

08:13.400 --> 08:21.880
next expression that you see here, for example, build layer image, and then you add the software,

08:21.880 --> 08:27.280
you want in your image, and then you will get a Docker image as a product.

08:27.280 --> 08:31.760
This is more reproducible than Docker files, which are not reproducible.

08:31.760 --> 08:34.760
So I like to use it to make my Docker images.

08:34.760 --> 08:41.640
It also produces quite slim images, so it's less at a surface, right?

08:41.640 --> 08:47.440
Another way that I found useful, if you are, let's say in your workplace, you are tied to virtual

08:47.440 --> 08:55.080
machines or containers, you must use this, then you can just install Nick's on top.

08:55.080 --> 09:01.880
Now you have a single image to maintain, and then Nick's can pull in the necessary dependencies

09:01.880 --> 09:08.080
for each of your tasks, right, that you, for each task, you can define a shadow-snips file,

09:08.080 --> 09:12.240
you define what you need for that task, and Nick's will take care of it.

09:12.240 --> 09:18.120
So let's meet the non-DocardMazes anymore.

09:18.120 --> 09:20.400
So what are the benefits of using it?

09:20.480 --> 09:28.280
I like the fact that I can rest assured, I'm going much faster than before, that you can

09:28.280 --> 09:35.880
rest assured that you will have the environment that you find, and you will have it five years

09:35.880 --> 09:43.400
later, or 10 years later, you can also share your environment with your colleagues, and

09:43.400 --> 09:51.640
you can be sure that they can run your code, CICD pipelines or containers equally.

09:51.640 --> 09:58.960
And that is because you can store your environment as part of your code and fire it up anywhere.

09:58.960 --> 10:04.840
I also enjoyed being part of the ARP packages maintainer community, so I've learned a lot

10:04.840 --> 10:11.600
from them as they are quite knowledgeable in the Nick space.

10:11.600 --> 10:19.760
But I can say also is that learning Nick's is an experience, it's a journey, it's not,

10:19.760 --> 10:27.040
I can compare it to the experience when I first installed Linux, so you kind of realize

10:27.040 --> 10:32.720
there is something else out there, something very interesting and very cool, and it changes

10:32.720 --> 10:35.440
the way you think about software.

10:35.440 --> 10:44.600
So with that, thank you for your attention, and looking forward to these questions.

10:44.600 --> 10:45.600
Thank you.

10:45.600 --> 10:50.800
We have time for two questions, but I would like to make a small comparison, because Nick's

10:50.800 --> 10:57.440
is not the only player in the game, and we also have Nick's, and afterwards later on we

10:57.440 --> 11:00.760
have a small lighting talk about Greeks, so it would be nice to compare.

11:00.760 --> 11:04.200
We would have loved to have together, but you could have.

11:04.200 --> 11:09.280
So issue of questions, and by the way, if the next speaker could get in line, so you can

11:09.280 --> 11:11.480
just prepare the next talk.

11:11.480 --> 11:12.480
I have any questions?

11:12.480 --> 11:42.040
So the question is what was the original use of Nick's?

11:42.040 --> 11:47.040
Actually, I was surprised to find out that Nick's is really old, so I think it's about

11:47.040 --> 11:55.320
20 years old now, which means it predates Docker, but it's never been that popular,

11:55.320 --> 11:59.880
I think now I see a bit of an uptick in popularity.

11:59.880 --> 12:06.280
What was, I think, the original purpose was reproducibility, so how to build software in

12:06.280 --> 12:11.840
a reproducible way, and actually you can use it not only for software, but you can

12:11.840 --> 12:20.160
also, any kind of digital product, let's say.

12:20.160 --> 12:26.960
But at the original purpose, I think it was that, am I correct?

12:26.960 --> 12:33.080
So you say that one of the problems of using Docker images is that you have to have an

12:33.080 --> 12:38.080
environment with the many dependencies, many different tools, but what if you will use, for

12:38.080 --> 12:43.040
instance, it seems like by a containers and you type to optimize your processes, it's

12:43.040 --> 12:47.880
like what we're doing in a port, that we have a module that runs a single pull or a couple

12:47.880 --> 12:52.800
of tools, but not type to have everything, I see it in a block area in it, and then you

12:52.800 --> 12:56.400
don't rely in the community to have this image, I don't want anything in this make make

12:56.400 --> 13:01.240
is the same approach, so why didn't you see Nick's one days already at Nick communities

13:01.240 --> 13:03.240
you've seen is the game?

13:03.240 --> 13:04.240
Okay.

13:04.240 --> 13:05.240
Can you repeat the question?

13:05.280 --> 13:06.240
Yeah, absolutely.

13:06.240 --> 13:16.360
So the question is, I tried to summarize the question, I think it was, why not use publicly

13:16.360 --> 13:24.960
available images for single workflow steps, right, containers that are defining environments

13:24.960 --> 13:27.040
for workflow steps.

13:27.040 --> 13:32.620
So I think as long as I was working with workflows, like for example, I don't know

13:32.620 --> 13:39.300
are any sick mapping, I read mapping, I would say, it was okay because then you can really

13:39.300 --> 13:46.780
just pass it on, when I started to feel the need for Nick's was actually more downstream

13:46.780 --> 13:54.580
analytics, so when you are using various arpecages for different tasks, and yeah, you

13:54.580 --> 13:59.420
have to build the container with R in it, and then this set of packages and another container

13:59.420 --> 14:06.420
with another set of packages, and yeah for each project you need just a little bit different

14:06.420 --> 14:19.900
environment, so that's where I started using it, but yes, and actually you can, I could

14:19.900 --> 14:29.260
see it working really well together with NF Core or Galaxy, if it would support Nick's

14:29.260 --> 14:33.940
defined environment, right, in the meantime, we can build a Docker containers with Nick's

14:33.940 --> 14:34.940
as well.

