WEBVTT

00:00.000 --> 00:11.280
By break is over, on to the next talk, and we've been really fortunate to have a really

00:11.280 --> 00:20.240
nice spread of topics covered, so now we move on to, well, polytopes, or metabolic flux,

00:20.240 --> 00:37.040
due to side, and my place to introduce Zalion. Is it copoulos? OK. Thanks a lot. So, today, I would

00:37.040 --> 00:44.920
like to introduce Dingo, which is a Python package for a sampling on a metabolic networks.

00:45.880 --> 00:53.680
OK. So, I will start with some intro, probably most of you know it, but it's a good

00:53.680 --> 00:59.240
background introduction. So, probably you know that in our cells, they're having a thousand

00:59.240 --> 01:06.280
of reactions in every cell. So, here we are interested in the input and the output of those

01:06.280 --> 01:14.040
reactions, what we call them metabolites. And of course, you can model all these reactions and

01:14.040 --> 01:24.440
the metabolites that take place on these reactions as a net cork. OK. So, those net corks, we are

01:24.440 --> 01:31.960
interested in the flow rate, the rate of a reaction, the rate that happens inside one reaction,

01:31.960 --> 01:41.480
that we call it flow. OK. So, if you see the reactions are the rows, and then this VI are the

01:41.640 --> 01:50.920
fluxes, all the flows. OK. So, you can also imagine all this set of flows, that we can create

01:52.520 --> 02:01.960
a flow vector. OK. So, this can be modelled as a bolt-up, and this bolt-up is a feasible set,

02:01.960 --> 02:07.800
and the feasible set contains all the fluxes that balance is the net cork. So, I'm

02:07.880 --> 02:15.560
skipping the linear algebra here, but you can model it as a linear system, actual linear inequalities,

02:15.560 --> 02:22.920
where all the feasible solutions of this inequalities are the fluxes that balances my net cork.

02:22.920 --> 02:31.000
So, I want to study all this space. OK. Of course, net corks in real life are like this. So,

02:31.000 --> 02:37.960
this is a recon one, the human metabolic network. As far as I know, there are also two more

02:39.960 --> 02:47.240
larger versions, like recon two or three that are even larger. OK. And all this net cork will

02:47.240 --> 02:55.160
become for us a convex molecule. So, if we're, OK, imagine that box, this is a convex

02:55.160 --> 03:02.600
volt-up in a three-day space, but here you have as many dimensions as our reaction. So,

03:02.600 --> 03:11.320
can be thousands of dimensions. OK. And this volt-up is so the red one represents the balance.

03:11.320 --> 03:19.560
So, the steady states. OK. Now, if we want to do some optimization, like the optimal steady states,

03:19.640 --> 03:26.520
with respect to, for example, some biomass objective function, then we have to do some optimization,

03:26.520 --> 03:32.920
which is simple linear optimization. This is the method that called FBA. So, in the

03:32.920 --> 03:40.520
polytop, this means that you would like to find the blue thing, which is a facet, like let's say if

03:40.520 --> 03:48.840
you have a box, it's one facet of the box. This is optimal according to the biomass. So,

03:48.840 --> 03:56.120
there, all the fluxes are optimal with respect to this object. OK. So, this is one, let's say,

03:56.120 --> 04:06.680
biased way of studying this, because FBA will give you one vertex of this blue facet that corresponds

04:06.680 --> 04:14.760
to the optimal state. The other way is the unbiased. So, we can do some link on this facet,

04:15.400 --> 04:22.200
and take all the possible, let's say, this will cover all the possible stages that are optimal.

04:23.160 --> 04:31.160
So, this is the last figure. So, imagine that you do some, in this case, uniform sampling on this facet,

04:31.160 --> 04:42.200
and then it's point of the sampling represent some optimal flux. OK. In order to do it,

04:42.200 --> 04:50.200
we, we build a package called Dingo. It's a Python, it's written Python. It has several sampling

04:50.200 --> 04:59.480
and rounding algorithms, which are based on a C++ library called Valeste. So, the library that

04:59.480 --> 05:07.160
we also maintain, and everything that it needs performance is on C++. So, Python contains

05:07.800 --> 05:14.760
the bindings, and also some extra functions, like loading models from different standards,

05:15.560 --> 05:23.160
loading utilities, and do some statistics like computing copulus or joint distribution,

05:23.160 --> 05:30.280
and things like that. OK. So, yes, this also published in by format some vansis. If you want to see

05:30.360 --> 05:40.120
the paper, and we are a small team working on this problem. So, let's, so we case,

05:41.960 --> 05:49.640
how you can use Dingo to start these fluxes. So, in the first line, you, you just import

05:51.880 --> 05:58.840
metabolic numbers, and the sampler from the from Dingo, and then you create a model,

05:58.920 --> 06:06.680
or the second line, this is a simple model from the, they call it. So, this is quite small. If

06:06.680 --> 06:14.520
you run it and your computer, this will be really fast. Then you can do, we do FBA. This is the,

06:14.520 --> 06:22.840
let's say, the bias method. This will give us one value, one flux value, that it's optimal.

06:23.800 --> 06:29.800
And then we try to, to sample. So, we create a sampler, like Paul, to a sampler given the model,

06:30.360 --> 06:38.040
and the, the, the, the Chevy, the computer is not Chevy, part is the line that,

06:38.920 --> 06:45.640
computer state states, like sampler generates state states. So, there, ESS is a statistical thing that,

06:45.640 --> 06:56.120
uh, somehow tells you that we want 3,000 samples that are somehow statistical and biased.

06:56.120 --> 07:01.240
So, and we want to, to have 3,000 points that are uniform. Let's say, on this,

07:01.240 --> 07:06.840
fast it, on some high dimensional space. Okay, and then we plot the histogram. So, in the, in the

07:06.840 --> 07:12.760
histogram, you can see the blue line is FBA, the information that FBA gives you. And the histogram

07:12.840 --> 07:19.560
is all the samples. The, the flux is a fusion, we compute from the samples. And this is,

07:20.600 --> 07:25.800
uh, from one reaction in the network, and this is from another reaction. So, even from this,

07:27.000 --> 07:35.560
you can understand that FBA can give you a different information. Okay, um, okay, you can do it for,

07:35.560 --> 07:41.240
all the reactions, and you can do it for different networks. Okay, now, the other thing that you can

07:41.320 --> 07:49.720
do is that, uh, you can also start the, uh, how reactions, um, the, the connection between the

07:49.720 --> 07:55.960
dependencies between the reactions. And in order to do it, we, we use copulas, which is, um, when you,

07:55.960 --> 08:03.240
you start it's own, uh, uh, Martin at distributions. So, here, I select two different reactions from

08:03.240 --> 08:09.880
my network, uh, account and, uh, PPC, and I compute, uh, the copula again with, uh, sampling. So,

08:10.840 --> 08:16.200
and then I have this graph. So, this means that when the one, the reaction, the, the flux in one,

08:16.200 --> 08:22.840
a reaction goes up, the other also goes up. That could be, uh, also different. So, this is a way

08:22.840 --> 08:31.560
to study, um, uh, the dependence, uh, the, and bias dependence between, uh, two different, uh, reactions.

08:31.640 --> 08:43.560
On my network. Okay, and then one thing that you could also do, uh, is, okay, this is maybe a

08:43.560 --> 08:50.440
high shot, uh, that you can, you can use this to do some targeting. Uh, what we did is that we, we took

08:50.440 --> 08:57.000
a paper from rents at all, that, uh, they generate a host virus network. They're going to study,

08:57.960 --> 09:05.640
um, the actions that are related to COVID in order to create a drug. Uh, so here you have two objective

09:05.640 --> 09:10.680
functions. They model it having two objective functions, the human biomass and the virus grows.

09:11.800 --> 09:18.200
And then they compute the FBA. So, what we did is, we said, okay, instead of FBA, let's say compute,

09:18.200 --> 09:26.280
let's say, let's do some sampling. Okay, um, what they show in FBA, that, uh, all,

09:27.080 --> 09:31.640
most of the reactions have the same FBA, but one reaction have a different FBA. So what we do with

09:31.640 --> 09:39.080
sampling is that, uh, we look at the flux rate, uh, the distribution, uh, of the host biomass

09:39.080 --> 09:45.400
and the virus growth rate. So one is human with COVID and the other is human without COVID. And we

09:45.400 --> 09:52.600
look at the, if the flux does not change, means that this reaction probably, uh, does not have to

09:52.680 --> 09:58.200
do with COVID or it's not a target that we want to study. But if we have something like this to

09:58.200 --> 10:05.080
different flux, then this means that this reaction is doing something in my organism. So the question

10:05.080 --> 10:11.240
here, uh, that we could raise is that, uh, is it possible that sampling can give you more information

10:11.240 --> 10:21.080
than FBA? It seems that uncertainty gives you, but can we use it for targeting? Okay, uh, so this

10:21.080 --> 10:27.880
is my last slide. I would like to, uh, give some notes about what is the current and future work.

10:27.880 --> 10:34.440
We have some Google summer, of course, the projects. Uh, two of them, uh, the last year is

10:34.440 --> 10:41.080
one, uh, sampling from the boundary. So now we sample inside from the, the, the, but it seems that

10:41.080 --> 10:46.520
it's interesting for some applications, uh, that you should sample from the boundary. So we had

10:46.600 --> 10:54.120
one student that, uh, implemented the boundary sampling, which is more, let's say, difficult to,

10:54.120 --> 10:58.920
do that. We don't have a lot of values or sample from the boundary. Let's say that it's not

10:58.920 --> 11:04.920
convex as a problem. Uh, and then there is another student that was doing, uh, statistical analysis.

11:04.920 --> 11:12.360
So if you remember the copula, this is, you can imagine that every cell of this, uh, matrix is

11:12.440 --> 11:19.320
one copula. But instead of loading the copula, we just have a patient correlation and we, we'll

11:19.320 --> 11:25.640
be the, uh, we just draw the value here. So this is somehow the connections between all the,

11:25.640 --> 11:34.200
all the reactions, uh, in a network. Uh, yes, and this is, yes, this is in, uh, in the pockets.

11:34.200 --> 11:41.320
You can use it. Um, yeah. So that's it. Uh, the repository, it's on GitHub. Uh, we have a call

11:41.400 --> 11:49.960
up a notebook that you can run all of these and, uh, take and there will also have a, uh, paper, uh,

11:49.960 --> 11:59.720
pockets. Uh, that's all. Thanks a lot. And we, and Olga, would you like to start

11:59.720 --> 12:10.920
coming to set up? Uh, questions. So this was probably the first tool that we had with more

12:10.920 --> 12:23.240
questions. So I tried to remove them. It's very hard. I mean, just that this is, um, either

12:23.240 --> 12:28.440
sort of static solution versus numerical simulation to get a, uh, more of a more broad space

12:28.440 --> 12:36.120
solutions. Um, so what sort of compute would work better for the, the copula, uh, computation?

12:36.120 --> 12:43.320
So if you're doing all dimensions or reactions against all reactions, uh, possible, is there a parallel,

12:43.320 --> 12:50.120
is there fully parallelized? So it depends on the network. So for RECON 3D, uh, which is the most

12:50.200 --> 12:57.080
advanced model, uh, even sampling, one sampling means that you, you get, uh, one copula,

12:59.320 --> 13:05.960
used to take, uh, okay, maybe days. So for the whole matrix, I don't know. Yeah. Um,

13:07.960 --> 13:18.440
but, uh, okay, okay, okay, okay. But, uh, yes, probably you can, probably you can, because, um,

13:18.520 --> 13:24.920
if you have the sample, then you can do the, you can have the matrix. Yes. So the sample gives you,

13:24.920 --> 13:28.920
let's say that in the sample, you have a point that it's 1,000 dimensional,

13:29.560 --> 13:34.600
every coordinate corresponds to a reaction. So if you have the sample, you have everything.

13:35.720 --> 13:41.400
So let's say, yeah, in a day, it's not bad. All right. Thank you very much. Thanks.

