WEBVTT

00:00.000 --> 00:21.000
The risk of supply chain attacks have been lost here in the media, quite popular, not only security experts,

00:21.000 --> 00:27.000
we are discussing the topic, but also politicians and society.

00:27.000 --> 00:37.000
Not only attacks on the physical infrastructure and on plans, but also on the digital site

00:37.000 --> 00:43.000
have been discussed. The most major incident was the supply chain attack on the

00:44.000 --> 00:49.000
targetting the open message server to gain remote code execution.

00:49.000 --> 00:56.000
How we can detect this attack and similar ones will be the topic of my talk.

00:56.000 --> 01:01.000
Who am I? I am Tobias Cyber Security Researcher from Germany.

01:01.000 --> 01:06.000
I am working at the Final Institute of Applied Integrated Security.

01:07.000 --> 01:16.000
My main research areas are critical analysis and offensive security, both mainly in the area of automotive

01:16.000 --> 01:22.000
and embedded. You can find me online on GitHub and on Matrix.

01:22.000 --> 01:30.000
The agenda of this talk will be as follows. I will start with a small background about supply chain attacks in general

01:30.000 --> 01:40.000
and in particular the accept attack. I will introduce you to build systems on C++ and the DBN packet build system

01:40.000 --> 01:44.000
to better understand the tech vector and the mitigation.

01:44.000 --> 01:50.000
I will then introduce you to my concept called supply graph.

01:50.000 --> 01:59.000
Very well traced build process. We can then analyze the graph to hopefully detect the accept attack.

01:59.000 --> 02:06.000
At the end I will go into the limitations of my approach and to future work.

02:06.000 --> 02:13.000
The background of supply chain attacks can probably be best described by this famous XKCD comic.

02:13.000 --> 02:21.000
It shows that all modern software is not standing on their own but they are built on top of multiple dependencies.

02:21.000 --> 02:24.000
We can also call them supply chain.

02:24.000 --> 02:36.000
In our example of the accept project, we have the open message server on top and the accept library would be the small piece down there,

02:36.000 --> 02:40.000
which is the weakest point in the chain.

02:40.000 --> 02:45.000
If this is attacked, all other components are affected as well.

02:45.000 --> 02:53.000
The accept attack, as I mentioned, target the open message server to gain remote code execution.

02:53.000 --> 03:08.000
The attackers infreate the dependency, so they are not attacked the open message project directly but the dependency, the supply chain attack.

03:08.000 --> 03:16.000
The attackers are not attacked the source code of accept directly, so they have been no trace in the source code.

03:16.000 --> 03:25.000
But they use the build system and the way they build the project to enter their demolishes code.

03:25.000 --> 03:36.000
So they use the test vectors and decrypted test files and then they reveal the demolishes code, which I will show in a second.

03:37.000 --> 03:57.000
This attack was only detected by co-incidence because the log on time of the open message server was a little bit longer due to the attack and this reveals the questions of there are any undetected attacks running, which was the main motivation for my talk.

03:57.000 --> 04:08.000
You can see a schematic of the attack on the one hand side, we have the legitimate source code of the project and on the other hand we have the attacker modified test vector.

04:08.000 --> 04:23.000
The source code is compiled to object file and test vector is extracted to reveal the malicious code both are linked together to the shared library and shipped to other projects.

04:23.000 --> 04:35.000
To better understand the whole process, we go into the build chain itself for C++ we have a very diverse and complex universe.

04:35.000 --> 04:42.000
There are platforms specific build systems like Visual Studio on Windows or Xcode on Mac.

04:42.000 --> 04:46.000
I can now order tools have been quite popular on Linux for some time.

04:46.000 --> 04:55.000
There are Mac files and CMAC to only name a few. Those build systems are often touring complete programming languages on their own.

04:55.000 --> 05:06.000
So you can do right on applications, basically with them you can do the dependency check getting test case execution and whatnot.

05:07.000 --> 05:24.000
They often involve recursive file structures they include internal and external files and all this together makes a very complex and transparent system and as we know those tend to be prone to attacks.

05:24.000 --> 05:39.000
I will now go into a few details about the deviant package system. I chose deviant because it's quite common and it has huge community but the same concept you can apply to other build systems as well.

05:39.000 --> 05:51.000
On the one hand side you have the upstream source code and on the other side you have the deviant specific additional so this can be metadata like the package name version and so on.

05:51.000 --> 06:00.000
What is important to know is that the original source code is unmodified by deviant and it's only extended with the separate David configuration.

06:00.000 --> 06:13.000
The build process then reveals the build artifacts like the binary file, configuration file, icons, whatever and those are all packed into the deviant package and then shipped to the customer.

06:13.000 --> 06:20.000
So when you opt install a packet you only get the binary files but not the source file.

06:21.000 --> 06:34.000
And now I will introduce you to my concept called supply graph. The main idea behind this is that for source projects all program parts should be available in source code.

06:34.000 --> 06:44.000
So that is a common interest we share in the open source community and this is as we have seen the point where the exact attack.

06:44.000 --> 06:51.000
You cannot kind of verify because there is a binary block popping out of nowhere.

06:51.000 --> 07:04.000
So how we can then detect the attack we need to trace the build process then we can build the graph and we can traverse the graph to find the attack.

07:04.000 --> 07:11.000
I will now go step by step to see how we can do.

07:11.000 --> 07:23.000
To trace the build process I used a tool called code checker. It's originally a tool for study code analysis but it can also capture the build process originally in a different configuration.

07:23.000 --> 07:31.000
It only captures the compile commands but we can instruct it to capture also the linking commands.

07:31.000 --> 07:41.000
It uses the preload for those of you are interested in intercepts all the execute calls tracing the external command execution.

07:41.000 --> 07:52.000
I modified it a little bit to also capture additional tools like the sampler, the archive copy and install commands which are used as part of the rebuild process.

07:52.000 --> 07:59.000
I also extended the compilation commands tracing file to add an output entry.

07:59.000 --> 08:09.000
We can see on the right and the example it shows a clung compiler execution it shows the directory where the command was executed.

08:09.000 --> 08:17.000
The file which is the output file of the compilation step and the input file which is then put and the output is the output.

08:17.000 --> 08:25.000
So thinking about the graph structure the input file and output file are nodes and the edge between those nodes is the compile command.

08:25.000 --> 08:33.000
We can do this not only for the compilation step but also for every step in the build process which gives us a graph structure.

08:33.000 --> 08:37.000
An example graph structure can be seen here.

08:37.000 --> 08:43.000
We have the source files on top. They are then built into object files.

08:43.000 --> 08:49.000
The object files are linked into for example an application and shared library.

08:49.000 --> 08:55.000
Both are then packed into the DBN build packet.

08:55.000 --> 09:08.000
Now we can take this graph. We can capture the graph of the vulnerable extension and we can turn it upside down and we can see if we can find the attack.

09:08.000 --> 09:21.000
We start with the DBN packet of the lib and as part of other files it contains the vulnerable shared object file.

09:21.000 --> 09:39.000
This contains, for example, the CRC32 fast and the CRC64 fast object file and an additional CRC64 fast file which is located in dotlips folder which is hidden forward on Linux.

09:39.000 --> 09:43.000
So this is already where it is suspicious.

09:44.000 --> 09:52.000
If we trace it further we can see that for the first two object files we can reconstruct the according source files.

09:52.000 --> 10:01.000
But for the third object file there is no corresponding source file and indeed this is the malicious code of the attackers.

10:01.000 --> 10:07.000
So this we can successfully detect the exact attack.

10:07.000 --> 10:16.000
The whole graph of the DBN packet of the accepted utils is way more larger than we saw in the small excerpt.

10:16.000 --> 10:29.000
So this is a picture of the whole graph. On the top layer we can see all the source files in the middle layer we can see the object files and the shared libraries and on the bottom we can see the DBN packet.

10:29.000 --> 10:41.000
And on the top right corner if we look close we can see the the malicious binaries.

10:41.000 --> 10:47.000
So you can find all this work on GitHub as a proof of concept.

10:47.000 --> 10:58.000
It also includes a Docker container with not only the exact example but also open message edge and open message edge so you can play with it and modify it.

10:58.000 --> 11:01.000
And modify it as you like.

11:01.000 --> 11:04.000
There are a few limitations to my approach.

11:04.000 --> 11:17.000
First tracing the build system is unreliable as we have seen it is quite complex it involves different commands we need to pass the commands to get input and output nodes.

11:17.000 --> 11:20.000
So this is error prone.

11:21.000 --> 11:30.000
There are also legitimate ways of anomalies in the tree for example code generation as an example.

11:30.000 --> 11:36.000
I saw that open message edge uses some press scripts to generate highly optimized assembly code.

11:36.000 --> 11:43.000
So this assembly code was not part of the original archive which is suspicious in the first way.

11:43.000 --> 11:53.000
We have also code generation as part of protocol for example to generate encoders and decoders.

11:53.000 --> 12:05.000
If we can cover those limitations we could scale this up to all DBN packages on a regular basis to detect any attacks.

12:05.000 --> 12:16.000
Also implement other detection mechanism on the graph so the example of finding all source files for the object file is only one example.

12:16.000 --> 12:30.000
I also implemented a check that all source files are part of the original tarball but you can imagine any other mechanism you can implement it on the graph.

12:30.000 --> 12:38.000
If this at the end we hopefully can prevent attacks on the build system in general or at least make it harder to hide them.

12:38.000 --> 12:47.000
This all of course is trying to work with this whole complex build system.

12:47.000 --> 12:56.000
So my vision would be that at the end we can create descriptive build system where we have a strict rule set what is possible and what not.

12:56.000 --> 13:10.000
So we can static analyze certain requirements again to build system so we can in the first they prevent all suspicious work.

13:10.000 --> 13:19.000
As an acknowledgment this work was founded by the German Federal Ministry of Education and Research as part of the attacker project.

13:19.000 --> 13:28.000
That's it from my talk if you have any questions or ideas you can find me around at foster or online. Thank you.

13:40.000 --> 13:47.000
Thanks for watching.