Christmas is just around the corner and it is meanwhile gotten quite busy in the Christmas village, it's become a big mess, everything is full of ELFs. With all the coming and going, it is really easy to lose the overview over the ELFs. Since the Council of ELFs is rather on the conservative side and would therefore like to keep track of their members, this year a registration office was established, where all ELFs that come and go have to register. This booth directly reports to the Council.
Today, we will look at a part of the kernel interface that is very important but also rather opaque and weird: netlink(7). In its essence, netlink is a general purpose communication layer to transfer data between the kernel and the user space. So, unlike yesterday's UNIX domain sockets, where both ends where a process, netlinks connects a process to a kernel subsystem.
Today, the most common usage for netlink is the configuration of the networking subsystem in Linux. Every time you add a new route, add a firewall rule, or change your IP address, a user process connects via netlink to the kernel and sends a message to change the routing information, append a firewall rule, or modify a device's IP address. You also can see netlink in action, if you change the network configuration without having the necessary permissions:
$ id
uid=1000(user) gid=1000(user) ....
$ ip a add 123.0.0.1/23 dev lo :(
RTNETLINK answers: Operation not permitted
But netlink is not only for sending configuration packets to the kernel, we can also receive data from the kernel on a netlink channel. For example, with NETLINK_NETFILER
we can passively monitor the network state of our kernel. Oleg Kutkov has a wonderful blog entry, including an example, on this.
While netlink is a powerful interface, it is in its essence a networking protocol and it has even an RFC that describes it. Therefore, working with netlink really feels awkward if you consider that you try to talk to your local kernel. Especially, the messages that we send to the kernel are prepared as if we would send them to a machine on the other side of the world. In my opinion, this is more than weird. But it's the reality.
cn_proc
Coming back to our original problem, we want to build that registration booth. More specific, we want to register with our Kernel to inform us about every fork(2), execve(2), and every exit(2) in the whole system. We want to have an overview about everything! And guess what, there is an interface for exactly this task! What a coincident!
Let's look at a special kind of netlink sockets: Netlink Connector. This abstraction set out to make it easy to communicate between the user space and a kernel module. So kernel modules can add a callback with cn_add_callback to a certain cb_id
. At the moment, there are eleven pre-defined/well-known callback slots or registered connector types. So, since this connector was introduced in 2004, we cannot say that it gained wide-spread adoption in the kernel.
Nevertheless, with cn_proc, there is a connector to monitor process-state changes in the whole system. However, as it is netlink, it is quite tricky to get this beast to work.
The central idea is to create a new SOCK_DGRAM/NETLINK_CONNECTOR
socket and bind it to CN_IDX_PROC
. On this socket, we will receive our fork/exec/exit messages. However, we first have to enable these message by sending a message to cn_proc
: The message is a struct proc_cn_mcast_op
, wrapped in a struct cn_msg
, wrapped in a struct nlmsg
that we write to the socket.
And really, if you cannot figure this out on your own, don't be ashamed. The documentation for this interface is less than stellar and I can also directly link to the solution on StackOverflow as you will search for it anyway. But perhaps, by crafting your own solution and looking at the cn_proc.c
in the kernel, you will get some understanding about what is happening there. One starting point is to have a look at the call-site of proc_fork_connector, which generates the fork()
message.
Also, as an example, the following happens on my machine when I run ./cn_proc
and type sleep
in another terminal:
$ ./cn_proc
.....
fork(): /bin/zsh (1451658, 1451658) -> (1451730, 1451730)
exec(): /bin/sleep (1451730, 1451730)
exit(): (1451730, 1451730) -> rc=0
We clearly see that my shell (zsh
) forks itself in order to directly exec()
to /bin/sleep
, which exits after some time. Please note that those tuples in my output are the task-group id and the actual Thread ID tid. As all involved processes are single threaded, the pid is equal to the tid.
Another curiosity that I encountered during the preparation for today is that cn_proc
is a stateful interface. If you enable this event stream once with PROC_CN_MCAST_LISTEN
, it is enabled and messages will be created, even if nobody listens to it. But, if you forget to enable the LISTING the next time, the events will still come anyway. Also, there is no safe way to disable this interface by sending PROC_CN_MCAST_IGNORE
as the subsystem will use atomic_dec()
to decrement the variable proc_event_num_listeners
, which then could overflow! All in all, this interface is broken and I don't see how somebody could fix it. But, hey, perhaps this makes it a good match for netlink ;-)
Last modified: 2023-12-01 15:52:28.137643, Last author: , Permalink: /p/advent-22-netlink
Vacancies of TU Braunschweig
Career Service' Job Exchange
Merchandising
Term Dates
Courses
Degree Programmes
Information for Freshman
TUCard
Technische Universität Braunschweig
Universitätsplatz 2
38106 Braunschweig
P. O. Box: 38092 Braunschweig
GERMANY
Phone: +49 (0) 531 391-0