For programs to execute it is a necessary precondition that all required resources are available. Simple calculations only require the CPU to be available, but more complex operations might need multiple resources: For example, if your program wants to parse a file in the next step, it does not only require time on the CPU to do the parsing but also the file contents must be loaded to memory as our CPU instructions only work on memory (and registers). So, we could draw a dependency graph between resources (CPUs, I/O requests, network messages) and pieces of code. Only if all dependencies (predecessors) are available or have finished, the piece of code can actually execute.
Often, we express such dependencies with synchronous system calls, which block the thread continuation until the system call has completed:
funcA();
read(fd, buf, 4096);
funcB(buf);
In this trivial example, the program issues the read(2) system call to express two things: (1) please read 4096 bytes from the file descriptor fd
into buf
. (2) continue the thread, which then will call funcB()
only after the buffer was filled. So, synchronous system calls promise that the invoked functionality has completed before we continue.
For the second part, if the data is not yet available read()
will block the thread and exclude it from scheduling until the data arrived. However, with read()
we can only wait for data to become available on a single file descriptor. But sometimes this is not enough: For example, if you are a server process that wants to handle multiple network clients simultaneously you do not know which client socket will provide data first. Thereby, you do not know on which socket you should invoke read()
first. While one solution is to spawn an execution thread for each client that blocks its execution on the client socket, this approach comes with high costs for servers with many clients (many threads, frequent thread switches).
Since Unix did not have proper thread support for a long time, the developers came up with the select(2) system call. With this blocking syscall, the calling thread expresses: please block my execution until one of the given file descriptors becomes "ready". Becoming ready means that a read()
/write()
call would surely not block. With this, waiting for the next request in a multi-client server becomes easier: just "select" all client sockets and handle all sockets that became ready:
while True:
select(client_connections)
for fd in client_connections:
if fd.is_ready():
request = fd.read()
handle(request)
While this looks quite easy on first sight, it opens a whole new box of problems as we are now in the realm of event-based programming
. For example, not only read()
can block, but also write()
can block if the client cannot receive the answer fast enough. However, for today, we will not think too much about this and only look at the read()
side.
For more details on using select(2), please look at the man page and also at the tutorial man page for select (select_tut(2) as we have no intention to reiterate that.
Within the kernel, select()
is reduced to the poll(2) infrastructure. Actually, poll()
is the newer system call but we will, for now, stick to select()
. In its core, do_select() is rather simple: It iterates over the given file-descriptor set, which is a fixed-sized bit mask, gets the struct file object, and polls it with vfs_poll()
function, which finally calls a concrete file_operations.poll operation.
With today's program we'll put select()
to some use. You'll write a program that acts as pipe multiplexer filter: After spawning N filter processes, it reads from its standard input and copies its input to the filter processes. On the filter's output side, our program uses select()
to demultiplex the filter's stdout descriptors into its own stdout. An example output looks like this:
$ make run
seq 1 100 | ./select "grep 1[1-3]" "grep [1-3]2"
[grep 1[1-3]] Started filter as pid 384425
[grep [1-3]2] Started filter as pid 384426
[grep 1[1-3]] 11
[grep 1[1-3]] 12
[grep 1[1-3]] 13
[grep [1-3]2] 12
[grep [1-3]2] 22
[grep [1-3]2] 32
[grep 1[1-3]] filter exited. exitcode=0
[grep [1-3]2] filter exited. exitcode=0
start_proc()
function uses posix_spawn(3) to start a filter processes. Use it, but also read the man page of posix_spawn()
and be thankful that you do not have to do it yourself with fork(2) and exec(2).stdin
side you better use a thread that uses blocking system calls to multiplex the data into proc[i].stdin
drain_proc()
function that reads data from a process and prefixes it properly with the given label.select()
and only a single filter process, before you proceed.select
, you have to reinitialize the readfd
file-descriptor set.Last modified: 2023-12-01 15:52:27.805320, Last author: , Permalink: /p/advent-07-select
Vacancies of TU Braunschweig
Career Service' Job Exchange
Merchandising
Term Dates
Courses
Degree Programmes
Information for Freshman
TUCard
Technische Universität Braunschweig
Universitätsplatz 2
38106 Braunschweig
P. O. Box: 38092 Braunschweig
GERMANY
Phone: +49 (0) 531 391-0