Institute of Operating Systems and Computer Networks
- News
- About us
- Connected and Mobile Systems
  - Courses
  - Theses
  - Projects
  - Publications
  - Software
  - Datasets
- Reliable System Software
  - Team
  - Teaching
  - Theses & Jobs
  - Research
  - Publications
- Algorithms
  - Courses
  - Theses
  - Projects
  - Publications
- Microprocessor Lab
- Education
- Services
- Spin-Offs
  - Docoloc
  - bliq (formerly AIPARK)
  - Confidential Technologies
- Research Cooperations
  - IST.hub

Select a Gift

☃️

Git-Repository: Template Solution Solution-Diff (Solution is posted at 18:00 CET)
Workload: 110 lines of code
Important System-Calls: select(2)
Recommended Reads:

Dec. 1: The cat on the tip of the iceberg file 45 lines [open(2), pread(2), close(2)]
Dec. 2: Clone a Chimera! process 45 lines [clone(2), fork(2), getpid(2), gettid(2)]

For programs to execute it is a necessary precondition that all required resources are available. Simple calculations only require the CPU to be available, but more complex operations might need multiple resources: For example, if your program wants to parse a file in the next step, it does not only require time on the CPU to do the parsing but also the file contents must be loaded to memory as our CPU instructions only work on memory (and registers). So, we could draw a dependency graph between resources (CPUs, I/O requests, network messages) and pieces of code. Only if all dependencies (predecessors) are available or have finished, the piece of code can actually execute.

Often, we express such dependencies with synchronous system calls, which block the thread continuation until the system call has completed:

funcA();
read(fd, buf, 4096);
funcB(buf);

In this trivial example, the program issues the read(2) system call to express two things: (1) please read 4096 bytes from the file descriptor fd into buf. (2) continue the thread, which then will call funcB() only after the buffer was filled. So, synchronous system calls promise that the invoked functionality has completed before we continue.

For the second part, if the data is not yet available read() will block the thread and exclude it from scheduling until the data arrived. However, with read() we can only wait for data to become available on a single file descriptor. But sometimes this is not enough: For example, if you are a server process that wants to handle multiple network clients simultaneously you do not know which client socket will provide data first. Thereby, you do not know on which socket you should invoke read() first. While one solution is to spawn an execution thread for each client that blocks its execution on the client socket, this approach comes with high costs for servers with many clients (many threads, frequent thread switches).

Since Unix did not have proper thread support for a long time, the developers came up with the select(2) system call. With this blocking syscall, the calling thread expresses: please block my execution until one of the given file descriptors becomes "ready". Becoming ready means that a read()/write() call would surely not block. With this, waiting for the next request in a multi-client server becomes easier: just "select" all client sockets and handle all sockets that became ready:

while True:
    select(client_connections)
    for fd in client_connections:
        if fd.is_ready():
            request = fd.read()
            handle(request)

While this looks quite easy on first sight, it opens a whole new box of problems as we are now in the realm of event-based programming. For example, not only read() can block, but also write() can block if the client cannot receive the answer fast enough. However, for today, we will not think too much about this and only look at the read() side.

For more details on using select(2), please look at the man page and also at the tutorial man page for select (select_tut(2) as we have no intention to reiterate that.

Within the kernel, select() is reduced to the poll(2) infrastructure. Actually, poll() is the newer system call but we will, for now, stick to select(). In its core, do_select() is rather simple: It iterates over the given file-descriptor set, which is a fixed-sized bit mask, gets the struct file object, and polls it with vfs_poll() function, which finally calls a concrete file_operations.poll operation.

Task

With today's program we'll put select() to some use. You'll write a program that acts as pipe multiplexer filter: After spawning N filter processes, it reads from its standard input and copies its input to the filter processes. On the filter's output side, our program uses select() to demultiplex the filter's stdout descriptors into its own stdout. An example output looks like this:

$ make run
seq 1 100 | ./select "grep 1[1-3]" "grep [1-3]2"
[grep 1[1-3]] Started filter as pid 384425
[grep [1-3]2] Started filter as pid 384426
[grep 1[1-3]] 11
[grep 1[1-3]] 12
[grep 1[1-3]] 13
[grep [1-3]2] 12
[grep [1-3]2] 22
[grep [1-3]2] 32
[grep 1[1-3]] filter exited. exitcode=0
[grep [1-3]2] filter exited. exitcode=0

Hints

The given start_proc() function uses posix_spawn(3) to start a filter processes. Use it, but also read the man page of posix_spawn() and be thankful that you do not have to do it yourself with fork(2) and exec(2).
On the stdin side you better use a thread that uses blocking system calls to multiplex the data into proc[i].stdin
Implement a drain_proc() function that reads data from a process and prefixes it properly with the given label.
Test everything without select() and only a single filter process, before you proceed.
For each select, you have to reinitialize the readfd file-descriptor set.

Last modified: 2023-12-01 15:52:27.805320, Last author: , Permalink: /p/advent-07-select

Select a Gift

Task

Hints

For All Visitors

For Students

Internal Tools

Contact