ELKS Processes

( work in progress while I try to understand it )
( also, much of the following is incomplete in ELKS )
( add info on fork and exec )

The Process Table

Each process is represented by a record in task[], an array of struct task_struct, declared in <linuxmt/sched.h>

The currently-executing task is pointed to by current

The most important members of task[] are:

t_regs;
t_kstack;
A complete stored register set and a kernel stack, for context switching between processes.

Q. Why have separate kernel and user stacks?
A. Because when kernel code executes it assumes SS = DS = its own data segment. Any modern processor has a separate system stack pointer; the 8086 doesn't, so the switch has to be done in software.

Q. Why do we have a separate kernel stack per task?
A. Because a context switch can occur in kernel code, if it calls schedule() - although preemptive timeslicing does not occur when kernel code is executing, nor during interrupts.

Q. When does timeslicing occur?
A. At the moment, on return from an interrupt, if we were in userland when the interrupt occurred. In Linux, it can also occur when returning from a system call.

state;
The task's state. This can be one of:

Values of task[].state
TASK_RUNNING The task is eligible to run. It might not actually be running because of timeslicing, but it will continue when its turn comes again.
TASK_INTERRUPTIBLE
TASK_UNINTERRUPTIBLE
The task is "sleeping", and won't wake up until another process puts it back to TASK_RUNNING, usually as a result of some external condition changing. The difference between the two is that a TASK_INTERRUPTIBLE will also be woken up if a signal arrives.
TASK_STOPPED A process is stopped when it receives a certain signal (SIGSTOP, SIGTSTP, SIGTTIN or SIGTTOUT). It is restarted by sending it a SIGCONT.
TASK_UNUSED Indicates an empty slot in the task[] table

pid, ppid;
The task's process id (a number from 1 to 32767), and its parent's pid. pid=0 is reserved for the 'idle' process task[0], which is run when there's nothing else to do. pids are allocated cyclicly, so when a process dies it's likely to be a long time before the same pid is reused.

pgrp;
A process group number. "Process groups are used for distribution of signals, and by terminals to arbitrate requests for their input: processes that have the same process group as the terminal are foreground and may read, while others will block with a signal if they attempt to read" [taken from man 2 setpgid]

session;
A session number. As far as I can tell, the main idea is that the kernel is a bit more relaxed about permissions between tasks in the same session (e.g. they can send signals to each other). The session number is the pid of the 'session leader' which receives a SIGHUP and a SIGCONT when its controlling tty goes away (e.g. modem drops carrier)

If someone can give a more lucid and/or accurate description of the above please submit it!

uid, euid, suid;
gid, egid, sgid;
The task's user and group ids (real, effective and stored). The effective uid/gid gives the task's actual privileges at the moment, which may be different to the real id of the user who started the task (e.g. for a suid program). The stored ids let a process give up privileges yet reclaim them later.

groups[NGROUPS];
An array of supplementary groups, which are used when accessing files. The user is permitted group access if any of these gids matches the file's gid. See function in_group_p in kernel/sys.c

files;
The task's files, containing an array of struct file's, indexed by the fd number which open() returns. Also includes a bitmap indicating which ones must be closed when the task uses exec() to become another program.

fs;
Master filesystem information: the inode of the root of the filetree (which may be a subset of the full filetree if you want to limit its scope to roam), the inode of its current working directory, and the 'umask' which gives the default permissions when creating a file (actually, each '1' bit indicates a permission NOT granted)

t_count, t_priority;
Process priority information, used to decide when this process has had more than its fair share of processor time

sig;
signal, blocked;
Signal information: pointers to handlers for each possible signal which might be received, and bitmaps of signals outstanding and signals blocked (i.e. which the process doesn't want to receive at the moment)

Context Switching

Switching between processes is performed by the function schedule() in kernel/sched.c. The actual context switch is remarkably simple:
	save_regs(current);
	current = &task[curnum];	/* choose a new task */
	load_regs(current);
save_regs saves the exact state which the kernel was in when it called this function. load_regs doesn't return to that point (because the program flow would cause load_regs to be run again, ad infinitum) - rather it pops enough items off the stack to return to whoever called schedule().

schedule() can be called whenever the kernel wants to 'give up' its current timeslice - normally it would have set current->state=TASK_(UN)INTERRUPTIBLE first, otherwise it will be rescheduled at the next opportunity.

schedule() is also called at the end of the timer interrupt routine if the global flag need_resched is set (see arch/i86/kernel/irqtab.c). This is a bit hair-raising: the user's process is left hanging mid-timer-interrupt while another one continues! But when its turn comes again, the return from the timer interrupt completes.

need_resched is set if a process has used up its allotted time, which should also be calculated in the timer interrupt routine [not yet implemented].

Q. What if schedule() is called from a timer interrupt while the kernel is already executing schedule()?
A. This can't happen; the timer interrupt won't check need_resched unless user-level code is executing.

Note that hardware interrupt handlers are NOT allowed to call schedule! This means they can't sleep - they must run to completion and return.

Wait Queues

A common thing a process must do is to sleep until a resource it needs becomes available (such as buffer space, or input from disk or terminal). Functions in kernel/sleepwake.c are provided for this purpose.

The process which needs a resource calls sleep_on(q). sleep_on adds this process to the "wait queue" q, sets its state = TASK_UNINTERRUPTIBLE, and calls schedule(). This causes the process to sleep.

Later, another process which makes the resource available calls wake_up(q), which sets state=TASK_RUNNING for the sleeping process. This allows it to run; it continues at the point it got to in the sleep_on function (i.e. just after the call to schedule), where it removes itself from q and returns. Because q is actually a linked list, many processes can be asleep waiting for the same resource.

The functions that manipulate the wait queues protect themselves from being timesliced, by disabling interrupts. This prevents corruption of the queue.

Cunningly, no malloc-style storage allocation is needed to create objects for a wait queue. sleep_on defines a local variable of type struct wait_queue - i.e. it is allocated on the kernel stack for that process. When sleep_on returns, which is when we don't need it any more, it vanishes. If multiple tasks are sleeping for the same resource, q will be a linked list of objects, one on each kernel stack.


This document may be freely distributed as long as this copyright notice is kept intact and any changes or additions are marked with your name
Copyright © Brian Candler 1996

Last updated: 1 September 1996