Web based School

18 – What Is a Process?

What Happens When You Execute a Command?

Forking a Process
Running a Command

Looking at Process
Visiting the Shell Again

Processing a Command

Checking the Aliases and Built-Ins
Make a New Process with fork
Start a New Command with exec
An Example

Executing in the Background

An Example

Kinks in Your Pipeline
A Special Process Called Daemon

init
inetd
cron

Summary

18 – What Is a Process?

This chapter introduces the concept of processes and how you use UNIX to interact with them.

What Happens When You Execute a Command?

When you execute a program on your UNIX system, the system creates a special environment for that program. This environment contains everything needed for the system to run the program as if no other program were running on the system.

Forking a Process

Each process has process context, which is everything that is unique about the state of the program you are currently running. The process context includes then following:

The text (program instructions) being run
The memory used by the program being run
The current working directory
The files that are open and positions in the files
Resource limits
Access control information
Others—various low-level information

Every time you execute a program the UNIX system does a fork, which performs a series of operations to create a process context and then execute your program in that context. The steps include the following:

Allocate a slot in the process table, a list of currently running programs kept by UNIX. UNIX creates the illusion of multiple programs running simultaneously by switching quickly between active processes in the process table. This allocation can fail for a number of reasons, including these:

You have exceeded your per user process limit, the maximum number of processes your UNIX system will allow you to run.
The system runs out of open process slots. The UNIX kernel stores information about currently running processes in a table of processes. When this table runs out of room for new entries, you are unable to fork a new process.
UNIX has run out of memory and does not have room for the text and data of the new process.

Assign a unique process identifier (PID) to the process. This identifier can be used to examine and control the process later.
Copy the context of the parent, the process that requested the spawning of the new process.
Return the new PID to the parent process. This enables the parent process to examine or control the process directly.

After the fork is complete, UNIX runs your program. One of the differences between UNIX and many other operating systems is that UNIX performs this two-step procedure to run a program. The first step is to create a new process that's just like the parent. The second is to execute a different program. This procedure allows interesting variations. (See the section "A Special Process Called Daemon.")

Running a Command

When you enter ls to look at the contents of your current working directory, UNIX does a series of things to create an environment for ls and the run it:

The shell has UNIX perform a fork. This creates a new process that the shell will use to run the ls program.
The shell has UNIX perform an exec of the ls program. This replaces the shell program and data with the program and data for ls and then starts running that new program.
The ls program is loaded into the new process context, replacing the text and data of the shell.
The ls program performs its task, listing the contents of the current directory.

Looking at Process

Because processes are so important to getting things done, UNIX has several commands that enable you to examine processes and modify their state. The most frequently used command is ps, which prints out the process status for processes running on your system. Each system has a slightly different version of the ps command, but there are two main variants, the System V version and the Berkeley version, covered in this section. Different versions of ps do similar things, but have somewhat different output and are controlled using different options. The X/Open Portability Guide makes an attempt to standardize somewhat on output of the ps command. The ps command is covered in more detail in chapter 19, "Administrative Processes."

On a System V or XPG-compliant system, you can examine all the processes you are running by entering ps -f and you will get output such as the following:

$ ps -f

     UID   PID  PPID  C    STIME TTY      TIME COMMAND

    root 14931   136  0 08:37:48 ttys0    0:00 rlogind

  sartin 14932 14931  0 08:37:50 ttys0    0:00 -sh

  sartin 15339 14932  7 16:32:29 ttys0    0:00 ps -f

$

NOTE: After the first line, which is the header, each line of output tells about the status of a single process. The UID column tells the owner of the process. The PID column tells the process ID. The PPID tells the process ID of the parent process (the process that executed the fork). The STIME is the time the process began executing. The TIME is the amount of computer time the process has used. The COMMAND field tells what command line was executed.

Look at this example and you can see that root (the system administration user) is running rlogind as process 14931. This process is a special kind of administrative program, called a daemon (daemons are described in the section "A Special Process Called Daemon"). This particular daemon is responsible for managing a connection from rlogin, which is described in Chapter 8, "Getting Around the Network." As you can see from the next line, there is a process called -sh, which is a Bourne shell. The shell has rlogind as its parent because the daemon did a fork to run the login shell. Similarly, there is a ps -f command that has the shell as its parent.

TIP: The leading hyphen on the -sh in the output of ps means that the shell is executing as a login shell, which does certain special processing that other instances of the shell do not. See the chapter on your shell for more information on login shells.

Visiting the Shell Again

Earlier in this chapter you learned that the shell creates a new process for each command you execute. This section covers in a bit more detail how the shell creates and manages processes.

Processing a Command

When you type a command to your shell user interface, the shell performs a series of tasks to process the command. Although the steps may seem a bit cumbersome at first, they create an environment that is highly flexible.

Checking the Aliases and Built-Ins

The first thing the shell does is alias and built-in processing to see if your command is one of the shell's internally implemented functions. Each shell implements a number of functions internally either because external implementation would be difficult (for example, while loops) or because internal implementation is a big performance win (for example, echo in some shells). One reason the built-in commands are easier is that they can operate directly in the shell process rather than forcing the shell to create a new process to run a different command. That new command would not have access to the shell's memory.

Make a New Process with fork

If the command you typed is not a built-in command (for example, if you entered ps), the shell performs a fork to create a new process. Your UNIX system allocates the necessary resources. The shell modifies the process environment to configure correctly for the command to be executed. This includes any input or output redirect you may have requested (including command pipelines) and creating a new background process group if you executed the command in the background.

Start a New Command with exec

Finally, the shell performs an exec to execute the program that you requested. The program will replace the shell with a forked shell, but your shell will still be running.

An Example

The following happens when you enter ps -f > t1 followed by cat t1:

$ ps -f > t1

$ cat t1

     UID   PID  PPID  C    STIME TTY      TIME COMMAND

    root 14931   136  0 08:37:48 ttys0    0:00 rlogind

  sartin 14932 14931  0 08:37:50 ttys0    0:00 -sh

  sartin 15339 14932  7 16:32:29 ttys0    0:00 ps -f

$

UNIX performs the following steps to execute ps -f> t1:

Shell command processing. The login shell (PID 14932 in this example) performs variable substitution and examines the command line to determine that ps is not a built-in or an alias.
fork/wait. The login shell (PID 14932) forks a new process (PID 15339). This new process is an exact copy of the login shell. It has the same open files, the same user ID, and a copy of the memory, and it is executing the same code. Because the command was not executed in the background, the login shell (14932) will execute a wait to wait for the new child (15339) to complete.
setup. The new shell (PID 15339) performs the operations it needs to do in order to prepare for the new program. In this case, it redirects the standard output to a file (if it existed) in the current directory named t1, overwriting the file.
exec. The new shell (PID 15339) asks the UNIX system to exec the ps command with -f as its argument. UNIX throws away the memory from the shell and loads the ps command code and data into the process memory. The ps command will run and write its output to the standard output, which has been redirected to the file t1.
wait ends. When the ps command is done executing, the login shell (PID 14932) receives notification and will prompt the user for more input.

Executing in the Background

You can tell your shell to execute commands in the background, which tells the shell not to wait for the command to complete. This enables you to run programs without having to wait for them to complete.

TIP: For long-running commands that are not interactive, you can run the command in the background and continue to do work while it executes. Use the nohup command to make sure the process will not get interrupted; nohup will redirect the command output to a file called nohup.out. For example, to run a make in the background enter nohup make all. When the make terminates, you can read nohup.out to check the output.

An Example

This example is almost the same as the previous example. The only difference is that the ps command is executed in the background. The following happens when you are using the Bourne shell and enter ps -f > t1 & followed by cat t1:

$ ps -f > t1 &

15445

$ cat t1

     UID   PID  PPID  C    STIME TTY      TIME COMMAND

    root 14931   136  1 08:37:48 ttys0    0:00 rlogind

  sartin 14932 14931  0 08:37:50 ttys0    0:00 -sh

  sartin 15445 14932  8 17:31:14 ttys0    0:00 ps -f

$

WARNING: Do not depend on the output of a background process until you know the process has completed. If the command is still running when you examine the output, you may see incomplete output.

UNIX performs the following steps to execute ps -f > t1 &:

Shell command processing. The login shell (PID 14932 in this example) performs variable substitution and examines the command line to determine that ps is not a built-in or an alias.
fork. The login shell (PID 14932) forks a new process (PID 15445). This new process is an exact copy of the login shell. It has the same open files, the same user ID, and a copy of the memory, and it is executing the same code. Because the command was executed in the background, the login shell (14932) will immediately prompt you for input. Because your background command may still be running, you should not depend on its output until you know the process completed. You will be able to run a new command immediately.
setup. The new shell (PID 15445) performs the operations it needs to do in order to prepare for the new program. In this case, it redirects the standard output to a file in the current directory named t1, overwriting the file (if it existed).
exec. The new shell (PID 15445) asks the UNIX system to exec the ps command with -f as its argument. UNIX throws away the memory from the shell and loads the ps command code and data into the process memory. The ps command will run and write its output to the standard output, which has been redirected to the file t1.

Kinks in Your Pipeline

One of the things the fork/exec model enables is creating command pipelines, a series of commands with the output of one command as the input for the next. This powerful notion is one of the major advantages of UNIX over some other systems. See Chapter 1, "Operating Systems."

Creating a pipeline is similar to creating an ordinary command. The difference is in how output is redirected. In the ordinary case, the shell performs some simple I/O redirection before executing the program. In the pipeline case, the shell will instead connect the standard output of one command as the standard input of another.

If you enter ps -f | grep sartin you might get output such as the following:

$ ps -f | grep sartin

  sartin 14932 14931  1 08:37:50 ttys0    0:00 -sh

  sartin 15424 14932  1 17:15:02 ttys0    0:00 grep sartin

  sartin 15425 15424  7 17:15:02 ttys0    0:00 ps -f

$

NOTE: Some shells perform these tasks in slightly different orders. This example illustrates what one version of the Bourne shell does. Variations are relatively minor and involve the details of which process does the extra fork calls.

In order to get this output, the shell went through the following series of steps:

fork (1). The login shell (PID 14932) forks a new process (15424) to execute the pipeline. This subprocess (15424) redirects input, or creates a pipe, so that standard input is from a pipe. The login shell (14932) then waits for the pipeline execution to complete.
fork (2). The shell subprocess (15424) forks another new process (15425) to help execute the pipeline. This new subprocess (15425) connects its standard output to the pipe that its parent (15424) is using for input.
exec (1). The first subprocess (15424) executes the grep program.
exec (2). The second subprocess (15425) executes the ps program.

Avoiding the Background with GUI

With the advent of graphical user interfaces (GUIs) on UNIX, you do not need to use background processes to be able to run multiple programs at once. Instead, you can run each command either from a graphical interface or from its own terminal window. This can be very resource intensive, so don't try to do too many things at once.

A Special Process Called Daemon

As you learned in Chapter 1, many of the features that are sometimes implemented as part of the kernel, the core of the operating system, are not in the UNIX kernel. Instead, many of these features are implemented using special processes called daemons. A daemon is a process that detaches itself from the terminal and runs, disconnected, in the background, waiting for requests and responding to them. Many system functions are commonly performed by daemons, including the sendmail daemon, which handles mail, and the NNTP daemon, which handles USENET news. Many other daemons may exist on your system; check the documentation for more information.

Generally only system administrators need to know about most daemons, but there are three daemons that are important and widespread enough that you should probably have a minimal understanding of what they do; they are init, inetd, and cron.

init

In a way the init program is the "super daemon." It takes over the basic running of the system when the kernel has finished the boot process. It is responsible for the following:

Running scripts that change state. Every time the system administrator switches the system to a new state, init runs any programs needed to update the system to the new state.
Managing terminals. Each physical terminal is monitored by a program called getty; it is the job of init to keep getty properly running and shut it down when the system administrator disables logins. On systems with GUI consoles, init may be responsible for keeping the graphical login program running.
Reaping processes. When a process terminates, UNIX keeps some status information around until the parent process reads that information. Sometimes the parent process terminates before the child. Sometimes the parent process terminates without reading the status of the child. Any time either of these happens, UNIX makes init the parent process of the resulting zombie process, and init must read the process status so that UNIX can reuse the process slot. Sometimes, init isn't able to do the job of releasing zombies, too. However, this is unusual in most of the recent UNIX-based systems.

For further information on init, see Chapter 34, "Starting Up and Shutting Down."

inetd

A second powerful daemon is the inetd program common to many UNIX machines (including those based on BSD and many that are based on System V). The inetd process is the network "super server" daemon. It is responsible for starting network services that do not have their own stand-alone daemons. For example, inetd usually takes care of incoming rlogin, telnet, and ftp connections. (See Chapter 9, "Getting Around the Network.")

For further information on inetd, see Chapter 37, "Networking."

cron

Another common daemon is the cron program, which is responsible for running repetitive tasks on a regular schedule. It is a perfect tool for running system administration tasks such as backup and system logfile maintenance. It can also be useful for ordinary users to schedule regular tasks including calendar reminders and report generation. For more information, see Chapter 20, "Scheduling Processes."

Summary

In this chapter you have learned what a UNIX process is, how your interaction with UNIX starts and stops processes, and a little bit about a few special processes called daemons. A process is an entire execution environment for your computer program; it is almost like having a separate computer that executes your program. UNIX switches quickly between processes to give the illusion that they are all running simultaneously. You start a process any time you run a command or pipeline in the shell. You can even start a process in the background and perform other tasks while it is executing. Several processes called daemons run on your system to perform special tasks and supply services that some operating systems supply in the kernel.