Unix Free Tutorial

Web based School

Previous Page Main Page Next Page

  • 19 — Administering Processes

    • 19 — Administering Processes


      You use processes on UNIX every time you want to get something done. Each command (that isn't built into your shell) you run will start one or more new processes to perform your desired task. To get the most benefit out of your UNIX machine you need to learn how to monitor and control the processes that are running on it. You will need to know how to make large, but not time-critical, tasks take less of your CPU time. You will need to learn how to shut down programs that have gone astray. You will need to learn how to improve the performance of your machine.

      Monitoring Processes—ps and time

      The first step in controlling processes in UNIX is to learn how to monitor them. By using the process-monitoring commands in UNIX, you will be able to find what programs are using your CPU time, find jobs that are not completing, and generally explore what is happening to your machine.

      What Is ps?

      The first command you should learn about is the ps command, which prints out the process status for some or all of the processes running on your UNIX machine.

      There are two distinctly different versions of ps: the SYSV version and the BSD version. Your machine might have either one or both of the ps commands. If you are running on a machine that is mostly based on Berkeley UNIX, try looking in /usr/5bin for the SYSV version of ps. If you are running on a machine that is mostly based on System V UNIX, try looking in /usr/ucb for the BSD version of ps. Check your manuals and the output of your ps program to figure out which one you have. You may want to read the introductions to both SYSV and BSD ps output since some systems either combine features of both (for example, AIX) or have both versions (for example, Solaris 2.3, which has SYSV /usr/bin/ps and BSD /usr/ucb/ps).

      Introduction to SYSV ps Output

      If you are using SYSV, you should read this section to learn about the meaning of the various fields output by ps.

      Look at what happens when you enter ps:

      $ ps
      
         PID TTY      TIME COMD
      
        1400 pts/5    0:01 sh
      
        1405 pts/5    0:00 ps
      
      $ 

      The PID field gives you the process identifier, which uniquely identifies a particular process. The TTY fields tell what terminal the process is using. It will have ? if the process has no controlling terminal. It may say console if the process is on the system console. The terminal listed may be a pseudo terminal, which is how UNIX handles terminal-like connections from a GUI or over the network. Pseudo terminal names often begin with pt (or just p, if your system uses very short names). The TIME field tells how much CPU time the process has used. The COMD field (sometimes labelled CMD or COMMAND) tells what command the process is running.

      Now look at what happens when you enter ps -f:

      $ ps -f
      
           UID   PID  PPID  C    STIME TTY      TIME COMD
      
        sartin  1400  1398 80 18:31:32 pts/5    0:01 -sh
      
        sartin  1406  1400 25 18:34:33 pts/5    0:00 ps -f
      
      $

      The UID field tells which user ID owns the process. Your login name should appear here. The PPID field tells the process identifier of the parent of the process; notice that the PPID of ps -f is the same as the PID of -sh. The C field is process-utilization information used by the scheduler. The STIME is the time the process started.

      Next, look at what happens when you enter ps -l:

      $ ps -l
      
       F S   UID   PID  PPID  C PRI NI     ADDR     SZ    WCHAN TTY      TIME COMD
      
       8 S   343  1400  1398 80   1 20 fc315000    125 fc491870 pts/5    0:01 sh
      
       8 O   343  1407  1400 11   1 20 fc491800    114          pts/5    0:00 ps
      
      $

      Note that the UID is printed out numerically this time. The PRI field is the priority of the process; a lower number means more priority to the scheduler. The NI field is the nice value. See the section "Prioritizing Processes" for more information on the scheduler and nice values. The SZ field shows the process size. The WCHAN field tells what event, if any, the process is waiting for. Interpretation of the WCHAN field is specific to your system.

      On some SYSV systems with real-time scheduling additions, you may see output such as the following if you enter ps -c:

      $ ps -c
      
         PID  CLS PRI TTY      TIME COMD
      
        1400   TS  62 pts/5    0:01 sh
      
        1409   TS  62 pts/5    0:00 ps
      
      $ 

      The CLS field tells the scheduling class of the process; TS means time sharing and is what you will usually see. You may also see SYS for system processes and RT for real-time processes.

      On some SYSV systems running the Fair Share Scheduler, you may see output such as the following if you enter ps -f:

      $ ps -f
      
           UID    FSID   PID  PPID  C    STIME TTY      TIME COMMAND
      
        sartin rddiver 18735 18734  1  Mar 12  ttys0    0:01 -ksh
      
        sartin rddiver 19021 18735  1 18:47:37 ttys0    0:01 xdivesim
      
        sartin rddiver 19037 18735  4 18:52:58 ttys0    0:00 ps -f
      
          root default 18734   136  0  Mar 12  ttys0    0:01 rlogind
      
      $ 

      The extra FSID field tells the fair share group for the process.

      Introduction to BSD ps Output

      If you are using BSD, you should read this section to learn about the meaning of the various fields output by ps.

      Look at what happens when you enter ps:

      $ ps
      
        PID TT STAT  TIME COMMAND
      
      22711 c0 T     0:00 rlogin brat
      
      22712 c0 T     0:00 rlogin brat
      
      23121 c0 R     0:00 ps
      
      $ 

      The PID field gives you the process identifier, which uniquely identifies a particular process. The TT fields tell what terminal the process is using. It will have ? if the process has no controlling terminal. It may say co if the process is on the system console. The terminal listed may be a pseudo terminal. The STAT field shows the process state. Check your manual entry for ps to learn more about state. The TIME field tells how much CPU time the process has used. The COMMAND field tells what command the process is running. Normally, the COMMAND field lists the command arguments stored in the process itself. On some systems, these arguments can be overwritten by the process. If you use the c option, the real command name will be given, but not the arguments.


      NOTE: The BSD ps command predates standard UNIX option processing. It does not take hyphens to introduce options. On systems where one ps acts like either SYSV or BSD (e.g., AIX ps), the absence of the hyphen is what makes it run in BSD mode.

      Look at what happens when you enter ps l:

      $ ps l
      
             F UID   PID  PPID CP PRI NI  SZ  RSS WCHAN    STAT TT  TIME COMMAND
      
      20408020 343 22711 22631  0  25  0  48    0          TW   c0  0:00 rlogin brat
      
          8000 343 22712 22711  0   1  0  48    0 socket   TW   c0  0:00 rlogin brat
      
      20000001 343 23122 22631 19  29  0 200  400          R    c0  0:00 ps l
      
      $ 

      The F field gives a series of flags that tell you about the current state of the process. Check your system manuals for information on interpreting this field. The UID field tells the user ID that owns the process. Your login name should appear here. The PPID field tells the process identifier of the parent of the process; notice that the PPID of the second rlogin is the same as the PID of the other, its parent process. The CP is process utilization information used by the scheduler. The PRI field is the priority of the process; a lower number means more priority to the scheduler. See the section "Prioritizing Processes" for more information on the scheduler. The SZ field shows the process size. The RSS field shows the resident set size, which is the actual amount of computer memory occupied by the process. The WCHAN field tells what event, if any, the process is waiting for. Interpretation of the WCHAN field is specific to your system.

      Look at what happens when you enter ps u:

      $ ps u
      
      USER       PID %CPU %MEM   SZ  RSS TT STAT START  TIME COMMAND
      
      sartin   23127  0.0  1.6  200  416 c0 R    19:25   0:00 ps u
      
      sartin   22712  0.0  0.0   48    0 c0 TW   18:40   0:00 rlogin brat
      
      sartin   22711  0.0  0.0   48    0 c0 TW   18:40   0:00 rlogin brat
      
      $

      The %CPU and %MEM fields tell the percentage of CPU time and system memory the process is using. The START field tells when the process started.

      $ ps v
      
        PID TT STAT  TIME SL RE PAGEIN SIZE  RSS   LIM %CPU %MEM COMMAND
      
      23126 c0 R     0:00  0  0      0  200  420    xx  0.0  1.6 ps
      
      $

      The SL field tells how long the process has been sleeping, waiting for an event to occur. The RE field tells how long the process has been resident in memory. The PAGEIN field tells the number of disk input operations caused by the process, to read in pages that were not already resident in memory. The LIM field tells the soft limit on memory used.

      Checking on Your Processes with ps

      This section gives a few handy ways to examine the states of certain processes you might care about. Short examples are given using the SYSV and BSD versions of ps.

      Everything You Own

      Viewing all processes that you own can be useful in looking for jobs that you accidentally left running or to see everything you are doing so you can control it. On SYSV, you type ps -u userid to see everything owned by a particular user. Try ps -u $LOGNAME to see everything you own:

      $ ps -u $LOGNAME
      
         PID TTY      TIME COMMAND
      
        18743 ttys0    0:01 ksh
      
        19250 ttys0    0:00 ps
      
      $ 

      On BSD, the default is for ps to show everything you own:

      $ ps l
      
             F UID   PID  PPID CP PRI NI  SZ  RSS WCHAN    STAT TT  TIME COMMAND
      
      20088201 343   835   834  1  15  0  32  176 kernelma S    p0  0:00 -ksh TERM=vt
      
      20000001 343   861   835 25  31  0 204  440          R    p0  0:00 ps l
      
      20088001 343   857   856  0   3  0  32  344 Heapbase S    p1  0:00 -ksh HOME=/t
      
      $ 
      Specific Processes

      Looking at the current status of a particular process can be useful to track the progress (or lack thereof) of a single command you have running. On SYSV you type ps -pPID ... to see a specific process:

      $ ps -p19057
      
         PID TTY      TIME COMMAND
      
        19057 ttys3    0:00 ksh
      
      $ 

      On BSD, if the last argument to ps is a number, it is used as a PID:

      $ ps l22712
      
             F UID   PID  PPID CP PRI NI  SZ  RSS WCHAN    STAT TT  TIME COMMAND
      
          8000 343 22712 22711  0   1  0  48    0 socket   TW   c0  0:00 rlogin brat
      
      $
      Specific Process Groups

      Looking at the status of a process group (See the section "Job Control and Process Groups.") can be useful in tracking a particular job you run. On SYSV you can use ps -gPGID to see a particular process group:

      $ ps -lg19080
      
        F S   UID   PID  PPID  C PRI NI     ADDR   SZ    WCHAN TTY      TIME COMD
      
        1 S   343 19080 19057  0 158 24   710340   51   39f040 ttys3    0:58 fin_analysis
      
        1 S   343 19100 19080  0 168 24   71f2c0   87 7ffe6000 ttys3    2:16 fin_marketval
      
      $ 

      On BSD, there is no standard way to see a particular process group, but the output of ps j gives much useful information:

      $ ps j
      
       PPID   PID  PGID   SID TT TPGID  STAT   UID  TIME COMMAND
      
        834   835   835   835 p0   904   SOE   198  0:00 -ksh TERM=vt100 HOME=/u/sart
      
        835   880   880   835 p0   904   TWE   198  0:00 vi
      
        835   881   881   835 p0   904   TWE   198  0:00 vi t1.sh
      
        835   896   896   835 p0   904   IWE   198  0:00 ksh t2.sh _ /usr/local/bin/k
      
        896   897   896   835 p0   904   IWE   198  0:00 task_a
      
        896   898   896   835 p0   904   IWE   198  0:00 task_b
      
        835   904   904   835 p0   904    RE   198  0:00 ps j
      
      $

      Note the PGID field for PIDs 896—898, which are all part of one shell script. Note the TPGID field, which is the same for all processes and identifies the current owner of the terminal.

      Specific Terminal

      Looking at the status of a particular terminal can be a useful way to filter processes started from a particular login, either from a terminal or over the network. On SYSV use ps -t termid to see processes running from a particular terminal or pseudo terminal. (See your system documentation to determine the correct values for termid.)

      $ ps -fts3
      
           UID   PID  PPID  C    STIME TTY      TIME COMMAND
      
          root 19056   136  0 19:21:00 ttys3    0:00 rlogind
      
        sartin 19080 19057  0 19:23:53 ttys3    1:01 fin_analysis
      
        sartin 19057 19056  0 19:21:01 ttys3    0:00 -ksh
      
        sartin 19100 19080  0 19:33:53 ttys3    3:43 fin_marketval
      
        sartin 19082 19057  0 19:23:58 ttys3    0:00 vi 19unxor.adj
      
      $ 

      On BSD use ps t termid to see processes running from a particular terminal or pseudo terminal (See your system documentation to determine the correct values for termid.):

      $ ps utp5
      
      USER       PID  %CPU %MEM   SZ  RSS TT STAT  TIME COMMAND
      
      sartin    2058   0.0  0.9  286      p5 R     0:00 -sh (sh)
      
      sartin    2060   0.0  2.7   53      p5 R     0:00 vi 19unxor.adj
      
      $ 
      Specific User

      Looking at processes run by a particular user can be useful for the system administrator to track what is being run by others and to deal with "runaway" processes. On SYSV enter ps -u userid to see everything owned by a particular user:

      $ ps -fusartin
      
           UID   PID  PPID  C    STIME TTY      TIME COMMAND
      
        sartin 18743 18735  0  Mar 12  ttys0    0:31 collect_stats
      
        sartin 19065 19057  1 19:21:04 ttys3    0:00 vi 19unxor.adj
      
        sartin 19057 19056  0 19:21:01 ttys3    0:00 -ksh
      
        sartin 18735 18734  0  Mar 12  ttys0    0:00 -ksh
      
        sartin 19066 18743  8 19:21:12 ttys0    0:00 ps -fusartin
      
      $ 

      On BSD, there is no simple, standard way to see processes owned by a particular user other than yourself.

      Checking on a Process with time

      The time command prints out the real, system, and user time spent by a command (in ksh, the built-in time command will time a pipeline as well). The real time is the amount of clock time it took from starting the command until it completed. This will include time spent waiting for input, output, or other events. The user time is the amount of CPU time used by the code of the process. The system time is the amount of time the UNIX kernel spent doing things for the process. The time command prints real, user, and sys times on separate lines (BSD time may print them all on one line). Both csh and ksh have built-in versions of time that have slightly different output formats. The csh built-in time command prints user time, system time, clock time, percent usage, and some I/O statistics all on one line. The ksh time built-in time command prints real, user, and sys time on separate lines, but uses a slightly different format for the times than does time:

      % time ./doio
      
      9.470u 0.160s 0:09.56 100.7% 0+99k 0+0io 0pf+0w
      
      % ksh
      
      $ time ./doio
      
      real    0m9.73s
      
      user    0m9.63s
      
      sys     0m0.10s
      
      $ sh
      
      $ time ./doio
      
      real        9.8
      
      user        9.5
      
      sys         0.1
      
      $ 

      Background and Foreground Processes

      So far, you have seen examples and descriptions of a user typing a command, watching as it executes, possibly interacting during its execution, and eventually completing. This is the default way your interactive shell executes processes. Using only this order of events means your shell executes a single process at a time. This single process is running in the foreground. Shells are able to keep track of more than one process at a time. In this type of environment, one process at most can be in the foreground; all the other processes are running in the backgound. This allows you to do multiple things at once from a single screen or window. You can think of the foreground and the background as two separate places where your interactive shell keeps processes. The foreground holds a single process, and you may interact with this process. The background holds many processes, but you cannot interact with these processes.

      Foreground Processing

      Running a process in the foreground is very common—it is the default way your shell executes a process. If you want to write a letter using the vi editor, you enter the command vi letter and type away. After you enter the vi command, your shell starts the vi process in the foreground so you can write your letter. In order for you to enter information interactively, your process must be in the foreground. When you exit the editor, you are terminating the process. After your foreground process terminates, but not before, the shell prompts you for the next command.

      This mode of execution is necessary for all processes that need your interactions. It would be impossible for the computer to write the letter you want without your input. Mind reading is not currently a means of input, so you commonly type, use your mouse, and even sometimes speak the words. But not all processes need your input—they are designed to be able to get all the necessary input via other ways. They may be designed to get input from the computer system, from other processes, or from the file system.

      Still, such processes may be designed to give you information. Status information could be reported periodically, and usually the process results are displayed at a certain point. If you wish to see this information as it is reported, the process must be running in the foreground.

      Where Is the Background and Why Should You Go There?

      Sometimes a program you run doesn't need you to enter any information or view any results. If this is the case, there is no reason you need to wait for it to complete before doing something else. UNIX shells provide a way for you to execute more than one process at a time from a single terminal. The way you do this is to run one or more processes in the background. The background is where your shell keeps all processes other than the one you are interacting with (your foreground process). You cannot give input to a process via standard input while it is in the background—you can give input via standard input only to a process in the foreground.

      The most common reason to put a process in the background is to allow you to do something else interactively without waiting for the process to complete. For example, you may need to run a calculation program that goes through a very large database, computing a complicated financial analysis of your data and then printing a report; this may take several minutes (or hours). You don't need to input any data because your database has all the necessary information. You don't need to see the report on your screen since it is so big you would rather have it saved in a file and/or printed on your laser printer. So when you execute this program, you specify that the input should come from your database (redirection of standard input) and the report should be sent to a file (redirection of standard output). At the end of the command you add the special background symbol, &. This symbol tells your shell to execute the given command in the background. Refer to the following example scenario.

      $ fin_analysis < fin_database > fin_report &
      
      [1]   123
      
      $ date
      
      Sat Mar 12 13:25:17 CST 1994
      
      $ tetris
      
      $ date
      
      Sat Mar 12 15:44:21 CST 1994
      
      [1] +  Done             fin_analysis < fin_database > fin_report &
      
      $ 

      After starting your program on its way (in the background), the shell prints a prompt and awaits your next command. You may continue doing work (executing commands) while the calculation program runs in the background. When the background process terminates (all your calculations are complete), your shell may print a termination message on your screen, followed by a prompt.

      Job Control

      Some shells (C shell, csh, and Korn shell, ksh, are two) have increased ability to manipulate multiple processes from a single interactive shell. Although graphical interfaces have since added the ability to use multiple windows (each with it's own interactive shell) from one display, job control still provides a useful function.

      First you need to understand the shell's concept of a job. A job is an executed command line. Recall the discussion of processes created during execution of a command. For many command lines (for example, pipelines of several commands), several processes are created in order to carry out the execution. The whole collection of processes that are created to carry out this command line belong to the same process group. By grouping the processes together into an identifiable unit, the shell allows you to perform operations on the entire job, giving you job control.

      Job control allows you to do the following:

      • Move processes back and forth between the foreground and background

      • Suspend and resume process execution

      Each job or process group has a controlling terminal. This is the terminal (or window) from which you executed the command. Your terminal can only have one foreground process (group) at a time. A shell that implements job control will move processes between the foreground and the background.

      The details of job control use are covered in the section "Job Control and Process Groups."

      Signaling Processes

      When a process is executing, UNIX provides a way to send a limited set of messages to this process: It sends a signal. UNIX defines a set of signals, each of which has a special meaning. Then the user, or other processes that are also executing, can send a specific signal to a process. This process may ignore some signals, and it may pay attention to others. As a nonprogramming user, you should know about the following subset of signals. The first group is important for processes, no matter what shell you are using. The second group applies if your shell supports job control.

      General Process Control Signals

      HUP

      Detection of terminal hangup or controlling process death

      INT

      Interactive attention signal—INTR control character generates this

      KILL

      Termination—process cannot ignore or block this

      QUIT

      Interactive termination—QUIT control character generates this

      TERM

      Termination—process may ignore or block this

      Job Control Process Control Signals

      CONT

      Continue a stopped process—process cannot ignore or block this

      STOP

      Stop a process—process cannot ignore or block this

      TSTP

      Interactive stop—SUSP control character generates this

      TTIN

      Background job attempted a read—process group is suspended

      TTOU

      Background job attempted a write—process group is suspended

      The default action for all the general process control signals is abnormal process termination. A process can choose to ignore all signals except the KILL signal. There is no way for you to tell what processes are ignoring what signals. But if you need to terminate a process, the KILL signal cannot be ignored and can be used as a last resort when attempting to terminate a process.

      The default action for the job control process control signals is suspending process execution, except for the CONT signal which defaults to resuming process execution. Once again, a process may choose to ignore most of these signals. The CONT signal cannot be ignored, so you can always continue a suspended process. The STOP signal will always suspend a process because it cannot be ignored.

      Except for KILL and STOP, a process may catch a signal. This means that it can accept the signal and do something other than the default action. For example, a process may choose to catch a TERM signal, do some special processing, and finally either terminate or continue as it wishes. Catching a signal allows the process to decide which action to take. If the process does not catch a signal and is not ignoring the signal, the default action results.

      Killing Processes

      At some time or other, you will run a command and subsequently find out that you need to terminate it. You may have entered the wrong command, you may have entered the right command but at the wrong time, or you may be stuck in a program and can't figure out how to exit.

      If you want to terminate your foreground process, the quickest thing to try is your interrupt control character. This is usually set to Ctrl+C, but make sure by looking at your stty -a output. The interrupt control character sends an INT signal to the process. It is possible for a program to ignore the INT signal, so this does not always terminate the process. A second alternative is to use your quit character (often Ctrl +\, set using stty quit char), which will send a QUIT signal. A process can ignore the QUIT signal. If your shell supports job control (C or Korn shells), you can suspend the process and then use the kill command. Once again, your process can ignore the suspend request. If you don't have job control or if none of these attempts work, you need to find another window, terminal, or screen where you can access your computer. From this other shell you can use the ps command along with the kill command to terminate the process. To terminate a process that is executing in the background, you can use the shell that is in the foreground on your terminal.

      The kill Command

      The kill command is not as nasty as it sounds. It is the way that you can send a signal to an executing process (see the section "Signaling a Process"). A common use of the kill command is to terminate a process, but it can also be used to suspend or continue a process.

      To send a signal to a process, you must either be the owner of the process (that is, it was started via one of your shells) or you must be logged in as root.

      See the section "Job Control and Process Groups" for information on how to use special features of the kill command for managing jobs.

      Finding What to Kill Using ps

      To send a signal to a process via the kill command, you need to somehow identify the particular process. Two commands can help you with this: the ps command and the jobs command. All UNIX systems support some version of the ps command, but the jobs command is found in job control shells only. (See the section "Job Control and Process Groups" for details on job control and the jobs command.)

      The ps command shows system process information for your computer. The processes listed can be owned by you or other users, depending on the options you specify on the ps command. Normally, if you want to terminate a process, you are the owner. It is possible for the superuser (root) to terminate any processes, but non-root users may only terminate their own processes. This helps secure a system from mistakes as well as from abuse.

      Terminating a process can be a three-step process: first you should check the list of processes with ps. See the section "Monitoring Processes" if you're not sure how to do this. The output of ps should contain the process identifier of each process. Make sure you look for the PID column and not the PPID column. The PPID is the process ID for the parent process. Terminating the parent process could cause many other processes to terminate as well.

      Second, you can send a signal to the process via the kill command. The kill command takes the PID as one argument; this identifies which process you want to terminate. The kill command also takes an optional argument, which is the signal you wish to send. The default signal (if you do not specify one) is the TERM signal. There are several signals that all attempt to terminate a process. Whichever one you choose, you may specify it by its name (for example, TERM) or by a number. The name is preferable because the signal names are standardized. The numbers may vary from system to system. To terminate a process with PID 2345, you might try kill -HUP 2345. This sends the HUP signal to process 2345.

      Third, you should check the process list to see if the process terminated. Remember that processes can ignore most signals. If you specified a signal that the process ignored, the process will continue to execute. If this happens, try again with a different signal.


      TIP: If you have a CPU-intensive job running in the background and you want to get some work done without killing the job, try using kill -STOP PID. This will force the job to be suspended, freeing up CPU time for your more immediate tasks. When you are ready for the job to run again, try kill -CONT PID.

      Determining Which Signal to Send

      The sure way to make a process terminate is to send it the KILL signal. So why not just send this signal and be done with it? Well, the KILL signal is important as a last resort, but it is not a very clean way to cause process termination. A process cannot ignore or catch the KILL signal, so it has no chance to terminate gracefully. If a process is allowed to catch the incoming signal, it has an opportunity to do some cleaning up or other processing prior to termination.

      Try starting with the TERM signal. If your interrupt control character did not work, the INT signal probably won't either, but it is probably a reasonable thing to try next anyway. A common signal that many processes catch and then cleanly terminate is the HUP signal, so trying HUP next is a good idea. If you would like a core image of the process (for use with a debugging tool), the QUIT signal causes this to happen. If your process isn't exiting at this point, it might be nice to have the core image for the application developer to do debugging. If none of these signals caused the process to terminate, you can fall back on the KILL signal; the process cannot catch or ignore this signal.


      NOTE: If your process is hung up waiting for certain events (such as a network file server that is not responding), not even kill will have any visible effect immediately. As long as your process isn't using CPU time, you can probably stop worrying about it. The hung process will abort if the event ever occurs (for example, the file server responds or the request times out), but it might not go away until the next time you reboot.

      If you need a list of the available signals, the -l option to the kill command will display this list. You can also check the kill and signalf man pages for descriptions of each signal. The signals described in this section are the standard signals, but some systems may have additional supported signals. Always check the manual for your system to be sure.

      The dokill Script An Example

      Look at the dokill script as an example of how to kill a process reasonably and reliably:

      #!/bin/sh
      
      # TERM, HUP and INT could possibly come in a different order
      
      # TERM is first because it is what kill does by default
      
      # INT is next since it is a typical way to let users quit a program
      
      # HUP is next since many programs will make a recovery file
      
      # QUIT is next since it can be caught and often generates a core dump
      
      # KILL is the last resort since it can't be caught, blocked or ignored
      
      for sig in TERM INT HUP QUIT KILL
      
      do
      
              dosleep=0
      
              for pid in $*
      
              do
      
                      # kill -0 checks if the process still exists
      
                      if kill -0 $pid
      
                      then
      
                              # Attempt to kill the process using the current signal
      
                              kill -$sig $pid
      
                              dosleep=1
      
                      fi
      
              done
      
              # Here we sleep if we tried to kill anything.
      
              # This gives the process(es) a chance to gracefully exit
      
              # before dokill escalates to the next signal
      
              if [ $dosleep -eq 1 ]
      
              then
      
                      sleep 1
      
              fi
      
      done

      This script uses the list of signals suggested in the section "Determining Which Signal to Send." For each signal in the suggested list, dokill sends the signal to any processes remaining in its list of processes to kill. After sending a signal, dokill sleeps for one second to give the other processes a chance to catch the signal and shut down cleanly. The last signal in the list is KILL and will shut down any process that is not blocked, waiting for a high-priority kernel event. If kill -KILL does not shut down your process, you may have a kernel problem. Check your system documentation and the WCHAN field of ps to find out which event blocked the process.

      Logging Out with Background Processes

      After you start using executing processes in the background, you may forget or lose track of what processes you have running. You can always check on your processes by using the ps command (see the section "Monitoring Processes"). Occasionally, you will try to exit from your shell when you have processes running in the background. By default, UNIX tries to terminate any background or stopped jobs you have when you log out. UNIX does this by sending a HUP signal to all of your child processes.


      NOTE: As a safeguard, job control shells (such as csh and ksh) issue a warning instead of allowing you to log out. The message will be similar to "You have stopped (running) jobs." If you immediately enter exit again, the shell will allow you to log out without warning. But, beware! The background processes are terminated immediately. If you don't want these background processes to be terminated, you must wait until they have completed before exiting. There is no way to log out while keeping the processes alive unless you plan ahead.

      Using nohup

      Some of the commands you use may take so long to complete that you may not be able to (or want to) stay logged in until they complete. To change this behavior, you can use the nohup command. The word nohup simply precedes your normal command on the command line. Using nohup runs the command, ignoring certain signals. This allows you to log out, leaving the process running. As you log out, all your existing processes (those processes with your terminal as the controlling terminal) are sent the HUP signal. Since the process on which nohup is used ignores this signal, you can log out and the process will not terminate. If you have a nohup process in the background as you attempt to log out, your shell may warn you on your first exit command and require an immediate second exit in order to actually log out. (If yours is a shell that does job control, such as ksh or csh, see the section "Job Control and Process Groups.")


      NOTE: There are several varieties of the nohup command. The SYSV nohup executable arranges for the command to ignore NOHUP and QUIT signals but does nothing regarding the TERM signal. If the output is going to standard out, it is redirected to the file nohup.out (or alternately to $HOME/nohup.out if you can't write to the first).

      The C shell has a built-in nohup command. It arranges for the command to ignore TERM signals. (In C shell, background commands automatically ignore the HUP signal.) It does not redirect output to the file nohup.out.

      Your system or shell may have a slight variation on the exact signals ignored and whether the nice value is changed when you use nohup.

      Prioritizing Processes

      Part of administering your processes is controlling how much CPU time they use and how important each process is relative to the others. UNIX supplies some fairly simple ways to monitor and control CPU usage of your process. This section describes how to use UNIX nice values to control your process CPU usage. By setting nice values for large jobs that aren't time critical, you can make your system more usable for other jobs that need to be done now.

      What Is a Priority?

      The UNIX kernel manages the scheduling of all processes on the system in an attempt to share the limited CPU resource fairly. Because UNIX has grown as a general purpose time-sharing system, the mechanism the scheduler uses tries to favor interactive processes over long-running, CPU-intensive processes so that users perceive good system response. UNIX always schedules the process that is ready to run (not waiting for I/O or an event) with the lowest numerical priority (that is, lower numbers are more important). If two processes with the same priority are ready, the scheduler will schedule the process that has been waiting the longest. If your process is CPU intensive, the kernel will automatically change your process priority based on how much CPU time your process is using. This gives preference to interactive applications that don't use lots of CPU time.


      NOTE: Low PRI means high priority. You may find it a bit confusing that lower numbers for priority mean "higher" priority. Try thinking of the scheduler starting out at priority zero and seeing if any processes at that priority are ready. If not, the scheduler tries priority 1, and so on.

      To see how the UNIX scheduler works, look at the example in Table 19.1. In this example, three processes are each running long computations, and no other processes are trying to run. Each of the three processes will execute for a time slice and then let one of the other processes execute. Note that each process gets an equal share of the CPU. If you run an interactive process, such as a ps, while these three processes are running, you will get priority to run.

        Table 19.1. Scheduling three CPU-intensive processes.
      Process 1
      
      
      Process 2
      
      
      Process 3
      
      

      Running

      Waiting

      Waiting

      Waiting

      Running

      Waiting

      Waiting

      Waiting

      Running

      Running

      Waiting

      Waiting

      Waiting

      Running

      Waiting

      Waiting

      Waiting

      Running

      Being Nice

      One of the factors the kernel uses in determining a process priority is the nice value, a user-controlled value that indicates how "nice" you want a process to be to other processes. Traditionally, nice values range from 0 to 39 and default to 20. Only root can lower a nice value. All other users can only make processes more nice than they were.

      To see how the UNIX scheduler works with nice, look at the example in Table 19.2. In this example, three processes are each running long computations and no other processes are trying to run. This time, Process 1 was run with a nice value of 30. Each of the three processes will execute for a time slice and then let one of the other processes execute. However, in this case, Process 1 gets a smaller share of the CPU because the kernel uses the nice value in calculating the priority. Once again, if you run an interactive process, like a ps, while these three processes are running, you will get priority to run.

        Table 19.2. Scheduling three CPU-intensive processes, one nicely.
      Process 1
      
      
      Process 2
      
      
      Process 3, Nice Process
      
      

      Running

      Waiting

      Waiting

      Waiting

      Running

      Waiting

      Waiting

      Waiting

      Running

      Running

      Waiting

      Waiting

      Waiting

      Running

      Waiting

      Running

      Waiting

      Waiting

      Waiting

      Waiting

      Running

      Waiting

      Running

      Waiting

      Running

      Waiting

      Waiting

      Waiting

      Running

      Waiting

      Using renice on a Running Process

      BSD introduced the ability to change the nice value of other processes that are owned by you. The renice command gives you access to this capability. If you run a job and then decide it should be running with lower priority, you can use renice to do that.


      CAUTION: Not all systems have the renice command. Most systems based on BSD have it. Some systems, which are not based on BSD have added renice. The renice command on your system may take slightly different arguments than in the examples here. Check your system documentation to see if you have renice and what arguments it takes.

      On BSD-based systems, the renice command takes arguments in this manner:

      renice priority [ [-p] pid ... ] [ -g pgrp ... ] [ -u userid ... ]

      The priority is the new nice value desired for the processes to be changed. The -p option (the default) allows a list of process identifiers; you should get these from ps or by saving the PID of each background task you start. The -g option allows a list of process groups; if you are using a shell that does job control you should get this from the PID of each background task you start or by using ps and using the PID of the process that has a PPID that is your shell's PID. The -u option outputs a list of user IDs; unless you have appropriate privileges (usually only if you are root), you will be able to change only your own processes. If you want to make all of your current processes nicer, you can use renice -u yourusername. Remember that this will affect your login shell! This means that any command you start after renice will have lower priority.

      Here is an example of using renice on a single process. You start a long job (called longjob) and then realize you have an important job (called impjob) to run. After you start impjob, you can do a ps to see that longjob is PID 27662. Then you run renice 20 27662 to make longjob have a lower priority. If you immediately run ps l (try ps -l on a SYSV system that has renice), you will see that longjob has a higher nice value (see the NI column). If you wait a bit and do another ps l, you should notice that impjob is getting more CPU time (see the TIME column).

      $ longjob &
      
      27662
      
      $ impjob &
      
      28687
      
      $ ps l
      
           F S UID   PID  PPID   C PRI NI ADDR  SZ  RSS   WCHAN    TTY  TIME CMD
      
      240801 S 343 24076 29195   0  60 20 4231  88  268          pts/4  0:00 -sh 
      
      240001 R 343 26398 24076   4  62 20 4e52 108  204          pts/4  0:00 ps l 
      
      241001 R 343 27662 24076  52  86 20 49d0  32   40          pts/4  0:03 longjob 
      
      241001 R 343 28687 24076  52  86 20 256b  32   40          pts/4  0:00 impjob 
      
      $ renice 20 27662
      
      27662: old priority 0, new priority 20
      
      $ ps l
      
           F S UID   PID  PPID   C PRI NI ADDR  SZ  RSS   WCHAN    TTY  TIME CMD
      
      240001 R 343 18017 24076   3  61 20 60b8 108  204          pts/4  0:00 ps l 
      
      240801 S 343 24076 29195   0  60 20 4231  88  268          pts/4  0:00 -sh 
      
      241001 R 343 27662 24076  32  96 40 49d0  32   40          pts/4  0:09 longjob 
      
      241001 R 343 28687 24076  52  86 20 256b  32   40          pts/4  0:07 impjob 
      
      $ # Wait a bit
      
      $ ps l
      
           F S UID   PID  PPID   C PRI NI ADDR  SZ  RSS   WCHAN    TTY  TIME CMD
      
      240801 S 343 24076 29195   0  60 20 4231  88  268          pts/4  0:00 -sh 
      
      241001 R 343 27662 24076  74 117 40 49d0  32   40          pts/4  0:31 longjob 
      
      241001 R 343 28687 24076 115 117 20 256b  32   40          pts/4  0:41 impjob 
      
      240001 R 343 29821 24076   4  62 20 4ff2 108  204          pts/4  0:00 ps l 
      
      $ 

      Some jobs you run may start multiple processes, but renice -p will affect only one of them. One way to get around this is to use ps to find all of the processes and list each one to renice -p. If you are using a job control shell (for example, Korn shell or C shell), you may be able to use renice -g. In the following example, longjob spawns several sub-processes to help do more work (see the output of the first ps l). Notice that if you use renice -p you affect only the parent process's nice value (see the output of the second ps l). If you are using a shell that does job control, your background process should have been put in its own process group with a process group ID the same as its process ID. Try renice 20 -g PID and see if it works. Notice in the output of the third ps l that all of the children of longjob have had their nice values changed.

      $ longjob &
      
      [1]     27823
      
      $ ps l
      
           F S UID   PID  PPID   C PRI NI ADDR  SZ  RSS   WCHAN    TTY  TIME CMD
      
        1001 R 343 21938 27823  27  77 24 328e  56   20          pts/5  0:01 longjob 
      
        1001 R 343 26545 27823  26  77 24 601a  48   20          pts/5  0:01 longjob 
      
      201001 R 343 27823 27973  26  77 24 1647  56   20          pts/5  0:01 longjob 
      
      200801 S 343 27973 24078   0  60 20 6838 104  384          pts/5  0:00 -ksh 
      
        1001 R 343 28336 27823  26  77 24 7f1e  40   20          pts/5  0:01 longjob 
      
      200001 R 343 29877 27973   4  62 20 4ff2 108  204          pts/5  0:00 ps l 
      
      $ renice 20 -p 27823
      
      27823: old priority 4, new priority 20
      
      $ ps l
      
           F S UID   PID  PPID   C PRI NI ADDR  SZ  RSS   WCHAN    TTY  TIME CMD
      
        1001 R 343 21938 27823  24  76 24 328e  56   20          pts/5  0:04 longjob 
      
        1001 R 343 26545 27823  24  76 24 601a  48   20          pts/5  0:04 longjob 
      
      201001 R 343 27823 27973  11  85 40 1647  56   20          pts/5  0:04 longjob 
      
      200801 S 343 27973 24078   0  60 20 6838 104  384          pts/5  0:00 -ksh 
      
        1001 R 343 28336 27823  24  76 24 7f1e  40   20          pts/5  0:04 longjob 
      
      200001 R 343 29699 27973   4  62 20 4ff2 108  204          pts/5  0:00 ps l 
      
      $ renice 20 -g 27823
      
      27823: old priority 4, new priority 20
      
      $ ps l
      
           F S UID   PID  PPID   C PRI NI ADDR  SZ  RSS   WCHAN    TTY  TIME CMD
      
        1001 R 343 21938 27823  39  99 40 328e  56   20          pts/5  0:06 longjob 
      
        1001 R 343 26545 27823  38  99 40 601a  48   20          pts/5  0:06 longjob 
      
      201001 R 343 27823 27973  38  99 40 1647  56   20          pts/5  0:05 longjob 
      
      200801 S 343 27973 24078   0  60 20 6838 104  384          pts/5  0:00 -ksh 
      
        1001 R 343 28336 27823  38  99 40 7f1e  40   20          pts/5  0:06 longjob 
      
      200001 R 343 29719 27973   4  62 20 705d 108  204          pts/5  0:00 ps l 
      
      $ 

      Job Control and Process Groups

      Job control is a BSD UNIX addition that is used by some shells. Both C shell and Korn shell support job control. In order to support job control, these shells use the concept of process groups. Each time you enter a command or pipeline from the command line, your shell creates a process group. The process group is simply the collection of all the processes that are executed as a result of that command. For simple commands, this could be a single process. For pipelines, the process group could contain many processes. Either way, the shell keeps track of the processes as one unit by identifying a process group ID. This ID will be the PID of one of the processes in the group.

      If you run a process group in the background or suspend its execution, it is referred to as a job. A small integer value, the job number, is associated with this process group. The shell prints out a message with these two identifiers at the time when you perform the background operation. A process group and a job are almost the same thing. The one distinction you might care about is that every command line results in a process group (and therefore a process group identifier); a job identifier is assigned only when a process group is suspended or put into the background.

      Given process groups and job IDs, the shells have added new commands that operate on the job (or process group) as a whole. Further, existing commands (such as kill) are modified to take advantage of this concept. The two shells (C shell and Korn shell) have very minor differences from one another, but for the most part the job control commands in each are the same.

      Using the jobs Command

      The jobs command will show you the list of all of your shell's jobs that are either suspended or executing in the background. The list of jobs will look similar to this:

      [1]   Stopped              vi mydoc.txt
      
      [2] - Running              fin_analysis < fin_database > fin_report &
      
      [3] + Stopped (tty output) summararize_log &

      Each line corresponds to a single process group, and the integer at the start is its job number. You can use the job number as an argument to the kill command by prefixing the job number with a percent (%) sign. To send a signal to the process vi mydoc.txt, you could enter kill %1. Since you did not specify the signal you wanted to send to the process, the default signal, TERM, is sent. This notation is just a convenience for you since you can do the same thing via kill and the PID. The real power of job control comes with the ability to manipulate jobs between the foreground and the background.

      The shell also keeps the concept of current and previous jobs. On the output of the jobs command you will notice a + next to the current job and a - next to the previous job. If you have more than two jobs, the remaining jobs have no particular distinction. Again, this notation is mainly a convenience for you. In some job control commands, if you do not specify a job (or PID) number, the current job is taken by default. Keep in mind that your current job is different from your foreground process group. A job is either suspended or in the background.

      The following are various ways to reference a job:

      %n

      Where n is the job number reported by jobs

      %+

      Your current job

      %%

      Your current job

      %-

      Your previous job

      %string

      Job whose command line begins with string

      %?string

      Job whose command line contains string

      Putting Jobs in the Foreground

      After executing a process group in the background, you may decide for some reason that you would like it to execute in the foreground. With non-job control shells, after executing a command line in the background (via the & symbol), it stays in the background until it completes or is terminated (for example, if you send a terminate signal to it via kill). With a job control shell, the fg command will move the specified job into the foreground. The fg command will take either a job number preceded by a percent (%) sign or a PID as an argument. If neither is given, the current job is taken as the default.

      The result of the fg command is that the specified job executes as your foreground process. Remember that you can have only one foreground process at a time. To move the vi mydoc.txt job into the foreground, you could enter fg %1.

      Suspending Process Groups

      To suspend an executing process group, you need to send a suspend signal to the process. There are two ways to do this: (1) use the suspend control character on your foreground process, or (2) send a suspend signal via the kill command.

      The suspend control character, commonly Ctrl+Z, is used to send a suspend signal to the foreground process. Your shell may be configured with a different suspend control character, so be sure to find out your own configuration by running the stty -a command. (Refer to the section "Working on the System" for information on control characters.) After you have executed a command in the foreground, you simply press Ctrl+Z (or whatever your suspend control character is) to suspend the running process. The result is that the process is suspended from execution. When this happens, your shell prints a message giving the job number and process group ID for that job. You can subsequently use the fg or bg commands to manipulate this process.

      Putting Jobs in the Background

      The bg command puts the specified jobs into the background and resumes their execution. The common way to use this command is following a suspend control character. After a job is put in the background, it will continue executing until it completes (or attempts input or output from the terminal). You manipulate it via fg or kill.

      An example may help you see the power of these commands when used together:

      $ long_job\
      
      ^Z[1] + Stopped                  long_job
      
      $ important_job 1
      
      $ jobs
      
      [1] + Stopped                  sleep 400
      
      $ bg
      
      [1]     long_job&
      
      $ important_job 2
      
      $ kill -STOP %1
      
      [1] + Stopped (signal)         long_job
      
      $ important_job 3
      
      $ fg %1
      
      long_job

      If you don't have a long_job, try using sleep 100. If you don't have an important_job, try using echo. This example shows how you can use job control to move jobs between the foreground and background and suspend, then later resume, jobs that might be taking computer resources that you need.

      Using wait to Wait for Background Jobs

      The wait command built into most shells (including all the shells discussed in this guide) will wait for completion of all background processes or a specific background process. Usually, wait is used in scripts, but occasionally you may want to use it interactively to wait for a particularly important background job or to pause until all of your current background jobs complete so you will not load the system with your next job. The command wait will wait for all background jobs. The command wait pid will wait for a particular PID. If you are using a job control shell, you can use a job identifier instead of a PID:

      $ job1 &
      
      [1]     20233
      
      $ job2 &
      
      [2]     20234
      
      $ job3 &
      
      [3]     20235
      
      $ job4 %
      
      [4]     20237
      
      $ wait %1
      
      $ wait 20234
      
      $ wait
      
      [4] +  Done                    job4 &
      
      [3] +  Done                    job3 &
      
      $ jobs
      
      $

      Using csh notify to Learn About Changes Sooner

      Most interactive use of wait in csh can be replaced by notify. The notify command tells csh not to wait until issuing a new prompt before telling you about the completion of all or some background jobs. The command notify will tell csh to give asynchronous notification of job completion. The command notify jobid will tell csh to give asynchronous notification for a particular job. For example:

      % sleep 30 &
      
      [1] 20237
      
      % sleep 10 &
      
      [2] 20238
      
      % notify %2
      
      % 
      
      [2]   Done                 sleep 10
      
      jobs
      
      [1]  +Running              sleep 30
      
      %

      When you do this example, don't type anything after hitting return to enter notify %2. The notification appears as soon as job 2 finishes.

      My System Is Slow—Performance Tuning

      UNIX offers several tools that can be useful in finding performance problem areas. This section covers using ps and sar to look for processes which are causing problems and system bottlenecks which need to be resolved. Your system may have more performance analysis tools; check your system documentation.

      Monitoring the System with ps

      If your system is having performance problems, you may want to terminate or suspend some of the large or CPU-intensive processes to let your system run more effectively. You can use ps to locate some of these processes.


      NOTE: Many UNIX systems have or can run a program called top, which displays the current heavy users of system CPU resources.

      On a SYSV system, you can use ps -fe or ps -le to look at all processes and examine the list to look for those processes which are using lots of CPU or memory. Try running ps twice in a row to look for processes with rapidly increasing TIME:

      $ ps -le
      
       F S   UID   PID  PPID  C PRI NI     ADDR     SZ    WCHAN TTY      TIME COMD
      
      19 T     0     0     0 80   0 SY f808c4bc      0          ?        0:20 sched
      
       8 S     0     1     0241   1 20 fc1c2000     43 fc1c21c4 ?        0:02 init
      
      19 S     0     2     0  1   0 SY fc13c800      0 f80897a0 ?        0:00 pageout
      
      19 S     0     3     0 80   0 SY fc13c000      0 f8089e4e ?        0:06 fsflush
      
       8 S     0   204   120 35   1 20 fc311000    265 f808fb60 ?        0:00 in.rlogi
      
       8 S     0   179     1 29   1 20 fc3b2800    196 fc16554e ?        0:00 sac
      
       8 S     0   136     1 29   1 20 fc36d000    353 f808fb60 ?        0:00 automoun
      
       8 S     0   103     1 80   1 20 fc32e800    326 f808fb60 ?        0:01 rpcbind
      
       8 S     0   109     1 52   1 20 fc333800    294 f808fb60 ?        0:01 ypbind
      
       8 S     0   120     1154   1 20 fc349800    289 f808fb60 ?        0:01 inetd
      
       8 S     0   111     1 20   1 20 fc34b800    294 f808fb60 ?        0:00 kerbd
      
       8 S     0   105     1  3   1 20 fc335800    223 f808fb60 ?        0:00 keyserv
      
       8 S     0   123     1 80   1 20 fc348000    332 f808fb60 ?        0:19 statd
      
       8 S     0   125     1 65   1 20 fc353800    395 f808fb60 ?        0:01 lockd
      
       8 S     0   159   151 15   1 20 fc39d000    239 f808fb60 ?        0:00 lpNet
      
       8 S   343   151     1 61   1 20 fc399000    891 f808fb60 ?        0:00 bigproc
      
       8 S     0   143     1 18   1 20 fc30c000    259 fc308b4e ?        0:00 cron
      
       8 S     0   160     1 17   1 20 fc3a0800    329 fc22de4e ?        0:00 sendmail
      
       8 O   343   210   206  9   1 20 fc314000    114          pts/0    0:00 ps
      
       8 S     0   167     1 80   1 20 fc3b4800    310 f808fb60 ?        0:12 syslogd
      
       8 S     0   181     1 29   1 20 fc3b8800    213 f808fb60 console  0:00 ttymon
      
       8 S   343   206   204 80   1 20 fc30e800    125 fc314070 pts/0    0:00 sh
      
       8 S   343   208   204 80   1 20 fc30e800    212          pts/0    0:46 busyproc
      
       8 S     0   184   179 44   1 20 fc3b6800    208 f808fb60 ?        0:00 listen
      
       8 S     0   185   179 38   1 20 fc3b3000    221 fc3b31c4 ?        0:00 ttymon
      
      $ 

      Note that bigproc has a rather large value for SZ and that busyproc has a lot of TIME.

      On a BSD system, you can use ps xau to look at all processes and examine the %CPU and %MEM field for processes with high CPU and memory usage:

      % ps xau
      
      USER       PID %CPU %MEM   SZ  RSS TT STAT START  TIME COMMAND
      
      sartin    1014 88.7  0.9   32  192 p0 R    15:46   0:19 busyproc
      
      root         1  0.0  0.0   52    0 ?  IW   Mar 12  0:00 /sbin/init -
      
      root         2  0.0  0.0    0    0 ?  D    Mar 12  0:00 pagedaemon
      
      root        93  0.0  0.0  100    0 ?  IW   Mar 12  0:00 /usr/lib/sendmail -bd -q
      
      root        54  0.0  0.0   68    0 ?  IW   Mar 12  0:02 portmap
      
      root       300  0.0  0.0   48    0 ?  IW   Mar 12  0:00 rpc.rquotad
      
      root        59  0.0  0.0   40    0 ?  IW   Mar 12  0:00 keyserv
      
      sartin     980  0.0  1.5  268  336 p0 S    15:33   0:00 -sh (tcsh)
      
      root        74  0.0  0.0   16    0 ?  I    Mar 12  0:00  (biod)
      
      root        85  0.0  0.0   60    0 ?  IW   Mar 12  0:00 syslogd
      
      root       111  0.0  0.0   28    0 ?  I    Mar 12  0:00  (nfsd)
      
      root       117  0.0  0.1   16   28 ?  S    Mar 12 17:03 /usr/bin/screenblank
      
      root       127  0.0  0.0   12    8 ?  S    Mar 12 11:07 update
      
      root       130  0.0  0.0   56    0 ?  IW   Mar 12  0:00 cron
      
      root       122  0.0  3.3  740  748 ?  S    Mar 12  0:05 bigproc
      
      root       136  0.0  0.0   56    0 ?  IW   Mar 12  0:00 inetd
      
      sartin    1016  0.0  2.0  204  444 p0 R    15:46   0:00 ps xau
      
      root       140  0.0  0.0   52    0 ?  IW   Mar 12  0:00 /usr/lib/lpd
      
      root       834  0.0  0.2   44   44 ?  S    15:03   0:03 in.telnetd
      
      root       146  0.0  0.0   40    0 co IW   Mar 12  0:00 - std.9600 console (gett
      
      sartin     835  0.0  0.0   32    0 p0 IW   15:03   0:01 -ksh TERM=vt100 HOME=/ti
      
      root      1011  0.0  0.9   24  204 ?  S    15:45   0:00 in.comsat
      
      root         0  0.0  0.0    0    0 ?  D    Mar 12  0:01 swapper
      
      %

      Note that busyproc has 88.7 percent CPU usage and that bigproc has higher than average memory usage, but still only 3.3 percent.

      By using ps to examine the running processes, you can keep track of what is happening on your system and catch runaway processes or memory hogs.

      Monitoring the System with sar

      The sar command can be used to generate a System Activity Report covering things such as CPU usage, buffer activity, disk usage, TTY activity, system calls, swapping activity, file access calls, queue length, and system table and message/semaphore activity. If you run sar [-ubdycwaqvmA] [-o file] interval [num_samples], sar will print summaries a total of num_samples times every interval seconds and then stop. If num_samples is not supplied, sar will run until interrupted. With sar -o file the output will go in binary format to file and can be read using sar -f file. If you run sar [-ubdycwaqvmA] [-s time] [-s time] [-i sec] [-f file], the input will be read from a binary file (default is where the system command sa1 puts its output).


      NOTE: The sar command is the user interface to the System Activity Report. Your system administrator can configure your system to do continual activity reporting (using sa1 and other commands). See your system documentation on sar for more information.


      CAUTION: If you are on a BSD system, you may not have sar. Try checking out the vmstat and iostat commands for some similar information.

      The command sar -u 5 5 will print CPU usage statistics:

      $ sar -u 5 5
      
      HP-UX cnidaria A.09.00 C 9000/837    03/14/94
      
      16:18:53    %usr    %sys    %wio   %idle
      
      16:18:58       0       0       0     100
      
      16:19:03      58      28       1      13
      
      16:19:08      84      16       0       0
      
      16:19:13      57      11      31       0
      
      16:19:18       0       6      94       0
      
      Average       40      12      25      23
      
      $

      The column headings %usr, %sys, %wio, and %idle report the percentage of time spent respectively on user processes, system mode, waiting for I/O, and idling (doing nothing). The command sar -b will print buffer activity:

      $ sar -b 5 5
      
      HP-UX cnidaria A.09.00 C 9000/837    03/14/94
      
      16:19:34 bread/s lread/s %rcache bwrit/s lwrit/s %wcache pread/s pwrit/s
      
      16:19:39       0       5      96       0       0       0       0       0
      
      16:19:44       2    2809     100     174    2081      92       0       0
      
      16:19:49       1    1456     100      83     950      91       0       0
      
      16:19:54       4    1598     100      71    1267      94       0       0
      
      16:19:59       3    1374     100      92    1055      91       0       0
      
      Average        2    1449     100      84    1071      92       0       0
      
      $

      The bread/s and bwrit/s columns report transfers between the system buffers and disk (or other block) devices. The lread/s and lwrit/s columns report accesses of system buffers. The %rcache and %wcache columns report cache hit ratios. The UNIX kernel attempts to keep copies of buffers around in memory so that it can satisfy a disk read request without having to read the disk. For example, if one process writes block 5 of your disk and shortly after that another process writes different data to block 5, your system will save one write if it kept the data cached rather than writing to disk. High cache:hit ratios are good because they mean your system is able to avoid reading from or writing to the disk when it isn't necessary. The pread/s and pwrit/s columns report raw transfers. Raw transfers are transfers that don't use the file system at all. You will usually see raw transfers when using tar to read or write a tape or when using fsck to repair a file system. The command sar -d will print buffer activity for each block device (disk or tape drive):

      $ sar -d 5 2
      
      HP-UX cnidaria A.09.00 C 9000/837    03/14/94
      
      16:41:16  device   %busy   avque   r+w/s  blks/s  avwait  avserv
      
      16:41:21 disc3-0        3     1.4       2      14     8.8    21.2
      
      16:41:26 disc3-0       70   105.8      55     867  1328.6    12.7
      
      Average  disc3-0       37   101.0      28     441  1291.5    12.9
      
      $

      The device column will report your system-specific disk name. The %busy and avque columns report the percentage of time the device was busy servicing requests and the average number of requests outstanding. The r+w/s and blks/s columns report the number of transfers per second and number of 512 byte blocks transferred per second. The avwait and avserv columns report the average time in milliseconds that transfer requests wait in the queue and the average time for a request to be serviced. The command sar -y will report TTY activity

      $ sar -y 10 4
      
      HP-UX cnidaria A.09.00 C 9000/837    03/14/94
      
      16:43:12 rawch/s canch/s outch/s rcvin/s xmtin/s mdmin/s
      
      16:43:22     424     420     458       0       0       0
      
      16:43:32     595     596    1469       0       0       0
      
      16:43:42     678     674    1542       0       0       0
      
      16:43:52     736     743     755       0       0       0
      
      Average      608     608    1056       0       0       0
      
      $ 

      The rawch/s, canch/s, and outch/s columns report the input rate, input rate for characters with canonical processing, and output rate. The rcvin/s, xmtin/s, and mdmin/s columns report the modem receive rate, transmit rate, and interrupt rate. The command sar -c will report system call activity:

      $ sar -c 5 5
      
      HP-UX cnidaria A.09.00 C 9000/837    03/14/94
      
      16:50:33 scall/s  sread/s  swrit/s   fork/s   exec/s  rchar/s  wchar/s
      
      16:50:38    1094       15     1016     0.60     0.60  16938189  1047142
      
      16:50:43     592        8      540     0.20     0.20  9033318   590234
      
      16:50:48     641        9      602     0.00     0.00  10007142   613376
      
      16:50:53     735       14      766     0.20     0.20  11245978   507494
      
      16:50:58     547       16      359     0.00     0.00  7215923   605594
      
      Average      722       12      657     0.20     0.20  10887960   672768
      
      $ 

      The scall/s column reports the total number of system calls per second. The sread/s, swrit/s, fork/s, and exec/s columns report the number of read, write, fork, and exec system calls. The rchar/s, and wchar/s columns report the number of characters read and written by system calls. The command sar -w reports system-swapping activity:

      $ sar -w 5 5
      
      HP-UX cnidaria A.09.00 C 9000/837    03/14/94
      
      16:51:40 swpin/s bswin/s swpot/s bswot/s pswch/s
      
      16:51:45    0.00     0.0    0.00     0.0      24
      
      16:51:50    0.00     0.0    0.00     0.0      49
      
      16:51:55    0.00     0.0    0.00     0.0       5
      
      16:52:00    0.00     0.0    0.00     0.0      67
      
      16:52:05    0.00     0.0    0.00     0.0      42
      
      Average     0.00     0.0    0.00     0.0      37
      
      $ 

      The swpin/s, bswin/s, swpot/s, and bswot/s columns report the number of transfers and 512 byte blocks for swapins and swapouts. The pswch/s column reports the number of process context switches per second. The command sar -a reports system file access activity:

      $ sar -a 5 5
      
      HP-UX cnidaria A.09.00 C 9000/837    03/14/94
      
      16:52:31  iget/s namei/s dirbk/s
      
      16:52:36       0       1       0
      
      16:52:41      65      79       4
      
      16:52:46     495     561      23
      
      16:52:51     487     572      30
      
      16:52:56     726     828      36
      
      Average      354     408      18
      
      $ 

      The columns report the number of calls to the system function named. The command sar -q reports run queue activity:

      $ sar -q 5 5
      
      HP-UX cnidaria A.09.00 C 9000/837    03/14/94
      
      16:53:15 runq-sz %runocc swpq-sz %swpocc
      
      16:53:20     1.0      80                
      
      16:53:25     1.5      80                
      
      16:53:30     2.0     100                
      
      16:53:35     1.4     100                
      
      16:53:40     1.6     100                
      
      Average      1.5      92                
      
      $ 

      The runq-sz and %runocc columns report the average length of the run queue when occupied and the percentage of time it was occupied. The run queue is the list of processes that are ready to use the CPU (not waiting for I/O or other events). The swpq-sz and %swpocc columns report the average length of the swap queue when occupied and the percentage of time it was occupied. The swap queue is the list of processes that are ready to use the CPU, but are completely swapped out of memory and can't use the CPU until they are swapped into memory. This column may not appear (or may be empty or appear with 0 values) for systems without swapping. The command sar -v reports status of various system tables:

      $ sar -v
      
      HP-UX cnidaria A.09.00 C 9000/837    03/14/94
      
      13:12:54 text-sz  ov  proc-sz  ov  inod-sz  ov  file-sz  ov 
      
      13:13:02   N/A   N/A  48/276   0  114/356   0  121/600   0
      
      13:20:00   N/A   N/A  51/276   0  111/356   0  128/600   0
      
      13:40:00   N/A   N/A  51/276   0   95/356   0  128/600   0
      
      14:00:01   N/A   N/A  51/276   0  108/356   0  128/600   0
      
      14:20:01   N/A   N/A  51/276   0   94/356   0  128/600   0
      
      14:40:01   N/A   N/A  51/276   0   94/356   0  128/600   0
      
      15:00:01   N/A   N/A  48/276   0  106/356   0  124/600   0
      
      15:20:01   N/A   N/A  48/276   0   91/356   0  124/600   0
      
      15:40:01   N/A   N/A  48/276   0   91/356   0  124/600   0
      
      16:00:00   N/A   N/A  54/276   0  213/356   0  135/600   0
      
      16:20:00   N/A   N/A  49/276   0  113/356   0  119/600   0
      
      16:40:00   N/A   N/A  47/276   0   84/356   0  118/600   0
      
      17:00:01   N/A   N/A  47/276   0   99/356   0  118/600   0
      
      $ 

      The column table-sz reports the entries/size of a particular system table. The tables for SYSV (from SVID3) are proc, inod, file, and lock. UNIX SVR4 (SVID3) includes a program synchronization mechanism using semaphores, which are critical resource controls. A process generally acquires a semaphore, performs a critical action, and releases the semaphore. No other process can acquire a semaphore already in use. The command sar -m reports message and semaphore activity:

      $ sar -m 6 5
      
      HP-UX cnidaria A.09.00 C 9000/837    03/14/94
      
      17:00:22   msg/s  sema/s
      
      17:00:28    4.50    0.00
      
      17:00:34    4.50    0.00
      
      17:00:40    4.50    0.00
      
      17:00:46    4.50    0.00
      
      17:00:52    4.50    0.00
      
      Average     4.50    0.00
      
      $ 

      The columns msg/s and sema/s report message and semaphore primitives per second.

      Summary

      In this chapter, you have learned how to use the UNIX commands ps, time, and sar to examine the state of your processes and your system. You have learned about foreground and background jobs and how to use the job control features of UNIX and your shell (csh or ksh) to control foreground and background jobs. You have learned to use the nice and renice commands to limit the CPU impact of your jobs. You have learned to use the kill command to suspend or terminate jobs that are using too much of the available system resources. Applying this knowledge to your daily use of UNIX will help you and your system be efficient at getting tasks completed.

      Previous Page Main Page Next Page