Chapter 13
Process, String, and Mathematical Functions
CONTENTS
ToChapter's lesson describes three groups of built-in Perl functions:
- The functions that manipulate processes and programs that
are currently running
- The functions that perform mathematical operations
- The functions that manipulate character strings
 |
Many of the functions described toChapter use features of the UNIX operating system. If you are using Perl on a machine that is not running UNIX, some of these functions might not be defined or might behave differently.
Check the documentation supplied with your version of Perl for details on which functions are supported or emulated on your machine
|
Perl provides a wide range of functions that manipulate both the
program currently being executed and other programs (also called
processes) running on your machine. These functions are divided
into four groups:
- Functions that start additional processes
- Functions that stop the current program or another process
- Functions that control the execution of a program or process
- Functions that manipulate processes or programs but don't
fit into any of the preceding categories
The following sections describe these four groups of process-
and program-manipulation functions.
Several built-in functions provide different ways of creating
processes: eval, system, fork, pipe,
exec, and syscall. These functions are described
in the following subsections.
The eval Function
The eval function treats a character string as an executable
Perl program.
The syntax for the eval function is
eval (string);
Here, string is the character string that is to become
a Perl program.
For example, these two lines of code:
$print = "print (\"hello, world\\n\");";
eval ($print);
print the following message on your screen:
hello, world
The character string passed to eval can be a character-string
constant or any expression that has a value which is a character
string. In this example, the following string is assigned to $print,
which is then passed to eval:
print ("hello, world\n");
The eval function uses the special system variable $@
to indicate whether the Perl program contained in the character
string has executed properly. If no error has occurred, $@
contains the null string. If an error has been detected, $@
contains the text of the message.
The subprogram executed by eval affects the program that
called it; for example, any variables that are changed by the
subprogram remain changed in the main program. Listing 13.1 provides
a simple example of this.
Listing 13.1. A program that illustrates the behavior of eval.
1: #!/usr/local/bin/perl
2:
3: $myvar = 1;
4: eval ("print (\"hi!\\n\"); \$myvar = 2;");
5: print ("the value of \$myvar is $myvar\n");
$ program13_1
hi!
the value of $myvar is 2
$

The call to eval in line 4
first executes the statement
print ("hi!\n");
Then it executes the following assignment, which assigns 2
to $myvar:
$myvar = 2;
The value of $myvar remains 2 in the main program,
which means that line 5 prints the value 2. (The backslash
preceding the $ in $myvar ensures that the Perl
interpreter does not substitute the value of $myvar for
the name before passing it to eval.)
| NOTE |
If you like, you can leave off the final semicolon in the character string passed to eval, as follows:
eval ("print (\"hi!\\n\"); \$myvar = 2");
As before, this prints hi! and assigns 2 to $myvar
|
The eval function has one very useful property: If the
subprogram executed by eval encounters a fatal error,
the main program does not halt. Instead, the subprogram terminates,
copies the error message into the system variable $@,
and returns to the main program.
This feature is very useful if you are moving a Perl program from
one machine to another and you are not sure whether the new machine
contains a built-in function you need. For example, Listing 13.2
tests whether the tell function is implemented.
Listing 13.2. A program that uses eval
to test whether a function is implemented.
1: #!/usr/local/bin/perl
2:
3: open (MYFILE, "file1") || die ("Can't open file1");
4: eval ("\$start = tell(MYFILE);");
5: if ($@ eq "") {
6: print ("The tell function is defined.\n");
7: } else {
8: print ("The tell function is not defined!\n");
9: }
$ program13_2
The tell function is defined.
$

The call to eval in line 4
creates a subprogram that calls the function tell. If
tell is defined, the subprogram assigns the location
of the next line (which, in this case, is the first line) to read
to the scalar variable $start. If tell is not
defined, the subprogram places the error message in $@.
Line 5 checks whether $@ is the null string. If $@
is empty, the subprogram in line 4 executed without generating
an error, which means that the tell function is implemented.
(Because assignments performed in the subprogram remain in effect
in the main program, the main program can call seek using
the value in $start, if desired.) If $@ is not
empty, the program assumes that tell is not defined,
and it prints a message proclaiming that fact. (This program is
assuming that the only reason the subprogram could fail is because
tell is not defined. This is a reasonable assumption,
because you know that the file referenced by MYFILE has
been successfully opened.)
 |
Although eval is very useful, it is best to use it only for small programs. If you need to generate a larger program, it might be better to write the program to a file and call system to execute it. (The system function is
described in the following section.)
Because statements executed by eval affect the program that calls it, the behavior of complicated programs might become difficult to track if eval is used to excess.
|
The system Function
You have seen examples of the system function in earlier
lessons.
The syntax for the system function is
system (list);
This function is passed a list as follows: The first element of
the list contains the name of a program to execute, and the other
elements are arguments to be passed to the program.
When system is called, it starts a process that runs
the program and waits until the process terminates. When the process
terminates, the error code is shifted left eight bits, and the
resulting value becomes system's return value. Listing
13.3 is a simple example of a program that calls system.

Listing 13.3. A program that calls system.
1: #!/usr/local/bin/perl
2:
3: @proglist = ("echo", "hello, world!");
4: system(@proglist);
$ program13_3
hello, world!
$

In this program, the call to system
executes the UNIX program echo, which displays its arguments.
The argument passed to echo is hello, world!.
| TIP |
When you start another program using system, output data might be mixed, out of sequence, or duplicated.
To get around this problem, set the system variable $|, defined for each file, to 1. The following is an example:
select (STDOUT);
$| = 1;
select (STDERR);
$| = 1;
When $| is set to 1, no buffer is defined for that file, and output is written out right away. This ensures that the output behaves properly when system is called.
See "Redirecting One File to Another" on Chapter 12, "Working with the File System," for more information on select and $|
|
The fork Function
The fork function creates two copies of your program:
the parent process and the child process. These copies execute
simultaneously.
The syntax for the fork function is
procid = fork();
fork returns zero to the child process and a nonzero
value to the parent process. This nonzero value is the process
ID of the child process. (A process ID is an integer that
enables the system to distinguish this process from the other
processes currently running on the machine.)
The return value from fork enables you to determine which
process is the child process and which is the parent. For example:
$retval = fork();
if ($retval == 0) {
# this is the child process
exit; # this terminates the child process
} else {
# this is the parent process
}
If fork is unable to execute, the return value is a special
undefined value for which you can test by using the defined
function. (For more information on defined, see Chapter 14,
"Scalar- Conversion and List-Manipulation Functions.")
To terminate a child process created by fork, use the
built-in function exit, which is described later in toChapter's
lesson.
 |
Be careful when you use the fork function. The following are a few examples of what can go wrong:
- If both copies of the program execute calls to print or any other output-generating function, the output from one copy might be mixed with the output from the other copy. There is no way to guarantee that output from one copy will appear
before output from the other, unless you force one process to wait for the other.
- If you use fork in a loop, the program might wind up generating many copies of itself. This can affect the performance of your system (or crash it completely).
- Your child process might wind up executing code that your parent process is supposed to execute, or vice versa
|
The pipe Function
The pipe function is designed to be used in conjunction
with the fork function. It provides a way for the child
and parent processes to communicate.
The syntax for the pipe function is
pipe (infile, outfile);
pipe requires two arguments, each of which is a file
variable that is not currently in use-in this case, infile
and outfile. After pipe has been called, information
sent via the outfile file variable can be read using
the infile file variable. In effect, the output from
outfile is piped to infile.
To use pipe with fork, do the following:
- Call pipe.
- Call fork to split the program into parent and child
processes.
- Have one of the processes close infile, and have
the other close outfile.
The process in which outfile is still open can now send
data to the process in which infile is still open. (The
child can send data to the parent, or vice versa, depending on
which process closes input and which closes output.)
Listing 13.4 shows how pipe works. It uses fork
to create a parent and child process. The parent process reads
a line of input, which it passes to the child process. The child
process then prints it.
Listing 13.4. A program that uses fork
and pipe.
1: #!/usr/local/bin/perl
2:
3: pipe (INPUT, OUTPUT);
4: $retval = fork();
5: if ($retval != 0) {
6: # this is the parent process
7: close (INPUT);
8: print ("Enter a line of input:\n");
9: $line = <STDIN>;
10: print OUTPUT ($line);
11: } else {
12: # this is the child process
13: close (OUTPUT);
14: $line = <INPUT>;
15: print ($line);
16: exit (0);
17: }
$ program13_4
Enter a line of input:
Here is a test line
Here is a test line
$

Line 3 defines the file variables INPUT
and OUTPUT. Data sent to OUTPUT can be now read
from INPUT.
Line 4 splits the program into a parent process and a child process.
Line 5 then determines which process is which.
The parent process executes lines 7-10. Because the parent process
is sending data through OUTPUT, it has no need to access
INPUT; therefore, line 7 closes INPUT.
Lines 8 and 9 obtain a line of data from the standard input file.
Line 10 then sends this line of data to the child process via
the file variable OUTPUT.
The child process executes lines 13-16. Because the child process
is receiving data through INPUT, it does not need access
to OUTPUT; therefore, line 13 closes OUTPUT.
Line 14 reads data from INPUT. Because data from OUTPUT
is piped to INPUT, the program waits until the data is
actually sent before continuing with line 15.
Line 16 uses exit to terminate the child process. This
also automatically closes INPUT.
Note that the <INPUT> operator behaves like any
other operator that reads input (such as, for instance, <STDIN>).
If there is no more data to read, INPUT is assumed to
be at the "end of file," and <INPUT>
returns the null string.
 |
Traffic through the file variables specified by pipe can flow in only one direction. You cannot have a process both send and receive on the same pipe.
If you need to establish two-way communication, you can open two pipes, one in each direction
|
The exec Function
The exec function is similar to the system function,
except that it terminates the current program before starting
the new one.
The syntax for the exec function is
exec (list);
This function is passed a list as follows: The first element of
the list contains the name of a program to execute, and the other
elements are arguments to be passed to the program.
For example, the following statement terminates the Perl program
and starts the command mail dave:
exec ("mail dave");
Like system, exec accepts additional arguments
that are assumed to be passed to the command being invoked. For
example, the following statement executes the command vi file1:
exec ("vi", "file1");
You can specify the name that the system is to use as the program
name, as follows:
exec "maildave" ("mail dave");
Here, the command mail dave is invoked, but the program
name is set to maildave. (This affects the value of the
system variable $0, which contains the name of the running
program. It also affects the value of argv[0] if the
program to be invoked was originally written in C.)
exec often is used in conjunction with fork:
when fork splits into two processes, the child process
starts another program using exec.
 |
exec has the same output-buffering problems as system. See the description of system, earlier in toChapter's lesson, for a description of these problems and how to deal with them
|
The syscall Function
The syscall function calls a system function.
The syntax for the syscall function is
syscall (list);
syscall expects a list as its argument. The first element
of the list is the name of the system call to invoke,
and the remaining elements are arguments to be passed to the call.
If an argument in the list passed to syscall is a numeric
value, it is converted to a C integer (type int). Otherwise,
a pointer to the string value is passed. See the syscall
UNIX manual page or the Perl documentation for more details.
| NOTE |
The Perl header file syscall.ph must be included in order to use syscall:
require ("syscall.ph")
For more information on require, see Chapter 20, "Miscellaneous Features of Perl."
|
The following sections describe the functions that terminate either
the currently executing program or a process running elsewhere
on the system: die, warn, exit, and
kill.
The die and warn Functions
The die and warn functions provide a way for
programs to pass urgent messages back to the user who is running
them.
The die function terminates the program and prints an
error message on the standard error file.
The syntax for the die function is
die (message);
message is the error message to be displayed.
For example, the call
die ("Cannot open input file\n");
prints the following message and then exits:
Cannot open input file
die can accept a list as its argument, in which case
all elements of the list are printed.
@diemsg = ("I'm about ", "to die\n");
die (@diemsg);
This prints out the following message and then exits:
I'm about to die
If the last argument passed to die ends with a newline
character, the error message is printed as is. If the last argument
to die does not end with a newline character, the program
filename and line number are printed, along with the line number
of the input file (if applicable). For example, if line 6 of the
file myprog is
die ("Cannot open input file");
the message it prints is
Cannot open input file at myprog line 6.
The warn function, like die, prints a message
on the standard error file.
The syntax for the warn function is
warn (message);
As with die, message is the message to be displayed.
warn, unlike die, does not terminate. For example,
the statement
warn ("Input file is empty");
sends the following message to the standard error file, and then
continues executing:
Input file is empty at myprog line 76.
If the string passed to warn is terminated by a newline
character, the warning message is printed as is. For example,
the statement
warn("Danger! Danger!\n");
sends
Danger! Danger!
to the standard error file.
| NOTE |
If eval is used to invoke a program that calls die, the error message printed by die is not printed; instead, the error message is assigned to the system variable $@
|
The exit Function
The exit function terminates a program.
If you like, you can specify a return code to be passed to the
system by passing exit an argument using the following
syntax:
exit (retcode);
retcode is the return code you want to pass.
For example, the following statement terminates the program with
a return code of 2:
exit(2);
The kill Function
The kill function enables you to send a signal to a group
of processes.
The syntax for invoking the kill function is
kill (signal, proclist);
In this case, signal is the numeric signal to send. (For
example, a signal of 9 kills the listed processes.) proclist
is a list of process IDs (such as the child process ID returned
by fork).
signal also can be a signal name enclosed in quotes,
as in "INT".
For more details on the signals you can send, refer to the kill
UNIX manual page.
The sleep, wait, and waitpid functions
delay the execution of a particular program or process.
The sleep Function
The sleep function suspends the program for a specified
number of seconds.
The syntax for the sleep function is
sleep (time);
time is the number of seconds to suspend program execution.
The function returns the number of seconds that the program was
actually stopped.
For example, the following statement puts the program to sleep
for five seconds:
sleep (5);
The wait and waitpid Functions
The wait function suspends execution and waits for a
child process to terminate (such as a process created by fork).
The wait function requires no arguments:
procid = wait();
When a child process terminates, wait returns the process
ID, procid, of the process that has terminated. If no
child processes exist, wait returns -1.
The waitpid function waits for a particular child process.
The syntax for the waitpid function is
waitpid (procid, waitflag);
procid is the process ID of the process to wait for,
and waitflag is a special wait flag (as defined by the
waitpid or wait4 manual page). By default, waitflag
is 0 (a normal wait). waitpid returns 1 if the
process is found and has terminated, and it returns -1
if the child process does not exist.
Listing 13.5 shows how waitpid can be used to control
process execution.
Listing 13.5. A program that uses waitpid.
1: #!/usr/local/bin/perl
2:
3: $procid = fork();
4: if ($procid == 0) {
5: # this is the child process
6: print ("this line is printed first\n");
7: exit(0);
8: } else {
9: # this is the parent process
10: waitpid ($procid, 0);
11: print ("this line is printed last\n");
12: }
$ program13_5
this line is printed first
this line is printed last
$

Line 3 splits the program into a parent
process and a child process. The parent process is returned the
process ID of the child process, which is stored in $procid.
Lines 6 and 7 are executed by the child process. Line 6 prints
the following line:
this line is printed first
Line 7 then calls exit, which terminates the child process.
Lines 10 and 11 are executed by the parent process. Line 10 calls
waitpid and passes it the ID of the child process; therefore,
the parent process waits until the child process terminates before
continuing. This means that line 11, which prints the second line,
is guaranteed to be executed after the first line is printed.
As you can see, wait can be used to force the order of
execution of processes.
| NOTE |
For more information on the possible values that can be passed as waitflag, examine the file wait.ph, which is available from the same place you retrieved your copy of Perl. (It might already be on your system.) You can find out more
also by investigating the waitpid and wait4 manual pages
|
The caller, chroot, local, and times
functions perform various process and program-related actions.
The caller Function
The caller function returns the name and the line number
of the program that called the currently executing subroutine.
The syntax for the caller function is
subinfo = caller();
caller returns a three-element list, subinfo,
consisting of the following:
- The name of the package from which the subroutine was called
- The name of the file from which the subroutine was called
- The line number of the subroutine call
This routine is used by the Perl debugger, which you'll learn
about on Chapter 21, "The Perl Debugger." For more information
on packages, refer to Chapter 20, "Miscellaneous Features of
Perl."
The chroot Function
The chroot function duplicates the functionality of the
chroot function call.
The syntax for the chroot function is
chroot (dir);
dir is the new root directory.
In the following example, the specified directory becomes the
root directory for the program:
chroot ("/u/jqpublic");
For more information, refer to the chroot manual page.
The local Function
The local function was introduced on Chapter 9, "Using
Subroutines." It declares that a copy of a named variable
is to be defined for a subroutine. (Refer to that Chapter for examples
that use local inside a subroutine.)
local can be used also to define a copy of a variable
for use inside a statement block (a collection of statements
enclosed in brace brackets), as follows:
if ($var == 14) {
local ($localvar);
# stuff goes here
}
This defines a local copy of the variable $localvar for
use inside the statement block. Any other copies of $localvar
that exist are not affected by the changes to this local copy.
 |
DON'T use local inside a loop, as in this example:
while ($var <= 14) {
local ($myvar);
# stuff goes here
}
Here, a new copy of $myvar is defined each time the loop iterates. This is probably not what you want.
|
The times Function
The times function returns the amount of job time consumed
by this program and any child processes of this program.
The syntax for the times function is
timelist = times
As you can see, times accepts no arguments. It returns
timelist, a list consisting of the following four floating-point
numbers:
- The user time consumed by this program
- The system time consumed by this program
- The user time consumed by the child processes, if they exist
- The system time consumed by the child processes, if they exist
Perl provides functions that perform the standard trigonometric
operations, plus some other useful mathematical operations. The
following sections describe these functions: sin, cos,
atan2, sqrt, exp, log, abs,
rand, and srand.
The sin and cos functions are passed a scalar
value and return the sine and cosine, respectively, of the value.
The syntax of the sin and cos functions is
retval = sin (value);
retval = cos (value);
value is a placeholder here. It can be the value stored
in a scalar variable or the result of an expression; it is assumed
to be in radians. See the following section, "The atan2
Function," to find out how to convert from radians to degrees.
The atan2 function calculates and returns the arctangent
of one value divided by another, in the range -p to p.
The syntax of the atan2 function is
retval = atan2 (value1, value2);
If value1 and value2 are equal, retval
is the value of p divided by 4.
Listing 13.6 shows how you can use this to convert from degrees
to radians.
Listing 13.6. A program that contains a subroutine that converts
from degrees to radians.
1: #!/usr/local/bin/perl
2:
3: $rad90 = °rees_to_radians(90);
4: $sin90 = sin($rad90);
5: $cos90 = cos($rad90);
6: print ("90 degrees:\nsine is $sin90\ncosine is $cos90\n");
7:
8: sub degrees_to_radians {
9: local ($degrees) = @_;
10: local ($radians);
11:
12: $radians = atan2(1,1) * $degrees / 45;
13: }
$ program13_6
90 degrees:
sine is 1
cosine is 6.1230317691118962911e-17
$

The subroutine degrees_to_radians
converts from degrees to radians by multiplying by p divided by
180. Because atan2(1,1) returns p divided by 4, all the
subroutine needs to do after that is divide by 45 to obtain the
number of radians.
In the main body of the program, line 3 converts 90 degrees to
the equivalent value in radians (p divided by 2). Line 4 then
passes this value to sin, and line 5 passes it to cos.
| NOTE |
The trigonometric operations provided here are sufficient to enable you to perform the other important trigonometric operations. For example, to obtain the tangent of a value, obtain the sine and cosine of the value by calling sin and
cos, and then divide the sine by the cosine
|
The sqrt function returns the square root of the value
it is passed.
The syntax for the sqrt function is
retval = sqrt (value);
value can be any positive number.
The exp function returns the number e ** value,
where e is the standard mathematical constant (the base
for the natural logarithm) and value is the argument
passed to exp.
The syntax for the exp function is
retval = exp (value);
To retrieve e itself, pass exp the value 1.
The log function takes a value and returns the natural
(base e) logarithm of the value.
The syntax for the log function is
retval = log (value);
The log function undoes exp; the expression
$var = log (exp ($var));
always leaves $var with the value it started with (if
you factor in round-off error).
The abs function returns the absolute value of a number.
This is defined as follows: if a value is less than zero, abs
negates it and returns the result.
$result = $abs(-3.5); # returns 3.5
Otherwise, the result is identical to the value:
$result = $abs(3.5); # returns 3.5
$result = $abs(0); # returns 0
The syntax for the abs function is
retval = abs (value);
value can be any number.
| NOTE |
abs is not defined in Perl 4
|
The rand and srand functions enable Perl programs
to generate random numbers.
The rand function is passed an integer value and generates
a random floating-point number between 0 and the value.
The syntax for the rand function is
retval = rand (num);
num is the integer value passed to rand, and
retval is a random floating-point number between 0 and
the num.
For example, the following statement generates a number between
0 and 10 and returns it in $retval:
$retval = rand (10);
srand initializes the random-number generator used by
rand. This ensures that the random numbers generated
are, in fact, random. (If you do not use srand, you'll
get the same set of random numbers each time.)
The syntax for the srand function is
srand (value);
srand accepts an integer value as an argument; if no
argument is supplied, srand calls the time function
and uses its return value as the random-number seed.
For an example that uses rand and srand, see
the section titled "Returning a Value from a Subroutine"
on Chapter 9.
| NOTE |
The following values and functions return numbers that can make useful random-number seeds:
- The system variable $$ contains the process ID of the current program. (See Chapter 17, "System Variables," for more information on $$.)
- time returns the current time value.
- Many of the functions described on Chapter 15, "System Functions," return useful values. For example, getppid returns the process ID of the program's parent process.
For best results, combine two or more of these using the | (bitwise OR) operator
|
This section describes the built-in Perl functions that manipulate
character strings. These functions enable you to do the following:
- Search for a substring in a character string
- Create a string
- Replace a substring within a string
The index function provides a way of indicating the location
of a substring in a string.
The syntax for the index function is
position = index (string, substring);
string is the character string to search in, and substring
is the character string being searched for. position
returns the number of characters skipped before substring
is located; if substring is not found, position
is set to -1.
Listing 13.7 is a program that uses index to locate a
substring in a string.
Listing 13.7. A program that uses the index
function.
1: #!/usr/local/bin/perl
2:
3: $input = <STDIN>;
4: $position = index($input, "the");
5: if ($position >= 0) {
6: print ("pattern found at position $position\n");
7: } else {
8: print ("pattern not found\n");
9: }
$ program13 7
Here is the input line I have typed.
pattern found at position 8
$

This program searches for the first
occurrence of the word the. If it is found, the program
prints the location of the pattern; if it is not found, the program
prints pattern not found.
You can use the index function to find more than one
copy of a substring in a string. To do this, pass a third argument
to index, which tells it how many characters to skip
before starting to search. For example:
$position = index($line, "foo", 5);
This call to index skips five characters before starting
to search for foo in the string stored in $line.
As before, if index finds the substring, it returns the
total number of characters skipped (including the number specified
by the third argument to index). If index does
not find the substring in the portion of the string that it searches,
it returns -1.
This feature of index enables you to find all occurrences
of a substring in a string. Listing 13.8 is a modified version
of Listing 13.7 that searches for all occurrences of the
in an input line.
Listing 13.8. A program that uses index
to search a line repeatedly.
1: #!/usr/local/bin/perl
2:
3: $input = <STDIN>;
4: $position = $found = 0;
5: while (1) {
6: $position = index($input, "the", $position);
7: last if ($position == -1);
8: if ($found == 0) {
9: $found = 1;
10: print ("pattern found - characters skipped:");
11: }
12: print (" $position");
13: $position++;
14: }
15: if ($found == 0) {
16: print ("pattern not found\n");
17: } else {
18: print ("\n");
19: }
$ program13 8
Here is the test line containing the words.
pattern found - characters skipped: 8 33
$

Line 6 of this program calls index.
Because the initial value of $position is 0,
the first call to index starts searching from the beginning
of the string. Eight charact-ers are skipped before the first
occurrence of the is found; this means that $position
is assigned 8.
Line 7 tests whether a match has been found by comparing $position
with -1, which is the value index returns when
it does not find the string for which it is looking. Because a
match has been found, the loop continues to execute.
When the loop iterates again, line 6 calls index again.
This time, index skips nine characters before beginning
the search again, which ensures that the previously found occurrence
of the is skipped. A total of 33 bytes are skipped before
the is found again. Once again, the loop continues, because
the conditional expression in line 7 is false.
On the final iteration of the loop, line 6 calls index
and skips 34 characters before starting the search. This time,
the is not found, index returns -1,
and the conditional expression in line 7 is true. At this point,
the loop terminates.
| NOTE |
To extract a substring found by index, use the substr function, which is described later in toChapter's lesson
|
The rindex function is similar to the index
function. The only difference is that rindex starts searching
from the right end of the string, not the left.
The syntax for the rindex function is
position = rindex (string, substring);
This syntax is identical to the syntax for index. string
is the character string to search in, and substring is
the character string being searched for. position returns
the number of characters skipped before substring is
located; if substring is not found, position
is setto -1.
The following is an example:
$string = "Here is the test line containing the words.";
$position = rindex($string, "the");
In this example, rindex finds the second occurrence of
the. As with index, rindex returns
the number of characters between the left end of the string and
the location of the found substring. In this case, 33 characters
are skipped, and $position is assigned 33.
You can specify a third argument to rindex, indicating
the maximum number of characters that can be skipped. For example,
if you want rindex to find the first occurrence of the
in the preceding example, you can call it as follows:
$string = "Here is the test line containing the words.";
$position = rindex($string, "the", 32);
Here, the second occurrence of the cannot be matched,
because it is to the right of the specified limit of 32 skipped
characters. rindex, therefore, finds the first occurrence
of the. Because there are eight characters between the
beginning of the string and the occurrence, $position
is assigned 8.
Like index, rindex returns -1 if it
cannot find the string it is looking for.
The length function returns the number of characters
contained in a character string.
The syntax for the length function is
num = length (string);
string is the character string for which you want to
determine the length, and num is the returned length.
Here is an example using length:
$string = "Here is a string";
$strlen = length($string);
In this example, length determines that the string
in $string is 16 characters long, and it assigns 16
to $strlen.
Listing 13.9 is a program that calculates the average word length
used in an input file. (This is sometimes used to determine the
"complexity" of the text.) Numbers are skipped.
Listing 13.9. A program that demonstrates the use of length.
1: #!/usr/local/bin/perl
2:
3: $wordcount = $charcount = 0;
4: while ($line = <STDIN>) {
5: @words = split(/\s+/, $line);
6: foreach $word (@words) {
7: next if ($word =~ /^\d+\.?\d+$/);
8: $word =~ s/[,.;:]$//;
9: $wordcount += 1;
10: $charcount += length($word);
11: }
12: }
13: print ("Average word length: ", $charcount / $wordcount, "\n");
$ program13 9
Here is the test input.
Here is the last line.
^D
Average word length: 3.5
$

This program reads a line of input at
a time from the standard input file, breaking the input line into
words. Line 7 tests whether the word is a number, and skips it
if it is. Line 8 strips any trailing punctuation character from
the word, which ensures that the punctuation is not counted as
part of the word length.
Line 10 calls length to retrieve the number of characters
in the word. This number is added to $charcount, which
contains the total number of characters in all of the words that
have been read so far. To determine the average word length of
the file, line 13 takes this value and divides it by the number
of words in the file, which is stored in $wordcount.
The tr function provides another way of determining the
length of a character string, in conjunction with the built-in
system variable $_.
The syntax for the tr function is
tr/sourcelist/replacelist/
sourcelist is the list of characters to replace, and
replacelist is the list of characters to replace with.
(For details, see the following listing and the explanation provided
with it.)
Listing 13.10 shows how tr works.
Listing 13.10. A program that uses tr
to retrieve the length of a string.
1: #!/usr/local/bin/perl
2:
3: $string = "here is a string";
4: $_ = $string;
5: $length = tr/a-zA-Z /a-zA-Z /;
6: print ("the string is $length characters long\n");
$ program13 10
the string is 16 characters long
$

Line 3 of this program creates a string
named here is a string and assigns it to the scalar variable
$string. Line 4 copies this string into a built-in scalar
variable, $_.
Line 5 exploits two features of the tr operator that
have not yet been discussed:
- If the value to be translated is not explicitly specified
by means of the =~ operator, tr assumes that
the value is stored in $_.
- tr returns the number of characters translated.
In line 5, both the search pattern (the set of characters to look
for) and the replacement pattern (the characters to replace them
with) are the same. This pattern, /a-zA-Z /, tells tr
to search for all lowercase letters, uppercase letters, and blank
spaces, and then replace them with themselves. This pattern matches
every character in the string, which means that every character
is being translated.
Because every character is being translated, the number of characters
translated is equivalent to the length of the string. This string
length is assigned to the scalar variable $length.
tr can be used also to count the number of occurrences
of a specific character, as shown in Listing 13.11.
Listing 13.11. A program that uses tr
to count the occurrences of specific characters.
1: #!/usr/local/bin/perl
2:
3: $punctuation = $blanks = $total = 0;
4: while ($input = <STDIN>) {
5: chop ($input);
6: $total += length($input);
7: $_ = $input;
8: $punctuation += tr/,:;.-/,:;.-/;
9: $blanks += tr/ / /;
10: }
11: print ("In this file, there are:\n");
12: print ("\t$punctuation punctuation characters,\n");
13: print ("\t$blanks blank characters,\n");
14: print ("\t", $total - $punctuation - $blanks);
15: print (" other characters.\n");
$ program13 11
Here is a line of input.
This line, another line, contains punctuation.
^D
In this file, there are:
4 punctuation characters,
10 blank characters,
56 other characters.
$

This program uses the scalar variable
$total and the built-in function length to count
the total number of characters in the input file (excluding the
trailing newline characters, which are removed by the call to
chop in line 5).
Lines 8 and 9 use tr to count the number of occurrences
of particular characters. Line 8 replaces all punctuation characters
with themselves; the number of replacements performed, and hence
the number of punctuation characters found, is added to the total
stored in $punctuation. Similarly, line 9 replaces all
blanks with themselves and adds the number of blanks found to
the total stored in $blanks. In both cases, tr
operates on the contents of the scalar variable $_, because
the =~ operator has not been used to specify another
value to translate.
Line 14 uses $total, $punctuation, and $blanks
to calculate the total number of characters that are not blank
and not punctuation.
| NOTE |
Many other functions and operators accept $_ as the default variable on which to work. For example, lines 4-7 of this program also can be written as follows:
while (<STDIN>) {
chop();
$total += length();
For more information on $_, refer to Chapter 17, "System Variables.
|
The pos function, defined only in Perl 5, returns the
location of the last pattern match in a string. It is ideal for
use when repeated pattern matches are specified using the g
(global) pattern-matching operator.
The syntax for the pos function is
offset = pos(string);
string is the string whose pattern is being matched.
offset is the number of characters already matched or
skipped.
Listing 13.12 illustrates the use of pos.
Listing 13.12. A program that uses pos
to display pattern match positions.
1: #!/usr/local/bin/perl
2:
3: $string = "Mississippi";
4: while ($string =~ /i/g) {
5: $position = pos($string);
6: print("matched at position $position\n");
7: }
$ program13 12
matched at position 2
matched at position 5
matched at position 8
matched at position 11

This program loops every time an i
in Mississippi is matched. The number displayed by line
6 is the number of characters to skip to reach the point at which
pattern matching resumes. For example, the first i is
the second character in the string, so the second pattern search
starts at position 2.
| NOTE |
You can also use pos to change the position at which pattern matching is to resume. To do this, put the call to pos on the left side of an assignment:
pos($string) = 5;
This tells the Perl interpreter to start the next pattern search with the sixth character in the string. (To restart searching from the beginning, use 0.
|
The substr function lets you assign a part of a character
string to a scalar variable (or to a component of an array variable).
The syntax for calls to the substr function is
substr (expr, skipchars, length)
expr is the character string from which a substring is
to be copied; this character string can be the value stored in
a variable or the value resulting from the evaluation of an expression.
skipchars is the number of characters to skip before
starting copying. length is the number of characters
to copy; length can be omitted, in which case the rest
of the string is copied.
Listing 13.13 provides a simple example of substr.
Listing 13.13. A program that demonstrates the use of substr.
1: #!/usr/local/bin/perl
2:
3: $string = "This is a sample character string";
4: $sub1 = substr ($string, 10, 6);
5: $sub2 = substr ($string, 17);
6: print ("\$sub1 is \"$sub1\"\n\$sub2 is \"$sub2\"\n");
$ program13 13
$sub1 is "sample"
$sub2 is "character string"
$

Line 4 calls substr, which
copies a portion of the string stored in $string. This
call specifies that ten characters are to be skipped before copying
starts, and that a total of six characters are to be copied. This
means that the substring sample is copied and stored
in $sub1.
Line 5 is another call to substr. Here, 17 characters
are skipped. Because the length field is omitted, substr
copies the remaining characters in the string. This means that
the substring character string is copied and stored in
$sub2.
Note that lines 4 and 5 do not change the contents of $string.
String Insertion Using substr
In Listing 13.13, which you've just seen, calls to substr
appear to the right of the assignment operator =. This
means that the return value from substr-the extracted
substring-is assigned to the variable appearing to the left of
the =.
Calls to substr can appear also on the left of the assignment
operator =. In this case, the portion of the string specified
by substr is replaced by the value appearing to
the right of the assignment operator.
The syntax for these calls to substr is basically the
same as before:
substr (expr, skipchars, length) = newval;
Here, expr must be something that can be assigned to-for
example, a scalar variable or an element of an array variable.
skipchars represents the number of characters to skip
before beginning the overwriting operation, which cannot be greater
than the length of the string. length is the number of
characters to be replaced by the overwriting operation. If length
is not specified, the remainder of the string is replaced.
newval is the string that replaces the substring specified
by skipchars and length. If newval
is larger than length, the character string automatically
grows to hold it, and the rest of the string is pushed aside (but
not overwritten). If newval is smaller than length,
the character string automatically shrinks. Basically, everything
appears where it is supposed to without you having to worry about
it.
| NOTE |
By the way, things that can be assigned to are sometimes known as lvalues, because they appear to the left of assignment statements (the l in lvalue stands for "left"). Things that appear to the right of assignment
statements are, similarly, called rvalues.
This guide does not use the terms lvalue and rvalue, but you might find that knowing them will prove useful when you read other guides on programming languages
|
Listing 13.14 is an example of a program that uses substr
to replace portions of a string.
Listing 13.14. A program that replaces parts of a string using
substr.
1: #!/usr/local/bin/perl
2:
3: $string = "Here is a sample character string";
4: substr($string, 0, 4) = "This";
5: substr($string, 8, 1) = "the";
6: substr($string, 19) = "string";
7: substr($string, -1, 1) = "g.";
8: substr($string, 0, 0) = "Behold! ";
9: print ("$string\n");
$ program13 14
Behold! This is the sample string.
$

This program illustrates the many ways
you can use substr to replace portions of a string.
The call to substr in line 4 specifies that no characters
are to be skipped before overwriting, and that four characters
in the original string are to be overwritten. This means that
the substring Here is replaced by This, and
that the following is the new value of the string stored in $string:
This is a sample character string
Similarly, the call to substr in line 5 specifies that
eight characters are to be skipped and one character is to be
replaced. This means that the word a is replaced by the.
Now, $string contains the following:
This is the sample character string
Note that the character string is now larger than the original,
because the new substring, the, is larger than the substring
it replaced.
Line 6 is an example of a call to substr that shrinks
the string. Here, 19 characters are skipped, and the rest of the
string is replaced by the substring string (because no
length field has been specified). Now, the following
is the value stored in $string:
This is the sample string
In line 7, the call to substr is passed -1 in
the skipchars field and is passed 1 in the length
field. This tells substr to replace the last character
of the string with the substring g. (g followed
by a period). $string now contains
This is the sample string.
| NOTE |
If substr is passed a skipchars value of -n, where n is a positive integer, substr skips to n characters from the right end of the string. For example, the following call replaces the last two
characters in $string with the string hello:
substr($string, -2, 2) = "hello"
|
Finally, line 8 specifies that no characters are to be skipped
and no characters are to be replaced. This means that the substring
"Behold! " (including a trailing space) is
added to the front of the existing string and that $string
now contains the following:
Behold! This is the sample string.
Line 9 prints this final value of $string.
| TIP |
If you are a C programmer and are used to manipulating strings using pointers, note that substr with a length field of 1 can be used to simulate pointer-like behavior in Perl.
For example, you can simulate the C statement
char = *str++;
as follows in Perl:
$char = substr($str, $offset++, 1);
You'll need to define a counter variable (such as $offset) to keep track of where you are in the string. However, this is no more of a chore than remembering to initialize your C pointer variable.
You can simulate the following C statement:
*str++ = char;
by assigning values using substr in the same way:
substr($str, $offset++, 1) = $char;
You shouldn't use substr in this way unless you really have to. Perl supplies more powerful and useful tools, such as pattern matching and substitution, to get the job done more efficiently
|
The study function is a special function that tells the
Perl interpreter that the specified scalar variable is about to
be searched many times.
The syntax for the study function is
study (scalar);
scalar is the scalar variable to be "studied."
The Perl interpreter takes the value stored in the specified scalar
variable and represents it in an internal format that allows faster
access.
For example:
study ($myvar);
Here, the value stored in the scalar variable $myvar
is about to be repeatedly searched.
You can call study for only one scalar variable at a
time. Previous calls to study are superseded if study
is called again.
| TIP |
To check whether study actually makes your program more efficient, use the function times, which displays the user and CPU times for a program or program fragment. (times is discussed earlier toChapter.
|
Perl 5 provides functions that perform case conversion on strings.
These are
- The lc function, which converts a string to lowercase
- The uc function, which converts a string to uppercase
- The lcfirst function, which converts the first character
of a string to lowercase
- The ucfirst function, which converts the first character
of a string to uppercase
The lc and uc Functions
The syntax for the lc and uc functions is
retval = lc(string);
retval = uc(string);
string is the string to be converted. retval
is a copy of the string, converted to either lowercase or uppercase:
$lower = lc("aBcDe"); # $lower is assigned "abcde"
$upper = uc("aBcDe"); # $upper is assigned "ABCDE"
The lcfirst and ucfirst Functions
The syntax for the lcfirst and ucfirst functions
is
retval = lcfirst(string);
retval = ucfirst(string);
string is the string whose first character is to be converted.
retval is a copy of the string, with the first character
converted to either lowercase or uppercase:
$lower = lcfirst("HELLO"); # $lower is assigned "hELLO"
$upper = ucfirst("hello"); # $upper is assigned "Hello"
The quotemeta function, defined only in Perl 5, places
a backslash character in front of any non-word character in a
string. The following statements are equivalent:
$string = quotemeta($string);
$string =~ s/(\W)/\\$1/g;
The syntax for quotemeta is
newstring = quotemeta(oldstring);
oldstring is the string to be converted. newstring
is the string with backslashes added.
quotemeta is useful when a string is to be used in a
subsequent pattern-matching operation. It ensures that there are
no characters in the string which are to be treated as special
pattern-matching characters.
The join function has been used many times in this guide.
It takes the elements of a list and converts them into a single
character string.
The syntax for the join function is
join (joinstr, list);
joinstr is the character string that is to be used to
glue the elements of list together.
For example:
@list = ("Here", "is", "a", "list");
$newstr = join ("::", @list);
After join is called, the value stored in $newstr
becomes the following string:
Here::is::a::list
The join string, :: in this case, appears between each
pair of joined elements. The most common join string is a single
blank space; however, you can use any value as the join string,
including the value resulting from an expression.
The sprintf function behaves like the printf
function defined on Chapter 11, "Formatting Your Output,"
except that the formatted string is returned by the function instead
of being written to a file. This enables you to assign the string
to another variable.
The syntax for the sprintf function is
sprintf (string, fields);
string is the character string to print, and fields
is a list of values to substitute into the string.
Listing 13.15 is an example that uses sprintf to build
a string.
Listing 13.15. A program that uses sprintf.
1: #!/usr/local/bin/perl
2:
3: $num = 26;
4: $outstr = sprintf("%d = %x hexadecimal or %o octal\n",
5: $num, $num, $num);
6: print ($outstr);
$ program14_9
26 = 1a hexadecimal or 32 octal
$

Lines 4 and 5 take three copies of the
value stored in $num and include them as part of a string.
The field specifiers %d, %x, and %o
indicate how the values are to be formatted.
%d Indicates an integer displayed in the usual decimal
(base-10) format
%x Indicates an integer displayed in hexadecimal (base-16)
format
%o Indicates an integer displayed in octal (base-8) format
The created string is returned by sprintf. Once it has
been created, it behaves just like any other Perl character string;
in particular, it can be assigned to a scalar variable, as in
this example. Here, the string containing the three copies of
$num is assigned to the scalar variable $outstr.
Line 6 then prints this string.
| NOTE |
For more information on field specifiers or on how printf works, refer to Chapter 11, which lists the field specifiers defined and provides a description of the syntax of printf
|
ToChapter, you learned about three types of built-in Perl functions:
functions that handle process and program control, functions that
perform mathematical operations, and functions that manipulate
strings.
With the process- and program-control functions, you can start
new processes, stop the current program or other processes, or
temporarily halt the current program. You also can create a pipe
that sends data from one of your created processes to another.
With the functions that perform mathematical operations, you can
obtain the sine, cosine, and arctangent of a value. You also can
calculate the natural logarithm and square root of a value, or
use the value as an exponent of base e.
You also can generate random numbers and define the seed to use
when generating the numbers.
Functions that search character strings include index,
which searches for a substring starting from the left of a string,
and rindex, which searches for a substring starting from
the right of a string. You can retrieve the length of a character
string using length. By using the translate operator
tr in conjunction with the system variable $_,
you can count the number of occurrences of a particular character
or set of characters in a string. The pos function enables
you to determine or set the current pattern-matching location
in a string.
The function substr enables you to extract a substring
from a string and use it in an expression or assignment statement.
substr also can be used to replace a portion of a string
or append to the front or back end of the string.
The lc and uc functions convert strings to lowercase
or uppercase. To convert the first letter of a string to lowercase
or uppercase, use lcfirst or ucfirst.
quotemeta places a backslash in front of every non-word
character in a string.
You can create new character strings using join and sprintf.
join creates a string by joining elements of a list,
and sprintf builds a string using field specifiers that
specify the string format.
| Q: | How does Perl generate random numbers?
| | A: | Basically, by performing arithmetic operations using very
large numbers. If the numbers for these arithmetic operations
are carefully chosen, a sequence of "pseudo-random"
numbers can be generated by repeating the set of arithmetic operations
and returning their results.
The random-number seed provided by srand supplies
the initial value for one of the numbers used in the set of arithmetic
operations. This ensures that the sequence of pseudo-random numbers
starts with a different result each time. |
| Q: | What programs can be called using system?
| A: | Any program that you can run from your terminal can be
run using system. |
| Q: | How many processes can a program create using fork?
| A: | Perl provides no limit on how many processes can be created
at a time. However, the performance of your system will be adversely
affected if you generate too many processes at once.
In particular, programs that call fork and wind
up in an infinite loop are sometimes called fork bombs, because
they generate thousands of processes and grind your machine to
an effective halt. (Your system administrator will not be pleased
with you if you do this!) |
| Q: | How can I send signals to a process without killing it?
| A: | The kill function actually can send any signal
supported by your machine to any running process (that you can
access).
Refer to the UNIX system documentation for details on the
signals you can send and what their names are. |
| Q: | What is the difference between the %d and %ld
format specifiers in sprintf?
| A: | %ld defines a "long integer." It refers to the largest number of bits that your local machine can use
to store an integer. (This is often 32 bits.) %d, on
the other hand, is equivalent to your machine's standard integer
format. On some machines, %ld and %d are equivalent.
If you are not sure how many bits your machine uses to store
integers, or you know you are going to be dealing with large numbers,
it's safer to use %ld. (The same holds true for all other
integer formats, such as %lx and %lo.) |
| Q: | What is the difference between the %c and %s
format specifiers in sprintf?
| A: | %c undoes the effect of the ord function.
It converts a scalar value into the equivalent ASCII character.
(Its behavior is similar to that of the chr function
in Pascal.)
%s treats a scalar value as a character string and
inserts it into the string at the place specified.
| | | | | |
The Workshop provides quiz questions to help you solidify your
understanding of the material covered and exercises to give you
experience in using what you've learned. Try and understand the
quiz and exercise answers before you go on to tomorrow's lesson.
- What do these functions do?
a. srand
b. pipe
c. atan2
d. sleep
e. gmtime
- Explain the differences between fork, system,
and exec.
- Explain the differences between wait and waitpid.
- How can you obtain the value of p?
- How can you obtain the value of the mathematical constant
e?
- What sprintf specifiers produce the following?
a. A hexadecimal number
b. An octal number
c. A floating-point number in exponential
format
d. A floating-point number in standard
(fixed) format
- If the scalar variable $string contains abcdefgh,
what do the following calls return?
a substr ($string, 0, 3);
b. substr ($string,
4);
c. substr ($string, -2, 2);
d. substr ($string, 2, 0);
- Assume $string contains the value abcdabcd.
What value is returned by each of the following calls?
a. index ($string, "bc");
b. index ($string,
"bcde");
c. index ($string, "bc", 1);
d. index ($string, "cd", 3);
e. rindex ($string, "bc");
- Assume $string contains the value abcdabcd\n
(the last character being a trailing newline character). What
is returned in $retval by the following?
a. $_ = $string; $retval = tr/ab/ab/;
b. $retval = length
($string);
- Write a program that uses fork and waitpid
to generate a total of three processes (including the program).
Have each process print a line, and have the lines appear in a
specified order.
- Write a program that reads input from a file named temp
and writes it to the standard output file. Write another program
that reads input from the standard output file, writes it to temp,
and uses exec to call the first program.
- Write a program that prints the natural logarithm of the integers
between 1 and 100.
- Write a program that computes the sum of the numbers from
1 to 10 ** n for values of n from 1 to 6. For
each computed value, use times to calculate the amount
of time each computation takes. Print these calculation times.
- Write a program that reads an integer value and prints the
sine, cosine, and tangent of the value. Assume that the input
value is in degrees.
- BUG BUSTER: What is wrong with the following program?
#!/usr/local/bin/perl
print ("Here is a line of
output. ");
system ("w");
print ("Here is the rest of the line.\n");
- Write a program that uses index to print out the
locations of the letters a, e, i, o,
and u in an input line.
- Write a program that uses rindex to do the same thing
as the one in Exercise 1.
- Write a program that uses substr to do the same thing
as the one in Exercise 1. (Hint: This will require many calls
to substr!)
- Write a program that uses tr to count all the occurrences
of a, e, i, o, and u
in an input line.
- Write a program that reads a number. If the number is a floating-point
value, print it in exponential and fixed-point form. If the number
is an integer, print it in decimal, octal, and hexadecimal form.
(Hint: Recall that printf and sprintf use the
same field specifiers.)
- BUG BUSTER: What is wrong with the following program?
#!/usr/local/bin/perl
$mystring = <STDIN>;
$lastfound = length ($mystring);
while ($lastfound != -1) {
$lastfound = index($mystring, "xyz", $lastfound);
}

|