Web based School

- 15 -
Streamlining SQL Statements for Improved Performance

Streamlining SQL statements is as much a part of application performance as database designing and tuning. No matter how fine-tuned the database or how sound the database structure, you will not receive timely query results that are acceptable to you, or even worse, the customer, if you don't follow some basic guidelines. Trust us, if the customer is not satisfied, then you can bet your boss won't be satisfied either.

Objectives

You already know about the major components of the relational database language of SQL and how to communicate with the database; now it's time to apply your knowledge to real-life performance concerns. The objective of Chapter 15 is to recommend methods for improving the performance of, or streamlining, an SQL statement. By the end of today, you should

Understand the concept of streamlining your SQL code
Understand the differences between batch loads and transactional processing and their effects on database performance
Be able to manipulate the conditions in your query to expedite data retrieval
Be familiar with some underlying elements that affect the tuning of the entire database

Here's an analogy to help you understand the phrase streamline an SQL statement: The objective of competitive swimmers is to complete an event in as little time as possible without being disqualified. The swimmers must have an acceptable technique, be able to torpedo themselves through the water, and use all their physical resources as effectively as possible. With each stroke and breath they take, competitive swimmers remain streamlined and move through the water with very little resistance.

Look at your SQL query the same way. You should always know exactly what you want to accomplish and then strive to follow the path of least resistance. The more time you spend planning, the less time you'll have to spend revising later. Your goal should always be to retrieve accurate data and to do so in as little time as possible. An end user waiting on a slow query is like a hungry diner impatiently awaiting a tardy meal. Although you can write most queries in several ways, the arrangement of the components within the query is the factor that makes the difference of seconds, minutes, and sometimes hours when you execute the query. Streamlining SQL is the process of finding the optimal arrangement of the elements within your query.

In addition to streamlining your SQL statement, you should also consider several other factors when trying to improve general database performance, for example, concurrent user transactions that occur within a database, indexing of tables, and deep-down database tuning.

NOTE: Today's examples use Personal Oracle7 and tools that are available with the Oracle7.3 relational database management system. The concepts discussed toChapter are not restricted to Oracle; they may be applied to other relational database management systems.

Make Your SQL Statements Readable

Even though readability doesn't affect the actual performance of SQL statements, good programming practice calls for readable code. Readability is especially important if you have multiple conditions in the WHERE clause. Anyone reading the clause should be able to determine whether the tables are being joined properly and should be able to understand the order of the conditions.

Try to read this statement:

SQL> SELECT EMPLOYEE_TBL.EMPLOYEE_ID,

EMPLOYEE_TBL.NAME,EMPLOYEE_PAY_TBL.SALARY,EMPLOYEE_PAY_TBL.HIRE_DATE
  2  FROM EMPLOYEE_TBL, EMPLOYEE_PAY_TBL
  3  WHERE EMPLOYEE_TBL.EMPLOYEE_ID = EMPLOYEE_PAY_TBL.EMPLOYEE_ID AND
  4  EMPLOYEE_PAY_TBL.SALARY > 30000 OR (EMPLOYEE_PAY_TBL.SALARY BETWEEN 25000
  5  AND 30000 AND EMPLOYEE_PAY_TBL.HIRE_DATE < SYSDATE - 365);

Here's the same query reformatted to enhance readability:

SQL> SELECT E.EMPLOYEE_ID, E.NAME, P.SALARY, P.HIRE_DATE
  2  FROM EMPLOYEE_TBL E,
  3       EMPLOYEE_PAY_TBL P
  4  WHERE E.EMPLOYEE_ID = P.EMPLOYEE_ID
  5    AND P.SALARY > 30000
  6     OR (P.SALARY BETWEEN 25000 AND 30000
  7    AND P.HIRE_DATE < SYSDATE - 365);

NOTE: Notice the use of table aliases in the preceding query. EMPLOYEE_TBL in line 2 has been assigned the alias E, and EMPLOYEE_PAY_TBL in line 3 has been assigned the alias P. You can see that in lines 4, 5, 6, and 7, the E and P stand for the full table names. Aliases require much less typing than spelling out the full table name, and even more important, queries that use aliases are more organized and easier to read than queries that are cluttered with unnecessarily long full table names.

The two queries are identical, but the second one is obviously much easier to read. It is very structured; that is, the logical components of the query have been separated by carriage returns and consistent spacing. You can quickly see what is being selected (the SELECT clause), what tables are being accessed (the FROM clause), and what conditions need to be met (the WHERE clause).

The Full-Table Scan

A full-table scan occurs when the database server reads every record in a table in order to execute an SQL statement. Full-table scans are normally an issue when dealing with queries or the SELECT statement. However, a full-table scan can also come into play when dealing with updates and deletes. A full-table scan occurs when the columns in the WHERE clause do not have an index associated with them. A full-table scan is like reading a guide from cover to cover, trying to find a keyword. Most often, you will opt to use the index.

You can avoid a full-table scan by creating an index on columns that are used as conditions in the WHERE clause of an SQL statement. Indexes provide a direct path to the data the same way an index in a guide refers the reader to a page number. Adding an index speeds up data access.

Although programmers usually frown upon full-table scans, they are sometimes appropriate. For example:

You are selecting most of the rows from a table.
You are updating every row in a table.
The tables are small.

In the first two cases an index would be inefficient because the database server would have to refer to the index, read the table, refer to the index again, read the table again, and so on. On the other hand, indexes are most efficient when the data you are accessing is a small percentage, usually no more than 10 to 15 percent, of the total data contained within the table.

In addition, indexes are best used on large tables. You should always consider table size when you are designing tables and indexes. Properly indexing tables involves familiarity with the data, knowing which columns will be referenced most, and may require experimentation to see which indexes work best.

NOTE: When speaking of a "large table," large is a relative term. A table that is extremely large to one individual may be minute to another. The size of a table is relative to the size of other tables in the database, to the disk space available, to the number of disks available, and simple common sense. Obviously, a 2GB table is large, whereas a 16KB table is small. In a database environment where the average table size is 100MB, a 500MB table may be considered massive.

Adding a New Index

You will often find situations in which an SQL statement is running for an unreasonable amount of time, although the performance of other statements seems to be acceptable; for example, when conditions for data retrieval change or when table structures change.

We have also seen this type of slowdown when a new screen or window has been added to a front-end application. One of the first things to do when you begin to troubleshoot is to find out whether the target table has an index. In most of the cases we have seen, the target table has an index, but one of the new conditions in the WHERE clause may lack an index. Looking at the WHERE clause of the SQL statement, we have asked, Should we add another index? The answer may be yes if:

The most restrictive condition(s) returns less than 10 percent of the rows in a table.
The most restrictive condition(s) will be used often in an SQL statement.
Condition(s) on columns with an index will return unique values.
Columns are often referenced in the ORDER BY and GROUP BY clauses.

Composite indexes may also be used. A composite index is an index on two or more columns in a table. These indexes can be more efficient than single-column indexes if the indexed columns are often used together as conditions in the WHERE clause of an SQL statement. If the indexed columns are used separately as well as together, especially in other queries, single-column indexes may be more appropriate. Use your judgment and run tests on your data to see which type of index best suits your database.

Arrangement of Elements in a Query

The best arrangement of elements within your query, particularly in the WHERE clause, really depends on the order of the processing steps in a specific implementation. The arrangement of conditions depends on the columns that are indexed, as well as on which condition will retrieve the fewest records.

You do not have to use a column that is indexed in the WHERE clause, but it is obviously more beneficial to do so. Try to narrow down the results of the SQL statement by using an index that returns the fewest number of rows. The condition that returns the fewest records in a table is said to be the most restrictive condition. As a general statement, you should place the most restrictive conditions last in the WHERE clause. (Oracle's query optimizer reads a WHERE clause from the bottom up, so in a sense, you would be placing the most restrictive condition first.)

When the optimizer reads the most restrictive condition first, it is able to narrow down the first set of results before proceeding to the next condition. The next condition, instead of looking at the whole table, should look at the subset that was selected by the most selective condition. Ultimately, data is retrieved faster. The most selective condition may be unclear in complex queries with multiple conditions, subqueries, calculations, and several combinations of the AND, OR, and LIKE.

TIP: Always check your database documentation to see how SQL statements are processed in your implementation.

The following test is one of many we have run to measure the difference of elapsed time between two uniquely arranged queries with the same content. These examples use Oracle7.3 relational database management system. Remember, the optimizer in this implementation reads the WHERE clause from the bottom up.

Before creating the SELECT statement, we selected distinct row counts on each condition that we planned to use. Here are the values selected for each condition:

Condition	Distinct Values
`calc_ytd = '-2109490.8'`	13,000 +
`dt_stmp = '01-SEP-96'`	15
`output_cd = '001'`	13
`activity_cd = 'IN'`	10
`status_cd = 'A'`	4
`function_cd = '060'`	6

NOTE: The most restrictive condition is also the condition with the most distinct values.

The next example places the most restrictive conditions first in the WHERE clause:

INPUT:

SQL> SET TIMING ON
  2  SELECT COUNT(*)
  3  FROM FACT_TABLE
  4  WHERE CALC_YTD = '-2109490.8'
  5    AND DT_STMP = '01-SEP-96'
  6    AND OUTPUT_CD = '001'
  7    AND ACTIVITY_CD = 'IN'
  8    AND STATUS_CD = 'A'
  9    AND FUNCTION_CD = '060';

OUTPUT:

COUNT(*)
--------
       8
1 row selected.
Elapsed:  00:00:15.37

This example places the most restrictive conditions last in the WHERE clause:

INPUT/OUTPUT:

SQL> SET TIMING ON
  2  SELECT COUNT(*)
  3  FROM FACT_TABLE
  4  WHERE FUNCTION_CD = '060'
  5    AND STATUS_CD = 'A'
  6    AND ACTIVITY_CD = 'IN'
  7    AND OUTPUT_CD = '001'
  8    AND DT_STMP = '01-SEP-96'
  9    AND CALC_YTD = '-2109490.8';

COUNT(*)
--------
       8
1 row selected.
Elapsed:  00:00:01.80

ANALYSIS:

Notice the difference in elapsed time. Simply changing the order of conditions according to the given table statistics, the second query ran almost 14 seconds faster than the first one. Imagine the difference on a poorly structured query that runs for three hours!

Procedures

For queries that are executed on a regular basis, try to use procedures. A procedure is a potentially large group of SQL statements. (Refer to Chapter 13, "Advanced SQL Topics.")

Procedures are compiled by the database engine and then executed. Unlike an SQL statement, the database engine need not optimize the procedure before it is executed. Procedures, as opposed to numerous individual queries, may be easier for the user to maintain and more efficient for the database.

Avoiding OR

Avoid using the logical operator OR in a query if possible. OR inevitably slows down nearly any query against a table of substantial size. We find that IN is generally much quicker than OR. This advice certainly doesn't agree with documentation stating that optimizers convert IN arguments to OR conditions. Nevertheless, here is an example of a query using multiple ORs:

INPUT:

SQL> SELECT *
  2  FROM FACT_TABLE
  3  WHERE STATUS_CD = 'A'
  4     OR STATUS_CD = 'B'
  5     OR STATUS_CD = 'C'
  6     OR STATUS_CD = 'D'
  7     OR STATUS_CD = 'E'
  8     OR STATUS_CD = 'F'
  9  ORDER BY STATUS_CD;

Here is the same query using SUBSTR and IN:

INPUT:

SQL> SELECT *
  2  FROM FACT_TABLE
  3  WHERE STATUS_CD IN ('A','B','C','D','E','F')
  4  ORDER BY STATUS_CD;

ANALYSIS:

Try testing something similar for yourself. Although guides are excellent sources for standards and direction, you will find it is often useful to come to your own conclusions on certain things, such as performance.

Here is another example using SUBSTR and IN. Notice that the first query combines LIKE with OR.

INPUT:

SQL> SELECT *
  2  FROM FACT_TABLE
  3  WHERE PROD_CD LIKE 'AB%'
  4     OR PROD_CD LIKE 'AC%'
  5     OR PROD_CD LIKE 'BB%'
  6     OR PROD_CD LIKE 'BC%'
  7     OR PROD_CD LIKE 'CC%'
  8  ORDER BY PROD_CD;

SQL> SELECT *
  2  FROM FACT_TABLE
  3  WHERE SUBSTR(PROD_CD,1,2) IN ('AB','AC','BB','BC','CC')
  4  ORDER BY PROD_CD;

ANALYSIS:

The second example not only avoids the OR but also eliminates the combination of the OR and LIKE operators. You may want to try this example to see what the real-time performance difference is for your data.

OLAP Versus OLTP

When tuning a database, you must first determine what the database is being used for. An online analytical processing (OLAP) database is a system whose function is to provide query capabilities to the end user for statistical and general informational purposes. The data retrieved in this type of environment is often used for statistical reports that aid in the corporate decision-making process. These types of systems are also referred to as decision support systems (DSS). An online transactional processing (OLTP) database is a system whose main function is to provide an environment for end-user input and may also involve queries against day-to-Chapter information. OLTP systems are used to manipulate information within the database on a daily basis. Data warehouses and DSSs get their data from online transactional databases and sometimes from other OLAP systems.

OLTP Tuning

A transactional database is a delicate system that is heavily accessed in the form of transactions and queries against day-to-Chapter information. However, an OLTP does not usually require a vast sort area, at least not to the extent to which it is required in an OLAP environment. Most OLTP transactions are quick and do not involve much sorting.

One of the biggest issues in a transactional database is rollback segments. The amount and size of rollback segments heavily depend on how many users are concurrently accessing the database, as well as the amount of work in each transaction. The best approach is to have several rollback segments in a transactional environment.

Another concern in a transactional environment is the integrity of the transaction logs, which are written to after each transaction. These logs exist for the sole purpose of recovery. Therefore, each SQL implementation needs a way to back up the logs for use in a "point in time recovery." SQL Server uses dump devices; Oracle uses a database mode known as ARCHIVELOG mode. Transaction logs also involve a performance consideration because backing up logs requires additional overhead.

OLAP Tuning

Tuning OLAP systems, such as a data warehouse or decision support system, is much different from tuning a transaction database. Normally, more space is needed for sorting.

Because the purpose of this type of system is to retrieve useful decision-making data, you can expect many complex queries, which normally involve grouping and sorting of data. Compared to a transactional database, OLAP systems typically take more space for the sort area but less space for the rollback area.

Most transactions in an OLAP system take place as part of a batch process. Instead of having several rollback areas for user input, you may resort to one large rollback area for the loads, which can be taken offline during daily activity to reduce overhead.

Batch Loads Versus Transactional Processing

A major factor in the performance of a database and SQL statements is the type of processing that takes place within a database. One type of processing is OLTP, discussed earlier today. When we talk about transactional processing, we are going to refer to two types: user input and batch loads.

Regular user input usually consists of SQL statements such as INSERT, UPDATE, and DELETE. These types of transactions are often performed by the end user, or the customer. End users are normally using a front-end application such as PowerBuilder to interface with the database, and therefore they seldom issue visible SQL statements. Nevertheless, the SQL code has already been generated for the user by the front-end application.

Your main focus when optimizing the performance of a database should be the end-user transactions. After all, "no customer" equates to "no database," which in turn means that you are out of a job. Always try to keep your customers happy, even though their expectations of system/database performance may sometimes be unreasonable. One consideration with end-user input is the number of concurrent users. The more concurrent database users you have, the greater the possibilities of performance degradation.

What is a batch load? A batch load performs heaps of transactions against the database at once. For example, suppose you are archiving last year's data into a massive history table. You may need to insert thousands, or even millions, of rows of data into your history table. You probably wouldn't want to do this task manually, so you are likely to create a batch job or script to automate the process. (Numerous techniques are available for loading data in a batch.) Batch loads are notorious for taxing system and database resources. These database resources may include table access, system catalog access, the database rollback segment, and sort area space; system resources may include available CPU and shared memory. Many other factors are involved, depending on your operating system and database server.

Both end-user transactions and batch loads are necessary for most databases to be successful, but your system could experience serious performance problems if these two types of processing lock horns. Therefore, you should know the difference between them and keep them segregated as much as possible. For example, you would not want to load massive amounts of data into the database when user activity is high. The database response may already be slow because of the number of concurrent users. Always try to run batch loads when user activity is at a minimum. Many shops reserve times in the evenings or early morning to load data in batch to avoid interfering with daily processing.

You should always plan the timing for massive batch loads, being careful to avoid scheduling them when the database is expected to be available for normal use. Figure 15.1 depicts heavy batch updates running concurrently with several user processes, all contending for system resources.

Figure 15.1.
System resource contention.

As you can see, many processes are contending for system resources. The heavy batch updates that are being done throw a monkey wrench into the equation. Instead of the system resources being dispersed somewhat evenly among the users, the batch updates appear to be hogging them. This situation is just the beginning of resource contention. As the batch transactions proceed, the user processes may eventually be forced out of the picture. This condition is not a good way of doing business. Even if the system has only one user, significant contention for that user could occur.

Another problem with batch processes is that the process may hold locks on a table that a user is trying to access. If there is a lock on a table, the user will be refused access until the lock is freed by the batch process, which could be hours. Batch processes should take place when system resources are at their best if possible. Don't make the users' transactions compete with batch. Nobody wins that game.

Optimizing Data Loads by Dropping Indexes

One way to expedite batch updates is by dropping indexes. Imagine the history table with many thousands of rows. That history table is also likely to have one or more indexes. When you think of an index, you normally think of faster table access, but in the case of batch loads, you can benefit by dropping the index(es).

When you load data into a table with an index, you can usually expect a great deal of index use, especially if you are updating a high percentage of rows in the table. Look at it this way. If you are studying a guide and highlighting key points for future reference, you may find it quicker to browse through the guide from beginning to end rather than using the index to locate your key points. (Using the index would be efficient if you were highlighting only a small portion of the guide.)

To maximize the efficiency of batch loads/updates that affect a high percentage of rows in a table, you can take these three basic steps to disable an index:

1. Drop the appropriate index(es).

2. Load/update the table's data.

3. Rebuild the table's index.

A Frequent `COMMIT` Keeps the DBA Away

When performing batch transactions, you must know how often to perform a "commit." As you learned on Chapter 11, "Controlling Transactions," a COMMIT statement finalizes a transaction. A COMMIT saves a transaction or writes any changes to the applicable table(s). Behind the scenes, however, much more is going on. Some areas in the database are reserved to store completed transactions before the changes are actually written to the target table. Oracle calls these areas rollback segments. When you issue a COMMIT statement, transactions associated with your SQL session in the rollback segment are updated in the target table. After the update takes place, the contents of the rollback segment are removed. A ROLLBACK command, on the other hand, clears the contents of the rollback segment without updating the target table.

As you can guess, if you never issue a COMMIT or ROLLBACK command, transactions keep building within the rollback segments. Subsequently, if the data you are loading is greater in size than the available space in the rollback segments, the database will essentially come to a halt and ban further transactional activity. Not issuing COMMIT commands is a common programming pitfall; regular COMMITs help to ensure stable performance of the entire database system.

The management of rollback segments is a complex and vital database administrator (DBA) responsibility because transactions dynamically affect the rollback segments, and in turn, affect the overall performance of the database as well as individual SQL statements. So when you are loading large amounts of data, be sure to issue the COMMIT command on a regular basis. Check with your DBA for advice on how often to commit during batch transactions. (See Figure 15.2.)

Figure 15.2.
The rollback area.

As you can see in Figure 15.2, when a user performs a transaction, the changes are retained in the rollback area.

Rebuilding Tables and Indexes in a Dynamic Environment

The term dynamic database environment refers to a large database that is in a constant state of change. The changes that we are referring to are frequent batch updates and continual daily transactional processing. Dynamic databases usually entail heavy OLTP systems, but can also refer to DSSs or data warehouses, depending upon the volume and frequency of data loads.

The result of constant high-volume changes to a database is growth, which in turn yields fragmentation. Fragmentation can easily get out of hand if growth is not managed properly. Oracle allocates an initial extent to tables when they are created. When data is loaded and fills the table's initial extent, a next extent, which is also allocated when the table is created, is taken.

Sizing tables and indexes is essentially a DBA function and can drastically affect SQL statement performance. The first step in growth management is to be proactive. Allow room for tables to grow from Chapter one, within reason. Also plan to defragment the database on a regular basis, even if doing so means developing a weekly routine. Here are the basic conceptual steps involved in defragmenting tables and indexes in a relational database management system:

1. Get a good backup of the table(s) and/or index(es).

2. Drop the table(s) and/or index(es).

3. Rebuild the table(s) and/or index(es) with new space allocation.

4. Restore the data into the newly built table(s).

5. Re-create the index(es) if necessary.

6. Reestablish user/role permissions on the table if necessary.

7. Save the backup of your table until you are absolutely sure that the new table was built successfully. If you choose to discard the backup of the original table, you should first make a backup of the new table after the data has been fully restored.

WARNING: Never get rid of the backup of your table until you are sure that the new table was built successfully.

The following example demonstrates a practical use of a mailing list table in an Oracle database environment.

INPUT:

CREATE TABLE MAILING_TBL_BKUP AS
SELECT * FROM MAILING_TBL;

OUTPUT:

Table Created.

INPUT/OUTPUT:

drop table mailing_tbl;

Table Dropped.

CREATE TABLE MAILING_TBL
    (
    INDIVIDUAL_ID        VARCHAR2(12)     NOT NULL,
    INDIVIDUAL_NAME    VARCHAR2(30)    NOT NULL,
    ADDRESS            VARCHAR(40)    NOT NULL,
    CITY            VARCHAR(25)    NOT NULL,
    STATE            VARCHAR(2)    NOT NULL,
    ZIP_CODE        VARCHAR(9)    NOT NULL,
    )
    TABLESPACE TABLESPACE_NAME
    STORAGE    (    INITIAL        NEW_SIZE,
            NEXT        NEW_SIZE    );

Table created.

INSERT INTO MAILING_TBL
select * from mailing_tbl_bkup;

93,451 rows inserted.

CREATE INDEX MAILING_IDX ON MAILING TABLE
    (
    INDIVIDUAL_ID
    )
    TABLESPACE TABLESPACE_NAME
    STORAGE    (    INITIAL        NEW_SIZE,
            NEXT        NEW_SIZE    );

Index Created.

grant select on mailing_tbl to public;

Grant Succeeded.

drop table mailing_tbl_bkup;

Table Dropped.

ANALYSIS:

Rebuilding tables and indexes that have grown enables you to optimize storage, which improves overall performance. Remember to drop the backup table only after you have verified that the new table has been created successfully. Also keep in mind that you can achieve the same results with other methods. Check the options that are available to you in your database documentation.

Tuning the Database

Tuning a database is the process of fine-tuning the database server's performance. As a newcomer to SQL, you probably will not be exposed to database tuning unless you are a new DBA or a DBA moving into a relational database environment. Whether you will be managing a database or using SQL in applications or programming, you will benefit by knowing something about the database-tuning process. The key to the success of any database is for all parties to work together. Some general tips for tuning a database follow.

Minimize the overall size required for the database.

It's good to allow room for growth when designing a database, but don't go overboard. Don't tie up resources that you may need to accommodate database growth.

Experiment with the user process's time-slice variable.

This variable controls the amount of time the database server's scheduler allocates to each user's process.

Optimize the network packet size used by applications.

The larger the amount of data sent over the network, the larger the network packet size should be. Consult your database and network documentation for more details.

Store transaction logs on separate hard disks.

For each transaction that takes place, the server must write the changes to the transaction logs. If you store these log files on the same disk as you store data, you could create a performance bottleneck. (See Figure 15.3.)

Stripe extremely large tables across multiple disks.

If concurrent users are accessing a large table that is spread over multiple disks, there is much less chance of having to wait for system resources. (See Figure 15.3.)

Store database sort area, system catalog area, and rollback areas on separate hard disks.

These are all areas in the database that most users access frequently. By spreading these areas over multiple disk drives, you are maximizing the use of system resources. (See Figure 15.3.)

Add CPUs.

This system administrator function can drastically improve database performance. Adding CPUs can speed up data processing for obvious reasons. If you have multiple CPUs on a machine, then you may be able to implement parallel processing strategies. See your database documentation for more information on parallel processing, if it is available with your implementation.

Add memory.

Generally, the more the better.

Store tables and indexes on separate hard disks.

You should store indexes and their related tables on separate disk drives when- ever possible. This arrangement enables the table to be read at the same time the index is being referenced on another disk. The capability to store objects on multiple disks may depend on how many disks are connected to a controller. (See Figure 15.3.)

Figure 15.3 shows a simple example of how you might segregate the major areas of your database.

Figure 15.3.
Using available disks to enhance performance.

The scenario in Figure 15.3 uses four devices: disk01 through disk04. The objective when spreading your heavy database areas and objects is to keep areas of high use away from each another.

Disk01-- The system catalog stores information about tables, indexes, users, statistics, database files, sizing, growth information, and other pertinent data that is often accessed by a high percentage of transactions.
Disk02--Transaction logs are updated every time a change is made to a table (insert, update, or delete). Transaction logs are a grand factor in an online transactional database. They are not of great concern in a read-only environment, such as a data warehouse or DSS.
Disk03--Rollback segments are also significant in a transactional environment. However, if there is little transactional activity (insert, update, delete), rollback segments will not be heavily used.
Disk04-- The database's sort area, on the other hand, is used as a temporary area for SQL statement processing when sorting data, as in a GROUP BY or ORDER BY clause. Sort areas are typically an issue in a data warehouse or DSS. However, the use of sort areas should also be considered in a transactional environment.

TIP: Also note how the application tables and indexes have been placed on each disk. Tables and indexes should be spread as much as possible.

Notice that in Figure 15.3 the tables and indexes are stored on different devices. You can also see how a "Big Table" or index may be striped across two or more devices. This technique splits the table into smaller segments that can be accessed simultaneously. Striping a table or index across multiple devices is a way to control fragmentation. In this scenario, tables may be read while their corresponding indexes are being referenced, which increases the speed of overall data access.

This example is really quite simple. Depending on the function, size, and system-related issues of your database, you may find a similar method for optimizing system resources that works better. In a perfect world where money is no obstacle, the best configuration is to have a separate disk for each major database entity, including large tables and indexes.

NOTE: The DBA and system administrator should work together to balance database space allocation and optimize the memory that is available on the server.

Tuning a database very much depends on the specific database system you are using. Obviously, tuning a database entails much more than just preparing queries and letting them fly. On the other hand, you won't get much reward for tuning a database when the application SQL is not fine-tuned itself. Professionals who tune databases for a living often specialize on one database product and learn as much as they possibly can about its features and idiosyncrasies. Although database tuning is often looked upon as a painful task, it can provide very lucrative employment for the people who truly understand it.

Performance Obstacles

We have already mentioned some of the countless possible pitfalls that can hinder the general performance of a database. These are typically general bottlenecks that involve system-level maintenance, database maintenance, and management of SQL statement processing.

This section summarizes the most common obstacles in system performance and database response time.

Not making use of available devices on the server--A company purchases multiple disk drives for a reason. If you do not use them accordingly by spreading apart the vital database components, you are limiting the performance capabilities. Maximizing the use of system resources is just as important as maximizing the use of the database server capabilities.
Not performing frequent COMMITs--Failing to use periodic COMMITs or ROLLBACKs during heavy batch loads will ultimately result in database bottlenecks.
Allowing batch loads to interfere with daily processing--Running batch loads during times when the database is expected to be available will cause problems for everybody. The batch process will be in a perpetual battle with end users for system resources.
Being careless when creating SQL statements--Carelessly creating complex SQL statements will more than likely contribute to substandard response time.

TIP: You can use various methods to optimize the structure of an SQL statement, depending upon the steps taken by the database server during SQL statement processing.

Running batch loads with table indexes--You could end up with a batch load that runs all Chapter and all night, as opposed to a batch load that finishes within a few hours. Indexes slow down batch loads that are accessing a high percentage of the rows in a table.
Having too many concurrent users for allocated memory--As the number of concurrent database and system users grows, you may need to allocate more memory for the shared process. See your system administrator.
Creating indexes on columns with few unique values--Indexing on a column such as GENDER, which has only two unique values, is not very efficient. Instead, try to index columns that will return a low percentage of rows in a query.
Creating indexes on small tables--By the time the index is referenced and the data read, a full-table scan could have been accomplished.
Not managing system resources efficiently--Poor management of system resources can result from wasted space during database initialization, table creation, uncontrolled fragmentation, and irregular system/database maintenance.
Not sizing tables and indexes properly--Poor estimates for tables and indexes that grow tremendously in a large database environment can lead to serious fragmentation problems, which if not tended to, will snowball into more serious problems.

Built-In Tuning Tools

Check with your DBA or database vendor to determine what tools are available to you for performance measuring and tuning. You can use performance-tuning tools to identify deficiencies in the data access path; in addition, these tools can sometimes suggest changes to improve the performance of a particular SQL statement.

Oracle has two popular tools for managing SQL statement performance. These tools are explain plan and tkprof. The explain plan tool identifies the access path that will be taken when the SQL statement is executed. tkprof measures the performance by time elapsed during each phase of SQL statement processing. Oracle Corporation also provides other tools that help with SQL statement and database analysis, but the two mentioned here are the most popular. If you want to simply measure the elapsed time of a query in Oracle, you can use the SQL*Plus command SET TIMING ON.

SET TIMING ON and other SET commands are covered in more depth on Chapter 20, "SQL*Plus."

Sybase's SQL Server has diagnostic tools for SQL statements. These options are in the form of SET commands that you can add to your SQL statements. (These commands are similar to Oracle's SET commands). Some common commands are SET SHOWPLAN ON, SET STATISTIC IO ON, and SET STATISTICS TIME ON. These SET commands display output concerning the steps performed in a query, the number of reads and writes required to perform the query, and general statement-parsing information. SQL Server SET commands are covered on Day 19, "Transact-SQL: An Introduction."

Summary

Two major elements of streamlining, or tuning, directly affect the performance of SQL statements: application tuning and database tuning. Each has its own role, but one cannot be optimally tuned without the other. The first step toward success is for the technical team and system engineers to work together to balance resources and take full advantage of the database features that aid in improving performance. Many of these features are built into the database software provided by the vendor.

Application developers must know the data. The key to an optimal database design is thorough knowledge of the application's data. Developers and production programmers must know when to use indexes, when to add another index, and when to allow batch jobs to run. Always plan batch loads and keep batch processing separate from daily transactional processing.

Databases can be tuned to improve the performance of individual applications that access them. Database administrators must be concerned with the daily operation and performance of the database. In addition to the meticulous tuning that occurs behind the scenes, the DBA can usually offer creative suggestions for accessing data more efficiently, such as manipulating indexes or reconstructing an SQL statement. The DBA should also be familiar with the tools that are readily available with the database software to measure performance and provide suggestions for statement tweaking.

Q&A

Q If I streamline my SQL statement, how much of a gain in performance should I expect?

A Performance gain depends on the size of your tables, whether or not columns in the table are indexed, and other relative factors. In a very large database, a complex query that runs for hours can sometimes be cut to minutes. In the case of transactional processing, streamlining an SQL statement can save important seconds for the end user.

Q How do I coordinate my batch loads or updates?

A Check with the database administrator and, of course, with management when scheduling a batch load or update. If you are a system engineer, you probably will not know everything that is going on within the database.

Q How often should I commit my batch transactions?

A Check with the DBA for advice. The DBA will need to know approximately how much data you are inserting, updating, or deleting. The frequency of COMMIT statements should also take into account other batch loads occurring simultaneously with other database activities.

Q Should I stripe all of my tables?

A Striping offers performance benefits only for large tables and/or for tables that are heavily accessed on a regular basis.

Workshop

The Workshop provides quiz questions to help solidify your understanding of the material covered, as well as exercises to provide you with experience in using what you have learned. Try to answer the quiz and exercise questions before checking the answers in Appendix F, "Answers to Quizzes and Exercises."

Quiz

1. What does streamline an SQL statement mean?

2. Should tables and their corresponding indexes reside on the same disk?

3. Why is the arrangement of conditions in an SQL statement important?

4. What happens during a full-table scan?

5. How can you avoid a full-table scan?

6. What are some common hindrances of general performance?

Exercises

1. Make the following SQL statement more readable.

SELECT EMPLOYEE.LAST_NAME, EMPLOYEE.FIRST_NAME, EMPLOYEE.MIDDLE_NAME,
EMPLOYEE.ADDRESS, EMPLOYEE.PHONE_NUMBER, PAYROLL.SALARY, PAYROLL.POSITION,
EMPLOYEE.SSN, PAYROLL.START_DATE FROM EMPLOYEE, PAYROLL WHERE
EMPLOYEE.SSN = PAYROLL.SSN AND EMPLOYEE.LAST_NAME LIKE 'S%' AND
PAYROLL.SALARY > 20000;

2. Rearrange the conditions in the following query to optimize data retrieval time. Use the following statistics (on the tables in their entirety) to determine the order of the conditions:

593 individuals have the last name SMITH.

712 individuals live in INDIANAPOLIS.

3,492 individuals are MALE.

1,233 individuals earn a salary >= 30,000.

5,009 individuals are single.

Individual_id is the primary key for both tables.

SELECT M.INDIVIDUAL_NAME, M.ADDRESS, M.CITY, M.STATE, M.ZIP_CODE,
       S.SEX, S.MARITAL_STATUS, S.SALARY
FROM MAILING_TBL M,
     INDIVIDUAL_STAT_TBL S
WHERE M.NAME LIKE 'SMITH%'
  AND M.CITY = 'INDIANAPOLIS'
  AND S.SEX = 'MALE'
  AND S.SALARY >= 30000
  AND S.MARITAL_STATUS = 'S'
  AND M.INDIVIDUAL_ID = S.INDIVIDUAL_ID;
--------------