Web based School

What Is an RDBMS?

The Relational Database Model

Hierarchical and Network Models
Elements of the Relational Model

Relational Data Structures
Key Values and Referential Integrity
Relational Algebra

RDBMS Components

The RDBMS Kernel
The Data Dictionary

Nonprocedural Data Access (SQL)
Summary

1

What Is an RDBMS?

In recent years, database management systems (DBMS) have established themselves as the primary means of data storage for information systems ranging from large commercial transaction processing applications to PC-based desktop applications. At the heart of most of today's information systems is a relational database management system (RDBMS). RDBMSs have been the workhorse for data management operations for over a decade and continue to evolve and mature, providing sophisticated storage, retrieval, and distribution functions to enterprise-wide data processing and information management systems. Compared to the file systems, relational database management systems provide organizations with the capability to easily integrate and leverage the massive amounts of operational data into meaningful information systems. The evolution of high-powered database engines such as Oracle7 has fostered the development of advanced "enabling" technologies including client/server, data warehousing, and online analytical processing, all of which comprise the core of today's state-of-the-art information management systems.

Examine the components of the term relational database management system. First, a database is an integrated collection of related data. Given a specific data item, the structure of a database facilitates the access to data related to it, such as a student and all of his registered courses or an employee and his dependents. Next, a relational database is a type of database based in the relational model; non-relational databases commonly use a hierarchical, network, or object-oriented model as their basis. Finally, a relational database management system is the software that manages a relational database. These systems come in several varieties, ranging from single-user desktop systems to full-featured, global, enterprise-wide systems, such as Oracle7.

This chapter discusses the basic elements of a relational database management system, the relational database, and the software systems that manage it. Also included is a discussion of nonprocedural data access. If you are a new user to relational database technology, you'll have to change your thinking somewhat when it comes to referencing data nonprocedurally.

The Relational Database Model

Most of the database management systems used by commercial applications today are based on one of three basic models: the hierarchical model, the network model, or the relational model. The following sections describe the various differences and similarities of the models.

Hierarchical and Network Models

The first commercially available database management systems were of the CODASYL type, and many of them are still in use with mainframe-based, COBOL applications. Both network and hierarchical databases are quite complex in that they rely on the use of permanent internal pointers to relate records to each other. For example, in an accounts payable application, a vendor record might contain a physical pointer in its record structure that points to purchase order records. Each purchase order record in turn contains pointers to purchase order line item records.

The process of inserting, updating, and deleting records using these types of databases requires synchronization of the pointers, a task that must be performed by the application. As you might imagine, this pointer maintenance requires a significant amount of application code (usually written in COBOL) that at times can be quite cumbersome.

Elements of the Relational Model

Relational databases rely on the actual attribute values as opposed to internal pointers to link records. Instead of using an internal pointer from the vendor record to purchase order records, you would link the purchase order record to the vendor record using a common attribute from each record, such as the vendor identification number.

Although the concepts of academic theory underlying the relational model are somewhat complex, you should be familiar with are some basic concepts and terminology. Essentially, there are three basic components of the relational model: relational data structures, constraints that govern the organization of the data structures, and operations that are performed on the data structures.

Relational Data Structures

The relational model supports a single, "logical" structure called a relation, a two-dimensional data structure commonly called a table in the "physical" database. Attributes represent the atomic data elements that are related by the relation. For example, the Customer relation might contain such attributes about a customer as the customer number, customer name, region, credit status, and so on.

In relational database design literature, you might see a relation denoted as Relation(attribute1, attribute2, . . .) with the name of the relation followed by the attribute list enclosed in parentheses.

Customer(Customer_ID, Customer_Name, Region, . . .)

The actual data values for the attributes of a relation are stored in tuples, or rows, of the table. It is not necessary for a relation to have rows in order to be a relation; even if no data exists for the relation, the relation remains defined with its set of attributes. Figure 1.1 illustrates the basic elements of the Customer relation.

Figure 1.1. The basic components of a relation.

Key Values and Referential Integrity

Attributes are grouped with other attributes based on their dependency on a primary key value. A primary key is an attribute or group of attributes that uniquely identifies a row in a table. A table has only one primary key, and as a rule, every table has one. Because primary key values are used as identifiers, they cannot be null. Using the conventional notation for relations (shown in the note), an attribute is underlined to indicate that it is the primary key of the relation. If a primary key consists of several attributes, each attribute is underlined.

You can have additional attributes in a relation with values that you define as unique to the relation. Unlike primary keys, unique keys can contain null values. In practice, unique keys are used to prevent duplication in the table rather than identify rows. Consider a relation that contains the attribute, United States Social Security Number (SSN). In some rows, this attribute may be null in since not every person has a SSN; however for a row that contains a non-null value for the SSN attribute, the value must be unique to the relation.

Linking one relation to another typically involves an attribute that is common to both relations. The common attributes are usually a primary key from one table and a foreign key from the other. Referential integrity rules dictate that foreign key values in one relation reference the primary key values in another relation. Foreign keys might also reference the primary key of the same relation. Figure 1.2 illustrates two foreign key relationships.

Figure 1.2. Foreign keys that reference a primary key in another table as well as a primary key in the same table.

Many database design tools use underlines to denote primary keys in diagram and report views. Depending on the tool that you use, you might also see (PK) and (FK) next to attributes to denote primary and foreign keys, respectively.

You will notice frequent references to the DEPT and EMP sample tables throughout Oracle documentation. You can find these tables along with other sample database objects in Oracle's standard demonstration account. You usually access the account with the username/password combination scott/tiger.

You typically design a relational database using the rules of normalization that dictate which attributes belong in which relations. There are five levels (or forms) of normalization to which a data model can comply. Of the five, most database designs minimally conform to the third normal form. This form serves to alleviate redundancy in the data model, requiring each atomic data element to appear once in the data model and be dependent on one and only one primary key. Employing a normalized data model protects against insert, update, and delete anomalies that can arise as a result of incorrectly defined relations.

Relational Algebra

The relational model defines the operations that are permitted on a relation or group of relations. There are unary and binary relational operators, each of which result in another relation. You should find these operations somewhat intuitive and very similar to those used with set operations. Table 1.1 describes the seven operators used to manipulate relational structures. Binary operator types indicate that the operation uses two relations as operands; unary operators require a single relation as an operand.

Table 1.1. Algebra operations of the relational model.

*Operation*	*Type*	*Resulting Relation*
Union	Binary	Rows from the two relations are combined, eliminating duplicate rows.
Intersection	Binary	Rows common to two relations.
Difference	Binary	Rows that exist in the first relation but not in the second.
Projection	Unary	Rows that contain some of the columns from the source relation.
Selection	Unary	Rows from the source relation that meet query criteria.
Product	Binary	Concatenation of every row in one relation with every row in another.
Join	Binary	Concatenation of rows from one relation and related rows from another.

The source relations used by UNION, INTERSECTION, and DIFFERENCE must have attribute lists that match in number and data type.

RDBMS Components

Two important pieces of an RDBMS architecture are the kernel, which is the software, and the data dictionary, which consists of the system-level data structures used by the kernel to manage the database.

The RDBMS Kernel

You might think of an RDBMS as an operating system (or set of subsystems), designed specifically for controlling data access; its primary functions are storing, retrieving, and securing data. Like an operating system, Oracle7 manages and controls access to a given set of resources for concurrent database users. The subsystems of an RDBMS closely resemble those of a host operating system and tightly integrate with the host's services for machine-level access to resources such as memory, CPU, devices, and file structures. An RDBMS such as Oracle7 maintains its own list of authorized users and their associated privileges; manages memory caches and paging; controls locking for concurrent resource usage; dispatches and schedules user requests; and manages space usage within its tablespace structures. Figure 1.3 illustrates the primary subsystems of the Oracle7 kernel that manage the database.

Figure 1.3. An RDBMS and its multiple subsystems.

The Data Dictionary

A fundamental difference between an RDBMS and other database and file systems is in the way that they access data. A RDBMS enables you to reference physical data in a more abstract, logical fashion, providing ease and flexibility in developing application code. Programs using an RDBMS access data through a database engine, creating independence from the actual data source and insulating applications from the details of the underlying physical data structures. Rather than accessing a customer number as bytes 1 through 10 of the customer record, an application simply refers to the attribute Customer Number. The RDBMS takes care of where the field is stored in the database. Consider the amount of programming modifications that you must make if you change a record structure in a file system-based application. For example, if you move the customer number from bytes 1 through 10 to bytes 11 through 20 to accommodate an additional field, all the programs that use the customer number would require modification. However, using an RDBMS, the application code would continue to reference the attribute by name rather than by record position, alleviating the need for any modifications.

This data independence is possible because of the RDBMS's data dictionary. The data dictionary stores meta-data (data about data) for all the objects that reside in the database. Oracle7's data dictionary is a set of tables and database objects that is stored in a special area of the database and maintained exclusively by the Oracle7 kernel. As shown in Figure 1.4, requests to read or update the database are processed by the Oracle7 kernel using the information in the data dictionary. The information in the data dictionary validates the existence of the objects, provides access to them, and maps the actual physical storage location.

Figure 1.4. Access to application data through the Oracle7 Kernel and Data Dictionary.

Not only does the RDBMS take care of locating data, it also determines an optimal access path to store or retrieve the data. Oracle7 uses sophisticated algorithms that enable you to retrieve information either for the best response for the first set of rows, or for total throughput of all rows to be retrieved.

Nonprocedural Data Access (SQL)

An RDBMS differentiates itself with its capability to process a set of data; other file systems and database models process data in a record-by-record fashion. You communicate with an RDBMS using Structured Query Language (SQL, pronounced sequel). SQL is a nonprocedural language that is designed specifically for data access operations on normalized relational database structures. The primary difference between SQL and other conventional programming languages is that SQL statements specify what data operations should be performed rather than how to perform them. For example, consider a procedure to give a salary increase to a particular department for each employee who had not received a raise within the past six months. The code segments in Figure 1.5 illustrate the solution to the problem using both procedural and nonprocedural methods.

Figure 1.5. SQL programming versus traditional procedural programming methods.

Although the example in Figure 5.1 illustrates a simplistic scenario, consider a more complex application and the amount of programming that is alleviated by using SQL for data access. By reducing the amount of programming required for data access, the costs to develop and maintain the data access portions of an application are also reduced.

Summary

This chapter describes two aspects of a relational database management system: the relational database model and the database management system. The relational model defines relations, which are the underlying database structures; constraints, which are the rules that govern their relationships to one another; and the relational algebra operations that you can perform on relations. Relational database management systems work on sets of data and employ many of the concepts of basic set theory.

A full-featured management system for a relational database is a sophisticated, complex piece of software that functions very much like an operating system. One of the reasons that Oracle has been so successful and widely used is that it has been able to implement the same "logical" database operating system on a variety of host operating systems. User access to objects in the database is controlled by the RDBMS kernel and the meta-data stored in the data dictionary. Applications never access the data in the actual operating system data files directly; instead, all access is provided through the RDBMS.

Access to RDBMS data is accomplished through nonprocedural requests using SQL. Compared to conventional file system access, SQL provides "set-at-a-time" as opposed to "row-by-row" processing. The language elements and usage of SQL are covered in Chapter 4.