Perhaps you have been given the largest DEC Alpha computer known to mankind to run your NT Server with a huge disk farm, the maximum memory configuration, and only 10 users who want to print a few simple reports each day. Now back to the real world. Most of the typically encountered (whether UNIX, VMS, or NT) start out with plenty of capacity, but people soon start to see what can be done with them and start using them more and more. Over the past several decades, one trend is very clearapplications continue to demand more processing capacity, more memory, and more disk space as time progresses. Be cheerfulthis trend helps to keep us computer gurus employed.
The downside of this trend is that you have to be ready for the increased load that your users will be putting on your server and deal with performance problems that come up. Perhaps one of the features you liked when you first read about Windows NT was that line in the marketing materials about Windows NT being a self-tuning operating system. Yes, it is true that Windows NT takes much better care of itself than systems such as VMS and UNIX. Especially in older versions of UNIX, many of the parameters were poorly documented (if at all). The whole system could be brought to its knees if one of these parameters was not adjusted properly for the load applied to your specific system.
Having hired some of the folks who designed one of the best of the previous generation of operating systems (VMS, which was revolutionary for its day), Microsoft tried to take a logical scheme for allocating resources and make it one better with algorithms that help divert resources to where they are most needed. Windows NT does an admirable job (especially considering that it has not been out for all that many years) at getting the most out of your existing computer systems running NT. However, there are a few real-world considerations that you need to factor in against your joy at having a self-tuning operating system:
Your hardware has real limits. You can get only so much processing out of a 486/66 computer, no matter how well tuned it is. The same applies to all your disk drives, memory areas, and other hardware items.
You can configure your system poorly so that no amount of automated tuning will help. An example of this would be a file server that has ten disk drives, only one of which contains information of interest to the users and is therefore the one that bears all the load for data transfer operations.
You can have bad applications that either you write or you buy from a vendor. A relatively simple function (such as a device driver or financial database) can put a huge load on the system. You might not have much control over this as the system administrator, but you might want to coach your developers or influence your users who are purchasing applications.
Your server can be limited by the loading and performance of your network. Many experienced computer systems professionals still find all those wires, concentrators, and protocols to be a dark mystery. However, that mystery has a significant impact on the overall performance of client/server applications and network operating system services (such as file sharing and printing). Therefore, you need to be conversant on what the capabilities of these networks are and when they are affecting your performance.
This chapter is devoted to arming you with the knowledge of when you need to intervene to help keep your system running at peak performance. This process starts with an introduction to the components of your hardware and operating system that relate to performance. It then introduces the monitoring tools that are used to help you figure out if your system if operating well and, if not, where the problems lie. Next, this chapter covers the various steps that can be taken to alleviate common problems. Finally, there is a discussion of capacity planning, which helps you ensure that you have the resources you need before you need them.
One final note is that there are entire guides devoted to this topic (such as in the resource kit). The goal of this chapter is not to capture all the wisdom of these large volumes in a matter of a few pages. Instead, my goal is to present material that helps you understand the basics of the tuning, optimization, and capacity-planning processes. That, combined with an understanding of the most common problems, will equip you to handle the vast majority of NT Server installations out there. Finally, don't forget that not all the tuning responsibility lies at the operating system administrator level. Applications, especially complex databases and three-tier client/server applications, can humble the most powerful computer systems if they are not tuned properly. Therefore, system tuning might involve obtaining support from database administrators or application administrators to ensure that their applications are making efficient use of your system resources.
The Challenge of Performance Management
A common theme in most businesses today goes something like "more for less." Because you want to keep the salary levels of information systems professionals such as yourself high, it is preferable to cut costs in other areas. This is where performance management comes into play. Because you have little control over the prices charged by your hardware and software vendors, your job is to acquire tools that help you get the job done at a reasonable cost and then use those tools effectively. Performance management addresses the effective use of the tools you have been given.
Although performance management has it benefits, it can also be a challenge. Just as you should not open the hood of your car and start adjusting the carburetor if you are not familiar with engines, you cannot start tweaking the operating system until you understand both the basics of your hardware platform and the operating system itself. In the next section, this challenge is taken on, covering the basics of your typical PC architecture. NT is delivered on other computer platforms, but this is not a guide on comparative computer architectures, and the general concepts of the Intel world are not that alien to the other environments. Typically, they change the arrangement of the components and the names of the busses.
Once you are familiar with the basics of your computer and operating system, the next challenge is to know when to adjust things. Just as you could completely mess up a car that is working perfectly by making adjustments when they are not needed, you need to know when to make adjustments to your Windows NT Server. Coupled with knowing when to make the adjustment is the knowledge of what to adjust. Both of these subjects are answered by the performance monitoring utilities provided by NT.
Once you detect a problem, you need to know what has to be done about it. Sometimes you can make adjustments that fix the problem without having to buy additional hardware or software. You can become really popular with management if you can implement such cost-effective solutions. However, part of the challenge is knowing when to tune and when to buy. Shops that contract out for labor usually better appreciate the cost of an hour of a person's time. You don't win anything by spending a large number of hours (such as rewriting a large application for efficiency with a team of people) to fix a problem that could be solved just as easily by purchasing a $300 disk drive that splits the input/output load.
The final section in this chapter discusses capacity planning and routine performance monitoring. The basic philosophy behind this section is that knowing how to solve a problem is good. However, knowing how to avoid the problem altogether is even better. This section discusses how you can set up a program to keep an eye on the load on your systems. Based on this data, you can anticipate when your hardware capacity will be exceeded or when an application will grow to the point where it needs to be moved to another server.
Windows NT Components and Performance
This section is the starting point for my performance management discussions. My goal is to cover the typical hardware and software found on Intel-based servers running Windows NT. These concepts can easily be extended to the other hardware platforms on which NT runs. The first challenge when working in the PC server world is the number of vendors out there. This is good when it comes to keeping pressure on the vendors to innovate and offer products at a reasonable price. However, it does complicate things for administrators who have to keep up with these innovations and keep the systems running. It is not just a matter of dealing with the different transfer rates of different types of hard disk drives. It involves having disk drives that have completely different data transfer architectures and capabilities. The industry is committed to ensuring that you don't get bored with a lack of new technologies to keep up on.
Intel-Based PC Architecture
So what is the Intel-based PC architecture? Figure 19.1 presents a sample of such an architecture. It starts with a processor chip made by Intel or one of the companies that produces chips compatible with those made by Intel (Cyrix, AMD, and so on). This chip dictates several things about the architecture; it defines the interface between the processor chip, cache memory, and main random access memory. Intel numbers its chips with numbers ending in 86 (such as 80386 and 80486), although the later generations have been marketed by fancier names, such as Pentium. Whatever you call them, they define the hardware standard for Intel-based PCs and drive how Microsoft builds the versions of Windows NT that are designed to run on these processors. For purposes of this discussion, all the internals of these Central Processing Unit (or CPU) chips are skipped, because they don't contain anything you need to worry about related to tuning. Two of your most precious resources are connected to the processor chip by the highest-speed bus in your computer (which is 32 bits wide in the processors that run Windows NT 4.0). The first of these resources is the random access memory (RAM). This is the main memory in your computer, which enables you to hold the instructions that make up the programs you want to execute. The next resource, the second-level cache, is designed to make things a little faster. Conceptually, it is similar to random access memory, although it is faster. The goal is that if the operating system and processor guess correctly as to which is the next instruction or data element that needs to be retrieved and draws it into this cache, you can speed up the operation of the computer. It becomes important to you when purchasing systems that talk about cache in the specifications. Typically, computers that have more cache for the same processor speed operate slightly faster.
Because Windows NT operates on a variety of computer platforms, there is one thing related to the central processing units that needs to be discussed. The Intel 80x86 family of computers is what is known as a complex instruction set computer (CISC). This means that each instruction actually performs relatively complex tasks. This is in contrast to the reduced instruction set computers (RISC), such as the DEC Alpha, which process relatively simple tasks with each instruction. This becomes important when you consider the processing speed ratings that computer makers always throw at you. RISC computers almost always have higher speed ratings than CISC computers that perform the same amount of useful processing per unit of time. Also, because the Intel family continues to evolve, later generations of Intel processors usually out-perform older processors that have similar speed ratings, assuming that the operating system is designed to take advantage of these features. Windows NT is one of the few operating systems in the PC world that exploits many of the advanced features of recent Intel chips such as the Pentium.
Now back to the discussion of the computer architecture. So far, this chapter has discussed the three main components (CPU, cache, and RAM) that are attached to the processor's wonderful, high-speed data transfer bus. To keep this bus available for the important transfers between memory and the CPU, lower-importance communications are split off onto other busses within the computer. This was not the case in the early PC days. What happened here is that the slow stream of data from the various peripherals impeded the communications of the more critical components. Designers learned from this and placed these other devices on their own busses.
Many of the modern PCs that you would be considering for server configuration actually have two different types of supporting data transfer bus. The EISA (Enhanced Industry Standard Architecture) bus has been around for a while. Because of this, the majority of expansion cards on the market today support this architecture. It comes in two flavors: one uses an 8-bit or 16-bit wide data transfer bus and the other uses a 32-bit wide data transfer bus. This architecture was designed when processors and peripheral devices were much slower than they are today.
In recent years there have been several attempts to specify a new standard data transfer bus for the industry. The one that seems to be assuming this role is the PCI bus. This bus uses 32-bit wide data transfer rates. Some other technical design considerations also enable it to transfer data at much higher rates than the older EISA bus. What this means to you is that if your server has a PCI bus and you can find a peripheral (such as a SCSI disk drive controller card) that supports this structure, you get higher levels of performance. Be aware, though, that most of these cards cost more than their EISA counterparts. Also, some peripherals cannot use the higher performance capabilities of the PCI bus (floppy disk drives). Most of the servers I have configured recently have a mix of EISA and PCI slots. You typically guard the PCI slots for needs such as 100 megabit per second network cards that need the speed of the PCI bus.
So where does that discussion leave you? You have a processor chip that executes the instructions that allow the computer to process information. You have two forms of very high speed data and instruction storage in the RAM and cache that have a direct line to the CPU. You then have controllers that link the high speed processor and memory bus to slower peripherals. The lower speed data transfer buses come in several different flavors. The key is that you have to match the data transfer bus used in your machine with the type of card you want to add to your machine.
Now that you are comfortable with those basic concepts, there are a few other wrinkles you should be aware of before you jump in and start measuring your system performance in preparation for a tuning run. Some of the cards you attach to the EISA, PCI, or other lower speed data transfer buses are actually controllers for a tertiary bus in the computer. The most common example of this is hard disk drive controllers. Of course, there are several different standards for these controllers (SCSI, Fast-Wide SCSI, IDE, and others) that you have to get used to and that offer different performance characteristics. I cover a few of these just so you understand some of the differences.
IDE stands for Integrated Drive Electronics, which simply means that a lot of the logic circuits are on the drive itself and therefore provide a smarter drive that burdens the system less and is capable of supporting higher transfer rates than the older drives (MFM, RLL and ESDI). These controllers can typically support two drives, which are referred to as master and slave (if one is a boot drive, it is typically the master). They are typically slower than SCSI controllers and support fewer drives per controller. This is by far the most popular drive architecture in today's PCs.
SCSI stands for Small Computer Standard Interface and was first commonly used in the UNIX world. As such, it was designed for a slightly higher transfer speed and also supported more devices (typically seven peripherals on a single controller). You can usually have multiple SCSI controllers in a system if you have really big disk needs. SCSI components cost a bit more, and the device drivers are a bit harder to come by, but many administrators prefer a SCSI bus for servers. You can also buy SCSI tape drives (my favorite is the 4 millimeter DAT tape drive), optical drives and a few other peripherals, such as scanners. The SCSI bus is an architecture designed for the more demanding data transfer requirements that used to occur only in UNIX workstations and servers but today is being levied on Windows NT Servers.
Other components also have their standards. Modems are often referred to as Hayes or US Robotics compatible. However, for the most part, these devices do their own thing, and it typically has a lesser impact on the performance of the computer system. One area that does have an impact on the system performance is the video subsystem. Several graphics cards take high-level graphics commands and work the display details out on the graphics card itself using on-board processors and special high-speed graphics memory. NT can be set up to interface directly with these cards at a high level, or it can be asked to work out the display details using the main CPU, which increases the load on the system.
For those of you have already exceeded your limits on hardware-related information, be brave. There is only one more hardware topic to discuss before moving on to the software world. The final topic is the components themselves. Memory chips have speed ratings, but because the data transfer rate is synchronized by the processor bus, all you have to worry about is that your memory chips are good enough for the clock speed of your computer. The performance effects that are most commonly considered are those associated with disk drives. You might have a fast bus, but if the disk drive is slow, your overall data transfer rate is reduced. Therefore, if you have a system that needs a lot of performance, you need to check out the performance characteristics of your peripherals as well.
I'm sure you found that discussion of server hardware components absolutely fascinating, but where does that leave you? Computer systems are a collection of hardware components that interact to perform the tasks assigned to them. Figure 19.2 shows a hierarchy of these components. These levels need to act together to perform services for the user, such as accessing data on a hard disk drive. Each of these components can be purchased from several vendors. Each of these products has varying levels of price and performance. The difficulty is finding that correct blend of price and performance to meet your needs.
The Operating System and Its Interaction with Hardware
This section addresses the operating system and how it interacts with all this hardware. Chapter 2, Building Blocks of Windows NT, provides a more detailed discussion of the NT architecture. Think of the way you set up a new computer. Typically, you set up all the hardware first and then you install your operating system. If your hardware doesn't work (the machine is dead), it really doesn't matter what the operating system can do. This is a good way to think of the interaction of the operating system with the hardware. The operating system is designed to interface with the various hardware devices to perform some useful processing.
The first interface between the operating system and your computer is pretty well defined by Intel (or your hardware vendor, such as DEC) and Microsoft. The operating system contains a lot of compiled software that is written in the native language of the CPU and related processors (a bunch of 1s and 0s). The low-level interface to the computer chips is segregated into a small set of code that is specific to the particular computer chip used (this helps NT port between Intel, DEC Alpha, and other host platforms).
Each of the other hardware devices (expansion cards, SCSI controllers, and disk drives) responds to a series of 1s and 0s. The challenge here is that different devices use a different set of codes to control their devices. For example, you would transmit a specific binary number to eject the tape on a 4 millimeter DAT tape drive and another binary number to eject a CD-ROM from its drive. Walking through any major computer store and seeing the huge variety of hardware available gives me a headache just thinking about all the possibilities for controlling commands.
In the old DOS world, each application was pretty much on its own and had to worry about what codes were sent to each of the peripherals. That made the lives of the people who wrote DOS easier, but created great headaches for application developers. Realizing this problem, Windows (before Windows 95 and Windows NT) introduced the concept of device drivers. This enabled you to write your application to a standard interface with the operating system (part of the application programming interface, or API, for that operating system). It was then the operating system's job to translate what you wanted to have done into the appropriate low-level codes specific to a particular piece of hardware. The section of the operating system that took care of these duties was the device driver (sometimes referred to as printer drivers for printers).
It is now no longer your problem to deal with the hardware details as an application designer. You have a few other things to think about, though. Because different hardware devices have different capabilities (for example, some printers can handle graphics and others are text-only), the device driver has to be able to signal the applications when it is asked to do something that the attached device cannot handle. Also, because these device drivers are bits of software, some are better written than others. Some of these device drivers can get very complex and might have bugs in them. This leads to upgrades and fixes to device drivers that you have to keep up on. Finally, although Windows NT comes with a wide variety of device drivers (both Microsoft and the peripheral vendors want to make it easy for you to attach their products to your Windows NT machine), other devices might not have drivers on the NT distribution disks. New hardware or hardware from smaller vendors requires device driver disks (or CDs) for you to work with them. An excellent source of the latest device drivers is the Internet Web pages provided by Microsoft and most computer equipment vendors. The key to remember here is that you need to have a Windows NT compatible (not Windows 3.1 or Windows 95) device driver for all peripherals that you will be attaching to your NT system. If you are in doubt as to whether a peripheral is supported under NT, contact the peripheral vendor or look at the NT Hardware Compatibility List on the Microsoft Web page (www.microsoft.com or CD-ROMs if you lack Web access).
Now that you have eliminated the low-level interfaces to all that hardware, you have a basis on which you can build operating system services that help applications run under Windows NT. Using the same logic that drove the developers to build device drivers, there came to be several common functions that almost every application uses. Microsoft has always been sensitive to the fact that you want a lot of great applications to run on your operating system so that it is useful to people and they will buy it. Rather than have the application developers spend a lot of time writing this code for every application, Microsoft engineers decided to build this functionality into the operating system itself. Examples of some of these services include print queue processing (killing print jobs and keeping track of which job is to be sent to the printer next) and TCP/IP networking drivers. This actually has the side benefit of enabling Microsoft engineers who are specialists in these components and the operating system itself to write these services and drivers. This usually results in more powerful and efficient processes than would be written by your typical application programmer who has to worry about the screen layout and all those reports that accounting wants to have done by Friday.
Finally, at the top of the processing hierarchy are the end-user applications themselves. One could consider a computer as next to useless if it lacks applications that help people get their jobs done. Remember, UNIX talks about its openness, Macintosh talks about its superior interface that has existed for several years, and the OS/2 folks usually talk about the technical superiority of the internals of their operating system. All Microsoft has going for it is a few thousand killer applications that people like. One thing you have to remember as a system administrator now that you are at the top of the performance hierarchy is that you still have the opportunity to snatch defeat from the jaws of victory. You can have the best hardware, the best device drivers, and the most perfectly tuned Windows NT system in the world, but if your applications are poorly written, your users will suffer from poor system performance.
Figure 19.3 provides a summary of what this chapter has covered so far in a convenient, graphical form. Obviously, this drawing does not depict the interaction of all the components. The operating system interfaces with the CPU, and some peripherals actually interact with one another. What is does show is that several components need to work together to provide good performance as seen by the end users. This can be a complex job, but fortunately Windows NT provides you with the tools that help you manage this more easily than most other server operating systems. That is the subject of the rest of this chapter.
You might be wondering how to figure out all the options and have any idea whether this complex array of components was functioning at normal efficiency. There are articles that go through the theory behind components and try to calculate values for various configurations. However, because hardware technology continues to march on, and any configuration you might read about is quite possibly out of date by the time you read it, many prefer a more empirical approach.
Unlike most other operating systems, Windows NT is designed to handle most of the details of the interaction of its components with one another. You can control a few things, but I wouldn't recommend it unless you had a really unusual situation and a really serious need. You might do things that the designers never intended. Therefore, I start by accepting my hardware and operating system configuration. From there, I look for metrics that measure the overall performance of the resources that typically have problems and that I have some form of control over. For purposes of this chapter, I treat the vast complexity of computers running Windows NT as having five components that can be monitored and tuned:
CPU processing capacity
Random access memory and virtual memory capacity
Network transmission capacity
CPU Processing Capacity
The first component specifies the amount of processing work that the central processing unit of your computer can handle in a given period of time. Although this might depend on the clock speed, the type of instruction set it processes, the efficiency with which the operating system uses the CPU's resources, and any number of other factors, I want to take it as a simple limit that is fixed for a given CPU and operating system. You should be aware that different CPUs exhibit different processing capacities based on the type of work presented to them. Some computers handle integer and text processing efficiently, whereas others are designed as numeric computation machines that handle complex floating-point (with decimals) calculations. Although this is fixed for a given family of processor (such as the Intel Pentium), you might want to consider this when you are selecting your hardware architecture.
RAM and Virtual Memory Capacity
The next component is RAM and virtual memory. Applications designers know that RAM can be read very quickly and has a direct connection to the CPU through the processor and memory bus. They are under pressure to make their applications do more and, of course, respond more quickly to the user's commands. Therefore, they continue to find new ways to put more information (data, application components, and so on) into memory, where it can be gotten much more quickly than if it were stuck on a hard disk drive or a floppy disk. Gone are the days of my first PC, which I thought was impressive with 256KB of memory. Modern NT Servers usually start at 24MB of RAM and go up from there.
With Windows NT and most other operating systems, you actually have two types of memory to deal with. RAM corresponds to chips physically inside your machine that store information. Early computer operating systems generated errors (or halted completely) when they ran out of physical memory. To get around this problem, operating system developers started to use virtual memory. Virtual memory combines the physical memory in the RAM chips with some storage space on one or more disk drives to simulate an environment that has much more memory than you could afford if you had to buy RAM chips. These operating systems have special operating system processes that figure out what sections of physical memory are less likely to be needed (usually by a combination of memory area attributes and a least recently used algorithm) and then transfer this data from RAM to the page file on disk that contains the swapped-out sections of memory. If the swapped-out data is needed again, the operating system processes transfer it back into RAM for processing. Obviously, because disk transfers are much slower than RAM transfers and you might be in a position of having to swap something out to make room and then transfer data into RAM, this can significantly slow down applications that are trying to perform useful work for your users.
Some use of virtual memory is harmless. Some operating system components are loaded and rarely used. Also, some applications have sections of memory that get loaded but are never used. The problem is when the system becomes so memory-bound that it has to swap something out, then swap something in, before it can do any processing for the users. Several memory areas are designated as "do not swap" to the operating system; these cause you to have even less space for applications than you might think. If you are using a modern database management system such as SQL Server or Oracle, be ready to have large sections of memory taken up in shared memory pools. Databases use memory areas to store transactions that are pending, record entries to their log file for later transfer to disk, and cache records that have been retrieved on the hope that they will be the next ones asked for by the users. They are probably the most demanding applications on most servers.
The next component to discuss is the input/output capacity of the computer. For today at least, you can simplify this to be the input/output capacity of your disk drives and controllers. Someday, not too long from now, you might be worrying about saturating your PCI bus with complex audio/video traffic. However, for now, I concentrate on disk drives. Typically, three components are involved with getting data from the high-capacity disk drives to the RAM memory where the CPU can use it. The first is the secondary data transfer bus controller, usually a PCI or EISA controller. The next is the actual disk drive controller, which is most often an IDE controller, with more reasonably priced SCSI showing up as time progresses. Finally, there is the disk drive itself. I discuss ways to measure the overall transfer capacity of the disk drive systems as if they were individual drives directly connected to the processor bus. This simplification treats the controllers and drives as a single entity and makes measurement and management easier. You still need to keep in the back of your mind the possibility that you might saturate the capacity of your disk drive controller even though you have not saturated the capacity of any of the individual disk drives attached to it.
Network Transmission Capacity
A very important component in many Windows NT Servers is the network transmission capabilities. Windows NT Server is a network-based computing environment. This is basically data input and output using a card similar to those used to attach disk drives, right? I separated my discussion of network I/O from other I/O for several reasons. First, the technologies are completely different and you have to get used to a different set of terminology. Also, you usually have to deal with a different group of engineers when trying to resolve problems. Second, unless you are also the network administrator, you typically do not completely own the network transmission system. Instead, you are just one of many users of the network. This means that you have to determine if you are the cause of the network bottleneck or if you are merely the victim of it. Finally, you end up using a different set of monitoring utilities when dealing with networks.
Network cards are an often overlooked component in a server. The networking world is somewhat deceptive because they quote transmission rates for the type of network you are using (for example, 10 million bits per second on Ethernet). This might lead you to think that it doesn't matter which network interface card you choose because the transmission rate is fixed anyway. Unfortunately, that is not the case. If you ran different network cards through performance tests, you would find that some are much more capable than others at getting information onto and off of a particular computer. Because most network cards on the market are designed for workstations that individually have relatively light network loads, you might have to look around to find a network interface card that is designed to meet the much more demanding need of a server that is continuously responding to network requests from several workstations.Application Efficiency
Finally, I want to re-emphasize the idea of application efficiency as an important part of overall performance. Perhaps this is because I come from the perspective of the database administrator, where you continuously run across developers who are complaining that "the database is too slow." I can't tell you how many times I have looked at their software only to find things such as if they want ten numbers, they put the program in a loop and make ten calls to the database over a heavily loaded network, as opposed to one call that brings back all the data at once. Perhaps they are issuing queries without using the indexes that are designed to make such queries run faster. By changing a few words around in the query, they could take a 30-minute query and turn it into a 10-second query. (I'm not kiddingI saw this done on a very large data warehouse many times.) Unfortunately, this is one of those things that requires experience and common sense. You have to judge whether the application is taking a long time because it is really complex (for example, a fluid flow computation with a system that has 10,000 degrees of freedom) or because it is poorly tuned (they want to retrieve 10 simple numbers and it is taking 30 seconds). Of course, you need to be certain before you point the finger at someone else. That is what the Performance Monitoring Utilities section is all about. It gives you scientific data to show that you are doing your job properly before you ask for more equipment or tell people to rewrite their applications.
In this section, I have tried to give an overview of how the hardware, software, and operating system interact to give an overall level of computer service to the users. The difficulty in this process is that a huge variety of hardware components, device drivers, and applications all come together to affect the performance of your Windows NT Server. I don't believe it is worthwhile to calculate out numbers that indicate the capacity of your system. Instead, I believe in measuring certain key performance indicators that are tracked by Windows NT. Over time, you will develop a baseline of what numbers are associated with good system performance and what are associated with poor performance. I have also simplified the vast array of components in the system to a list of five key components that you will want to routinely monitor. These are the ones that will cause your most common problems.
Self-Tuning and Windows NT
As I mentioned earlier, Windows NT is a self-tuning operating systemwithin limits. Don't get me wrong. Having worked with UNIX, I appreciate all the self-tuning that Windows NT does for you. What I want to ensure is that you don't think that NT can tune itself on a 486/33 to support thousands of users. With that caveat out of the way, I want to explore some of the things that Windows NT does to keep itself in tune for you and how you can affect this process. Once you understand what NT is doing to help itself, you will be in a better position to interpret monitoring results and therefore know when your intervention is required.
First, there are several things you do not want Windows NT to try to do for you. Moving files between disks to level the load might cause certain applications to fail when they cannot find their files. That is something you have to do for yourself. You also don't want NT to decide whether background services that have not been used for a while should be shut down. You might have users who need to use those services and applications and who cannot start the services automatically if they are currently shut down.
There are also some things Windows NT cannot do. It cannot change the jumper settings on your expansion cards and disk drives to reconfigure your system to be more efficient. It cannot rearrange cabling. Thankfully, it also cannot issue a purchase order to buy additional memory or CPU upgrades. Basically anything that requires human hands is still beyond the reach of Windows NT self-tuning.
So what does Windows NT have control over? Basically, it boils down to how Windows NT uses memory to improve its performance. There are several games it plays to try to hold the information you will probably want to work with next in memory. To do this it sets aside memory areas for disk caches and other needs. It also tries to adapt itself to your demonstrated processing needs. When changes in your needs are detected, it reallocates memory to try to best meet your needs. This is why you almost always find almost all your system memory used on a Windows NT Server even when you are not especially busy. Windows NT senses that there is a lot of memory available and it tries to apply it to best meet your anticipated needs rather than just having it sit around unused.
Another thing Windows NT can do automatically for you is adjust its virtual memory space. If your existing page file space (pagefile.sys) is not large enough to handle your needs, it can go after other space that is available on your disks to get additional temporary paging space. The downside of this is that the new page file space is probably not located next to the default page files on disk. Therefore, when you are writing to or reading from these page files, you will probably have extra delays as the heads on the disk drive move between the various files. Although disk drive transmission rates are relatively slow when it comes to paging, the physical movement of the arms that hold the heads is even slower, so it should be avoided whenever possible.
Although I'm sure that self-tuning is the topic of many of the white papers available on the Microsoft Web page or Technet CD, the previous page is good enough for these purposes. Although I spent only a page on the subject, do not underestimate the power of self-tuning. In UNIX, if you had to go in to tune the kernel, you were faced with a series of parameters that were ill-documented and had strange, unexpected effects on one another. I have more than once done more harm than good when changing UNIX kernel parameters to get applications such as databases going. However, self-tuning is not a magic silver bullet that solves all your problems. The next section starts the discussion of how you can tell when your system is having problems that self-tuning cannot solve and how to identify what the specific problem is.
Performance Monitoring Utilities
One of the more interesting things about coming to Windows NT after having worked on several other types of server platform is the fact that several useful, graphical administrative tools come as part of the operating system itself. Although UNIX and other such operating systems come with tools that can get the job done if you are fluent in them, they are neither friendly nor powerful. My main focus in this section is the built-in utility known as Performance Monitor. This tool enables you to monitor most of the operating system performance parameters that you could possibly be interested in. It supports a flexible, real-time interface and also enables you to store data in a log file for future retrieval and review.
However, before I get too far into the tools used to measure performance, a few topics related to monitoring need to be covered. Although you could put your ear to your computer's cabinet to see if you hear the disk drives clicking a lot, it is much easier if the operating system and hardware work together to measure activities of interest for you. Windows NT can monitor all the critical activities of your system and then some. Your job is to wade through all the possible things that can be monitored to determine which ones are most likely to give you the information you need.
How does Windows NT measure activities of interest? The first concept you have to get used to is that of an object. Examples of objects in Windows NT performance would be the (central) processor or logical disk drives. Associated with each of these objects is a series of counters. Each of these counters measures a different activity for that object, such as bytes total per second and bytes read per second. That brings me to a good point about counters. To be useful, counters have to measure something that is useful for indicating a load on the system. For example, six million bytes read does not tell you very much. If it were six million bytes per second, that would be a significant load. If it were six million bytes read since the system was last rebooted two months ago, it would be insignificant. Therefore, most of your counters that show activity are usually rated per second or as a percentage of total capacity or usage (as in percent processor time devoted to user tasks). A few items, such as number of items in a queue, have meaning in and of themselves (keeping the queue small is generally a good idea).
Another important concept when working with counters is that you have to measure them over an appropriate time interval. A graph showing every instant in time would produce an enormous amount of data very quickly. To control this, measurement programs such as Performance Monitor average the values over an amount of time you specify. You have to be somewhat careful when you specify the time interval. For example, if you average the data over a day, you can see long-term trends on the increase in usage of your system when you compare the various days. However, you would not notice the fact that the system is on its knees from 8 to 9 am and from 1 to 2 pm. This is what your users will notice; therefore, you need to be able to measure over a more reasonable interval, such as several minutes. I'll show you later how to turn monitoring on and off automatically, so you aren't collecting a lot of relatively useless data when no one is using your server.
Another important point about monitoring is that you need to monitor the system without influencing the data. For example, imagine setting up several dozen instances of Performance Monitor to measure all the parameters that could possibly be needed for later analysis using a time interval of 1 second and logging all this information to a single disk drive. The data you collected by this process would be heavily influenced by the load placed on the system to collect, process, and store the information related to monitoring. You counters for the number of processes running, threads, and data transfer to that logging disk might actually reflect only the load of the monitoring application and not show any data about the use of the system by other applications.
A final term you need to get comfortable with that relates to Windows NT performance monitoring is that of an instance. Most of the objects monitored by Performance Monitor have multiple instances. An example of this is when you try to monitor the logical disk object: the system needs to know which of your logical disks (such as the C drive) you want to monitor. As you see later, Performance Monitor provides you with a list of available instances for the objects that have them.
In summary, monitoring under Windows NT is provided by a built-in, graphical utility known as Performance Monitor. This tool monitors many different types of objects (for example, processors). Each object has several possible attributes that might need to be measured. Windows NT refers to these attributes of the objects as counters (for example, percent processor time). Finally, there are more than one of many of the objects in your system. When there are multiple items of a given object, you need to tell Performance Monitor which instance of that object you want to have measured and for which of the counters. Figure 19.4 illustrates these concepts.
As I alluded to earlier in my discussion, Windows NT provides you with a large number of counters to select from. I found the following objects, with the number of counters associated with them in parentheses:
Logical Disk (21)
NetBEUI Resource (3)
Network Interface (17)
NWLink IPX (39)
NWLink NetBIOS (39)
NWLink SPX (39)
Objects (Events, Mutexes, Processes, Sections, Semaphores, and Threads)
Paging File (2)
Physical Disk 19)
RAS Port (17)
RAS Total (18)
Server Work Queue (17)
Now on to one of the really interesting features of Performance Monitor. Much as the Event Log can be used by applications to give you one place to look at for things that have happened (information and problems) on your system, applications can be interfaced with Performance Monitor to give you one place to look for performance information. One of the things I found useful is that the monitoring options appear on the list of objects only when you have the appropriate applications running. For example, the five Performance Monitor objects associated with SQL Server and the nine Performance Monitor objects associated with Exchange Server are listed only when you have those servers running. This does not prevent you from running Performance Monitor in multiple windows to monitor things one at a time. It does enable you to put the usage data for an application on the same graph as critical operating system parameters to see how the application is interacting with the operating system. Think of it as being able to put the blame on a particular application for bringing your server to its knees.
This brings me to my first bit of advice related to using the Performance Monitor built into Windows NT: Try it out. It can monitor numerous things, but the interface is fairly simple. It is not like the registry editor, where you are doing something risky. You might waste a little time playing with this utility when everything is going well on your system. Being comfortable with selecting the various counters and graphical options could pay you back in the future when things are not going well and you are under pressure to solve the problems quickly.
With all this performance theory and terminology out of the way, it is time to actually look at Performance Monitor and see how it can be used to get you the data you need. Figure 19.5 shows you how to access Performance Monitor in the Start menu hierarchy. As you can see, it is conveniently located with all the other administrative tools that you will come to depend upon to keep your server going.
What does Performance Monitor do for you? Figure 19.6 shows my favorite means of displaying data: the line graph. It is a flexible utility that has other ways to capture data, but for now you can learn a lot about the tool using this format. Following is a description of the basic interface:
Across the top is a traditional Windows menubar that enables you to access all your basic processing options.
Underneath the menu is a tool bar that enables you to pick the most commonly used options with a single mouse click. The first four buttons enable you to select the type of display to be presented: chart view, alert view, log view, and report view. The next three buttons enable you to add, modify, or delete a particular counter from your list via a pop-up dialog box. The next button enables you to take a snapshot of performance data (for when you are not collecting data at predefined intervals). The next to last button enables you to write a comment (guidemark) to the log file at a particular time that might help jog your memory later when you are reviewing the data. The final button displays the Display Options dialog to enable you to set things up to your liking.
The majority of the display is consumed with the display of performance data. I will go over the various display formats shortly.
Before leaving the basics of the Performance Monitor display, I wanted to cover a few options that might appeal to some of you. Those of you who are comfortable with several windows or toolbars being open on your desktop at a given time so you can see everything that is happening can use a few control keys to minimize the Performance Monitor display. Figure 19.7 shows a Performance Monitor graph of CPU processor time use in a nice, graph-only window. To do this, I hit the following keys: Ctrl+M (toggles the menu on and off); Ctrl+S (toggles the status line on and off); and Ctrl+T (toggles the toolbar on and off). You can hit Ctrl+P to toggle a setting that keeps the Performance Monitor display on top of other windows on your desktop. All you have to do is set up Performance Monitor to monitor the items you want, trim off the menu, toolbar, and status window, then size the window and move it to where you want it.
When you first start Performance Monitor, you notice that it is not doing anything. With the large number of monitoring options, Microsoft was not willing to be presumptuous and assume which counters should be monitored by default. Therefore, you have to go in and add the options, counters, and instances you want to get the system going. It is not at all difficult to accomplish this task. The first thing you do is click the plus sign icon on the toolbar. You are presented with the Add to Chart dialog box, shown in Figure 19.8. From here, all you have to do is select the options you want:
Computer: This box enables you to select which computer you want to monitor. This is a really nice feature because it lets you sit at your desk and use your terminal to monitor the world (at least that part of the world to which you have privileges). The a selection box at the right helps you locate the computers that are available for you to monitor, or you can do it the old-fashioned way by typing a double backslash followed by the computer name.
Object: This is a drop-down listbox that enables you to select which object you want to monitor. The object selected drives the legal values that are displayed in the next two controls.
Counter: This is a scrollable list that enables you to choose which of the counters associated with your selected object you want to monitor.
Instance: To the right of the Object and Counter controls is a scrollable list of the instances of the object you have selected (which disk drive).
Add: To add the specified object counter instance you have entered to the list of counters monitored, click the Add button.
Explain: To get a more detailed explanation of the counter you are selecting, click the Explain button and you get some text at the bottom of the dialog box (see Figure 19.9). This text could be a little more detailed and easy to understand, but it can be useful in certain circumstances.
The Add to Chart dialog box with the Explain option selected.
Cancel: When you are finished, click the Cancel button before you click the Add button, or click Done after you click the Add button. This takes you back to the main display so that you can see what is happening.
You should note that this same dialog box is used for all the types of monitoring (chart, alert, log, and report) that are supported by Performance Monitor. The last word in the title bar changes to correspond to the display option with which you are working.
Finally, you see Color, Scale, Width, and Style controls at the bottom of this dialog box. Performance Monitor automatically cycles through a predefined color and line pattern list as you add different counters, but you might want to take control of this decision to suit your artistic sense. These buttons let you do that.
Considerations for Charts
Here there are a few considerations for laying out this chart. So far I have presented relatively simple graphs to illustrate my points. However, now consider Figure 19.10, which I intentionally made complex to illustrate a few new points. First, you might find it hard to distinguish which line corresponds to which counter. There are several lines, and they are all crossing one another. Very few people, even those comfortable with graph reading, can follow more than a few lines. You might want to keep this in mind when you take your wonderful charts before management to prove a point. Also, you might find it especially difficult to read the graphs presented here. That is because Microsoft uses different colors and patterns for the lines to help you pick out which line corresponds to which data element. Although there are shading differences between the various colors and there is the line pattern, you might want to keep the number of lines on your graph especially small if you are presenting it in black and white.
It can take some time to customize this display with the counters you want to monitor, the display options, and the colors that appeal to your sense of beauty. It would be a real pain if you had to go through this process each time you wanted to start up Performance Monitor. The boys from Redmond have not let you down. They have provided the Save Chart Settings option under the File menu in Performance Monitor (see Figure 19.11) for you to save your settings for the chart. You can also save all the settings for your charts, option menu selections, and so forth in a workspace file using the Save Workspace menu option. These features can be quite powerful. You can actually create a series of Performance Monitor settings files in advance when everything is working well. You can even record data for these key parameters that correspond to times when the system is working well. Then, when a crisis comes up, you can quickly set up to run performance charts that can be compared with the values for the system when it was running well. This up-front preparation can really pay off when everyone is running around screaming at you, so you might want to put it on your list of things to do.
If you develop applications in addition to administering an NT system, one of the things you are probably aware of is that report generation and presentation utilities are the ones that generate the most debate and interest in the user community. You can get away with a wide variety of data entry forms, as long as they are functional and efficient. However, when it comes to reports and presentation utilities, you will have users arguing with each other about which columns to put on the report, what order the columns have to be displayed in, and sometimes how you calculate the values in the columns. I have seen people go at it for hours arguing such little details as the font used and what the heading of the document should be. You are in the position of being the consumer of a data presentation utility (Performance Monitor), and Microsoft knows that everyone has slightly different tastes. Therefore, they give you several options for presenting the data. Not only are there basic format options such as alert reports versus the graphical charts I have been discussing, but they even let you get into the details as to how the data on a chart is collected and presented. The Chart Options dialog box is shown in Figure 19.12.
This dialog box enables you to set the following presentation options:
Legend: This checkbox enables you to specify whether to take up part of the screen display with a legend (the section where you match a line color and pattern to a specified computer, object, counter, and instance). Unless this is absolutely obvious to anyone walking up to the display, I recommend keeping the legend available at all times.
Value Bar: This checkbox controls whether the Last, Average, Minimum, Maximum, and Graph Time data displays are presented just below the graph itself (see Figure 19.12, which has this option selected).
Vertical Grid: This checkbox controls whether vertical lines are drawn in the middle of the graph to help you see what the values of a given data point are. (Both horizontal and vertical lines were selected for the graph displayed in Figure 19.12.)
Horizontal Grid: This checkbox controls whether horizontal lines are drawn in the middle of the graph to help you see what the times corresponding to a given data point are.
Vertical Labels: This checkbox controls whether the scale of values for the vertical line is displayed.
Gallery: This option contains radio buttons to select either Graph (a line graph where all the data points are represented as dots connected together with a line) or Histogram (a bar chart where each data point is represented as a vertical bar whose height corresponds to the value being measured).
Vertical Maximum: This edit box enables you to control whether you display the whole range of values possible as determined by NT (0 to 100 percent) or you focus on a narrower range of data (such as 0 to 50 percent). This can be useful in situations where you want to see finer variations in data that is confined to a narrower range.
Update Time: This section enables you to specify whether you want Periodic Update or Manual Update. Periodic Update is the default, in which Performance Monitor automatically collects a data point for the parameters being monitored at the interval (in seconds) specified in the edit box. Manual Update collects data manually, by clicking the Performance Monitor toolbar item that looks like a camera, by selecting the Options menu and Update Now, or by pressing Ctrl+U.
By now you should be impressed with the wide array of charting options Performance Monitor provides. You are probably a little uncertain as to which of the many options you will want to use in your environment. I give my list of favorite counters to monitor later in this section. Also, as I mentioned before, there is no substitute for sitting down and actually playing with Performance Monitor to get comfortable with it and see how you like the environment to be set up. What I want to cover now are the other data collection and presentation options provided by Performance Monitor. You have so far experienced only one of the four possible presentation formats. The good news is that although the format of the displays is different, the thinking behind how you set things up is the same, so you should be able to adapt quickly to the other three presentation formats.
The Alert View in Performance Monitor
Figure 19.13 presents the next display option Performance Monitor provides: the alert view. The concept behind this is really quite simple. Imagine that you want to keep an eye on several parameters on several servers to detect any problems that come up. The problem with the chart view is that the vertical line keeps overwriting the values so that you see a fixed time interval and lose historical information. You probably don't want to have to sit in front of your terminal 24 hours a day waiting for a problem value to show up on your chart, either. Later, I explain the log view, which lets you capture each piece of data and save it in a file for future reference. However, if you want to have a fairly reasonable time interval that enables you to detect response time problems for your users (for example, 60 seconds), you will generate a mountain of data in a relatively short period of time.
When you look at the data that would be captured in a Performance Monitor log file, the one thing that hits you is that the vast majority of the data collected is of no interest to you. It shows times when the system is performing well and there are no problems. Disk drives are relatively inexpensive these days, so it isn't that much of a burden to store the data. However, your time is valuable, and it would take you quite some time to wade through all the numbers to find those that might indicate problems. People are generally not very good at reading through long lists of numbers containing different types of data to find certain values. Computers, on the other hand, are quite good at this task. The alert view combines savings on disk space and the task of sorting through values that are of no interest into one utility. It lets you record data only when the values reach certain critical values that you tell Performance Monitor are of interest to you.
To set up monitoring with the alert view in Performance Monitor, you first select this view from the toolbar (the icon with the log guide with an exclamation mark on it). Next, you click the plus sign icon to add counters to be monitored. The Add to Alert dialog box you get (see Figure 19.14) has the basic controls that you used to specify the counters under the graph view (Computer, Object, Counter, and Instance). The key data that is needed to make the alert view work is entered at the bottom of the screen. The Alert If control enables you to specify when the alert record is written. You specify whether you are interested in values that are either under or over the value you enter in the edit box. The Run Program on Alert control gives you the option to run a program when these out-of-range values are encountered. You can specify whether this program is to be run every time the alert condition is encountered or only the first time it is encountered.
The alert view can be really useful if you want to run it over a long period of time (or continuously whenever the server is running). It collects only data that is of interest to you. You can specify values that indicate both extremely high load and extremely low load. You can even kick off a program to take corrective action or enable more detailed monitoring when you run across an unusual condition. You still have to make time to view the data, but at least you get to avoid reading anything but potentially interesting events.
The Log View in Performance Monitor
The next view provided by Performance Monitor is the log view. The principle behind this is simpleyou write all the data to a log file. Later, you can open this log file using Performance Monitor and view the results or export the results to a data file that can be imported into a spreadsheet like Microsoft Excel for further processing, or you could even write your own program to read this file and massage the data. The log view is relatively simple to work with. First, you need to specify the parameters that you want to log using the plus sign toolbar icon. Next, you have to specify the file that is to contain the logging data and how often the data points are to be collected. Figure 19.15 shows the Log Options dialog box, which captures this information.
The Log Options dialog box contains your standard file selection controls. You can add to an existing file or make up a new filename. You can even keep all your log files in a separate directory (or on a network drive) so that it is easy to find them. Once you have specified the filename, your only other real decision is the update interval. Again, you have to be careful, because if you choose a 10 second interval, you will generate over 8,600 data points per counter selected per day. All that's left is to click the Start Log button to get the process going. The Log Options dialog is also used to stop the logging process. Note that you need to keep Performance Monitor running to continue to collect data to this log file. I will show you later how to run Performance Monitor as a service in background.
The Report View in Performance Monitor
Now for the final view provided by Performance Monitor: the report view. It is actually a very simple concept, as is shown in Figure 19.16. You specify a set of counters that you want to monitor, and it gives you a screen that lists the parameter and the value observed in the last interval. This is one view where manual data collection could be useful. Imagine a situation where you have an application turned off and everything is running fine. You collect a set of values for this normal time and save them to disk. You can then start up the application that seems to be causing the problem and capture a new set of data to see what the differences are. This is yet another tool to add to your arsenal for those times when problems arise.
There is one more interesting feature of Performance Monitor before moving on. Performance Monitor has lots of nice display capabilities built into its graph view. However, the log option is useful for collecting data over a long period of time when you can't sit at your terminal and review the graphs before they are overwritten. A menu selection on the Options menu helps you with this situation. The Data From option brings up a dialog box (see Figure 19.17) that lets you select what you are graphing. The formal setting is to graph current activity (the counters that you have currently selected to monitor). You have the option of selecting one of the log files you built to be the subject of the graph for later review.
So far I have covered the four views Performance Monitor provides that can be used to display data in different formats. One of these views should be what makes the most sense for you when you need to review data. I also covered how to set these monitors up and alter their display to your taste. You need to remember that if you set up to monitor a certain set of parameters in one view, these settings do not carry over to the other views. (For example, if you are monitoring logical disk transfer rates in the graph view, it does not mean that you can access this data and settings from within the report view.)
Other Performance Tools
What I want to cover next is a series of tools you might want to call into play when a performance problem is suspected. Performance Monitor is the main tool designed into the Windows NT environment to support performance monitoring, and it is quite powerful. It can measure far more data items than you would ever want to review (unless you are really bored). However, I would like you to consider data that you can obtain from the following tools that support your performance monitoring activities:
Virtual Memory display in the Control Panel System applet
Tasking display in the Control Panel System applet
The Services applet in Control Panel
The Server applet in Control Panel
The Windows NT Diagnostics utility in the Administrative Tools program group
First on this list is Event Viewer. This tool was basically designed to support auditing in Windows NT and is described in more detail in Chapter 24, Auditing Windows NT Server. The Event Viewer audit trail records occurrences of certain situations that have been defined as being of interest for future review. This includes both problems (as in the case of a service that fails to start up or dies) and information messages (such as startups and shutdowns of the system). The Event Viewer can also record activities related to applications (such as SQL Server and Exchange Server) that have been designed to record such information. This time-stamped record can be very valuable when you are trying to figure out what was happening with your system when a problem occurred.
Imagine that users complained that the system was extremely slow abound 12:45 today. I didn't have Performance Monitor running at the time, so I don't have any hard data related to key performance counters. I do, however, have the subjective data point from the users that performance was poor at a certain time of the day. The first thing I would do is look in the Event Viewer to see if there was anything special happening about that time.
Figure 19.18 shows the event viewer from my system. Looking at this, I can see that several print jobs were queued around the time that the users reported the performance problem. I might suspect that someone had queued a series of complex screen print jobs to that printer that placed a high demand on the system (which is what I was doing). I would then be able to run some tests and see if I could replicate the problem and then take corrective action to prevent it from happening again. The best thing about Event Viewer is that it is sitting there waiting for you to read it. You don't have to go out of your way to set it up.
An Event Viewer record used for performance monitoring.
Virtual Memory Display Applet
Next on my list of other tools that might come in handy is the Virtual Memory display in the Control Panel System applet. Figure 19.19 shows this display for my computer. Although this tool does not tell you why a particular application was slow this afternoon, it is one place to visit when you want to know how your paging files are set up. There are also some interesting bits of data at the bottom about the space taken up by drivers and the size of the registry. The display at the top can be useful because if you are observing a fair amount of paging and also a disk drive that is heavily loaded, you might want to check which disk drives contain paging files. Paging might cause the disks that contain the paging files to become heavily burdened.
Another interesting mini-tool you might want to use as a reference is the one that controls tasking priorities for Windows NT. This is one place where you get to influence those self-tuning algorithms. Windows NT Workstation is optimized toward providing a good response to user actions (in the foreground). Windows NT Server is optimized toward sacrificing some console responsiveness in return for doing a good job taking care of networked users through background services, such as the network services, print services, and database management system services. Figure 19.20 shows the Tasking dialog on the System applet accessed from the Control Panel. As you can see, you have three levels of control, ranging from heavily supporting foreground applications to a balance between foreground and background responsiveness. If you ever have to do a significant amount of processing on the system console of a less powerful Windows NT Server, you will become convinced that it is not balanced but instead is ignoring you in favor of those users who are connected through the network. Perhaps that is just my opinion or I am getting spoiled from using relatively good equipment.
Another tool you might want to bring to bear on performance problems is the Services applet in the Control Panel (see Figure 19.21). This is the best place to look to see which services are actually running at the current moment. If you are experiencing performance problems, you might be able to find a few services that you are running that you no longer need. This is an ideal way to get more performance from your system without increasing your costs. Another possibility is that you have a series of services that you use only at certain times of the day. You can build a batch file that uses the net start and net stop commands to start and stop the services so that they run only when they are needed. This is shown in the following example, which stops an Oracle database instance service to allow a backup to be completed. (There are also a few calls to shut down the Oracle database and then reopen it in this script, but this is not a tutorial on Oracle.) The key to remember is that the Services applet is where you can go to see which services are running.
net stop "OracleServiceORCL" >> c:\scripts\backup.logntbackup backup c: /d"Daily Backup" /b /l"C:\daily_bk.log" /tape:0net start "OracleServiceORCL" >> c:\scripts\backup.log
Next on my list of useful utilities is the Server applet (not to be confused with the Services applet) in the Control Panel (see Figure 19.22). This is a quick place to check several parameters that directly impact performance without setting up Performance Monitoring settings files. The items that are of interest to me are as follows:
The number of users who currently have sessions on your server. You can click the Users button to get a list of these users. If you have an intermittent problem, it might be useful to see who is always logged on when the problem occurs. You can then explore what they are doing that might be causing the problem. This might seem a bit tedious, but it might be the only way to figure out those problems that crop up and then disappear before you have time to set up formal monitoring routines.
The number of files that are locked. Certain applications are very interested in maintaining data integrity. They therefore lock up their data files while a thread is in the process of updating that file. The problem is that other user applications that need to access those files might be stuck waiting for the first application to release the data file. This period of being stuck seems the same to an end user as a grossly overloaded CPU. It is just time sitting there looking at the hourglass cursor.
The other indicators might be interesting in certain cases. You might want to explore them when you get some time.
Windows NT Diagnostics Applet
My final item on the list of other useful monitoring utilities is the Windows NT Diagnostics panel (see Figure 19.23). This is not quite up to the functionality of the Windows 95 device control panels that let you see all those nasty internal parameters such as IRQs, memory addresses, and so forth that affect your system. If you have any questions about these parameters, this is one place to find the data.
With these utilities, you have a really good shot at tracking down most of the common problems that might rear their ugly heads. Before I leave this subject, I wanted to bring out a few final points that you should consider related to monitoring utilities:
The disk monitoring utilities require a bit of overhead to monitor them (1.5 percent is the number you typically see). Microsoft has therefore decided that it is best to leave disk monitoring turned off by default. To get those disk-related counters going, you have to run a utility from the command line (diskperf -y) and then reboot the computer whose disk drives you want to monitor. After that, you have data for those counters associated with the logical and physical disk drives.
You should keep in mind that monitoring an excessive number of parameters can weigh down your system. Most of the time it is beneficial to start monitoring for general problems and then activate additional counters that relate to specific areas when the general counters indicate potential problems.
You have the option of monitoring remote NT computers with Performance Monitor. This enables you to use your personal workstation to keep an eye on several servers. When set up in this manner, you have to look in only one place to check the performance of all the computers on your network.
Time-Based Performance Monitoring
Finally, you do not have to collect performance data all the time. In many organizations, you find very little processing during the evening or night hours. Why waste disk space collecting data from numerous counters when you know they are probably close to zero? The answer to this is a neat little utility included in the Windows NT Resource Kit that enables you to start performance monitoring in the background using settings files that you have previously saved to disk. You can configure multiple settings files to take averages every minute during the normal work day and collect averages every half hour at night when you are running fairly long batch processing jobs whose load varies little over time. The service to accomplish this function is datalog.exe. You turn this on and off at various times using the command line (and batch file) utility of NT and the monitor.exe command line utility. The following example would start the monitoring utility at 8:00 and stop it at 11:59 (the morning data run):
at 8:00 "monitor START"at 11:59 "monitor STOP"
The Resource Kit provides more details about using the automated monitoring service. You still have to look at the log files to see what is happening with your system, but at least you don't have to be around to start and stop the monitoring process. It also has the convenience of letting you build and check out settings files interactively using Performance Monitor. This feature, combined with all the other monitoring functions discussed earlier, gives you a really powerful set of tools to track what is going on with your NT system.
Good news. My lengthy theoretical discussion of hardware, the Windows NT operating system components and the performance monitoring utilities is complete. I know that many of you would have preferred a simple list explaining "if you observe this, do that." If it were that simple, everyone would be doing it, and that would lower the pay scale. So I had to give you an understanding of what you were controlling. I also had to present all the tools you can use to help you when a problem comes up, as well as the basics of how to use them. I urge you to get comfortable with these tools before you actually have to use them to solve a problem with everyone breathing down your neck.
A Starting Set of Counters
There are many possible things you can look at and several tools you can use. Many of these counters are designed for very special situations and yield little useful information about 99 percent of the servers that are actually in operation. My next task is to present what I would consider to be a good set of counters to monitor to give you a feel for the overall health of your system. If one of these counters starts to indicate a potential problem, you can call into play the other counters related to this object to track down the specifics of the problem.
Going back to my earlier discussion, I simplified the numerous hardware and software systems on your computer into four basic areas: CPU, memory, input/output (primarily hard disk drives), and networking. Based on this model, I offer the following counters as a starting point for your monitoring efforts:
Processor: percent processor time. If your processor is running continuously at near 100 percent load, you are probably nearing the limit of processing capacity for your computer. If you have multiprocessor systems, remember to look at the loads on all the processor instances. It might turn out that your system and applications are beating up on one processor while the rest remain idle (one of the areas Windows NT is working upon is splitting the load across multiple processors). Remember that processing naturally comes in surges (an application waits for a while for data to load from disk, then suddenly starts off in a burst of activity), so you have to consider the load measured over a reasonable period (10 seconds or more).
Memory: pages/second. Some paging between physical and virtual memory is normal and happens even in periods of light loading. What you have to be sensitive about is when this paging becomes higher than normal and your system is spending all its time transferring information between the paging files and physical memory instead of servicing user processing needs. A number of 5 of less in this category is generally considered to be acceptable, but it varies between systems.
Physical disks: percent time. Applications tend to like to have all their files located together so they can be found and segregated from the files created by other applications. This leads to a problem: when an application is running, especially when it is supporting multiple users, it tends to place a high load on the disk drive where the file is located, whereas other disk drives might be sitting idle. Because disk drives are one of the slowest components in most applications, this is an area where you can build up a bottleneck where applications sit and wait for data transfers for one user to be completed before the other users can access their data. In databases, this is usually the single biggest tuning activity that can improve performance. General wisdom usually states that if a drive has over 90 percent disk time, it is very busy and might be a candidate for a bit of load reduction.
Physical disks: queue length. Another important indicator designed to catch the fact that several user processes are sitting idle while waiting to be granted access to a disk drive is the queue length. A good rule of thumb is that if the queue length is greater than 2, you should be investigating ways to lighten the load on this disk drive.
Server: bytes total/second. The final component you need to keep an eye on is network transmissions. This is a tough one because there are so many monitors for each of the different protocols. Perhaps this is a sign that very few people truly understand networking to the point where they are one with the network and instantly know of problems. The server total bytes per second counter provides a good overall picture of when the network and/or network interface card is struggling to keep up with the load. Anything near the rated sustained limit for the type of network that you are using (3 million bits per second for Ethernet and either 4 or 16 million bits per second for token ring) is an indication that the network itself is overloaded. This can be somewhat deceptive because you might be connected to a local network that can transmit signals at the speeds listed earlier, but you might be connected to other local networks by lines as slow as 56 thousand bits per second that might be your true bottleneck. Finally, your network might be within its rated transmission capacity, but your network card might not be. This varies between cards, so you have to get a feeling for the limit of your particular system.
These rules of thumb are fine for the industry in general, but they do not reflect the unique characteristics of your particular configuration. You really need to run your performance monitoring utilities when your system is running well to get a baseline as to what counter values are associated with normal performance. This is especially important for some of the counters that are not in the preceding list; it is difficult to prescribe a rule of thumb for them because they are very dependent on your hardware. For example, a fast disk drive on a fast controller and bus can read many more bytes per second than a slow disk drive on a slower controller. It might be helpful to run Performance Monitor with several of these other counters that make sense to you and write down what the normal values are.
You can take the preceding discussion one step further and use a utility found in the Resource Kit to simulate loads on your system to determine what counter values correspond to the limits of your system. This utility is relatively simple to operate. Your goal is to start with a moderate load and increase the load until performance is seriously affected. You record the values of the counters of interest as you increase the load and then mark those that correspond to the effective maximum values. This takes a fair amount of time, and you do not want to impact your users; however, if you are running a particularly large or demanding NT environment, you might want to have this data to ensure that you line up resources long before the system reaches its limits.
One final point is that each application has a slightly different effect on the system. Databases, for example, tend to stress memory utilization, input/output capabilities, and the system's capability to process simple (integer) calculations. Scientific and engineering applications tend to emphasize raw computing power, especially calculations that involve floating point arithmetic. It is useful for you to have an understanding of what each of the applications on your server does so that you can get a feel for which of the applications and therefore which of the users might be causing your load to increase. Remember, a disk queue counter tells you that a certain disk drive is overloaded. However, it is your job to figure out which applications and users are placing that load on the drive and to determine a way to improve their performance.
Common Problems: Detection and Correction
Unfortunately, your new-found knowledge of how to analyze the source of a Windows NT performance problem is not enough. You also need to know how to pinpoint the problem and fix it to get performance back up to standard. Armed with the magnificent array of data you collected from Performance Monitor, you should be able to have the system back to peak efficiency in a matter of minutes. Although you are well on the way to returning your system to health, you might still have a few hurdles to jump over along the way:
Problems can tend to hide one another. For example, you might have a disk drive that is being used excessively. Unless you check all your indications and think the problem through carefully, you might miss the fact that this particular problem was caused by a high degree of paging that is occurring to a drive that also contains most of your database files. This is really indicative of a problem with both memory and the way you have your disk drives laid out. Therefore, it is always wise to check all the information available to you.
You must also know what is running on your system when problems are observed in order to know what to adjust. This can be tough on systems that are processing a large number of users and a variety of applications. Just knowing that a disk drive is overloaded is not enough. You have to know which application is causing the problem so that you know what adjustments to make. (It could take a very long time if you moved files around one by one until you found the one that reduced the load to acceptable values.)
There are differing opinions as to what is satisfactory response time. People who are used to working with simple PC applications usually consider instantaneous response time to be at least adequate. If you are searching for data in a database that contains a few hundred gigabytes of information, you have to be a little more tolerant as to how long it takes to search the data for your query. This is something that should be agreed to by your organization, preferably before the applications are built, so that you can obtain adequate processing resources. It can make the administrator's job extremely difficult if you have to spend all your time chasing after performance problems that you can do nothing about because users' expectations are unreasonable.
You also have to be sensitive to the cost/performance trade-offs that have to be made. Although extra speed is a good idea in any system, it might not be worth devoting your efforts for several months to cut a second or two off user response time on a particular application. Most administrators have plenty of other work to do. Another trade-off that has to be made is whether it is cheaper to spend $400 on an extra 8MB of RAM or have a contractor come in at $50 per hour and spend a month tuning the system (at a cost of around $8,000). Technical people are often not trained to think about these considerations, but they probably should be.
Finally, you should sometimes consider nontechnical solutions to your problems before you buy equipment or spend a lot of your time on tuning. One of the classic loading problems on servers is the fact that they experience peak loads for the first hour of the business day and the first hour after lunch. This is relatively easy to explain, because most people come in, get their coffee, and log onto the e-mail system to check their mail. The same thing happens after lunch, and possibly just before the end of the business day (which also might experience peak printing loads as everyone finishes off what they have been working on and prints it before they go home). Sometimes an e-mail (or memo) to your users explaining the situation and offering them greatly improved response times if they alter their habits a little can be enough to solve the problem without any drastic measures or purchases.
Performance Tuning Methodology
It is now time to go over the analysis and tuning process itself to see what steps are involved. My basic approach to performance tuning starst with the basic performance data collected by the tools discussed in this chapter. I combine this with knowledge of what is happening on the system (what jobs are running, is this a special time such as monthly closing of the guides, and so on).
The next step is to combine the data you have gathered with your knowledge of your hardware platform and the Windows NT system. Your job is to come up with possible causes of the problem you are seeing. This is where it is especially helpful to have collected a baseline of performance data when the system is working well so that you can see relatively slight changes in the performance indicators, which might lead you to the problem. Sometimes it is useful to start by figuring out what is working correctly and narrowing down the list of things that could be causing the problem. It depends somewhat on how your brain thinks.
Once you have a list of possible causes to the problem, you might next want to collect a bit more data to confirm your theories. Also, the general performance data might not be sufficient to narrow down the problem. You might have to run additional performance counters while the system is running to get the data necessary to make a decision. This can be a challenge because users often want an immediate solution and might not understand when you take time to investigate things a bit further. Another real problem area is intermittent problems. Perhaps you are using the alert feature of Performance Monitor and come in one morning to find that the system had two nasty performance spikes yesterday. How do you re-create the circumstances that lead to those problems? You might have to run extra monitors for days or even weeks until the problem crops up again.
Once you have enough data, you need to come up with likely causes for the problem. Looking ahead to the next step, where you try to fix the problem, it is to your benefit to try to keep the list of possible causes to those that are reasonable. It is also useful to rank the items on your list in order of likelihood that they are the cause of the problem. Because your time is valuable, it would be nice if you solved the problem with your first or second try. It also keeps your users much happier.
Once you have done all the analysis you consider reasonable, it is time to actually solve the problem. Here is where you need to apply knowledge of your system to know what has to be done to solve a given problem. There are often alternative solutions to a problem. You have to use engineering judgment and experience to come up with the best solution for your environment. An important principle in scientific experiments that you need to apply to your problem-solving efforts is to change only one thing at a time on your system. If you move all the files on every one of your disk drives around, you might solve the first problem you found but create several more in the process.
So much for the basic problem-solving method. What I want to devote the rest of this section to is a list of common problems that come up on servers and some alternative solutions you might want to consider. Please remember that thousands of things could go wrong in a modern computer. The list I am presenting could never cover every possible problem and solution. I am just listing a few of the more common solutions to stimulate your creative processes when you are working on your problems.
Typical Performance Problems
CPU capacity problems are the first area I want to consider. This is an area where you have to be somewhat careful in your analysis. Adding an extra disk drive is becoming a relatively insignificant expense that will probably be needed in the future anyway, with the growth in data storage needs and application sizes. However, you are talking serious expense and effort to upgrade to a higher capacity CPU in most cases. Some servers have the capability of just plugging in additional processors, but most do not. Therefore, you better be absolutely sure you have exhausted all other possibilities before you bring up buying a new CPU as a solution. Some things to consider when you think you might have a CPU problem include the following:
Is the load placed on the CPU reasonable for the applications that are being run? This might be tough to assess for individuals who are not experienced in the computer industry. However, if you have a 150MHz Pentium processor that is being overloaded by running an e-mail system for 10 users, I would say that you have a problem with the application or other tuning parameters; this is not a load that would typically require a more powerful processor.
Do you have any new or uncommon hardware devices in your computer? A poorly written device driver can waste a lot of CPU time and weigh down the processor. You can check to see if the vendor has seen this problem before and possibly has an updated device driver that is kinder to the CPU.
Do you have monitoring data from your capacity planning efforts (which I discuss later) that backs up your claim that the user's demand has grown to the point where a new processor is needed? Have any new applications been loaded on the system that might need tuning or be the cause of the problem that forces you to upgrade the CPU?
The next area of concern is memory. One of the main complaints PC users have about Windows NT is that it demands what seems to them to be a large amount of memory. As I discussed earlier, accessing information stored in memory is so much faster than accessing data located on disk drives that operating system and application designers will likely continue to write software that requires greater memory to perform more complex tasks and produce reasonable response times. Another point to remember is that many Windows NT self-tuning activities are centered around allocating memory space to help improve system performance. A few thoughts related to memory problems include the following:
Are your applications that use shared memory areas properly tuned? An Oracle database can be tuned by the DBA to use only a few megabytes of memory or almost a hundred megabytes (I did that once on a very large data warehouse). You should check to see that these applications are properly tuned (neither too much nor too little memory used) before you go out and purchase memory. It is also useful to check whether you have any unnecessary services running. With all these automated installation utilities, it is hard to remember whether you have cleared out old application services when new ones are installed. It doesn't hurt to scan through the list of services in the Control Panel just to be sure.
What is the configuration of your memory expansion slots? Typically you have a relatively small number of memory expansion slots in a computer. Also, hardware vendors often tie pairs of these memory slots together and require you to install the chips in this bank (as they call it) in matched pairs. If you are expecting to continue to expand the use of your server and its memory requirements, you might want to consider getting slightly larger memory chips now so that you don't have to remove older, smaller chips in the future to make room for chips that are large enough to support your needs.
Next on the list of problem areas are input/output problems related to disk drives. Working with a lot of database systems, I find this to be the most common problem. The good news is that with current disk drive capacities and prices, it is also one of the easiest to solve. Almost all the servers I have worked with have their disk requirements grow every year, so it isn't much of a risk that you will never use additional disk capacity. The risk you take is that you will spend extra money this year on a disk drive that has half the capacity and twice the cost of next year's model. Some considerations when you have a disk input/output problem include the following:
One of the easiest solutions to an overworked disk situation in a system with multiple disk drives is to move files around to balance the load. The basic process is to figure out which files are accessed frequently (or accessed at the same time by a given application) and put them on separate disk drives. The biggest tuning recommendation for an Oracle database that is performing poorly due to input/output is to place the tables and indexes on separate disk drives. You have to be sensitive to how the disk drives are connected together when you are doing this. You can split the load between several disk drives so that each disk drive is well below its data transfer capacity, but you might run into problems when you exceed the transfer capacity of the disk controller card to which these drives are attached. On very large disk farms with multiple levels of disk controllers, someone has to balance the load across controllers (and controllers of controllers) in addition to worrying about the load on individual disk drives.
You will run into situations where your application load or other factors will not enable you to split the load across several disk drives (for example, all the input/output activity is centered around a single file). In this case, you might have to consider buying faster disk drives and controllers (such as fast-wide SCSI or even electrostatic disk drives) to handle these busy files and reallocate the other disk drives to other purposes (such as holding all the performance data you are going to have to log onto this heavily loaded system). You could also implement disk striping to split single data files across multiple disk drives.
Depending on which of the NT file systems (FAT, NTFS, and so on) you have used on a given disk drive, you might run into a problem known as fragmentation. Back at the dawn of time in the computer world, all data in a file had to be located in a set of contiguous blocks on a disk drive. People had a lot of problems dealing with this as they tried to write applications such as word processors, where they would write a little bit on one document, do some other things, and then write a little bit more. The disk blocks at the end of the word processing document would get filled by other work, and it would take a lot of rearrangement of files to make space for the new, larger document. The solution to this was to allow files to be split into multiple sections on the disk drive with file access utilities that are smart enough to put all the pieces together when the user accessed the file. This can be a problem in many file systems when you consider the fact that the disk drives can transmit data much more quickly than they can move the mechanical arms to which the heads read and write data. Therefore, a file that is scattered all over the disk drive (known as a fragmented file) is much slower to access than one where all the data is located in one chunk. There is some debate over whether NTFS suffers from fragmentation, but there are utilities that can defragment different types of file systems, such as Executive Software with its Diskeeper product.
An interesting situation relates to the fact that disk services have a lower priority than printing services. If you are really daring, you can go into the registry under the service\lanman\server\parameters key and raise the priority of the server from its default priority of 1 to 2 (add ThreadPriority of type DWORD with a value of 2).
An alternative that is supported on many different types of disk drives enables you to scatter a single file across multiple disk drives. You can rely on either hardware or software to let you tie together sections of several disk drives and treat them as if they were a single logical disk drive. In a two-disk pair, for example, the first logical block would be on the first physical disk drive; the second logical block would be on the second physical disk drive; the third logical block would be on the third disk drive; and so on. This technique, known as striping, is actually just one form of a technology known as RAID (redundant arrays of inexpensive disks) that can be used to improve performance and reliability.
The final area in which you might encounter problems is the network. I have always found this to be a much tougher area to troubleshoot, because I do not own it all. Even the network administrators can have troubles because servers or workstations can cause the problems as often as a basic lack of network capacity or failed equipment. One of the keys to being able to troubleshoot a network effectively is to have a drawing of how things are laid out. This is difficult because it changes regularly and most people do not have access to these drawings, if anyone has even bothered to make one. A few thoughts on the area of network problem-solving are as follows:
The best hope for quick and efficient solutions for network problems is to isolate the problem to a particular section of the network or even a particular machine. This often involves going around to friends on the various network segments and seeing how their systems are performing. Perhaps you have some of the advanced network monitoring technologies that will help you in this process. One of the most common problems on a wide area network is the limited transmission capacities of the links to remote sites. Although these network links might not be your problem to solve, you are often the first one to hear about it ("why is your server so slow today?") and you have to come up with some data to prove that it is not your problem and that someone else has to solve the problem.
If the problem is actually with your server, you might try to reduce the number of network protocols that have to be monitored by the server. It is easy to just check every protocol when installing the server so that you don't have to worry about it in the future, but this can become a burden later.
You might also want to rearrange the binding order for the various protocols to emphasize those that are more important (for example, TCP/IP for those database transactions that everyone want to speed up) at the cost of increasing response time for those services that need speed less (the print jobs where the printout can always get to the printer faster than the user can).
If all else fails, talk to network experts and see if they can recommend a network card that has better throughput than your current card. There is a wide variation in performance, with most PC cards designed for the relatively simple transmission requirements of workstations, not the more demanding needs of servers. It is less expensive than replacing your network with a faster network or altering the topology to provide better routing.
That is probably sufficient for this introductory discussion of performance problem-solving. Try not to feel overwhelmed if this is your first introduction to the subject. It is a rather complex art that comes easier as you get some experience. It is not easy, and there are people who specialize in solving the more complex problems. It is often a challenge just to keep up with all the available technology options. Windows NT will probably also continue to evolve to meet some of the new challenges that are out there, including Internet/intranet access and multimedia initiatives such as PC video. It could almost be a full-time job just to keep up with all the application programming interfaces that Microsoft releases to developers these days.
Many administrators find themselves on small local area networks that don't suffer from many performance problems. Others are attached to complex, global networks that place high demands on all parts of the system. The network monitoring utilities built into Windows NT are designed to meet the more simple needs of the smaller networks. They are designed to provide a reasonable amount of information to determine whether the problem lies with NT Server. They can even help isolate the protocol and possibly the application causing the problem.
More powerful tools can monitor the traffic levels and types of the various segments of the network itself. Many of them even run under Windows NT and are marketed by vendors such as the big network hardware vendors, Hewlett-Packard, and others. They provide convenient GUI interfaces and graphical displays of loads. They are definitely beyond the scope of this introductory discussion. You might want to keep them in mind for those times when you have proved that your server is providing adequate network transmission services but a problem somewhere in your corporate network is causing user response times to be unacceptable. Who knows, your company might already have this equipment located in one of those dark corners of the data center known only to the true network gurus.
Simple Network Monitoring Protocol (SNMP)
The Simple Network Monitoring Protocol (SNMP) started out with the goal of monitoring network-related information. However, it had the blessing of being a published standard as opposed to a proprietary tool that locked you into a particular network equipment vendor's hardware. I mention it here for two reasons. First, the use of this protocol has been expanded to monitor the functioning of devices on the network such as Windows NT Servers. You can install an SNMP agent to allow remote devices to find out how your server is doing. A second reason is that there are packages designed to collect the performance results from other devices on the network that run under Windows NT. One of your workstations or servers might be used to serve as a central network system monitoring computer.
Capacity Planning and Routine Performance Monitoring
Some people spend their entire lives reacting to what others are doing to them. Others prefer to be the ones making the plans and causing others to react to what they are doing. When you get a call from a user saying that the system is out of disk space or it takes 10 minutes to execute a simple database transaction, you are put in the position of reacting to external problems. This tends to be a high-pressure situation where you have to work quickly to restore service. If you have to purchase equipment, you might have great problems getting a purchase order through your procurement system in less than a decade (government employees are allowed to laugh at that last comment). You might get called in the middle of the night to come in and solve the problem, and you might have to stay at work for 36 hours straight. This is not a fun situation.
Although these reactive situations cannot always be avoided (and some people actually seem to prefer crisis management), I prefer to avoid them whenever possible. The best way to avoid crises is to keep a close eye on things on a routine basis and try my best to plan for the future. All the performance monitoring techniques presented in this chapter can be run at almost any time. You could be running some basic performance monitoring jobs right now. The key to planning is some solid data that shows trends over a significant period of time combined with information on any changes planned in the environment.
This process has been the subject of entire guides. There are several detailed, scientific methodologies for calculating out resource needs and performance requirements. For the purposes of this guide, I just want to present a few basic concepts that I have found useful and that can be applied with a minimal amount of effort by administrators who take care of small workgroup servers or large data centers of servers:
If your program to track performance and capacity requirements takes too much time to complete, you are probably going to have trouble keeping up with it. I would recommend automating the data collection using Performance Monitor log files either started with the at command or run as a service. Make it one of your routine tasks to collect averages or maximum/minimum values from these log files and place them in a spreadsheet, database, or log guide.
You should present this data to management, users, and so forth on a routine basis. People tend to react poorly if you suddenly drop on them that they need to buy some new disk drives next month (which is the end of the fiscal year and all the budget for such things is already spent). You might want to tie your capacity planning into the budgeting cycle for your organization.
I like to use a spreadsheet to store the data so that I can easily construct a graph showing load over time. People tend to get lost in a long series of numbers but can easily see that line of disk utilization progressing relentlessly toward its limits.
Knowing what has happened and using it as a prediction for the future is often not enough. For example, based on past trends, you might have enough processor capacity to last for two years. However, you also know that your development group is developing five new applications for your server that are to be rolled out this summer. You need to find a way to estimate (even a rough estimate is better than none at all) the impact of these planned changes on your growth curves to determine when you will run out of a critical component in server performance.
I think most people would consider this a somewhat challenging chapter to read. It was a challenging chapter to write. It was a balancing act between presenting too much information (there are entire guides devoted to Windows NT tuning) and too little. My goals for this chapter included the following:
Provide information on the basics of server hardware as they affect performance
Provide an overview of the components of the Windows NT Server operating system that affect performance, including the self-tuning features.
Discuss the monitoring tools that are provided with Windows NT.
Go over the basic methodology used to troubleshoot a problem.
Review some of the more common problems and their solutions.
Present a few extra credit topics that you should be aware of, such as network monitoring, SNMP, and capacity planning. I would recommend getting in there and trying out the performance monitoring tools that are available under Windows NT. You will be more ready to solve problems when they come up and be able to worry less about your ability to determine exactly what is going on with your system. You might also consider taking a performance baseline every now and then so that you have a set of numbers that represent your server when it is working well. You can compare the numbers generated when the server is performing poorly to this baseline to see what has changed. Finally, you should at least try to store the performance monitoring data that you collect every so often so that you have data that you can use in case someone asks you to project when you will need equipment upgrades, and so on.