The Win32 API offers a set of new functions and concepts for accessing and managing disk files. This is in addition to the low-level and stream I/O functions that are available as part of the C and C++ run-time libraries. This chapter reviews all forms of file handling that are available to 32-bit Windows applications.
Figure 14.1 illustrates the relationship between DOS/UNIX-style "low-level" I/O, C/C++ style stream I/O, and the Win32 file I/O functions.
Win32 applications should preferably use Win32 file I/O operations, which provide full access to Win32 security features and other attributes and also enable asynchronous, or overlapped, input and output operations.
A typical file is a collection of data stored on nonvolatile media, such as a magnetic disk. Files are organized into file systems. File systems implement a particular scheme for storing files on physical media and for representing various file attributes such as filenames, permissions, and ownership information.
File system information can be obtained by calling the function GetVolumeInformation. Information about the nature of the storage device can be obtained by a call to GetDriveType.
Windows NT recognizes four file systems. File Allocation Table (FAT) file systems are compatible with earlier versions of DOS. High Performance File System (HPFS) is the file system used by the OS/2 operating system. New Technology File System (NTFS) is the "native" file system of Windows NT. Finally, an extension of the FAT file system, the Protected Mode FAT file system, supports long filenames on volumes otherwise compatible with earlier versions of MS-DOS.
From an application's point of view, the major difference between the various file systems is the support for special attributes. For example, NTFS volumes support the concept of file ownership and security attributes, which are unavailable in the case of FAT file systems.
NOTE: Windows 95 only recognizes FAT and Protected Mode FAT file systems.
Windows supports file sharing across a network. Network file systems may appear under local drive letters through network redirection. Alternatively, applications may access files across a network using UNC (Universal Naming Convention) names, such as \\server\vol1\myfile.txt. Different networks may or may not support long filenames.
Starting with Version 3.51, Windows NT now supports per-file compression on NTFS volumes. On the other hand, Windows 95 supports DriveSpace compression of FAT volumes. Unfortunately, the two compression mechanisms are not compatible; at present, only uncompressed FAT volumes can be accessed by both Windows 95 and Windows NT.
In 32-bit Windows, an open file is treated as an operating system object. It is referenced through a Win32 handle; this is not to be confused with the DOS/UNIX style "file handles," which are basically integers assigned by the operating system to represent open files.
Because a file is a kernel object, in addition to file system operations, many other handle-based operations are also possible. For example, it is possible to use the WaitForSingleObject function on a file handle opened for console I/O.
A file object is created by a call to the CreateFile function. This function can be used to both create a new file and open an existing file. The function name may appear to be a misnomer unless you realize that what the function creates is the file object, which represents either a new or an existing file on the storage device.
Parameters to this function specify the access mode (read or write), file sharing mode, security attributes, creation flags, file attributes, and an optional file that serves as an attribute template.
For example, to open the file C:\README.TXT for reading, one would issue the following call to CreateFile:
The first parameter is a filename. Applications can also use the UNC name. The length of the filename is limited to the value of the MAX_PATH constant. Under Windows NT, this can be circumvented by prepending "\\?\" to the path and calling the wide version of CreateFile, CreateFileW. The prefix "\\?\" tells the operating system not to parse the path name.
Another parameter that deserves special interest is the fourth parameter; this is of type LPSECURITY_ATTRIBUTES. Through this parameter, applications request security attributes for the new file object and may also specify the security attributes for newly created files. However, for this parameter to have any effect, it must be supported by the operating system and the file system. In other words, unless the file is on an NTFS volume and the operating system is Windows NT, advanced security features will not be available. Nevertheless, one member of the SECURITY_ATTRIBUTES structure, the bInheritHandle member, is still useful; it controls whether a handle to the object is inherited by child processes.
Win32 applications should not use the OpenFile function for opening files; this function is provided only for compatibility with 16-bit Windows.
An open file object can be closed by calling the CloseHandle function.
Input and output are accomplished with the help of the ReadFile and WriteFile functions. Need I say more? Listing 14.1 contains a simple program (compile with cl filecopy.c), which copies the contents of one file to another using the Win32 functions CreateFile, ReadFile, and WriteFile.
Listing 14.1. Copying a file using Win32 file functions.
For random-access files, Win32 provides the SetFilePointer function to position the file pointer before reading or writing. The file pointer is a 64-bit value that determines the position of the next read or write operation within the file. The SetFilePointer function fails if it is called with a handle to a device that cannot perform seek operations, such as the console or a communication port.
SetFilePointer can also be used to retrieve the current value of the file pointer. Call this function as follows:
A recurring problem when programming interactive applications that perform file I/O is the issue of responsiveness. Typical file system calls are blocking calls; for example, a call to scanf may not return until there are enough characters in the operating system's input buffer to complete the call. This is rarely a problem with fast, hard disk-based file systems; however, when the input operation is performed, for example, on a communication port, the problem becomes much more acute.
In 32-bit Windows, there are several solutions to this problem. An obvious solution is to use multiple threads; a dedicated thread may perform the input function and remain blocked indefinitely, without affecting the responsiveness of the application's user interface, managed by another thread. A simple communication program using the multithreaded approach is demonstrated in Listing 14.2. This program can be compiled from the command line by typing cl commthrd.c. (I return to the subject of using the console and communication ports in more detail later in this chapter.)
NOTE: Under Windows 95, overlapped I/O operations cannot be used on disk files.
Listing 14.2. Simple communication program using multiple threads.
This program uses the COM2 port for communications. In order to test it, you should have a modem attached to that port. If your modem is attached to a different port, change the port name in the second CreateFile call and recompile the application.
After opening the communication port COM2 for reading and writing and the console for input, the program proceeds with initializing the port. As part of the initialization, it sets up an infinite time-out for reading; it also initializes communications through a DCB structure. (Yes, the seemingly superfluous GetCommState/SetCommState pair of calls is actually necessary.) After initializing the communication port, the program creates a secondary thread, which opens the console for writing. The purpose of the secondary thread is to perform input on the communication port in a loop, while the primary thread does the same on the console. Whenever the primary thread receives a character on the console, it outputs that character on the communication port. Whenever the secondary thread receives a character from the communication port, it writes that character to the console.
The loops are terminated when a Ctrl+X character is received from the keyboard. The primary thread sets the bDoRun variable (notice that it is declared volatile) to FALSE. It also calls PurgeComm to ensure that any pending ReadFile calls in the secondary thread would be interrupted so the secondary thread can terminate its execution.
While using multiple threads is always a viable option for 32-bit applications, it may not always be the most convenient solution. Another approach available to 32-bit applications is the use of overlapped I/O. Overlapped I/O enables an application to initiate an I/O operation in a nonblocking fashion. For example, if an application uses the ReadFile function for overlapped input, the function will return even if the input operation has not yet been completed. Once the operation is complete, the application can retrieve results using the GetOverlappedResult function. Applications can also use the ReadFileEx and WriteFileEx functions for overlapped I/O operations.
Overlapped input can also be used in combination with a synchronization event. Processes can use the synchronization event to receive notification when the I/O operation has been completed. Using events and the WaitForMultipleObjects function, it is possible to wait for input on several input devices at once. This is exactly the technique demonstrated by the second version of this simple communication program, shown in Listing 14.3. This program can also be compiled by a simple command-line instruction, cl commovio.c.
Listing 14.3. Simple communication program using overlapped I/O.
As before, if your modem is not attached to COM2, it may be necessary to change the port name in the second CreateFile call and recompile the program before using it.
Like its multithreaded counterpart, this program also begins by opening the console and the communication port and initializing the port. Part of the port initialization is a call to the SetCommMask function, which enables read event notifications for that port.
The communication port is opened with the FILE_FLAG_OVERLAPPED attribute. This enables overlapped I/O operations. When the program's main loop is entered, a call is made to ReadFile, passing to it a pointer to an OVERLAPPED structure.
ReadFile may return data immediately if data is available on the port. If not, ReadFile still returns, but signals an error; GetLastError can be used to check for the error code ERROR_IO_PENDING. (For simplicity, this part has been left out from the code; we just assume that any ReadFile error indicates pending input.)
The heart of this program is the call to WaitForMultipleObjects. The function waits on two objects: an event object that was specified as part of the OVERLAPPED structure used in reading the communication port and the console input object. In the case of the latter, it is not necessary to use overlapped I/O; the console object has its own signaled state indicated that data is waiting in the console's input buffer.
When WaitForMultipleObjects returns, it indicates that data arrived either on the console or on the communication port. It is the subsequent switch statement that distinguishes between the two. Retrieving the console event requires code that is a bit tricky. Unfortunately, a simple ReadFile would not suffice as it leaves the key up event in the console's input buffer, leaving the console object in a signaled state. A subsequent ReadFile would then result in a blocking read, waiting until a key is pressed once again. Because of this behavior, it was necessary to use low-level console functions to retrieve (and discard) all console events, so when WaitForMultipleObjects is called again, the console object would no longer be signaledat least not until the user presses a key again.
Figure 14.1 makes it obvious that the term "low-level I/O" is somewhat of a misnomer for file descriptor-based I/O operations. Indeed, this term is a relic, a leftover from DOS and UNIX; although Windows NT provides these functions for compatibility with those operating systems, they are effectively implemented using CreateFile, ReadFile, and WriteFile.
A file descriptor is an integer identifying an open file. A file descriptor is obtained when an application uses the _open or _creat functions. Note that throughout the run-time library documentation, file descriptors are often referred to as file handles; once again, this is not to be confused with Win32 handles for file objects. A handle returned by CreateFile and a file descriptor obtained by calling _open are not compatible.
A file can be opened for low-level I/O using the _open function. A new file can be created for low-level I/O using _creat. Both of these functions have wide character versions; that is, versions which accept a Unicode filename string under Windows NT: _wopen and _wcreat.
Reading and writing can be performed by calling the _read or _write functions. Seeking within the file is accomplished by calling _lseek. The current position within the file can be retrieved by calling _tell.
The contents of any buffers maintained by Windows can be committed to disk by calling _commit. The file can be closed by calling _close. The _eof function can be used to test for an end-of-file condition. All low-level I/O functions use the errno global variable to indicate other error conditions.
The names of all these functions begin with an underscore to indicate that they are not part of the standard ANSI function library. However, for programs that may use the old names of these functions, Microsoft provides the oldnames.lib library.
C programs that use stream I/O utilize the FILE structure and related family of functions. A file is opened for stream I/O by calling the fopen function. This function, if successful, returns a pointer to a FILE structure, which can be used in subsequent operations, such as calls to fscanf, fprintf, fread, fwrite, fseek, ftell, or fclose. The Visual C++ run-time library supports all standard stream I/O functions as well as several Microsoft-specific functions.
For applications that mix calls to stream I/O and low-level I/O functions, the _fileno function can be used to obtain a file descriptor for a given stream (identified by a FILE pointer). The _fdopen function can be used to open a stream and associate it with a file descriptor that identifies a previously opened file.
Applications can also access the standard input, standard output, and standard error through the predefined streams stdin, stdout, and stderr.
The base class of all iostream classes is the ios class. Normally, applications do not derive classes from ios directly. Instead, they use one of the derived classes, istream or ostream.
Variants of the istream class include istrstream (operates on an array of characters stored in memory), ifstream (operates on a disk file), and istream_withassign (a variant of istream that enables assignments to work).
Variants of ostream include ostrstream (stream output to a character array), ofstream (stream output to a file), and ostream_withassign (variant of ostream that enables assignment).
The predefined stream object cin, representing standard input, is of type istream_withassign. The predefined objects cout, cerr, and clog, which represent standard output and standard error, are of type ostream_withassign.
The class iostream combines the functionality of the istream and ostream classes. Derived classes include fstream (for file I/O), strstream (for stream I/O on a character array), and stdiostream (for standard I/O).
All ios-derived objects make use of the streambuf class or the derived classes filebuf, stdiobuf, or strstreambuf for I/O buffering.
In addition to handling disk files, the Win32 file management routines can be used to handle many other types of devices. These include the console, communication ports, named pipes, and mailslots. Functions such as ReadFile or WriteFile may also accept socket handles created by the WinSock functions socket or accept depending on the WinSock implementation.
In the next section, we discuss console and communication port I/O.
Win32 applications can use the CreateFile, ReadFile, and WriteFile functions to perform console input and output. Consoles provide an interface for character-based applications.
Unless its input or output are redirected, an application inherits file handles to the console that can be obtained by calling the function GetStdHandle. However, if the application's standard handles are redirected, GetStdHandle returns the redirected handles. In this case, applications can open the console explicitly by using the special filenames CONIN$ and CONOUT$.
When opening the console for input or output, make sure to specify the FILE_SHARE_READ or FILE_SHARE_WRITE sharing mode, respectively. Also, use the OPEN_EXISTING creation mode. For example:
In order to be able to perform certain operations on a console opened for reading (such as flushing the input buffer or setting the console mode), it may be necessary to open the console for reading and writing (GENERIC_READ | GENERIC_WRITE).
By default, the console is opened for line-oriented input. The SetConsoleMode function can be used to change the input and output mode. For example, to set the console to raw input mode (every character is returned immediately, no control character processing takes place) use the following call:
The ReadFile function can be used to read keyboard input from the console. However, it is recommended that applications use ReadConsole instead where applicable; unlike ReadFile, ReadConsole can handle both ASCII and Unicode input on Windows NT.
To write to the console, the WriteFile function can be used. WriteConsole has the same functionality but it can also handle Unicode output on Windows NT, and thus it is the preferred function for console output.
What I explained so far may give the impression that using the console is just a glorified way of accomplishing what can easily be done using standard C/C++ library functions. This is not so. A console has many capabilities other than providing a facility for keyboard input and character output.
In addition to character input and output, a console also handles mouse input and provides some window management functions for the console window. The low-level console input functions ReadConsoleInput and PeekConsoleInput can be used to retrieve information about keyboard, mouse, and window events. On the output side, the low level output function WriteConsoleOutputAttribute can be used to write a text and background color attributes to the console.
Graphical Windows applications do not by default have access to a console. These applications can explicitly create a console by calling the AllocConsole function. A process can detach itself from its console by calling the FreeConsole function.
The console window's title and position can be controlled using the SetConsoleTitle and SetConsoleWindowInfo functions, respectively.
Communication ports are opened and used via the CreateFile, ReadFile, and WriteFile functions. That said, there are several other functions that applications must use to set up the communication ports and control their behavior.
The basic setup of a communication port takes place using a DCB, or device control block structure. Members of this structure specify the baud rate, parity, data and stop bits, handshaking, and other aspects of port behavior. The current settings can be obtained using the GetCommState function and can be set by calling the SetCommState function. A helper function, BuildCommDCB, can be used to fill in parts of this structure on the basis of a string formatted in the style of an MS-DOS MODE command.
Read and write time-out behavior is controlled through the COMMTIMEOUTS structure. Current time-outs can be retrieved by calling GetCommTimeouts and set by calling SetCommTimeouts. The helper function BuildCommDCBAndTimeouts can be used to fill in both a DCB structure and a COMMTIMEOUTS structure using command strings.
The size of the input and output buffers can be controlled by calling the SetupComm function.
The WaitCommEvent function can be used to wait for a specific event to occur on the communication port.
The SetCommBreak function places the communication line in a break state. The state can be cleared by calling the ClearCommBreak function.
The ClearCommError function can be used to clear an error condition. This function also reports the status of the communication device.
The PurgeComm function can be used to clear any I/O buffers associated with the communication port, and to interrupt pending I/O operations.
The TransmitCommChar function transmits a character on the communication port ahead of any pending data in the output buffer. This function can be used, for example, to transmit interrupt characters such as Ctrl+C.
The communication port can also be opened for overlapped I/O operations. An event mask, which controls which events set the state of the event object (specified as part of the OVERLAPPED structure), can be set using the SetCommMask function. The current event mask can be retrieved by calling the GetCommMask function.
Low-level access to port functions is provided by the functions EscapeCommFunction and DeviceIoControl.
Further information about the communication port and its status can be obtained by calling the GetCommProperties and GetCommModemStatus functions.
Windows 95 also offers the function CommConfigDialog, which displays a driver-specific configuration dialog for the specified communication port.
Win32 applications can access files through three distinct sets of file management functions: stream and low-level I/O functions that are part of the C/C++ run-time libraries and Win32 file management functions.
A file is a collection of data on a storage device, such as a disk. Files are organized on the device into file systems. Windows NT recognizes the DOS FAT, protected-mode FAT, NTFS, and HPFS file systems. Windows 95, in contrast, can only deal with FAT and protected-mode FAT file systems. Windows NT supports file-level compression on NTFS file systems; Windows 95, in turn, supports volume compression through DriveSpace. On NTFS volumes, Windows NT also supports advanced security features.
In addition to disks, files may also be accessed on CD-ROM and remote volumes across the network.
The Win32 file management functions treat an open file as an operating system object, referred to by a handle. At the heart of Win32 file management are the functions CreateFile, ReadFile, and WriteFile. Through file handles, I/O operations can be performed synchronously and asynchronously; for the latter, applications can use the technique of overlapped I/O. Overlapped I/O enables an application to regain control after an I/O call, even before the I/O operation is finished, and do something else until it is notified of the operation's completion.
Win32 file management routines can also be used to perform I/O on the standard input, standard output, and standard error. Handles to these can be obtained by calling the GetStdHandle function.
Applications can also access files through DOS/UNIX style low-level I/O functions. The names of these functions are preceded by an underscore (for example, _open, _read) to indicate that they are not part of the standard ANSI library. However, applications can be linked with the oldnames.lib library file if the use of the old names without underscores is desired.
In addition to low-level I/O, applications can also use stream I/O. This includes C functions such as fopen, fprintf, fscanf, or fclose; it also includes the C++ iostream classes.
The Win32 file management functions can also be used to access special devices. They include the console, communication ports, named pipes and mailslots, and sockets opened by calls to socket or accept. Several special functions provide fine control over console and communication port I/O.