The gen on the C and C++ language bindings to the DOS API

Every operating system API has C/C++ language bindings, which make that API accessible to programs written in the C and C++ languages. The OS/2 system API has the <os2.h> header and the os2386.lib link library, for example. The Win32 API has the <windows.h> header and a whole bunch of link libraries such as kernel32.lib, user32.lib, gdi32.lib, and advapi32.lib. The POSIX API has <unistd.h>, <sys/stat.h>, <sys/socket.h>, and a whole bunch of other headers, and the libc link library.

For the DOS API, the C and C++ language bindings comprise the <dos.h>, <io.h>, <direct.h>, and <conio.h> headers, and a link library of wrapper and shim functions that is usually rolled into the implementation's all-in-one "runtime library".

These were supplied by pretty much all DOS-targetting implementations of the C and C++ languages, from Watcom C/C++, through Turbo C/C++ and Microsoft C, to Borland C/C++ for DOS.

There were essentially two classes of functions provided by the C/C++ language bindings: Direct wrappers for the DOS INT 0x21 API itself, that simply took their function parameters and stuck them into the appropriate processor registers before invoking INT 0x21, and "shim" functions that were layered on top of the DOS API, that did further processing to provide semantics that DOS itself did not such as POSIX-style permission flags and "text mode" files (more on which, later).

The headers

The individual headers provide access to different portions of the DOS API:

<io.h>

File I/O functionality, including:

plus a whole load of shims, more on which later.

<direct.h>

Directory manipulation functionality, including:

<conio.h>

"Console" I/O functionality, including:

<dos.h>

Pretty much everything else, such as (to pick a few examples):

Resemblance to the POSIX API

The C/C++ language bindings to the DOS API were, and still are, often conflated with the POSIX API C language bindings, but they are in fact a wholly different API, that just happens to resemble the POSIX API on a dark night if one squints heavily.

Sometimes, this resemblance was intentional. <io.h>, for example, also declared a whole raft of supposedly POSIX-alike functions, such as open(), chmod(), read(), write(), seek(), and close(). These were shim functions, which internally called the DOS API but placed some mappings atop it, changing POSIX permission flags into DOS file attributes (where possible), and implementing the handling of character 26 and of CR+LF sequences in O_TEXT mode files (which, contrary to popular belief, are not functions of the DOS API itself).

Sometimes this resemblence was a simple consequence of the fact that the DOS API and the POSIX API work exactly the same. INT 0x21/AH=0x3F, for example, has almost exactly the semantics of the POSIX API read() function from <unistd.h>: it is given a buffer pointer, an I/O handle, and a maximum size, and it reads up to that number of bytes from the handle directly into the buffer as-is, without any processing of them, and returns either an error code or the number of bytes read. Thus dos_read() from <dos.h> closely resembles the POSIX read() function.

Sometimes, there was a distinct difference, and not a resemblance at all. The most widely-known such difference is the DOS API mkdir() function from <io.h>, which takes one argument, the string to pass to INT 0x21/AH=0x39. The POSIX API mkdir() function from <sys/stat.h> takes two arguments. And of course, as mentioned, the shim functions in <io.h> that were layered on top of the DOS API itself added a whole load of "text file" processing, neither native to DOS itself nor the same as the POSIX semantics, such as special handling for character 26 and modification of CR+LF sequences. Thus, and ironically, functions like the read() shim from <io.h> were far less similar in operation to the POSIX read() function (from <unistd.h>) than the underlying DOS API dos_read() function from <dos.h> was.

Borland and conio

Originally, the C/C++ language bindings to the DOS API were as above. Then along came Borland.

Borland had to be faster than Microsoft. Its compiler had to compile faster. And programs compiled with it had to run faster. So Borland changed all of the <conio.h> functions. Instead of calling the console I/O API that the operating system actually provided, and being simple wrappers for the DOS API itself, Borland's versions of the functions bypassed DOS and either called the low-level device-specific machine firmware API, or talked to the console hardware directly, peeking and poking video RAM.

kbhit() turned into a firmware call. putch() wrote directly to VRAM and came to know about text window boundaries, scrolling flags, and colours. getche() became putch(getch()). And a whole load of new functions such as settextwindow() were added.

As a consequence of this, it became a Frequently Given Answer to point out that with a Microsoft-compiled program using <conio.h>, one could redirect the standard input and standard output of the program and it would work properly, because the DOS API that the <conio.h> functions called was of course aware of I/O handle redirection; whereas with a Borland-compiled program using <conio.h>, redirecting the standard input and standard output of the program simply wouldn't have any effect.

This was particularly galling to people who wanted to run Borland-compiled programs remotely, on BBSes that they were connected to via terminal emulators. A program that prompted for the user to press a key and then called getch() would work for BBS use if compiled with the Microsoft compiler, since the BBS software could redirect the DOS I/O handles through the serial device and DOS would handle the redirected console I/O in the normal fashion. But the same program if compiled with the Borland compiler would not work for BBS use, since getch() would talk to the firmware directly for keyboard access, rather than go through the redirectable DOS API functions.

Watcom's <conio.h> library followed Borland's, and the same Frequently Given Answer applied to Watcom-compiled programs. Thanks to Borland, the popular wisdom surrounding getch() and its companions changed to the extent that people eventually regarded them as highly hardware-specific, even though they had started off as simple wrappers for DOS API functions that could be redirected and would work with files and with most DOS character devices.

Underscores

There are various spellings of the DOS API C/C++ language binding function names. In part, these came about because of a confusion as to what implementors of the C and C++ languages should do with library functions that they supplied as standard, but that weren't part of the ISO C and C++ standard libraries. Originally, the function names were unadorned, as above. Later on, the popular belief that everything that was "non-ANSI" should be prefixed with an underscore took hold, and DOS C/C++ implementors renamed their functions to names such as _getche() and _dos_findfirst(). But because by that time there was a significant codebase using the former function names, DOS C/C++ implementations ended up with both forms in their headers, rather making a mockery of the reasons that the underscore convention was supposedly introduced in the first place. (Of course, nowadays, people appreciate far more that an operating system API's language bindings are in essence little different from any other application-mode programming library, and are not necessarily required to be specially marked with underscores.)

Relics

The C/C++ language bindings to the DOS system API are also available on compilers that don't target DOS. This is mainly to provide some form of source-compatible upgrade path for applications being ported from MS/PC/DR-DOS to the platforms that the compilers target. In these circumstances all of the functions are shims, layered on top of the native operating system API. On OS/2-targetting implementations, for example, the <conio.h> functions are layered on top of the 16-bit OS/2 VIO/KBD API and the <dos.h> functions are layered on top of the OS/2 Control Program API (i.e. the various DosXYZ() API functions).

The mapping from the DOS API shims to the actual operating system API is usually quite imperfect. For example: on OS/2 and Win32, directory searches have to be closed lest one leak handles. But this is not true for the DOS system API. The DOS API C language bindings only have _dos_findfirst() and _dos_findnext(). As a consequence of this, there's usually either a bodge in the library to attempt to reduce search handle leakage heuristically (as was the approach taken by Borland C/C++ for OS/2) or an API extension providing a new _dos_findclose() function that ported DOS code has to be modified to call (as is the approach taken by Watcom C/C++).

Interestingly, this is not a problem confined to C/C++ compilers providing a compatibility shim functions for porting DOS programs. It exists in DOS emulators, too. As Microsoft KnowledgeBase article 195930 notes, the Virtual DOS Machine subsystem on Windows NT, NTVDM, has exactly the same problem. It has to map DOS API calls made by DOS programs running within the VDM into Win32 API calls. But it has no way to know when a DOS program has finished with a directory search, unless the entire program terminates, of course. So it gradually leaks directory search handles as it calls FindFirstFile() without later calling FindClose(), until the DOS program eventually exits.


© Copyright 2010 Jonathan de Boyne Pollard. "Moral" rights asserted.
Permission is hereby granted to copy and to distribute this web page in its original, unmodified form as long as its last modification datestamp is preserved.