Capturing console I/O in Win32

You've come to this page because you've asked a question similar to

I am writing a Win32 program to run on Windows NT (versions 4 to 10), and I want to capture the output sent to the console of a DOS or a Win32 console application that I spawn from my program, and control its console input. How do I do this ?

This is the Frequently Given Answer to that question.

(This used to be the same as the answer given, formatted slightly differently, on J.M. Hart's Critical Comments page.)

How to do this

You used not to be able to do this. As of 2018, you now can, albeit only in Windows NT 10. There is an updated console I/O server available with a way to attach to its back-end, at long last.

You make some pipes to connect things up, then you call CreatePseudoConsole() to create a "pseudo-console" and connect the pipes to it, and then you start your console programs with a special flag that attaches your pseudo-console to their consoles. You send an ECMA-48 formatted stream of input down one pipe, which the "pseudo-console" turns into normal Win32 console input events for mouse and keyboard inserted into the console's input buffer. And you receive an ECMA-48 formatted stream of output, describing changes to the Win32 screen buffer of your console and cursor updates.

Microsoft has a more detailed description.

As an aside: there is no distinction between DOS and Win32 console applications here. On Windows NT, the case of DOS applications is actually a subset of the more general case of "console mode" applications, because DOS applications run (on those 32-bit versions of Windows NT that still support this) as coroutines within a Win32 process (NTVDM) that translates their I/O to Win32 console I/O equivalents.

Why this used to be impossible

Because of a programming tradition on Microsoft operating systems that goes all of the way back to the earliest versions of MS-DOS, there was no one single way to capture the output or control the input of a textual application.

There are two classes of console I/O applications. The important difference between the two is whether they read from and write to their standard input and standard output in "glass TTY" fashion using ReadFile() and WriteFile() (what Win32 terms high-level console I/O), or whether they use "random access" APIs such as WriteConsoleOutput() (what Win32 terms low-level console I/O).

This is a paradigm that existed throughout the 1980s, gradually evolving into the Windows console I/O paradigm that has existed from the 1990s onwards. DOS programs that use INT 21h to read from and write to a CON device are using high-level console I/O; and DOS programs that use INT 10h or that write directly to video memory are using low-level console I/O. OS/2 also had this distinction, between DosRead() and DosWrite() from/to a CON device and using the VIO/KBD/MOU subsystems.

To capture the output and control the input of programs that use low-level console I/O, one sat in a loop whilst the child process is executing, continuously monitoring the contents of the console screen buffer using ReadConsoleOutput() and sending any keystrokes using WriteConsoleInput().

There were several problems with this design.
- One minor problem was that it didn't cope at all well with Win32 programs that take full advantage of the features of the Win32 low-level console paradigm and use alternate console screen buffers for their output.
- A more major problem was that because it used polling (Win32 not providing any handy event mechanism to latch on to so that the monitor could know when the buffer has been modified, screen buffers not being synchronizable objects in Win32) it was always going to be both unreliable and expensive. It was unreliable in that depending from the relative scheduling priorities of the various processes, something which is going to vary dynamically with system load, it may well have been the case that the child program was able to generate reams of output that the monitoring process would simply miss because its console monitoring thread wouldn't be scheduled to run often enough. It was expensive in that if the child process happened not to generate any output for a while, the monitoring process was going to consume CPU time needlessly.
To capture the output and control the input of programs that use high-level console I/O, one redirected their standard input and standard output through pipes, and read from and wrote to the other end of the pipes in the monitoring process.

The advantage of this method was that one didn't need to worry about missing any output, since this approach didn't use polling. But, conversely, this method had a problem of its own, in that it wouldn't capture any output generated by low-level console I/O, and that programs that use low-level console I/O for input would bypass the redirection entirely. Alas, all too many DOS programs (and some Win32 programs) fall into this category.

Cygwin substituted pipes for consoles, with mechanisms that hid this from Win32 applications that were compiled against a special Cygwin library, and had compatibility problems with non-Cygwin programs because of it.

It was difficult to combine the two mechanisms into one for capturing output, and practically impossible to combine them for controlling input. So really one had to know ahead of time the type of textual application that was going to be run, i.e. whether it was going to be using high-level or low-level console I/O, and select the appropriate mechanism accordingly.

The solutions

The perfect solution that copes with both sorts of programs simultaneously is for Win32 to provide functionality akin to what in the UNIX world is known as a pseudo-TTY. Win32 needed to provide some means for a monitoring process to hook into the "back" of a console instead of the "front" as seen in normal use. The monitor process writes data to the back of the console and the console turns those data into keystrokes that applications reading from the front of the console, by either means, see as input. All output written to the console, by either means, is translated into a single encoded bytestream that the monitor process can then read from the back of the console, in sequence, without needing to poll, and without missing any data.

At last, after almost 20 years of this Frequently Given Answer being given, on this WWW page and otherwise, Windows has such a mechanism.

Microsoft actually addressed this twice. The first was with Interix, the second (and oft-forgotten) POSIX Subsystem for Windows NT, originally written by Softway Systems which Microsoft then took over. This had a full terminal I/O control sequence processor (and its own Interix terminal type that you can still find in terminfo/termcap databases) that performed its I/O against consoles. It even had real pseudo-terminals, with fully-fledged line disciplines and signal handling. The mechanism was, however, buried within the POSIX Subsystem itself, which used internal mechanisms between it and CSRSS.EXE to share consoles with the Win32 Subsystem. It was not accessible to Win32 programs, and couldn't be used by a Win32 program to capture the console I/O of a Win32 program.

The simplest design for giving this capability to Win32 programs would have been to make console screen buffers synchronizable objects; as console input buffers already were (albeit by some handle fudging involving a hidden event object). To that, one would add some form of "damage rectangle" recording what parts of the screen buffer had been written to (since the last damage rectangle was collected) and API functionality to retrieve it. Programs that wanted to capture console I/O could have waited on the screen buffer for updates with WaitForMultipleObjects() and used the existing API functions to read out the screen buffer contents from the "damage" area.

The mechanism that Microsoft produced the second time around is more ambitious than that, albeit not yet (as of 2020) as functional as Interix was when it comes to control sequence processing. It includes I/O translation mechanisms on both the front-ends and the back-ends of consoles, resulting in a four-way translation graph. As aforementioned, the application capturing console I/O attaches to the back end with bidirectional translation and the target applications whose 1980s-style terminal I/O is being captured attach to the front with bidirectional translation (new 1990s console I/O applications not needing these I/O translation layers).

It does the damage area processing internally. The downside of this is that what comes out of the "pseudo-console" is not an exact duplicate of what an old 1980s terminal I/O application may be generating, because that first gets turned into changes to the screen buffer, which then in turn get turned back into ECMA-48 control sequences. This process is not necessarily a 1:1 mapping, as the "pseudo-console"'s update processing may decide on a different encoding of the updates. (Working out how to turn updates to a cell matrix into a stream of old terminal I/O control sequences can be done in several ways, some vastly less efficient than others, especially if comparatively more recent terminal I/O mechanisms like Indexed or Direct RGB colour are employed on the being-captured-application side.)

Old 1970s and 1980s applications

The new console I/O paradigm that developed over the course of the 1980s was preceded by the old terminal I/O paradigm of the 1970s and very early 1980s. As a bonus, but not strictly connected to capturing console I/O, old terminal I/O applications from back then are also now catered for.

There is a second ECMA-48 translation layer at the front end of a Win32 console that allows the console applications themselves to also speak ECMA-48 input and output. This is a giant step backwards, though, from the new console I/O paradigm of the 1990s back to the old terminal I/O paradigm of the 1970s and very early 1980s.

Moreover, it has a number of problems. For one thing, the "terminal I/O" mode flag is global to the console rather than local to each application using it, resulting in the same sort of conflicts between old terminal I/O and new console I/O applications that one sees when setting shared file descriptors to non-blocking I/O mode on POSIX operating systems. For another, there's no terminfo/termcap equivalent and no TERM environment variable convention in Win32; your application has to hardwire control sequences, or adopt some third-party terminfo/termcap convention.

If your application has already come forward into the 1990s, it might be better to stay there. ☺

Capturing console I/O in Win32

How to do this

Why this used to be impossible

The solutions

Old 1970s and 1980s applications

Further reading