console-decode-ecma48 — filter to decode ECMA-48 sequences
console-decode-ecma48
[--input] [--no-cancel] [--no-7bit] [--control-strings] [files
...]
console-decode-ecma48 reads a sequence of characters, which it treats as UTF-8 encoded ECMA 48 sequences. For each control sequence, it prints a human-readable decoded form.
It understands several of the DEC private control sequences, including those for function key inputs. It also understands the function key control sequences for Interix, SCO console, and rxvt terminals.
If no files
are supplied, console-decode-ecma48 reads from its standard input until EOF.
Otherwise it processes each named file in turn.
In both cases it writes to its standard output.
Malformed escape and control sequences, interrupted by CSI or ESC or cancelled by CAN, are discarded and not decoded; just as most terminals will ignore them. The --no-cancel command-line option disables cancellation by CAN, allowing that character to be passed through as a control character.
The --no-7bit command-line option disables all ECMA-48 7-bit code extensions except for CSI. The 7-bit code extensions are treated as escape sequences rather than as aliases for control characters.
The --control-strings command-line option enables ECMA-48 control strings introduced by DCS, OSC, PM, APC, and SOS. Because control strings have to be terminated by ST, but most outputs do not generate them properly, processing of control strings is normally disabled and these characters are simply discarded.
The --input command-line option causes console-decode-ecma48 to process data as if they are terminal input rather than as if they are terminal output (ECMA-48 applying to both).
It disambiguates certain ambiguous control sequences as function key inputs rather than as output actions.
For example, the parameters to CUP (CSI H
) have different meanings as input (repeat count and modifiers) to those as output (row and column position).
The --no-7bit and --input command-line options used together use a non-ECMA-48 decoding of 2-character escape sequences, including the ones that would be ECMA-48 7-bit code extensions. These escape sequences are decoded as accelerators.
console-decode-ecma48 always follows the new ECMA-48:1986 rule for zero-value parameters, which superseded the old ECMA-48:1976 rule. Under the new 1986 rule, zero means zero.
ECMA-48 control codes, escape sequences, and control sequences are printed using their standard abbreviations. "CSI" is "Control Sequence Introducer", for example, documented in §8.3.16 of ECMA-48:1991. And similarly "CUU" is "CUrsor Up", documented in §8.3.22. For a full list of codes, see the standard.
DEC private extensions are printed using the abbreviations from the DEC VT Programmer References (q.v.) from DECALN to DECXMT. DECFNK in particular is further decoded and printed as the name of the relevant function, cursor, or editing key, where that is known. Otherwise it is printed as DECFNK and the number of the unknown key. Modifiers are decoded and added as prefixes.
SCO console, Interix, and rxvt function keys are always printed as the name of the function key, as there is no possibility of such sequences not being function keys, as there is with the more general-purpose DECFNK control sequence. SCO console function key control sequences technically range from F1 to F48 and do not have modifiers; but they are printed using the key chords with modifiers that one would use for them in practice on an IBM PC/AT keyboard.
Other keyboard input control sequences are printed using the names of the ECMA-48 control sequences that are used for them. The conventions for these come from the SCO console and from DEC VTs and are:
⇱ Home (per the fact that the CUrsor Position control sequence, when output with no parameters, positions the cursor in the home position)
⇲ End
Centre (i.e. the key in the middle of the cursor-key cross shape)
Up Arrow
Down Arrow
Left Arrow
Right Arrow
⇧ Shift+⇥ Tab
console-decode-ecma48 uses the C/C++ runtime library for output, and therefore has the normal C/C++ stream buffering semantics. It uses the read(2) library function for input, however. To duplicate the GNU and BSD C/C++ runtime libraries' behaviours, it explicitly flushes the output's buffer whenever it is about to read more input and that input is the beginning of a new line.
Control Functions for Coded Character Sets [ECMA-48] 5th edition. 1991. ECMA International.
Information technology — Open Document Architecture (ODA) and Interchange Format: Character content architectures [ISO/IEC 8613-6:1994] International Organization for Standardization.
VT520/VT525 Video Terminal Programmer Information [EK-VT520-RM] July 1994. Digital.
VT420 Programmer Reference Manual [EK-VT420-RM-002] February 1992. Digital.