If you want to see the list of sections in this document, to more easily navigate to a specific section, enable the navigation bar support in your WWW browser.
Function naming conventions, commonly known as name decoration and more commonly but less politely known as name mangling, are one of the aspects that combine to form the nebulous C and C++ concept of "linkage". The other is the calling convention used to call a function.
Function naming conventions are often, but erroneously, seen as entirely parallel to, and indeed often a mere part of, function calling conventions. But this is not in fact the case. Although the same mechanisms — extension keywords, linkage specifiers, and so forth — control both naming conventions and calling conventions, a particular calling convention does not imply a particular naming convention. This is not least because commonly a calling convention is associated with one naming convention by one compiler and with a different naming convention by another compiler.
Whilst calling conventions break down into three categories, according to whether they are platform-defined, architecture-defined, or compiler-defined, naming conventions break down according to how much extra information is being encoded into a name. For that is, in essence, what a naming convention does. It encodes extra information into the name of a function — as presented in object files to the static linker and as presented in executable program images to the dynamic linker — that allows multiple variations of what, in source code, is a single function name.
This is a basic maxim to bear in mind when thinking about naming conventions:
Naming conventions encode non-name information that, were the linker and object module format smarter, wouldn't have to be included in the name at all.
The four major types of naming convention are:
Underscore affixes: Underscore ('_') characters are affixed, in various ways, to the source code name to yield the linker-visible name.
Simple parameter size encodings: The (usually) decimal encoding of the size of the parameter list accepted by the function is affixed to the source code name to yield the linker-visible name.
Full encodings of the function type and namespace: An encoding of the function's type (in the C++ sense — i.e. the parameter type list that is involved in function overloading) and namespace is affixed to the source code name to yield the linker-visible name.
Names for nameless functions: A name is constructed for a function that doesn't actually have a name as such in the source code of the program.
Underscore affix conventions come in several forms: prepending an underscore, appending an underscore, and both prepending and appending an underscore. The most common of these is prepending an underscore.
Where the underscore prefix naming convention originated is unclear. However, the best available information about its early incarnations comes from John Levine, who writes in xyr book Linkers and Loaders:
At the time that UNIX was rewritten in C in about 1974, its authors already had extensive assember language libraries, and it was easier to mangle the names of new C and C-compatible code than to go back and fix all the existing code. Now, 20 years later, the assembler code has all been rewritten five times, and UNIX C compilers, particularly ones that create COFF and ELF object files, no longer prepend the underscore.
The purpose of the convention, as M. Levine also explains, is to encode a particular non-name datum: what language the function was written in, assembly language or the C language. The purpose of the naming convention is to intentionally make it more difficult to write an assembly language routine that invades the namespace of C language functions. This convention comes about in programming environments where assembly language essentially has a supporting rôle to the C language, providing hidden suppoert routines for operations (such as floating point calculations, stack probes, and so forth) that are essentially invisible to the C programmer.
Interestingly, this convention was soon applied backwards. In many commercial C (and C++) compilers targetting PC/MS/DR-DOS, Win16, Win32, and OS/2 over the years, the naming convention soon became that the support routines, written in assembly language, were callable directly from the C language; but in order to retain the distinction between library functions written in assembly language and applications program functions written in the C language, underscores were now prefixed to the names of the assembly language library functions, resulting in the rather bizarre circumstance that the C language function names had one underscore prefixed to them, and the functions written in assembly language had two underscores prefixed to their names. What is more, this had to be done manually in the assembly language code, whereas the C compilers would be prefixing the single underscore automatically. In many ways, this was directly fighting the opposite convention, as implemented out of the box by the assemblers and C compilers themselves.
To ask for the reasons behind this naming convention is in fact to miss the point that it wasn't ever a universal convention at all, not even on PC/MS/DR-DOS as the people who point to its desuetude by modern Unix toolchains as some supposed advantage that they have over toolchains for IBM/Microsoft operating systems. The reality is that not everything distinguished applications program functions from support library functions in this way in the first place, even on IBM/Microsoft operating systems. In Microsoft PDS BASIC for DOS, for example, the support library functions, used by the compiler to implement all sorts of things from string manipulation to array use, were written in a way that couldn't be "pronounced" in the BASIC language, by incorporating the '$' character in their names.
Appending an underscore is one of the naming conventions that is employed by Watcom C/C++. It uses the prepend-underscore and append-underscore naming conventions to encode these non-name data: whether the function was compiled in register-based mode or in stack-based mode. The idea is to make it difficult to accidentally link code compiled in one mode against code compiled in another mode.
These two modes only apply to the default calling convention of the
Watcall calling convention, and hence the naming
conventions only truly make this distinction properly, and thus only truly
prevent such accidental linking, for
Watcall functions. They
don't prevent accidentally cross-linking code using the stack-based
Watcall calling convention and code using the
cdecl calling conventions, for example, even though
accidentally linking the twain is just as much of an error as accidentally
Watcall code against stack-base