The HPFS2 filesystem format

This is the documentation for a filesystem format known as "HPFS2". This filesystem format is intended to be public and free.

The raison d'être of HPFS2

There is currently no filesystem format that both contains the extensive repertoire of features required by modern operating systems and is free of patent and trade secret encumbrances. FAT is encumbered by patents held by IBM and by Microsoft. Microsoft does not document NTFS at all. Neither IBM nor Microsoft document HPFS, and it suffers from a few unfortunate misfeatures, such as holding metadata in directory entries that should properly be held in f-nodes. EXT2/EXT3 still retains some of the more clunky warts of Unix filesystem formats, such as block maps and fixed-position fixed-length inode tables. And LEAN fails to learn from the design mistakes of the past.note

The basic structure of HPFS2

The basic data structure in HPFS is the FileNode. FileNodes have attributes (such as a last modification date) and indexes.

Indexes map keys to 64-bit numbers. There are two defined types of index in HPFS2.

The first type of index is a file-sector index. The key in a file-sector index is a simple integer, conceptually a file-relative sector number. The value that it maps to is a volume-relative sector number. File-sector indexes are thus used by FileNodes that correspond to ordinary data files. The index maps from the sector offset within the file to the actual sector in the volume that holds the file data.

The second type of index is a name index. The key in a name index is a Unicode string, conceptually a name of a directory entry. The value that it maps to is the number of an entry in the FileNode table. Name indexes are thus used by FileNodes that correspond to directories. The index maps from the name of the drectory entry to the FileNode for the directory.

Everything in HPFS2 is organized using a FileNode. The free space file is organized as a FileNode that has a file-sector index, with the data sectors that it points to comprising the free space. The bad sector list is another FileNode with a file-sector index, with the data sectors that it points to being the bad sectors on the volume. Even the FileNode table is organized as a FileNode, with the data sectors pointed to by its file-sector index being the FileNodes themselves.

HPFS2 allocates space in sectors. No specific sector size is required, but HPFS2 is discouraged on devices where the sector size is less than 512 bytes. Furthermore: HPFS2 should not be used on devices where the sector size is less than the sizes of the headers of any of the whole-sector data structures.

There are three main whole-sector data structures in HPFS2: the FileNode, the FSIndexNode, and the NameIndexNode. Each comprises an integral number of disc sectors, and each begins with a 64-bit magic number. The purpose of this is to allow volume repair utilities in the worst case, when no other form of recovery is possible, to scan the entire volume, sector by sector, and to rebuild the volume from all of the nodes that it finds.

HPFS2 does not have the concept of clusters. Clusters are used for two purposes:note reducing the number of bits used in address pointers and grouping I/O transactions into units larger than 1 sector. Neither of these apply to HPFS2. HPFS2 employs an extent-based allocation scheme, and encourages the use of an allocation policy that minimizes the number of extents required by a file. An individual address pointer is not required for each individual sector in a file, so the width of address pointers is not a pressing concern. Filesystem drivers that wish to cluster individual sector reads and writes can read and write entire extents (or fractions thereof) if they so choose, so I/O transactions can be grouped as actually appropriate, instead of only into a fixed cluster size.

The volume boot record

The one fixed-position data structure in HPFS2 is the volume boot record, located in sector 0, the first sector of the volume. The VBR contains a BIOS parameter block, and optionally also contains boot code. The operation and the location of the boot code is operating-system specific and architecture-specific, and beyond the scope of this specification.

Volume format utilities are responsible for initializing the BPBs in HPFS2 VBRs. They may employ version 4.0 BPBs, version 7.0 BPBs, or version 7.0bis BPBs. (Version 7.0bis BPBs are recommended, with version 7.0 BPBs being a second preference and version 4.0 BPBs being a third.) They may not, however, employ Windows NT BPBs, or plain version 3.4 BPB, because such BPBs contain no filesystem type information and it is thus impossible to reliably distguish HPFS2 format volumes from NTFS format volumes and FAT format volumes.

Volume boot code and filesystem drivers must detect the type of BPB that is in use by inspecting the BPB signature and filesystem type fields. The filesystem type for HPFS2 is "HPFS2   ".

HPFS2 volume boot code may employ only the following BPB fields:

BPB fields that volume boot code is permitted to use
FieldNotes
sector sizeonly to be used if the machine firmware does not supply the actual sector size
sectors per trackonly to be used if the machine firmware does not provide a logical block DASD I/O API
number of headsonly to be used if the machine firmware does not provide a logical block DASD I/O API
number of "hidden" sectorsto translate volume-relative sector numbers into disc-relative sector numbers
signature bytes and filesystem type fields
root directory start pointer (version 7.0 BPBs only)This contains the volume-relative sector number of the Master FileNode.
superblock pointer (version 7.0bis BPBs only)This contains the volume-relative sector number of the Master FileNode.

In particular, volume boot code may not use the value in the BPB drive number field, but must instead use the drive number that the machine firmware supplies to it.

Filesystem drivers and filesystem maintenance utilities may employ only the following BPB fields:

BPB fields that filesystem drivers are permitted to use
FieldNotes
signature bytes and filesystem type fields
root directory start pointer (version 7.0 BPBs only)This contains the volume-relative sector number of the Master FileNode.
superblock pointer (version 7.0bis BPBs only)This contains the volume-relative sector number of the Master FileNode.
"chkdsk" flags
volume label
volume serial number

In particular, filesystem drivers may not use the value in the BPB sector size field, or the BPB fields used for LBA→CHS translation and for volume-relative→disc-relative translation. Operating systems are required to provide a logical block I/O API to filesystem drivers, to provide filesystem drivers with the actual block size of the device, and to translate volume-relative sector numbers into disc-relative sector numbers. (Filesystem maintenance utilities that run directly on top of the machine firmware, and that do not employ an operating system, must likewise use equivalent facilities provided by the machine firmware.)

The following BPB fields must be ignored by HPFS2 boot code and filesystem drivers, but are required to contain specific values for the benefits of poorly written tools that believe that only the FAT filesystem format exists:

BPB fields that are unused but must be initialized to certain values
FieldNotes
number of sectors per allocation unitmust be 1
number of "reserved" sectorsmust be at least 1
number of FATsmust be 0
sectors per FATboth the 16-bit and 32-bit fields must be 0
number of records in the root directorymust be 0

All other BPB fields are ignored and should be initialized to 0.

If neither a version 7.0bis nor a version 7.0 BPB is employed, volume boot code and filesytem drivers must assume a fixed-position Master FileNode located at sector number 64 of the volume. It is strongly recommended that filesystem format utilities use version 7.0 or version 7.0bis BPBs, however.

Neither volume boot code nor filesystem drivers may use any of the three sectors-in-volume fields. The free space file is the sole determiner of what usable sectors exist in a volume. Filesystem repair utilities should employ external means, outside of the on-volume data structures, for determining the number of sectors in the volume.

FileNodes

FileNodes comprise a fixed header and a variable metadata area. All FileNodes in a volume are the same size, which is an integral multiple (usually 1) of the volume sector size. The sizes of FileNodes are determined at format time. Format utilities are required to ensure that FileNodes are at least 512 bytes long.

Structure of a FileNode header
Offset (octets)Description
0x0000 64-bit magic number
0x0008
FileNode flags
BitDescription
15-1Reservednote
0 If set to 1, this node is allocated. If set to 0, this node is free, and no other fields beyond this one should be considered valid.

Flags word

0x000A

16-bit offset to first metadata record

The offset is from the start of the FileNode, and is essentially the length of the FileNode header. This is intended to allow the FileNode header structure to be expanded in future revisions of the file format whilst preserving backwards compatibility. This must be a multiple of 4 octets.

0x000C

16-bit offset to first free metadata record

The offset is from the start of the FileNode. All space in the metadata area from this point up until the end of the FileNode is available for metadata records. All space from the first metadata record up to but not including this point is used by metadata records.

Filesystem repair utilities must ensure that this does not exceed the FileNode length.

0x000E 2 octets reservednote (present for the purpose of aligning fields with natural word boundaries)

FileNode metadata

The actual metadata in a FileNode are held in metadata records. Metadata records are variant records that begin with a type and a length field. A FileNode may contain at most one metadata record of any given type. If a FileNode contains more than one metadata record of any given type, filesystem drivers should use the first record, and filesystem repair utilities should discard the second and subsequent records.

Not all FileNodes will have metadata records of all types. The FileNode for the free space file need not have unnamed or named attributes, for example.

Metadata records of different types are not mutually exclusive. A FileNode is permitted to have all three of a name index, a file-sector index, and a symblic link metadata record, for example.

Filesystem drivers must preserve unaltered any metadata records of unknown type.

Structure of a metadata record's header
Offset (octets)Description
0x0000
Metadata record types
TypeDescription
0x00 Unnamed attributes
0x01 Named attributes
0x02 File-sector index
0x03 Name index
0x04 Ownership
0x05 Access control
0x06 Symbolic link data
0x07 Master FileNode data
Other Reserved for future use

8-bit metadata record type

0x0001 Reservednote (present for the purpose of aligning fields with natural word boundaries)
0x0002

16-bit offset to the next metadata record

This is the offset from the start of this record, and so is the length of this record.

When reading a record, its length may be shorter than the defined number of fields for a metadata record of the given type. (This is the case where the creator of the record uses a version of this filesystem format specification that is older than the one used by the reader of the record.) In this case, filesystem drivers and maintenance utilities are required to act as if all of the missing octets contain the value 0.

When reading a record, its length may be greater than the defined number of fields for a metadata record of the given type. (This is the case where the creator of the record uses a version of this filesystem format specification that is newer than the one used by the reader of the record.) In this case, filesystem drivers and maintenance utilities are required to treat all additional octets, beyond the fields that it knows about, as reserved.note

Unnamed attributes metadata records

The unnamed attributes comprise the conventional, fixed length, unnamed file/directory attributes required by operating systems such as OS/2 and Unix. If a FileNode lacks this metadata record, filesystem drivers must report to operating systems as if the record were present and all fields contained the value zero. (Several of the system FileNodes normally lack this metadata record, for example. But, on the gripping hand, they are not normally accessible directly by operating system APIs.)

Several operating systems include other values in the set of unnamed attributes that they themselves ascribe to files/directories. Filesystem drivers do not derive these from the unnamed attributes record, but from other metadata records. The allocated space value is derived from a FileNode's file-sector index. The EA length value is derived from a FileNode's named attributes record.

Structure of an unnamed attributes metadata record (sans header)
Offset (octets)Description
0x0004
File flags
BitDescription
15Reservednote
14Offline
13Not content indexed
12 – 6Reservednote
5Archive
4 – 3Reservednote
2System
1Hidden
0Read-only

16-bit DOS type and flags

This is identical to the OS/2 and MS/PC-DOS attribute word, except for its omission of the 'D' and 'V' attributes.

The DOS/Windows/OS/2 file flags presented by filesystem drivers to operating systems are a combination of this field and (read-only) values calculated from other metadata. Specifically:

  • The (reported) 'D' bit is set to 0 unless the FileNode has a name index and does not have either a file-sector index or a symbolic link metadata record.

  • The (reported) 'V' bit is always set to 0. Volume labels are never presented to operating systems as directory entries.

  • The (reported) 'RP' ("reparse point) bit is set to 0 unless the FileNode has a symbolic link metadata record and does not have either a file-sector index or a name index.

The Windows "Temporary", "Sparse", "Compressed", and "Encrypted" flags have no on-disc equivalents. Sparse file support is always enabled for all files in HPFS2. Whether a file is temporary or not has no relevance to the on-disc data structures. Compression and encryption will be controlled by other metadata records in future versions of this specification.

Filesystem drivers for operating systems that do not have the concept of file flags should initialize this field to 0x0000 when creating new unnamed attributes records, and preserve the current value when modifying existing records.

0x0006
POSIX permissions
BitDescription
15 – 12Reservednote
11Set-UID
10Set-GID
9Sticky
8 – 6RWX permissions for owner
5 – 3RWX permissions for group
2 – 0RWX permissions for world

16-bit POSIX permissions

Filesystem drivers for operating systems that do not have the concept of POSIX permissions should initialize this field to either 0700 or 0500 when creating new unnamed attributes records, and preserve the current value when modifying existing records. (If the operating system supports OS/2/Windows/DOS file flags, 0500 should be used if the 'R' bit in the creation file flags is set to 1, and 0700 if it is set to 0.)

0x0008
POSIX types
ValueDescription
0x00Report as regular file
0x01Block device
0x02Character device
0x03FIFO
0x04Unix-domain socket
OtherReserved for future use. Report as regular file.

8-bit POSIX type

The POSIX type presented by filesystem drivers to operating systems is a combination of this field and other metadata records. A file-sector index record implies a regular file. A name index implies a directory. And a symbolic link metadata record implies a symbolic link. Those records take precedence over this field. Only if a FileNode does not possess any of those metadata records should the value in this field be consulted.

Type 0x00 may thus seem redundant. It is not. Its purpose is to force a FileNode to be reported as a regular file, even if that FileNode lacks all of the aforementioned metadata records.

Filesystem drivers for operating systems that do not have the concept of POSIX file types should initialize this field to 0x00 when creating new unnamed attributes records, and preserve the current value when modifying existing records.

0x000A 6 octets reservednote (present for the purpose of aligning fields with natural word boundaries)
0x0010 64-bit creation timestamp, expressed as 100ns intervals since the Windows NT Epoch
0x0018 64-bit last modification timestamp, expressed as 100ns intervals since the Windows NT Epoch
0x0020 64-bit last access timestamp, expressed as 100ns intervals since the Windows NT Epoch
0x0028 64-bit last metadata change timestamp, expressed as 100ns intervals since the Windows NT Epoch
0x0030 64-bit end of file position

Named attributes metadata records

The named attributes comprise up to 64KiB of arbitrary name and value pairs, where both name and value are strings of uninterpreted octets, and where no more than one pair may have any given name. Filesystem drivers for operating systems (such as OS/2 and Windows NT) that employ the concept of extended attributes store those attributes in this metadata record.

Structure of a named attributes metadata record (sans header)
Offset (octets)Description
0x0004

File-sector index metadata records

File-sector index metadata records are pointers to B+-trees that map file-relative sector numbers to a volume-relative sector numbers. On operating systems that can only treat filesystem objects as files, directories, or symbolic links, and not as a combination of two or more, a file-sector index takes precedence over a name index and a symbolic link metadata record. Any FileNode with a file-sector index is reported by filesystem drivers as a file to such operating systems.

Structure of a file-sector index metadata record (sans header)
Offset (octets)Description
0x0004 4 octets reservednote (present for the purpose of aligning fields with natural word boundaries)
0x0008 64-bit sector number of the FSIndexNode that comprises the root of the B+-tree

Name index metadata records

Name index metadata records are pointers to B-trees that map names to 64-bit numbers. On operating systems that can only treat filesystem objects as files, directories, or symbolic links, and not as a combination of two or more, a name index takes precedence over a symbolic link metadata record but is subordinate to a file-sector index. Any FileNode with no file-sector index but with a name index is reported by filesystem drivers as a directory to such operating systems.

Structure of a name index metadata record (sans header)
Offset (octets)Description
0x0004 4 octets reservednote (present for the purpose of aligning fields with natural word boundaries)
0x0008 64-bit sector number of the NameIndexNode that comprises the root of the B-tree

Ownership metadata records

Ownership metadata records comprise ownership information that is reported to Windows NT, Unix, and other similar operating systems.

Structure of an ownership metadata record (sans header)
Offset (octets)Description
0x0004

12 octets reservednote (present for the purpose of aligning fields with natural word boundaries)

0x0010

128-bit security identifier representing the object owner's primary user ID

0x0020

128-bit security identifier representing the object owner's primary group ID

Access control metadata records

Access control metadata records comprise the access control lists that are reported to Windows NT, Unix, and other similar operating systems.

Structure of a access control metadata record (sans header)
Offset (octets)Description
0x0004 4 octets reservednote (present for the purpose of aligning fields with natural word boundaries)
0x0008 64-bit sector number of the first ACLNode in the access control list

Symbolic link metadata records

Symbolic link metadata records comprise symbolic link data that are reported to Unix and other similar operating systems as symbolic link data, and to Windows NT and other similar operating systems as reparse point data.

Structure of a symbolic link metadata record (sans header)
Offset (octets)Description
0x0004

Name data for the link target

Name data are stored as UTF-16 strings and are not NUL-terminated. (The length is derived from the metadata record length.) There are no restrictions imposed by the on-disc data structures as to what characters are legal in symbolic link target names. Such restrictions are imposed by individual operating systems.

Master FileNode metadata records

The Master FileNode metadata record has no meaning for any other FileNode apart from the Master FileNode.

Structure of a Master FileNode metadata record (sans header)
Offset (octets)Description
0x0004

16-bit number of sectors in a FileNode

0x0008

16-bit number of sectors in a NameIndexNode

0x000C

16-bit number of sectors in a FSIndexNode

Special requirements are placed upon the location of the Master FileNode metadata record within the Master FileNode itself. Because volume boot code, filesystem drivers, and filesystem maintenance utilities do not know the size of FileNodes until they have read this metadata record from the Master FileNode, the Master FileNode metadata record must be the first metadata record in the node, ensuring that it occurs within the first sector of the FileNode. Thus code can read the first sector of the Master FileNode, determine the length of FileNodes from the metadata record, and then proceed to read the rest of the Master FileNode.

Filesystem format utilities must choose sectors-per-node values that ensure that FileNodes, FSIndexNodes, and NameIndexNodes are at least 512 octets long. The recommended sizes are 1 sector for FileNodes and FSIndexNodes, and 4 sectors for NameIndexNodes.

Name indexes

Name indexes are B-tree structures. Each node in the tree is stored on disc as a NameIndexNode structure, which comprises an integral number of sectors. The root node of a tree is pointed to by a name index metadata record in a FileNode.

A NameIndexNode structure comprises a fixed-length header followed by zero or more name index entries.

Structure of a name index entry
Offset (octets)Description
0x0000

16-bit length

The length is constrained by the end of the containing Name Index Node.

0x0002
Name index entry flags
BitDescription
15 – 1Reservednote
0If set to 1, this is an internal entry in the tree and the downlink field is valid. If set to 0, this is a leaf entry in the tree and the downlink field is not valid.

16-bit flags word

0x0004

64-bit FileNode table index

0x000C

64-bit downlink to a child name index node

0x0014

Name data

Name data are stored as UTF-16 strings and are not NUL-terminated. There are no restrictions imposed by the on-disc data structures as to what characters are legal in directory entry names. Such restrictions are imposed by individual operating systems.

Comparison for B-tree ordering is case-sensitive, and the original case of an entry is preserved. (How operating systems that do not support case-sensitive filename matching implement case-insensitive lookup semantics depands from each individual operating system. One approach is, after checking for an exact match, to generate both all-uppercase and all-lowercase versions of a name and to scan all directory entries that are lexically between the two.)

No provision is made for separate 8.3 format ("short") names for directory entries. Where an operating system employs the concept of "short" names (which in practice only applies to OS/2's VDM support and to Windows NT) a filesystem driver can can adopt one of two strategies:

  • It can decide to simply render all directory entries that don't fit into the 8.3 paradigm invisible when it is requested to look up an entry by an 8.3 name. (This strategy is appropriate for OS/2, where directory lookups resulting from VDM processes contain a special flag to indicate that only 8.3 names are required.) Making an object visible to 8.3-only processes is a simple matter of creating a second hard link to the object in the same directory with an 8.3 name. This can be dealt with at application level. However, deleting the object requires that both links be removed.

  • It can decide to create 8.3 names on the fly, using an appropriate algorithm to ensure uniqueness within a single directory. (This strategy is appropriate for Windows NT, which requires that all files have two names, a "long" name and a "short" name.)

Filesystem drivers for operating systems that do not have a "long"/"short" name dichotomy, such as Unix and TAU, need take no special measures.

The zero-length string, which sorts lexically before all other entries, is a special entry that denotes what filesystem drivers should present as "." and ".." entries to operating systems. The contents of these entries are not stored in the on-disc data structures, since they are redundant. The FileNode table index for such an entry is the FileNode of the parent directory.

The FileNode table

All FileNodes in a volume are organized into a table, the FileNode table. The FileNode pointers in directory entries (i.e. the 64-bit values in name indexes) are indices into this table.

The FileNode table is itself organized via a FileNode: the Master FileNode. The FileNode table is treated as a regular file. To locate a FileNode in the FileNode table from its table index, one multiplies the index value by the FileNode size in sectors to obtain a file-relative sector number, and then looks up the sector(s) comprising the target FileNode using the file-sector index in the Master FileNode.

When a volume is formatted, the format utility allocates space in the FileNode table for a handful of default FileNodes. Other FileNodes are added to the FileNode table in the same way that sectors are added to ordinary files. The FileNode table is permitted to be sparsely allocated. If FileNodes are deleted, filesystem drivers are permitted to create "holes" in the FileNode table where they once were.

FileNodes are allocated table indices no lower than 32. Table indices from 0 to 31 are reserved for predefined system FileNodes.

System-reserved FileNode table indices
IndexDescription
0The Master FileNode
1The FileNode demarcating the volume's boot area
2The FileNode comprising all bad sectors in the volume
3The FileNode comprising the free space file
4The root directory FileNode
5 – 31Reserved for future use

The Master FileNode is only required to contain two metadata records: a file-sector index and a special metadata record for the Master FileNode. Other metadata record types do not apply to the Master FileNode. Filesystem drivers are not required to update them (such as updating timestamps in an unnamed attributes record) if they exist.

The root directory

The root directory is the only system FileNode that filesystem drivers are required to make accessible to applications softwares. Its name is whatever name is defined by the operating system for referring to the volume's root directory.

The root directory is the only directory where entry for the zero-length string in its name index (i.e. the "."/".." entry) points to the directory's own FileNode. Exactly as for all other such entries, this reference is included in the FileNode's link count. Therefore the link count of the root directory FileNode is always at least 1.

The free space file

On HPFS2, all space in a volume is allocated to a FileNode. Even the boot area belongs to a FileNode. Free space, which doesn't belong to anything else, is the property of the free space file.

When a volume is formatted, the format utility creates a free space file that owns every sector in the volume, apart from sectors allocated to the system FileNodes created at format fime. To extend a volume after formatting, it is simply necessary to extend the free space file to own the new sectors.

The free space file is required to be sparsely allocated. Moreover, it is required that its file-relative sector numbers map 1:1 to volume-relative sector numbers. Filesystem drivers and utilities must preserve this 1:1 mapping.

There are no doubled-up data structures on HPFS2. There's no index to a bitmap. The free space file's file-sector index is the free space list. If the free-space file has a file-relative sector N (i.e. that sector isn't in a "hole" caused by sparse allocation), then volume-relative sector N of the volume is free space. Checking the allocation state of a particular sector is a simple matter of looking up that sector number in the free-space file's file-sector index. Looking for the next available free sector after a given sector M involves looking for the non-sparse extent at or following file-relative sector M in the free-space file's file sector index. Allocating space involves making the free-space file sparser. Freeing space is the reverse.

The free space file FileNode is only required to contain one metadata record: a file-sector index. Other metadata record types do not apply to the free space file FileNode. Filesystem drivers are not required to update them (such as updating timestamps in an unnamed attributes record) if they exist.

The bad sector map

The bad sector map for a volume is organized via a system FileNode, the bad sector map FileNode. The bad sector map is treated as a regular file. It comprises a concatenation of all of the bad sectors in the volume. In other words: The file-sector index for the bad sector map maps file-relative sector numbers in the bad sector map file to the bad sectors on the disc. Like the free-space file owns all free sectors, the bad sector map effectively owns all bad sectors.

When a volume is formatted, the format utility creates a bad sector map comprising all of the bad sectors that it is informed about, or that it finds itself. When a filesystem driver or a filesystem repair utility finds an additional bad sector, it simply appends it to the bad sector map FileNode as if it were appending that sector to an ordinary file. The bad sector map file is permitted to be sparsely allocated, but (on the presumption that bad sectors don't turn into good ones) there is no reason for it to be.

The bad sector map FileNode is only required to contain one metadata record: a file-sector index. Other metadata record types do not apply to the bad sector map FileNode. Filesystem drivers are not required to update them (such as updating timestamps in an unnamed attributes record) if they exist.

Filesystem maintenance utilities must not attempt to "defragment" the bad sector map file.

The boot area map

The boot area for a volume is demarcated and accessed via a system FileNode, the boot area map FileNode. The boot area map is treated as a regular file. It comprises a concatenation of all of sectors in the volume's boot area. In other words: The file-sector index for the boot area map maps file-relative sector numbers to the sectors in the boot area.

The purpose of the boot area map file is twofold. It ensures that the sectors comprising the volume's boot area are considered to be "in use" by a FileNode, thus eliminating the need for filesystem drivers and repair utilities to treat them as special cases. It also provides a simple means for applications softwares to read and write the volume's boot area, treating it as an ordinary file, without the need to grant users the access rights to read and write the raw sectors of the volume as a whole.

When a volume is formatted, the format utility creates a boot area map comprising the whole of the volume's boot area, including the volume boot record. The boot area map file has a fixed size thereafter, and is not expected to be expanded or shrunk in normal operation.

The boot area map file is permitted to be sparsely allocated. If the boot area is discontiguous, the boot area map file must be sparsely allocated. (The boot area map file is not a view of the entire volume. Sectors outside of the volume's boot area must appear as "holes" in the boot area map file.) The format utility should create the file-sector index for the boot area map FileNode such that there is a 1:1 mapping between file-relative sector numbers in the boot area map file and volume-relative sector numbers in the volume.

Filesystem drivers are not required to make the boot area map file accessible as a regular file to applications softwares, although doing so makes the writing of operating system installation and upgrade tools easier. If an filesystem driver does make the file accessible as a regular file to applications softwares, then normal writes to the file that fill in the "holes" will very probably not preserve the 1:1 mapping between file-relative sector numbers and volume-relative sector numbers. (Filesystem drivers may, but are not required to, employ a special sector allocation policy specifically for this FileNode.) If filesystem repair utilities discover this invariant to be broken, they should restore it.

The boot area map FileNode is only required to contain one metadata record: a file-sector index. It may also, however, contain metadata records for ownership, access control, unnamed attributes, and named attributes. Filesystem drivers are required to respect and to update these just as they would for an ordinary file. (Thus the boot area can have an owner, access controls that prevent unauthorized access to or modification of the volume boot code, and even extended attributes such as a description EA or a modification history EA.) Other metadata record types do not apply to the boot area map FileNode.

Filesystem maintenance utilities must not attempt to "defragment" the boot area map file.

Sector allocation policy

This specification imposes no sector allocation policy upon filesystem drivers. Filesystem drivers are even free to employ the (extremely poor) allocation policy of simply allocating the lowest numbered available sector from the free-space file, should they so choose. However, the following sector allocation policy is recommended:

Footnotes

LEAN mistakes: This is not the place to go into the design mistakes of LEAN at length. But for starters: Version 1 of the specification had linked-lists of clusters, i.e. FATs in all but name, and version 2 of the specification has cylinder groups.

Reserved fields: Reserved fields in structures are set to 0 when creating a record, preserved with their current values when modifying the record, and ignored when reading the record.

Clusters: A third use for clusters is sometimes suggested: portability across devices with different physical sector sizes. Microsoft gives this as the justification for NTFS being designed to use clusters, for example. This portability is illusory, for two simple reasons:


© Copyright 2006,2009 Jonathan de Boyne Pollard. "Moral" rights asserted.
Permission is hereby granted to copy and to distribute this web page in its original, unmodified form as long as its last modification datestamp information is preserved.