Some of what is said about IRQ levels is wrong.

You've come to this page because you've made an erroneous claim about IRQ levels. These are the Frequently Given Answers to such claims.

The myth about the speed of access to 8259s

This myth actually originates with Microsoft, and is an instance of an explanation so dumbed down that is actually wrong. It has been copied and parrotted by many people without correct attribution over the years, so it may appear that it originates with other people. But it's Microsoft's sales pitch for everyone to switch to local APICs whence it originates. The other people are simply parrotting Microsoft without actual understanding of the hardware involved.

8259 Hardware Is Slow

When the operating system raises or lowers IRQL, a new mask is written into the 8259 that enables only the interrupts allowed at this IRQL. Therefore, raising or lowering IRQL causes either two "out" instructions or software simulation of one sort or another. Each of these I/O instructions causes bus cycles that must make it all the way to the South Bridge and back.

Yes, 8259 hardware is located on the PCI-to-ISA/PCI-to-LPC bridge. But that only makes it as "slow" as an I/O space bus cycle to any other PCI device. The dual 8259s aren't actual ISA devices located on the ISA, or LPC, bus. They are implemented within the PCI-to-ISA bridge on pretty much all Intel and VIA chipsets, and are effectively ordinary PCI I/O space devices. They aren't even distinct devices. They are simply several more I/O space registers of the PCI-to-ISA/PCI-to-LPC bridge device itself.

The problem with 8259s isn't that they are slow. It's that they don't actually work in the way that everyone naïvely thinks them to work. This is another myth.

The myth about the way that the 8259 Interrupt Mask Register works, also known as the myth about "Lazy IRQL".

Even Mark Russinovich believes this myth. Here's what Windows Internals says on the matter:

HALs that use a PIC implement a performance optimization, called lazy IRQL, that avoids PIC accesses. When the IRQL is raised, the HAL notes the new IRQL internally instead of changing the interrupt mask. If a lower-priority interrupt subsequently occurs, the HAL sets the interrupt mask to the settings appropriate for the first interrupt and postpones the lower-priority interrupt until the IRQL is lowered. Thus, if no lower-priority interrupts occur while the IRQL is raised, the HAL doesn't need to modify the PIC.

This "Lazy IRQL" myth is predicated on a model of an 8259's operation that simply isn't true. One can find this model repeated all over the place, and it's how most people think an 8259 works. Here's a web diarist (who, you'll note, elsewhere on the same diary entry plagiarized Microsoft without attribution) propounding the myth:

The IMR. This register lets the programmer disable or "mask" individual interrupts so that the PIC doesn't interrupt the processor when the corresponding interrupt is signaled. For an interrupt to be disabled, its corresponding bit in the IMR must be 1. To be enabled, its bit must be 0. Interrupts can be enabled or disabled by the programmer by reading the IMR, setting or clearing the appropriate bits, then writing the new value back to the IMR.

Here's Open Systems Resources, Incorporated parrotting the same idea:

Each one of the IRQs is individually maskable, meaning it can be programmatically disabled via the 8259's Interrupt Mask Register (IMR). If an IRQ is masked, any device that is connected to that IRQ's requests for interrupts are ignored.

This is not how 8259s work, and there is explicit wording in Intel's original datasheet in several places warning the reader that 8259s don't work this way. Designers are warned not to depend on the INT signal from the 8259 to go inactive for any specific period of time; and systems programmers are warned that various things are not affected by the IMR.

The simple truth is that the IMR operates upon the interrupt request lines feeding into the IRR. It doesn't operate upon the generation of the INT output caused by the IRR. If an interrupt line signals an interrupt, and an IRR bit is set to 1 as a result, then setting the associated IMR bit to 1 has no effect. The INT signal from the chip remains active, and is not deactivated by the interrupt request being "masked out". The IMR prevents new interrupts from reaching the IRR, but it does not mask out interrupt requests that have already been signalled and recorded in the IRR. (One can read the source code for the Bochs virtual machine emulation of a dual-8259 system and notice this. Writing to the IMR doesn't turn interrupts off that have already been set on. This is not in fact an error in Bochs, as it may seem. The real 8259 hardware actually works this way.)

So the IMR cannot be used as an interrupt priority register as most people naïvely think it can be used. Once an interrupt has been raised, one cannot "raise the priority", by setting bits to 1 in the mask register, to stop the CPU from receiving it. Once the IRR has a 1 bit, the INT signal to the CPU goes active, and the only route for it to go inactive is for the CPU to perform an Interrupt Acknowledge bus cycle, thereby receiving the interrupt and executing its handler.

So the myth about "lazy IRQL" is predicated upon the idea, that is wrongly incorporated into many people's mental model of an 8259, that the IMR reflects the current IRQ level. In fact it does not. The IMR does not reflect the IRQL and never has. There is no new "Lazy IRQL" mechanism. The IMR reflects which interrupt requests the operating system wants to be temporarily silenced. One might think that that's saying the same thing. But it is not. There is a subtle but crucial difference.

The difference lies in a race condition. Hypothecate that one were using the IMR as an IRQ level register. One would "raise IRQL" by masking out more interrupt requests and one would "lower IRQL" by unmasking them again. But this does not work. If two interrupts are requested simultaneously — for the sake of exposition let us suppose they are interrupt requests #0 and #4 — then IRR bits 0 and 4 are set. (We are also supposing, for the sake of exposition, a simple IRQ number to priority mapping. 8259s can map priorities in several ways. They don't change the nature of the race condition, though, so we use a simple mapping to avoid complicating the explanation.) The 8259 signals INT to the processor, which issues an acknowledge bus cycle and executes the IRQ #4 handler, whatever that is. The IRQ #4 handler is of course using the IMR as its IRQ level register, so the first thing that it does is "raise the IRQ level" to mask out IRQs 4, 3, 2, 1, and 0. Unfortunately, bit 0 is already set in the IRR, and no manipulation of the IMR changes that. As soon as the IRQ #4 handler issues an END-OF-INTERRUPT to the 8259, it will assert the INT signal to the CPU again, for IRQ #0, even though the "IRQ level" in the IMR is supposedly at level #4, masking out IRQ #0.

Worse, most operating systems issue EOIs immediately, because they don't want the interrupt priority semantics that 8259s and their In-Service Register priority mechanism enforce. Thus what results, from the operating system's perspective, when it uses the IMR in this naïve and incorrect manner, is that the IRQ #4 handler is triggered, it raises IRQL to mask out IRQ #0, it issues an EOI to turn off the 8259's ISR priority semantics, and immediately the supposedly masked IRQ #0 occurs.

The "lazy IRQL" idea comes from a misunderstanding of how (the 8259 HAL in) x86 Windows NT uses the Interrupt Mask Register(s) of the 8259(s). The writers of x86 Windows NT are some of the few people in the world who have read the Intel 8259 datasheets and understood how they really operate. What x86 Windows NT actually does is implement the "IRQ level register", of the abstract CPU that the rest of the kernel talks to, entirely in software. It's just a location in memory, a field in a per-CPU data structure, that stores the current IRQ level of that CPU. Any IRQ can occur at any IRQ level, because IRQ levels have no hardware existence. What happens instead is fourfold:

Raising the IRQ level is simplicity. One simply checks that one is, indeed, actually raising the IRQ level (it being a fatal error for a device driver to say that it is raising the IRQ level but actually ending up lowering it) and stores the new level in the per-CPU data structure field.
If an interrupt occurs which is of greater priority than the current IRQ level, stored in the data structure, it is handled as normal. The IRQ level is raised, the 8259s are sent an EOI to clear the In-Service Register bit (and prevent an unwanted hardware priority mechanism from kicking in), and the handlers in the KINTERRUPT object chain are invoked.
If an interrupt occurs which is of lesser or equal priority than the current IRQ level, its interrupt handler is still executed but it is deferred, early on in the handler. The relevant bit in the IMR is set in order to prevent this from happening again, and a software bit for the CPU records that an interrupt at this level is pending. It has been acknowledged in hardware (the EOI being sent as normal to stop the ISR priority mechanism from kicking in), and masked from further occurrence. The operating system has to re-raise the interrupt itself at a later point. Fortunately, the x86 architecture provides an easy way to do this: the INT instruction. A deferred hardware interrupt becomes a software interrupt.
Lowering the IRQ level is the mirror image of raising the IRQ level, with an addition. It checks that it is really lowering the value, and updates the per-CPU data structure field. It also checks whether it is lowering the level below the point where deferred, pending, interrupts would (had this been a real hardware mechanism) have become unmasked in hardware. For any that have, it simply executes an appropriate INT instruction to re-raise the interrupt, which triggers the CPU's interrupt handler as if an actual interrupt cycle had occurred on the system bus. (It turns APCs and DPCs into real interrupts with this mechanism too, principally because it is simple to do so. This deferral mechanism is turning hardware interrupts into software interrupts anyway, so it's a minor matter to add in to the mix some more interrupts that never existed in hardware in the first place, because the processor architecture had no equivalent for them.) The IMR bits for the deferred interrupts are set back to 0, to allow further interrupt requests with this level to hit the Interrupt Request Register once more, now that the kernel is not deferring such interrupt requests any more.

The important distinction to bear in mind here is that the IMR is not an "IRQ level register". It instead masks out the interrupts that the 8259 hardware has triggered in the CPU at too high an IRQ level for them to be processed. It's not an "IRQ level" register. It's a "pending lesser/equal-priority interrupts already deferred once" register. In other words: If no lesser/equal priority interrupt happens to be raised during a higher priority IRQ level period, the IMR isn't touched at all. It only needs mask further (superfluous) assertions of those IRQs that have had to be deferred, and that will be raised as software interrupts once the IRQ level is once again low enough.

The myth about the distinction between "software" and "hardware" interrupts.

Because x86 PCs with 8259 interrupt controllers are widespread, and because such processor/system architectures have no notion of DPC or APC hardware interrupts, a myth has evolved that IRQ levels are somehow in two classes:

DISPATCH_LEVEL and APC_LEVEL are software IRQLs and the higher IRQ levels are hardware IRQLs.

As noted, the idea that so-called "hardware" IRQLs map to hardware is a myth, and in fact the so-called "hardware" interrupt requests are often raised via software interrupts, namely the INT instruction, because they've had to be acknowledged in hardware but deferred in software because the current IRQ level prohibits them.

The simple truth is that there is no real distinction between "hardware" and "software" IRQ levels. This is because such a mental model is missing one important fact: The whole IRQ level model is that of an abstract CPU in the first place. Saying that some interrupts are "hardware" and some are "software" misses the fact that in the CPU abstraction that is presented, all interrupts are hardware, and it is simply an accident of implementation which ones have to be implemented as software mechanisms under the covers. As far as the abstraction to which one, as a kernel or device driver programmer, programs is concerned, the abstract CPU has an IRQ level register and all IRQs are hardware interrupt requests, some of which are triggered by the act of queueing up APCs and DPCs and some of which are triggered asynchronously by devices.

As noted, on x86 systems with 8259s, it is in fact the case that all of the abstract hardware interrupts are potentially software interrupts under the covers, because such systems have no workable mechanisms for implementing everything as hardware. But, conversely but even less well known, on x86 systems with local APICs it is in fact that case that all of the abstract hardware interrupts, even the ones mistakenly labelled "software" interrupts, are implemented using real hardware interrupt mechanisms.

This is because x86 local APICs have two things:

They have a Task Priority Register in the Local APIC, which, unlike the Interrupt Mask Register in an 8259, can work like an "IRQ level register". The TPR in a Local APIC doesn't suffer from the problems of an 8259 IMR. Whereas an 8259 IMR masks the inputs to the IRR, and doesn't affect the output of the IRR to the interrupt priority and INT signal assertion logic, the TPR in a Local APIC does operate upon the interrupt priority and INT signal assertion logic, and can mask out already-raised interrupts.
They have a way to generate hardware interrupt requests, as if they had come over the interrupt bus from an I/O APIC, directed at the current CPU. These are called "self-interrupts", and are issued by programming the interrupt vector number and a "destination shorthand" of "self" into the local APIC's Interrupt Command Register.

So on a Windows NT system with a non-8259 HAL (e.g. HALAPIC.DLL, HALAACPI.DLL, and so forth) not only is the IRQL register not implemented in software, but APC and DPC interrupts are implemented in hardware, too, and are not software interrupts.

© Copyright 2011 Jonathan de Boyne Pollard. "Moral" rights asserted.
Permission is hereby granted to copy and to distribute this web page in its original, unmodified form as long as its last modification datestamp is preserved.