Dissecting the POP SS Vulnerability
- The newly uncovered POP SS vulnerability takes advantage of a widespread misconception about behaviour of pop ss or mov ss instructions resulting in exceptions when the instruction immediately following is an interrupt.
- It is a privilege escalation, and as a result it assumes that the attacker has some level of control over a userland process (ring 3). The attack has the potential to upgrade their privilege level to ring 0 giving them complete control of the target system.
- By jailing the guest OS using a hypervisor which operates at a hardware-enforced layer below ring 0 on the machine, PCs protected by Bromium are immune to threats of this nature.
Today we are dissecting the pop ss or mov ss vulnerability. To understand the attack, you must first be familiar with CPU interrupts and exceptions.
CPU interrupts and exceptions
Interrupts and exceptions are used by the CPU to interrupt the currently executing program in order to handle unscheduled events such as the user pressing a key or moving the mouse. Exceptions also interrupt the currently running process, but in response to an event (usually an error) resulting from the currently executing instruction. The need for these two CPU features stems from the fact that these events are not predetermined and must be dealt with as and when they occur, rather than according to a predefined schedule.
When an interrupt or exception is triggered, it is down to the OS to decide how to handle it. There are operating system routines for each type of interrupt which can be triggered by the CPU to determine, from the current state information, how the instruction should be dealt with. The
pop ss vulnerability takes advantage of a bug in OS routines for dealing with interrupts caused by unexpected #db (debug) exceptions triggered in the kernel.
The root of the vulnerability
Under normal circumstances, CPU exceptions are triggered immediately upon retirement of the instruction that caused them. However, with
pop ss and
mov ss instructions this is deemed unsafe since they are used as part of sequence such as the following to switch stacks:
mov ss, [rax]
mov rsp, rbp
If mov ss, [rax] causes an exception, it would be handled with an invalid stack pointer since the stack segment offset has been changed but the new stack pointer has not yet been set. Consequently, the design decision was made to trigger the exception on retirement of the instruction immediately following the offending instruction to allow the stack pointer to be set correctly.
Whilst this solved the issue of exception handling with an invalid stack pointer, it had the unforeseen side effect of creating an unexpected state for the OS if the next instruction was itself an interrupt such as in the following case:
mov ss, [rax]
In this scenario, due to the context switch caused by int 0x1 which is executed before the exception handler, the handler will be triggered from ring 0 despite being caused by ring 3 code. Since OS developers have been operating under the assumption that ring 0 code exclusively will be responsible for exceptions triggered within the kernel, this causes them to potentially mishandle this edge case whereby an exception is triggered from within the kernel by a userland process.
The paper published describes one way in which the attacker can use this scenario to manipulate kernel memory in unintended ways by tricking the kernel into operating on userland data structures when handling the exception. In the kernel, the gs segment register is used as a pointer to various kernel related data structures. Following a system call, the kernel calls swapgs to set the gs register to the kernel specific value, and, upon exit from the kernel, swapgs is called again to return it to its original userland value. However, in our case, an exception was triggered before swapgs could be executed by the kernel so it is still using a user defined gs value upon triggering the exception from ring 0.
In handling the interrupt, vulnerable OSs determine whether swapgs needs to be called based on the location which the interrupt was fired from. If the exception was triggered from the kernel, the OS makes the (incorrect) assumption that, swapgs has already been called when context was first switched to the kernel, so it attempts to handle it without executing this instruction first. As a result the exception is handled using a user defined gs register value creating the opportunity to corrupt kernel memory in a manner which allows for arbitrary code execution.
Bromium VMs are immune to escape using this kind of attack since user memory along with kernel memory is jailed within the hypervisor and specific to each instance. As a result, nothing is gained even when an attacker gains complete control of the kernel within a VM – there is no sensitive information to steal and no way for an attacker to propagate or persist.
There are certain kinds of hypervisors that are potentially vulnerable to this attack including Xen legacy PV VMs. This is because the architecture runs the hypervisor at ring 0, whilst the kernel and userspace both operate as ring 3 processes communicating with one another via the hypervisor. As a result, if the hypervisor mishandles an exception, this allows for the attacker to obtain ring 0 on the physical machine effectively escaping the VM.
The Bromium hypervisor is protected from this threat since it operates at a hardware-enforced layer which sits behind ring 0 for all the VMs on a system.
Learn more about the Bromium Secure Platform.