An overview of Intel SGX

Intel Software Guard Extensions (SGX) is a set of processor instructions which leverage trusted hardware to enhance application security by allowing developers to isolate sensitive information using encrypted containers.

In this post I’m going to introduce Intel SGX, describe the basic architecture and functionality and also position SGX in the context of related technologies such as ARM TrustZone. Finally, I’ll give a high-level overview of the literature on SGX and discuss what I think the future might look like.

What is SGX?

SGX belongs to a relatively new group of technologies concerned with providing a secure environment for application code and data, also known as a Trusted Execution Environment (TEE) or trusted computing. SGX allows an application to instantiate an isolated container, known as an enclave, which provides confidentiality and integrity to its contents. Each SGX enclave can be thought of as a `reverse sandbox’ which protects the application code and data from a compromised host. Critically, this means that it’s possible to protect selected code and data from disclosure or modification, even when all of the privileged software (OS, hypervisor, firmware, etc.) is potentially compromised.

Intel SGX Architecture Diagram
The Intel SGX architecture showing an enclave in each of two applications. Compromises in the host operating system, hypervisor, firmware, BIOS etc. are unable to access or modify the enclave contents. (Image credit to Intel)

Since SGX enclaves provide a TEE on an otherwise untrusted platform, providing integrity and confidentiality guarantees for selected code and data, it enables secure computing on someone else’s computer.

For example, SGX can be used for Digital Rights Managements (DRM) and to prevent reverse engineering by having an enclave securely provision a public-key pair, encrypting all sensitive content using the enclave public key and then having the trusted CPU decrypt and execute the encrypted content. The `killer-app’ for SGX however is in enabling sensitive computation and data to be outsourced to the cloud. In particular, SGX can be used to build applications for processing confidential medical images, for privacy-preserving machine learning as a service, to enhance the security and privacy of the Tor privacy network and for outsourcing databases that guarantee confidentiality and integrity despite a malicious cloud provider, administrator, operating system etc.

Trusted Hardware and Attestation

For SGX to provide meaningful confidentiality and integrity assurances, the source of sensitive code and data must be able to determine that an enclave is being executed on trusted hardware. How can we be sure that a malicious OS or hypervisor hasn’t emulated the hardware and is planning to decrypt and read the contents of the enclave?

Like it’s predecessors the Trusted Platform Module (TPM) and Intel Trusted Execution Technology (TXT), SGX uses hardware secrets and an attestation protocol to enable remote authentication of genuine enclaves and their contents. Unlike TPM and TXT which authenticate the entire system software stack and the virtual machine code, respectively, the attestations necessary for SGX only concern the contents of a particular enclave and therefore result in a much smaller Trusted Code Base (TCB). In other words, SGX hardware-based attestation provides a guarantee that “this is the right application executing on the right platform”.

A diagram showing the Intel SGX attestation protocol key-hiararchy.
The Intel SGX attestation protocol key-hierarchy. (Image credit to Intel)

Each SGX-compatible Intel processor contains two unique 128-bit secret keys that are permanently fused into the chip during manufacture. There is a root provisioning key which is retained by Intel and a seal key which never leaves the chip. The keys used for attestation and for protecting data are all derived using both keys and so cannot be computed by either Intel or any other third party.

SGX Architecture

In more detail the SGX architecture is as follows. SGX appropriates a contiguous region of DRAM called the Processor Reserved Memory (PRM) which is used to store an enclaves’ code and data and that cannot be accessed by any other software or peripheral. Within the PRM is the Enclave Page Cache (EPC) where the enclaves’ code and data are stored in 4KB pages.

SGX memory architecture diagram
The SGX memory architecture. (Image credit to Costan and Devadas)

The initial code and data in an enclave is first loaded by the untrusted system. In particular, data is copied from outside the PRM into EPC pages which are then assigned to the enclave. The Enclave Page Cache Metadata (EPCM) ensures that each page in the EPC is only assigned to a single enclave.

As the enclave is loaded into the EPC by the untrusted system its contents are cryptographically hashed. Once the enclave is fully loaded into the EPC, the CPU marks the enclave as initialised and the finalised hash becomes the enclave’s measurement hash.

Once initialised, the enclave can begin a software attestation process which proves, either to another local enclave or to a remote verifier, that the enclave software is both unmodified and hosted inside a container that is protected by trusted hardware. Whilst local attestation is based on the enclave’s measurement hash and it’s certificate-based identities, remote attestation requires heavy-weight primitives which are too complex to be implemented in CPU hardware. To solve this problem, SGX remote attestation requires a special purpose, privileged enclave called the Quoting Enclave (QE).

Disgram of the Intel SGX attestation architecture.
SGX attestation is based on the provisioning and seal keys that are fused into the CPU hardware during manufacture. (Image credit to Costan and Devadas)

The QE is issued by Intel and contains the signing functionality for remote attestation in SGX. During remote attestation, the attested enclave first uses local attestation to prove to the QE that it is running the correct software. Next, the QE uses the Intel Enhanced Privacy IDentification (EPID) attestation scheme to prove to the remote party that the attested enclave is running trusted software on trusted hardware. EPID is a scheme for Direct Anonymous Attestation (DAA) that has improved revocation capabilities. Like DAA, EPID is based on zero-knowledge proofs and allows signers to anonymously prove group membership.

In SGX, the EPID signature computed by the QE is derived from the seal and provisioning secrets that are unique to the trusted hardware, a key received from Intel’s provisioning service and the local attestation of the attested enclave. Remote SGX attestation succeeds when the remote party receives valid EPID signature, computed by the QE, on the local attestation report output by the attested enclave.

Once remote attestation has succeeded, the enclave can be provisioned secrets from a remote client and can then be expected to execute sensitive code and operate on confidential data without risking exposure or modification from the untrusted parts of the system.

Related Technologies

Earlier in this post I mentioned TPM and Intel TXT, predecessors to SGX. While these technologies are also based on trusted hardware, they address a different problem and are not directly comparable to SGX. The TPM is an auxiliary, tamper-resistant chip that provides a measured boot process based on cryptographically hashing the code that is run at each stage. For example, the BIOS firmware is verified first followed by the boot loader, the OS and then the kernel modules. A TPM can be used to verify an arbitrary system state provided that the list of acceptable measurement hashes can keep pace with the provision of new software versions. Similarly to SGX, TPM supports a software attestation model based on DAA or EPID and can be used to anonymously attest to the presence of trusted hardware and software to a remote verifier. Unlike SGX the TPM trusts all of the software on the system and does not provide any isolation between applications.

Intel TXT uses the TPM chip and attestation model but reduces the secure container to a virtual machine supported by an appropriate CPU. Unlike the TPM, TXT does provide software isolation by ensuring that the virtual machine has exclusive control over the entire system while it is active. Unlike SGX, TXT does not protect the contents of the isolated container from the rest of the system and its code and data are not encrypted.

ARM TrustZone is similar to SGX in that it provides a mechanism for using trusted hardware to ensure isolation between different software components. In TrustZone, which is a collection of IP cores for use in proprietary integrated circuit designs, a system’s resources are partitioned into a secure world and a normal world. Similarly to SGX enclaves, code and data in the TrustZone secure world are protected by hardware from untrusted software located in the normal world. Beyond this, the precise security properties of TrustZone are largely dependent on the specific configuration of IP cores selected by the circuit designer and cannot be directly compared.

Literature

Some high-level and development-oriented details on SGX are provided by Intel in their tutorial slides and developer guide, respectively. They also have a page dedicated to showcasing academic research papers that either propose new applications, theory or attacks relating to SGX.

A thorough, academic explanation and evaluation of SGX from 2016 is given by Costan and Devadas in their paper, Intel SGX Explained. Their paper identifies a number of gaps in the security guarantees provided by SGX, highlights missing information in the publicly available documentation which prevent further analysis and concludes that

“… a security-conscious software developer cannot in good conscience rely on SGX for secure remote computation.”

The main criticism of SGX given by Costan and Devadas, and also Costan et al. in a later publication, is that while the SGX threat model protects against all direct attacks; it excludes side-channel attacks including those that can be performed in software alone.

As might be expected from this discovery, there are a number of side-channel attacks on Intel SGX in the literature. Foreshadow, for example, is a practical software-only architectural attack on SGX which abuses speculative execution to reliably leak plaintext enclave secrets from the CPU cache. Similarly, Götzfried et al. present a practical root-level cache-timing attack that is able to extract an AES secret key from an SGX enclave in less than 10 seconds.

Can we do better?

SGX addresses an important problem in an efficient way, but its present security model and lack of transparency prevent it from delivering strong security guarantees in all practical settings. Sanctum is an open-source set of hardware security extensions that provably provides the intended security guarantees of SGX in a security model which includes all known software side-channel attacks.

Whether using SGX or Sanctum, the provision of provably secure software containers presents an opportunity to develop new application and system architectures that are provably secure against all known direct and side-channel attacks.

Dismantling the AUT64 Automotive Cipher, Part II

This is the second (rather lengthy) part of a two-part series on the AUT64 automotive cipher – you can read the first part here. AUT64 is a proprietary 64 bit block cipher used by a number of major automotive manufacturers. We reverse engineered the design from an immobiliser firmware image and exposed a number of weaknesses in the design.

Since posting the first part I have presented the work we did on AUT64 at CHES 2018. The conference attracted over 600 attendees and featured a great side-channel analysis hackathon which I will write about soon.

My presentation was recorded and has been put on youtube (see below) – speaking of which, I’m really thankful when conference organisers do this. I find it very helpful to be able to watch the presentation and read the paper.

In the remainder of this post I’m going to summarise the key cryptographic weaknesses which we found in AUT64.

Briefly,

  • The AUT64 Feistel network compression function is weak.
  • In eight-round AUT64, the input to the compression function can be precisely controlled for a subset of the message space.
  • The AUT64 substitution-permutation network is weak.
  • AUT64 has certain weak keys.

The AUT64 compression function is weak (and introduces certain weak keys)

The AUT64 compression function
The AUT64 compression function

The AUT64 compression function takes as input a 64 bit bitstring, a 32 bit compression function key \(k_G\) and a round number \(r\) between 0 and 23. The output is 8 bits. The function depends on three key-independent lookup tables \(T_U, T_L\) and \(T_{\text{offset}}\). The first two lookup tables \(T_U, T_L\) prescribe a nibble-wise permutation of the compression function key \(k_G\) which is unique for each round. For each of the 16 nibbles in the input, a round-dependent nibble from  \(k_G\) is concatenated and the resulting byte is input to \(T_{\text{offset}}\).

The AUT64 lookup table T_offset

\(T_{\text{offset}}\) is a 8×8 lookup table which substitutes each nibble in the input according to the key \(k_G\) and the round number \(r\). The way the function is implemented means that the key nibble selects a row and the input nibble selects a column. One of the first things we noticed was that the first row and the first column contains only zeroes. In practice what this means is that the compression function key should not contain any nibbles with the value zero, else bijectivity is lost (which is used in at least one remote keyless entry application) and the cipher is weakened. The resulting key size is reduced from 32 bits to \(15^8\approx {31.25}\) bits.

(Un)Fortunately there’s a much more significant weakness in this design. In each round, for every byte in the input, two values from \(T_{\text{offset}}\) are output and stored in the registers \(gu\) and \(gl\). Where \(i\) is the input byte index, \(X^\prime_i\) is the \(i\)’th input byte and \(uk,lk\) are the upper and lower key nibbles selected by \(T_U\) and \(T_L\), respectively

\[ gu = \bigoplus^{7}_{i = 0} T_{\text{offset}}\Big[uk(k_G, r, i)\parallel un(X’_i) \Big] \]\[ gl = \bigoplus^{7}_{i = 0} T_{\text{offset}}\Big[lk(k_G, r, i)\parallel ln(X’_i) \Big] \]

Significantly the XOR sum is a commutative operation – the order of the operands does not change the result. When we combine this with the fact that each round key is a permutation of the same underlying compression function key, we learn that if we can fix all the input nibbles to have the same value then the round output will always be a fixed value. The bijective design of \(T_{\text{offset}}\) further indicates that if we do this for all 256 byte values then we will get a different fixed output byte for each input.

In eight-round AUT64, the input to the compression function can be precisely controlled for a subset of the message space.

Now we seek to exploit the compression function weakness we just described. To do this we first need to have (somewhat) chosen-plaintext access to the compression function.

Fortunately, the input to the compression function is just the output of the byte-wise permutation \(R\) in the Feistel network design.

The AUT64 Feistel network design
The AUT64 Feistel network design

So provided that each input byte \(x_0\ldots x_7\) has the same value, the permuted input \(x^\prime_0\ldots x^\prime_7\) will be unchanged. Great – this means we can control the input to the compression function \(G\) under 256 different inputs. Better yet, these are exactly the form of inputs which exploit the AUT64 compression function weakness.

Next, we need to be able to discern the output from the compression function in the ciphertext output by AUT64.

The AUT64 substitution-permutation network is weak

The AUT64 round function \(F\) is as follows

The AUT64 round function F
The AUT64 round function F

We’ve shown we can control \(x^\prime_0 \ldots x^\prime_7\) for an interesting subset of messages and that these inputs will cause the output from the compression function \(G\) to be distinguishable from random. What follows \(G\) is a simple substitution-permutation network comprising a single S-box \(S\) applied both at the input and output and a bitwise permutation \(\sigma_{\text{bit}}\) applied in between.

Rather than a single 16×16 S-Box, \(S\) is formed from the concatenation of a single 8×8 S-Box which is defined by the 64-bit key part \(k_\tau\). This means that if the upper and lower 4 bits output from \(G\) have the same value, the output of \(S\) will retain their symmetry and output a byte with an equivalent upper and lower 4 bits.

This gives us the final piece of our puzzle. We can cause each of the 256 chosen inputs \(0x00\ldots, 0x00\) to \(0xFF,\ldots , 0xFF\) to be input to G, we know the output for each will be a unique byte and we know that of these, the sixteen bytes \(0x00,0x11,\ldots , 0xFF\) which are symmetric will also have symmetric outputs from the substitution-permutation network into the final ciphertext.

Breaking semantic security

Semantic security
Semantic security

We can now break semantic security (shown above) as follows. We target distinguishing the first round of encryption as this is when we can fully control the input to the compression function with our chosen plaintexts.

We create the set \(\mathbb{P}\) of all 256 plaintexts with the property that every byte has the same value. \[ \mathbb{P} = \Big \{n^8 : n\in \{0,\ldots,255\} \Big \} \]

Then we encrypt every plaintext in \(\mathbb{P}\) using 8 round AUT64 and save the resulting set of ciphertexts \(\mathbb{C}\). Our goal is to distinguish the output from the first round by looking at each of the eight byte positions over all 256 ciphertexts.

Next, we create 8 new sets \(\mathbb{C}_0 \ldots\mathbb{C}_7\), In each new set we store the 256 ciphertext bytes corresponding to each byte position in every ciphertext in the ciphertexts set \(\mathbb{C}\). The output from the first round will be contained within a single set because the byte permutation \(R\) which mixes the output from each round is fixed. The output from the first round will also be a uniform distribution over the byte-value space. i.e. the set corresponding to the first round will contain exactly one of each byte value \(0, \ldots, 255\)! Semantic security is now broken with an advantage of 1 as we can always distinguish the output of 8 round AUT86 from a random permutation.

Breaking AUT64

Now we’ve broken semantic security, we’d also like to break the cipher more pragmatically.

The nominal key size of AUT64 is 120 bits and is comprised from three parts: a 32-bit compression function key \(k_G\), a 64-bit S-Box key \(k_\tau\) and a 24-bit permutation key \(k_\sigma\). The compression function key is a simple bit-string, the S-Box key defines a 4×4 S-Box and the permutation key defines an 8-element permutation.

The maximum entropy of an AUT64 key is therefore \(232×16!×8!≈91.55\) bits – too much for us to brute force.

In our paper we describe a purely cryptanalytic attack which can recover an AUT64 key in fewer than \(\approx 2^{49.6}\) encryptions!

AUT64 divide and conquer attack
AUT64 divide and conquer attack

First, encrypt \[ \mathbb{P}_1 = \Big \{n^8 : n\in \{0,\ldots,255\} \Big \} \] to get \(\mathbb{C}_0 \ldots\mathbb{C}_7\) (as described above) and then identify the set \(\mathbb{C}_i\) which is equal to the set of all byte values \(\mathbb{Z}_{256}\) and therefore identifies the byte position \(i\) corresponding to the first round of encryption.

Once the position \(i\) is known, construct the second set of chosen plaintexts \[\mathbb{P}_2 = \Bigg \{ \Big(n\ll (64-8\times i)\Big) : n\in \{0,\dots,15\} \Bigg \}\] such that the nibble counter \(n\in\{0,\ldots , 15\}\) is placed in the lower nibble of byte position \(i\).

Encrypting \(\mathbb{P}_2\) will cause the output of the compression function \(G\) to be dependent (in the first round) on only one nibble from the substitution table \(T_{\text{offset}}\). This is because for 15 of the 16 input nibbles, the value 0 will be selected from \(T_{\text{offset}}\) and XORed into the output registers. Now all that remains (see paper for specifics of key sizes remaining) is to brute-force the 15 possible key-nibble values, the possible \(\approx 2^{37.3}\) S-Box values and the \(\approx 2^{8.4}\) permutation values. The resulting attack takes just \(15\times 2^{37.3} \times 2^{8.4} \approx \mathbf{2^{49.6}}\) encryptions to recover the full 120 bit AUT64 key!

I found reverse engineering and breaking AUT64 very exciting, particularly once it became practical. I’ve since (for predominantly diplomatic reasons) moved on to working on PKI solutions in the V2X domain so this may be the last cryptanalysis post I make for a while, although I did recently make an interesting observation about the DES round function I would like to afford some time to.

— Happy new years!

Dismantling the AUT64 Automotive Cipher, Part 1

I recently got my first academic paper published! This was a big moment for me and it feels great to have my hard work validated and preserved in the literature.

My paper was published in the IACR Transactions on Cryptographic Hardware and Embedded Systems 2018, at which I’ll be presenting the work in September later this year 🙂

In this post I’ll give a very brief overview of the AUT64 cipher which we reverse engineered from an immobiliser box firmware. In a follow-up post I’ll go into more detail about the cryptographic and implementational weaknesses which we found (read the paper for spoilers!).

My PhD supervisor Flavio Garcia has a long history of exposing weak cryptography in embedded applications, perhaps most prominently the MIFARE classic smartcard which, at one time, comprised 70% of the wireless smartcard market and was widely used for public transport payment schemes such as the Oyster Card in London and the OV-chipkaart in The Netherlands.  It is also widely used for access control in office and governmental buildings. The attacks on MIFARE classic allowed the recovery of all secret keys, typically in less than a second.

AUT64 is an automotive block cipher which was used by a number of automotive manufacturers including Volkswagen group and Mazda. AUT64 has been used both for traditional vehicle immobiliser systems (which prevent hotwiring) and the more-recent remote keyless entry systems which allow the doors of the vehicle to be unlocked remotely. AUT64 was first discovered in an earlier paper by Garcia et al. which showed how the majority of Volkswagen group vehicles between 1995 and 2016 were undermined, in addition to significant cryptographic issues, by frankly laughably inadequate key management practices in which global master keys were used across entire product families.

In the paper, I reverse engineer an AUT64 immobiliser system, find a number of cryptographic and key management weaknesses and then combine them to show how the system is critically flawed.

A common wisdom in cryptography is that `security by obscurity’ doesn’t work. The details of AUT64 were proprietary and obscured away beneath patents, datasheets and compiled implementations. My first task was to take the datasheets, the patent and a firmware dump from a vehicle immobiliser box and to work out exactly how the cipher operated.

I reverse engineered the firmware using IDA. Once the correct processor type had been identified it was relatively easy to identify the routines of interest, it took a lot (lot lot..) longer to discern all the logic.

At a high-level, this is what I found

AUT64 is a 64-bit block cipher which uses a 120-bit key. This means it takes an input message of length eight bytes \(x_0\ldots x_7\) and outputs a ciphertext of eight bytes. This specific type of block cipher is called a Feistel construction, the cipher is iterated in rounds which gradually modify the output until it is (ideally!) indistinguishable from a random string. In each round, a byte permutation \(\sigma_{byte}\) is applied to the input and then the Feistel function \(F\) is applied to the permuted bytes \(x’_0\ldots x’_7\).

The Feistel function \(F\) comprises a compression function \(G\), an S-Box \(S\) and a bit-wise permutation \(\sigma_{\text{bit}}\) applied as follows

The compression function takes as input the permuted eight bytes \(x’_0\ldots x’_7\) and outputs just a single byte. The substitution-permutation network composed from the S-Box, the bit-wise permutation and then the S-Box again is used to provide the final output byte \(x{”}_7\).

Finally, the compression function \(G\) is composed as follows

The compression function operates nibble-wise (on 4-bit values) and uses a three of key-independent look-up tables \(T_U, T_L\) and \(T_{\text{offset}}\) which define the key schedule and the diffusion property, respectively.

What makes AUT64 particularly interesting is that a great deal of the construction is dependent on the key. A 120-bit AUT64 \(K\) is comprised from three parts: a 32-bit compression function key \(k_G\), a 64-bit S-Box key \(k_\tau\) and a 24-bit permutation key \(k_\sigma\).

The compression function key is a simple bit-string, the S-Box key defines a \(4×4\) S-Box and the permutation key defines an 8-element permutation. The maximum entropy of an AUT64 key is therefore \(2^{32}\times 16! \times 8!\approx 91.55\) bits. Unlike some of the earlier work on weak proprietary cryptography, this is far too large to be exhaustively searched.

A second interesting property of AUT64 is that it is an unbalanced Feistel construction. Consequently, in each round of AUT64 just one-byte of the ciphertext is changed, the other bytes are only rearranged. This is in contrast to the traditional Feistel construction (i.e. DES) in which half of the ciphertext is changed in each round. This means we need to apply at least eight rounds of AUT64 before the ciphertext could ever hope to be statistically indistinguishable from the plaintext. A comprehensive taxonomy of Feistel networks by Schneier et al. can be found here.

To summarise, AUT64 is a 64-bit block cipher with an unbalanced Feistel-network type construction. The key has around 96-bits of entropy, the round size is at least eight and the cryptographic properties of the cipher are key-dependent to an unusually large degree.

That’s all for now. In the follow up post on this paper I’ll highlight the key cryptographic and implementational weaknesses which let us break AUT64!