ECW 2023: kaleidoscope (write-up)

This year again, Thalium took part in organizing the European Cyber Week CTF qualifiers, as well as designing several challenges in our core competencies.

kaleidoscope was a hard reverse engineering challenge, with a focus on Windows-specific mechanisms and obfuscation. 14 contestants managed to solve it. You can download the challenge files here (password: thalium).

In this write-up, I will explain my thoughts as the author of the challenge, and describe two different approaches that could be leveraged to untangle its inner workings.

Introduction

The challenge’s description was the following:

Challenge description

It refers to the concept of weird machine, which can be thought of as the set of unintended states that can be reached in a program (e.g. by leveraging a vulnerability such as memory corruption, and « breaking out » of the program’s initial specification). This will prove important later.

We are given an executable file (kaleidoscope.exe) and an unknown binary file (wtf.bin). Running the former yields:

$ ./kaleidoscope.exe
Usage: C:\work\ctf\ecw\2023\kaleidoscope\kaleidoscope.exe program.bin

Let’s pass wtf.bin as argument:

$ ./kaleidoscope.exe wtf.bin
   __        __    _    __
  / /_____ _/ /__ (_)__/ /__  __________  ___  ___
 /  '_/ _ `/ / -_) / _  / _ \(_-< __/ _ \/ _ \/ -_)
/_/\_\\_,_/_/\__/_/\_,_/\___/___|__/\___/ .__/\__/
                                       /_/

Enter the password: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Uhh... no
Terminated

CTF players with some experience in reverse engineering will most likely recognize what seems to be VM-based obfuscation, wtf.bin being the emulated program. Here is its hexdump:

00000000  e3 0c a9 f1 b1 61 79 86 2d e9 ee 5a d6 a6 8f 63  |ã.©ñ±ay.-éîZÖ¦.c|
00000010  55 01 00 00 20 20 20 5f 5f 20 20 20 20 20 20 20  |U...   __       |
00000020  20 5f 5f 20 20 20 20 5f 20 20 20 20 5f 5f 20 20  | __    _    __  |
00000030  20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20  |                |
00000040  20 20 20 20 20 20 20 0a 20 20 2f 20 2f 5f 5f 5f  |       .  / /___|
00000050  5f 5f 20 5f 2f 20 2f 5f 5f 20 28 5f 29 5f 5f 2f  |__ _/ /__ (_)__/|
00000060  20 2f 5f 5f 20 20 5f 5f 5f 5f 5f 5f 5f 5f 5f 5f  | /__  __________|
00000070  20 20 5f 5f 5f 20 20 5f 5f 5f 20 0a 20 2f 20 20  |  ___  ___ . /  |
00000080  27 5f 2f 20 5f 20 60 2f 20 2f 20 2d 5f 29 20 2f  |'_/ _ `/ / -_) /|
00000090  20 5f 20 20 2f 20 5f 20 5c 28 5f 2d 3c 20 5f 5f  | _  / _ \(_-< __|
000000a0  2f 20 5f 20 5c 2f 20 5f 20 5c 2f 20 2d 5f 29 0a  |/ _ \/ _ \/ -_).|
000000b0  2f 5f 2f 5c 5f 5c 5c 5f 2c 5f 2f 5f 2f 5c 5f 5f  |/_/\_\\_,_/_/\__|
000000c0  2f 5f 2f 5c 5f 2c 5f 2f 5c 5f 5f 5f 2f 5f 5f 5f  |/_/\_,_/\___/___|
000000d0  7c 5f 5f 2f 5c 5f 5f 5f 2f 20 2e 5f 5f 2f 5c 5f  ||__/\___/ .__/\_|
000000e0  5f 2f 20 0a 20 20 20 20 20 20 20 20 20 20 20 20  |_/ .            |
000000f0  20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20  |                |
00000100  20 20 20 20 20 20 20 20 20 20 20 2f 5f 2f 20 20  |           /_/  |
00000110  20 20 20 20 20 20 20 0a 0a 45 6e 74 65 72 20 74  |       ..Enter t|
00000120  68 65 20 70 61 73 73 77 6f 72 64 3a 20 4e 69 63  |he password: Nic|
00000130  65 20 6f 6e 65 21 0a 55 68 68 2e 2e 2e 20 6e 6f  |e one!.Uhh... no|
00000140  0a 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  |................|
00000150  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  |................|
00000160  00 01 00 00 00 01 00 00 00 ef c2 18 d2 13 aa 11  |.........ïÂ.Ò.ª.|
00000170  02 2b 1e 89 4d 87 07 a7 92 e6 41 5d 9a dc fc 1c  |.+..M..§.æA].Üü.|
00000180  50 fd ba 01 f0 1c 4b 14 ed 44 4d ca ec af 03 13  |Pýº.ð.K.íDMÊì¯..|
00000190  c3 25 8c ee 5b 24 27 3a 63 c3 be 1d 8a 77 7d f1  |Ã%.î[$':cþ..w}ñ|
000001a0  80 0a 2f 7d c5 ca ee be a8 5a 48 dd e5 3b 3a 98  |../}ÅÊZHÝå;:.|
000001b0  5b 9a 91 88 42 81 0c 7f 88 10 7c 18 93 03 4d a6  |[...B.....|...M¦|
000001c0  52 7b fa 9c f4 5a 0d 5f a1 6a c8 f0 f1 e7 ca c6  |R{ú.ôZ._¡jÈðñçÊÆ|
000001d0  40 6d 9d 66 e2 a9 1b b4 df 3d c1 41 f1 8e cb 45  |@m.fâ©.´ß=ÁAñ.ËE|
000001e0  ba 00 b5 a8 6b 8c 63 ba 22 03 9f 64 63 89 2e 25  |º.µ¨k.cº"..dc..%|
000001f0  48 cf 23 dd de f3 15 ca 5c 22 32 2e 6c a5 5e 32  |HÏ#ÝÞó.Ê\"2.l¥^2|
00000200  78 91 c4 26 5a ad 90 61 12 0b 3f 15 68 c0 b3 82  |x.Ä&Z..a..?.hÀ³.|
00000210  dd 21 ad fe 02 f7 d5 d7 9a ce de d9 41 41 bb 2c  |Ý!.þ.÷Õ×.ÎÞÙAA»,|
00000220  3e 5c b5 4e 82 ed 10 1e 38 d0 51 aa ac 7d 4f 31  |>\µN.í..8ÐQª¬}O1|
00000230  26 38 9a 9a 2b f4 4d 66 33 94 97 7d 70 7c 99 6c  |&8..+ôMf3..}p|.l|
00000240  ba 13 4d a7 79 be df 79 f7 3a bc 97 99 8f 7a 72  |º.M§y¾ßy÷:¼...zr|
00000250  40 ae 86 dc fe ed aa 75 67 f3 ee d3 b8 70 03 32  |@®.ÜþíªugóîÓ¸p.2|
00000260  0d 5d 91 c9 47 ea 8e dd 5f 7c 34 0c d5 77 99 f9  |.].ÉGê.Ý_|4.Õw.ù|
00000270  ed 11 eb 8b 37 29 3f 6f 99 1e 8d 3c 85 3b 53 e0  |í.ë.7)?o...<.;Sà|
00000280  a2 e3 1c 9f 45 c7 e0 05 50 85 6e 99 16 16 c0 c8  |¢ã..EÇà.P.n...ÀÈ|
00000290  cb 23 bd 3d b8 f7 0e f6 d4 0e 3e 8f 52 46 6a 13  |Ë#½=¸÷.öÔ.>.RFj.|
000002a0  44 2d 6f 96 bf 3a f9 ab 46 59 c1 76 a3 76 a8 fb  |D-o.¿:ù«FYÁv£v¨û|
000002b0  05 d4 ef 13 56 27 54 44 f8 03 f0 07 96 29 5e 33  |.Ôï.V'TDø.ð..)^3|
000002c0  95 be 3a 06 20 fe 68 a5 24 d0 b5 43 07 cb e9 5f  |.¾:. þh¥$еC.Ëé_|
000002d0  7f 93 b9 b0 d9 1b 0f c6 da d9 2b ff 0e f0 8f de  |..¹°Ù..ÆÚÙ+ÿ.ð.Þ|
000002e0  b9 27 b5 0e 4d 63 0a 57 12 75 2a e1 56 2d ea 41  |¹'µ.Mc.W.u*áV-êA|
000002f0  b5 26 6d 6b 19 ad eb 51 26 77 f5 e7 67 92 9c 2d  |µ&mk..ëQ&wõçg..-|
00000300  f8 7b 49 9b ed 27 05 79 cb 5a be 24 2a 14 0f 02  |ø{I.í'.yËZ¾$*...|
00000310  91 46 7d bd a0 b1 78 0a 4a 7e 60 07 cf ec 3e 2e  |.F}½ ±x.J~`.Ïì>.|
00000320  a7 f0 3d a2 46 72 56 75 23 1e 73 c3 d0 27 5a 8f  |§ð=¢FrVu#.sÃÐ'Z.|
00000330  fe 82 dc e6 ac ed 4d 1d 4d 45 74 1b 14 8e 67 ef  |þ.Üæ¬íM.MEt...gï|
00000340  aa e4 a8 5f db a3 ac c6 95 2f 9b 70 d3 d5 57 50  |ªä¨_Û£¬Æ./.pÓÕWP|
00000350  45 a9 e5 f6 16 58 70 7f be 9f f3 24 df 75 18 9c  |E©åö.Xp.¾.ó$ßu..|
00000360  60 11 d2 d0 df a5 02 57 ad 64 ce 23 56 31 b4 fd  |`.ÒÐߥ.W.dÎ#V1´ý|
00000370  63 5d 3f a9 af 7c 2e 26 12 24 a3 2e 77 60 0e aa  |c]?©¯|.&.$£.w`.ª|
00000380  47 91 cf 66 29 a4 3a f8 33 9a 76 80 ef a6 3a 7b  |G.Ïf)¤:ø3.v.ï¦:{|
00000390  a5 c8 f4 d4 b7 31 e6 13 df c1 c4 8c f6 f6 61 56  |¥ÈôÔ·1æ.ßÁÄ.ööaV|
000003a0  91 dd d0 97 b1 a5 b4 b2 4f 15 ad e8 cc 44 df 58  |.ÝÐ.±¥´²O..èÌDßX|
000003b0  ee c7 2a e2 64 86 b7 83 1e 35 3a f6 db 79 3f 5d  |îÇ*âd.·..5:öÛy?]|
000003c0  a3 64 18 16 09 fc 5d 82 ba 4f 0c 10 79 79 f6 9f  |£d...ü].ºO..yyö.|
000003d0  af fe 6b d0 2c e0 b0 01 17 c4 44 b3 6b 83 80 e0  |¯þkÐ,à°..ÄD³k..à|
000003e0  ce 95 5c d8 2b 89 ba fc b3 26 b7 d9 34 64 c9 71  |Î.\Ø+.ºü³&·Ù4dÉq|
000003f0  ac bb 7b 40 66 9d 5e 73 4c cc 73 a0 78 37 4a 2c  |¬»{@f.^sLÌs x7J,|
00000400  9c                                               |.|

We can already point out a few relevant elements:

  • the file starts with 16 seemingly random bytes;
  • then, it is followed by a section that contains program data, including strings;
  • finally, the main part of the file, which would be the code, has a high entropy (7.7), hinting towards compression or encryption.

Other than that, there is not much more to uncover in pure black-box, so let’s reverse the host binary.

Reversing the VM host

The main function opens the argument file, and parses it:

FileA = CreateFileA(filename_argv, GENERIC_READ, 0, 0i64, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, 0i64);
if ( FileA == (HANDLE)-1i64 )
{
    printf("Could not open file\n");
    return 1;
}
if ( ReadFile(FileA, &Key, 0x10u, &NumberOfBytesRead, 0i64)
    && NumberOfBytesRead == 16
    && ReadFile(FileA, &EntryPoint, 4u, &NumberOfBytesRead, 0i64)
    && NumberOfBytesRead == 4
    && (Program = malloc(0x10000ui64)) != 0i64
    && (memset(Program, 0, 0x10000ui64), ReadFile(FileA, Program, 0x10000u, &NumberOfBytesRead, 0i64))
    && NumberOfBytesRead )
{
    CloseHandle(FileA);
    if ( CreateThread(0i64, 0i64, CryptoWorkerMain, 0i64, 0, &CryptoWorkerTid) )
    {
        ProgramInfo.EntryPoint = EntryPoint;
        ProgramInfo.Program = Program;
        v9 = CreateThread(0i64, 0i64, VMWorkerMain, &ProgramInfo, 0, &VMWorkerTid);
        if ( v9 )
        {
            WaitForSingleObject(v9, 0xFFFFFFFF);
            puts("Terminated");
            return 0;
        }
    }
    printf("CreateThread failed (%lu)\n", GetLastError());
}

A 16-byte key is first read, then the offset of the entry point in the emulated program is read (4 bytes). Some space for the program is allocated on the heap, and the rest of the file (up to 0x10000 bytes) is read into it. This includes both the data/strings section and the encrypted code section.

After that, two threads are created:

  • one dedicated to cryptographic operations, which we shall call the crypto worker;
  • one dedicated to running the virtualized program, which we shall call the VM worker. It takes the program and the entry point as parameters in a structure.
CryptoWorkerThreadMainThreadVMWorkerThread

The VM worker initializes a context for the virtual machine, and then enters a loop in which it fetches the instructions and handles them.

void __fastcall __noreturn VMWorkerMain(ProgramInfo *ProgramInfo)
{
  int EntryPoint; // ebx
  unsigned int Ins; // eax
  VmContext VmCtx; // [rsp+20h] [rbp-59h] BYREF

  EntryPoint = ProgramInfo->EntryPoint;
  VmCtx.Stack = 0i64;
  VmCtx.Program = ProgramInfo->Program;
  VmCtx.SyscallHandlers[5] = SyscallUnimplemented;
  VmCtx.SyscallHandlers[6] = SyscallUnimplemented;
  memset(VmCtx.Registers, 0, 72);
  VmCtx.SyscallHandlers[7] = SyscallUnimplemented;
  VmCtx.SyscallHandlers[0] = sub_10D0;
  VmCtx.SyscallHandlers[1] = sub_1120;
  VmCtx.SyscallHandlers[2] = sub_1170;
  VmCtx.SyscallHandlers[3] = sub_1190;
  VmCtx.SyscallHandlers[4] = sub_1280;
  VmCtx.Stack = (DWORD *)calloc(0x2000ui64, 4ui64);
  if ( VmCtx.Stack )
  {
    VmCtx.Registers[15] = EntryPoint;
    VmCtx.Registers[13] = 0x2000;
    hKernel32 = LoadLibraryA("kernel32.dll");
    while ( VmCtx.Registers[15] + 3 < 0x10000 )
    {
      Ins = VM_FetchInstruction(&VmCtx);
      VM_HandleInstruction(&VmCtx, Ins);
    }
  }
  ExitProcess(1u);
}

Further analysis of the handling logic shows that the VM works with 32-bit registers, a stack, and up to 8 syscalls (we can see some of the subs call fread/fwrite for I/O).

The context structure stores the values for 16 different registers, some special ones like 0xF or 0xD being dedicated to the program counter and stack pointer.

The VM_FetchInstruction function reads a 4-byte instruction from the current program counter, but then does something more unexpected:

DWORD __fastcall VM_FetchInstruction(VmContext *VmCtx)
{
  LPARAM PC; // rax MAPDST
  WPARAM Ins; // rdi
  struct tagMSG Msg; // [rsp+30h] [rbp-48h] BYREF

  PC = VmCtx->Registers[15];
  if ( (unsigned int)PC >= 0x10000 )
  {
    puts("Program overflow");
    ExitProcess(1u);
  }
  PC = VmCtx->Registers[15];
  Ins = *(unsigned int *)&VmCtx->Program[PC];
  while ( !PostThreadMessageA(CryptoWorkerTid, 0x1337u, Ins, PC) )
    Sleep(100u);
  while ( !PeekMessageA(&Msg, HWND_MESSAGE|0x2, 0, 0, 1u) || Msg.message != 0x1338 )
    Sleep(0xAu);
  return Msg.wParam;
}

It uses the PostThreadMessageA function from Windows API, which posts a message to a message queue of a specific thread (here, the crypto worker thread).

This function is often used to send window events, but one can define custom messages by choosing a message identifier in the private range, between 0x0400 (WM_USER) and 0xFFFF. Here, we can see that two message types exist: 0x1337 and 0x1338.

The VM worker thread sends the information of the fetched instruction (Ins) and the current program counter (PC) to the crypto worker thread through the 0x1337 message type. Then, a 0x1338 message is received from the crypto worker thread (thanks to PeekMessageA), which contains the decrypted instruction that will be sent over to VM_HandleInstruction.

For this part, I was inspired by the notion of Instruction Set Randomization, a theoretical mitigation against code injection attacks (which can be implemented at both hardware or software level). Here, instructions are encrypted and decrypted on-the-fly through the crypto worker thread.

Let’s take a look at the crypto worker:

void __fastcall __noreturn CryptoWorkerMain(__int64 lpThreadParameter)
{
  unsigned __int64 v1; // rbx
  struct tagMSG Msg; // [rsp+38h] [rbp-40h] BYREF

  SetSeed(lpThreadParameter, 0xBAD1DEAu);
  while ( 1 )
  {
    if ( PeekMessageA(&Msg, HWND_MESSAGE|0x2, 0, 0, 1u) && Msg.message == 0x1337 )
    {
      v1 = LODWORD(Msg.wParam) ^ (unsigned __int64)(unsigned int)TEA_Encrypt(Msg.lParam);
      while ( !PostThreadMessageA(VMWorkerTid, 0x1338u, v1, 0i64) )
        Sleep(0xAu);
    }
    else
    {
      Sleep(0xAu);
    }
  }
}

The SetSeed function sets a global variable Seed to 0xBAD1DEA. Then, every time an encrypted instruction arrives, it is decrypted by xoring it with TEA_Encrypt(PC).

DWORD __stdcall TEA_Encrypt(DWORD v0)
{
  unsigned int v1; // eax
  int sum; // r9d
  __int64 rounds; // r10

  v1 = Seed;
  sum = 0;
  rounds = 32i64;
  do
  {
    sum -= 0x61C88647;
    v0 += (sum + v1) ^ (Key0 + 16 * v1) ^ (Key1 + (v1 >> 5));
    v1 += (sum + v0) ^ (Key2 + 16 * v0) ^ (Key3 + (v0 >> 5));
    --rounds;
  }
  while ( rounds );
  return v0 ^ v1;
}

The TEA_Encrypt function performs TEA encryption (Tiny Encryption Algorithm) with PC and Seed as inputs. The 128-bit key is the one that was read from the header of the file.

Therefore, the encrypted instructions basically depend on their offset in the code (PC) and a 32-bit (constant) seed.

The VM_HandleInstruction function, which eventually handles the decrypted instructions, dispatches them to different handlers. It’s a pretty classic VM: some arithmetic logic instructions, branching, stack stuff, syscalls…

SKeeeydCryptoWorkerPoEDsnetccTrrhyyr0p0pextxta1eP1ed3dC3dM33e7o8osppsccaoogddeeeAVMWorkerProgramRubnytiencsotdreuction

At this point, players can opt for two main routes.

The first is the more static route, in which you keep on reversing the handling logic, specify all the instructions, and write a static disassembler for the bytecode (which also involves decrypting the instructions).

The second is the dynamic route: one may be tempted to instrument the binary in order to automatically dump all the decrypted instructions that are executed.

Both routes are valid and worth taking a look at, because both have their own twists and touch on different skills. Let’s dive into them!

The static route

Through a more thorough analysis of the functions involved in instruction handling, we are able to grasp a finer understanding of the virtual machine.

The sixteen registers are subdivided like the following:

  • R0 -> RB: general purpose
  • RC: frame pointer (FP)
  • RD: stack pointer (SP)
  • RE: link register (LR)
  • RF: program counter (PC)

Regarding the calling convention, arguments are stored in R0, R1, R2, R3 and the stack. Return values are stored in RA.

The calling convention for syscalls is slightly different; maximum 4 arguments, stored in R2, R3, R4 and R5. A few syscalls are defined:

  • SYSCALL_READ (0x0)
  • SYSCALL_WRITE (0x1)
  • SYSCALL_EXIT (0x2)
  • SYSCALL_SETPROCESSMITIGATIONPOLICY (0x3)
  • SYSCALL_ISDEBUGGERPRESENT (0x4)

We will talk about the last two syscalls in the section dedicated to the dynamic route.

Instructions (32-bits) are encoded as the four following bytes: [Opcode][Param][Dst][Src]. For ALU/MOV operations, the source can be a register, a memory location or an immediate. Likewise, destination can be a register or a memory location. The Param byte specifies the different cases with its four last bits (∳b_3∳, ∳b_2∳, ∳b_1∳, ∳b_0∳):

casemeaningcomment
∳b_3 b_2 = 00∳dst = reg
∳b_3 b_2 = 01∳dst = mem1 additional DWORD for the address
∳b_3 b_2 = 10∳invalid
∳b_3 b_2 = 11∳invalid
∳b_1 b_0 = 00∳src = reg
∳b_1 b_0 = 01∳src = mem1 additional DWORD for the address
∳b_1 b_0 = 10∳src = imm1 additional DWORD for the immediate
∳b_1 b_0 = 11∳invalid

For memory accesses, an additional bit (∳b_4∳) specifies whether we’re working at DWORD level or BYTE level. For instance, mov reg, dword ptr [addr] would require ∳b_4 = 0∳, and mov reg, byte ptr [addr] would require ∳b_4 = 1∳. In the case of a byte read access, it also allows to save 4 bytes by encoding the immediate in the Src byte.

As for jumps/calls, these can be relative or absolute depending on ∳b_0∳. If ∳b_0 = 0∳, it’s a relative jump encoded by the WORD [Dst][Src] (relative offset to PC at the moment of the jump). If ∳b_0 = 1∳, it’s an absolute jump, and the address is encoded in an additional DWORD.

These are all the opcodes:

instructionopcode
ADD0x80
SUB0x81
MUL0x82
DIV0x83
MOD0x84
CMP0x85
AND0x90
OR0x91
XOR0x92
MOV0xA0
JMP0xB0
JEQ0xB1
JNE0xB2
JGT0xB3
JGE0xB4
JLT0xB5
JLE0xB6
CALL0xC0
RET0xC1
PUSH0xD0
POP0xD1
SYSCALL0xE0

With all this knowledge, we can now decrypt the instructions and implement a basic disassembler. This can yield something like the following:

.welcome  b"   __        __    _    __                         \n  / /_____ _/ /__ (_)__/ /__  __________  ___  ___ \n /  '_/ _ `/ / -_) / _  / _ \\(_-< __/ _ \\/ _ \\/ -_)\n/_/\\_\\\\_,_/_/\\__/_/\\_,_/\\___/___|__/\\___/ .__/\\__/ \n                                       /_/         \n\n"
.enterpassword  b"Enter the password: "
.good  b"Nice one!\n"
.nope  b"Uhh... no\n"

.flag   b"\x00" * 32

.unk1  b"\x01\x00\x00\x00" 
.unk2  b"\x01\x00\x00\x00"

jmp entrypoint

fun1:
mov R2, flag
mov R3, 32
syscall 0  ; read
ret

fun2:
mov R5, ??????
mov R6, ??????
sub R5, 0x10c0
ret

fun3:
syscall 4
cmp RA, 1
jne fun3_ret
syscall 2
fun3_ret:
ret

fun4:
mov R2, 2
mov R3, unk1
mov R4, 4
syscall 3
mov R2, 8
mov R3, unk2
mov R4, 4
syscall 3
ret

entrypoint:
call fun3
call fun4
mov R2, welcome
mov R3, $welcome
syscall 1
mov R2, enterpassword
mov R3, $enterpassword
syscall 1
call fun1
call fun3
call fun2
mov R0, R5
mov R1, R6
add R0, 0x1f30
mov R2, 0x644e7750
syscall 8

We will ignore what fun3 and fun4 do for now. The program outputs the welcome and enter password strings, before calling fun1 which reads a 32-byte flag.

It then calls fun2, which does something weird that wasn’t disassembled properly… and finally, after the last syscall, the disassembly is total garbage (not shown here), as if the program were obfuscated or the instructions not decrypted correctly.

The first weird thing that happens is this:

mov R5, ??????
mov R6, ??????
sub R5, 0x10c0
ret

Let’s look at the exact bytecode for the mov instructions:

A00005FE
A00006FF

According to the handling logic, a mov Rp, Rq instruction should be encoded as A0 00 0p 0q. But here, we have FE and FF bytes instead!

Indeed, there is a vulnerability in the virtual machine that allows to cause an underflow in register numbers. Here’s an extract of the actual source code of the challenge:

DWORD VM_LoadSrcOrDst(VMContext* VM, CHAR SrcOrDst, UCHAR Type, UCHAR IsByte) {

	DWORD Addr;

	switch (Type) {

	// Register
	case 0:
		if (SrcOrDst < N_REGISTERS) {
			return VM->Registers[SrcOrDst];
		}
		ExitProcess(EXIT_FAILURE);

It is correctly verified that the register number (SrcOrDst) be lower than a certain threshold. However, this variable is a signed CHAR, which means that a negative byte can be used.

The VMContext structure is the following:

typedef struct VMContext {
	BYTE* Program;
	VM_SYSCALL_FN* SyscallHandlers[N_SYSCALLS];
	DWORD Registers[N_REGISTERS];
	UCHAR Flag;
	DWORD* Stack;
} VMContext;

Consequently, we are able to underflow into the SyscallHandlers table. The fun2 function actually does something like this:

mov R5, R(-2)
mov R6, R(-1)
sub R5, 0x10c0
ret

The R(-2) and R(-1) registers represent the last two DWORDs of the SyscallHandlers table. Therefore, the emulated program is effectively leaking the last function pointer in the table!

&RegisRteegriss[t-e2r]sPVVVVVVVVRRrMMMMMMMM02o________gSSSSSSSSryyyyyyyyassssssssmccccccccaaaaaaaallllllllllllllllRWESIUUUerxesnnnaiitDiiidttPemmmerbpppoulllcgeeeegmmmseeeesrnnnMPtttReireee1ttedddcis.geanttionPolicySyscallHandlerstable

This function pointer happens to be the function VM_SyscallUnimplemented (at 0x10C0), which is a default filler for undefined syscalls. The fun2 function thus computes the base address of kaleidoscope.exe.

The emulated program does not stop there, and does a second weird thing:

mov R0, R5
mov R1, R6
add R0, 0x1f30
mov R2, 0x644e7750
syscall 8

The 8th syscall does not exist! There’s a second vulnerability, a bit similar to the previous one. Here is another extract of the source code:

case OP_SYSCALL:
    if (Param > N_SYSCALLS)
        ExitProcess(EXIT_FAILURE);
    VM->Registers[RA] = VM->SyscallHandlers[Param](
        VM,
        VM->Registers[R2],
        VM->Registers[R3],
        VM->Registers[R4],
        VM->Registers[R5]
    );
    VM->Registers[PC] += 4;
    break;

The Param > N_SYSCALLS condition gives room for an off-by-one. There are only 8 handlers in the SyscallHandlers table, but we can call a handler that is located one QWORD right after.

As we saw before, right next to this table sit the registers. Therefore, an attacker can call an arbitrary function pointer formed by the two first registers R0 and R1.

SysScyaslclaHlalnHdalnedrlse[r8s]PVVVVVVVVRRrMMMMMMMM02o________gSSSSSSSSryyyyyyyyassssssssmccccccccaaaaaaaallllllllllllllllRWESIUUUerxesnnnaiitDiiidttPemmmerbpppoulllcgeeeegmmmseeeesrnnnMPtttReireee1ttedddcis.geanttionPolicyRegisters

Since the syscalling convention is given by registers R2, R3, R4 and R5, we can also potentially specify arbitrary arguments to the call through these registers (if the target function’s calling convention is compatible).

The target function for the call is computed with add R0, 0x1f30. At offset 0x1f30 lies the function SetSeed:

void __fastcall SetSeed(__int64 a1, unsigned int a2)
{
  Seed = a2;
}

As discussed earlier, this function simply modifies the global Seed variable used during the instruction decryption mechanism. Its calling convention happens to match the one from the syscall handlers, and a2 (edx) contains the value of the R2 register, which is at the attacker’s control.

Therefore, the attacker manages to change the seed to the value of R2: 0x644e7750 (‘PwNd’)!

From this moment onwards, all instructions will see their decryption modified, hence the garbage. To summarize, the emulated program auto-exploited a chain of bugs in the host in order to obfuscate itself.

Note: this concept was partly inspired by hgarrereyn’s breach challenge from DiceCTF 2022.

We can now modify our disassembler to use the new decryption seed after a certain offset. This allows to decrypt the rest of the program, which implements the flag checking logic.

mov RA, 0

mov R8, [flag0]
mov R9, R8
mod R8, 76335
mod R9, 120026
xor R8, 27241
xor R9, 68181
or RA, R8
or RA, R9

mov R8, [flag4]
mov R9, R8
mod R8, 110125
mod R9, 108506
xor R8, 92563
xor R9, 45127
or RA, R8
or RA, R9

mov R8, [flag8]
mov R9, R8
mod R8, 98851
mod R9, 115966
xor R8, 42287
xor R9, 46766
or RA, R8
or RA, R9

mov R8, [flag12]
mov R9, R8
mod R8, 98564
mod R9, 114611
xor R8, 96512
xor R9, 58270
or RA, R8
or RA, R9

cmp RA, 0
jne fail

mov R8, [flag16]
mov R9, R8
mod R8, 69129
mod R9, 88694
xor R8, 14900
xor R9, 82705
or RA, R8
or RA, R9

mov R8, [flag20]
mov R9, R8
mod R8, 109460
mod R9, 68883
xor R8, 24419
xor R9, 41368
or RA, R8
or RA, R9

mov R8, [flag24]
mov R9, R8
mod R8, 82595
mod R9, 65709
xor R8, 40736
xor R9, 58761
or RA, R8
or RA, R9

mov R8, [flag28]
mov R9, R8
mod R8, 130942
mod R9, 71079
xor R8, 22132
xor R9, 26140
or RA, R8
or RA, R9

cmp RA, 0
jne fail

mov R2, good
mov R3, $good
syscall 1  ; write
syscall 2  ; exit

fail:
mov R2, nope
mov R3, $nope
syscall 1  ; write
syscall 2  ; exit

The dynamic route

When the challenge was first designed, we found a little bypass (or shortcut): by instrumenting the binary to dump all instructions after they’re decrypted, we could totally circumvent the “key” point of the challenge, which is understanding the obfuscation.

Indeed, we would be able to dump all the executed instructions, including the flag checking logic, while staying totally oblivious to the vulnerability that was exploited right before our eyes and the seed modification that happened.

Therefore, I wanted to even out the difficulty a bit between the two approaches, by making this route more complex (not significantly much, but still throwing a little spanner in the works).

In order to do that, I implemented a few protections and mitigations, namely anti-debug and anti-hooking tricks.

The host binary leverages the IsDebuggerPresent and SetProcessMitigationPolicy functions from the Windows API to:

  • check whether a debugger is present;
  • enable the ProcessDynamicCodePolicy policy (ACG), thwarting most hooking tools that allow RWX trampolines (e.g. frida, detours);
  • enable the ProcessSignaturePolicy policy (CIG), preventing non-Microsoft-signed DLLs from being loaded into the binary.

However, these calls are not directly seen in the binary; they are performed from inside the VM, through two “proxy” syscalls (remember fun3 and fun4?) and some GetProcAddress. One should identify these syscalls in the host and patch these to cancel out their effect.

Once the mitigations are overcome, a tiny frida hook can easily dump all instructions post-decryption:

Interceptor.attach(Module.findBaseAddress("kaleidoscope.exe").add(0x1190), {
    onLeave: (retVal) => {
        console.log(retVal.toString(16));
        return retVal;
    }
});

The VM instruction handling logic still needs to be reversed a little bit to understand the instructions (but only those of interest).

Conclusion

The flag checking logic is unmistakably not the most intricate part of the challenge.

Each DWORD from the flag must satisfy constraints as following:

∳∳ \left\{ \: \begin{aligned} \text{flag}_i = x_{i,1} \mod{n_{i,1}} \\ \text{flag}_i = x_{i,2} \mod{n_{i,2}} \end{aligned} \right. ∳∳

We can get rid of these using the Chinese remainder theorem (or simply z3…).

This yields the following flag: ECW{w31rdtu4l_m4ch1n3s_g00d_fun}!

I hope you enjoyed this challenge. I wanted to bring together different elements: a little bit of Windows reversing, a little bit of inter-thread communication, a little bit of crypto… and tie it all together with an auto-exploiting VM, which I have seen a few times during CTFs.

Most experienced reverse players have already deobfuscated plenty of VMs and might find it repetitive, but these ones come with a little twist, which I think is interesting.

During the CTF, many players seem to have opted for the dynamic route, and rightfully so: in hindsight, this was clearly the easiest route. I hoped people would figure out the bugs in the VM, but very few actually did.

I should have made debugging harder, or pushed the exploitation concept even further (e.g. getting code execution).

Some contestants noticed uncanny stuff going on that miraculously changed the TEA seed, and just dumped the new seed. Others didn’t even realize anything was happening and managed to directly dump the flag constraints (though sometimes through more creative approaches, such as Qiling emulation or modifying Wine).

Anyway, there were a few slip-ups with this challenge and it was eventually not as hard as I intended, but overall I believe it was still relevant and of decent difficulty.

You can find the sources of the challenge over on Github if you’re interested!