picoCTF 2014: Baleful (re200) Part 2
Welcome to the second part of the Baleful writeup. In the first part, we started reverse engineering Baleful and figuring out how it worked. We eventually deduced that it was a VM that ran an embedded byte code program. By stepping through the VM in GDB, we were able to learn how the registers and stack worked in the VM. Now that we have this knowledge, it's easy to decipher the rest of the instructions. Let's get to it!
We know that the stack pointer is located in [ebp-0x38], so other instructions using that would also be doing something with the stack. Opcodes 0x1e and 0x1f both do, meaning it'd be good to take a look at them. First let's see 0x1e:
.text:08049B92 loc_8049B92: ; CODE XREF: sub_804898B+C0 j
.text:08049B92 ; DATA XREF: .rodata:vm_instrs o
.text:08049B92 mov eax, [ebp+ipos] ; jumptable 08048A4B case 30
.text:08049B95 add eax, 1
.text:08049B98 movzx eax, byte_804C0C0[eax]
.text:08049B9F movsx eax, al
.text:08049BA2 mov [ebp+var_C], eax
.text:08049BA5 cmp [ebp+var_C], 0
.text:08049BA9 jz short loc_8049BC1
.text:08049BAB mov eax, [ebp+ipos]
.text:08049BAE add eax, 2
.text:08049BB1 add eax, offset byte_804C0C0
.text:08049BB6 mov eax, [eax]
.text:08049BB8 mov [ebp+var_24], eax
.text:08049BBB add [ebp+ipos], 6
.text:08049BBF jmp short loc_8049BDF
.text:08049BC1 ; ---------------------------------------------------------------------------
.text:08049BC1
.text:08049BC1 loc_8049BC1: ; CODE XREF: sub_804898B+121E j
.text:08049BC1 mov eax, [ebp+ipos]
.text:08049BC4 add eax, 2
.text:08049BC7 movzx eax, byte_804C0C0[eax]
.text:08049BCE movsx eax, al
.text:08049BD1 mov eax, [ebp+eax*4+regs]
.text:08049BD8 mov [ebp+var_24], eax
.text:08049BDB add [ebp+ipos], 3
.text:08049BDF
.text:08049BDF loc_8049BDF: ; CODE XREF: sub_804898B+1234 j
.text:08049BDF mov eax, [ebp+stack]
.text:08049BE2 sub eax, 4
.text:08049BE5 mov [ebp+stack], eax
.text:08049BE8 mov eax, [ebp+stack]
.text:08049BEB lea edx, byte_804C0C0[eax]
.text:08049BF1 mov eax, [ebp+var_24]
.text:08049BF4 mov [edx], eax
.text:08049BF6 jmp short loc_8049C67
This function reads the second byte of the instruction, and takes one of two paths depending on the value. If it's 0, it goes to 0x08049bc1, otherwise, it ends up branching to 0x08049bdf. Both cases are nearly identical: the stack pointer is decremented by 4 and a value gets written onto the stack. This is quite clearly a PUSH instruction. But why two different cases? Well, case 0 loads the value from a register specified in the instruction, while case 1 puts a constant on the stack. This PUSH instruction can use either a register or a constant as a data source.
Now let's look at 0x1f, which is much simpler:
.text:08049BF8 loc_8049BF8: ; CODE XREF: sub_804898B+C0 j
.text:08049BF8 ; DATA XREF: .rodata:vm_instrs o
.text:08049BF8 mov eax, [ebp+ipos] ; jumptable 08048A4B case 31
.text:08049BFB add eax, 1
.text:08049BFE movzx eax, byte_804C0C0[eax]
.text:08049C05 movsx eax, al
.text:08049C08 mov [ebp+var_24], eax
.text:08049C0B mov eax, [ebp+stack]
.text:08049C0E add eax, offset byte_804C0C0
.text:08049C13 mov edx, [eax]
.text:08049C15 mov eax, [ebp+var_24]
.text:08049C18 mov [ebp+eax*4+regs], edx
.text:08049C1F mov eax, [ebp+stack]
.text:08049C22 add eax, 4
.text:08049C25 mov [ebp+stack], eax
.text:08049C28 add [ebp+ipos], 2
.text:08049C2C jmp short loc_8049C67
The second byte of the instruction is loaded into EAX and stored temporarily in [ebp-0x24]. Then we read a 4-byte value from the current position of the stack into EDX. Next, we use the value stored in [ebp-0x24] as a register number. This register number determines which register we write EDX into. Finally, the stack pointer is incremented by 4-bytes. This is clearly a POP instruction. We specify a destination register, and this instruction puts the current stack value inside it before incrementing the stack pointer.
What instruction should we look at next? It might be a good idea to take a second look at MOV. Originally, it looked far too intimidating to statically analyze, so we just stepped through it in GDB. Now we know more about the VM, and can probably figure the rest out.
.text:08049A02 mov_8049A02: ; CODE XREF: sub_804898B+C0 j
.text:08049A02 ; DATA XREF: .rodata:vm_instrs o
.text:08049A02 mov eax, [ebp+ipos] ; jumptable 08048A4B case 24
.text:08049A05 add eax, 1
.text:08049A08 movzx eax, byte_804C0C0[eax]
.text:08049A0F movsx eax, al
.text:08049A12 mov [ebp+var_C], eax
.text:08049A15 mov eax, [ebp+var_C]
.text:08049A18 test eax, eax
.text:08049A1A jz short loc_8049A23
.text:08049A1C cmp eax, 1
.text:08049A1F jz short loc_8049A57
.text:08049A21 jmp short loc_8049A81
.text:08049A23 ; ---------------------------------------------------------------------------
.text:08049A23
.text:08049A23 loc_8049A23: ; CODE XREF: sub_804898B+108F j
.text:08049A23 mov eax, [ebp+ipos]
.text:08049A26 add eax, 2
.text:08049A29 movzx eax, byte_804C0C0[eax]
.text:08049A30 movsx eax, al
.text:08049A33 mov edx, [ebp+ipos]
.text:08049A36 add edx, 3
.text:08049A39 movzx edx, byte_804C0C0[edx]
.text:08049A40 movsx edx, dl
.text:08049A43 mov edx, [ebp+edx*4+regs]
.text:08049A4A mov [ebp+eax*4+regs], edx
.text:08049A51 add [ebp+ipos], 4
.text:08049A55 jmp short loc_8049A81
.text:08049A57 ; ---------------------------------------------------------------------------
.text:08049A57
.text:08049A57 loc_8049A57: ; CODE XREF: sub_804898B+1094 j
.text:08049A57 mov eax, [ebp+ipos]
.text:08049A5A add eax, 2
.text:08049A5D movzx eax, byte_804C0C0[eax]
.text:08049A64 movsx eax, al
.text:08049A67 mov edx, [ebp+ipos]
.text:08049A6A add edx, 3
.text:08049A6D add edx, offset byte_804C0C0
.text:08049A73 mov edx, [edx]
.text:08049A75 mov [ebp+eax*4+regs], edx
.text:08049A7C add [ebp+ipos], 7
.text:08049A80 nop
.text:08049A81
.text:08049A81 loc_8049A81: ; CODE XREF: sub_804898B+1096 j
.text:08049A81 ; sub_804898B+10CA j
.text:08049A81 jmp loc_8049C67
Just like before, it's taking the second byte and using it to determine which case to enter. Let's see what each case does. Case 0 reads the third and fourth bytes of the instruction, then uses them as register numbers. It takes the value of the register specified by the fourth byte and puts it in the register specified by the third byte. Looks like case 0 is a register-register MOV. What about case 1? It also uses the third byte as a destination register, but it instead puts a constant (included in the instruction into that register). Similarly to PUSH, it has two cases: one for a register source and one for a constant source. The destination is always a register.
We know that 0x0f is the CALL instruction. But there are also several other branching instructions we ought to take a look at.
.text:080496D1 loc_80496D1: ; CODE XREF: sub_804898B+C0 j
.text:080496D1 ; DATA XREF: .rodata:vm_instrs o
.text:080496D1 mov eax, [ebp+ipos] ; jumptable 08048A4B case 14
.text:080496D4 add eax, 1
.text:080496D7 add eax, offset byte_804C0C0
.text:080496DC mov eax, [eax]
.text:080496DE mov [ebp+var_10], eax
.text:080496E1 mov eax, [ebp+var_10]
.text:080496E4 mov [ebp+ipos], eax
.text:080496E7 jmp loc_8049C67
This reads a new instruction pointer from the opcode data and puts it in ipos. However, the stack is not involved, meaning this is just a normal, unconditional jump. What other kinds of jumps are there?
.text:080496EC loc_80496EC: ; CODE XREF: sub_804898B+C0 j
.text:080496EC ; DATA XREF: .rodata:vm_instrs o
.text:080496EC mov eax, [ebp+ipos] ; jumptable 08048A4B case 16
.text:080496EF add eax, 1
.text:080496F2 add eax, offset byte_804C0C0
.text:080496F7 mov eax, [eax]
.text:080496F9 mov [ebp+var_10], eax
.text:080496FC cmp [ebp+var_28], 0
.text:08049700 jz short loc_804970A
.text:08049702 mov eax, [ebp+ipos]
.text:08049705 add eax, 5
.text:08049708 jmp short loc_804970D
.text:0804970A ; ---------------------------------------------------------------------------
.text:0804970A
.text:0804970A loc_804970A: ; CODE XREF: sub_804898B+D75 j
.text:0804970A mov eax, [ebp+var_10]
.text:0804970D
.text:0804970D loc_804970D: ; CODE XREF: sub_804898B+D7D j
.text:0804970D mov [ebp+ipos], eax
.text:08049710 jmp loc_8049C67
This does almost the same thing as JMP, but only modifies ipos if [ebp-0x28] is 0. Recall that on x86, the jz instruction is equivalent to jump-if-equal-to. It subtracts the two comparison operands, and if the result is 0 (indicating equality), it jumps to them. What's here is very similar, so maybe [ebp-0x28] holds the result of that subtraction. That would make it a condition register. If we look further down, many opcodes also check the condition register in similar ways, but with different types of jumps.
If there are multiple jumps only branching if a comparison holds true, there must be some instructions that perform those comparisons. Both 0x16 and 0x17 modify the condition register, so they're good candidates. We'll take a look at 0x16 first:
.text:080497E2 loc_80497E2: ; CODE XREF: sub_804898B+C0 j
.text:080497E2 ; DATA XREF: .rodata:vm_instrs o
.text:080497E2 mov eax, [ebp+ipos] ; jumptable 08048A4B case 22
.text:080497E5 add eax, 1
.text:080497E8 movzx eax, byte_804C0C0[eax]
.text:080497EF movsx eax, al
.text:080497F2 mov [ebp+var_C], eax
.text:080497F5 mov eax, [ebp+var_C]
.text:080497F8 cmp eax, 1
.text:080497FB jz short loc_804985B
.text:080497FD cmp eax, 1
.text:08049800 jg short loc_804980B
.text:08049802 test eax, eax
.text:08049804 jz short loc_804981E
.text:08049806 jmp loc_80498E0
.text:0804980B ; ---------------------------------------------------------------------------
.text:0804980B
.text:0804980B loc_804980B: ; CODE XREF: sub_804898B+E75 j
.text:0804980B cmp eax, 2
.text:0804980E jz short loc_804988B
.text:08049810 cmp eax, 4
.text:08049813 jz loc_80498BB
.text:08049819 jmp loc_80498E0
.text:0804981E ; ---------------------------------------------------------------------------
.text:0804981E
.text:0804981E loc_804981E: ; CODE XREF: sub_804898B+E79 j
.text:0804981E mov eax, [ebp+ipos]
.text:08049821 add eax, 2
.text:08049824 movzx eax, byte_804C0C0[eax]
.text:0804982B movsx eax, al
.text:0804982E mov eax, [ebp+eax*4+regs]
.text:08049835 mov [ebp+var_20], eax
.text:08049838 mov eax, [ebp+ipos]
.text:0804983B add eax, 3
.text:0804983E movzx eax, byte_804C0C0[eax]
.text:08049845 movsx eax, al
.text:08049848 mov eax, [ebp+eax*4+regs]
.text:0804984F mov [ebp+var_1C], eax
.text:08049852 add [ebp+ipos], 4
.text:08049856 jmp loc_80498E0
.text:0804985B ; ---------------------------------------------------------------------------
.text:0804985B
.text:0804985B loc_804985B: ; CODE XREF: sub_804898B+E70 j
.text:0804985B mov eax, [ebp+ipos]
.text:0804985E add eax, 2
.text:08049861 movzx eax, byte_804C0C0[eax]
.text:08049868 movsx eax, al
.text:0804986B mov eax, [ebp+eax*4+regs]
.text:08049872 mov [ebp+var_20], eax
.text:08049875 mov eax, [ebp+ipos]
.text:08049878 add eax, 3
.text:0804987B add eax, offset byte_804C0C0
.text:08049880 mov eax, [eax]
.text:08049882 mov [ebp+var_1C], eax
.text:08049885 add [ebp+ipos], 7
.text:08049889 jmp short loc_80498E0
.text:0804988B ; ---------------------------------------------------------------------------
.text:0804988B
.text:0804988B loc_804988B: ; CODE XREF: sub_804898B+E83 j
.text:0804988B mov eax, [ebp+ipos]
.text:0804988E add eax, 2
.text:08049891 add eax, offset byte_804C0C0
.text:08049896 mov eax, [eax]
.text:08049898 mov [ebp+var_20], eax
.text:0804989B mov eax, [ebp+ipos]
.text:0804989E add eax, 6
.text:080498A1 movzx eax, byte_804C0C0[eax]
.text:080498A8 movsx eax, al
.text:080498AB mov eax, [ebp+eax*4+regs]
.text:080498B2 mov [ebp+var_1C], eax
.text:080498B5 add [ebp+ipos], 7
.text:080498B9 jmp short loc_80498E0
.text:080498BB ; ---------------------------------------------------------------------------
.text:080498BB
.text:080498BB loc_80498BB: ; CODE XREF: sub_804898B+E88 j
.text:080498BB mov eax, [ebp+ipos]
.text:080498BE add eax, 2
.text:080498C1 add eax, offset byte_804C0C0
.text:080498C6 mov eax, [eax]
.text:080498C8 mov [ebp+var_20], eax
.text:080498CB mov eax, [ebp+ipos]
.text:080498CE add eax, 6
.text:080498D1 add eax, offset byte_804C0C0
.text:080498D6 mov eax, [eax]
.text:080498D8 mov [ebp+var_1C], eax
.text:080498DB add [ebp+ipos], 0Ah
.text:080498DF nop
.text:080498E0
.text:080498E0 loc_80498E0: ; CODE XREF: sub_804898B+E7B j
.text:080498E0 ; sub_804898B+E8E j ...
.text:080498E0 mov eax, [ebp+var_1C]
.text:080498E3 mov edx, [ebp+var_20]
.text:080498E6 and eax, edx
.text:080498E8 mov [ebp+cr], eax
.text:080498EB jmp loc_8049C67
It looks like an incredibly complex instruction to begin with. There are multiple code paths depending on the second byte, but now even more than ever. However, we do know that all code paths ultimately lead to 0x080498e0. That local routine does a bitwise AND of two arguments and puts the result in CR. This is exactly what the x86 TEST instruction does, so we're probably looking at the same thing. As for all the cases, it looks daunting but just requires careful investigation. The cases just specify the format of the two source operands:
- 0 - Register-register
- 1 - Register-constant
- 2 - Constant-register
- 4 - Constant-constant
Knowing how that big set of cases works makes practically every remaining function easy to reverse engineer. 0x17 is pretty much the same, but it just performs a subtraction at the end rather than AND, which is what CMP does. Almost all of the arithmetic instructions also work like this. IDIV, the division instruction, has a bit of weirdness, though. It needs two destination registers, one for the quotient and one for the remainder. This initially tripped me up a little.
There are, however, two other weird instructions, 0x1b and 0x1c. They don't immediately look like anything in the x86 instruction set, so we'll have to investigate them more closely.
.text:08049AEC loc_8049AEC: ; CODE XREF: sub_804898B+C0 j
.text:08049AEC ; DATA XREF: .rodata:vm_instrs o
.text:08049AEC mov eax, [ebp+ipos] ; jumptable 08048A4B case 27
.text:08049AEF add eax, 1
.text:08049AF2 movzx eax, byte_804C0C0[eax]
.text:08049AF9 movsx eax, al
.text:08049AFC mov [ebp+var_24], eax
.text:08049AFF mov eax, [ebp+ipos]
.text:08049B02 add eax, 2
.text:08049B05 movzx eax, byte_804C0C0[eax]
.text:08049B0C movsx eax, al
.text:08049B0F mov [ebp+var_20], eax
.text:08049B12 add [ebp+ipos], 3
.text:08049B16 mov eax, [ebp+var_20]
.text:08049B19 mov eax, [ebp+eax*4+regs]
.text:08049B20 add eax, offset byte_804C0C0
.text:08049B25 mov edx, [eax]
.text:08049B27 mov eax, [ebp+var_24]
.text:08049B2A mov [ebp+eax*4+regs], edx
.text:08049B31 mov eax, [ebp+var_24]
.text:08049B34 mov eax, [ebp+eax*4+regs]
.text:08049B3B mov [ebp+cr], eax
.text:08049B3E jmp loc_8049C67
.text:08049B43 ; ---------------------------------------------------------------------------
.text:08049B43
.text:08049B43 loc_8049B43: ; CODE XREF: sub_804898B+C0 j
.text:08049B43 ; DATA XREF: .rodata:vm_instrs o
.text:08049B43 mov eax, [ebp+ipos] ; jumptable 08048A4B case 28
.text:08049B46 add eax, 1
.text:08049B49 movzx eax, byte_804C0C0[eax]
.text:08049B50 movsx eax, al
.text:08049B53 mov [ebp+var_24], eax
.text:08049B56 mov eax, [ebp+ipos]
.text:08049B59 add eax, 2
.text:08049B5C movzx eax, byte_804C0C0[eax]
.text:08049B63 movsx eax, al
.text:08049B66 mov [ebp+var_20], eax
.text:08049B69 add [ebp+ipos], 3
.text:08049B6D mov eax, [ebp+var_24]
.text:08049B70 mov eax, [ebp+eax*4+regs]
.text:08049B77 add eax, offset byte_804C0C0
.text:08049B7C mov edx, [ebp+var_20]
.text:08049B7F mov edx, [ebp+edx*4+regs]
.text:08049B86 mov [eax], edx
.text:08049B88 mov eax, [eax]
.text:08049B8A mov [ebp+cr], eax
.text:08049B8D jmp loc_8049C67
Both are similar, let's approach 0x1b first. The second and third bytes are both used for register accesses. It reads the data of the register specified by the third byte. This data is then used as an offset into the bytecode area, and the data at that offset is read. Finally, it writes the data into the register specified by the second byte. It might not be obvious at first, but this is just a memory read instruction. It reads memory from the address specified in the source register, and puts that data in the destination register. 0x1c does the same thing in reverse. I decided to call them mem2reg and reg2mem, respectively. They're really just RISC-like load and store instructions.
We could continue and reverse engineer all of the instructions, but it would get really repetitive. The process is basically the same for all of the VM instructions. If we manage to reverse engineer all 33 instructions, we get this list:
- 0x00: nop
- 0x01: ret
- 0x02: add (3 arguments; destination and two source)
- 0x03: sub (3 arguments; destination and two source)
- 0x04: imul (3 arguments; destination and two source)
- 0x05: idiv (4 arguments; quotient/remainder destinations and two source)
- 0x06: xor (3 arguments; destination and two source)
- 0x07: neg (2 arguments; destination and source)
- 0x08: not (2 arguments; destination and source)
- 0x09: and (3 arguments; destination and two source)
- 0x0a: or (3 arguments; destination and two source)
- 0x0b: logicnot (2 arguments; destination and source)
- 0x0c: shl (3 arguments; destination and two source)
- 0x0d: sar (3 arguments; destination and two source)
- 0x0e: jmp (1 argument; offset to go to)
- 0x0f: call (1 argument; offset to go to)
- 0x10: jz (1 argument; offset to go to)
- 0x11: js (1 argument; offset to go to)
- 0x12: jle (1 argument; offset to go to)
- 0x13: jg (1 argument; offset to go to)
- 0x14: jns (1 argument; offset to go to)
- 0x15: jnz (1 argument; offset to go to)
- 0x16: test (2 arguments; two operands)
- 0x17: cmp (2 arguments; two operands)
- 0x18: mov (2 arguments; destination and source)
- 0x19: inc (1 argument; is both destination and source)
- 0x1a: dec (1 argument; is both destination and source)
- 0x1b: mem2reg (2 arguments; destination and source registers)
- 0x1c: reg2mem (2 arguments; destination and source registers)
- 0x1d: hlt
- 0x1e: push (1 argument; source)
- 0x1f: pop (1 argument; destination register)
- 0x20: io (1 argument; function code)
With all these instructions figured out, we can write a short Python script that disassembles the Baleful bytecode. I won't bother documenting that here; writing the disassembler is a fairly easy and boring task. Something more interesting, though, is how we get access to the bytecode in the first place.
Remember back in part 1, when I suspected that the bytecode was being encrypted or packed? I said this because the data looked random, and taking a new look at the VM with our recent discoveries in mind reveals that it's not valid bytecode at all. In fact, we can prove the hypothesis correct by comparing what I dumped from 0x0804db39 in memory to what's in the binary:
They're not even remotely similar. Something is definitely happening to the bytecode. It's likely that what's in memory at the time is valid, so let's just dump that to a file from GDB:
If we look at memdump.bin, it's much different. Rather than random garbage, we get something that appears to be proper bytecode. If we try disassembling it, it results in something sane, which is also a good sign. Now for all our reverse engineering efforts, we're rewarded with...another binary to reverse engineer. Oh well, c'est la vie (not really).
The ASM code is interesting. It actually has many elements of RISC architectures, like the larger number of registers and dedicated memory access instructions. The ASM generated is quite big, though not as big is Baleful. Let's start at the beginning of the bytecode.
This function, right at the beginning is interesting. Starting at address 0x103a, it reads 4-bytes from memory, XORs them against a constant bitmask, and writes them back to the same spot. It doesn't step until it hits 0x1edb. The purpose of this code is easily apparent: it's decrypting the rest of the bytecode! This is solid proof that the decryption hypothesis was correct. After the decryption, it branches to 0x103a. which itself just goes to 0x1bc0.
(Before going there, though, 0x103f has a bunch of I/O instructions, each one for a specific I/O function in the VM. It's helpful, but not strictly necessary, to give the ones we recognize names. In general, it's a good idea to break up blocks of code separated by RET instructions into separate functions.)
0x1bc0 appears to be the main function of the bytecode program. It starts out with a call to 0x1a6d, which is just this code sequence:
It's a series of instructions all doing the same thing: loading a byte into r0 and calling 0x103f. 0x103f contains the call to print_char using the I/O instruction, so what's getting loaded into r0 is then printed. If we treat all of these bytes as ASCII characters and try to reconstruct the message, it turns out to be "Please enter your password: ", meaning 0x1a6d prints the password prompt.
Once control returns to main, main loads 0x04 into r0, 0x1e into r1, and calls 0x1080. What does 0x1080 do?
It multiplies r1 (which is 0x1e) by r0 (which is 0x04), adds 8 to it, and stores it in r0. That value becomes an argument to I/O function 0x11. The function then ensures that the I/O function didn't return 0 and writes the block size to the beginning of the returned memory. Finally, it increments the address by 8 and returns it. Although I didn't know for sure what 0x1080 was doing, it reminded me a lot of malloc. It makes a request to the VM which returns a memory address, and then writes some extra data to the beginning of the buffer (which the user does not see), like heap block headers. I assumed it was something like malloc(block_size, num_blocks), which turned out to be correct. You can also confirm this by looking at Baleful, but aren't we tired of that by this point?
So main is allocating an array of 0x1e 4-byte entries. We don't know what it's for at this point. The returned buffer is saved in r11. main then loads 0 into r9 and jumps to 0x1d66.
We actually start out in the second instruction of an existing function, which is interesting. Anyway, it first checks r9 to make sure it hasn't exceeded 0x1e. If it hasn't exceeded 0x1e, it jumps to 0x1bfb, which we'll look at in a second. If it has exceeded 0x1e, it makes a call to 0x104b, which calls input_char using the I/O instruction. Then it checks the value of the character, and if it's anything besides 0xa (newline, indicating end of input), it prints "Sorry, wrong password!" and terminates. What this appears to be is a password length check. 0x1e is likely the password length, and if we enter more than 0x1e characters (not counting newline), the password is automatically wrong.
That's only if we fail the check, though. If the check succeeds, the function jumps to 0x1bfb, which is shown below.
Recall that r9 contains 0 at this point, and r11 has the 0x1e*4-byte array we allocated. It uses r9 as an index into the array and puts that in r10. Then it makes a call to 0x104b, which we already know reads a character from stdin. This character is returned in r0, and gets written to the array spot that we put in r10. So the buffer we allocated is being used to store the password that the user entered. After storing it in the buffer, it once again checks if the character is a newline. In this case, receiving a newline means that the password is less than 0x1e characters, further supporting the idea that 0x1e is the password length. Finally, it goes back to 0x1d5e, which increments r9 by one and repeats the loop.
We've now identified the loop which reads the password from the user. Now what happens after reading it? Assuming the length is correct, control proceeds to 0x1ebe:
Which simply puts the password buffer in r0 and calls 0x12a9. Since we already have the password, the only thing left to do is check it. That's probably what 0x12a9 is for.
0x12a9 starts out by allocating two new buffers. Both have an unknown purpose, though one of them is the same length as the password buffer, so it's probably related. Afterwards, we just have a bunch of code which fills the buffers up. It's a huge amount of code, so I'm not listing it all here. I'll just put some C code that shows the buffer contents:
After the buffers are filled up, we finally get to the meat of the function, which actually checks the password. Both buffers are involved in an interesting way.
r1, r2, and r3 are set to 0 before jumping to 0x1860. It seems r2 is being used to keep track of the current index, since once it gets past 0x1e, the check ends. r3 seems to be a status indicator as to whether the check succeeded. If it's anything put 0, it prints the "Sorry, wrong password!" message, but if it is 0, it prints "Congratulations!" Until r2 hits 0x1e, it's looping through 0x17d9, which has to be what's actually checking the contents of the password.
It uses the index to grab the current character of the password and put it in r4. Then it does something weird: idiv r0, r3, r2, 4. idiv is one of the more complicated instructions done by the VM. It has two different destination registers, one for the quotient of division and one for the remainder. r0, the quotient, gets trashed immediately afterwards, so the program only cares about the remainder in r3. r3 would then equal index % 4.
Why is it computing the index modulo 4? Immediately after this division, it uses index % 4 as an index into that 16-byte buffer. Remember that the buffer contained 4 entries, so this remainder division is making sure we don't go past that buffer. In essence, it rotates through that 16-byte buffer depending on our index in the password. What's read from that buffer is stored in r3.
Finally, it XORs the character we entered with the value read from the 16-byte buffer. It then compares it against the corresponding byte in the other buffer, the one that's the same size as the password buffer. The loop continues either way, but if they're not equal, it ends up putting 1 in r3. This, as we found above, causes the check to ultimately fail and print the "Sorry, wrong password!" message. All the checks have to succeed for the "Congratulations!" message to print.
We now have everything we need to reconstruct the password. buf1 contains the password character we need, XORed with the correct mask for that character's index. XORs can be easily inverted by XORing again with the same mask, so we just need to do that for every character in buf1 and we'll get back the original password. Once this is done, we end up getting the flag packers_and_vms_and_xors_oh_my!
This probably concludes the picoCTF 2014 writeups I'll be doing on this blog. Thanks to the picoCTF team for an amazing set of challenges, especially Baleful, which was probably my favorite of them all. You guys are amazing, and I can't wait to play picoCTF again next year!
Remember back in part 1, when I suspected that the bytecode was being encrypted or packed? I said this because the data looked random, and taking a new look at the VM with our recent discoveries in mind reveals that it's not valid bytecode at all. In fact, we can prove the hypothesis correct by comparing what I dumped from 0x0804db39 in memory to what's in the binary:
(gdb) x/16xb 0x0804db39
0x804db39: 0x18 0x01 0x00 0x6c 0x00 0x00 0x00 0x0f
0x804db41: 0x3f 0x10 0x00 0x00 0x18 0x01 0x00 0x65
(gdb)
.data:0804DB39 db 19h
.data:0804DB3A db 43h ; C
.data:0804DB3B db 0CFh ; -
.data:0804DB3C db 18h
.data:0804DB3D db 1
.data:0804DB3E db 42h ; B
.data:0804DB3F db 0CFh ; -
.data:0804DB40 db 7Bh ; {
.data:0804DB41 db 3Eh ; >
.data:0804DB42 db 52h ; R
.data:0804DB43 db 0CFh ; -
.data:0804DB44 db 74h ; t
.data:0804DB45 db 19h
.data:0804DB46 db 43h ; C
.data:0804DB47 db 0CFh ; -
.data:0804DB48 db 11h
They're not even remotely similar. Something is definitely happening to the bytecode. It's likely that what's in memory at the time is valid, so let's just dump that to a file from GDB:
(gdb) dump memory memdump.bin 0x0804c0c0 0x0804e000
If we look at memdump.bin, it's much different. Rather than random garbage, we get something that appears to be proper bytecode. If we try disassembling it, it results in something sane, which is also a good sign. Now for all our reverse engineering efforts, we're rewarded with...another binary to reverse engineer. Oh well, c'est la vie (not really).
The ASM code is interesting. It actually has many elements of RISC architectures, like the larger number of registers and dedicated memory access instructions. The ASM generated is quite big, though not as big is Baleful. Let's start at the beginning of the bytecode.
0x1000: mov r0, 0x103a
0x1007: mov r1, 0x1edb
0x100e: mov r3, r0
0x1012: mov r5, 0x174cf42
0x1019: mem2reg r4, r3
0x101c: xor r4, r4, r5
0x1021: reg2mem r3, r4
0x1024: add r3, r3, 0x4
0x102c: cmp r1, r3
0x1030: jns 0x1019
0x1035: jmp 0x103a
This function, right at the beginning is interesting. Starting at address 0x103a, it reads 4-bytes from memory, XORs them against a constant bitmask, and writes them back to the same spot. It doesn't step until it hits 0x1edb. The purpose of this code is easily apparent: it's decrypting the rest of the bytecode! This is solid proof that the decryption hypothesis was correct. After the decryption, it branches to 0x103a. which itself just goes to 0x1bc0.
(Before going there, though, 0x103f has a bunch of I/O instructions, each one for a specific I/O function in the VM. It's helpful, but not strictly necessary, to give the ones we recognize names. In general, it's a good idea to break up blocks of code separated by RET instructions into separate functions.)
0x1bc0 appears to be the main function of the bytecode program. It starts out with a call to 0x1a6d, which is just this code sequence:
0x1a6d: mov r0, 0x50
0x1a74: call 0x103f
0x1a79: mov r0, 0x6c
0x1a80: call 0x103f
0x1a85: mov r0, 0x65
0x1a8c: call 0x103f
0x1a91: mov r0, 0x61
0x1a98: call 0x103f
0x1a9d: mov r0, 0x73
0x1aa4: call 0x103f
0x1aa9: mov r0, 0x65
0x1ab0: call 0x103f
0x1ab5: mov r0, 0x20
0x1abc: call 0x103f
0x1ac1: mov r0, 0x65
0x1ac8: call 0x103f
0x1acd: mov r0, 0x6e
0x1ad4: call 0x103f
0x1ad9: mov r0, 0x74
0x1ae0: call 0x103f
0x1ae5: mov r0, 0x65
0x1aec: call 0x103f
0x1af1: mov r0, 0x72
0x1af8: call 0x103f
0x1afd: mov r0, 0x20
0x1b04: call 0x103f
0x1b09: mov r0, 0x79
0x1b10: call 0x103f
0x1b15: mov r0, 0x6f
0x1b1c: call 0x103f
0x1b21: mov r0, 0x75
0x1b28: call 0x103f
0x1b2d: mov r0, 0x72
0x1b34: call 0x103f
0x1b39: mov r0, 0x20
0x1b40: call 0x103f
0x1b45: mov r0, 0x70
0x1b4c: call 0x103f
0x1b51: mov r0, 0x61
0x1b58: call 0x103f
0x1b5d: mov r0, 0x73
0x1b64: call 0x103f
0x1b69: mov r0, 0x73
0x1b70: call 0x103f
0x1b75: mov r0, 0x77
0x1b7c: call 0x103f
0x1b81: mov r0, 0x6f
0x1b88: call 0x103f
0x1b8d: mov r0, 0x72
0x1b94: call 0x103f
0x1b99: mov r0, 0x64
0x1ba0: call 0x103f
0x1ba5: mov r0, 0x3a
0x1bac: call 0x103f
0x1bb1: mov r0, 0x20
0x1bb8: call 0x103f
0x1bbd: ret
0x1bbe: nop
0x1bbf: nop
It's a series of instructions all doing the same thing: loading a byte into r0 and calling 0x103f. 0x103f contains the call to print_char using the I/O instruction, so what's getting loaded into r0 is then printed. If we treat all of these bytes as ASCII characters and try to reconstruct the message, it turns out to be "Please enter your password: ", meaning 0x1a6d prints the password prompt.
Once control returns to main, main loads 0x04 into r0, 0x1e into r1, and calls 0x1080. What does 0x1080 do?
0x1080: push r30
0x1083: push r29
0x1086: mov r30, r1
0x108a: imul r0, r30, r0
0x108f: add r0, r0, 0x8
0x1097: mov r29, r0
0x109b: io 17
0x109d: test r0, r0
0x10a1: jz 0x10c2
0x10a6: mov r4, r29
0x10aa: sar r4, r4, 0x3
0x10b2: reg2mem r0, r4
0x10b5: add r0, r0, 0x8
0x10bd: pop r29
0x10bf: pop r30
0x10c1: ret
0x10c2: nop
It multiplies r1 (which is 0x1e) by r0 (which is 0x04), adds 8 to it, and stores it in r0. That value becomes an argument to I/O function 0x11. The function then ensures that the I/O function didn't return 0 and writes the block size to the beginning of the returned memory. Finally, it increments the address by 8 and returns it. Although I didn't know for sure what 0x1080 was doing, it reminded me a lot of malloc. It makes a request to the VM which returns a memory address, and then writes some extra data to the beginning of the buffer (which the user does not see), like heap block headers. I assumed it was something like malloc(block_size, num_blocks), which turned out to be correct. You can also confirm this by looking at Baleful, but aren't we tired of that by this point?
So main is allocating an array of 0x1e 4-byte entries. We don't know what it's for at this point. The returned buffer is saved in r11. main then loads 0 into r9 and jumps to 0x1d66.
0x1d5e: add r9, r9, 0x1
0x1d66: mov r30, r9
0x1d6a: mov r0, 0x1e
0x1d71: cmp r30, r0
0x1d75: js 0x1bfb
0x1d7a: call 0x104b
0x1d7f: mov r1, r0
0x1d83: mov r30, r1
0x1d87: mov r0, 0xa
0x1d8e: cmp r30, r0
0x1d92: jnz 0x1d9c
0x1d97: jmp 0x1ebe
We actually start out in the second instruction of an existing function, which is interesting. Anyway, it first checks r9 to make sure it hasn't exceeded 0x1e. If it hasn't exceeded 0x1e, it jumps to 0x1bfb, which we'll look at in a second. If it has exceeded 0x1e, it makes a call to 0x104b, which calls input_char using the I/O instruction. Then it checks the value of the character, and if it's anything besides 0xa (newline, indicating end of input), it prints "Sorry, wrong password!" and terminates. What this appears to be is a password length check. 0x1e is likely the password length, and if we enter more than 0x1e characters (not counting newline), the password is automatically wrong.
That's only if we fail the check, though. If the check succeeds, the function jumps to 0x1bfb, which is shown below.
0x1bfb: mov r30, r9
0x1bff: imul r30, r30, 0x4
0x1c07: mov r0, r11
0x1c0b: add r30, r30, r0
0x1c10: mov r10, r30
0x1c14: call 0x104b
0x1c19: mov r1, r0
0x1c1d: reg2mem r10, r1
0x1c20: mem2reg r1, r10
0x1c23: mov r30, r1
0x1c27: mov r0, 0xa
0x1c2e: cmp r30, r0
0x1c32: jz 0x1c3c
0x1c37: jmp 0x1d5e
Recall that r9 contains 0 at this point, and r11 has the 0x1e*4-byte array we allocated. It uses r9 as an index into the array and puts that in r10. Then it makes a call to 0x104b, which we already know reads a character from stdin. This character is returned in r0, and gets written to the array spot that we put in r10. So the buffer we allocated is being used to store the password that the user entered. After storing it in the buffer, it once again checks if the character is a newline. In this case, receiving a newline means that the password is less than 0x1e characters, further supporting the idea that 0x1e is the password length. Finally, it goes back to 0x1d5e, which increments r9 by one and repeats the loop.
We've now identified the loop which reads the password from the user. Now what happens after reading it? Assuming the length is correct, control proceeds to 0x1ebe:
0x1ebe: mov r0, r11
0x1ec2: call 0x12a9
Which simply puts the password buffer in r0 and calls 0x12a9. Since we already have the password, the only thing left to do is check it. That's probably what 0x12a9 is for.
0x12a9: push r9
0x12ac: push r10
0x12af: mov r10, r0 # r10 = Password buffer
0x12b3: mov r1, 0x1e
0x12ba: mov r0, 0x4
0x12c1: call 0x1080
0x12c6: mov r9, r0 # r9 = Secondary password buffer?
0x12ca: mov r1, 0x4
0x12d1: mov r0, 0x4
0x12d8: call 0x1080
0x12dd: mov r5, r0 # r5 = r0 = Unknown 16-byte buffer
0x12a9 starts out by allocating two new buffers. Both have an unknown purpose, though one of them is the same length as the password buffer, so it's probably related. Afterwards, we just have a bunch of code which fills the buffers up. It's a huge amount of code, so I'm not listing it all here. I'll just put some C code that shows the buffer contents:
u32 buf1[30] = {0x8d, 0x6f, 0x00, 0x24, 0x98, 0x7c, 0x10, 0x10, 0x9c, 0x60, 0x07, 0x10, 0x8b, 0x63, 0x10, 0x9c, 0x60, 0x07, 0x10, 0x85, 0x61, 0x11, 0x3c, 0xa2, 0x61, 0x0b, 0x10, 0x90, 0x77};
u32 buf2[4] = {0xfd, 0x0e, 0x63, 0x4f};
After the buffers are filled up, we finally get to the meat of the function, which actually checks the password. Both buffers are involved in an interesting way.
true_check:
0x17b5: mov r3, 0x0
0x17bc: jmp 0x17c6
0x17c1: jmp 0x1878
0x17c6: mov r1, 0x0
0x17cd: mov r2, 0x0
0x17d4: jmp 0x1860
actual_check:
0x17d9: mov r30, r2
0x17dd: imul r30, r30, 0x4
0x17e5: mov r0, r10 # r0 = Password buffer
0x17e9: add r30, r30, r0
0x17ee: mov r3, r30
0x17f2: mem2reg r4, r3 # r4 = password[index]
0x17f5: idiv r0, r3, r2, 4 # r3 = index % 4
0x17fd: nop
0x17fe: mov r30, r3
0x1802: imul r30, r30, 0x4
0x180a: mov r0, r5 # r0 = 16-byte buffer
0x180e: add r30, r30, r0
0x1813: mov r3, r30
0x1817: mem2reg r3, r3 # r3 = buf2[index % 4]
0x181a: xor r4, r4, r3 # r4 = password[index] ^ buf2[index % 4]
0x181f: mov r30, r2
0x1823: imul r30, r30, 0x4
0x182b: mov r0, r9 # r0 = Other buffer
0x182f: add r30, r30, r0
0x1834: mov r3, r30
0x1838: mem2reg r3, r3 # r3 = buf1[index]
0x183b: mov r30, r4
0x183f: mov r0, r3
0x1843: cmp r30, r0 # password[index] ^ buf2[index % 4] == buf1[index]
0x1847: jnz 0x1851
0x184c: jmp 0x1858
0x1851: mov r1, 0x1
0x1858: add r2, r2, 0x1
0x1860: mov r3, r1
0x1864: mov r30, r2
0x1868: mov r0, 0x1e
0x186f: cmp r30, r0
0x1873: js 0x17d9
end_of_check:
0x1878: test r3, r3
0x187c: jnz 0x1952 (check_failed)
r1, r2, and r3 are set to 0 before jumping to 0x1860. It seems r2 is being used to keep track of the current index, since once it gets past 0x1e, the check ends. r3 seems to be a status indicator as to whether the check succeeded. If it's anything put 0, it prints the "Sorry, wrong password!" message, but if it is 0, it prints "Congratulations!" Until r2 hits 0x1e, it's looping through 0x17d9, which has to be what's actually checking the contents of the password.
It uses the index to grab the current character of the password and put it in r4. Then it does something weird: idiv r0, r3, r2, 4. idiv is one of the more complicated instructions done by the VM. It has two different destination registers, one for the quotient of division and one for the remainder. r0, the quotient, gets trashed immediately afterwards, so the program only cares about the remainder in r3. r3 would then equal index % 4.
Why is it computing the index modulo 4? Immediately after this division, it uses index % 4 as an index into that 16-byte buffer. Remember that the buffer contained 4 entries, so this remainder division is making sure we don't go past that buffer. In essence, it rotates through that 16-byte buffer depending on our index in the password. What's read from that buffer is stored in r3.
Finally, it XORs the character we entered with the value read from the 16-byte buffer. It then compares it against the corresponding byte in the other buffer, the one that's the same size as the password buffer. The loop continues either way, but if they're not equal, it ends up putting 1 in r3. This, as we found above, causes the check to ultimately fail and print the "Sorry, wrong password!" message. All the checks have to succeed for the "Congratulations!" message to print.
We now have everything we need to reconstruct the password. buf1 contains the password character we need, XORed with the correct mask for that character's index. XORs can be easily inverted by XORing again with the same mask, so we just need to do that for every character in buf1 and we'll get back the original password. Once this is done, we end up getting the flag packers_and_vms_and_xors_oh_my!
This probably concludes the picoCTF 2014 writeups I'll be doing on this blog. Thanks to the picoCTF team for an amazing set of challenges, especially Baleful, which was probably my favorite of them all. You guys are amazing, and I can't wait to play picoCTF again next year!