Thursday, December 11, 2014

picoCTF 2014: Baleful (re200) Part 2

Welcome to the second part of the Baleful writeup. In the first part, we started reverse engineering Baleful and figuring out how it worked. We eventually deduced that it was a VM that ran an embedded byte code program. By stepping through the VM in GDB, we were able to learn how the registers and stack worked in the VM. Now that we have this knowledge, it's easy to decipher the rest of the instructions. Let's get to it!

We know that the stack pointer is located in [ebp-0x38], so other instructions using that would also be doing something with the stack. Opcodes 0x1e and 0x1f both do, meaning it'd be good to take a look at them. First let's see 0x1e:

 .text:08049B92 loc_8049B92:              ; CODE XREF: sub_804898B+C0 j  
 .text:08049B92                     ; DATA XREF: .rodata:vm_instrs o  
 .text:08049B92         mov   eax, [ebp+ipos] ; jumptable 08048A4B case 30  
 .text:08049B95         add   eax, 1  
 .text:08049B98         movzx  eax, byte_804C0C0[eax]  
 .text:08049B9F         movsx  eax, al  
 .text:08049BA2         mov   [ebp+var_C], eax  
 .text:08049BA5         cmp   [ebp+var_C], 0  
 .text:08049BA9         jz   short loc_8049BC1  
 .text:08049BAB         mov   eax, [ebp+ipos]  
 .text:08049BAE         add   eax, 2  
 .text:08049BB1         add   eax, offset byte_804C0C0  
 .text:08049BB6         mov   eax, [eax]  
 .text:08049BB8         mov   [ebp+var_24], eax  
 .text:08049BBB         add   [ebp+ipos], 6  
 .text:08049BBF         jmp   short loc_8049BDF  
 .text:08049BC1 ; ---------------------------------------------------------------------------  
 .text:08049BC1  
 .text:08049BC1 loc_8049BC1:              ; CODE XREF: sub_804898B+121E j  
 .text:08049BC1         mov   eax, [ebp+ipos]  
 .text:08049BC4         add   eax, 2  
 .text:08049BC7         movzx  eax, byte_804C0C0[eax]  
 .text:08049BCE         movsx  eax, al  
 .text:08049BD1         mov   eax, [ebp+eax*4+regs]  
 .text:08049BD8         mov   [ebp+var_24], eax  
 .text:08049BDB         add   [ebp+ipos], 3  
 .text:08049BDF  
 .text:08049BDF loc_8049BDF:              ; CODE XREF: sub_804898B+1234 j  
 .text:08049BDF         mov   eax, [ebp+stack]  
 .text:08049BE2         sub   eax, 4  
 .text:08049BE5         mov   [ebp+stack], eax  
 .text:08049BE8         mov   eax, [ebp+stack]  
 .text:08049BEB         lea   edx, byte_804C0C0[eax]  
 .text:08049BF1         mov   eax, [ebp+var_24]  
 .text:08049BF4         mov   [edx], eax  
 .text:08049BF6         jmp   short loc_8049C67  

This function reads the second byte of the instruction, and takes one of two paths depending on the value. If it's 0, it goes to 0x08049bc1, otherwise, it ends up branching to 0x08049bdf. Both cases are nearly identical: the stack pointer is decremented by 4 and a value gets written onto the stack. This is quite clearly a PUSH instruction. But why two different cases? Well, case 0 loads the value from a register specified in the instruction, while case 1 puts a constant on the stack. This PUSH instruction can use either a register or a constant as a data source.

Now let's look at 0x1f, which is much simpler:

 .text:08049BF8 loc_8049BF8:              ; CODE XREF: sub_804898B+C0 j  
 .text:08049BF8                     ; DATA XREF: .rodata:vm_instrs o  
 .text:08049BF8         mov   eax, [ebp+ipos] ; jumptable 08048A4B case 31  
 .text:08049BFB         add   eax, 1  
 .text:08049BFE         movzx  eax, byte_804C0C0[eax]  
 .text:08049C05         movsx  eax, al  
 .text:08049C08         mov   [ebp+var_24], eax  
 .text:08049C0B         mov   eax, [ebp+stack]  
 .text:08049C0E         add   eax, offset byte_804C0C0  
 .text:08049C13         mov   edx, [eax]  
 .text:08049C15         mov   eax, [ebp+var_24]  
 .text:08049C18         mov   [ebp+eax*4+regs], edx  
 .text:08049C1F         mov   eax, [ebp+stack]  
 .text:08049C22         add   eax, 4  
 .text:08049C25         mov   [ebp+stack], eax  
 .text:08049C28         add   [ebp+ipos], 2  
 .text:08049C2C         jmp   short loc_8049C67  

The second byte of the instruction is loaded into EAX and stored temporarily in [ebp-0x24]. Then we read a 4-byte value from the current position of the stack into EDX. Next, we use the value stored in [ebp-0x24] as a register number. This register number determines which register we write EDX into. Finally, the stack pointer is incremented by 4-bytes. This is clearly a POP instruction. We specify a destination register, and this instruction puts the current stack value inside it before incrementing the stack pointer.

What instruction should we look at next? It might be a good idea to take a second look at MOV. Originally, it looked far too intimidating to statically analyze, so we just stepped through it in GDB. Now we know more about the VM, and can probably figure the rest out.

 .text:08049A02 mov_8049A02:              ; CODE XREF: sub_804898B+C0 j  
 .text:08049A02                     ; DATA XREF: .rodata:vm_instrs o  
 .text:08049A02         mov   eax, [ebp+ipos] ; jumptable 08048A4B case 24  
 .text:08049A05         add   eax, 1  
 .text:08049A08         movzx  eax, byte_804C0C0[eax]  
 .text:08049A0F         movsx  eax, al  
 .text:08049A12         mov   [ebp+var_C], eax  
 .text:08049A15         mov   eax, [ebp+var_C]  
 .text:08049A18         test  eax, eax  
 .text:08049A1A         jz   short loc_8049A23  
 .text:08049A1C         cmp   eax, 1  
 .text:08049A1F         jz   short loc_8049A57  
 .text:08049A21         jmp   short loc_8049A81  
 .text:08049A23 ; ---------------------------------------------------------------------------  
 .text:08049A23  
 .text:08049A23 loc_8049A23:              ; CODE XREF: sub_804898B+108F j  
 .text:08049A23         mov   eax, [ebp+ipos]  
 .text:08049A26         add   eax, 2  
 .text:08049A29         movzx  eax, byte_804C0C0[eax]  
 .text:08049A30         movsx  eax, al  
 .text:08049A33         mov   edx, [ebp+ipos]  
 .text:08049A36         add   edx, 3  
 .text:08049A39         movzx  edx, byte_804C0C0[edx]  
 .text:08049A40         movsx  edx, dl  
 .text:08049A43         mov   edx, [ebp+edx*4+regs]  
 .text:08049A4A         mov   [ebp+eax*4+regs], edx  
 .text:08049A51         add   [ebp+ipos], 4  
 .text:08049A55         jmp   short loc_8049A81  
 .text:08049A57 ; ---------------------------------------------------------------------------  
 .text:08049A57  
 .text:08049A57 loc_8049A57:              ; CODE XREF: sub_804898B+1094 j  
 .text:08049A57         mov   eax, [ebp+ipos]  
 .text:08049A5A         add   eax, 2  
 .text:08049A5D         movzx  eax, byte_804C0C0[eax]  
 .text:08049A64         movsx  eax, al  
 .text:08049A67         mov   edx, [ebp+ipos]  
 .text:08049A6A         add   edx, 3  
 .text:08049A6D         add   edx, offset byte_804C0C0  
 .text:08049A73         mov   edx, [edx]  
 .text:08049A75         mov   [ebp+eax*4+regs], edx  
 .text:08049A7C         add   [ebp+ipos], 7  
 .text:08049A80         nop  
 .text:08049A81  
 .text:08049A81 loc_8049A81:              ; CODE XREF: sub_804898B+1096 j  
 .text:08049A81                     ; sub_804898B+10CA j  
 .text:08049A81         jmp   loc_8049C67  

Just like before, it's taking the second byte and using it to determine which case to enter. Let's see what each case does. Case 0 reads the third and fourth bytes of the instruction, then uses them as register numbers. It takes the value of the register specified by the fourth byte and puts it in the register specified by the third byte. Looks like case 0 is a register-register MOV. What about case 1? It also uses the third byte as a destination register, but it instead puts a constant (included in the instruction into that register). Similarly to PUSH, it has two cases: one for a register source and one for a constant source. The destination is always a register.

We know that 0x0f is the CALL instruction. But there are also several other branching instructions we ought to take a look at.

 .text:080496D1 loc_80496D1:              ; CODE XREF: sub_804898B+C0 j  
 .text:080496D1                     ; DATA XREF: .rodata:vm_instrs o  
 .text:080496D1         mov   eax, [ebp+ipos] ; jumptable 08048A4B case 14  
 .text:080496D4         add   eax, 1  
 .text:080496D7         add   eax, offset byte_804C0C0  
 .text:080496DC         mov   eax, [eax]  
 .text:080496DE         mov   [ebp+var_10], eax  
 .text:080496E1         mov   eax, [ebp+var_10]  
 .text:080496E4         mov   [ebp+ipos], eax  
 .text:080496E7         jmp   loc_8049C67  

This reads a new instruction pointer from the opcode data and puts it in ipos. However, the stack is not involved, meaning this is just a normal, unconditional jump. What other kinds of jumps are there?

 .text:080496EC loc_80496EC:              ; CODE XREF: sub_804898B+C0 j  
 .text:080496EC                     ; DATA XREF: .rodata:vm_instrs o  
 .text:080496EC         mov   eax, [ebp+ipos] ; jumptable 08048A4B case 16  
 .text:080496EF         add   eax, 1  
 .text:080496F2         add   eax, offset byte_804C0C0  
 .text:080496F7         mov   eax, [eax]  
 .text:080496F9         mov   [ebp+var_10], eax  
 .text:080496FC         cmp   [ebp+var_28], 0  
 .text:08049700         jz   short loc_804970A  
 .text:08049702         mov   eax, [ebp+ipos]  
 .text:08049705         add   eax, 5  
 .text:08049708         jmp   short loc_804970D  
 .text:0804970A ; ---------------------------------------------------------------------------  
 .text:0804970A  
 .text:0804970A loc_804970A:              ; CODE XREF: sub_804898B+D75 j  
 .text:0804970A         mov   eax, [ebp+var_10]  
 .text:0804970D  
 .text:0804970D loc_804970D:              ; CODE XREF: sub_804898B+D7D j  
 .text:0804970D         mov   [ebp+ipos], eax  
 .text:08049710         jmp   loc_8049C67

This does almost the same thing as JMP, but only modifies ipos if [ebp-0x28] is 0. Recall that on x86, the jz instruction is equivalent to jump-if-equal-to. It subtracts the two comparison operands, and if the result is 0 (indicating equality), it jumps to them. What's here is very similar, so maybe [ebp-0x28] holds the result of that subtraction. That would make it a condition register. If we look further down, many opcodes also check the condition register in similar ways, but with different types of jumps.

If there are multiple jumps only branching if a comparison holds true, there must be some instructions that perform those comparisons. Both 0x16 and 0x17 modify the condition register, so they're good candidates. We'll take a look at 0x16 first:

 .text:080497E2 loc_80497E2:              ; CODE XREF: sub_804898B+C0 j  
 .text:080497E2                     ; DATA XREF: .rodata:vm_instrs o  
 .text:080497E2         mov   eax, [ebp+ipos] ; jumptable 08048A4B case 22  
 .text:080497E5         add   eax, 1  
 .text:080497E8         movzx  eax, byte_804C0C0[eax]  
 .text:080497EF         movsx  eax, al  
 .text:080497F2         mov   [ebp+var_C], eax  
 .text:080497F5         mov   eax, [ebp+var_C]  
 .text:080497F8         cmp   eax, 1  
 .text:080497FB         jz   short loc_804985B  
 .text:080497FD         cmp   eax, 1  
 .text:08049800         jg   short loc_804980B  
 .text:08049802         test  eax, eax  
 .text:08049804         jz   short loc_804981E  
 .text:08049806         jmp   loc_80498E0  
 .text:0804980B ; ---------------------------------------------------------------------------  
 .text:0804980B  
 .text:0804980B loc_804980B:              ; CODE XREF: sub_804898B+E75 j  
 .text:0804980B         cmp   eax, 2  
 .text:0804980E         jz   short loc_804988B  
 .text:08049810         cmp   eax, 4  
 .text:08049813         jz   loc_80498BB  
 .text:08049819         jmp   loc_80498E0  
 .text:0804981E ; ---------------------------------------------------------------------------  
 .text:0804981E  
 .text:0804981E loc_804981E:              ; CODE XREF: sub_804898B+E79 j  
 .text:0804981E         mov   eax, [ebp+ipos]  
 .text:08049821         add   eax, 2  
 .text:08049824         movzx  eax, byte_804C0C0[eax]  
 .text:0804982B         movsx  eax, al  
 .text:0804982E         mov   eax, [ebp+eax*4+regs]  
 .text:08049835         mov   [ebp+var_20], eax  
 .text:08049838         mov   eax, [ebp+ipos]  
 .text:0804983B         add   eax, 3  
 .text:0804983E         movzx  eax, byte_804C0C0[eax]  
 .text:08049845         movsx  eax, al  
 .text:08049848         mov   eax, [ebp+eax*4+regs]  
 .text:0804984F         mov   [ebp+var_1C], eax  
 .text:08049852         add   [ebp+ipos], 4  
 .text:08049856         jmp   loc_80498E0  
 .text:0804985B ; ---------------------------------------------------------------------------  
 .text:0804985B  
 .text:0804985B loc_804985B:              ; CODE XREF: sub_804898B+E70 j  
 .text:0804985B         mov   eax, [ebp+ipos]  
 .text:0804985E         add   eax, 2  
 .text:08049861         movzx  eax, byte_804C0C0[eax]  
 .text:08049868         movsx  eax, al  
 .text:0804986B         mov   eax, [ebp+eax*4+regs]  
 .text:08049872         mov   [ebp+var_20], eax  
 .text:08049875         mov   eax, [ebp+ipos]  
 .text:08049878         add   eax, 3  
 .text:0804987B         add   eax, offset byte_804C0C0  
 .text:08049880         mov   eax, [eax]  
 .text:08049882         mov   [ebp+var_1C], eax  
 .text:08049885         add   [ebp+ipos], 7  
 .text:08049889         jmp   short loc_80498E0  
 .text:0804988B ; ---------------------------------------------------------------------------  
 .text:0804988B  
 .text:0804988B loc_804988B:              ; CODE XREF: sub_804898B+E83 j  
 .text:0804988B         mov   eax, [ebp+ipos]  
 .text:0804988E         add   eax, 2  
 .text:08049891         add   eax, offset byte_804C0C0  
 .text:08049896         mov   eax, [eax]  
 .text:08049898         mov   [ebp+var_20], eax  
 .text:0804989B         mov   eax, [ebp+ipos]  
 .text:0804989E         add   eax, 6  
 .text:080498A1         movzx  eax, byte_804C0C0[eax]  
 .text:080498A8         movsx  eax, al  
 .text:080498AB         mov   eax, [ebp+eax*4+regs]  
 .text:080498B2         mov   [ebp+var_1C], eax  
 .text:080498B5         add   [ebp+ipos], 7  
 .text:080498B9         jmp   short loc_80498E0  
 .text:080498BB ; ---------------------------------------------------------------------------  
 .text:080498BB  
 .text:080498BB loc_80498BB:              ; CODE XREF: sub_804898B+E88 j  
 .text:080498BB         mov   eax, [ebp+ipos]  
 .text:080498BE         add   eax, 2  
 .text:080498C1         add   eax, offset byte_804C0C0  
 .text:080498C6         mov   eax, [eax]  
 .text:080498C8         mov   [ebp+var_20], eax  
 .text:080498CB         mov   eax, [ebp+ipos]  
 .text:080498CE         add   eax, 6  
 .text:080498D1         add   eax, offset byte_804C0C0  
 .text:080498D6         mov   eax, [eax]  
 .text:080498D8         mov   [ebp+var_1C], eax  
 .text:080498DB         add   [ebp+ipos], 0Ah  
 .text:080498DF         nop  
 .text:080498E0  
 .text:080498E0 loc_80498E0:              ; CODE XREF: sub_804898B+E7B j  
 .text:080498E0                     ; sub_804898B+E8E j ...  
 .text:080498E0         mov   eax, [ebp+var_1C]  
 .text:080498E3         mov   edx, [ebp+var_20]  
 .text:080498E6         and   eax, edx  
 .text:080498E8         mov   [ebp+cr], eax  
 .text:080498EB         jmp   loc_8049C67  

It looks like an incredibly complex instruction to begin with. There are multiple code paths depending on the second byte, but now even more than ever. However, we do know that all code paths ultimately lead to 0x080498e0. That local routine does a bitwise AND of two arguments and puts the result in CR. This is exactly what the x86 TEST instruction does, so we're probably looking at the same thing. As for all the cases, it looks daunting but just requires careful investigation. The cases just specify the format of the two source operands:
  • 0 - Register-register
  • 1 - Register-constant
  • 2 - Constant-register
  • 4 - Constant-constant
Knowing how that big set of cases works makes practically every remaining function easy to reverse engineer. 0x17 is pretty much the same, but it just performs a subtraction at the end rather than AND, which is what CMP does. Almost all of the arithmetic instructions also work like this. IDIV, the division instruction, has a bit of weirdness, though. It needs two destination registers, one for the quotient and one for the remainder. This initially tripped me up a little.

There are, however, two other weird instructions, 0x1b and 0x1c. They don't immediately look like anything in the x86 instruction set, so we'll have to investigate them more closely.

 .text:08049AEC loc_8049AEC:              ; CODE XREF: sub_804898B+C0 j  
 .text:08049AEC                     ; DATA XREF: .rodata:vm_instrs o  
 .text:08049AEC         mov   eax, [ebp+ipos] ; jumptable 08048A4B case 27  
 .text:08049AEF         add   eax, 1  
 .text:08049AF2         movzx  eax, byte_804C0C0[eax]  
 .text:08049AF9         movsx  eax, al  
 .text:08049AFC         mov   [ebp+var_24], eax  
 .text:08049AFF         mov   eax, [ebp+ipos]  
 .text:08049B02         add   eax, 2  
 .text:08049B05         movzx  eax, byte_804C0C0[eax]  
 .text:08049B0C         movsx  eax, al  
 .text:08049B0F         mov   [ebp+var_20], eax  
 .text:08049B12         add   [ebp+ipos], 3  
 .text:08049B16         mov   eax, [ebp+var_20]  
 .text:08049B19         mov   eax, [ebp+eax*4+regs]  
 .text:08049B20         add   eax, offset byte_804C0C0  
 .text:08049B25         mov   edx, [eax]  
 .text:08049B27         mov   eax, [ebp+var_24]  
 .text:08049B2A         mov   [ebp+eax*4+regs], edx  
 .text:08049B31         mov   eax, [ebp+var_24]  
 .text:08049B34         mov   eax, [ebp+eax*4+regs]  
 .text:08049B3B         mov   [ebp+cr], eax  
 .text:08049B3E         jmp   loc_8049C67  
 .text:08049B43 ; ---------------------------------------------------------------------------  
 .text:08049B43  
 .text:08049B43 loc_8049B43:              ; CODE XREF: sub_804898B+C0 j  
 .text:08049B43                     ; DATA XREF: .rodata:vm_instrs o  
 .text:08049B43         mov   eax, [ebp+ipos] ; jumptable 08048A4B case 28  
 .text:08049B46         add   eax, 1  
 .text:08049B49         movzx  eax, byte_804C0C0[eax]  
 .text:08049B50         movsx  eax, al  
 .text:08049B53         mov   [ebp+var_24], eax  
 .text:08049B56         mov   eax, [ebp+ipos]  
 .text:08049B59         add   eax, 2  
 .text:08049B5C         movzx  eax, byte_804C0C0[eax]  
 .text:08049B63         movsx  eax, al  
 .text:08049B66         mov   [ebp+var_20], eax  
 .text:08049B69         add   [ebp+ipos], 3  
 .text:08049B6D         mov   eax, [ebp+var_24]  
 .text:08049B70         mov   eax, [ebp+eax*4+regs]  
 .text:08049B77         add   eax, offset byte_804C0C0  
 .text:08049B7C         mov   edx, [ebp+var_20]  
 .text:08049B7F         mov   edx, [ebp+edx*4+regs]  
 .text:08049B86         mov   [eax], edx  
 .text:08049B88         mov   eax, [eax]  
 .text:08049B8A         mov   [ebp+cr], eax  
 .text:08049B8D         jmp   loc_8049C67  

Both are similar, let's approach 0x1b first. The second and third bytes are both used for register accesses. It reads the data of the register specified by the third byte. This data is then used as an offset into the bytecode area, and the data at that offset is read. Finally, it writes the data into the register specified by the second byte. It might not be obvious at first, but this is just a memory read instruction. It reads memory from the address specified in the source register, and puts that data in the destination register. 0x1c does the same thing in reverse. I decided to call them mem2reg and reg2mem, respectively. They're really just RISC-like load and store instructions.

We could continue and reverse engineer all of the instructions, but it would get really repetitive. The process is basically the same for all of the VM instructions. If we manage to reverse engineer all 33 instructions, we get this list:
  • 0x00: nop
  • 0x01: ret
  • 0x02: add (3 arguments; destination and two source)
  • 0x03: sub (3 arguments; destination and two source)
  • 0x04: imul (3 arguments; destination and two source)
  • 0x05: idiv (4 arguments; quotient/remainder destinations and two source)
  • 0x06: xor (3 arguments; destination and two source)
  • 0x07: neg (2 arguments; destination and source)
  • 0x08: not (2 arguments; destination and source)
  • 0x09: and (3 arguments; destination and two source)
  • 0x0a: or (3 arguments; destination and two source)
  • 0x0b: logicnot (2 arguments; destination and source)
  • 0x0c: shl (3 arguments; destination and two source)
  • 0x0d: sar (3 arguments; destination and two source)
  • 0x0e: jmp (1 argument; offset to go to)
  • 0x0f: call (1 argument; offset to go to)
  • 0x10: jz (1 argument; offset to go to)
  • 0x11: js (1 argument; offset to go to)
  • 0x12: jle (1 argument; offset to go to)
  • 0x13: jg (1 argument; offset to go to)
  • 0x14: jns (1 argument; offset to go to)
  • 0x15: jnz (1 argument; offset to go to)
  • 0x16: test (2 arguments; two operands)
  • 0x17: cmp (2 arguments; two operands)
  • 0x18: mov (2 arguments; destination and source)
  • 0x19: inc (1 argument; is both destination and source)
  • 0x1a: dec (1 argument; is both destination and source)
  • 0x1b: mem2reg (2 arguments; destination and source registers)
  • 0x1c: reg2mem (2 arguments; destination and source registers)
  • 0x1d: hlt
  • 0x1e: push (1 argument; source)
  • 0x1f: pop (1 argument; destination register)
  • 0x20: io (1 argument; function code)
With all these instructions figured out, we can write a short Python script that disassembles the Baleful bytecode. I won't bother documenting that here; writing the disassembler is a fairly easy and boring task. Something more interesting, though, is how we get access to the bytecode in the first place.

Remember back in part 1, when I suspected that the bytecode was being encrypted or packed? I said this because the data looked random, and taking a new look at the VM with our recent discoveries in mind reveals that it's not valid bytecode at all. In fact, we can prove the hypothesis correct by comparing what I dumped from 0x0804db39 in memory to what's in the binary:

 (gdb) x/16xb 0x0804db39                        
  0x804db39:  0x18 0x01 0x00 0x6c 0x00 0x00 0x00 0x0f           
  0x804db41:  0x3f 0x10 0x00 0x00 0x18 0x01 0x00 0x65           
  (gdb)  

 .data:0804DB39         db 19h  
 .data:0804DB3A         db 43h ; C  
 .data:0804DB3B         db 0CFh ; -  
 .data:0804DB3C         db 18h  
 .data:0804DB3D         db  1  
 .data:0804DB3E         db 42h ; B  
 .data:0804DB3F         db 0CFh ; -  
 .data:0804DB40         db 7Bh ; {  
 .data:0804DB41         db 3Eh ; >  
 .data:0804DB42         db 52h ; R  
 .data:0804DB43         db 0CFh ; -  
 .data:0804DB44         db 74h ; t  
 .data:0804DB45         db 19h  
 .data:0804DB46         db 43h ; C  
 .data:0804DB47         db 0CFh ; -  
 .data:0804DB48         db 11h  

They're not even remotely similar. Something is definitely happening to the bytecode. It's likely that what's in memory at the time is valid, so let's just dump that to a file from GDB:

 (gdb) dump memory memdump.bin 0x0804c0c0 0x0804e000  

If we look at memdump.bin, it's much different. Rather than random garbage, we get something that appears to be proper bytecode. If we try disassembling it, it results in something sane, which is also a good sign. Now for all our reverse engineering efforts, we're rewarded with...another binary to reverse engineer. Oh well, c'est la vie (not really).

The ASM code is interesting. It actually has many elements of RISC architectures, like the larger number of registers and dedicated memory access instructions. The ASM generated is quite big, though not as big is Baleful. Let's start at the beginning of the bytecode.

 0x1000: mov r0, 0x103a  
 0x1007: mov r1, 0x1edb  
 0x100e: mov r3, r0  
 0x1012: mov r5, 0x174cf42  
 0x1019: mem2reg r4, r3  
 0x101c: xor r4, r4, r5  
 0x1021: reg2mem r3, r4  
 0x1024: add r3, r3, 0x4  
 0x102c: cmp r1, r3  
 0x1030: jns 0x1019  
 0x1035: jmp 0x103a  

This function, right at the beginning is interesting. Starting at address 0x103a, it reads 4-bytes from memory, XORs them against a constant bitmask, and writes them back to the same spot. It doesn't step until it hits 0x1edb. The purpose of this code is easily apparent: it's decrypting the rest of the bytecode! This is solid proof that the decryption hypothesis was correct. After the decryption, it branches to 0x103a. which itself just goes to 0x1bc0.

(Before going there, though, 0x103f has a bunch of I/O instructions, each one for a specific I/O function in the VM. It's helpful, but not strictly necessary, to give the ones we recognize names. In general, it's a good idea to break up blocks of code separated by RET instructions into separate functions.)

0x1bc0 appears to be the main function of the bytecode program. It starts out with a call to 0x1a6d, which is just this code sequence:

 0x1a6d: mov r0, 0x50  
 0x1a74: call 0x103f  
 0x1a79: mov r0, 0x6c  
 0x1a80: call 0x103f  
 0x1a85: mov r0, 0x65  
 0x1a8c: call 0x103f  
 0x1a91: mov r0, 0x61  
 0x1a98: call 0x103f  
 0x1a9d: mov r0, 0x73  
 0x1aa4: call 0x103f  
 0x1aa9: mov r0, 0x65  
 0x1ab0: call 0x103f  
 0x1ab5: mov r0, 0x20  
 0x1abc: call 0x103f  
 0x1ac1: mov r0, 0x65  
 0x1ac8: call 0x103f  
 0x1acd: mov r0, 0x6e  
 0x1ad4: call 0x103f  
 0x1ad9: mov r0, 0x74  
 0x1ae0: call 0x103f  
 0x1ae5: mov r0, 0x65  
 0x1aec: call 0x103f  
 0x1af1: mov r0, 0x72  
 0x1af8: call 0x103f  
 0x1afd: mov r0, 0x20  
 0x1b04: call 0x103f  
 0x1b09: mov r0, 0x79  
 0x1b10: call 0x103f  
 0x1b15: mov r0, 0x6f  
 0x1b1c: call 0x103f  
 0x1b21: mov r0, 0x75  
 0x1b28: call 0x103f  
 0x1b2d: mov r0, 0x72  
 0x1b34: call 0x103f  
 0x1b39: mov r0, 0x20  
 0x1b40: call 0x103f  
 0x1b45: mov r0, 0x70  
 0x1b4c: call 0x103f  
 0x1b51: mov r0, 0x61  
 0x1b58: call 0x103f  
 0x1b5d: mov r0, 0x73  
 0x1b64: call 0x103f  
 0x1b69: mov r0, 0x73  
 0x1b70: call 0x103f  
 0x1b75: mov r0, 0x77  
 0x1b7c: call 0x103f  
 0x1b81: mov r0, 0x6f  
 0x1b88: call 0x103f  
 0x1b8d: mov r0, 0x72  
 0x1b94: call 0x103f  
 0x1b99: mov r0, 0x64  
 0x1ba0: call 0x103f  
 0x1ba5: mov r0, 0x3a  
 0x1bac: call 0x103f  
 0x1bb1: mov r0, 0x20  
 0x1bb8: call 0x103f  
 0x1bbd: ret  
 0x1bbe: nop  
 0x1bbf: nop  

It's a series of instructions all doing the same thing: loading a byte into r0 and calling 0x103f. 0x103f contains the call to print_char using the I/O instruction, so what's getting loaded into r0 is then printed. If we treat all of these bytes as ASCII characters and try to reconstruct the message, it turns out to be "Please enter your password: ", meaning 0x1a6d prints the password prompt.

Once control returns to main, main loads 0x04 into r0, 0x1e into r1, and calls 0x1080. What does 0x1080 do?

 0x1080: push r30  
 0x1083: push r29  
 0x1086: mov r30, r1  
 0x108a: imul r0, r30, r0  
 0x108f: add r0, r0, 0x8  
 0x1097: mov r29, r0  
 0x109b: io 17  
 0x109d: test r0, r0  
 0x10a1: jz 0x10c2  
 0x10a6: mov r4, r29  
 0x10aa: sar r4, r4, 0x3  
 0x10b2: reg2mem r0, r4  
 0x10b5: add r0, r0, 0x8  
 0x10bd: pop r29  
 0x10bf: pop r30  
 0x10c1: ret  
 0x10c2: nop  

It multiplies r1 (which is 0x1e) by r0 (which is 0x04), adds 8 to it, and stores it in r0. That value becomes an argument to I/O function 0x11. The function then ensures that the I/O function didn't return 0 and writes the block size to the beginning of the returned memory. Finally, it increments the address by 8 and returns it. Although I didn't know for sure what 0x1080 was doing, it reminded me a lot of malloc. It makes a request to the VM which returns a memory address, and then writes some extra data to the beginning of the buffer (which the user does not see), like heap block headers. I assumed it was something like malloc(block_size, num_blocks), which turned out to be correct. You can also confirm this by looking at Baleful, but aren't we tired of that by this point?

So main is allocating an array of 0x1e 4-byte entries. We don't know what it's for at this point. The returned buffer is saved in r11. main then loads 0 into r9 and jumps to 0x1d66.

 0x1d5e: add r9, r9, 0x1  
 0x1d66: mov r30, r9  
 0x1d6a: mov r0, 0x1e  
 0x1d71: cmp r30, r0  
 0x1d75: js 0x1bfb  
 0x1d7a: call 0x104b
 0x1d7f: mov  r1, r0
 0x1d83: mov  r30, r1
 0x1d87: mov  r0, 0xa
 0x1d8e: cmp r30, r0
 0x1d92: jnz 0x1d9c
 0x1d97: jmp 0x1ebe  

We actually start out in the second instruction of an existing function, which is interesting. Anyway, it first checks r9 to make sure it hasn't exceeded 0x1e. If it hasn't exceeded 0x1e, it jumps to 0x1bfb, which we'll look at in a second. If it has exceeded 0x1e, it makes a call to 0x104b, which calls input_char using the I/O instruction. Then it checks the value of the character, and if it's anything besides 0xa (newline, indicating end of input), it prints "Sorry, wrong password!" and terminates. What this appears to be is a password length check. 0x1e is likely the password length, and if we enter more than 0x1e characters (not counting newline), the password is automatically wrong.

That's only if we fail the check, though. If the check succeeds, the function jumps to 0x1bfb, which is shown below.

 0x1bfb: mov r30, r9  
 0x1bff: imul r30, r30, 0x4  
 0x1c07: mov r0, r11  
 0x1c0b: add r30, r30, r0  
 0x1c10: mov r10, r30  
 0x1c14: call 0x104b  
 0x1c19: mov r1, r0  
 0x1c1d: reg2mem r10, r1  
 0x1c20: mem2reg r1, r10  
 0x1c23: mov r30, r1  
 0x1c27: mov r0, 0xa  
 0x1c2e: cmp r30, r0  
 0x1c32: jz 0x1c3c  
 0x1c37: jmp 0x1d5e  

Recall that r9 contains 0 at this point, and r11 has the 0x1e*4-byte array we allocated. It uses r9 as an index into the array and puts that in r10. Then it makes a call to 0x104b, which we already know reads a character from stdin. This character is returned in r0, and gets written to the array spot that we put in r10. So the buffer we allocated is being used to store the password that the user entered. After storing it in the buffer, it once again checks if the character is a newline. In this case, receiving a newline means that the password is less than 0x1e characters, further supporting the idea that 0x1e is the password length. Finally, it goes back to 0x1d5e, which increments r9 by one and repeats the loop.

We've now identified the loop which reads the password from the user. Now what happens after reading it? Assuming the length is correct, control proceeds to 0x1ebe:

 0x1ebe: mov r0, r11  
 0x1ec2: call 0x12a9  

Which simply puts the password buffer in r0 and calls 0x12a9. Since we already have the password, the only thing left to do is check it. That's probably what 0x12a9 is for.

 0x12a9: push r9  
 0x12ac: push r10  
 0x12af: mov r10, r0               # r10 = Password buffer  
 0x12b3: mov r1, 0x1e  
 0x12ba: mov r0, 0x4  
 0x12c1: call 0x1080  
 0x12c6: mov r9, r0                # r9 = Secondary password buffer?  
 0x12ca: mov r1, 0x4  
 0x12d1: mov r0, 0x4  
 0x12d8: call 0x1080  
 0x12dd: mov r5, r0                # r5 = r0 = Unknown 16-byte buffer  

0x12a9 starts out by allocating two new buffers. Both have an unknown purpose, though one of them is the same length as the password buffer, so it's probably related. Afterwards, we just have a bunch of code which fills the buffers up. It's a huge amount of code, so I'm not listing it all here. I'll just put some C code that shows the buffer contents:

 u32 buf1[30] = {0x8d, 0x6f, 0x00, 0x24, 0x98, 0x7c, 0x10, 0x10, 0x9c, 0x60, 0x07, 0x10, 0x8b, 0x63, 0x10, 0x9c, 0x60, 0x07, 0x10, 0x85, 0x61, 0x11, 0x3c, 0xa2, 0x61, 0x0b, 0x10, 0x90, 0x77};  
 u32 buf2[4] = {0xfd, 0x0e, 0x63, 0x4f};  

After the buffers are filled up, we finally get to the meat of the function, which actually checks the password. Both buffers are involved in an interesting way.

 true_check:  
 0x17b5: mov r3, 0x0  
 0x17bc: jmp 0x17c6  
 0x17c1: jmp 0x1878  
 0x17c6: mov r1, 0x0  
 0x17cd: mov r2, 0x0  
 0x17d4: jmp 0x1860
  
 actual_check:  
 0x17d9: mov r30, r2  
 0x17dd: imul r30, r30, 0x4  
 0x17e5: mov r0, r10               # r0 = Password buffer  
 0x17e9: add r30, r30, r0  
 0x17ee: mov r3, r30  
 0x17f2: mem2reg r4, r3               # r4 = password[index]  
 0x17f5: idiv r0, r3, r2, 4           # r3 = index % 4  
 0x17fd: nop  
 0x17fe: mov r30, r3  
 0x1802: imul r30, r30, 0x4  
 0x180a: mov r0, r5                    # r0 = 16-byte buffer  
 0x180e: add r30, r30, r0  
 0x1813: mov r3, r30  
 0x1817: mem2reg r3, r3               # r3 = buf2[index % 4]  
 0x181a: xor r4, r4, r3               # r4 = password[index] ^ buf2[index % 4]  
 0x181f: mov r30, r2  
 0x1823: imul r30, r30, 0x4  
 0x182b: mov r0, r9                    # r0 = Other buffer  
 0x182f: add r30, r30, r0  
 0x1834: mov r3, r30  
 0x1838: mem2reg r3, r3               # r3 = buf1[index]  
 0x183b: mov r30, r4  
 0x183f: mov r0, r3  
 0x1843: cmp r30, r0                    # password[index] ^ buf2[index % 4] == buf1[index]  
 0x1847: jnz 0x1851  
 0x184c: jmp 0x1858  
 0x1851: mov r1, 0x1  
 0x1858: add r2, r2, 0x1
  
 0x1860: mov r3, r1  
 0x1864: mov r30, r2  
 0x1868: mov r0, 0x1e  
 0x186f: cmp r30, r0  
 0x1873: js 0x17d9
  
 end_of_check:  
 0x1878: test r3, r3  
 0x187c: jnz 0x1952 (check_failed)  

r1, r2, and r3 are set to 0 before jumping to 0x1860. It seems r2 is being used to keep track of the current index, since once it gets past 0x1e, the check ends. r3 seems to be a status indicator as to whether the check succeeded. If it's anything put 0, it prints the "Sorry, wrong password!" message, but if it is 0, it prints "Congratulations!" Until r2 hits 0x1e, it's looping through 0x17d9, which has to be what's actually checking the contents of the password.

It uses the index to grab the current character of the password and put it in r4. Then it does something weird: idiv r0, r3, r2, 4. idiv is one of the more complicated instructions done by the VM. It has two different destination registers, one for the quotient of division and one for the remainder. r0, the quotient, gets trashed immediately afterwards, so the program only cares about the remainder in r3. r3 would then equal index % 4.

Why is it computing the index modulo 4? Immediately after this division, it uses index % 4 as an index into that 16-byte buffer. Remember that the buffer contained 4 entries, so this remainder division is making sure we don't go past that buffer. In essence, it rotates through that 16-byte buffer depending on our index in the password. What's read from that buffer is stored in r3.

Finally, it XORs the character we entered with the value read from the 16-byte buffer. It then compares it against the corresponding byte in the other buffer, the one that's the same size as the password buffer. The loop continues either way, but if they're not equal, it ends up putting 1 in r3. This, as we found above, causes the check to ultimately fail and print the "Sorry, wrong password!" message. All the checks have to succeed for the "Congratulations!" message to print.

We now have everything we need to reconstruct the password. buf1 contains the password character we need, XORed with the correct mask for that character's index. XORs can be easily inverted by XORing again with the same mask, so we just need to do that for every character in buf1 and we'll get back the original password. Once this is done, we end up getting the flag packers_and_vms_and_xors_oh_my!

This probably concludes the picoCTF 2014 writeups I'll be doing on this blog. Thanks to the picoCTF team for an amazing set of challenges, especially Baleful, which was probably my favorite of them all. You guys are amazing, and I can't wait to play picoCTF again next year!

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home