Friday, December 5, 2014

picoCTF 2014: Baleful (re200) Part 1

Baleful is the last of the five 200 point master challenges, and the final challenge in picoCTF. It gives us very little information to start off with, simply giving us a "twisted" binary and telling us to get it to accept a password. Since we're just given a binary, there's definitely a reverse engineering element, and like most reversing challenges, the password is probably the flag. Let's jump in!

What happens if we execute Baleful? As expected, there's a password prompt which we have to get past:

 pico59150@shell:~$ ./baleful                                         
 Please enter your password: test                                       
 Sorry, wrong password!                                            

The only obvious course of action is disassembling Baleful. Before we try to disassemble the binary, it's a good idea to get some basic information about it. Let's try seeing what sections it has:

 pico59150@shell:~$ readelf -S baleful  
 There are no sections in this file.  

Well, that's certainly odd. An ELF file with no sections, yet we can still run it. That seems pretty suspicious. If we view it in a hex editor, there are a few odd things. There appears to be another ELF header after the normal one, and the string "UPX" constantly appears. While there are a few other recognizable strings, there aren't very many. One string, however, is quite revealing:

 Info: This file is packed with the UPX executable packer $  
 $Id: UPX 3.91 Copyright (C) 1996-2013 the UPX Team. All Rights Reserved.  

So it appears this file is packed with UPX, a common packer for executables. What executable packers do is take a program and compress it, while still allowing it to run normally. The program contains some stub code that decompresses the rest of the executable. Packing is often used by malware, but only to decrease the file size. It provides no obfuscation benefit, since we can easily unpack the file. Let's get UPX and do that:

 pico59150@shell:~$ ./upx -d baleful  
             Ultimate Packer for eXecutables  
              Copyright (C) 1996 - 2013  
 UPX 3.91w    Markus Oberhumer, Laszlo Molnar & John Reiser  Sep 30th 2013  
     File size     Ratio   Format   Name  
   --------------------  ------  -----------  -----------  
   148104 <-   6752  4.56% netbsd/elf386 baleful  
 Unpacked 1 file.  

Now we have Baleful in a form that'll be much easier to reverse engineer. Load it into your preferred disassembler (I use IDA) and take a look. A good start would be trying to find the messages that the program prints, but they're not anywhere in the executable. Where could they be, then? A good start might be learning how I/O is done in the first place. Looking at the PLT (procedure linkage table), there are printf(), fputc(), and fgetc() functions. Quite a few things reference them.

 .text:0804867C sub_804867C   proc near        ; CODE XREF: sub_804898B+12C9 p  
 .text:0804867C                     ; DATA XREF: .data:off_804C060 o  
 .text:0804867C arg_0      = dword ptr 8  
 .text:0804867C         push  ebp  
 .text:0804867D         mov   ebp, esp  
 .text:0804867F         sub   esp, 18h  
 .text:08048682         mov   edx, ds:stderr  
 .text:08048688         mov   eax, [ebp+arg_0]  
 .text:0804868B         mov   eax, [eax]  
 .text:0804868D         mov   [esp+4], edx  ; stream  
 .text:08048691         mov   [esp], eax   ; c  
 .text:08048694         call  _fputc  
 .text:08048699         mov   eax, ds:stderr  
 .text:0804869E         mov   [esp], eax   ; stream  
 .text:080486A1         call  _fflush  
 .text:080486A6         mov   eax, [ebp+arg_0]  
 .text:080486A9         mov   eax, [eax]  
 .text:080486AB         leave  
 .text:080486AC         retn  
 .text:080486AC sub_804867C   endp  

This function takes a single argument, a pointer to a character, and prints that character to stderr. It then calls fflush to make sure it's actually printed. Let's call this print_char in case we encounter it later. There's an analogous function for character input, which we'll call stdin_getc:

 .text:080486FB sub_80486FB   proc near        ; DATA XREF: .data:0804C070 o  
 .text:080486FB arg_0      = dword ptr 8  
 .text:080486FB         push  ebp  
 .text:080486FC         mov   ebp, esp  
 .text:080486FE         sub   esp, 18h  
 .text:08048701         mov   eax, [ebp+arg_0]  
 .text:08048704         mov   [esp], eax  
 .text:08048707         call  sub_80485F4  
 .text:0804870C         mov   eax, ds:stdin  
 .text:08048711         mov   [esp], eax   ; stream  
 .text:08048714         call  _fgetc  
 .text:08048719         leave  
 .text:0804871A         retn  
 .text:0804871A sub_80486FB   endp  

0x080485F4 is a small function that checks if we've reached EOF in stdin, and raises a signal if we have. We can also find some more I/O functions which don't appear to be used. Here's our final list of all I/O functions:
  • 0x0804867C (print_char) - Prints a single character to stderr
  • 0x080486AD (print_dec) - Prints decimal numbers as strings
  • 0x080486D4 (print_hex) - Prints hexadecimal numbers as strings
  • 0x080487A9 (print_float) - Prints floating-point numbers as strings
  • 0x080486FB (stdin_getc) - Read a single character from stdin and return it
  • 0x0804871B (input_dec) - Reads a decimal number from stdin and returns it
  • 0x0804874E (input_hex) - Reads a hexadecimal number from stdin and returns it
  • 0x080487D8 (input_float) - Reads a floating-point number from stdin and returns it
All of these functions deal with basic text I/O. Interestingly enough, they're also all referenced by a table of functions at 0x0804C060. I call it io_ops since all the known functions in it are centered around that purpose:

 .data:0804C060 io_ops     dd offset print_char  ; DATA XREF: sub_804898B+12B9 r  
 .data:0804C064         dd offset print_dec  
 .data:0804C068         dd offset print_hex  
 .data:0804C06C         dd offset print_float  
 .data:0804C070         dd offset stdin_getc  
 .data:0804C074         dd offset input_dec  
 .data:0804C078         dd offset input_hex  
 .data:0804C07C         dd offset input_float  
 .data:0804C080         dd offset sub_8048619  
 .data:0804C084         dd offset sub_8048813  
 .data:0804C088         dd offset sub_8048834  
 .data:0804C08C         dd offset sub_804887B  
 .data:0804C090         dd offset sub_80488B6  
 .data:0804C094         dd offset sub_80488F1  
 .data:0804C098         dd offset sub_804892C  
 .data:0804C09C         dd offset sub_8048660  
 .data:0804C0A0         dd offset sub_804866A  
 .data:0804C0A4         dd offset sub_8048967  
 .data:0804C0A8         align 20h  

Is print_char used by Baleful to print the messages? That's not incredibly efficient, but would help obfuscate the program. We can find out by placing a GDB breakpoint on print_char and seeing what happens:

 pico59150@shell:~$ gdb baleful                                        
 GNU gdb (Ubuntu 7.7-0ubuntu3.1) 7.7                                     
 Copyright (C) 2014 Free Software Foundation, Inc.                              
 License GPLv3+: GNU GPL version 3 or later <>                
 This is free software: you are free to change and redistribute it.                      
 There is NO WARRANTY, to the extent permitted by law. Type "show copying"                  
 and "show warranty" for details.                                       
 This GDB was configured as "x86_64-linux-gnu".                                
 Type "show configuration" for configuration details.                             
 For bug reporting instructions, please see:                                 
 Find the GDB manual and other documentation resources online at:                       
 For help, type "help".                                            
 Type "apropos word" to search for commands related to "word"...                       
 Reading symbols from baleful...(no debugging symbols found)...done.                     
 (gdb) b *0x0804867C                                             
 Breakpoint 1 at 0x804867c                                          
 (gdb) run                                                  
 Starting program: /home_users/pico59150/baleful                               
 Breakpoint 1, 0x0804867c in ?? ()                                      
 (gdb) cont                                                  
 Breakpoint 1, 0x0804867c in ?? ()                                      
 (gdb) cont                                                  
 Breakpoint 1, 0x0804867c in ?? ()                                      
 (gdb) cont                                                  
 Breakpoint 1, 0x0804867c in ?? ()                                      
 (gdb) cont                                                  
 Breakpoint 1, 0x0804867c in ?? ()                                      
 (gdb) cont                                                  
 Breakpoint 1, 0x0804867c in ?? ()                                      
 (gdb) cont                                                  
 Breakpoint 1, 0x0804867c in ?? ()                                      

Looks like that hypothesis is correct. Each time we execute print_char, the password prompt ("Please enter your password") gets printed out one character at a time. Whatever function is calling print_char is probably involved with printing out the message. Let's see where we were called from by viewing the return address on the stack:

  (gdb) info registers                         
  eax   0xffffd5c4  -10812                    
  ecx   0xf7fc988c  -134440820                   
  edx   0x804867c  134514300                   
  ebx   0xffffd690  -10608                    
  esp   0xffffd5ac  0xffffd5ac                   
  ebp   0xffffd678  0xffffd678                   
  esi   0x0  0                       
  edi   0xffffd70c  -10484                    
  eip   0x804867c  0x804867c                   
  eflags   0x212 [ AF IF ]                     
  cs    0x23  35                       
  ss    0x2b  43                       
  ds    0x2b  43                       
  es    0x2b  43                       
  fs    0x0  0                       
  gs    0x63  99                       
  (gdb) x 0xffffd5ac                         
  0xffffd5ac:  0x08049c56                       

The return address is 0x08049c56. Let's view the code in the vicinity of that:

 .text:08049C2E loc_8049C2E:              ; CODE XREF: sub_804898B+C0 j  
 .text:08049C2E                     ; DATA XREF: .rodata:off_8049DD4 o  
 .text:08049C2E         mov   eax, [ebp+var_34] ; jumptable 08048A4B case 32  
 .text:08049C31         add   eax, 1  
 .text:08049C34         movzx  eax, byte_804C0C0[eax]  
 .text:08049C3B         movsx  eax, al  
 .text:08049C3E         mov   [ebp+var_24], eax  
 .text:08049C41         mov   eax, [ebp+var_24]  
 .text:08049C44         mov   edx, io_ops[eax*4]  
 .text:08049C4B         lea   eax, [ebp+var_B4]  
 .text:08049C51         mov   [esp], eax  
 .text:08049C54         call  edx ; print_char  
 .text:08049C56         mov   [ebp+var_B4], eax  
 .text:08049C5C         add   [ebp+var_34], 2  
 .text:08049C60         jmp   short loc_8049C67  
 .text:08049C62 ; ---------------------------------------------------------------------------  
 .text:08049C62 loc_8049C62:              ; CODE XREF: sub_804898B+B3 j  
 .text:08049C62                     ; sub_804898B+C0 j  
 .text:08049C62                     ; DATA XREF: ...  
 .text:08049C62         add   [ebp+var_34], 1 ; jumptable 08048A4B default case  
 .text:08049C66         nop  
 .text:08049C67 loc_8049C67:              ; CODE XREF: sub_804898B+9D j  
 .text:08049C67                     ; sub_804898B+C6 j ...  
 .text:08049C67         mov   eax, [ebp+var_34]  
 .text:08049C6A         add   eax, offset byte_804C0C0  
 .text:08049C6F         movzx  eax, byte ptr [eax]  
 .text:08049C72         cmp   al, 1Dh  
 .text:08049C74         jnz   loc_8048A2D  
 .text:08049C7A         mov   eax, [ebp+var_B4]  
 .text:08049C80 locret_8049C80:             ; CODE XREF: sub_804898B+E4 j  
 .text:08049C80         leave  
 .text:08049C81         retn  
 .text:08049C81 sub_804898B   endp  

The highlighted text is where the actual call took place. Let's look back a bit to see where we came from. We can see that this is case 32 in some unknown jumptable. The first thing it does is read a 4-byte value from [ebp-0x34]. This value is used as an offset into some memory area at 0x804C0C0. This function reads the byte at 0x804C0C0+offset+1. What we can deduce from this is that there's some data structure pointed to by offset, and this function takes its second byte. That byte is used as an index into io_ops, from which a function is read and then called (in the highlighted line). The argument to the function is taken from [ebp-0xb4], and the return value is put there afterwards.

Once the I/O function has been completed, it increments the offset in [ebp-0x34] by 2 and calls 0x08049c67. 0x08049c67 reads a byte at the new offset and then compares it to 0x1d. If it is 0x1d, it just returns from whatever function we're in, but otherwise, it jumps to 0x08048a2d. It's not exactly clear what the function is doing at this point, so let's see what happens at 0x08048a2d:

 .text:08048A2D loc_8048A2D:              ; CODE XREF: sub_804898B+12E9 j  
 .text:08048A2D         mov   eax, [ebp+var_34]  
 .text:08048A30         add   eax, offset byte_804C0C0  
 .text:08048A35         movzx  eax, byte ptr [eax]  
 .text:08048A38         movsx  eax, al  
 .text:08048A3B         cmp   eax, 20h    ; switch 33 cases  
 .text:08048A3E         ja   loc_8049C62   ; jumptable 08048A4B default case  
 .text:08048A44         mov   eax, ds:off_8049DD4[eax*4]  
 .text:08048A4B         jmp   eax       ; switch jump  

Looks like 0x08048a2d is the jumptable dispatcher. It once again uses [ebp-0x34] as an offset into 0x0804c0c0, a pattern that's starting to emerge. It takes the first byte at that offset and uses it as an index into the jumptable. Recall that 0x8049c2e is a jumptable case, so it gets called directly from here. It looked at the second byte at the offset, and used that as a parameter. So the data pointed to by [ebp-0x34] always starts with a jumptable index, and then contains some case-specific data afterwards.

What is at 0x0804c0c0 anyway? As it turns out, there's absolutely nothing but zeroes for the first 0x1000 bytes. Then there are some bytes which appear normal, though their purpose isn't yet known. But as we get to 0x0804D0F0, the data starts to lose any noticeable patterns and appears to be fairly random. It looks like there's some sort of encryption or packing going on. We'll get back to that much later.

Now, it still wasn't completely clear what I was dealing with, but I began to have a hunch that this was a bytecode VM. The theory makes sense: it has an offset into some data area, it uses the first byte at that offset to choose one of many cases, each case can read additional data from that offset, and it always increments the offset after it finishes. Recall that 0x08049c2e, the one which called all the I/O functions, used the second byte at the offset only. Then it incremented the offset by 2 when it finished, and went back to the main dispatcher. If the VM theory is correct, Baleful is advancing an instruction pointer and dispatching the next one.

The VM theory was actually quite plausible, so I decided to run with it. If it was true, that meant that everything in the 0x0804c0c0 area was a bytecode program that actually did everything. The I/O meta-function at 0x08049c2e would just be an instruction called by the bytecode program to communicate with the outside world. As obfuscation mechanisms go, it's a fairly good one. The new goal should be understanding enough of the VM to write a disassembler and reverse engineer the bytecode program.

 .rodata:08049DD4 vm_instrs    dd offset loc_8048A4D  ; DATA XREF: sub_804898B+B9 r  
 .rodata:08049DD4         dd offset loc_8048A56  ; jump table for switch statement  
 .rodata:08049DD4         dd offset loc_8048A8F  
 .rodata:08049DD4         dd offset loc_8048BC4  
 .rodata:08049DD4         dd offset loc_8048CF9  
 .rodata:08049DD4         dd offset loc_8048E2F  
 .rodata:08049DD4         dd offset loc_8048F91  
 .rodata:08049DD4         dd offset loc_80495F5  
 .rodata:08049DD4         dd offset loc_8049649  
 .rodata:08049DD4         dd offset loc_80490C6  
 .rodata:08049DD4         dd offset loc_80491FB  
 .rodata:08049DD4         dd offset loc_804959E  
 .rodata:08049DD4         dd offset loc_8049330  
 .rodata:08049DD4         dd offset loc_8049467  
 .rodata:08049DD4         dd offset loc_80496D1  
 .rodata:08049DD4         dd offset loc_804969D  
 .rodata:08049DD4         dd offset loc_80496EC  
 .rodata:08049DD4         dd offset loc_8049715  
 .rodata:08049DD4         dd offset loc_804973E  
 .rodata:08049DD4         dd offset loc_8049767  
 .rodata:08049DD4         dd offset loc_8049790  
 .rodata:08049DD4         dd offset loc_80497B9  
 .rodata:08049DD4         dd offset loc_80497E2  
 .rodata:08049DD4         dd offset loc_80498F0  
 .rodata:08049DD4         dd offset loc_8049A02  
 .rodata:08049DD4         dd offset loc_8049A86  
 .rodata:08049DD4         dd offset loc_8049AB9  
 .rodata:08049DD4         dd offset loc_8049AEC  
 .rodata:08049DD4         dd offset loc_8049B43  
 .rodata:08049DD4         dd offset loc_8049C62  
 .rodata:08049DD4         dd offset loc_8049B92  
 .rodata:08049DD4         dd offset loc_8049BF8  
 .rodata:08049DD4         dd offset io_8049C2E  

There's a huge, intimidating jumptable staring us in the face, and of the 33 instructions there, we have only a single one. Let's try and see which ones are easy enough to identify right away.

 .text:08048A4D loc_8048A4D:              ; DATA XREF: .rodata:vm_instrs o  
 .text:08048A4D         add   [ebp+ipos], 1 ; jumptable 08048A4B case 0  
 .text:08048A51         jmp   loc_8049C67  

A case that does absolutely nothing but increment the instruction pointer (I now call it ipos). I'm willing to bet this is the equivalent of NOP on basically every CPU architecture. This is probably some sort of assembly language bytecode, then. Two instructions down, 31 to go. What else can we identify?

Well, for me, pretty much nothing at all. 0x08049C62, which implements opcode 0x1d, is fairly easy to identify as the VM termination instruction, but that's not much help. Every other function just seemed incomprehensible from a static analysis perspective, using a bunch of local variables that I didn't know the meaning of. So I decided to go back to GDB, tracing the execution path of the program after the I/O dispatcher (opcode 0x20).

Let's restart the program and set two breakpoints, one on the main VM loop and one inside the I/O dispatcher. We want to start debugging after the first I/O call, though, so we need to set up the breakpoint then:

 pico59150@shell:~$ gdb baleful                                        
 GNU gdb (Ubuntu 7.7-0ubuntu3.1) 7.7                                     
 Copyright (C) 2014 Free Software Foundation, Inc.                              
 License GPLv3+: GNU GPL version 3 or later <>                
 This is free software: you are free to change and redistribute it.                      
 There is NO WARRANTY, to the extent permitted by law. Type "show copying"                  
 and "show warranty" for details.                                       
 This GDB was configured as "x86_64-linux-gnu".                                
 Type "show configuration" for configuration details.                             
 For bug reporting instructions, please see:                                 
 Find the GDB manual and other documentation resources online at:                       
 For help, type "help".                                            
 Type "apropos word" to search for commands related to "word"...                       
 Reading symbols from baleful...(no debugging symbols found)...done.                     
 (gdb) b *0x08049C54                                             
 Breakpoint 1 at 0x8049c54                                          
 (gdb) run                                                  
 Starting program: /home_users/pico59150/baleful                               
 Breakpoint 1, 0x08049c54 in ?? ()                                      
 (gdb) b *0x08048A2D                                             
 Breakpoint 2 at 0x8048a2d                                          

We want to see what instructions are being executed, so let's view the opcode every time we hit the main dispatcher:

 (gdb) cont                                                  
 Breakpoint 2, 0x08048a2d in ?? ()                                      
 (gdb) si                                                   
 0x08048a30 in ?? ()                                             
 (gdb) info registers                                             
 eax      0x1041  4161                                         
 ecx      0xf7fc988c    -134440820                                  
 edx      0x0   0                                          
 ebx      0xffffd690    -10608                                    
 esp      0xffffd5b0    0xffffd5b0                                  
 ebp      0xffffd678    0xffffd678                                  
 esi      0x0   0                                          
 edi      0xffffd70c    -10484                                    
 eip      0x8048a30    0x8048a30                                  
 eflags     0x297  [ CF PF AF SF IF ]                                  
 cs       0x23   35                                          
 ss       0x2b   43                                          
 ds       0x2b   43                                          
 es       0x2b   43                                          
 fs       0x0   0                                          
 gs       0x63   99                                          
 (gdb) si                                                   
 0x08048a35 in ?? ()                                             
 (gdb) si                                                   
 0x08048a38 in ?? ()                                             
 (gdb) si                                                   
 0x08048a3b in ?? ()                                             
 (gdb) info registers                                             
 eax      0x1   1                                          
 ecx      0xf7fc988c    -134440820                                  
 edx      0x0   0                                          
 ebx      0xffffd690    -10608                                    
 esp      0xffffd5b0    0xffffd5b0                                  
 ebp      0xffffd678    0xffffd678                                  
 esi      0x0   0                                          
 edi      0xffffd70c    -10484                                    
 eip      0x8048a3b    0x8048a3b                                  
 eflags     0x202  [ IF ]                                        
 cs       0x23   35                                          
 ss       0x2b   43                                          
 ds       0x2b   43                                          
 es       0x2b   43                                          
 fs       0x0   0                                          
 gs       0x63   99                                          

Offset is 0x1041, opcode is 0x1. Let's keep doing this for a while.

 (gdb) cont                                                  
 Breakpoint 2, 0x08048a2d in ?? ()                                         
 (gdb) si                                                   
 0x08048a30 in ?? ()                                             
 (gdb) info registers                                             
 eax      0x1a79  6777                                         
 ecx      0xf7fc988c    -134440820                                  
 edx      0x0   0                                          
 ebx      0xffffd690    -10608                                    
 esp      0xffffd5b0    0xffffd5b0                                  
 ebp      0xffffd678    0xffffd678                                  
 esi      0x0   0                                          
 edi      0xffffd70c    -10484                                    
 eip      0x8048a30    0x8048a30                                  
 eflags     0x293  [ CF AF SF IF ]                                   
 cs       0x23   35                                          
 ss       0x2b   43                                          
 ds       0x2b   43                                          
 es       0x2b   43                                          
 fs       0x0   0                                          
 gs       0x63   99                                          
 (gdb) si                                                   
 0x08048a35 in ?? ()                                             
 (gdb) si                                                   
 0x08048a38 in ?? ()                                             
 (gdb) si                                                   
 0x08048a3b in ?? ()                                             
 (gdb) info registers                                             
 eax      0x18   24                                          
 ecx      0xf7fc988c    -134440820                                  
 edx      0x0   0                                          
 ebx      0xffffd690    -10608                                    
 esp      0xffffd5b0    0xffffd5b0                                  
 ebp      0xffffd678    0xffffd678                                  
 esi      0x0   0                                          
 edi      0xffffd70c    -10484                                    
 eip      0x8048a3b    0x8048a3b                                  
 eflags     0x206  [ PF IF ]                                      
 cs       0x23   35                                          
 ss       0x2b   43                                          
 ds       0x2b   43                                          
 es       0x2b   43                                          
 fs       0x0   0                                          
 gs       0x63   99                                          
 (gdb) cont                                                  
 Breakpoint 2, 0x08048a2d in ?? ()                                      
 (gdb) si                                                   
 0x08048a30 in ?? ()                                             
 (gdb) info registers                                             
 eax      0x1a80  6784                                         
 ecx      0xf7fc988c    -134440820                                  
 edx      0x6c   108                                         
 ebx      0xffffd690    -10608                                    
 esp      0xffffd5b0    0xffffd5b0                                  
 ebp      0xffffd678    0xffffd678                                  
 esi      0x0   0                                          
 edi      0xffffd70c    -10484                                    
 eip      0x8048a30    0x8048a30                                  
 eflags     0x283  [ CF SF IF ]                                     
 cs       0x23   35                                          
 ss       0x2b   43                                          
 ds       0x2b   43                                          
 es       0x2b   43                                          
 fs       0x0   0                                          
 gs       0x63   99                                          
 (gdb) si                                                   
 0x08048a35 in ?? ()                                             
 (gdb) si                                                   
 0x08048a38 in ?? ()                                             
 (gdb) si                                                   
 0x08048a3b in ?? ()                                             
 (gdb) info registers                                             
 eax      0xf   15                                          
 ecx      0xf7fc988c    -134440820                                  
 edx      0x6c   108                                         
 ebx      0xffffd690    -10608                                    
 esp      0xffffd5b0    0xffffd5b0                                  
 ebp      0xffffd678    0xffffd678                                  
 esi      0x0   0                                          
 edi      0xffffd70c    -10484                                    
 eip      0x8048a3b    0x8048a3b                                  
 eflags     0x202  [ IF ]                                        
 cs       0x23   35                                          
 ss       0x2b   43                                          
 ds       0x2b   43                                          
 es       0x2b   43                                          
 fs       0x0   0                                          
 gs       0x63   99                                          
 (gdb) cont                                                  
 Breakpoint 2, 0x08048a2d in ?? ()                                      
 (gdb) si                                                   
 0x08048a30 in ?? ()                                             
 (gdb) info registers                                             
 eax      0x103f  4159                                         
 ecx      0xf7fc988c    -134440820                                  
 edx      0x1a85  6789                                         
 ebx      0xffffd690    -10608                                    
 esp      0xffffd5b0    0xffffd5b0                                  
 ebp      0xffffd678    0xffffd678                                  
 esi      0x0   0                                          
 edi      0xffffd70c    -10484                                    
 eip      0x8048a30    0x8048a30                                  
 eflags     0x216  [ PF AF IF ]                                     
 cs       0x23   35                                          
 ss       0x2b   43                                          
 ds       0x2b   43                                          
 es       0x2b   43                                          
 fs       0x0   0                                          
 gs       0x63   99                                          
 (gdb) si                                                   
 0x08048a35 in ?? ()                                             
 (gdb) si                                                   
 0x08048a38 in ?? ()                                             
 (gdb) si                                                   
 0x08048a3b in ?? ()                                             
 (gdb) info registers                                             
 eax      0x20   32                                          
 ecx      0xf7fc988c    -134440820                                  
 edx      0x1a85  6789                                         
 ebx      0xffffd690    -10608                                    
 esp      0xffffd5b0    0xffffd5b0                                  
 ebp      0xffffd678    0xffffd678                                  
 esi      0x0   0                                          
 edi      0xffffd70c    -10484                                    
 eip      0x8048a3b    0x8048a3b                                  
 eflags     0x206  [ PF IF ]                                      
 cs       0x23   35                                          
 ss       0x2b   43                                          
 ds       0x2b   43                                          
 es       0x2b   43                                          
 fs       0x0   0                                          
 gs       0x63   99                                          

Let's summarize what we see here. It's pretty important, since what I found while doing this was the big breakthrough that allowed me to quickly figure out the rest of the VM. After leaving the I/O instruction, we're at offset 0x1041. We can infer from this that the I/O instruction was at 0x103f. The opcode there is 0x1. After running opcode 0x1, we go back to the main VM loop, but the offset has suddenly jumped by a lot to 0x1a79. The most likely explanation is that opcode 0x1 modified ipos. Opcode 0x18 is then run, which seems to have a lot of associated data, as the ipos after that is 0x1a80. The opcode at 0x1a80 is 0xf. Afterwards, we can observe that ipos has changed back to 0x103f, about to run the I/O instruction again. 0xf is the only candidate for what modified ipos.

If we allow the program to continue and run the I/O instruction, now the "l" character gets printed. This is the second character in the password prompt. So the I/O instruction runs, then opcode 0x1 runs and ipos completely changes as a result. At the new ipos, we run opcode 0x18 and 0xf. As a result of executing opcode 0xf, ipos goes back to what it was before and runs the I/O instruction again. But we have new data being passed to it! Which instruction did that? Assuming that each opcode does one specific thing, 0x18 is the only possibility.

The way this is arranged made me think of returning from a function, getting some new data, and then going back to that same function. That would make 0x1 RET and 0xf CALL. And since 0x18 gets the new character to be printed, it's probably similar to MOV. As it turns out, simply figuring that out is amazingly helpful towards getting all 33 opcodes figured out. RET and CALL deal with the stack, and MOV deals with registers, so they'll shed a lot of light on the VM.

Let's look at both 0x1 and 0xf first, which we believe are CALL and RET:

 .text:08048A56 ret_8048A56:              ; CODE XREF: sub_804898B+C0 j  
 .text:08048A56                     ; DATA XREF: .rodata:vm_instrs o  
 .text:08048A56         mov   eax, [ebp+var_38] ; jumptable 08048A4B case 1  
 .text:08048A59         add   eax, offset byte_804C0C0  
 .text:08048A5E         mov   eax, [eax]  
 .text:08048A60         mov   [ebp+var_14], eax  
 .text:08048A63         cmp   [ebp+var_14], 0  
 .text:08048A67         jnz   short loc_8048A74  
 .text:08048A69         mov   eax, [ebp+var_B4]  
 .text:08048A6F         jmp   locret_8049C80  
 .text:08048A74 ; ---------------------------------------------------------------------------  
 .text:08048A74 loc_8048A74:              ; CODE XREF: sub_804898B+DC j  
 .text:08048A74         mov   eax, [ebp+var_38]  
 .text:08048A77         add   eax, 4  
 .text:08048A7A         mov   [ebp+var_38], eax  
 .text:08048A7D         mov   eax, [ebp+var_14]  
 .text:08048A80         mov   [ebp+ipos], eax  
 .text:08048A83         mov   [ebp+var_28], 0  
 .text:08048A8A         jmp   loc_8049C67  

 .text:0804969D call_804969D:              ; CODE XREF: sub_804898B+C0 j  
 .text:0804969D                     ; DATA XREF: .rodata:vm_instrs o  
 .text:0804969D         mov   eax, [ebp+ipos] ; jumptable 08048A4B case 15  
 .text:080496A0         add   eax, 1  
 .text:080496A3         add   eax, offset byte_804C0C0  
 .text:080496A8         mov   eax, [eax]  
 .text:080496AA         mov   [ebp+var_10], eax  
 .text:080496AD         mov   eax, [ebp+var_38]  
 .text:080496B0         sub   eax, 4  
 .text:080496B3         mov   [ebp+var_38], eax  
 .text:080496B6         mov   eax, [ebp+var_38]  
 .text:080496B9         add   eax, offset byte_804C0C0  
 .text:080496BE         mov   edx, [ebp+ipos]  
 .text:080496C1         add   edx, 5  
 .text:080496C4         mov   [eax], edx  
 .text:080496C6         mov   eax, [ebp+var_10]  
 .text:080496C9         mov   [ebp+ipos], eax  
 .text:080496CC         jmp   loc_8049C67  

We can see that both of these functions use [ebp-0x38]. RET uses it as an offset into the 0x0804c0c0 area, reading a 4-byte value from that offset. It increments the offset in [ebp-0x38] by 4 bytes and then stores the value it previously read in ipos. CALL does the opposite of this. It first reads a 4-byte value as part of the bytecode instruction, which is presumably the offset we want to jump to. It then decrements [ebp-0x38] by 4 bytes and stores the current value of ipos + 5, which would point to the instruction after the CALL. Then it puts the previously read value into ipos. These functions make it obvious that 0x1 and 0xf are indeed RET and CALL, and furthermore, that [ebp-0x38] is a stack pointer offset. Very useful to know.

Now we want to look at MOV to see how it works, but the function looks very complex at first glance. Let's again use GDB to trace through it, at the point where the "l" character gets loaded. Opcode 0x18, which is MOV, has its case at 0x08049A02. Let's see which part of the case we end up branching to.

 .text:08049A02 loc_8049A02:              ; CODE XREF: sub_804898B+C0 j  
 .text:08049A02                     ; DATA XREF: .rodata:vm_instrs o  
 .text:08049A02         mov   eax, [ebp+ipos] ; jumptable 08048A4B case 24  
 .text:08049A05         add   eax, 1  
 .text:08049A08         movzx  eax, byte_804C0C0[eax]  
 .text:08049A0F         movsx  eax, al  
 .text:08049A12         mov   [ebp+var_C], eax  
 .text:08049A15         mov   eax, [ebp+var_C]  
 .text:08049A18         test  eax, eax  
 .text:08049A1A         jz   short loc_8049A23  
 .text:08049A1C         cmp   eax, 1  
 .text:08049A1F         jz   short loc_8049A57  
 .text:08049A21         jmp   short loc_8049A81  

There are several cases we could end up in, let's just set breakpoints on all of them and see which one we hit.

 (gdb) break *0x8049A23                                            
 Breakpoint 3 at 0x8049a23                                          
 (gdb) break *0x8049A57                                            
 Breakpoint 4 at 0x8049a57                                          
 (gdb) cont                                                  
 Breakpoint 4, 0x08049a57 in ?? ()                                      

Alright, we end up getting to 0x08049a57. We should first dump the data ipos points to:

 (gdb) x $ebp-0x34                                              
 0xffffd644:   0x00001a79                                          
 (gdb) x/16xb 0x0804db39                                           
 0x804db39:   0x18  0x01  0x00  0x6c  0x00  0x00  0x00  0x0f                 
 0x804db41:   0x3f  0x10  0x00  0x00  0x18  0x01  0x00  0x65                 

The data highlighted in red is the actual MOV instruction. Here we see 0x18, the opcode, 0x01, which is used to decide the case, a 0x00 byte, and the 4-byte value 0x6c (little endian). 0x6c is simply the ASCII code for the letter "l", so this is almost certainly loading the next character to print. Now that we know the data, how will the function play out?

 .text:08049A57 loc_8049A57:              ; CODE XREF: sub_804898B+1094 j  
 .text:08049A57         mov   eax, [ebp+ipos]  
 .text:08049A5A         add   eax, 2  
 .text:08049A5D         movzx  eax, byte_804C0C0[eax]  
 .text:08049A64         movsx  eax, al  
 .text:08049A67         mov   edx, [ebp+ipos]  
 .text:08049A6A         add   edx, 3  
 .text:08049A6D         add   edx, offset byte_804C0C0  
 .text:08049A73         mov   edx, [edx]  
 .text:08049A75         mov   [ebp+eax*4+var_B4], edx  
 .text:08049A7C         add   [ebp+ipos], 7  
 .text:08049A80         nop  

It loads the third byte of our instruction, which will be 0x00. Then it loads the 4-byte value after that, which is 0x6c. Finally, it loads that value into [ebp-0xb4+eax], which is just [ebp-0xb4] in this case. Recall that [ebp-0xb4] contained the argument given to print_char. However, it turns out that ebp-0xb4 is not just the address of one 4-byte value, but an entire array of them. Depending on the third byte, the MOV instruction can write to any index of this array. MOV usually loads values into registers, so it would appear that ebp-0xb4 is a register area for the VM bytecode. And if ebp-0xb4 is a register area, this means that the I/O opcode always passes the value of the first register as an argument to whichever I/O function is called.

Now we've figured out a lot more about how data is accessed inside the VM. Knowing about both the register and stack implementations makes it very easy to figure out the rest of the instructions. Since this article is already quite long, we'll pick this back up in part 2. In the second part of this write-up, we'll figure out the rest of the instructions, write a disassembler for the bytecode, and then reverse engineer the bytecode to get the flag.

