picoCTF 2014: Baleful (re200) Part 1
Baleful is the last of the five 200 point master challenges, and the final challenge in picoCTF. It gives us very little information to start off with, simply giving us a "twisted" binary and telling us to get it to accept a password. Since we're just given a binary, there's definitely a reverse engineering element, and like most reversing challenges, the password is probably the flag. Let's jump in!
What happens if we execute Baleful? As expected, there's a password prompt which we have to get past:
pico59150@shell:~$ ./baleful
Please enter your password: test
Sorry, wrong password!
pico59150@shell:~$
The only obvious course of action is disassembling Baleful. Before we try to disassemble the binary, it's a good idea to get some basic information about it. Let's try seeing what sections it has:
pico59150@shell:~$ readelf -S baleful
There are no sections in this file.
Well, that's certainly odd. An ELF file with no sections, yet we can still run it. That seems pretty suspicious. If we view it in a hex editor, there are a few odd things. There appears to be another ELF header after the normal one, and the string "UPX" constantly appears. While there are a few other recognizable strings, there aren't very many. One string, however, is quite revealing:
Info: This file is packed with the UPX executable packer http://upx.sf.net $
$Id: UPX 3.91 Copyright (C) 1996-2013 the UPX Team. All Rights Reserved.
So it appears this file is packed with UPX, a common packer for executables. What executable packers do is take a program and compress it, while still allowing it to run normally. The program contains some stub code that decompresses the rest of the executable. Packing is often used by malware, but only to decrease the file size. It provides no obfuscation benefit, since we can easily unpack the file. Let's get UPX and do that:
pico59150@shell:~$ ./upx -d baleful
Ultimate Packer for eXecutables
Copyright (C) 1996 - 2013
UPX 3.91w Markus Oberhumer, Laszlo Molnar & John Reiser Sep 30th 2013
File size Ratio Format Name
-------------------- ------ ----------- -----------
148104 <- 6752 4.56% netbsd/elf386 baleful
Unpacked 1 file.
Now we have Baleful in a form that'll be much easier to reverse engineer. Load it into your preferred disassembler (I use IDA) and take a look. A good start would be trying to find the messages that the program prints, but they're not anywhere in the executable. Where could they be, then? A good start might be learning how I/O is done in the first place. Looking at the PLT (procedure linkage table), there are printf(), fputc(), and fgetc() functions. Quite a few things reference them.
.text:0804867C sub_804867C proc near ; CODE XREF: sub_804898B+12C9 p
.text:0804867C ; DATA XREF: .data:off_804C060 o
.text:0804867C
.text:0804867C arg_0 = dword ptr 8
.text:0804867C
.text:0804867C push ebp
.text:0804867D mov ebp, esp
.text:0804867F sub esp, 18h
.text:08048682 mov edx, ds:stderr
.text:08048688 mov eax, [ebp+arg_0]
.text:0804868B mov eax, [eax]
.text:0804868D mov [esp+4], edx ; stream
.text:08048691 mov [esp], eax ; c
.text:08048694 call _fputc
.text:08048699 mov eax, ds:stderr
.text:0804869E mov [esp], eax ; stream
.text:080486A1 call _fflush
.text:080486A6 mov eax, [ebp+arg_0]
.text:080486A9 mov eax, [eax]
.text:080486AB leave
.text:080486AC retn
.text:080486AC sub_804867C endp
This function takes a single argument, a pointer to a character, and prints that character to stderr. It then calls fflush to make sure it's actually printed. Let's call this print_char in case we encounter it later. There's an analogous function for character input, which we'll call stdin_getc:
.text:080486FB sub_80486FB proc near ; DATA XREF: .data:0804C070 o
.text:080486FB
.text:080486FB arg_0 = dword ptr 8
.text:080486FB
.text:080486FB push ebp
.text:080486FC mov ebp, esp
.text:080486FE sub esp, 18h
.text:08048701 mov eax, [ebp+arg_0]
.text:08048704 mov [esp], eax
.text:08048707 call sub_80485F4
.text:0804870C mov eax, ds:stdin
.text:08048711 mov [esp], eax ; stream
.text:08048714 call _fgetc
.text:08048719 leave
.text:0804871A retn
.text:0804871A sub_80486FB endp
0x080485F4 is a small function that checks if we've reached EOF in stdin, and raises a signal if we have. We can also find some more I/O functions which don't appear to be used. Here's our final list of all I/O functions:
- 0x0804867C (print_char) - Prints a single character to stderr
- 0x080486AD (print_dec) - Prints decimal numbers as strings
- 0x080486D4 (print_hex) - Prints hexadecimal numbers as strings
- 0x080487A9 (print_float) - Prints floating-point numbers as strings
- 0x080486FB (stdin_getc) - Read a single character from stdin and return it
- 0x0804871B (input_dec) - Reads a decimal number from stdin and returns it
- 0x0804874E (input_hex) - Reads a hexadecimal number from stdin and returns it
- 0x080487D8 (input_float) - Reads a floating-point number from stdin and returns it
All of these functions deal with basic text I/O. Interestingly enough, they're also all referenced by a table of functions at 0x0804C060. I call it io_ops since all the known functions in it are centered around that purpose:
.data:0804C060 io_ops dd offset print_char ; DATA XREF: sub_804898B+12B9 r
.data:0804C064 dd offset print_dec
.data:0804C068 dd offset print_hex
.data:0804C06C dd offset print_float
.data:0804C070 dd offset stdin_getc
.data:0804C074 dd offset input_dec
.data:0804C078 dd offset input_hex
.data:0804C07C dd offset input_float
.data:0804C080 dd offset sub_8048619
.data:0804C084 dd offset sub_8048813
.data:0804C088 dd offset sub_8048834
.data:0804C08C dd offset sub_804887B
.data:0804C090 dd offset sub_80488B6
.data:0804C094 dd offset sub_80488F1
.data:0804C098 dd offset sub_804892C
.data:0804C09C dd offset sub_8048660
.data:0804C0A0 dd offset sub_804866A
.data:0804C0A4 dd offset sub_8048967
.data:0804C0A8 align 20h
Is print_char used by Baleful to print the messages? That's not incredibly efficient, but would help obfuscate the program. We can find out by placing a GDB breakpoint on print_char and seeing what happens:
pico59150@shell:~$ gdb baleful
GNU gdb (Ubuntu 7.7-0ubuntu3.1) 7.7
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from baleful...(no debugging symbols found)...done.
(gdb) b *0x0804867C
Breakpoint 1 at 0x804867c
(gdb) run
Starting program: /home_users/pico59150/baleful
Breakpoint 1, 0x0804867c in ?? ()
(gdb) cont
Continuing.
P
Breakpoint 1, 0x0804867c in ?? ()
(gdb) cont
Continuing.
l
Breakpoint 1, 0x0804867c in ?? ()
(gdb) cont
Continuing.
e
Breakpoint 1, 0x0804867c in ?? ()
(gdb) cont
Continuing.
a
Breakpoint 1, 0x0804867c in ?? ()
(gdb) cont
Continuing.
s
Breakpoint 1, 0x0804867c in ?? ()
(gdb) cont
Continuing.
e
Breakpoint 1, 0x0804867c in ?? ()
(gdb)
Looks like that hypothesis is correct. Each time we execute print_char, the password prompt ("Please enter your password") gets printed out one character at a time. Whatever function is calling print_char is probably involved with printing out the message. Let's see where we were called from by viewing the return address on the stack:
(gdb) info registers eax 0xffffd5c4 -10812 ecx 0xf7fc988c -134440820 edx 0x804867c 134514300 ebx 0xffffd690 -10608 esp 0xffffd5ac 0xffffd5ac ebp 0xffffd678 0xffffd678 esi 0x0 0 edi 0xffffd70c -10484 eip 0x804867c 0x804867c eflags 0x212 [ AF IF ] cs 0x23 35 ss 0x2b 43 ds 0x2b 43 es 0x2b 43 fs 0x0 0 gs 0x63 99 (gdb) x 0xffffd5ac 0xffffd5ac:
0x08049c56
(gdb)
The return address is 0x08049c56. Let's view the code in the vicinity of that:
.text:08049C2E loc_8049C2E: ; CODE XREF: sub_804898B+C0 j .text:08049C2E ; DATA XREF: .rodata:off_8049DD4 o .text:08049C2E mov eax, [ebp+var_34] ; jumptable 08048A4B case 32 .text:08049C31 add eax, 1 .text:08049C34 movzx eax, byte_804C0C0[eax] .text:08049C3B movsx eax, al .text:08049C3E mov [ebp+var_24], eax .text:08049C41 mov eax, [ebp+var_24] .text:08049C44 mov edx, io_ops[eax*4] .text:08049C4B lea eax, [ebp+var_B4] .text:08049C51 mov [esp], eax
.text:08049C54 call edx ; print_char
.text:08049C56 mov [ebp+var_B4], eax .text:08049C5C add [ebp+var_34], 2 .text:08049C60 jmp short loc_8049C67 .text:08049C62 ; --------------------------------------------------------------------------- .text:08049C62 .text:08049C62 loc_8049C62: ; CODE XREF: sub_804898B+B3 j .text:08049C62 ; sub_804898B+C0 j .text:08049C62 ; DATA XREF: ... .text:08049C62 add [ebp+var_34], 1 ; jumptable 08048A4B default case .text:08049C66 nop .text:08049C67 .text:08049C67 loc_8049C67: ; CODE XREF: sub_804898B+9D j .text:08049C67 ; sub_804898B+C6 j ... .text:08049C67 mov eax, [ebp+var_34] .text:08049C6A add eax, offset byte_804C0C0 .text:08049C6F movzx eax, byte ptr [eax] .text:08049C72 cmp al, 1Dh .text:08049C74 jnz loc_8048A2D .text:08049C7A mov eax, [ebp+var_B4] .text:08049C80 .text:08049C80 locret_8049C80: ; CODE XREF: sub_804898B+E4 j .text:08049C80 leave .text:08049C81 retn .text:08049C81 sub_804898B endp
The highlighted text is where the actual call took place. Let's look back a bit to see where we came from. We can see that this is case 32 in some unknown jumptable. The first thing it does is read a 4-byte value from [ebp-0x34]. This value is used as an offset into some memory area at 0x804C0C0. This function reads the byte at 0x804C0C0+offset+1. What we can deduce from this is that there's some data structure pointed to by offset, and this function takes its second byte. That byte is used as an index into io_ops, from which a function is read and then called (in the highlighted line). The argument to the function is taken from [ebp-0xb4], and the return value is put there afterwards.
Once the I/O function has been completed, it increments the offset in [ebp-0x34] by 2 and calls 0x08049c67. 0x08049c67 reads a byte at the new offset and then compares it to 0x1d. If it is 0x1d, it just returns from whatever function we're in, but otherwise, it jumps to 0x08048a2d. It's not exactly clear what the function is doing at this point, so let's see what happens at 0x08048a2d:
.text:08048A2D loc_8048A2D: ; CODE XREF: sub_804898B+12E9 j
.text:08048A2D mov eax, [ebp+var_34]
.text:08048A30 add eax, offset byte_804C0C0
.text:08048A35 movzx eax, byte ptr [eax]
.text:08048A38 movsx eax, al
.text:08048A3B cmp eax, 20h ; switch 33 cases
.text:08048A3E ja loc_8049C62 ; jumptable 08048A4B default case
.text:08048A44 mov eax, ds:off_8049DD4[eax*4]
.text:08048A4B jmp eax ; switch jump
Looks like 0x08048a2d is the jumptable dispatcher. It once again uses [ebp-0x34] as an offset into 0x0804c0c0, a pattern that's starting to emerge. It takes the first byte at that offset and uses it as an index into the jumptable. Recall that 0x8049c2e is a jumptable case, so it gets called directly from here. It looked at the second byte at the offset, and used that as a parameter. So the data pointed to by [ebp-0x34] always starts with a jumptable index, and then contains some case-specific data afterwards.
What is at 0x0804c0c0 anyway? As it turns out, there's absolutely nothing but zeroes for the first 0x1000 bytes. Then there are some bytes which appear normal, though their purpose isn't yet known. But as we get to 0x0804D0F0, the data starts to lose any noticeable patterns and appears to be fairly random. It looks like there's some sort of encryption or packing going on. We'll get back to that much later.
Now, it still wasn't completely clear what I was dealing with, but I began to have a hunch that this was a bytecode VM. The theory makes sense: it has an offset into some data area, it uses the first byte at that offset to choose one of many cases, each case can read additional data from that offset, and it always increments the offset after it finishes. Recall that 0x08049c2e, the one which called all the I/O functions, used the second byte at the offset only. Then it incremented the offset by 2 when it finished, and went back to the main dispatcher. If the VM theory is correct, Baleful is advancing an instruction pointer and dispatching the next one.
The VM theory was actually quite plausible, so I decided to run with it. If it was true, that meant that everything in the 0x0804c0c0 area was a bytecode program that actually did everything. The I/O meta-function at 0x08049c2e would just be an instruction called by the bytecode program to communicate with the outside world. As obfuscation mechanisms go, it's a fairly good one. The new goal should be understanding enough of the VM to write a disassembler and reverse engineer the bytecode program.
.rodata:08049DD4 vm_instrs dd offset loc_8048A4D ; DATA XREF: sub_804898B+B9 r
.rodata:08049DD4 dd offset loc_8048A56 ; jump table for switch statement
.rodata:08049DD4 dd offset loc_8048A8F
.rodata:08049DD4 dd offset loc_8048BC4
.rodata:08049DD4 dd offset loc_8048CF9
.rodata:08049DD4 dd offset loc_8048E2F
.rodata:08049DD4 dd offset loc_8048F91
.rodata:08049DD4 dd offset loc_80495F5
.rodata:08049DD4 dd offset loc_8049649
.rodata:08049DD4 dd offset loc_80490C6
.rodata:08049DD4 dd offset loc_80491FB
.rodata:08049DD4 dd offset loc_804959E
.rodata:08049DD4 dd offset loc_8049330
.rodata:08049DD4 dd offset loc_8049467
.rodata:08049DD4 dd offset loc_80496D1
.rodata:08049DD4 dd offset loc_804969D
.rodata:08049DD4 dd offset loc_80496EC
.rodata:08049DD4 dd offset loc_8049715
.rodata:08049DD4 dd offset loc_804973E
.rodata:08049DD4 dd offset loc_8049767
.rodata:08049DD4 dd offset loc_8049790
.rodata:08049DD4 dd offset loc_80497B9
.rodata:08049DD4 dd offset loc_80497E2
.rodata:08049DD4 dd offset loc_80498F0
.rodata:08049DD4 dd offset loc_8049A02
.rodata:08049DD4 dd offset loc_8049A86
.rodata:08049DD4 dd offset loc_8049AB9
.rodata:08049DD4 dd offset loc_8049AEC
.rodata:08049DD4 dd offset loc_8049B43
.rodata:08049DD4 dd offset loc_8049C62
.rodata:08049DD4 dd offset loc_8049B92
.rodata:08049DD4 dd offset loc_8049BF8
.rodata:08049DD4 dd offset io_8049C2E
There's a huge, intimidating jumptable staring us in the face, and of the 33 instructions there, we have only a single one. Let's try and see which ones are easy enough to identify right away.
.text:08048A4D loc_8048A4D: ; DATA XREF: .rodata:vm_instrs o
.text:08048A4D add [ebp+ipos], 1 ; jumptable 08048A4B case 0
.text:08048A51 jmp loc_8049C67
A case that does absolutely nothing but increment the instruction pointer (I now call it ipos). I'm willing to bet this is the equivalent of NOP on basically every CPU architecture. This is probably some sort of assembly language bytecode, then. Two instructions down, 31 to go. What else can we identify?
Well, for me, pretty much nothing at all. 0x08049C62, which implements opcode 0x1d, is fairly easy to identify as the VM termination instruction, but that's not much help. Every other function just seemed incomprehensible from a static analysis perspective, using a bunch of local variables that I didn't know the meaning of. So I decided to go back to GDB, tracing the execution path of the program after the I/O dispatcher (opcode 0x20).
Let's restart the program and set two breakpoints, one on the main VM loop and one inside the I/O dispatcher. We want to start debugging after the first I/O call, though, so we need to set up the breakpoint then:
pico59150@shell:~$ gdb baleful
GNU gdb (Ubuntu 7.7-0ubuntu3.1) 7.7
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from baleful...(no debugging symbols found)...done.
(gdb) b *0x08049C54
Breakpoint 1 at 0x8049c54
(gdb) run
Starting program: /home_users/pico59150/baleful
Breakpoint 1, 0x08049c54 in ?? ()
(gdb) b *0x08048A2D
Breakpoint 2 at 0x8048a2d
(gdb)
We want to see what instructions are being executed, so let's view the opcode every time we hit the main dispatcher:
(gdb) cont Continuing. P Breakpoint 2, 0x08048a2d in ?? () (gdb) si 0x08048a30 in ?? () (gdb) info registers eax
0x1041
4161 ecx 0xf7fc988c -134440820 edx 0x0 0 ebx 0xffffd690 -10608 esp 0xffffd5b0 0xffffd5b0 ebp 0xffffd678 0xffffd678 esi 0x0 0 edi 0xffffd70c -10484 eip 0x8048a30 0x8048a30 eflags 0x297 [ CF PF AF SF IF ] cs 0x23 35 ss 0x2b 43 ds 0x2b 43 es 0x2b 43 fs 0x0 0 gs 0x63 99 (gdb) si 0x08048a35 in ?? () (gdb) si 0x08048a38 in ?? () (gdb) si 0x08048a3b in ?? () (gdb) info registers eax
0x1
1 ecx 0xf7fc988c -134440820 edx 0x0 0 ebx 0xffffd690 -10608 esp 0xffffd5b0 0xffffd5b0 ebp 0xffffd678 0xffffd678 esi 0x0 0 edi 0xffffd70c -10484 eip 0x8048a3b 0x8048a3b eflags 0x202 [ IF ] cs 0x23 35 ss 0x2b 43 ds 0x2b 43 es 0x2b 43 fs 0x0 0 gs 0x63 99 (gdb)
Offset is 0x1041, opcode is 0x1. Let's keep doing this for a while.
(gdb) cont Continuing. Breakpoint 2, 0x08048a2d in ?? () (gdb) si 0x08048a30 in ?? () (gdb) info registers eax
0x1a79
6777 ecx 0xf7fc988c -134440820 edx 0x0 0 ebx 0xffffd690 -10608 esp 0xffffd5b0 0xffffd5b0 ebp 0xffffd678 0xffffd678 esi 0x0 0 edi 0xffffd70c -10484 eip 0x8048a30 0x8048a30 eflags 0x293 [ CF AF SF IF ] cs 0x23 35 ss 0x2b 43 ds 0x2b 43 es 0x2b 43 fs 0x0 0 gs 0x63 99 (gdb) si 0x08048a35 in ?? () (gdb) si 0x08048a38 in ?? () (gdb) si 0x08048a3b in ?? () (gdb) info registers eax
0x18
24 ecx 0xf7fc988c -134440820 edx 0x0 0 ebx 0xffffd690 -10608 esp 0xffffd5b0 0xffffd5b0 ebp 0xffffd678 0xffffd678 esi 0x0 0 edi 0xffffd70c -10484 eip 0x8048a3b 0x8048a3b eflags 0x206 [ PF IF ] cs 0x23 35 ss 0x2b 43 ds 0x2b 43 es 0x2b 43 fs 0x0 0 gs 0x63 99 (gdb) cont Continuing. Breakpoint 2, 0x08048a2d in ?? () (gdb) si 0x08048a30 in ?? () (gdb) info registers eax
0x1a80
6784 ecx 0xf7fc988c -134440820 edx 0x6c 108 ebx 0xffffd690 -10608 esp 0xffffd5b0 0xffffd5b0 ebp 0xffffd678 0xffffd678 esi 0x0 0 edi 0xffffd70c -10484 eip 0x8048a30 0x8048a30 eflags 0x283 [ CF SF IF ] cs 0x23 35 ss 0x2b 43 ds 0x2b 43 es 0x2b 43 fs 0x0 0 gs 0x63 99 (gdb) si 0x08048a35 in ?? () (gdb) si 0x08048a38 in ?? () (gdb) si 0x08048a3b in ?? () (gdb) info registers eax
0xf
15 ecx 0xf7fc988c -134440820 edx 0x6c 108 ebx 0xffffd690 -10608 esp 0xffffd5b0 0xffffd5b0 ebp 0xffffd678 0xffffd678 esi 0x0 0 edi 0xffffd70c -10484 eip 0x8048a3b 0x8048a3b eflags 0x202 [ IF ] cs 0x23 35 ss 0x2b 43 ds 0x2b 43 es 0x2b 43 fs 0x0 0 gs 0x63 99 (gdb) cont Continuing. Breakpoint 2, 0x08048a2d in ?? () (gdb) si 0x08048a30 in ?? () (gdb) info registers eax
0x103f
4159 ecx 0xf7fc988c -134440820 edx 0x1a85 6789 ebx 0xffffd690 -10608 esp 0xffffd5b0 0xffffd5b0 ebp 0xffffd678 0xffffd678 esi 0x0 0 edi 0xffffd70c -10484 eip 0x8048a30 0x8048a30 eflags 0x216 [ PF AF IF ] cs 0x23 35 ss 0x2b 43 ds 0x2b 43 es 0x2b 43 fs 0x0 0 gs 0x63 99 (gdb) si 0x08048a35 in ?? () (gdb) si 0x08048a38 in ?? () (gdb) si 0x08048a3b in ?? () (gdb) info registers eax
0x20
32 ecx 0xf7fc988c -134440820 edx 0x1a85 6789 ebx 0xffffd690 -10608 esp 0xffffd5b0 0xffffd5b0 ebp 0xffffd678 0xffffd678 esi 0x0 0 edi 0xffffd70c -10484 eip 0x8048a3b 0x8048a3b eflags 0x206 [ PF IF ] cs 0x23 35 ss 0x2b 43 ds 0x2b 43 es 0x2b 43 fs 0x0 0 gs 0x63 99 (gdb)
Let's summarize what we see here. It's pretty important, since what I found while doing this was the big breakthrough that allowed me to quickly figure out the rest of the VM. After leaving the I/O instruction, we're at offset 0x1041. We can infer from this that the I/O instruction was at 0x103f. The opcode there is 0x1. After running opcode 0x1, we go back to the main VM loop, but the offset has suddenly jumped by a lot to 0x1a79. The most likely explanation is that opcode 0x1 modified ipos. Opcode 0x18 is then run, which seems to have a lot of associated data, as the ipos after that is 0x1a80. The opcode at 0x1a80 is 0xf. Afterwards, we can observe that ipos has changed back to 0x103f, about to run the I/O instruction again. 0xf is the only candidate for what modified ipos.
If we allow the program to continue and run the I/O instruction, now the "l" character gets printed. This is the second character in the password prompt. So the I/O instruction runs, then opcode 0x1 runs and ipos completely changes as a result. At the new ipos, we run opcode 0x18 and 0xf. As a result of executing opcode 0xf, ipos goes back to what it was before and runs the I/O instruction again. But we have new data being passed to it! Which instruction did that? Assuming that each opcode does one specific thing, 0x18 is the only possibility.
The way this is arranged made me think of returning from a function, getting some new data, and then going back to that same function. That would make 0x1 RET and 0xf CALL. And since 0x18 gets the new character to be printed, it's probably similar to MOV. As it turns out, simply figuring that out is amazingly helpful towards getting all 33 opcodes figured out. RET and CALL deal with the stack, and MOV deals with registers, so they'll shed a lot of light on the VM.
Let's look at both 0x1 and 0xf first, which we believe are CALL and RET:
.text:08048A56 ret_8048A56: ; CODE XREF: sub_804898B+C0 j
.text:08048A56 ; DATA XREF: .rodata:vm_instrs o
.text:08048A56 mov eax, [ebp+var_38] ; jumptable 08048A4B case 1
.text:08048A59 add eax, offset byte_804C0C0
.text:08048A5E mov eax, [eax]
.text:08048A60 mov [ebp+var_14], eax
.text:08048A63 cmp [ebp+var_14], 0
.text:08048A67 jnz short loc_8048A74
.text:08048A69 mov eax, [ebp+var_B4]
.text:08048A6F jmp locret_8049C80
.text:08048A74 ; ---------------------------------------------------------------------------
.text:08048A74
.text:08048A74 loc_8048A74: ; CODE XREF: sub_804898B+DC j
.text:08048A74 mov eax, [ebp+var_38]
.text:08048A77 add eax, 4
.text:08048A7A mov [ebp+var_38], eax
.text:08048A7D mov eax, [ebp+var_14]
.text:08048A80 mov [ebp+ipos], eax
.text:08048A83 mov [ebp+var_28], 0
.text:08048A8A jmp loc_8049C67
.text:0804969D call_804969D: ; CODE XREF: sub_804898B+C0 j
.text:0804969D ; DATA XREF: .rodata:vm_instrs o
.text:0804969D mov eax, [ebp+ipos] ; jumptable 08048A4B case 15
.text:080496A0 add eax, 1
.text:080496A3 add eax, offset byte_804C0C0
.text:080496A8 mov eax, [eax]
.text:080496AA mov [ebp+var_10], eax
.text:080496AD mov eax, [ebp+var_38]
.text:080496B0 sub eax, 4
.text:080496B3 mov [ebp+var_38], eax
.text:080496B6 mov eax, [ebp+var_38]
.text:080496B9 add eax, offset byte_804C0C0
.text:080496BE mov edx, [ebp+ipos]
.text:080496C1 add edx, 5
.text:080496C4 mov [eax], edx
.text:080496C6 mov eax, [ebp+var_10]
.text:080496C9 mov [ebp+ipos], eax
.text:080496CC jmp loc_8049C67
We can see that both of these functions use [ebp-0x38]. RET uses it as an offset into the 0x0804c0c0 area, reading a 4-byte value from that offset. It increments the offset in [ebp-0x38] by 4 bytes and then stores the value it previously read in ipos. CALL does the opposite of this. It first reads a 4-byte value as part of the bytecode instruction, which is presumably the offset we want to jump to. It then decrements [ebp-0x38] by 4 bytes and stores the current value of ipos + 5, which would point to the instruction after the CALL. Then it puts the previously read value into ipos. These functions make it obvious that 0x1 and 0xf are indeed RET and CALL, and furthermore, that [ebp-0x38] is a stack pointer offset. Very useful to know.
Now we want to look at MOV to see how it works, but the function looks very complex at first glance. Let's again use GDB to trace through it, at the point where the "l" character gets loaded. Opcode 0x18, which is MOV, has its case at 0x08049A02. Let's see which part of the case we end up branching to.
.text:08049A02 loc_8049A02: ; CODE XREF: sub_804898B+C0 j
.text:08049A02 ; DATA XREF: .rodata:vm_instrs o
.text:08049A02 mov eax, [ebp+ipos] ; jumptable 08048A4B case 24
.text:08049A05 add eax, 1
.text:08049A08 movzx eax, byte_804C0C0[eax]
.text:08049A0F movsx eax, al
.text:08049A12 mov [ebp+var_C], eax
.text:08049A15 mov eax, [ebp+var_C]
.text:08049A18 test eax, eax
.text:08049A1A jz short loc_8049A23
.text:08049A1C cmp eax, 1
.text:08049A1F jz short loc_8049A57
.text:08049A21 jmp short loc_8049A81
There are several cases we could end up in, let's just set breakpoints on all of them and see which one we hit.
(gdb) break *0x8049A23
Breakpoint 3 at 0x8049a23
(gdb) break *0x8049A57
Breakpoint 4 at 0x8049a57
(gdb) cont
Continuing.
Breakpoint 4, 0x08049a57 in ?? ()
(gdb)
Alright, we end up getting to 0x08049a57. We should first dump the data ipos points to:
(gdb) x $ebp-0x34 0xffffd644: 0x00001a79 (gdb) x/16xb 0x0804db39 0x804db39:
0x18 0x01 0x00 0x6c 0x00 0x00 0x00
0x0f 0x804db41: 0x3f 0x10 0x00 0x00 0x18 0x01 0x00 0x65 (gdb)
The data highlighted in red is the actual MOV instruction. Here we see 0x18, the opcode, 0x01, which is used to decide the case, a 0x00 byte, and the 4-byte value 0x6c (little endian). 0x6c is simply the ASCII code for the letter "l", so this is almost certainly loading the next character to print. Now that we know the data, how will the function play out?
.text:08049A57 loc_8049A57: ; CODE XREF: sub_804898B+1094 j
.text:08049A57 mov eax, [ebp+ipos]
.text:08049A5A add eax, 2
.text:08049A5D movzx eax, byte_804C0C0[eax]
.text:08049A64 movsx eax, al
.text:08049A67 mov edx, [ebp+ipos]
.text:08049A6A add edx, 3
.text:08049A6D add edx, offset byte_804C0C0
.text:08049A73 mov edx, [edx]
.text:08049A75 mov [ebp+eax*4+var_B4], edx
.text:08049A7C add [ebp+ipos], 7
.text:08049A80 nop
It loads the third byte of our instruction, which will be 0x00. Then it loads the 4-byte value after that, which is 0x6c. Finally, it loads that value into [ebp-0xb4+eax], which is just [ebp-0xb4] in this case. Recall that [ebp-0xb4] contained the argument given to print_char. However, it turns out that ebp-0xb4 is not just the address of one 4-byte value, but an entire array of them. Depending on the third byte, the MOV instruction can write to any index of this array. MOV usually loads values into registers, so it would appear that ebp-0xb4 is a register area for the VM bytecode. And if ebp-0xb4 is a register area, this means that the I/O opcode always passes the value of the first register as an argument to whichever I/O function is called.
Now we've figured out a lot more about how data is accessed inside the VM. Knowing about both the register and stack implementations makes it very easy to figure out the rest of the instructions. Since this article is already quite long, we'll pick this back up in part 2. In the second part of this write-up, we'll figure out the rest of the instructions, write a disassembler for the bytecode, and then reverse engineer the bytecode to get the flag.
0 Comments:
Post a Comment
Subscribe to Post Comments [Atom]
<< Home