HSF 2016: Isengard (re500)
Isengard is a 500 point reverse engineering problem. We're given a binary to look at, but little other direction. Let's see what we can find.
First, it's a good idea to try running the program:
$ ./isengard
starting stub ...
.___ .___ ____
| | ______ ____ ____ _________ _______ __| _/ ___ _/_ |
| |/ ___// __ \ / \ / ___\__ \\_ __ \/ __ | \ \/ /| |
| |\___ \\ ___/| | \/ /_/ > __ \| | \/ /_/ | \ / | |
|___/____ >\___ >___| /\___ (____ /__| \____ | \_/ |___|
\/ \/ \//_____/ \/ \/
============================================================================================================
============================================================================================================
============================================================================================================
\\\\\\ They are taking the Hobbits! \\\\\\
\\\\\\ Find them and bring them back! \\\\\\
\\\\\\ https://www.youtube.com/watch?v=z9Uz1icjwrM \\\\\\
\\\\\\ Password: youshallnotpass \\\\\\
============================================================================================================
============================================================================================================
============================================================================================================
Password:
There's an intro screen, a password given to us ("youshallnotpass"), and then a password prompt. If we try entering the password it tells us, the program appears to hang waiting for input. Pressing Enter enough times eventually causes it to terminate with the message "woaw".
That doesn't tell us much, so let's start disassembling Isengard:
.text:0BADCC46 main proc near ; CODE XREF: start+8 p
.text:0BADCC46
.text:0BADCC46 var_D0 = dword ptr -0D0h
.text:0BADCC46 var_CC = dword ptr -0CCh
.text:0BADCC46 var_C8 = dword ptr -0C8h
.text:0BADCC46 var_C4 = dword ptr -0C4h
.text:0BADCC46 var_C0 = dword ptr -0C0h
.text:0BADCC46 var_BC = dword ptr -0BCh
.text:0BADCC46 var_AC = dword ptr -0ACh
.text:0BADCC46 var_A8 = byte ptr -0A8h
.text:0BADCC46 var_A0 = byte ptr -0A0h
.text:0BADCC46 var_80 = dword ptr -80h
.text:0BADCC46 arg_0 = dword ptr 8
.text:0BADCC46
.text:0BADCC46 push ebp
.text:0BADCC47 mov ebp, esp
.text:0BADCC49 push edi
.text:0BADCC4A push esi
.text:0BADCC4B push ebx
.text:0BADCC4C sub esp, 0D8h
.text:0BADCC52 mov eax, dword_BAFB484
.text:0BADCC57 mov [ebp+var_AC], 0
.text:0BADCC61 push offset aStartingStub__ ; "starting stub ...\n"
.text:0BADCC66 mov [ebp+var_BC], eax
.text:0BADCC6C call sub_BADC25A
main() begins by passing "starting stub ...\n" to sub_BADC25A. Since that's the first message printed when we start Isengard, we can deduce that sub_BADC25A is printf().
Looking further into main(), it's not immediately apparent what's going on at first. However, the use of strings like /dev/urandom jumps out. What's happening there?
.text:0BADCD54 push 0
.text:0BADCD56 push offset aDevUrandom ; "/dev/urandom"
.text:0BADCD5B call sub_BADC18E
.text:0BADCD60 add esp, 10h
.text:0BADCD63 test eax, eax
.text:0BADCD65 mov edi, eax
.text:0BADCD67 js short try_random
.text:0BADCD69 lea edx, [ebp+var_A0]
.text:0BADCD6F push eax
.text:0BADCD70 push 20h
.text:0BADCD72 push edx
.text:0BADCD73 push edi
.text:0BADCD74 mov [ebp+var_C0], edx
.text:0BADCD7A call sub_BADC187
Translated into pseudocode, it basically looks like:
ret = sub_BADC18E("/dev/urandom", 0)
sub_BADC187(ret, &var_A0, 0x20)
This seems like an open() followed by a read(). If we actually look into the implementations of those two subroutines, that's confirmed to be the case. Both of them set up the correct arguments and invoke int 0x80 (the Linux syscall instruction). So we've found open() and read() statically linked into Isengard.
In fact, several adjacent subroutines are also statically linked POSIX calls. We can identify them using the Linux syscall table and name them, which will make later reversing easier. Having named the Unix API functions, what new insights do we gain?
Near the middle of main() is a strange call to mmap():
.text:0BADCE77 push 0 ; int
.text:0BADCE79 push 0FFFFFFFFh ; fd
.text:0BADCE7B mov [ebp+var_D0], edx
.text:0BADCE81 or ecx, 12h
.text:0BADCE84 push ecx ; flags
.text:0BADCE85 push 7 ; prot
.text:0BADCE87 mov ecx, eax
.text:0BADCE89 mov esi, [ebx+0Ch]
.text:0BADCE8C and ecx, 0FFFh
.text:0BADCE92 and eax, 0FFFFF000h
.text:0BADCE97 lea ecx, [esi+ecx+0FFFh]
.text:0BADCE9E and ecx, 0FFFFF000h
.text:0BADCEA4 push ecx ; len
.text:0BADCEA5 push eax ; start
.text:0BADCEA6 call mmap
Why is mmap() being used? Curiously, the prot value is 7, meaning that we're mapping pages that are readable, writable, and executable. It seems as if we're generating executable code on the fly to run later. For now, it's not entirely clear what role that plays. We'll come back to it later.
We also know that Isengard initially asks us for a password. If we enter "youshallnotpass", as it tells us to, we go into that weird input loop where hitting Enter enough times exits the program. If we enter anything else, it prints "Integrity error. Good password ?" and quits.
So entering an invalid password causes a data integrity error. That implies the password is being used to decode some kind of data. In that case, it's a good idea to see what it does with the password. Looking for references to read(), we can find the password input function as sub_BADCA35. Let's call it authenticate().
authenticate() does a lot, and without context, it's hard to understand. But we can discern the key parts:
- The "Password: " prompt is printed in loc_BADCA72
- Each character of the password is read in a loop starting at loc_BADCAB9
- After the password is read, some unknown processing occurs in loc_BADCAEC and loc_BADCB5E. For whatever reason, the former has a printf("Curve25519 key exchange failed !\n") call. This implies that the password is used in some kind of crypto algorithm.
- loc_BADCB5E compares the results of this processing against a static 0x20 byte array. It prints the integrity error if they're not equal. Otherwise, the function returns normally.
Thus, authenticate() gets a password, runs it through some crypto-related operations, and compares it against a known correct value. Building off our earlier guess, it seems like authenticate() is using the password to decrypt some data. If that's the case, the code after authenticate() likely does something with that data.
authenticate() is called from main(). In fact, it's invoked just before the mmap() section. So mmap(), which we know is creating executable code, appears to be generating this code from the data we decrypted. What exactly happens between authenticate() and mmap()?
.text:0BADCE20 loc_BADCE20: ; CODE XREF: main+1C6 j
.text:0BADCE20 call authenticate
.text:0BADCE25 mov eax, [ebp+var_BC]
.text:0BADCE2B mov [ebp+var_C0], 0
.text:0BADCE35 mov [ebp+var_C8], 0
.text:0BADCE3F mov edx, [eax+1Ch]
.text:0BADCE42 add edx, dword_BAFB484
.text:0BADCE48 movzx eax, word ptr [eax+2Ch]
.text:0BADCE4C lea ebx, [edx+8]
.text:0BADCE4F mov [ebp+var_CC], eax
.text:0BADCE55
.text:0BADCE55 loc_BADCE55: ; CODE XREF: main+2A8 j
.text:0BADCE55 mov edi, [ebp+var_CC]
.text:0BADCE5B cmp [ebp+var_C8], edi
.text:0BADCE61 jge loc_BADCEF3
.text:0BADCE67 cmp dword ptr [ebx-8], 1
.text:0BADCE6B jnz short loc_BADCEE5
.text:0BADCE6D mov eax, [ebx]
.text:0BADCE6F push ecx
.text:0BADCE70 push ecx
.text:0BADCE71 mov ecx, ds:dword_BAFB58C
mmap()'s purpose is to create a memory region, at a specific address with a specific length. When mmap() is called, EAX holds the address and ECX holds the length. Working backwards through the above code sample, we see that these values ultimately originate from var_BC. It seems like an in-memory structure is being parsed to get the mmap() parameters. What is this structure?
Let's use GDB to figure that out:
(gdb) b *0x0BADCE25
Breakpoint 1 at 0xbadce25
(gdb) r
Starting program: /home/user/isengard
[...REDACTED FOR BREVITY...]
Password:
Breakpoint 1, 0x0badce25 in ?? ()
(gdb) si
0x0badce2b in ?? ()
(gdb) info registers
eax 0xda8072c 229115692
ecx 0x0 0
edx 0xffffd3b8 -11336
ebx 0xffffd650 -10672
esp 0xffffd5f8 0xffffd5f8
ebp 0xffffd6d0 0xffffd6d0
esi 0xffffd628 -10712
edi 0x0 0
eip 0xbadce2b 0xbadce2b
eflags 0x286 [ PF SF IF ]
cs 0x23 35
ss 0x2b 43
ds 0x2b 43
es 0x2b 43
fs 0x0 0
gs 0x0 0
(gdb) x $eax
0xda8072c: 0x464c457f
Converting that value into ASCII, we get "\x7FELF". This is the magic number for an ELF file! It seems that there's an ELF file in memory, being loaded with mmap(). Isengard's purpose is hiding a real binary inside of itself.
Using GDB's dump command, we can dump the ELF into a file. Running the file command confirms it is a valid ELF. Now we can begin reverse engineering the apparent meat of the problem:
.text:080486E7 main proc near ; DATA XREF: start+17 o
.text:080486E7
.text:080486E7 var_10 = dword ptr -10h
.text:080486E7 var_C = dword ptr -0Ch
.text:080486E7 var_4 = dword ptr -4
.text:080486E7
.text:080486E7 lea ecx, [esp+4]
.text:080486EB and esp, 0FFFFFFF0h
.text:080486EE push dword ptr [ecx-4]
.text:080486F1 push ebp
.text:080486F2 mov ebp, esp
.text:080486F4 push ecx
.text:080486F5 sub esp, 14h
.text:080486F8 mov [ebp+var_10], 0
.text:080486FF jmp short loc_804870A
.text:08048701 ; ---------------------------------------------------------------------------
.text:08048701
.text:08048701 loc_8048701: ; CODE XREF: main+29 j
.text:08048701 call sub_804862A
.text:08048706 add [ebp+var_10], 1
.text:0804870A
.text:0804870A loc_804870A: ; CODE XREF: main+18 j
.text:0804870A mov eax, [ebp+var_10]
.text:0804870D cmp eax, 25h
.text:08048710 jbe short loc_8048701
.text:08048712 mov [ebp+var_C], 1
.text:08048719 mov [ebp+var_10], 0
.text:08048720 jmp short loc_8048753
It begins by calling sub_804862A in a loop 0x26 times. sub_804862A's main section is as follows:
.text:0804863E push 2 ; nbytes
.text:08048640 lea eax, [ebp+buf]
.text:08048643 push eax ; buf
.text:08048644 push 0 ; fd
.text:08048646 call _read
.text:0804864B add esp, 10h
.text:0804864E mov eax, ds:dword_804A510
.text:08048653 test eax, eax
.text:08048655 jnz short loc_8048660
.text:08048657 mov [ebp+var_18], 0
.text:0804865E jmp short loc_804866F
.text:08048660 ; ---------------------------------------------------------------------------
.text:08048660
.text:08048660 loc_8048660: ; CODE XREF: readchar+2B j
.text:08048660 mov eax, ds:dword_804A510
.text:08048665 mov eax, ds:dword_804A520[eax*4]
.text:0804866C mov [ebp+var_18], eax
.text:0804866F
.text:0804866F loc_804866F: ; CODE XREF: readchar+34 j
.text:0804866F sub esp, 4
.text:08048672 push 1
.text:08048674 lea eax, [ebp+buf]
.text:08048677 push eax
.text:08048678 push [ebp+var_18]
.text:0804867B call sub_804855B
.text:08048680 add esp, 10h
.text:08048683 mov [ebp+var_14], eax
.text:08048686 mov eax, ds:dword_804A510
.text:0804868B mov edx, [ebp+var_14]
.text:0804868E mov ds:dword_804A520[eax*4], edx
.text:08048695 mov eax, ds:dword_804A510
.text:0804869A add eax, 1
.text:0804869D mov ds:dword_804A510, eax
Essentially, it does this:
- Reads 2 bytes of keyboard input into a stack buffer. (This is why, after entering "youshallnotpass" as the password, we had to hit Enter a bunch of times to quit)
- Checks the value of dword_804A510. If it's 0, var_18 is set to 0. Otherwise, dword_804A510 is used as an index into the dword_804A520 array, and that fetched value becomes var_18.
- Calls sub_804855B(var_18, buf, 1)
- Uses dword_804A510 is used as an index into the dword_804A520 array, storing the return value of sub_804855B() there
- Increments dword_804A510
We can deduce what a lot of these variables and routines are for. This overall subroutine is used to read keyboard input, process it into an int, and store it in the dword_804A520 array. Each time this subroutine is called (0x26 times in total), dword_804A510 (the index into the array) is incremented.
Pseudocode representing what's going on would look like this:
# outbuf is the dword_804A520 array
# outlen is dword_804A510
read(0, buf, 2)
var = 0
if (outlen > 0) var = outbuf[outlen]
outbuf[outlen] = sub_804855B(var, buf, 1)
outlen++
Presumably, this processed keyboard input that we're collecting will be used later in main(). But we're missing two key things before we can understand this routine:
- What does sub_804855B() do?
- How is outbuf[outlen] being referenced before any processed characters have been written there? When outlen=1, we're setting var to outbuf[1], but outbuf[1] is filled in with sub_804855B()'s output later.
sub_804855B() does the following:
.text:0804855B obfuscate proc near ; CODE XREF: readchar+51 p
.text:0804855B
.text:0804855B buf = dword ptr -4
.text:0804855B key = dword ptr 8
.text:0804855B arg_4 = dword ptr 0Ch
.text:0804855B len = dword ptr 10h
.text:0804855B
.text:0804855B push ebp
.text:0804855C mov ebp, esp
.text:0804855E sub esp, 10h
.text:08048561 mov eax, [ebp+arg_4]
.text:08048564 mov [ebp+buf], eax
.text:08048567 not [ebp+key]
.text:0804856A jmp short loc_8048593
.text:0804856C ; ---------------------------------------------------------------------------
.text:0804856C
.text:0804856C loc_804856C: ; CODE XREF: obfuscate+43 j
.text:0804856C mov eax, [ebp+buf]
.text:0804856F lea edx, [eax+1]
.text:08048572 mov [ebp+buf], edx
.text:08048575 movzx eax, byte ptr [eax]
.text:08048578 movzx eax, al
.text:0804857B xor eax, [ebp+key]
.text:0804857E movzx eax, al
.text:08048581 mov eax, dword_804A060[eax*4]
.text:08048588 mov edx, [ebp+key]
.text:0804858B shr edx, 8
.text:0804858E xor eax, edx
.text:08048590 mov [ebp+key], eax
.text:08048593
.text:08048593 loc_8048593: ; CODE XREF: obfuscate+F j
.text:08048593 mov eax, [ebp+len]
.text:08048596 lea edx, [eax-1]
.text:08048599 mov [ebp+len], edx
.text:0804859C test eax, eax
.text:0804859E jnz short loc_804856C
.text:080485A0 mov eax, [ebp+key]
.text:080485A3 not eax
.text:080485A5 leave
.text:080485A6 retn
.text:080485A6 obfuscate endp
key is var_18 from the calling function. buf is the buffer containing keyboard input. And len is 1. Based on that, we're only using the first character of keyboard input. Presumably, then, the second is meant to be a newline.
Overall, this is a simple algorithm that processes an ASCII character into an integer. It achieves this by looking up a value in dword_804A060, which we'll call srcbuf. Pseudocode for this function will come later.
We know almost everything about how individual input characters are read. But how are we using spots in outbuf that aren't yet initialized? As it happens, outbuf has a special preinit routine:
.text:080485E6 sub_80485E6 proc near ; CODE XREF: sub_8048790+44 p
.text:080485E6 ; DATA XREF: .init_array:08049F04 o
.text:080485E6
.text:080485E6 var_8 = dword ptr -8
.text:080485E6 var_4 = dword ptr -4
.text:080485E6
.text:080485E6 push ebp
.text:080485E7 mov ebp, esp
.text:080485E9 sub esp, 10h
.text:080485EC mov [ebp+var_4], 0CAF3B33Bh
.text:080485F3 mov [ebp+var_8], 0
.text:080485FA jmp short loc_8048619
.text:080485FC ; ---------------------------------------------------------------------------
.text:080485FC
.text:080485FC loc_80485FC: ; CODE XREF: sub_80485E6+3B j
.text:080485FC mov eax, [ebp+var_8]
.text:080485FF mov eax, ds:outbuf[eax*4]
.text:08048606 xor eax, [ebp+var_4]
.text:08048609 mov edx, eax
.text:0804860B mov eax, [ebp+var_8]
.text:0804860E mov ds:outbuf[eax*4], edx
.text:08048615 add [ebp+var_8], 1
.text:08048619
.text:08048619 loc_8048619: ; CODE XREF: sub_80485E6+14 j
.text:08048619 mov eax, [ebp+var_8]
.text:0804861C cmp eax, 97h
.text:08048621 jbe short loc_80485FC
.text:08048623 mov eax, 0
.text:08048628 leave
.text:08048629 retn
.text:08048629 sub_80485E6 endp
Since outbuf is filled with zeroes before its initialization, due to being in the BSS, all this does is set the first 0x97 spots to 0xCAF3B33B.
Now that the character input and processing is figured out, what does main() do with outbuf?
.text:08048722 do_cmp: ; CODE XREF: main+72 j
.text:08048722 cmp [ebp+var_C], 0
.text:08048726 jz short not_equal
.text:08048728 mov eax, [ebp+n]
.text:0804872B mov edx, ds:outbuf[eax*4]
.text:08048732 mov eax, [ebp+n]
.text:08048735 mov eax, cmpbuf[eax*4]
.text:0804873C cmp edx, eax
.text:0804873E jnz short not_equal
.text:08048740 mov eax, 1
.text:08048745 jmp short cmp_loop
.text:08048747 ; ---------------------------------------------------------------------------
.text:08048747
.text:08048747 not_equal: ; CODE XREF: main+3F j
.text:08048747 ; main+57 j
.text:08048747 mov eax, 0
.text:0804874C
.text:0804874C cmp_loop: ; CODE XREF: main+5E j
.text:0804874C mov [ebp+var_C], eax
.text:0804874F add [ebp+n], 1
.text:08048753
.text:08048753 loc_8048753: ; CODE XREF: main+39 j
.text:08048753 mov eax, [ebp+n]
.text:08048756 cmp eax, 25h
.text:08048759 jbe short do_cmp
.text:0804875B cmp [ebp+var_C], 0
.text:0804875F jnz short great_8048773
.text:08048761 sub esp, 0Ch
.text:08048764 push offset format ; "woaw"
.text:08048769 call _printf
.text:0804876E add esp, 10h
.text:08048771 jmp short loc_8048783
.text:08048773 ; ---------------------------------------------------------------------------
.text:08048773
.text:08048773 great_8048773: ; CODE XREF: main+78 j
.text:08048773 sub esp, 0Ch
.text:08048776 push offset aGreat ; "great"
.text:0804877B call _printf
.text:08048780 add esp, 10h
Pretty simple: it compares all 0x26 ints of outbuf to the corresponding ints in another array, cmpbuf. If they all match, it prints "great"; otherwise, it prints "woaw". Presumably, we want "great".
All of the reverse engineering work can be transformed into a pseudocode algorithm. I wrote this paste during the competition to serve that purpose.
Reversing the algorithm to get the flag is quite simple. We just need to make this equation match for every i'th character:
cmpbuf[i] = ~(srcbuf[(uint8_t)(ch ^ ~key)] ^ (~key >> 8))
Daunting as this may seem, we know the key for every single i. In fact, it's always either 0 (for the first character) or 0xCAF3B33B (for all subsequent ones). Calculating ch for every cmpbuf[i], and putting it all together, we get the solution: flag{oh_man_such_many_ant3_d00b00gers}