Monday, October 10, 2016

HSF 2016: Isengard (re500)

Isengard is a 500 point reverse engineering problem. We're given a binary to look at, but little other direction. Let's see what we can find.

First, it's a good idea to try running the program:

$ ./isengard
starting stub ...
                    .___                                             .___        ____
                    |   | ______ ____   ____    _________ _______  __| _/ ___  _/_   |
                    |   |/  ___// __ \ /    \  / ___\__  \\_  __ \/ __ |  \  \/ /|   |
                    |   |\___ \\  ___/|   |  \/ /_/  > __ \|  | \/ /_/ |   \   / |   |
                    |___/____  >\___  >___|  /\___  (____  /__|  \____ |    \_/  |___|
                             \/     \/     \//_____/     \/           \/
                                                                                            
============================================================================================================
============================================================================================================
============================================================================================================

        \\\\\\          They are taking the Hobbits!                    \\\\\\
            \\\\\\          Find them and bring them back!                  \\\\\\
                \\\\\\          https://www.youtube.com/watch?v=z9Uz1icjwrM     \\\\\\
                    \\\\\\              Password: youshallnotpass                   \\\\\\

============================================================================================================
============================================================================================================
============================================================================================================
Password:

There's an intro screen, a password given to us ("youshallnotpass"), and then a password prompt. If we try entering the password it tells us, the program appears to hang waiting for input. Pressing Enter enough times eventually causes it to terminate with the message "woaw".

That doesn't tell us much, so let's start disassembling Isengard:

.text:0BADCC46 main            proc near               ; CODE XREF: start+8 p
.text:0BADCC46
.text:0BADCC46 var_D0          = dword ptr -0D0h
.text:0BADCC46 var_CC          = dword ptr -0CCh
.text:0BADCC46 var_C8          = dword ptr -0C8h
.text:0BADCC46 var_C4          = dword ptr -0C4h
.text:0BADCC46 var_C0          = dword ptr -0C0h
.text:0BADCC46 var_BC          = dword ptr -0BCh
.text:0BADCC46 var_AC          = dword ptr -0ACh
.text:0BADCC46 var_A8          = byte ptr -0A8h
.text:0BADCC46 var_A0          = byte ptr -0A0h
.text:0BADCC46 var_80          = dword ptr -80h
.text:0BADCC46 arg_0           = dword ptr  8
.text:0BADCC46
.text:0BADCC46                 push    ebp
.text:0BADCC47                 mov     ebp, esp
.text:0BADCC49                 push    edi
.text:0BADCC4A                 push    esi
.text:0BADCC4B                 push    ebx
.text:0BADCC4C                 sub     esp, 0D8h
.text:0BADCC52                 mov     eax, dword_BAFB484
.text:0BADCC57                 mov     [ebp+var_AC], 0
.text:0BADCC61                 push    offset aStartingStub__ ; "starting stub ...\n"
.text:0BADCC66                 mov     [ebp+var_BC], eax
.text:0BADCC6C                 call    sub_BADC25A

main() begins by passing "starting stub ...\n" to sub_BADC25A. Since that's the first message printed when we start Isengard, we can deduce that sub_BADC25A is printf().

Looking further into main(), it's not immediately apparent what's going on at first. However, the use of strings like /dev/urandom jumps out. What's happening there?

.text:0BADCD54                 push    0
.text:0BADCD56                 push    offset aDevUrandom ; "/dev/urandom"
.text:0BADCD5B                 call    sub_BADC18E
.text:0BADCD60                 add     esp, 10h
.text:0BADCD63                 test    eax, eax
.text:0BADCD65                 mov     edi, eax
.text:0BADCD67                 js      short try_random
.text:0BADCD69                 lea     edx, [ebp+var_A0]
.text:0BADCD6F                 push    eax
.text:0BADCD70                 push    20h
.text:0BADCD72                 push    edx
.text:0BADCD73                 push    edi
.text:0BADCD74                 mov     [ebp+var_C0], edx
.text:0BADCD7A                 call    sub_BADC187

Translated into pseudocode, it basically looks like:

ret = sub_BADC18E("/dev/urandom", 0)
sub_BADC187(ret, &var_A0, 0x20)

This seems like an open() followed by a read(). If we actually look into the implementations of those two subroutines, that's confirmed to be the case. Both of them set up the correct arguments and invoke int 0x80 (the Linux syscall instruction). So we've found open() and read() statically linked into Isengard.

In fact, several adjacent subroutines are also statically linked POSIX calls. We can identify them using the Linux syscall table and name them, which will make later reversing easier. Having named the Unix API functions, what new insights do we gain?

Near the middle of main() is a strange call to mmap():

.text:0BADCE77                 push    0               ; int
.text:0BADCE79                 push    0FFFFFFFFh      ; fd
.text:0BADCE7B                 mov     [ebp+var_D0], edx
.text:0BADCE81                 or      ecx, 12h
.text:0BADCE84                 push    ecx             ; flags
.text:0BADCE85                 push    7               ; prot
.text:0BADCE87                 mov     ecx, eax
.text:0BADCE89                 mov     esi, [ebx+0Ch]
.text:0BADCE8C                 and     ecx, 0FFFh
.text:0BADCE92                 and     eax, 0FFFFF000h
.text:0BADCE97                 lea     ecx, [esi+ecx+0FFFh]
.text:0BADCE9E                 and     ecx, 0FFFFF000h
.text:0BADCEA4                 push    ecx             ; len
.text:0BADCEA5                 push    eax             ; start
.text:0BADCEA6                 call    mmap

Why is mmap() being used? Curiously, the prot value is 7, meaning that we're mapping pages that are readable, writable, and executable. It seems as if we're generating executable code on the fly to run later. For now, it's not entirely clear what role that plays. We'll come back to it later.

We also know that Isengard initially asks us for a password. If we enter "youshallnotpass", as it tells us to, we go into that weird input loop where hitting Enter enough times exits the program. If we enter anything else, it prints "Integrity error. Good password ?" and quits.

So entering an invalid password causes a data integrity error. That implies the password is being used to decode some kind of data. In that case, it's a good idea to see what it does with the password. Looking for references to read(), we can find the password input function as sub_BADCA35. Let's call it authenticate().

authenticate() does a lot, and without context, it's hard to understand. But we can discern the key parts:

  • The "Password: " prompt is printed in loc_BADCA72
  • Each character of the password is read in a loop starting at loc_BADCAB9
  • After the password is read, some unknown processing occurs in loc_BADCAEC and loc_BADCB5E. For whatever reason, the former has a printf("Curve25519 key exchange failed !\n") call. This implies that the password is used in some kind of crypto algorithm.
  • loc_BADCB5E compares the results of this processing against a static 0x20 byte array. It prints the integrity error if they're not equal. Otherwise, the function returns normally.
Thus, authenticate() gets a password, runs it through some crypto-related operations, and compares it against a known correct value. Building off our earlier guess, it seems like authenticate() is using the password to decrypt some data. If that's the case, the code after authenticate() likely does something with that data.

authenticate() is called from main(). In fact, it's invoked just before the mmap() section. So mmap(), which we know is creating executable code, appears to be generating this code from the data we decrypted. What exactly happens between authenticate() and mmap()?

.text:0BADCE20 loc_BADCE20:                            ; CODE XREF: main+1C6 j
.text:0BADCE20                 call    authenticate
.text:0BADCE25                 mov     eax, [ebp+var_BC]
.text:0BADCE2B                 mov     [ebp+var_C0], 0
.text:0BADCE35                 mov     [ebp+var_C8], 0
.text:0BADCE3F                 mov     edx, [eax+1Ch]
.text:0BADCE42                 add     edx, dword_BAFB484
.text:0BADCE48                 movzx   eax, word ptr [eax+2Ch]
.text:0BADCE4C                 lea     ebx, [edx+8]
.text:0BADCE4F                 mov     [ebp+var_CC], eax
.text:0BADCE55
.text:0BADCE55 loc_BADCE55:                            ; CODE XREF: main+2A8 j
.text:0BADCE55                 mov     edi, [ebp+var_CC]
.text:0BADCE5B                 cmp     [ebp+var_C8], edi
.text:0BADCE61                 jge     loc_BADCEF3
.text:0BADCE67                 cmp     dword ptr [ebx-8], 1
.text:0BADCE6B                 jnz     short loc_BADCEE5
.text:0BADCE6D                 mov     eax, [ebx]
.text:0BADCE6F                 push    ecx
.text:0BADCE70                 push    ecx
.text:0BADCE71                 mov     ecx, ds:dword_BAFB58C

mmap()'s purpose is to create a memory region, at a specific address with a specific length. When mmap() is called, EAX holds the address and ECX holds the length. Working backwards through the above code sample, we see that these values ultimately originate from var_BC. It seems like an in-memory structure is being parsed to get the mmap() parameters. What is this structure?

Let's use GDB to figure that out:

(gdb) b *0x0BADCE25
Breakpoint 1 at 0xbadce25
(gdb) r
Starting program: /home/user/isengard
[...REDACTED FOR BREVITY...]
Password:

Breakpoint 1, 0x0badce25 in ?? ()
(gdb) si
0x0badce2b in ?? ()
(gdb) info registers
eax            0xda8072c        229115692
ecx            0x0      0
edx            0xffffd3b8       -11336
ebx            0xffffd650       -10672
esp            0xffffd5f8       0xffffd5f8
ebp            0xffffd6d0       0xffffd6d0
esi            0xffffd628       -10712
edi            0x0      0
eip            0xbadce2b        0xbadce2b
eflags         0x286    [ PF SF IF ]
cs             0x23     35
ss             0x2b     43
ds             0x2b     43
es             0x2b     43
fs             0x0      0
gs             0x0      0
(gdb) x $eax
0xda8072c:      0x464c457f

Converting that value into ASCII, we get "\x7FELF". This is the magic number for an ELF file! It seems that there's an ELF file in memory, being loaded with mmap(). Isengard's purpose is hiding a real binary inside of itself.


Using GDB's dump command, we can dump the ELF into a file. Running the file command confirms it is a valid ELF. Now we can begin reverse engineering the apparent meat of the problem:

.text:080486E7 main            proc near               ; DATA XREF: start+17 o
.text:080486E7
.text:080486E7 var_10          = dword ptr -10h
.text:080486E7 var_C           = dword ptr -0Ch
.text:080486E7 var_4           = dword ptr -4
.text:080486E7
.text:080486E7                 lea     ecx, [esp+4]
.text:080486EB                 and     esp, 0FFFFFFF0h
.text:080486EE                 push    dword ptr [ecx-4]
.text:080486F1                 push    ebp
.text:080486F2                 mov     ebp, esp
.text:080486F4                 push    ecx
.text:080486F5                 sub     esp, 14h
.text:080486F8                 mov     [ebp+var_10], 0
.text:080486FF                 jmp     short loc_804870A
.text:08048701 ; ---------------------------------------------------------------------------
.text:08048701
.text:08048701 loc_8048701:                            ; CODE XREF: main+29 j
.text:08048701                 call    sub_804862A
.text:08048706                 add     [ebp+var_10], 1
.text:0804870A
.text:0804870A loc_804870A:                            ; CODE XREF: main+18 j
.text:0804870A                 mov     eax, [ebp+var_10]
.text:0804870D                 cmp     eax, 25h
.text:08048710                 jbe     short loc_8048701
.text:08048712                 mov     [ebp+var_C], 1
.text:08048719                 mov     [ebp+var_10], 0
.text:08048720                 jmp     short loc_8048753

It begins by calling sub_804862A in a loop 0x26 times. sub_804862A's main section is as follows:

.text:0804863E                 push    2               ; nbytes
.text:08048640                 lea     eax, [ebp+buf]
.text:08048643                 push    eax             ; buf
.text:08048644                 push    0               ; fd
.text:08048646                 call    _read
.text:0804864B                 add     esp, 10h
.text:0804864E                 mov     eax, ds:dword_804A510
.text:08048653                 test    eax, eax
.text:08048655                 jnz     short loc_8048660
.text:08048657                 mov     [ebp+var_18], 0
.text:0804865E                 jmp     short loc_804866F
.text:08048660 ; ---------------------------------------------------------------------------
.text:08048660
.text:08048660 loc_8048660:                            ; CODE XREF: readchar+2B j
.text:08048660                 mov     eax, ds:dword_804A510
.text:08048665                 mov     eax, ds:dword_804A520[eax*4]
.text:0804866C                 mov     [ebp+var_18], eax
.text:0804866F
.text:0804866F loc_804866F:                            ; CODE XREF: readchar+34 j
.text:0804866F                 sub     esp, 4
.text:08048672                 push    1
.text:08048674                 lea     eax, [ebp+buf]
.text:08048677                 push    eax
.text:08048678                 push    [ebp+var_18]
.text:0804867B                 call    sub_804855B
.text:08048680                 add     esp, 10h
.text:08048683                 mov     [ebp+var_14], eax
.text:08048686                 mov     eax, ds:dword_804A510
.text:0804868B                 mov     edx, [ebp+var_14]
.text:0804868E                 mov     ds:dword_804A520[eax*4], edx
.text:08048695                 mov     eax, ds:dword_804A510
.text:0804869A                 add     eax, 1
.text:0804869D                 mov     ds:dword_804A510, eax

Essentially, it does this:

  • Reads 2 bytes of keyboard input into a stack buffer. (This is why, after entering "youshallnotpass" as the password, we had to hit Enter a bunch of times to quit)
  • Checks the value of dword_804A510. If it's 0, var_18 is set to 0. Otherwise, dword_804A510 is used as an index into the dword_804A520 array, and that fetched value becomes var_18.
  • Calls sub_804855B(var_18, buf, 1)
  • Uses dword_804A510 is used as an index into the dword_804A520 array, storing the return value of sub_804855B() there
  • Increments dword_804A510

We can deduce what a lot of these variables and routines are for. This overall subroutine is used to read keyboard input, process it into an int, and store it in the dword_804A520 array. Each time this subroutine is called (0x26 times in total), dword_804A510 (the index into the array) is incremented.

Pseudocode representing what's going on would look like this:

# outbuf is the dword_804A520 array
# outlen is dword_804A510

read(0, buf, 2)

var = 0
if (outlen > 0) var = outbuf[outlen]

outbuf[outlen] = sub_804855B(var, buf, 1)
outlen++

Presumably, this processed keyboard input that we're collecting will be used later in main(). But we're missing two key things before we can understand this routine:

  • What does sub_804855B() do?
  • How is outbuf[outlen] being referenced before any processed characters have been written there? When outlen=1, we're setting var to outbuf[1], but outbuf[1] is filled in with sub_804855B()'s output later.

sub_804855B() does the following:

.text:0804855B obfuscate       proc near               ; CODE XREF: readchar+51 p
.text:0804855B
.text:0804855B buf             = dword ptr -4
.text:0804855B key             = dword ptr  8
.text:0804855B arg_4           = dword ptr  0Ch
.text:0804855B len             = dword ptr  10h
.text:0804855B
.text:0804855B                 push    ebp
.text:0804855C                 mov     ebp, esp
.text:0804855E                 sub     esp, 10h
.text:08048561                 mov     eax, [ebp+arg_4]
.text:08048564                 mov     [ebp+buf], eax
.text:08048567                 not     [ebp+key]
.text:0804856A                 jmp     short loc_8048593
.text:0804856C ; ---------------------------------------------------------------------------
.text:0804856C
.text:0804856C loc_804856C:                            ; CODE XREF: obfuscate+43 j
.text:0804856C                 mov     eax, [ebp+buf]
.text:0804856F                 lea     edx, [eax+1]
.text:08048572                 mov     [ebp+buf], edx
.text:08048575                 movzx   eax, byte ptr [eax]
.text:08048578                 movzx   eax, al
.text:0804857B                 xor     eax, [ebp+key]
.text:0804857E                 movzx   eax, al
.text:08048581                 mov     eax, dword_804A060[eax*4]
.text:08048588                 mov     edx, [ebp+key]
.text:0804858B                 shr     edx, 8
.text:0804858E                 xor     eax, edx
.text:08048590                 mov     [ebp+key], eax
.text:08048593
.text:08048593 loc_8048593:                            ; CODE XREF: obfuscate+F j
.text:08048593                 mov     eax, [ebp+len]
.text:08048596                 lea     edx, [eax-1]
.text:08048599                 mov     [ebp+len], edx
.text:0804859C                 test    eax, eax
.text:0804859E                 jnz     short loc_804856C
.text:080485A0                 mov     eax, [ebp+key]
.text:080485A3                 not     eax
.text:080485A5                 leave
.text:080485A6                 retn
.text:080485A6 obfuscate       endp

key is var_18 from the calling function. buf is the buffer containing keyboard input. And len is 1. Based on that, we're only using the first character of keyboard input. Presumably, then, the second is meant to be a newline.

Overall, this is a simple algorithm that processes an ASCII character into an integer. It achieves this by looking up a value in dword_804A060, which we'll call srcbuf. Pseudocode for this function will come later.

We know almost everything about how individual input characters are read. But how are we using spots in outbuf that aren't yet initialized? As it happens, outbuf has a special preinit routine:

.text:080485E6 sub_80485E6     proc near               ; CODE XREF: sub_8048790+44 p
.text:080485E6                                         ; DATA XREF: .init_array:08049F04 o
.text:080485E6
.text:080485E6 var_8           = dword ptr -8
.text:080485E6 var_4           = dword ptr -4
.text:080485E6
.text:080485E6                 push    ebp
.text:080485E7                 mov     ebp, esp
.text:080485E9                 sub     esp, 10h
.text:080485EC                 mov     [ebp+var_4], 0CAF3B33Bh
.text:080485F3                 mov     [ebp+var_8], 0
.text:080485FA                 jmp     short loc_8048619
.text:080485FC ; ---------------------------------------------------------------------------
.text:080485FC
.text:080485FC loc_80485FC:                            ; CODE XREF: sub_80485E6+3B j
.text:080485FC                 mov     eax, [ebp+var_8]
.text:080485FF                 mov     eax, ds:outbuf[eax*4]
.text:08048606                 xor     eax, [ebp+var_4]
.text:08048609                 mov     edx, eax
.text:0804860B                 mov     eax, [ebp+var_8]
.text:0804860E                 mov     ds:outbuf[eax*4], edx
.text:08048615                 add     [ebp+var_8], 1
.text:08048619
.text:08048619 loc_8048619:                            ; CODE XREF: sub_80485E6+14 j
.text:08048619                 mov     eax, [ebp+var_8]
.text:0804861C                 cmp     eax, 97h
.text:08048621                 jbe     short loc_80485FC
.text:08048623                 mov     eax, 0
.text:08048628                 leave
.text:08048629                 retn
.text:08048629 sub_80485E6     endp

Since outbuf is filled with zeroes before its initialization, due to being in the BSS, all this does is set the first 0x97 spots to 0xCAF3B33B.

Now that the character input and processing is figured out, what does main() do with outbuf?

.text:08048722 do_cmp:                                 ; CODE XREF: main+72 j
.text:08048722                 cmp     [ebp+var_C], 0
.text:08048726                 jz      short not_equal
.text:08048728                 mov     eax, [ebp+n]
.text:0804872B                 mov     edx, ds:outbuf[eax*4]
.text:08048732                 mov     eax, [ebp+n]
.text:08048735                 mov     eax, cmpbuf[eax*4]
.text:0804873C                 cmp     edx, eax
.text:0804873E                 jnz     short not_equal
.text:08048740                 mov     eax, 1
.text:08048745                 jmp     short cmp_loop
.text:08048747 ; ---------------------------------------------------------------------------
.text:08048747
.text:08048747 not_equal:                              ; CODE XREF: main+3F j
.text:08048747                                         ; main+57 j
.text:08048747                 mov     eax, 0
.text:0804874C
.text:0804874C cmp_loop:                               ; CODE XREF: main+5E j
.text:0804874C                 mov     [ebp+var_C], eax
.text:0804874F                 add     [ebp+n], 1
.text:08048753
.text:08048753 loc_8048753:                            ; CODE XREF: main+39 j
.text:08048753                 mov     eax, [ebp+n]
.text:08048756                 cmp     eax, 25h
.text:08048759                 jbe     short do_cmp
.text:0804875B                 cmp     [ebp+var_C], 0
.text:0804875F                 jnz     short great_8048773
.text:08048761                 sub     esp, 0Ch
.text:08048764                 push    offset format   ; "woaw"
.text:08048769                 call    _printf
.text:0804876E                 add     esp, 10h
.text:08048771                 jmp     short loc_8048783
.text:08048773 ; ---------------------------------------------------------------------------
.text:08048773
.text:08048773 great_8048773:                          ; CODE XREF: main+78 j
.text:08048773                 sub     esp, 0Ch
.text:08048776                 push    offset aGreat   ; "great"
.text:0804877B                 call    _printf
.text:08048780                 add     esp, 10h

Pretty simple: it compares all 0x26 ints of outbuf to the corresponding ints in another array, cmpbuf. If they all match, it prints "great"; otherwise, it prints "woaw". Presumably, we want "great".

All of the reverse engineering work can be transformed into a pseudocode algorithm. I wrote this paste during the competition to serve that purpose.

Reversing the algorithm to get the flag is quite simple. We just need to make this equation match for every i'th character:

cmpbuf[i] = ~(srcbuf[(uint8_t)(ch ^ ~key)] ^ (~key >> 8))

Daunting as this may seem, we know the key for every single i. In fact, it's always either 0 (for the first character) or 0xCAF3B33B (for all subsequent ones). Calculating ch for every cmpbuf[i], and putting it all together, we get the solution: flag{oh_man_such_many_ant3_d00b00gers}