libprotobuf-mutator 简单分析

What are protocol buffers?

Protocol buffers are Google’s language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages.

这段是 Google 官方网站给出的介绍,protobuf 可以自动化生成代码,用于读入或者写入结构化数据。一个简单的 protobuf 文件可以是这样的:

1
2
3
4
5
message Person {
required string name = 1;
required int32 id = 2;
optional string email = 3;
}

具体的语法可以参考Google 的文档 proto2proto3。c++ 语言使用 protobuf 的示例可以参见Protocol Buffer Basics: C++ 文档,基本步骤总结如下:

  1. 写 protobuf 文件,表达数据结构
  2. 利用 protoc 自动生成代码 (支持多种语言 C++ Java 等)
  3. 利用生成的文件解析或者写入相关数据结构

libprotobuf-mutator

编译安装

参考: https://github.com/google/libprotobuf-mutator/blob/master/README.md

1
2
3
4
5
git clone https://github.com/google/libprotobuf-mutator.git
mkdir build
cd build
cmake .. -GNinja -DCMAKE_C_COMPILER=clang -DLIB_PROTO_MUTATOR_DOWNLOAD_PROTOBUF=ON -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_BUILD_TYPE=Debug
https_proxy=http://127.0.0.1:3128 ninja check

由于需要从Google下载一些源码,所以在 ninja check 的时候需要挂上代理,结果编译出错了,找不到 libxml2.a,排查一下编译参数,发现需要添加编译静态库的参数

1
2
3
4
5
6
7
8
9
10
11
12
diff --git a/cmake/external/libxml2.cmake b/cmake/external/libxml2.cmake
index c00ace2..a944fab 100644
--- a/cmake/external/libxml2.cmake
+++ b/cmake/external/libxml2.cmake
@@ -38,6 +38,7 @@ ExternalProject_Add(${LIBXML2_TARGET}
UPDATE_COMMAND ""
CONFIGURE_COMMAND ${LIBXML2_SRC_DIR}/autogen.sh --without-python
--prefix=${LIBXML2_INSTALL_DIR}
+ --enable-static=yes
CC=${CMAKE_C_COMPILER}
CXX=${CMAKE_CXX_COMPILER}
CFLAGS=${LIBXML2_CFLAGS}

修改后可以正常通过 ninja check 命令的所有检查。默认情况下 ninja install 会安装到 /usr/local 目录,因为考虑到后续需要给 afl+使用,所以需要使用下的命令重新 cmake 一下

1
2
cmake .. -GNinja -DCMAKE_C_COMPILER=clang -DLIB_PROTO_MUTATOR_DOWNLOAD_PROTOBUF=ON \
-DCMAKE_INSTALL_PREFIX=/home/henices/code/AFL+/external/libprotobuf-mutator -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_BUILD_TYPE=Debug

libprotobuf-mutator 代码实现

libfuzzer_macro.h

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#define DEFINE_TEST_ONE_PROTO_INPUT_IMPL(use_binary, Proto)                 \
extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) { \
using protobuf_mutator::libfuzzer::LoadProtoInput; \
Proto input; \
if (LoadProtoInput(use_binary, data, size, &input)) \
TestOneProtoInput(input); \
return 0; \
}

// Defines custom mutator, crossover and test functions using default
// serialization format. Default is text.
#define DEFINE_PROTO_FUZZER(arg) DEFINE_TEXT_PROTO_FUZZER(arg)
// Defines custom mutator, crossover and test functions using text
// serialization. This format is more convenient to read.
#define DEFINE_TEXT_PROTO_FUZZER(arg) DEFINE_PROTO_FUZZER_IMPL(false, arg)
// Defines custom mutator, crossover and test functions using binary
// serialization. This makes mutations faster. However often test function is
// significantly slower than mutator, so fuzzing rate may stay unchanged.
#define DEFINE_BINARY_PROTO_FUZZER(arg) DEFINE_PROTO_FUZZER_IMPL(true, arg)

#define DEFINE_PROTO_FUZZER_IMPL(use_binary, arg) \
static void TestOneProtoInput(arg); \
using FuzzerProtoType = std::remove_const<std::remove_reference< \
std::function<decltype(TestOneProtoInput)>::argument_type>::type>::type; \
DEFINE_CUSTOM_PROTO_MUTATOR_IMPL(use_binary, FuzzerProtoType) \
DEFINE_CUSTOM_PROTO_CROSSOVER_IMPL(use_binary, FuzzerProtoType) \
DEFINE_TEST_ONE_PROTO_INPUT_IMPL(use_binary, FuzzerProtoType) \
DEFINE_POST_PROCESS_PROTO_MUTATION_IMPL(FuzzerProtoType) \
static void TestOneProtoInput(arg)

调用路径为 DEFINE_PROTO_FUZZER -> DEFINE_TEXT_PROTO_FUZZER -> DEFINE_PROTO_FUZZER_IMPL -> DEFINE_TEST_ONE_PROTO_INPUT_IMPL -> LLVMFuzzerTestOneInput -> TestOneProtoInput
最终还是实现了LLVMFuzzerTestOneInput libfuzzer 的入口方法,使用 macro 可以少写不少代码非常方便,从这里就可以看出 fuzz 的 target function 必须和 LLVMFuzzerTestOneInput 的参数类型一致

1
2
3
4
5
// fuzz_target.cc
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
DoSomethingInterestingWithMyAPI(Data, Size);
return 0; // Non-zero return values are reserved for future use.
}

libfuzzer_macro.cc

1
2
3
4
5
6
7
8
9
10
bool LoadProtoInput(bool binary, const uint8_t* data, size_t size,
protobuf::Message* input) {
if (GetCache()->LoadIfSame(data, size, input)) return true;
auto result = binary ? ParseBinaryMessage(data, size, input)
: ParseTextMessage(data, size, input);
if (!result) return false;
GetMutator()->Seed(size);
GetMutator()->Fix(input);
return true;
}

LoadProtoInput 返回 true 或者 false,如果解析成功,将调用 TestOneProtoInputDEFINE_PROTO_FUZZER macro 其实就是写 TestOneProtoInput 的实现。

libprotobuf-mutator_fuzzing_learning

这是 github 上一位同行写的学习 libprotobuf-mutator fuzzing 的文章,总体写的不错,但是其中有一些错误的地方,在实践过程中都记录了其中的修改。

Simple protobuf example

文章:https://github.com/bruce30262/libprotobuf-mutator_fuzzing_learning/tree/master/1_simple_protobuf

先写一个简单的 protobuf 文件,test.proto

1
2
3
4
5
6
syntax = "proto2";

message TEST {
required uint32 a = 1;
required string b = 2;
}

使用 protoc 编译 protobuf 文件

1
2
mkdir genfiles  
protoc ./test.proto --cpp_out=./genfiles

将自动生成两个文件

1
2
ls ./genfiles
test.pb.cc test.pb.h

写一个测试程序 test_proto.cc

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#include "test.pb.h"

#include <bits/stdc++.h>

using std::cin;
using std::cout;
using std::endl;

int main(int argc, char *argv[])
{
TEST t;
t.set_a(101);
t.set_b("testtest");
cout << t.a() << endl;
cout << t.b() << endl;
return 0;
}

写 Makefile 来编译 test_proto.cc

1
2
3
4
5
6
7
8
9
10
11
12
13
CXX=clang++
PB_SRC=test.pb.cc

PROTOBUF_DIR=$(HOME)/code/libprotobuf-mutator/build/external.protobuf/
PROTOBUF_LIB=$(PROTOBUF_DIR)/lib/libprotobufd.a
INC=-I$(PROTOBUF_DIR)/include

test_proto: test_proto.cc $(PB_SRC)
$(CXX) -o $@ $^ $(PROTOBUF_LIB) $(INC)

.PHONY: clean
clean:
rm test_proto

执行 make 后报错 ./test.pb.h:17:2: error: This file was generated by an older version of protoc which is,因为我们在编译的时候使用了 -DLIB_PROTO_MUTATOR_DOWNLOAD_PROTOBUF=ON 下载了新版本的 protobuf,所以出了这个错误。只好使用下载的 protoc 重新生成 test.pb.cctest.pb.h

1
~/code/libprotobuf-mutator/build/external.protobuf/bin/protoc ./test.proto --cpp_out=./genfiles

这次可以成功 make 了,实际执行的命令是

1
2
clang++ -o test_proto test_proto.cc test.pb.cc /home/henices/code/libprotobuf-mutator/build/external.protobuf//lib/libprotobufd.a \
-I/home/henices/code/libprotobuf-mutator/build/external.protobuf//include

运行 ./test_proto, 输出如下

1
2
101
testtest

Combine libprotobuf-mutator with libfuzzer

代码在: https://github.com/bruce30262/libprotobuf-mutator_fuzzing_learning/tree/master/2_libprotobuf_libfuzzer

harness.cc

1
2
3
4
5
6
7
8
9
#include <stdint.h>
#include <stddef.h>

extern "C" int FuzzTEST(const uint8_t *data, size_t size) {
if(data[0] == '\x01') {
__builtin_trap();
}
return 0;
}

其中 FuzzTEST 是我们需要测试的目标函数。

lpm_libfuzz.cc

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#include "libprotobuf-mutator/src/libfuzzer/libfuzzer_macro.h"
#include "test.pb.h"

#include <bits/stdc++.h>

using std::cin;
using std::cout;
using std::endl;

std::string ProtoToData(const TEST &test_proto) {
std::stringstream all;
const auto &aa = test_proto.a();
const auto &bb = test_proto.b();
all.write((const char*)&aa, sizeof(aa));
if(bb.size() != 0) {
all.write(bb.c_str(), bb.size());
}

std::string res = all.str();
if (bb.size() != 0 && res.size() != 0) {
// set PROTO_FUZZER_DUMP_PATH env to dump the serialized protobuf
if (const char *dump_path = getenv("PROTO_FUZZER_DUMP_PATH")) {
std::ofstream of(dump_path);
of.write(res.data(), res.size());
}
}
return res;
}

extern "C" int FuzzTEST(const uint8_t* data, size_t size); // our customized fuzzing function

DEFINE_PROTO_FUZZER(const TEST &test_proto) {
auto s = ProtoToData(test_proto); // convert protobuf to raw data
FuzzTEST((const uint8_t*)s.data(), s.size()); // fuzz the function
}

在文件的最开头导入 libfuzzer_macro.h, 后面就可以使用一些宏来写代码了,DEFINE_PROTO_FUZZER 是关键的 fuzzer 入口。结构化变异部分 libprotobuf-mutator 已经完成了,需要实现的是一个由 protobuf 转需要的数据类型的函数,如上面的 ProtoToData

代码编译

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
TARGET=lpm_libfuzz
CXX=clang++
CXXFLAGS=-g -fsanitize=fuzzer,address
PB_SRC=test.pb.cc

PROTOBUF_DIR=$(HOME)/code/libprotobuf-mutator/build/external.protobuf/
LPM_DIR=$(HOME)/code/AFL+/external/libprotobuf-mutator
PROTOBUF_LIB=$(PROTOBUF_DIR)/lib/libprotobufd.a
LPM_LIB=$(LPM_DIR)/lib/libprotobuf-mutator-libfuzzer.a $(LPM_DIR)/lib/libprotobuf-mutator.a
INC=-I$(PROTOBUF_DIR)/include -I$(LPM_DIR)/include
DFUZZ=-DLLVMFuzzerTestOneInput=FuzzTEST

all: $(TARGET)

# for testing libprotobuf + libfuzzer
# compile harness first
# then link lpm_libfuzz with harness.o & static libraries
harness.o: harness.cc
$(CXX) $(CXXFLAGS) -c $(DFUZZ) $<

$(TARGET): harness.o $(TARGET).cc
$(CXX) $(CXXFLAGS) -o $@ $^ $(PB_SRC) $(LPM_LIB) $(PROTOBUF_LIB) $(INC) # $(LPM_LIB) must be placed before $(PROTOBUF_LIB)

.PHONY: clean
clean:
rm $(TARGET) *.o

make 后报错,找不到头文件

1
2
3
4
/home/henices/code/AFL+/external/libprotobuf-mutator/include/libprotobuf-mutator/src/libfuzzer/libfuzzer_macro.h:24:10: fatal error: 'port/protobuf.h' file not found
#include "port/protobuf.h"
^~~~~~~~~~~~~~~~~
1 error generated.

port/protobuf.h 来自 https://github.com/google/libprotobuf-mutator/blob/master/port/protobuf.h 修改 Makefile 如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
TARGET=lpm_libfuzz
CXX=clang++
CXXFLAGS=-g -fsanitize=fuzzer,address
PB_SRC=test.pb.cc

PROTOBUF_DIR=$(HOME)/code/libprotobuf-mutator/build/external.protobuf/
LPM_DIR=$(HOME)/code/AFL+/external/libprotobuf-mutator
PROTOBUF_LIB=$(PROTOBUF_DIR)/lib/libprotobufd.a
LPM_LIB=$(LPM_DIR)/lib/libprotobuf-mutator-libfuzzer.a $(LPM_DIR)/lib/libprotobuf-mutator.a
INC=-I$(PROTOBUF_DIR)/include -I$(HOME)/code/libprotobuf-mutator/ -I$(LPM_DIR)/include
DFUZZ=-DLLVMFuzzerTestOneInput=FuzzTEST

all: $(TARGET)

# for testing libprotobuf + libfuzzer
# compile harness first
# then link lpm_libfuzz with harness.o & static libraries
harness.o: harness.cc
$(CXX) $(CXXFLAGS) -c $(DFUZZ) $<

$(TARGET): harness.o $(TARGET).cc
$(CXX) $(CXXFLAGS) -o $@ $^ $(PB_SRC) $(LPM_LIB) $(PROTOBUF_LIB) $(INC) # $(LPM_LIB) must be placed before $(PROTOBUF_LIB)

.PHONY: clean
clean:
rm $(TARGET) *.o

修改 makefile 后,可以正常编译通过。经过我们上面的分析,不需要定义 DFUZZ=-DLLVMFuzzerTestOneInput=FuzzTEST 将这行删除掉,同样可以编译通过。

执行 lpm_libfuzz 运行正常

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
➜  ./lpm_libfuzz                                                                                                                                                                                          
INFO: found LLVMFuzzerCustomMutator (0x758a80). Disabling -len_control by default.
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 331712324
INFO: Loaded 1 modules (434 inline 8-bit counters): 434 [0xa056b8, 0xa0586a),
INFO: Loaded 1 PC tables (434 PCs): 434 [0x939398,0x93aeb8),
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
INFO: A corpus is not provided, starting from an empty corpus
#2 INITED cov: 82 ft: 83 corp: 1/1b exec/s: 0 rss: 35Mb
NEW_FUNC[1/41]: 0x75abb0 in TEST::~TEST() /tmp/genfiles/test.pb.cc:116
NEW_FUNC[2/41]: 0x75ca10 in google::protobuf::UnknownFieldSet* google::protobuf::internal::InternalMetadata::mutable_unknown_fields<google::protobuf::UnknownFieldSet>() /home/henices/code/libprotobuf-mut
ator/build/external.protobuf//include/google/protobuf/metadata_lite.h:117
#3 NEW cov: 135 ft: 159 corp: 2/12b lim: 4096 exec/s: 0 rss: 36Mb L: 11/11 MS: 1 CustomCrossOver-
#4 NEW cov: 138 ft: 162 corp: 3/201b lim: 4096 exec/s: 0 rss: 36Mb L: 189/189 MS: 2 InsertRepeatedBytes-Custom-
#38 REDUCE cov: 138 ft: 162 corp: 3/127b lim: 4096 exec/s: 0 rss: 37Mb L: 115/115 MS: 5 CustomCrossOver-CustomCrossOver-Custom-InsertRepeatedBytes-Custom-
#60 REDUCE cov: 138 ft: 162 corp: 3/124b lim: 4096 exec/s: 0 rss: 37Mb L: 112/112 MS: 4 CustomCrossOver-ShuffleBytes-ChangeByte-Custom-
#98 REDUCE cov: 138 ft: 162 corp: 3/56b lim: 4096 exec/s: 0 rss: 37Mb L: 44/44 MS: 5 CrossOver-Custom-Custom-CMP-Custom- DE: "~\xff\xff\xff\xff\xff\xff\xff"-
#144 REDUCE cov: 138 ft: 162 corp: 3/53b lim: 4096 exec/s: 0 rss: 37Mb L: 41/41 MS: 2 CrossOver-Custom-
#182 REDUCE cov: 138 ft: 162 corp: 3/25b lim: 4096 exec/s: 0 rss: 37Mb L: 13/13 MS: 5 ChangeBit-Custom-Custom-InsertByte-Custom-
#324 REDUCE cov: 138 ft: 162 corp: 3/24b lim: 4096 exec/s: 0 rss: 38Mb L: 12/12 MS: 3 CustomCrossOver-EraseBytes-Custom-

Handling input from AFL++ in our custom mutator

https://github.com/bruce30262/libprotobuf-mutator_fuzzing_learning/tree/master/5_libprotobuf_aflpp_custom_mutator_input

主要内容是将 libprotobuf-mutator 和 afl++ 结合起来,使用的 afl++ 的 custom mutator,值得一提的是在这个例子里,需要使用 -fPIC 参数编译 libprotobuf-mutator。

1
2
3
4
cmake .. -GNinja -DCMAKE_C_COMPILER=clang -DLIB_PROTO_MUTATOR_DOWNLOAD_PROTOBUF=ON \
-DCMAKE_INSTALL_PREFIX=/home/henices/code/AFL+/external/libprotobuf-mutator \
-DCMAKE_CXX_COMPILER=clang++ -DCMAKE_BUILD_TYPE=Debug \
-DCMAKE_C_FLAGS="-fPIC" -DCMAKE_CXX_FLAGS="-fPIC"

主要的步骤在 readme.md 中已经介绍得比较清楚了。

  • lpm_aflpp_custom_mutator_input.cc 是 afl++ 的 custom mutator shared library
    • 解析输入数据(testcase buffer)并将其转成 TEST protobuf message
    • 使用 libprotobuf-mutator 变异 TEST protobuf message
    • 注册一个 PostProcessor 处理变异后的 TEST protobuf message (非必要步骤)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
extern "C" size_t afl_custom_fuzz(MyMutator *mutator, // return value from afl_custom_init
uint8_t *buf, size_t buf_size, // input data to be mutated
uint8_t **out_buf, // output buffer
uint8_t *add_buf, size_t add_buf_size, // add_buf can be NULL
size_t max_size) {
// This function can be named either "afl_custom_fuzz" or "afl_custom_mutator"
// A simple test shows that "buf" will be the content of the current test case
// "add_buf" will be the next test case ( from AFL++'s input queue )

TEST input;
// parse input data to TEST
// Notice that input data should be a serialized protobuf data
// Check ./in/ii and test_protobuf_serializer for more detail
bool parse_ok = input.ParseFromArray(buf, buf_size);
if(!parse_ok) {
// Invalid serialize protobuf data. Don't mutate.
// Return a dummy buffer. Also mutated_size = 0
static uint8_t *dummy = new uint8_t[10]; // dummy buffer with no data
*out_buf = dummy;
return 0;
}
// mutate the protobuf
mutator->Mutate(&input, max_size);

// Convert protobuf to raw data
const TEST *p = &input;
std::string s = ProtoToData(*p);
// Copy to a new buffer ( mutated_out )
size_t mutated_size = s.size() <= max_size ? s.size() : max_size; // check if raw data's size is larger than max_size
uint8_t *mutated_out = new uint8_t[mutated_size+1];
memcpy(mutated_out, s.c_str(), mutated_size); // copy the mutated data
// Assign the mutated data and return mutated_size
*out_buf = mutated_out;
return mutated_size;
}

mutator->Mutate(&input, max_size); 为真正起作用的核心代码

  • lpm_aflpp_custom_mutator_input.h 继承了protobuf_mutator::Mutator, 可以使用 libprotobuf-mutator 的 Mutate 方法
1
2
3
4
5
6
7
#include "libprotobuf-mutator/src/mutator.h"
#include "test.pb.h"

#include <bits/stdc++.h>

class MyMutator : public protobuf_mutator::Mutator {
};
  • test_proto_serializer.cc

    • 用于生成一条序列化的 TEST protobuf message,可以作为 fuzz 的初始化 testcase 使用
  • vuln.c 漏洞测试程序

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#include <stdio.h>
#include <string.h>
#include <math.h>
#include <stdlib.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
char str[100]={};
read(0, str, 100);
int *ptr = NULL;
if( str[0] == '\x02' || str[0] == '\xe8') {
*ptr = 123;
}
return 0;
}

漏洞测试程序比较简单,只要第一个字节是 0xe8 或者 0x02 即可,libprotobuf-mutator 的变异在这个例子里效率并不高,所以需要使用 PostProcessor 来优化变异。

  • Makefile
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
TARGET=lpm_aflpp_custom_mutator_input
CXX=clang++-11
AFLCC=$(HOME)/AFLplusplus/afl-gcc
PB_SRC=test.pb.cc

PROTOBUF_DIR=$(HOME)/libprotobuf-mutator/build/external.protobuf
PROTOBUF_LIB=$(PROTOBUF_DIR)/lib/libprotobufd.a

LPM_DIR=$(HOME)/libprotobuf-mutator
LPM_LIB=$(LPM_DIR)/build/src/libfuzzer/libprotobuf-mutator-libfuzzer.a $(LPM_DIR)/build/src/libprotobuf-mutator.a

INC=-I$(PROTOBUF_DIR)/include -I$(LPM_DIR)

all: $(TARGET).so

$(TARGET).so: $(TARGET).cc $(PB_SRC)
$(CXX) -fPIC -c $^ $(INC)
$(CXX) -shared -Wall -O3 -o $@ *.o $(LPM_LIB) $(PROTOBUF_LIB)

vuln: vuln.c
$(AFLCC) -o $@ $^

test_proto_serializer: test_proto_serializer.cc $(PB_SRC)
$(CXX) -o $@ $^ $(PROTOBUF_LIB) $(INC)

.PHONY: clean
clean:
rm *.so *.o vuln test_proto_serializer

Makefile 有瑕疵,这个章节的内容和 libfuzzer 没有关系,不需要链接 libprotobuf-mutator-libfuzzer.a

参考资料

https://developers.google.com/protocol-buffers/docs/cpptutorial
https://github.com/google/fuzzing/blob/master/docs/structure-aware-fuzzing.md
https://github.com/google/libprotobuf-mutator/
https://llvm.org/docs/LibFuzzer.html
https://github.com/bruce30262/libprotobuf-mutator_fuzzing_learning


libprotobuf-mutator 简单分析
http://usmacd.com/cn/libprotobuf-mutator/
Author
henices
Posted on
September 6, 2023
Licensed under