onefuzz 简单分析

deploy agent (部署)

1
2
3
unzip onefuzz-deployment-$VERSION.zip
pip install -r requirements.txt
./deploy.py $REGION $RESOURCE_GROUP_NAME $ONEFUZZ_INSTANCE_NAME $CONTACT_EMAIL_ADDRESS

Azure CLI logged in 后,执行上面命令可以在 Azure 上部署 agent

需要订阅 Azure, 可能要收费

安装 onefuzz CLI

1
2
3
wget https://github.com/microsoft/onefuzz/releases/download/1.0.0/onefuzz-1.0.0-py3-none-any.whl
wget https://github.com/microsoft/onefuzz/releases/download/1.0.0/onefuzztypes-1.0.0-py3-none-any.whl
pip install ./onefuzz*.whl

执行 fuzz 任务

1
onefuzz template libfuzzer basic my-project my-target build-1 my-pool --target_exe fuzz.exe

支持的平台

  1. Python 的 CLI 客户端,需要 Python 3.7 +
  2. Azure 的 OS image 支持 Windows 10 和 Ubuntu Server 18.04
  3. libfuzzer 支持 llvm 8+ (windows, Linux x86/x64), MSVC 16.8+ (支持 ASAN)

支持的 Fuzz 工具

onefuzz 中集成了几个 fuzz 工具: afl afl++ libfuzzer 和 radasma

OneFuzz 的主要工作

主要工作是利用微软的 Azure 云平台进行 fuzz,实现了 Python 版本的接口,可以远程直接操作 Azure 的资源进行 fuzz

api 接口: api-service
agent: agent

项目进展情况

onefuzz 项目主要是一个 fuzz 框架,项目成熟度不高和 Google 的 ClusterFuzz 相比有较大差距。Fuzz 过程也是简单调用fuzz 工具,没有处理特殊情况。文档完备程度也不高,比较感兴趣的 MSVC 和 libfuzzer、ASAN 的集成也没有看到具体代码。另外和微软的 Azrue 深度绑定,用起来也不是太方便,后续将继续关注此项目的进展情况。

一些有用的链接

https://github.com/microsoft/onefuzz/blob/main/docs/getting-started.md
https://github.com/microsoft/onefuzz/blob/main/docs/supported-platforms.md

在线演示

launching-job

llvm Coverage 可视化

Google 提供的工具

Google 提供了一个工具 https://cs.chromium.org/chromium/src/tools/code_coverage/coverage.py

1
2
3
4
5
6
7
8
$ gn gen out/coverage \
--args='use_clang_coverage=true is_component_build=false dcheck_always_on=true'
$ python tools/code_coverage/coverage.py \
crypto_unittests url_unittests \
-b out/coverage -o out/report \
-c 'out/coverage/crypto_unittests' \
-c 'out/coverage/url_unittests --gtest_filter=URLParser.PathURL' \
-f url/ -f crypto/

一些参数的含义:

1
2
3
4
-b 测试 coverage 的 target 的路径
-o 输出报告的路径
-c 测试 coverage 的命令行
-f 过滤,只显示某些路径的 coverage

workflow

实际操作表明, coverage.py 不是太好用。 还是需要一步一步来,比较稳妥。

(0) 编译

在 chromium 项目下,可以直接使用 use_clang_coverage=true and is_component_build=false
如果不是 chromium 项目, 则需要自己 指定参数, 例如在 skia 项目中,可以这么写:

args.gn

1
2
3
4
cc = "/home/henices/clang7/bin/clang"
cxx = "/home/henices/clang7/bin/clang++"
extra_cflags = [ "-fprofile-instr-generate", "-fcoverage-mapping" ]
extra_ldflags = [ "-fprofile-instr-generate", "-fcoverage-mapping" ]

如果是自己的项目,使用 clang 编译时加上这两个参数 -fprofile-instr-generate -fcoverage-mapping

(1) 生成 Raw Profiles 文件

export LLVM_PROFILE_FILE="out/report/target.%4m.profraw" 使用这个命令
限制 profraw 文件的个数。

%p 进程 ID
%h hostname
%Nm 生成几个 profraw 文件

写个循环,将所有的样本跑一遍。 timeout 10 指定程序超时时间。

1
for i  in path ; do timeout 10 target  $i ; done

将在 out/report 目录下, 生成 profraw 文件, 如果没有生成, 则说明上面的代码编译出了问题。

(2) 生成 Indexed Profile

/home/henices/clang7/bin/llvm-profdata merge -j=1 -sparse -o out/report/coverage.profdata out/report/*.profraw

-sparse 能大幅减小 profraw 文件大小

(3) 生成 Coverage report

1
2
3
/home/henices/clang7/bin/llvm-cov show -output-dir=out/report -format=html \
-Xdemangler c++filt -Xdemangler -n -instr-profile=out/report/coverage.profdata \
-object=out/coverage/target

打开 out/report/index.html 可以看到详细的 html 报告,非常不错。

llvm-coverge-html

参考链接

libprotobuf-mutator 简单分析

What are protocol buffers?

Protocol buffers are Google’s language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages.

这段是 Google 官方网站给出的介绍,protobuf 可以自动化生成代码,用于读入或者写入结构化数据。一个简单的 protobuf 文件可以是这样的:

1
2
3
4
5
message Person {
required string name = 1;
required int32 id = 2;
optional string email = 3;
}

具体的语法可以参考Google 的文档 proto2proto3。c++ 语言使用 protobuf 的示例可以参见Protocol Buffer Basics: C++ 文档,基本步骤总结如下:

  1. 写 protobuf 文件,表达数据结构
  2. 利用 protoc 自动生成代码 (支持多种语言 C++ Java 等)
  3. 利用生成的文件解析或者写入相关数据结构

libprotobuf-mutator

编译安装

参考: https://github.com/google/libprotobuf-mutator/blob/master/README.md

1
2
3
4
5
git clone https://github.com/google/libprotobuf-mutator.git
mkdir build
cd build
cmake .. -GNinja -DCMAKE_C_COMPILER=clang -DLIB_PROTO_MUTATOR_DOWNLOAD_PROTOBUF=ON -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_BUILD_TYPE=Debug
https_proxy=http://127.0.0.1:3128 ninja check

由于需要从Google下载一些源码,所以在 ninja check 的时候需要挂上代理,结果编译出错了,找不到 libxml2.a,排查一下编译参数,发现需要添加编译静态库的参数

1
2
3
4
5
6
7
8
9
10
11
12
diff --git a/cmake/external/libxml2.cmake b/cmake/external/libxml2.cmake
index c00ace2..a944fab 100644
--- a/cmake/external/libxml2.cmake
+++ b/cmake/external/libxml2.cmake
@@ -38,6 +38,7 @@ ExternalProject_Add(${LIBXML2_TARGET}
UPDATE_COMMAND ""
CONFIGURE_COMMAND ${LIBXML2_SRC_DIR}/autogen.sh --without-python
--prefix=${LIBXML2_INSTALL_DIR}
+ --enable-static=yes
CC=${CMAKE_C_COMPILER}
CXX=${CMAKE_CXX_COMPILER}
CFLAGS=${LIBXML2_CFLAGS}

修改后可以正常通过 ninja check 命令的所有检查。默认情况下 ninja install 会安装到 /usr/local 目录,因为考虑到后续需要给 afl+使用,所以需要使用下的命令重新 cmake 一下

1
2
cmake .. -GNinja -DCMAKE_C_COMPILER=clang -DLIB_PROTO_MUTATOR_DOWNLOAD_PROTOBUF=ON \
-DCMAKE_INSTALL_PREFIX=/home/henices/code/AFL+/external/libprotobuf-mutator -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_BUILD_TYPE=Debug

libprotobuf-mutator 代码实现

libfuzzer_macro.h

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#define DEFINE_TEST_ONE_PROTO_INPUT_IMPL(use_binary, Proto)                 \
extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) { \
using protobuf_mutator::libfuzzer::LoadProtoInput; \
Proto input; \
if (LoadProtoInput(use_binary, data, size, &input)) \
TestOneProtoInput(input); \
return 0; \
}

// Defines custom mutator, crossover and test functions using default
// serialization format. Default is text.
#define DEFINE_PROTO_FUZZER(arg) DEFINE_TEXT_PROTO_FUZZER(arg)
// Defines custom mutator, crossover and test functions using text
// serialization. This format is more convenient to read.
#define DEFINE_TEXT_PROTO_FUZZER(arg) DEFINE_PROTO_FUZZER_IMPL(false, arg)
// Defines custom mutator, crossover and test functions using binary
// serialization. This makes mutations faster. However often test function is
// significantly slower than mutator, so fuzzing rate may stay unchanged.
#define DEFINE_BINARY_PROTO_FUZZER(arg) DEFINE_PROTO_FUZZER_IMPL(true, arg)

#define DEFINE_PROTO_FUZZER_IMPL(use_binary, arg) \
static void TestOneProtoInput(arg); \
using FuzzerProtoType = std::remove_const<std::remove_reference< \
std::function<decltype(TestOneProtoInput)>::argument_type>::type>::type; \
DEFINE_CUSTOM_PROTO_MUTATOR_IMPL(use_binary, FuzzerProtoType) \
DEFINE_CUSTOM_PROTO_CROSSOVER_IMPL(use_binary, FuzzerProtoType) \
DEFINE_TEST_ONE_PROTO_INPUT_IMPL(use_binary, FuzzerProtoType) \
DEFINE_POST_PROCESS_PROTO_MUTATION_IMPL(FuzzerProtoType) \
static void TestOneProtoInput(arg)

调用路径为 DEFINE_PROTO_FUZZER -> DEFINE_TEXT_PROTO_FUZZER -> DEFINE_PROTO_FUZZER_IMPL -> DEFINE_TEST_ONE_PROTO_INPUT_IMPL -> LLVMFuzzerTestOneInput -> TestOneProtoInput
最终还是实现了LLVMFuzzerTestOneInput libfuzzer 的入口方法,使用 macro 可以少写不少代码非常方便,从这里就可以看出 fuzz 的 target function 必须和 LLVMFuzzerTestOneInput 的参数类型一致

1
2
3
4
5
// fuzz_target.cc
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
DoSomethingInterestingWithMyAPI(Data, Size);
return 0; // Non-zero return values are reserved for future use.
}

libfuzzer_macro.cc

1
2
3
4
5
6
7
8
9
10
bool LoadProtoInput(bool binary, const uint8_t* data, size_t size,
protobuf::Message* input) {
if (GetCache()->LoadIfSame(data, size, input)) return true;
auto result = binary ? ParseBinaryMessage(data, size, input)
: ParseTextMessage(data, size, input);
if (!result) return false;
GetMutator()->Seed(size);
GetMutator()->Fix(input);
return true;
}

LoadProtoInput 返回 true 或者 false,如果解析成功,将调用 TestOneProtoInputDEFINE_PROTO_FUZZER macro 其实就是写 TestOneProtoInput 的实现。

libprotobuf-mutator_fuzzing_learning

这是 github 上一位同行写的学习 libprotobuf-mutator fuzzing 的文章,总体写的不错,但是其中有一些错误的地方,在实践过程中都记录了其中的修改。

Simple protobuf example

文章:https://github.com/bruce30262/libprotobuf-mutator_fuzzing_learning/tree/master/1_simple_protobuf

先写一个简单的 protobuf 文件,test.proto

1
2
3
4
5
6
syntax = "proto2";

message TEST {
required uint32 a = 1;
required string b = 2;
}

使用 protoc 编译 protobuf 文件

1
2
mkdir genfiles  
protoc ./test.proto --cpp_out=./genfiles

将自动生成两个文件

1
2
ls ./genfiles
test.pb.cc test.pb.h

写一个测试程序 test_proto.cc

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#include "test.pb.h"

#include <bits/stdc++.h>

using std::cin;
using std::cout;
using std::endl;

int main(int argc, char *argv[])
{
TEST t;
t.set_a(101);
t.set_b("testtest");
cout << t.a() << endl;
cout << t.b() << endl;
return 0;
}

写 Makefile 来编译 test_proto.cc

1
2
3
4
5
6
7
8
9
10
11
12
13
CXX=clang++
PB_SRC=test.pb.cc

PROTOBUF_DIR=$(HOME)/code/libprotobuf-mutator/build/external.protobuf/
PROTOBUF_LIB=$(PROTOBUF_DIR)/lib/libprotobufd.a
INC=-I$(PROTOBUF_DIR)/include

test_proto: test_proto.cc $(PB_SRC)
$(CXX) -o $@ $^ $(PROTOBUF_LIB) $(INC)

.PHONY: clean
clean:
rm test_proto

执行 make 后报错 ./test.pb.h:17:2: error: This file was generated by an older version of protoc which is,因为我们在编译的时候使用了 -DLIB_PROTO_MUTATOR_DOWNLOAD_PROTOBUF=ON 下载了新版本的 protobuf,所以出了这个错误。只好使用下载的 protoc 重新生成 test.pb.cctest.pb.h

1
~/code/libprotobuf-mutator/build/external.protobuf/bin/protoc ./test.proto --cpp_out=./genfiles

这次可以成功 make 了,实际执行的命令是

1
2
clang++ -o test_proto test_proto.cc test.pb.cc /home/henices/code/libprotobuf-mutator/build/external.protobuf//lib/libprotobufd.a \
-I/home/henices/code/libprotobuf-mutator/build/external.protobuf//include

运行 ./test_proto, 输出如下

1
2
101
testtest

Combine libprotobuf-mutator with libfuzzer

代码在: https://github.com/bruce30262/libprotobuf-mutator_fuzzing_learning/tree/master/2_libprotobuf_libfuzzer

harness.cc

1
2
3
4
5
6
7
8
9
#include <stdint.h>
#include <stddef.h>

extern "C" int FuzzTEST(const uint8_t *data, size_t size) {
if(data[0] == '\x01') {
__builtin_trap();
}
return 0;
}

其中 FuzzTEST 是我们需要测试的目标函数。

lpm_libfuzz.cc

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#include "libprotobuf-mutator/src/libfuzzer/libfuzzer_macro.h"
#include "test.pb.h"

#include <bits/stdc++.h>

using std::cin;
using std::cout;
using std::endl;

std::string ProtoToData(const TEST &test_proto) {
std::stringstream all;
const auto &aa = test_proto.a();
const auto &bb = test_proto.b();
all.write((const char*)&aa, sizeof(aa));
if(bb.size() != 0) {
all.write(bb.c_str(), bb.size());
}

std::string res = all.str();
if (bb.size() != 0 && res.size() != 0) {
// set PROTO_FUZZER_DUMP_PATH env to dump the serialized protobuf
if (const char *dump_path = getenv("PROTO_FUZZER_DUMP_PATH")) {
std::ofstream of(dump_path);
of.write(res.data(), res.size());
}
}
return res;
}

extern "C" int FuzzTEST(const uint8_t* data, size_t size); // our customized fuzzing function

DEFINE_PROTO_FUZZER(const TEST &test_proto) {
auto s = ProtoToData(test_proto); // convert protobuf to raw data
FuzzTEST((const uint8_t*)s.data(), s.size()); // fuzz the function
}

在文件的最开头导入 libfuzzer_macro.h, 后面就可以使用一些宏来写代码了,DEFINE_PROTO_FUZZER 是关键的 fuzzer 入口。结构化变异部分 libprotobuf-mutator 已经完成了,需要实现的是一个由 protobuf 转需要的数据类型的函数,如上面的 ProtoToData

代码编译

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
TARGET=lpm_libfuzz
CXX=clang++
CXXFLAGS=-g -fsanitize=fuzzer,address
PB_SRC=test.pb.cc

PROTOBUF_DIR=$(HOME)/code/libprotobuf-mutator/build/external.protobuf/
LPM_DIR=$(HOME)/code/AFL+/external/libprotobuf-mutator
PROTOBUF_LIB=$(PROTOBUF_DIR)/lib/libprotobufd.a
LPM_LIB=$(LPM_DIR)/lib/libprotobuf-mutator-libfuzzer.a $(LPM_DIR)/lib/libprotobuf-mutator.a
INC=-I$(PROTOBUF_DIR)/include -I$(LPM_DIR)/include
DFUZZ=-DLLVMFuzzerTestOneInput=FuzzTEST

all: $(TARGET)

# for testing libprotobuf + libfuzzer
# compile harness first
# then link lpm_libfuzz with harness.o & static libraries
harness.o: harness.cc
$(CXX) $(CXXFLAGS) -c $(DFUZZ) $<

$(TARGET): harness.o $(TARGET).cc
$(CXX) $(CXXFLAGS) -o $@ $^ $(PB_SRC) $(LPM_LIB) $(PROTOBUF_LIB) $(INC) # $(LPM_LIB) must be placed before $(PROTOBUF_LIB)

.PHONY: clean
clean:
rm $(TARGET) *.o

make 后报错,找不到头文件

1
2
3
4
/home/henices/code/AFL+/external/libprotobuf-mutator/include/libprotobuf-mutator/src/libfuzzer/libfuzzer_macro.h:24:10: fatal error: 'port/protobuf.h' file not found
#include "port/protobuf.h"
^~~~~~~~~~~~~~~~~
1 error generated.

port/protobuf.h 来自 https://github.com/google/libprotobuf-mutator/blob/master/port/protobuf.h 修改 Makefile 如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
TARGET=lpm_libfuzz
CXX=clang++
CXXFLAGS=-g -fsanitize=fuzzer,address
PB_SRC=test.pb.cc

PROTOBUF_DIR=$(HOME)/code/libprotobuf-mutator/build/external.protobuf/
LPM_DIR=$(HOME)/code/AFL+/external/libprotobuf-mutator
PROTOBUF_LIB=$(PROTOBUF_DIR)/lib/libprotobufd.a
LPM_LIB=$(LPM_DIR)/lib/libprotobuf-mutator-libfuzzer.a $(LPM_DIR)/lib/libprotobuf-mutator.a
INC=-I$(PROTOBUF_DIR)/include -I$(HOME)/code/libprotobuf-mutator/ -I$(LPM_DIR)/include
DFUZZ=-DLLVMFuzzerTestOneInput=FuzzTEST

all: $(TARGET)

# for testing libprotobuf + libfuzzer
# compile harness first
# then link lpm_libfuzz with harness.o & static libraries
harness.o: harness.cc
$(CXX) $(CXXFLAGS) -c $(DFUZZ) $<

$(TARGET): harness.o $(TARGET).cc
$(CXX) $(CXXFLAGS) -o $@ $^ $(PB_SRC) $(LPM_LIB) $(PROTOBUF_LIB) $(INC) # $(LPM_LIB) must be placed before $(PROTOBUF_LIB)

.PHONY: clean
clean:
rm $(TARGET) *.o

修改 makefile 后,可以正常编译通过。经过我们上面的分析,不需要定义 DFUZZ=-DLLVMFuzzerTestOneInput=FuzzTEST 将这行删除掉,同样可以编译通过。

执行 lpm_libfuzz 运行正常

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
➜  ./lpm_libfuzz                                                                                                                                                                                          
INFO: found LLVMFuzzerCustomMutator (0x758a80). Disabling -len_control by default.
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 331712324
INFO: Loaded 1 modules (434 inline 8-bit counters): 434 [0xa056b8, 0xa0586a),
INFO: Loaded 1 PC tables (434 PCs): 434 [0x939398,0x93aeb8),
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
INFO: A corpus is not provided, starting from an empty corpus
#2 INITED cov: 82 ft: 83 corp: 1/1b exec/s: 0 rss: 35Mb
NEW_FUNC[1/41]: 0x75abb0 in TEST::~TEST() /tmp/genfiles/test.pb.cc:116
NEW_FUNC[2/41]: 0x75ca10 in google::protobuf::UnknownFieldSet* google::protobuf::internal::InternalMetadata::mutable_unknown_fields<google::protobuf::UnknownFieldSet>() /home/henices/code/libprotobuf-mut
ator/build/external.protobuf//include/google/protobuf/metadata_lite.h:117
#3 NEW cov: 135 ft: 159 corp: 2/12b lim: 4096 exec/s: 0 rss: 36Mb L: 11/11 MS: 1 CustomCrossOver-
#4 NEW cov: 138 ft: 162 corp: 3/201b lim: 4096 exec/s: 0 rss: 36Mb L: 189/189 MS: 2 InsertRepeatedBytes-Custom-
#38 REDUCE cov: 138 ft: 162 corp: 3/127b lim: 4096 exec/s: 0 rss: 37Mb L: 115/115 MS: 5 CustomCrossOver-CustomCrossOver-Custom-InsertRepeatedBytes-Custom-
#60 REDUCE cov: 138 ft: 162 corp: 3/124b lim: 4096 exec/s: 0 rss: 37Mb L: 112/112 MS: 4 CustomCrossOver-ShuffleBytes-ChangeByte-Custom-
#98 REDUCE cov: 138 ft: 162 corp: 3/56b lim: 4096 exec/s: 0 rss: 37Mb L: 44/44 MS: 5 CrossOver-Custom-Custom-CMP-Custom- DE: "~\xff\xff\xff\xff\xff\xff\xff"-
#144 REDUCE cov: 138 ft: 162 corp: 3/53b lim: 4096 exec/s: 0 rss: 37Mb L: 41/41 MS: 2 CrossOver-Custom-
#182 REDUCE cov: 138 ft: 162 corp: 3/25b lim: 4096 exec/s: 0 rss: 37Mb L: 13/13 MS: 5 ChangeBit-Custom-Custom-InsertByte-Custom-
#324 REDUCE cov: 138 ft: 162 corp: 3/24b lim: 4096 exec/s: 0 rss: 38Mb L: 12/12 MS: 3 CustomCrossOver-EraseBytes-Custom-

Handling input from AFL++ in our custom mutator

https://github.com/bruce30262/libprotobuf-mutator_fuzzing_learning/tree/master/5_libprotobuf_aflpp_custom_mutator_input

主要内容是将 libprotobuf-mutator 和 afl++ 结合起来,使用的 afl++ 的 custom mutator,值得一提的是在这个例子里,需要使用 -fPIC 参数编译 libprotobuf-mutator。

1
2
3
4
cmake .. -GNinja -DCMAKE_C_COMPILER=clang -DLIB_PROTO_MUTATOR_DOWNLOAD_PROTOBUF=ON \
-DCMAKE_INSTALL_PREFIX=/home/henices/code/AFL+/external/libprotobuf-mutator \
-DCMAKE_CXX_COMPILER=clang++ -DCMAKE_BUILD_TYPE=Debug \
-DCMAKE_C_FLAGS="-fPIC" -DCMAKE_CXX_FLAGS="-fPIC"

主要的步骤在 readme.md 中已经介绍得比较清楚了。

  • lpm_aflpp_custom_mutator_input.cc 是 afl++ 的 custom mutator shared library
    • 解析输入数据(testcase buffer)并将其转成 TEST protobuf message
    • 使用 libprotobuf-mutator 变异 TEST protobuf message
    • 注册一个 PostProcessor 处理变异后的 TEST protobuf message (非必要步骤)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
extern "C" size_t afl_custom_fuzz(MyMutator *mutator, // return value from afl_custom_init
uint8_t *buf, size_t buf_size, // input data to be mutated
uint8_t **out_buf, // output buffer
uint8_t *add_buf, size_t add_buf_size, // add_buf can be NULL
size_t max_size) {
// This function can be named either "afl_custom_fuzz" or "afl_custom_mutator"
// A simple test shows that "buf" will be the content of the current test case
// "add_buf" will be the next test case ( from AFL++'s input queue )

TEST input;
// parse input data to TEST
// Notice that input data should be a serialized protobuf data
// Check ./in/ii and test_protobuf_serializer for more detail
bool parse_ok = input.ParseFromArray(buf, buf_size);
if(!parse_ok) {
// Invalid serialize protobuf data. Don't mutate.
// Return a dummy buffer. Also mutated_size = 0
static uint8_t *dummy = new uint8_t[10]; // dummy buffer with no data
*out_buf = dummy;
return 0;
}
// mutate the protobuf
mutator->Mutate(&input, max_size);

// Convert protobuf to raw data
const TEST *p = &input;
std::string s = ProtoToData(*p);
// Copy to a new buffer ( mutated_out )
size_t mutated_size = s.size() <= max_size ? s.size() : max_size; // check if raw data's size is larger than max_size
uint8_t *mutated_out = new uint8_t[mutated_size+1];
memcpy(mutated_out, s.c_str(), mutated_size); // copy the mutated data
// Assign the mutated data and return mutated_size
*out_buf = mutated_out;
return mutated_size;
}

mutator->Mutate(&input, max_size); 为真正起作用的核心代码

  • lpm_aflpp_custom_mutator_input.h 继承了protobuf_mutator::Mutator, 可以使用 libprotobuf-mutator 的 Mutate 方法
1
2
3
4
5
6
7
#include "libprotobuf-mutator/src/mutator.h"
#include "test.pb.h"

#include <bits/stdc++.h>

class MyMutator : public protobuf_mutator::Mutator {
};
  • test_proto_serializer.cc

    • 用于生成一条序列化的 TEST protobuf message,可以作为 fuzz 的初始化 testcase 使用
  • vuln.c 漏洞测试程序

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#include <stdio.h>
#include <string.h>
#include <math.h>
#include <stdlib.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
char str[100]={};
read(0, str, 100);
int *ptr = NULL;
if( str[0] == '\x02' || str[0] == '\xe8') {
*ptr = 123;
}
return 0;
}

漏洞测试程序比较简单,只要第一个字节是 0xe8 或者 0x02 即可,libprotobuf-mutator 的变异在这个例子里效率并不高,所以需要使用 PostProcessor 来优化变异。

  • Makefile
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
TARGET=lpm_aflpp_custom_mutator_input
CXX=clang++-11
AFLCC=$(HOME)/AFLplusplus/afl-gcc
PB_SRC=test.pb.cc

PROTOBUF_DIR=$(HOME)/libprotobuf-mutator/build/external.protobuf
PROTOBUF_LIB=$(PROTOBUF_DIR)/lib/libprotobufd.a

LPM_DIR=$(HOME)/libprotobuf-mutator
LPM_LIB=$(LPM_DIR)/build/src/libfuzzer/libprotobuf-mutator-libfuzzer.a $(LPM_DIR)/build/src/libprotobuf-mutator.a

INC=-I$(PROTOBUF_DIR)/include -I$(LPM_DIR)

all: $(TARGET).so

$(TARGET).so: $(TARGET).cc $(PB_SRC)
$(CXX) -fPIC -c $^ $(INC)
$(CXX) -shared -Wall -O3 -o $@ *.o $(LPM_LIB) $(PROTOBUF_LIB)

vuln: vuln.c
$(AFLCC) -o $@ $^

test_proto_serializer: test_proto_serializer.cc $(PB_SRC)
$(CXX) -o $@ $^ $(PROTOBUF_LIB) $(INC)

.PHONY: clean
clean:
rm *.so *.o vuln test_proto_serializer

Makefile 有瑕疵,这个章节的内容和 libfuzzer 没有关系,不需要链接 libprotobuf-mutator-libfuzzer.a

参考资料

https://developers.google.com/protocol-buffers/docs/cpptutorial
https://github.com/google/fuzzing/blob/master/docs/structure-aware-fuzzing.md
https://github.com/google/libprotobuf-mutator/
https://llvm.org/docs/LibFuzzer.html
https://github.com/bruce30262/libprotobuf-mutator_fuzzing_learning

Keynote - How Do You Actually Find Bugs?

https://www.youtube.com/watch?v=7Ysy6iA2sqA&ab_channel=OffensiveCon

  • Temperament
    • Curiosity
    • Detail-oriented
    • Ability to deal with failure and continual evidence that you’re wrong
  • Learn how to deal with failure
    • Two projects (can be unrealeted, or different parts of the same)
      • Learn to recognize whe you have hit a wall and have become unproductive
      • Switch to your secondary project
    • Consider having a development project as your seconary project
      • Do an achiveable, measurable task
      • Regain a sense of achievement
  • Moving on - Knowning when to quit
    • You will return it in the future
  • Motivation - Remaining Enager
    • The more you’re curious about how a technology works or how an algorithm achives its goal, the less monotonous code review is
  • Motivation - bug patching
    • bugs being patched is frustrating
      • … but they are evidence that you were on the right track!
  • Confidence
    • Research is a daunting filed to enter
    • Some security reseacher you respect had the same self-doubt coming in, and have recurrences from time to time
    • Growth mindset: “I can’t do that … Yet”
  • Bias and assumptions
    • Common code reviewer biases
      • Everyone has looked at the already (many eyes make all bugs shallow)
      • Even if I found something, it will be unexploitable (Server side)
      • The X attack surface is not interesting now (eg, media parsing in browsers)
      • There are no more bugs in this
      • The protocol doesn’t allow you to do X
  • Auditing Process
    • Understanding the code
    • Documenting your findings
    • Identifying bias
    • Tooling
    • Revisit the code base
    • Analyze failures
  • Attampt to understand the code
    • A lot of people try to short-circuit this process
      • Reliance on tools
      • Fuzzers/static analyzers are a guide, not the whole process
    • Many of today’s vulnerabilities are complex, and require in-depth understanding of the codebase
    • The more you understand about how the program works, the better equipped you are to find bugs ( and exploit them)
    • The best way to understand how something works is to explain to someone else
  • What I’m looking for
    • software risk = available attack surface * complexity
    • attack surface can be indirect
      • even mitigations are attack surface
    • often you initial perception of attack surface is naive
      • Hidden/non-obvious attack surfaces are the best
    • Complexity is plentiful
      • Feature driven (thanks w3c)
      • Legacy support
      • Often avoidable: the anomaly of cheap complexity
  • Borrwoing ideas
    • Bugtracker / Diffs
      • Can show where a bug is
      • Can inspire new ideas: variants, same bugs in other codebase
      • Mean but viable: track commits by error-prone developers
    • Comments in the codebase
      • Descripbe things I’d never thought of
  • Document your findings
    • Get into the habit of documenting
      • ideas, bugs candidates, idiosyncrasies, data structure, algorithms
    • Documenting failed ideas is as important as documenting successful ones
      • avoids repeating thie idea sometime later
    • Long term view: I’m going to revisit this later
  • Revisit code bases and failed bugs
    • code bases are not static
      • coee rewritten
      • Features added/changed
    • Environment is not static
  • Analyze your failures
    • if someone succeeds where you didn’t, have a look at what they found
    • Try to figure out: why did I miss it?
    • Is this a one-off or teachable trick/blind spot
    • Can you improve on that?
    • Is there a pattern in those falures?
    • Failures is an oppotunity to learn

halfempty 的一些使用说明

https://github.com/googleprojectzero/halfempty

google P0 @taviso 提供的 testcase 并行快速精简工具 (A fast, parallel test case minimization tool)
需要注意的是 halfempty 只能精简导致目标程序 crash 的 testcase,如果 testcase 不导致目标程序 crash, 还是需要使用 afl-tmin 类似的工具根据 coverage 来精简。

halfempty 工具向测试脚本传递内容时使用的是 pipe, 如果测试的程序只接受文件路径作为参数时,需要一些技巧,README 虽然有提及但是说的比较晦涩。

以 upx 为例,upx 的命令行帮助如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
root@fuzzing-5:/mnt/disk/halfempty# ./upx.out_x86-64
Ultimate Packer for eXecutables
Copyright (C) 1996 - 2020
UPX git-d7ba31+ Markus Oberhumer, Laszlo Molnar & John Reiser Jan 23rd 2020

Usage: upx.out_x86-64 [-123456789dlthVL] [-qvfk] [-o file] file..

Commands:
-1 compress faster -9 compress better
-d decompress -l list compressed file
-t test compressed file -V display version number
-h give more help -L display software license
Options:
-q be quiet -v be verbose
-oFILE write output to 'FILE'
-f force compression of suspicious files
-k keep backup files
file.. executables to (de)compress

Type 'upx.out_x86-64 --help' for more detailed help.

UPX comes with ABSOLUTELY NO WARRANTY; for details visit https://upx.github.io

upx 只能使用文件路径作为参数, 比如像这样执行命令。 ./upx.out_x86-64 crash.upx

按照 README 中的例子编写测试脚本 test.sh

1
2
3
4
5
6
7
8
9
#!/bin/sh

./upx.out_x86-64 $1

if test $? -eq 139; then
exit 0
else
exit 1
fi

执行的时候会报错

1
2
3
4
5
6
7
8
9
10
11
12
13
./halfempty ./test.sh  crash.upx
╭│ │ ── halfempty ───────────────────────────────────────────────── v0.30 ──
╰│ 16│ A fast, parallel testcase minimization tool
╰───╯ ───────────────────────────────────────────────────────── by @taviso ──

Input file "crash.upx" is now 19088 bytes, starting strategy "bisect"...
Verifying the original input executes successfully... (skip with --noverify)
** Message: This program expected `./test1.sh` to return successfully
** Message: for the original input (i.e. exitcode zero).
** Message: Try it yourself to verify it's working.
** Message: Use a command like: `cat crash.upx | ./test.sh || echo failed`

** (halfempty:2477): WARNING **: Strategy "bisect" failed, cannot continue.

正确的方法是使用临时文件,因为 halfempty 是一个并行的工具,每次使用的临时文件都应该不一样。

1
2
3
4
5
6
7
8
9
10
11
#!/bin/sh

tempfile=`mktemp` && cat > ${tempfile}

./upx.out_x86-64 ${tempfile}

if test $? -eq 139; then
exit 0
else
exit 1
fi

运行后的输出,大致如下

1
2
3
4
5
6
7
8
9
10
root@fuzzing-5:/mnt/disk/halfempty# ./halfempty  ./test.sh  crash.upx 
╭│ │ ── halfempty ───────────────────────────────────────────────── v0.30 ──
╰│ 16│ A fast, parallel testcase minimization tool
╰───╯ ───────────────────────────────────────────────────────── by @taviso ──

Input file "crash.upx" is now 19088 bytes, starting strategy "bisect"...
Verifying the original input executes successfully... (skip with --noverify)
The original input file succeeded after 0.0 seconds.
New finalized size: 19088 (depth=2) real=0.0s, user=0.0s, speedup=~-0.0s
treesize=6654, height=6376, unproc=0, real=4.4^C user=19.3s, speedup=~14.9s

已经可以正常运行了。