TopHome
<2023-11-20 Mon>linuxsystemsmemory

A trek through zpoline (part 1)

So, HN front page led to me to an interesting project: bpftime. It's an userspace ebpf runtime for hooking syscalls and uprobes! Think about that for a second. As complicated as it is, it builds ontwo underlying technologies, one to track userspace functions and another for syscalls.

The second one, is something called zpoline: a paper from Usenix ATC'23. The aim of this work is to find an efficient, fast, complete solution to syscall hooking - which it does entirely in userspace using binary rewriting. I was simply blown away by the simplicity and excellence of this work - this is the most exciting paper I have read in a while. In fact, this paper has won the best paper award - which is totally deserving.

In this blog, I am going to take a trek through this paper. The aim is not to explain the paper itself - you can read the original paper - it may be one of the simplest papers in existance, if you have the pre-requisite knowledge. This is not a teardown or walkthrough either - since I don't have the time to pursue a full implementation, unfortunately. Instead, I will simply call out interesting aspects of the work, point out parts of the code and do some minimal hands-on in the same space of this paper. I aim to over-explain in this blog, so that beginners to these concepts may be able to follow.

1. Examining the syscall instruction

One of the core pillars of this paper is that syscalls are instructions in the binary which take up 2 bytes. The challenge addressed in this paper is being able to efficiently use these 2 bytes - naive approaches to replace the syscall call take up more than 8 bytes.

I actually didn't know/realize that syscall calls were a part of the instruction set. Sure enough, looking it up on https://www.felixcloutier.com/x86/ does show it. But, I want to see it for myself.

Let us start with a simple C program. Since, the paper mentions "read" being the first syscall, I wanted to find this out.

#include<stdio.h>
#include<unistd.h>

void main() {
  char c;
  c = getc(stdin);
  printf("Value: %c\n", c);
}

Great, you can compile it using gcc:

$ gcc simple.c

and run it as usual ./a.out at which point you have simple program that works as expected.

If you examine the binary using:

$ objdump -d a.out | less

What is the problem? The problem is that you will not find the "syscall" instruction anywhere, since we are simply calling a standard C library function "getc".

In the disassmbled binary, searching for main, will give you a section like the following:

0000000000400616 <main>:
  400616:       55                      push   %rbp
  400617:       48 89 e5                mov    %rsp,%rbp
  40061a:       48 83 ec 10             sub    $0x10,%rsp
  40061e:       48 8b 05 0b 0a 20 00    mov    0x200a0b(%rip),%rax        # 601030 <stdin@@GLIBC_2.2.5>
  400625:       48 89 c7                mov    %rax,%rdi
  400628:       e8 f3 fe ff ff          callq  400520 <getc@plt>
  40062d:       88 45 ff                mov    %al,-0x1(%rbp)
  400630:       0f be 45 ff             movsbl -0x1(%rbp),%eax
  400634:       89 c6                   mov    %eax,%esi
  400636:       bf e8 06 40 00          mov    $0x4006e8,%edi
  40063b:       b8 00 00 00 00          mov    $0x0,%eax
  400640:       e8 cb fe ff ff          callq  400510 <printf@plt>

The calls to getc (and printf) - the "callq" lines refer to other addresses in the same binary. Let us look at them, in a separate section called plt:

Disassembly of section .plt:

0000000000400500 <.plt>:
  400500:       ff 35 02 0b 20 00       pushq  0x200b02(%rip)        # 601008 <_GLOBAL_OFFSET_TABLE_+0x8>
  400506:       ff 25 04 0b 20 00       jmpq   *0x200b04(%rip)        # 601010 <_GLOBAL_OFFSET_TABLE_+0x10>
  40050c:       0f 1f 40 00             nopl   0x0(%rax)

0000000000400510 <printf@plt>:
  400510:       ff 25 02 0b 20 00       jmpq   *0x200b02(%rip)        # 601018 <printf@GLIBC_2.2.5>
  400516:       68 00 00 00 00          pushq  $0x0
  40051b:       e9 e0 ff ff ff          jmpq   400500 <.plt>

0000000000400520 <getc@plt>:
  400520:       ff 25 fa 0a 20 00       jmpq   *0x200afa(%rip)        # 601020 <getc@GLIBC_2.2.5>
  400526:       68 01 00 00 00          pushq  $0x1
  40052b:       e9 d0 ff ff ff          jmpq   400500 <.plt>

PLT stands for Procedure Linkage Table, along with GOT - Global Object Table allows external functions (in this case from the standard library) to be linked dynamically. Understandably, we will never get to our syscall this way.

Let us do 2 things:

  1. Switch to the read stdlib call, instead of getc. Why? Just to remove one layer of indirection.
  2. Statically compile our program. This is not normally done with C, but this is the only way to get the binary of our dependencies into a place where we can actually inspect them.

For the first, we simply replace the getc line with the following:

read(0, &c, 1);

Note: this read is not the read system call. It is a C std library function that wraps the system call. In the man pages, section 2 is devoted to system calls and section 3 is to standard POSIX function. So simply running man 2 read or man 3 read will get you the corresponding help pages.

Second, we need to statically compile the C program. For this, first you need get the static versions of the stdlib. On a RHEL, it is the following:

$ sudo dnf install glibc-static

Then, you can compile the program as follows:

$ gcc -static simple.c

If you notice the created binary, it will be much larger than before. Dissamble as usual, the output will also be quite large, with all sorts of un-needed functions being included. Not neat, but works for us.

Now, let's follow main.

0000000000400ac5 <main>:
  400ac5:       55                      push   %rbp
  400ac6:       48 89 e5                mov    %rsp,%rbp
  400ac9:       48 83 ec 10             sub    $0x10,%rsp
  400acd:       48 8d 45 ff             lea    -0x1(%rbp),%rax
  400ad1:       ba 01 00 00 00          mov    $0x1,%edx
  400ad6:       48 89 c6                mov    %rax,%rsi
  400ad9:       bf 00 00 00 00          mov    $0x0,%edi
  400ade:       e8 ad cb 03 00          callq  43d690 <__libc_read>
  400ae3:       0f b6 45 ff             movzbl -0x1(%rbp),%eax
  400ae7:       0f be c0                movsbl %al,%eax
  400aea:       89 c6                   mov    %eax,%esi
  400aec:       bf 50 e9 47 00          mov    $0x47e950,%edi
  400af1:       b8 00 00 00 00          mov    $0x0,%eax
  400af6:       e8 a5 81 00 00          callq  408ca0 <_IO_printf>
  400afb:       90                      nop
  400afc:       c9                      leaveq
  400afd:       c3                      retq
  400afe:       66 90                   xchg   %ax,%ax

Pretty similar to last time, except there is no reference to PLT now. This time, we are going to go the symbol __libc_read, cutting out the middleman getc.

000000000043d690 <__libc_read>:
  43d690:       f3 0f 1e fa             endbr64
  43d694:       8b 05 f6 d1 26 00       mov    0x26d1f6(%rip),%eax        # 6aa890 <__libc_multiple_threads>
  43d69a:       85 c0                   test   %eax,%eax
  43d69c:       75 12                   jne    43d6b0 <__libc_read+0x20>
  43d69e:       31 c0                   xor    %eax,%eax
  43d6a0:       0f 05                   syscall
  43d6a2:       48 3d 00 f0 ff ff       cmp    $0xfffffffffffff000,%rax
  43d6a8:       77 56                   ja     43d700 <__libc_read+0x70>
  43d6aa:       c3                      retq
  43d6ab:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
  43d6b0:       41 54                   push   %r12
  43d6b2:       49 89 d4                mov    %rdx,%r12
  43d6b5:       55                      push   %rbp
  43d6b6:       48 89 f5                mov    %rsi,%rbp
  43d6b9:       53                      push   %rbx
  43d6ba:       89 fb                   mov    %edi,%ebx
  43d6bc:       48 83 ec 10             sub    $0x10,%rsp
  43d6c0:       e8 4b 3a 02 00          callq  461110 <__libc_enable_asynccancel>
  43d6c5:       4c 89 e2                mov    %r12,%rdx
  43d6c8:       48 89 ee                mov    %rbp,%rsi
  43d6cb:       89 df                   mov    %ebx,%edi
  43d6cd:       41 89 c0                mov    %eax,%r8d
  43d6d0:       31 c0                   xor    %eax,%eax
  43d6d2:       0f 05                   syscall
  43d6d4:       48 3d 00 f0 ff ff       cmp    $0xfffffffffffff000,%rax
  43d6da:       77 38                   ja     43d714 <__libc_read+0x84>
  43d6dc:       44 89 c7                mov    %r8d,%edi
  43d6df:       48 89 44 24 08          mov    %rax,0x8(%rsp)

  ... lines truncated

A number of things are happening here, which I don't fully understand, but the first few lines directly show what we came for.

...
43d69e:       31 c0                   xor    %eax,%eax
43d6a0:       0f 05                   syscall
...

XOR'ing eax sets it to 0. Then syscall, the instruction in binary being 0f 05, is called. Exactly like the paper says.

Now, the paper said that read syscall is 0, which is as seen here. But, we can checkout all syscall numbers by looking at "sys/syscall.h" header file. The file points to another file, a few redirects later, we settle on /usr/include/asm/unistd_64.h on my system. This looks like the following:

#ifndef _ASM_X86_UNISTD_64_H
#define _ASM_X86_UNISTD_64_H 1

#define __NR_read 0
#define __NR_write 1
#define __NR_open 2
#define __NR_close 3
#define __NR_stat 4
#define __NR_fstat 5
#define __NR_lstat 6
...

There you go. Syscall numbers as promised, running to 439 in my case.

Note: You may have heard of interrupts using 0x80 as a way of doing syscalls. Apparently that is part of the 32 bit ABI and is pretty much deprecated with the x86-64 ABI. The new alternative is what we see here, the syscall instruction.

2. LDPRELOAD Library Overriding

I am going to be brief here and simply point to an existing article on this topic: https://www.baeldung.com/linux/ld_preload-trick-what-is.

The point is, you can easily, dynamically, load libraries before your binary is run and you can also override internal functions. Let us see a simple example, lifted from this Stack Overflow QA: https://stackoverflow.com/questions/6083337/overriding-malloc-using-the-ld-preload-mechanism

Create a "malloc.c" that looks like this:

#define _GNU_SOURCE  // needed for RTLD_NEXT constant, see man dlsym
#include <stdio.h>
#include <dlfcn.h>   // gets you dlsym

static void* (*real_malloc)(size_t)=NULL;

static void malloc_init(void)
{
    real_malloc = dlsym(RTLD_NEXT, "malloc");  // lookup the real malloc
    if (NULL == real_malloc) {
        fprintf(stderr, "Error in `dlsym`: %s\n", dlerror());
    }
}

void *malloc(size_t size)
{
    if(real_malloc==NULL) {
        malloc_init();
    }

    void *p = real_malloc(size); // call the real malloc
    fprintf(stderr, "malloc(%d) = %p\n", size, p);
    return p;
}

Let us compile this into a shared library:

$ gcc -shared -fPIC malloc.c -o libmalloc.so

Now, running the following:

$ LD_PRELOAD=`pwd`/libmalloc.so ls

will run ls while also printing all the malloc calls.

See also: man 8 ld.so

3. Using the contructor attribute

In the previous example, we did something complicated - override an existing function. We (ie zpoline) doesn't need something that complicated.

Instead, when we load libzpoline.so using LDPRELOAD, we need to simply run a function that does certain things - rewrite the loaded binary. This is done using the constructor attribute.

Take a look at the following.

#include <stdio.h>

__attribute__((constructor)) static void myinit(void)
{
    fprintf(stderr, "Haha");
}

We create a function myinit (marked static - since we don't need anyone to call it) and set the contructor attribute. Now, when this is loaded, as before using LDPRELOAD, this function will be run before the main program starts.

$ gcc -shared -fPIC -o libconstr.so constr.c
$ LD_PRELOAD=`pwd`/libconstr.so ls
Haha
<output of ls>

You can specify a priority for the constructor function between 101 to 65536 (lesser ones are reserved) using the syntax ((constructor (105))) for example.

Full documentation for attributes may be found here: https://gcc.gnu.org/onlinedocs/gcc-8.4.0/gcc/Common-Function-Attributes.html. (Search for constructor on that page.)

See how zpoline uses this here: https://github.com/yasukata/zpoline/blob/0a349e65c102f8f9bdbbf6da0a52c4006589178b/main.c#L545

zpoline needs to go at the very end, so they use the lowest priority possible (0xffff is 65535).

So, for now, using the LDPRELOAD system, zpoline is injected into the application and whatever it needs to do is run at the very beginning. Then, the application will run as normal (excepting any changes we did, of course).

This injected function does 3 things:

  1. Setup the trampoline at the very beginning.
  2. Rewrite all of the syscall calls to redirect to the trampoline at zero (hence the zpoline).
  3. Load the user defined hook function - the actual business logic to replace the original syscall.

4. Memory maps in the /proc file system

You probably know that the /proc filesystem is a virtual file like view into the state of the system by processes. You can look up all sorts of things for a process of pid p by simple going to the directory /proc/p/.

zpoline uses the /proc/self/maps to begin to manage it's own memory. The function is here: https://github.com/yasukata/zpoline/blob/0a349e65c102f8f9bdbbf6da0a52c4006589178b/main.c#L336

Broadly, we are:

  1. "self" is used to refer to our own process. Remember, zpoline is now inside the target process.
  2. Lookup the memory maps.
  3. Filter out some that we don't want to touch, like the stack.
  4. Filter in the ones that have the "executable" bit set. Syscalls would not be called from other places.
  5. Run those portions through the disassembler to get hints (instead of blindly re-writing). (Kind of like how we manually ran objdump and looked at the instructions)
  6. Go and replace the instruction at those locations.

Let us look at the memory map for cat.

       $ cat /proc/self/maps
55d6b9602000-55d6b960a000 r-xp 00000000 fd:02 805886940                  /usr/bin/cat
55d6b9809000-55d6b980a000 r--p 00007000 fd:02 805886940                  /usr/bin/cat
55d6b980a000-55d6b980b000 rw-p 00008000 fd:02 805886940                  /usr/bin/cat
55d6bae87000-55d6baea8000 rw-p 00000000 00:00 0                          [heap]
7f6ea0537000-7f6ead4ed000 r--p 00000000 fd:02 14146528                   /usr/lib/locale/locale-archive
7f6ead4ed000-7f6ead6a8000 r-xp 00000000 fd:02 1673184                    /usr/lib64/libc-2.28.so
7f6ead6a8000-7f6ead8a8000 ---p 001bb000 fd:02 1673184                    /usr/lib64/libc-2.28.so
7f6ead8a8000-7f6ead8ac000 r--p 001bb000 fd:02 1673184                    /usr/lib64/libc-2.28.so
7f6ead8ac000-7f6ead8ae000 rw-p 001bf000 fd:02 1673184                    /usr/lib64/libc-2.28.so
7f6ead8ae000-7f6ead8b2000 rw-p 00000000 00:00 0
7f6ead8b2000-7f6ead8e0000 r-xp 00000000 fd:02 5042440                    /usr/lib64/ld-2.28.so
7f6eadaa4000-7f6eadac9000 rw-p 00000000 00:00 0
7f6eadade000-7f6eadae0000 rw-p 00000000 00:00 0
7f6eadae0000-7f6eadae1000 r--p 0002e000 fd:02 5042440                    /usr/lib64/ld-2.28.so
7f6eadae1000-7f6eadae3000 rw-p 0002f000 fd:02 5042440                    /usr/lib64/ld-2.28.so
7ffdb0c8a000-7ffdb0cad000 rw-p 00000000 00:00 0                          [stack]
7ffdb0d0b000-7ffdb0d0f000 r--p 00000000 00:00 0                          [vvar]
7ffdb0d0f000-7ffdb0d11000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

The first column is the virtual memory address range. The second is the read/write/execute bits. For example, you can see that the "stack" and "heap" are not executable. You can see more info about these columns in man proc.

5. Rewriting memory using the map

Let us write a dumb program to demonstrate the core functionality of the rewrite. In zpoline, they did an intelligent move of disassembling the loaded code and searching for the things to replace. Let us do something much simpler in this section.

Here is the program.

#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<assert.h>
#include<unistd.h>

void modify_heap(){
    FILE *fp;
    assert((fp = fopen("/proc/self/maps", "r")) != NULL);

    char buf[4096];
    // gets line by line
    while (fgets(buf, sizeof(buf), fp) != NULL) {
      // if this line has heap somewhere, we should be on the right entry
      if (strstr(buf, "heap") != NULL) {
        char addr[65] = { 0 };
        // extract out the first column from the proc map entry
        char *c = strtok(buf, " ");
        strncpy(addr, c, sizeof(addr) - 1);

        // replace the "-" in the x-y string with a null byte
        int k;
        for (k = 0; k < strlen(addr); k++) {
            if (addr[k] == '-') {
                addr[k] = '\0';
                break;
            }
        }

        int64_t from, to;
        from = strtol(&addr[0], NULL, 16);
        to = strtol(&addr[k + 1], NULL, 16);

        printf("From: %jd, To: %jd\n", from, to);

        // let us set the first int to 42
        for (char* loc = from; loc < from + 1000; loc+=4) {
            *loc = 0x2a;
            *(loc+1) = 0x00;
            *(loc+2) = 0x00;
            *(loc+3) = 0x00;
        }

        // we have done what we came to do - no need to look at other entries
        break;
      }
    }
}

int main(){
  void *beg = sbrk(0);

  int *a = malloc(sizeof(int));
  *a = 10;
  printf("a = %d\n", *a);

  printf("beg = %d\n", beg);
  printf("addr a = %d\n", a);
  printf("delta = %d\n", (char*)a - (char*)beg);
  modify_heap();
  printf("a = %d\n", *a);

  getc(stdin);
  free(a);
}

At a high level, this is what we are doing:

  1. Using sbrk(0) to get the beginning of the heap (look at man sbrk for more info. Point is sbrk simply extends the data segement). Comparing this with the malloc'ed address and checking the delta. I had naively assumed that it would be 0, or close to it. That wasn't the case. It would be nice to understand how exactly malloc works and why this memory is being reserved. But, we don't have the time for that now.
  2. Getting the /proc/self/maps view into memory and getting to the heap entry. This code is as-is taken from the zpoline implementation and simplified for our usecase. Now, in this particular case, to modify heap, we don't need to look at the proc maps - we directly have the sbrk(0) value. But, this is for illustration purposes.
  3. Finally, overriding the memory with our own values. We are doing a brute-force override of the first 1000 bytes at the beginning of the heap section with a sequence of bytes, 1 0x2a followed by 3 0x00. What this is will become clear when you run the program.

If you run this program, you get:

$ ./a.out
a = 10
beg = 37306368
addr a = 37307040
delta = 672
From: 37306368, To: 37441536
a = 42
<waiting for input>

Let us see the output line by line:

  1. We create a variable a on the heap. It's value is initially 10.
  2. The beggining of the heap is at address 37306368 in this instance, using sbrk.
  3. The int pointer a is allocated the address 37307040.
  4. The delta between these two is 672. (This is the part I can't explain for now).
  5. The modifyheap function opens the proc map and finds the heap entry. It extracts out the from and to addresses. You can confirm that the from address is the same as what sbrk returned.
  6. The modifyheap function overrides the first 1000 bytes of the heap with our custom values. (If you have a larger delta than 1000 on your system, the next step won't work. Retry by changing the range in the modifyheap function.)
  7. Finally, we print the value of a, which is now 42. How?

This is because 0x2a is 42 and my system is Little Endian.

$ lscpu | grep Endian
Byte Order:          Little Endian

See this diagram on Wikipedia, if it helps understand the byte filling.

Now, while the program is running, you can manually go and check /proc/<pid>/maps (get the pid using ps or pgrep).

There is one final thing here. If you Ctrl-C the program now, everything is fine. If you actually enter a character like "a" and hit Enter (since we used getc in the program), you will see the following:

munmap_chunk(): invalid pointer
Aborted (core dumped)

This is to be expected, since we went and messed up the entire heap, including parts which malloc must be using for its book keeping. It will be interesting to understand exactly how malloc uses the heap and work around this, but this is a topic for another day.

6. Creating the trampoline: mmaping address 0

We saw in the previous section, how to go about rewriting arbitrary addresses. But, the trampoline is not at an arbitrary address, but at the bottom of the memory region, right at 0.

The difference in these two cases, is that in one case, the address is already mapped. The address at 0 is definitely not mapped.

Since, there are other problems with using address 0 as detailed in the paper, let us work with some other address. In this section, we will aim to:

  1. Use mmap to map an arbitrary address.
  2. Programmatically load in some executable in direct binary form.
  3. Run the loaded function.

As you can see, this is a super simplified version of the trampoline. In the trampoline, the rewritten syscall will auto trigger the loaded function. In our case, let us do this manually.

6.1. Preparing our function

Let us start simple, with an add function.

int add(int a, int b) {
  return a + b;
}

int main() {
}

You might wonder, what is the point of this program? The point is, we shall compile it and disassemble it to get the binary instructions. In my case, it looks like the following:

0000000000400536 <add>:
  400536:       55                      push   %rbp
  400537:       48 89 e5                mov    %rsp,%rbp
  40053a:       89 7d fc                mov    %edi,-0x4(%rbp)
  40053d:       89 75 f8                mov    %esi,-0x8(%rbp)
  400540:       8b 55 fc                mov    -0x4(%rbp),%edx
  400543:       8b 45 f8                mov    -0x8(%rbp),%eax
  400546:       01 d0                   add    %edx,%eax
  400548:       5d                      pop    %rbp
  400549:       c3                      retq

Keep this aside, it will be used later - to be loaded into our runner.

6.2. The main program

Here, we will write a simple version of the program which maps an arbitrary address, writes some binary data into it and then executes it.

#include <sys/mman.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

// helper to wait for user input
void uwait() {
        printf("Press y to continue...\n");
        getc(stdin);
        int c;
        while ( (c = getchar()) != '\n' && c != EOF ) { }
}

int (*myfun) (int a, int b);

int main() {
        char *mem;

        printf("Check maps now using: cat /proc/%d/maps\n", getpid());
        uwait();

        /* allocate memory at virtual address 0 */
        mem = mmap(0x32000, 0x1000,
                        PROT_READ | PROT_WRITE | PROT_EXEC,
                        MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED,
                        -1, 0);
        if (mem == MAP_FAILED) {
                fprintf(stderr, "map failed\n");
                fprintf(stderr, "NOTE: /proc/sys/vm/mmap_min_addr should be set 0\n");
                exit(1);
        }

        printf("Mmap done!\n");
        printf("Check maps now using: cat /proc/%d/maps\n", getpid());
        uwait();

        // hold pointer to starting point, since we will keep moving the mem pointer
        void *base = mem;

        // macro to help load data
#define W(d) *mem = d; mem++

        // load binary in
        W(0x55);
        W(0x48); W(0x89); W(0xe5);
        W(0x89); W(0x7d); W(0xfc);
        W(0x89); W(0x75); W(0xf8);
        W(0x8b); W(0x55); W(0xfc);
        W(0x8b); W(0x45); W(0xf8);
        W(0x01); W(0xd0);
        W(0x5d);
        W(0xc3);

        printf("Loaded binary.\n");

        // execute the virtual function now
        myfun = (int (*)(int, int))base;

        printf("Sum is: %d\n", myfun(12, 30));
}

This is a bit long, so let us understand this step by step.

  1. The function uwait is just a helper to pause the program. This is useful to examine proc maps in between the processing.
  2. We call mmap to load 1 page - 4kb - 4096 bytes (0x1000 in hexadecimal) at the location 0x32000. Note that it is important to choose a start address which aligns with page size, or an error is thrown. On my system, 0 didn't work even with mmapminaddr set to 0, but I will ignore that for the moment. The value 0x32000 is chosen to not clash with any existing value in the maps.
  3. Once we have the memory, we manually fill it byte-by-byte with the binary data we already have by examining the earlier compiled add function. To do this, we use a helper macro which simply fills the current location "mem" and increments the pointer. Since we are hardcoding the binary, this will obviously only work on the same instructions set, hardware combination. You can compile the add function on your system and see the produced binary and change this if needed.
  4. We create function pointer (with matching signatures) and point it to the base location of this new mapped memory. Finally we call the function.

When you run it, you will see the following:

(base) chandergovind:zpoline$ ./a.out
Check maps now using: cat /proc/1238745/maps
Press y to continue...
y
Mmap done!
Check maps now using: cat /proc/1238745/maps
Press y to continue...
y
Loaded binary.
Sum is: 42

Look at that! We calculated the sum of 12 and 30 by running them through our loaded binary. I keep wanting to write code, but as you see, it is not really code. Maybe compiled code, or binary instructions.

If I examine the proc maps file at the 2 locations where it prompts you to, you will see 1 line different, the first one, pointing to address 0x32000.

00032000-00033000 rwxp 00000000 00:00 0
00400000-00401000 r-xp 00000000 fd:02 268576133                          /home/chandergovind/Documents/Dabblings/zpoline/a.out
00600000-00601000 r--p 00000000 fd:02 268576133                          /home/chandergovind/Documents/Dabblings/zpoline/a.out
00601000-00602000 rw-p 00001000 fd:02 268576133                          /home/chandergovind/Documents/Dabblings/zpoline/a.out
009ad000-009ce000 rw-p 00000000 00:00 0                                  [heap]
7f86c864e000-7f86c8809000 r-xp 00000000 fd:02 1673184                    /usr/lib64/libc-2.28.so
...

All other lines are the same, obviously. Notice how the "read", "write" and "execute" protection bits are all set. If we didn't set the "x" bit, we wouldn't be able to execute the code like we just did.

7. The disasm library

Since this is a new topic to me, let us do this in a separate part 2.

8. dlmopen

The paper uses dlmopen to load the actual hook functions, which should NOT be re-written. Let us do this in part 2.

9. Creating a zpoline launcher

The paper mentions an alternative approach to LDPRELOAD needed for static binaries, though this is not there in the repo.

Check it for yourself. Create the simple constr library, and try to run it against a simple static binary like we did earlier. You will find that the library is never loaded.

Instead, we would need a different way to rewrite the given input program. Let us do this in Part 3.

10. Conclusion

This was a long post. Hopefully, this gives you some useful information - I certainly learnt a number of things as I worked through zpoline.