Patch Integration Engine
News
Download
Prepatch
Documentation
Contact
PIE Internals

This document covers the inner workings of PIE at a technical level. An understanding of this document is by no means required to utilize PIE. For a broad description of the project see the document "Introduction". The reference documents describing the individual parts of PIE are also a helpful resource.

Inserting a Prepatch

The important question when considering the insertion of prepatches is not how to do it, but where to do it. There are in fact numerous different techniques for the insertion of code into a runtime process. The technique that PIE uses is relatively simple. In an ELF binary there are two contiguous segments, text and data. Between these two segments lies a variable amount of padding to ensure that the data segment lies on a seperate page boundary. This padding is essentially part of the text segment, so PIE can use this space for its prepatches as it is marked executable.

The main problem with this technique lies in the fact that the padding has a variable length. This means that a binary could potentially not have enough padding to fit the prepatch. In this situation, the binary cannot be prepatched by this technique. Currently, PIE only supports the use of text padding for its prepatches, but multiple other techniques are being considered for future releases.

Finding a Library Function

Finding a library function is a fairly simple task because of the availability of the linkmap. The linkmap is a chain of structures describing the currently loaded shared objects. This structure contains the base address at which the shared object is loaded at. The linkmap can be found in the DEBUG entry of the dynamic section.

Once the linkmap has been found the target function can easily be obtained. It is simply a matter of traversing the chain, looking at each shared objects symtab for a matching entry.

Finding a Local Function

Local functions are those contained within the binary's own text section. There are two scenarios when trying to resolve functions in a text section. Firstly, where the binary has debugging symbols and secondly, when the debugging symbols have been stripped.

The first scenario does not present a significant problem. The binary's sections can be traversed until the debugging symbol and string tables have been found. From there, it is simply a matter of finding a matching entry.

The second scenario however presents a challenge. Assembler has no need for function names - it uses relative addresses instead. A binary with no indication whatsoever of the names of the functions it contains is perfectly valid and can run without problems. Unfortunately this makes finding the address of the function that is to be prepatched a significantly difficult task. The solution is to use function fingerprints.

Fingerprints

Note: this section is out of date. An entirely new fingerprinting system has been developed that renders much of the technique discussed below obsolete.

The concept of function fingerprints is described in the document "Pfp Reference" and will not be covered here. This section will instead describe the different function fingerprinting technique used in technical detail.

There are two types of techniques for fingerprinting a binary, opcode aware and opcode unaware. The difference is essentially one technique understands the function of the hexadecimal opcodes it is examining, whereas the other does not. PIE uses a form of opcode unaware fingerprinting. It examines the following aspects of a function, forming a rank of based on the result of each of the tests:

- The size from the prolog of f_n to the prolog of f_(n+1).
- The offset of f_n from the start of the text section.
- The amount of calls to f_n from anywhere in the text section.
- The size of the space made available on the stack for variables.
- The count of each different byte of data within f_n.

The likelihood that some of these tests would change dramatically when compiled under different conditions is quite high. This explains the importance of getting the compilation environment of the debugging enabled binary as close to that of the original as possible. A further fingerprinting technique using an opcode aware method is likely to be developed in the near future. The hope is that this will provide more accurate results over a larger difference in compilation environment.

Redirecting Functions

Just as there are two types of functions that can be found by PIE, there are two different ways of redirecting them. The redirection of functions is an integral part of the prepatching of a binary, completing the process by redirecting function to prepatch.

Redirecting a library function is simply a matter of finding the function's location in the Global Offset Table and changing this entry to the address of the prepatch in memory. The GOT entry can be found by examining the Procedure Linking Table entry for the function, which itself can be found by examining the binary's dynamic symbol tables.

Local functions require a slightly stickier technique to redirect. As mentioned above, binary's replace function names with relative addresses. Function calls are also made with these relative addresses. In order to redirect all the calls to a function, the entire text section must be examined. Every single relative call's endpoint is calculated, and those which end at the address of the target function are redirected to the address of the prepatch.

Inserting Multiple Prepatch

Multiple prepatch files can be inserted into a single process because of a 32 bit identifier prepended to every inserted function. This identifier consists of a magic number in the high order 16 bits and the size of the proceeding function in the low order 16 bits. This allows for two things to occur: the discovery of any previous prepatch activity and the calculation of an adjustment entry point for any subsequent prepatch insertions.


Copyright (C) 2004, Ben Hawkes
SourceForge.net Logo