Rattle - an Ethereum EVM binary analysis framework

reCON Montreal 2018
Ryan Stortz
June 16, 2018

Abstract

The majority of smart contracts on the blockchain have no verified source code, but people still trust them to protect their cryptocurrency. These contracts should be auditable by a third party as they exist on the blockchain. EVM – the native code representation – was not designed with auditability in mind. EVM is implemented as a stack machine which makes it extremely difficult to identify and track variables.

Rattle is an EVM binary static analysis framework designed to work on deployed smart contracts. Rattle takes EVM byte strings, uses a flow-sensitive analysis to recover the original control flow graph, lifts the control flow graph into an SSA/infinite register form, and optimizes the SSA – removing DUPs, SWAPs, PUSHs, and POPs. The conversion from a stack machine to SSA form removes 60%+ of all EVM instructions and presents a much friendlier interface to those who wish to read the smart contracts they’re interacting with.

This talk presents Rattle, discusses its development process and design decisions, and explores binary auditing of Ethereum smart contracts.

Presentation

Hi, I’m Ryan Stortz and this is rattle.

Our agenda for this evening will start with a light dusting of ethereum smart contracts that Patrick didn’t cover. Followed by some more ethereum VM internals, and ending with a bunch of information on SSA lifting and optimization.

Don’t worry, Patrick and I coordinated a little bit so there shouldn’t be a ton of repeat information.

As I said earlier, I’m Ryan Stortz. I’m a principal security researcher at Trail of Bits in New York, I’ve been kicking around the industry for about 10 years and I’ve spent far too much time playing and running capture the flag competitions.

I’m also a huge Ferrari fan and I’m sad I wasn’t here last weekend for the race.

At CanSecWest earlier this year, a friend – Jay Little – and I presented a talk called ‘Blackhat Ethereum’. That talk was meant to introduce Ethereum and specifically binary EVM analysis to more traditional security people. In the first half of the talk we covered identifying and extracting contracts from the blockchain. In the second half, we demonstrated several ways to analyze binary contracts, how to identify vulnerabilities, and how to write an exploit and place it on the network. As a part of that talk, I briefly demo’d rattle but it wasn’t quite ready for prime time – this talk is about rattle – what it is, what it does, and how it does it.

Normally, this is where I’d introduce Ethereum Smart Contracts, but Patrick just spent an hour explaining them to you.

I’ll take the opportunity to explain exactly what an ICO Token is and how it works. Since it took me a while to grasp it conceptually.

So an Initial Coin Offering is the cybercurrency world’s equivalent of an Initial Public Offering – where shares in a company are first made available on a public exchange. When I first heard about ICOs, I had trouble understanding how they worked. To break it down:

  • ICOs are normally sales of an ERC20 token for Ethereum.
  • The ICO organizers normally sell the Ethereum to USD and run away with it to fund their start-up/product/etc.
  • ERC20 is a ABI-specification / interface / protocol
  • Contracts are expected to implement functions like transferFrom(address from, address to, uint tokens) which should move tokens from address from to address to.
  • But what does this mean? Well, it means they’re updating a giant hashmap of addresses to values with the new value. No token is ever created, it only lives in this persistent storage associated with the ERC20 contract.
  • To check your balance, you call a different ERC20 specified function ballanceOf(address owner).

Most users use MetaMask, a Chrome and Firefox plugin, to buy into these ICOs – which is possible because they all mostly implement ERC20. The exit scam isn’t normally encoded into the contract itself.

Now, I don’t want to explain ICOs to you all and have you jump into them all gung ho. A Satis Group report that came out in April tracked ICOs that raised more than $50m, of those:

  • 81% - fraudulent projects
  • 3.8% are successful
  • 1.6% are promising

Additionally, most of these ICOs are techincally securities and the Canadian Securities Administration or the SEC will come knocking if you run or possibly even if you invest in one.

The Ethereum VM is a harvard stack machine. Stack machines are pretty simple to implement. If you’re familiar with RPN calculators, that’s pretty much how they operate.

Programs written in EVM run every client as they validate the blockchain or mine new blocks – people call it a world “World computer” with “infinite” ram. Storage and execution length is limited by the gas charge of each instruction. For example, store operations are more expensive than add or subtract. These gas costs need to pay for the persistent storage and execution of these contracts until the end of time (or the end of cryptocurrency).

I argue that the EVM has 6 address spaces, making it quite the harvard architecture. There are different instructions and addressing areas for:

  • Code
  • Memory
  • Storage
  • Arguments
  • Return Arguments
  • Execution Stack

Not all opcodes are created equal. Some are more interesting than others. In my opinion, the important and security relevant opcodes are:

  • JUMPI, JUMP, RETURN
    • These instructions define control flow, meaning they’re interesting but not really security relevant!
  • REVERT, INVALID
    • Exception causing opcodes, these are for aborting contract execution (one returns all remaining gas the other consumes all the remaining gas)
  • CALLVALUE
    • The amount of ethereum in WEI sent to to the contract in that call
  • CALLDATASIZE, CALLDATALOAD
    • Arguments passed by the caller. Think argc and argv.
  • SSTORE, SLOAD
    • Manipulating persistant storage
  • CALL, CALLCODE, DELEGATECALL
    • Calling into other contracts. CALLCODE is deprecated and seriously broken. DELEGATECALL replaces it, but it’s still a huge code smell.
  • SELFDESTRUCT
    • Destroys the contract and returns all its value to the address passed as the first argument

If a contract doesn’t have a CALL, CALLCODE, DELEGATECALL, or SELFDESTRUCT in it, there’s no way to get its Ether out.

I have the contract loaded in Ethersplay here. I’m going to step through a few parts.

We start with PC zero, where we enter the dispatch function. The dispatch does a few things, first it sets up the free memory pointer. The value at memory offset 0x40 is the next available memory slot, in the initial case it’s 0x60. Next, it checks the size of the call data - aka the ethereum argc. If it’s zero, it branches to the default function - if there were no default function, it would branch to a revert instruction.

Next, it grabs the argument starting at offset 0. It masks it to 160 bits for some reason, then further masks that down to 4 bytes. These 4 bytes represent our function hashes. It then goes through each hash and compares it to the called hash. If it matches, it dispatches to the function. If none match, it falls through to the default function or reverts. Ethersplay has a couple built-in names here, from its list of known function hashes: kill() and balance(). Looking at kill, it appears to implement a SELFDESTRUCT, which kills the contract and sends all the remaining value to the address specified as an argument to the self destruct opcode. It does some loading from storage above, so it’s probably safe to assume the address is stored in storage.

Reversing this with Ethersplay is time consuming. Stack machines aren’t really conducive to reversing or analyzing. They’re super easy to implement but it really sucks tracing down all those dups, swaps, pushes, and pops. For this reason, I wrote rattle.

To discuss Rattle, we first have to introduce SSA form. In computer architecture, ISAs can be broken into a few groups: stack machines, register machines, register-file machines, and file machines. Almost every real CPU is a register-file machine, where there is a limited number of high speed registers and the ISA supports operations between registers and between registers and memory. The limited number of registers causes the compiler spill temporary values to some local storage (normally the stack) to relieve register pressure.

But what if we had an unlimited number of registers, then we could avoid scratch storage. That’s an infinite register machine.

Now, what if we made it so a register (or value) can only be assigned once. If we do this, values are versioned and locked in time. This also provides a bit of direct taint tracking (or data flow).

So, the goal starting was to convert EVM’s stack machine into SSA form. Rather than jumping right in, I tried to do a literature review to find other people lifting a stack machine to SSA and had a little luck. I was hoping to find a nice straight-forward algorithm, but I couldn’t find one. There’s a couple Java-related papers – java is also a stack machine – that did static analysis on java and used an ssa form, but no one published a good algorithm for it.

To identify the blocks, I used the superior recursive descent approach. I was being forward thinking, what if instructions jump into the middle of other instructions! We should support these malicious contracts.

Recursive descent is a worklist based algorithm. You start by adding the initial basic block to the worklist and you loop until the worklist is empty.

For each work loop, you take the work item off the list, disassemble until you reach a block-ending instruction like JUMP, JUMPI, STOP, RETURN, INVALID, REVERT, or SELFDESTRUCT.

  • If you hit a JUMP, create a block for the JUMP target, add it to the worklist, and then stop processing the block
  • If you hit a JUMPI, create two blocks, one for the fall through block and one for the true branch of the JUMP and add both to the worklist, and stop processing the block

The problem here is you have to trace the stack to identify jump targets. Additionally, not all the jump targets are in the current basic block, so you have to resolve parent stacks. To address this, I took the approach used in scripting langauges for scope resolution. If a DUP12 instruction is issued when the SP is only at 7, I look at the parent’s scope 5 from the end and use that value, if the parent stack isn’t deep enough, i go up the scope until the value is resolved.

While I was initially pretty happy with this approach, it turned out to be very problematic. For example, how to you resolve parent stacks if your CFG is incomplete? And there’s a bigger problem…

There are a couple of SSA algorithms that work on register-file machines that don’t require a separate CFG recovery pass, but we do.

This is the process:

  • Identify blocks
  • Link blocks
  • Trace stack
  • Wrap EVM Insns in SSA Insns
  • Skip DUP, SWAP, POP, PUSH – but perform their stack actions
  • ???
  • Profit!

But we have a problem, EVM doesn’t support internal subroutines. It has no local call or local return opcodes.

The Solidity developers get around this by pushing a return address onto the stack and – several blocks later – indirect jumping back. This is fine if only one function calls an internal function, but becomes a problem when two callers call into the same function.

If I trace function 1, I enter the inline function and mark it’s JUMP out on 44 as returning to #10.

If I then trace function 2, I never mark 44 as being able to jump to #30 because I only process basic blocks once.

Even if I did reprocess the blocks, it suggests that function 1 can flow to address 30, which is impossible.

Instead of going with recursive descent, I went with a linear sweep. Why? Well because the yellow paper actually defines control flow as only being able to go to valid jump targets and you can’t jump into misaligned instructions – and parity and geth enforce this.

So this time, we initially identify the blocks by a simpler algorithm:

  • If we encounter a terminating instruction end the block
  • If we encounter a JUMPDEST, start a block
  • If our terminator instruction was a JUMPI, start a block at PC + 1 and link it as the fall-through block.

Next, we trace the stack to link in any indirect jumps – but we had a similar problem as before. We would miss control flow due to inline function calls sharing blocks.

So, I kept ending up with unresolved control flow. I decided to give everything a re-think. This time armed with the fact that a block stack can never be greater than 32 entries deep. I started by identifying the blocks as before. I linked in fall-through blocks when I came across a JUMPI. Instead of resolving up to dominating blocks’ stacks, I pre-filled every basic block with 32 placeholder functions with metadata that includes their originating block and their initial SP. Then traced the instructions in each block and linked in resolvable jump targets – which meant jump targets where the target is pushed in the same block as the jump. At the end of the pass, I checked to see if I encountered a new edge. If I did, I throw a NewEdgeException and do it all over again with the new edge. I do this until the cfg converges. After this, my CFG is complete and my SSA instructions are half created. To finalize it I need to canonicalize them.

Let’s quickly walk through a SSA recovery example. On the left we have the EVM, in the middle we’ll have our EVM::SSA, and on the right is our recovery stack. Initially filled with placeholder values.

Our program counter is at the initial JUMPDEST, which we don’t lift we we continue to the next line.

Here we hit a CALLER instruction, which takes no arguments but returns the caller address of the transaction. For the return value we assign %0 and push it to the top of the recovery stack.

Next we have a DUP2, which takes the 2nd item on the execution stack and duplicates it.

The PUSH1 pushes a constant value. We temporarily lift the PUSH1 and assign it to %1. Later we’ll remove all the PUSH operations and propagate the constants.

Here we have an ADD, which takes two items off the execution stack as arguments. You can see it’s adding an unresolved value to our #0. The unresolved value will be resolved during CFG recovery, right now it just means that the value comes from a dominating basic block.

Jumping ahead a bit, we can see our EVM::SSA basic block being filled out further.

Jumping ahead again. During this stage of analysis, If I lift a jump and its target is decidable, I update the edges and it all starts over with more data. This allows me to have a better understanding of dominating basic blocks.

To finish lifting to SSA form, we need to resolve all these unresolved placeholder values. For this the algorithm is basically:

while function.dirty():
    function.resolve_phis()

Where the resolve phis function iterates over all the instructions and their arguments. If it identifies a placeholder value, it resolves it from the parent stack similar to the initial approach using the scripting engine style scope resolution chain.

There’s one difference though, if a block has more than one incoming edge, it creates a PHI instruction with as many arguments as incoming edges. I haven’t introduced it yet, but a PHI instruction is an unavoidable artifact of SSA form which basically says “whichever value was updated last”.

Removing stack operations isn’t the only thing we can do with SSA form. Since every value has exactly one version, we can trace all the “users” (or readers in my code) of a specific value version. If the value happens to be the result of a push and is constant, we can start implementing constant folding.

So that example from the tweet Patrick posted (on top) becomes (bottom)

ANDs with that constant indicate a uint160 type, or an address.

Additionally, we can now trivially graph variable definition and uses. Here’s an example from when I was debugging some impossible control flow bugs.

We can also recover memory locations by finding all MSTORE, MLOAD, and SHA3 instructions. And storage locations by tracing all SLOAD and SSTORE instructions.

We can identify external calls and checking to see if a smart contract can send ether.

Finally, we can trace loads from the first argument – which specifies the function hash – and comparisons against it to generically identify methods and split them off. Once we split them off we can recover their ABI by tracing further loads from calldataload. We can also check to see if the function is marked payable or not. It’s also trivial to identify the onlyOwner style modifiers once the functions have been seperated.

Demo

Rattle is in a private beta now. If you do smart contract stuff and would like early access I have a wufoo form. I’ll tweet this link out shortly, don’t worry about writing it down. We’ll be publishing it to github in the near future.

Update: Rattle is now available publicly on our GitHub.

Questions?