Language Reference

Grackle supports many, but not all, features of the LLVM intermediate language. We have focused our efforts on supporting LLVM code that is generated from idiomatic C via the clang compiler. NOTE that Grackle currently only supports bitcode produced by the 3.6.* versions of LLVM/clang.

Basic Use

Grackle expects LLVM code to be provided as in the binary bitcode format used by LLVM tools, as a single linked file. For a simple C project, such a bitcode file can be produced and simulated using the following recipe:

clang -c -emit-llvm -g ${C_SOURCES}
llvm-link -o main.bc ${BC_OBJS}
grackle main.bc

Grackle expects to find a defined symbol named main which takes no arguments and returns an int. It will simulate the function to completion and set the return value of the main function as it’s exit code before exiting.

Grackle executes an abbreviated C startup routine. Global variables will be allocated and set to their initial values; but other ordinary pre-main startup procedures – such as automatically running functions with __attribute__((constructor)) – will not be performed. Likewise, post-main shutdown procedures are not performed.

Language Support Summary

The following LLVM language features are supported:

  • Function definitions, calls and returns via the standard C calling convention. This includes calls to varargs functions. (However, the instructions and intrinsics needed to implement varargs functions are not yet supported).
  • Standard control flow, including jumps, conditional jumps, switches, calls and returns. Calls both to statically-known addresses and to function pointers are supported.
  • Machine-word integer types, including the usual collection of 8, 16, 32 and 64-bit signed and unsigned integer types. The usual collection of signed, unsigned, bitwise and comparison operators are all implemented.
  • The basic floating point arithmetic types, float and double, including their usual arithmetic operators. However, note that all floating-point computation in Grackle is abstracted to infinite precision rational arithmetic, and floating-point values are interpreted as real numbers when queries are made to solvers with an appropriate real-number theory.
  • C aggregate types (arrays, unions, structs) are fully supported.
  • Global-, heap- and stack-allocated memory. Grackle will strictly track allocation boundaries and lifetimes and will explicitly fail if out-of-bounds memory accesses or illegal pointer arithmetic is performed.
  • If compiled with debugging symbols, the simulator will report source files and locations when printing errors.

The following LLVM features are unsupported, or only partially supported:

  • Vector types and SIMD instructions. Basic vector types and operations are supported, but somewhat inconsistently. Some operations which are intended to work componentwise on vector values are implemented only for scalar arguments.
  • Computed jumps/gotos. Supporting these would require nontrivial extensions to Grackle and are quite rare, except in very specific circumstances (code for interpeters and VMs).
  • Limited support for converting pointers to/from integer values. Pointers in Grackle are more than just integer bit-patterns, and cannot be fully converted. There is limited support for casting to/from integers in order to support certain patterns of computed addresses that cannot be easily expressed as a getelementpointer instruction. However, the operations that can be performed on pointers are limited to the following: subtracting two non-null pointers from the same live allocation to get an offset value; and adding integer offsets to a live non-null pointers. One cannot, for example, use the XOR trick for implementing doubly-linked lists with a single machine word because the full range of integer operations are not defined on pointers.
  • LLVM instructions and intrinsics for accessing varargs are not currently implemented.
  • LLVM instructions for throwing and catching exceptions are not supported; currently there are no plans to support exceptions. Likewise, the nonlocal control operators setjmp() and longjmp() are not supported, and probably will not be.
  • Most of the standard C library is not currently implemented, but individual pieces may be easy to support in the future. Many LLVM intrinsics are also not currently supported, but can be added on an as-needed basis.
  • Multithreaded code is not supported, including atomic memory access primitives and thread-local storage.
  • Inline assembly is not supported.
  • All function and parameter attributes are ignored.
  • Function prefix and prologue data are not supported.
  • Garbage-collection intrinsics are not supported.