Skip to content

OCaml

OCaml can be compiled two very different ways, and the first job is to tell which one you are facing:

  • Bytecode (compiled with ocamlc, run by the ocamlrun virtual machine). file reports something like a /usr/bin/ocamlrun script executable. This is the easy case.
  • Native code (compiled with ocamlopt). file reports a normal ELF/PE binary. There is no decompiler, so you read assembly with knowledge of the OCaml ABI.

Bytecode

Bytecode keeps a lot of information and is much friendlier to reverse.

  • ocamlobjinfo - lists the primitives, modules and (for .cmo/.cma) the symbols contained in a compiled object or bytecode executable.

    ocamlobjinfo a.out
  • The OCaml toolchain ships a bytecode disassembler (dumpobj, sometimes packaged as ocaml-dumpobj) that prints the VM instructions of a .cmo/.byte/bytecode executable. Closure and global names are often preserved, so the program logic is readable.

  • js_of_ocaml compiles an OCaml bytecode executable to JavaScript. Running it on the challenge binary turns it into readable JS, which is often the fastest way to understand the logic.

    js_of_ocaml a.out -o a.js

Native code

Native OCaml has no decompiler and its value representation is unusual, which is what makes it look strange in a disassembler. Keep these conventions in mind:

  • Tagged integers. Every value is one machine word. An immediate int is stored as 2n + 1 (shifted left by one with the low bit set), so OCaml integers always look odd and arithmetic is full of shifts, lea, or 1 and n+n patterns. The real value is (v - 1) / 2.

  • Boxed blocks. Anything that is not an immediate is a pointer to a heap block preceded by a header word encoding its size, GC color and a tag. The tag identifies the shape: 0 for tuples/records/variants with arguments, 252 for strings, 253 for floats, 254 for float arrays, 247 for closures. A constructor with no argument (like [] or a nullary variant) is just a tagged integer.

  • Closures and currying. Functions are blocks (tag 247) holding a code pointer plus the captured environment. Partial application and multi-argument calls go through the runtime helpers caml_curryN / caml_applyN.

  • Symbols. The runtime is full of caml_* helpers (caml_alloc, caml_call_gc, caml_apply2, …) and the program’s own functions keep fairly descriptive names of the form camlModulename__function_NNN. Following those symbols is usually enough to locate the interesting logic; the standard library appears as camlStdlib__*.