OCaml
OCaml can be compiled two very different ways, and the first job is to tell which one you are facing:
- Bytecode (compiled with
ocamlc, run by theocamlrunvirtual machine).filereports something likea /usr/bin/ocamlrun script executable. This is the easy case. - Native code (compiled with
ocamlopt).filereports a normal ELF/PE binary. There is no decompiler, so you read assembly with knowledge of the OCaml ABI.
Bytecode
Bytecode keeps a lot of information and is much friendlier to reverse.
ocamlobjinfo- lists the primitives, modules and (for.cmo/.cma) the symbols contained in a compiled object or bytecode executable.ocamlobjinfo a.outThe OCaml toolchain ships a bytecode disassembler (
dumpobj, sometimes packaged asocaml-dumpobj) that prints the VM instructions of a.cmo/.byte/bytecode executable. Closure and global names are often preserved, so the program logic is readable.js_of_ocamlcompiles an OCaml bytecode executable to JavaScript. Running it on the challenge binary turns it into readable JS, which is often the fastest way to understand the logic.js_of_ocaml a.out -o a.js
Native code
Native OCaml has no decompiler and its value representation is unusual, which is what makes it look strange in a disassembler. Keep these conventions in mind:
Tagged integers. Every value is one machine word. An immediate
intis stored as2n + 1(shifted left by one with the low bit set), so OCaml integers always look odd and arithmetic is full of shifts,lea,or 1andn+npatterns. The real value is(v - 1) / 2.Boxed blocks. Anything that is not an immediate is a pointer to a heap block preceded by a header word encoding its size, GC color and a tag. The tag identifies the shape:
0for tuples/records/variants with arguments,252for strings,253for floats,254for float arrays,247for closures. A constructor with no argument (like[]or a nullary variant) is just a tagged integer.Closures and currying. Functions are blocks (tag
247) holding a code pointer plus the captured environment. Partial application and multi-argument calls go through the runtime helperscaml_curryN/caml_applyN.Symbols. The runtime is full of
caml_*helpers (caml_alloc,caml_call_gc,caml_apply2, …) and the program’s own functions keep fairly descriptive names of the formcamlModulename__function_NNN. Following those symbols is usually enough to locate the interesting logic; the standard library appears ascamlStdlib__*.