A Deep Dive into Application Binary Interfaces: Lessons learned from Nadya

Optimium

Solutions

Company

Resources

Contact

Select Language

Optimium

Solutions

Company

Resources

Contact

Select Language

Technology

A Deep Dive into Application Binary Interfaces: Lessons learned from Nadya

Hello! My name is Jin-hwan Shin, and I’m developing Nadya at ENERZAi. (Nadya is a metaprogramming language independently developed by ENERZAi and plays a key role in the AI inference optimization engine Optimium.) In this post, I’d like to briefly share a problem we encountered while developing Nadya and how we resolved it.

Jinwhan Shin

February 26, 2025

Hello! My name is Jin-hwan Shin, and I’m developing Nadya at ENERZAi. (Nadya is a metaprogramming language independently developed by ENERZAi and plays a key role in the AI inference optimization engine Optimium.) In this post, I’d like to briefly share a problem we encountered while developing Nadya and how we resolved it.

During the process of testing Nadya, we discovered that a segmentation fault would occur in the Nadya runtime library whenever we used aggregate types (struct types). Even though the code was fine at the IR level, once it was compiled into machine code and run, the memory address would corrupt. Tracing this issue with a debugger at the instruction level, we found that function parameters and return values on the C++ side were being passed as pointers, not by value. We then investigated why the value we passed was being received as a pointer in C++ and concluded that it was due to rules defined by the Application Binary Interface (ABI).

In this post, I’d like to share what I learned while solving this problem — namely, what an ABI is, and how LLVM handles it.

What is an ABI?

If you’re not a developer who works closely with hardware, the concept of an Application Binary Interface (ABI) might feel somewhat unfamiliar. Let’s start with a brief overview of what an ABI is.

Most developers frequently use an Application Programming Interface (API) in their work. An API is an interface that defines how a library designer intends for certain information to be passed, how specific tasks should be performed, and which classes or functions should be used to achieve those tasks.

Similarly, for compiler or hardware developers, an Application Binary Interface (ABI) is a set of rules defining things like the sizes of types such as int and long, how data represented by structs is laid out in memory, which registers can be used inside a function, and which registers should be used to pass parameters or receive return values.

Hence, the ABI is affected by the CPU’s instruction set architecture (ISA), the operating system (OS), and the compiler. In the past, ABIs varied significantly between compilers, but today, the differences in ABIs among compilers have been greatly reduced.

How ABI is implemented in LLVM and Clang?

Since LLVM is a compiler infrastructure that must handle various target architectures and operating systems, it needs to support the different ABIs of each hardware and OS. Let’s take a look at how LLVM manages these different ABIs.

This post is based on the llvmorg-17.0.6 tag (commit id 6009708b4367171ccdbf4b5905cb6a803753fe18) of the LLVM repository (https://github.com/llvm/llvm-project). Since the codebase is vast, instead of inserting code directly, I’ve provided footnotes referencing file paths and line numbers.

In LLVM, there is a data structure called llvm::DataLayout¹ that describes information (endianness, alignment, primitive type sizes, etc.) which varies depending on the target architecture and OS.

llvm::DataLayout is represented as a string split by - (dashes), and the possible values are as follows:

e: Indicates that the target architecture is little-endian.
E: Indicates that the target architecture is big-endian.
iN: Describes the ABI alignment and preferred alignment for an Nbit integer. For example, i32:32:32.
fN: Describes the ABI alignment and preferred alignment for an Nbit floating point. For example, f64:64:64.
vN: Describes the ABI alignment and preferred alignment for an Nbit vector. For example, v256:256:256.
p: Describes the pointer size, alignment, and address space for the target architecture (for environments that divide memory into RAM/ROM/MMIO or GPU memory regions, etc., the pointer may need to reflect which memory space it is referencing by using a unique number). For example, p0:64:64:64.
n: Describes the native integer sizes that the target architecture supports. For example, if it supports both 32-bit and 64-bit registers: n32:64.
S: Describes the stack alignment. For example, S256.
F: Describes the alignment of function pointers. If prefixed with i, it indicates it is independent of the function code alignment (i.e., the code itself does not need to be aligned to this boundary). If prefixed with n, the function code must also be aligned to this boundary.
P: Describes the address space where function (code) is stored.
A: Describes the address space for stack memory.
G: Describes the address space for global (data) memory.
m: Describes the name-mangling rule. This applies a mangling rule suitable for the binary format or architecture.
e: ELF – Binary format used by Linux/UNIX-like OS.
l: GOFF – Binary format used by IBM’s z/OS.
o: Mach-O – Binary format for macOS, iOS, and other Apple operating systems.
m: MIPS – MIPS architecture.
w: WinCOFF – Binary format used by Windows.
x: WinCOFF-X86 – Windows binary format specialized for x86.
a: XCOFF – Binary format used by IBM’s AIX OS.

// Data layout for macOS AMD64 (x86-64)
e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128

// Data layout for Windows ARM64
e-m:w-p:64:64-i32:32-i64:64-i128:128-n32:64-S128

// Data layout for Hexagon
e-m:e-p:32:32:32-a:0-n16:32-i64:64:64-i32:32:32-i16:16:16-i1:8:8-f32:32:32-f64:64:64-v32:32:32-v64:64:64-v512:512:512-v1024:1024:1024-v2048:2048:2048

Using this string defined in llvm::DataLayout, LLVM and Clang compute in advance the sizes and alignment information for the types supported by the target architecture and OS. Based on this information, LLVM then computes the size of structs and the alignment of member variables, handles call stack alignment, machine code alignment, and so on. The string is either predefined in llvm/lib/Target/<arch>/<arch>TargetMachine.cpp files or composed from llvm::Triple. You can also manually provide the string if needed, but if you use the data layout that LLVM provides, you can obtain it from the llvm::TargetMachine class. Below is a rough outline of how to do that.

std::string triple = "..."; // e.g) aarch64-linux-gnu, arm64-apple-darwin, x86_64-pc-windows-msvc
std::string error;
const llvm::Target *target = llvm::TargetRegistry::lookupTarget(triple, error);
if (!target) { /* do some error handling */ }

// cpu, featureStr, options and RM can be empty value; It does not affect creating llvm::DataLayout.
llvm::TargetMachine *tm = target->createTargetMachine(triple,
                                                      /*cpu=*/"",
                                                      /*featureStr=*/"",
                                                      /*options=*/llvm::TargetOptions(),
                                                      /*RM=*

How Clang handles calling convention

While LLVM provides various features to ensure code generation adheres to the ABI, it doesn’t do everything on its own. For the calling convention in particular, LLVM only handles the basics (such as which parameters go in registers versus on the stack). Deciding which parameters are passed directly and which are passed indirectly (by pointer) is done by the front end, like Clang.² To do this, Clang defines a class called ABIInfo and stores target architecture- and OS-specific ABI information in the clang/lib/CodeGen/Targets folder so that it can generate code conforming to the ABI of the target environment.

The logic that determines how function parameters are passed is in the ABIInfo::computeInfo(...) member function³. This method is virtual, and each target architecture file in clang/lib/CodeGen/Targets provides its own detailed implementation. Though there are too many details to cover comprehensively here (since each target architecture and OS differs), the key rules determining whether a function’s parameter is passed through a register, on the stack (by value), or as a pointer (for reference) are roughly as follows:

Is the type’s size below some threshold that the ABI allows for register passing?
Is it a trivial class in C++ terms?
Have we exhausted all available registers for function parameter passing?

Using this information, Clang generates IR code in such a way that LLVM can produce machine code following the calling convention.

Looking at how Clang creates function declarations,⁴ you’ll see that Clang gathers information about arguments and returns by calling getTypes().arrangeGlobalDeclaration(), and then creates llvm::FunctionType by calling getTypes().GetFunctionType()⁵. In CodeGenTypes::GenFunctionType(...), Clang checks the CGFunctionInfo for each argument. If the argument is meant to be passed directly (ABIArgInfo::Direct, ABIArgInfo::Extend), it uses the original llvm::Type corresponding to that parameter. If it needs to be passed indirectly (ABIArgInfo::Indirect, ABIArgInfo::IndirectAliased), then that parameter type is changed to a pointer type when creating the function signature, resulting in an llvm::FunctionType. When a return value is a large struct or something similar, it is also returned indirectly through a pointer, so an implicit function parameter is added for that purpose.⁶

When generating IR code for function calls, the logic is in CodeGenFunction::EmitCall(...)⁷. First, it checks how the return value should be passed based on CGFunctionInfo. If it needs to be returned indirectly, Clang creates an AllocaOp to allocate memory for the return value⁸. If a function parameter needs to be passed indirectly, Clang allocates stack memory with AllocaOp, stores the original value there, and then passes that memory address to the function. For parameters that must be passed directly, no additional work is needed, and the original value is passed as is.⁹ Next, Clang uses the prepared parameters to generate a CallOp (or an InvokeOp if exception handling is involved) to call the function.¹⁰ Finally, after the call, Clang handles the return value. If the function was transformed to return its value indirectly, Clang inserts code to read the return value from the stack memory allocated at the beginning.¹¹

With these steps, Clang produces IR code that ensures machine code generated by LLVM adheres to the calling convention.

Lessons Learned

Through this process, Clang and LLVM compile C/C++ code into machine code that can be executed on a real CPU, while following the ABI of the target architecture and OS. This means the machine code produced will interoperate seamlessly with code produced by other compilers such as GCC or MSVC, and you can safely use external libraries that have already been compiled.

During Nadya’s development, we initially overlooked these details and spent several days struggling with unexplained memory bugs. However, it turned out to be a valuable lesson in understanding the rules our code follows when executed at the hardware level.

Because there’s so much ground to cover, I’ve had to condense the content significantly. But I hope this post provides some helpful insights for those who are curious.

<Footnote>

llvm/include/llvm/IR/DataLayout.h
It appears that LLVM can handle certain scenarios using attributes like sret, inalloca, byref on its side. But Clang does not currently use those in its implementation.
clang/include/clang/lib/CodeGen/ABIInfo.h
clang/lib/CodeGen/CodeGenModule.cpp:3592
CodeGenTypes::GenFunctionType(...), clang/lib/CodeGen/CGCall.cpp:1619
clang/lib/CodeGen/CGCall.cpp:1662
clang/lib/CodeGen/CGCall.cpp:4905
clang/lib/CodeGen/CGCall.cpp:4972
clang/lib/CodeGen/CGCall.cpp:5005
clang/lib/CodeGen/CGCall.cpp:5544
clang/lib/CodeGen/CGCall.cpp:5690