Optimium 101 (9) — Ownership System of Nadya

Optimium

Solutions

Company

Resources

Contact

Select Language

Optimium

Solutions

Company

Resources

Contact

Select Language

Product

Optimium 101 (9) — Ownership System of Nadya

We will discuss how Nadya can safely manage memory without using garbage collector, and how it brings its guarantee to tensors. Furthermore, we will have discussion about why Nadya ownership system is designed, and why it is better for both programmers and compiler optimizations.

Jaewoo Kim

August 19, 2024

Hello, my name is Jaewoo Kim, and I am developing Nadya at ENERZAi. Today, we will deep-dive into technology under Nadya. We will discuss how Nadya can safely manage memory without using garbage collector, and how it brings its guarantee to tensors. Furthermore, we will have discussion about why Nadya ownership system is designed, and why it is better for both programmers and compiler optimizations.

Restrictions of previous memory management models

Programming languages can be categorized into two groups based on how they manage memory. Most popular group is “managed” language, where allocation and deallocation is done implicitly by garbage collector. Many languages including C#, Java, Python, and Javascript use this methodology. Garbage collectors kick in during the execution, and they gather all resources that are no longer used in the program. This is very convenient from the perspective of compiler, but program’s performance can be limited by the efficiency of the garbage collector.

On the other hand, “Unmanaged” programming languages require programmers to allocate and deallocate memory explicitly. C and C++ are great examples. These languages do not require garbage collectors since it is the programmer’s responsibility to manage memory safely. This makes programming complicated and demands greater effort and caution from the programmer.

While Most programming languages fall into the two groups mentioned above, Rust introduced a concept called ‘ownership’ to manage memory safely without relying on a garbage collector. In Rust, compiler statically analyzes how memory is used throughout the program, and which value owns which memory, and when it should be freed.

Inspired by Rust, we have decided to introduce enhanced ownership system for Nadya. Since Nadya has to be both extremely fast and memory-safe, this was the only viable option. We could not use garbage collector since it could lead to inconsistent program performance and memory usage. Furthermore, we did not want to place the burden on programmers to ensure that the program is free from memory leaks.

Nadya ownership system

Nadya extends both Rust and Descend, and provides more flexible, safe model for both CPUs and GPUs. At Nadya v0.3.0, Nadya introduced its own ownership system for managing data without using garbage collector. It uses similar ownership model used in Rust, where “binding” may own data and data can be borrowed from binding to gain access.

Before going further, let’s discuss some key concepts in the Nadya ownership system.

{
 let value = 10 // "value" is binding, and "10" is data
 ...
 // Lifetime of data "10" must end before here, since its owner binding "value" cannot escape the scope
}

Data (“10” in the example)

Data refers to any information that requires memory for storage. Memory here may be placed on stack, heap or even processor register if it fits.
Data may be copied, or moved to other memory. While “copy” keeps source data, while “move” invalidates source data

Binding (“value” in the example)

Binding is a reserved place where data can be stored, and its lifetime is limited to the duration of the scope in which it is defined. Binding may ‘own’ data.
If data is bound to certain binding, the lifetime of the data becomes same as that of the binding unless it is moved out of the binding.

Lifetime (implicitly attached to “data” by compiler)

Lifetime is a property of “data” that specifies how long the data remains valid within a program.
If data is no longer used, its lifetime ends immediately.
When the lifetime of certain data becomes invalid, the data is destructed.
The Nadya compiler includes lifetime inference, which determines the lifetime of every data used in the program and decides when the data should be destructed.

and here are the actions that may be done.

Copy

Copy creates an exact duplicate of the original data.
Copied data is not linked to the source data and has its own lifetime

let a = 10
let b = a // This is Copy!
print(a) // binding "a" is still alive (owns value)

2. Move

“Move” transfers “Data” from one location to another.
It does not create new lifetime.

let a = 10
let b = move(a) // This is Move!
print(a) // Error! binding "a" does not have any data (it has been moved out)

3. Borrow

“Borrow” creates loan from other data that is already bound to a binding.
Borrowed data is a stack address of the data that is bound to a binding
The lifetime of borrowed data cannot exceed the lifetime of the binding it is borrowed from.

let a = 10
let borrowA = &a
print(a) // prints 10
print(borrowA) // prints 10. Points to 'a'

Let’s take a look at some examples for further understanding.

module core::Vector

let owner = core::Vector::ones<i32>() // (1) Creates empty vector
let borrower = &owner // (2) Immutably borrows from "owner". May access 'owner' but may not modify it

// (3) Prints inner value (Note : borrower.get(0) returns borrow type)
print(borrower.get(0))
// Note : lifetime of data stored in vector is same as vector instance itself

// Lifetime of borrower ends here since it does not have further use.

let mutableBorrow = &mut owner // Mutably borrows from "owner". May access 'owner'

New vector type is created, and it is bound to owner. “owner” binding becomes the owner of the empty vector data.
“borrower” immutably borrows from “owner”. After this, “borrower” holds a reference to the borrowed data from “owner”, which is the stack address of “owner”

Note lifetime of “borrower” is bound to “owner”.

“get” method borrows the first element of the data.

Note that the lifetime of the returned data must be shorter than the lifetime of “owner”. In this case, lifetime of returned data ends immediately after the “print” function is executed.

Ownership system for tensors

Ownership system of Nadya was inspired by Rust, but extends capabilities, especially when dealing with tensors. In Nadya, tensor is a primitive data type that compiler can directly optimize, which plays key role in boosting Nadya’s performance. Nadya allows data to be partially borrowed from its owner using range analysis. The partially borrowed tensor is called Subview.

Using subview, programmers can use a tensor as if it were any other tensor. However, under the hood, it accesses the original data by performing index conversion operation that converts given indices to the indices in the original tensor.

Combining subview with ownership provides safety guarantees for tensor data. Part of owned tensor’s data can be borrowed by other binding, while ensuring there’s no violation on mutability and borrow rules, reducing the risk of data races and the need for synchronization barriers in parallel programming. According to Kopcke et al., researchers used this concept to prevent data race between different GPU execution resources. They expanded Rust’s borrow checker and adapted it to GPU programming to provide safety guarantees and simplify synchronization. In their model, partial view of the data can be borrowed and owned by certain execution resource, making data races between different execution resources easily detectable.

In Nadya, we expand this concept to general programming. We combined the concept introduced in Rust with the idea suggested by Kopcke et al. to apply these principles to general-purpose programming without sacrificing safety guarantees.

Let’s start with simple Nadya code.

let myTensor = tensor((2, 3), 0) // (1) Initialize zero tensor of shape (2, 3)
// (2) Borrow whole range of the tensor
let borrowedTensorSimple = &myTensor
// (3) Borrow tensor with range (0:2:1, 1:3:1)
// Can be thought of tensor with shape (2, 2)
let borrowedTensorWitRange = &myTensor[(0:2:1, 1:3:1)]

Tensor is created at (1), and is borrowed from (2) and (3). While (2) borrows every element of the tensor, (3) only borrows part of the tensor. Both creates tensor subview, which is a subset of the original tensor that can be considered as a tensor with constrained shape when indexing. In (3), the tensor subview can be treated as a tensor with the shape (2,2) since it is indexed with (0:2:1, 1:3:1). (Note. a:b:c means “from a to b, with step c, excluding index b”)

Nadya compiler tries to statically analyze dependency relationship between subview tensor and origin tensor. If this process is successful, Nadya compiler can pre-optimize indexing expressions between a subview tensor and the origin tensor, converting subview indexing to origin tensor indexing internally. If the Nadya compiler fails to do so (often due to an uncertain execution path), the indexing logic is dynamically computed at runtime.

In the example above, since the entire use-def chain from the borrowed tensor to the origin tensor is statically visible to the compiler, compiler can optimize load & store operation for borrowed tensor by directly accessing the origin tensor.

However, there are cases that source of the borrowed tensor is not visible in static analysis. One of the example is passing borrowed tensor to function.

let myTensor = tensor((2, 3), 0)
let borrowedTensor = &myTensor[(0:2:1, 1:3:1)]

myFunc(borrowedTensor)


fun myFunc(input : &tensor<i32, 2>){
 // How can we find the location in original tensor outside the function?
 let borrowedTensor[(1,)] 
 ...
}

In this example, borrowedTensor is passed as an argument to the function. In this case, compiler cannot assume where myFunc is defined. It can be in other module or translation unit, which is not visible in current translation unit. Therefore, resolving index to original myTensor from borrowed Tensor must be dynamically handled.

Therefore, Nadya internally expresses borrowed tensor as follows:

template<typename T, std::size_t rank>
struct Tensor{
 T* Data; // Pointer to the data where tensor is stored. Null if this tensor is borrowed
 int64_t Shape[rank]; // Shape of the tensor
 int64_t Stride[rank]; // Stride of the tensor
 int64_t Offset[rank]; // Index offsets
 Tensor<T>* ParantTensor; // Pointer to the stack address of parent tensor. Null if tensor is not borrow of other tensor
}

With this structure, since borrow checker guarantees that the parent tensor remains alive as long as a given tensor is borrowed, tensor can be used in anywhere in the program, even in different translation unit without sacrificing safety guarantees. However, performance may be reduced due to dynamic handling of index offsets and indirections. That’s why Nadya compiler tries to statically analyze indexing rules whenever possible.

Philosophy behind ownership system of Nadya

Ownership system of Nadya is critical to ensure safety while keeping programming experience simple and concise. Furthermore, it also helps compiler to safely reason compilation process.

One of Nadya’s main contributions to the programming language and compiler industry is its integration of optimizations — previously only possible in domain-specific languages — into general-purpose programming. We achieved this integration by defining the language’s own semantics, which are easy to reason about.

Why does Nadya focus on making safe assumptions from the beginning?

Recently, due to the rise of AI and machine learning technology, compiler engineers are focusing on accelerating AI and HPC workloads. For example, [2]Polygeist( william et al. ) worked on parsing C code into MLIR, converting MLIR to SCoP constructs with its own passes, and utilizing existing tools such as [3]CLoog and [4]Pluto (Bondhugula et al.). Although this demonstrates performance improvements with polyhedral optimization, its scope is constrained to loops with simple control flow and clear load-store dependencies. Moreover, these compilers lack higher-level information of the program. They can assume whether a loop can be transformed into other form, but they do not understand the loop’s purpose. To address this issue, there have been efforts to raise lower-level IR into high-level abstractions used by XLA or MLIR Linalg dialects. For example, [5]mlirSynth (Brauckmann et al.) introduced a code generator that raises low-level MLIR dialect, which may be generated by [2]Polygeist, to higher-level dialects that can be optimized by XLA or MLIR Linalg dialect passes. This approach has shown better performance compared to those lowered directly to LLVM IR or even those using polyhedral frameworks.

As previous research shows, capturing high-level context information is crucial for generating highly optimized code. When a compiler can make safer assumptions at a higher level, it can perform more effective optimizations. This is the reason that domain-specific compilers, such as XLA or MLIR-Linalg passes, often achieve better optimization than lower-level compilers like LLVM-opt or certain polyhedral compilers. Nadya aims to support programmers to achieve their goals in a safer and more effective manner, reducing the risk of errors and helping compilers to make safer assumptions about the program.

Optimium

Solutions

Company

Resources

ENERZAi

Business number: 246-86-01405