Optimium 101(10)-Modeling compilable general programming language #1

Optimium

Solutions

Company

Resources

Contact

Select Language

Optimium

Solutions

Company

Resources

Contact

Select Language

Product

Optimium 101(10)-Modeling compilable general programming language #1

Hello, this is Jaewoo Kim, who works as an engineer for developing Nadya, the programming language specialized for HPC. In this article, I will talk about general language features that Nadya uses to treat its own data, and basic semantics of it. After that, I will explain how it can be modeled in MLIR, with several examples. Hope this article can help someone who wants to understand architecture of Nadya, and how general programming can be modeled using MLIR infrastructure.

Jaewoo Kim

March 10, 2025

Hello, this is Jaewoo Kim, who works as an engineer for developing Nadya, the programming language specialized for HPC. In this article, I will talk about general language features that Nadya uses to treat its own data, and basic semantics of it. After that, I will explain how it can be modeled in MLIR, with several examples. Hope this article can help someone who wants to understand architecture of Nadya, and how general programming can be modeled using MLIR infrastructure.

Implementing semantics for general language

Modern programming languages have several rules for treating the data. Of course, each language have different rules for doing so, but in general ways, most of them are similar (at least for compiled languages such as C++, Rust .etc)

Here are several operations that counts.

Operations

copy : Deep copies (or clones) a value, decoupling new value with the source value. Copied values can be treated completely separately with the original value.
move : move all contents of the value to other value, and invalidate the source value.
borrow : create loan of other l-value (stack value) , which means borrowed value is “owned” by the source value. Typically modeled as stack address of the l-value
access : de-borrow the borrowed value, which can return original l-value from the borrowed value. However, l-value generated by access operation considers it’s lifetime is still borrowed from the original value. In other words, this value cannot be dropped out instead the owner of the borrowed value it was accessed from.
assign : Replace value of the mutable let-binding to the new value. assign can only be done on l-values, since it’s semantic replaces value stored on the stack memory. If let-binding is immutable, trying to assign something will cause compile error.

Copy

Copying the value creates complete new instance of the value, which does not have any relationship from the source. Copied value is considered as “Owned” value, which means it can be deleted, moved freely without having any restrictions. To be more specific, values that can be copied with simple byte-copy is considered as “trivial copyable”. Otherwise, type requires to implement “Copy” trait, which requires user or compiler (if type is builtin) to implement handler for copying the whole data.

let mut a = tensor((2i, 3i), 0.0f32)
let b = a // Copy is called here (default behavior of let binding the l-value is "Copy")
a[(0i, 1i)] <- 10.0f // modify a
print(toStr(b)) // b is not modified

Move

It would be beneficial to avoid deep-copy as possible in the program. also, It’s useful if we could make references to the value. In Nadya, such process is performed by move and borrow semantics.

For example, if we move the tensor to other value as following example, internal contents of value a is now possessed by value b . This process does not require deep-copy. It’s enough to do this with shallow copy. Therefore, nadyaCore.move operation is equivalent to llvm.load operation internally, but it exists to abstract this in higher level scope.

let a = tensor((2i, 3i), 0.0f32)
let b = move(a)
print(toStr(a)) // Error! (a is invalidated)

Borrow

let mut a = tensor((1i, 1i), 0.0f32) // create 2 dim tensor with shape (1i, 1i)
let mut b = &mut a // "lef mut" means "b can be reassigned" while "&mut" means a can be modified via b.
b[(0i, 0i)] <- 10.0f32 // assign value 10.0f32 to b's first index
print(toStr(a)) // prints tensor((1i, 1i), 10.000000f32)

Lifetime of borrowed variable such as b is bound to the lifetime of a. Which means, b cannot outlive the scope of a, and a cannot be destructed before b. This restriction is guaranteed by internal verification mechanism implemented in Nadya compiler.

When borrow is compiled to llvm-level, it’s actually no-op. It’s rather represented as an opaque ptr pointing to the stack address of a. Since borrowed value is ptr to it’s owner value, value that is going to be borrowed (owner value) must be an l-value.

For instance, the following code is not valid in Nadya. (This is possible in Rust, but it also creates l-value internally. In Nadya, we explicitly mark this as an error.)

let mut a = &10 // Wrong!

Access

let mut a = tensor((1i, 1i), 0.0f32) // create 2 dim tensor with shape (1i, 1i)
let mut b = &mut a // "lef mut" means "b can be reassigned" while "&mut" means a can be modified via b.
let accessed = @b // '@' operator is used to access the borrowed value.

Access operation de-borrows borrowed type and returns its source type. You can think of this as accessing pointer from the value. However, lifetime analysis algorithm of Nadya still considers de-borrowed value to be dependent of its source value. In the above example, lifetime of accessed is still considered as borrowed value of a. Which means, a will not get dropped before accessed is dropped.

Assign

let mut a = 10
a <- 20 // Replace value stored inside the stack memory of a to "20"

let mut tensorA = tensor((2i, 3i), 0) // i32 tensor
tensorA[(0i, 1i)] <- 1 // partial-assign the value. We model this as partially replacing the stack-allocated value

Assign operation replaces value stored on the stack to the new value. The mut keyword in the let binding means it’s stack memory can be completely or partially replaced to something else. (Don’t be confused with &mut on the right hand side)

Implementing general language features using MLIR

As a backend technology, Nadya uses MLIR, a compiler infrastructure that allows the definition and customization of multi-level IRs. MLIR is truly a powerful infrastructure for developing languages, since it can abstract complicated operations with IRs defined in different abstraction levels. It also implements lots of default IRs (dialects) that can perform optimization on different abstraction level. While MLIR infrastructure makes it possible to implement a general-purpose language, MLIR lacks default IR for modeling the general purpose language directly.

To implement a general purpose language, we should clarify the following concepts

Which values are on stack, and which values are not?
How are the values related to each other in memory?
How to copy & delete data types properly
(For Nadya) How can we benefit the optimization with this higher-level concept?

However, by default, MLIR lacks ability to model such kind of concepts. Therefore, we had to define one ourselves. Let me discuss how we did this, and propose this concept to MLIR community for other potential language developers around the globe.

Separating l-values and r-values

What we first thought of is separation between l-value and r-value.

For example, look at this Nadya program.

// function receiving single 32bit integer, and returning 32bit integer
fun add(arg : i32) -> i32 {
 let a = 10
 let b = 20
 a + b + arg
}

Conceptually, a and b would be pushed onto the stack memory, while 10 and 20 are temporary constants.

Here, values like 10 and 20 are called r-value, while a and b are called l-value.

Biggest difference between the two is lifetime. r-values are immediate values, which are destructed after use, while l-values live until the scope ends (End of the add function in this example) so it can be used later in the same scope.

Which means, 10 and 20 should be pushed onto the stack, and deleted immediately. (Let’s not think about the optimization for now).

When executing this function, program stack would look something like

// stack-bottom
| arg : i32 // argument. It's pushed on the stack since it's conceptually an l-value since it's name is assigned
| 10 : i32 // "a" points to this value
| 20 : i32 // "b" points to this value
// stack-top (stack grows downwards)

When performing a + b + arg later, compiler should load all arg, a, b to the memory as an r-value, and add them together.

Actually, simple case like this can be modeled easily using llvm IR directly, since it contains llvm.alloca (allocates memory on stack), llvm.store (stores value on the allocated stack) and llvm.load (loads value from the stack). We call these kind of types “trivially copyable” which means values can be copied completely with byte manipulation (simple memory copy).

For trivially copyable types, shallow copy and deep-copy does not give any kind of difference. (We call shallow-copy “move” and deep-copy “copy” in Nadya)

Why do we need special modeling over low-level llvm IR?

However, most general programming languages use more complicated types such as Strings, User defined structs, lists and arrays, etc. For example, in Nadya, type may define custom implementations for copy & drop operations.

This example shows simple data wrapper implemented in Nadya.

// filename : example.ndy
type DataWrapper with Copy | Drop
 public
  data : tensor<f32, 2>
  name : string

impl DataWrapper 
 trait Copy
  // Automtically called when value is copied
  fun copy(&self : Self) -> Self {
   print("Copy called on " + self.name)
   let new_tensor = self.data // deep-copy the tensor (In nadya, deep-copy is default behavior)
   DataWrapper(new_tensor, self.name)
  }
  
 trait Drop 
  // Automatically called when value is deleted
  fun drop(mut& self : Self) -> () {
   print("Drop called on " + self.name)
   // data is automatically deallocated by the wrapper after "drop"
  }

Internally, DataWrapper in C++, would look something like

typedef struct Tensor{
 void* data;
 std::size_t rank;
 std::size_t* shape;
};

typedef struct String{
 char* data;
 std::size_t length;
}

struct DataWrapper{
 Tensor data;
 String name;
};

In this case, shallow copy (move) and deep copy (copy) would be different. While deep-copy is defined to copy the whole contents of the data, shallow copy will only copy the ptr value itself. If we had to write copy & drop semantics of this struct directly in LLVM, it would be too complicated.

Therefore, we developed our own model for modeling this case, using nadyaCore dialect.

nadyaCore dialect provides the following functionalities

push operation: loads r-value onto the stack and creates l-value
copy operation: deep-copies arbitrary type
drop operation: destructs arbitrary type
move operation: shallow-copies the value. Lifetime of the source is invalidated since value is considered to be moved out.
borrow operation: creates r-value which references other l-value

Now, let’s see how it works by implementing the entry point of the program. (In Nadya, entrypoint of the executable is nadya_main)

// filename : example.ndy
attr[Extern]
fun nadya_main() -> i32 {
 // Define tensors
 let left = DataWrapper (tensor((2i, 3i), 1.0f32), "left")
 let right = DataWrapper (tensor((2i, 3i), 2.0f32), "right")
 
 // add two tensors. left and right have to be copied here since left, right is used later
 let added = left.data + right.data
 
 // print left, right and computation result
 print("left : " + toStr(left.data))
 print("right : " + toStr(right.data))
 print("result : " + toStr(added))
 
 0 // return code
 // Before exiting, left, right and added must be destructed (inserted automatically by compiler).
}

Executing this with ndc example.ndy -o example gives the following result (compiled without optimization)

Copy called on left
Drop called on left
Copy called on right
Drop called on right
Copy called on left
Drop called on left
Drop called on left
left : tensor<f32, 2>((2i, 3i, ), 1.000000f32, 1.000000f32, 1.000000f32, 1.000000f32, 1.000000f32, 1.000000f32)
Copy called on right
Drop called on right
Drop called on right
right : tensor<f32, 2>((2i, 3i, ), 2.000000f32, 2.000000f32, 2.000000f32, 2.000000f32, 2.000000f32, 2.000000f32)
result : tensor<f32, 2>((2i, 3i, ), 3.000000f32, 3.000000f32, 3.000000f32, 3.000000f32, 3.000000f32, 3.000000f32)

This is the abstracted MLIR output converted from the AST

...
%left_pushed = nadyaCore.push %left // push value to the stack
%right_pushed = nadyaCore.push %right // push value to the stack

%left_copied = nadyaCore.copy %left_pushed // copy the value
%right_copied = nadyaCore.copy %right_copied // copy the value
%added = nadyaCore.add %left_copied, %right_copied // add two values together
...
// Drop operations are inserted automatically by lifetime analysis (Data is destructed before scope ends)
nadyaCore.drop %left_copied
nadyaCore.drop %right_copied
nadyaCore.drop %left // Calls drop handler we defined above, and completely frees internal memory
nadyaCore.drop %right
nadyaCore.drop %added

As we can find out, this concept can be abstracted using MLIR pretty easily. Nadya uses this concept of l-value and r-value to handle data dependencies.

Modeling general programming semantics is not easy, but MLIR provides great infrastructure

As we explained, MLIR can be used to abstract the general programming features, and it can express the semantics of Nadya programming language in simple way. With MLIR’s builtin analysis tools and customizable pass pipelines, MLIR provides strong infrastructure for building programming languages. Personally, I think if people can get more interested in extending MLIR for more general purpose other than optimizing AI workloads, there can be a lot more potential for the framework.

Optimium

Solutions

Company

Resources

ENERZAi

Business number: 246-86-01405