Company

Resources

Company

Resources

Product

Optimium 101 (6) — Statically Typed Languages and Type Inference

Optimium 101 (6) — Statically Typed Languages and Type Inference

Optimium 101 (6) — Statically Typed Languages and Type Inference

This week, I will delve into a topic that we carefully considered while designing Nadya: the “type system in programming languages.” This article will provide a general overview of how programming languages are categorized by their type systems and the advantages and disadvantages of each. Let’s go!

Seungwuk Eun

2024년 5월 22일

Hello, I am Seungwuk Eun from ENERZAi, the developer of Nadya. Last week, we introduced Nadya, a programming language designed for high-performance computing. This week, I will delve into a topic that we carefully considered while designing Nadya: the “type system in programming languages.” This article will provide a general overview of how programming languages are categorized by their type systems and the advantages and disadvantages of each. Let’s go!

What is a Statically Typed Language?

A statically typed language is a language where the type of every value is determined at compile time, ensuring stability and safety. In contrast, a dynamically typed language determines the type of a value at runtime. Examples of statically typed languages include C/C++, C#, Java, and Go, while Python, JavaScript, and Ruby are well-known dynamically typed languages.

Statically Typed Language

Let’s look at an example in C code:

int main() {
  int integer = 0;
  const char *str = "String";
  integer = str;
  return 0;
}

In above code, the line integer = str will cause a compile-time error:

test.c:4:10: error: incompatible pointer to integer conversion assigning to 'int' from 'const char *' [-Wint-conversion]
    4 |         integer = str;
      |                 ^ ~~~
1 error generated

This is because C does not allow assigning a const char * type value to an int type variable. The compiler checks the types at compile time and stops the compilation if there is a type error, which is a key feature of statically typed languages.

Dynamically Typed Language

Now, let’s look at an example in Python:

a = 10
a = "string"
a = { "key" : "value", "another key" : 10 }
print(a)

Above code will run without any errors.

$ python test.py
{'key': 'value', 'another key': 10}

Even though the variable a initially holds an int, it can later hold a str and then a dict without any issues. It is because Python doesn’t restrict these kinds of codes. Meanwhile, what will happen to below code?

a = 10
a = "string"
a = { "key" : "value", "another key" : 10 }
print("run")
print(a.split('t'))

The split method is specific to the Python str type. If a was a "string" , it would have printed ['s', 'ring']. However, since a has been assigned dict type at the very end, the output will be as follows:

$ python test.py
run
Traceback (most recent call last):
  File "/home/test/test.py", line 5, in <module>

Since the dict type does not have the split attribute, it raises an error. However, you can see that the run output is still displayed. This means that the program does not know that a dict type value does not have a split method until it is actually run. This is a representative characteristic of dynamically typed languages, where type checking is not done until just before execution for syntactically correct programs.

Advantages and Disadvantages of Statically Typed Languages, and Why?

At a glance, Python programs feel more flexible and freer to write. On the other hand, C programs might feel less flexible due to types, and some programmers who mainly use dynamically typed languages may even feel repulsed.

The pros and cons of statically typed languages can be summarized as follows. Note that these pros and cons do not apply to all languages and all situations. They are general characteristics.

  • Advantages

  1. By checking types in advance, runtime errors can be reduced.

  2. Programs run faster.

  3. Understanding the program’s context is easier due to clear types.

  4. There is more information available for use in Integrated Development Environments (IDEs) (e.g., auto-completion).

  • Disadvantages

  1. Writing flexible code is difficult.

  2. Simple logic can become verbose due to types.

  3. The learning curve is steeper compared to dynamically typed languages.

Dynamically typed languages tend to have the exact opposite pros and cons compared to statically typed languages.

Others seem straight forward, but why would ‘Programs run faster’ with statically typed languages? Why are statically typed languages faster and dynamically typed languages slower?

Computers and Memory

To understand this, it is necessary to have a basic understanding of how computers work. The general process of a computer performing a computation is as follows:

  1. Look up memory and pass it to the register.

  2. Perform computations through the arithmetic logic unit (ALU) and pass it to the register.

  3. Pass the register’s value back to memory.

However, there are many omitted parts here. The full process, though lengthy, is as follows:

  1. Look up the memory at address A and pass the N-byte data to the register.

  2. Perform computations through the ALU with N-byte and M-byte registers and pass it to the K-byte register.

  3. Store the K-byte register’s value at address B.

All operations must be precisely defined for the computer to operate as described above and once these operations are expressed in programming language, we call it an assembly language.

Statically typed languages can directly pass this information to the machine code if they track the type of every value and how much memory each type occupies. Let’s take the following C program as an example:

struct Struct {
  int first;          // 4 byte
  const char *second; // 8 byte
  long double third;  // 8 byte
};

int main() {
 struct Struct myStruct = {
  .first = 10,
  .second = "Second",
  .third = 0.01L,
 };
 const char *str = myStruct.second;
 return 0;
}

In the above code, the Struct occupies a total of 20 bytes of memory, and you know where each member is located within the Struct. If the address of Struct is X, the starting address of the first member is X, the second member starts at X + 4, and the third member starts at X + 12. Since the types are also known, you know how many bytes of data to retrieve and where to retrieve it.

Thus, the machine code for myStruct.first is composed of code that retrieves the memory at the myStruct address with an offset of 4 for the second member.

However, dynamically typed languages cannot use this information. Since the type of a value cannot be known statically, the address and size of the data to be retrieved must be known at runtime. For example, even a simple addition code like this:

def add(a, b):
  return a + b

Therefore a Python interpreter, would execute above code as pseudo-code like this:

def int_add(left: int, right: int):
  ...

def string_add(left: str, right: str):
  ...
 
def float_add(left: float, right: float):
  ...

...

def add(a_value, a_attr, b_value, b_attr): 
  if has_attribute('add', a_attr, b_attr):
    function = get_attribute('add', a_attr) ## int_add, string_add ...  하나로 매핑
    function_call(function, a_value, b_value) 
  else:
    raise RuntimeError(...)

Statically typed languages generate a few lines of machine code, but dynamically typed languages must carry the type attribute along with the value and distribute it through a branch statement to code that can be executed as static machine code. This additional process takes more time than machine code, resulting in slower execution compared to compiled languages, sometimes by factors of tens, hundreds, or even thousands.

Furthermore, generating machine code that operates dynamically is also difficult and not very beneficial, which is why most dynamically typed languages like Python and Ruby adopt interpreter or virtual machine structures.

Type Inference

Statically typed languages need to know the type of every expression to check if they satisfy the language’s semantics before translating them into machine code. Ideally, it would be nice if users wrote all types like this:

int main() {
  int integer1 = 1 : int;
  int integer2 = 2 : int;
  return (integer1 : int + integer2 : int) : int;
}

However, if programmers have to write every program this way, they would spend a lot of time writing unnecessary types. Language designers reduce unnecessary type writing for user convenience, and the process of filling in types for expressions that are not typed in the compilation process is called type inference.

While it is impossible to completely omit type writing in statically typed languages, efforts to minimize type writing and make the code look simpler are already seen in many languages. Let’s look at a few examples.

Omitting Declaration/Return Types

C++’s auto or Rust's let declaration determines the variable type based on the right-hand side expression. Additionally, they allow omitting the return type of lambda expressions and functions.

/// C++ 17

auto /* int */ add(int x, int y) {
  return x + y;
}

int main() {
  auto lambda = []() /* -> int */ {
   return add(1, 2); 
  };
  auto /* int */ result = lambda();
  return result;
}
/// Rust 
fn main() {
    let f = |a : i32, b : i32| a + b; // (i32, i32) -> i32 
    println!("{}", f(1, 2)); // 3
}

Generic Types

In statically typed languages, each value has a single type. Sometimes, you need to write multiple pieces of code with only the types differing even though the behavior is the same. To handle such cases flexibly, many languages introduce generic types. Here are two equivalent pieces of C++ code:

#include <string>
 
 int add(int a, int b) {
   return a + b;
 }
 
 std::string add(std::string a, std::string b) {
   return a + b;
 }
 
 int main() {
   auto intAdd = add(1, 2);
   auto stringAdd = add("hello ", "world");
   return 0;
 }
#include <string>

template <typename T>
T add(T a, T b) {
  return a + b;
}

/* instantiated functions

int add_int(int a, int b) {
  return a + b;
}

std::string add_string(std::string a, std::string b) {
  return a + b;
}

*/

int main() {
  auto intAdd = add<int>(1, 2); /* add_int(1, 2) */
  auto stringAdd = add<std::string>("hello ", "world"); /* add_string("hello ", "world") */
}

A single piece of generic code is redefined into multiple functions based on user calls. The compilation of generic code varies by language, but generally, as in C++, the syntax tree is redefined at call time, and the generic types are filled in with concrete types, then type checking is performed again. Thus, sometimes generic code that had no problems may cause a type error when called.

#include <string>

template <typename T>
T add(T a, T b) {
  return a + b;
}

struct myStruct {
  int member;
};

int main() {
  auto intAdd = add<int>(1, 2); 
  auto stringAdd = add<std::string>("hello ", "world"); 
  auto myStructAdd = add<myStruct>(myStruct{1}, myStruct{2}); /// Type error!
}

Generic Type Inference

In some languages like C++ and Rust, even if the user does not explicitly specify the generic types, the language can infer them and access the appropriate functions or structures. The following generic code works without any issues:

#include <string>

template <typename T>
T add(T a, T b) {
  return a + b;
}

struct myStruct {
  int member;
};

int main() {
  auto intAdd = add(1, 2); // <int>
  auto floatAdd = add(1.0f, 2.0f); // <float>
  auto longdoubleAdd = add(1.0L, 2.0L); // <long double>
}

At call time, the argument types and the function’s name are used to infer the generic types and redefine the functions accordingly.

Functional Languages and Type Inference

Functional languages often implement static typing through powerful type inference algorithms. However, as shown in the example below, there are relatively few cases where you need to explicitly write types:

add a b = a + b

intAdd = f 1 2
floatAdd = f 1.0 2.0

main = do
    print intAdd

It is possible to limit type annotations to the extent that it might be mistaken for a dynamically typed language. You can even use generic types without explicit declarations. This is because the type theory used by functional languages differs significantly from other languages.

Generally, functional languages use the Hindley-Milner type system, which provides powerful type inference capabilities. Ideally, it is possible to omit variable types, function declaration types, and function return types, and use generic types without special declarations. When type inference proceeds, the code above is represented as the following pseudo-code:

Num => Num => Num f (a: Num) (b: Num) = (a: Num + b: Num) : Num

Int intAdd = (f: Int => Int => Int) (1: Int) (2: Int)
Float floatAdd = (f: Float => Float => Float) (1.0: Float) (2.0: Float)

main = do
    print (intAdd: Int)

Sharp-eyed readers will notice that the type of the add function changes every time it is used. Although it is declared as Num => Num => Num, it changes to Int => Int => Int when used with Int, and Float => Float => Float when used with Float. This phenomenon, where the type changes every time it is used after declaration, is called Let polymorphism. The most notable type inference algorithm in the Hindley-Milner type system, Algorithm W, is also based on Let polymorphism. Our programming language, Nadya, is designed as a statically typed language for high performance, and type inference is conducted based on Algorithm W to support functional paradigms. We plan to cover the details of this algorithm in a future post.

Conclusion

In this post, we briefly explored the characteristics of statically typed languages and various type inference techniques. Statically typed languages are fundamental to the operation of your computers, and we hope this short article has helped you understand them a bit better. Our Optimium team provides fast AI inference performance through our statically typed language, Nadya. We will continue to post about the algorithms and techniques implemented in Nadya, so please stay tuned!

Optimium

Optimium

Solutions

Resources

ENERZAi

Copyright ⓒ ENERZAi Inc. All Rights Reserved

사업자등록번호: 246-86-01405

이메일: contact@enerzai.com

연락처: +82 (2) 883 1231

주소: 대한민국 서울특별시 강남구 테헤란로27길 27

Optimium

Optimium

Solutions

Resources

ENERZAi

Copyright ⓒ ENERZAi Inc. All Rights Reserved

사업자등록번호: 246-86-01405

이메일: contact@enerzai.com

연락처: +82 (2) 883 1231

주소: 대한민국 서울특별시 강남구 테헤란로27길 27

Optimium

Optimium

Solutions

Resources

ENERZAi

Copyright ⓒ ENERZAi Inc. All Rights Reserved

사업자등록번호: 246-86-01405

이메일: contact@enerzai.com

연락처: +82 (2) 883 1231

주소: 대한민국 서울특별시 강남구 테헤란로27길 27