Product
This week, I will delve into a topic that we carefully considered while designing Nadya: the “type system in programming languages.” This article will provide a general overview of how programming languages are categorized by their type systems and the advantages and disadvantages of each. Let’s go!
Seungwuk Eun
2024년 5월 22일
Hello, I am Seungwuk Eun from ENERZAi, the developer of Nadya. Last week, we introduced Nadya, a programming language designed for high-performance computing. This week, I will delve into a topic that we carefully considered while designing Nadya: the “type system in programming languages.” This article will provide a general overview of how programming languages are categorized by their type systems and the advantages and disadvantages of each. Let’s go!
What is a Statically Typed Language?
A statically typed language is a language where the type of every value is determined at compile time, ensuring stability and safety. In contrast, a dynamically typed language determines the type of a value at runtime. Examples of statically typed languages include C/C++, C#, Java, and Go, while Python, JavaScript, and Ruby are well-known dynamically typed languages.
Statically Typed Language
Let’s look at an example in C code:
In above code, the line integer = str
will cause a compile-time error:
This is because C does not allow assigning a const char *
type value to an int
type variable. The compiler checks the types at compile time and stops the compilation if there is a type error, which is a key feature of statically typed languages.
Dynamically Typed Language
Now, let’s look at an example in Python:
Above code will run without any errors.
Even though the variable a
initially holds an int
, it can later hold a str
and then a dict
without any issues. It is because Python doesn’t restrict these kinds of codes. Meanwhile, what will happen to below code?
The split
method is specific to the Python str
type. If a
was a "string"
, it would have printed ['s', 'ring']
. However, since a
has been assigned dict
type at the very end, the output will be as follows:
Since the dict
type does not have the split
attribute, it raises an error. However, you can see that the run
output is still displayed. This means that the program does not know that a dict
type value does not have a split
method until it is actually run. This is a representative characteristic of dynamically typed languages, where type checking is not done until just before execution for syntactically correct programs.
Advantages and Disadvantages of Statically Typed Languages, and Why?
At a glance, Python programs feel more flexible and freer to write. On the other hand, C programs might feel less flexible due to types, and some programmers who mainly use dynamically typed languages may even feel repulsed.
The pros and cons of statically typed languages can be summarized as follows. Note that these pros and cons do not apply to all languages and all situations. They are general characteristics.
Advantages
By checking types in advance, runtime errors can be reduced.
Programs run faster.
Understanding the program’s context is easier due to clear types.
There is more information available for use in Integrated Development Environments (IDEs) (e.g., auto-completion).
Disadvantages
Writing flexible code is difficult.
Simple logic can become verbose due to types.
The learning curve is steeper compared to dynamically typed languages.
Dynamically typed languages tend to have the exact opposite pros and cons compared to statically typed languages.
Others seem straight forward, but why would ‘Programs run faster’ with statically typed languages? Why are statically typed languages faster and dynamically typed languages slower?
Computers and Memory
To understand this, it is necessary to have a basic understanding of how computers work. The general process of a computer performing a computation is as follows:
Look up memory and pass it to the register.
Perform computations through the arithmetic logic unit (ALU) and pass it to the register.
Pass the register’s value back to memory.
However, there are many omitted parts here. The full process, though lengthy, is as follows:
Look up the memory at address A and pass the N-byte data to the register.
Perform computations through the ALU with N-byte and M-byte registers and pass it to the K-byte register.
Store the K-byte register’s value at address B.
All operations must be precisely defined for the computer to operate as described above and once these operations are expressed in programming language, we call it an assembly language.
Statically typed languages can directly pass this information to the machine code if they track the type of every value and how much memory each type occupies. Let’s take the following C program as an example:
In the above code, the Struct
occupies a total of 20 bytes of memory, and you know where each member is located within the Struct
. If the address of Struct
is X
, the starting address of the first
member is X
, the second
member starts at X + 4
, and the third
member starts at X + 12
. Since the types are also known, you know how many bytes of data to retrieve and where to retrieve it.
Thus, the machine code for myStruct.first
is composed of code that retrieves the memory at the myStruct
address with an offset of 4
for the second
member.
However, dynamically typed languages cannot use this information. Since the type of a value cannot be known statically, the address and size of the data to be retrieved must be known at runtime. For example, even a simple addition code like this:
Therefore a Python interpreter, would execute above code as pseudo-code like this:
Statically typed languages generate a few lines of machine code, but dynamically typed languages must carry the type attribute along with the value and distribute it through a branch statement to code that can be executed as static machine code. This additional process takes more time than machine code, resulting in slower execution compared to compiled languages, sometimes by factors of tens, hundreds, or even thousands.
Furthermore, generating machine code that operates dynamically is also difficult and not very beneficial, which is why most dynamically typed languages like Python and Ruby adopt interpreter or virtual machine structures.
Type Inference
Statically typed languages need to know the type of every expression to check if they satisfy the language’s semantics before translating them into machine code. Ideally, it would be nice if users wrote all types like this:
However, if programmers have to write every program this way, they would spend a lot of time writing unnecessary types. Language designers reduce unnecessary type writing for user convenience, and the process of filling in types for expressions that are not typed in the compilation process is called type inference.
While it is impossible to completely omit type writing in statically typed languages, efforts to minimize type writing and make the code look simpler are already seen in many languages. Let’s look at a few examples.
Omitting Declaration/Return Types
C++’s auto
or Rust's let
declaration determines the variable type based on the right-hand side expression. Additionally, they allow omitting the return type of lambda expressions and functions.
Generic Types
In statically typed languages, each value has a single type. Sometimes, you need to write multiple pieces of code with only the types differing even though the behavior is the same. To handle such cases flexibly, many languages introduce generic types. Here are two equivalent pieces of C++ code:
A single piece of generic code is redefined into multiple functions based on user calls. The compilation of generic code varies by language, but generally, as in C++, the syntax tree is redefined at call time, and the generic types are filled in with concrete types, then type checking is performed again. Thus, sometimes generic code that had no problems may cause a type error when called.
Generic Type Inference
In some languages like C++ and Rust, even if the user does not explicitly specify the generic types, the language can infer them and access the appropriate functions or structures. The following generic code works without any issues:
At call time, the argument types and the function’s name are used to infer the generic types and redefine the functions accordingly.
Functional Languages and Type Inference
Functional languages often implement static typing through powerful type inference algorithms. However, as shown in the example below, there are relatively few cases where you need to explicitly write types:
It is possible to limit type annotations to the extent that it might be mistaken for a dynamically typed language. You can even use generic types without explicit declarations. This is because the type theory used by functional languages differs significantly from other languages.
Generally, functional languages use the Hindley-Milner type system, which provides powerful type inference capabilities. Ideally, it is possible to omit variable types, function declaration types, and function return types, and use generic types without special declarations. When type inference proceeds, the code above is represented as the following pseudo-code:
Sharp-eyed readers will notice that the type of the add
function changes every time it is used. Although it is declared as Num => Num => Num
, it changes to Int => Int => Int
when used with Int
, and Float => Float => Float
when used with Float
. This phenomenon, where the type changes every time it is used after declaration, is called Let polymorphism. The most notable type inference algorithm in the Hindley-Milner type system, Algorithm W, is also based on Let polymorphism. Our programming language, Nadya
, is designed as a statically typed language for high performance, and type inference is conducted based on Algorithm W to support functional paradigms. We plan to cover the details of this algorithm in a future post.
Conclusion
In this post, we briefly explored the characteristics of statically typed languages and various type inference techniques. Statically typed languages are fundamental to the operation of your computers, and we hope this short article has helped you understand them a bit better. Our Optimium team provides fast AI inference performance through our statically typed language, Nadya
. We will continue to post about the algorithms and techniques implemented in Nadya
, so please stay tuned!