language ideas
Programming languages are something I think about a lot. This is a list of design choices I think are interesting, and would like to implement some day. Treat this page as documentation for a hypothetical language. Ideas are stolen liberally from zig, odin, jai, go, and rust
guiding principles
A good language is comprehensive but not complex. It should provide a reasonable implementation for every common algorithm and data type, as well as many uncommon ones. Its use of concepts and keywords should be economical; each should perform a unique function not possible in their absence. Language features should closely map on to the real world mathematical and theoretical ideas they exist downstream from. It should be fast, simple, and succinct without compromise
primitives
Algebraic primitives are written as a single character followed by a data
size. They include wholes (unsigned), integers (signed), real (floating point),
complex, and quaternion types. The components of complex and quaternion numbers
are reals. Primitive data sizes may be 8, 16, 32, 64, 128, or platform defined.
Reals must have a data size greater than 8. Algebraic primitives may optionally
specify they are big or little endian by appending be
or le
respectively.
The lexical primitives are glyphs, strings, and substrings. A glyph represents a single Unicode code-point, and is 32 bits wide. A string is an owned buffer of UTF-8 encoded text. Substrings are "fat pointers" to string buffers, with a specified length.
The logical primitives are booleans and bitflags. A boolean is 8 bits wide, and can be true or false. Bitflags are a compressed array of booleans, and may be 8, 16, 32, 64, or 128 bits wide.
The two special types are pointers and the empty type. The pointer type is wide enough to
fit a native pointer, and is intentionally distinct from wsize
in order to support
the CHERI architecture
(see this article).
The empty type exists only implicitly, and has a size of zero. It is primarily used in
combination with fallible types to represent the absence of a value. No variable or immutable
can be of the empty type. It is represented as an empty block {}
primitive:
algebraic:
whole: w8 w16 w32 w64 w128 wsize
integer: i8 i16 i32 i64 i128 isize
real: r16 r32 r64 r128
complex: c32 c64 c128 c256
quaternion: q64 q128 q256 q512
lexical: glyph string substr
logical: bool b8 b16 b32 b64 b128
special: ptr {}
operators
math: + - / * %
logical: not and or xor
bitwise: ~ & | ^ << >>
comparative: < <= > >= == !=
variables
Variables and constants are strongly typed, but the type may be inferred or
explicit. Re-declared variables shadow the previous declaration for the current
scope. Declaration is done using :=
and ::
. Uninitialized variables are zeroed
by default. Multiple declaration and assignment is allowed
// Implicit type
v1 := 10;
c1 :: true;
// Explicit type
v2 : i64 = 20;
c2 : bool : false;
v1 = 15; // Assignment
v3, v4 := 'π', 3.14159; // Multiple declaration
v3, v4 = 'τ', 6.28318; // Multiple assignment
{
v2 := "Hello World"; // Name shadowing
v4, v5 := 4.0, q128(1.0, 0.0, 0.0, 0.0);
}
// v2 == 20, v4 == 6.28318
conditionals
There are two kinds of conditionals, if-else statements and when statements. The when statement is similar to the switch statement in other languages. They must exhaustively cover every case, or include a default case. Both can be used as the right-hand side of an assignment
n := 15;
sign := if n < 0 {
print("n is negative");
-1
} else {
print("n is positive");
1
}
s := ":)"
match s {
":(" -> {
println("Sad face");
},
":)" -> {
println("Happy face");
},
":|" -> {
println("Neutral face");
},
default(val) -> {
println("I dont recognize this face:");
println(val);
}
}
blocks
Blocks are surrounded by {}
and optionally evaluate to some type. A block
may be given a name using the @
operator. By default, blocks will capture all
variables within the scope they are declared. A block can optionally capture
only specific named variables from the scope in which it is evaluated using the
capture syntax []
.
The break
keyword is used to escape a block early. A block name can be
provided to break out of multiple nested blocks at once. If the escaped block
evaluates to a type, a value must be provided to break. If the final expression
within a block does not end in a semicolon, the block will evaluate to that
expression.
sum := 0;
for x : 0..100 @x_block {
for y : 0..100 @y_block {
for z : 0..100 @z_block {
if x == 16 {
break @x_block;
}
else if y == 64 and z == 32 {
break @y_block;
}
sum += x + y + z;
}
}
}
outside := "foo";
sum2 := [^sum] { // Captures mutable reference to `sum`
/* outside = "bar"; */ // ERROR: only `sum` is captured
sum *= 2
} // == sum * 2
functions
Functions and lambdas are not distinct types, and do not have distinct syntax. A function may be declared anywhere a variable may, including inside other functions or as a parameter. A function may have any number of ordered parameters, and may be variadic. Functions may have default parameters, and default parameters may refer to parameters which come before them. Multiple values may be returned from a function
A function may accept any number of receiver types. A receiver types implements the function as a "method". Multiple receiver parameters refer to a parenthetical list of objects of the given types, which are order dependent. Types can use methods defined from any receiver function within the current scope. Receiver types can be any type, including primitives.
Function overloading is valid so long as no two functions share both a name and call signature. Function names can also be shadowed just like variables.
A function body is a block. This means it can optionally capture local variables
from where it is called. Unlike regular blocks, function blocks do not capture
locals automatically. Functions implicitly return the last expression of their
body if it does not end in a semicolon, and can be returned early by using
break
without a block name. Note that block captures and names are not part of
a functions signature
// Receiver types come before parameters, and may be elided
fizzbuzz :: (number: wsize) -> wsize {
sum := 0;
if number == 0 {
println("Input cannot be zero");
break sum; // Early return
}
for i : 0..number {
if (i % 3 == 0) and (i % 5 == 0) {
println("fizzbuzz");
sum += 1;
}
else if i % 3 == 0 {
println("fizz");
}
else if i % 5 == 0 {
println("buzz");
}
}
sum // return the number of fizzbuzz's
}
f1 := fizzbuzz(15);
f2 := fizzbuzz(number: 30); // Parameters may be explicitly named
// With a single receiver
fizzbuzz :: (number: wsize)() -> wsize { fizzbuzz(number) }
f3 := 15.fizzbuzz()
// With multiple receivers
fizzbuzz :: (number1: wsize, number2: wsize)() -> wsize, wsize {
number1.fizzbuzz(), number2.fizzbuzz()
}
f4, f5 := (15, 30).fizzbuzz();
// With capture
sum_locals :: () [f1: wsize, f2: wsize, f3: wsize, f4: wsize] {
f1 + f2 + f3 + f4
}
f6 := sum_locals(); // Captures local variables implicitly
fallibility
A type which is followed by a question mark ?
is marked as fallible. A fallible
type may or may not contain a value, and must be checked before they are accessed.
All types implicitly convert to their fallible counterpart
something : w64? = 5;
nothing : w64? = {};
// Conditional assignment
if num := something {
print("Found a value");
} else {
print("No value exists");
}
// Most boolean operations work on fallible types
maybe = something or nothing;
Functions may return a fallible type. Placing a question mark after a fallible value will immediately return from the current function if that value is empty
divide :: (numerator: r64, divisor: r64) -> r64? {
if numerator == 0.0 {
break; // Failure case, returns the empty type
}
numerator / divisor
}
inverse_squared :: (val: r64) -> r64? {
inv : r64 = divide(1.0, val)?;
inv * inv
}
generators
Capturing variables within a function allows for generator functions.
A generator function can produce different values each time it is called.
They are often used to lazily evaluate an open-ended sequence, or to iterate
over a collection. The :
operator inside a for loop expression will call
a function repeatedly until the function fails to return a value
range :: (minimum: i64, maximum: i64) -> (() -> i64?) {
current := minimum;
() [minimum, maximum, current] {
last := current;
current += 1;
if current <= maximum {
break last;
} else {
break;
}
}
}
iterable := range(0, 10);
for i : iterable {
print(i);
} // Prints 0-9
iterable := 0..10; // Equivalant to range(0, 10)
move, clone, refer
Be default, parameters and receivers are "moved" into the function body. This
means that they can no longer be accessed by the surrounding scope. A variable
may be cloned instead to replicate the classic C "pass by value" behavior. The
clone
method is implemented on all primitives and may be implemented on any
user defined type.
Referenced types are prefixed by &
or ^
, for immutable and mutable
references respectively. Cloning a reference results in a clone of the
underlying value. Receiver arguments can be either type of reference. Values
will implicitly cast to their referenced equivalent for the purpose of method
calls, but not vice-versa
value : wsize = 10;
reference : &wsize = &2;
divide :: (numerator: wsize)(divisor: &wsize) {
numerator / divisor
}
num1 := value.divide(reference); // `value` moves out of scope
num2 := num1.clone().divide(reference); // `num1` stays in scope
num3 := reference.clone().divide(&num1);
aliased types
A type can be aliased, giving it a new name. The alias is treated as a completely different type, and will not implicitly cast to the original type
my_string_alias :: string;
normal_string : string = "Hello World";
aliased_string : my_string_alias = "Goodbye World";
/* normal_string = aliased_string; */ // ERROR: incompatible types
structures
A structure is defined by aliasing the struct
type. They may contain named or
unnamed fields, but not a mix of both. Like local variables, uninitialized struct
fields are zeroed by default
// Struct with named fields
Position :: struct {
x: r64, y: r64
};
origin : Position = {x: 0.0, y: 0.0};
// With unnamed fields
Color :: struct {
r64, r64, r64
};
red : Color = {0.0, 0.0, 0.0};
// Accessing unnamed fields
red.0 = 1.0;
red.1 = 0.2;
red.2 = 0.2;
// red == {1.0, 0.2, 0.2};
defer
Execution of a block can be deferred until the end of the current scope. Deferred blocks execute in the reverse order they were declared
{
println("First");
defer { println("Fourth"); }
defer { println("Third"); }
println("Second")
}
namespaces
All source files implicitly exist within their own namespace, which must be
named at the top of the file. An inner namespace can also be declared. Members
of a namespace are accessed using the dot .
operator
ns1 :: {
foo :: () {
println("foo");
}
ns2 :: {
bar :: () {
println("bar");
}
}
}
ns1.foo();
ns1.ns2.bar();