language ideas

Programming languages are something I think about a lot. This is a list of design choices I think are interesting, and would like to implement some day. Treat this page as documentation for a hypothetical language. Ideas are stolen liberally from zig, odin, jai, go, and rust

guiding principles

A good language is comprehensive but not complex. It should provide a reasonable implementation for every common algorithm and data type, as well as many uncommon ones. Its use of concepts and keywords should be economical; each should perform a unique function not possible in their absence. Language features should closely map on to the real world mathematical and theoretical ideas they exist downstream from. It should be fast, simple, and succinct without compromise

primitives

Algebraic primitives are written as a single character followed by a data size. They include wholes (unsigned), integers (signed), real (floating point), complex, and quaternion types. The components of complex and quaternion numbers are reals. Primitive data sizes may be 8, 16, 32, 64, 128, or platform defined. Reals must have a data size greater than 8. Algebraic primitives may optionally specify they are big or little endian by appending be or le respectively.

The lexical primitives are glyphs, strings, and substrings. A glyph represents a single Unicode code-point, and is 32 bits wide. A string is an owned buffer of UTF-8 encoded text. Substrings are "fat pointers" to string buffers, with a specified length.

The logical primitives are booleans and bitflags. A boolean is 8 bits wide, and can be true or false. Bitflags are a compressed array of booleans, and may be 8, 16, 32, 64, or 128 bits wide.

The two special types are pointers and the empty type. The pointer type is wide enough to fit a native pointer, and is intentionally distinct from wsize in order to support the CHERI architecture (see this article). The empty type exists only implicitly, and has a size of zero. It is primarily used in combination with fallible types to represent the absence of a value. No variable or immutable can be of the empty type. It is represented as an empty block {}

primitive:
  algebraic:
    whole: w8 w16 w32 w64 w128 wsize
    integer: i8 i16 i32 i64 i128 isize
    real: r16 r32 r64 r128
    complex: c32 c64 c128 c256
    quaternion: q64 q128 q256 q512
  lexical: glyph string substr
  logical: bool b8 b16 b32 b64 b128
  special: ptr {}

operators

math: + - / * %
logical: not and or xor
bitwise: ~ & | ^ << >>
comparative: < <= > >= == !=

variables

Variables and constants are strongly typed, but the type may be inferred or explicit. Re-declared variables shadow the previous declaration for the current scope. Declaration is done using := and ::. Uninitialized variables are zeroed by default. Multiple declaration and assignment is allowed

// Implicit type
v1 := 10;
c1 :: true; 
// Explicit type
v2 : i64 = 20;
c2 : bool : false;

v1 = 15; // Assignment
v3, v4 := 'π', 3.14159; // Multiple declaration
v3, v4 = 'τ', 6.28318; // Multiple assignment
{
  v2 := "Hello World"; // Name shadowing
  v4, v5 := 4.0, q128(1.0, 0.0, 0.0, 0.0);
}
// v2 == 20, v4 == 6.28318

conditionals

There are two kinds of conditionals, if-else statements and when statements. The when statement is similar to the switch statement in other languages. They must exhaustively cover every case, or include a default case. Both can be used as the right-hand side of an assignment

n := 15;
sign := if n < 0 {
  print("n is negative");
  -1
} else {
  print("n is positive");
  1
}

s := ":)"
match s {
  ":(" -> {
    println("Sad face");
  },
  ":)" -> {
    println("Happy face");
  },
  ":|" -> {
    println("Neutral face");
  },
  default(val) -> {
    println("I dont recognize this face:");
    println(val);
  }
}

blocks

Blocks are surrounded by {} and optionally evaluate to some type. A block may be given a name using the @ operator. By default, blocks will capture all variables within the scope they are declared. A block can optionally capture only specific named variables from the scope in which it is evaluated using the capture syntax [].

The break keyword is used to escape a block early. A block name can be provided to break out of multiple nested blocks at once. If the escaped block evaluates to a type, a value must be provided to break. If the final expression within a block does not end in a semicolon, the block will evaluate to that expression.

sum := 0;
for x : 0..100 @x_block {
  for y : 0..100 @y_block {
    for z : 0..100 @z_block {
      if x == 16 {
        break @x_block;
      }
      else if y == 64 and z == 32 {
        break @y_block;
      }
      sum += x + y + z;
    }
  }
}

outside := "foo";
sum2 := [^sum] { // Captures mutable reference to `sum`
  /* outside = "bar"; */ // ERROR: only `sum` is captured
  sum *= 2
} // == sum * 2

functions

Functions and lambdas are not distinct types, and do not have distinct syntax. A function may be declared anywhere a variable may, including inside other functions or as a parameter. A function may have any number of ordered parameters, and may be variadic. Functions may have default parameters, and default parameters may refer to parameters which come before them. Multiple values may be returned from a function

A function may accept any number of receiver types. A receiver types implements the function as a "method". Multiple receiver parameters refer to a parenthetical list of objects of the given types, which are order dependent. Types can use methods defined from any receiver function within the current scope. Receiver types can be any type, including primitives.

Function overloading is valid so long as no two functions share both a name and call signature. Function names can also be shadowed just like variables.

A function body is a block. This means it can optionally capture local variables from where it is called. Unlike regular blocks, function blocks do not capture locals automatically. Functions implicitly return the last expression of their body if it does not end in a semicolon, and can be returned early by using break without a block name. Note that block captures and names are not part of a functions signature

// Receiver types come before parameters, and may be elided
fizzbuzz :: (number: wsize) -> wsize {
  sum := 0;
  if number == 0 {
    println("Input cannot be zero");
    break sum; // Early return
  }
  for i : 0..number {
    if (i % 3 == 0) and (i % 5 == 0) {
      println("fizzbuzz");
      sum += 1;
    }
    else if i % 3 == 0 {
      println("fizz");
    }
    else if i % 5 == 0 {
      println("buzz");
    }
  }
  sum // return the number of fizzbuzz's
}
f1 := fizzbuzz(15);
f2 := fizzbuzz(number: 30); // Parameters may be explicitly named

// With a single receiver
fizzbuzz :: (number: wsize)() -> wsize { fizzbuzz(number) }
f3 := 15.fizzbuzz()

// With multiple receivers
fizzbuzz :: (number1: wsize, number2: wsize)() -> wsize, wsize {
  number1.fizzbuzz(), number2.fizzbuzz()
}
f4, f5 := (15, 30).fizzbuzz();

// With capture
sum_locals :: () [f1: wsize, f2: wsize, f3: wsize, f4: wsize] {
  f1 + f2 + f3 + f4
}

f6 := sum_locals(); // Captures local variables implicitly

fallibility

A type which is followed by a question mark ? is marked as fallible. A fallible type may or may not contain a value, and must be checked before they are accessed. All types implicitly convert to their fallible counterpart

something : w64? = 5;
nothing : w64? = {};

// Conditional assignment
if num := something {
  print("Found a value");  
} else {
  print("No value exists");
}

// Most boolean operations work on fallible types
maybe = something or nothing;

Functions may return a fallible type. Placing a question mark after a fallible value will immediately return from the current function if that value is empty

divide :: (numerator: r64, divisor: r64) -> r64? {
  if numerator == 0.0 {
    break; // Failure case, returns the empty type
  }
  numerator / divisor
}

inverse_squared :: (val: r64) -> r64? {
  inv : r64 = divide(1.0, val)?;
  inv * inv
}

generators

Capturing variables within a function allows for generator functions. A generator function can produce different values each time it is called. They are often used to lazily evaluate an open-ended sequence, or to iterate over a collection. The : operator inside a for loop expression will call a function repeatedly until the function fails to return a value

range :: (minimum: i64, maximum: i64) -> (() -> i64?) {
  current := minimum;
  () [minimum, maximum, current] {
    last := current;
    current += 1;
    if current <= maximum {
      break last;
    } else {
      break;
    }
  }
}

iterable := range(0, 10);
for i : iterable {
  print(i);
} // Prints 0-9

iterable := 0..10; // Equivalant to range(0, 10)

move, clone, refer

Be default, parameters and receivers are "moved" into the function body. This means that they can no longer be accessed by the surrounding scope. A variable may be cloned instead to replicate the classic C "pass by value" behavior. The clone method is implemented on all primitives and may be implemented on any user defined type.

Referenced types are prefixed by & or ^, for immutable and mutable references respectively. Cloning a reference results in a clone of the underlying value. Receiver arguments can be either type of reference. Values will implicitly cast to their referenced equivalent for the purpose of method calls, but not vice-versa

value : wsize = 10;
reference : &wsize = &2;

divide :: (numerator: wsize)(divisor: &wsize) {
  numerator / divisor
}

num1 := value.divide(reference); // `value` moves out of scope
num2 := num1.clone().divide(reference); // `num1` stays in scope
num3 := reference.clone().divide(&num1);

aliased types

A type can be aliased, giving it a new name. The alias is treated as a completely different type, and will not implicitly cast to the original type

my_string_alias :: string;
normal_string : string = "Hello World";
aliased_string : my_string_alias = "Goodbye World";
/* normal_string = aliased_string; */ // ERROR: incompatible types

structures

A structure is defined by aliasing the struct type. They may contain named or unnamed fields, but not a mix of both. Like local variables, uninitialized struct fields are zeroed by default

// Struct with named fields
Position :: struct {
  x: r64, y: r64
};

origin : Position = {x: 0.0, y: 0.0};
// With unnamed fields
Color :: struct {
  r64, r64, r64
};
red : Color = {0.0, 0.0, 0.0};
// Accessing unnamed fields
red.0 = 1.0;
red.1 = 0.2;
red.2 = 0.2;
// red == {1.0, 0.2, 0.2};

defer

Execution of a block can be deferred until the end of the current scope. Deferred blocks execute in the reverse order they were declared

{
  println("First");
  defer { println("Fourth"); }
  defer { println("Third"); }
  println("Second")
}

namespaces

All source files implicitly exist within their own namespace, which must be named at the top of the file. An inner namespace can also be declared. Members of a namespace are accessed using the dot . operator

ns1 :: {
  foo :: () {
    println("foo");
  }

  ns2 :: {
    bar :: () {
      println("bar");
    }
  }
}

ns1.foo();
ns1.ns2.bar();