Sum Types: Nominal vs. Structural

newt is strongly influenced by functional programming languages, so it’s no accident that many of the language constructs are copied or adapted from functional programming. No existing paradigm is sacrosanct, however, so even the implementation of classic FP constructs should be considered carefully.

One recent and long-running design discussions is centered around the implementation of sum types (i.e. tagged unions). This type-safe and succinct method for describing a value that is one of several variants is well-aligned with the language design goals. The common implementation of sum types identifies the variants of the type with a tag, which is simply a name for one of the variants. This makes the sum type a nominal type, where equivalence is determined by the name. Rust’s enums are a fine example of this nominal approach:

enum Message {
    Quit,
    ChangeColor(i32, i32, i32),
    Move { x: i32, y: i32 },
    Write(String),
}

This snippet defines a Message type that can be a Quit message with no associated data, a ChangeColor message that contains a tuple of integers representing RGB color value, a Move message with associated Cartesian coordinates, or a Write message with an associated string value. Instances are created with a type constructor as follows:

let x: Message = Message::Move { x: 3, y: 4 };

To use a value of a sum type, the specific variant is matched by name before use:

match msg {
    Message::Quit => quit(),
    Message::ChangeColor(r, g, b) => change_color(r, g, b),
    Message::Move { x: x, y: y } => move_cursor(x, y),
    Message::Write(s) => println!("{}", s),
};

Performing computations with values of a sum type without first performing a match decomposition is usually a semantic error. For example, adding to values of type Message together is nonsensical, but adding two values of type Message::Move could be interpreted as vector addition.

So far the construct is regular and coherent (although I strongly disagree with the use of the keyword enum), but the syntax feels a bit heavy for simple cases. For many simple sum types (such as the Message example from Rust), the variants are each of different types, and can be matched by the structure of the type instead of its name. In small examples, the gains in succinctness can be significant, as illustrated by the following hypothetical newt snippets:

# nominal approach
sum number {
        discrete:int
        | continuous:double
}
t:= number::discrete(7)

# structural approach
t:(int|double)= 7

This succinctness is offset by verbosity when specifying struct members or function return types:

# nominal
f:= (a:int) -> number { }
f:= (a:double) -> number { }

# structural
f:= (a:int) -> (int|double) { }
f:= (a:double) -> (int|double) { }

The verbosity increases with each additional structural variant. Type inference solves the double verbosity of function return types and variable types, but does not address the verbosity of multiple function definitions with the same return type.

Even so, the nominal typing still feels verbose. My current thinking is to require nominal typing, but allow type inference of the variant where possible. For example, the int variant can unambiguously be inferred as follows:

sum number {
        discrete:int
        | continuous:double
}
t:number= 7 # t will be of type number::discrete