Ideas for a type system

Pipe-calculus is a typeless core calculus, but a language based on it doesn't have to be. There are a few preliminary thoughts that guide the search for a suitable type system.

Types are not added directly to the core calculus but to the surface language.
The type system should naturally fit to the calculus (this is a subtle guideline that is elaborated below).
The type checking algorithm must be easy to implement in the typeless core language.

The last point comes from the hope that the surface language will be homoiconic, and the type system will not break this property.

Dynamic vs. static typing

In this article we will call the typeless core calculus the dynamic or runtime language.

A type annotation can be treated in two ways: either translating it to a runtime check or in advance prove that a runtime check is unnecessary and remove it from the program. This makes the distinction between dynamic and static typing.

With dynamic typing type checking and evaluation overlap. A dynamically typed program can be modeled as a type safe domain that is surrounded by an unsafe environment. Once the program has to receive an input value from the environment, its external representation has to be decoded and it must be proved that it is an element of the required type. On success the program proceeds in a type safe manner. On failure the program either rejects the input and proceeds as an alternative branch or its execution is aborted.

If two type safe domains share the same internal representation and type system, they can directly exchange values. In this case values are transferred together with type annotations (probably in an optimized form) and only type annotations have to be checked.

With static typing, the type of a program is checked before execution. After successful type checking, type annotations may be safely erased. At runtime the program can still receive input. The type checker has to ensure that on failure to parse the input the program either proceeds in an alternative branch or is aborted.

Linguistic analogy

In a linguistic model of dynamic typing every data type [math]\displaystyle{ T_i }[/math] where [math]\displaystyle{ 1 \le i \le n }[/math] is associated with a language [math]\displaystyle{ L_i }[/math]. The environment carries words of the language [math]\displaystyle{ \sum_{i=1}^{n} L_i }[/math]. When outputting a value [math]\displaystyle{ v : T_i }[/math], it is translated to an external representation in [math]\displaystyle{ L_i }[/math]. When inputting a value [math]\displaystyle{ v : T_i }[/math] from the environment, its external representation is parsed. A program that implements a function [math]\displaystyle{ f : T_i \to T_j }[/math] is modeled by a translator from [math]\displaystyle{ L_i }[/math] to [math]\displaystyle{ L_j }[/math].

Every data type [math]\displaystyle{ T_i }[/math] can be conservatively modeled by a generator [math]\displaystyle{ g_{L_i} }[/math] that generates all words of [math]\displaystyle{ L_i }[/math] and a recognizer [math]\displaystyle{ r_{L_i} }[/math] that recognizes exactly the words of [math]\displaystyle{ L_i }[/math]. Accordingly, every first order function [math]\displaystyle{ f : T_i → T_j }[/math] can be modeled by the composition of [math]\displaystyle{ r_{L_i} }[/math] and [math]\displaystyle{ g_{L_j} }[/math].

Algebraic sum types perfectly fit into this framework. For any type definition [math]\displaystyle{ T = c_1~A_{11}~...~A_{1n}~|~...~|~c_l~A_{l1}~...~A_{lm} }[/math] we can construct a corresponding unambiguous context-free grammar as follows.

For each distinct [math]\displaystyle{ T }[/math] and [math]\displaystyle{ A_{ij} }[/math] type names we choose distinct nonterminal symbols [math]\displaystyle{ \alpha(T) }[/math] and [math]\displaystyle{ \alpha(A_{ij}) }[/math] respectively.
For each [math]\displaystyle{ c_i }[/math] constructor symbol we choose a distinct [math]\displaystyle{ s(c_i) }[/math] terminal symbol. ^[1]
For each [math]\displaystyle{ c_i~A_{i1}~...~A_{io} }[/math] product type we create a production rule [math]\displaystyle{ \alpha(T) ::= s(c_i)\alpha(A_{i1})...\alpha(A_{io}) }[/math].

This method can be extended to multiple type definitions as long as we use the same mappings [math]\displaystyle{ s }[/math] and [math]\displaystyle{ \alpha }[/math] and constructor symbols are globally unique. The resulting grammar is in Greybach normal form. ^[2]

Notes

↑ Constructor symbols of products inside a sum type are unique by definition.
↑ Note that this representation of sum types does not require parentheses for disambiguation.

[1] Constructor symbols of products inside a sum type are unique by definition.

[2] Note that this representation of sum types does not require parentheses for disambiguation.

[1]

[2]