Pipe-calculus

Revision as of 16:35, 8 July 2023 by KalmanKeri (talk | contribs) (→‎Normal forms)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Pipe-calculus is a universal model of computation closely related to process calculus, lambda calculus and formal grammars. One of its distinctive features is the inclusion of syntactic symbols, so it is easy to express program structure and complex data types using only the built-in constructs.

The name of the calculus refers to the pipe operator of Unix shells, and also alliterates with π-calculus, a popular process calculus, however they are not closely related.

Pure pipe-calculus is untyped. Construction of a type system is in the exploration phase, placing the focus on well-typed communication of programs.

Although the core calculus is sufficiently expressive to be used as a programming language, with added syntax sugar it can be the basis of various typed or untyped programming languages. Its characteristic features like I/O and stream orientation make pipe-calculus a suitable choice even for scripting languages.

Note: Pipe-calculus is a work in progress. This wiki page reflects the current status of the project and details may well change over time.

Introduction

Motivation

Development of pipe-calculus started with the idea that syntax should be a first class element of programming languages as well as formation rules are integral part of logical systems. A language like that is able to interpret complex input without using a parser library. Pipe-calculus is a minimal language with this capability.

On the practical side pipe-calculus is inspired by shell programming with text streams and stream processing tools. Indeed the idea of the pipe combinator comes from shell scripting.

An example

To get a taste of pipe-calculus, consider a program that adds two natural numbers encoded in unary numeral system. Suppose that we send both numbers over a stream as a series of 1s terminated by an end symbol. Our example language consists of the following expressions.

  • match s expects that the next symbol of the input stream is s. On failure the current branch of the program is aborted.
  • write s appends the symbol s to the output stream.
  • run x recursively runs a routine given its name.
  • p ; q runs p then runs q unless p fails.
  • p | q forks the execution of the program so that one branch runs p, another branch runs q, sharing the same input stream.

Start by writing a routine that copies a natural number from the input to the output.

passNat = match 1; write 1; run passNat | match end; write end

Now we can write the program that writes the sum of two numbers to the output.

addNat = match 1; write 1; run addNat | match end; run passNat

The program terminates after matching and writing out the second end symbol. It skips the first end that terminates the first number and copies everything else. Effectively it concatenates two sequences of 1s. If any other symbol occurs in the input stream, the program fails.

This language is obviously too weak. We can write simple filters, but we can't even check the equality of two numbers. In order to do it, we need a method that allows the first number to be accumulated and read back symbol by symbol as we read the second number. Pipe-calculus can be seen as a framework to study the required extensions that turn this tiny language into a practical programming language.

Connection with other fields

Pipe-calculus is presented as a process calculus. Although it significantly differs from typical process calculi, it still lends itself to this style of presentation. Its development is also inspired by higher-order logic programming, automata theory, type theory (up to the lambda-cube) and algebraic effects [1].

In certain cases the connection can be made more precise.

  • Programs written in pipe-calculus can recognize and generate words of formal languages encoded as terms.
  • Untyped lambda calculus can be translated to a subset of higher-order pipe-calculus (to be specified later).

Variants and computational power

The most basic variant of the calculus that lacks any scoped constructs is called zeroth-order as a reference to zeroth-order logic. Somewhat surprisingly, even the zeroth-order calculus has a limited computational power. In the higher-order calculus there are lexically scoped variables that can bind arbitrary terms. Intermediate variants can be created by extending the basic calculus with new constructs. Determining the computational power of specific variants of the calculus is of interest.

It is an observation and also a design goal that specific variants of the calculus can recognize specific classes of formal languages in the Chomsky hierarchy. For example it is assumed that the zeroth-order calculus with top level recursion can encode finite-state machines and as a consequence it can recognize regular languages.

Informal description

Terms

[math]\displaystyle{ \def\dot{.} \def\seq{\mathrel{;}} \def\alt{\mid} \def\pipe{\rhd} \def\pass{\textbf{pass}} \def\eq{ = ~} }[/math]Pipe-calculus terms are also called processes to emphasize their interactive, behavioral aspect. Each process has a conceptual input and output which are implicit in the rules. Output of one process can be connected to the input of another one. Some terms contain atoms which are unscoped and uninterpreted symbols. In grammatical context one can think of atoms as tokens.

The following terms are available in pipe-calculus.

  • synchronisation, where
    • negative literal prefixing [math]\displaystyle{ a^- \dot s }[/math] is a process that expects an atom [math]\displaystyle{ a }[/math] to be signalled on its input. If the expectation is fulfilled, the process proceeds as [math]\displaystyle{ s }[/math], otherwise it fails.
    • positive literal prefixing [math]\displaystyle{ a^+ \dot s }[/math] is a process that signals an atom [math]\displaystyle{ a }[/math] on its output before proceeding as [math]\displaystyle{ s }[/math].
  • communication, where
    • input prefixing [math]\displaystyle{ ?x \dot s }[/math] is a process that receives a term. On success, it proceeds as [math]\displaystyle{ s }[/math] binding the name [math]\displaystyle{ x }[/math] to the received term.
    • output prefixing [math]\displaystyle{ !s \dot t }[/math] is a process that sends a term [math]\displaystyle{ s }[/math] before proceeding as [math]\displaystyle{ t }[/math].
  • sequence [math]\displaystyle{ s \seq t }[/math] is a composite process that executes [math]\displaystyle{ s }[/math] before proceeding as [math]\displaystyle{ t }[/math] unless [math]\displaystyle{ s }[/math] fails.
  • choice [math]\displaystyle{ s \alt t }[/math] is a forking process where one branch executes [math]\displaystyle{ s }[/math] while the other branch executes [math]\displaystyle{ t }[/math]. Failure of one branch does not affect the other branch.
  • pipeline [math]\displaystyle{ s \pipe t }[/math] is a composite process that enforces unidirectional, left to right data flow between its components.
  • top [math]\displaystyle{ \top }[/math] is a process that is successfully terminated.
  • bottom [math]\displaystyle{ \bot }[/math] is also a terminated process. In certain contexts it denotes failure.

Prefixes

Prefixes are the basic units of behavior (they are called actions in process algebra). In pipe-calculus actions can only occur together with a continuation. Actions can be classified by polarity and kind.

Synchronisation Communication
Positive [math]\displaystyle{ a^+ . t }[/math] [math]\displaystyle{ !s . t }[/math]
Negative [math]\displaystyle{ a^- . t }[/math] [math]\displaystyle{ ?x . t }[/math]

Prefixes are the basic units of interaction. Prefixes of the same kind and opposite polarity can interact. Intuitively, positive prefixes are outbound, while negative ones are inbound. It is also useful to think of positive prefixes as data and negative prefixes as programs. In the zeroth-order calculus only synchronisation is present.

Sequence

Sequences bring the concept of temporal order to pipe-calculus. In [math]\displaystyle{ s \mathrel{;} t }[/math], [math]\displaystyle{ s }[/math] is executed before [math]\displaystyle{ t }[/math] (more precisely, the behavior of [math]\displaystyle{ s }[/math] is observed before the behavior of [math]\displaystyle{ t }[/math]). If [math]\displaystyle{ s }[/math] fails (it reduces to [math]\displaystyle{ \bot }[/math]), [math]\displaystyle{ t }[/math] is not executed at all and the sequence fails.

[math]\displaystyle{ \bot \mathrel{;} t = \bot }[/math]

If the left operand succeeds, the right operand follows.

[math]\displaystyle{ \top \mathrel{;} t = t }[/math]

In [math]\displaystyle{ s ; s }[/math], [math]\displaystyle{ s }[/math] is executed twice. In other words, sequence is not idempotent. Logically [math]\displaystyle{ ; }[/math] has common properties with non-commutative (shortcut) conjunction.

Prefixed forms also obey this temporal ordering: the behavior of the prefix is observed before the behavior of the continuation[2].

Choice

Choice is a forking process. [math]\displaystyle{ s \mid t }[/math] has two branches that exist side by side. Both [math]\displaystyle{ s }[/math] and [math]\displaystyle{ t }[/math] are ready for interaction, however they can't communicate with each other. Both branches participate in the same interaction but they evolve independently. This is different from systems where alternatives are exclusive or they are examined one by one. We can say that choice is laid out in space[3].

Rules

If one of the branches fail, all others remain active. In other words, failure is local to branches.

[math]\displaystyle{ s \mid \bot = s }[/math]

If one of the branches succeeds, the choice succeeds too, eliminating all remaining branches.

[math]\displaystyle{ s \mid \top = \top }[/math]

Identical branches are indistinguishable. In the following example, the behavior of [math]\displaystyle{ s }[/math] is observed only once.

[math]\displaystyle{ s \mid s = s }[/math]

Analogies

Practical concepts that share some properties with choice:

  • The Unix fork system call - as both forks inherit the same environment and the failure of one fork does not affect the other one.
  • Branches in a version control system - branches are often created to explore possible solutions and successful solutions are merged with the main branch.

In both analogies, branches may exist side by side. However branches of a choice are all equal, none of them is outstanding.

Exclusive choice

Given a set of distinct atoms [math]\displaystyle{ { a_1,...a_n } }[/math] and process terms [math]\displaystyle{ s_1,...s_n, t_1,...t_n }[/math], all subterms of [math]\displaystyle{ a_1^+ . s_1 \mid ... \mid a_n^+ . s_n \mid a_1^- . t_1 \mid ... \mid a_n^- . t_n }[/math] are exclusive choices, since all possible interaction with another process leaves at most one active branch. These are examples of synchronization-guarded choice. They allow for the encoding of sum types (with positive guards) and [math]\displaystyle{ case }[/math] expressions (with negative guards) in strongly typed functional languages.

Pipeline

After discussing sequence and choice, the pipe combinator is the first where interaction occurs. [math]\displaystyle{ s \rhd t }[/math] is a composite process that enforces unidirectional data flow between its components. Input of the composite process is forwarded to the input of [math]\displaystyle{ s }[/math], output of [math]\displaystyle{ s }[/math] is connected to the input of [math]\displaystyle{ t }[/math] and output of [math]\displaystyle{ t }[/math] is forwarded to the output of the composite process, according to the following diagram.

[math]\displaystyle{ \underset{s \rhd t}{\boxed{\to \boxed{s} \to \boxed{t} \to}} }[/math]

Compared to the parallel composition [math]\displaystyle{ \mid }[/math] in π-calculus, [math]\displaystyle{ \rhd }[/math] can be classified as unidirectional parallel composition.

Rules

If the operands are prefixed with matching literals, synchronisation occurs and both sides make progress.

[math]\displaystyle{ a^+ . s \rhd a^- . t = s \rhd t }[/math]

If the left operand is an output prefix and the right operand is an input prefix, communication occurs and both sides make progress. The term output by the left side is available in the continuation of the right side.

[math]\displaystyle{ !s . t \rhd ?x . u = t \rhd u[s/x] }[/math]

Arguably these are the key rules of pipe-calculus[4]. Any other combination of prefixes leads to failure. The following example fails because synchronisation cannot occur between different atoms.

[math]\displaystyle{ a^+ . s \rhd b^- . t = \bot }[/math]

If any of the arguments is a choice, each branch of the left side is paired with each branch of the right side.

[math]\displaystyle{ r \mid s \rhd t \mid u = (r \rhd t) \mid (r \rhd u) \mid (s \rhd t) \mid (s \rhd u) }[/math]

Pipe and Unix pipelines

A pipeline in Unix shell programming is a good analogy for the pipe combinator. They have remarkably similar properties. A shell pipeline connects programs that execute concurrently. In [math]\displaystyle{ P1 | P2 }[/math] it cannot occur that the output of [math]\displaystyle{ P1 }[/math] directly reaches the output of the pipeline. If [math]\displaystyle{ P1 }[/math] exits, [math]\displaystyle{ P2 }[/math] continues to run, but if [math]\displaystyle{ P2 }[/math] exits, [math]\displaystyle{ P1 }[/math] will exit too[5]. Although Unix processes can handle multiple input and output streams, only the standard input and output streams are passed on through the pipe.

Recognition and interpretation

One can also think of [math]\displaystyle{ s \rhd t }[/math] in a syntactic setting. In this case [math]\displaystyle{ s }[/math] represents input or source, while [math]\displaystyle{ t }[/math] plays the role of a recognizer, parser or interpreter.

[...]

Formal definition

Syntax

[math]\displaystyle{ \def\pos{\mathsf{pos}~} \def\neg{\mathsf{neg}~} \def\seq{\mathrel{;}} \def\alt{\mid} \def\pipe{\rhd} \def\eq{ = ~} \def\rule{ ::= ~} \begin{alignat}{3} & \mbox{Atoms}~ & a, b \rule & \mathsf{A} \mid \mathsf{B} \mid \mathsf{'foo} \mid \mathsf{'bar} \mid \mathsf{'+} \mid \mathsf{'*} \mid ... \\ & \mbox{Variables}~ & x, y \rule & \mathsf{a} \mid \mathsf{b} \mid ... \\ & \mbox{Prefixes}~ & p, q \rule & ~ a^+ & & \text{Positive literal (signal)} \\ & & \mid & ~ a^- & & \text{Negative literal (match)} \\ & & \mid & ~ ?x & & \text{Abstraction (input)} \\ & & \mid & ~ !s & & \text{Argument (output)} \\ & \mbox{Terms}~ & s, t \rule & ~ p . s & & \text{Prefix operator} \\ & & \mid & ~ s \seq t & & \text{Sequential composition} \\ & & \mid & ~ s \mathsf{|} t & & \text{Choice} \\ & & \mid & ~ s \pipe t & & \text{Pipe (interpretation)} \\ & & \mid & ~ \top & & \text{Termination, also success} \\ & & \mid & ~ \bot & & \text{Termination, also failure} \\ & & \mid & ~ x & & \text{Variable} \\ & & \mid & ~ (s) & & \text{Grouping} \\ \end{alignat} }[/math]

Operator summary
Symbol Associativity[6] Precedence
[math]\displaystyle{ \seq }[/math] right highest
[math]\displaystyle{ \mid }[/math] right
[math]\displaystyle{ \rhd }[/math] right lowest

Equational theory

Pipe-calculus can be described as a process algebra [7] [8] . The following structural laws are satisfied by all [math]\displaystyle{ s, t, u }[/math] pipe-calculus terms.

[math]\displaystyle{ \def\seq{\mathrel{;}} \def\alt{\mid} \def\pipe{\rhd} \def\eq{ = ~} }[/math]Sequential composition is associative with left neutral element [math]\displaystyle{ \top }[/math] and left absorbing element [math]\displaystyle{ \bot }[/math].

[math]\displaystyle{ \begin{align} (s \seq t) \seq u \eq & s \seq (t \seq u) \\ \top \seq t \eq & t \\ \bot \seq t \eq & \bot \\ \end{align} }[/math]

Choice is associative, commutative and idempotent with neutral element [math]\displaystyle{ \bot }[/math] and absorbing element [math]\displaystyle{ \top }[/math].

[math]\displaystyle{ \begin{align} (s \alt t) \alt u \eq & s \alt (t \alt u) \\ s \alt t \eq & t \alt s \\ s \alt s \eq & s \\ \top \alt t \eq & \top \\ \bot \alt t \eq & t \\ \end{align} }[/math]

Sequential composition is right distributive over choice.

[math]\displaystyle{ \begin{align} (s \alt t) \seq u \eq & s \seq u \alt t \seq u \end{align} }[/math]

Pipe has left neutral element [math]\displaystyle{ \bot }[/math] and right absorbing element [math]\displaystyle{ \top }[/math].

[math]\displaystyle{ \begin{align} \bot \pipe t \eq & t \\ s \pipe \top \eq & \top \\ \end{align} }[/math]

Todo: pipe is also associative for well behaving operands. A process whose branches are all of the same prefixed form are certainly well behaving.

Normal forms

Normal form of pipe-calculus terms is defined by a term rewriting system[8]. Rewrite rules can be applied to any matching subterm of a term. A term is in normal form if no rewrite rule applies to any of its subterms.

[math]\displaystyle{ \def\seq{\mathrel{;}} \def\alt{\mid} \def\pipe{\rhd} \def\in{\,?} \def\out{!} \def\rew{\mathrel{\Rightarrow}\,} }[/math] [math]\displaystyle{ \begin{alignat}{1} (s \seq t) \seq u \rew & s \seq (t \seq u) \\ (s \alt t) \seq u \rew & s \seq u \alt t \seq u \\ \top \seq s \rew & s \\ \bot \seq s \rew & \bot \\ \\ (s \alt t) \alt u \rew & s \alt (t \alt u) \\ a^+ . s \alt a^+ . t \rew & a^+ . (s \alt t) \\ a^- . s \alt a^- . t \rew & a^- . (s \alt t) \\ \bot \alt s \rew & s \\ s \alt \bot \rew & s \\ \top \alt s \rew & \top \\ s \alt \top \rew & \top \\ \end{alignat} }[/math]

(Todo: these rules don't implement idempotence of [math]\displaystyle{ | }[/math] because commutativity is left out from the rules so equal terms don't always have a chance to "meet" on sides of [math]\displaystyle{ | }[/math].)

While all other rules are syntax directed, rules for [math]\displaystyle{ \pipe }[/math] have to be attempted in the listed order for the following reasons.

  1. The processes [math]\displaystyle{ a^- . s \pipe a^+ . t }[/math] and [math]\displaystyle{ \in x . s \pipe \out t . u }[/math] behave differently depending on the order of rule application, even if their normal forms are identical.
  2. The process [math]\displaystyle{ r \alt s \pipe t \alt u }[/math] produces subresults in different order depending of the order of rule application. Explicit application order avoids nondeterminism in both cases.
  3. We need a default rule that fails in order to ensure that [math]\displaystyle{ \pipe }[/math] never occurs in normal forms[9].

[math]\displaystyle{ \begin{alignat}{2} a^+ . s \pipe a^- . t \rew & s \pipe t \\ \out s . t \pipe \in x . u \rew & t \pipe u[s/x] \\ s \alt t \pipe u \rew & (s \pipe u) \alt (t \pipe u) \\ s \pipe t \alt u \rew & (s \pipe t) \alt (s \pipe u) \\ \bot \pipe t \rew & t \\ s \pipe \top \rew & \top \\ s \pipe t \rew & \bot & \\ \end{alignat} }[/math]

Notice that finding a matching rewriting rule does not require checking the equality of terms, only of atoms.

Extensions

  • Recursion (μ-recursion, top level recursion, regular grammars)
  • Stack (context-free grammars?)
  • Ordered choice (PEG, backtracking, local context)
  • Intersection (commutative, idempotent conjunction)
  • Replication

Pipe-calculus and process calculus

Process calculi (or process algebras) is an approach for formally modeling concurrent systems. Process refers to the behaviour of a system. The word algebra denotes an algebraic approach in talking about behaviour, while the word calculus emphasizes the computational aspect of processes. [7] There are lots of implementations of this approach and none of them can be pointed to as the primary one.

A process calculus can be characterized by the atomic units of behaviour (actions, events) and formers of compound processes (operations, combinators). Implementations may include several one of both categories and they are often open for extensions.

Interaction and behaviour

It is a recurrent pattern in the literature that processes are treated in one of two settings: one can combine component processes and examine the evolution of the resulting process. In this setting component processes may interact with each other and together they produce a compound process. In the other setting a single process is examined by placing it into a (not strictly specified) environment which is able to interact with the process. In this case the emphasis is placed on the observable behaviour of the process that is the possible sequences of actions performed by the process.

Pipe-calculus as a process calculus

Pipe-calculus fits into this framework if we make the following observations.

  • Prefixes are atomic units of behaviour (actions).
  • Prefixes with sequential composition and choice form basic process algebra (BPA). [8]
  • The pipe operation can be seen as a restricted, unidirectional parallel composition.

There are also notable differences.

  • The most outstanding difference is the lack of channels. Pipe-calculus processes only have a conceptual input and output, however one can simulate channels with synchronisation where atoms stand for channel names.
  • Communication primitives in mainstream process calculi often inseparably unite two conceptual steps: the selection of a communication channel and the act of message sending. For example π-calculus [10] defines input prefixing [math]\displaystyle{ c\left(x\right).P }[/math] and output prefixing [math]\displaystyle{ \overline{c} \langle y \rangle.P }[/math] to denote receiving a name via channel [math]\displaystyle{ c }[/math] and sending a name via channel [math]\displaystyle{ c }[/math] respectively, before proceeding as [math]\displaystyle{ P }[/math]. Although one can introduce as convention synchronization where the communicated message is ignored, and broadcast where a group of processes are receiving messages on the same channel. In pipe-calculus these steps are separated by design.
  • In pipe-calculus there is no counterpart of name hiding that prevents interference between two groups of processes using the same channel name for communication. Since there is no general, bidirectional parallel composition, we don't need a hiding primitive. The pipe operation naturally restricts communication to a pipeline.
  • In pipe-calculus, if a process fails to synchronize, it is aborted. This is different from the usual implementation where a process that fails to synchronize does not evolve, it "keeps waiting" for a message, opening up the possibility for asynchronous communication.

Notes

  1. Matija Pretnar: An Introduction to Algebraic Effects and Handlers
  2. Wikipedia implies that prefixed forms are specialized forms of sequential composition: "In process calculi, the sequentialisation operator is usually integrated with input or output, or both." (Process calculus on Wikipedia)
  3. This suggestive phrasing is borrowed from Lennart Augustsson et al.: The Verse Calculus: a Core Calculus for Functional Logic Programming.
  4. Compare with reduction semantics of process calculi on Wikipedia.
  5. The process will exit next time when it tries to write to stdout, at least if the default SIGPIPE handler is not replaced by the user.
  6. Notice that left or right associativity is a grammatical property of a binary operator, which differs from the mathematical associativity of the respective operation. Even associative operations may be represented by left or right associative operators. See Operator associativity.
  7. 7.0 7.1 J.C.M. Baeten: A brief history of process algebra., Theoretical Computer Science 335 (2005) 131 – 146
  8. 8.0 8.1 8.2 Wan Fokkink: Introduction to Process Algebra, Springer-Verlag
  9. The default case allows for embedding synchronisation failure in the object language, so it can be captured by [math]\displaystyle{ \mid }[/math].
  10. Joachim Parrow: An Introduction to the π-Calculus, chapter to appear in Handbook of Process Algebra, ed. Bergstra, Ponse and Smolka, Elsevier