What is a dataflow programming language? Why use it? And are there any benefits to it?
In a control flow language, you have a stream of instructions which operate on external data. Conditional execution, jumps and procedure calls change the instruction stream to be executed. This could be seen as instructions flowing through data (for example, instructions operate on registers which are loaded with data by instructions - the data is static unless the instruction stream moves it). A control flow "if" statement jumps to the correct branch in the instruction stream, but the data does not get moved.
In a dataflow language, you have a stream of data which is passed from instruction to instruction to be processed. Conditional execution, jumps and procedure calls route the data to different instructions. This could be seen as data flowing through otherwise static instructions like how electrical signals flow through circuits or water flows through pipes. A dataflow "if" statement would route the data to the correct branch.
Some examples of dataflow features and languages:
- Spreadsheets are essentially dataflow
- Unix pipes
- Futures and promises are dataflow or dataflow-like constructs found in many modern languages
- The messaging in the actor model is dataflow
- Some languages have dataflow features:
- Oz has dataflow variables
- Groovy has GPars
- Clojure has clojure.contrib.dataflow, Reagi and Javelin
Textual languages
- VHDL, Verilog and other hardware description languages are essentially dataflow
- Chuck
- Cunieform
- Lustre, used in defence, aerospace and power plant industries
- Ptolemy II
- Nyquist
Visual Languages
- LabVIEW (screenshot [source])
- Max/MSP (screenshot [source])
- Pure Data (screenshot [source])
- Reaktor (screenshot [source])
- SCADE (screenshot [source]), the graphical programming environment for Lustre
- SynthMaker (screenshot [source]) and FlowStone
- vvvv (screenshot [source])
- Expecco (screenshot [source])
- Shake (screenshot [source])
- [BLOK] (screenshot [source])
- Quartz Composer (screenshot [source])
- AudioMulch (screenshot [source])
Products which embed a visual dataflow language:
- Blender
- Voreen (screenshot)
- Unreal Engine's Kismet (screenshot)
- ANKHOR FlowSheet (screenshot)
- Dynamo for Autodesk Revit (screenshot)
- LiveBlox (screenshot)
Dataflow programming languages are ones that focus on the state of the program and cause operations to occur according to any change in the state. Dataflow programming languages are inherently parallel, because the operations rely on inputs that when met will cause the operation to execute. This means unlike a normal program where one operation is followed by the next operation, in a dataflow program operations will execute as long as the inputs are met and thus there is no set order.
Often dataflow programming languages use a large hashtable where the keys are the data of the program and the values of the table are pointers to the operations of the program. This makes multicore programs easier to create in a dataflow programming language, since each core would only need the hashtable to work.
A common example of a dataflow programming language is a spread sheet program which has columns of data that are affected by other columns of data. Should the data in one column change, other data in the other columns will probably change with it. Although the spread sheet program is the most common example of a dataflow programming language, most of them tend to be graphical languages.
One kind of dataflow programming is reactive programming. When this style of programming is used in a functional language, it's called functional reactive programming. An example of a functional reactive programming language for the web is Flapjax.
Also, anic is a dataflow language recently discussed on Hacker News.
Another example is Martlet from Oxford.
Dataflow programming languages propose to isolate some local behaviors in so called "actors", that are supposed to run in parallel and exchange data through point-to-point channels. There is no notion of central memory (both for code and data) unlike the Von Neumann model of computers.
These actors consume data tokens on their inputs and produce new data on their outputs.
This definition does not impose the means to run this in practice. However, the production/consumption of data needs to be analyzed with care: for example, if an actor B does not consume at the same speed as the actor A that produce the data, then a potentially unbounded memory (FIFO) is required between them. Many other problems can arise like deadlocks.
In many cases, this analysis will fail because the interleaving of the internal behaviors is intractable (beyond reach of today formal methods).
Despite this, dataflow programming languages remain attractive in many domains:
- for instance to define reference models for video encoding : a pure C program won't do the job because it makes the assumption that everything run as a sequence of operations, which is not true in computers (pipeline, VLIW, mutlicores, and VLSI). Maybe you could have a look at this: recent PhD thesis. CAL dataflow language is proposed as a unifying language for next generation video encoders/decoders reference.
- Mission critical where safety is required: if you add some strong assumptions on the production/consumption of data, then you get a language with strong potential in terms of code generation, proofs, etc. (see synchronous languages)
Excel (and other spreadsheets) are essentially dataflow languages. Dataflow languages are a lot like functional programming languages, except that the values at the leaves of the whole program graph are not values at all, but variables (or value streams), so that when they change, the changes ripple and flow up the graph.
Mozart has support for dataflow-like synchronization, and it does have some commercial applications. You could also argue that make is a dataflow programming language.
It is actually quite an old concept - in the 1970s, there was even a language + machine built for efficient dataflow programming and execution (Manchester Dataflow Machine).
The great thing about it is its duality to lazy functional languages like Haskell. Therefore, if your processing steps are pure functional, and given you have enough processing units to evaluate them and pass results around, you get maximum parallelity for free - automatically and without any programming effort!
Many ETL tools are also in this realm. The dataflow tasks in MS SSIS are a good example. Graphical tool in this case.
There are certain domains where dataflow programming just makes a lot more sense. Realtime media is one example, and two widely used graphical dataflow programming environments, Pure Data and Max/MSP, are both focused on realtime media programming. I suppose their visual nature also maps nicely to the dataflow programming.
You could try Cameleon: www.shinoe.org/cameleon which seems to be simple to use. It's a graphical language for functional programming which has a data(work)-flow approach.
It's written in C++, but it can call any type of local or distant programs written in any programming language.
It has a multi-scale approach and seems to be Turing complete (this is a Petri net extension).
© 2022 - 2024 — McMap. All rights reserved.