Gds Machines Overview

Goto: Documentation Home

Contents

About

Technical

Deep Technical

Gds Machines allow you to quickly define complex processing tasks using a graphical representation.

Business users typically define reports and requirements using flow diagrams, with Gds Machines these diagrams can be executed by assembling building blocks together, the same way a child constructs a crane out of levers and pulleys

(@@picture of spam filter)

Machines execute runtime decision and query processing, much like SQL in a database. machines define a number of tiny highly focused engines (machines) which perform a single task, such as fetching values, or searching for single scalers or intersecting two sets. Each machine can only communicate with other machines over a MachineWire, passing MachineDataNodes. This specialisation allows for highly efficient algorithms to be implemented, and also permits relatively easy multi threading and grid operation. Machines are very much like electronic circuits, where each discrete chip performs a highly specialised function and the way in which these chips are arranged produces the overall outcome.

Gds machines create a series of primitive steps that can be applied to data and processing. These primitive steps (or "chips") are similar to electronics integrated circuits; each chip performs a specialised task, but without much knowledge of the overall environment. These chips can be dynamically assembled to create large processing machines in much the same way as electronic components are assembled together to create larger machines.

Gds machines are often quite high level in action. There is a machine that reads CSV format files and produces output, others provide a range of filtering and sorting, while still others perform dictionary lookups.

Gds machines were primarily designed to allow compotent but non technical users to describe a process, such as creating a report, without having to learn a programming language. As machines are dynamic, users can experiment with actual data and see results immediately. With machines being high level, much of the technical detail is abstracted away, allowing reasonably simple graphical depections of the process.

Example 1. The following machine takes an input file of "words", and checks them against another file of "dictionary", routing any words that are not in the dictionary to an output set. This output set is the set of "words" that are unknown.

Gds Machine FilterIn
GdsMachineFetch_CSV (words) 100 100 110  
GdsMachineFetch_CSV (Dictionary) 100 101 111100GdsMachineSet

Example 2. The following machine might be used in realtime processing during a sale to apply a special discount, using machines in this manner permits a virtually unlimited range of processing choices.

Filter, Product='Cola' Filter, Payment='Cash' Filter, Day='Tuesday'
100 101 100 101 100 101 Continue processing
e.g. "apply discount"

Machines have the following characteristics

  1. Machines are created in high level languages to perform a specific function, such as calculating a formula
  2. Machines are dynamically created and called via a machine executor
  3. Operating machines can be very fast to run, the example above with dictionary searching will often run faster than the equivalent in a database engine.
  4. As machines are performing a single task, with control on definition, input and output, they can be automatically tested
  5. The machine executor is able to abstract away network operation, and machines can be running in a networked grid environment without knowing it.
  6. The machine executor is able to launch multiple alternative techniques to solve the problem, running them in parallel. This is especially useful in searching type queries where different algorithms may excel with certain types of input patterns.
  7. An machine optimisation phase will try and replace common user machine designs with more optimal designs, for example replacing a "fetch"->"filter" pair with a "fetch-filter" machine where it exists.
  8. The description of machines (STEM and STEP blocks) defines the interconnections between machines, but not the realm of machines available. This allows machines to be translated to a range of different runtime languages, including Javascript.

Internals

Each operating machine has a "base" definition. This defines the actual code stream to execute, its input and output buses and constants. In OO programming terms, this is an instance of an object.

Machines are wired to each other via a BUS. Multiple machines can share the same input bus, it is not limited to 1-1 linking. In electronic terms, the fan-in is 1, the fan-out is unlimited.

Each bus, carries "data-nodes", which are objects holding data from previous steps. Each machine is free to add, edit, delete, hold etc the data nodes. Individual machines can only communicate via data-nodes, if there is a need for other types of communication for some reason, then combine the routines and create an uber-machine.

Machines do not need to buffer explicitly or flow control. The machine executor will select machines for execution and manage the flow control between machines.

DataNodes

Creation, stack, from machine only use DN, not external signals

Data Types

Predicates

Chips

A chip is a single purpose device that performs a tightly defined operation. In programming terms it is similar to a function or a program. A chip might exist to check an input word for correct spelling, or to calculate statistics on an array of numbers. How the chip works internally is left to the chip designer, but the interfaces are clearly defined.

Chips are primarily focused on high level operations, rather than "add two numbers", but there is nothing stopping chips from performing any function. Chip designers should ensure that operations performed by chips are of sufficient purpose that overheads of passing datanodes in and out are not the main use of resources.

Chips once published are should be considered fixed in operation, as if they had been built using physical components. Revisions to chips are allowed but should create a revision number. This allows systems that require high predicatability to specify the exact revision they wish to use, while systems that want latest revisions will also use the highest revision available at runtime. (What about different data tables, eg dictionaries?)

If a chip interface is changed in such a way that breaks any current use then a new chip part number should be created.

Environment

An environment defines the overall parameters within which machines and chips operate. A chip that fetchs information from a database needs to know which database to use (in the absence of special signals to select specific databases), this sort of information is contained in the environment.

Machines

A machine is a collection of one or more chips assembled in a structure to do something. Users create machines by piecing together chips and machines run the overall process. A machine is similar to a program. Machines exist inside an environment, where an environment may be running multiple machines at once.

For example if I wish to spell check all product names in a database I night create a machine consisting of a chip "fetch all product names" and connect this to "check spelling" chip. This collection of two chips is now a machine capable of processing.

A Machine itself can be packaged with fixed input and outputs and become a chip. Users then see the chip and the exact complexity inside the chip is hidden. A business user may create a machine that is "Significant sales in 2012", package as a chip for other users who no longer need to know how significant sales are determined, merely that this chip will process/provide them in some fashion.

Optomisation

TBS. Optimisation is performed externally by the machine executor. It may rearrange defined wiring, move processing order or other changes. Chips are able to provide signals to aid optimisation but do not explicitly communicate with each other for this purpose. (Exception, some chips have IO busses that are designed to communicate across each other - database query chips - but these are technically chips being wired into the machine)

IPO vs Events

Chips primarily work on the model Input-Process-Output. Chips are not currently designed for working at User Interface layers (not primary/initial focus). Some chips however can generate Output without input, such as a timer chip.

Distributed Processing

TBS. Chips should not assume that Datanodes in or out are to chips that are immediately resident, the machine executor may be running chips on completely different environments.

Javascript

Machines can be defined in javascript inside a browser environment. This allows javascript functions to access server side chips easily.

Example (stylised)

	var m = new Machine;
	var p = new ChipListProducts(m);
	var s = new ChipCheckHaveSales(m);
	s.SetRequiredYear(2012);
	p.WireOutput(s);

	// Sync processing
	m.RunSync();
	while (! s.IsEOF()) {
		var record = s.GetOutput();
		// Process single record of product,  which has sales in 2012
	}

	// Or Async processing
	s.WireOutput(SingleRecordCallbackFunction);
	m.RunAsync();

English Machines

TBS. Machines can be defined using the Chip (...) to convert english sentences and phrases to machines that can run inside an environment.

Samples: