GDS Technical Overview

Goto: Documentation Home

GDS is a high speed storage and retrieval engine, coupled with strict interface protocols and communications systems. It is ideally suited to rapid SQL like processing in a distributed environment, and as a reference data server for Web Servers. Data is fetched both directly, like a classic database, and via "fuzzy" techniques that allow considerable flexibility with input formats. GDS also implements a system of universal domains, allowing standard databases and tables to "join" to large reference databases.

Performance A GDS server predominatly uses a column structured database internally. It also uses row structured and other techniques when more appropriate. The column store technique has been around for many years, but is more suited to analytical processing rather than classic OLTP workloads. GDS servers are mostly read access and therefore use column processing techniques to achieve higher query response times than conventional databases. GDS also uses techniques such as specialised query handlers for special case data access, allowing highly customised algorithms when required.

Multiple Databases You can connect multiple databases and other data sources, each drawing from different areas of the business. Queries can be transparently run over multiple databases, either going back to the source database or locally processed.

Grid Processing In order to maximise throughput GDS implements local LAN based grid processing. This allows you to utilise idle CPU and unused storage and on other systems within your network. The grid client is designed to co-exist with other users on the node. Grid client programs operate within a simple framework, and need not concern themselves with overly technical details of GDS, they simply process small information fragments. Customers can create their own grid clients if desired.

Dynamic Data Retrieval Using the internet as a database, the GDS server will connect to other trusted GDS servers and download required data on demand. The data is pulled to your local server in order to increase query performance and also means that exact details of your queries remain private. (Your data is not sent outside your servers). This allows users to extend reports beyond the boundaries of their information to include geographic, demographic or other reference data.

Universal Domains Often data in database columns refer to specific "domains" of knowledge. By tagging fields as containing these reference universal domains you permit GDS to join to external data via those values. This universal domain matching is flexible enough that you can join via universal domains such as "person name".

AJAX friendly GDS includes a number of options specifically to enable simple interaction using standards such as XML for record data exchange, automatic tools, Javascript objects and many examples.

Lightweight Protocols One of the key requirements is for GDS to be able to communicate and process with embedded controllers and small computing systems. The base data communication is call GNAP, and is a fixed binary protocol that can be rapidly processed and decoded on micro controllers. Binary protocols do not suit everyone, and GNAP can be easily converted to a highly structured XML on the fly for browsers.

Minimal DBA effort GDS is predominatly a self tuning system. It is designed to be drop in, run, forget. Clearly there are some knobs and levers, but mostly it reacts to workloads, environment and type of data being processed. As an example, the SQL command "create index" is typically a hint to GDS about expected workload patterns (indexes are not always necessary in Column Store databases anyway)

What isn't GDS?

GDS is not a general purpose database. It is optimised towards data analysis an retrieval. While it does offer high performance Read/Write access, you should continue to use classic OLTP databases. The read/write database in GDS is more suited to distributed eventually consistent databases.

Environmental Impact

Gds servers utilise multiple threads and are designed to use all resources available if needed. A Gds process can easily use 100% of several cores for extended periods if it is precalculating or optimising resources. These threads are run at lower internal priority and should not impact normal system operation.

Gds operates slightly faster on non virtual platforms, but is still a good candidate for virtual servers if these are much larger computers than would normally be used. A Gds service will often idle for long periods of time.

When resolving some queries, Gds will sometimes start several different techniques and drag race them to see which is quicker

To get maximum performance from Gds Servers:

Select processors with large L1 and L2 caches.
Select systems with high main memory bandwidth. Gds often scans large blocks of memory and main memory bandwidth is often a resource constraint.
On virtual servers ensure that the advertised number of cores is equal or less than the actual number. Internally Gds detects the number of cores and limits to this number of cpu bound activity.
On virtual servers, ensure that the advertised CPU type is less or equal to the actual CPU. Gds actively changes code paths based on architecture so avoid Gds selecting instructions that will be emulated anyway.
Disk IO Paths should be fast reacting.
Network adapters should be fast enough for the load. If you are using grid processing, you may like to ensure network segments are seperated. Gds sends UDP unicast and broadcast messages to cooperating clients.
Raw CPU speed is important, but memory bandwidth will often have more impact.