OneTick Map-Reduce is a Hadoop based solution combining OneTick’s analytical engine with the MapReduce computational model that can be used to perform distributed computations over large volumes of financial tick data. As a distributed tick data management system, the OneTick internal architecture provides support for databases that are spread across multiple physical machines. This architecture designed for distributed parallel processing improves query performance as the typical OneTick query is easily parallelize-able at logical boundaries (e.g. running the same query analytics across a large symbol universe) and can be processed on a separate physical machine.
On April 26, 2016 we had a very successful broadcast webinar on the details behind how OneTick’s large collection of built-in analytical functions and query design can easily leverage the Hadoop middleware framework for large scale parallel processing. You can watch the recording at this link or click on the image:
OneTick Map-Reduce is a dynamically distributing data (stored in OneTick historical archives) and computation across the nodes using a combination of distributed file system (HDFS) and the MapReduce computational framework.
- OneTick archives are stored on a distributed file system (e.g. HDFS with Amazon S3 as a backup). The distributed file system serves as an abstraction layer providing shared access — physically the data resides on different nodes of the cluster. The distributed file system is also responsible for balancing disk utilization and minimizing the network bandwidth.
- Hadoop’s MapReduce daemons are responsible for distributing the query across the nodes of the cluster, by taking into account the locality of the queried data.
- The distributed OneTick query is an analytical process that semantically defines a user’s business function. OneTick query analytics are designed specifically for that purpose.
OneTick provides a large collection of built-in analytical functions which are applied to streams of historical or real-time data. These functions referred to as Event Processors (EPs) are a set of business and generic processors that are semantically assembled in a query and ultimately define the logical, time series result set of a query. Event Processors include aggregations, filters, transformers, joins & unions, statistical and finance-specific functions order book management, sorting and ranking, and input and output functions.
The OneTick Map-Reduce design allows an easy to switch between different data representation/job dispatching models – affording support for an internal model and external model. Users define their “map”, “reduce” operations in this restricted computational model and the framework takes care of the parallelization.
Once again thanks for reading.
For an occasional opinion or commentary on technology in Capital Markets you can follow me on twitter, here.