Currently there are no plans to change the primary storage to an SQL DBMS. However one can post-process and import the cleaned tick data into SQL DBMS similar to how the 5min and daily tick data is generated from the file-based DBMS (tree/hold/(updater,ibupdater)). This is the approach I would recommend so long as you don't need as "real-time" access to the data from the data-feed.
regards,
-George
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
We considered and rejected storing the tick data in a SQL database when we first designed TREE, and I still think that was the right decision. Standard SQL just isn't very well suited for this problem. A SQL database won't have any tick-specific storage optimization --- if you store each tick as a separate row, it's going to be *at least* a factor of ten larger than the TREE files, probably a lot more depending on how much row overhead your preferred DB has. (You could buy back some of that by aggregating multiple ticks per row, assuming that your DB has a suitable array type, but that just makes the manipulations even more awkward.) And the real problem is that SQL just isn't designed to provide the operations you need. Or at least it isn't designed to do what we wanted TREE to do. The basic requirement for the automated trading work we were doing was "give me the ticks for symbol X, in time order, from time T1 to time T2". A standard SQL database can only do that if it has a suitable index and can scan the index in parallel with the table data. So that's even more disk space consumed, and even more I/O (nonsequential I/O at that) to do the basic task. Contrarily, what SQL can do well (eg, update individual rows) is mostly not very
useful for tick data.
Now it might be that for whatever you want to do, SQL actually offers some useful processing/analysis leverage. But you had better be perfectly clear on what it is that you are gaining in exchange for the space and time overheads you'll be adding.
I've got nothing against SQL: we used a SQL database in combination with the tick database when we were doing TREE, and I've spent my time since then working full-time on a SQL database engine. But IMHO it's not the right tool for this job.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Was wondering if you plan to allow for the use of a sql db rather than a file based db?
Currently there are no plans to change the primary storage to an SQL DBMS. However one can post-process and import the cleaned tick data into SQL DBMS similar to how the 5min and daily tick data is generated from the file-based DBMS (tree/hold/(updater,ibupdater)). This is the approach I would recommend so long as you don't need as "real-time" access to the data from the data-feed.
regards,
-George
We considered and rejected storing the tick data in a SQL database when we first designed TREE, and I still think that was the right decision. Standard SQL just isn't very well suited for this problem. A SQL database won't have any tick-specific storage optimization --- if you store each tick as a separate row, it's going to be *at least* a factor of ten larger than the TREE files, probably a lot more depending on how much row overhead your preferred DB has. (You could buy back some of that by aggregating multiple ticks per row, assuming that your DB has a suitable array type, but that just makes the manipulations even more awkward.) And the real problem is that SQL just isn't designed to provide the operations you need. Or at least it isn't designed to do what we wanted TREE to do. The basic requirement for the automated trading work we were doing was "give me the ticks for symbol X, in time order, from time T1 to time T2". A standard SQL database can only do that if it has a suitable index and can scan the index in parallel with the table data. So that's even more disk space consumed, and even more I/O (nonsequential I/O at that) to do the basic task. Contrarily, what SQL can do well (eg, update individual rows) is mostly not very
useful for tick data.
Now it might be that for whatever you want to do, SQL actually offers some useful processing/analysis leverage. But you had better be perfectly clear on what it is that you are gaining in exchange for the space and time overheads you'll be adding.
I've got nothing against SQL: we used a SQL database in combination with the tick database when we were doing TREE, and I've spent my time since then working full-time on a SQL database engine. But IMHO it's not the right tool for this job.