Home Log in to Edit

Authors:

Attachments

test_0.9237413925999745.jpg (34896 bytes)

PgBsh, PgNetted, PgStocked

PgStocked is Quote and Static Data server for financial instruments written in Postgres PLSQL, PLJava and Java. PgStocked consists of three modules:

PgBsh: a scripting engine for Java code, based on BeanShell2
PgNetted: a fully featured web scrapper with scheduler and async process queues, based on HtmlUnit and PgAgent
PgStocked: the quote and static data server, integrating a very small web server and talib

PgBsh

PgBsh stands for Postgres Beanshell 2, which is designed for a standalone use as well and lives like the other modules in his own schema. PgBsh is the very base of PgNetted and is responsible for data mining. With PgBsh you are able to run interpreted Java (not Java Script) code inside your Postgres Database. This opens your world to use any 3rd party Java library inside your database without doing any compile/create action. Check out this example:

postgres=# select (quick_eval).stdout from quick_eval(
postgres(# $$
postgres$#   import com.tictactec.ta.lib.*;
postgres$#
postgres$#   static void foo(){
postgres$#     print("Hello talib World: " + (new Core()).toString());
postgres$#   }
postgres$#
postgres$#   foo();
postgres$# $$);
                         stdout
--------------------------------------------------------
 Hello talib World: com.tictactec.ta.lib.Core@e235e9d\r+

PgNetted

As we have seen at PgBsh we can use interpreted Java code using any jar we want. The PgNetted web scrapper uses HtmlUnit and offers you a scriptable fully featured web browser including java script. So the scrapper code itself is in Java syntax but is called from within your database, stored as plain text in a table.

The classic finance.yahoo eod data scrapper looks like this:

select install_scrapper('yahoo_eod', $$
import java.sql.Timestamp;
import com.gargoylesoftware.htmlunit.*;
import com.gargoylesoftware.htmlunit.html.*;
import net.sourceforge.pgnetted.*;

Timestamp start = Timestamp.valueOf(SHARE.getOrArg("job_start_run", "1900-01-01 00:00:00").replaceAll("\\+\\d\\d$",""));
Timestamp now = new Timestamp( Calendar.getInstance().getTime().getTime() );
String symbol = SHARE.getOrArg("symbol", "MAN.DE");
String startDay = "" + start.getDate();
String startMonth = "" + start.getMonth();
String startYear = "" + (start.getYear() + 1900);
String endDay = "" + now.getDate();
String endMonth = "" + now.getMonth();
String endYear = "" + (now.getYear() + 1900);

String link = "http://ichart.finance.yahoo.com/table.csv"
  + "?s=" + symbol
  + "&a=" + startMonth
  + "&b=" + startDay  
  + "&c=" + startYear  
  + "&d=" + endMonth 
  + "&e=" + endDay 
  + "&f=" + endYear  
  + "&g=d&ignore=.csv";

WebClient wc = WebFactory.getWebClient(true);
TextPage page = wc.getPage(link);
String[] lines = page.getContent().split("\\n");
wc.closeAllWindows();

for (int i=0; i<lines.length; i++) {
  String[] columns = lines[i].split(",");
  SHARE.put(columns[0], columns);
}
$$, hstore('symbol', 'MAN.DE')); -- pass a default for missing parameter

-- execute scrapper:
select scrap4('yahoo_eod', 'symbol=>^DJI, 'job_start_run=>2011-01-01 00:00:00'::hstore, null);

If yahoo changes its URL, just change the line in the script an you are done. This makes scrapper maintenance pretty easy. A simple SQL script will fix that within a second, no compiler needed.

Since we are using Java, Beanshell and HtmlUnit any complex web scrapper is imagine able. In example filling forms or reading java script streams (real time data). Using data from different sources like files or JDBC data sources is also possible with ease.

By using PgAgent it is easy to schedule regular scrappers. And with the implemented process queue it is possible to do asynchronous (scrapper) tasks in parallel.

Once scrapping data is finished a trigger get fired and executes so called _subscribers _ (asynchronous without any affect to the current transaction). Subscribers are responsible to load data into PgStocked or reschedule other scrapping tasks.

PgStocked

Within PgStocked your data gets organised and prepared to data mine. The goal is not to simply store stupid eod data, the goal is to store anything! Since anything is hard do declare the data structure have to be flexible enough to do so. Lets imagine the following situation: You are developing a trading strategy based on candle sticks. So all you need is simple stupid bar data (open, high, low, close, volume). But now you want to use your strategy only on companies with a gaining EBITDA. What can you do? With PgStocked you do not have to care about data structure just organise a feed (scrapper) for your needed data and feed it. On another day you hear about using tweets to predict stock prices. You think this would be interesting and you would like to give it a try. What can you do? Prototyping in Excel? No, just organise a feed and propagate it to your PgStocked.

Once all the needed data is collected and stored, of course we want to do something cool with them :-) In PgStocked you can – as you could with PgNetted – use the Beanshell scripting engine. At this time an easy access to a Talib implementation is available too. Now with Java an Talib we can do pretty nice things like for example calculating am simple 20 day moving average over the prive movement (todays prince – yesterdays price):

set client_encoding to 'WIN1252';
select pgstocked.calc_indicators('`', '´'
 ,'select dte, close from pgstocked.quotes where fi_id = 3133 order by dte desc, tme desc'
 ,'New ("cool"); cool=Talib.calc("sma", `close(0)-close(-1)´, 20).get(0);
   Return("dte", "close", "cool");'
);
select * from tmp_resultset;

id  |         cool         |    dte     | close
----+----------------------+------------+-------
  0 | -0.00750000000000001 | 2011-10-07 |  1.66
  1 | -0.00700000000000001 | 2011-10-06 |  1.66
  2 |                -0.01 | 2011-10-05 |  1.65
  3 |                -0.01 | 2011-10-04 |  1.65
  4 | -0.00800000000000001 | 2011-10-03 |   1.7
  5 | -0.00750000000000001 | 2011-09-30 |   1.7
  6 | -0.00700000000000001 | 2011-09-29 |   1.7
  7 |                    0 | 2011-09-28 |  1.75
  8 |               0.0015 | 2011-09-27 |  1.77
  9 |               0.0025 | 2011-09-26 |   1.8
 10 |              -0.0025 | 2011-09-23 |  1.75
 11 |               0.0015 | 2011-09-22 |  1.79
 12 |               0.0035 | 2011-09-21 |   1.8
 13 |               0.0015 | 2011-09-20 |  1.81
 14 |                0.003 | 2011-09-19 |  1.81
 15 |               0.0045 | 2011-09-16 |  1.82
 16 |  0.00650000000000001 | 2011-09-15 |  1.83
 17 |  0.00750000000000001 | 2011-09-14 |  1.85
 18 |                 0.01 | 2011-09-13 |  1.87
 19 |  0.00700000000000001 | 2011-09-12 |  1.81
 20 |               0.0055 | 2011-09-09 |  1.81
 21 |               0.0025 | 2011-09-08 |   1.8
 22 |  0.00600000000000001 | 2011-09-07 |  1.85

The first two arguments define a open / close character to indicate that you want to loop through an array (day-by-day or better qoute-by-quote). This is evaluated first and finally represented as an array. Note that you need to pass an offset to your variable name so close(0) is t and close(-1) is t-1. You even can loop inside a quote-by-quote sub routine:

-- weighted average made by foot:

set client_encoding to 'WIN1252';
select pgstocked.calc_indicators('`', '´'
 ,'select close,dte from pgstocked.quotes where fi_id = 3133 order by dte desc, tme desc'
 ,'New("test"); test=`Double c=0; for(int i=0;i>-20;i--){c+=close(i)*(i+20);}; return c / 210;´;Return("close","test");'
);

And of corse you can combine any variant:

set client_encoding to 'WIN1252';
select pgstocked.calc_indicators('`', '´'
 ,'select close,dte from pgstocked.quotes where fi_id = 3133 order by dte desc, tme desc'
 ,'New("test"); 
   test=Talib.calc("sma", `Double c=0; for(int i=0;i>-20;i--){c+=close(i)*(i+20);}; return c / 210;´, 10).get(0);
   New("testDelta"); testDelta=`test(0)-close(0)´;
   Return("close","test", "testDelta");'
);

select * from tmp_resultset;

To get to the data from any other application PgStock implements a very, very small web server. The server directly accesses special granted functions in the database instead of doing anything on the file system.

By today you are also able to plot your data as a binary image (jpg):

select pgstocked.calc_indicators('`', '´'
 ,'select close,dte from pgstocked.quotes 
   where fi_id = 3133 and dte >= ''2010-10-01'' 
   order by dte desc, tme desc'
 ,'a=Talib.calc("sma", `close(0)-close(-1)´, 20).get(0);
   RGBColor red = new RGBColor(255,0,0);
   Plot("main","main","line", red, close);
   Plot();   
   Return("close", "a");'
);

And this is what you get

Until now you only have an tranding tool onyl solution or a very spread one. The disadvantage on the first one is that you can only use the data supported by your application. And the harm on the second is that you have to glue a lot. Everybody knows this situation: You want to download quote data with a simple tool like wget. Using finance.yahoo.com and cron this is no big deal . But soon you will discover that a quote for a single day is missing and the imagination of other days missing drives you crazy. So you start do adapt your downloading script. Now your script looks in your database first for the last available quote and uses this date as a parameter for your downloading script. A few months later you will be ended up in gluing scripts in various languages to do a rather simple job.

You get a modular and all in one place solution. You do not need to glue database, shell scripts and cron tasks anymore. To move your code simply dump your database and restore it, you are done, no need to care about missing libs or scripts and binaries or scheduling jobs. Develop on Windows and run on Linux without any cross compile issue. Develop nearly any index you want to, even index based once like valuing tweets. Access your data through a web browser from and for any application you like.

Discussion

Anonymous