Normally I use R for my statistical analyses. However R has the well know problem of having problems to deal with very large data sets. Because I work with such datasets, I was looking for a free or open source application which can deal with such data sets. Gretl seemed to be perfect for this job. In a little test I tried to run a simple OLS and a Logit-Modell with simulated data with 10 million rows. Gretl did the job in a few seconds. However I discovered that the calculation of a simple crosstable (xtab-command) took very very long even with much smaller datasets. For example with a dataset of 500’000 rows, a crosstable made out of two dummy variables took more then 5 minutes time. I’m working with a MacBook Pro with OS X 10.8.4 running on it.
Kris
2014-01-14
I just found out that the problem is only prevalent when calculating a 2X2 crosstable. I think its because then gretl calculates the fishers exact test. From this point of view it would be great to have an option to suppress the calculation of the fishers exact test.
Allin Cottrell
2014-01-14
Allin Cottrell
2014-01-14
Thanks for the report. Yes, the culprit was Fisher's Exact Test; we shouldn't
be attempting to calculate that for very big tables. After experimenting some
I have disabled that test for N > 1000. The fix is now in CVS and snapshots.