gretl / Bugs / #240 Nerlove unbalanced

Sven Schreiber - 2020-12-30

Sorry, in order to replicate: Not sure what you mean with the first 196 obs. The original dataset is balanced. Are you saying the final four years of the last firm in the panel should be removed?
(I had to remind myself that feature request ticket 103 is related, and links to Allin's document.)
thanks, sven

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

KTTK - 2020-12-30

Thank you for looking at this! I am sorry if I did not express clearly. The original Grunfeld data set is balanced (I referer to the one used in Baltagi's text book, there are various floating around, one with 11 firms, but that is another story). It seems to be the one built in in gretl. As I want to replicate the unbalanced one-way Nerlove estimator, I chose that data set and made it unbalanced by dropping the last 4 obs (the obs belonging to the last four years) of the 10th observational unit, yielding 196 obs (the first 196 obs in the order of the data as in Baltagi's text book). So, as you assumed, yes.

Feature request ticket 103 is related as it lead to the implementation of the one-way unbalanced Nerlove estimator in gretl. However, it does not contain information not contained in Allin's working paper (and the ticket is quite long...).

I am not proficient in C. I tried to look at gretl's implemenation anyway. The part about weighted averages in gretl seems fine at a first glance. I do not know how the fixed effects (the a's) are calculated by gretl (one would need to know more about the structure of the model objects). I suspect the fixed effects are already different. Gretl's implementation distinguishes between the balanced and the unbalanced case while this would not be necessary judging from the formulae, as I think. To differenciate might be a (computational) efficiency consideration, though.

Last edit: KTTK 2020-12-30

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sven Schreiber - 2020-12-30

To be honest I'm having trouble getting this unbalanced version of grunfeld right in gretl. Perhaps this (separate) problem might be a factor in the differences? How did you construct the unbalanced sample?
thanks, sven

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- KTTK - 2020-12-30
  
  Oh, I just imported a CSV (or .xlsx, cannot tell for sure anymore) where I deleted the last 4 obs manually. Please find it in gretl's format attached.
  
  I had the same thought about strange data import but other estimators run fine with the same data and way of importing. E.g., the one-way unbalanced FE model as stand alone. It is to be used as well in the Nerlove one-way unbalanced estimation - or rather only its fixed effects, i.e., the a's. From stand-alone FE estimation, one can get the a's with "Save" -> "Per-unit constants" and these are exactly the ones used in my Nerlove unbalanced implementation (values as stated in comments to my R code in the initial post).
  
  What always wonders me a bit is that in foot line of the gretl main windows it says 1:01 - 10:20 for the data set but concluded that is how gretl want it to be displayed (missing obs are counted for this presentation).
  
  Last edit: KTTK 2020-12-30
  
  Grunfeld_unbalanced_196.gdt
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Riccardo "Jack" Lucchetti - 2020-12-30

Possibly unrelated, but while investigating the problem I found that the $ahat accessor is slightly buggy: if you try the following piece of code (meant to replicate KTTK's experiment) I realised that the values for unit 10 are all missing.

open grunfeld.gdt smpl 1 196 panel invest const value kstock series a = $ahat
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- KTTK - 2020-12-30
  
  Interesting. Also note how command panel picks up only 180 observations (all but the 10th unit). Well, the $ahat accessor's ouput fits to the estimated model.
  
  Last edit: KTTK 2020-12-30
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Riccardo "Jack" Lucchetti - 2020-12-30
    
    Exactly. In the meantime, here's a gretl script replicating your R script:
    
    set verbose off open grunfeld.gdt invest = (t>196) ? NA : invest smpl full panel invest const value kstock series ahat = $ahat series tmpf = ok(ahat) ? firm : NA matrix tmp = aggregate(ahat, tmpf, mean) matrix a = tmp[,3] matrix w = tmp[,2]/sumc(tmp[,2]) scalar n = rows(a) scalar KTTK_bet = sumc ( (a - sumc(a .* w)).^2 .* w) * n/(n-1) print KTTK_bet
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Allin Cottrell - 2020-12-30

Gretl gives the between variance you calculated via R if you append the --unbalanced flag along with --nerlove, as described in section 6 of the working paper. Otherwise the standard "uncorrected" Nerlove formula is used.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- KTTK - 2020-12-30
  
  That's it! Thank you for that hint! I only use gretl's GUI, so I would have suspected the unbalanced option is applied as the GUI has one Nerlove option (while there are two for the unbalanced Swamy-Arora,a la Baltagi/Chang and a la Stata).
  
  Maybe we can have this a little clearer in Gretl's User Guide, end of section 23.1? And/or introduce a GUI option for unbalanced Nerlove?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Sven Schreiber - 2020-12-30
    
    Granted that the GUI appears to be lagging the developments in the scripting area here -- but out of curiosity, in what real-world application are you actually using this specialized variant? It seems not many people pay so much attention to the details of the unbalanced implementation. cheers
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - KTTK - 2020-12-31
      
      Frankly: No real world application. Just out of curiosity. And I realized these days, I never looked at the numbers produced by the Nerlove unbalanced estimation once the fine working paper was published.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - KTTK - 2020-12-31
    
    An observation/question. Just to be as niggeling as possible. Please feel free to not comment if you do not want to go into that level of detail. I put it here for my own record as well.
    
    Not related to the Nerlove estimation per se but about the printed regression output:
    mean theta printed by gretl for the 196 unbalanced Grunfeld data is the mean of 10 numbers (as there are 10 individuals, not their weighted mean (= mean of 196 numbers (the 10 numbers extended to the total of 196 observations))), no matter the estimator.
    Both approaches seem to be defensible for Nerlove. (Original Nerlove treats balanced data only where both approaches yield the same result).
    
    This reminded me of the Stata vs. Baltagi/Chang approch for Swamy-Arora estimation! The trigger for gretl working paper #4. So, extending the diffierence (compressed vs. full length (Ti weighted) data) to the printed mean of theta:
    For Baltagi/Chang approach the mean of 10 numbers is printed for mean of theta in estimation output. Within the Baltagi/Chang approach, wouldn't it be more consistent to have the weighted mean of 10 numbers (= mean of 196 numbers) printed by gretl?
    
    For comparison:
    Baltagi/Chang (1994) approach: mean of 196 numbers (=weighted mean of 10 numbers) for the Grunfeld one-way RE model: (weighted) mean theta 0.8598731
    
    mean of 10 numbers (no weighting), current gretl output for Baltagi/Chang: 0.859579
    
    Last edit: KTTK 2020-12-31
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sven Schreiber - 2021-01-02

status: open --> closed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sven Schreiber - 2021-01-02

OK, for the record:
There were some little bugs related to formal subsampling of such panel datasets when the sampling criterion led to unequal T_i lengths of the panel groups. Thanks to Allin these should now be fixed (in git and thus in the next release).
Next, the thing to note about gretl's handling of panels is that if it says "full sample" or "full range" that isn't intended to claim that the panel is balanced. As you have noticed, there may still be missings, and that is perfectly OK and will be taken into account when doing actual calculations or estimations.

So I'm closing this bug ticket now, not having forgotten that the nerlove/unbalanced combination is not fully reflected or available via the GUI. I'd suggest to open a new feature request ticket for that issue if a more pressing use case arises.

(Also you may add further comments here if you like, irrespective of the "closed" status of the ticket.)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- KTTK - 2021-01-15
  
  I am glad this report about a "bug" lead to some fixes in the subsampling code! Thank you!
  
  Re your suggestions about opening another ticket:
  I don't know whether you would like to bring unbalanced Nerlove RE model into the GUI or not, i.e., keeping it for the experts in the command line. Anyways, I suggest the Gretl User's Guide (indended for the non-expert audience as well, as I reckon) could be more clear about this.
  
  Last edit: KTTK 2021-01-16
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sven Schreiber - 2021-01-15

I guess we could add something like this quote from Allin:

Gretl gives the between variance [you calculated via R] if you append the --unbalanced flag along with --nerlove, as described in section 6 of the working paper. Otherwise the standard "uncorrected" Nerlove formula is used.

Plus perhaps: "This is currently available as an expert option for scripting only."

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nerlove unbalanced

A cross-platform statistical package for econometric analysis

Group

Searches

Help

#240 Nerlove unbalanced

Discussion