gretl / Feature Requests / #192 Multiple summary statistics for aggregate()

A cross-platform statistical package for econometric analysis

#192 Multiple summary statistics for aggregate()

Milestone: None

Status: closed

Owner: nobody

Labels: None

Priority: 4

Updated: 2023-12-16

Created: 2023-05-14

Creator: Artur Tarassow

Private: No

Hi all,

currently one can request grouped statistics for a single statistics only when using the aggregate() function. The type is passed through the 3rd paramter funcname (of type string). Here an example computing the mean:

set verbose off
open data4-10 -q

# series/list mode
list L = ENROLL CATHOL
matrix m = aggregate(L, REGION, mean)
print m

A useful feature is if the third parameter funcname could be a string array allowing the user to request multiple statistics (each must return a scalar value for each distinct group).

Currently, this is quite cumbersome to do:

set verbose off
open data4-10 -q

# series/list mode
list L = ENROLL CATHOL
list groupby = REGION
matrix avg = aggregate(L, groupby, mean)
matrix std = aggregate(L, groupby, sd) # make sure you group by the same variables

matrix combine = avg ~ std[,2+nelem(groupby):]

# labels not unique; adding the name of the statistics requested in brackets would be useful
strings column_labels = cnameget(avg) + cnameget(std)[2+nelem(groupby):]
cnameset(combine, column_labels)  

print combine

Here a pseudo-example using series as data input:

list L = ENROLL CATHOL
strings stats = defarray("mean", "median", "sd")
matrix m = aggregate(L, REGION, stats)
print m

Best
Artur

Discussion

Sven Schreiber - 2023-05-23

Well, I find aggregate() already quite complex. In the list case, for example, one would then need a convention about the ordering of the result columns, would it be by series/variable or by aggregation method?
I can see a potential argument relating to a speed gain, because the grouping doesn't have to be done again and again. So the question IMO is, is the complexity worth the gain? Is speed actually an issue?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Allin Cottrell - 2023-11-11
  
  I agree with Sven: aggregate is complex enough already.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sven Schreiber - 2023-12-16

status: open --> closed

Group: -->
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sven Schreiber - 2023-12-16

Allow me to close this - sorry, Artur, but it seems others are not enthusiastic about squeezing more stuff into aggregate().

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Multiple summary statistics for aggregate()

A cross-platform statistical package for econometric analysis

Group

Searches

Help

#192 Multiple summary statistics for aggregate()

Discussion