Menu

#192 Multiple summary statistics for aggregate()

None
closed
nobody
None
4
2023-12-16
2023-05-14
No

Hi all,

currently one can request grouped statistics for a single statistics only when using the aggregate() function. The type is passed through the 3rd paramter funcname (of type string). Here an example computing the mean:

set verbose off
open data4-10 -q

# series/list mode
list L = ENROLL CATHOL
matrix m = aggregate(L, REGION, mean)
print m

A useful feature is if the third parameter funcname could be a string array allowing the user to request multiple statistics (each must return a scalar value for each distinct group).

Currently, this is quite cumbersome to do:

set verbose off
open data4-10 -q

# series/list mode
list L = ENROLL CATHOL
list groupby = REGION
matrix avg = aggregate(L, groupby, mean)
matrix std = aggregate(L, groupby, sd) # make sure you group by the same variables

matrix combine = avg ~ std[,2+nelem(groupby):]

# labels not unique; adding the name of the statistics requested in brackets would be useful
strings column_labels = cnameget(avg) + cnameget(std)[2+nelem(groupby):]
cnameset(combine, column_labels)  

print combine

Here a pseudo-example using series as data input:

list L = ENROLL CATHOL
strings stats = defarray("mean", "median", "sd")
matrix m = aggregate(L, REGION, stats)
print m

Best
Artur

Discussion

  • Sven Schreiber

    Sven Schreiber - 2023-05-23

    Well, I find aggregate() already quite complex. In the list case, for example, one would then need a convention about the ordering of the result columns, would it be by series/variable or by aggregation method?
    I can see a potential argument relating to a speed gain, because the grouping doesn't have to be done again and again. So the question IMO is, is the complexity worth the gain? Is speed actually an issue?

     
    • Allin Cottrell

      Allin Cottrell - 2023-11-11

      I agree with Sven: aggregate is complex enough already.

       
  • Sven Schreiber

    Sven Schreiber - 2023-12-16
    • status: open --> closed
    • Group: -->
     
  • Sven Schreiber

    Sven Schreiber - 2023-12-16

    Allow me to close this - sorry, Artur, but it seems others are not enthusiastic about squeezing more stuff into aggregate().

     

Log in to post a comment.

MongoDB Logo MongoDB