I have created an aggregate function to calculate the 90th percentile of a data set:
CREATE AGGREGATE FUNCTION perc90(IN val DOUBLE, IN flag boolean, INOUT buffer DOUBLE ARRAY, INOUT counter INT)
RETURNS Double
CONTAINS SQL
BEGIN ATOMIC
DECLARE positionRank DOUBLE;
DECLARE sortedArray DOUBLE ARRAY;
IF flag THEN
SET positionRank = round(((0.90*counter)+0.5),0);
SET sortedArray = SORT_ARRAY(buffer);
RETURN sortedArray[positionRank-1];
ELSE
IF val IS NULL THEN RETURN NULL; END IF;
IF counter IS NULL THEN SET counter = 0; END IF;
SET counter = counter + 1;
IF counter = 1 THEN SET buffer = ARRAY[val];
ELSE SET buffer[counter] = val; END IF;
RETURN NULL;
END IF;
END
this worked fine with a small data set of around 300 records but when increasing the sample to a 150,000 I encountered the hard limit of 1024 for the array so I tried the following:
CREATE AGGREGATE FUNCTION perc90(IN val DOUBLE, IN flag boolean, INOUT buffer DOUBLE ARRAY[10000000], INOUT counter INT)
RETURNS Double
CONTAINS SQL
BEGIN ATOMIC
DECLARE positionRank DOUBLE;
DECLARE sortedArray DOUBLE ARRAY;
DECLARE trimedArray DOUBLE ARRAY;
IF flag THEN
SET positionRank = round(((0.9*counter)+0.5),0);
SET trimedArray = TRIM_ARRAY(buffer,(10000000 - counter));
SET sortedArray = SORT_ARRAY(trimedArray);
RETURN sortedArray[positionRank-1];
ELSE
IF val IS NULL THEN RETURN NULL; END IF;
IF counter IS NULL THEN SET counter = 0; END IF;
SET counter = counter + 1;
IF counter = 1 THEN SET buffer = ARRAY[val];
ELSE SET buffer[counter] = val; END IF;
RETURN NULL;
END IF;
END
and
CREATE AGGREGATE FUNCTION perc90(IN val DOUBLE, IN flag boolean, INOUT buffer DOUBLE ARRAY[10000000], INOUT counter INT)
RETURNS Double
CONTAINS SQL
BEGIN ATOMIC
DECLARE positionRank DOUBLE;
DECLARE sortedArray DOUBLE ARRAY;
DECLARE trimedArray DOUBLE ARRAY;
IF flag THEN
SET positionRank = round(((0.9*counter)+0.5),0);
SET sortedArray = SORT_ARRAY(buffer ASC NULLS LAST);
RETURN sortedArray[1];
ELSE
IF val IS NULL THEN RETURN NULL; END IF;
IF counter IS NULL THEN SET counter = 0; END IF;
SET counter = counter + 1;
IF counter = 1 THEN SET buffer = ARRAY[val];
ELSE SET buffer[counter] = val; END IF;
RETURN NULL;
END IF;
END
but I seem to keep getting a "data exception: array data, right truncation / Error Code: -3491 / State: 2202F" error.
I've tried looking for limits to these functions in the documentation but I couldn't find anything.
Am I doing something wrong or is it a limitation of the db?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have created an aggregate function to calculate the 90th percentile of a data set:
this worked fine with a small data set of around 300 records but when increasing the sample to a 150,000 I encountered the hard limit of 1024 for the array so I tried the following:
and
but I seem to keep getting a "data exception: array data, right truncation / Error Code: -3491 / State: 2202F" error.
I've tried looking for limits to these functions in the documentation but I couldn't find anything.
Am I doing something wrong or is it a limitation of the db?
Have you tried specifying the size for the ARRAY variables? For example DECLARE sortedArray DOUBLE ARRAY.
that works thank you :) I can't believe I missed it.
I have it working but it takes around 20s for 150K rows which is not really great, the query:
is pretty much instant and that's doing the majority of what the aggregate function does I just need to be able to go to a direct row.