Download Latest Version collapse version 2.1.2 source code.tar.gz (13.5 MB)
Email in envelope

Get an email when there's a new version of collapse

Home / v2.1.0
Name Modified Size InfoDownloads / Week
Parent folder
collapse version 2.1.0 source code.tar.gz 2025-03-10 12.7 MB
collapse version 2.1.0 source code.zip 2025-03-10 13.0 MB
README.md 2025-03-10 4.0 kB
Totals: 3 Items   25.7 MB 0

collapse 2.1.0, released in March 2025, introduces a fast slicing function, an improved weighted quantile algorithm, a few convenience features, and removes some legacy functions from the package.

Potentially breaking changes

  • Functions pwNobs, as.factor_GRP, as.factor_qG, is.GRP, is.qG, is.unlistable, is.categorical, is.Date, as.numeric_factor, as.character_factor, and Date_vars, which were renamed in v1.6.0 by either replacing '.' with '_' or using all lower-case letters, and depreciated since then, are now finally removed from the package.

  • num_vars() (and thus also cat_vars() and collap()) were changed to a simpler C-definition of numeric data types which is more in-line with is.numeric(): is_numeric_C <- function(x) typeof(x) %in% c("integer", "double") && !inherits(x, c("factor", "Date", "POSIXct", "yearmon", "yearqtr")). The previous definition was: is_numeric_C_old <- function(x) typeof(x) %in% c("integer", "double") && (!is.object(x) || inherits(x, c("ts", "units", "integer64"))). Thus, the definition changed from including only certain classes to excluding the most important classes. Thanks @maouw for flagging this (#727).

Bug Fixes

  • Fixed some issues using collapse and the tidyverse together, particularly regarding tidyverse methods for 'grouped_df' - thanks @NicChr (#645).

  • More consistent handling of zero-length inputs - they are now also returned in fmean() and fmedian()/fnth() instead of returning NA (#628).

Additions

  • Added function fslice(): a fast alternative to dplyr::slice_[head|tail|min|max] that also works with matrices. Thanks @alinacherkas for the proposal and initial implementation (#725).

  • Added function groupv() as programmers version of group(), or rather, groupv() is now identical to the former group(), and group() now supports multiple vectors as input e.g. group(v1, v2). This is done for convenience and consistency with radixorder[v](). For backwards compatibility, group() also supports a single list as input.

  • join() has a new argument require allowing the user to generate messages or errors if the join operation is not successful enough:

    :::r join(df1, df2, require = list(x = 0.8, fail = "warning"))

    > Warning: Matched 75.0% of records in table df1 (x), but 80.0% is required

    > left join: df1[id1, id2] 3/4 (75%) <1:1st> df2[id1, id2] 3/4 (75%)

    > id1 id2 name age salary dept

    > 1 1 a John 35 60000 IT

    > 2 1 b Jane 28 NA <NA>

    > 3 2 b Bob 42 55000 Marketing

    > 4 3 c Carl 50 70000 Sales

  • psmat() now has a fill argument to fill empty slots in matrix/array with other elements (default NULL/NA).

Improvements

  • The weighted quantile algorithm in fquantile()/fnth() was improved to a more theoretically sound method following excellent notes by Matthew Kay. It now also supports quantile type 4, but it does not skip zero weights anymore, as the new algorithm makes it difficult to skip them 'on the fly'. Note that the existing collapse algorithm already had very good properties after a bug fix in v2.0.17, but the new algorithm is more exact and also faster.

  • The collapse arXiv article has been updated and significantly enhanced. It is an excellent resource to get an overview of the package.

Notes

  • On CRAN, collapse R dependency was changed to >= 4.1.0 to be able to use the base pipe in examples without generating a NOTE on R CMD check (another absolutely unnecessary restriction). The package depends on R >= 3.5.0 and the DESCRIPTION file on GitHub/R-universe will continue to reflect this.
Source: README.md, updated 2025-03-10