Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
collapse version 2.1.0 source code.tar.gz | 2025-03-10 | 12.7 MB | |
collapse version 2.1.0 source code.zip | 2025-03-10 | 13.0 MB | |
README.md | 2025-03-10 | 4.0 kB | |
Totals: 3 Items | 25.7 MB | 0 |
collapse 2.1.0, released in March 2025, introduces a fast slicing function, an improved weighted quantile algorithm, a few convenience features, and removes some legacy functions from the package.
Potentially breaking changes
-
Functions
pwNobs
,as.factor_GRP
,as.factor_qG
,is.GRP
,is.qG
,is.unlistable
,is.categorical
,is.Date
,as.numeric_factor
,as.character_factor
, andDate_vars
, which were renamed in v1.6.0 by either replacing '.' with '_' or using all lower-case letters, and depreciated since then, are now finally removed from the package. -
num_vars()
(and thus alsocat_vars()
andcollap()
) were changed to a simpler C-definition of numeric data types which is more in-line withis.numeric()
:is_numeric_C <- function(x) typeof(x) %in% c("integer", "double") && !inherits(x, c("factor", "Date", "POSIXct", "yearmon", "yearqtr"))
. The previous definition was:is_numeric_C_old <- function(x) typeof(x) %in% c("integer", "double") && (!is.object(x) || inherits(x, c("ts", "units", "integer64")))
. Thus, the definition changed from including only certain classes to excluding the most important classes. Thanks @maouw for flagging this (#727).
Bug Fixes
-
Fixed some issues using collapse and the tidyverse together, particularly regarding tidyverse methods for 'grouped_df' - thanks @NicChr (#645).
-
More consistent handling of zero-length inputs - they are now also returned in
fmean()
andfmedian()
/fnth()
instead of returningNA
(#628).
Additions
-
Added function
fslice()
: a fast alternative todplyr::slice_[head|tail|min|max]
that also works with matrices. Thanks @alinacherkas for the proposal and initial implementation (#725). -
Added function
groupv()
as programmers version ofgroup()
, or rather,groupv()
is now identical to the formergroup()
, andgroup()
now supports multiple vectors as input e.g.group(v1, v2)
. This is done for convenience and consistency withradixorder[v]()
. For backwards compatibility,group()
also supports a single list as input. -
join()
has a new argumentrequire
allowing the user to generate messages or errors if the join operation is not successful enough::::r join(df1, df2, require = list(x = 0.8, fail = "warning"))
> Warning: Matched 75.0% of records in table df1 (x), but 80.0% is required
> left join: df1[id1, id2] 3/4 (75%) <1:1st> df2[id1, id2] 3/4 (75%)
> id1 id2 name age salary dept
> 1 1 a John 35 60000 IT
> 2 1 b Jane 28 NA <NA>
> 3 2 b Bob 42 55000 Marketing
> 4 3 c Carl 50 70000 Sales
-
psmat()
now has afill
argument to fill empty slots in matrix/array with other elements (defaultNULL
/NA
).
Improvements
-
The weighted quantile algorithm in
fquantile()
/fnth()
was improved to a more theoretically sound method following excellent notes by Matthew Kay. It now also supports quantile type 4, but it does not skip zero weights anymore, as the new algorithm makes it difficult to skip them 'on the fly'. Note that the existing collapse algorithm already had very good properties after a bug fix in v2.0.17, but the new algorithm is more exact and also faster. -
The collapse arXiv article has been updated and significantly enhanced. It is an excellent resource to get an overview of the package.
Notes
- On CRAN, collapse R dependency was changed to >= 4.1.0 to be able to use the base pipe in examples without generating a NOTE on R CMD check (another absolutely unnecessary restriction). The package depends on R >= 3.5.0 and the DESCRIPTION file on GitHub/R-universe will continue to reflect this.