Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
README.md | 2023-01-23 | 6.6 kB | |
tidyr 1.3.0 source code.tar.gz | 2023-01-23 | 2.4 MB | |
tidyr 1.3.0 source code.zip | 2023-01-23 | 2.5 MB | |
Totals: 3 Items | 5.0 MB | 0 |
New features
-
New family of consistent string separating functions:
separate_wider_delim()
,separate_wider_position()
,separate_wider_regex()
,separate_longer_delim()
, andseparate_longer_position()
. These functions are thorough refreshes ofseparate()
andextract()
, featuring improved performance, greater consistency, a polished API, and a new approach for handling problems. They use stringr and supersedeextract()
,separate()
, andseparate_rows()
(#1304). -
nest()
gains a.by
argument which allows you to specify the columns to nest by (rather than the columns to nest, i.e. through...
). Additionally, the.key
argument is no longer deprecated, and is used whenever...
isn't specified (#1458). -
unnest_longer()
gains akeep_empty
argument likeunnest()
(#1339). -
pivot_longer()
gains acols_vary
argument for controlling the ordering of the output rows relative to their original row number (#1312). -
New datasets
who2
,household
,cms_patient_experience
, andcms_patient_care
to demonstrate various tidying challenges (#1333).
Breaking changes
- The
...
argument of bothpivot_longer()
andpivot_wider()
has been moved to the front of the function signature, after the required arguments but before the optional ones. Additionally,pivot_longer_spec()
,pivot_wider_spec()
,build_longer_spec()
, andbuild_wider_spec()
have all gained...
arguments in a similar location. This change allows us to more easily add new features to the pivoting functions without breaking existing CRAN packages and user scripts.
pivot_wider()
provides temporary backwards compatible support for the case
of a single unnamed argument that previously was being positionally matched to
id_cols
. This one special case still works, but will throw a warning
encouraging you to explicitly name the id_cols
argument.
To read more about this pattern, see Data, dots, details in the tidyverse design guide (#1350).
Lifecycle changes
- All functions deprecated in tidyr 1.0 and 1.2 (the old lazyeval functions
ending in
_
and various arguments tounnest()
) now warn on every use. They will be made defunct in 2024 (#1406).
Rectangling
-
unnest_longer()
now consistently drops rows with eitherNULL
or empty vectors (likeinteger()
) by default. Set the newkeep_empty
argument toTRUE
to retain them. Previously,keep_empty = TRUE
was implicitly being used forNULL
, whilekeep_empty = FALSE
was being used for empty vectors, which was inconsistent with all other tidyr verbs with this argument (#1363). -
unnest_longer()
now uses""
in the index column for fully unnamed vectors. It also now consistently usesNA
in the index column for empty vectors that are "kept" bykeep_empty = TRUE
(#1442). -
unnest_wider()
now errors if any values being unnested are unnamed andnames_sep
is not provided (#1367). -
unnest_wider()
now generates automatic names for partially unnamed vectors. Previously it only generated them for fully unnamed vectors, resulting in a strange mix of automatic names and name-repaired names (#1367).
Bug fixes and minor improvements
General
-
Most tidyr functions now consistently disallow renaming during tidy-selection. Renaming was never meaningful in these functions, and previously either had no effect or caused problems (#1449, [#1104]).
-
tidyr errors (including input validation) have been thoroughly reviewed and should generally be more likely to point you in the right direction (#1313, [#1400]).
-
uncount()
is now generic so implementations can be provided for objects other than data frames (@mgirlich, [#1358]). -
uncount()
gains a...
argument. It comes between the required and the optional arguments (@mgirlich, [#1358]). -
nest()
,complete()
,expand()
, andfill()
now document their support for grouped data frames created bydplyr::group_by()
(#952). -
All built in datasets are now standard tibbles (#1459).
-
R >=3.4.0 is now required, in line with the tidyverse standard of supporting the previous 5 minor releases of R.
-
rlang >=1.0.4 and vctrs >=0.5.2 are now required (#1344, [#1470]).
-
Removed dependency on ellipsis in favor of equivalent functions in rlang (#1314).
Nesting, packing, and chopping
-
unnest()
,unchop()
,unnest_longer()
, andunnest_wider()
better handle lists with additional classes (#1327). -
pack()
,unpack()
,chop()
, andunchop()
all gain anerror_call
argument, which in turn improves some of the error calls shown innest()
and variousunnest()
adjacent functions (#1446). -
chop()
,unpack()
, andunchop()
all gain...
, which must be empty (#1447). -
unpack()
does a better job of reporting column name duplication issues and gives better advice about how to resolve them usingnames_sep
. This also improves errors from functions that useunpack()
, likeunnest()
andunnest_wider()
(#1425, [#1367]).
Pivoting
-
pivot_longer()
no longer supports interpretingvalues_ptypes = list()
andnames_ptypes = list()
asNULL
. An emptylist()
is now interpreted as a<list>
prototype to apply to all columns, which is consistent with how any other 0-length value is interpreted (#1296). -
pivot_longer(values_drop_na = TRUE)
is faster when there aren't any missing values to drop (#1392, @mgirlich). -
pivot_longer()
is now more memory efficient due to the usage ofvctrs::vec_interleave()
(#1310, @mgirlich). -
pivot_longer()
now throws a slightly better error message whenvalues_ptypes
ornames_ptypes
is provided and the coercion can't be made (#1364). -
pivot_wider()
now throws a better error message when a column selected bynames_from
orvalues_from
is also selected byid_cols
(#1318). -
pivot_wider()
is now faster whennames_sep
is provided (@mgirlich, [#1426]). -
pivot_longer_spec()
,pivot_wider_spec()
,build_longer_spec()
, andbuild_wider_spec()
all gain anerror_call
argument, resulting in better error reporting inpivot_longer()
andpivot_wider()
(#1408).
Missing values
-
fill()
now works correctly when there is a column named.direction
indata
(#1319, @tjmahr). -
replace_na()
is faster when there aren't any missing values to replace (#1392, @mgirlich). -
The documentation of the
replace
argument ofreplace_na()
now mentions thatreplace
is always cast to the type ofdata
(#1317).