[R-gregmisc-users] SF.net SVN: r-gregmisc: [1034] trunk/gdata/inst/doc
Brought to you by:
warnes
From: <gg...@us...> - 2006-11-30 21:36:44
|
Revision: 1034 http://svn.sourceforge.net/r-gregmisc/?rev=1034&view=rev Author: ggorjan Date: 2006-11-30 13:36:40 -0800 (Thu, 30 Nov 2006) Log Message: ----------- description of mapLevels methods Added Paths: ----------- trunk/gdata/inst/doc/mapLevels.pdf trunk/gdata/inst/doc/mapLevels.tex Added: trunk/gdata/inst/doc/mapLevels.pdf =================================================================== (Binary files differ) Property changes on: trunk/gdata/inst/doc/mapLevels.pdf ___________________________________________________________________ Name: svn:mime-type + application/octet-stream Added: trunk/gdata/inst/doc/mapLevels.tex =================================================================== --- trunk/gdata/inst/doc/mapLevels.tex (rev 0) +++ trunk/gdata/inst/doc/mapLevels.tex 2006-11-30 21:36:40 UTC (rev 1034) @@ -0,0 +1,255 @@ +\documentclass[a4paper]{report} +\usepackage{Rnews} +\usepackage[round]{natbib} +\bibliographystyle{abbrvnat} + +\begin{document} + +\begin{article} + +\title{Mapping levels of a factor} +\subtitle{The gdata package} +\author{by Gregor Gorjanc} + +\maketitle + +\section{Introduction} + +Factors use levels attribute to store information on mapping between +internal integer codes and character values i.e. levels. First level is +mapped to internal integer code 1 and so on. Although some users do not +like factors, their use is more efficient in terms of storage than for +character vectors. Additionally, there are many functions in base \R{} that +provide additional value for factors. Sometimes users need to work with +internal integer codes and mapping them back to factor, especially when +interfacing external programs. Mapping information is also of interest if +there are many factors that should have the same set of levels. This note +describes \code{mapLevels} function, which is an utility function for +mapping the levels of a factor in \pkg{gdata} \footnote{from version 2.3.1} +package \citep{WarnesGdata}. + +\section{Description with examples} + +Function \code{mapLevels()} is an (S3) generic function and works on +\code{factor} and \code{character} atomic classes. It also works on +\code{list} and \code{data.frame} objects with previously mentioned atomic +classes. Function \code{mapLevels} produces a so called ``map'' with names +and values. Names are levels, while values can be internal integer codes or +(possibly other) levels. This will be clarified later on. Class of this +``map'' is \code{levelsMap}, if \code{x} in \code{mapLevels()} was atomic +or \code{listLevelsMap} otherwise - for \code{list} and \code{data.frame} +classes. The following example shows the creation and printout of such a +``map''. + +\begin{smallverbatim} +> library(gdata) +> (fac <- factor(c("B", "A", "Z", "D"))) +[1] B A Z D +Levels: A B D Z +> (map <- mapLevels(x=fac)) +A B D Z +1 2 3 4 +\end{smallverbatim} + +If we have to work with internal integer codes, we can transform factor to +integer and still get ``back the original factor'' with ``map'' used as +argument in \code{mapLevels<-} function as shown bellow. \code{mapLevels<-} +is also an (S3) generic function and works on same classes as +\code{mapLevels} plus \code{integer} atomic class. + +\begin{smallverbatim} +> (int <- as.integer(fac)) +[1] 2 1 4 3 +> mapLevels(x=int) <- map +> int +[1] B A Z D +Levels: A B D Z +> identical(fac, int) +[1] TRUE +\end{smallverbatim} + +Internally ``map'' (\code{levelsMap} class) is a \code{list} (see bellow), +but its print method unlists it for ease of inspection. ``Map'' from +example has all components of length 1. This is not mandatory as +\code{mapLevels<-} function is only a wrapper around workhorse function +\code{levels<-} and the later can accept \code{list} with components of +various lengths. + +\begin{smallverbatim} +> str(map) +List of 4 + $ A: int 1 + $ B: int 2 + $ D: int 3 + $ Z: int 4 + - attr(*, "class")= chr "levelsMap" +\end{smallverbatim} + +Although not of primary importance, this ``map'' can also be used to remap +factor levels as shown bellow. Components ``later'' in the map take over +the ``previous'' ones. Since this is not optimal I would rather recommend +other approaches for ``remapping'' the levels of a \code{factor}, say +\code{recode} in \pkg{car} package \citep{FoxCar}. + +\begin{smallverbatim} +> map[[2]] <- as.integer(c(1, 2)) +> map +A B B D Z +1 1 2 3 4 +> int <- as.integer(fac) +> mapLevels(x=int) <- map +> int +[1] B B Z D +Levels: A B D Z +\end{smallverbatim} + +Up to now examples showed ``map'' with internal integer codes for values +and levels for names. I call this integer ``map''. On the other hand +character ``map'' uses levels for values and (possibly other) levels for +names. This feature is a bit odd at first sight, but can be used to easily +unify levels and internal integer codes across several factors. Imagine +you have a factor that is for some reason split into two factors \code{f1} +and \code{f2} and that each factor does not have all levels. This is not +uncommon situation. + +\begin{smallverbatim} +> (f1 <- factor(c("A", "D", "C"))) +[1] A D C +Levels: A C D +> (f2 <- factor(c("B", "D", "C"))) +[1] B D C +Levels: B C D +\end{smallverbatim} + +If we work with this factors, we need to be careful as they do not have the +same set of levels. This can be solved with appropriately specifying +\code{levels} argument in creation of factors i.e. \code{levels=c("A", "B", + "C", "D")} or with proper use of \code{levels<-} function. I say proper +as it is very tempting to use: + +\begin{smallverbatim} +> fTest <- f1 +> levels(fTest) <- c("A", "B", "C", "D") +> fTest +[1] A C B +Levels: A B C D +\end{smallverbatim} + +Above example extends set of levels, but also changes level of 2nd and 3rd +element in \code{fTest}! Proper use of \code{levels<-} (as shown in +\code{levels} help page) would be: + +\begin{smallverbatim} +> fTest <- f1 +> levels(fTest) <- list(A="A", B="B", C="C", D="D") +> fTest +[1] A D C +Levels: A B C D +\end{smallverbatim} + +Function \code{mapLevels} with character ``map'' can help us in such +scenarios to unify levels and internal integer codes across several +factors. Again the workhorse under this process is \code{levels<-} function +from base \R{}! Function \code{mapLevels<-} just controls the assignment of +(integer or character) ``map'' to \code{x}. Levels in \code{x} that match +``map'' values (internal integer codes or levels) are changed to ``map'' +names (possibly other levels) as shown in \code{levels} help page. Levels +that do not match are converted to \code{NA}. Integer ``map'' can be +applied to \code{integer} or \code{factor}, while character ``map'' can be +applied to \code{character} or \code{factor}. Result of \code{mapLevels<-} +is always a \code{factor} with possibly ``remapped'' levels. + +To get one joint character ``map'' for several factors, we need to +put factors in a \code{list} or \code{data.frame} and use arguments +\code{codes=FALSE} and \code{combine=TRUE}. Such map can then be used to +unify levels and internal integer codes. + +\begin{smallverbatim} +> (bigMap <- mapLevels(x=list(f1, f2), codes=FALSE, ++ combine=TRUE)) + A B C D +"A" "B" "C" "D" +> mapLevels(f1) <- bigMap +> mapLevels(f2) <- bigMap +> f1 +[1] A D C +Levels: A B C D +> f2 +[1] B D C +Levels: A B C D +> cbind(as.character(f1), as.integer(f1), ++ as.character(f2), as.integer(f2)) + [,1] [,2] [,3] [,4] +[1,] "A" "1" "B" "2" +[2,] "D" "4" "D" "4" +[3,] "C" "3" "C" "3" +\end{smallverbatim} + +If we do not specify \code{combine=TRUE} (which is the default behaviour) +and \code{x} is a \code{list} or \code{data.frame}, \code{mapLevels} +returns ``map'' of class \code{listLevelsMap}. This is internally a +\code{list} of ``maps'' (\code{levelsMap} objects). Both +\code{listLevelsMap} and \code{levelsMap} objects can be passed to +\code{mapLevels<-} for \code{list}/\code{data.frame}. Recycling occurs when +length of \code{listLevelsMap} is not the same as number of +components/columns of a \code{list}/\code{data.frame}. + +Additional convenience methods are also implemented to ease the work with +``maps'': + +\begin{itemize} + +\item \code{is.levelsMap}, \code{is.listLevelsMap}, \code{as.levelsMap} and + \code{as.listLevelsMap} for testing and coercion of user defined + ``maps'', + +\item \code{"["} for subsetting, + +\item \code{c} for combining \code{levelsMap} or \code{listLevelsMap} + objects; argument \code{recursive=TRUE} can be used to coerce + \code{listLevelsMap} to \code{levelsMap}, for example \code{c(llm1, llm2, + recursive=TRUE)} and + +\item \code{unique} and \code{sort} for \code{levelsMap}. + +\end{itemize} + +\section{Summary} + +Functions \code{mapLevels} and \code{mapLevels<-} can help users to map +internal integer codes to factor levels and unify levels as well as +internal integer codes among several factors. I welcome any comments or +suggestions. + +% \bibliography{refs} +\begin{thebibliography}{1} +\providecommand{\natexlab}[1]{#1} +\providecommand{\url}[1]{\texttt{#1}} +\expandafter\ifx\csname urlstyle\endcsname\relax + \providecommand{\doi}[1]{doi: #1}\else + \providecommand{\doi}{doi: \begingroup \urlstyle{rm}\Url}\fi + +\bibitem[Fox(2006)]{FoxCar} +J.~Fox. +\newblock \emph{car: Companion to Applied Regression}, 2006. +\newblock URL \url{http://socserv.socsci.mcmaster.ca/jfox/}. +\newblock R package version 1.1-1. + +\bibitem[Warnes.(2006)]{WarnesGdata} +G.~R. Warnes. +\newblock \emph{gdata: Various R programming tools for data manipulation}, + 2006. +\newblock URL + \url{http://cran.r-project.org/src/contrib/Descriptions/gdata.html}. +\newblock R package version 2.3.1. Includes R source code and/or documentation + contributed by Ben Bolker, Gregor Gorjanc and Thomas Lumley. + +\end{thebibliography} + +\address{Gregor Gorjanc\\ + University of Ljubljana, Slovenia\\ +\email{gre...@bf...}} + +\end{article} + +\end{document} This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |