Thread: [R-gregmisc-users] SF.net SVN: r-gregmisc: [1151] trunk/gdata/inst/doc/mapLevels.tex
Brought to you by:
warnes
From: <gg...@us...> - 2007-08-20 10:18:34
|
Revision: 1151 http://r-gregmisc.svn.sourceforge.net/r-gregmisc/?rev=1151&view=rev Author: ggorjan Date: 2007-08-20 03:18:30 -0700 (Mon, 20 Aug 2007) Log Message: ----------- clean Removed Paths: ------------- trunk/gdata/inst/doc/mapLevels.tex Deleted: trunk/gdata/inst/doc/mapLevels.tex =================================================================== --- trunk/gdata/inst/doc/mapLevels.tex 2007-08-20 10:17:37 UTC (rev 1150) +++ trunk/gdata/inst/doc/mapLevels.tex 2007-08-20 10:18:30 UTC (rev 1151) @@ -1,329 +0,0 @@ - -%\VignetteIndexEntry{Mapping levels of a factor} -%\VignettePackage{gdata} -%\VignetteKeywords{levels, factor, manip} - -\documentclass[a4paper]{report} -\usepackage{Rnews} -\usepackage[round]{natbib} -\bibliographystyle{abbrvnat} - -\usepackage{Sweave} - - -\begin{document} - -\begin{article} - -\title{Mapping levels of a factor} -\subtitle{The \pkg{gdata} package} -\author{by Gregor Gorjanc} - -\maketitle - -\section{Introduction} - -Factors use levels attribute to store information on mapping between -internal integer codes and character values i.e. levels. First level is -mapped to internal integer code 1 and so on. Although some users do not -like factors, their use is more efficient in terms of storage than for -character vectors. Additionally, there are many functions in base \R{} that -provide additional value for factors. Sometimes users need to work with -internal integer codes and mapping them back to factor, especially when -interfacing external programs. Mapping information is also of interest if -there are many factors that should have the same set of levels. This note -describes \code{mapLevels} function, which is an utility function for -mapping the levels of a factor in \pkg{gdata} \footnote{from version 2.3.1} -package \citep{WarnesGdata}. - -\section{Description with examples} - -Function \code{mapLevels()} is an (S3) generic function and works on -\code{factor} and \code{character} atomic classes. It also works on -\code{list} and \code{data.frame} objects with previously mentioned atomic -classes. Function \code{mapLevels} produces a so called ``map'' with names -and values. Names are levels, while values can be internal integer codes or -(possibly other) levels. This will be clarified later on. Class of this -``map'' is \code{levelsMap}, if \code{x} in \code{mapLevels()} was atomic -or \code{listLevelsMap} otherwise - for \code{list} and \code{data.frame} -classes. The following example shows the creation and printout of such a -``map''. - -\begin{Schunk} -\begin{Sinput} -> library(gdata) -> (fac <- factor(c("B", "A", "Z", "D"))) -\end{Sinput} -\begin{Soutput} -[1] B A Z D -Levels: A B D Z -\end{Soutput} -\begin{Sinput} -> (map <- mapLevels(x=fac)) -\end{Sinput} -\begin{Soutput} -A B D Z -1 2 3 4 -\end{Soutput} -\end{Schunk} - -If we have to work with internal integer codes, we can transform factor to -integer and still get ``back the original factor'' with ``map'' used as -argument in \code{mapLevels<-} function as shown bellow. \code{mapLevels<-} -is also an (S3) generic function and works on same classes as -\code{mapLevels} plus \code{integer} atomic class. - -\begin{Schunk} -\begin{Sinput} -> (int <- as.integer(fac)) -\end{Sinput} -\begin{Soutput} -[1] 2 1 4 3 -\end{Soutput} -\begin{Sinput} -> mapLevels(x=int) <- map -> int -\end{Sinput} -\begin{Soutput} -[1] B A Z D -Levels: A B D Z -\end{Soutput} -\begin{Sinput} -> identical(fac, int) -\end{Sinput} -\begin{Soutput} -[1] TRUE -\end{Soutput} -\end{Schunk} - -Internally ``map'' (\code{levelsMap} class) is a \code{list} (see bellow), -but its print method unlists it for ease of inspection. ``Map'' from -example has all components of length 1. This is not mandatory as -\code{mapLevels<-} function is only a wrapper around workhorse function -\code{levels<-} and the later can accept \code{list} with components of -various lengths. - -\begin{Schunk} -\begin{Sinput} -> str(map) -\end{Sinput} -\begin{Soutput} -List of 4 - $ A: int 1 - $ B: int 2 - $ D: int 3 - $ Z: int 4 - - attr(*, "class")= chr "levelsMap" -\end{Soutput} -\end{Schunk} - -Although not of primary importance, this ``map'' can also be used to remap -factor levels as shown bellow. Components ``later'' in the map take over -the ``previous'' ones. Since this is not optimal I would rather recommend -other approaches for ``remapping'' the levels of a \code{factor}, say -\code{recode} in \pkg{car} package \citep{FoxCar}. - -\begin{Schunk} -\begin{Sinput} -> map[[2]] <- as.integer(c(1, 2)) -> map -\end{Sinput} -\begin{Soutput} -A B B D Z -1 1 2 3 4 -\end{Soutput} -\begin{Sinput} -> int <- as.integer(fac) -> mapLevels(x=int) <- map -> int -\end{Sinput} -\begin{Soutput} -[1] B B Z D -Levels: A B D Z -\end{Soutput} -\end{Schunk} - -Up to now examples showed ``map'' with internal integer codes for values -and levels for names. I call this integer ``map''. On the other hand -character ``map'' uses levels for values and (possibly other) levels for -names. This feature is a bit odd at first sight, but can be used to easily -unify levels and internal integer codes across several factors. Imagine -you have a factor that is for some reason split into two factors \code{f1} -and \code{f2} and that each factor does not have all levels. This is not -uncommon situation. - -\begin{Schunk} -\begin{Sinput} -> (f1 <- factor(c("A", "D", "C"))) -\end{Sinput} -\begin{Soutput} -[1] A D C -Levels: A C D -\end{Soutput} -\begin{Sinput} -> (f2 <- factor(c("B", "D", "C"))) -\end{Sinput} -\begin{Soutput} -[1] B D C -Levels: B C D -\end{Soutput} -\end{Schunk} - -If we work with this factors, we need to be careful as they do not have the -same set of levels. This can be solved with appropriately specifying -\code{levels} argument in creation of factors i.e. \code{levels=c("A", "B", - "C", "D")} or with proper use of \code{levels<-} function. I say proper -as it is very tempting to use: - -\begin{Schunk} -\begin{Sinput} -> fTest <- f1 -> levels(fTest) <- c("A", "B", "C", "D") -> fTest -\end{Sinput} -\begin{Soutput} -[1] A C B -Levels: A B C D -\end{Soutput} -\end{Schunk} - -Above example extends set of levels, but also changes level of 2nd and 3rd -element in \code{fTest}! Proper use of \code{levels<-} (as shown in -\code{levels} help page) would be: - -\begin{Schunk} -\begin{Sinput} -> fTest <- f1 -> levels(fTest) <- list(A="A", B="B", -+ C="C", D="D") -> fTest -\end{Sinput} -\begin{Soutput} -[1] A D C -Levels: A B C D -\end{Soutput} -\end{Schunk} - -Function \code{mapLevels} with character ``map'' can help us in such -scenarios to unify levels and internal integer codes across several -factors. Again the workhorse under this process is \code{levels<-} function -from base \R{}! Function \code{mapLevels<-} just controls the assignment of -(integer or character) ``map'' to \code{x}. Levels in \code{x} that match -``map'' values (internal integer codes or levels) are changed to ``map'' -names (possibly other levels) as shown in \code{levels} help page. Levels -that do not match are converted to \code{NA}. Integer ``map'' can be -applied to \code{integer} or \code{factor}, while character ``map'' can be -applied to \code{character} or \code{factor}. Result of \code{mapLevels<-} -is always a \code{factor} with possibly ``remapped'' levels. - -To get one joint character ``map'' for several factors, we need to put -factors in a \code{list} or \code{data.frame} and use arguments -\code{codes=FALSE} and \code{combine=TRUE}. Such map can then be used to -unify levels and internal integer codes. - -\begin{Schunk} -\begin{Sinput} -> (bigMap <- mapLevels(x=list(f1, f2), -+ codes=FALSE, -+ combine=TRUE)) -\end{Sinput} -\begin{Soutput} - A B C D -"A" "B" "C" "D" -\end{Soutput} -\begin{Sinput} -> mapLevels(f1) <- bigMap -> mapLevels(f2) <- bigMap -> f1 -\end{Sinput} -\begin{Soutput} -[1] A D C -Levels: A B C D -\end{Soutput} -\begin{Sinput} -> f2 -\end{Sinput} -\begin{Soutput} -[1] B D C -Levels: A B C D -\end{Soutput} -\begin{Sinput} -> cbind(as.character(f1), as.integer(f1), -+ as.character(f2), as.integer(f2)) -\end{Sinput} -\begin{Soutput} - [,1] [,2] [,3] [,4] -[1,] "A" "1" "B" "2" -[2,] "D" "4" "D" "4" -[3,] "C" "3" "C" "3" -\end{Soutput} -\end{Schunk} - -If we do not specify \code{combine=TRUE} (which is the default behaviour) -and \code{x} is a \code{list} or \code{data.frame}, \code{mapLevels} -returns ``map'' of class \code{listLevelsMap}. This is internally a -\code{list} of ``maps'' (\code{levelsMap} objects). Both -\code{listLevelsMap} and \code{levelsMap} objects can be passed to -\code{mapLevels<-} for \code{list}/\code{data.frame}. Recycling occurs when -length of \code{listLevelsMap} is not the same as number of -components/columns of a \code{list}/\code{data.frame}. - -Additional convenience methods are also implemented to ease the work with -``maps'': - -\begin{itemize} - -\item \code{is.levelsMap}, \code{is.listLevelsMap}, \code{as.levelsMap} and - \code{as.listLevelsMap} for testing and coercion of user defined - ``maps'', - -\item \code{"["} for subsetting, - -\item \code{c} for combining \code{levelsMap} or \code{listLevelsMap} - objects; argument \code{recursive=TRUE} can be used to coerce - \code{listLevelsMap} to \code{levelsMap}, for example \code{c(llm1, llm2, - recursive=TRUE)} and - -\item \code{unique} and \code{sort} for \code{levelsMap}. - -\end{itemize} - -\section{Summary} - -Functions \code{mapLevels} and \code{mapLevels<-} can help users to map -internal integer codes to factor levels and unify levels as well as -internal integer codes among several factors. I welcome any comments or -suggestions. - -% \bibliography{refs} -\begin{thebibliography}{1} -\providecommand{\natexlab}[1]{#1} -\providecommand{\url}[1]{\texttt{#1}} -\expandafter\ifx\csname urlstyle\endcsname\relax - \providecommand{\doi}[1]{doi: #1}\else - \providecommand{\doi}{doi: \begingroup \urlstyle{rm}\Url}\fi - -\bibitem[Fox(2006)]{FoxCar} -J.~Fox. -\newblock \emph{car: Companion to Applied Regression}, 2006. -\newblock URL \url{http://socserv.socsci.mcmaster.ca/jfox/}. -\newblock R package version 1.1-1. - -\bibitem[Warnes(2006)]{WarnesGdata} -G.~R. Warnes. -\newblock \emph{gdata: Various R programming tools for data manipulation}, - 2006. -\newblock URL - \url{http://cran.r-project.org/src/contrib/Descriptions/gdata.html}. -\newblock R package version 2.3.1. Includes R source code and/or documentation - contributed by Ben Bolker, Gregor Gorjanc and Thomas Lumley. - -\end{thebibliography} - -\address{Gregor Gorjanc\\ - University of Ljubljana, Slovenia\\ -\email{gre...@bf...}} - -\end{article} - -\end{document} This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |