r-gregmisc-users Mailing List for R gregmisc package (Page 39)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Revision: 1034
          http://svn.sourceforge.net/r-gregmisc/?rev=1034&view=rev
Author:   ggorjan
Date:     2006-11-30 13:36:40 -0800 (Thu, 30 Nov 2006)

Log Message:
-----------
description of mapLevels methods

Added Paths:
-----------
    trunk/gdata/inst/doc/mapLevels.pdf
    trunk/gdata/inst/doc/mapLevels.tex

Added: trunk/gdata/inst/doc/mapLevels.pdf
===================================================================
(Binary files differ)


Property changes on: trunk/gdata/inst/doc/mapLevels.pdf
___________________________________________________________________
Name: svn:mime-type
   + application/octet-stream

Added: trunk/gdata/inst/doc/mapLevels.tex
===================================================================

--- trunk/gdata/inst/doc/mapLevels.tex	                        (rev 0)
+++ trunk/gdata/inst/doc/mapLevels.tex	2006-11-30 21:36:40 UTC (rev 1034)
@@ -0,0 +1,255 @@
+\documentclass[a4paper]{report}
+\usepackage{Rnews}
+\usepackage[round]{natbib}
+\bibliographystyle{abbrvnat}
+
+\begin{document}
+
+\begin{article}
+
+\title{Mapping levels of a factor}
+\subtitle{The gdata package}
+\author{by Gregor Gorjanc}
+
+\maketitle
+
+\section{Introduction}
+
+Factors use levels attribute to store information on mapping between
+internal integer codes and character values i.e. levels. First level is
+mapped to internal integer code 1 and so on. Although some users do not
+like factors, their use is more efficient in terms of storage than for
+character vectors. Additionally, there are many functions in base \R{} that
+provide additional value for factors. Sometimes users need to work with
+internal integer codes and mapping them back to factor, especially when
+interfacing external programs. Mapping information is also of interest if
+there are many factors that should have the same set of levels. This note
+describes \code{mapLevels} function, which is an utility function for
+mapping the levels of a factor in \pkg{gdata} \footnote{from version 2.3.1}
+package \citep{WarnesGdata}.
+
+\section{Description with examples}
+
+Function \code{mapLevels()} is an (S3) generic function and works on
+\code{factor} and \code{character} atomic classes. It also works on
+\code{list} and \code{data.frame} objects with previously mentioned atomic
+classes. Function \code{mapLevels} produces a so called ``map'' with names
+and values. Names are levels, while values can be internal integer codes or
+(possibly other) levels. This will be clarified later on.  Class of this
+``map'' is \code{levelsMap}, if \code{x} in \code{mapLevels()} was atomic
+or \code{listLevelsMap} otherwise - for \code{list} and \code{data.frame}
+classes. The following example shows the creation and printout of such a
+``map''.
+
+\begin{smallverbatim}
+> library(gdata)
+> (fac <- factor(c("B", "A", "Z", "D")))
+[1] B A Z D
+Levels: A B D Z
+> (map <- mapLevels(x=fac))
+A B D Z
+1 2 3 4
+\end{smallverbatim}
+
+If we have to work with internal integer codes, we can transform factor to
+integer and still get ``back the original factor'' with ``map'' used as
+argument in \code{mapLevels<-} function as shown bellow. \code{mapLevels<-}
+is also an (S3) generic function and works on same classes as
+\code{mapLevels} plus \code{integer} atomic class.
+
+\begin{smallverbatim}
+> (int <- as.integer(fac))
+[1] 2 1 4 3
+> mapLevels(x=int) <- map
+> int
+[1] B A Z D
+Levels: A B D Z
+> identical(fac, int)
+[1] TRUE
+\end{smallverbatim}
+
+Internally ``map'' (\code{levelsMap} class) is a \code{list} (see bellow),
+but its print method unlists it for ease of inspection. ``Map'' from
+example has all components of length 1. This is not mandatory as
+\code{mapLevels<-} function is only a wrapper around workhorse function
+\code{levels<-} and the later can accept \code{list} with components of
+various lengths.
+
+\begin{smallverbatim}
+> str(map)
+List of 4
+ $ A: int 1
+ $ B: int 2
+ $ D: int 3
+ $ Z: int 4
+ - attr(*, "class")= chr "levelsMap"
+\end{smallverbatim}
+
+Although not of primary importance, this ``map'' can also be used to remap
+factor levels as shown bellow.  Components ``later'' in the map take over
+the ``previous'' ones. Since this is not optimal I would rather recommend
+other approaches for ``remapping'' the levels of a \code{factor}, say
+\code{recode} in \pkg{car} package \citep{FoxCar}.
+
+\begin{smallverbatim}
+> map[[2]] <- as.integer(c(1, 2))
+> map
+A B B D Z
+1 1 2 3 4
+> int <- as.integer(fac)
+> mapLevels(x=int) <- map
+> int
+[1] B B Z D
+Levels: A B D Z
+\end{smallverbatim}
+
+Up to now examples showed ``map'' with internal integer codes for values
+and levels for names. I call this integer ``map''. On the other hand
+character ``map'' uses levels for values and (possibly other) levels for
+names. This feature is a bit odd at first sight, but can be used to easily
+unify levels and internal integer codes across several factors.  Imagine
+you have a factor that is for some reason split into two factors \code{f1}
+and \code{f2} and that each factor does not have all levels. This is not
+uncommon situation.
+
+\begin{smallverbatim}
+> (f1 <- factor(c("A", "D", "C")))
+[1] A D C
+Levels: A C D
+> (f2 <- factor(c("B", "D", "C")))
+[1] B D C
+Levels: B C D
+\end{smallverbatim}
+
+If we work with this factors, we need to be careful as they do not have the
+same set of levels. This can be solved with appropriately specifying
+\code{levels} argument in creation of factors i.e. \code{levels=c("A", "B",
+  "C", "D")} or with proper use of \code{levels<-} function. I say proper
+as it is very tempting to use:
+
+\begin{smallverbatim}
+> fTest <- f1
+> levels(fTest) <- c("A", "B", "C", "D")
+> fTest
+[1] A C B
+Levels: A B C D
+\end{smallverbatim}
+
+Above example extends set of levels, but also changes level of 2nd and 3rd
+element in \code{fTest}! Proper use of \code{levels<-} (as shown in
+\code{levels} help page) would be:
+
+\begin{smallverbatim}
+> fTest <- f1
+> levels(fTest) <- list(A="A", B="B", C="C", D="D")
+> fTest
+[1] A D C
+Levels: A B C D
+\end{smallverbatim}
+
+Function \code{mapLevels} with character ``map'' can help us in such
+scenarios to unify levels and internal integer codes across several
+factors. Again the workhorse under this process is \code{levels<-} function
+from base \R{}! Function \code{mapLevels<-} just controls the assignment of
+(integer or character) ``map'' to \code{x}. Levels in \code{x} that match
+``map'' values (internal integer codes or levels) are changed to ``map''
+names (possibly other levels) as shown in \code{levels} help page. Levels
+that do not match are converted to \code{NA}. Integer ``map'' can be
+applied to \code{integer} or \code{factor}, while character ``map'' can be
+applied to \code{character} or \code{factor}. Result of \code{mapLevels<-}
+is always a \code{factor} with possibly ``remapped'' levels.
+
+To get one joint character ``map'' for several factors, we need to
+put factors in a \code{list} or \code{data.frame} and use arguments
+\code{codes=FALSE} and \code{combine=TRUE}. Such map can then be used to
+unify levels and internal integer codes.
+
+\begin{smallverbatim}
+> (bigMap <- mapLevels(x=list(f1, f2), codes=FALSE,
++                      combine=TRUE))
+  A   B   C   D
+"A" "B" "C" "D"
+> mapLevels(f1) <- bigMap
+> mapLevels(f2) <- bigMap
+> f1
+[1] A D C
+Levels: A B C D
+> f2
+[1] B D C
+Levels: A B C D
+> cbind(as.character(f1), as.integer(f1),
++       as.character(f2), as.integer(f2))
+     [,1] [,2] [,3] [,4]
+[1,] "A"  "1"  "B"  "2"
+[2,] "D"  "4"  "D"  "4"
+[3,] "C"  "3"  "C"  "3"
+\end{smallverbatim}
+
+If we do not specify \code{combine=TRUE} (which is the default behaviour)
+and \code{x} is a \code{list} or \code{data.frame}, \code{mapLevels}
+returns ``map'' of class \code{listLevelsMap}. This is internally a
+\code{list} of ``maps'' (\code{levelsMap} objects). Both
+\code{listLevelsMap} and \code{levelsMap} objects can be passed to
+\code{mapLevels<-} for \code{list}/\code{data.frame}. Recycling occurs when
+length of \code{listLevelsMap} is not the same as number of
+components/columns of a \code{list}/\code{data.frame}.
+
+Additional convenience methods are also implemented to ease the work with
+``maps'':
+
+\begin{itemize}
+
+\item \code{is.levelsMap}, \code{is.listLevelsMap}, \code{as.levelsMap} and
+  \code{as.listLevelsMap} for testing and coercion of user defined
+  ``maps'',
+
+\item \code{"["} for subsetting,
+
+\item \code{c} for combining \code{levelsMap} or \code{listLevelsMap}
+  objects; argument \code{recursive=TRUE} can be used to coerce
+  \code{listLevelsMap} to \code{levelsMap}, for example \code{c(llm1, llm2,
+    recursive=TRUE)} and
+
+\item \code{unique} and \code{sort} for \code{levelsMap}.
+
+\end{itemize}
+
+\section{Summary}
+
+Functions \code{mapLevels} and \code{mapLevels<-} can help users to map
+internal integer codes to factor levels and unify levels as well as
+internal integer codes among several factors. I welcome any comments or
+suggestions.
+
+% \bibliography{refs}
+\begin{thebibliography}{1}
+\providecommand{\natexlab}[1]{#1}
+\providecommand{\url}[1]{\texttt{#1}}
+\expandafter\ifx\csname urlstyle\endcsname\relax
+  \providecommand{\doi}[1]{doi: #1}\else
+  \providecommand{\doi}{doi: \begingroup \urlstyle{rm}\Url}\fi
+
+\bibitem[Fox(2006)]{FoxCar}
+J.~Fox.
+\newblock \emph{car: Companion to Applied Regression}, 2006.
+\newblock URL \url{http://socserv.socsci.mcmaster.ca/jfox/}.
+\newblock R package version 1.1-1.
+
+\bibitem[Warnes.(2006)]{WarnesGdata}
+G.~R. Warnes.
+\newblock \emph{gdata: Various R programming tools for data manipulation},
+  2006.
+\newblock URL
+  \url{http://cran.r-project.org/src/contrib/Descriptions/gdata.html}.
+\newblock R package version 2.3.1. Includes R source code and/or documentation
+  contributed by Ben Bolker, Gregor Gorjanc and Thomas Lumley.
+
+\end{thebibliography}
+
+\address{Gregor Gorjanc\\
+  University of Ljubljana, Slovenia\\
+\email{gre...@bf...}}
+
+\end{article}
+
+\end{document}


This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.




2006	Jan	Feb	Mar (12)	Apr (5)	May (3)	Jun (5)	Jul (2)	Aug (5)	Sep (7)	Oct (15)	Nov (34)	Dec (3)
2007	Jan (3)	Feb (16)	Mar (28)	Apr (5)	May	Jun (5)	Jul (9)	Aug (50)	Sep (29)	Oct (9)	Nov (25)	Dec (7)
2008	Jan (6)	Feb (4)	Mar (5)	Apr (8)	May (26)	Jun (11)	Jul	Aug (2)	Sep	Oct	Nov	Dec (9)
2009	Jan	Feb (1)	Mar	Apr (2)	May (26)	Jun	Jul (10)	Aug (6)	Sep	Oct (7)	Nov (3)	Dec (2)
2010	Jan (45)	Feb (11)	Mar	Apr (1)	May (8)	Jun (7)	Jul (3)	Aug (1)	Sep	Oct (1)	Nov (9)	Dec (1)
2011	Jan (2)	Feb	Mar	Apr (3)	May (1)	Jun	Jul	Aug (14)	Sep (29)	Oct (3)	Nov	Dec (3)
2012	Jan	Feb	Mar	Apr (7)	May (6)	Jun (59)	Jul	Aug (8)	Sep (21)	Oct	Nov	Dec
2013	Jan (1)	Feb	Mar (10)	Apr	May (18)	Jun (25)	Jul (18)	Aug (1)	Sep (6)	Oct (28)	Nov (4)	Dec (13)
2014	Jan (7)	Feb (5)	Mar (4)	Apr (36)	May (3)	Jun (7)	Jul (46)	Aug (14)	Sep (12)	Oct (2)	Nov	Dec (12)
2015	Jan (4)	Feb	Mar	Apr (80)	May (36)	Jun	Jul	Aug	Sep	Oct	Nov	Dec

r-gregmisc-users Mailing List for R gregmisc package (Page 39)

r-gregmisc-users — Discussion list for R gregmisc packages (gplot, gmodels, gdata, gtools)