Thread: [R-gregmisc-users] SF.net SVN: r-gregmisc: [1151] trunk/gdata/inst/doc/mapLevels.tex

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Revision: 1151
          http://r-gregmisc.svn.sourceforge.net/r-gregmisc/?rev=1151&view=rev
Author:   ggorjan
Date:     2007-08-20 03:18:30 -0700 (Mon, 20 Aug 2007)

Log Message:
-----------
clean

Removed Paths:
-------------
    trunk/gdata/inst/doc/mapLevels.tex

Deleted: trunk/gdata/inst/doc/mapLevels.tex
===================================================================

--- trunk/gdata/inst/doc/mapLevels.tex	2007-08-20 10:17:37 UTC (rev 1150)
+++ trunk/gdata/inst/doc/mapLevels.tex	2007-08-20 10:18:30 UTC (rev 1151)
@@ -1,329 +0,0 @@
-
-%\VignetteIndexEntry{Mapping levels of a factor}
-%\VignettePackage{gdata}
-%\VignetteKeywords{levels, factor, manip}
-
-\documentclass[a4paper]{report}
-\usepackage{Rnews}
-\usepackage[round]{natbib}
-\bibliographystyle{abbrvnat}
-
-\usepackage{Sweave}
-
-
-\begin{document}
-
-\begin{article}
-
-\title{Mapping levels of a factor}
-\subtitle{The \pkg{gdata} package}
-\author{by Gregor Gorjanc}
-
-\maketitle
-
-\section{Introduction}
-
-Factors use levels attribute to store information on mapping between
-internal integer codes and character values i.e. levels. First level is
-mapped to internal integer code 1 and so on. Although some users do not
-like factors, their use is more efficient in terms of storage than for
-character vectors. Additionally, there are many functions in base \R{} that
-provide additional value for factors. Sometimes users need to work with
-internal integer codes and mapping them back to factor, especially when
-interfacing external programs. Mapping information is also of interest if
-there are many factors that should have the same set of levels. This note
-describes \code{mapLevels} function, which is an utility function for
-mapping the levels of a factor in \pkg{gdata} \footnote{from version 2.3.1}
-package \citep{WarnesGdata}.
-
-\section{Description with examples}
-
-Function \code{mapLevels()} is an (S3) generic function and works on
-\code{factor} and \code{character} atomic classes. It also works on
-\code{list} and \code{data.frame} objects with previously mentioned atomic
-classes. Function \code{mapLevels} produces a so called ``map'' with names
-and values. Names are levels, while values can be internal integer codes or
-(possibly other) levels. This will be clarified later on.  Class of this
-``map'' is \code{levelsMap}, if \code{x} in \code{mapLevels()} was atomic
-or \code{listLevelsMap} otherwise - for \code{list} and \code{data.frame}
-classes. The following example shows the creation and printout of such a
-``map''.
-
-\begin{Schunk}
-\begin{Sinput}
-> library(gdata)
-> (fac <- factor(c("B", "A", "Z", "D")))
-\end{Sinput}
-\begin{Soutput}
-[1] B A Z D
-Levels: A B D Z
-\end{Soutput}
-\begin{Sinput}
-> (map <- mapLevels(x=fac))
-\end{Sinput}
-\begin{Soutput}
-A B D Z 
-1 2 3 4 
-\end{Soutput}
-\end{Schunk}
-
-If we have to work with internal integer codes, we can transform factor to
-integer and still get ``back the original factor'' with ``map'' used as
-argument in \code{mapLevels<-} function as shown bellow. \code{mapLevels<-}
-is also an (S3) generic function and works on same classes as
-\code{mapLevels} plus \code{integer} atomic class.
-
-\begin{Schunk}
-\begin{Sinput}
-> (int <- as.integer(fac))
-\end{Sinput}
-\begin{Soutput}
-[1] 2 1 4 3
-\end{Soutput}
-\begin{Sinput}
-> mapLevels(x=int) <- map
-> int
-\end{Sinput}
-\begin{Soutput}
-[1] B A Z D
-Levels: A B D Z
-\end{Soutput}
-\begin{Sinput}
-> identical(fac, int)
-\end{Sinput}
-\begin{Soutput}
-[1] TRUE
-\end{Soutput}
-\end{Schunk}
-
-Internally ``map'' (\code{levelsMap} class) is a \code{list} (see bellow),
-but its print method unlists it for ease of inspection. ``Map'' from
-example has all components of length 1. This is not mandatory as
-\code{mapLevels<-} function is only a wrapper around workhorse function
-\code{levels<-} and the later can accept \code{list} with components of
-various lengths.
-
-\begin{Schunk}
-\begin{Sinput}
-> str(map)
-\end{Sinput}
-\begin{Soutput}
-List of 4
- $ A: int 1
- $ B: int 2
- $ D: int 3
- $ Z: int 4
- - attr(*, "class")= chr "levelsMap"
-\end{Soutput}
-\end{Schunk}
-
-Although not of primary importance, this ``map'' can also be used to remap
-factor levels as shown bellow.  Components ``later'' in the map take over
-the ``previous'' ones. Since this is not optimal I would rather recommend
-other approaches for ``remapping'' the levels of a \code{factor}, say
-\code{recode} in \pkg{car} package \citep{FoxCar}.
-
-\begin{Schunk}
-\begin{Sinput}
-> map[[2]] <- as.integer(c(1, 2))
-> map
-\end{Sinput}
-\begin{Soutput}
-A B B D Z 
-1 1 2 3 4 
-\end{Soutput}
-\begin{Sinput}
-> int <- as.integer(fac)
-> mapLevels(x=int) <- map
-> int
-\end{Sinput}
-\begin{Soutput}
-[1] B B Z D
-Levels: A B D Z
-\end{Soutput}
-\end{Schunk}
-
-Up to now examples showed ``map'' with internal integer codes for values
-and levels for names. I call this integer ``map''. On the other hand
-character ``map'' uses levels for values and (possibly other) levels for
-names. This feature is a bit odd at first sight, but can be used to easily
-unify levels and internal integer codes across several factors.  Imagine
-you have a factor that is for some reason split into two factors \code{f1}
-and \code{f2} and that each factor does not have all levels. This is not
-uncommon situation.
-
-\begin{Schunk}
-\begin{Sinput}
-> (f1 <- factor(c("A", "D", "C")))
-\end{Sinput}
-\begin{Soutput}
-[1] A D C
-Levels: A C D
-\end{Soutput}
-\begin{Sinput}
-> (f2 <- factor(c("B", "D", "C")))
-\end{Sinput}
-\begin{Soutput}
-[1] B D C
-Levels: B C D
-\end{Soutput}
-\end{Schunk}
-
-If we work with this factors, we need to be careful as they do not have the
-same set of levels. This can be solved with appropriately specifying
-\code{levels} argument in creation of factors i.e. \code{levels=c("A", "B",
-  "C", "D")} or with proper use of \code{levels<-} function. I say proper
-as it is very tempting to use:
-
-\begin{Schunk}
-\begin{Sinput}
-> fTest <- f1
-> levels(fTest) <- c("A", "B", "C", "D")
-> fTest
-\end{Sinput}
-\begin{Soutput}
-[1] A C B
-Levels: A B C D
-\end{Soutput}
-\end{Schunk}
-
-Above example extends set of levels, but also changes level of 2nd and 3rd
-element in \code{fTest}! Proper use of \code{levels<-} (as shown in
-\code{levels} help page) would be:
-
-\begin{Schunk}
-\begin{Sinput}
-> fTest <- f1
-> levels(fTest) <- list(A="A", B="B",
-+                       C="C", D="D")
-> fTest
-\end{Sinput}
-\begin{Soutput}
-[1] A D C
-Levels: A B C D
-\end{Soutput}
-\end{Schunk}
-
-Function \code{mapLevels} with character ``map'' can help us in such
-scenarios to unify levels and internal integer codes across several
-factors. Again the workhorse under this process is \code{levels<-} function
-from base \R{}! Function \code{mapLevels<-} just controls the assignment of
-(integer or character) ``map'' to \code{x}. Levels in \code{x} that match
-``map'' values (internal integer codes or levels) are changed to ``map''
-names (possibly other levels) as shown in \code{levels} help page. Levels
-that do not match are converted to \code{NA}. Integer ``map'' can be
-applied to \code{integer} or \code{factor}, while character ``map'' can be
-applied to \code{character} or \code{factor}. Result of \code{mapLevels<-}
-is always a \code{factor} with possibly ``remapped'' levels.
-
-To get one joint character ``map'' for several factors, we need to put
-factors in a \code{list} or \code{data.frame} and use arguments
-\code{codes=FALSE} and \code{combine=TRUE}. Such map can then be used to
-unify levels and internal integer codes.
-
-\begin{Schunk}
-\begin{Sinput}
-> (bigMap <- mapLevels(x=list(f1, f2),
-+                      codes=FALSE,
-+                      combine=TRUE))
-\end{Sinput}
-\begin{Soutput}
-  A   B   C   D 
-"A" "B" "C" "D" 
-\end{Soutput}
-\begin{Sinput}
-> mapLevels(f1) <- bigMap
-> mapLevels(f2) <- bigMap
-> f1
-\end{Sinput}
-\begin{Soutput}
-[1] A D C
-Levels: A B C D
-\end{Soutput}
-\begin{Sinput}
-> f2
-\end{Sinput}
-\begin{Soutput}
-[1] B D C
-Levels: A B C D
-\end{Soutput}
-\begin{Sinput}
-> cbind(as.character(f1), as.integer(f1),
-+       as.character(f2), as.integer(f2))
-\end{Sinput}
-\begin{Soutput}
-     [,1] [,2] [,3] [,4]
-[1,] "A"  "1"  "B"  "2" 
-[2,] "D"  "4"  "D"  "4" 
-[3,] "C"  "3"  "C"  "3" 
-\end{Soutput}
-\end{Schunk}
-
-If we do not specify \code{combine=TRUE} (which is the default behaviour)
-and \code{x} is a \code{list} or \code{data.frame}, \code{mapLevels}
-returns ``map'' of class \code{listLevelsMap}. This is internally a
-\code{list} of ``maps'' (\code{levelsMap} objects). Both
-\code{listLevelsMap} and \code{levelsMap} objects can be passed to
-\code{mapLevels<-} for \code{list}/\code{data.frame}. Recycling occurs when
-length of \code{listLevelsMap} is not the same as number of
-components/columns of a \code{list}/\code{data.frame}.
-
-Additional convenience methods are also implemented to ease the work with
-``maps'':
-
-\begin{itemize}
-
-\item \code{is.levelsMap}, \code{is.listLevelsMap}, \code{as.levelsMap} and
-  \code{as.listLevelsMap} for testing and coercion of user defined
-  ``maps'',
-
-\item \code{"["} for subsetting,
-
-\item \code{c} for combining \code{levelsMap} or \code{listLevelsMap}
-  objects; argument \code{recursive=TRUE} can be used to coerce
-  \code{listLevelsMap} to \code{levelsMap}, for example \code{c(llm1, llm2,
-    recursive=TRUE)} and
-
-\item \code{unique} and \code{sort} for \code{levelsMap}.
-
-\end{itemize}
-
-\section{Summary}
-
-Functions \code{mapLevels} and \code{mapLevels<-} can help users to map
-internal integer codes to factor levels and unify levels as well as
-internal integer codes among several factors. I welcome any comments or
-suggestions.
-
-% \bibliography{refs}
-\begin{thebibliography}{1}
-\providecommand{\natexlab}[1]{#1}
-\providecommand{\url}[1]{\texttt{#1}}
-\expandafter\ifx\csname urlstyle\endcsname\relax
-  \providecommand{\doi}[1]{doi: #1}\else
-  \providecommand{\doi}{doi: \begingroup \urlstyle{rm}\Url}\fi
-
-\bibitem[Fox(2006)]{FoxCar}
-J.~Fox.
-\newblock \emph{car: Companion to Applied Regression}, 2006.
-\newblock URL \url{http://socserv.socsci.mcmaster.ca/jfox/}.
-\newblock R package version 1.1-1.
-
-\bibitem[Warnes(2006)]{WarnesGdata}
-G.~R. Warnes.
-\newblock \emph{gdata: Various R programming tools for data manipulation},
-  2006.
-\newblock URL
-  \url{http://cran.r-project.org/src/contrib/Descriptions/gdata.html}.
-\newblock R package version 2.3.1. Includes R source code and/or documentation
-  contributed by Ben Bolker, Gregor Gorjanc and Thomas Lumley.
-
-\end{thebibliography}
-
-\address{Gregor Gorjanc\\
-  University of Ljubljana, Slovenia\\
-\email{gre...@bf...}}
-
-\end{article}
-
-\end{document}


This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.




Thread: [R-gregmisc-users] SF.net SVN: r-gregmisc: [1151] trunk/gdata/inst/doc/mapLevels.tex

r-gregmisc-users