Revision: 3974
http://matplotlib.svn.sourceforge.net/matplotlib/?rev=3974&view=rev
Author: jdh2358
Date: 2007-10-20 14:57:26 -0700 (Sat, 20 Oct 2007)
Log Message:
-----------
finished working with files chapter
Modified Paths:
--------------
trunk/py4science/workbook/files_etc.tex
Modified: trunk/py4science/workbook/files_etc.tex
===================================================================
--- trunk/py4science/workbook/files_etc.tex 2007-10-20 21:32:59 UTC (rev 3973)
+++ trunk/py4science/workbook/files_etc.tex 2007-10-20 21:57:26 UTC (rev 3974)
@@ -202,3 +202,96 @@
since 2003}
\end{figure}
\par\end{center}
+
+
+\subsection{Loading and saving binary data}
+\label{sec:binary_data}
+
+ASCII is bloated and slow for working with large arrays, and so binary
+data should be used if performance is a consideration. To save an
+array \texttt{X} in binary form, you can use the numpy
+\texttt{tostring} method
+
+\begin{lstlisting}
+In [16]: import numpy
+
+# create some random numbers
+In [17]: x = numpy.random.rand(5,2)
+
+In [19]: print x
+[[ 0.56331918 0.519582 ]
+ [ 0.22685429 0.18371135]
+ [ 0.19384767 0.27367054]
+ [ 0.35935445 0.95795884]
+ [ 0.37646642 0.14431089]]
+
+# save it to a data file in binary
+In [20]: x.tofile(file('myx.dat', 'wb'))
+
+# load it into a new array
+In [21]: y = numpy.fromfile(file('myx.dat', 'rb'))
+
+# the shape is not preserved, so we will have to reshape it
+In [22]: print y
+[ 0.56331918 0.519582 0.22685429 0.18371135 0.19384767
+0.27367054
+ 0.35935445 0.95795884 0.37646642 0.14431089]
+
+In [23]: y.shape
+Out[23]: (10,)
+
+# restore the original shape
+In [24]: y.shape = 5, 2
+
+In [25]: print y
+[[ 0.56331918 0.519582 ]
+ [ 0.22685429 0.18371135]
+ [ 0.19384767 0.27367054]
+ [ 0.35935445 0.95795884]
+ [ 0.37646642 0.14431089]]
+\end{lstlisting}
+
+The advantage of numpy \texttt{tofile} and \texttt{fromfile} over
+ASCII data is that the data storage is compact and the read and write
+are very fast. It is a bit of a pain that that meta ata like array
+datatype and shape are not stored. In this format, just the raw binary
+numeric data is stored, so you will have to keep track of the data
+type and shape by other means. This is a good solution if you need to
+port binary data files between different packages, but if you know you
+will always be working in python, you can use the python pickle
+function to preserve all metadata (pickle also works with all standard
+python data types, but has the disadvantage that other programs and
+applications cannot easily read it)
+
+\begin{lstlisting}
+# create a 6,3 array of random integers
+In [36]: x = (256*numpy.random.rand(6,3)).astype(numpy.int)
+
+In [37]: print x
+[[173 38 2]
+ [243 207 155]
+ [127 62 140]
+ [ 46 29 98]
+ [ 0 46 156]
+ [ 20 177 36]]
+
+# use pickle to save the data to a file myint.dat
+In [38]: import cPickle
+
+In [39]: cPickle.dump(x, file('myint.dat', 'wb'))
+
+# load the data into a new array
+In [40]: y = cPickle.load(file('myint.dat', 'rb'))
+
+# the array type and share are preserved
+In [41]: print y
+[[173 38 2]
+ [243 207 155]
+ [127 62 140]
+ [ 46 29 98]
+ [ 0 46 156]
+ [ 20 177 36]]
+\end{lstlisting}
+
+
+
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|