[Py-howto-checkins] CVS: pyhowto/zodb links.tex,NONE,1.1 chatter.py,1.1,1.2 gfdl.tex,1.1,1.2 introdu

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Update of /cvsroot/py-howto/pyhowto/zodb
In directory slayer.i.sourceforge.net:/tmp/cvs-serv3293

Modified Files:
	chatter.py gfdl.tex introduction.tex prog-zodb.tex zeo.tex 
	zodb.tex 
Added Files:
	links.tex 
Log Message:
Add the text from my old ZODB/ZEO page (very roughly, so it still needs a 
    *lot* of editing)

--- NEW FILE ---
% links.tex
% Collection of relevant links

ZODB HOWTO, by Michel Pelletier:\\
Goes into slightly more detail about the rules for writing applications using the ZODB.
\\
\url{http://www.zope.org/Members/michel/HowTos/ZODB-How-To}

Introduction to the Zope Object Database, by Jim Fulton
\\
Goes into much greater detail, explaining advanced uses of the ZODB and 
how it's actually implemented.  A definitive reference, and highly recommended.
\\
\url{http://www.python.org/workshops/2000-01/proceedings/papers/fulton/zodb3.html}

Download link for ZEO \\
\url{http://www.zope.org/Products/ZEO/}

Index: chatter.py
===================================================================
RCS file: /cvsroot/py-howto/pyhowto/zodb/chatter.py,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -r1.1 -r1.2
*** chatter.py	2000/11/08 23:55:45	1.1
--- chatter.py	2000/11/11 01:49:37	1.2
***************
*** 28,33 ****
          self.name = name

!         # Internal attribute: _messages holds all the chat messages.
!         
          self._messages = BTree.BTree()

--- 28,32 ----
          self.name = name

!         # Internal attribute: _messages holds all the chat messages.        
          self._messages = BTree.BTree()

Index: gfdl.tex
===================================================================
RCS file: /cvsroot/py-howto/pyhowto/zodb/gfdl.tex,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -r1.1 -r1.2
*** gfdl.tex	2000/11/08 23:55:45	1.1
--- gfdl.tex	2000/11/11 01:49:37	1.2
***************
*** 1,3 ****
! % fdl.tex 
  % This file is a chapter.  It must be included in a larger document to work
  % properly.
--- 1,3 ----
! % gfdl.tex 
  % This file is a chapter.  It must be included in a larger document to work
  % properly.

Index: introduction.tex
===================================================================
RCS file: /cvsroot/py-howto/pyhowto/zodb/introduction.tex,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -r1.1 -r1.2
*** introduction.tex	2000/11/08 23:55:45	1.1
--- introduction.tex	2000/11/11 01:49:37	1.2
***************
*** 17,21 ****
  programming languages provide facilities that automatically write
  objects to disk and read them in again when they're required by a
! running program.  

  It's certainly possible to build your own system for making Python
--- 17,21 ----
  programming languages provide facilities that automatically write
  objects to disk and read them in again when they're required by a
! running program, and the ZODB adds such facilities to Python.

  It's certainly possible to build your own system for making Python
***************
*** 35,66 ****

- 
  \subsection{OODBs vs. Relational DBs}

  Another way to look at it is that the ZODB is a Python-specific
! object-oriented database (OODB).  

  Relational databases (RDBs) are far more common than OODBs.
! Relational databases store information in tables; a table consists of 
  any number of rows, each row containing several columns of
! information.  

  Let's introduce the example that we'll be using through this document.
  The example comes from my day job working for the MEMS Exchange, in a
  greatly simplified version.  
- 
- XXX explain a bit more

! The job is to track process runs, which are lists of manufacturing  steps
! to be performed in a semiconductor fab.  A run is owned by a
  particular user, and has a name and assigned ID number.  Runs consist
  of a number of operations; an operation is a single step to be
  performed, such as depositing something on a wafer or etching
! something off it.  Operations may have parameters, which are
! additional information required to perform an operation.  For example,
! if you're depositing something on a wafer, you need to know two
! things: 1) what you're depositing, and 2) how much should be
! deposited.
! You might deposit 100 microns of silicon oxide, or 1 micron of copper.

  Mapping these structures to a relational database is straightforward:
--- 35,68 ----

  \subsection{OODBs vs. Relational DBs}

  Another way to look at it is that the ZODB is a Python-specific
! object-oriented database (OODB).  Commercial object databases for C++
! or Java often require that you jump through some hoops, using a
! special preprocessor or avoiding certain data types.  As we'll see,
! the ZODB has some hoops of its own, but in comparison the naturalness
! of the ZODB is astonishing.

  Relational databases (RDBs) are far more common than OODBs.
! Relational databases store information in tables; a table consists of
  any number of rows, each row containing several columns of
! information.

  Let's introduce the example that we'll be using through this document.
  The example comes from my day job working for the MEMS Exchange, in a
  greatly simplified version.  

! The job is to track process runs, which are lists of manufacturing
! steps to be performed in a semiconductor fab.  A run is owned by a
  particular user, and has a name and assigned ID number.  Runs consist
  of a number of operations; an operation is a single step to be
  performed, such as depositing something on a wafer or etching
! something off it.  
! 
! Operations may have parameters, which are additional information
! required to perform an operation.  For example, if you're depositing
! something on a wafer, you need to know two things: 1) what you're
! depositing, and 2) how much should be deposited.  You might deposit
! 100 microns of silicon oxide, or 1 micron of copper.

  Mapping these structures to a relational database is straightforward:
***************
*** 81,87 ****
  class Run:
      .run_id
! ... XXX finish

- If you were

  \subsection{What is ZEO?}
--- 83,92 ----
  class Run:
      .run_id
! ... XXX finish this code ...
! 
! 
! 
! XXX continue

  \subsection{What is ZEO?}

Index: prog-zodb.tex
===================================================================
RCS file: /cvsroot/py-howto/pyhowto/zodb/prog-zodb.tex,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -r1.1 -r1.2
*** prog-zodb.tex	2000/11/08 23:55:45	1.1
--- prog-zodb.tex	2000/11/11 01:49:37	1.2
***************
*** 8,16 ****
  \section{ZODB Programming}

- \subsection{How ZODB Works}
- 
- XXX (ExtensionClass, dirty bits)
- 
- 
  \subsection{Installing ZODB}

--- 8,11 ----
***************
*** 59,64 ****
--- 54,240 ----
  problems, please let me know.

+ \subsection{How ZODB Works}

+ XXX (ExtensionClass, dirty bits)
+ 
+ There are 3 main interfaces in the ZODB: 
+ \class{Storage}, \class{DB}, and \class{Connection} classes.
+ 
+ \begin{itemize}
+  \item \class{Storage} classes are the lowest layer, and handle storing
+  and retrieving objects from some form of long-term storage.  A few
+  different types of Storage have been written, such as
+  \class{FileStorage}, which uses regular files, 
+  and \class{BerkeleyStorage}, which uses Sleepycat Software's
+  BerkeleyDB 2.7.  You could
+  write a new Storage that stored objects in a relational database or
+  Metakit file, for example, if you needed to ensure some property
+  useful to your application.  Two example storages,
+  \class{DemoStorage} and \class{MappingStorage}, are
+  available to use as models if you want to write a new Storage.
+ 
+  \item The \class{DB} class sits on top of a storage, and mediates the
+ interaction between several connections.  One \class{DB} instance
+ is created per process. 
+ 
+  \item Finally, the \class{Connection} class caches objects, and moves
+  them into and out of object storage.  A multi-threaded program can
+  open a separate \class{Connection} instance for each thread.
+ 
+ \end{itemize}
+ 
+ \subsection{Opening a ZODB}
+ 
+ Preparing to use a ZODB requires 3 steps: you have to open the
+ Storage, then create a DB instance that uses the Storage, and then get
+ a Connection from the DB instance.  All this is only a few lines of
+ code:
+ 
+ \begin{verbatim}
+ from ZODB import FileStorage, DB
+ 
+ storage = FileStorage.FileStorage('/tmp/test-filestorage.fs')
+ db = DB( storage )
+ conn = db.open()
+ \end{verbatim}
+ 
+ Note that you can use a completely different data storage mechanism
+ by changing the first line that opens a Storage; the
+ above example uses a \class{FileStorage}.  Soon you'll see how ZEO
+ uses this flexibility to good effect.
+ 
+ \subsection{Using a ZODB}
+ 
+ Making a Python class persistent is quite simple; it simply needs
+ to subclass from the \class{Persistent} class, as shown in this
+ example:
+ 
+ \begin{verbatim}
+ import ZODB
+ from Persistence import Persistent
+ 
+ class User(Persistent):
+     pass
+ \end{verbatim}
+ 
+ (The apparently unnecessary \code{import ZODB} statement is
+ needed for the following \\code{from...import} statement to work
+ correctly, since the ZODB code is doing some magical tricks with
+ importing.)
+ 
+ For simplicity, in the examples the \class{User} class will
+ simply be used as a holder for a bunch of attributes.  Normally the
+ class would define various methods that add functionality, but that
+ has no impact on the ZODB's treatment of the class.
+ 
+ The ZODB uses persistence by reachability; starting from
+ a set of root objects, all the attributes of those objects are made
+ persistent, whether they're simple Python data types or class instances.
+ 
+ As an example, we'll create a simple database of users that allows
+ retrieving a \class{User} object given the user's ID.  First, we
+ retrieve the primary root object of the ZODB; this object behaves like
+ a Python dictionary, so you can just add a new key/value pair for your
+ application's root object.  We'll insert a \class{BTree} object
+ that will contain all the \class{User} objects.  (The
+ \class{BTree} module is also included as part of Zope.)
+ 
+ \begin{verbatim}dbroot = conn.root()
+ 
+ # Ensure that a 'userdb' key is present
+ if not dbroot.has_key('userdb'):
+     import BTree
+     dbroot['userdb'] = BTree.BTree()
+ 
+ userdb = dbroot['userdb']
+ \end{verbatim}
+ 
+ Inserting a new user is simple: create the \class{User}
+ object, fill it with data, insert it into the BTree, and commit this
+ transaction.
+ 
+ \begin{verbatim}# Create new User instance
+ newuser = User() 
+ 
+ # Add whatever attributes you want to track
+ newuser.id = 'amk' 
+ newuser.first_name = 'Andrew' ; newuser.last_name = 'Kuchling'
+ ...
+ 
+ # Add object to the BTree, keyed on the ID
+ userdb[ newuser.id ] = newuser
+ 
+ # Commit the change
+ get_transaction().commit()
+ \end{verbatim}
+ 
+ When you import the ZODB package, it adds a new function,
+ \function{get_transaction()}, to Python's collection of built-in
+ functions.  \function{get_transaction()} returns a transaction
+ object, which has two important methods: \method{commit()} and
+ \method{abort()}.  \method{commit()} writes out any modified objects
+ to disk, making the changes permanent, while
+ \method{abort()} rolls back any changes that have been made,
+ restoring the original state of the objects.  If you're familiar with
+ database transactional semantics, this is all what you'd expect.
+ 
+ Because the integration with Python is so complete, it's a lot like
+ having transactional semantics for your program's variables, and you
+ can experiment with transactions at the Python interpreter's prompt:
+ 
+ \begin{verbatim}>>> newuser
+ &lt;User instance at 81b1f40>
+ >>> newuser.first_name           # Print initial value
+ 'Andrew'         
+ >>> newuser.first_name = 'Bob'   # Change first name
+ >>> newuser.first_name           # Verify the change
+ 'Bob'
+ >>> get_transaction().abort()    # Abort transaction
+ >>> newuser.first_name           # The value has changed back
+ 'Andrew'
+ \end{verbatim}
+ 
+ 
  \subsection{Rules for Writing Persistent Classes}
+ 
+ The ZODB uses various Python hooks to catch attribute accesses,
+ which cover most of the ways of modifying an object, but not all of
+ them.  If you modify a \class{User} object by assigning to one of
+ its attributes, as in \code{userobj.first_name = 'Andrew'}, the
+ ZODB will mark the object as having been changed, and it'll be written
+ out on the following \method{commit()}.
+ 
+ The most common idiom that \emph{isn't} caught by Zope is
+ mutating a list or dictionary.  If \class{User} objects have a
+ attribute named \code{friends} containing a list, calling
+ \code{userobj.friends.append( otherUser )} doesn't mark
+ \code{userobj} as modified; from the ZODB's point of
+ view, \code{userobj.friends} was only read, and its value, which
+ happened to be an ordinary Python list, was returned.  The ZODB isn't
+ aware that the returned value was later modified.
+ 
+ This is one of the few quirks you'll have to remember when using
+ the ZODB; if you modify an attribute of an object in place, you have
+ to manually mark the object as having been modified, by 
+ setting the \code{_p_changed} attribute to true:
+ 
+ \begin{verbatim}
+ userobj.friends.append( otherUser )
+ userobj._p_changed = 1
+ \end{verbatim}
+ 
+ You can hide this implementation detail by not designing your
+ class's API to use direct attribute access; instead, you can use the
+ Java-style approach of accessor methods for everything, and then set
+ \code{_p_changed} within the accessor method.  For example, you
+ might forbid accessing the \code{friends} attribute directly,
+ and add a \method{get_friend_list()} accessor and an
+ \method{add_friend()} modifier method to the class.
+ Alternatively, you could use a ZODB-aware list or mapping type that
+ sets \code{_p_changed} for you; the ZODB includes a
+ \class{PersistentMapping} class, and I've contributed a
+ \class{PersistentList} class that may make it into a future
+ release.
+ 

Index: zeo.tex
===================================================================
RCS file: /cvsroot/py-howto/pyhowto/zodb/zeo.tex,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -r1.1 -r1.2
*** zeo.tex	2000/11/08 23:55:45	1.1
--- zeo.tex	2000/11/11 01:49:37	1.2
***************
*** 1,14 ****

  % ZEO
- %    How ZEO works (ClientStorage)
  %    Installing ZEO
  %    Configuring ZEO

  \section{ZEO}
- \subsection{How ZEO Works}
- XXX (ClientStorage)
-    
- \subsection{Installing ZEO}

  This package contains the Python code for ZEO, packaged to make it
--- 1,11 ----

  % ZEO
  %    Installing ZEO
+ %    How ZEO works (ClientStorage)
  %    Configuring ZEO

  \section{ZEO}

+ \subsection{Installing ZEO}

  This package contains the Python code for ZEO, packaged to make it
***************
*** 86,89 ****
  correctly.

! \subsection{Configuring ZEO}

--- 83,294 ----
  correctly.

! 
! \subsection{How ZEO Works}
! 
!  The ZODB, as I've described it so far, can only be used within a
! single Python process running on one machine.  ZEO, Zope
! Enterprise Objects, extends the ZODB machinery to provide access to
! objects over a network.  The name "Zope Enterprise Objects" is a bit
! misleading.  ZEO can be used to store Python objects, and access them
! in a distributed fashion, without Zope ever entering a picture;
! essentially the combination of ZEO and ZODB is a Python-specific
! object database.
! 
! ZEO consists of about 1400 lines of Python code.  The code is
! relatively small because it contains only code for a TCP/IP server,
! and for a new type of Storage, \class{ClientStorage}.
! \class{ClientStorage} doesn't use disk files at all; it simply
! makes remote procedure calls to the server, which then passes them on
! a regular \class{Storage} class such as \class{FileStorage}.
! The following diagram lays out the system:
! 
! [ XXX insert diagram here ]
! 
! Any number of processes can create a \class{ClientStorage}
! instance, and any number of threads in each process can be using that
! instance.  \class{ClientStorage} aggressively caches objects
! locally, so, in order to avoid using stale data, the ZEO server sends
! an invalidate message to all the connected \class{ClientStorage}
! instances on every write operation.  The invalidate message contains
! the object ID for each object that's been modified, so the
! \class{ClientStorage} instances can delete the old data for the
! given object from their caches.
! 
! This design decision has some consequences you should be aware of.
! First, while ZEO isn't tied to Zope, it was first written for use with
! Zope, which stores HTML, images, and program code in the database.  As
! a result, reads from the database are \emph{far} more frequent than
! writes, and ZEO is therefore better suited for read-intensive
! applications.  If every \class{ClientStorage} is writing to the
! database very frequently, this will result in a storm of invalidate
! messages being sent, and this might take up more processing time than
! the actual database operations themselves. 
! 
! On the other hand, for applications that have few writes in
! comparison to the number of read accesses, this aggressive caching can
! be a major win.  Consider the job of writing a Slashdot-like
! discussion forum, where you want to divide the load among several Web
! servers.  If news items and postings are represented by objects, and
! accessed through ZEO, then the most heavily accessed objects -- the
! most recent or most popular postings -- will very quickly wind up in
! the caches of the \class{ClientStorage} instances on the
! front-end servers.  The back-end ZEO server will do relatively little
! work, only being called upon to return the occasional older posting
! that's requested, and to send the occasional invalidate message when a
! new posting is added.  The ZEO server isn't going to be 
! contacted for
! every single request, so its workload will remain manageable.
! 
! \subsection{Configuring and Running a ZEO Server}

+ Once the code has been unpacked, the next step is to run a ZEO
+ server.  Go to the Zope installation directory and run this command:
+ 
+ \begin{verbatim}python lib/python/ZEO/start.py -p 9672 /tmp/storage.fs
+ \end{verbatim}
+ 
+ This starts a ZEO server listening on TCP port 9672, and using a
+ \class{FileStorage} on top of the file
+ \file{/tmp/storage.fs}.  If you want to use a storage other than
+ \class{FileStorage}, you'll have to manually hack the code in
+ \file{start.py} to create an instance of a different class.
+ 
+ 
+ \subsection{Connecting to a ZEO Server}
+ 
+ Once a ZEO server is up and running, using it is just like using
+ ZODB with a more conventional disk-based storage.  The only difference
+ is that you'll create a \class{ClientStorage} instance instead of
+ a \class{FileStorage} instance:
+ 
+ \begin{verbatim}
+ from ZEO import ClientStorage
+ from ZODB import DB
+ 
+ storage = ClientStorage.ClientStorage( ('localhost', 9762) )
+ db = DB( storage )
+ conn = db.open()
+ \end{verbatim}
+ 
+ From this point onward, your ZODB-based code is happily unaware
+ that objects are being retrieved from a ZEO server, and not from the
+ local disk.
+ 
+ \subsection{Sample Application: chatter.py}
+ 
+ For an example application, we'll build a little chat application.
+ What's interesting is that none of the application's code deals with
+ network programming at all; instead, an object will hold chat
+ messages, and be magically shared between all the clients through ZEO.
+ I won't present the complete script here; you can <a
+ href="chatter.py">download it or read <a
+ href="chatter.py.html">the colourised source code.  Only the
+ interesting portions of the code will be covered here. 
+ 
+ The basic data structure is the \class{ChatSession} object,
+ which provides an \method{add_message()} method that adds a
+ message, and a \method{new_messages()} method that returns a list
+ of new messages that have accumulated since the last call to
+ \method{new_messages()}.  Internally, \class{ChatSession}
+ maintains a B-tree that uses the time as the key, and stores the
+ message as the corresponding value.
+ 
+ The constructor for \class{ChatSession} is pretty simple; it simply creates an attribute containing a B-tree:
+ 
+ \begin{verbatim}
+ class ChatSession(Persistent):
+     def __init__(self, name):
+         self.name = name
+         # Internal attribute: _messages holds all the chat messages.        
+         self._messages = BTree.BTree()        
+ \end{verbatim}
+ 
+ \method{add_message()} has to add a message to the
+ \code{_messages} B-tree.  A complication is that it's possible
+ that some other client is trying to add a message at the same time;
+ when this happens, the client that commits first wins, and the second
+ client will get a \exception{ConflictError} exception when it tries
+ to commit.  For this application, \exception{ConflictError} isn't
+ serious but simply means that the operation has to be retried; other
+ applications might treat it as a fatal error.  The code uses
+ \code{try...except...else} inside a \code{while} loop,
+ breaking out of the loop when the commit works without raising an
+ exception.
+ 
+ \begin{verbatim}
+     def add_message(self, message):
+         """Add a message to the channel.
+         message -- text of the message to be added
+         """
+ 
+         while 1:
+             try:
+                 now = time.time()
+                 self._messages[ now ] = message
+                 get_transaction().commit()
+             except ConflictError:
+                 # Conflict occurred; this process should pause and
+                 # wait for a little bit, then try again.
+                 time.sleep(.2)
+                 pass
+             else:
+                 # No ConflictError exception raised, so break
+                 # out of the enclosing while loop.
+                 break
+         # end while
+ \end{verbatim}
+ 
+ \method{new_messages()} introduces the use of \textit{volatile}
+ attributes.  Attributes of a persistent object that begin with
+ \code{_v_} are considered volatile and are never stored in the
+ database.  \method{new_messages()} needs to store the last time
+ the method was called, but if the time was stored as a regular
+ attribute, its value would be committed to the database and shared
+ with all the other clients.  \method{new_messages()} would then
+ return the new messages accumulated since any other client called
+ \method{new_messages()}, which isn't what we want.
+ 
+ \begin{verbatim}
+     def new_messages(self):
+         "Return new messages."
+ 
+         # self._v_last_time is the time of the most recent message
+         # returned to the user of this class. 
+         if not hasattr(self, '_v_last_time'):
+             self._v_last_time = 0
+ 
+         new = []
+         T = self._v_last_time
+ 
+         for T2, message in self._messages.items():
+             if T2 > T:
+                 new.append( message )
+                 self._v_last_time = T2
+ 
+         return new
+ \end{verbatim}
+ 
+ This application is interesting because it uses ZEO to easily share a
+ data structure, using it more like a networking tool than a database.
+ I can foresee many interesting applications using ZEO in this way:
+ 
+ \begin{itemize}
+   \item With a Tkinter front-end, and a cleverer, more scalable data
+ structure, you could build a shared whiteboard using the same 
+ technique.
+ 
+   \item A shared chessboard object would make writing a networked chess
+   game easy.  
+ 
+   \item You could create a Python class containing a CD's title and
+   track information, and make a CD database containing many objects
+   available through a read-only ZEO server.
+ 
+ \item A program like Quicken could use a ZODB on the local disk to
+   store its data.  This avoids the need to write and maintain
+   specialized I/O code that reads in your objects and writes them out;
+   instead you can concentrate on the problem domain, writing objects
+   that represent cheques, stock portfolios, or whatever.
+ 
+ \end{itemize}
+ 

Index: zodb.tex
===================================================================
RCS file: /cvsroot/py-howto/pyhowto/zodb/zodb.tex,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -r1.1 -r1.2
*** zodb.tex	2000/11/08 23:55:45	1.1
--- zodb.tex	2000/11/11 01:49:37	1.2
***************
*** 1,5 ****
  \documentclass{howto}

! \title{ZODB/ZEO HOWTO}
  \release{0.01}

--- 1,5 ----
  \documentclass{howto}

! \title{Programming with ZODB and ZEO}
  \release{0.01}

***************
*** 27,30 ****
--- 27,31 ----

  \appendix
+ \input links.tex
  \input gfdl.tex

[Py-howto-checkins] CVS: pyhowto/zodb links.tex,NONE,1.1 chatter.py,1.1,1.2 gfdl.tex,1.1,1.2 introdu

[Py-howto-checkins] CVS: pyhowto/zodb links.tex,NONE,1.1 chatter.py,1.1,1.2 gfdl.tex,1.1,1.2 introduction.tex,1.1,1.2 prog-zodb.tex,1.1,1.2 zeo.tex,1.1,1.2 zodb.tex,1.1,1.2