From: A.M. K. <aku...@us...> - 2000-11-11 01:49:40
|
Update of /cvsroot/py-howto/pyhowto/zodb In directory slayer.i.sourceforge.net:/tmp/cvs-serv3293 Modified Files: chatter.py gfdl.tex introduction.tex prog-zodb.tex zeo.tex zodb.tex Added Files: links.tex Log Message: Add the text from my old ZODB/ZEO page (very roughly, so it still needs a *lot* of editing) --- NEW FILE --- % links.tex % Collection of relevant links ZODB HOWTO, by Michel Pelletier:\\ Goes into slightly more detail about the rules for writing applications using the ZODB. \\ \url{http://www.zope.org/Members/michel/HowTos/ZODB-How-To} Introduction to the Zope Object Database, by Jim Fulton \\ Goes into much greater detail, explaining advanced uses of the ZODB and how it's actually implemented. A definitive reference, and highly recommended. \\ \url{http://www.python.org/workshops/2000-01/proceedings/papers/fulton/zodb3.html} Download link for ZEO \\ \url{http://www.zope.org/Products/ZEO/} Index: chatter.py =================================================================== RCS file: /cvsroot/py-howto/pyhowto/zodb/chatter.py,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -r1.1 -r1.2 *** chatter.py 2000/11/08 23:55:45 1.1 --- chatter.py 2000/11/11 01:49:37 1.2 *************** *** 28,33 **** self.name = name ! # Internal attribute: _messages holds all the chat messages. ! self._messages = BTree.BTree() --- 28,32 ---- self.name = name ! # Internal attribute: _messages holds all the chat messages. self._messages = BTree.BTree() Index: gfdl.tex =================================================================== RCS file: /cvsroot/py-howto/pyhowto/zodb/gfdl.tex,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -r1.1 -r1.2 *** gfdl.tex 2000/11/08 23:55:45 1.1 --- gfdl.tex 2000/11/11 01:49:37 1.2 *************** *** 1,3 **** ! % fdl.tex % This file is a chapter. It must be included in a larger document to work % properly. --- 1,3 ---- ! % gfdl.tex % This file is a chapter. It must be included in a larger document to work % properly. Index: introduction.tex =================================================================== RCS file: /cvsroot/py-howto/pyhowto/zodb/introduction.tex,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -r1.1 -r1.2 *** introduction.tex 2000/11/08 23:55:45 1.1 --- introduction.tex 2000/11/11 01:49:37 1.2 *************** *** 17,21 **** programming languages provide facilities that automatically write objects to disk and read them in again when they're required by a ! running program. It's certainly possible to build your own system for making Python --- 17,21 ---- programming languages provide facilities that automatically write objects to disk and read them in again when they're required by a ! running program, and the ZODB adds such facilities to Python. It's certainly possible to build your own system for making Python *************** *** 35,66 **** - \subsection{OODBs vs. Relational DBs} Another way to look at it is that the ZODB is a Python-specific ! object-oriented database (OODB). Relational databases (RDBs) are far more common than OODBs. ! Relational databases store information in tables; a table consists of any number of rows, each row containing several columns of ! information. Let's introduce the example that we'll be using through this document. The example comes from my day job working for the MEMS Exchange, in a greatly simplified version. - - XXX explain a bit more ! The job is to track process runs, which are lists of manufacturing steps ! to be performed in a semiconductor fab. A run is owned by a particular user, and has a name and assigned ID number. Runs consist of a number of operations; an operation is a single step to be performed, such as depositing something on a wafer or etching ! something off it. Operations may have parameters, which are ! additional information required to perform an operation. For example, ! if you're depositing something on a wafer, you need to know two ! things: 1) what you're depositing, and 2) how much should be ! deposited. ! You might deposit 100 microns of silicon oxide, or 1 micron of copper. Mapping these structures to a relational database is straightforward: --- 35,68 ---- \subsection{OODBs vs. Relational DBs} Another way to look at it is that the ZODB is a Python-specific ! object-oriented database (OODB). Commercial object databases for C++ ! or Java often require that you jump through some hoops, using a ! special preprocessor or avoiding certain data types. As we'll see, ! the ZODB has some hoops of its own, but in comparison the naturalness ! of the ZODB is astonishing. Relational databases (RDBs) are far more common than OODBs. ! Relational databases store information in tables; a table consists of any number of rows, each row containing several columns of ! information. Let's introduce the example that we'll be using through this document. The example comes from my day job working for the MEMS Exchange, in a greatly simplified version. ! The job is to track process runs, which are lists of manufacturing ! steps to be performed in a semiconductor fab. A run is owned by a particular user, and has a name and assigned ID number. Runs consist of a number of operations; an operation is a single step to be performed, such as depositing something on a wafer or etching ! something off it. ! ! Operations may have parameters, which are additional information ! required to perform an operation. For example, if you're depositing ! something on a wafer, you need to know two things: 1) what you're ! depositing, and 2) how much should be deposited. You might deposit ! 100 microns of silicon oxide, or 1 micron of copper. Mapping these structures to a relational database is straightforward: *************** *** 81,87 **** class Run: .run_id ! ... XXX finish - If you were \subsection{What is ZEO?} --- 83,92 ---- class Run: .run_id ! ... XXX finish this code ... ! ! ! ! XXX continue \subsection{What is ZEO?} Index: prog-zodb.tex =================================================================== RCS file: /cvsroot/py-howto/pyhowto/zodb/prog-zodb.tex,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -r1.1 -r1.2 *** prog-zodb.tex 2000/11/08 23:55:45 1.1 --- prog-zodb.tex 2000/11/11 01:49:37 1.2 *************** *** 8,16 **** \section{ZODB Programming} - \subsection{How ZODB Works} - - XXX (ExtensionClass, dirty bits) - - \subsection{Installing ZODB} --- 8,11 ---- *************** *** 59,64 **** --- 54,240 ---- problems, please let me know. + \subsection{How ZODB Works} + XXX (ExtensionClass, dirty bits) + + There are 3 main interfaces in the ZODB: + \class{Storage}, \class{DB}, and \class{Connection} classes. + + \begin{itemize} + \item \class{Storage} classes are the lowest layer, and handle storing + and retrieving objects from some form of long-term storage. A few + different types of Storage have been written, such as + \class{FileStorage}, which uses regular files, + and \class{BerkeleyStorage}, which uses Sleepycat Software's + BerkeleyDB 2.7. You could + write a new Storage that stored objects in a relational database or + Metakit file, for example, if you needed to ensure some property + useful to your application. Two example storages, + \class{DemoStorage} and \class{MappingStorage}, are + available to use as models if you want to write a new Storage. + + \item The \class{DB} class sits on top of a storage, and mediates the + interaction between several connections. One \class{DB} instance + is created per process. + + \item Finally, the \class{Connection} class caches objects, and moves + them into and out of object storage. A multi-threaded program can + open a separate \class{Connection} instance for each thread. + + \end{itemize} + + \subsection{Opening a ZODB} + + Preparing to use a ZODB requires 3 steps: you have to open the + Storage, then create a DB instance that uses the Storage, and then get + a Connection from the DB instance. All this is only a few lines of + code: + + \begin{verbatim} + from ZODB import FileStorage, DB + + storage = FileStorage.FileStorage('/tmp/test-filestorage.fs') + db = DB( storage ) + conn = db.open() + \end{verbatim} + + Note that you can use a completely different data storage mechanism + by changing the first line that opens a Storage; the + above example uses a \class{FileStorage}. Soon you'll see how ZEO + uses this flexibility to good effect. + + \subsection{Using a ZODB} + + Making a Python class persistent is quite simple; it simply needs + to subclass from the \class{Persistent} class, as shown in this + example: + + \begin{verbatim} + import ZODB + from Persistence import Persistent + + class User(Persistent): + pass + \end{verbatim} + + (The apparently unnecessary \code{import ZODB} statement is + needed for the following \\code{from...import} statement to work + correctly, since the ZODB code is doing some magical tricks with + importing.) + + For simplicity, in the examples the \class{User} class will + simply be used as a holder for a bunch of attributes. Normally the + class would define various methods that add functionality, but that + has no impact on the ZODB's treatment of the class. + + The ZODB uses persistence by reachability; starting from + a set of root objects, all the attributes of those objects are made + persistent, whether they're simple Python data types or class instances. + + As an example, we'll create a simple database of users that allows + retrieving a \class{User} object given the user's ID. First, we + retrieve the primary root object of the ZODB; this object behaves like + a Python dictionary, so you can just add a new key/value pair for your + application's root object. We'll insert a \class{BTree} object + that will contain all the \class{User} objects. (The + \class{BTree} module is also included as part of Zope.) + + \begin{verbatim}dbroot = conn.root() + + # Ensure that a 'userdb' key is present + if not dbroot.has_key('userdb'): + import BTree + dbroot['userdb'] = BTree.BTree() + + userdb = dbroot['userdb'] + \end{verbatim} + + Inserting a new user is simple: create the \class{User} + object, fill it with data, insert it into the BTree, and commit this + transaction. + + \begin{verbatim}# Create new User instance + newuser = User() + + # Add whatever attributes you want to track + newuser.id = 'amk' + newuser.first_name = 'Andrew' ; newuser.last_name = 'Kuchling' + ... + + # Add object to the BTree, keyed on the ID + userdb[ newuser.id ] = newuser + + # Commit the change + get_transaction().commit() + \end{verbatim} + + When you import the ZODB package, it adds a new function, + \function{get_transaction()}, to Python's collection of built-in + functions. \function{get_transaction()} returns a transaction + object, which has two important methods: \method{commit()} and + \method{abort()}. \method{commit()} writes out any modified objects + to disk, making the changes permanent, while + \method{abort()} rolls back any changes that have been made, + restoring the original state of the objects. If you're familiar with + database transactional semantics, this is all what you'd expect. + + Because the integration with Python is so complete, it's a lot like + having transactional semantics for your program's variables, and you + can experiment with transactions at the Python interpreter's prompt: + + \begin{verbatim}>>> newuser + <User instance at 81b1f40> + >>> newuser.first_name # Print initial value + 'Andrew' + >>> newuser.first_name = 'Bob' # Change first name + >>> newuser.first_name # Verify the change + 'Bob' + >>> get_transaction().abort() # Abort transaction + >>> newuser.first_name # The value has changed back + 'Andrew' + \end{verbatim} + + \subsection{Rules for Writing Persistent Classes} + + The ZODB uses various Python hooks to catch attribute accesses, + which cover most of the ways of modifying an object, but not all of + them. If you modify a \class{User} object by assigning to one of + its attributes, as in \code{userobj.first_name = 'Andrew'}, the + ZODB will mark the object as having been changed, and it'll be written + out on the following \method{commit()}. + + The most common idiom that \emph{isn't} caught by Zope is + mutating a list or dictionary. If \class{User} objects have a + attribute named \code{friends} containing a list, calling + \code{userobj.friends.append( otherUser )} doesn't mark + \code{userobj} as modified; from the ZODB's point of + view, \code{userobj.friends} was only read, and its value, which + happened to be an ordinary Python list, was returned. The ZODB isn't + aware that the returned value was later modified. + + This is one of the few quirks you'll have to remember when using + the ZODB; if you modify an attribute of an object in place, you have + to manually mark the object as having been modified, by + setting the \code{_p_changed} attribute to true: + + \begin{verbatim} + userobj.friends.append( otherUser ) + userobj._p_changed = 1 + \end{verbatim} + + You can hide this implementation detail by not designing your + class's API to use direct attribute access; instead, you can use the + Java-style approach of accessor methods for everything, and then set + \code{_p_changed} within the accessor method. For example, you + might forbid accessing the \code{friends} attribute directly, + and add a \method{get_friend_list()} accessor and an + \method{add_friend()} modifier method to the class. + Alternatively, you could use a ZODB-aware list or mapping type that + sets \code{_p_changed} for you; the ZODB includes a + \class{PersistentMapping} class, and I've contributed a + \class{PersistentList} class that may make it into a future + release. + Index: zeo.tex =================================================================== RCS file: /cvsroot/py-howto/pyhowto/zodb/zeo.tex,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -r1.1 -r1.2 *** zeo.tex 2000/11/08 23:55:45 1.1 --- zeo.tex 2000/11/11 01:49:37 1.2 *************** *** 1,14 **** % ZEO - % How ZEO works (ClientStorage) % Installing ZEO % Configuring ZEO \section{ZEO} - \subsection{How ZEO Works} - XXX (ClientStorage) - - \subsection{Installing ZEO} This package contains the Python code for ZEO, packaged to make it --- 1,11 ---- % ZEO % Installing ZEO + % How ZEO works (ClientStorage) % Configuring ZEO \section{ZEO} + \subsection{Installing ZEO} This package contains the Python code for ZEO, packaged to make it *************** *** 86,89 **** correctly. ! \subsection{Configuring ZEO} --- 83,294 ---- correctly. ! ! \subsection{How ZEO Works} ! ! The ZODB, as I've described it so far, can only be used within a ! single Python process running on one machine. ZEO, Zope ! Enterprise Objects, extends the ZODB machinery to provide access to ! objects over a network. The name "Zope Enterprise Objects" is a bit ! misleading. ZEO can be used to store Python objects, and access them ! in a distributed fashion, without Zope ever entering a picture; ! essentially the combination of ZEO and ZODB is a Python-specific ! object database. ! ! ZEO consists of about 1400 lines of Python code. The code is ! relatively small because it contains only code for a TCP/IP server, ! and for a new type of Storage, \class{ClientStorage}. ! \class{ClientStorage} doesn't use disk files at all; it simply ! makes remote procedure calls to the server, which then passes them on ! a regular \class{Storage} class such as \class{FileStorage}. ! The following diagram lays out the system: ! ! [ XXX insert diagram here ] ! ! Any number of processes can create a \class{ClientStorage} ! instance, and any number of threads in each process can be using that ! instance. \class{ClientStorage} aggressively caches objects ! locally, so, in order to avoid using stale data, the ZEO server sends ! an invalidate message to all the connected \class{ClientStorage} ! instances on every write operation. The invalidate message contains ! the object ID for each object that's been modified, so the ! \class{ClientStorage} instances can delete the old data for the ! given object from their caches. ! ! This design decision has some consequences you should be aware of. ! First, while ZEO isn't tied to Zope, it was first written for use with ! Zope, which stores HTML, images, and program code in the database. As ! a result, reads from the database are \emph{far} more frequent than ! writes, and ZEO is therefore better suited for read-intensive ! applications. If every \class{ClientStorage} is writing to the ! database very frequently, this will result in a storm of invalidate ! messages being sent, and this might take up more processing time than ! the actual database operations themselves. ! ! On the other hand, for applications that have few writes in ! comparison to the number of read accesses, this aggressive caching can ! be a major win. Consider the job of writing a Slashdot-like ! discussion forum, where you want to divide the load among several Web ! servers. If news items and postings are represented by objects, and ! accessed through ZEO, then the most heavily accessed objects -- the ! most recent or most popular postings -- will very quickly wind up in ! the caches of the \class{ClientStorage} instances on the ! front-end servers. The back-end ZEO server will do relatively little ! work, only being called upon to return the occasional older posting ! that's requested, and to send the occasional invalidate message when a ! new posting is added. The ZEO server isn't going to be ! contacted for ! every single request, so its workload will remain manageable. ! ! \subsection{Configuring and Running a ZEO Server} + Once the code has been unpacked, the next step is to run a ZEO + server. Go to the Zope installation directory and run this command: + + \begin{verbatim}python lib/python/ZEO/start.py -p 9672 /tmp/storage.fs + \end{verbatim} + + This starts a ZEO server listening on TCP port 9672, and using a + \class{FileStorage} on top of the file + \file{/tmp/storage.fs}. If you want to use a storage other than + \class{FileStorage}, you'll have to manually hack the code in + \file{start.py} to create an instance of a different class. + + + \subsection{Connecting to a ZEO Server} + + Once a ZEO server is up and running, using it is just like using + ZODB with a more conventional disk-based storage. The only difference + is that you'll create a \class{ClientStorage} instance instead of + a \class{FileStorage} instance: + + \begin{verbatim} + from ZEO import ClientStorage + from ZODB import DB + + storage = ClientStorage.ClientStorage( ('localhost', 9762) ) + db = DB( storage ) + conn = db.open() + \end{verbatim} + + From this point onward, your ZODB-based code is happily unaware + that objects are being retrieved from a ZEO server, and not from the + local disk. + + \subsection{Sample Application: chatter.py} + + For an example application, we'll build a little chat application. + What's interesting is that none of the application's code deals with + network programming at all; instead, an object will hold chat + messages, and be magically shared between all the clients through ZEO. + I won't present the complete script here; you can <a + href="chatter.py">download it or read <a + href="chatter.py.html">the colourised source code. Only the + interesting portions of the code will be covered here. + + The basic data structure is the \class{ChatSession} object, + which provides an \method{add_message()} method that adds a + message, and a \method{new_messages()} method that returns a list + of new messages that have accumulated since the last call to + \method{new_messages()}. Internally, \class{ChatSession} + maintains a B-tree that uses the time as the key, and stores the + message as the corresponding value. + + The constructor for \class{ChatSession} is pretty simple; it simply creates an attribute containing a B-tree: + + \begin{verbatim} + class ChatSession(Persistent): + def __init__(self, name): + self.name = name + # Internal attribute: _messages holds all the chat messages. + self._messages = BTree.BTree() + \end{verbatim} + + \method{add_message()} has to add a message to the + \code{_messages} B-tree. A complication is that it's possible + that some other client is trying to add a message at the same time; + when this happens, the client that commits first wins, and the second + client will get a \exception{ConflictError} exception when it tries + to commit. For this application, \exception{ConflictError} isn't + serious but simply means that the operation has to be retried; other + applications might treat it as a fatal error. The code uses + \code{try...except...else} inside a \code{while} loop, + breaking out of the loop when the commit works without raising an + exception. + + \begin{verbatim} + def add_message(self, message): + """Add a message to the channel. + message -- text of the message to be added + """ + + while 1: + try: + now = time.time() + self._messages[ now ] = message + get_transaction().commit() + except ConflictError: + # Conflict occurred; this process should pause and + # wait for a little bit, then try again. + time.sleep(.2) + pass + else: + # No ConflictError exception raised, so break + # out of the enclosing while loop. + break + # end while + \end{verbatim} + + \method{new_messages()} introduces the use of \textit{volatile} + attributes. Attributes of a persistent object that begin with + \code{_v_} are considered volatile and are never stored in the + database. \method{new_messages()} needs to store the last time + the method was called, but if the time was stored as a regular + attribute, its value would be committed to the database and shared + with all the other clients. \method{new_messages()} would then + return the new messages accumulated since any other client called + \method{new_messages()}, which isn't what we want. + + \begin{verbatim} + def new_messages(self): + "Return new messages." + + # self._v_last_time is the time of the most recent message + # returned to the user of this class. + if not hasattr(self, '_v_last_time'): + self._v_last_time = 0 + + new = [] + T = self._v_last_time + + for T2, message in self._messages.items(): + if T2 > T: + new.append( message ) + self._v_last_time = T2 + + return new + \end{verbatim} + + This application is interesting because it uses ZEO to easily share a + data structure, using it more like a networking tool than a database. + I can foresee many interesting applications using ZEO in this way: + + \begin{itemize} + \item With a Tkinter front-end, and a cleverer, more scalable data + structure, you could build a shared whiteboard using the same + technique. + + \item A shared chessboard object would make writing a networked chess + game easy. + + \item You could create a Python class containing a CD's title and + track information, and make a CD database containing many objects + available through a read-only ZEO server. + + \item A program like Quicken could use a ZODB on the local disk to + store its data. This avoids the need to write and maintain + specialized I/O code that reads in your objects and writes them out; + instead you can concentrate on the problem domain, writing objects + that represent cheques, stock portfolios, or whatever. + + \end{itemize} + Index: zodb.tex =================================================================== RCS file: /cvsroot/py-howto/pyhowto/zodb/zodb.tex,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -r1.1 -r1.2 *** zodb.tex 2000/11/08 23:55:45 1.1 --- zodb.tex 2000/11/11 01:49:37 1.2 *************** *** 1,5 **** \documentclass{howto} ! \title{ZODB/ZEO HOWTO} \release{0.01} --- 1,5 ---- \documentclass{howto} ! \title{Programming with ZODB and ZEO} \release{0.01} *************** *** 27,30 **** --- 27,31 ---- \appendix + \input links.tex \input gfdl.tex |