[Py-howto-checkins] CVS: pyhowto rexec.tex,1.12,1.13

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Update of /cvsroot/py-howto/pyhowto
In directory sc8-pr-cvs1:/tmp/cvs-serv3716

Modified Files:
	rexec.tex 
Log Message:
Withdraw the rexec HOWTO

Index: rexec.tex
===================================================================
RCS file: /cvsroot/py-howto/pyhowto/rexec.tex,v
retrieving revision 1.12
retrieving revision 1.13
diff -C2 -r1.12 -r1.13
*** rexec.tex	26 Nov 2002 16:05:50 -0000	1.12
--- rexec.tex	17 Dec 2002 14:31:42 -0000	1.13
***************
*** 3,7 ****
  \title{Restricted Execution HOWTO}

! \release{2.0}

  \author{A.M. Kuchling}
--- 3,7 ----
  \title{Restricted Execution HOWTO}

! \release{2.1}

  \author{A.M. Kuchling}
***************
*** 14,559 ****
  \begin{abstract}
  \noindent
- Python provides a restricted execution mode for running untrusted code
- that will prevent the code from performing dangerous operations.  This
- HOWTO explains how to use restricted execution mode, and how to
- customize the restricted environment for your application.  It aims to
- provide a gentler introduction than the corresponding section in the
- Python Library Reference.

! This document is available from the Python HOWTO page at
! \url{http://www.python.org/doc/howto}.

  \end{abstract}

- \tableofcontents
- 
- \section{Basic use of \class{RExec}}
- 
- For some applications, it's desirable to execute chunks of Python code
- that come from an outside source.  The most obvious example is a Web
- browser such as Grail, which can download and execute applets written in
- Python.
- 
- An obvious danger of downloading and running code from anywhere is that
- someone might write a malicious applet that appears to be harmless, but
- silently erases files, makes copies of sensitive data, or gives the
- applet's author a back door into your system.  The solution is to run
- the code in a restricted environment, where it's prevented from
- performing any operations that could be used maliciously.  
- 
- Java does this by using the Java Virtual Machine, which executes Java
- bytecode.  The virtual machine, or VM, has complete control over the
- running applet, and any dangerous operations must go through the VM in
- order to be performed.  The VM can therefore trap suspicious activity,
- and stop the applet's execution, if a strict security policy is used, or
- ask the user if the operation should be permitted, if the policy is
- somewhat looser.
- 
- Python already has a virtual machine that executes Python byte codes, so
- creating a restricted execution environment simply requires sealing off
- dangerous built-in functions such as \code{open()}, and dangerous
- modules, such as the \code{socket} module.  This can be done by creating
- new namespaces, removing any dangerous functions, and forcing code to be
- executed in those namespaces.  While a simple idea, in practice it's
- fairly complicated to implement.  Luckily, the required features have
- been present in Python for a while, and it's already been implemented
- for you as a standard module.
- 
- Code for using a restricted execution environment is in the \file{rexec}
- module.  The base class is called \class{RExec}; in a later section of
- this HOWTO, we'll show you how to create your own subclasses of
- \class{RExec} to customize the functions and modules that are available.
- Here's the documentation for creating a new \class{RExec} instance:
- 
- \begin{funcdesc}{RExec}{[\var{hooks}], [\var{verbose}] }
- Returns a \class{RExec} instance.  The \var{verbose} parameter is a
- Boolean value, defaulting to false.  If true, the \class{RExec} instance
- will execute in verbose mode, which will print a debugging message when
- modules are imported, as if the \code{-v} option was given to the Python
- interpreter.
- 
- The \var{hooks} parameter can be an instance of the \code{RHooks}
- class, or of some subclass of \code{RHooks}; a default instance will
- be used if the parameter is omitted.  This is only required when
- creating particularly exotic restricted environments that import
- modules in new ways.  If you need to use this, you'll have to
- consult the source code (or Guido) for a complete picture of what's
- going on.   
- \end{funcdesc}
- 
- The \class{RExec} instance has \code{r_exec()}, \code{r_eval()}, and
- \code{r_execfile()} functions, which do the same thing as Python's
- built-in \code{exec()}, \code{eval()}, and \code{execfile()} functions,
- performing them in the restricted environment.  (There are also
- \code{s_exec()}, \code{s_eval()}, and \code{s_execfile()} methods which
- replace the restricted environment's standard input, output, and error
- files with \code{StringIO} objects that allow you to control the input
- and capture any output generated.)
- 
- Here's a sample usage of a restricted environment.  First, the
- \class{RExec} instance has to be created.
- 
- \begin{verbatim}
- r_env = rexec.RExec()
- \end{verbatim}
- 
- Now, we can execute code and evaluate expressions 
- in the environment:
- 
- \begin{verbatim}
- r_env.r_exec('import string')
- expr = 'string.upper("This is a test")'
- print r_env.r_eval( expr )  
- \end{verbatim}
- 
- The first line executes a statement, importing the \code{string} module.
- Since it's considered a safe module, the operation succeeds.  The second
- and third lines create a string containing an expression, and evaluates
- the expression in the restricted environment; it prints out \samp{THIS
- IS A TEST}, as you'd expect.
- 
- Unsafe operations trigger an exception.  For example:
- 
- \begin{verbatim}
- r_env.r_exec('import socket')
- \end{verbatim}
- 
- The previous line will cause an \code{ImportError} exception to be
- raised, with an associated string value that reads "untrusted dynamic
- module: _socket".  Trying to open a file for writing is also forbidden:
- 
- \begin{verbatim}
- r_env.r_exec('file = open("/tmp/a.out", "w")')
- \end{verbatim}
- 
- This will raise an \code{IOError} exception, with an assocated string
- value that reads "can't open files for writing in restricted mode".  The
- restricted code can catch the exception in a \code{try...except} block
- and continue running; this is useful for writing code which works in
- both restricted and unrestricted mode.  Opening files for reading will
- work, however.
- 
- Exactly what restrictions does the base \class{RExec} impose?  It limits
- the modules that can be imported to the following safe list:
- 
- \begin{verbatim}
- audioop, array, binascii, cmath, errno, imageop,
- marshal, math, md5, operator, parser, regex, 
- pcre, rotor, select, strop, struct, time
- \end{verbatim}
- 
- In general, these are modules that can't affect anything outside of
- the executing code; they allow various forms of computation, but don't
- allow operations that change the filesystem or use network connections
- to other machines.  (The \code{pcre} module may be unfamiliar.  It's
- an internal module used by the \module{re} module, so restricted code
- can still use the \module{re} to perform regular expression matches.)
- 
- It also restricts the variables and functions that are available from
- the \code{sys} and \code{os} modules.  The \code{sys} module only
- contains the following symbols:
- 
- \begin{verbatim}
- ps1, ps2, copyright, version, platform, exit, maxint
- \end{verbatim}
- 
- The \code{os} module is reduced to the following functions:
- 
- \begin{verbatim}
- error, fstat, listdir, lstat, readlink,
- stat, times, uname, getpid, getppid,
- getcwd, getuid, getgid, geteuid, getegid
- \end{verbatim}
- 
- Note that restricted code has some read-only access to the filesystem
- via functions like \code{os.stat} and \code{os.readlink}; if you wish to
- forbid all access to the filename, these functions must be removed.
- 
- In restricted mode, there are various attributes of function and class
- objects that are no longer accessible: the \code{__dict__} attribute of
- class, instance and module objects; the \code{__self__} attribute of
- method objects; and most of the attributes of function objects, namely
- \code{func_code}, \code{func_defaults}, \code{func_doc},
- \code{func_globals}, and \code{func_name}.
- 
- The \code{__import__()} and \code{reload()} functions are replaced by
- versions which implement the above restrictions.  Finally, Python's
- usual \code{open()} function is removed and replaced by a restricted
- version that only allows opening files for reading.
- 
- To change any of these policies, whether to be stricter or looser, see
- the section below on customizing the restricted environment.
- 
- \section{Frequently Asked Questions}
- 
- \emph{How do I guard against denial-of-service attacks?  Or, how do I
- keep restricted code from consuming a lot of memory?}
- 
- Even if restricted code can't open sockets or write files, it can still
- cause problems by entering an infinite loop or consuming lots of memory;
- this is as easy as coding \code{while 1: pass} or \code{'a' *
- 12345678901}.  Unfortunately, there's no way at present to prevent
- restricted code from doing this.  The Python process may therefore
- encounter a \code{MemoryError} exception, loop forever, or be killed by
- the operating system.  
- 
- One solution would be to perform \code{os.fork()} to get a child process
- running the interpreter.  The child could then use the \code{resource}
- module to set limits on the amount of memory, stack space, and CPU time
- it can consume, and run the restricted code.  In the meantime, the
- parent process can set a timeout and wait for the child to return its
- results; if the child takes too long, the parent can conclude that the
- restricted code looped forever, and kill the child process.
- 
- \emph{If restricted code returns a class instance via \code{r_eval()},
- can that class instance do nasty things if unrestricted code calls its
- methods?}
- 
- You might be worried about the handling of values returned by
- \code{r_eval()}.  For example, let's say your program does this:
- 
- \begin{verbatim}
- value = r_env.r_eval( expression )
- print str(value)
- \end{verbatim}
- 
- If \code{value} is a class instance, and has a \code{__str__} method,
- that method will get called by the \code{str()} function.  Is it
- possible for the restricted code to return a class instance where the
- \code{__str__} function does something nasty?  Does this provide a way
- for restricted code to smuggle out code that gets run without
- restrictions?
- 
- The answer is no.  If restricted code returns a class instance, or a
- function, then, despite being called by unrestricted code, those
- functions will always be executed in the restricted environment.  You
- can see why if you follow this little exercise.  Run the interpreter in
- interactive mode, and create a sample class with a single method.
- 
- \begin{verbatim}
- >>> class C:
- ...   def f(self): print "Hi!"
- ... 
- \end{verbatim}
- 
- Now, look at the attributes of the unbound method \code{C.f}:
- 
- \begin{verbatim}
- >>> dir(C.f)
- ['__doc__', '__name__', 'im_class', 'im_func', 'im_self']
- \end{verbatim}
- 
- \code{im_func} is the attribute we're interested in; it contains the
- actual function for the method.  Look at the function's attributes using
- the \code{dir()} built-in function, and then look at the
- \code{func_globals} attribute.
- 
- \begin{verbatim}
- >>> dir(C.f.im_func)
- ['__doc__', '__name__', 'func_code', 'func_defaults', 'func_doc', 
-  'func_globals', 'func_name']
- >>> C.f.im_func.func_globals
- {'__doc__': None, '__name__': '__main__', 
-  '__builtins__': <module '__builtin__'>, 
-  'f': <function f at 1201a68b0>, 
-  'C': <class __main__.C at 1201b35e0>, 
-  'a': <__main__.C instance at 1201a6b10>}
- \end{verbatim}
- 
- See how the function contains attributes for its \code{__builtins__}
- module?  This means that, wherever it goes, the function will always use
- the same \code{__builtin__} module, namely the one provided by the
- restricted environment.  
- 
- This means that the function's module scope is limited to that of the
- restricted environment; it has no way to access any variables or
- methods in the unrestricted environment that is calling into the
- restricted environment.  
- 
- \begin{verbatim}
- r_env.r_exec('def f(): g()\n')
- f = r_env.r_eval('f')
- def g(): print "I'm unrestricted."
- \end{verbatim}
- 
- If you execute the \code{f()} function in the unrestricted module, it
- will fail with a \code{NameError} exception, because \code{f()} doesn't
- have access to the unrestricted namespace.  To make this work, you'd
- must insert \code{g} into the restricted namespace.  Be careful when
- doing this, since \code{g} will be executed without restrictions; you
- have to be sure that \code{g} is a function that can't be used to do
- any damage.  (Or is an instance with no methods that do anything
- dangerous.  Or is a module containing no dangerous functions.  You get
- the idea.)  
- 
- 
- \emph{What happens if restricted code raises an exception?}
- 
- The \module{rexec} module doesn't do anything special for exceptions
- raised by restricted code; they'll be propagated up the call stack
- until a \code{try...except} statement is found that catches it.  If
- no exception handler is found, the interpreter will print a traceback and exit, which
- is its usual behaviour.  To prevent untrusted code from terminating
- the program, you should surround calls to \code{r_exec()},
- \code{r_execfile()}, etc. with a \code{try...except} statement.
- 
- Python 1.5 introduced exceptions that could be classes; for more
- information about this new feature, consult
- \url{http://www.python.org/doc/essays/stdexceptions.html}.  
- Class-based exceptions present a problem; the separation between
- restricted and unrestricted namespaces may cause confusion.  Consider
- this example code, suggested by Jeff Rush.
- 
- t1.py:
- \begin{verbatim}
- # t1.py
- 
- from rexec import RHooks, RExec
- from t2 import MyException
- r= RExec( )
- 
- print 'MyException class:', repr(MyException)
- try:
-     r.r_execfile('t3.py')
- except MyException, args:
-     print 'Got MyException in t3.py'
- except:
-     print 'Missed MyException "%s" in t3.py' % repr(MyException)
- \end{verbatim}
- 
- t2.py
- \begin{verbatim}
- #t2.py
- 
- class MyException(Exception): pass
- def myfunc():
-     print 'Raising', `MyException`
-     raise MyException, 5
- 
- print 't2 module initialized'
- \end{verbatim}
- 
- t3.py:
- \begin{verbatim}
- #t3.py
- import sys
- from t2 import MyException, myfunc
- myfunc()
- \end{verbatim}
- 
- So, \file{t1.py} imports the \code{MyException} class from
- \file{t2.py}, and then executes some restricted code that also imports
- \file{t2.py} and raises \code{MyException}.  However, because of the
- separation between restricted and unrestricted code, \code{t2.py} is
- actually imported twice, once in each mode.  Therefore two distinct
- class objects are created for \code{MyException}, and the
- \code{except} statement doesn't catch the exception because it seems
- to be of the wrong class.
- 
- The solution is to modify \file{t1.py} to pluck the class object out
- of the restricted environment, instead of importing it.  The following
- code will do the job, if added to \code{t1.py}:
- 
- \begin{verbatim}
- module = r.add_module('__main__')
- mod_dict = module.__dict__
- MyException = mod_dict['MyException']
- \end{verbatim}
- 
- The first two lines simply get the dictionary for the \code{__main__}
- module; this is a usage pattern discussed above.  The last line simply
- gets the value corresponding to 'MyException', which will be the class
- object for \code{MyException}.
- 
- \section{Customizing The Restricted Environment}
- \label{sect-customizing}
- 
- \subsection{Inserting Variables}
- 
- While restricted code may be completely self-contained, it's common for
- it to require other data: perhaps a tuple listing various available
- plug-ins, or a dictionary mapping symbols to values.  For simple Python
- data types, such as numbers and strings, the natural solution is to
- insert variables into one of the namespaces used by the restricted
- environment, binding the desired variable name to the value.
- 
- Continuing from the examples above, you can get the dictionary
- corresponding to the restricted module named \code{module_name} with the
- following code:
- 
- \begin{verbatim}
- module = r_env.add_module(module_name)
- mod_dict = module.__dict__
- \end{verbatim}
- 
- Despite its name, the \code{add_module()} method actually only adds the
- module if it doesn't already exist; it returns the corresponding module
- object, whether or not the module had to be created.  
- 
- Most commonly, you'll insert variable bindings into the \code{__main__}
- or \code{__builtins__} module, so these will be the most frequent values
- of \code{module_name}.  
- 
- Once you have the module's dictionary, you need only insert a key/value
- pair for the desired variable name and value.  For example, to add a
- \code{username} variable:
- 
- \begin{verbatim}
- mod_dict['username'] = "Kate Bush"
- \end{verbatim}
- 
- Restricted code will then have access to this variable.
- 
- \subsection{Allowing Access to Unrestricted Objects}
- 
- Often, the code being executed will need access to various objects that
- exist outside the restricted environment.  For example, an applet should
- be able to read some attributes of the object representing the browser,
- or needs access to the \code{Tkinter} module to provide a GUI display.
- But the browser object, or the \code{Tkinter} module aren't safe, so
- what can be done?
- 
- The solution is in the \code{Bastion} module, which lets you create
- class instances that represent some other Python object, but deny access
- to certain sensitive attributes or methods.
- 
- \begin{funcdesc}{Bastion}{\var{object}, [\var{filter}], [\var{name}],
- [\var{class}] }
- 
- Return a \code{Bastion} instance protecting the class instance
- \var{object}.  Any attempt to access one of the object's attributes will
- have to be approved by the \var{filter} function; if the access is
- denied an \code{AttributeError} exception will be raised.
- 
- If present, \var{filter} must be a function that accepts a string
- containing an attribute name, and returns true if access to that
- attribute will be permitted; if \var{filter} returns false, the access
- is denied.  The default filter denies access to any function beginning
- with an underscore \samp{_}.  The bastion's string representation
- will be \code{<Bastion for \var{name}>} if a value for
- \var{name} is provided; otherwise, \code{repr(\var{object})} will be used.
- 
- \var{class}, if present, would be a subclass of \code{BastionClass};
- see the code in \file{bastion.py} for the details.  Overriding the
- default \code{BastionClass} will rarely be required.  
- \end{funcdesc}
- 
- So, to safely make an object available to restricted code, create a
- \code{Bastion} object protecting it, and insert the \code{Bastion}
- instance into the restricted environment's namespace. 
- 
- For example, the following code will create a bastion for an instance,
- named \code{S}, that simulates a dictionary.  We want restricted code to
- be able to set and retrieve values from \code{S}, but no other
- attributes or methods should be accessible.
- 
- \begin{verbatim}
- import Bastion
- maindict = r_env.modules['__main__'].__dict__
- maindict['S'] = Bastion.Bastion(SS, 
-           filter = lambda name: name in ['__getitem__', '__setitem__'] )
- \end{verbatim}
- 
- \subsection{Modifying Built-ins}
- 
- Often you'll wish to customize the restricted environment in various
- ways, most commonly by adding or subtracting variables or functions from
- the modules available.  At a more advanced level, you might wish to
- write replacements for existing functions; for example, a Web browser
- that executes Python applets would have an import function that allows
- retrieving modules via HTTP and importing them.
- 
- An easy way to add or remove functions is to create the \class{RExec}
- instance, get the namespace dictionary for the desired module, and add
- or delete the desired function.  For example, the \class{RExec} class
- provides a restricted \code{open()} that allows opening files for
- reading.  If you wish to disallow this, you can simply delete 'open'
- from the \class{RExec} instance's \code{__builtin__} module.
- 
- \begin{verbatim}
- module = r_env.add_module('__builtin__')
- mod_dict = module.__dict__
- del mod_dict['open']
- \end{verbatim}
- 
- (This isn't enough to prevent code from accessing the filesystem; 
- the \class{RExec} class also allows access 
- via some of the functions in the \code{posix} module, which is usually
- aliased to the \code{os} module.  See below for how to change this.)
- 
- This is fine if only a single function is being added or removed, but
- for more complicated changes, subclassing the \class{RExec} class is a
- better idea.
- 
- Subclassing can potentially be quite simple.  The \class{RExec} class
- defines some class attributes that are used to initialize the restricted
- versions of modules such as \code{os} and \code{sys}.  Changing the
- environment's policy then requires just changing the class attribute in
- your subclass.  For example, the default environment allows restricted
- code to use the \code{posix} module to get its process and group ID.  If
- you decide to disallow this, you can do it with the following custom
- class:
- 
- \begin{verbatim}
- class MyRExec(rexec.RExec):
-     ok_posix_names = ('error', 'fstat', 'listdir', 'lstat', 'readlink',
- 		      'stat', 'times', 'uname')
- \end{verbatim}
- 
- More elaborate customizations may require overriding one of the methods
- called to create the corresponding module.  The functions to be
- overridden are \code{make_builtin}, \code{make_main},
- \code{make_osname}, and \code{make_sys}.  The \code{r_import},
- \code{r_open}, and \code{r_reload} methods are made available to
- restricted code, so by overriding these functions, you can change the
- capabilities available.
- 
- For example, defining a new import function requires overriding 
- \code{r_import}:
- 
- \begin{verbatim}
- class MyRExec(rexec.RExec):
-     def r_import(self, mname, globals={}, locals={}, fromlist=[]):
-         raise ImportError, "No imports allowed--ever"      
- \end{verbatim}
- 
- Obviously, a less trivial function could import modules using HTTP, or do something else of interest.
- 
- \section{References}
- 
- See some of the papers on the Knowbot Programming Environment on
- CNRI's publications page: ``Knowbot programming: System support for
- mobile agents'', at
- \url{http://www.cnri.reston.va.us/home/koe/papers/iwooos-full.html}, and ``Using
- the Knowbot Operating Environment in a Wide-Area Network'', at
- \url{http://www.cnri.reston.va.us/home/koe/papers/mos.html}.
-   
- For information on Java's security model, consult the Java Security
- FAQ at \url{http://java.sun.com/sfaq/index.html}.
- 
- Perl supports similar features, via a software package called Penguin
- developed by Felix Gallo.
- Humberto Ortiz Zuazaga wrote a paper called "The Penguin Model for
- Secure Distributed Internet Scripting", at
- \url{http://www.hpcf.upr.edu/~humberto/documents/penguin-safe-scripting.html}.
- Thanks to Fred Drake for bringing it to my attention.
- 
- Work has also been done on Safe-Tcl; see ``The Safe-Tcl Security
- Model'', by Jacob Y. Levy, Laurent Demailly, John K. Ousterhout, and
- Brent B. Welch, in the Proceedings of the 1998 USENIX Annual Technical
- Conference.  Usenix members can access the paper online at
- \url{http://www.usenix.org/publications/library/proceedings/usenix98/levy.html}.
- 
- 
- The Janus project provides a secure environment for untrusted helper
- applications by trapping unsafe system calls.  The project page is
- \url{http://www.cs.berkeley.edu/~daw/janus/}.  Thanks to Paul Prescod
- for suggesting it.
- 
- Can you suggest other links, or some academic references, for this section?

  \section{Version History}

! Sep. 12, 1998: Minor revisions and added the reference to the Janus project.

  Feb. 26, 1998: First version.  Suggestions are welcome.
--- 14,32 ----
  \begin{abstract}
  \noindent

! Python provides a \module{rexec} module running untrusted code.
! However, it's never been exhaustively audited for security and it
! hasn't been updated to take into account recent changes to Python such
! as new-style classes. Therefore, the
! \module{rexec} module should not be trusted.  To discourage use of 
! \module{rexec}, this HOWTO has been withdrawn.

  \end{abstract}

  \section{Version History}

! Sep. 12, 1998: Minor revisions and added the reference to the Janus
! project.

  Feb. 26, 1998: First version.  Suggestions are welcome.
***************
*** 564,567 ****
--- 37,42 ----
  Oct. 4, 2000: Checked with Python 2.0.  Minor rewrites and fixes made.
  Version number increased to 2.0.
+ 
+ Dec. 17, 2002: Withdrawn.

  \end{document}