From: A.M. K. <aku...@us...> - 2002-12-17 14:31:49
|
Update of /cvsroot/py-howto/pyhowto In directory sc8-pr-cvs1:/tmp/cvs-serv3716 Modified Files: rexec.tex Log Message: Withdraw the rexec HOWTO Index: rexec.tex =================================================================== RCS file: /cvsroot/py-howto/pyhowto/rexec.tex,v retrieving revision 1.12 retrieving revision 1.13 diff -C2 -r1.12 -r1.13 *** rexec.tex 26 Nov 2002 16:05:50 -0000 1.12 --- rexec.tex 17 Dec 2002 14:31:42 -0000 1.13 *************** *** 3,7 **** \title{Restricted Execution HOWTO} ! \release{2.0} \author{A.M. Kuchling} --- 3,7 ---- \title{Restricted Execution HOWTO} ! \release{2.1} \author{A.M. Kuchling} *************** *** 14,559 **** \begin{abstract} \noindent - Python provides a restricted execution mode for running untrusted code - that will prevent the code from performing dangerous operations. This - HOWTO explains how to use restricted execution mode, and how to - customize the restricted environment for your application. It aims to - provide a gentler introduction than the corresponding section in the - Python Library Reference. ! This document is available from the Python HOWTO page at ! \url{http://www.python.org/doc/howto}. \end{abstract} - \tableofcontents - - \section{Basic use of \class{RExec}} - - For some applications, it's desirable to execute chunks of Python code - that come from an outside source. The most obvious example is a Web - browser such as Grail, which can download and execute applets written in - Python. - - An obvious danger of downloading and running code from anywhere is that - someone might write a malicious applet that appears to be harmless, but - silently erases files, makes copies of sensitive data, or gives the - applet's author a back door into your system. The solution is to run - the code in a restricted environment, where it's prevented from - performing any operations that could be used maliciously. - - Java does this by using the Java Virtual Machine, which executes Java - bytecode. The virtual machine, or VM, has complete control over the - running applet, and any dangerous operations must go through the VM in - order to be performed. The VM can therefore trap suspicious activity, - and stop the applet's execution, if a strict security policy is used, or - ask the user if the operation should be permitted, if the policy is - somewhat looser. - - Python already has a virtual machine that executes Python byte codes, so - creating a restricted execution environment simply requires sealing off - dangerous built-in functions such as \code{open()}, and dangerous - modules, such as the \code{socket} module. This can be done by creating - new namespaces, removing any dangerous functions, and forcing code to be - executed in those namespaces. While a simple idea, in practice it's - fairly complicated to implement. Luckily, the required features have - been present in Python for a while, and it's already been implemented - for you as a standard module. - - Code for using a restricted execution environment is in the \file{rexec} - module. The base class is called \class{RExec}; in a later section of - this HOWTO, we'll show you how to create your own subclasses of - \class{RExec} to customize the functions and modules that are available. - Here's the documentation for creating a new \class{RExec} instance: - - \begin{funcdesc}{RExec}{[\var{hooks}], [\var{verbose}] } - Returns a \class{RExec} instance. The \var{verbose} parameter is a - Boolean value, defaulting to false. If true, the \class{RExec} instance - will execute in verbose mode, which will print a debugging message when - modules are imported, as if the \code{-v} option was given to the Python - interpreter. - - The \var{hooks} parameter can be an instance of the \code{RHooks} - class, or of some subclass of \code{RHooks}; a default instance will - be used if the parameter is omitted. This is only required when - creating particularly exotic restricted environments that import - modules in new ways. If you need to use this, you'll have to - consult the source code (or Guido) for a complete picture of what's - going on. - \end{funcdesc} - - The \class{RExec} instance has \code{r_exec()}, \code{r_eval()}, and - \code{r_execfile()} functions, which do the same thing as Python's - built-in \code{exec()}, \code{eval()}, and \code{execfile()} functions, - performing them in the restricted environment. (There are also - \code{s_exec()}, \code{s_eval()}, and \code{s_execfile()} methods which - replace the restricted environment's standard input, output, and error - files with \code{StringIO} objects that allow you to control the input - and capture any output generated.) - - Here's a sample usage of a restricted environment. First, the - \class{RExec} instance has to be created. - - \begin{verbatim} - r_env = rexec.RExec() - \end{verbatim} - - Now, we can execute code and evaluate expressions - in the environment: - - \begin{verbatim} - r_env.r_exec('import string') - expr = 'string.upper("This is a test")' - print r_env.r_eval( expr ) - \end{verbatim} - - The first line executes a statement, importing the \code{string} module. - Since it's considered a safe module, the operation succeeds. The second - and third lines create a string containing an expression, and evaluates - the expression in the restricted environment; it prints out \samp{THIS - IS A TEST}, as you'd expect. - - Unsafe operations trigger an exception. For example: - - \begin{verbatim} - r_env.r_exec('import socket') - \end{verbatim} - - The previous line will cause an \code{ImportError} exception to be - raised, with an associated string value that reads "untrusted dynamic - module: _socket". Trying to open a file for writing is also forbidden: - - \begin{verbatim} - r_env.r_exec('file = open("/tmp/a.out", "w")') - \end{verbatim} - - This will raise an \code{IOError} exception, with an assocated string - value that reads "can't open files for writing in restricted mode". The - restricted code can catch the exception in a \code{try...except} block - and continue running; this is useful for writing code which works in - both restricted and unrestricted mode. Opening files for reading will - work, however. - - Exactly what restrictions does the base \class{RExec} impose? It limits - the modules that can be imported to the following safe list: - - \begin{verbatim} - audioop, array, binascii, cmath, errno, imageop, - marshal, math, md5, operator, parser, regex, - pcre, rotor, select, strop, struct, time - \end{verbatim} - - In general, these are modules that can't affect anything outside of - the executing code; they allow various forms of computation, but don't - allow operations that change the filesystem or use network connections - to other machines. (The \code{pcre} module may be unfamiliar. It's - an internal module used by the \module{re} module, so restricted code - can still use the \module{re} to perform regular expression matches.) - - It also restricts the variables and functions that are available from - the \code{sys} and \code{os} modules. The \code{sys} module only - contains the following symbols: - - \begin{verbatim} - ps1, ps2, copyright, version, platform, exit, maxint - \end{verbatim} - - The \code{os} module is reduced to the following functions: - - \begin{verbatim} - error, fstat, listdir, lstat, readlink, - stat, times, uname, getpid, getppid, - getcwd, getuid, getgid, geteuid, getegid - \end{verbatim} - - Note that restricted code has some read-only access to the filesystem - via functions like \code{os.stat} and \code{os.readlink}; if you wish to - forbid all access to the filename, these functions must be removed. - - In restricted mode, there are various attributes of function and class - objects that are no longer accessible: the \code{__dict__} attribute of - class, instance and module objects; the \code{__self__} attribute of - method objects; and most of the attributes of function objects, namely - \code{func_code}, \code{func_defaults}, \code{func_doc}, - \code{func_globals}, and \code{func_name}. - - The \code{__import__()} and \code{reload()} functions are replaced by - versions which implement the above restrictions. Finally, Python's - usual \code{open()} function is removed and replaced by a restricted - version that only allows opening files for reading. - - To change any of these policies, whether to be stricter or looser, see - the section below on customizing the restricted environment. - - \section{Frequently Asked Questions} - - \emph{How do I guard against denial-of-service attacks? Or, how do I - keep restricted code from consuming a lot of memory?} - - Even if restricted code can't open sockets or write files, it can still - cause problems by entering an infinite loop or consuming lots of memory; - this is as easy as coding \code{while 1: pass} or \code{'a' * - 12345678901}. Unfortunately, there's no way at present to prevent - restricted code from doing this. The Python process may therefore - encounter a \code{MemoryError} exception, loop forever, or be killed by - the operating system. - - One solution would be to perform \code{os.fork()} to get a child process - running the interpreter. The child could then use the \code{resource} - module to set limits on the amount of memory, stack space, and CPU time - it can consume, and run the restricted code. In the meantime, the - parent process can set a timeout and wait for the child to return its - results; if the child takes too long, the parent can conclude that the - restricted code looped forever, and kill the child process. - - \emph{If restricted code returns a class instance via \code{r_eval()}, - can that class instance do nasty things if unrestricted code calls its - methods?} - - You might be worried about the handling of values returned by - \code{r_eval()}. For example, let's say your program does this: - - \begin{verbatim} - value = r_env.r_eval( expression ) - print str(value) - \end{verbatim} - - If \code{value} is a class instance, and has a \code{__str__} method, - that method will get called by the \code{str()} function. Is it - possible for the restricted code to return a class instance where the - \code{__str__} function does something nasty? Does this provide a way - for restricted code to smuggle out code that gets run without - restrictions? - - The answer is no. If restricted code returns a class instance, or a - function, then, despite being called by unrestricted code, those - functions will always be executed in the restricted environment. You - can see why if you follow this little exercise. Run the interpreter in - interactive mode, and create a sample class with a single method. - - \begin{verbatim} - >>> class C: - ... def f(self): print "Hi!" - ... - \end{verbatim} - - Now, look at the attributes of the unbound method \code{C.f}: - - \begin{verbatim} - >>> dir(C.f) - ['__doc__', '__name__', 'im_class', 'im_func', 'im_self'] - \end{verbatim} - - \code{im_func} is the attribute we're interested in; it contains the - actual function for the method. Look at the function's attributes using - the \code{dir()} built-in function, and then look at the - \code{func_globals} attribute. - - \begin{verbatim} - >>> dir(C.f.im_func) - ['__doc__', '__name__', 'func_code', 'func_defaults', 'func_doc', - 'func_globals', 'func_name'] - >>> C.f.im_func.func_globals - {'__doc__': None, '__name__': '__main__', - '__builtins__': <module '__builtin__'>, - 'f': <function f at 1201a68b0>, - 'C': <class __main__.C at 1201b35e0>, - 'a': <__main__.C instance at 1201a6b10>} - \end{verbatim} - - See how the function contains attributes for its \code{__builtins__} - module? This means that, wherever it goes, the function will always use - the same \code{__builtin__} module, namely the one provided by the - restricted environment. - - This means that the function's module scope is limited to that of the - restricted environment; it has no way to access any variables or - methods in the unrestricted environment that is calling into the - restricted environment. - - \begin{verbatim} - r_env.r_exec('def f(): g()\n') - f = r_env.r_eval('f') - def g(): print "I'm unrestricted." - \end{verbatim} - - If you execute the \code{f()} function in the unrestricted module, it - will fail with a \code{NameError} exception, because \code{f()} doesn't - have access to the unrestricted namespace. To make this work, you'd - must insert \code{g} into the restricted namespace. Be careful when - doing this, since \code{g} will be executed without restrictions; you - have to be sure that \code{g} is a function that can't be used to do - any damage. (Or is an instance with no methods that do anything - dangerous. Or is a module containing no dangerous functions. You get - the idea.) - - - \emph{What happens if restricted code raises an exception?} - - The \module{rexec} module doesn't do anything special for exceptions - raised by restricted code; they'll be propagated up the call stack - until a \code{try...except} statement is found that catches it. If - no exception handler is found, the interpreter will print a traceback and exit, which - is its usual behaviour. To prevent untrusted code from terminating - the program, you should surround calls to \code{r_exec()}, - \code{r_execfile()}, etc. with a \code{try...except} statement. - - Python 1.5 introduced exceptions that could be classes; for more - information about this new feature, consult - \url{http://www.python.org/doc/essays/stdexceptions.html}. - Class-based exceptions present a problem; the separation between - restricted and unrestricted namespaces may cause confusion. Consider - this example code, suggested by Jeff Rush. - - t1.py: - \begin{verbatim} - # t1.py - - from rexec import RHooks, RExec - from t2 import MyException - r= RExec( ) - - print 'MyException class:', repr(MyException) - try: - r.r_execfile('t3.py') - except MyException, args: - print 'Got MyException in t3.py' - except: - print 'Missed MyException "%s" in t3.py' % repr(MyException) - \end{verbatim} - - t2.py - \begin{verbatim} - #t2.py - - class MyException(Exception): pass - def myfunc(): - print 'Raising', `MyException` - raise MyException, 5 - - print 't2 module initialized' - \end{verbatim} - - t3.py: - \begin{verbatim} - #t3.py - import sys - from t2 import MyException, myfunc - myfunc() - \end{verbatim} - - So, \file{t1.py} imports the \code{MyException} class from - \file{t2.py}, and then executes some restricted code that also imports - \file{t2.py} and raises \code{MyException}. However, because of the - separation between restricted and unrestricted code, \code{t2.py} is - actually imported twice, once in each mode. Therefore two distinct - class objects are created for \code{MyException}, and the - \code{except} statement doesn't catch the exception because it seems - to be of the wrong class. - - The solution is to modify \file{t1.py} to pluck the class object out - of the restricted environment, instead of importing it. The following - code will do the job, if added to \code{t1.py}: - - \begin{verbatim} - module = r.add_module('__main__') - mod_dict = module.__dict__ - MyException = mod_dict['MyException'] - \end{verbatim} - - The first two lines simply get the dictionary for the \code{__main__} - module; this is a usage pattern discussed above. The last line simply - gets the value corresponding to 'MyException', which will be the class - object for \code{MyException}. - - \section{Customizing The Restricted Environment} - \label{sect-customizing} - - \subsection{Inserting Variables} - - While restricted code may be completely self-contained, it's common for - it to require other data: perhaps a tuple listing various available - plug-ins, or a dictionary mapping symbols to values. For simple Python - data types, such as numbers and strings, the natural solution is to - insert variables into one of the namespaces used by the restricted - environment, binding the desired variable name to the value. - - Continuing from the examples above, you can get the dictionary - corresponding to the restricted module named \code{module_name} with the - following code: - - \begin{verbatim} - module = r_env.add_module(module_name) - mod_dict = module.__dict__ - \end{verbatim} - - Despite its name, the \code{add_module()} method actually only adds the - module if it doesn't already exist; it returns the corresponding module - object, whether or not the module had to be created. - - Most commonly, you'll insert variable bindings into the \code{__main__} - or \code{__builtins__} module, so these will be the most frequent values - of \code{module_name}. - - Once you have the module's dictionary, you need only insert a key/value - pair for the desired variable name and value. For example, to add a - \code{username} variable: - - \begin{verbatim} - mod_dict['username'] = "Kate Bush" - \end{verbatim} - - Restricted code will then have access to this variable. - - \subsection{Allowing Access to Unrestricted Objects} - - Often, the code being executed will need access to various objects that - exist outside the restricted environment. For example, an applet should - be able to read some attributes of the object representing the browser, - or needs access to the \code{Tkinter} module to provide a GUI display. - But the browser object, or the \code{Tkinter} module aren't safe, so - what can be done? - - The solution is in the \code{Bastion} module, which lets you create - class instances that represent some other Python object, but deny access - to certain sensitive attributes or methods. - - \begin{funcdesc}{Bastion}{\var{object}, [\var{filter}], [\var{name}], - [\var{class}] } - - Return a \code{Bastion} instance protecting the class instance - \var{object}. Any attempt to access one of the object's attributes will - have to be approved by the \var{filter} function; if the access is - denied an \code{AttributeError} exception will be raised. - - If present, \var{filter} must be a function that accepts a string - containing an attribute name, and returns true if access to that - attribute will be permitted; if \var{filter} returns false, the access - is denied. The default filter denies access to any function beginning - with an underscore \samp{_}. The bastion's string representation - will be \code{<Bastion for \var{name}>} if a value for - \var{name} is provided; otherwise, \code{repr(\var{object})} will be used. - - \var{class}, if present, would be a subclass of \code{BastionClass}; - see the code in \file{bastion.py} for the details. Overriding the - default \code{BastionClass} will rarely be required. - \end{funcdesc} - - So, to safely make an object available to restricted code, create a - \code{Bastion} object protecting it, and insert the \code{Bastion} - instance into the restricted environment's namespace. - - For example, the following code will create a bastion for an instance, - named \code{S}, that simulates a dictionary. We want restricted code to - be able to set and retrieve values from \code{S}, but no other - attributes or methods should be accessible. - - \begin{verbatim} - import Bastion - maindict = r_env.modules['__main__'].__dict__ - maindict['S'] = Bastion.Bastion(SS, - filter = lambda name: name in ['__getitem__', '__setitem__'] ) - \end{verbatim} - - \subsection{Modifying Built-ins} - - Often you'll wish to customize the restricted environment in various - ways, most commonly by adding or subtracting variables or functions from - the modules available. At a more advanced level, you might wish to - write replacements for existing functions; for example, a Web browser - that executes Python applets would have an import function that allows - retrieving modules via HTTP and importing them. - - An easy way to add or remove functions is to create the \class{RExec} - instance, get the namespace dictionary for the desired module, and add - or delete the desired function. For example, the \class{RExec} class - provides a restricted \code{open()} that allows opening files for - reading. If you wish to disallow this, you can simply delete 'open' - from the \class{RExec} instance's \code{__builtin__} module. - - \begin{verbatim} - module = r_env.add_module('__builtin__') - mod_dict = module.__dict__ - del mod_dict['open'] - \end{verbatim} - - (This isn't enough to prevent code from accessing the filesystem; - the \class{RExec} class also allows access - via some of the functions in the \code{posix} module, which is usually - aliased to the \code{os} module. See below for how to change this.) - - This is fine if only a single function is being added or removed, but - for more complicated changes, subclassing the \class{RExec} class is a - better idea. - - Subclassing can potentially be quite simple. The \class{RExec} class - defines some class attributes that are used to initialize the restricted - versions of modules such as \code{os} and \code{sys}. Changing the - environment's policy then requires just changing the class attribute in - your subclass. For example, the default environment allows restricted - code to use the \code{posix} module to get its process and group ID. If - you decide to disallow this, you can do it with the following custom - class: - - \begin{verbatim} - class MyRExec(rexec.RExec): - ok_posix_names = ('error', 'fstat', 'listdir', 'lstat', 'readlink', - 'stat', 'times', 'uname') - \end{verbatim} - - More elaborate customizations may require overriding one of the methods - called to create the corresponding module. The functions to be - overridden are \code{make_builtin}, \code{make_main}, - \code{make_osname}, and \code{make_sys}. The \code{r_import}, - \code{r_open}, and \code{r_reload} methods are made available to - restricted code, so by overriding these functions, you can change the - capabilities available. - - For example, defining a new import function requires overriding - \code{r_import}: - - \begin{verbatim} - class MyRExec(rexec.RExec): - def r_import(self, mname, globals={}, locals={}, fromlist=[]): - raise ImportError, "No imports allowed--ever" - \end{verbatim} - - Obviously, a less trivial function could import modules using HTTP, or do something else of interest. - - \section{References} - - See some of the papers on the Knowbot Programming Environment on - CNRI's publications page: ``Knowbot programming: System support for - mobile agents'', at - \url{http://www.cnri.reston.va.us/home/koe/papers/iwooos-full.html}, and ``Using - the Knowbot Operating Environment in a Wide-Area Network'', at - \url{http://www.cnri.reston.va.us/home/koe/papers/mos.html}. - - For information on Java's security model, consult the Java Security - FAQ at \url{http://java.sun.com/sfaq/index.html}. - - Perl supports similar features, via a software package called Penguin - developed by Felix Gallo. - Humberto Ortiz Zuazaga wrote a paper called "The Penguin Model for - Secure Distributed Internet Scripting", at - \url{http://www.hpcf.upr.edu/~humberto/documents/penguin-safe-scripting.html}. - Thanks to Fred Drake for bringing it to my attention. - - Work has also been done on Safe-Tcl; see ``The Safe-Tcl Security - Model'', by Jacob Y. Levy, Laurent Demailly, John K. Ousterhout, and - Brent B. Welch, in the Proceedings of the 1998 USENIX Annual Technical - Conference. Usenix members can access the paper online at - \url{http://www.usenix.org/publications/library/proceedings/usenix98/levy.html}. - - - The Janus project provides a secure environment for untrusted helper - applications by trapping unsafe system calls. The project page is - \url{http://www.cs.berkeley.edu/~daw/janus/}. Thanks to Paul Prescod - for suggesting it. - - Can you suggest other links, or some academic references, for this section? \section{Version History} ! Sep. 12, 1998: Minor revisions and added the reference to the Janus project. Feb. 26, 1998: First version. Suggestions are welcome. --- 14,32 ---- \begin{abstract} \noindent ! Python provides a \module{rexec} module running untrusted code. ! However, it's never been exhaustively audited for security and it ! hasn't been updated to take into account recent changes to Python such ! as new-style classes. Therefore, the ! \module{rexec} module should not be trusted. To discourage use of ! \module{rexec}, this HOWTO has been withdrawn. \end{abstract} \section{Version History} ! Sep. 12, 1998: Minor revisions and added the reference to the Janus ! project. Feb. 26, 1998: First version. Suggestions are welcome. *************** *** 564,567 **** --- 37,42 ---- Oct. 4, 2000: Checked with Python 2.0. Minor rewrites and fixes made. Version number increased to 2.0. + + Dec. 17, 2002: Withdrawn. \end{document} |