From: Tibs (T. J. I. <ti...@us...> - 2003-10-21 12:00:28
|
Update of /cvsroot/docutils/sandbox/tibs/pysource2 In directory sc8-pr-cvs1:/tmp/cvs-serv25780 Modified Files: package.py test_package.py Added Files: log.txt temp.rst temp.txt transform.py Log Message: Start transforming trees. --- NEW FILE: log.txt --- ============================= Writing and testing pysource2 ============================= :Author: Tibs :Contact: ti...@ti... :Revision: $Revision: 1.1 $ :Date: $Date: 2003/10/21 10:36:28 $ :Copyright: This document has been placed in the public domain. pysource2 is my attempt to rewrite the original pysource. pysource itself was a proof-of-concept module to find docstrings withing Python source files and present them as (by default) HTML documentation, as described by the Docutils project. Since it was written before the Docutils codebase became stabilised around its current Reader/Writer patterns, it doesn't really mesh well with the current approaches. Also, lots of the code is fairly grotty anyway, and could do with a rewrite on principle - not least because it is not well tested. So, pysource2 is both that rewrite, and also an attempt on my part to learn how to do test driven development. Setting the path ================ I want to take my docutils stuff directly from the source directories, so that I work with the latest CVS code, and don't have to keep installing things. Thus I need to set the Python path to point to the source directories:: export PYTHONPATH=${PYTHONPATH}:${HOME}/docutils Since I'm using Python 2.2.3, I also need the modules in the "extras" directory:: export PYTHONPATH=${PYTHONPATH}:${HOME}/docutils/extras If I want access to the testing stuff, I also need the "test" directory:: export PYTHONPATH=${PYTHONPATH}:${HOME}/docutils/test NB: Relies upon the code in docutils/docutils/readers/python/moduleparser.py Log === The "obvious" place to start is with packages - the previous pysource never did quite get them right (for a start, it wouldn't cope with sub-packages). Also, having a utility to report on packages, then on modules, and gradually on to finer levels of detail, seems like giving something useful as soon as possible. It looked over-complex to adopt the docutils test framework itself, initially, especially since I am new both to unit testing *and* to test driven development. So I am being less ambitious, and working with "pure" unit tests - I reckon I'll learn more that way. So, the first pass gives me package.py and test_package.py. My first impressions of (such a little bit of) development is that TDD does indeed give one the feeling of reassurance I'd expected from my half-TDD efforts in Java at LSL. Initially, I was looking to detect a request for a package that didn't exist, or wasn't a directory file, explicitly, with dedicated exceptions. This felt rather over-complex, and indeed refactoring those tests out and just catching a (non-explicit) OSError in the tests works well enough - in reality, a user is not going to ask to parse a package that is not already known to be an existant directory (heck, the "user" is probably a program that's just figured out if the thing whose documentation is wanted is a file or a directory), and if they do then OSError makes sense since it is what one would normally get. Questions ========= * Should we attempt to parse files that don't end in ".py"? What about ".pyw"? What about Python files on Unix which have had their extension removed and been made executable? * Should there be an option to produce a document for a directory of Python files that is not a package - e.g., a directory of useful scripts put together just to be on the UNIX path, or Python's own library. TODO ==== * Add a method to Module to indicate if it has an Attribute called __docformat__, and if so, what its value is. * That requires understanding how the testing for the moduleparser is organised and works, so I can add an appropriate test. * At which stage, should I incorporate Package (and NotPython) therein? * Write a simple transform (first learn how!) to parse any Docstring contents in a module with __docformat__ equal to one of the reStructuredText indicators. * Write another transform to turn the Pythonic doctree into a standard one. * At which point, we'll have something useful, albeit not very powerful, so provide an appropriate command line interface for (at least) HTML output. * Work out how to do attribute references, etc., in *this* context (I have no idea if the mechanisms from the original pysource will be any use). --- NEW FILE: temp.rst --- <document source="temp.txt"> <comment xml:space="preserve"> This is a simple reStructuredText file that represents what I would <comment xml:space="preserve"> like the output of transforming my test Python code to be <section id="package-trivial-package" name="package trivial_package"> <title> Package trivial_package <section id="module-trivial-package-init" name="module trivial_package.__init__"> <title> Module trivial_package.__init__ <block_quote> <pending> .. internal attributes: .transform: docutils.transforms.misc.ClassAttribute .details: class: 'docstring' directive: 'class' <paragraph> A simple docstring. <section id="module-trivial-package-file1" name="module trivial_package.file1"> <title> Module trivial_package.file1 <block_quote> <pending> .. internal attributes: .transform: docutils.transforms.misc.ClassAttribute .details: class: 'docstring' directive: 'class' <paragraph> This is the first example file. It <emphasis> does use reStructuredText. <paragraph> Attributes: <bullet_list bullet="*"> <list_item> <paragraph> __docformat__ = "reST" (line 5) <paragraph> Import: os (line 7) <section id="class-trivial-package-file1-fred" name="class trivial_package.file1.fred"> <title> Class trivial_package.file1.Fred <field_list> <field> <field_name> line <field_body> <paragraph> 9 <pending> .. internal attributes: .transform: docutils.transforms.misc.ClassAttribute .details: class: 'docstring' directive: 'class' <paragraph> An example class - it announces each instance as it is created. <section id="method-trivial-package-file1-fred-init" name="method trivial_package.file1.fred.__init__"> <title> Method trivial_package.file1.Fred.__init__ <field_list> <field> <field_name> line <field_body> <paragraph> 13 <field> <field_name> parameters <field_body> <paragraph> self <section id="module-trivial-package-file2" name="module trivial_package.file2"> <title> Module trivial_package.file2 <block_quote> <pending> .. internal attributes: .transform: docutils.transforms.misc.ClassAttribute .details: class: 'docstring' directive: 'class' <paragraph> This module is <emphasis> not using reStructuredText for its docstrings. <section id="file-trivial-package-not-python" name="file trivial_package.not_python"> <title> File trivial_package.not_python <paragraph> (Not a Python module) <section id="package-trivial-package-sub-package" name="package trivial_package.sub_package"> <title> Package trivial_package.sub_package <section id="module-trivial-package-sub-package-init" name="module trivial_package.sub_package.__init__"> <title> Module trivial_package.sub_package.__init__ <paragraph> (No documentation) --- NEW FILE: temp.txt --- .. This is a simple reStructuredText file that represents what I would .. like the output of transforming my test Python code to be ======================= Package trivial_package ======================= Module trivial_package.__init__ =============================== .. class:: docstring A simple docstring. Module trivial_package.file1 ============================ .. class:: docstring This is the first example file. It *does* use reStructuredText. Attributes: * __docformat__ = "reST" (line 5) Import: os (line 7) Class trivial_package.file1.Fred -------------------------------- :line: 9 .. class:: docstring An example class - it announces each instance as it is created. Method trivial_package.file1.Fred.__init__ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ :line: 13 :parameters: self Module trivial_package.file2 ============================ .. class:: docstring This module is *not* using reStructuredText for its docstrings. File trivial_package.not_python =============================== (Not a Python module) Package trivial_package.sub_package =================================== Module trivial_package.sub_package.__init__ ------------------------------------------- (No documentation) --- NEW FILE: transform.py --- """transform.py - create a docutils Document tree from a Package or Module tree """ from docutils.utils import new_document import docutils.nodes as nodes def make_document(tree): """Return a docutils Document tree constructed from this Python tree. The tree given must be either a Package or Module tree. """ document = new_document("Package trivial_package") section = nodes.section(id="package-trivial-package", name="package trivial_package") title = nodes.title(text="Package trivial_package") section.append(title) document.append(section) return document Index: package.py =================================================================== RCS file: /cvsroot/docutils/sandbox/tibs/pysource2/package.py,v retrieving revision 1.2 retrieving revision 1.3 diff -u -d -r1.2 -r1.3 --- package.py 21 Jul 2003 19:05:55 -0000 1.2 +++ package.py 21 Oct 2003 10:36:28 -0000 1.3 @@ -2,15 +2,13 @@ """ import os -from docutils.readers.python.moduleparser import Node +from docutils.readers.python.moduleparser import Node, parse_module class NotAPackageException(Exception): pass -class NoSuchDirectoryException(Exception): - pass - - + +# ---------------------------------------------------------------------- class Package(Node): """This class represents a Python package. @@ -18,6 +16,9 @@ This may be extended/altered/expanded to include/disambiguate the name of the package, the "full" name of the package (e.g., if it is a sub-package) and the full path of the package, as needs indicate. + + Note that a package must, by definition, include at least one module, + i.e., __init__.py (otherwise, it isn't a package). """ def __init__(self, filename): @@ -39,39 +40,48 @@ return Node.attlist(self, filename=self.filename) -def parse_package(package_path): + +# ---------------------------------------------------------------------- +class NotPython(Node): + """This class is used to represent a non-Python file. + + @@@ If the file isn't Python, should we try for reStructuredText? + """ + + def __init__(self, filename): + """Initialise a NotPython instance. + + @@@ Same caveats as Package. + """ + # Hackery - the following two lines copied from Node itself. + self.children = [] + self.lineno = None + self.filename = filename + + def attlist(self): + return Node.attlist(self, filename=self.filename) + + +# ---------------------------------------------------------------------- +def parse_package(package_path,ignore=None): """Parse a package for documentation purposes. `package_path` should be the system path of the package directory, which is not necessarily the same as the Python path... - - Note that the final result is expected to return a Docutils tree, not - a string - but for the moment a string is easier. """ package_path = os.path.normpath(package_path) - if not os.path.exists(package_path): - raise NoSuchDirectoryException,\ - "Directory '%s' does not exist"%package_path - - if not os.path.isdir(package_path): - raise NotAPackageException,\ - "Directory '%s' is not a Python package"%package_path - dir,file = os.path.split(package_path) if dir == "": dir = "." return parse_subpackage(dir,file) -def parse_subpackage(package_path,subpackage,indent=""): +def parse_subpackage(package_path,subpackage): """Parse a subpackage for documentation purposes. `package_path` should be the system path of the package directory, and `subpackage` is the (file) name of the subpackage therein. It is assumed that this is already known to be a directory. - - The indentation is purely for debugging purposes, and should not - (of course) actually be used in the returned value. """ sub_path = os.path.join(package_path,subpackage) @@ -81,19 +91,58 @@ "Directory '%s' is not a Python package"%sub_path node = Package(subpackage) - ###text = '%s<Package filename="%s">\n'%(indent,subpackage) - for file in files: - if os.path.isdir(os.path.join(sub_path,file)): + # Should we sort the files? Well, if we don't have them in a predictable + # order, it is harder to test the result(!), and also I believe that it + # is easier to use the output if there is some obvious ordering. Of course, + # the question then becomes whether packages and modules should be in the + # same sequence, or separated. + files.sort() + + for filename in files: + fullpath = os.path.join(sub_path,filename) + if os.path.isdir(fullpath): try: - ###text += parse_subpackage(sub_path,file,indent+" ") - node.append(parse_subpackage(sub_path,file)) + node.append(parse_subpackage(sub_path,filename)) except NotAPackageException: pass - - ###return text + else: + node.append(parse_file(fullpath,filename)) return node +def parse_file(fullpath,filename): + """Parse a single file (which we hope is a Python file). + + * `fullpath` is the full path of the file + * `filename` is the name we want to use for it in the docutils tree + + Returns a docutils parse tree for said file. + """ + + # @@@ Should we worry about the extension of the file? + # Trying to use that to predict the contents can be a problem + # - we already know that we have to worry about ".pyw" as well + # as ".py", not to mention the possibility (e.g., on Unix) of + # having removed the extension in order to make an executable + # file "look" more like a Unix executable. On the whole, it's + # probably better to try to parse a file, and worry about it + # not parsing if/when that occurs. + + module = open(fullpath) + try: + module_body = module.read() + try: + module_node = parse_module(module_body,filename) + except SyntaxError: + # OK - it wasn't Python - so what *should* we do with it? + module_node = NotPython(filename) + return module_node + finally: + module.close() + + + +# ---------------------------------------------------------------------- if __name__ == "__main__": result = parse_package("trivial_package") print result Index: test_package.py =================================================================== RCS file: /cvsroot/docutils/sandbox/tibs/pysource2/test_package.py,v retrieving revision 1.2 retrieving revision 1.3 diff -u -d -r1.2 -r1.3 --- test_package.py 21 Jul 2003 19:05:55 -0000 1.2 +++ test_package.py 21 Oct 2003 10:36:28 -0000 1.3 @@ -1,8 +1,9 @@ +#! /usr/bin/env python """test_package.py Unit tests for parsing packages for pysource. -Initially, this is a standalone test, but ultimately it may be merge into the +Initially, this is a standalone test, but ultimately it may be merged into the mechanisms used for the Docutils self-tests. :Author: Tibs @@ -13,8 +14,9 @@ """ import unittest -from package import parse_package, NotAPackageException, \ - NoSuchDirectoryException + +from package import parse_package, NotAPackageException +from transform import make_document class PackageTest(unittest.TestCase): @@ -22,7 +24,7 @@ """Not a package - no such directory. """ - self.assertRaises(NoSuchDirectoryException, + self.assertRaises(OSError, parse_package, "no_such_directory") @@ -30,7 +32,7 @@ """Not a package - file is not a directory. """ - self.assertRaises(NotAPackageException, + self.assertRaises(OSError, parse_package, "not_a_directory") @@ -42,14 +44,78 @@ parse_package, "not_a_package") - def testTrivialPackage(self): - """Trivial package(s) - only empty __init__.py files. + def testPackage(self): + """A package containing subpackage(s) + + The directory is called "trivial_package" for historical reasons. """ - self.assertEqual(str(parse_package("trivial_package")), - """\ + wanted_result = """\ <Package filename="trivial_package"> - <Package filename="sub_package">\n""") + <Module filename="__init__.py"> + <Docstring> + A simple docstring. + <Module filename="file1.py"> + <Docstring> + This is the first example file. It *does* use reStructuredText. + <Attribute lineno="5" name="__docformat__"> + <Expression lineno="5"> + "reST" + <Import lineno="7"> + os + <Class lineno="9" name="Fred"> + <Docstring lineno="9"> + An example class - it announces each instance as it is created. + <Method lineno="13" name="__init__"> + <ParameterList lineno="13"> + <Parameter lineno="13" name="self"> + <Module filename="file2.py"> + <Docstring> + This module is *not* using reStructuredText for its docstrings. + <NotPython filename="not_python"> + <Package filename="sub_package"> + <Module filename="__init__.py">\n""" + + actual_result = str(parse_package("trivial_package")) + + if wanted_result != actual_result: + print "+++++++++++++++++++++++++ WANT" + print wanted_result + print "+++++++++++++++++++++++++ GOT" + print actual_result + print "+++++++++++++++++++++++++" + + self.assertEqual(actual_result,wanted_result) + + def testFindDocstrings(self): + """ + Find each docstring, and format it appropriately. + """ + + # @@@ For the moment, just wrap each docstrings innnards inside + # a literal block (which is what we want to do if the module/file + # does not indicate that docstrings are in reStructuredText). + wanted_result = """\ +<document source="Package trivial_package"> + <section id="package-trivial-package" name="package trivial_package"> + <title> + Package trivial_package +""" + + tree = parse_package("trivial_package") + + document = make_document(tree) + + actual_result = document.pformat() + + if wanted_result != actual_result: + print "+++++++++++++++++++++++++ WANT" + print wanted_result + print "+++++++++++++++++++++++++ GOT" + print actual_result + print "+++++++++++++++++++++++++" + + self.assertEqual(actual_result,wanted_result) if __name__ == "__main__": |