|
From: Fuzzyman <fuz...@vo...> - 2006-05-23 15:13:13
|
Hello Robin, Thanks for your comments/work etc. I'll probably push out a 4.3.2 release soon with the minor bugfixes I've already done and then work on a 4.4.0 release. We can then decide how to integrate the new templating into that. The bazaar folks are concerned about ConfigObj performance and aim to create some patches to address that, hopefully they can be integrated into the same release. All the best, Fuzzyman http://www.voidspace.org.uk/python/index.shtml Robin Munn wrote: > On 5/23/06, Fuzzyman <fuz...@vo...> wrote: > >> Robin Munn wrote: >> >> > On 5/23/06, Robin Munn <rob...@gm...> wrote: >> > >> >> > Instead of reimplementing parts of string.Template, I'd copy it >> >> wholesale >> >> > from the stdlib string module to an additional module, and do a >> >> conditional >> >> > import, like this: >> >> > >> >> > try: >> >> > from string import Template >> >> > except ImportError: >> >> > from configobj.string import Template >> >> > >> >> > That way, when ConfigObj drops support for the older Python >> >> versions, the >> >> > additional module and the conditional import will be removed, and >> >> no code >> >> > duplication will remain. >> >> >> >> That was my first approach, but it didn't actually work. The problem >> >> was that string.Template expects you to pass it a single mapping >> >> (whether it's a dict or some other dict-like object, it doesn't >> >> matter), and doesn't have an approach for saying "If this mapping >> >> fails, try this other mapping". I suppose I could have caught the >> >> resulting KeyError and called string.Template.substitute() again with >> >> the second mapping, but I'm not sure it would have worked, and it >> >> would have looked quite a bit uglier than what I ended up with. >> >> >> >> OTOH, you have a very good point about using already tested and >> >> deployed code. So I'll reimplement the TemplateEngine class using >> >> string.Template, using the "call substitute() a second and third time >> >> if you get KeyError" approach, and see if it works out better. I'll >> >> post my results as a second patch if it works out. (Or even if, IMO, >> >> it doesn't). >> > >> > >> > Attempting this again, I'm reminded why I abandoned this approach when >> > I tried it yesterday. It turns out that because of the way >> > string.Template is written, I can't safely replicate ConfigObj's >> > current behavior. Here's why: >> > >> > 1) If ConfigObj doesn't find the value in any of the sections it >> > searches (the three DEFAULT subsections), it raises a >> > MissingInterpolationError. string.Template can do the same thing, >> > since its substitute() function will raise KeyError if the identifier >> > given isn't found in the mapping. (E.g., you tell string.Template to >> > interpolate "$foo" but don't have a key "foo" in the dict you hand >> > it). To chain the DEFAULT sections, I could just chain three calls to >> > string.Template.substitute(), only raising MissingInterpolationError >> > if the last one fails to find any values. So far so good. >> > >> > But: 2) ConfigObj also promises recursive interpolation, up to ten >> > levels deep. And here I run into a problem. Say I want to put the >> > value "$100" in my configuration file. I write it as "money = $$100", >> > since string.Template-style interpolation promises that a doubled >> > delimiter will be turned into a single delimiter in the output and >> > otherwise left untouched. Hence, "$$100" should become "$100". But >> > then the recursive interpolation kicks in, and the string "$100" is >> > sent around for another try -- where it fails, because "100" is not a >> > valid Python identifier. So I can't use substitute(); I need to use >> > safe_substitute(), which doesn't raise exceptions when it encounters >> > an invalid substitution, but simply leaves it alone. Using >> > safe_substitute() instead of substitute() will allow me to leave the >> > "$100" string alone. But wait -- if I'm using safe_substitute(), then >> > I'm also not raising an exception when I try to interpolate "$foo" >> > without a key "foo" in the values dictionary. Oops. >> > >> > I can't satisfy both 1) and 2) above by using the string.Template >> > class. So I need to write my own implementation, that can throw an >> > error when it sees "$foo" but will leave "$$100" alone. >> > >> > Actually, as I was working through this example, I just realized that >> > my patch still has a bug in it. It can deal with the template "$$100" >> > and produce "$100", and then leave that value alone -- but only >> > because "100" isn't a valid Python identifier and thus doesn't match >> > the template regexp anymore. But what if you really, really want to >> > put the word "$name" in one of the values of your config file? Doing >> > "$$name" won't work, because that interprets to "$name" and then a >> > MissingInterpolationError will be raised when TemplateEngine can't >> > find the key "name" in any of the sections it searches. >> > >> > What this patch really needs is a way to say "stop the interpolation >> > recursion now, we're through". The only approach I've come up with so >> > far is to say that if you encounter a $$ in your input, then stop >> > interpolating, because you've just turned that $$ into a $. And on the >> > next cycle you're going to start interpolating that value, when the >> > user has clearly communicated (by his escaping the delimiter) his >> > intent for that particular value *not* to be interpolated. >> > >> > That simply can't be done using string.Template, not even with >> > monkeypatching. What you would need is to be able to reach down inside >> > the string.Template.substitute() function's *helper* function and add >> > a line of code to say "If you find the 'escaped' group in this match >> > object, then don't just return the delimiter, but *also* set a flag to >> > let me know." That can't be done without resorting to bytecode hacks. >> > (Or perhaps, in Python 2.5, AST manipulation). And that's even worse >> > than rewriting string.Template myself. >> > >> > At the end of the day, there's no getting around it -- string.Template >> > is not designed for recursive interpolation. If we want to allow >> > recursive interpolation and get it right, we need to write the code >> > ourselves. Fortunately, it comes out pretty small. >> > >> Hmmm... on the other hand I'm not convinced recursive interpolation is >> used by anyone. I don't mind it being dropped for the string templating >> interpolation - *if* that makes it any easier. >> >> Fuzzyman >> http://www.voidspace.org.uk/python/index.shtml > > > Here's a revised version of the template-interpolation patch that > stops recursing the minute it encounters a doubled-up delimiter. It > added 14 lines of new code: 6 lines dealing with the "stop_recursion" > flag, and 8 lines of comments explaining why the flag might get set. > > I've got a pretty complete "foo.ini" file that exercises most of the > features in ConfigObj's interpolation by now. I should turn it into a > set of sensible test cases (basically, read in a config file and make > sure the interpolated options get their correct values and don't throw > exceptions). But I'm not sure my regular workload will stand any more > inattention today, so I'll have to put it off for now. Therefore, in > addition to the revised patch, I'm attaching the "foo.ini" file I've > been using for testing. If someone else wants to turn it into test > cases before I get around to it, be my guest. > > The one thing that I haven't included in my documentation patch, that > might possibly be surprising behavior to somebody who has an obscure > corner case, is the fact that recursion stops when it hits a $$. So if > someone's done something like: > > first = foo > second = $$bar > third = "$first and $second" > fourth = "$third and $second and $first" > fifth = $fourth > > When "$fourth" is interpolated, the result will be "$third and $second > and $first". That will yield "$first and $second and $$bar and foo". > Which gives "foo and $$bar and $bar and foo" -- and because a "$$" > just went past, interpolation stops right there. > > That might surprise someone, but I'm having a real hard time coming up > with reasons why anyone would want to do something twisted like this. > :-) So the "stop when you hit a $$" behavior should be fine for most > purposes. > >------------------------------------------------------------------------ > >=== docs/configobj.txt >================================================================== >--- docs/configobj.txt (revision 19655) >+++ docs/configobj.txt (local) >@@ -787,13 +787,13 @@ > * create_empty > * file_error > >-interpolate >-~~~~~~~~~~~ >+interpolation >+~~~~~~~~~~~~~ > > ConfigObj can perform string interpolation in a *similar* way to > ``ConfigParser``. See the interpolation_ section for full details. > >-If ``interpolate`` is set to ``False``, then interpolation is *not* done when >+If ``interpolation`` is set to ``False``, then interpolation is *not* done when > you fetch values. > > stringify >@@ -1943,9 +1943,29 @@ > ============= > > ConfigObj allows string interpolation *similar* to the way ``ConfigParser`` >+or ``string.Template`` work. The value of the ``interpolation`` attribute >+determines which style of interpolation you want to use. Valid values are >+"ConfigParser" or "Template" (case-insensitive, so "configparser" and >+"template" will also work). For backwards compatibility reasons, the value >+``True`` is also a valid value for the ``interpolation`` attribute, and >+will select ``ConfigParser``-style interpolation. At some undetermined point >+in the future, that default *may* change to ``Template``-style interpolation. > >-You specify a value to be substituted by including ``%(name)s`` in the value. >+For ``ConfigParser``-style interpolation, you specify a value to be >+substituted by including ``%(name)s`` in the value. > >+For ``Template``-style interpolation, you specify a value to be substituted >+by including ``${name}`` in the value. Alternately, if 'name' is a valid >+Python identifier (i.e., is composed of nothing but alphanumeric characters, >+plus the underscore character), then the braces are optional and the value >+can be written as ``$name``. >+ >+Note that ``ConfigParser``-style interpolation and ``Template``-style >+interpolation are mutually exclusive; you cannot have a configuration file >+that's a mix of one or the other. Pick one and stick to it. ``Template``-style >+interpolation is simpler to read and write by hand, and is recommended if >+you don't have a particular reason to use ``ConfigParser``-style. >+ > Interpolation checks first the 'DEFAULT' sub-section of the current section to > see if ``name`` is the key to a value. ('name' is case sensitive). > >=== pythonutils/configobj.py >================================================================== >--- pythonutils/configobj.py (revision 19655) >+++ pythonutils/configobj.py (local) >@@ -102,6 +102,7 @@ > __all__ = ( > '__version__', > 'DEFAULT_INDENT_TYPE', >+ 'DEFAULT_INTERPOLATION', > 'NUM_INDENT_SPACES', > 'MAX_INTERPOL_DEPTH', > 'ConfigObjError', >@@ -121,6 +122,7 @@ > 'flatten_errors', > ) > >+DEFAULT_INTERPOLATION = 'configparser' > DEFAULT_INDENT_TYPE = ' ' > NUM_INDENT_SPACES = 4 > MAX_INTERPOL_DEPTH = 10 >@@ -276,11 +278,114 @@ > """An error parsing in unrepr mode.""" > > >+class InterpolationEngine(object): >+ """ >+ A helper class to help perform string interpolation. >+ >+ This class is an abstract base class; its descendants perform >+ the actual work. >+ """ >+ >+ # compiled regexp to use in self.interpolate() >+ _KEYCRE = re.compile(r"%\(([^)]*)\)s") >+ >+ def __init__(self, section): >+ # the Section instance that "owns" this engine >+ self.section = section >+ # flag to tell the engine to stop looping through the recursion >+ # (can't use an exception for this, because some implementations >+ # such as TemplateEngine need to set this flag and still be able >+ # to return a value) >+ self.stop_recursion = False >+ >+ def interpolate(self, value): >+ depth = MAX_INTERPOL_DEPTH >+ # loop through this until it's done >+ self.stop_recursion = False >+ while depth: >+ depth -= 1 >+ if self._KEYCRE.search(value): >+ value = self._KEYCRE.sub(self._interpolation_replace, value) >+ else: >+ break >+ # self._interpolation_replace might have set this flag >+ if self.stop_recursion: >+ break >+ else: >+ raise InterpolationDepthError(value) >+ self.stop_recursion = False >+ return value >+ >+ def _fetch(self, name): >+ """Helper function to fetch values from owning section.""" >+ # switch off interpolation before we try and fetch anything ! >+ save_interp = self.section.main.interpolation >+ self.section.main.interpolation = False >+ # try the 'DEFAULT' member of owning section first >+ val = self.section.get('DEFAULT', {}).get(name) >+ # try the 'DEFAULT' member of owner's parent section next >+ if val is None: >+ val = self.section.parent.get('DEFAULT', {}).get(name) >+ # last, try the 'DEFAULT' member of the main section >+ if val is None: >+ val = self.section.main.get('DEFAULT', {}).get(name) >+ # restore interpolation to previous value before returning >+ self.section.main.interpolation = save_interp >+ if val is None: >+ raise MissingInterpolationOption(name) >+ return val >+ >+ def _interpolation_replace(self, match): >+ """Implementation-dependent helper function.""" >+ raise NotImplementedError >+ >+ >+class ConfigParserInterpolation(InterpolationEngine): >+ """Behaves like ConfigParser.""" >+ _KEYCRE = re.compile(r"%\(([^)]*)\)s") >+ >+ def _interpolation_replace(self, match): >+ s = match.group(1) >+ return self._fetch(s) >+ >+ >+class TemplateInterpolation(InterpolationEngine): >+ """Behaves like string.Template.""" >+ _delimiter = '$' >+ _KEYCRE = re.compile(r""" >+ \$(?: >+ (?P<escaped>\$) | # Two $ signs >+ (?P<named>[_a-z][_a-z0-9]*) | # $name format >+ {(?P<braced>[^}]*)} # ${name} format >+ ) >+ """, re.IGNORECASE | re.VERBOSE) >+ >+ def _interpolation_replace(self, match): >+ # Valid name (in or out of braces): fetch value from section >+ s = match.group('named') or match.group('braced') >+ if s is not None: >+ return self._fetch(s) >+ # Escaped delimiter (e.g., $$): return single delimiter >+ if match.group('escaped') is not None: >+ # Stop the recursive interpolation now, because the user intended >+ # this dollar sign to end up in the final result. If we go through >+ # one more level of recursion, we'll end up interpolating it. >+ self.stop_recursion = True >+ return self._delimiter >+ # Anything else: ignore completely, just return it unchanged >+ raise AssertionError, "Shouldn't get here -- invalid Template interpolation value" >+ return match.group() >+ >+interpolation_engines = { >+ 'configparser': ConfigParserInterpolation, >+ 'template': TemplateInterpolation, >+} >+ > class Section(dict): > """ > A dictionary-like object that represents a section in a config file. > >- It does string interpolation if the 'interpolate' attribute >+ It does string interpolation if the 'interpolation' attribute > of the 'main' object is set to True. > > Interpolation is tried first from the 'DEFAULT' section of this object, >@@ -293,8 +398,6 @@ > Iteration follows the order: scalars, then sections. > """ > >- _KEYCRE = re.compile(r"%\(([^)]*)\)s|.") >- > def __init__(self, parent, depth, main, indict=None, name=None): > """ > * parent is the section above >@@ -336,40 +439,27 @@ > self[entry] = indict[entry] > > def _interpolate(self, value): >- """Nicked from ConfigParser.""" >- depth = MAX_INTERPOL_DEPTH >- # loop through this until it's done >- while depth: >- depth -= 1 >- if value.find("%(") != -1: >- value = self._KEYCRE.sub(self._interpolation_replace, value) >+ try: >+ # do we already have an interpolation engine? >+ engine = self._interpolation_engine >+ except AttributeError: >+ # not yet: first time running _interpolate(), so pick the engine >+ name = self.main.interpolation >+ if name == True: # note that "if name:" would be incorrect here >+ # backwards-compatibility: interpolation=True means use default >+ name = DEFAULT_INTERPOLATION >+ name = name.lower() # so that "Template", "template", etc. all work >+ klass = interpolation_engines.get(name, None) >+ if klass is None: >+ # invalid value for self.main.interpolation >+ self.main.interpolation = False >+ return value > else: >- break >- else: >- raise InterpolationDepthError(value) >- return value >+ # save reference to engine so we don't have to do this again >+ engine = self._interpolation_engine = klass(self) >+ # let the engine do the actual work >+ return engine.interpolate(value) > >- def _interpolation_replace(self, match): >- """ """ >- s = match.group(1) >- if s is None: >- return match.group() >- else: >- # switch off interpolation before we try and fetch anything ! >- self.main.interpolation = False >- # try the 'DEFAULT' member of *this section* first >- val = self.get('DEFAULT', {}).get(s) >- # try the 'DEFAULT' member of the *parent section* next >- if val is None: >- val = self.parent.get('DEFAULT', {}).get(s) >- # last, try the 'DEFAULT' member of the *main section* >- if val is None: >- val = self.main.get('DEFAULT', {}).get(s) >- self.main.interpolation = True >- if val is None: >- raise MissingInterpolationOption(s) >- return val >- > def __getitem__(self, key): > """Fetch the item and do string interpolation.""" > val = dict.__getitem__(self, key) > > > > > > > |