SourceForge has been redesigned. Learn more.
Close

Pydoop version checker doesn't handle plus in version

Help
2013-04-22
2013-05-30
  • Jeremy G Kahn

    Jeremy G Kahn - 2013-04-22

    some older CDH versions use a + to delimit external and internal versions of Hadoop code, e.g. 0.20.2+320.

    This change supports handling that plus signature in pydoop.hadoop_utils.

    The patch is available on Github and will be pasted separately below.

     
  • Jeremy G Kahn

    Jeremy G Kahn - 2013-04-22

    Patch below:

    commit 0e12d80b6a47742722c1bea02177895bc953a363
    Author: Jeremy G. Kahn <jeremy@trochee.net>
    Date:   Tue Apr 16 14:28:31 2013 -0700
    
        Allow hadoop version to use a plus
    
        some older CDH versions use a + to delimit 'public' vs private versions
        of Hadoop code, e.g. 0.20.2+320.
    
        This change supports handling that signature in pydoop.hadoop_utils
    
    diff --git a/pydoop/hadoop_utils.py b/pydoop/hadoop_utils.py
    index 3d70495..a63cc0f 100644
    --- a/pydoop/hadoop_utils.py
    +++ b/pydoop/hadoop_utils.py
    @@ -63,7 +63,7 @@ class HadoopVersion(object):
       """
       def __init__(self, version_str):
         self.__str = version_str
    -    version = self.__str.split("-", 1)
    +    version = re.split(r"[-+]", self.__str, maxsplit=1)
         try:
    
     
  • Simone Leo

    Simone Leo - 2013-04-24

    Although we do not explicitly support older versions, adding this won't hurt. Thanks for the patch!

    Simone

     
  • Jeremy G Kahn

    Jeremy G Kahn - 2013-04-29

    Thanks. I know y'all don't explicitly support versions that old, but I'm working with a legacy Hadoop cluster that I can't upgrade (yet). With this patch in place, I find that I can get Pydoop working - without it, I am immediately blocked by a version assert from pydoop at compile time.

     
  • Jeremy G Kahn

    Jeremy G Kahn - 2013-05-09

    Do you have any plans to integrate this change into develop (or master)? I would like to avoid maintaining my own local copy of this patch, and I do not believe that it interferes with any other pydoop development.

     
  • Simone Leo

    Simone Leo - 2013-05-10

    Done:

    http://sourceforge.net/p/pydoop/code/ci/e922028adb6514c982caa40ee563853efe8efdff

    I had to add a few more lines to keep version objects consistent with the other ones. The "320" part in "0.20.2+320", for instance, is mapped to "(3, -1, 320)" (it's CDH3 and it's older than "cdh3u0" which is mapped to "(3, 0)"). I also updated the corresponding unit test.

    I chose not to add support for "0.20.2-CDH3B4" for now. I hope no one needs that :)

    Simone

     
  • Jeremy G Kahn

    Jeremy G Kahn - 2013-05-10

    These additional changes (using a negative number in the sort) break the resulting generated c code: I get pages of errors that look like this:

    src/_pipes_0_20_2_cdh_3_-1_320_main.cpp:8:1: error: invalid suffix "_320" on integer constant
    

    I understand the importance of keeping the version objects in line, but it's also important that the versions compile. Can we cope with a -1 minor version in a way that doesn't break the C code?

     
  • Simone Leo

    Simone Leo - 2013-05-13

    Fixed (hopefully) in commit 5164a98 (as I'm writing this, the web-based repository browser does not yet show this commit, but it's there if you checkout the develop branch via git).

    I tried to address all possible cases this time, while trying to leave things as consistent as possible. Thanks for your feedback and please let me know if the current version works for you

    Simone

     
  • Jeremy G Kahn

    Jeremy G Kahn - 2013-05-21

    This seems to work for me. Thanks for revising the version numbers.

    Any plans to issue a formal release with these changes? I'd like to manage these packages with an artifact repository and that requires a known version number.

     
  • Jeremy G Kahn

    Jeremy G Kahn - 2013-05-21

    hmm. no, this patch doesn't seem to work for me. I can't figure out why, but when I compile against develop, I have errors from the mappers which tell me it can't find HADOOP_CONF_DIR. When I compile against my original patch, it does find the appropriate pieces and functions.

    Still not happy with the revisions to this patch. I hope to abandon them all, of course, when my group upgrades to CDH4, but in the meantime I'm sticking to my revised patch until I can isolate what change broke my pydoop. :(

     
  • Simone Leo

    Simone Leo - 2013-05-30

    Hello,

    have you tried removing the old installation completely before installing the new version? You can also force the correct value of HADOOP_CONF_DIR by setting the environment variable explicitly.

    Simone

     
    Last edit: Simone Leo 2013-05-30
  • Jeremy G Kahn

    Jeremy G Kahn - 2013-05-30

    I have tried both of those things already; I think the version-munging creates some piece of generated C code with a different name than the engine expects - setting HADOOP_CONF_DIR should only be required when it cannot be computed from HADOOP_HOME; my previous compilation already sets HADOOP_HOME.

    We'll probably have to complete our adoption of CDH4 just to avoid this headache in the future; in the mean time I think we're going to have to stick with my homebrew patch.

     

Log in to post a comment.