Menu

Import spss bugfixes

Help
fsando
2007-12-13
2013-01-15
  • fsando

    fsando - 2007-12-13

    Hi
    I am now using rkward regularly and I have found a couple of bugs in the import dialog regarding SPSS datasets. I have gone through the code and have made some code that should correct it. I am not sure what is the best way to about this process.
    The bugs:
    1. unchecking "Use value labels" causes read.spss to use default values which happens to be TRUE instead of false
    2. unchecking "Edit" cause the data to not be imported at all

    There is also an issue with value labels not shown in rkward editor rather the editor appears to be using the "levels" attribute instead. i am working on a fix that will convert labels to levels though there are some snags.
    Regards

     
    • Thomas Friedrichsmeier

      Hi,

      the best way to get these changes into RKWard is to write a message to rkward-devel at lists dot sf dot net. This makes it easier to follow up in replies. For representing the changes you made, use
      # diff -u oldfile newfile
      This allows us to see easily what the relevant changes are. If you're using an SVN version, "svn diff" will do the trick, nicely (see http://rkward.sourceforge.net/wiki/index.php?title=RKWard_SVN\).

      The issue of value labels / levels is a bit difficult, since R only supports labels / levels for factors, out-of-the-box. RKWard uses the levels attribute to be compatible with R in this respect, but does not insist on assigning levels only to factors. But let us know about your approach!

      Regards
      Thomas

       
    • fsando

      fsando - 2007-12-17

      I tried to post to rkward-devel but received this error

      550-Postmaster verification failed
      [...]
      50-Several RFCs state that you are required to have
      a postmaster 550-mailbox for each mail domain. This host does not accept
      mail 550-from domains whose servers reject the postmaster address. 550
      Sender verify failed (in reply to RCPT TO command)

      I don't know if it's just because I'm posting to soon after joining or it's a setting on my mail server. If it persists I may post here until it is solved.
      Anyway this is the what I tried to post:
      the diff-file is pasted at the end
      ==================================================================
      I believe there are a couple of errors in the code generated by the

      "Import SPSS file" dialog. I have created a patch that should correct this.

      I have attached the diff-file (svn diff)

      The two errors are:
      1) When the "Use value labels" is deselected the "use.value.labels" argument is removed (i.e. set to default) but default=TRUE it should be "use.value.label=FALSE".
      I made an else clause for when getRK_val ("use_labels") is FALSE.

      Perhaps it would be better to use default arguments and only act when getRK_val ("use_labels") is FALSE but I cannot easily determine the consequences of that, thus this approach.

      2) When "Edit" is deselected the assignment to globalenv is removed along with the "edit" statement. I moved the assignment outside the "doedit" condition.

      I considered the problem with value labels and they way the enter into rkward via the levels attribute. I thought it would be relatively easy but to do something about it, not so. Problem now is that you can't assign labels to non-positive values in rkward editor.

      I have this idea but don't know if it is at all doable: what about having an extra field for value labels (like the current one that refers to levels). This field should have two columns the first for the values, the second for their corresponding labels. A third column could be added that indicate missing values. I know there would be issues related to the way R handles data.frame etc. not easily solved.
      I am not a c-programmer so, of course, it's easy to dream up all kinds of "brilliant" ideas, but anyway.
      Kind
      Regards
      Finn
      ===================================================================

      diff-file

      Index: rkward/plugins/00saveload/import/import_spss.php

      --- rkward/plugins/00saveload/import/import_spss.php    (revision 2280)
      +++ rkward/plugins/00saveload/import/import_spss.php    (working copy)
      @@ -17,17 +17,22 @@
      }
      <?    }
      }
      -
      +// when "Use value labels" is deselected, "use.value.labels" is removed - i.e. set to default but default=TRUE - should be FALSE
      function calculate () {
           if (getRK_val ("data_frame")) {
               $data_frame = true;
               $data_frame_opt = ", to.data.frame=TRUE";
           }
      -
      -    if (getRK_val ("use_labels")) {
      +//"Use value labels" selected, use.value.label=TRUE
      +    if (getRK_val ("use_labels")) {
               $labels_opt = ", use.value.labels=TRUE";
               $labels_opt .= ", max.value.labels=" . getRK_val ("labels_limit");
               if (getRK_val ("trim_labels")) $labels_opt .= ", trim.factor.names=TRUE";
      +    } else
      +//"Use value labels" deselected, use.value.label=FALSE
      +    {
      +        $labels_opt = ", use.value.labels=FALSE";
      +        if (getRK_val ("trim_labels")) $labels_opt .= ", trim.factor.names=TRUE";
           }

           $object = getRK_val ("saveto");
      @@ -54,10 +59,11 @@
               }
           }
      }
      +<? //assign imported data to globalenv moved outside the doedit condition, will always be executed ?>
      +<? echo ($object); ?> <<- <? echo ($object); ?>        # assign to globalenv()
      <?    }
           if (getRK_val ("doedit") && $data_frame) { ?>

      -<? echo ($object); ?> <<- <? echo ($object); ?>        # assign to globalenv()
      rk.edit (<? echo ($object); ?>)
      <?    }
      }
      @@ -65,3 +71,7 @@
      function printout () {
      }
      ?>

       
      • Thomas Friedrichsmeier

        Hi,

        about the mail problem: Perhaps you should open a ticket in the main sourceforge request tracker about this. This does not appear to be a restriction of the mailing list software, but an earlier filter while accepting (or rather rejecting) the mail. (Though perhaps SF will tell you it's your mail provider's fault. If it really does not conform to such RFCs, that might potentially cause problems with other mail servers as well).

        Regarding the issues:
        1) Omitting the argument, when it is at its default value is slightly nicer, in general. I'll need to check whether the default of use.value.labels has changed (I certainly hope, not!), or whether it was a plain bug from the start.

        2) Ouch. Yes of course, you're right.

        I'll commit these changes as soon as I got around to checking 1).

        Regarding value labels vs. levels:
        I can see the problem. However, changing this would add complexity, some potential for confusion (*), and move us further away from "the R way" of handling data. So I'm not sure it's really worth it. Ideally, R itself would support value labels for arbitrary values, but I don't think this is likely to happen mid-term.

        (*): Basically those "arbitrary value" labels would only work as expected inside the data editor in RKWard. Making them work in all other contexts where you'd expect value labels to be printed (or missing values to be treated) would be next to impossible. So - at least until a convincing solution is found - I'd like to rather enforce the limitation of labelling only consequtive positive integers early on and consistently, instead of dealing with all the follow up issues of trying to support "arbitrary value" labels. Keep the suggestions coming, though.

         
    • fsando

      fsando - 2007-12-18

      Hi Thomas
      Thanks for your answer.
      I expect the mail problem will go away or I'll have to do something about it.

      On the value label conundrum I absolutely agree with you. No obvious solution. A solid solution would indeed have to involve R-devel. But such things do happen I think. The data.frame format has evolved somewhat I believe and methods have been adapted. So if the changes are well thought out they may make it into the central methods. I really would like to see the data.frame improved/extended in the the way of spss and even improved on what spss does. I think rkward with its ease of use and intuitive interface has tremendous potential for being the standard statistical package of Linux.

       
MongoDB Logo MongoDB