accumulation of variables in .RData files

Status: Beta

Brought to you by: worden

#68 accumulation of variables in .RData files

Milestone: lalashan-yushan_site-specific

Status: open

Owner: nobody

Labels: None

Priority: 5

Updated: 2012-08-30

Created: 2011-02-24

Creator: Lee Worden

Private: No

via XPJ.

When you load A.RData into B.R and save B.RData for use by C.R, it includes all the variables from both A and B, not just the ones from B. The data from A can be very large, causing subsequent .RData files to fill up disk space. Also, the namespcace can become increasingly polluted by variable names that were used much earlier in the chain of dependencies. That is, you can have trouble when C.R innocently makes use of a variable called "w" or whatever, not realizing that "w" has a value from when it was used as a loop counter in A.R. This can cause hard-to-diagnose bugs.

It's not clear how to implement it, but it might be preferable if only data created by B.R were saved in B.RData.

Discussion

Lee Worden - 2011-02-24

I wonder if there's a nice way to do this with "environments":
http://stat.ethz.ch/R-manual/R-devel/library/base/html/environment.html

For instance, load A.RData into an environment called "A" so that the variables are accessible but we can leave them out when saving B.RData.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Comment has been marked as spam.
Undo

View and moderate all "bugs Discussion" comments posted by this user

Mark all as spam, and block user from posting to "Bugs"

Anonymous - 2011-02-24

OK. As JD suggested, a way to solve this:

When we have a small project, we don't worry about that.

But when we have a big project, we should use #rdsave to save variables in each R program, that will remove the chain of the dependence.

OK. As JD suggested, a way to solve this: When we have a small project, we don't worry about that. But when we have a big project, we should use #rdsave to save variables in each R program, that will remove the chain of the dependence.

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

New Attachment:

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Jonathan Dushoff - 2011-02-24

I don't think what we already do is functionally different from using environments.

If we make B.Rout depend on A.RData then B has access to all of the variables that A saved.

If C is downstream of B, we can:

Save everything we need in B, and make C depend on B

Save only new things in B, and make C depend on A and B.

I now understand that Xingpeng is asking an additional question that I didn't get: what if B changes something from A? We should test and confirm that loading A and B in that order has the expected behaviour that B should override A when we want it to.

JD
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Anonymous