via XPJ.
When you load A.RData into B.R and save B.RData for use by C.R, it includes all the variables from both A and B, not just the ones from B. The data from A can be very large, causing subsequent .RData files to fill up disk space. Also, the namespcace can become increasingly polluted by variable names that were used much earlier in the chain of dependencies. That is, you can have trouble when C.R innocently makes use of a variable called "w" or whatever, not realizing that "w" has a value from when it was used as a loop counter in A.R. This can cause hard-to-diagnose bugs.
It's not clear how to implement it, but it might be preferable if only data created by B.R were saved in B.RData.
Anonymous
I wonder if there's a nice way to do this with "environments":
http://stat.ethz.ch/R-manual/R-devel/library/base/html/environment.html
For instance, load A.RData into an environment called "A" so that the variables are accessible but we can leave them out when saving B.RData.
View and moderate all "bugs Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Bugs"
OK. As JD suggested, a way to solve this:
When we have a small project, we don't worry about that.
But when we have a big project, we should use #rdsave to save variables in each R program, that will remove the chain of the dependence.
I don't think what we already do is functionally different from using environments.
If we make B.Rout depend on A.RData then B has access to all of the variables that A saved.
If C is downstream of B, we can:
Save everything we need in B, and make C depend on B
Save only new things in B, and make C depend on A and B.
I now understand that Xingpeng is asking an additional question that I didn't get: what if B changes something from A? We should test and confirm that loading A and B in that order has the expected behaviour that B should override A when we want it to.
JD