I just compiled 2.10.2 and after starting it I followed the suggestion to clean up diskspace. Wondering about the long runtime I made a ps waux|grep rm and turned pale... The program was systematically "cleaning up" my entire home directory tree (with quite some success: 70G lost in unrelated directories). I'm somewhat clueless how this is possible,
Independent of a fix I suggest that the dialogbox should give a hint where it is about to clean up before actually starting an rm -rf job, and it ideally reports the files it currently removes (e.g. reporting the rm command in action).
The "legal" places I would think of are
~/.lives-dir
~/livestmp
but no other?
One suggestion for a potential cause (in case it does not remove everything in ~/ by default):
I removed the directory ~/livestmp (after my laptop diskspace "developed") and symlinked it to a directory of same name /tmp/livestmp before.
IMHO this should be a harmless change?
I just inspected the ~/.lives file.
I have been running into some crashes and a "disk full" on a prior version, and I believe that I have seen this file to be quite small (but not empty).
Right now it is much bigger and actually contains the answer, the file is attached.
The tempfir section points to my homedir instead of the livestmp link inside :-S
I don't know if the case of disk full could leave behind a damaged .lives file which has been "recovered" with defaults? In any case, the tmpdir should never point to the home dir
Hello Guenter ! First of all my sincere apologies for the dataloss. Under normal circumstances this should never happen. However, your analysis was correct. It seems likely that your ~/.lives file was damaged and the <tempdir> (which is actually the LiVES woirking directory - however the name is set like that for historical reasons) setting was not saved correctly (most probably due to the disk being full). </tempdir>
In normal circumstances what would happen is, the tempdir setting should only be absent on a new install, in this case it would be set to the home directory (since we need somewhere to write to until the user selects the working directory through the GUI). Along side this, LiVES should also set <startup_phase> to -1, which then triggers the setup code in the GUI, allowing the user to select the working directory. It seems that in your case, the <tempdir> setting must have been absent, but the ~/.lives file was actually present which prevented the <startup_phase> from being set.</startup_phase></tempdir></startup_phase>
In order to avoid this situation in future, I will make the following changes:
if the <tempdir> entry is missing in ~/.lives, we still set it to the default (home directory), BUT in thios case it will also set <startup_phase> to -1, even if the ~/.lives file is present. Hence it should be impossible to start up LiVES without going through the setup again.</startup_phase></tempdir>
we can't really prevenmt the user from selecting their home directory in the setup phase, since there may be genuine reasons for doing so, but LiVES will now show a warning if the user selects the home directory, informing them that this could lead to data loss and letting them go back and select or create another directory.
I believe this should be enough since there would only be three ways to set the working directory as the home directory; via the GUI, after clicking Yes on the warning dialog, manually editing the ~/.lives file, or passing the directory in as a startup option, in each case requiring a concious decision by the user.
I did consider your suggestion of showing which files/directories would be removed in advanced, however it would be quite difficult to implement and I am not sure how useful it would be in prectice, so I am reluctant to do that.
Anyway I am interested in your thoughts on the proposed fix, and whether you think it is sufficient.
Thanks for the quick feedback! Unfortunately I'm still a bit busy to sort out >100k files ;)
I should mention that ext4magic is a great tool to recover files which you would normally consider lost. I think your suggestion does provide enough safety, because there are several dangerous points.
If a user really wants to trash his homedir with temporary files, he should actively set it and get a warning. A safe default would be ~/livestmp and in case it's missing upon first access the program can simply create it.
To prevent this kind of error (disk full might not be that rare due to the huge space requirements of video), modifications to the ~/.lives file should be first written to e.g. ~/.lives.tmp, and in case of success the old .lives will be removed and the new file renamed...cannot be truncated anymore.
I would also sleep much better if the cleanup command could be very specific. I don't use lives enough to tell if the files have a common prefix or suffix, but if they had, the rm command could just look for these. Only in this case use of the home directory seems safe (otherwise I think you should block the function)
Thanks, all good suggestions, I will look into implementing them right away.
Regarding your suggestion of creating a new file then renaming it, that is already how LiVES operates. So it appears that in your case the diskspace must have run out during the rename operation, which is indeed very unfortunate.
I guess a better soultion might be - create the new file, -copy rather than rename, -make sure the file sizes match (old and new), if the sizes match, delete the temp version, otherwise delete the .lives file and abort (i.e forcing it to be recreated and the setup to be triggered on the following run).
There is another possibility which I had not considered, which is that the new (temp) file may be truncated. Thje actual file writing is done by a background perl process, so it would seem wise to check for errors during the write process. As a further check one could also calculate the size of the new file and then check that this matches what is in the file system and abort if the two values do not match.
Again my apologies for the dataloss, and thankyou for helping focus my attention on this serious issue.
Further investigation has revealed a couple of interesting things. Currently we return an error code if either the temp settings file cannot be opened or if the rename operation fails. This check was inadvertantly omitted when setting a colour type preference, so this needs to be added (colour prefs were added pretty recently). On receipt of the error code, the front end will show a warning, however I am going to change this to a retry / abort dialog. The same code will now also be triggered if the actual write operation fails.
So there are now two possibilities for what went wrong in your case:
1) a colour preference was amended, LiVES wrote the remp config file, but the rename failed, and the error was not handled in the front end.
2) writing the temp config file failed, resulting in a truncated file, which was then succesfully renamed
The changes II am making now (in addition to your suggestions) should ensure that neither of these situations occurs again.
You should also be aware that there are two diskspace levels. a warning level (default 2.5 GB) and a critical level (default 250 MB). At certain points, including if a command fails, LiVES will check the diskspace for the volume containing the working directory,, and if the warning level is reached a warning will be shown and if the user elects to continue then the warning level is automaticlly lowered (informing the user of the new value). If the critical level is reached then the user will see a different dialog which allows them to abort, retry or cancel. These levels can be seen and adjusted in Preferences, and it is also possible (though not recommended) to set the vaule to 0 to disable that particular check. We don't specifically check the home directory, but if your working directory was originaly a subdirectory of home these checks should have alerted you to the fact.
However due to the particular circumstances in your case it is possible that the disk level was not checked in this particular instance. Adding in the temp file write checks and fixing the colour pref bug should ensure that the check is performed when a failure occurs. So in fact the user would receive two warnings, the first that updating the config file failed, and the second that the diskspace was below the critical level, in each case the user would have the opportunity to retry the operation or abort immediately from the program.
I think you mentioned that you did receive the critical warning and logically ran the diskspace cleaning tool; tragically in your case the ~/,lives file had previously been damaged and then recreated with the default setting, leading to the dataloss.
I don't remember that I've seen such a warning. The initial one I might have missed because I might have started lives at <2.5G. My expectation of where the tmpdir should be (/local/livestmp or /tmp/livestmp) versus where it has been (~/livestmp) was different - a matter of permanent over-utilization of these tiny 2.5" disks...I languish for a 16TB drive ;-)
What happened next is that I tried to find options to change brightness/contrast/denoise and apparently I started a conversion by accident (was not really obvious from the GUI, maybe a preview?). I only noticed the ongoing CPU load and checked with ps to find some convert jobs which have probably blown up the memory rapidly while I've been reading the manual. That could be an explanation why I didn't see the critical warning before (an older version) hung.
If I ever changed some color prefs in lives I can't tell for sure, but persistent color shifts in handbrake caused me to look at lives... Too many circumstances? Yeah, that's Murphy...
BTW, for recovering lost files I can personally recommend ddrescue:
https://www.gnu.org/software/ddrescue/
it has worked very well for me in the past after a hard disk crash. It will link any orphaned inodes into /lost+found. -the filenames will be missing, but you can list the fragments is order of descending size and try to play them through mplayer. or vlc or another player. If you find a missing clip then you can rename it and move it back to its old location.
Last edit: Salsaman 2019-07-08
OK, just to update you I have been working on this all day. It proved to be a little more tricky than I imagined, since LiVES performs a lot of checks on startup. With the new failure mode it was detecting the (simulated since I was testing) config file error much earlier, then trying to show a warning. The problem being that in order to show the warning dialog it was first trying to create the GUI, and in order to create the GUI it needed to read in a lot of preferences, thus creating a kind of chicken and egg situation. To get around this I had to move some warning messages earlier in the code, before the GUI is created, and LiVES will show just a plain (unthemed) error dialog before exiting without trying to create the GUI.
The extra checks (mentioned above) should ensure that config file truncation never occurs .
Following your suggestion, the disk cleanup is now a lot less aggressive. Unfortunately, there is no prefix that we can check for since the sets are just saved using the name which the user supplies (normally this is not a problem since under normal circumstances we would be using a dedicated directory, the set name is also checked so that for example it cannot start with a "." or contain "..", a space, "*", or a path separator). However it was possible to tighten the scope so that subdirectories are only removed if either:
the subdirectory name is composed entirely of digits, with an int value > 65535.
or:
the subdirectory contains a file named "order" and does not contain either a subdirectory named "clips" or a subdirectory named "layoiuts"
In addition there are some files which will always be removed, namely:
"rfx. .smogplugin. .smogval. keep_layout
but these should be pretty specific to LiVES.
Regarding your final suggestion, which was to set the default working directory to a subdirectory of $HOME, that is a great idea; I think the best way would be for the frontend to obtain a unique directory name in $HOME using mkdtemp(), then pass the resulting string to the backend, In case the working directory is not set in the config file, the backend will use the supplied value and the front end process will note that the provided value was returned, and thus will know to delete it after the user selects the real working directory.
Last edit: Salsaman 2019-07-09
All changes done:
https://github.com/salsaman/LiVES/commit/a68ad304392102a1b383d24a0a75d5e078d47b28
https://github.com/salsaman/LiVES/commit/22ded0e9e8f8dcd976cd6ea2137062f5cf07ee35
Hi Guenter, I'm not sure if you are still monitoring this, but the current git version of LiVES has even more extensive safeguards to try to ensure nothing like this ever happens again.
If you recall, the root of the problem was that your config file was damaged, leading to the program unfortunately falling back to its default working directory, without prompting for it to be set.
Here is a summary of the safeguards in the upcoming release.
Firstly, on startup, the program will check for an existing config file. If this is not present, then it will attempt to recover from the backup config file, as I will explain shortly. Next, if the config file is there we try to read the version string from it. If this is not present the we assume the file is corrupt and it is renamed (for later forensics if needed). In this case we then retry the same procedure with the backup config file.
If we got a version string from the config or backup config, then the next step is to try to retrieve the working directory. If this succeeds then we assume all is OK and do a normal startup, althought the user is informed if recovery was made from the backup config file.
If a version string or a working directory string are not present then the flag is set for a fresh install. There is no longer a default for the working directory, nothing is written to it until the user selects it. The config files are still created by default in $HOME, however it is possible to override this by starting the program with the -configdir option, in which case it is possible to avoid entirely writing anything to $HOME. Additionally, one can use the commandline option -workdir to override any setting in the config dir.
Secondly, the process of updating the config file has been meticulously rewritten to ensure that it is almost impossible to end up with a corrupted config file. Before making any alterations, the file is backed up, and the size of the backup is compared with the original to make sure it is the same, otherwise we abort immediately. Alterations are then made in a third copy of the config file, and the size is checked after inserting every string to make sure it is as we expect. Only if the final size is OK, we then copy the new file over the original, again checking the size of the resulting file. If anything goes wrong with the copying process then we do the following: - if the config file still has its original size, do nothing, just abort. If the config file has a different size, try to delete it and then rename the new file to the config file (in case there was insufficient disk space for the copy operation), if anything goes wrong here, we also have the option of deleting the new config file, provided the original backup is still there. On the next startup, we'll check the config file, the new file, and the backup in that order.
In the normal case, after copying the new config file to the original file, the new config file would then replace the backup. This way we should always have at least 2 copies of the config file. You can even delete the config file and LiVES will happily continue with the backup file on the next restart.
Finally, the diskspace cleanup has been extensively rewritten in the new version. If you are running the program from a terminal window, then you get a running comentary describing the decisions taken. Each subdirectory of the working directory is examined. The directory will only be removed if it meets some strict conditions, With a couple of specific exceptions, the directory name must be composed of 5 or more digits only, with a numerical value > 65535, or else it must contain a file named "order", and not contain specific marker files.
Anyway, if you are still following this then please be reassured that the issue is still being taken very seriously and every action possible being taken to ensure such a tragedy does occur again.