Menu

Windows 10 1903 random system freezing while using VeraCrypt's system encryption

2019-09-27
2024-08-18
1 2 3 .. 22 > >> (Page 1 of 22)
  • Katarina Schubitz

    Hi,

    I am experiencing some weird behaviour with Windows 10 and VeraCrypt.
    After moving from Windows 7 to Windows 10 1809 (clean new installation, no upgrade), I started experiencing random system freezing (no BSOD, no entried in Windows event viewer, just freezing).
    The sound (music, video audio, etc.) would still play in the background, but the screen was completely frozen and the keyboard/mouse events were not working, also the power button would not initiate a shutdown.
    Sometimes the mouse would still move and the Num lock key would still switch, but the OS would not react to anything, some time later they would also stop working, while the screen was still frozen.

    I've tried to pinpoint the problem, checking/switching every hardware component, removing any software that might be the problem, nothing helped. I also restored the old Windows 7 image to the same SSD and the problem was gone, so the problem was with Windows 10.

    So I restored the Windows 10 image (system drive not encrypted) and did a in-place upgrade to 1903, hoping it would fix things. Everything was working fine for some weeks, so I thought the problem was some faulty Windows 10 configuration/installation.
    Finally I re-encrypted the system partition with VeraCrypt and was happy that everything is finally working as intended.
    Until shortly after the "random" freezing reoccurred :(
    At first I did not connect the freezing with VeraCrypt at all, but I was able to pinpoint the problem to the system encryption of VeryCrypt.

    I was able to reproduce the problem doing this:
    Boot Windows 10 (system encrypted), then run prime95 (using "Large FFT", but anything is fine as long as it would use up almost all available memory).
    While running prime95, the memory (12 GB total) would be used up to about 90-95%.
    Just for clarification: At this point I can run prime95 for as long as I want without having the system crashing/freezing or anything. So no problem there at this point.
    Now I open Firefox (or Internet Explorer, etc.) and open YouTube.
    From there I select a random video and play it. Now I middle-click any random videos to open them in new tabs (opening 10-20 new tabs at once), which will cause the system memory to be used up to 100% and start swapping.
    The system swap is set to 10-20GB, more than enough. Now at some point the system either freezes directly, while the audio would still play in the background, or it would freeze after closing Firefox, etc.
    This happens every time I do this to stress the system while having the system partition encrypted.

    I then thought it was again some problem with Windows 10, so i decrypted the system drive and rebooted. After that I did these steps again and was surprised that I was not able to reproduce the freezing again.
    I can open 50+ tabs which will cause the system to get a little bit slowed down to extensive swapping but everything will work and after closing all tabs/Firefox everything goes back to normal speed, since the swapping is over.
    I did a lot more of these tests, trying to cause another freeze, to no avail. So I figured it must have to do with VeraCrypt's system encryption, maybe having some part of it being swapped or anything, I don't know.
    So I re-encrypted the system drive again and did the above mentioned procedures again...and the freezing was back! So I finally know what's causing this, yay! But unfortunately not how to solve it :(

    Any idea what exactly is causing this and how to fix it? Is there some option to give VeraCrypt some higher priority or stop it from swapping or counter whatever the real reason to the freezing is?
    This seems to only happen in Windows 10, since Windows 7 did not have the problem at all.
    Any help is highly preciated!

    Katarina

     

    Last edit: Katarina Schubitz 2019-09-27
  • Katarina Schubitz

    I've done some more testing and the results are like I've written before. With an unencrypted system drive I can force swapping all I want, it does not freeze/deadlock the system. With encrypted system drive it locks up pretty fast when forcing swapping.
    I've also tried to disable all Windows 10 exploit protections, but the system still froze.

    Could it be possible that some part of VeraCrypt ends in swap (which is located on an encrypted drive), so the system ends up in a deadlock/freeze?
    Something like: VeraCrypt needs its data from memory (which was moved to swap) to perform encryption/decryption, but at the same time it needs to decrypt the data from swap first, thus not being able to access its own data anymore from swap?
    I really don't understand why the problem only occurs with encrytped system drive.

    Any idea here?

    System is:
    Windows 10 Enterprise - Version 1903 - 64-bit
    Intel Xeon E5649
    12GB RAM
    256GB SSD

     
  • Mounir IDRASSI

    Mounir IDRASSI - 2019-09-27

    Hi Katarina,

    Thank you very much for this this report and the extensive test. This is a very intersting case of stress testing.

    It is clear that the issue is located at VeraCrypt driver since it is the one responsible for handling on-the-fly encrytion/decryption of disk read/write operations.

    VeraCrypt driver allocates all its memory from the non-paged memory pool which is basically a non-swapable memory. This is an important security requirement as we don't want sensitive information (like keys and passwords) to be written on the disk. Unfortunately, the amount of non-paged memory available is small and so it must be consummed with care.

    From your description, it looks like on Windows 10, VeraCrypt driver doesn't get the amount of non-paged memory that he requests when the system is low on memory and VeraCrypt drive stops from performing on-the-fly encryption/decryption of disk access which in turn causes the system to freeze.

    Did you use veraCrypt system encryption on Windows 7? I want to confirm that the problem doesn't occur on Windows 7 under forced swpping.

    I'm not sure if we can do anything for this issue. We already handle non-pages memory with greate care and it is not possible to use less since we allocate just what is needed for I/O operations. One thing that can be enhanced is that instead of allocation non-pages memory "on-demand", we can allocate a large pool of non-paged memory at the initialization of the driver, with a size that should be estimated so that it is enough for all I/O operations. That way, we are guaranteed to never run out of non-paged memory.

    I will give this a thought in the coming days but I don't think there will be anything for this in the upcoming 1.24 version.

     
  • Philip Smith

    Philip Smith - 2019-09-27

    Have you tried a program like "who crashed" it reads the file that is created when the system crashes and will give an indication of what made it crash.

    https://www.resplendence.com/whocrashed_whatsnew

    It may provide more info to help fix the problem.

     
  • Katarina Schubitz

    Hi,

    thanks for the replies.

    @Phillip:
    There is nothing written to disk, no crash dumps, no error logs, nothing, which is indicating that there is a problem with accessing files on the encrypted hard drive (i.e. non-functional VeraCrypt driver).
    The software "who crashed" seems to analyze windows crash dump files (which do not exist), so unfortunately it doesn't help in this case.

    @Mounir:
    I was using TrueCrypt for some years on Windows 7 and then switched to VeraCrypt about 9 months ago, without any problems.
    Since the Windows 7 support runs out in a few months, I was moving to Windows 10 and there the problem started.
    I haven't performed such intense stress testing under Windows 7, though I am sure I was hitting the memory limit from time to time, causing some extended swap usage.
    I could restore the old Win7 image and do some testing.

    Also I stumbled upon the following text about EncFS:
    "In addition, EncFS uses FUSE, which suffers from the fact that shared writable memory mappings must be entirely disabled in order to avoid deadlock on some page swap events."
    Not sure, if this is anything related to the VeraCrypt problem, just wanted to mention it.

    This is so annoying, since after every freeze (forced restart) Windows is re-syncronizing the mirrored HDDs, which takes like 24 hours and leaves those drives in a very slow state for that time. Maybe I will try installing Windows 8.1 when I have some more time and do the stress testing to see if it is stable like Win7.

     

    Last edit: Katarina Schubitz 2019-09-28
  • Mounir IDRASSI

    Mounir IDRASSI - 2019-09-28

    @Katarina: The EncFS text is about Linux rather than Windows. Moreover, in VeraCrypt case, we are always in Windows Kernel space so there is no frequent context switching.

    It would be helpful if you could do comparable tests on Windows 7 or Windows 8.1. This would give us an idea about the real factor behind this issue.

    Concerning my idea of pre-allocating a large non-paged memory pool, it is not trivial because it requires implementing our own memory manager instead of using the one provided by Windows and this is no small feat. I will think more about it later once other priorities are handled.

     
  • Katarina Schubitz

    Ok, I will do some more testing and report back. It will take time though.

     
  • Katarina Schubitz

    I've done some more testing and it seems to be a problem with Windows 10 (1903).
    I restored the Windows 7 image and did the stress testing mentioned above, it didn't freeze or crash, no matter how hard I pushed the system.

    So I restored Windows 10, booted into safe mode with networking and did the same stress testing again. Interestingly, the system did not freeze in that configuration.
    After that I wanted to rule out any 3rd party component as the cause and tested with a fresh Windows 10 installation (fully upgraded) with VeraCrypt only. The freezing happened there too.
    So after I ruled out 3rd party components, I compared device manager, services and running processes.
    I went ahead and disabled all devices that were not active in safe mode, but the freezing still happened.
    In the next step I compared the running services from safe mode with normal mode and there is a ton of services running in normal mode only, it's insane.

    Not sure yet, what exactly is causing the problems though. Without encrypted system drive everything is stable, even with extreme swapping / cpu usage.
    So it looks like there is some kind of incompatibility between VeraCrypt and some Windows service (or possibly some service+device combination).

    After all it seems like the VC driver locks up (exact cause unknown yet), therefor no logs or memory dumps can be written to disk and no bluescreen happens, just freezing.
    It's so hard to actually pinpoint the problem, this is one of the cases where I would be really happy to actually have a bluescreen. Very annoying.

     
  • Mounir IDRASSI

    Mounir IDRASSI - 2019-09-30

    Thank you very much Katarina for this awsome analysis and for the tests! If all users were so helpful and so dedicated like you, life would be much easier :-)

    I agree with your conclusion that there is a conflict between VeraCrypt driver and some other service/drive. If we want to know more about it, we would need to connect a kernel debugger to the machine and wait for it for freeze in order to be able to find out where the locking is happening. But this is something complex to put in place as it requires development tools and custom build of VeraCrypt.

    My hypothesis of VeraCrypt running out of non-paged memory is still the most probable one and in this case there is unfortunately nothing we can do about it. As a driver that processes all disk I/O, we allocate and free many chunks of memory of different sizes at a high frequency and if non-pages memory becomes scarce, its fragmentation becomes more problematic as it leads to allocation failures. And when allocation failures happen in VeraCrypt, I/O operations start to fail little by little until they are completely blocked.

    I don't think we can do anything further from here. Indeally in the future I should implement a PoC of a custom non-pages memory manager in VeraCrypt that pre-allocated a large amount at startup to guard against such low memory situations. Such implementation is not trivial but it would be interesting to have on the long term.

     
  • Katarina Schubitz

    Yes, I imagine how hard it can be to work through all the user requests / bug reports, while also reading through all the threads here etc., thank you for that!

    I have encrypted the system drive with Bitlocker and performed the same stress test, it does not freeze at all, not matter how hard I push the system ressources.
    So it seems like the memory usage/handling of the Bitlocker software is not affected when swapping in Windows 10. In fact the system was pretty responsive, I was sort of surprised.
    But then again it's Microsofts own creation, so they can optimize it to the last bit for their own Windows OS.

    Well, I really don't like Bitlocker that much, but for now it's still better than having no encrytion at all. So I guess I will stick with it until a fixed VeraCrypt version is released, then it gets replaced faster than you can say "YAAAAY!" ;)

    Anyways, if I have some time, I might do some more tests with disabled services and devices (i.e. trying to match the configuration of safe mode) and see if it is stable, just to see if those are really the cause. I'm pretty sure there are some more differences between safe mode and normal mode.
    Unfortunately I need the system, so I can only experiment on week-ends and it takes time to backup and restore the whole SSD back and forth. It sounds to me that a kernel debugging would be a good option. At least it would be nice to finally pinpoint the actual cause of the instability with VC's system encryption. I think I will look into this aswell.

     

    Last edit: Katarina Schubitz 2019-10-01
  • aldren niere

    aldren niere - 2019-10-05

    This also happened to me, screen freezes and music from youtube was still playing.

    I installed veracypt on Windows 10 (1809) and it was working smoothly but when i installed windows 10 updates (1903) it starts freezing, it happens when opening multiple visual studio and many tabs in chrome. but mostly it will freezes when compiling my solution in visual studio.

     
    • Katarina Schubitz

      I also had 1809 installed, but only for 1-2 weeks, I didn't have those freezes there I think.
      Then the I upgraded to 1903 and kept setting up the system and shortly after the freezing started. At first I didn't know what was causing it, so I was going through the whole system (hardware, software, drivers, etc.) and some days later I figured out it was VeraCrypt's system drive encryption that was causing the freezing.

      Currently I have encrypted the system drive with Bitlocker, while having the other drives encrypted with VeraCrypt. No freezing so far, everything runs pretty smooth.

      So it sounds plausible that 1809 could be fine, while 1903 is not.
      I have also tested 1903 in Safe Mode and there was no freezing at all.

      So my guess would be either changes to memory allocation (but I'd guess then the system would also freeze in Safe Mode) or more likely some changes to Windows defender/kernel protection/etc. which leads to the described deadlock situation.

      I think someone needs to attach a kernel debugger to a Win10 1903 machine and perform the above mentioned stress test to get the system to freeze and see what is actually going on there.
      Not sure how exactly that would work though.

       
  • Bane Banington

    Bane Banington - 2019-10-25

    Hey, I've been suffering this issue for over a year and was starting to think it was a hardware issue!

    I reproduced it perfectly with Katarina Schubitz' method, it specifically happened when I closed down Prime95 and Firefox, perhaps it's something to do with the process of recovering/clearing memory? (sample size of 1 lol)

    Notably I'm using Windows 10 LTSC which IIRC is version 1809.

    I'm up for anything that's needed to fix this issue, custom kernel debuggers etc.

    On the less thorough side of things I've noticed this issue happens a lot more when I'm hosting a videogame server. It's not a definite thing but I can be free from a freeze for months then have it happen 3 times in one day if I'm hosting a game. Things like a Minecraft server or other co op games. Perhaps the same memory processes are happening when unloading portions of the game? Playing singleplayer or as a client doesn't seem to cause it as much so I'm unsure.

    Incase it's of any use I'll describe my experience with the crashes.

    Point of no return is once Taskmanager reports the C: drive at 100% usage, but with no data being moved. All other drives drop to 0% usage and usually the network does the same. In this "quasi stable" state all open windows continue to "work" if left untouched, internet streamed media can either last for as long as it was cached before stopping or go on indefinitely, but voice chat (via Discord etc) always works both ways until a hard freeze. VLC continues to play music (small filesize cached I assume) but if it was a video it seems to only go as long as the RAM buffer allows. Interacting with firefox and such works sometimes but can trigger a full hard freeze where I can only reset. Interacting with Task Manager at all causes the full system freeze, but it still displays info if left alone and is an active window.

    As soon as the C: drive tell is there the shutdown button on the case no longer works, even in the situation of the system being "quasi stable" and interactable, only a soft or hard reset works.

    Will of course be of little use but attached is a phone screenshot of my system after hard freezing, note the small grace period when C: drops to having no transfers going on until the actual freeze.

     

    Last edit: Bane Banington 2019-10-25
    • Katarina Schubitz

      Yes, I was also thinking about hardware problems or incompatibilites with Windows 10 in the first place. I am pretty sure there are more users experiencing this problem (at least from time to time, when swapping happens) but they don't connect them to VeraCrypt, but rather to hardware or Windows problems.

      I also had a few freezes after prime95 was stopped and Firefox was closed, when the system should go back to normal. I assume that it's more like a coincidence, depending on what exactly was swapped before, so the damage was already done at that point.

      For kernel debugging, one seems to need two devices. One where the freezing will happen (set up for remote debugging) and one that is remotely connected to the freezing machine. I have read about debugging with "windbg", but didn't get into details.

      Maybe Mounir can give instructions on what exactly to do? I'm also not sure if a special debugging version of VeraCrypt would be needed for that.

      Maybe the freezes can be reproduced in a VM for easier debugging? Maybe I'll test it when I have some spare time.

       
  • Bane Banington

    Bane Banington - 2019-12-28

    Is there anything we can do to help this get sorted?

     
  • DDD

    DDD - 2019-12-31

    I wrote a whole long thing here and then the website here said that I must provide content, even though I was logged in and now I have to rewrite the whole thing! ;(

    I'm thinking that it could be caused by "encrypt keys in RAM". When I chose to not encrypt the keys in RAM, the freezes stopped. The system would come close to freezing, would appear to freeze, but after a bit of not pushing the system too much harder, the system would come around to being responsive again. That's on win 1903, but perhaps I can check if that's the case also with win 7.

    I've noticed some file corruption here and there, where some files I had downloaded or torrented files were all of a sudden back to zero and all that data was lost. Perhaps is that due to the thrashing? memory to hard drive to memory to page file or swap file etc. Maybe that's what caused the freezing too? memory being displaced by 1 or two bytes or something? not being in the spot where it's supposed to be?

    I truly wish I was getting a blue screen, too, so at least it could be diagnosed a bit. Maybe it's win 1903, not sure.

    Windows 10 seems like a facade, anyways, like your system is not even yours anymore, where things that had been working fine, back with win 7 have to be reinvented. But wasn't there some talk of Windows having to inject code or something like that to mitigate the problems of Meltdown and Spectre? Maybe Windows was injecting code right into the Veracrypt driver's code?

     
  • Tylor Durden

    Tylor Durden - 2020-01-08

    There is an easier method to reproduce:
    - Use a tool to have high load on the drive, e.g h2testw https://www.heise.de/download/product/h2testw-50539 Originally it is intendet for checking real capacity of flash drives, but also useful to write a test pattern to your SSD free space. Open, check "English" and enter directory for writing test files and the amount, e.g. 300 000 MB
    - Start Prime95 and do a stress test, e.g. 12 threads and blend

    In Win10 1903 or 1909 and Veracrypt encrypted system drive you get a freeze after a few minutes of writing data (Around 100GB and 200GB in my case). It happens if the system beginns to swap (I have 16GB RAM installed, so to reproduce limit your RAM). I think it is at the point when the SIZE OF PAGEFILE CHANGES. Freezing happens also when you stop Prime95 and windows releases RAM and pagefile. Written files are valid with their checksum. (However, as I got frequent freezes while executing a Virtual Machine with high memory and file footprint, freezes here rendered massive file corruptions)

    Symptons in task manager immediately before freeze:
    - disk activity in benchmarking is around ~50%. In last 5 seconds 100% is reached, but the very last datapoint shows 0 KB/s reading and writing speed.

     

    Last edit: Tylor Durden 2020-01-08
  • Tylor Durden

    Tylor Durden - 2020-01-08

    Things tried that are no solution

    1) The flaw is in Win10 memory management. So what is the biggest difference compared to Win7 that run flawlessly --> Memomry compression
    So I disabled memory compression in elevated powershell "get-MMAgent" and "Disable-MMAgent -mc ", reboot
    --> This worsens the situation. You now get a freeze in Prime95 after seconds!
    2) Maybe the only factor is the pagefile? I preallocated a huge 24GB pagefile via system control panel. Now the freeze was not observed in the writing, but a few seconds after all writing operations had stopped.
    3) Findings of 2) may point to the SSD firmware. Its a Samsung 970 Evo. As most premium devices, it has 2 "caches"
    --> 6GB of SLC cache not accesible from OS in the spare part region
    --> up to 36GB of pseudo-SLC cache in free sectors. Note, this are free sectors within the (encrypted) system partition. So the firmware has to decide if the sectors are not filled with data (TRIM?) and only 1 bit per cell is stored in this cells. If the drive goes idle, every information in the pseudo SLC cache is copied to TLC memory (firmware using now 3 levels per cell) and the former SLC chache is flushed.

    I know VeraCrypt is one level above the firmware, but could Veracrypt interfere in this process in any way? Forum posts here were never answered of the implications of newer SSD features, e.g Over provisioning that uses unencrypted unallocated space and dynamical change of system partition size which come with the Samsung tools.

     
    • DDD

      DDD - 2020-01-10

      I was also thinking maybe it was something with the TRIM command - should I block those or allow those?
      But has anyone upgraded to Windows 1909 and also continues to have it freeze up?
      I also wonder if it has to do with the specific encryption algorithm.
      Serpent
      primary key size : 256 bytes
      secondary key size (xts mode) : 256 bytes
      block sizse: 128 bits
      PKCS-5 PRF: HMAC-SHA-512
      says that the veracrypt boot loader version is 1.24, but it's really 1.24-Update2
      Right now all the checkboxes under "driver configuration" are all unchecked. I did write a post beforehand, asking what the TRIM command is and what to check or uncheck (I think) but no response.

      Also, sometimes I get an error from qbittorrent saying that it can't write a file because of an i/o error. also 4k video downloader has said something like that too. I don't have any Windows write cache enabled on the computer, just to write immediately everything, because the computer's frozen a number of times and I figure it's better to write immediately. but performance suffers

       
  • DavidXanatos

    DavidXanatos - 2020-01-08

    I tryed today quickly to reproduce the issue in a VM but I did not get the VM to lock up :/

    about SSD's imho the VC driver can not interfear with what the SSD is internaly doing, its ablock device no more no less all the firmware internal layers of abstraction dont mater.

    Could you try completly disabling the page file and than run the tests that make your system lock up. Just to confirm that it is related with paging of memory and not something other?

     
    • Tylor Durden

      Tylor Durden - 2020-01-09

      A virtual machine does not cover every usecase to be identical to an actual machine.
      Virtual RAM is not connected to physical RAM, VMWare has its own memory manager to compress it for reduced host memory footprint
      Virtual disk space is mapped to files by driver, so timing is different and acess is filtered
      * Virtual disks use virtual SCSI disk controller, this is a different protocol than common NVMe for native SSDs

      Vera Crypt must have awareness of different protocols, it's not a simple block device. Because it selectively blocks TRIM commands (and it would be nice to have an option do disable filtering)
      "VeraCrypt does not block the trim operation on partitions that are within the key scope of system encryption" --> So on UEFI it blocks TRIM on other partitions on the system drive and unallocated drive space. For this TRIM the protocols are different for ATA, SCSI (UNMAP) and NVme (Deallocate command) and I am sure there a subtile differences between the impementations.

       
      • DavidXanatos

        DavidXanatos - 2020-01-09

        Timing is different yes, eversthing else no.
        In VMWare you can select between SATA, SCSI, and NVMe interface for your virtual disk.

        The hypothesis in this thread is that we can not get non-paged memory to compelte reads when we need it and this causes the deadlock. imho this should not be so specific of an issue to only happen on specific hardware. Hence it should happen also in VM's.

         
  • DDD

    DDD - 2020-01-18

    I think it might have to do with Firefox. I opened a file to make it crash, and it pretty consistently locks up the computer. but then Windows completely froze, then created a dump file, saying it was a blue screen, DRIVER_POWER_STATE_FAILURE. Anybody have any more information? Veracrypt 1.24 (perhaps at a beta #) and then 1.24 Update 2, well, those didn't seem to be able to turn off my computer after a certain amount of time, my windows 7 machine. So, could it be a problem with the power driver? just guessing. am hoping somebody will come up with a fix. I know in Veracrypt 1.24, one of those versions, that Veracrypt turning off the computer after a few minutes was added.

     
  • Katarina Schubitz

    @DDD:
    "encrypt keys in RAM" did not even exist in VeraCrypt when I initially wrote the first post. Also TRIM and Firefox has nothing to do with this issue.
    If you get a memory dump file or if a problem was logged, it was no freeze but a normal crash, caused by some other problem in your system.
    If a real freeze (aka dead lock) happens, the complete system is fully frozen in its last state. Disk writes/reads are no longer possible and therefor no memory dumps can be created at this point, also no logs, etc.

    @DavidXantos:
    I also thought about trying to reproduce the freezing in a VM, so I could connect a kernel debugger to see what's going on. Unfortunately it didn't work, since the VM won't freeze like the host system would. It's simply not the same thing.

    @Tylor Durden:
    It actually seems to be related to some "new" feature of Windows 10, which causes this problem in combination with VeraCrypts system drive encrypted and swapping.
    I could not reproduce the freezing under Windows 7 for example and the problem only exists under Windows 10 in normal boot mode.
    When booting the affected Windows 10 machine in safe mode, no freezing happened, no matter how hard I tried. It will swap just fine and return to normal afterwards.

    I was trying to find out the relevant differences between safe mode and normal mode under Windows 10, but there was just too much stuff going on and I didn't have the time.

    To find out what's actually going on, one would need 2 physical machines, to start remote kernel debugging before forcing the test machine into freezing.

     
  • DavidXanatos

    DavidXanatos - 2020-01-19

    @Katarina Schubitz
    But its very close to the real thing, so I would assume it must be a very specific combination of circumstances where it happens. So presumably it will not even affect all physical machines just specific models? Did anyone run tasts on different hardware to see if it really happens on every physical machien or only on certain once?

     
1 2 3 .. 22 > >> (Page 1 of 22)

Log in to post a comment.