#7 Firefox 4 has problems with sqlite files

closed
Xeno86
None
9
2011-11-14
2011-04-11
aceman
No

It seems Firefox 4 (final) has some problems with its sqlite files in the profile directory. It can't read bookmarks and history file created with 3.6 (places.sqlite), then removes the file. It creates a new empty file, but can't append new history entries to it. Visiting any URL is very slow, it looks like FF is trying to update the file but fails repeatedly. The same problem is with cookies.sqlite. My guess is there is a problem with the new WAL mode of sqlite used in FF4. The file places.sqlite-wal and -shm is always empty, which seems wrong. Also, the documentation to WAL mentions shared memory support is required from the VFS. Does win98 or kernelex provide this? Can it be added?

Discussion

  • aceman
    aceman
    2011-04-11

    This problem actually makes Firefox 4 unusable.

     
  • aceman
    aceman
    2011-04-11

    • priority: 5 --> 9
    • assigned_to: nobody --> xeno86
     
  • Felicitas Ann
    Felicitas Ann
    2011-07-25

    Confirming that this problem is related to SQLite3 using WAL with KernelEx.
    The following patch disables WAL in Firefox, causing it to run full speed again. Consider this a hack to prove that WAL+KernelEx is the problem here. We should look into what's exactly causing the problem in KernelEx(is shared memory implemented anyway?)

    diff -Naur firefox-5.0.source/mozilla-release/netwerk/cookie/nsCookieService.cpp firefox-5.0.sourcep//mozilla-release/netwerk/cookie/nsCookieService.cpp
    --- firefox-5.0.source/mozilla-release/netwerk/cookie/nsCookieService.cpp 2011-06-15 23:57:48.000000000 +0200
    +++ firefox-5.0.sourcep//mozilla-release/netwerk/cookie/nsCookieService.cpp 2011-07-24 22:42:46.011744600 +0200
    @@ -1015,7 +1015,7 @@
    // Use write-ahead-logging for performance. We cap the autocheckpoint limit at
    // 16 pages (around 500KB).
    mDefaultDBState->dbConn->ExecuteSimpleSQL(NS_LITERAL_CSTRING(
    - "PRAGMA journal_mode = WAL"));
    + "PRAGMA journal_mode = DELETE"));
    mDefaultDBState->dbConn->ExecuteSimpleSQL(NS_LITERAL_CSTRING(
    "PRAGMA wal_autocheckpoint = 16"));

    diff -Naur firefox-5.0.source/mozilla-release/toolkit/components/places/nsNavHistory.cpp firefox-5.0.sourcep//mozilla-release/toolkit/components/places/nsNavHistory.cpp
    --- firefox-5.0.source/mozilla-release/toolkit/components/places/nsNavHistory.cpp 2011-06-15 23:57:54.000000000 +0200
    +++ firefox-5.0.sourcep//mozilla-release/toolkit/components/places/nsNavHistory.cpp 2011-07-24 22:42:44.057632900 +0200
    @@ -750,7 +750,7 @@

    // Be sure to set journal mode after page_size. WAL would prevent the change
    // otherwise.
    - if (NS_SUCCEEDED(SetJournalMode(JOURNAL_WAL))) {
    + if (NS_SUCCEEDED(SetJournalMode(JOURNAL_DELETE))) {
    // Set the WAL journal size limit. We want it to be small, since in
    // synchronous = NORMAL mode a crash could cause loss of all the
    // transactions in the journal. For added safety we will also force

     
  • Felicitas Ann
    Felicitas Ann
    2011-08-24

    Apparently, there are 2 distinctive bugs causing Firefox 4+ to not run properly:
    1. UnlockFileEx unimplemented
    2. Unexpected behavior of locked files in conjunction with file mapping

    UnlockFileEx unimplemented
    When using WAL mode, SQLite performs some locking/unlocking to determine whether the “*-wal” file is in use by issuing LockFileEx/UnlockFileEx. As UnlockFileEx is not supported on Win9x, and KernelEx doesn’t implement it either, it will do nothing, causing the next LockFileEx to fail. This will lead SQLite, and ultimately Firefox, to think that the file is in use. One of them(haven’t figured out which one exactly, but I think it’s Firefox) will then try to open the file over and over again, leading to massive slowdowns.
    This bug is easy to fix. Attached is a UnlockFileEx implementation, which maps the request to 9x’ UnlockFile.

    Unexpected behavior of locked files in conjunction with file mapping
    With the patch applied, you will notice that Firefox just crashes.
    After opening the “*-wal” file, SQLite locks 2 1-byte regions, offset 120 and 128, to coordinate internal stuff. It will try to write to offset 0 through a file mapping after that, which is where it’s header resides. The crash happens right here.
    When using file mappings, locks will apply for entire pages, which are 4096 bytes in size. So locking offset 120 and 128 locks the entire range from 0 to 4095.
    You may know that there are 2 handles involved when establishing file mappings: The file handle returned by CreateFile, and the file mapping handle, returned by CreateFileMapping.
    When locking files, full access is still possible by using the file handle that originally created the locking. This is the case for Win9x and WinNT.
    However, there seems to be a difference when accessing a locked region through a file mapping handle/file mapping. On NT systems(tested with WinXP SP3) it will just work. On 9x, it will not.
    So locking a file, creating a file mapping handle for that file, using this handle to create the actual mapping in the process’ address space and finally accessing the file through our mapping does not work on 9x, but does on NT.
    Further investigation revealed that accessing locked regions works when creating the file mapping and writing to every page once before locking anything. We can’t depend on that behavior however, as it will restore access rights according to the description above in case the page gets swapped out(I think).
    Attached is a test case.

    I don’t have any idea on how to fix that problem. I can only think of a few workarounds:
    1. Ignore LockFileEx/UnlockFileEx calls at all. This will most probably affect several applications
    2. Ignore lock requests with offset 120 and 128(maybe some additional checks to see whether SQLite issued them, check for –wal suffix?). Impact will be less, but still there. Defeats the whole purpose of locking these regions though(same issue as with 1.)
    3. Ignore the bug, and continue using a Firefox build with WAL disabled. This will not help with other applications using SQLite. But I think there’s a good chance of incorporating a “WAL doesn’t work->fall back to old mode(DELETE)” patch into Firefox.

    Also, there is the possibility of emulating the desired behavior at least to some extent. We can keep track of all file locks, and unlock all regions that block our way when establishing the actual file mapping. This will add a lot of complexity to the code to get it right. Furthermore, it’s not perfect, as applications may depend on locks still being in place. But this is as close as we can get without changing 9x’ kernel I think. Emulating NT’s behavior using the “write first, lock afterwards” quirk described above is a bad idea in my opinion, as applications might crash if too much physical memory is in use.

    So, any suggestions/ideas? Maybe there’s a nifty way around the problem or something? I’m not that into 9x yet.

    Test output on Windows XP SP3:
    Test 1: Lock first, don't touch, map
    - Writing to file mapping, offset 120 and 128 locked, offset 0: Passed
    - Writing to file mapping, offset 120 and 128 locked, offset 4096: Passed
    Test 2: Lock first, touch, map
    - Touching offset 0: Passed
    - Touching offset 4096: Passed
    - Writing to file mapping, offset 120 and 128 locked, offset 0: Passed
    - Writing to file mapping, offset 120 and 128 locked, offset 4096: Passed
    Test 3: Map first, don't touch, lock
    - Writing to file mapping, offset 120 and 128 locked, offset 0: Passed
    - Writing to file mapping, offset 120 and 128 locked, offset 4096: Passed
    Test 4: Map first, touch, lock
    - Touching offset 0: Passed
    - Touching offset 4096: Passed
    - Writing to file mapping, offset 120 and 128 locked, offset 0: Passed
    - Writing to file mapping, offset 120 and 128 locked, offset 4096: Passed

    Test output on Windows 98 FE/Gold(4.10.1998) with KernelEx 4.5.1 and UnlockFileEx patch:
    Test 1: Lock first, don't touch, map
    - Writing to file mapping, offset 120 and 128 locked, offset 0: Failed
    - Writing to file mapping, offset 120 and 128 locked, offset 4096: Passed
    Test 2: Lock first, touch, map
    - Touching offset 0: Failed
    - Touching offset 4096: Passed
    - Writing to file mapping, offset 120 and 128 locked, offset 0: Failed
    - Writing to file mapping, offset 120 and 128 locked, offset 4096: Passed
    Test 3: Map first, don't touch, lock
    - Writing to file mapping, offset 120 and 128 locked, offset 0: Failed
    - Writing to file mapping, offset 120 and 128 locked, offset 4096: Passed
    Test 4: Map first, touch, lock
    - Touching offset 0: Passed
    - Touching offset 4096: Passed
    - Writing to file mapping, offset 120 and 128 locked, offset 0: Passed
    - Writing to file mapping, offset 120 and 128 locked, offset 4096: Passed

    I skipped Test 5. It demonstrates how we could use 9x quirks to simulate NT’s behavior, and in which cases we can’t depend on it. See test sourcecode for details.

    UnlockFileEx: http://files.xi-intersection.de/download/pQUmtY0uc4rV0OTY1MvAVx9RGwORop4C
    Tests: http://files.xi-intersection.de/download/8U6k7AtEAQTSJp6UHw6SNQpnwUwrpOJT
    (...how can I add attachments?)

     
  • Xeno86
    Xeno86
    2011-11-14

    Fixed as of KernelEx v4.5.2

     
  • Xeno86
    Xeno86
    2011-11-14

    • status: open --> closed