Re: occasional "Warning: lost track of 460840 pulled message(s)"
mailbox synchronizer
Brought to you by:
ossi
|
From: Uwe Kleine-K. <uwe...@kl...> - 2021-09-07 19:58:49
|
Hello, On 8/4/21 9:01 PM, Oswald Buddenhagen wrote: > On Wed, Aug 04, 2021 at 06:20:33PM +0200, Uwe Kleine-König wrote: >> A bigger context of the logs of an older run can be found at >> https://www.kleine-koenig.org/~uwe/syncmail.log >> > ok, that's "interesting": the job not only takes lots of wall time, it > also consumes huge amounts of cpu time. something is seriously fishy in > that setup. first to check would be whether it's ssh or mbsync (and if > it's the latter, whether it's openssl or something else). It currently happens again :-\ strace looks as follows: ... poll([{fd=4, events=POLLIN|POLLOUT}], 1, 2000) = 1 ([{fd=4, revents=POLLIN}]) write(4, "\27\3\3\0F\250\304\334\237W\235\257\307\276\256fiy\374[\3057\"V\25\332\370V\366<Q\321"..., 75) = -1 EAGAIN (Resource temporarily unavailable) read(4, "\27\3\3\0\365", 5) = 5 read(4, "%{&\27\326UVl\2657F\207\331\305i\356\200ot\37\36se\312i_\272\310\0\33\352\211"..., 245) = 245 lseek(3, 0, SEEK_SET) = 0 write(3, "1591221302\n4727926\n", 19) = 19 ftruncate(3, 19) = 0 fdatasync(3) = 0 openat(AT_FDCWD, "/home/uwe/Maildir-work/.lists.linux-arm-kernel/tmp/1631042587.213028_341577.taurus,U=4727926:2,", O_WRONLY|O_CREAT|O_EXCL, 0600) = 8 write(8, "Return-path: <linux-arm-kernel-b"..., 7718) = 7718 fsync(8) = 0 close(8) = 0 utime("/home/uwe/Maildir-work/.lists.linux-arm-kernel/tmp/1631042587.213028_341577.taurus,U=4727926:2,", {actime=1562768095 /* 2019-07-10T16:14:55+0200 */, modtime=1562768095 /* 2019-07-10T16:14:55+0200 */}) = 0 rename("/home/uwe/Maildir-work/.lists.linux-arm-kernel/tmp/1631042587.213028_341577.taurus,U=4727926:2,", "/home/uwe/Maildir-work/.lists.linux-arm-kernel/new/1631042587.213028_341577.taurus,U=4727926:2,") = 0 write(7, "> 4728198 0 4727926\n", 20) = 20 lseek(3, 0, SEEK_SET) = 0 write(3, "1591221302\n4727927\n", 19) = 19 ftruncate(3, 19) = 0 fdatasync(3) = 0 openat(AT_FDCWD, "/home/uwe/Maildir-work/.lists.linux-arm-kernel/tmp/1631042587.213028_341578.taurus,U=4727927:2,", O_WRONLY|O_CREAT|O_EXCL, 0600) = 8 write(8, "Return-path: <linux-arm-kernel-b"..., 7409) = 7409 fsync(8) = 0 close(8) = 0 utime("/home/uwe/Maildir-work/.lists.linux-arm-kernel/tmp/1631042587.213028_341578.taurus,U=4727927:2,", {actime=1562661698 /* 2019-07-09T10:41:38+0200 */, modtime=1562661698 /* 2019-07-09T10:41:38+0200 */}) = 0 rename("/home/uwe/Maildir-work/.lists.linux-arm-kernel/tmp/1631042587.213028_341578.taurus,U=4727927:2,", "/home/uwe/Maildir-work/.lists.linux-arm-kernel/new/1631042587.213028_341578.taurus,U=4727927:2,") = 0 write(7, "> 4728199 0 4727927\n", 20) = 20 lseek(3, 0, SEEK_SET) = 0 write(3, "1591221302\n4727928\n", 19) = 19 ftruncate(3, 19) = 0 fdatasync(3) = 0 openat(AT_FDCWD, "/home/uwe/Maildir-work/.lists.linux-arm-kernel/tmp/1631042587.213028_341579.taurus,U=4727928:2,", O_WRONLY|O_CREAT|O_EXCL, 0600) = 8 write(8, "Return-path: <linux-arm-kernel-b"..., 10959) = 10959 fsync(8) = 0 close(8) = 0 utime("/home/uwe/Maildir-work/.lists.linux-arm-kernel/tmp/1631042587.213028_341579.taurus,U=4727928:2,", {actime=1562341587 /* 2019-07-05T17:46:27+0200 */, modtime=1562341587 /* 2019-07-05T17:46:27+0200 */}) = 0 rename("/home/uwe/Maildir-work/.lists.linux-arm-kernel/tmp/1631042587.213028_341579.taurus,U=4727928:2,", "/home/uwe/Maildir-work/.lists.linux-arm-kernel/new/1631042587.213028_341579.taurus,U=4727928:2,") = 0 write(7, "> 4728200 0 4727928\n", 20) = 20 lseek(3, 0, SEEK_SET) = 0 write(3, "1591221302\n4727929\n", 19) = 19 ftruncate(3, 19) = 0 fdatasync(3) = 0 openat(AT_FDCWD, "/home/uwe/Maildir-work/.lists.linux-arm-kernel/tmp/1631042587.213028_341580.taurus,U=4727929:2,", O_WRONLY|O_CREAT|O_EXCL, 0600) = 8 write(8, "Return-path: <linux-arm-kernel-b"..., 10563) = 10563 fsync(8) = 0 close(8) = 0 utime("/home/uwe/Maildir-work/.lists.linux-arm-kernel/tmp/1631042587.213028_341580.taurus,U=4727929:2,", {actime=1562736076 /* 2019-07-10T07:21:16+0200 */, modtime=1562736076 /* 2019-07-10T07:21:16+0200 */}) = 0 rename("/home/uwe/Maildir-work/.lists.linux-arm-kernel/tmp/1631042587.213028_341580.taurus,U=4727929:2,", "/home/uwe/Maildir-work/.lists.linux-arm-kernel/new/1631042587.213028_341580.taurus,U=4727929:2,") = 0 write(7, "> 4728201 0 4727929\n", 20) = 20 poll([{fd=4, events=POLLIN|POLLOUT}], 1, 2000) = 1 ([{fd=4, revents=POLLIN}]) write(4, "\27\3\3\0F\250\304\334\237W\235\257\307\276\256fiy\374[\3057\"V\25\332\370V\366<Q\321"..., 75) = -1 EAGAIN (Resource temporarily unavailable) read(4, "\27\3\3\20\21", 5) = 5 read(4, "\276\352\253\355\3733\237vz\340\177\330\254\307\366\356\243\2355em\323\272*\312\24\247\276\333\0\334\216"..., 4113) = 4113 poll([{fd=4, events=POLLIN|POLLOUT}], 1, 2000) = 1 ([{fd=4, revents=POLLIN}]) write(4, "\27\3\3\0F\250\304\334\237W\235\257\307\276\256fiy\374[\3057\"V\25\332\370V\366<Q\321"..., 75) = -1 EAGAIN (Resource temporarily unavailable) read(4, "\27\3\3\2J", 5) = 5 read(4, "Qq\17\341\271\205\353\n\210\236\371H\340\235\r\241)S\251p\257\205\36Z\245K\30\222\347\346\224 "..., 586) = 586 poll([{fd=4, events=POLLIN|POLLOUT}], 1, 2000) = 1 ([{fd=4, revents=POLLIN}]) write(4, "\27\3\3\0F\250\304\334\237W\235\257\307\276\256fiy\374[\3057\"V\25\332\370V\366<Q\321"..., 75) = -1 EAGAIN (Resource temporarily unavailable) read(4, "\27\3\3\vX", 5) = 5 read(4, "\320\32\217HM\226y\362\17\5\320\363\2\35z\365\320Bl\6+\305dZ^\371<\373\200zU\334"..., 2904) = 2904 poll([{fd=4, events=POLLIN|POLLOUT}], 1, 2000) = 1 ([{fd=4, revents=POLLIN}]) write(4, "\27\3\3\0F\250\304\334\237W\235\257\307\276\256fiy\374[\3057\"V\25\332\370V\366<Q\321"..., 75) = -1 EAGAIN (Resource temporarily unavailable) read(4, "\27\3\3\20\21", 5) = 5 read(4, "\225\363\341W\213\327\345$n&d\2\27m\265\rR`\260D\356g4ERz\31330\305\323\263"..., 4113) = 4113 poll([{fd=4, events=POLLIN|POLLOUT}], 1, 2000) = 1 ([{fd=4, revents=POLLIN}]) write(4, "\27\3\3\0F\250\304\334\237W\235\257\307\276\256fiy\374[\3057\"V\25\332\370V\366<Q\321"..., 75) = -1 EAGAIN (Resource temporarily unavailable) read(4, "\27\3\3\4\312", 5) = 5 read(4, "\26\31D_\21Y\201\17\34\373\274\263[]\260\245\243\351*u\235\376\2633/\300\225w\304\274\16U"..., 1226) = 1226 poll([{fd=4, events=POLLIN|POLLOUT}], 1, 2000) = 1 ([{fd=4, revents=POLLIN}]) write(4, "\27\3\3\0F\250\304\334\237W\235\257\307\276\256fiy\374[\3057\"V\25\332\370V\366<Q\321"..., 75) = -1 EAGAIN (Resource temporarily unavailable) read(4, "\27\3\3\t\220", 5) = 5 read(4, "CG\272s%\337P .[\251\362|p\371?F\340\37\3708\211[\360(\24C\217\372\341\200\274"..., 2448) = 2448 poll([{fd=4, events=POLLIN|POLLOUT}], 1, 2000) = 1 ([{fd=4, revents=POLLIN}]) write(4, "\27\3\3\0F\250\304\334\237W\235\257\307\276\256fiy\374[\3057\"V\25\332\370V\366<Q\321"..., 75) = -1 EAGAIN (Resource temporarily unavailable) read(4, "\27\3\3\20\21", 5) = 5 read(4, "l z\34\22\221):X7\227\323\5\3702_\246:p\322\302w$\300m\22\242\251\35\213W\360"..., 4113) = 4113 poll([{fd=4, events=POLLIN|POLLOUT}], 1, 2000) = 1 ([{fd=4, revents=POLLIN}]) write(4, "\27\3\3\0F\250\304\334\237W\235\257\307\276\256fiy\374[\3057\"V\25\332\370V\366<Q\321"..., 75) = -1 EAGAIN (Resource temporarily unavailable) read(4, "\27\3\3\6\222", 5) = 5 read(4, "A\207`\364\26\213US\225\360\340\231\31y]\300V\257S\7\202\311\304\25\\\324\315FSc\206\350"..., 1682) = 1682 poll([{fd=4, events=POLLIN|POLLOUT}], 1, 2000) = 1 ([{fd=4, revents=POLLIN}]) write(4, "\27\3\3\0F\250\304\334\237W\235\257\307\276\256fiy\374[\3057\"V\25\332\370V\366<Q\321"..., 75) = -1 EAGAIN (Resource temporarily unavailable) read(4, "\27\3\3\7\226", 5) = 5 read(4, "\332\331\236\353\fx\1\331\261\314T\t\332\203\0\337/\3460\207\307w!\320\3430H\20\232t\304\276"..., 1942) = 1942 poll([{fd=4, events=POLLIN|POLLOUT}], 1, 2000) = 1 ([{fd=4, revents=POLLIN}]) write(4, "\27\3\3\0F\250\304\334\237W\235\257\307\276\256fiy\374[\3057\"V\25\332\370V\366<Q\321"..., 75) = -1 EAGAIN (Resource temporarily unavailable) read(4, "\27\3\3\20\21", 5) = 5 read(4, "\25\353*\200\241\234\0332y\274S#|\325\307\254;\4\244\3772\343i9\232K\213\6K\332?O"..., 4113) = 4113 poll([{fd=4, events=POLLIN|POLLOUT}], 1, 2000) = 1 ([{fd=4, revents=POLLIN}]) write(4, "\27\3\3\0F\250\304\334\237W\235\257\307\276\256fiy\374[\3057\"V\25\332\370V\366<Q\321"..., 75) = -1 EAGAIN (Resource temporarily unavailable) read(4, "\27\3\3\10\214", 5) = 5 read(4, "\206\251\203\10R2{\231B%#\274\342~\22\34\342\241Y\25\206\206Y\227B\272\ncA\277\3033"..., 2188) = 2188 poll([{fd=4, events=POLLIN|POLLOUT}], 1, 2000) = 1 ([{fd=4, revents=POLLIN}]) write(4, "\27\3\3\0F\250\304\334\237W\235\257\307\276\256fiy\374[\3057\"V\25\332\370V\366<Q\321"..., 75) = -1 EAGAIN (Resource temporarily unavailable) read(4, "\27\3\3\20\21", 5) = 5 read(4, "\340\360\366\345\34\222B]l\355V\215\26\317a\7\357E\2143i\356\"\273\335G2\0\263\353K\225"..., 4113) = 4113 poll([{fd=4, events=POLLIN|POLLOUT}], 1, 2000) = 1 ([{fd=4, revents=POLLIN}]) write(4, "\27\3\3\0F\250\304\334\237W\235\257\307\276\256fiy\374[\3057\"V\25\332\370V\366<Q\321"..., 75) = -1 EAGAIN (Resource temporarily unavailable) read(4, "\27\3\3\20\21", 5) = 5 read(4, "n\366\203\331\365\211\203)\266\3272\223\3^\212\3}^\371Z\256\3731_\334C\343\7`\233-\22"..., 4113) = 4113 poll([{fd=4, events=POLLIN|POLLOUT}], 1, 2000) = 1 ([{fd=4, revents=POLLIN}]) write(4, "\27\3\3\0F\250\304\334\237W\235\257\307\276\256fiy\374[\3057\"V\25\332\370V\366<Q\321"..., 75) = -1 EAGAIN (Resource temporarily unavailable) read(4, "\27\3\3\20\21", 5) = 5 read(4, "[\364\317\22\205\276\321\247\215~\236\372\266~B>\21\350\215u\302\347\31\315\356Q8Y\305l\262\237"..., 4113) = 4113 poll([{fd=4, events=POLLIN|POLLOUT}], 1, 2000) = 1 ([{fd=4, revents=POLLIN}]) write(4, "\27\3\3\0F\250\304\334\237W\235\257\307\276\256fiy\374[\3057\"V\25\332\370V\366<Q\321"..., 75) = -1 EAGAIN (Resource temporarily unavailable) read(4, "\27\3\3\20\21", 5) = 5 read(4, "\232#Y\27\351\243\363e=\17\267\356\365CCxar\373\230\322\331\24\315X\322zj\206O\262\257"..., 4113) = 4113 poll([{fd=4, events=POLLIN|POLLOUT}], 1, 2000) = 1 ([{fd=4, revents=POLLIN}]) write(4, "\27\3\3\0F\250\304\334\237W\235\257\307\276\256fiy\374[\3057\"V\25\332\370V\366<Q\321"..., 75) = -1 EAGAIN (Resource temporarily unavailable) read(4, "\27\3\3\2A", 5) = 5 read(4, "\236\345\357\333}\37\374\33\347\355\252(\261P\36\30\210\226f\221\347\360\265xj\215\361\222s6BJ"..., 577) = 577 poll([{fd=4, events=POLLIN|POLLOUT}], 1, 2000) = 1 ([{fd=4, revents=POLLIN}]) write(4, "\27\3\3\0F\250\304\334\237W\235\257\307\276\256fiy\374[\3057\"V\25\332\370V\366<Q\321"..., 75) = -1 EAGAIN (Resource temporarily unavailable) read(4, "\27\3\3\0\365", 5) = 5 read(4, "e+\231s\344,M\tN\344\10\360l\276\365\33AS?\366o\310o\310>l\361l\214P\f\25"..., 245) = 245 lseek(3, 0, SEEK_SET) = 0 write(3, "1591221302\n4727930\n", 19) = 19 ftruncate(3, 19) = 0 fdatasync(3) = 0 openat(AT_FDCWD, "/home/uwe/Maildir-work/.lists.linux-arm-kernel/tmp/1631042587.213028_341581.taurus,U=4727930:2,", O_WRONLY|O_CREAT|O_EXCL, 0600) = 8 write(8, "Return-path: <linux-arm-kernel-b"..., 7344) = 7344 fsync(8) = 0 close(8) = 0 utime("/home/uwe/Maildir-work/.lists.linux-arm-kernel/tmp/1631042587.213028_341581.taurus,U=4727930:2,", {actime=1562341670 /* 2019-07-05T17:47:50+0200 */, modtime=1562341670 /* 2019-07-05T17:47:50+0200 */}) = 0 rename("/home/uwe/Maildir-work/.lists.linux-arm-kernel/tmp/1631042587.213028_341581.taurus,U=4727930:2,", "/home/uwe/Maildir-work/.lists.linux-arm-kernel/new/1631042587.213028_341581.taurus,U=4727930:2,") = 0 write(7, "> 4728202 0 4727930\n", 20) = 20 lseek(3, 0, SEEK_SET) = 0 write(3, "1591221302\n4727931\n", 19) = 19 ftruncate(3, 19) = 0 fdatasync(3) = 0 openat(AT_FDCWD, "/home/uwe/Maildir-work/.lists.linux-arm-kernel/tmp/1631042587.213028_341582.taurus,U=4727931:2,", O_WRONLY|O_CREAT|O_EXCL, 0600) = 8 write(8, "Return-path: <linux-arm-kernel-b"..., 7517) = 7517 fsync(8) = 0 close(8) = 0 utime("/home/uwe/Maildir-work/.lists.linux-arm-kernel/tmp/1631042587.213028_341582.taurus,U=4727931:2,", {actime=1562337704 /* 2019-07-05T16:41:44+0200 */, modtime=1562337704 /* 2019-07-05T16:41:44+0200 */}) = 0 rename("/home/uwe/Maildir-work/.lists.linux-arm-kernel/tmp/1631042587.213028_341582.taurus,U=4727931:2,", "/home/uwe/Maildir-work/.lists.linux-arm-kernel/new/1631042587.213028_341582.taurus,U=4727931:2,") = 0 write(7, "> 4728203 0 4727931\n", 20) = 20 lseek(3, 0, SEEK_SET) = 0 write(3, "1591221302\n4727932\n", 19) = 19 ftruncate(3, 19) = 0 fdatasync(3) = 0 openat(AT_FDCWD, "/home/uwe/Maildir-work/.lists.linux-arm-kernel/tmp/1631042587.213028_341583.taurus,U=4727932:2,", O_WRONLY|O_CREAT|O_EXCL, 0600) = 8 write(8, "Return-path: <linux-arm-kernel-b"..., 7470) = 7470 ... The process has the following mapping of fd numbers to files: lr-x------ 1 uwe uwe 64 Sep 7 21:27 0 -> /dev/null lrwx------ 1 uwe uwe 64 Sep 7 21:27 1 -> 'socket:[669143]' lrwx------ 1 uwe uwe 64 Sep 7 21:27 2 -> 'socket:[669143]' lrwx------ 1 uwe uwe 64 Sep 7 21:27 3 -> /home/uwe/Maildir-work/.lists.linux-arm-kernel/.uidvalidity lrwx------ 1 uwe uwe 64 Sep 7 21:27 4 -> 'socket:[669937]' l-wx------ 1 uwe uwe 64 Sep 7 21:27 5 -> /home/uwe/Maildir-work/.lists.linux-arm-kernel/.mbsyncstate.lock l-wx------ 1 uwe uwe 64 Sep 7 21:27 6 -> /home/uwe/Maildir-work/.lists.linux-arm-kernel/.mbsyncstate.new l-wx------ 1 uwe uwe 64 Sep 7 21:27 7 -> /home/uwe/Maildir-work/.lists.linux-arm-kernel/.mbsyncstate.journal l-wx------ 1 uwe uwe 64 Sep 7 21:27 8 -> '/home/uwe/Maildir-work/.lists.linux-arm-kernel/tmp/1631042850.213028_347408.taurus,U=4733757:2,' Maybe it's the server's fault if it scrambled all UIDs? >> I saw mbsync write to one of its files at 100% yesterday while the ssh >> tunnel was already gone. Currently the ssh tunnel is up. >> This was before I suspended the machine, and some time after wakeup >> today, the loss of 460840 messages was reported from that process. I added -V to the invokation of mbsync and the log currently has: Sep 07 16:55:41 taurus syncmail-work[212854]: Opening master box lists/linux-arm-kernel... Sep 07 16:55:41 taurus syncmail-work[212854]: Opening slave box lists/linux-arm-kernel... Sep 07 16:55:42 taurus syncmail-work[212854]: Loading master... Sep 07 16:55:42 taurus syncmail-work[212854]: Loading slave... Sep 07 16:55:45 taurus syncmail-work[212854]: slave: 4386047 messages, 513211 recent Sep 07 16:55:52 taurus syncmail-work[212854]: master: 523665 messages, 2 recent Sep 07 16:55:52 taurus syncmail-work[212854]: Synchronizing... [...] Sep 07 17:02:41 taurus syncmail-work[213028]: Opening master box lists/linux-arm-kernel... Sep 07 17:02:41 taurus syncmail-work[213028]: Opening slave box lists/linux-arm-kernel... Sep 07 17:02:42 taurus syncmail-work[213028]: Loading master... Sep 07 17:02:42 taurus syncmail-work[213028]: Loading slave... Sep 07 17:02:44 taurus syncmail-work[213028]: slave: 4386049 messages, 513213 recent Sep 07 17:03:02 taurus syncmail-work[213028]: master: 523670 messages, 523670 recent Sep 07 17:03:02 taurus syncmail-work[213028]: Synchronizing... the first part is from the previous run which completed successfully a bit later. Now it's 21:38 local time here and mbsync still works on lists/linux-arm-kernel/ . > ok, now i want to see the -D logs from an event like that, bracketed by > timestamps. I was lazy and didn't save -D logs. >> Would it make sense to put my maildir into a git repo (including the >> mbsync files) and commit after each run? >> > that's an interesting idea. in particular, this could be used to verify > that notmuch doesn't mess up anything in the synced folders. Also this I didn't implement yet ... >> Is there a more suitable place where I can read about how mbsync >> tracks mail than the source code to understand the content of these >> files? > > nope. everything is in sync.c. load_state() is probably most helpful, as > it also provides insight into the journal's meaning. ok. It seems I have to do some homework now ... Best regards Uwe |