I found another glitch, this one present in 0.9.9 as well as the latest CVS version.
The example mbox file at http://servlets.com/mstor/imagined.mbox demonstrates. If you run a mstor getCount() against it you'll see 294 messages. If you run this grep command:
grep ^From\ imagined.mbox | wc -l
Then you get 293.
Inside the message processing there is one message in the list that has no headers and no content, what you could call the "phantom message". If you delete the first entry in the mbox file, or the second, there's no problem. If you delete the third, the bug is still there.
-jh-
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Apologies for taking so long to respond to this - I've been tied up for a while with other projects and just haven't had time to work on mstor for a while.
Anyway, I think I found the problem that was creating the phantom messages. When calculating message positions the end of the previous buffer is preserved in case it contains part of a "From_" pattern. As I was preserving enough to contain an entire "From_" pattern, in a scenario where this pattern occurs right at the end of the previous buffer, it will get a double match (once in previous buffer and once in current). Sounds highly unlikely, but it looks like that was happening here (I now add one to the buffer offset and the problem has gone away).
Anyway, just thought I'd share that. :)
regards,
ben
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I found another glitch, this one present in 0.9.9 as well as the latest CVS version.
The example mbox file at http://servlets.com/mstor/imagined.mbox demonstrates. If you run a mstor getCount() against it you'll see 294 messages. If you run this grep command:
grep ^From\ imagined.mbox | wc -l
Then you get 293.
Inside the message processing there is one message in the list that has no headers and no content, what you could call the "phantom message". If you delete the first entry in the mbox file, or the second, there's no problem. If you delete the third, the bug is still there.
-jh-
Hi Jason,
Apologies for taking so long to respond to this - I've been tied up for a while with other projects and just haven't had time to work on mstor for a while.
Anyway, I think I found the problem that was creating the phantom messages. When calculating message positions the end of the previous buffer is preserved in case it contains part of a "From_" pattern. As I was preserving enough to contain an entire "From_" pattern, in a scenario where this pattern occurs right at the end of the previous buffer, it will get a double match (once in previous buffer and once in current). Sounds highly unlikely, but it looks like that was happening here (I now add one to the buffer offset and the problem has gone away).
Anyway, just thought I'd share that. :)
regards,
ben