#22 Too many headers marked read

closed
nobody
None
5
2003-12-23
2003-12-21
No

"nget" does a bad thing if you refresh the headers in a
group with one process while downloading with another.
At completion of the download, all headers that match
the regexp are marked read, not just the ones that
matched at the start of the download (that is, the only
ones which were attempted by that process).

Say, for instance, you're downloading "cool.thing",
with "nget -G some.group -r cool.thing", but at the
time you commence the download, all the parts haven't
been uploaded yet. A few hours later, you're still
downloading -- it's a long download -- but you figure
the parts should have arrived by now, so you do an
"nget -g some.group" in another window. Also in that
other window, "nget -G some.group -Tr cool.thing" shows
that, indeed, the rest of the parts have arrived. After
the first command completes, you recall it in your
shell to get the rest, but it retrieves nothing! You
have to use "-D" or "-U" to see the remaining parts.
Same story if "episode 2" comes in while episode 1 is
downloading (but you didn't specify "episode.1" because
it was the only one, at the time).

A work-around is to interrupt the first process, then
recall the command. Interrupted nget processes don't
mark anything read, not even the articles they've
already downloaded. (I think that's a bug, too.)

It would be nice if nget could mark read every
mult-part as it's downloaded, but maybe that would be
too much zipping and unzipping. (Note: we wouldn't want
to mark read single parts of incomplete mult-part
files!) An alternative would be to keep a file of "in
progress" downloads, and merge the information into the
group cache at completion *or* *interrupt*. Other nget
processes could also note the existence of the
in-progess file, and adjust their in-memory cache
accordingly.

Discussion

  • Logged In: YES
    user_id=803104

    Ah, I intended to make this a bug, not an RFE. Oh well.

     
  • Logged In: YES
    user_id=65253

    I haven't had an opportunity to test this yet, but I can't
    see how it would be possible. At the time of the first
    retrieve starting, nget doesn't know what the message-id's
    of the new posts are.. it couldn't possibly mark them as
    retrieved unless it reread the cache to get the new
    message-ids. Maybe you are just seeing dupe-file checking?
    Does using -dF instead of -D cause the new files to appear?

    Also, interrupting does save the current state. (You should
    see a message such as "term_handler: signal 2, shutting
    down.", and if you have debug turned on in your ngetrc, some
    stuff about saving mid_info). Are you using ^C to interrupt
    it, or something else?

     
  • Logged In: YES
    user_id=803104

    Hmm. This may be pilot error. My home directory is on NFS,
    and I'm not building against liblockfile. I'll update and
    add liblockfile to the mix, and try again, and let you know.

     
    • status: open --> closed
     
  • Logged In: YES
    user_id=803104

    OK, I updated from Sunday's CVS, and rebuilt against
    liblockfile 1.05, and you're right that interrupted
    downloads mark the multi-part binaries that were completed.

    Yesterday, I ran into indications that my article cache was
    corrupted, but that could have been done by the nget
    pre-liblockfile. (nget complained about corrupted lines in
    mid_info, and the complaint persisted even after deleting
    the mid_info cache and marking all the articles read, but it
    finally went away after flushing one of the crappier servers
    from the article cache.)

    Better just close this. Nevermind. :-)