Menu

#52 SHA checking large files

open
nobody
None
5
2004-08-02
2004-08-02
SirKilljoy
No

I mainly share interesting 700mb files using Waste, and
found without SHA checking on resume, occasionly files
were getting corrupt while transfering.
Switching SHA checking on files from the default 32mb
to 700mb (or larger), solved this resume error except
that the source and destination machine would hang and
timeout, if the SHA check needed to read the file for any
length of time (For example, if the source file was across
a local network).

To solve the problem for my local Waste group we
created a modified version of Waste that changes the
SHA check from reading every single byte, to only
cheching every so often, for example every 10mb:-

snipit from
void XferRecv::onGotMsg(C_FileSendReply *reply)

while (l>0) {
unsigned char buf[8192];
unsigned int a=sizeof(buf);
if (a > l) a=l;
#ifdef XFER_WIN32_FILEIO
DWORD d;
if (!ReadFile(m_houtfile,
buf,sizeof(buf),&d,NULL)) d=0;
a=d;
#else
a=fread(buf,1,a,m_outfile);
#endif
if (!a) break;
// Jump 10mb before doing the next 8kb SHA check
#ifdef XFER_WIN32_FILEIO
if(SetFilePointer(m_houtfile,
10*1024*1024,NULL,FILE_CURRENT)
== 0xFFFFFFFF)
break;
#else
if(fseek(m_outfile,10*1024*1024,SEEK_CUR))
break;
#endif
context.add(buf,a);
l-=a;
};

As ever this was a temporary change on my behalf and
would love to see a proper change to Waste that solves
this rather important error. As a p2p that can't transfer
files correctly is pointless even if it is encrypted.

(I would be happy to offer my services as a win32 C++
programmer to a solution)

SirKilljoy
Movie Rich, Time Poor & Happiness Wasted.

Discussion

  • Chris

    Chris - 2004-09-27

    Logged In: YES
    user_id=573676

    I assume that in order for this to work you must have done
    a similar thing inside XferSend::XferSend ??

    Help clarify the situation for me. I can understand that if
    the sender takes longer than the connection timeout to hash
    the file then the send will never take place. Have you also
    observed cases where the sender completes in time, the file
    is sent, and then there is a problem on the receiving end?
    Some significant delay would be expected when verifying the
    hash, but does it cause any other nasty side effects?

    One way of lessening the impact of this problem is to keep
    file hashes in the file DB so that it wouldn't need to be
    calculated on send. I will likely implement this as it can
    be useful in other ways (searching), but it certainly won't
    solve any problems you're seeing in terms of sends for files
    that have not yet been hashed or on the verify at the
    receiving end.

    Hashing the files definately seems necesarry, though maybe
    md5 would be faster if not more secure. If we can minimize
    side effects (timeouts) by doing it asynchronously that
    would definately help. We shoud also try to hunt down what
    might be causing the file corruption to begin with, as that
    seems like a seperate problem.

     
  • SirKilljoy

    SirKilljoy - 2004-11-20

    Logged In: YES
    user_id=1063206

    Q: I assume that in order for this to work you must have done
    a similar thing inside XferSend::XferSend ??

    A: Yes I did.

    Q: Have you also observed cases where the sender completes
    in time, the file is sent, and then there is a problem on the
    receiving end?

    A1: With the modification I made, yes. My code would
    occasionly state a file mismatch after the receiver has
    received all 100% of the file. Independant tests showed the
    files to be identical between sender and receiver. Within my
    local group of Waste users, we now ignore this error message.
    A2: Running standard waste, we never saw an error on the
    receivers end, part of this is due to the fact that we all used
    local drives as our waste download folder and so hash
    checking of the incomplete download file would complete in 20
    seconds or so.

    Q: Keeping the hash value of every source file.

    A: I am pleased that your considering a solution, for my two
    cents, it sounds a good solution (except for users who have
    their download folder located somewhere with slow access,
    which on my quick survey was zero users do).

    If you need any help, I'm happy to offer my services.
    Especially in the area of upgrading of code around source
    filelist db. I could, if I can be allowed to sound so bold, think
    of other improvements esp. conserning AVI files; like bitrate,
    codec and resolution for example.

     
  • SirKilljoy

    SirKilljoy - 2004-11-20

    Logged In: YES
    user_id=1063206

    Q: I assume that in order for this to work you must have done
    a similar thing inside XferSend::XferSend ??

    A: Yes I did.

    Q: Have you also observed cases where the sender completes
    in time, the file is sent, and then there is a problem on the
    receiving end?

    A1: With the modification I made, yes. My code would
    occasionly state a file mismatch after the receiver has
    received all 100% of the file. Independant tests showed the
    files to be identical between sender and receiver. Within my
    local group of Waste users, we now ignore this error message.
    A2: Running standard waste, we never saw an error on the
    receivers end, part of this is due to the fact that we all used
    local drives as our waste download folder and so hash
    checking of the incomplete download file would complete in 20
    seconds or so.

    Q: Keeping the hash value of every source file.

    A: I am pleased that your considering a solution, for my two
    cents, it sounds a good solution (except for users who have
    their download folder located somewhere with slow access,
    which on my quick survey was zero users do).

    If you need any help, I'm happy to offer my services.
    Especially in the area of upgrading of code around source
    filelist db. I could, if I can be allowed to sound so bold, think
    of other improvements esp. conserning AVI files; like bitrate,
    codec and resolution for example.

     
  • SirKilljoy

    SirKilljoy - 2004-11-20

    Logged In: YES
    user_id=1063206

    Q: I assume that in order for this to work you must have done
    a similar thing inside XferSend::XferSend ??

    A: Yes I did.

    Q: Have you also observed cases where the sender completes
    in time, the file is sent, and then there is a problem on the
    receiving end?

    A1: With the modification I made, yes. My code would
    occasionly state a file mismatch after the receiver has
    received all 100% of the file. Independant tests showed the
    files to be identical between sender and receiver. Within my
    local group of Waste users, we now ignore this error message.
    A2: Running standard waste, we never saw an error on the
    receivers end, part of this is due to the fact that we all used
    local drives as our waste download folder and so hash
    checking of the incomplete download file would complete in 20
    seconds or so.

    Q: Keeping the hash value of every source file.

    A: I am pleased that your considering a solution, for my two
    cents, it sounds a good solution (except for users who have
    their download folder located somewhere with slow access,
    which on my quick survey was zero users do).

    If you need any help, I'm happy to offer my services.
    Especially in the area of upgrading of code around source
    filelist db. I could, if I can be allowed to sound so bold, think
    of other improvements esp. conserning AVI files; like bitrate,
    codec and resolution for example.

     
  • SirKilljoy

    SirKilljoy - 2004-11-20

    Logged In: YES
    user_id=1063206

    Q: I assume that in order for this to work you must have done
    a similar thing inside XferSend::XferSend ??

    A: Yes I did.

    Q: Have you also observed cases where the sender completes
    in time, the file is sent, and then there is a problem on the
    receiving end?

    A1: With the modification I made, yes. My code would
    occasionly state a file mismatch after the receiver has
    received all 100% of the file. Independant tests showed the
    files to be identical between sender and receiver. Within my
    local group of Waste users, we now ignore this error message.
    A2: Running standard waste, we never saw an error on the
    receivers end, part of this is due to the fact that we all used
    local drives as our waste download folder and so hash
    checking of the incomplete download file would complete in 20
    seconds or so.

    Q: Keeping the hash value of every source file.

    A: I am pleased that your considering a solution, for my two
    cents, it sounds a good solution (except for users who have
    their download folder located somewhere with slow access,
    which on my quick survey was zero users do).

    If you need any help, I'm happy to offer my services.
    Especially in the area of upgrading of code around source
    filelist db. I could, if I can be allowed to sound so bold, think
    of other improvements esp. conserning AVI files; like bitrate,
    codec and resolution for example.

     
  • Nobody/Anonymous

    Logged In: NO

    What I'd like to see is something similar to some of the newer p2p applications which will split the file into pieces and do a sha on the pieces as they come in. This modification would allow waste to download pieces from more than once source, as well as allow sha checks on pieces of data while its being downloaded, which should resolve the bug above.

     

Log in to post a comment.

MongoDB Logo MongoDB