Share

OpenCBM

Tracker: Bugs

5 Rare reliability problems/turbo transfer hangups - ID: 1070076
Last Update: Comment added ( strik )

Out of scripted testing procedure that were run over a
long time, there are some rare transfer hangups to
report. These hangups seem to occur only, when any of
the turbo protocols is used (paralle, serial1, serial
with or without warp).

Test scenario/test command:
d64copy -w -tparallel -s1 -e1 8 junk.d64

The hangup occurs after the transfer itself is finished
(100%), but the summarizing message of the number of
blocks transferred is not given out then. Instead the
command hangs while the floppy drive seems to be in
idle state. The command can be aborted by pressing
CTRL-C. After such an abort, the driver is answering to
following commands without any problems.

From inserting some debugging fprintf's into all the
close functions of all protocol implementations I could
see that the turbo transfer protocol shutdown is not
working sometimes.

In the attched logfile of the test run of the command
above you can see these debugging comments and some
assumptions about the real cause for this bug along
with further descriptions of the bug cause assumption.


Womo


Nobody/Anonymous ( nobody ) - 2004-11-20 17:07

5

Closed

Accepted

Spiro Trikaliotis

d64copy

v0.1.0

Public


Comments ( 10 )




Date: 2005-05-16 15:07
Sender: strikProject Admin

Logged In: YES
user_id=1059994

I'm sorry, I mixed the versions. It already is fixed in 0.1.0a-8.
This will be merged into 0.1.0.20.

-- Spiro.


Date: 2005-05-16 15:00
Sender: strikProject Admin

Logged In: YES
user_id=1059994

Ok, this is fixed with 0.1.0.20 (and 0.1.0a-9). Thanks to
"svs_fire" for testing the patch.

-- Spiro.


Date: 2005-03-09 08:30
Sender: strikProject Admin

Logged In: YES
user_id=1059994

reopened this bug.

This problem does not only occur with d64copy, but also with
cbmcopy. It has to be fixed there, too.

-- Spiro.


Date: 2004-12-07 20:01
Sender: strikProject Admin

Logged In: YES
user_id=1059994

Hello,

thanks for analysing this so deeply. I will check if I will
put your fix (or a variant of it) into CVS.

For the time being, I'm setting this to "pending".

- Spiro.


Date: 2004-12-07 07:55
Sender: nobody

Logged In: NO

A new test run without any additional Debug messages enabled
worked flawlessly. For this test I used the second fix
variant (minimal code changes) sent around on 20041206 via
private mail.

The status of this bug item can be set to "Closed" now.


Womo



Date: 2004-12-06 08:32
Sender: nobody

Logged In: NO

My fix seems to work for the parallel, serial1 and serial2
protocol. A 10 hours test does not show up problems anymore.
This test was run with Debug messages within the protocol
implementations, so another final test would be needed to be
absolutely sure.

The fix was sent to you for a PM discussion about different
implementation possibilities.

Womo



Date: 2004-12-05 09:44
Sender: nobody

Logged In: NO

The rare protocol shutdown problems with the serial1 and
serial2 turbo transfers are identified. I put Debug
statements from Spiro's logging facility into the desired
sources (s1.c, s2.c) and discovered the following from the
last night's test script run:

As with the parallel protocol (pp.c), when the turbo
protocol is shutdown (sending two zeros as T/S? to the turbo
loader), then the very last Floppy->PC handshake (wait for
Data becoming 1) is somtimes missed. This results in an
endless wait of the PC side.

As with the parallel turbo handler, the fix would be to omit
that last handshake, when shutting down the protocol.
Instead of applying the same fix to the s1 and s2 protocols
I'll try to find a better fix that does not duplicate lot's
of code in the close_disk function. Something that is
similar to Michaels proposion, but with omitting the
additional parameter.


Womo



Date: 2004-12-04 09:38
Sender: nobody

Logged In: NO

New test results from the remaining protocols (beside the
fixed parallel one in my version 0.1.0-plWM014):

Each of the following test groups was run for at least 2
hours, a group here means transferring track 1 from a disk
image to an external floppy disk and retransferring the same
track 1 back into another disk image followed by a byte
comparison of both disk images.

498 tests with "-w -ts2" : 6 WatchDog triggers
410 tests with "-ts2" : no WatchDog triggers
323 tests with "-w -ts1" : 4 WatchDog triggers
322 tests with "-ts2" : no WatchDog triggers
184 tests with "-to" : no WatchDog triggers

Further results:
* WatchDog was only triggerd, when reading from
an external disk into a disk image, writing seems to be
stable (no hangups)
* WatchDog was only triggered _after_ the end of a
transfer, this means the turbo protocol shutdown seems
not to work under rare circumstances (as with the parallel
protocol)
* WatchDog triggers could only be watched for warp
mode transfers

With test number 01594, running a "-to" transfer, a read
error could be watched which was not repairable. Therefore
the lasting comparison failed. Maybe that the fomer disk
image write process already had some problem. Since verify
is not used for my tests, I can't say exactly.

Find the full logfile as well as an excerpt from at:
http://d81.de/shared/0100-plWM014_Testlog-20041203-001.zip


Womo



Date: 2004-11-30 21:43
Sender: nobody

Logged In: NO

With versions 0.0.12.5 and 0.1.0 using the parallel transfer
protocols, the problem could reproduced again. A fix to it
was already discussed in personal mails.
At least the parallel protocol shutdown hangup can be
declared as beeing solved for at least cbm4win. Don't know,
how it works out under cbm4linux.

Currently these shutdown problems cannot be reproduced with
any other of the turbo protocols (serial1 or serial2), I'll
do some more tests to find possible problems and perhaps a
workout for these, but I think, this bug item can be closed
until I really find a problem with the other protocols.

Womo



Date: 2004-11-26 07:36
Sender: nobody

Logged In: NO

I'm not able to reproduce this problem currently. With
versions 0.0.12.3 and 0.0.12.5 I made some "real user task"
tests. That is, not transferring only a single track of a
disk, but a whole disk image. This then means that I don't
have as much Turbo protocol startups and shutdowns within
the same test time as before, which could be another reason,
why I'm not observing this problem currently. In the future
I'll repeat the special Turbo protocol startup/shutdown
tests again to see, if the problem exists furthermore.

Womo



Log in to comment.




Attached File ( 1 )

Filename Description Download
d64copy-w-tparallel.log Logfile of the parallel warp test run Download

Changes ( 8 )

Field Old Value Date By
close_date 2004-12-07 20:01 2005-05-16 15:00 strik
status_id Open 2005-05-16 15:00 strik
status_id Closed 2005-03-09 08:30 strik
artifact_group_id v0.0.12 2004-12-07 20:01 strik
close_date - 2004-12-07 20:01 strik
status_id Open 2004-12-07 20:01 strik
resolution_id None 2004-12-07 20:01 strik
File Added 109501: d64copy-w-tparallel.log 2004-11-20 17:07 nobody