Menu

#135 QNX demo disk crashes with new pit

maybe_some_day
closed
5
2012-10-15
2001-09-03
No

The QNX demo floppy can be downloaded for free at
http://www.qnx.com/iat/download/network.html

I'm using current CVS on 9/3/2001. With the new PIT
model, the QNX demo disk is reporting segmentation
faults and filesystem failures during the boot process.


QNX Demodisk Loader v1.5
[then a percent done going from 0% to 100%]
starting initial
../disk-COMMON/bin/ramdisk.demo terminated (SIGSEGV) at
10A1:0000BFD3.
ization
[a few seconds pause]
Timed out waiting for '/'.
[Then it continues to the first install screen. Press
space (or just wait) and it moves on to the second
screen, asking what kind of keyboard you have. I chose
F1 for North America.]

Progress Indicator[
Could not open '/image/image1.z', No such file or
directory.

Couldn't start '/bin/Dev -n8'.
Spawn of (No such file or directory).
File not present

Couldn't start '/bin/Net -S -d3 -n401'.
Spawn of (No such file or directory).
File not present
[then comes the screen which says that you don't have a
NIC. Press space or wait]

Progress indicator [ ]
Couldn't start '/bin/emu387'.
Spawn of (No such file or directory).
File not present

Couldn't start '/bin/Socklet -p1 qnxdemo'.
Spawn of (No such file or directory).
File not present

Couldn't start '/bin/phlib_s11'.

terminated (SIGSERGV) at 10A1:0000C006.

The only log message from the PIT code is:
00078501111e[ ] Unknown behavior when latching
during 2-part read.
00078501111e[ ] This message will not be
repeated.

If you instead do
configure --disable-new-pit; make
and boot the QNX demo disk, it works. There are no
SIGSEGV or "Could not open" messages, it loads the OS,
goes into graphics mode, etc. If you revert to the CVS
version from 2001-08-14 it also works.

And, the most strange of all, if I compile the current
CVS with the new pit and enable the debugger, IT ALSO
WORKS! This makes it hard to get an instruction trace
and say exactly where it fails.

I'll post more if I find any more clues.

Summary of what I've tried so far:
with current cvs:
configure --enable-new-pit. fails
configure --disable-new-pit. works
configure --enable-new-pit --enable-debugger. works
configure --disable-new-pit --enable-debugger. works

I tried them again with a cvs version from 8/14, with
the same results:
configure --enable-new-pit. fails
configure --disable-new-pit. works
configure --enable-new-pit --enable-debugger. works

Discussion

  • Greg Alexander

    Greg Alexander - 2001-09-04

    Logged In: YES
    user_id=125806

    Even stranger still, it works if ips=500000 but not if
    ips=1000000. You might check to make sure you aren't
    masking the ips problem when you enable the debugger. I
    have some guesses as to what this might be, but they're just
    guesses for now.

    The big difference between ips=500000 and ips=1000000 is
    that at ips=500000 we only call clock_all(2) instead of
    clock_all(1) because of a problem with the timing model of
    bochs. This could conceiveably mask an interrupt caused by
    a data/control word write, which could explain the problem.

    Try running the old pit with TIMER_DELTA in devices.cc set
    to 1 (warning: this will be hideously slow, but is helpful
    as a test.)

     
  • Greg Alexander

    Greg Alexander - 2001-09-04

    Logged In: YES
    user_id=125806

    Okay, I think I sort of understand the problem, and its a
    QNX "feature," not a bochs problem.

    It goes a little something like this:

    With a low ips, certain interrupts can be masked off.
    Specifically, those resulting from a control word write
    might get masked. This is a bochs bug (fundamentally a
    problem with the timing model), but is actually helping
    here.

    With ips=1000000 you get 1 instruction per timer tick, and
    the timer is an accurate model. However, this doesn't
    accurately represent any real x86 processor, which brings us
    to:

    With ips=4000000 you model the slowest x86 processor ever
    built, the 4MHz 8088 (which has somehow mysteriously had
    Pentium architecture changes added :) .) This allows you
    4 instructions per timer tick, and fixes the current problem
    by allowing a write to the control word followed by writes
    of the initial count before the timer gets to register an
    interrupt. (And thus QNX works.)

    There may be some minor fixes I can do to make the 1MHz
    behavior a little cleaner, but fundamentally the problem is
    that there is no 1MHz x86, so there may be code that won't
    run on an imaginary one.

    There also may be a PI_C_ problem exposed here, but that's
    not my area of expertise.

     
  • Greg Alexander

    Greg Alexander - 2001-09-04

    Logged In: YES
    user_id=125806

    This gets stranger by the minute.

    I found a bug in the latching code, but that's not the
    problem you're seeing here. At both 500KHz and 1MHz there
    are 2 writes to the PIT, but at 4MHz there are 3 writes.
    This makes no sense at all, and doesn't explain why 500KHz
    is working but 1MHz isn't (so far as I can tell both should
    be broken.) I sure wish I knew what was causing that
    SIGSERGV.

    Oh, and the masked interrupt problem I was talking about
    doesn't exist except for at ips values less than 500000.

    I have a feeling this is a PIC problem (or a PIC model
    problem) due to the
    [PIC ] IRQ lowest command 0xc2
    line, but I'm not sure.

     
  • Bryce Denney

    Bryce Denney - 2001-09-04

    Logged In: YES
    user_id=185114

    If I can find some combination of settings that causes the
    problem with the debugger enabled, I can get instruction
    traces at the point that the simulations diverge.

    That assumes that the old PIT and new PIT actually put up
    their interrupt at exactly the same time. If not, the
    instruction traces diverge as soon as the first PIT
    interrupt.

     
  • Bryce Denney

    Bryce Denney - 2001-09-28

    Logged In: YES
    user_id=185114

    Now that bug #465262 (timed events not consistent) is fixed,
    I hope that it will be possible to debug this. Before,
    there were some significant differences in how the
    simulation time got updated when the debugger was on or off,
    and even if tracing was on or off.

     
  • Bryce Denney

    Bryce Denney - 2001-12-06

    Logged In: YES
    user_id=185114

    Postpone until a later version.

     
  • Anonymous

    Anonymous - 2002-04-11

    Logged In: YES
    user_id=93674

    what's the status on this bug? Is it still present in 1.4?

     
  • Greg Alexander

    Greg Alexander - 2002-04-11

    Logged In: YES
    user_id=125806

    This bug still exists. I don't think it's a PIT problem,
    though I could be mistaken. The problem is that bochs
    naturally runs at unreasonably slow clock speeds in relation
    to modern x86 processors, so there isn't a clear fix. I
    believe the same problem exists on linux 2.?0? series
    kernels.

     
  • Volker Ruppert

    Volker Ruppert - 2006-10-17

    Logged In: YES
    user_id=376477

    This problem seems to be fixed in the current version of
    Bochs. In older releases (2.0.2, 2.1.1) QNX crashed at low
    ips values (e.g. 1000000), but now it works here, even at
    low values.

     

Log in to post a comment.