2006-12-15 06:42:53 PST
"It kills the program and presents the user with an ugly error message. Software protection gives the program a chance to recover and when the code can't recover it can usually terminate properly with an informative error message."
The error in the program does not have to present the user with an ugly error message. Well formed applications, especially at the operating system level, could incorporate higher level concepts, such as exception throwing. As your code is executing you could call an OS API which would indicate where the exception handler is (as is inserted automatically--to handle scoping--by the compiler). When anything errant happens, the appropriate exception is thrown and handled politely.
The advantage of using this model is that the code and data is assumed to be correct, thereby not having to double-check everything. The only time when the application handles an error is when an error occurs at a given point and time, and only then does it process extra steps to handle the error.
"So loading a new page descriptor on a context switch is free?"
When an error occurs, it is not slow relatively speaking, as the entire context switch is handled in 100s of clock cycles. When a CPU is operating at 3.0 GHz, for example, that equates to a miniscule amount of time.
"If transitioning from user space to kernel space is free then why do most operating systems move graphics, networking and other speed critical drivers into kernel space?"
Because they have freer reign of the system at PL0. Plus, that code is core, trusted code and should be executed at PL0 in a well-designed system.
"If passing data between processes is free then why do they use copy-on-write and other hacks when passing data?"
Because most OSes cannot know in advance the type(s) of software that will be executing on it. As such, they code for the worst. The copying idea is designed to leave the errant code in pristine state, thereby having no chance of corruption.
However, since you are the author of your OS, you could include code which would not require a duplication of the data space, but rather a separate LDT or GDT selector which gets mapped into that space for the duration of the debuggin session.
"Hardware protection creates barriers that make it more difficult to write modular systems."
I disagree with that statement because hardware protection mechanisms can be created which do not create barriers. However, their use in that regard is not found widely in bulk OSes for whatever reason(s) (to which I do not understand because the core architecture itself--x86--can handle it).
"Moving code into the kernel to make it faster also adds security and instability problems because the hardware protection does not work when kernel code crashes."
Hardware protection still works when kernel code crashes. You can have multiple PL0 apps which are stalled in debugging. It all depends on how you setup your task management and how you isolate critical code. I have actually used my own debugger to debug problems with my debugger code. :) I had to setup some special traps and features to enable/disable blocks of code where I knew there was a problem, but it works. The hardware provides everything you need, and does so without the obsfucation layer imposed by most mainstream OSes.
If you're interested in learning how it can be done I would be happy to show you. Note also that I'm not debating you here, I just have some expertise in this area having spent so many years working on developing my OS, and learning that the methodologies in place by other mainstream OSes are inconsistent with the fundamental principles and philosophies inherent in the raw x86 architecture. The extent to which their software-enforced protection mechanisms operate obsfucate the underlying hardware mechanisms, those which actually make it very easy to deal with all sorts of traps and everything else.
I'll await your reply (in email, hopefully).
- Rick