Scott Franco - 2019-04-05

With the 0.4 version of the pascaline.docx language standard document, it lays in cement three rules I have considered since back before the turn of the century with IP pascal. Two of them are new rules. Another is the rule that goes back to the 1995 compiler that I have actually tried to get rid of several times, failed, and now have decided was a good idea all along.

The three rules have profound impact on how Pascaline works. And in fact, these are fundamental principles discussed in other languages outside of the Pascal family (as well is other compilers in it, for example FPC).

The rules are:

  1. "uses" specifications merge absolutely. Any used module has its global namespace merged with the using module, and any duplicates are flagged as errors. Another way to say that is that uses makes the user and the used module behave as if they were compiled together in the same source file.
  2. Used modules don't import their used modules as well. I.E., if module A uses module B which uses C, A does not get merged with C as well. Module A must specifically use C if it needs definitions in C.
  3. There cannot be loops in uses or joins sequences. If module A uses or joins B, which uses or joins A, that's an error. If A uses/joins B which uses/joins C, ad infinitem, and module X references any of the others, its an error.

Ok, so why is that a big deal. #1 is a rule that has existed since 1995 with IP Pascal, and exists in other languages as well, for example UCSD, Borland, as well as languages outside Pascal. Its the simpliest way to treat modularity. It also creates a lot of difficulty with managing namespaces. If you use a module, you get its names. All of them. And then you can't use those names. This can get really annoying when you start getting a lot of different modules that use each other. Not only are the names in conflict, but if someone changes a module you use, any names added there can cause your module to fail.

There have been a lot of "solutions" to this problem over the years. ISO 10206 proposes the ability to select which names get imported from a particular module. This is seen in other languages as well. There is even the ability to "rename" imports to avoid conflicts. These solutions work, but, IMHO, they fix one mess by creating a bigger mess. Trying to track down what program objects are used where by name gets harder and harder, and the ability to rename imports makes it far worse.

The "right" solution to this issue is the joins statement and qualidents. With joins, if you want objects from an external module, but you don't feel like bringing that entire module into your own intimate namespace, you don't have to. You join the module and then qualident references to that module. Its a little more work (you need to qualident all external references), but it clarifies, not obfuscates the code and solves the problem.

mymodule.dothis

Makes it clear what you are doing and where you are doing it, that is, procedure "dothis" from module "mymodule". In fact, everyone should use joins. So why does the uses statement still exist?

Pretty much this comes down to the strings module and other modules of that kind, like services. There are modules in the Pascaline set that are just essential to getting anything done, and you don't want to have to qualident them constantly. They are stable (I.E., don't add new names often) and worth keeping their names reserved. Plus, if you don't like having them muck yp your namespace, you can join them instead of use them.

Over the years I have considered other solutions to the issues with uses. For example, why should definitions from used modules have more importance than system layer definitions? You can overlay a system definition like ord() or integer, but you can't overlay a uses name. Its an error. I considered several times having used module namespaces exist in a scoping layer below the using module, which would have made them possible to overlay. I guess the best rationale for why this should NOT be so is that you specifically used and therefore intended to include names from that module. System definitions don't have that property. For example, the Pascaline definition of "string" type is obviously Pascaline specific, and ISO 7185 programs don't know about it. [1]

In short, the decision with Pascaline is that joins is the way to handle naming comflicts, uses still "has a use", and that simplier is better in this case. The old rule stands.

For #2, this is a new rule with Pascaline/P6. IP Pascal didn't work that way. If you had a uses chain of A uses B uses C, etc., you get everything, A, B and C all merged into your program deck. This created a longstanding issue with creating complex system support module decks. Its a problem actually shared by languages like C, which only differentiates between local (static) definitions and global ones, and merges its entire global deck. Its a problem that C++ created namespaces for. Joins of course solves that as well, but this improvement to Pascaline has been a long time coming.

Having uses only work one level deep from the point of view of the using module does create issues. If module A uses module B, and module A needs to know about, say some type definitions in module C that module B used in its parameter lists, module A needs to use module C as well. Rule #2 does not really change the way that works, but it requires module A to specifically reference module C in the uses list. This works that way for joins as well.

This is not just a Pascaline issue. In C, with a module structure driven entirely by includes, its considered "bad form" to let an included module include other modules for you. Its a rule enforced by several code checkers.

Rule #3 is perhaps the hardest one to swallow, and this finally made it into the latest round of changes. The reason it is hard is that IP Pascal and other programs I have had for decades in fact rely on circular references. The easiest case to see why is deep nested error recovery. If program module A references utility module B, and module B encounters a fatal error, how does it bail out to module A? ISO 7185 Pascal says "use a goto" (indeed, that's why Pascal had gotos, for such deep nested bailouts, and why UCSD not implementing intraprocedural gotos was a mistake). However, using goto to get back to module A from module B is >>> impossible <<<. Trust me, it is. You need stack context to make interprocedural gotos work, and you just don't have that across separately compiled modules.

I had exactly that discussion with Tony Hethrington of Prospero when I was evaluating their ISO 10206 compiler. He acknowledged it was a problem, but he believed it should be solved with structured exceptions, which neither ISO 10206 nor IP Pascal implemented at the time. Years later I have to admit it. He was completely correct. Exception handlers create their own context recovery at runtime and thus completely solve the issue.

This is not the only issue with lack of circular references however. Both the IP Pascal compiler and its companion assembler have this issue. Both being modular for adaption to different processors, they are divided into machine dependent parts and machine independent parts, and they call each other. The machine dependent part typically has the root procedure to run the program, and calls back and forth to the machine independent part to get work done that is common to all processors. Further, the IP compiler has data structures with machine dependent fields like registers and other items that are specific to that machine. These things are handled with circular references, abet carefully constructed ones.

The real bottom line is that ISO 7185 Pascal was never designed to scale to large multimodule programs. Pascaline fixes that. Thus the rule can finally be put into Pascaline: no loops.

To be fair, many languages mention that inclution loops are at least a bad idea. When I read such things in language descriptions, and they don't absolutely forbit it, I smile. That tends to indicate they found out you have to use it on occasion. Its also not straightforward to detect such a loop. You can't just forbid reading the same used/joined file twice. Module A that uses/joins B and C, which also uses/joins B is a valid program. The right way to handle rereading modules is to just not do it. You read B already, you have its definitions. You are done.

[1] The idea of nesting the scope for used namespaces has come up several times over the years. The fundamental problem with it is that with a complex series of uses, it creates as big a mess as import qualifications. Each module can have its own set of names with different meanings as used modules overlay each others namespaces. Far worse, since each module determines its own uses order, the mess could be different for each module as the name spaces overlay in different order! This is insanity. Complex namespace management is better left to joins and qualidents. Uses does what it should, which is announce conflicts immediately and fatally.

 

Last edit: Scott Franco 2022-11-05