Menu

Pointers on Pointers

2017-04-11
2017-04-11
  • Yves Cloutier

    Yves Cloutier - 2017-04-11

    One of my interests is...programming languages and how they are implemented.

    In most books on compilier construction there is a need to use pointers to memory or pointers to data structures.

    In the absence of pointers proper in unicon and icon, I'm wondering what is a stragegy one would use if attempting to use either in the constuction of a compiler?

    In addition, I know there exists an icon version offlex and yacc, but if one were to do lexical analysis and parsing from scratch, what would be best practices if using a goal oriented language? Do the constructs that are available in icon/unicon introduce new and innovative ways of doing things that other languages can't do?

    There are many resources on how to implement scanning and parsing for c-like languages. Even for Pascal (and variants like Oberon), and even how to do it using functional languages like Haskell, ML, F#, OCaml etc...

    I'd be interested in how an expression based, goal oriented language might offer new solutions and how it might bring fresh innovations into this domain.

    Would love to hear your thoughts.

    yc

     
    • Bruce Rennie

      Bruce Rennie - 2017-04-11

      Good morning Yves,

      To start with, please look at the Implementation Book in the Unicon
      distribution. Clinton and Don have been a great deal of work getting it
      up to date. It is a great resource for starting in on Unicon/Icon for
      both the optimising compiler and the interpreter.

      You have two distinct things to look at in regards Unicon/Icon. The
      first is the language itself. The second is the virtual machine.

      You have stated an interest in creating a compiler for some language
      using Unicon/Icon. You could start with the Unicon compiler itself (in
      the source distribution) as well as the Parser classes (in the source
      distribution), you can also look at ibpag (lexer/parser in the IPL
      source distribution). Alternatively, you can download the cocori source
      (CoCo/R rewritten in Icon).

      At some point, I can discuss with you my attempts at PEG/LR/LL parser
      generators written in Unicon/Icon. I have a number of in-progress
      projects using each of these techniques to write lexer/parser generators
      in Unicon/Icon.

      regards

      Bruce Rennie

       
    • Clinton Jeffery

      Clinton Jeffery - 2017-04-12

      Hi Yves,

      I think you have to distinguish between pointers used in the implementation language (Unicon, like Java, has no explicit pointer type, but has has various forms of reference semantics) and pointers used in the execution model of the target language for which the compiler must generate code. The target generated code in most compilers must have a pointer model because generated code is usually lower level and closer to the machine, and that is independent from whether the implementation language has a pointer data type. So before I would try to answer your question, I would ask: what compiler construct (if you mean pointers in the implementation language) or what target language machine construct (if you mean pointers in the generated code) are you wanting suggestions on?

      Lexical analysis from scratch in Icon or Unicon would presumably make good use of string scanning and/or the new SNOBOL pattern matching facilities. The Unicon lexical analyzer was written from scratch and is a giant string scan. Best practices minimize I/O subsystem interaction, and to be honest, you don't need to do a lot of backtracking in lexical analysis usually. You can see it in unicon/uni/unicon/unilex.icn.

       
  • Yves Cloutier

    Yves Cloutier - 2017-04-12

    @Bruce: Your experiments with parsers seem interesting! Please do share once you feel satisfied with your work.

    @Clint: Yes I think you are right - there is a difference between the need for pointers in the implementation language and the target one. In fact I was re-reading the "Core Unicon" chapter of your book last night, where it clearly indicates how different datatypes are passed and manipulated by procedures.

    I think I was brainwashed by C, C++ and D where pointers are typically used in creating trees and AST nodes. But essentially, if unicon records are passed to procedured by reference, then there is no need to have explicit pointers in the implementation language.

    I think in some cases you become so used to having to do a lot of dirty work using other languages that you think to yourself "ok, how do I do this in unicon", when actually all the things you already need are already baked in as part of the language. It will take some time to "switch gears" and think at a much higher level than what you're used to in certain situations.

    Thanks for the pointer to unilex. I'll be sure to have a look at that.

     

Log in to post a comment.