xml-dev messages on XSDComp

  • Curt Arnold

    Curt Arnold - 2000-04-22

    "Arnold, Curt" wrote:
    > You are invited to join a project to an open-source XSLT-based
    compiler for XML Schema.

    I welcome Curt's initiative. From his careful comments on the XML
    drafts, he seems a person with the technical awareness to keep
    like this on track.

    And the news on the RELAX mailing list is also very heartening.

    On the issue of XML Schemas being compiled into Schematron, I am
    interested in knowing if anyone can come up with content models that
    could not be validated by a Schematron schema automatically generated
    from an XML Schema so that, for every unique particle in an element
    complex type:
        1) an assertion statement is created for all the allowed successors of
    that particle within that parent
        2) an assertion statement is created for all the allowed predecessors
    of that particle within that parent
        3) an assertion statement is created giving the effective minOccurs of
    that element within the whole type
        4) an assertion statement is created giving the effective maxOccurs of
    that element within the whole type.

    These simply derivable rules seem to validate most of the constraints
    most content models. But I can see a family of cases where these don't
    capture everything: that is the case of particles repeated in different
    contexts.  For example, a content model like
        ( a{3-4}, b, a{5-6} )
    would by the above translation rules have the assertions
    <rule context="x/a">
       test="following-sibling::b or following-sibling::a or
      >Allowed successors...</assert>
       test="previous-sibling::b or previous-sibling::a or position()=1"
      >Allowed predecessors...</assert>
      <rule context="x/b">
      >Allowed successors...</assert>
      >Allowed predecessors...</assert>
      <rule context="x">
       <assert test="count(a) &gt; 7 and count(a) &lt; 11"
       >(max and min on a)</assert>
       <assert text="count(b) =1 "
       >(max and min on b)</assert>
    but that corresponds to a slightly weaker content model:
       ( a,  (( b, a{7-10})
         ( a,  (( b, a{6-9})
          | ( a,  (( b, a{5-8})
           | ( a,  (( b, a{4-7})
            |( a,  (( b, a{3-6})
              |( a,  (( b, a{2-5})
                |( a,   b, a{1-4})
        ( a{1-7}, b, a{1-9} ) where a>7 and a<11

    If we can find any convenient way to represent these kind of grouping
    constraints (and other similar ones) then it is possible that the
    approach based on assertions on two-step path models is more powerful
    that grammars (for modeling constraints, which is only one of the
    that a schema language can be for: a schema language can also allow
    naming of structures present according to some analytical paradigm such
    as "type" or "pattern"). (Of course, if allowed an infinite number of
    subcontexts within an assert, that would give us a better purchase (in
    the mountaineering sense) but I am trying to resist that if possible.

    If anyone has any ideas or inspiration on this (especially formal
    approaches) then please feel free to email me or to continue this
    discussion on Curt's mail list  XSDComp (where is seems to fit in).

    Rick Jelliffe

    • Curt Arnold

      Curt Arnold - 2000-04-22

      Thanks for the vote of confidence.  I'm not sure how
      all the pieces will fit together, but I've always
      liked puzzles.

      The first series of transforms leads from the XML
      Schema language to what I'll call a package of Schema
      "bytecode" (don't worry it is still XML).  The
      "bytecode" will be roughly approximate to the objects
      that a validating parser would build out when
      processing a schema prior to validating a document.
      (I have some vague ideas of what it is going to look
      like).  The "bytecode" will have every piece of
      information relevant to document validation that was
      in the original schema document.

      What you do with this "bytecode" is then fairly open.

      It could be embedded in a document or referenced as a
      resource to be used in the validation of the document.

      One product from the bytecode could be the source for
      a application-specific validator object for the parser
      of your choice in the language of your choice.  For
      example, a purchase order validator in Java for

      The generated code could be a lossless interpretation
      of the schema constraints.  However, you could also
      generate a DTD from the "bytecode" which would, of
      course, lose those features that can't enforce in a

      In the same light, if schematron or XSLT in general is
      not capable of enforcing some constraint, it could
      still be useful to generate schematron or XSLT code
      for what it can enforce.

      My initial impression is that RELAX could be converted
      into "bytecode" without loss and "bytecode" could be
      converted into RELAX but constraints could be lost.

      Given the recent performance of xml-dev and the
      possibly small set of subscribers that would be
      interested in a in-depth discussion of these issues, I
      think that having them on xml-dev is not optimal.

      I'd suggest that anyone interested in further
      discussions on this to visit the XSDComp Open
      Discussion Forum

      It is not necessary to register with SourceForge to
      view the Forum, however you will need to if you want
      to  post or "monitor" the forum.  Monitoring the forum
      will result in any posted message being automatically
      emailed to you.  Posting to the forum is through HTML
      pages.  I think the forum will be a better venue than
      the XSDComp mailing list.

    • Curt Arnold

      Curt Arnold - 2000-04-22

      Hi Curt,

      I went to SourceForge and tried to login but got bounced with
      "Cannot find server or DNS Error"
      Internet Explorer

      I suspect it is just busy or ??? Normally Source Forge works fine.

      Anyway be that as it may I thought just fall back and mail you
      directly and cross-post to the sml-dev list whose membership
      needs be aware of your efforts. 

      In your message below the

      "bytecode" (don't worry it is still XML).

      that you are proposing sounds to me a lot like what the folks at
      SML-DEV are calling MinXML for which they have produced
      several parsers recently (the last few weeks). You owe it
      to yourself to take a look at SML-DEV and decide for yourself.
      Also SML-DEV has definitely attracted some serious youthful
      talent who are disenchanted with the W3C, OASIS rule by
      committee, etc and are just plowing ahead with the process of
      simplification and development of both standards and code via
      a much more Open Source collaborative approach. SML-DEV is one
      of the places where the action is in XML right now.

      See: http://www.egroups.com/group/sml-dev

      At any rate good luck. You have picked a tough project but
      obviously an important one. All of us need now to pitch in
      and help you.

      Bob La Quey

      • Curt Arnold

        Curt Arnold - 2000-04-23

        >I was running into the same problem at work which was
        >due to not having my security proxy set up in my
        >browser.  If you are accessing through a proxy server,
        >you might trying putting in an explicit setting for
        >your security proxy (typically same as your http
        >proxy, port 80 on some IP address.

        Hmm ... I am using a Linux Router Project box which I
        rolled myself. It seems to work on almost every site
        I go to but maybe something is differnt here.

        >I definitely see the potential for some synergy with
        >the SML-DEV community and the more energy the better.
        >My understanding of MinXML is that it is a profile of
        >XML syntax (sans attributes, DTD's, etc).  Schema
        >bytecode is an XML document (which potentially could
        >conform to MinXML and probably to Common XML) that
        >contains the distilled essence of a schema.

        It might be a very good use case for MinXML. We really should
        give it a try.

        One of my strongly held beliefs is that by simplifying the core
        XML technology we will gaain large benefits in all of the
        superstructure we erect upon that core. IMHO this is the most
        importan potential benefit of efforts like MinXML.

        >I think declarative validation is incredibly important
        >area for XML (and I'm definitely not alone on that).
        >But I think that we've constantly focused on
        >validation that a document is consistent with its own
        >declaration, instead of the more important validation
        >that a document is consistent with what a particular
        >application can consume.



        >How I see Schema bytecode interacting with a MinXML
        >parser is that a MinXML parser could be initialized to
        >validate parsed documents against a validation package
        >determined by what the application wants to accept.
        >This could either be accomplished by compiling the
        >schema bytecode into source code or by an generic
        >"bytecode" interpreter in the MinXML parser.


Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks