From: John A. K. <joh...@us...> - 2005-08-28 18:55:38
|
Update of /cvsroot/archive-access/archive-access/src/docs/warc In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv17998 Modified Files: warc_file_format.html warc_file_format.txt warc_file_format.xml Log Message: added complete in-line description of ANVL Index: warc_file_format.html =================================================================== RCS file: /cvsroot/archive-access/archive-access/src/docs/warc/warc_file_format.html,v retrieving revision 1.6 retrieving revision 1.7 diff -C2 -d -r1.6 -r1.7 *** warc_file_format.html 26 Aug 2005 23:19:18 -0000 1.6 --- warc_file_format.html 28 Aug 2005 18:55:30 -0000 1.7 *************** *** 363,371 **** warc-file = 1*warc-record warc-record = header block CRLF CRLF ! header = header-line CRLF anvl-fields block = *OCTET </pre> <p>Elements of this grammar are further specified and explained in ! sections that follow (and in the case of <span class="emph">anvl-fields</span>, also a separate document). </p> <p>The record <span class="emph">header-line</span> is a --- 363,371 ---- warc-file = 1*warc-record warc-record = header block CRLF CRLF ! header = header-line CRLF *anvl-field CRLF block = *OCTET </pre> <p>Elements of this grammar are further specified and explained in ! sections that follow. </p> <p>The record <span class="emph">header-line</span> is a *************** *** 385,401 **** been written. </p> ! <p>After the <span class="emph">header-line</span> come any number of ! named fields in a line-oriented syntax called <a class="info" href="#ANVL">ANVL<span> (</span><span class="info">Kunze, J., Kahle, B., Masanes, J., and G. Mohr, “A Name-Value Language,” .</span><span>)</span></a> [ANVL] that is very similar to that of email ! headers <a class="info" href="#RFC0822">[RFC0822]<span> (</span><span class="info">Crocker, D., “Standard for the format of ARPA Internet text messages,” August 1982.</span><span>)</span></a>. Its format can be roughly summarized ! as the following: </p><pre> ! anvl-fields = *line CRLF ! line = (field / other-anvl) CRLF ! field = <field per RFC0822> ! other-anvl = <see ANVL> </pre> ! <p>This document defines a number of named fields which may appear in ! the <span class="emph">anvl-fields</span> area of the header. Note that ! the smallest possible <span class="emph">anvl-fields</span> is a single CRLF, indicating no named fields. </p> --- 385,411 ---- been written. </p> ! <p>After the <span class="emph">header-line</span> come zero or more ! named <a class="info" href="#ANVL">ANVL<span> (</span><span class="info">Kunze, J., Kahle, B., Masanes, J., and G. Mohr, “A Name-Value Language,” .</span><span>)</span></a> [ANVL] fields in a line-oriented syntax ! very similar to that of email headers <a class="info" href="#RFC0822">[RFC0822]<span> (</span><span class="info">Crocker, D., “Standard for the format of ARPA Internet text messages,” August 1982.</span><span>)</span></a> but with ! unrestricted "text" values (none of its 13 reserved special characters). ! The precise format is as follows: </p><pre> ! anvl-field = field-name ":" [ field-body ] CRLF ! field-name = 1*<any CHAR, excluding control-chars and ":"> ! field-body = text [CRLF LWSP-char field-body] ! text = 1*<any UTF-8 character, including bare ! CR and bare LF, but NOT including CRLF> ! ; (Octal, Decimal.) ! CHAR = <any ASCII/UTF-8 character> ; (0-177, 0.-127.) ! CR = <ASCII CR, carriage return> ; ( 15, 13.) ! LF = <ASCII LF, linefeed> ; ( 12, 10.) ! SPACE = <ASCII SP, space> ; ( 40, 32.) ! HTAB = <ASCII HT, horizontal-tab> ; ( 11, 9.) ! CRLF = CR LF ! LWSP-char = SPACE / HTAB ; semantics = SPACE </pre> ! <p>This document defines a number of named fields that may appear as ! an <span class="emph">anvl-field</span>. Note that the smallest ! possible <span class="emph">anvl-fields</span> is a single CRLF, indicating no named fields. </p> *************** *** 632,636 **** </p> <p>Named parameters after the header-line, if any, follow the ! line-oriented syntax called <a class="info" href="#ANVL">ANVL<span> (</span><span class="info">Kunze, J., Kahle, B., Masanes, J., and G. Mohr, “A Name-Value Language,” .</span><span>)</span></a> [ANVL]. Normally, named parameters are optional and their order is insignificant, however, specific record types require that certain named parameters --- 642,647 ---- </p> <p>Named parameters after the header-line, if any, follow the ! line-oriented syntax defined previously (also know as ! <a class="info" href="#ANVL">ANVL<span> (</span><span class="info">Kunze, J., Kahle, B., Masanes, J., and G. Mohr, “A Name-Value Language,” .</span><span>)</span></a> [ANVL]). Normally, named parameters are optional and their order is insignificant, however, specific record types require that certain named parameters Index: warc_file_format.xml =================================================================== RCS file: /cvsroot/archive-access/archive-access/src/docs/warc/warc_file_format.xml,v retrieving revision 1.10 retrieving revision 1.11 diff -C2 -d -r1.10 -r1.11 *** warc_file_format.xml 26 Aug 2005 23:19:18 -0000 1.10 --- warc_file_format.xml 28 Aug 2005 18:55:30 -0000 1.11 *************** *** 203,207 **** warc-file = 1*warc-record warc-record = header block CRLF CRLF ! header = header-line CRLF anvl-fields block = *OCTET </artwork> --- 203,207 ---- warc-file = 1*warc-record warc-record = header block CRLF CRLF ! header = header-line CRLF *anvl-field CRLF block = *OCTET </artwork> *************** *** 209,214 **** <t>Elements of this grammar are further specified and explained in ! sections that follow (and in the case of <spanx ! style="emph">anvl-fields</spanx>, also a separate document).</t> <t>The record <spanx style="emph">header-line</spanx> is a --- 209,213 ---- <t>Elements of this grammar are further specified and explained in ! sections that follow.</t> <t>The record <spanx style="emph">header-line</spanx> is a *************** *** 233,254 **** been written.</t> ! <t>After the <spanx style="emph">header-line</spanx> come any number of ! named fields in a line-oriented syntax called <xref ! target="ANVL">ANVL</xref> that is very similar to that of email ! headers <xref target="RFC0822" />. Its format can be roughly summarized ! as the following:</t> <figure> <artwork> ! anvl-fields = *line CRLF ! line = (field / other-anvl) CRLF ! field = <field per RFC0822> ! other-anvl = <see ANVL> </artwork> </figure> ! <t>This document defines a number of named fields which may appear in ! the <spanx style="emph">anvl-fields</spanx> area of the header. Note that ! the smallest possible <spanx style="emph">anvl-fields</spanx> is a single CRLF, indicating no named fields.</t> --- 232,262 ---- been written.</t> ! <t>After the <spanx style="emph">header-line</spanx> come zero or more ! named <xref target="ANVL">ANVL</xref> fields in a line-oriented syntax ! very similar to that of email headers <xref target="RFC0822" /> but with ! unrestricted "text" values (none of its 13 reserved special characters). ! The precise format is as follows:</t> <figure> <artwork> ! anvl-field = field-name ":" [ field-body ] CRLF ! field-name = 1*<any CHAR, excluding control-chars and ":"> ! field-body = text [CRLF LWSP-char field-body] ! text = 1*<any UTF-8 character, including bare ! CR and bare LF, but NOT including CRLF> ! ; (Octal, Decimal.) ! CHAR = <any ASCII/UTF-8 character> ; (0-177, 0.-127.) ! CR = <ASCII CR, carriage return> ; ( 15, 13.) ! LF = <ASCII LF, linefeed> ; ( 12, 10.) ! SPACE = <ASCII SP, space> ; ( 40, 32.) ! HTAB = <ASCII HT, horizontal-tab> ; ( 11, 9.) ! CRLF = CR LF ! LWSP-char = SPACE / HTAB ; semantics = SPACE </artwork> </figure> ! <t>This document defines a number of named fields that may appear as ! an <spanx style="emph">anvl-field</spanx>. Note that the smallest ! possible <spanx style="emph">anvl-fields</spanx> is a single CRLF, indicating no named fields.</t> *************** *** 488,492 **** <t>Named parameters after the header-line, if any, follow the ! line-oriented syntax called <xref target="ANVL">ANVL</xref>. Normally, named parameters are optional and their order is insignificant, however, specific record types require that certain named parameters --- 496,501 ---- <t>Named parameters after the header-line, if any, follow the ! line-oriented syntax defined previously (also know as ! <xref target="ANVL">ANVL</xref>). Normally, named parameters are optional and their order is insignificant, however, specific record types require that certain named parameters Index: warc_file_format.txt =================================================================== RCS file: /cvsroot/archive-access/archive-access/src/docs/warc/warc_file_format.txt,v retrieving revision 1.5 retrieving revision 1.6 diff -C2 -d -r1.5 -r1.6 *** warc_file_format.txt 26 Aug 2005 23:19:18 -0000 1.5 --- warc_file_format.txt 28 Aug 2005 18:55:30 -0000 1.6 *************** *** 293,302 **** warc-file = 1*warc-record warc-record = header block CRLF CRLF ! header = header-line CRLF anvl-fields block = *OCTET Elements of this grammar are further specified and explained in ! sections that follow (and in the case of _anvl-fields_, also a ! separate document). The record _header-line_ is a newline-terminated sequence of --- 293,301 ---- warc-file = 1*warc-record warc-record = header block CRLF CRLF ! header = header-line CRLF *anvl-field CRLF block = *OCTET Elements of this grammar are further specified and explained in ! sections that follow. The record _header-line_ is a newline-terminated sequence of *************** *** 314,333 **** completely known after the record content _block_ has been written. ! After the _header-line_ come any number of named fields in a line- ! oriented syntax called ANVL [ANVL] that is very similar to that of ! email headers [RFC0822]. Its format can be roughly summarized as the ! following: ! ! anvl-fields = *line CRLF ! line = (field / other-anvl) CRLF ! field = <field per RFC0822> ! other-anvl = <see ANVL> ! ! This document defines a number of named fields which may appear in ! the _anvl-fields_ area of the header. Note that the smallest ! possible _anvl-fields_ is a single CRLF, indicating no named fields. ! Following the headers comes the content _block_, if any, which may ! contain arbitrary binary data, up through the remaining number of --- 313,333 ---- completely known after the record content _block_ has been written. ! After the _header-line_ come zero or more named ANVL [ANVL] fields in ! a line-oriented syntax very similar to that of email headers ! [RFC0822] but with unrestricted "text" values (none of its 13 ! reserved special characters). The precise format is as follows: ! anvl-field = field-name ":" [ field-body ] CRLF ! field-name = 1*<any CHAR, excluding control-chars and ":"> ! field-body = text [CRLF LWSP-char field-body] ! text = 1*<any UTF-8 character, including bare ! CR and bare LF, but NOT including CRLF> ! ; (Octal, Decimal.) ! CHAR = <any ASCII/UTF-8 character> ; (0-177, 0.-127.) ! CR = <ASCII CR, carriage return> ; ( 15, 13.) ! LF = <ASCII LF, linefeed> ; ( 12, 10.) ! SPACE = <ASCII SP, space> ; ( 40, 32.) ! HTAB = <ASCII HT, horizontal-tab> ; ( 11, 9.) ! CRLF = CR LF *************** *** 338,341 **** --- 338,349 ---- + LWSP-char = SPACE / HTAB ; semantics = SPACE + + This document defines a number of named fields that may appear as an + _anvl-field_. Note that the smallest possible _anvl-fields_ is a + single CRLF, indicating no named fields. + + Following the headers comes the content _block_, if any, which may + contain arbitrary binary data, up through the remaining number of octets as specified in the previously-given _data-length_ parameter. Finally come two CRLF newlines, not counted in the declared record *************** *** 381,392 **** - - - - - - - - Kunze, et al. Expires January 2, 2006 [Page 7] --- 389,392 ---- *************** *** 658,668 **** Named parameters after the header-line, if any, follow the line- ! oriented syntax called ANVL [ANVL]. Normally, named parameters are ! optional and their order is insignificant, however, specific record ! types require that certain named parameters be present (and future ! extensions may have ordering requirements). If there are no named ! parameters present, the entire WARC record header is the line of ! positional parameters followed by one blank line (two consecutive ! newlines). --- 658,668 ---- Named parameters after the header-line, if any, follow the line- ! oriented syntax defined previously (also know as ANVL [ANVL]). ! Normally, named parameters are optional and their order is ! insignificant, however, specific record types require that certain ! named parameters be present (and future extensions may have ordering ! requirements). If there are no named parameters present, the entire ! WARC record header is the line of positional parameters followed by ! one blank line (two consecutive newlines). |