archive-access-cvs Mailing List for Web Archive Access Utilities (Page 170)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Update of /cvsroot/archive-access/archive-access/src/docs/warc
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv32472

Modified Files:
	warc_file_format.html warc_file_format.txt 
	warc_file_format.xml 
Log Message:
* warc_file_format.xml 
    Added Appendix C of collection ABNF (Needs work still).
* warc_file_format.html
* warc_file_format.txt 
    Generated from warc_file_format.xml


Index: warc_file_format.html
===================================================================
RCS file: /cvsroot/archive-access/archive-access/src/docs/warc/warc_file_format.html,v
retrieving revision 1.7
retrieving revision 1.8
diff -C2 -d -r1.7 -r1.8
*** warc_file_format.html	28 Aug 2005 18:55:30 -0000	1.7
--- warc_file_format.html	15 Sep 2005 22:18:05 -0000	1.8
***************
*** 263,266 ****
--- 263,268 ----
  <a href="#anchor40">Appendix&nbsp;B.8.</a>&nbsp;
  Example of 'continuation' Record<br />
+ <a href="#anchor41">Appendix&nbsp;C.</a>&nbsp;
+ Collected BNF for WARC<br />
  <a href="#rfc.references1">14.</a>&nbsp;
  References<br />
***************
*** 1531,1534 ****
--- 1533,1579 ----
  the set, the one with the "Segment-Number: 1" named field.
  </p>
+ <a name="anchor41"></a><br /><hr />
+ <table summary="layout" cellpadding="0" cellspacing="2" class="bug" align="right"><tr><td class="bug"><a href="#toc" class="link2">&nbsp;TOC&nbsp;</a></td></tr></table>
+ <a name="rfc.section.C"></a><h3>Appendix C.&nbsp;Collected BNF for WARC</h3>
+ <pre>
+   warc-file     = 1*warc-record
+   warc-record   = header block CRLF CRLF
+   header        = header-line CRLF *anvl-field CRLF
+   block         = *OCTET
+ 
+   header-line   = warc-id tsp data-length tsp record-type tsp
+                     subject-uri tsp creation-date tsp
+                     content-type tsp record-id
+   tsp           = 1*WSP
+ 
+   warc-id       = "warc/" DIGIT "." DIGIT
+   data-length   = 1*DIGIT
+   record-type   = "warcinfo" / "response" / "request" / "metadata" /
+                     "revisit" / "conversion" / "continuation" /
+                     future-type
+   future-type   = 1*VCHAR
+   subject-uri   = uri
+   uri           = &lt;'URI' per RFC3986>
+   creation-date = timestamp
+   timestamp     = &lt;date per below>
+   content-type  = type "/" subtype
+   type          = &lt;'type' per RFC2045>
+   subtype       = &lt;'subtype' per RFC2045>
+   record-id     = uri
+ 
+   anvl-field    = field-name ":" [ field-body ] CRLF
+   field-name    = 1*&lt;any CHAR, excluding control-chars and ":">
+   field-body    = text [CRLF LWSP-char field-body]
+   text          = 1*&lt;any UTF-8 character, including bare
+                     CR and bare LF, but NOT including CRLF>
+                     ; (Octal, Decimal.)
+   CHAR          = &lt;any ASCII/UTF-8 character> ; (0-177,  0.-127.)
+   CR            = &lt;ASCII CR, carriage return> ; (   15,      13.)
+   LF            = &lt;ASCII LF, linefeed>        ; (   12,      10.)
+   SPACE         = &lt;ASCII SP, space>           ; (   40,      32.)
+   HTAB          = &lt;ASCII HT, horizontal-tab>  ; (   11,       9.)
+   CRLF          = CR LF
+   LWSP-char     = SPACE / HTAB                ; semantics = SPACE
+ </pre>
  <a name="rfc.references1"></a><br /><hr />
  <table summary="layout" cellpadding="0" cellspacing="2" class="bug" align="right"><tr><td class="bug"><a href="#toc" class="link2">&nbsp;TOC&nbsp;</a></td></tr></table>

Index: warc_file_format.xml
===================================================================
RCS file: /cvsroot/archive-access/archive-access/src/docs/warc/warc_file_format.xml,v
retrieving revision 1.11
retrieving revision 1.12
diff -C2 -d -r1.11 -r1.12
*** warc_file_format.xml	28 Aug 2005 18:55:30 -0000	1.11
--- warc_file_format.xml	15 Sep 2005 22:18:06 -0000	1.12
***************
*** 17,21 ****
    <!ENTITY rfc2540 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.2540.xml'>
    <!ENTITY rfc4027 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4027.xml'>
- 
  ]>
  <?rfc symrefs="yes"?>
--- 17,20 ----
***************
*** 1397,1401 ****
--- 1396,1452 ----
  
     </appendix>
+   </appendix>
+ 
+   <appendix title="Collected BNF for WARC">
+    <!--
+        TODO: Bring in the definitions for OCTET, etc., from RFC2234.
+        TODO: Whats the slash mean?  Others have |.
+        TODO: Timestamp, mimetype.
+        TODO: The dot after in ANVL zero?
+        TODO: Do all abnf as entity includes so not repeated.
+    -->
  
+ <figure>
+  <artwork>
+   warc-file     = 1*warc-record
+   warc-record   = header block CRLF CRLF
+   header        = header-line CRLF *anvl-field CRLF
+   block         = *OCTET
+ 
+   header-line   = warc-id tsp data-length tsp record-type tsp
+                     subject-uri tsp creation-date tsp
+                     content-type tsp record-id
+   tsp           = 1*WSP
+ 
+   warc-id       = "warc/" DIGIT "." DIGIT
+   data-length   = 1*DIGIT
+   record-type   = "warcinfo" / "response" / "request" / "metadata" /
+                     "revisit" / "conversion" / "continuation" /
+                     future-type
+   future-type   = 1*VCHAR
+   subject-uri   = uri
+   uri           = &lt;'URI' per RFC3986>
+   creation-date = timestamp
+   timestamp     = &lt;date per below>
+   content-type  = type "/" subtype
+   type          = &lt;'type' per RFC2045>
+   subtype       = &lt;'subtype' per RFC2045>
+   record-id     = uri
+ 
+   anvl-field    = field-name ":" [ field-body ] CRLF
+   field-name    = 1*&lt;any CHAR, excluding control-chars and ":">
+   field-body    = text [CRLF LWSP-char field-body]
+   text          = 1*&lt;any UTF-8 character, including bare
+                     CR and bare LF, but NOT including CRLF>
+                     ; (Octal, Decimal.)
+   CHAR          = &lt;any ASCII/UTF-8 character> ; (0-177,  0.-127.)
+   CR            = &lt;ASCII CR, carriage return> ; (   15,      13.)
+   LF            = &lt;ASCII LF, linefeed>        ; (   12,      10.)
+   SPACE         = &lt;ASCII SP, space>           ; (   40,      32.)
+   HTAB          = &lt;ASCII HT, horizontal-tab>  ; (   11,       9.)
+   CRLF          = CR LF
+   LWSP-char     = SPACE / HTAB                ; semantics = SPACE
+  </artwork>
+ </figure>
    </appendix>
  

Index: warc_file_format.txt
===================================================================
RCS file: /cvsroot/archive-access/archive-access/src/docs/warc/warc_file_format.txt,v
retrieving revision 1.6
retrieving revision 1.7
diff -C2 -d -r1.6 -r1.7
*** warc_file_format.txt	28 Aug 2005 18:55:30 -0000	1.6
--- warc_file_format.txt	15 Sep 2005 22:18:06 -0000	1.7
***************
*** 157,164 ****
     Appendix B.7. Example of 'conversion' Record . . . . . . . . . . . 32
     Appendix B.8. Example of 'continuation' Record . . . . . . . . . . 32
!    14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 33
!    Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 35
!    Intellectual Property and Copyright Statements . . . . . . . . . . 36
! 
  
  
--- 157,164 ----
     Appendix B.7. Example of 'conversion' Record . . . . . . . . . . . 32
     Appendix B.8. Example of 'continuation' Record . . . . . . . . . . 32
!    Appendix C.   Collected BNF for WARC . . . . . . . . . . . . . . . 34
!    14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 34
!    Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 37
!    Intellectual Property and Copyright Statements . . . . . . . . . . 38
  
  
***************
*** 1812,1815 ****
--- 1812,1895 ----
     set, the one with the "Segment-Number: 1" named field.
  
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ Kunze, et al.            Expires January 2, 2006               [Page 33]
+ 
+ Internet-Draft          WARC File Format, 0.8revB              July 2005
+ 
+ 
+ Appendix C.  Collected BNF for WARC
+ 
+      warc-file     = 1*warc-record
+      warc-record   = header block CRLF CRLF
+      header        = header-line CRLF *anvl-field CRLF
+      block         = *OCTET
+ 
+      header-line   = warc-id tsp data-length tsp record-type tsp
+                        subject-uri tsp creation-date tsp
+                        content-type tsp record-id
+      tsp           = 1*WSP
+ 
+      warc-id       = "warc/" DIGIT "." DIGIT
+      data-length   = 1*DIGIT
+      record-type   = "warcinfo" / "response" / "request" / "metadata" /
+                        "revisit" / "conversion" / "continuation" /
+                        future-type
+      future-type   = 1*VCHAR
+      subject-uri   = uri
+      uri           = <'URI' per RFC3986>
+      creation-date = timestamp
+      timestamp     = <date per below>
+      content-type  = type "/" subtype
+      type          = <'type' per RFC2045>
+      subtype       = <'subtype' per RFC2045>
+      record-id     = uri
+ 
+      anvl-field    = field-name ":" [ field-body ] CRLF
+      field-name    = 1*<any CHAR, excluding control-chars and ":">
+      field-body    = text [CRLF LWSP-char field-body]
+      text          = 1*<any UTF-8 character, including bare
+                        CR and bare LF, but NOT including CRLF>
+                        ; (Octal, Decimal.)
+      CHAR          = <any ASCII/UTF-8 character> ; (0-177,  0.-127.)
+      CR            = <ASCII CR, carriage return> ; (   15,      13.)
+      LF            = <ASCII LF, linefeed>        ; (   12,      10.)
+      SPACE         = <ASCII SP, space>           ; (   40,      32.)
+      HTAB          = <ASCII HT, horizontal-tab>  ; (   11,       9.)
+      CRLF          = CR LF
+      LWSP-char     = SPACE / HTAB                ; semantics = SPACE
+ 
+ 
  14.  References
  
***************
*** 1818,1821 ****
--- 1898,1909 ----
  
     [ARC]      Burner, M. and B. Kahle, "The ARC File Format",
+ 
+ 
+ 
+ Kunze, et al.            Expires January 2, 2006               [Page 34]
+ 
+ Internet-Draft          WARC File Format, 0.8revB              July 2005
+ 
+ 
                September 1996.
  
***************
*** 1842,1853 ****
  
     [RFC1884]  Hinden, R. and S. Deering, "IP Version 6 Addressing
- 
- 
- 
- Kunze, et al.            Expires January 2, 2006               [Page 33]
- 
- Internet-Draft          WARC File Format, 0.8revB              July 2005
- 
- 
                Architecture", RFC 1884, December 1995.
  
--- 1930,1933 ----
***************
*** 1874,1877 ****
--- 1954,1965 ----
  
     [RFC2540]  Eastlake, D., "Detached Domain Name System (DNS)
+ 
+ 
+ 
+ Kunze, et al.            Expires January 2, 2006               [Page 35]
+ 
+ Internet-Draft          WARC File Format, 0.8revB              July 2005
+ 
+ 
                Information", RFC 2540, March 1999.
  
***************
*** 1901,1905 ****
  
  
! Kunze, et al.            Expires January 2, 2006               [Page 34]
  
  Internet-Draft          WARC File Format, 0.8revB              July 2005
--- 1989,2017 ----
  
  
! 
! 
! 
! 
! 
! 
! 
! 
! 
! 
! 
! 
! 
! 
! 
! 
! 
! 
! 
! 
! 
! 
! 
! 
! Kunze, et al.            Expires January 2, 2006               [Page 36]
  
  Internet-Draft          WARC File Format, 0.8revB              July 2005
***************
*** 1957,1961 ****
  
  
! Kunze, et al.            Expires January 2, 2006               [Page 35]
  
  Internet-Draft          WARC File Format, 0.8revB              July 2005
--- 2069,2073 ----
  
  
! Kunze, et al.            Expires January 2, 2006               [Page 37]
  
  Internet-Draft          WARC File Format, 0.8revB              July 2005
***************
*** 2013,2016 ****
  
  
! Kunze, et al.            Expires January 2, 2006               [Page 36]
  
--- 2125,2128 ----
  
  
! Kunze, et al.            Expires January 2, 2006               [Page 38]

2005	Jan	Feb	Mar	Apr	May	Jun	Jul (1)	Aug (10)	Sep (36)	Oct (339)	Nov (103)	Dec (152)
2006	Jan (141)	Feb (102)	Mar (125)	Apr (203)	May (57)	Jun (30)	Jul (139)	Aug (46)	Sep (64)	Oct (105)	Nov (34)	Dec (162)
2007	Jan (81)	Feb (57)	Mar (141)	Apr (72)	May (9)	Jun (1)	Jul (144)	Aug (88)	Sep (40)	Oct (43)	Nov (34)	Dec (20)
2008	Jan (44)	Feb (45)	Mar (16)	Apr (36)	May (8)	Jun (77)	Jul (177)	Aug (66)	Sep (8)	Oct (33)	Nov (13)	Dec (37)
2009	Jan (2)	Feb (5)	Mar (8)	Apr	May (36)	Jun (19)	Jul (46)	Aug (8)	Sep (1)	Oct (66)	Nov (61)	Dec (10)
2010	Jan (13)	Feb (16)	Mar (38)	Apr (76)	May (47)	Jun (32)	Jul (35)	Aug (45)	Sep (20)	Oct (61)	Nov (24)	Dec (16)
2011	Jan (22)	Feb (34)	Mar (11)	Apr (8)	May (24)	Jun (23)	Jul (11)	Aug (42)	Sep (81)	Oct (48)	Nov (21)	Dec (20)
2012	Jan (30)	Feb (25)	Mar (4)	Apr (6)	May (1)	Jun (5)	Jul (5)	Aug (8)	Sep (6)	Oct (6)	Nov	Dec

archive-access-cvs Mailing List for Web Archive Access Utilities (Page 170)

archive-access-cvs — CVS commits