DelphiDabbler CodeSnip
File Format Documentation

Main Database Update Data Stream

Introduction

The Database Update Data Stream is a stream of data received from the CodeSnip Database Update web service that is used to update the local copy of the main database.

The stream is plain text and consists of a concatenation of text files from the online database along with some housekeeping information. The text files are recreated in the main database directory.

Encoding

The data stream is received from the web server as single- or multi-byte ANSI encoded text. The encoding must be such that characters from the ASCII character set occupy one byte each. Therefore encodings that use two bytes for such characters, such as UTF-16, cannot be used.

The actual encoding used is determined by the web server should be specified in HTTP header. If the HTTP headers do not specify the encoding then ISO-8859-1 is assumed.

The encoding used for the files recreated in the main database directory is UTF-8 with byte order mark.

Data is converted between several formats on its journey from the web server to the final database file. See the appendix for details.

Stream Format

The stream contains structured plain text comprising both numeric and string information. Variable length strings are preceded by numeric values that indicate the length of the following string in bytes. Numeric values are encoded as hexadecimal characters. The format is as follows:

FileCount
Number of files encoded in the data stream. 16 bit integer encoded as four hex digits. Maximum number of files is 32,767.

Followed by FileCount file records of:

Name
Name of file without path information. AnsiString preceded by its size in bytes as a 16 bit integer encoded as four hex digits.
UnixDate
File's modification date (GMT) in Unix format. Int64 encoded as 16 hex digits.
Content
File contents.
For web service version 5 this is an AnsiString preceded by its size in bytes as a 16 bit integer encoded as four hex digits. File size is limited to 32kB.
For web service version 6 this is an AnsiString preceded by its size in bytes as a 32 bit integer encoded as eight hex digits. File size limit is raised to 2 Gb.

Appendix: Description of Data Encoding Conversions

The following flowchart show the various encodings used for downloaded data on its journey from web server to main database file.

Text sent from web server using a single or multi-byte ANSI encoding.
Encoding used sent in HTTP header.
ANSI text stream
CodeSnip's HTTP handling code automatically converts ANSI text stream into Unicode string using encoding specified in HTTP header.
Unicode string
Database download manager code converts Unicode string back into ANSI text stream with same encoding in which it was sent from web server.
ANSI text stream
File updater interprets information stored in formatted ANSI text stream and get contents of each file, converting them to Unicode.
Unicode string
File writer finally writes each file as UTF-8 with a BOM.
UTF-8 stream
UTF-8 text files.