Menu

FileFormat_Implemented

silex6

File format

In this database software stack, all data related to a schema is stored in a single file. The file is a set of 8Kb pages that may contains data.

Each 8kb page starts with a 4 bytes signature used to identify / validate the type of the page.

The file pages may be the following:

  • a data page. The header of each data page contains a changeset number + collection number + page number, that is the virtual address of the page.
    This header part is followed by a content section that can be used to store data
  • a LOB page has similar structure as a data page, but is intended to be used for BLOB storage.
  • an index node of the master B-Tree index. This index is used to retrieve data pages by their virtual address.
    The index is used to map page virtual address (changeset+collection+page) to actual page location
  • a bitmap page contains an array of 35k bits. Nth bit is '1' means the Nth page is in use.
    A bitmap cover the usage of ~300 MB. The software will add a bitmap on the beginning of each 300 MB section.
  • an empty page. has a signature to indicate the page has been deleted and can be re-used
  • each change set has a header page with current status of the changeset

each page has a serial number, used to track changes and implement incremental backup. A new serial number is generated each time a page has been updated.

The virtual address of a page consist of the following:

  • changeset number. Non-zero for shadow copies
  • collection id. The virtual file this page belongs to
  • page number. Page virtual location in the virtual file. Note page numbers do not need to be contiguous

The data and LOB pages may actually contains the following:

  • record and fields - stored as a collection of key-value sets, in a virtual file. Records may be clustered = aligned to a fixed-size slots - where the slot is chosen based on the data to store. Page number 0 contains the header
  • B-Tree indexes. stored in a virtual file as a set of nodes bound together in a tree structure. page number 0 contains the header and a link to the top of the tree. The links stored in a B-Tree are virtual addresses of either a B-Tree node, or a record
  • LOB data. a virtual file may contains a set of BLOBs (large binary data). Where each blob may exceed the 8kb size of the page. In this case the BLOB is stored as a set of chunks in different pages of a virtual file

The content section of each data or LOB page also start with a 4 bytes signature

Banker's algorithm

In order to avoid deadlocks and conflicting changes that led to broken database structure, the file layer implements a 2-phase updates - the banker's algorithm:

  • during the first phase, the software simulate the data change, and record the scope - the set of pages that may need to be updated
  • Then the system lock all the pages in the scope, and actually apply the change. Nothing is changed before all the pages gain an exclusive lock

Backup and restore

The backup process will sequentially load each page from the database file. If the loaded page is a data page or BLOB, then the backup process will copy the content section of the page to the backup file. Bitmaps and master B-Tree index pages will not be backed-up, as not needed - these structures will be rebuilt / updated during the restore operation. The incremental backup process consider the page serial number. No change of the serial number = no change on the page.

The restore process will load data from the backup file and populate the file with the content. Full restore first create a blank database file. The loaded content is added to a change set. The indexes will be updated during commit.

See also

File format specification


Related

Wiki: File_format
Wiki: General_architecture

MongoDB Logo MongoDB