dev-blocklist-format-p2b

jre-phoenix

The PeerGuardian Binary Lists (P2B) Format

The P2B format was created in an attempt to significantly lighten the bandwidth requirements for transfering PeerGuardian lists. The new format typically results in files 50% smaller than an identical P2P text list. Because it is a binary format and not easily modified without specialized software, it is not recommended for general use in anything but transfer where bandwidth is an issue.

This describes the formats for the binary list formats seen in PeerGuardian 2 (Windows), should you wish to develop an application that uses the same lists.

The header

P2B Header

An eight byte header exists at the start of every P2B list to identify a file and to let a parser know which version it is.

Type Description
int32 Always -1 (0xFFFFFFFF)
char[3] Magic Number. Always 'P2B'
uint8 Version Number. Can currently be 1, 2, or 3.

Versions

Version 1

P2Bv1 can be thought of as a direct binary mapping of the original P2P format. Directly after the header comes a series of IP ranges:

P2Bv1 IP Range

Type Description
string Range label, a zero-terminated C string encoded in ISO-8859-1.
uint32 Starting IP, in network byte order.
uint32 Ending IP, in network byte order.

Version 2

P2Bv2 is identical in format to version 1, except all strings are encoded in UTF-8 for better internationalization.

Version 3

P2Bv3 was made with the realization that many ranges use the same name. In many cases it produces smaller lists than version 2, and compresses a little better.

P2Bv3 Format

Type Description
uint32 The amount of labels that follow, in network byte order.
n strings n zero-terminated C strings which define the range labels. All strings are encoded in UTF-8.
uint32 The amount of ranges that follow, in network byte order.
n ranges The IP ranges.

P2Bv3 IP Range

Type Description
uint32 The index of the label associated with this range, in network byte order.
uint32 Starting IP, in network byte order.
uint32 Ending IP, in network byte order.

Version 4 Draft

Warning: P2Bv4 development is not yet finished.

P2Bv4 is being made to address the issue of IPv6. In addition to IPv6, it allows a variable amount of fields for each range, metadata about the list, and support for downloading only changes instead of entire lists while updating. P2Bv4 should also be smaller due to support for CIDR notation and a dynamic range format. P2Bv4 also carries a builtin cryptographic signature for verification.

P2Bv4 Format

Type Description
n chunks n chunks.
uint8 Always 0, indicating the end of chunks.
64 bytes A SHA-512 hash of all the previous data in the file.
uint32 The size of the following digital signature, in network byte order. 0 is a valid size if no digital signature is associated with this file.
n bytes A ECDSA-521 signature of the previous hash.

P2Bv4 is broken down into chunks. Applications can safely ignore chunks they don't recognize. One chunk type can come multiple times.

P2Bv4 Chunk Header

Type Description
uint8 The type of chunk this is. Currently this can be:
1. Metadata
2. Update information
3. Strings
4. Diff IPs
5. IP ranges
uint32 The size of the chunk, including this header.

Metadata chunk

Type Description
n*2 strings n*2 zero-terminated C strings, which are n key/value pairs that define metadata about the list. These can be the application which made it, the date it was made, etc. All strings are encoded in UTF-8.
uint8 Always 0, an empty string designating the end of the list metadata.

Update information chunk

Type Description
uint16 Minimum interval, in minutes, at which applications should auto-update the list. If 0, the applications should supply a reasonable value.
uint64 A UNIX timestamp of when this list was created.
string A URL to be used for pulling down updates.
uint32 A counter value for diffs. If 0, this list is not diffable. Any other value should be used while updating lists. Specifically, if the counter value is 10 and the URL is http://phoenixlabs.org/lists/p2p.7z, updaters should attempt to download diffs at p2p.11.7z, p2p.12.7z, p2p.13.7z, and so on, until it gets a 404.

Strings chunk

Type Description
n strings n zero-terminated C strings which can be used for labels or any other range fields. All strings are encoded in UTF-8.
uint8 Always 0, an empty string designating the end of strings.

Note that if there are multiple string chunks, they are cumulative for indexes.

Diff IPs chunk

Type Description
uint8 The type of IP this holds. Currently this can be:
IPv4 IPs
IPv6 IPs
uint32 The amount of IPs that follow, in network byte order.
n IPs IP addresses to remove from the previous version, in network byte order. Only the starting IP of ranges needs to be specified.

The diff IPs chunk is only used in diffs. It specifies the starting IPs of ranges to remove from the previous version of the list.

IP ranges chunk

Type Description
n range descriptors n descriptors showing what and in which order range data comes in.
uint8 Always 0, a byte designating the end of the range metadata.
uint32 The amount of ranges that follow, in network byte order.
n ranges The ranges. If this is a diff, this consists of the ranges added or changed since the last version.

P2Bv4 Range Descriptor

Type Description
uint8 The type of value this range field holds. Can currently be:
uint8
uint16, in network byte order.
uint32, in network byte order.
uint64, in network byte order.
uint128, in network byte order.
zero-terminated C string in UTF-8.
blob: uint32 prefix in network byte order specifies the length, and the blob data follows.
uint32 The size of the field in bytes, in network byte order. Should be 0 for variable-sized types like strings and blobs.
string A short zero-terminated ASCII string describing what this range field contains. Currently, only the following are recognized:
label: If an integer is used, it is a string index. Otherwise must be a string.
startaddr: The starting address of an IP range. Must be a uint32 for IPv4, or uint128 for IPv6.
endaddr: The ending address of an IP range. Must be a uint32 for IPv4, or uint128 for IPv6. Can not be used with cidrbits.
cidrbits: The CIDR bitmask length. Must be an integer. Can not be used with endaddr.

Developing with P2B

A cross-platform C++ library for working with the P2B format is freely available here.

The future

In the future, when bandwidth is cheaper and the information is deemed useful, you may wish to include range metadata (category, geographical region, description, etc.) within a P2B list. If you wish to do so, please coordinate with me (phrosty@gmail.com) to ensure it is done compatibly.

See also

PeerGuardian Text Lists (P2P) Format
eMule Text Lists (DAT) Format

Based upon the old PhoenixLabs Wiki page


Related

Wiki: dev-Main
Wiki: dev-blocklist-format-dat
Wiki: dev-blocklist-format-p2p