From: <ul...@us...> - 2010-06-26 11:10:41
|
Revision: 13 http://adc.svn.sourceforge.net/adc/?rev=13&view=rev Author: ullner Date: 2010-06-26 11:10:34 +0000 (Sat, 26 Jun 2010) Log Message: ----------- Adding BLOM Removing 'optional' from UCMD. Modified Paths: -------------- trunk/ADC-EXT.txt Modified: trunk/ADC-EXT.txt =================================================================== --- trunk/ADC-EXT.txt 2010-04-04 14:24:32 UTC (rev 12) +++ trunk/ADC-EXT.txt 2010-06-26 11:10:34 UTC (rev 13) @@ -1,6 +1,6 @@ -= ADC Extensions += ADC Extensions Fredrik Ullner <ul...@gm...> -1.0.2, April 2010 +1.0.3, June 2010 == Abstract These are the official extensions to ADC. This document is based on the @@ -25,6 +25,10 @@ === Version 1.0.2 * Added UCMD extension +=== Version 1.0.3 +* Removed 'optional' keywords from UCMD +* Added BLOM extension + == Extensions === TIGR - Tiger tree hash support @@ -246,6 +250,7 @@ userCID | User CID userSID | User SID userXX | One for each flag on the user sent; for example, userI4 and userNI +line:info | Prompts the user for input where 'info' is the displayed text description for the user input ___ File parameters @@ -260,14 +265,6 @@ hubXX | One for each flag of the hub; for example, hubNI and hubVE ___ -The following tables specify the keywords that are optional. - -User parameters -[separator="|"] -``_ -line:info | Prompts the user for input where 'info' is the displayed text description for the user input -___ - ==== Example `_ ICMD ADCH++/Hub{backslash}smanagement/Register{backslash}snick TTHMSG{backslash}s+regnick{backslash}s%[userNI]{backslash}s%[line:Password{backslash}s(leave{backslash}sempty{backslash}sto{backslash}sun-reg)]{backslash}s%[line:Level{backslash}s(facultative;{backslash}sdefaults{backslash}sto{backslash}syour{backslash}sown{backslash}slevel{backslash}sminus{backslash}sone)]{backslash}n CT2 @@ -277,4 +274,76 @@ ICMD ADCH++/Info TTHMSG{backslash}s+info{backslash}n CT1 ___ + +=== BLOM + +Bloom filters allow the hub to filter certain searches using bitmap that represents the hashes of the files in the users share. BLOM is an extension that allows hub software to create a map (bloom filter) of the shared files on the hub, but with minimal effort, e.g. the hub doesn't keep a list of files, but a filter that never produces false negatives but only possible false positives. This can potentially save bandwidth and effort on the client side. When the user updates the share, the client must send an INF containing the flag SF. The hub may at any time request that the client sends an updated version of its bloom filter by sending a GET command to the client. The client will then respond using SND and send the bloom filter binary data. + +==== Legend + +|===== +|b |Number of bits used for file hashes +|n |Number of files in the user's share +|m |Size of the bloom filter in bits +|k |Number of sub-hashes constructed from the file hash +|h |Number of bits to use for each sub-hash +|p |Propability of a false positive +|===== + +The hub chooses k, h and m. + +==== Restrictions + +[width="15%"] +|===== +|k * h < = b +|h < = 64 +|2^h > m +|m mod 64 == 0 +|===== + +==== Probability + +|===== +|p == (1 - (1 - 1 / m)^(k * n))^k |False positives +|p == 0 |False negatives +|===== + +==== Protocol changes +Signal BLOM in SUP. + +For the SND type, adds H as message type. + +For the GET type, adds I as message type. + +For the GET type, adds "blom" as type. + +For the GET type, "/" shall be used as namespace. + +For the GET type, 0 (zero) shall be used as start position. + +For the GET type, m / 8 shall be used as byte amount. + +Updates GET with the following flags; + +[width="15%"] +|===== +|BK |Specify 'k'. +|BH |Specify 'h'. +|===== + +==== Algorithm +The client constructs the bloom filter by creating a bit array of m bits set to 0 (zero). For each file it then sets to "1" k positions constructed from the file hash. Seeing the file hash as a stream of bits (starting from the lowest bit of the first byte, ending at the highest bit of the last byte), the client should use h bits starting at the first bit of the first byte to create an integer and apply modulo m to get the position in the bit array, then redo the process k times advancing the starting position by h each time. + +Once the hub has received the bloom filter bit array, for each search command it processes, if it contains a hash search term, it can skip broadcasting the search to a particular client if at least one of the k bits in that clients bit array is "0", calculating positions as the client does when setting bits to "1". The hub has to evaluate the filter for each client that it has a bloom filter for, for each search. + +==== Probability calculations +p = (1 - (1 - 1 / m)^(k * n))^k, thus p becomes smaller as m grows and larger as n grows. Larger m means more bits to transfer but also fewer false positives. The optimum value for k given m and n is (m / n) * ln 2. The largest k supported by a hash of a certain size is b / h, so if the hub wants the smallest p possible, it should choose the smallest possible h which gives the largest k, and then calculate m = k * n/ln 2, checking that the resulting m < 2^h. 2^h should much be larger than m (at least 3-4 times), because of how the modulo operator works. Also, with m grows the required bandwidth to transfer the bloom filter, so the hub may wish to cap m. In that case, it should still choose k according to m / n * ln 2, but send an h as big as possible to alleviate biasing problems. + +==== Sample implementations +===== Tiger +For TTH roots, b is 192 and a reasonable value for h is 24, giving a maximum k = 8 which means that m = 8 * n / ln 2 ≈ 11.5 * n. The required bandwidth then becomes 11.5 * n / 8 bytes, so approximately 1.44 bytes per file in the users share. For 20000 files, m should then be 230016 (taking into account the modulo 64 requirement), giving a p = 0.004, in other words ~99.6% of all searches that don't match a users share would not be sent, saving valuable upload bandwidth for the hub. The client calculates i valid positions, if x is an array of bytes containing the hash of the file, on a little-endian machine, by doing pos = x[0+i*h/8] | (x[1+i*h/8] << 8) | (x[2+i*h/8] << 16) for i = [0;k). This is of course a special case where h % 8 = 0, the actual algorithm has to take into consideration values for h where the integer crosses byte boundaries. + +For test vectors, see the ADC wiki talk page; http://www.adcportal.com/wiki/index.php/Talk:BLOM + // vim: set syntax=asciidoc: This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |