CarvPath annotations are a (native) file path compatible way of annotating data entity locations within a larger entity. The basic concepts and annotation formats are relatively simple. Only when files on a filesystem get extremely fragmented, a little bit of complication sneaks into the picture.
A basic fragment
A basic fragment in the CarvPath annotation consists of an offset and a size separated by a plus sign '+'. The offset designates the offset within the parent entity where the fragment starts at. The size defines the size of the fragment. Please note that older versions of LibCarvPath used the colon instead of the plus sign. The colon annotation has been abandoned as a result of problems with the usage of colons from MS-windows systems over a SMB share.
A sparse fragment
In many file systems we know the concept of sparse files. A sparse file can contain regular fragments, but will also contain so called sparse fragments. What a sparse file basically is, is a file for what a size was set at creation, but where not all locations within the file were actually ever written. This basically means that there are holes in the file, sections of the file that were never written to, and thus sections of the file that can not be found on the parent entity. If a program working with the original file system would access such a hole, it would get back all zero's. So we need a way to represent these sparse fragments within files on a file system. The way we annotate sparse fragments with CarvPath is by prefixing the size of the sparse fragment with a capital s 'S'.
A fragmented data entity
Where many files will be written as a single continuous set of blocks to the disk, there are situations where the file system will not be able to do so. In those cases, the resulting file will be fragmented, and a single entity will thus be made up out of multiple fragments. To allow multiple fragments for a single entity to be annotated, CarvPath annotation defines the underscore as separator character between two fragments.
Most fragmented files will consists of a relatively limited amount ho fragments. There are however cases, for example mail boxes or log files, that are notorious for being extremely fragmented. Also for example systems that were use extensively with peer to peer file sharing applications tend to be filled with extremely fragmented movie files. While this does not pose a problem for raw CarvPath annotations, these annotations in themselves pose a problem for usage within the user space filesystem CarvFS. On Linux the total size of a path can be up to four K-bytes in length. A single file or directory name can be no longer than a single K-byte. When we expose CarvFS to windows over a SMB share, things become even more serious. The maximum path size for the basic windows file system API is only 256 bytes in size. That is not for a single file name, but for the whole path. To overcome this problem, LibCarvPath can run in one of 3 mode's:
- Raw carvpath mode (infinite token size)
- Linux mode (max token size of 1000 bytes)
- MS-Windows compatibility mode (max token size of 160 bytes)
If a token exceeds the maximum token size, LibCarvPath will make use of a longtoken db. Rather than returning the raw carvpath, LibCarvPath will return the SHA1 digest of the raw carvpath prefixed with a capital d 'D'. The raw carvpath is than stored in the longtoken db, and if any tool running under the same uid requests to parse the longtoken entity, the SHA1 entry will be looked up in this database first. A longtoken annotation thus will consist of a capital d, followed by the lowercase hexadecimal representation of the SHA1 digest of the raw carvpath:
One of the most important features of CarvPath annotations is that they can be used in a nested way. That is, you can for example create a carvpath entity annotation for a carved file, using the carvpath annotation of a block of unallocated data that this file was carved from as a prefix. As a nesting seperation, CarvPath annotation uses the operating system its native nesting separator for filesystem access. This means that on Unix like operating systems the slash '/' character is used, while on MS-Windows, we use the backslash '\' character.
The LibCarvPath library will always try to present its user with the shortest possible representation of a carvpath. This means that LibCarvPath will use two forms of flattening. First of all, while creating a carvpath entity from fragments, LibCarvPath will automatically meerge interlocking fragments. So instead of 0+4096_4096+4096_8192+4096 , LibCarvPath will return the single level flattened representation 0+12288.
Next to single level flattening, LibCarvPath will also do multi level flattening. That is, it will flatten each CarvPath to the CarvPath level that it considers its root. This means for example that 118784+17179869184/4096+53248_258048+28672_S1047552 will flatten to 122880+53248_376832+28672_S1047552.
While not part of the CarvPath annotation scheme, it is important to note that CarvFs uses a simple extention of CarvPath annotations to distinguish between nesting levels (pseudo dirs) and pseudo files. The basic concept is that any raw carvpath is interpreted by CarvFS to be a pseudo directory. In order to get at a carvpath as a pseudo file, the carvpath needs to suffixed with the '.crv' extention. For example:
CarvFS aware tools should use the '/CarvFS' and the '.crv' sections in the path string as an indication that the path may be on CarvFS. A CarvFS aware tool in this example may choose to use /mnt/carvfs/42e8a6da8794ed8c29f48577e0422fef/CarvFS as root path , using the size of /mnt/carvfs/42e8a6da8794ed8c29f48577e0422fef/CarvFS.crv as top size, or alternatively may use /mnt/carvfs/42e8a6da8794ed8c29f48577e0422fef/CarvFS/122880+53248_376832+28672_S1047552 as root path, using the size of /mnt/carvfs/42e8a6da8794ed8c29f48577e0422fef/CarvFS/122880+53248_376832+28672_S1047552.crv as top size.