Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

How are pages stored in GDBM format?

Help
Heiner
2006-08-10
2012-10-11
  • Heiner
    Heiner
    2006-08-10

    I can no longer access my phpWiki pages using the
    wiki itself, but still have the GDBM files used
    to store them.

    I tried to read them using the code below, but
    only get the page names with some additional
    bytes, which probably are pointers to the actual
    data. Could somebody shed some light on this?

    I'd like to have the program print the complete
    text of all pages to standard output.

    This is what I have now (compile with
    "gcc -o readpages readpages.c -lgdbm"):


    / readpages.c /

    include <errno.h>

    include <stdio.h>

    include <gdbm.h>

    int main(int argc, char *argv[])
    {
    int i;

    GDBM_FILE dbf;
    datum key;
    
    if (argc &lt; 2) {
        fprintf(stderr, &quot;%s: usage: %s filename\n&quot;,
                argv[0], argv[0]);
        return 1;
    }
    
    for (i = 1; i &lt; argc; i++) {
        if (!(dbf = gdbm_open(argv[i], 0, GDBM_READER, 0, 0))) {
            fprintf(stderr, &quot;gdbm_open: %s gdbm_errno=%d, errno=%d\n&quot;,
                gdbm_strerror(gdbm_errno), gdbm_errno, errno);
            return 1;
        }
    
        for (key = gdbm_firstkey(dbf); key.dptr; key = gdbm_nextkey(dbf, key)) {
            printf(&quot;%d bytes: &lt;%.*s&gt;\n&quot;, key.dsize, key.dsize, key.dptr);
        }
    
        gdbm_close(dbf);
    }
    
    return 0;
    

    }

     
    • Reini Urban
      Reini Urban
      2006-08-12

      You can use "dumpgdbm" to dump the data also :)
      This comes with gdbm.

      Please see the php sourcecode in lib/WikiDB/backend/dbaBase.php

      • Tables:
        *
      • page:
      • Index: pagename
      • Values: latestversion . ':' . flags . ':' serialized hash of page meta data
      • Currently flags = 1 if latest version has empty content.
        *
      • version
      • Index: version:pagename
      • Value: serialized hash of revision meta data, including:
        • quasi-meta-data %content
          *
      • links
      • index: 'o' . pagename
      • value: serialized list of pages (names) which pagename links to.
      • index: 'i' . pagename
      • value: serialized list of pages which link to pagename

      Each table uses a unique prefix for the key to seperate the
      page (p), version (v), links (l)

      So to get the page HomePage => "pHomePage", and the version (the text you need) => "vHomePage"
      The value is php-serialized, which you have to unserialize.

      best done with php.
      best done by using the existing library.