Menu

Journal file size, can it be compacted?

Help
2012-09-12
2014-02-19
  • Ola Bildtsen

    Ola Bildtsen - 2012-09-12

    We have a journal file that for some reason has grown unreasonably large (it's around 96GB at the moment).  We suspect that at some point a process added a whole bunch of triples and that those were subsequently removed from the bigdata store.  But the journal file size remains large.  Is it possible to compact/shrink/optimize a journal file?  We're using the NanoSparqlServer version of Bigdata.

     
  • Bryan Thompson

    Bryan Thompson - 2012-09-12

    Ola,

    The RWStore reserves space when it needs to grow the file.  The allocations are spread throughout each new extent.  The addresses of those allocations are in the B+Tree nodes.  It is not possible to "compact" the file in place.

    However, there is a CompactJournalUtility that can be used to generate a new Journal from an existing Journal.  It will only copy the most recent committed state into the new journal.  It will do this for all indices on the source journal.

    There is also an "Export/Import" utility described on the wiki (https://sourceforge.net/apps/mediawiki/bigdata/index.php?title=DataMigration#Export)

    Neither of those utilities can be run while the NanoSparqlServer is running.

    Thanks,
    Bryan

     
  • Ola Bildtsen

    Ola Bildtsen - 2012-09-12

    Thanks, Bryan.  I tried the export/import method - the export worked great, and I have a data.xml.gz file in TRIX format.  Presumably then I should be able to import that into a NanoSparqlServer instance with the following:

    curl -X POST -H "Content-Type: application/trix" -d @data.xml http://localhost:8282/bigdata/sparql
    

    This looks like it's working, in that it takes a long time on a 36G input file.  When the curl command finally completes, I get no response back and no errors in the logs.  But there is nothing in the triplestore - simple query returns no results and an ESTCARD query returns 0 for rangeCount.

     
  • Bryan Thompson

    Bryan Thompson - 2012-09-13

    Ola,

    Can you replicate your problem and file a ticket including a sample data file and the command (as above) that you are using?

    Are you sure that the @data.xml file was appropriately formatted?  Per the curl man page, it defaults to ASCII and wants the file to be url encoded.  Maybe you should be using -data-binary?  I am not sure.  I don't use curl that much.

    Thanks,
    Bryan

     
  • Bryan Thompson

    Bryan Thompson - 2012-09-13

    We've added a listener to support incremental feedback on SPARQL UPDATE operations.  However, we have not yet integrated this into the REST response for HTTP POSTS for SPARQL UPDATE.  Currently, the HTTP status code indicates whether or not the operation was successful.  However, if we start writing on the response in order to provide incremental feedback on the UPDATE progress, then the response can be committed (at the servlet level, that means that it has flushed the first buffer to the client).  After the response is committed, we can not change the status code.

    I think that we will probably deal with this by offering a URL query parameter that can be used to request incremental updates on the progress of SPARQL UPDATE operations.  When specified, you will see a 200 (Ok) status code and it will write an XHTML response document that details the progress of the UPDATE request.  That way we will not break the expectations of REST-ful client that expect the status code to indicate the outcome of the request (success or failure).

    Bryan

     
  • Ola Bildtsen

    Ola Bildtsen - 2012-09-13

    Unfortunately, I cannot post the sample data I was using as it contains proprietary information.  I'm assuming the file is properly formatted as I didn't modify it after doing the export (ExportKB).  So unless you think bigdata export may have a problem, I'd say it's safe to say the data is ok.

     
  • Bryan Thompson

    Bryan Thompson - 2012-09-13

    Ola,

    I do not need your data.  But, please post some data that demonstrates the problem.  Also, note that the curl documentation says it is expecting the file contends to be URL encoded.  Export definitely does not do that.  I suggest that you try this on a small file and try a binary encoding.

    Bryan

     
  • Ola Bildtsen

    Ola Bildtsen - 2012-09-13

    Thanks, Bryan - I will try that before posting an issue.

     

Log in to post a comment.