Currently webseer allows reporting to a line-delimited format mostly for outputting to a console, a delimited format (like CSV), and WEKA for analysis. However, for large-scale crawls it would be much better to have a more fluid format. Line delimited format can't be read from easily but is easily written to, character delimited is easy to read, but requires a lot of rewriting when new attributes are discovered. Weka is very similar to the latter, though its sparse format comes close to what I'm thinking of. I propose keeping the attribute list separate from the data and then reporting the feature values in a sparse manner (perhaps using attribute indexes as opposed to strings). This will allow us to add attributes in the middle of writing while still allowing fast attribute header reading. It will require some more complex file opening/locking and packaging.