[ https://jira.duraspace.org/browse/DS-1059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=25399#comment-25399 ]
Mark H. Wood commented on DS-1059:
>From #duraspace, 20-Jun-2012:
(16:04:52) tdonohue: Next up is DS-1059
(16:04:53) kompewter: [ https://jira.duraspace.org/browse/DS-1059 ] - [#DS-1059] Statistics utilities should be filters - DuraSpace JIRA
(16:05:11) tdonohue: mhwood, do you need any support on this or feedback?
(16:05:36) mhwood: No, just this nudge to get it moving, since nobody has any negative comments.
(16:05:52) tdonohue: It sounds like a great idea to me (haven't looked at the code though)
(16:06:36) tdonohue: ok, we'll keep moving along then to the next ticket, DS-1061
(16:06:39) helix84: +1 to the idea, although I haven't seen the code
(16:06:54) sands: same.
(16:07:23) ***tdonohue notes last two comments were for Ds-1059 (just for future clarity)
(16:07:59) mdiggory: I am amicable to better handling of the logs in 1061
(16:08:15) mdiggory: sorry now I'm creating confusion... I meant 1059
(16:09:22) mdiggory: for 1059, it would also be nice if the calculated Solr records were written to disk rather than having to retroll the original logs if the solr core needs to be restored.
[I think that's all.]
> Statistics utilities should be filters
> Key: DS-1059
> URL: https://jira.duraspace.org/browse/DS-1059
> Project: DSpace
> Issue Type: New Feature
> Components: Solr
> Reporter: Mark H. Wood
> Assignee: Mark H. Wood
> Fix For: 3.0
> Attachments: ApacheLogRobotsProcessor.patch, ClassicDSpaceLogConverter.patch, StatisticsImporter.patch
> Log files on large, busy sites may be enormous. This can make it difficult to find enough storage when (re)loading statistical cases from logs. Very large files also are very likely to be compressed by the sysadmin to save storage, which means they have to be decompressed before feeding them to the utilities, requiring even more temporary storage.
> One should be able to operate these utilities in a pipeline so that intermediate storage is eliminated. That is, they should be able to operate as filters: read standard input, write standard output.
> Attached patches (as they come) will add this *optional* behavior. If -i or -o switches are not given, or are given the value "-" or "", this means respectively standard input or standard output. Any other values will work as before: the value names a file.
> One can then do things like 'bunzip2 < logs/gigundo.log | bin/dspace stats-log-converter | bin/dspace stats-log-importer' without any additional storage required beyond what Solr will use.
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://jira.duraspace.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira