From: SourceForge.net <no...@so...> - 2006-04-15 21:04:17
|
Feature Requests item #1471020, was opened at 2006-04-15 14:04 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=684781&aid=1471020&group_id=119724 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: textIndexer Group: None Status: Open Resolution: None Priority: 5 Submitted By: Martin Haye (mhaye) Assigned to: Martin Haye (mhaye) Summary: Index MARC records Initial Comment: We are running an experiment where we're trying to index about 10 million meta-data records. They're in MARC format, in large files containing about a million records per file. We attempted to convert each record to a separate MODS XML file, but tended to blow up the filesystem and the textIndexer also used a lot of memory processing the file list. It would be nice to be able to have the indexer process the records directly, converting them to MARCXML using the Marc4j library, then passing the MARCXML to a series of prefilters that would result in MODS and finally indexable meta-data. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=684781&aid=1471020&group_id=119724 |