Thanks for the replies everyone. I'm working with a user who wants us to
ingest one image, w/ associated radar data, every 5 minutes for several
years into the future, so over 100,000 per year. We can automate the
process with a nested parent/child metadata hierarchy, I'm thinking of a
separate GN instance for each year, and harvest from those. I would
bundle the images together if they were small enough, but the total
comes out to 315 MB/day.
Does anyone have experience they would like to share with such large
On 02/16/2012 04:50 AM, heikki wrote:
> The TooManyClauses error happens because of the way certain queries are
> constructed, it's not directly the number of documents that are in the
> index. See e.g. this explanation
> MaxClauseCount is set to 16384 in GeoNetwork, which seems a bit random
> to me, it could be set to Integer.MAX_VALUE (which is 2^31 -1).
> Also, we're using TermRangeQuery for date ranges, where it may be more
> ideal to use NumericRangeQuery
> Apart from range queries, I would expect that a catalog with 100,000s or
> millions of records would still be fine -- though I have no performance
> data on such sizes. If anyone does, please let us know ?
> Kind regards,
> Heikki Doeleman
> On Thu, Feb 16, 2012 at 2:20 PM, Victor Epitropou
> <vepitrop@... <mailto:vepitrop@...>> wrote:
> Well, I first investigated this aspect a few days ago, when somebody
> posted a complaint about a maxClauseCount parameter in the Lucene
> index (forwarding post):
> On Tue, Feb 14, 2012 at 4:32 PM, <kieransun@...
> <mailto:kieransun@...>> wrote:
> > Hello GeoNetwork users,
> > we are searching for time ranges in about 54500 datasets inside
> the CSW and get the following error:
> > Raised exception while searching metadata :
> org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount
> is set to 16384
> > How to raise it?
> > Kind regards,
> > Kieran
> After that, just googling for "maxClauseCount" brought up an
> interesting backstory to it. Apparently it is always hardcoded to some
> "high enough" value (which, according to who you ask, might be 16000,
> 32000 or 64000 records etc.), which is of course proverbially proven
> wrong, sooner or later. Fortunately, it can be changed through
> configuration. That's the only obvious hard & fast limit which is due
> only to an arbitrary constraint that I could identify. For the
> rest....I suppose interesting things could happen if someone exceeds a
> total of 2^31 records (might cause 32-bit integer overflows in certain
> software modules).
International Arctic Research Center
University of Alaska Fairbanks