|
From: Karl O. P. <ko...@me...> - 2015-04-21 19:31:02
|
On Tue, 21 Apr 2015 12:50:32 -0500
Mara Kim <mar...@va...> wrote:
> Wow. There really are features that are more than 2 billion long?
>
> Not sure about the present, but it's only a matter of time. The
> largest sequenced genome is currently *Pinus taeda* at 23.18 Gbp over
> 12 chromosomes. In the best case scenario where these are
> distributed equally over all chromosomes, that gives a chromosome
> size of 1.93 Gbp, which is uncomfortably close. *Paris japonica *is
> known to have a genome size of ~150 Gbp.
The bad news is that having featureloc be bigint could really
affect performance, as, potentially, does moving some of the
keys to bigint.
Step 1 would then be to measure just what the performance impact
is. No point in making a fuss if it does not matter.
My back-of-the-pants feeling is that it will matter, especially
as you start to get into billions of rows.
If it does matter then perhaps whether to use bigints could
be an install-time choice. At least in some cases.
I can't see at lot of current chado users ever wanting
bigints for feature offsets, as critical as it might be
for some. (We are "feature oriented" here and
wouldn't want to see our storage needs balloon, or
increase the amount of time it takes to process
the storage.)
Karl <ko...@me...>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein
|