Re: [Gmod-schema] Loading chromosome sequences greater than 1G bp long.

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Thank you both for answering my questions.

I don't mind using the GMOD version of Postgres if it's well maintained. Or at least should we urge the Postgres developers to bump that limit in the future release?

Chun-Huai
________________________________
From: Scott Cain <sc...@sc...>
Sent: Thursday, May 11, 2023 9:54 PM
To: Joe Carlson <jwc...@lb...>
Cc: Cheng, Chun-Huai <chu...@ws...>; gmo...@li... <gmo...@li...>
Subject: Re: [Gmod-schema] Loading chromosome sequences greater than 1G bp long.

[EXTERNAL EMAIL]

I was kind of remembering that you wrote about this. I wonder if we could/should compile our own Postgres server to bump up that limit. At first blush, that sounds like a terrible idea but I could be convinced.

On May 11, 2023, at 4:14 PM, Joe Carlson <jwc...@lb...> wrote:

Hello,

I cannot be done. Even though the documentation for a TEXT type in postgresql says it is ‘variable unlimited length', there is an internal limitation of the size of an allocated buffer. It’s in the documentation for the character data type https://www.postgresql.org/docs/current/datatype-character.html:<https://urldefense.com/v3/__https://www.postgresql.org/docs/current/datatype-character.html:__;!!JmPEgBY0HMszNaDT!r6EhVZuJU69Aj_RlYei7ShU_fjjwROeYApefHMo6TG3Hczdk1aP0_uYvPhCLfdMhFAMsNrqhpke6KF8kD5N0eg$>

In any case, the longest possible character string that can be stored is about 1 GB.

I recently ran into this problem myself and tried various ways around the limit by storing chunks and trying to concatenate with plpgsql functions, but that did not work. (https://www.postgresql.org/message-id/80025ECD-44A6-454F-A4F9-784474B84952%40lbl.gov<https://urldefense.com/v3/__https://www.postgresql.org/message-id/80025ECD-44A6-454F-A4F9-784474B84952@lbl.gov__;!!JmPEgBY0HMszNaDT!r6EhVZuJU69Aj_RlYei7ShU_fjjwROeYApefHMo6TG3Hczdk1aP0_uYvPhCLfdMhFAMsNrqhpke6KF9HEvjIzQ$>).

You need to store residues as chunked pieces in a separate table and rely on your middleware code to split pieces on injest and concatenate on output.

Joe Carlson

On May 9, 2023, at 10:22 AM, Cheng, Chun-Huai via Gmod-schema <gmo...@li...<mailto:gmo...@li...>> wrote:

Hi,

We have a big genome with large chromosomes that each of them is greater than 1G bp. We're having trouble loading them into the 'feature' table. We've tried the Tripal FASTA loader and a custom script, but both failed with some error (in Postgres v12 log: ERROR:  invalid memory alloc request size 1161290884). Is there any way we can load the sequences for the genome into Chado?

Here's some length information about the genome:

chr1     1853204363
chr2     1709916750
chr3     1527935595
chr4     1588398909
chr5     1297479159
chr6     1379031673

Thank you very much for your help,

Chun-Huai Cheng
_______________________________________________
Gmod-schema mailing list
Gmo...@li...<mailto:Gmo...@li...>
https://lists.sourceforge.net/lists/listinfo/gmod-schema<https://urldefense.com/v3/__https://lists.sourceforge.net/lists/listinfo/gmod-schema__;!!JmPEgBY0HMszNaDT!r6EhVZuJU69Aj_RlYei7ShU_fjjwROeYApefHMo6TG3Hczdk1aP0_uYvPhCLfdMhFAMsNrqhpke6KF-P204wlg$>

_______________________________________________
Gmod-schema mailing list
Gmo...@li...
https://lists.sourceforge.net/lists/listinfo/gmod-schema<https://urldefense.com/v3/__https://lists.sourceforge.net/lists/listinfo/gmod-schema__;!!JmPEgBY0HMszNaDT!r6EhVZuJU69Aj_RlYei7ShU_fjjwROeYApefHMo6TG3Hczdk1aP0_uYvPhCLfdMhFAMsNrqhpke6KF-P204wlg$>