Re: [Gmod-schema] Loading chromosome sequences greater than 1G bp long.

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hello,

I cannot be done. Even though the documentation for a TEXT type in postgresql says it is ‘variable unlimited length', there is an internal limitation of the size of an allocated buffer. It’s in the documentation for the character data type https://www.postgresql.org/docs/current/datatype-character.html:

> In any case, the longest possible character string that can be stored is about 1 GB.

I recently ran into this problem myself and tried various ways around the limit by storing chunks and trying to concatenate with plpgsql functions, but that did not work. (https://www.postgresql.org/message-id/80025ECD-44A6-454F-A4F9-784474B84952%40lbl.gov <https://www.postgresql.org/message-id/800...@lb...>).

You need to store residues as chunked pieces in a separate table and rely on your middleware code to split pieces on injest and concatenate on output.

Joe Carlson

> On May 9, 2023, at 10:22 AM, Cheng, Chun-Huai via Gmod-schema <gmo...@li...> wrote:
> 
> Hi,
> 
> We have a big genome with large chromosomes that each of them is greater than 1G bp. We're having trouble loading them into the 'feature' table. We've tried the Tripal FASTA loader and a custom script, but both failed with some error (in Postgres v12 log: ERROR:  invalid memory alloc request size 1161290884). Is there any way we can load the sequences for the genome into Chado?
> 
> Here's some length information about the genome:
> 
> chr1     1853204363
> chr2     1709916750
> chr3     1527935595
> chr4     1588398909
> chr5     1297479159
> chr6     1379031673
> 
> 
> Thank you very much for your help,
> 
> Chun-Huai Cheng
> _______________________________________________
> Gmod-schema mailing list
> Gmo...@li... <mailto:Gmo...@li...>
> https://lists.sourceforge.net/lists/listinfo/gmod-schema <https://lists.sourceforge.net/lists/listinfo/gmod-schema>