From: Cheng, Chun-H. <chu...@ws...> - 2023-05-12 16:04:56
|
Thank you both for answering my questions. I don't mind using the GMOD version of Postgres if it's well maintained. Or at least should we urge the Postgres developers to bump that limit in the future release? Chun-Huai ________________________________ From: Scott Cain <sc...@sc...> Sent: Thursday, May 11, 2023 9:54 PM To: Joe Carlson <jwc...@lb...> Cc: Cheng, Chun-Huai <chu...@ws...>; gmo...@li... <gmo...@li...> Subject: Re: [Gmod-schema] Loading chromosome sequences greater than 1G bp long. [EXTERNAL EMAIL] I was kind of remembering that you wrote about this. I wonder if we could/should compile our own Postgres server to bump up that limit. At first blush, that sounds like a terrible idea but I could be convinced. On May 11, 2023, at 4:14 PM, Joe Carlson <jwc...@lb...> wrote: Hello, I cannot be done. Even though the documentation for a TEXT type in postgresql says it is ‘variable unlimited length', there is an internal limitation of the size of an allocated buffer. It’s in the documentation for the character data type https://www.postgresql.org/docs/current/datatype-character.html:<https://urldefense.com/v3/__https://www.postgresql.org/docs/current/datatype-character.html:__;!!JmPEgBY0HMszNaDT!r6EhVZuJU69Aj_RlYei7ShU_fjjwROeYApefHMo6TG3Hczdk1aP0_uYvPhCLfdMhFAMsNrqhpke6KF8kD5N0eg$> In any case, the longest possible character string that can be stored is about 1 GB. I recently ran into this problem myself and tried various ways around the limit by storing chunks and trying to concatenate with plpgsql functions, but that did not work. (https://www.postgresql.org/message-id/80025ECD-44A6-454F-A4F9-784474B84952%40lbl.gov<https://urldefense.com/v3/__https://www.postgresql.org/message-id/800...@lb...__;!!JmPEgBY0HMszNaDT!r6EhVZuJU69Aj_RlYei7ShU_fjjwROeYApefHMo6TG3Hczdk1aP0_uYvPhCLfdMhFAMsNrqhpke6KF9HEvjIzQ$>). You need to store residues as chunked pieces in a separate table and rely on your middleware code to split pieces on injest and concatenate on output. Joe Carlson On May 9, 2023, at 10:22 AM, Cheng, Chun-Huai via Gmod-schema <gmo...@li...<mailto:gmo...@li...>> wrote: Hi, We have a big genome with large chromosomes that each of them is greater than 1G bp. We're having trouble loading them into the 'feature' table. We've tried the Tripal FASTA loader and a custom script, but both failed with some error (in Postgres v12 log: ERROR: invalid memory alloc request size 1161290884). Is there any way we can load the sequences for the genome into Chado? Here's some length information about the genome: chr1 1853204363 chr2 1709916750 chr3 1527935595 chr4 1588398909 chr5 1297479159 chr6 1379031673 Thank you very much for your help, Chun-Huai Cheng _______________________________________________ Gmod-schema mailing list Gmo...@li...<mailto:Gmo...@li...> https://lists.sourceforge.net/lists/listinfo/gmod-schema<https://urldefense.com/v3/__https://lists.sourceforge.net/lists/listinfo/gmod-schema__;!!JmPEgBY0HMszNaDT!r6EhVZuJU69Aj_RlYei7ShU_fjjwROeYApefHMo6TG3Hczdk1aP0_uYvPhCLfdMhFAMsNrqhpke6KF-P204wlg$> _______________________________________________ Gmod-schema mailing list Gmo...@li... https://lists.sourceforge.net/lists/listinfo/gmod-schema<https://urldefense.com/v3/__https://lists.sourceforge.net/lists/listinfo/gmod-schema__;!!JmPEgBY0HMszNaDT!r6EhVZuJU69Aj_RlYei7ShU_fjjwROeYApefHMo6TG3Hczdk1aP0_uYvPhCLfdMhFAMsNrqhpke6KF-P204wlg$> |