Thread: [jgrapht-users] Query about scalability using JGraphT
Brought to you by:
barak_naveh,
perfecthash
From: Lokeya V. <lo...@gm...> - 2007-04-03 03:26:42
|
Hi, I am looking at using an appropriate Java Graph library to use for my project. I am doing a project in information retrieval where I deal with huge collection of documents. I will be representing the document groups as nodes and use edges to connect these. My Graph will keep growing as I will be trying to find get more connections among documents. So is JGraph scalable for such case? Kindly let me know. |
From: Trevor H. <tr...@vo...> - 2007-04-03 04:40:29
|
On Apr 2, 2007, at 8:26 PM, Lokeya Venkatachalam wrote: > I am doing a project in information retrieval where I deal with > huge collection of documents. How huge is huge? > So is JGraph scalable for such case? It depends on what operations you're doing. I haven't done any tests, but I'm sure JGraphT is as a scalable as any other Java-based graph library. Trevor |
From: Lokeya V. <lo...@gm...> - 2007-04-05 17:24:14
|
We are dealing with millions of documents. Each document is less than 2KB size. On 4/3/07, Trevor Harmon <tr...@vo...> wrote: > > On Apr 2, 2007, at 8:26 PM, Lokeya Venkatachalam wrote: > > > I am doing a project in information retrieval where I deal with > > huge collection of documents. > > How huge is huge? > > > So is JGraph scalable for such case? > > It depends on what operations you're doing. I haven't done any tests, > but I'm sure JGraphT is as a scalable as any other Java-based graph > library. > > Trevor > > |
From: Trevor H. <tr...@vo...> - 2007-04-05 20:15:22
|
On Apr 5, 2007, at 10:24 AM, Lokeya Venkatachalam wrote: > We are dealing with millions of documents. Each document is less > than 2KB size. How about generating a graph of millions of 2 KB vertices? (Use one of the classes in org.jgrapht.generate for this.) Then you can run some simple tests and see what kind of performance you get. Trevor |
From: Lokeya V. <lo...@gm...> - 2007-04-11 00:06:55
|
I will do that testing and see. But I am just curious if anyone has done this before. To be more precise, will it scale to represent 1 million document , where each document is 1KB in size which is around 1GB of data in form of nodes. On 4/5/07, Trevor Harmon <tr...@vo...> wrote: > > On Apr 5, 2007, at 10:24 AM, Lokeya Venkatachalam wrote: > > > We are dealing with millions of documents. Each document is less > > than 2KB size. > > How about generating a graph of millions of 2 KB vertices? (Use one > of the classes in org.jgrapht.generate for this.) Then you can run > some simple tests and see what kind of performance you get. > > Trevor > > |
From: Randall R S. <rs...@so...> - 2007-04-11 00:57:25
|
On Tuesday 10 April 2007 17:06, Lokeya Venkatachalam wrote: > I will do that testing and see. But I am just curious if anyone has > done this before. > To be more precise, will it scale to represent 1 million document , > where each document is 1KB in size which is around 1GB of data in > form of nodes. The amount of internal storage associated with each node is not a factor in the performance of the JGraphT code. That concerns only the magnitude of the node and arc sets, the graph's topology and what operations you invoke. 32-bit Java Virtual Machines can comfortably manage 1 GB heaps. Naturally, you must have the full GB of physical RAM available for the JVM. Randall Schulz |
From: Aaron H. <jgr...@li...> - 2007-04-11 00:53:48
|
Hi Lokeya, On Apr 10, 2007, at 8:06 PM, Lokeya Venkatachalam wrote: > I will do that testing and see. But I am just curious if anyone has > done this before. I've done lots of work with about 500,000 vertices, which is a bit ungainly but certainly tractable. > To be more precise, will it scale to represent 1 million document , > where each document is 1KB in size which is around 1GB of data in > form of nodes. Unless you're actively using the content of the documents as you traverse the nodes (certainly a possibility), it would seem wise to me to keep the node objects very lightweight, referring to a backing store of files or database rows as necessary. In the project I referred to above, we chose the latter route, with the nodes initially containing just id numbers, and optionally caching values from the database as needed. hope that helps, Aaron -- Aaron Harnly Center for Computational Learning Systems Columbia University |
From: Lokeya V. <lo...@gm...> - 2007-04-11 01:23:52
|
Thanks a lot for your reply. This gives me really a good idea. I like to how whats the size of a node/vertex in the 500,000 vertex graph which you have mentioned. My project is related to information processing to be more specific query processing with the option for user feedback where they can take the initial graph and further ask for more relations among documents, so the graph will grow. I understand the fact that the ability to get the graph is more dependent on the RAM but then I just want to make sure if I have really a good amount of memory support, then JGraph doesn't have any issue with respect to number of nodes/edges. On 4/10/07, Aaron Harnly <jgr...@li...> wrote: > > Hi Lokeya, > > On Apr 10, 2007, at 8:06 PM, Lokeya Venkatachalam wrote: > > I will do that testing and see. But I am just curious if anyone has > > done this before. > > I've done lots of work with about 500,000 vertices, which is a bit > ungainly but certainly tractable. > > > To be more precise, will it scale to represent 1 million document , > > where each document is 1KB in size which is around 1GB of data in > > form of nodes. > > Unless you're actively using the content of the documents as you > traverse the nodes (certainly a possibility), it would seem wise to > me to keep the node objects very lightweight, referring to a backing > store of files or database rows as necessary. In the project I > referred to above, we chose the latter route, with the nodes > initially containing just id numbers, and optionally caching values > from the database as needed. > > hope that helps, > Aaron > -- > Aaron Harnly > Center for Computational Learning Systems > Columbia University > > > > |