Thread: [jgrapht-users] Query about scalability using JGraphT

Brought to you by: barak_naveh, perfecthash

jgrapht-users

[jgrapht-users] Query about scalability using JGraphT

From: Lokeya V. <lo...@gm...> - 2007-04-03 03:26:42

Hi,

I am looking at using an appropriate Java Graph library to use for my
project. I am doing a project in information retrieval where I deal with
huge collection of documents. I will be representing the document groups as
nodes and use edges to connect these. My Graph will keep growing as I will
be trying to find get more connections among documents. So is JGraph
scalable for such case?

Kindly let me know.

Re: [jgrapht-users] Query about scalability using JGraphT

From: Trevor H. <tr...@vo...> - 2007-04-03 04:40:29

On Apr 2, 2007, at 8:26 PM, Lokeya Venkatachalam wrote:

> I am doing a project in information retrieval where I deal with  
> huge collection of documents.

How huge is huge?

> So is JGraph scalable for such case?

It depends on what operations you're doing. I haven't done any tests,  
but I'm sure JGraphT is as a scalable as any other Java-based graph  
library.

Trevor

Re: [jgrapht-users] Query about scalability using JGraphT

From: Lokeya V. <lo...@gm...> - 2007-04-05 17:24:14

We are dealing with millions of documents. Each document is less than 2KB
size.

On 4/3/07, Trevor Harmon <tr...@vo...> wrote:
>
> On Apr 2, 2007, at 8:26 PM, Lokeya Venkatachalam wrote:
>
> > I am doing a project in information retrieval where I deal with
> > huge collection of documents.
>
> How huge is huge?
>
> > So is JGraph scalable for such case?
>
> It depends on what operations you're doing. I haven't done any tests,
> but I'm sure JGraphT is as a scalable as any other Java-based graph
> library.
>
> Trevor
>
>

Re: [jgrapht-users] Query about scalability using JGraphT

From: Trevor H. <tr...@vo...> - 2007-04-05 20:15:22

On Apr 5, 2007, at 10:24 AM, Lokeya Venkatachalam wrote:

> We are dealing with millions of documents. Each document is less  
> than 2KB size.

How about generating a graph of millions of 2 KB vertices? (Use one  
of the classes in org.jgrapht.generate for this.) Then you can run  
some simple tests and see what kind of performance you get.

Trevor

Re: [jgrapht-users] Query about scalability using JGraphT

From: Lokeya V. <lo...@gm...> - 2007-04-11 00:06:55

I will do that testing and see. But I am just curious if anyone has done
this before.
To be more precise, will it scale to represent 1 million document , where
each document is 1KB in size which is around 1GB of data in form of nodes.

On 4/5/07, Trevor Harmon <tr...@vo...> wrote:
>
> On Apr 5, 2007, at 10:24 AM, Lokeya Venkatachalam wrote:
>
> > We are dealing with millions of documents. Each document is less
> > than 2KB size.
>
> How about generating a graph of millions of 2 KB vertices? (Use one
> of the classes in org.jgrapht.generate for this.) Then you can run
> some simple tests and see what kind of performance you get.
>
> Trevor
>
>

Re: [jgrapht-users] Query about scalability using JGraphT

From: Randall R S. <rs...@so...> - 2007-04-11 00:57:25

On Tuesday 10 April 2007 17:06, Lokeya Venkatachalam wrote:
> I will do that testing and see. But I am just curious if anyone has
> done this before.
> To be more precise, will it scale to represent 1 million document ,
> where each document is 1KB in size which is around 1GB of data in
> form of nodes.

The amount of internal storage associated with each node is not a factor 
in the performance of the JGraphT code. That concerns only the 
magnitude of the node and arc sets, the graph's topology and what 
operations you invoke.

32-bit Java Virtual Machines can comfortably manage 1 GB heaps. 
Naturally, you must have the full GB of physical RAM available for the 
JVM.

Randall Schulz

Re: [jgrapht-users] Query about scalability using JGraphT

From: Aaron H. <jgr...@li...> - 2007-04-11 00:53:48

Hi Lokeya,

On Apr 10, 2007, at 8:06 PM, Lokeya Venkatachalam wrote:
> I will do that testing and see. But I am just curious if anyone has  
> done this before.

I've done lots of work with about 500,000 vertices, which is a bit  
ungainly but certainly tractable.

> To be more precise, will it scale to represent 1 million document ,  
> where each document is 1KB in size which is around 1GB of data in  
> form of nodes.

Unless you're actively using the content of the documents as you  
traverse the nodes (certainly a possibility), it would seem wise to  
me to keep the node objects very lightweight, referring to a backing  
store of files or database rows as necessary. In the project I  
referred to above, we chose the latter route, with the nodes  
initially containing just id numbers, and optionally caching values  
from the database as needed.

hope that helps,
Aaron
--
Aaron Harnly
Center for Computational Learning Systems
Columbia University

Re: [jgrapht-users] Query about scalability using JGraphT

From: Lokeya V. <lo...@gm...> - 2007-04-11 01:23:52

Thanks a lot for your reply.

This gives me really a good idea. I like to how whats the size of a
node/vertex in the 500,000 vertex graph which you have mentioned.
My project is related to information processing to be more specific query
processing with the option for user feedback where they can take the initial
graph and further ask for more relations among documents, so the graph will
grow.

I understand the fact that the ability to get the graph is more dependent on
the RAM but then I just want to make sure if I have really a good amount of
memory support, then JGraph doesn't have any issue with respect to number of
nodes/edges.

On 4/10/07, Aaron Harnly <jgr...@li...> wrote:
>
> Hi Lokeya,
>
> On Apr 10, 2007, at 8:06 PM, Lokeya Venkatachalam wrote:
> > I will do that testing and see. But I am just curious if anyone has
> > done this before.
>
> I've done lots of work with about 500,000 vertices, which is a bit
> ungainly but certainly tractable.
>
> > To be more precise, will it scale to represent 1 million document ,
> > where each document is 1KB in size which is around 1GB of data in
> > form of nodes.
>
> Unless you're actively using the content of the documents as you
> traverse the nodes (certainly a possibility), it would seem wise to
> me to keep the node objects very lightweight, referring to a backing
> store of files or database rows as necessary. In the project I
> referred to above, we chose the latter route, with the nodes
> initially containing just id numbers, and optionally caching values
> from the database as needed.
>
> hope that helps,
> Aaron
> --
> Aaron Harnly
> Center for Computational Learning Systems
> Columbia University
>
>
>
>