|
From: Sandy M. <sa...@sa...> - 2003-07-07 22:24:40
|
Geoff,
Thanks for having a think about this. I think our problems are related
to htmerge but given that I had to make changes to get the code to
compile/run on 64 bit Solaris, I was concerned that there may be other
64 bit issues. Our word.db files are getting greater than 2Gb
(uncompressed due to earlier encountered zlib errors). When we merger
large databases, we are encountering problems where searches on words
that return results from the pre-merged database do not return results
from the merged database. We only seemed to encounter these issues when
we got into the >4Gb memory area.
The reason we are using htmerge is that we need a composite index of
content available from both within and external to the university, and
also to allow us to spider faster. if you can suggest why we may be
having problems we'd be really grateful.
Thanks
Sandy
p.s.
We are using 3.2.0b4-20030126 and are compiling statically.
To make it compile for 64 bit with gcc, I changed a line in
include/htconfig.h from:
(I am not a c/c++ developer so these changes are largely based on
hunches)
/* Define this to the type of the third argument of getpeername() */
#define GETPEERNAME_LENGTH_T size_t
to:
/* Define this to the type of the third argument of getpeername() */
#define GETPEERNAME_LENGTH_T socklen_t
and in htlib/String.cc (to resolve a segmentation error)
void String::copy_data_from(const char *s, int len, int dest_offset)
{
memcpy(Data + dest_offset, s, len);
}
to:
void String::copy_data_from(const char *s, size_t len, size_t
dest_offset)
{
memcpy(Data + dest_offset, s, len);
}
On Monday, July 7, 2003, at 10:10 pm, Geoff Hutchison wrote:
>> Can anyone comment on what would be required to make the htdig 3.2b4
>> code 64 bit clean ?
>
> I'm not familiar with issues on Solaris, but ht://Dig has long been
> "clean" on Alpha systems. So it should be 64-bit clean as-is. The
> Berkeley DB code is most definitely 64-bit clean and handles databases
> up to 4TB if you've got the hardware for it.
>
>> I am merging large databases, and the htmerge requires >4Gb memory so
>> I have had to compile using -m64 (Solaris 8, gcc 3.2.1) but had to
>> make some small changes in the code to make it compile, and to avoid
>> a segmentation error.
>
> OK, what changes did you make exactly? What snapshot did you use? Are
> you attempting to compile/use shared libraries?
>
>> I am indexing > 400,000 pages - any thoughts on whether htdig 3.3 can
>> scale to this ?
>
> I'm curious why this is causing problems. I know people who have
> 32-bit systems that easily handle 400,000+ pages. How big are your
> databases exactly? Are your problems limited to htmerge? (In which
> case, I likely know the problem, and it's not due to 64-bit
> addressing.)
>
> -Geoff
>
> --
> -Geoff Hutchison
> Williams Students Online
> http://wso.williams.edu/
>
>
>
> -------------------------------------------------------
> This SF.Net email sponsored by: Free pre-built ASP.NET sites including
> Data Reports, E-commerce, Portals, and Forums are available now.
> Download today and enter to win an XBOX or Visual Studio .NET.
> http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/
> 01
> _______________________________________________
> ht://Dig Developer mailing list:
> htd...@li...
> List information (subscribe/unsubscribe, etc.)
> https://lists.sourceforge.net/lists/listinfo/htdig-dev
>
|