htdig-dev Mailing List for ht://Dig (Page 77)

Brought to you by: angusgb, grdetil, lha, nealr, scherpbier

htdig-dev — Developer Discussion for the ht://Dig project

You can subscribe to this list here.

2001	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (47)	Nov (74)	Dec (66)
2002	Jan (95)	Feb (102)	Mar (83)	Apr (64)	May (55)	Jun (39)	Jul (23)	Aug (77)	Sep (88)	Oct (84)	Nov (66)	Dec (46)
2003	Jan (56)	Feb (129)	Mar (37)	Apr (63)	May (59)	Jun (104)	Jul (48)	Aug (37)	Sep (49)	Oct (157)	Nov (119)	Dec (54)
2004	Jan (51)	Feb (66)	Mar (39)	Apr (113)	May (34)	Jun (136)	Jul (67)	Aug (20)	Sep (7)	Oct (10)	Nov (14)	Dec (3)
2005	Jan (40)	Feb (21)	Mar (26)	Apr (13)	May (6)	Jun (4)	Jul (23)	Aug (3)	Sep (1)	Oct (13)	Nov (1)	Dec (6)
2006	Jan (2)	Feb (4)	Mar (4)	Apr (1)	May (11)	Jun (1)	Jul (4)	Aug (4)	Sep	Oct (4)	Nov	Dec (1)
2007	Jan (2)	Feb (8)	Mar (1)	Apr (1)	May (1)	Jun	Jul (2)	Aug	Sep (1)	Oct	Nov	Dec
2008	Jan (1)	Feb	Mar (1)	Apr (2)	May	Jun	Jul (1)	Aug	Sep (1)	Oct	Nov	Dec
2009	Jan	Feb	Mar (2)	Apr	May (1)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2010	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec (1)
2011	Jan	Feb	Mar (1)	Apr	May (1)	Jun	Jul	Aug	Sep	Oct (1)	Nov	Dec
2012	Jan	Feb	Mar	Apr	May	Jun	Jul (1)	Aug	Sep	Oct	Nov	Dec
2013	Jan	Feb	Mar	Apr (1)	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2016	Jan (1)	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2017	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov (1)	Dec

Flat | Threaded

<< < 1 .. 75 76 77 78 79 .. 108 > >> (Page 77 of 108)

[htdig-dev] Current Status as of snapshot 3.2.0b4-20020922

From: Geoff H. <ghu...@us...> - 2002-09-22 07:13:50

STATUS of ht://Dig branch 3-2-x

RELEASES:
   3.2.0b4: In progress 
	(mifluz merge essentially finished, contact Geoff for patch to test)
   3.2.0b3: Released:  22 Feb 2001.
   3.2.0b2: Released:  11 Apr 2000.
   3.2.0b1: Released:   4 Feb 2000.

SHOWSTOPPERS:

KNOWN BUGS:
* Odd behavior with $(MODIFIED) and scores not working with
   wordlist_compress set but work fine without wordlist_compress.
   (the date is definitely stored correctly, even with compression on
    so this must be some sort of weird htsearch bug)
* Not all htsearch input parameters are handled properly: PR#648. Use a
   consistant mapping of input -> config -> template for all inputs where
   it makes sense to do so (everything but "config" and "words"?).
* If exact isn't specified in the search_algorithms, $(WORDS) is not set 
   correctly: PR#650. (The documentation for 3.2.0b1 is updated, but can
   we fix this?)
* META descriptions are somehow added to the database as FLAG_TITLE,
   not FLAG_DESCRIPTION. (PR#859)

PENDING PATCHES (available but need work):
* Additional support for Win32.
* Memory improvements to htmerge. (Backed out b/c htword API changed.)

NEEDED FEATURES:
* Field-restricted searching.
* Return all URLs.
* Handle noindex_start & noindex_end as string lists.

TESTING:
* httools programs: 
  (htload a test file, check a few characteristics, htdump and compare)
* Turn on URL parser test as part of test suite.
* htsearch phrase support tests
* Tests for new config file parser
* Duplicate document detection while indexing
* Major revisions to ExternalParser.cc, including fork/exec instead of popen,
  argument handling for parser/converter, allowing binary output from an
  external converter.
* ExternalTransport needs testing of changes similar to ExternalParser.

DOCUMENTATION:
* List of supported platforms/compilers is ancient.
* Add thorough documentation on htsearch restrict/exclude behavior
   (including '|' and regex).
* Document all of htsearch's mappings of input parameters to config attributes
   to template variables. (Relates to PR#648.) Also make sure these config
   attributes are all documented in defaults.cc, even if they're only set by
   input parameters and never in the config file.
* Split attrs.html into categories for faster loading.
* require.html is not updated to list new features and disk space
   requirements of 3.2.x (e.g. phrase searching, regex matching,
   external parsers and transport methods, database compression.)
* TODO.html has not been updated for current TODO list and completions.

OTHER ISSUES:
* Can htsearch actually search while an index is being created?
   (Does Loic's new database code make this work?)
* The code needs a security audit, esp. htsearch
* URL.cc tries to parse malformed URLs (which causes further problems)
   (It should probably just set everything to empty) This relates to 
   PR#348.

[htdig-dev] removing double slashes from URL path may be a problem

From: Gilles D. <gr...@sc...> - 2002-09-21 02:45:29

Hi again, folks.  Another bug I discovered in htdig while I was
experimenting with different approaches to indexing the Geocrawler
archives was that its removal of double slashes in URL::normalizePath()
may cause problems.  Here's the code in question:

    //
    // Furthermore, get rid of "//".  This could also cause loops
    //
    while ((i = _path.indexOf("//")) >= 0 && i < pathend)
    {
        String  newPath;
        newPath << _path.sub(0, i).get();
        newPath << _path.sub(i + 1).get();
        _path = newPath;
        pathend = _path.indexOf('?');
        if (pathend < 0)
            pathend = _path.length();
    }

The problem with this is that is assumes the path refers to a standard
hierarchical filesystem where a null path component is taken as the same
as a ".".  That assumption can break down when the path is process by a
script rather than by the filesystem.  (There was a rant about a similar
bug in URL handling in Office XP in Woody's Office Watch sometime ago
just before Office XP's release.)  The easy fix I can think of would be
to prefix the while loop above with this:

    if (config.Boolean("remove_double_slash", 1))

and set that attribute to true by default in htcommon/defaults.cc.
Setting it to false in your htdig.conf would turn off this feature when
it causes problems.  It's still a good feature to have in most cases of
normalizing conventional filesystem paths.  The other approach I thought
of, which would take more work, would be to have a StringList attribute
called remove_url_path_part or something of the sort, which would define
all the substrings to be stripped out.  The complication is that not all
the stuff that URL::normalizePath() strips out is simple substrings.  I'm
leaning to the simpler fix.  Thoughts?

The way I stumbled into this problem with Geocrawler was when I used a
start_url like

	http://www.geocrawler.com/archives/3/8822/2002/8/

to index the Aug 2002 htdig-general archives.  Normally, the path component
after the month is a starting document number for a page of 50 messages in
reverse chronological order.  E.g., for August,

	http://www.geocrawler.com/archives/3/8822/2002/8/0/

is the last 50 messages of the month,

	http://www.geocrawler.com/archives/3/8822/2002/8/50/

is the next 50, and so on, leading to URLs for the individual messages
like this:

	http://www.geocrawler.com/archives/3/8822/2002/8/100/9269993/

But if you omit the starting document number, as in the first URL above,
Geocrawler generates URLs for messages with a null starting number, as

	http://www.geocrawler.com/archives/3/8822/2002/8//9269993/

which work fine until htdig removes one of the slashes, at which point
geocrawler tries a non-existant starting number, rather than recognising
the last number as a message ID (because the position in the path is
wrong), so you'd end up indexing a lot of "No Results Found" pages.
The workaround in this case was easy enough - I just used a starting
number of 0/ at the end of the start_url, and all was good.  I also
used url_rewrite_rules (and later an external converter script which
also had other "cleanups") to normalize the starting number in all
message URLs to 0, so that you wouldn't get the same message indexed
under multiple start numbers.  However, not all such situations will
have such an easy workaround.

-- 
Gilles R. Detillieux              E-mail: <gr...@sc...>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)

[htdig-dev] Should htmerge -m combine link descriptions?

From: Gilles D. <gr...@sc...> - 2002-09-20 22:54:36

Hi, folks.  I've been giving some thought to how htmerge -m works when
it finds the same URL in both databases.  Right now, it just tosses out
the older docdb record and keeps the newer one.  However, it occurs to
me that this could cause a loss of information that's collected during
a full dig, if you then merge in a more recent partial dig.  A full
dig would likely harvest more link descriptions for a given URL, and
a higher backlink count, than would a partial dig.  So, if the record
from the partial dig is more recent, it will clobber the more complete
information from the corresponding record in the full dig.

It seems to me that htmerge should look at both DocumentRef records
and take the higher backlink count, as well as combining all the link
description text (weeding out duplicates, presumably).  I guess it would
then also need to generate new wordlist entries for any new description
words for the new DocID.  Does this make sense?

This occurred to me as I was thinking about how htdig handles HTTP
redirects.  In that case, it transfers all the old pre-redirect
descriptions to the new redirected URL's DocumentRef.  It also takes
the smaller of the two hop counts, but it doesn't take the larger of
the two backlink counts, which strikes me as a bit of a bug there too.
Am I wrong?

-- 
Gilles R. Detillieux              E-mail: <gr...@sc...>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)

Re: [htdig-dev] (Two proposals)

From: Geoff H. <ghu...@ws...> - 2002-09-20 01:41:08

On Thu, 19 Sep 2002, Neal Richter wrote:

> We would also be able to avoid any dynamic resizing of the LOCATION
> Value-field in BDB by making it a fixed width.
> 
> Ex: Let's say this LOCATION-value is 'Full' @ 32 characters.  Further
> locations of 'affect' in doc 400 get new rows

As I said before, this is probably a good idea. But it's going to take
some work to get the right balance. Since the keys need to be unique, you
have to introduce at least some "padding" by putting a row field in the
key.

word // doc id // row

OK, now you have a fixed-length record for:

(location // field // anchor) (location // field // anchor) ...

So the trick will be to find:

a) A short field length for "row" to minimize overhead.
b) The "right" fixed-length record for a "row."

(a) is partially offset by the reduction in the BDB control structures if
you cut down on the number of keys. But you don't want to make it too
small since you don't know how many words will be in long documents. We
can make guesses fortunately.

e.g. I just did counting from Project Gutenberg's text versions of

"Adventures of Huckleberry Finn" (563KB) most frequent: "and" 6138 times
"20,000 Leagues Under the Sea" (567KB) most frequent: "the" 7469 times
King James Bible (Old & New Test.) (4240KB) most freq.: "the" 62162 times
							"and" 38611 times
							"of" 34506 times

(b) is trickier. For short rows, you'll waste space since you've reserved
this record you aren't using. But the shorter it is, the fewer keys you
can condense.

So the question is how often we'd waste space (and how much) on short
rows, versus how much we (might) regain from BDB control
structures. Experimentation will be needed.

Neil, do you think we can actually save bits across just the key/record
pair? It seems like you'll need to add bits for the row location.

-Geoff

Re: [htdig-dev] Tow proposals for Improving the size and reliability of the WordDB

From: Geoff H. <ghu...@ws...> - 2002-09-20 00:46:56

On Thu, 19 Sep 2002, Gilles Detillieux wrote:

> Sounds like an excellent idea to me.  I'm rather surprised they didn't do
> that in mifluz already (or is something like this in the newer code?).

OK, so to make this clear, there's a difference between mifluz (which is
more backend) and the way we use it. We can set whatever key specification
we want to mifluz. So we currently use a key specification like:

word // doc id // location // flags

Now we should also remember that Loic was essentially the *only* mifluz
developer.

>         I'm kind of being conservative.  Based on the total lack of
> recent progress in mifluz, the very quite mailing list, and the much
> smaller user base I have some worries about just how good the new mifluz
> code is.

Keep in mind that the version of htword/mifluz we're using is 0.13. I know
for *certain* there are bugs in it, and reading the ChangeLog for mifluz,
I know they're not just in the compression code. Keep in mind that there
are some fairly decent regression tests for mifluz. I don't see inactivity
on development as necessarily indicative of stability! (Otherwise, I'd
really worry about bugs whenever I use LaTeX.)

The problems with the merge were due to differences in the mifluz
*interfaces* (and my lack of free time to do the merge) as well as Loic's
disappearance. I haven't had to do significant changes to the mifluz code,
which does pass regression tests.

I performed the merge outside the CVS tree so that we _can_ get testing of
reliability. But this is completely distinct about how to handle _our_
key/record implementation.

-Geoff

Re: [htdig-dev] Tow proposals for Improving the size and reliability of the WordDB

From: Neal R. <ne...@ri...> - 2002-09-19 23:22:01

On Thu, 19 Sep 2002, Gilles Detillieux wrote:

> According to Neal Richter:
> > 1.  Add a new config verb to let users use zlib WordDB-page compression.
> > 	This would be an option to let users who run into this error:
> > 
> > FATAL ERROR:Compressor::get_vals invalid comptype
> > FATAL ERROR at file:WordBitCompress.cc line:827 !!!
> > 
> >  If you look into the db/mp_cmpr.c code (Loic's Compressed BDB page code)
> >  you'll find these two functions:
> > 	CDB___memp_cmpr_inflate(..)
> >         CDB___memp_cmpr_defalte(...)
> ...
> >    Merging Loic's latest mifluz is supposed to fix this problem (Geoff
> > and I have been working on this), but so far the merge is fairly complex
> > and needs much more work and long term testing.  This is a decent
> > solution.
> 
> Sounds reasonable as an interim solution.  I wonder, though, if
> it wouldn't be a quicker/easier fix to backport just the inflate and
> deflate code from the latest mifluz package to the existing 3.2.0b4 code.
> Would that fix this particular problem without all the headaches of
> merging in all the latest mifluz code?

	I tried to do just that (independent of Geoff).  Unfortunately
Loic basically reimplemented and restructured so much code that its very
hard to divide the merge.

> 
> > 2.  The inverted index is not very efficient in general.
> > 
> > The current scheme:
> > 
> > WORD    DOCID   LOCATION
> > affect  323    43  
> > affect  323    53  
> ...
> > A more efficient inverted system 
> > 
> > affect  323    43, 53
> ...
> > If the fixed width Location field was around 256 characters, this would
> > allow roughly 40-50 1,2,3 & 4 digit location codes... likely resulting the
> > vast majority of the time a second row is not needed.  For large
> > documents, this would change but still be much more efficient.
> > 
> > Eh? Feedback?
> 
> Sounds like an excellent idea to me.  I'm rather surprised they didn't do
> that in mifluz already (or is something like this in the newer code?).
> This does mean a deviation from the mifluz code base, but it seems
> that's inevitable anyway, given the efforts to crowbar the latest code
> into ht://Dig, and the lack of support from the mifluz developers.

	I'm kind of being conservative.  Based on the total lack of recent
progress in mifluz, the very quite mailing list, and the much smaller user
base I have some worries about just how good the new mifluz code is.

	I would like to see parallel development for a while till we get
the mifluz-merge tree VERY solid.

	Maybe even finish 3.2 without the merge and start the merge in
3.3?

> I guess it also means making the change twice - once in the current
> ht://Dig code and again after the mifluz code merge.  Or is all this
> at a level that can be done with minimal changes after the merge?

	Probably both, but the second port should be pretty
straightforward.

-- 
Neal Richter 
Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site
Office: 406-522-1485

Re: [htdig-dev] Tow proposals for Improving the size and reliability of the WordDB

From: Geoff H. <ghu...@ws...> - 2002-09-19 23:14:09

On Thu, 19 Sep 2002, Neal Richter wrote:

>    Merging Loic's latest mifluz is supposed to fix this problem (Geoff
> and I have been working on this), but so far the merge is fairly complex
> and needs much more work and long term testing.  This is a decent 
> interim solution.

Obviously I'm more concerned with the mifluz merge and figuring out the
lousy performance. But if you've seen that switching to zlib or the newer
codec seem to solve the database bugs, then I'm happy with this as an
interim solution. We could use this for a 3.2.0b4 (which we need) and then
work on the mifluz merge for 3.2.0b5.

> WORD    DOCID   LOCATION
> affect  323    43  

So first off, I should point out that it's not quite as bad as this. Loic
and I worked on "key compression," which means that the database doesn't
actually store multiple keys when they're only slightly different. There's
also a rationale behind this system--it was faster to keep all these keys
than changing the length of the records:

> affect  323    43, 53
> affect  336    14, 148, 155

> Value-field in BDB by making it a fixed width.
> 
> Ex: Let's say this LOCATION-value is 'Full' @ 32 characters.  Further
> locations of 'affect' in doc 400 get new rows

OK, so having a fixed width and multiple rows may be a reasonable idea,
but your description isn't very workable. For one, the keys need to be
unique. So you'd want something like:

Key: WORD   DOCID   ROW
Record: Location/Flags/Anchor designation list

The key would be to come up with a compact binary representation of
these. Using characters to store integers is a bit inefficient. :-) More
on that later, perhaps.

-Geoff

Re: [htdig-dev] Tow proposals for Improving the size and reliability of the WordDB

From: Gilles D. <gr...@sc...> - 2002-09-19 21:19:27

According to Neal Richter:
> 1.  Add a new config verb to let users use zlib WordDB-page compression.
> 	This would be an option to let users who run into this error:
> 
> FATAL ERROR:Compressor::get_vals invalid comptype
> FATAL ERROR at file:WordBitCompress.cc line:827 !!!
> 
>  If you look into the db/mp_cmpr.c code (Loic's Compressed BDB page code)
>  you'll find these two functions:
> 	CDB___memp_cmpr_inflate(..)
>         CDB___memp_cmpr_defalte(...)
...
>    Merging Loic's latest mifluz is supposed to fix this problem (Geoff
> and I have been working on this), but so far the merge is fairly complex
> and needs much more work and long term testing.  This is a decent
> solution.

Sounds reasonable as an interim solution.  I wonder, though, if
it wouldn't be a quicker/easier fix to backport just the inflate and
deflate code from the latest mifluz package to the existing 3.2.0b4 code.
Would that fix this particular problem without all the headaches of
merging in all the latest mifluz code?

> 2.  The inverted index is not very efficient in general.
> 
> The current scheme:
> 
> WORD    DOCID   LOCATION
> affect  323    43  
> affect  323    53  
...
> A more efficient inverted system 
> 
> affect  323    43, 53
...
> If the fixed width Location field was around 256 characters, this would
> allow roughly 40-50 1,2,3 & 4 digit location codes... likely resulting the
> vast majority of the time a second row is not needed.  For large
> documents, this would change but still be much more efficient.
> 
> Eh? Feedback?

Sounds like an excellent idea to me.  I'm rather surprised they didn't do
that in mifluz already (or is something like this in the newer code?).
This does mean a deviation from the mifluz code base, but it seems
that's inevitable anyway, given the efforts to crowbar the latest code
into ht://Dig, and the lack of support from the mifluz developers.

I guess it also means making the change twice - once in the current
ht://Dig code and again after the mifluz code merge.  Or is all this
at a level that can be done with minimal changes after the merge?

-- 
Gilles R. Detillieux              E-mail: <gr...@sc...>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)

[htdig-dev] Tow proposals for Improving the size and reliability of the WordDB

From: Neal R. <ne...@ri...> - 2002-09-19 20:17:57

Hey all,

	I've got two proposals here for the WordDB:

1.  Add a new config verb to let users use zlib WordDB-page compression.
	This would be an option to let users who run into this error:

FATAL ERROR:Compressor::get_vals invalid comptype
FATAL ERROR at file:WordBitCompress.cc line:827 !!!

 If you look into the db/mp_cmpr.c code (Loic's Compressed BDB page code)
 you'll find these two functions:
	CDB___memp_cmpr_inflate(..)
        CDB___memp_cmpr_defalte(...)

 They are drop in zlib-based replacements for the
(*cmpr_info->uncompress) & (*cmpr_info->compress) function-pointer calls.

 Yes, the compression isn't as good as the ad-hoc bit-stream compression
in WordDBCompress, WordBitCompress, and WordDBPage.  The advantage is that
its fairly bulletproof (zlib) and better than turning off wordDB
compression altogether with the  'wordlist_compress' config verb.

   Merging Loic's latest mifluz is supposed to fix this problem (Geoff
and I have been working on this), but so far the merge is fairly complex
and needs much more work and long term testing.  This is a decent
solution.


2.  The inverted index is not very efficient in general.

The current scheme:

WORD    DOCID   LOCATION
affect  323    43  
affect  323    53  
affect  336    14  
affect  336    148 
affect  336    155 
affect  351    43  
affect  358    370 
affect  399    51  
affect  400    10  
affect  400    86  
affect  400    95  
affect  400    139 
affect  400    215 
affect  400    222 
affect  400    229 

A more efficient inverted system 

affect  323    43, 53
affect  336    14, 148, 155
affect  351    43  
affect  358    370 
affect  399    51  
affect  400    10, 86, 95, 139, 215, 222, 229

We would need to augment the WordDB and associated classes to support the
value parsing..

We would also be able to avoid any dynamic resizing of the LOCATION
Value-field in BDB by making it a fixed width.

Ex: Let's say this LOCATION-value is 'Full' @ 32 characters.  Further
locations of 'affect' in doc 400 get new rows

affect  400    10, 86, 95, 139, 215, 222, 229
affect  400    300, 322, 395, 439, 516

The objects would keep track of the field lengths and create new rows as
needed.

If the fixed width Location field was around 256 characters, this would
allow roughly 40-50 1,2,3 & 4 digit location codes... likely resulting the
vast majority of the time a second row is not needed.  For large
documents, this would change but still be much more efficient.

Eh? Feedback?


EXTRA NOTE: Memory Leak detection:

I also wanted to make the developers aware (if you aren't already) of
Valgrind.  It's nice open-source memory error checking tool.

In general you use it like this:

Valgrind htdig xxx xxx xxx

It seems to be pretty comparable to both Purify and Insure.  It's not
going to get you as good a result as a compile-or-link time code
instrumentation, but its better than nothing at all.  Interestingly Isure
ships a program called 'Chaperon' that you use in the same way as
Valigrind on Debug binaries.  I haven't looked into it indetail, but my
guess is that both build on the native memory debugging facilities of
glibc.

Valgrind Home Page
http://developer.kde.org/~sewardj/

KDE GUI frontend to Valgrind
http://www.weidendorfers.de/kcachegrind/index.html

-- 
Neal Richter 
Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site
Office: 406-522-1485

Re: [htdig-dev] Help required with "configure" - Porting Adjustable Logging Patch to 3.2

From: Geoff H. <ghu...@ws...> - 2002-09-18 15:11:20

On Thu, 12 Sep 2002, Brian White wrote:

> Well, all except the "until they have granularity" bit -
> what does that mean?

In theory, we can do multi-write to the databases by locking *parts* of
the database, rather than the full file. This would be through the
Berkeley DB code and would obviously make file locking obsolete.

> 1) What do we do about "dangling" locks? ( where
>     lock files get left behind due to a crash or some
>     other kind of bug)

I think we'd want some sort of timeout. Using PIDs wouldn't help the
typical problem, which is running multiple programs on the same file. Yes,
out-of-sync clocks are a problem, but if we pick a long enough timeout and
document the behavior (so sysadmins can kill the lockfiles if there are
problems) it should be OK.

> 2) If locks have to be valid across machines, the
>     TMPDIR isn't going to work unless it is explicitly
>     put into a common area?

True. Of course the rundig scripts typically set TMPDIR to the database
directory, but you're right that we can't count on this.

> 3) Also - do locks need to work between different
>     programs? What if they have different
>     permissions

>     Another solution is to use the process id - but that

No, I don't think we want locks to be tied to the PID. The same PID is
already prevented from opening the same file twice--you'd need to have two
copies of DocumentDB, etc.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

[htdig-dev] Re: [htdig] Big Problem on Mac OS X - ref->DocTime() always returns 0

From: Stefan S. <Tal...@in...> - 2002-09-18 08:54:24

On 18.09.2002 1:33 Uhr, Jim Cole <gre...@yg...> wrote:

> Going back to the basics, how are you indexing? Are you using
> the rundig script? Are you aware that by default rundig always
> calls htdig with the -i option, causing the current databases
> to be deleted before it starts indexing?
> 
> I apologize in advance if I am suggesting the obvious to you
> here; but a couple symptoms you describe make it sound like the
> databases are being deleted and then rebuilt from scratch.

Holy DIG! I feel soooo stupid now!

Of course i used rundig, but since i called with -a, i figured it should not
reindex the whole site. Looking at the rundig script, of course there is
this tiny little "-i " flag in there. Bummer.

I apologize "deeply bend" to all of you for not seeing this. Usually i am
quite good at first looking at the most basic solution for a problem (like
connecting the PowerPlug first to a Printer before debugging the
Printer-Driver) ;-)

Sorry again!

--
Stefan Seiz <http://www.StefanSeiz.com>
Spamto: <bi...@im...>

[htdig-dev] Re: [htdig] Big Problem on Mac OS X - ref->DocTime() always returns 0

From: Jim C. <gre...@yg...> - 2002-09-17 23:34:03

Stefan Seiz's bits of Thu, 12 Sep 2002 translated to:

>I ran into a farily nasty problem running htdig on:
>Mac OS X Server 10.1.2 Server (5P68) [ Mac OS X 10.1.2 (5P48)

I am also unable to duplicate the problem. I am currently using
Mac OS X 10.2 (6C115).

>I realized htdig always REINDEXING my complete site no matter if a
>document's last modification date matched the servers last mod date or not.
>Tracking down things, I found that ref->DocTime() ALWAYS returns 0 even if
>the given url has a mod date (m:XXXXXXX) value in the database (verified
>with htdump).

Going back to the basics, how are you indexing? Are you using
the rundig script? Are you aware that by default rundig always
calls htdig with the -i option, causing the current databases
to be deleted before it starts indexing?

I apologize in advance if I am suggesting the obvious to you
here; but a couple symptoms you describe make it sound like the
databases are being deleted and then rebuilt from scratch.

Jim

Re: [htdig-dev] Re: ExternalTransport patch, and URL format

From: Gilles D. <gr...@sc...> - 2002-09-17 22:58:27

According to Lachlan Andrew:
> Here is a patch (url-format.patch) to allow the format of 
> external_protocol URLs be <protocol>:<path>, rather than 
> <protcol>://<host>/<path>. It seems to work in the cases 
> I've tested, but I'm not sure how to try it on the test 
> suite, so I hope I haven't broken anything else...  Please 
> let me know if it needs more work.
> 
> The patch is relative to  3.2.0b4-20020616.  Let me know if 
> you need it against a more recent snapshot.
> 
> The HTML table for the description of the output expected 
> from the external transport was also poorly formatted in 
> this snapshot, so I've included another patch 
> (return-field-table.patch) to fix that, if it hasn't 
> already been done.  If you apply this patch, do so first.

Thanks for the patches.  I've gotten rid of the rowspans in the
table in external_protocols' description, as per your 2nd patch.

For the 1st one, I don't have time to test it myself, but I'd like
to wait for comments/testing from other developers if any is forthcoming.

To answer some of the questions you ask in the code of your patch,
the .get() after a sub seemed to be needed with some compilers, to
avoid warnings.  For whatever reason, we couldn't just assign the
result of a .sub() to another String, even though .sub() is supposed
to return a String.  The .get() gives us a (char *) from that, and
everything seems to work well that way.

On line 324 of URL.cc, you say "(should also check the slashes are
actually there...)".  I agree, the code should do this.

The way the slash count is encoded in the Dictionary entries is kludgy,
but I think there are similar kludges elsewhere in the code.  It's also
self-contained in one method of the URL class, so I don't have a problem
with it.

Without actually testing it yet, that's about all I can suggest.

-- 
Gilles R. Detillieux              E-mail: <gr...@sc...>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)

Re: [htdig-dev] Big Problem on Mac OS X - ref->DocTime() always

From: Gilles D. <gr...@sc...> - 2002-09-17 20:55:31

According to Stefan Seiz:
> On 17.9.2002 20:21 Uhr, Geoff Hutchison <ghu...@ws...> wrote:
> > 2) Where do you determine that ref->DocTime() is returning 0?
> > I ask, in part because htdump is going to access this as well:
> >           fprintf(fl, "\tm:%d", (int) ref->DocTime());
> I added a traceprint to Retreiver.cc inside the Retriever::parse_url(URLRef
> &urlRef) routine like so:
> 
> --- snip ---
> if (ref)
>     {
>         //
>         // We already have an entry for this document in our database.
>         // This means we can get the document ID and last modification
>         // time from there.
>         //
>         current_id = ref->DocID();
>         date = ref->DocTime();
>         if (debug > 2)
>           {
>             cout << "\nDOC MATCHED DB!!! \n" << endl;
>             cout << "DocTime Date is: " << date << endl;
>           }
> --- snap ---

I think you might need a cast up there, i.e.:

            cout << "DocTime Date is: " << (int) date << endl;

I don't know if there's a (ostream) << (time_t) operator defined, and it
might not be automatically casting date to (int) on its own.  See if that
makes a difference.

> > 3) Are you sure the server is returning a Last-Modified header for files?
> Yes, I snooped the wire ;-)
> 
> > 4) Does the server properly handle the If-Modified-Since header?
> > (To see that this header is sent, check in Document.cc line 525 or so for
> > the output sent by htdig.)
> It's apache 1.3.26, so I guess it should. But I think htdig only sends the
> if-Modified since header if it finds a date for an url in the current
> database and as I assume that doesn't happen, so the If-Modified-Since
> header never makes it's way out.
> 
> Here's an example url from my htdump file to prove a date is in there:
> 
> 0       u:http://www.CENSORED.com/YADDA.html    t:CENSORED        a:0
> m:873819058     s:280   H:  CENSORED         h:      l:1031854616
> L:0     b:1     c:0     g:0     e:      n:      S:      d:      A:

Well, that "m:" value is definitely non-zero, and definitely not as large
as the current time, so it does seem to be getting, parsing, and storing
Last-Modified header dates.  But in your reply to my message on [htdig],
you said...

According to Stefan Seiz:
> On 17.9.2002 20:23 Uhr, Gilles Detillieux <gr...@sc...> wrote:
> > If that's the part you suspect is failing, then you should be able
> > to confirm that by running htdig -vvv.  Look for the messages where
> > it outputs the Last-Modified header, and then says something like
> > "Converted ... to ...", which shows the original and regenerated date
> > string after parsing.  If the second one is wrong, then you are right in
> > that the problem is somewhere in the parsing.  In that case, try adding
> > trace prints in the parsedate() function in htdig/Document.cc (minimal
> > programming skills required, just look at how other debug output is done).
> 
> I already tested this but unfortunately I don't get any Dates output when
> running with -vvv

That doesn't add up.  If you are indeed running htdig version 3.1.6
with -vvv, then it MUST be showing the Last-Modified headers if your
server is returning them.  Are you sure you're running a vanilla 3.1.6
installation, and not some severely modified variant of this, or another
version altogether?  Can you show us a complete excerpt of htdig -vvv
output for one file, from one "Retrieval command for ..." message to
the next?

If you're not seeing those either, is it possible you're retrieving
via local_urls?  In this case, there's not date parsing involved, as
htdig gets the modtime as a time_t already from the local filesystem.
But you did say you snooped the wire, so I'd guess this isn't the case.

> > If the second date string is fine, it could be a problem related to
> > refetching this info from the database, or some memory leak somewhere.
> > I thought you mentioned that an htdump showed correct, non-zero modtimes.
> > Such a problem would be harder to track down.
> 
> Yes, datestamps (seconds since epoch) are in the dumped file.
> 
> I'll add debug prints (already did some and always got a date of 0) to the
> files you mentioned and report back.
> 
> Could you tell me which subroutine is responsible for parsing the timestamp
> from the local database (I guess reading and parsing that one is the
> problem)?

It's already parsed by the time it gets into the database.  It starts out
as a date string from the server, in RFC850 or RFC1123 format, and the
parsedate() function in htdig/Document.cc converts that to a time_t, i.e.
a 32-bit integer representing seconds since Jan 1, 1970, 00:00:00 GMT.
It goes through some encoding and decoding as it gets stored in the
database (see DocumentRef::Serialize() and DocumentRef::Deserialize in
htcommon/DocumentRef.cc), but then it goes through those same routines
when you get the number via htdump.  It doesn't get converted back into
a date string until htsearch processes it, using strftime().

-- 
Gilles R. Detillieux              E-mail: <gr...@sc...>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)

Re: [htdig-dev] Big Problem on Mac OS X - ref->DocTime() always returns 0

From: Stefan S. <tal...@in...> - 2002-09-17 19:09:02

On 17.9.2002 20:21 Uhr, Geoff Hutchison <ghu...@ws...> wrote:

> At least at the moment, I cannot reproduce this, which is why I have not
> responded sooner. So let me at least ask a few questions which might help:
> 
> 1) Do you see actual, formatted dates in htsearch results? (These
> obviously need to do the same access.)

No, but I guess this is due to my templates being set so they don't display
any:
<strong><a href="$&(URL)">$&(TITLE)</a></strong> $(STARSLEFT)<br>
<ul>$(EXCERPT)</ul>

I'll try adding dates there and see what happens.

> 2) Where do you determine that ref->DocTime() is returning 0?
> I ask, in part because htdump is going to access this as well:
>           fprintf(fl, "\tm:%d", (int) ref->DocTime());
I added a traceprint to Retreiver.cc inside the Retriever::parse_url(URLRef
&urlRef) routine like so:

--- snip ---
if (ref)
    {
        //
        // We already have an entry for this document in our database.
        // This means we can get the document ID and last modification
        // time from there.
        //
        current_id = ref->DocID();
        date = ref->DocTime();
        if (debug > 2)
          {
            cout << "\nDOC MATCHED DB!!! \n" << endl;
            cout << "DocTime Date is: " << date << endl;
          }
--- snap ---


> 3) Are you sure the server is returning a Last-Modified header for files?
Yes, I snooped the wire ;-)

> 4) Does the server properly handle the If-Modified-Since header?
> (To see that this header is sent, check in Document.cc line 525 or so for
> the output sent by htdig.)
It's apache 1.3.26, so I guess it should. But I think htdig only sends the
if-Modified since header if it finds a date for an url in the current
database and as I assume that doesn't happen, so the If-Modified-Since
header never makes it's way out.

Here's an example url from my htdump file to prove a date is in there:

0       u:http://www.CENSORED.com/YADDA.html    t:CENSORED        a:0
m:873819058     s:280   H:  CENSORED         h:      l:1031854616
L:0     b:1     c:0     g:0     e:      n:      S:      d:      A:

--
<http://www.StefanSeiz.com>
Spamto: <bi...@im...>

Re: [htdig-dev] Big Problem on Mac OS X - ref->DocTime() always returns 0

From: Geoff H. <ghu...@ws...> - 2002-09-17 18:19:44

> I realized htdig always REINDEXING my complete site no matter if a
> document's last modification date matched the servers last mod date or not.
> Tracking down things, I found that ref->DocTime() ALWAYS returns 0 even if
> the given url has a mod date (m:XXXXXXX) value in the database (verified
> with htdump).

> I guess, the problem lies in the conversion of the m:xxxxxx (seconds since
> epoch) value maybe somewhere in mktime.c or such???

At least at the moment, I cannot reproduce this, which is why I have not
responded sooner. So let me at least ask a few questions which might help:

1) Do you see actual, formatted dates in htsearch results? (These
obviously need to do the same access.)
2) Where do you determine that ref->DocTime() is returning 0?
 I ask, in part because htdump is going to access this as well:
            fprintf(fl, "\tm:%d", (int) ref->DocTime());
3) Are you sure the server is returning a Last-Modified header for files?
4) Does the server properly handle the If-Modified-Since header?
 (To see that this header is sent, check in Document.cc line 525 or so for
the output sent by htdig.)

-Geoff

[htdig-dev] Re: ExternalTransport patch, and URL format

From: Lachlan A. <lh...@ee...> - 2002-09-16 00:59:30

Attachments: url-format.patch return-field-table.patch

Greetings Gilles,

Here is a patch (url-format.patch) to allow the format of 
external_protocol URLs be <protocol>:<path>, rather than 
<protcol>://<host>/<path>. It seems to work in the cases 
I've tested, but I'm not sure how to try it on the test 
suite, so I hope I haven't broken anything else...  Please 
let me know if it needs more work.

The patch is relative to  3.2.0b4-20020616.  Let me know if 
you need it against a more recent snapshot.

The HTML table for the description of the output expected 
from the external transport was also poorly formatted in 
this snapshot, so I've included another patch 
(return-field-table.patch) to fix that, if it hasn't 
already been done.  If you apply this patch, do so first.

Thanks for your help and advice :)
Lachlan

On Thu, 12 Sep 2002 02:50, Gilles Detillieux wrote:

> Would you be
> willing to implement it?  If you can provide patches, I
> can make sure they make it into the CVS tree.

-- 
Lachlan Andrew  Phone: +613 8344-3816 Fax: +613 8344-6678
Dept of Electrical and Electronic Engg		CRICOS Provider Code
University of Melbourne, Victoria, 3010  AUSTRALIA	00116K

[htdig-dev] Current Status as of snapshot 3.2.0b4-20020915

From: Geoff H. <ghu...@us...> - 2002-09-15 07:13:51

STATUS of ht://Dig branch 3-2-x

RELEASES:
   3.2.0b4: In progress 
	(mifluz merge essentially finished, contact Geoff for patch to test)
   3.2.0b3: Released:  22 Feb 2001.
   3.2.0b2: Released:  11 Apr 2000.
   3.2.0b1: Released:   4 Feb 2000.

SHOWSTOPPERS:

KNOWN BUGS:
* Odd behavior with $(MODIFIED) and scores not working with
   wordlist_compress set but work fine without wordlist_compress.
   (the date is definitely stored correctly, even with compression on
    so this must be some sort of weird htsearch bug)
* Not all htsearch input parameters are handled properly: PR#648. Use a
   consistant mapping of input -> config -> template for all inputs where
   it makes sense to do so (everything but "config" and "words"?).
* If exact isn't specified in the search_algorithms, $(WORDS) is not set 
   correctly: PR#650. (The documentation for 3.2.0b1 is updated, but can
   we fix this?)
* META descriptions are somehow added to the database as FLAG_TITLE,
   not FLAG_DESCRIPTION. (PR#859)

PENDING PATCHES (available but need work):
* Additional support for Win32.
* Memory improvements to htmerge. (Backed out b/c htword API changed.)

NEEDED FEATURES:
* Field-restricted searching.
* Return all URLs.
* Handle noindex_start & noindex_end as string lists.

TESTING:
* httools programs: 
  (htload a test file, check a few characteristics, htdump and compare)
* Turn on URL parser test as part of test suite.
* htsearch phrase support tests
* Tests for new config file parser
* Duplicate document detection while indexing
* Major revisions to ExternalParser.cc, including fork/exec instead of popen,
  argument handling for parser/converter, allowing binary output from an
  external converter.
* ExternalTransport needs testing of changes similar to ExternalParser.

DOCUMENTATION:
* List of supported platforms/compilers is ancient.
* Add thorough documentation on htsearch restrict/exclude behavior
   (including '|' and regex).
* Document all of htsearch's mappings of input parameters to config attributes
   to template variables. (Relates to PR#648.) Also make sure these config
   attributes are all documented in defaults.cc, even if they're only set by
   input parameters and never in the config file.
* Split attrs.html into categories for faster loading.
* require.html is not updated to list new features and disk space
   requirements of 3.2.x (e.g. phrase searching, regex matching,
   external parsers and transport methods, database compression.)
* TODO.html has not been updated for current TODO list and completions.

OTHER ISSUES:
* Can htsearch actually search while an index is being created?
   (Does Loic's new database code make this work?)
* The code needs a security audit, esp. htsearch
* URL.cc tries to parse malformed URLs (which causes further problems)
   (It should probably just set everything to empty) This relates to 
   PR#348.

[htdig-dev] Big Problem on Mac OS X - ref->DocTime() always returns 0

From: Stefan S. <tal...@in...> - 2002-09-12 19:45:15

Hi,
sorry for the crosspost to both htdig lists, I wasn't sure which one best
suits.

I ran into a farily nasty problem running htdig on:
Mac OS X Server 10.1.2 Server (5P68) [ Mac OS X 10.1.2 (5P48)

I realized htdig always REINDEXING my complete site no matter if a
document's last modification date matched the servers last mod date or not.
Tracking down things, I found that ref->DocTime() ALWAYS returns 0 even if
the given url has a mod date (m:XXXXXXX) value in the database (verified
with htdump).

I must confess, I am not a programmer and can only find my way in c source
to track down the problem and put some debugging output in there - no more
no less.

I guess, the problem lies in the conversion of the m:xxxxxx (seconds since
epoch) value maybe somewhere in mktime.c or such???

Does anyone have any tips on how to fix this?

Here some parts which might be important to know from config.cache on my
platform:
ac_cv_func_localtime_r=${ac_cv_func_localtime_r=no}
ac_cv_func_timegm=${ac_cv_func_timegm=no}
ac_cv_header_sys_time_h=${ac_cv_header_sys_time_h=yes}
ac_cv_header_time=${ac_cv_header_time=yes}
ac_cv_struct_tm=${ac_cv_struct_tm=time.h}

I'd appreciate any tips as I really can't afford to always kind of start
from scratch when indexing my site as it consumes too much bandwith and time
(lots of large pdf files).

Thanks a lot.

--
<http://www.StefanSeiz.com>
Spamto: <bi...@im...>

Re: [htdig-dev] Re: ExternalTransport patch, and URL format

From: Lachlan A. <lh...@ee...> - 2002-09-12 10:16:32

On Thu, 12 Sep 2002 02:50, Gilles Detillieux wrote:
> According to Lachlan Andrew:
> > I was thinking of having a list of
> > known services, specifying the number of leading
> > slashes:
> That sounds like a great idea to me.  Would you be
> willing to implement it?  If you can provide patches, I
> can make sure they make it into the CVS tree.

I'll gladly give it a go, and let you know how I get on...

Cheers,
Lachlan

-- 
Lachlan Andrew  Phone: +613 8344-3816 Fax: +613 8344-6678
Dept of Electrical and Electronic Engg		CRICOS Provider Code
University of Melbourne, Victoria, 3010  AUSTRALIA	00116K

Re: [htdig-dev] Help required with "configure" - Porting Adjustable Logging Patch to 3.2

From: Brian W. <bw...@st...> - 2002-09-12 08:42:03

At 12:46 12/09/2002, Geoff Hutchison wrote:
>On Thu, 12 Sep 2002, Brian White wrote:
>
> > the "Yes" and "No" responses as expected ) but I have
> > no idea how to propogate my AC_DEFINE down to the
> > C++ code.
>
>Run "autoheader" and it will update the include/htconfig.h.in , which
>becomes htconfig.h which is #included in the code. You should get your
>DEFINE set, and then you can #ifdef, #ifndef, etc.

Cool. That worked.

> > Are there any problems with doing this? Are there
> > any conventions I should be following that I am not?
>
>Personally, if I don't have a system-level file-locking system, I'd
>implement one in user code, say with a .lock file in the same directory
>(if writable) or using TMPDIR and something like
>
>$TMPDIR/path-to-file.lock
>
>for locking /path/to/file.
>
>I've actually felt this might not be a bad addition to the htdig code
>anyway, since the databases really should be locked against multi-write
>until they have granularity.
>
>Does that make any sense?

Well, all except the "until they have granularity" bit -
what does that mean?

Well, I can write something for this if you like.

Issues include

1) What do we do about "dangling" locks? ( where
    lock files get left behind due to a crash or some
    other kind of bug)

    Do it specify a timeout time ( which requires accurate
    clocks if it has to work across machines ) and how is
    it specified? Or could we use unix PIDs ( which only
    work

2) If locks have to be valid across machines, the
    TMPDIR isn't going to work unless it is explicitly
    put into a common area?

3) Also - do locks need to work between different
    programs? What if they have different
    permissions - how do we

The more I think about it, actually, the more I think
using TMPDIR is a bad idea. Anything that needs to create
a lock which matters is going to be doing it for a write
file, which generally will give them write access to
the directory.

Either that or locks ALWAYS go in TMPDIR ( which still leaves
you with issue 2)

Brian








    One solution is to specify a timeout of some kind - though
    that kind of depends on being confident that that all process
    that use the lockfile have the same time reference, whether
    that is becasue they all run on the same machine, or they
    run on machines that are at least close to all having the
    same time.

    Another solution is to use the process id - but that
    only works on unix, and you either need to be running
    on the same machine or have the ability to access the
    other machines process list. There is also the lurking
    danger caused by the fact the PID's are recycled.

    Any thoughts?

2) Writing to TMPDIR will probably only work for
    processes running on the same machine.









>-Geoff

-------------------------
Brian White
Step Two Designs Pty Ltd
Knowledge Management Consultancy, SGML & XML
Phone: +612-93197901
Web:   http://www.steptwo.com.au/
Email: bw...@st...

Content Management Requirements Toolkit
112 CMS requirements, ready to cut-and-paste

Re: [htdig-dev] HtWordList Object Question

From: Geoff H. <ghu...@ws...> - 2002-09-12 04:32:13

On Wed, 11 Sep 2002, Geoff Hutchison wrote:

> AFAICT, some of the htword/ destructors seem to set _config to NULL, which
> then kills the global HtConfiguration::config pointer. This is mostly a
> problem in htsearch currently, where things hit Display and go splat.

No, I was just wrong. This seems to work, so I'll clean things up a bit
and release another snapshot soon. (probably tomorrow night or Friday).

-Geoff

Re: [htdig-dev] Help required with "configure" - Porting Adjustable Logging Patch to 3.2

From: Jim C. <gre...@yg...> - 2002-09-12 03:49:33

Brian White's bits of Thu, 12 Sep 2002 translated to:

>I have been playing with my Adjustable Logging patch.
>I now have it working for 3.2.0b3, but I am stuck on

If possible, it might be more useful to create a patch against
a b4 snapshot. It is usually recommended that people don't use
the old b3 release anymore, so the patch might be of limited
usefulness in that sense.

Jim

Re: [htdig-dev] HtWordList Object Question

From: Geoff H. <ghu...@ws...> - 2002-09-12 02:54:57

> Is there any reason we can't move to using the global 
> 
> _config = HtConfiguration::config();

In the merge, I'm trying to go on the rule of "least possible
changes," which is still quite a lot, even in the ht://Dig code (rather
than db/ or htword/).

AFAICT, some of the htword/ destructors seem to set _config to NULL, which
then kills the global HtConfiguration::config pointer. This is mostly a
problem in htsearch currently, where things hit Display and go splat.

Then again, I've been hunting after the same problem in Display when the
*copy* of _config gets hammered and htsearch segfaults, so who can say
it's worse...

-Geoff

Re: [htdig-dev] Help required with "configure" - Porting Adjustable Logging Patch to 3.2

From: Geoff H. <ghu...@ws...> - 2002-09-12 02:46:23

On Thu, 12 Sep 2002, Brian White wrote:

> the "Yes" and "No" responses as expected ) but I have
> no idea how to propogate my AC_DEFINE down to the
> C++ code.

Run "autoheader" and it will update the include/htconfig.h.in , which
becomes htconfig.h which is #included in the code. You should get your
DEFINE set, and then you can #ifdef, #ifndef, etc.

> Are there any problems with doing this? Are there
> any conventions I should be following that I am not?

Personally, if I don't have a system-level file-locking system, I'd
implement one in user code, say with a .lock file in the same directory
(if writable) or using TMPDIR and something like

$TMPDIR/path-to-file.lock

for locking /path/to/file.

I've actually felt this might not be a bad addition to the htdig code
anyway, since the databases really should be locked against multi-write
until they have granularity.

Does that make any sense?
-Geoff

9 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 75 76 77 78 79 .. 108 > >> (Page 77 of 108)