htdig-dev Mailing List for ht://Dig (Page 101)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi,

I am maintainer of the GNU Image Finding Tool and active in the 
Fer-de-Lance project that's been in (not very loud, but 
behind-the-scenes-active) exsistence since April last year. Within this 
project we work towards integration of searching services into the desktop.

I am mailing our list and a couple of other developer lists, because I think 
I have found an architecture that provides security while maintaining most of 
the advantages of demon-based search engine architectures. I think this 
architecture and associated tricks are flexible enough to encompass different 
search engines, so this mail is not about Medusa vs. htdig vs. GIFT, but 
rather how to work together to solve our common security problems for desktop 
integration of our engines. 

And, of course, I would be happy to get some suggestions for improvement 
and/or some developer time. I would be less happy if someone finds a 
fundamental flaw, but also this would be better than wasting my time trying 
to develop this stuff further. 

Now let's go into more detail.

GOAL:
The goal is to provide search services to the desktop user. These search 
services should not only encompass web-visible URLs, but rather all files 
accessible to the user as well as http/ftp/etc. accessible items.

ISSUE:
The first issue is -privacy-: the system should not tell us locations of 
files that we cannot read otherwise. For example: looking for some 
correspondence with the health insurance, we do not want to know that our 
colleague wrote last month three letters that match our search.

Second -memory consumption-: All indexes for similarity searching use memory 
which is either proportional to the size of each indexed file, or quite big 
to begin with. We do not want plenty of users that roll their own index, we 
want one index, otherwise we are likely to spend a multiple of our useful 
disk space on indexes.

SUGGESTION: Use a daemon and make sure that authentication is good. :-)

Too easy? Of course the problem lies in providing the authentification. 

What I suggest is to run a daemon which creates for each user U an unix 
domain socket which is readable *and* writable *only* by this one user U (and 
root, of course). All instructions to the indexing demon like e.g.

add item to index
delete item from index
move item within index (new URL for same file)
block item/subdirectory/pattern (don't index *.o files for example)
process query

would go through the socket. By knowing which socket received the request, we 
automatically know the user, and then we just have to compare for each result 
item, if it can be read by the user who issued the query. Of course we give 
back only the readable items.

We can create the sockets as user "nemo", and then chown them using a very 
small script running under root. So we would be root during a couple of 
seconds on startup, afterwards everything would happen as a user (nemo) who 
has write rights on one directory tree which is unreadable for all else. So 
there is not the issue of a big indexing program running under root for days 
and days in a row.

Adding an item is a (small) issue. We probably have to pipe the uuencoded (or 
something equivalent) binary through the socket in order to have it indexed 
on the other side of the socket. However, I guess the efficiency overhead is 
small compared to the indexing cost.

Things become a trifle more complex for adding items which are found on the 
web. Somebody indexing a web page should probably indicate who else (group, 
all) is allowed to know that somebody's indexed that page. If several users 
publish an URL the least restrictive rights are taken into account.

WHATS THERE? WHAT'S NEEDED?

Basically, I have tried out the socket stuff with a small test program. 
Works. Now I am starting to integrate that with the GIFT (which involves 
cleaning up some of my internet socket code).

What's still needed is the filter that stores which URLs are indexed under 
which owner, and with which rights. On each query GIFT can ask this filter, 
if a list of URLs can be given out as query result. Currently, I would like 
to base this filter on MySQL.

When that filter is in place, writing a medusa-plugin for the GIFT would be 
easy. I just finished a primitive htdig GIFT plugin which soon goes to CVS, 
so that one just needs some fleshing out.

CONCLUSION

I hope to have convinced you that we can get relatively easily a secure, yet 
memory efficient indexing solution for the desktop. If this is been already 
done, please tell me where. If my mail is a stupid suggestion, please tell me 
that, too. However, if you would like to participate in the coding and design 
effort or simply to share your opinion, please do not hesitate to subscribe 
to the fer-de-lance-development list.

Cheers,
Wolfgang

2001	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (47)	Nov (74)	Dec (66)
2002	Jan (95)	Feb (102)	Mar (83)	Apr (64)	May (55)	Jun (39)	Jul (23)	Aug (77)	Sep (88)	Oct (84)	Nov (66)	Dec (46)
2003	Jan (56)	Feb (129)	Mar (37)	Apr (63)	May (59)	Jun (104)	Jul (48)	Aug (37)	Sep (49)	Oct (157)	Nov (119)	Dec (54)
2004	Jan (51)	Feb (66)	Mar (39)	Apr (113)	May (34)	Jun (136)	Jul (67)	Aug (20)	Sep (7)	Oct (10)	Nov (14)	Dec (3)
2005	Jan (40)	Feb (21)	Mar (26)	Apr (13)	May (6)	Jun (4)	Jul (23)	Aug (3)	Sep (1)	Oct (13)	Nov (1)	Dec (6)
2006	Jan (2)	Feb (4)	Mar (4)	Apr (1)	May (11)	Jun (1)	Jul (4)	Aug (4)	Sep	Oct (4)	Nov	Dec (1)
2007	Jan (2)	Feb (8)	Mar (1)	Apr (1)	May (1)	Jun	Jul (2)	Aug	Sep (1)	Oct	Nov	Dec
2008	Jan (1)	Feb	Mar (1)	Apr (2)	May	Jun	Jul (1)	Aug	Sep (1)	Oct	Nov	Dec
2009	Jan	Feb	Mar (2)	Apr	May (1)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2010	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec (1)
2011	Jan	Feb	Mar (1)	Apr	May (1)	Jun	Jul	Aug	Sep	Oct (1)	Nov	Dec
2012	Jan	Feb	Mar	Apr	May	Jun	Jul (1)	Aug	Sep	Oct	Nov	Dec
2013	Jan	Feb	Mar	Apr (1)	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2016	Jan (1)	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2017	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov (1)	Dec

htdig-dev Mailing List for ht://Dig (Page 101)

htdig-dev — Developer Discussion for the ht://Dig project