Share

full text index for sqlite3

File Release Notes and Changelog

Release Name: 0.2.1

Notes: Hello, I wrote a small full text search engine for sqlite3. http://ft3.sourceforge.net Basic ideas are as follow: -------------------------- * use another sqlite3 file for storing full text index information * store everything into sqlite3 (this is not the fastest strategy in particular for inverted index) * don't care too much about disk space constrains (ft3 use roughly 8x more space than initial datas) * be resonnably efficient (index the 500k small documents in the english part of dmoz.org in 2H on my desktop PC, a search with frequent words take around 3/5 seconds on 1 IDE disk whitout cache, and less than 0,5 second if data in cache) Support basic things: --------------------- * for words: stemming, metaphone, stopwords, dictionnary, some statistics * for scores: TFIDF, proximity * special parsing for urls * special parsing for topics (If you have some) * web classical syntax for queries (support �/""/:) * make it easy to search from a php module * configuration stored inside sqlite3 * language detection (currently unplugged) Important Missing things: ------------------------- * Trigger support (currently not an incremental process, ft3_indexer reindex the full table) * ICU integration * SQL for computing co-occurence table is slow/buggy * It is not integrated within sqlite3 but it may be in the future if I understand enough of sqlite3 internal. How to get: ----------- http://ft3.sourceforge.net Software: --------- Code is in C++ with some external dependancies (google sparsehash, libedit, and of course sqlite3. It currently works under Linux 32bits and Cygwin, tested with g++-3.4.4. Quality is Beta. It is release under a BSD license. I wait for your feedback. In particular, any way to optimize performance at search time is of interest. Pierre.


Changes: 2005-10-28 Pierre PA. Aubert <pierreaubert@yahoo.fr> * remove ft3_compact.cpp and ft3_compact.hpp * download.sh: creation * doc/install.ssi: add a note to download.sh 2005-10-27 Pierre PA. Aubert <pierreaubert@yahoo.fr> * src/ft3_system.hpp.in: rename ft3_autoconfig into ft3_system people got confused with config files * src/ft3_behaviour.hpp: * src/ft3_behaviour.cpp: rename ft3_config into ft3_behaviour * doc/install.ssi: simplification of installation 2005-10-25 Pierre PA. Aubert <pierreaubert@yahoo.fr> * src/ft3_smartp.cpp: now beta quality * src/ft3_scanner_url.l: same for urls * src/ft3_scanner_text.l: add a text parser in LEX this is MUCH faster 2005-10-16 Pierre PA. Aubert <pierreaubert@yahoo.fr> * configure.ac: update for new libs, some cleanup * mv external libraries into contrib/ in order to simplify install on other computers 2005-10-14 Pierre PA. Aubert <pierreaubert@yahoo.fr> * configure.ac: setup to version 0.2 2005-10-11 Pierre PA. Aubert <pierreaubert@yahoo.fr> * src/t_bag.cpp (main): add a test prog from memleak bag_t * sqlite3xx/t_stmt2.c: add a test prog for transaction and statement * sqlite3xx/t_stmt.c: add a test prog for memleak detection * sqlite3xx/sqlite3xx_count.hpp (class Count): add an example * remove docscounter and wordcounter from Word to gain some memory when indexing large database. This data are recomputed later in the indexer process from disk. It is a bit slower and can be reactivated by a #define in ft3_word.hpp * src/ft3_indexer.cpp (main): simple bug for argc==2 2005-10-04 Pierre PA. Aubert <pierreaubert@yahoo.fr> * doc/userguide.ssi: add previous function * src/ft3_functions.cpp (sqlite3_levenhstein): add an interface to levenhstein distance between 2 strings 2005-10-02 Pierre PA. Aubert <pierreaubert@yahoo.fr> * sync with sourceforge, doc cleanup, comments cleanup 2005-10-01 Pierre PA. Aubert <pierreaubert@yahoo.fr> * src/ft3_language.cpp (class Language): bug correction, return lang contain a [ * src/ft3_searcher.cpp: update to sqlite schema v12 * src/ft3_bow.* (class BagOfWords): use sqlite3cxx => loc / 2 2005-09-30 Pierre PA. Aubert <pierreaubert@yahoo.fr> * creation of lib sqlite3xx: a simple C++ wrapper for sqlite3 C API: add Transaction, Statement and Count class * demo/Makefile.am: creation * demo/dmoz2sqlite.cpp: add a importer for dmoz dump file not very fast, but enough to do the job, memory consumption is very low 2005-09-27 Pierre PA. Aubert <pierreaubert@yahoo.fr> * src/ft3_match.i: add wrappers for SWIG, target PHP and RUBY interfaces; currently not under CVS 2005-09-21 Pierre PA. Aubert <pierreaubert@yahoo.fr> * src/ft3_searcher.cpp: add an interface to libedit which is a BSD rewrite of GNU readline 2005-09-20 Pierre PA. Aubert <pierreaubert@yahoo.fr> * src/ft3_smartp.cpp: add a basic profiler, will need to go to a C+ hash_map * src/ft3_searcher.cpp (main): add a callback to select * * doc/install.ssi: update documentation * src/ft3_smartp.cpp: use count distinct in query * update to sqlite3 version 3.2.6 2005-09-19 Pierre PA. Aubert <pierreaubert@yahoo.fr> * src/ft3_match.hpp (class Match): add a string interface to the char* interface 2005-09-16 Pierre PA. Aubert <pierreaubert@yahoo.fr> * src/ft3_config.cpp (checked): validate status of IndexConfig * src/ft3_document.hpp: split ft3_index_XXX functions from ft3_indexer * src/ft3_bow.hpp: split bag of words class from ft3_indexer 2005-09-15 Pierre PA. Aubert <pierreaubert@yahoo.fr> * configure.ac: upgrade to new lib version notice API change into ft3_search.hpp 2005-09-14 Pierre PA. Aubert <pierreaubert@yahoo.fr> * src/ft3_smartp.hpp: splitted from ft3_search this is a first simple version with fragment extracted from sql database. due to current hashing for ft3_scores, this is too slow to be viable 2005-09-09 Pierre PA. Aubert <pierreaubert@yahoo.fr> * src/Makefile.am (t_stringsplit_LDADD): add missing dependancies * configure.ac: add AC_PROG_AWK * doc/html.awk: add a missing \ before ) in regexp * doc/Makefile.am: add a conditional AWK 2005-09-08 Pierre PA. Aubert <pierreaubert@yahoo.fr> * sql/v10tov11.sqlite: sql script to go from schema v10 to schema v11 * sql/ft3.sqlite: goes to version 11 2005-09-03 Pierre PA. Aubert <pierreaubert@yahoo.fr> * Makefile.am: correct dist target * successfully upload on sourceforge.net * contrib/stemming-0.3/data/Makefile.am (LM): remove drents 2005-09-02 Pierre PA. Aubert <pierreaubert/at\yahoo/dot\fr> * doc/html.awk: add path variable * doc/Makefile.am: add generation of HTML files and installation * src/ft3_indexer.cpp (index_url): add [[:space:]] is splitter * src/t_ascii.cpp (main): add a test case * src/t_stringsplit.cpp (main): add a test case 2005-09-01 Pierre PA. Aubert <pierreaubert/at\yahoo/dot\fr> * doc/ft3.css: creation from ICU doc css file * src/ft3_indexer.cpp: simulate the CLUSTER INDEX functionnality by sorting the table (main): call a VACUUM FULL after a run 2005-08-12 Pierre F. Aubert <pierre.aubert/at\free/dot\fr> * doc/userguide.ssi: creation 2005-08-11 Pierre F. Aubert <pierre.aubert/at\free/dot\fr> * doc/install.ssi: creation 2005-08-04 Pierre PA. Aubert <pierreaubert/at\yahoo/dot\fr> * create a doc directory with basic documentation * configure.ac: cleaning, add some comments 2005-07-27 Pierre PA. Aubert <pierreaubert/at\yahoo/dot\fr> * mv small bits of encoding dealing into ft3_matcher.cpp and .hpp * src/ft3_wordstorage.hpp: add missing namespace ft3 2005-07-26 Pierre PA. Aubert <pierreaubert/at\yahoo/dot\fr> * creation of changelog