File Release Notes and Changelog
Release Name: 0.2.1
Notes:
Hello,
I wrote a small full text search engine for sqlite3.
http://ft3.sourceforge.net
Basic ideas are as follow:
--------------------------
* use another sqlite3 file for storing full text index
information
* store everything into sqlite3 (this is not the
fastest strategy in particular for inverted index)
* don't care too much about disk space constrains
(ft3 use roughly 8x more space than initial datas)
* be resonnably efficient (index the 500k small
documents in the english part of dmoz.org in 2H on
my desktop PC, a search with frequent words take
around 3/5 seconds on 1 IDE disk whitout cache, and
less than 0,5 second if data in cache)
Support basic things:
---------------------
* for words: stemming, metaphone, stopwords,
dictionnary, some statistics
* for scores: TFIDF, proximity
* special parsing for urls
* special parsing for topics (If you have some)
* web classical syntax for queries (support �/""/:)
* make it easy to search from a php module
* configuration stored inside sqlite3
* language detection (currently unplugged)
Important Missing things:
-------------------------
* Trigger support (currently not an incremental
process, ft3_indexer reindex the full table)
* ICU integration
* SQL for computing co-occurence table is slow/buggy
* It is not integrated within sqlite3 but it may be in
the future if I understand enough of sqlite3 internal.
How to get:
-----------
http://ft3.sourceforge.net
Software:
---------
Code is in C++ with some external dependancies
(google sparsehash, libedit, and of course sqlite3. It
currently works under Linux 32bits and Cygwin, tested with
g++-3.4.4. Quality is Beta.
It is release under a BSD license.
I wait for your feedback. In particular, any way to
optimize performance at search time is of interest.
Pierre.
Changes:
2005-10-28 Pierre PA. Aubert <pierreaubert@yahoo.fr>
* remove ft3_compact.cpp and ft3_compact.hpp
* download.sh: creation
* doc/install.ssi: add a note to download.sh
2005-10-27 Pierre PA. Aubert <pierreaubert@yahoo.fr>
* src/ft3_system.hpp.in: rename ft3_autoconfig into ft3_system
people got confused with config files
* src/ft3_behaviour.hpp:
* src/ft3_behaviour.cpp: rename ft3_config into ft3_behaviour
* doc/install.ssi: simplification of installation
2005-10-25 Pierre PA. Aubert <pierreaubert@yahoo.fr>
* src/ft3_smartp.cpp: now beta quality
* src/ft3_scanner_url.l: same for urls
* src/ft3_scanner_text.l: add a text parser in LEX
this is MUCH faster
2005-10-16 Pierre PA. Aubert <pierreaubert@yahoo.fr>
* configure.ac: update for new libs, some cleanup
* mv external libraries into contrib/ in order to simplify install
on other computers
2005-10-14 Pierre PA. Aubert <pierreaubert@yahoo.fr>
* configure.ac: setup to version 0.2
2005-10-11 Pierre PA. Aubert <pierreaubert@yahoo.fr>
* src/t_bag.cpp (main): add a test prog from memleak bag_t
* sqlite3xx/t_stmt2.c: add a test prog for transaction and
statement
* sqlite3xx/t_stmt.c: add a test prog for memleak detection
* sqlite3xx/sqlite3xx_count.hpp (class Count): add an example
* remove docscounter and wordcounter from Word to gain some memory
when indexing large database. This data are recomputed later in
the indexer process from disk. It is a bit slower and can be
reactivated by a #define in ft3_word.hpp
* src/ft3_indexer.cpp (main): simple bug for argc==2
2005-10-04 Pierre PA. Aubert <pierreaubert@yahoo.fr>
* doc/userguide.ssi: add previous function
* src/ft3_functions.cpp (sqlite3_levenhstein): add an interface to
levenhstein distance between 2 strings
2005-10-02 Pierre PA. Aubert <pierreaubert@yahoo.fr>
* sync with sourceforge, doc cleanup, comments cleanup
2005-10-01 Pierre PA. Aubert <pierreaubert@yahoo.fr>
* src/ft3_language.cpp (class Language): bug correction, return
lang contain a [
* src/ft3_searcher.cpp: update to sqlite schema v12
* src/ft3_bow.* (class BagOfWords): use sqlite3cxx => loc / 2
2005-09-30 Pierre PA. Aubert <pierreaubert@yahoo.fr>
* creation of lib sqlite3xx: a simple C++ wrapper for sqlite3 C
API: add Transaction, Statement and Count class
* demo/Makefile.am: creation
* demo/dmoz2sqlite.cpp: add a importer for dmoz dump file
not very fast, but enough to do the job, memory consumption is
very low
2005-09-27 Pierre PA. Aubert <pierreaubert@yahoo.fr>
* src/ft3_match.i: add wrappers for SWIG, target PHP and RUBY
interfaces; currently not under CVS
2005-09-21 Pierre PA. Aubert <pierreaubert@yahoo.fr>
* src/ft3_searcher.cpp: add an interface to libedit which is a BSD
rewrite of GNU readline
2005-09-20 Pierre PA. Aubert <pierreaubert@yahoo.fr>
* src/ft3_smartp.cpp: add a basic profiler, will need to go to a
C+ hash_map
* src/ft3_searcher.cpp (main): add a callback to select *
* doc/install.ssi: update documentation
* src/ft3_smartp.cpp: use count distinct in query
* update to sqlite3 version 3.2.6
2005-09-19 Pierre PA. Aubert <pierreaubert@yahoo.fr>
* src/ft3_match.hpp (class Match): add a string interface to the
char* interface
2005-09-16 Pierre PA. Aubert <pierreaubert@yahoo.fr>
* src/ft3_config.cpp (checked): validate status of IndexConfig
* src/ft3_document.hpp: split ft3_index_XXX functions from ft3_indexer
* src/ft3_bow.hpp: split bag of words class from ft3_indexer
2005-09-15 Pierre PA. Aubert <pierreaubert@yahoo.fr>
* configure.ac: upgrade to new lib version
notice API change into ft3_search.hpp
2005-09-14 Pierre PA. Aubert <pierreaubert@yahoo.fr>
* src/ft3_smartp.hpp: splitted from ft3_search
this is a first simple version with fragment extracted from
sql database. due to current hashing for ft3_scores, this is
too slow to be viable
2005-09-09 Pierre PA. Aubert <pierreaubert@yahoo.fr>
* src/Makefile.am (t_stringsplit_LDADD): add missing dependancies
* configure.ac: add AC_PROG_AWK
* doc/html.awk: add a missing \ before ) in regexp
* doc/Makefile.am: add a conditional AWK
2005-09-08 Pierre PA. Aubert <pierreaubert@yahoo.fr>
* sql/v10tov11.sqlite: sql script to go from schema v10 to schema
v11
* sql/ft3.sqlite: goes to version 11
2005-09-03 Pierre PA. Aubert <pierreaubert@yahoo.fr>
* Makefile.am: correct dist target
* successfully upload on sourceforge.net
* contrib/stemming-0.3/data/Makefile.am (LM): remove drents
2005-09-02 Pierre PA. Aubert <pierreaubert/at\yahoo/dot\fr>
* doc/html.awk: add path variable
* doc/Makefile.am: add generation of HTML files and installation
* src/ft3_indexer.cpp (index_url): add [[:space:]] is splitter
* src/t_ascii.cpp (main): add a test case
* src/t_stringsplit.cpp (main): add a test case
2005-09-01 Pierre PA. Aubert <pierreaubert/at\yahoo/dot\fr>
* doc/ft3.css: creation from ICU doc css file
* src/ft3_indexer.cpp: simulate the CLUSTER INDEX functionnality
by sorting the table
(main): call a VACUUM FULL after a run
2005-08-12 Pierre F. Aubert <pierre.aubert/at\free/dot\fr>
* doc/userguide.ssi: creation
2005-08-11 Pierre F. Aubert <pierre.aubert/at\free/dot\fr>
* doc/install.ssi: creation
2005-08-04 Pierre PA. Aubert <pierreaubert/at\yahoo/dot\fr>
* create a doc directory with basic documentation
* configure.ac: cleaning, add some comments
2005-07-27 Pierre PA. Aubert <pierreaubert/at\yahoo/dot\fr>
* mv small bits of encoding dealing into ft3_matcher.cpp and .hpp
* src/ft3_wordstorage.hpp: add missing namespace ft3
2005-07-26 Pierre PA. Aubert <pierreaubert/at\yahoo/dot\fr>
* creation of changelog