You can subscribe to this list here.
2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
(3) |
Sep
(9) |
Oct
(1) |
Nov
(12) |
Dec
(18) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2007 |
Jan
(13) |
Feb
(11) |
Mar
(4) |
Apr
(2) |
May
(3) |
Jun
(9) |
Jul
|
Aug
|
Sep
(3) |
Oct
(2) |
Nov
(8) |
Dec
(1) |
2008 |
Jan
(1) |
Feb
|
Mar
(5) |
Apr
(4) |
May
|
Jun
(1) |
Jul
|
Aug
(3) |
Sep
(4) |
Oct
|
Nov
(1) |
Dec
|
2009 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
(2) |
Oct
(3) |
Nov
(2) |
Dec
|
2010 |
Jan
|
Feb
|
Mar
|
Apr
(2) |
May
(2) |
Jun
|
Jul
(5) |
Aug
|
Sep
|
Oct
|
Nov
(5) |
Dec
(20) |
2011 |
Jan
(10) |
Feb
(16) |
Mar
|
Apr
(1) |
May
(2) |
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Jos v. d. O. <jo...@va...> - 2011-06-02 09:41:07
|
On Saturday, May 28, 2011 17:01:19 PM Bill Ross wrote: > Been struggling with this for quite some time. No net search results, so I > am doing something unique and / or stupid. > > > > clucene-core-0.9.21b, strigi-0.7.2 > > > > Q: Is this the right version combo for KDE-4.6.1? 0.7.5 was just released and has many fixes. If you are using git to get the code, you can get it now. If you want a tarball, you have to wait a few hours before it hits the website. > Q: has anyone gotten kde-buildrc script to cross-compile or hints how to do > so? I cannot answer that. I know some people use scratchbox, which does crosscompilation, but I do not know of any other ways. > Q: Are there some clucene config flags I may have missed to build classes > 'lucene::analysis::standard`, 'lucene::store::FSDirectory'? These classes are essential, they should never be left out. > Q: Any clues how to proceed? I would suggest upgrading to 0.7.5 and if there is still a problem compiling, let us know. > > > > Code: > > > > cd > /home/rossb/OpenWrt/backfire/build_dir/target-i386_glibc-2.6.1/strigi-0.7.2 > / src/luceneindexer && > > /home/rossb/OpenWrt/backfire/staging_dir/toolchain-i386_gcc-4.1.2_glibc-2.6 > . 1/usr/bin/i486-openwrt-linux-gnu-g++ > > -Dclucene_EXPORTS -D_REENTRANT -DQT_GUI_LIB -DQT_CORE_LIB -DHAVE_CONFIG_H > -DQT_NO_DEBUG -O2 -pipe -march=i486 -funit-at-a-time -fhonour-copts > -D_UNICODE -D_CL_HAVE_GCC_ATOMIC_FUNCTIONS=1 > > -I/home/rossb/OpenWrt/backfire/build_dir/target-i386_glibc-2.6.1/strigi-0.7 > . 2/src/streamanalyzer > > -I/home/rossb/OpenWrt/backfire/staging_dir/target-i386_glibc-2.6.1/usr/incl > u de/CLucene > -I/home/rossb/OpenWrt/backfire/staging_dir/target-i386_glibc-2.6.1/usr/incl > u de/CLucene/analysis/standard > > -I/home/rossb/OpenWrt/backfire/staging_dir/target-i386_glibc-2.6.1/usr/incl > u de > > -I/home/rossb/OpenWrt/backfire/staging_dir/target-i386_glibc-2.6.1/include > -I/home/rossb/OpenWrt/backfire/staging_dir/toolchain-i386_gcc-4.1.2_glibc-2 > . 6.1/usr/include > > -Wnon-virtual-dtor -Wno-long-long -ansi -Wundef -Wcast-align > -Wchar-subscripts -Wall -W -Wpointer-arith -Wformat-security -fno-check-new > -fno-common -fexceptions -Wno-unused-parameter -fvisibility=hidden > -fvisibility=default -DNDEBUG -fPIC > > -I/home/rossb/OpenWrt/backfire/build_dir/target-i386_glibc-2.6.1/strigi-0.7 > . 2/src/luceneindexer > > -I/home/rossb/OpenWrt/backfire/build_dir/target-i386_glibc-2.6.1/qt-kde-201 > 1 -04-06/include > > -I/home/rossb/OpenWrt/backfire/build_dir/target-i386_glibc-2.6.1/qt-kde-201 > 1 -04-06/include/QtGui > > -I/home/rossb/OpenWrt/backfire/staging_dir/target-i386_glibc-2.6.1/usr/incl > u de/QtCore > > -I/home/rossb/OpenWrt/backfire/build_dir/target-i386_glibc-2.6.1/strigi-0.7 > . 2 > > -I/home/rossb/OpenWrt/backfire/build_dir/target-i386_glibc-2.6.1/strigi-0.7 > . 2/src/streamanalyzer > > -I/home/rossb/OpenWrt/backfire/build_dir/target-i386_glibc-2.6.1/strigi-0.7 > . 2/src/streams > > -I/home/rossb/OpenWrt/backfire/build_dir/target-i386_glibc-2.6.1/strigi-0.7 > . 2/src/streams/strigi > > -I/home/rossb/OpenWrt/backfire/staging_dir/target-i386_glibc-2.6.1/usr/incl > u de > > -I/home/rossb/OpenWrt/backfire/staging_dir/toolchain-i386_gcc-4.1.2_glibc-2 > . 6.1/usr/include > > -fPIC -o CMakeFiles/clucene.dir/cluceneindexmanager.cpp.o > > -c > /home/rossb/OpenWrt/backfire/build_dir/target-i386_glibc-2.6.1/strigi-0.7.2 > / src/luceneindexer/cluceneindexmanager.cpp > > > > Errors. > > /home/rossb/OpenWrt/backfire/build_dir/target-i386_glibc-2.6.1/strigi-0.7.2 > / src/luceneindexer/cluceneindexmanager.cpp:45: error: > 'lucene::analysis::standard' has not been declared > > /home/rossb/OpenWrt/backfire/build_dir/target-i386_glibc-2.6.1/strigi-0.7.2 > / src/luceneindexer/cluceneindexmanager.cpp:46: error: > 'lucene::store::FSDirectory' has not been declared > > .etc > > > > I did the obvious of including StandardAnalyzer.h to > cluceneindexmanager.cpp and adding include path. No change. > > > > Thanks. |
From: Bill R. <ro...@sy...> - 2011-05-28 15:01:32
|
Been struggling with this for quite some time. No net search results, so I am doing something unique and / or stupid. clucene-core-0.9.21b, strigi-0.7.2 Q: Is this the right version combo for KDE-4.6.1? Q: has anyone gotten kde-buildrc script to cross-compile or hints how to do so? Q: Are there some clucene config flags I may have missed to build classes 'lucene::analysis::standard`, 'lucene::store::FSDirectory'? Q: Any clues how to proceed? Code: cd /home/rossb/OpenWrt/backfire/build_dir/target-i386_glibc-2.6.1/strigi-0.7.2/ src/luceneindexer && /home/rossb/OpenWrt/backfire/staging_dir/toolchain-i386_gcc-4.1.2_glibc-2.6. 1/usr/bin/i486-openwrt-linux-gnu-g++ -Dclucene_EXPORTS -D_REENTRANT -DQT_GUI_LIB -DQT_CORE_LIB -DHAVE_CONFIG_H -DQT_NO_DEBUG -O2 -pipe -march=i486 -funit-at-a-time -fhonour-copts -D_UNICODE -D_CL_HAVE_GCC_ATOMIC_FUNCTIONS=1 -I/home/rossb/OpenWrt/backfire/build_dir/target-i386_glibc-2.6.1/strigi-0.7. 2/src/streamanalyzer -I/home/rossb/OpenWrt/backfire/staging_dir/target-i386_glibc-2.6.1/usr/inclu de/CLucene -I/home/rossb/OpenWrt/backfire/staging_dir/target-i386_glibc-2.6.1/usr/inclu de/CLucene/analysis/standard -I/home/rossb/OpenWrt/backfire/staging_dir/target-i386_glibc-2.6.1/usr/inclu de -I/home/rossb/OpenWrt/backfire/staging_dir/target-i386_glibc-2.6.1/include -I/home/rossb/OpenWrt/backfire/staging_dir/toolchain-i386_gcc-4.1.2_glibc-2. 6.1/usr/include -Wnon-virtual-dtor -Wno-long-long -ansi -Wundef -Wcast-align -Wchar-subscripts -Wall -W -Wpointer-arith -Wformat-security -fno-check-new -fno-common -fexceptions -Wno-unused-parameter -fvisibility=hidden -fvisibility=default -DNDEBUG -fPIC -I/home/rossb/OpenWrt/backfire/build_dir/target-i386_glibc-2.6.1/strigi-0.7. 2/src/luceneindexer -I/home/rossb/OpenWrt/backfire/build_dir/target-i386_glibc-2.6.1/qt-kde-2011 -04-06/include -I/home/rossb/OpenWrt/backfire/build_dir/target-i386_glibc-2.6.1/qt-kde-2011 -04-06/include/QtGui -I/home/rossb/OpenWrt/backfire/staging_dir/target-i386_glibc-2.6.1/usr/inclu de/QtCore -I/home/rossb/OpenWrt/backfire/build_dir/target-i386_glibc-2.6.1/strigi-0.7. 2 -I/home/rossb/OpenWrt/backfire/build_dir/target-i386_glibc-2.6.1/strigi-0.7. 2/src/streamanalyzer -I/home/rossb/OpenWrt/backfire/build_dir/target-i386_glibc-2.6.1/strigi-0.7. 2/src/streams -I/home/rossb/OpenWrt/backfire/build_dir/target-i386_glibc-2.6.1/strigi-0.7. 2/src/streams/strigi -I/home/rossb/OpenWrt/backfire/staging_dir/target-i386_glibc-2.6.1/usr/inclu de -I/home/rossb/OpenWrt/backfire/staging_dir/toolchain-i386_gcc-4.1.2_glibc-2. 6.1/usr/include -fPIC -o CMakeFiles/clucene.dir/cluceneindexmanager.cpp.o -c /home/rossb/OpenWrt/backfire/build_dir/target-i386_glibc-2.6.1/strigi-0.7.2/ src/luceneindexer/cluceneindexmanager.cpp Errors. /home/rossb/OpenWrt/backfire/build_dir/target-i386_glibc-2.6.1/strigi-0.7.2/ src/luceneindexer/cluceneindexmanager.cpp:45: error: 'lucene::analysis::standard' has not been declared /home/rossb/OpenWrt/backfire/build_dir/target-i386_glibc-2.6.1/strigi-0.7.2/ src/luceneindexer/cluceneindexmanager.cpp:46: error: 'lucene::store::FSDirectory' has not been declared .etc I did the obvious of including StandardAnalyzer.h to cluceneindexmanager.cpp and adding include path. No change. Thanks. |
From: Egon W. <ego...@gm...> - 2011-05-14 14:52:01
|
Hi Raphael, On Fri, Apr 15, 2011 at 5:19 AM, Raphael Kubo da Costa <ku...@gm...> wrote: > I see there are many bug reports around that have not been answered, > there has been no release in quite some time (the kdepim folks are > already resorting to hacks to make the next version work with strigi > 0.7.2) and the mailing list has seen very little traffic recently. There was some recent discussion about that. > Please don't see this as a troll mail, I am really interested in knowing > whether there is anyone acting as a project leader, if the bug reports > are being looked at (crash reports are not the only kind of bug there) > and if there is anyone involved in making a new release. I guess people have been busy with life... :/ > My knowledge of strigi's internals are close to zero, but I can > certainly help with buildsystem stuff and getting a new release out of > the door. > > Is there anybody out there? :) I am still here, and one of the people who worked on a chemistry extension of Strigi, together with Alexander Goncearenco who worked on that in a GSoc project... I have very little time for this project, and Alexander picked up a PhD in Norway, and also had little time... for the core team, I think they have had similar things... Regarding the bugs... these often are from plugins, rather than the core jstreams platform... the world is full of weird crap data, requiring Strigi to be super robust... but also against 'use'... recently there were bug reports caused by incomplete files send to Strigi for analysis, causing plugins to crash because fields of metadata ended in mid-air... :/ I'm sure this is not quite an answer you were hoping for, but like to personally thank you for your patching and interest in the project... Egon -- Dr E.L. Willighagen Postdoctoral Researcher Institutet för miljömedicin Karolinska Institutet (http://ki.se/imm) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |
From: Raphael K. da C. <ku...@gm...> - 2011-04-15 03:19:28
|
Hi there, Recently I've been fixing some bugs and updating stuff (mostly buildsystem-related code) in each of the submodules, and have synced the top-level strigi module to newer submodule revisions. I see there are many bug reports around that have not been answered, there has been no release in quite some time (the kdepim folks are already resorting to hacks to make the next version work with strigi 0.7.2) and the mailing list has seen very little traffic recently. Please don't see this as a troll mail, I am really interested in knowing whether there is anyone acting as a project leader, if the bug reports are being looked at (crash reports are not the only kind of bug there) and if there is anyone involved in making a new release. My knowledge of strigi's internals are close to zero, but I can certainly help with buildsystem stuff and getting a new release out of the door. Is there anybody out there? :) |
From: Evgeny E. <phr...@gm...> - 2011-02-27 21:53:08
|
First of all, I need to apologize for the neglect. I do plan to work more on bug fixes. I couldn't get much done in the last 4 months or so :( More comments below. On Saturday 26 February 2011 18:02:07 Karsten König wrote: > Am Freitag, 25. Februar 2011, 21:13:46 schrieb Peter Penz: > > On Friday 25 February 2011 20:42:34 Egon Willighagen wrote: > > > On Fri, Feb 25, 2011 at 8:28 PM, Egon Willighagen > > > > > > <ego...@gm...> wrote: > > > > Some bug reports do, in fact, ask for the files that cause the test; > > > > IMHO that is crucial here, and should be part of the bug report. Some > > > > bug reporters actually do report that, and that is very useful. > > > > > > For example, bug report #258715 mentions a file, so I just quickly ran > > > xmlindex on it (1 and 8 threads), and commented in the bug report: > > > > > > "I ran Strigi on the mentioned .zip file (it will go into the .zip, > > > and thus still test the jars) using xmlindexer, and get no crash: > > > > > > $ xmlindexer querydsl-jpa-2.0.5-full-deps.zip > > > > > > Tons of XML output and an undefined symbol ("xmlindexer: symbol lookup > > > error: /usr/lib/libldap_r-2.4.so.2: undefined symbol: > > > ldap_int_tls_destroy, version OPENLDAP_2.4_2"), but no crash. > > > > > > As such, I wonder if it is really the streamanalyzers that are buggy, > > > because then I would have gotten the crash too. KDE 4.4.5, Strigi > > > 0.7.2 on Debian Squeeze 32 bit. > > > > I'm not familiar with the xmlindexer. KFileMetaInfo uses a feature in > > Strigi that is not used by xmlindexer I think: The streams are limited to > > 64 KB and that is something where some analyzers crash. They do some > > pointer-arithmetics which goes behind those 64 KB without checking > > whether they have gotten such a range of memory. > > That 64KB limit is only the worst offender which pointed out the lack of > input checking, but finding this as the reason for the dolphin crash was > really a pita. > > > >From my point of view unit-tests for analyzers with the files attached > > >to bug-reports would be mandatory to prevent issues like this. > > > > > > I also tried it in threaded mode, again without crash: > > > > > > $ xmlindexer querydsl-jpa-2.0.5-full-deps.zip > > > > > > Can those who do get the crash, perhaps try to run Strigi on those > > > files with xmlindexer too?" > > > > Some people are really helpful and we could of course ask them. But I > > think we must be happy if people report bugs at all and are able to > > attach the document, the testing must really be done by the developers I > > think ;-) > > > > Peter > > +1, people do report the bugs and are responsive, the strigi bugs on bko > are often well documented already. > > The problem is even with better inputrange checks this stuff might blow in > our faces, a corrupt file (maybe even with malicous intent) can be crafted > very easily. > I asked in #strigi if we could do something about that, I didn't knew you > can't properly catch segfaults with c++, especially not platform > independent, so that idea doesn't work. So we came up with running a > seperate process and let that die in the worst case and just throw the > exception in our watcher thread. This is the way the tracker guys do it > they told me in #tracker. > > Problem is this is very hard to do and Jos is (obviously) lacking time > here. > Btw, I think we should still fix all the segfaulting parsers, but > segfaulting the users of KFileMetaInfo is bad bad bad and this needs > better handling. The problem is not even strigi analyzers since we can eventually fix them all. The problem is 3rd party analyzers or even libs they use(as happened with tiff). The only flawless solution is indeed a managed language or a separate process. One more issue we will probably encounter in the future is hanging analyzers. -- Evgeny |
From: Peter P. <pet...@gm...> - 2011-02-27 20:24:09
|
On Sunday 27 February 2011 20:17:48 Karsten König wrote: > --- Comment #4 from Peter Penz <peter penz19 gmail com> 2011-02-27 20:11:33 > --- > Git commit d1254968eac95f4d9d8c47552e27f0cc40e6aed9 by Peter Penz. > Committed on 27/02/2011 at 20:01. > Pushed by ppenz into branch 'master'. > > KFileMetaDataProvider: Don't crash if a Strigi analyzer crashes > > Some Strigi analyzers are still unstable and crash under certain > circumstances. > This patch moves the reading of the meta-data into a custom process instead of > doing it in a thread, which assures that a crashing Strigi analyzer does not > result in crashing Dolphin or Konqueror when hovering some items. > > ------------------------------------------------------------ > > > Wooo you are awesome sir! > The patch looks small and consistent, will you backport it into the 4.6 > branch? The patch works quite well, but still needs some finetuning (e.g. the tags-meta-data are currently not shown). All in all I'd say it's still a too big and risky change for backporting it to 4.6. We lived 3 years with this kind of instability, waiting another 5 months should be somehow acceptable ;-) > I understand this is fixed on kde side so you are closing the bugs, but they > are still bugs in strigi so they shouldn't get lost =/ Most of them are already part of the strigi bugs tracker, so I hope they will not get forgotton (for the other part of the "most" I'm quite sure they are duplicates of reported bugs). > Thanks for the awesome work! You're welcome :-) > Karsten > > > Am Freitag, 25. Februar 2011, 18:36:07 schrieb Peter Penz: > > Hi, > > > > Dolphin still suffers from some quite unstable strigi analyzers that might > > crash. I've tried to forward strigi-bugs reported at bugs.kde.org to the > > strigi bug tracker at > > http://sourceforge.net/tracker/?group_id=171000&atid=856302 but I've the > > impression that most of the analyzers are unmaintained and that there is > > very less activity on the strigi bug tracker. > > > > As there is still a quite huge list of open strigi-issues reported at > > bugs.kde.org (see [1]) I wanted to ask whether it makes sense at all to > > forward those reports to the strigi bug tracker? > > > > When KDE 4.0 got released three years ago I was sure that it is only a > > matter of time until the strigi-analyzers get stable, but in the meantime > > I'm not so convinced about this anymore. I've tried to fix some analyzer > > crashes myself, I've also contacted some authors of analyzers directly, > > but in the meantime I'm thinking about maintaining a black-list of crashy > > strigi-analyzers in KFileMetaInfo as ugly workaround... > > > > Before investigating a lot of work to forward the bugs below and before > > implementing such a black-list I'm writing this mail in the hope that my > > impression is completely wrong and that it still is only a matter of time > > until those crashes get fixed :-) > > > > Cheers, > > Peter > > > > [1] > > https://bugs.kde.org/show_bug.cgi?id=257964 > > https://bugs.kde.org/show_bug.cgi?id=258715 > > https://bugs.kde.org/show_bug.cgi?id=262299 > > https://bugs.kde.org/show_bug.cgi?id=263468 > > https://bugs.kde.org/show_bug.cgi?id=263502 > > https://bugs.kde.org/show_bug.cgi?id=264254 > > https://bugs.kde.org/show_bug.cgi?id=234799 > > https://bugs.kde.org/show_bug.cgi?id=251462 > > https://bugs.kde.org/show_bug.cgi?id=251701 > > https://bugs.kde.org/show_bug.cgi?id=258918 > > https://bugs.kde.org/show_bug.cgi?id=192377 > > https://bugs.kde.org/show_bug.cgi?id=195564 > > https://bugs.kde.org/show_bug.cgi?id=199368 > > https://bugs.kde.org/show_bug.cgi?id=210841 > > https://bugs.kde.org/show_bug.cgi?id=245376 > > https://bugs.kde.org/show_bug.cgi?id=246461 > > https://bugs.kde.org/show_bug.cgi?id=246901 > > https://bugs.kde.org/show_bug.cgi?id=249150 > > https://bugs.kde.org/show_bug.cgi?id=261952 > > https://bugs.kde.org/show_bug.cgi?id=249876 > > https://bugs.kde.org/show_bug.cgi?id=179376 > > https://bugs.kde.org/show_bug.cgi?id=179417 > > https://bugs.kde.org/show_bug.cgi?id=179420 > > https://bugs.kde.org/show_bug.cgi?id=181591 > > https://bugs.kde.org/show_bug.cgi?id=183269 > > https://bugs.kde.org/show_bug.cgi?id=183722 > > https://bugs.kde.org/show_bug.cgi?id=185667 > > https://bugs.kde.org/show_bug.cgi?id=188596 > > https://bugs.kde.org/show_bug.cgi?id=191864 > > https://bugs.kde.org/show_bug.cgi?id=193112 > > https://bugs.kde.org/show_bug.cgi?id=205813 > > https://bugs.kde.org/show_bug.cgi?id=244621 > > https://bugs.kde.org/show_bug.cgi?id=245451 > > https://bugs.kde.org/show_bug.cgi?id=248214 > > https://bugs.kde.org/show_bug.cgi?id=249655 > > https://bugs.kde.org/show_bug.cgi?id=249876 > > https://bugs.kde.org/show_bug.cgi?id=265549 > > https://bugs.kde.org/show_bug.cgi?id=267079 > > > > --------------------------------------------------------------------------- > > --- Free Software Download: Index, Search & Analyze Logs and other IT data > > in Real-Time with Splunk. Collect, index and harness all the fast moving > > IT data generated by your applications, servers and devices whether > > physical, virtual or in the cloud. Deliver compliance at lower cost and > > gain new business insights. http://p.sf.net/sfu/splunk-dev2dev > > _______________________________________________ > > Strigi-devel mailing list > > Str...@li... > > https://lists.sourceforge.net/lists/listinfo/strigi-devel > > > ------------------------------------------------------------------------------ > Free Software Download: Index, Search & Analyze Logs and other IT data in > Real-Time with Splunk. Collect, index and harness all the fast moving IT data > generated by your applications, servers and devices whether physical, virtual > or in the cloud. Deliver compliance at lower cost and gain new business > insights. http://p.sf.net/sfu/splunk-dev2dev > _______________________________________________ > Strigi-devel mailing list > Str...@li... > https://lists.sourceforge.net/lists/listinfo/strigi-devel > |
From: Karsten K. <re...@gm...> - 2011-02-27 19:17:55
|
--- Comment #4 from Peter Penz <peter penz19 gmail com> 2011-02-27 20:11:33 --- Git commit d1254968eac95f4d9d8c47552e27f0cc40e6aed9 by Peter Penz. Committed on 27/02/2011 at 20:01. Pushed by ppenz into branch 'master'. KFileMetaDataProvider: Don't crash if a Strigi analyzer crashes Some Strigi analyzers are still unstable and crash under certain circumstances. This patch moves the reading of the meta-data into a custom process instead of doing it in a thread, which assures that a crashing Strigi analyzer does not result in crashing Dolphin or Konqueror when hovering some items. ------------------------------------------------------------ Wooo you are awesome sir! The patch looks small and consistent, will you backport it into the 4.6 branch? I understand this is fixed on kde side so you are closing the bugs, but they are still bugs in strigi so they shouldn't get lost =/ Thanks for the awesome work! Karsten Am Freitag, 25. Februar 2011, 18:36:07 schrieb Peter Penz: > Hi, > > Dolphin still suffers from some quite unstable strigi analyzers that might > crash. I've tried to forward strigi-bugs reported at bugs.kde.org to the > strigi bug tracker at > http://sourceforge.net/tracker/?group_id=171000&atid=856302 but I've the > impression that most of the analyzers are unmaintained and that there is > very less activity on the strigi bug tracker. > > As there is still a quite huge list of open strigi-issues reported at > bugs.kde.org (see [1]) I wanted to ask whether it makes sense at all to > forward those reports to the strigi bug tracker? > > When KDE 4.0 got released three years ago I was sure that it is only a > matter of time until the strigi-analyzers get stable, but in the meantime > I'm not so convinced about this anymore. I've tried to fix some analyzer > crashes myself, I've also contacted some authors of analyzers directly, > but in the meantime I'm thinking about maintaining a black-list of crashy > strigi-analyzers in KFileMetaInfo as ugly workaround... > > Before investigating a lot of work to forward the bugs below and before > implementing such a black-list I'm writing this mail in the hope that my > impression is completely wrong and that it still is only a matter of time > until those crashes get fixed :-) > > Cheers, > Peter > > [1] > https://bugs.kde.org/show_bug.cgi?id=257964 > https://bugs.kde.org/show_bug.cgi?id=258715 > https://bugs.kde.org/show_bug.cgi?id=262299 > https://bugs.kde.org/show_bug.cgi?id=263468 > https://bugs.kde.org/show_bug.cgi?id=263502 > https://bugs.kde.org/show_bug.cgi?id=264254 > https://bugs.kde.org/show_bug.cgi?id=234799 > https://bugs.kde.org/show_bug.cgi?id=251462 > https://bugs.kde.org/show_bug.cgi?id=251701 > https://bugs.kde.org/show_bug.cgi?id=258918 > https://bugs.kde.org/show_bug.cgi?id=192377 > https://bugs.kde.org/show_bug.cgi?id=195564 > https://bugs.kde.org/show_bug.cgi?id=199368 > https://bugs.kde.org/show_bug.cgi?id=210841 > https://bugs.kde.org/show_bug.cgi?id=245376 > https://bugs.kde.org/show_bug.cgi?id=246461 > https://bugs.kde.org/show_bug.cgi?id=246901 > https://bugs.kde.org/show_bug.cgi?id=249150 > https://bugs.kde.org/show_bug.cgi?id=261952 > https://bugs.kde.org/show_bug.cgi?id=249876 > https://bugs.kde.org/show_bug.cgi?id=179376 > https://bugs.kde.org/show_bug.cgi?id=179417 > https://bugs.kde.org/show_bug.cgi?id=179420 > https://bugs.kde.org/show_bug.cgi?id=181591 > https://bugs.kde.org/show_bug.cgi?id=183269 > https://bugs.kde.org/show_bug.cgi?id=183722 > https://bugs.kde.org/show_bug.cgi?id=185667 > https://bugs.kde.org/show_bug.cgi?id=188596 > https://bugs.kde.org/show_bug.cgi?id=191864 > https://bugs.kde.org/show_bug.cgi?id=193112 > https://bugs.kde.org/show_bug.cgi?id=205813 > https://bugs.kde.org/show_bug.cgi?id=244621 > https://bugs.kde.org/show_bug.cgi?id=245451 > https://bugs.kde.org/show_bug.cgi?id=248214 > https://bugs.kde.org/show_bug.cgi?id=249655 > https://bugs.kde.org/show_bug.cgi?id=249876 > https://bugs.kde.org/show_bug.cgi?id=265549 > https://bugs.kde.org/show_bug.cgi?id=267079 > > --------------------------------------------------------------------------- > --- Free Software Download: Index, Search & Analyze Logs and other IT data > in Real-Time with Splunk. Collect, index and harness all the fast moving > IT data generated by your applications, servers and devices whether > physical, virtual or in the cloud. Deliver compliance at lower cost and > gain new business insights. http://p.sf.net/sfu/splunk-dev2dev > _______________________________________________ > Strigi-devel mailing list > Str...@li... > https://lists.sourceforge.net/lists/listinfo/strigi-devel |
From: Karsten K. <re...@gm...> - 2011-02-26 16:02:12
|
Am Freitag, 25. Februar 2011, 21:13:46 schrieb Peter Penz: > On Friday 25 February 2011 20:42:34 Egon Willighagen wrote: > > On Fri, Feb 25, 2011 at 8:28 PM, Egon Willighagen > > > > <ego...@gm...> wrote: > > > Some bug reports do, in fact, ask for the files that cause the test; > > > IMHO that is crucial here, and should be part of the bug report. Some > > > bug reporters actually do report that, and that is very useful. > > > > For example, bug report #258715 mentions a file, so I just quickly ran > > xmlindex on it (1 and 8 threads), and commented in the bug report: > > > > "I ran Strigi on the mentioned .zip file (it will go into the .zip, > > and thus still test the jars) using xmlindexer, and get no crash: > > > > $ xmlindexer querydsl-jpa-2.0.5-full-deps.zip > > > > Tons of XML output and an undefined symbol ("xmlindexer: symbol lookup > > error: /usr/lib/libldap_r-2.4.so.2: undefined symbol: > > ldap_int_tls_destroy, version OPENLDAP_2.4_2"), but no crash. > > > > As such, I wonder if it is really the streamanalyzers that are buggy, > > because then I would have gotten the crash too. KDE 4.4.5, Strigi > > 0.7.2 on Debian Squeeze 32 bit. > > I'm not familiar with the xmlindexer. KFileMetaInfo uses a feature in > Strigi that is not used by xmlindexer I think: The streams are limited to > 64 KB and that is something where some analyzers crash. They do some > pointer-arithmetics which goes behind those 64 KB without checking whether > they have gotten such a range of memory. That 64KB limit is only the worst offender which pointed out the lack of input checking, but finding this as the reason for the dolphin crash was really a pita. > >From my point of view unit-tests for analyzers with the files attached to > >bug-reports would be mandatory to prevent issues like this. > > > > I also tried it in threaded mode, again without crash: > > > > $ xmlindexer querydsl-jpa-2.0.5-full-deps.zip > > > > Can those who do get the crash, perhaps try to run Strigi on those > > files with xmlindexer too?" > > Some people are really helpful and we could of course ask them. But I think > we must be happy if people report bugs at all and are able to attach the > document, the testing must really be done by the developers I think ;-) > > Peter +1, people do report the bugs and are responsive, the strigi bugs on bko are often well documented already. The problem is even with better inputrange checks this stuff might blow in our faces, a corrupt file (maybe even with malicous intent) can be crafted very easily. I asked in #strigi if we could do something about that, I didn't knew you can't properly catch segfaults with c++, especially not platform independent, so that idea doesn't work. So we came up with running a seperate process and let that die in the worst case and just throw the exception in our watcher thread. This is the way the tracker guys do it they told me in #tracker. Problem is this is very hard to do and Jos is (obviously) lacking time here. Btw, I think we should still fix all the segfaulting parsers, but segfaulting the users of KFileMetaInfo is bad bad bad and this needs better handling. Cheers, Karsten > > Egon > > --------------------------------------------------------------------------- > --- Free Software Download: Index, Search & Analyze Logs and other IT data > in Real-Time with Splunk. Collect, index and harness all the fast moving > IT data generated by your applications, servers and devices whether > physical, virtual or in the cloud. Deliver compliance at lower cost and > gain new business insights. http://p.sf.net/sfu/splunk-dev2dev > _______________________________________________ > Strigi-devel mailing list > Str...@li... > https://lists.sourceforge.net/lists/listinfo/strigi-devel |
From: Egon W. <ego...@gm...> - 2011-02-25 20:42:54
|
On Fri, Feb 25, 2011 at 9:34 PM, Peter Penz <pet...@gm...> wrote: > I did not investigate into each of those bug-reports yet so there might be reports where the strigi-analyzers got fixed. Still we have too many reports for 4.6 where we get crashes and that was the intention of starting this thread: Does it make sense at all to make bug reports to the strigi but tracker if nobody seems to care to close the many open bugs there (or at least give an answer that it takes a while to fix this)? It does not surprise me that sending an unfinished stream causes trouble, but I would need to ask Jos (he's not on IRC tonite) if this is the proper way to do this... >From a design perspective, I would say it is not... making the analyzers robust for arbitrary stream stops is pretty nasty, particularly because the can happen in the middle of a field... What I could imagine, that you can pass jstreams a max number of bytes to analyze, or just a bit more if it needs to finish fields... then one would not have to complicate to code with checks for every byte it is trying to read, which would, I guess, cause bug reports in it being too slow :) But this discussion is really out of my league... >> I also found one report out of the some ten-ish I looked at, that was >> already marked as fixed :/ >> >> Strigi doesn't seem to do all that bad... not the streamanalyzers >> anyway. > > It is not my intention to make the impression that strigi is bad. All I want to know is whether there is still some active developer community around that is concerned to fix crashes and takes care that the bug tracker is maintained. The IRC channel is typically populated by some ten people, but I am one of them... I also noted that one bug was in an analyzer done by Amarok... >> So, it seems to me to be either something platform-dependent, >> or perhaps in the way Dolphin calls the streamanalyzers... > > Dolphin does not call the stream analyzers directly, this is done by KFileMetaInfo which has been adjusted to use Strigi by Jos himself. I doubt that Jos uses Strigi in a wrong way ;-) True, but I don't think Jos has control over or wrote all analyzers... like perhaps the Amarok one. >> I am also puzzled by several of your bug reports, because the are >> about copying files... is Dolphin in fact extracting metadata of files >> that it is copying? > > No, it does not using Strigi for copying itself. But if a file gets overwritten a dialog pops up which shows the meta-data of both files -> Strigi gets used... Ah, ok, that does make a lot of sense :) Egon -- Dr E.L. Willighagen Postdoctoral Researcher Institutet för miljömedicin Karolinska Institutet Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |
From: Peter P. <pet...@gm...> - 2011-02-25 20:34:44
|
On Friday 25 February 2011 21:17:44 Egon Willighagen wrote: > On Fri, Feb 25, 2011 at 6:36 PM, Peter Penz <pet...@gm...> wrote: > > Before investigating a lot of work to forward the bugs below and before implementing such a black-list I'm writing this mail in the hope that my impression is completely wrong and that it still is only a matter of time until those crashes get fixed :-) > > I tested four files now, the first files I could find listed in bug > reports. None make Strigi crash on my machine. None even make Dolphin > crash on my machine. I did not investigate into each of those bug-reports yet so there might be reports where the strigi-analyzers got fixed. Still we have too many reports for 4.6 where we get crashes and that was the intention of starting this thread: Does it make sense at all to make bug reports to the strigi but tracker if nobody seems to care to close the many open bugs there (or at least give an answer that it takes a while to fix this)? > I also found one report out of the some ten-ish I looked at, that was > already marked as fixed :/ > > Strigi doesn't seem to do all that bad... not the streamanalyzers > anyway. It is not my intention to make the impression that strigi is bad. All I want to know is whether there is still some active developer community around that is concerned to fix crashes and takes care that the bug tracker is maintained. > So, it seems to me to be either something platform-dependent, > or perhaps in the way Dolphin calls the streamanalyzers... Dolphin does not call the stream analyzers directly, this is done by KFileMetaInfo which has been adjusted to use Strigi by Jos himself. I doubt that Jos uses Strigi in a wrong way ;-) > I am also puzzled by several of your bug reports, because the are > about copying files... is Dolphin in fact extracting metadata of files > that it is copying? No, it does not using Strigi for copying itself. But if a file gets overwritten a dialog pops up which shows the meta-data of both files -> Strigi gets used... > Egon > > |
From: Egon W. <ego...@gm...> - 2011-02-25 20:31:07
|
On Fri, Feb 25, 2011 at 9:13 PM, Peter Penz <pet...@gm...> wrote: > I'm not familiar with the xmlindexer. KFileMetaInfo uses a feature in Strigi that is not used by xmlindexer I think: The streams are limited to 64 KB and that is something where some analyzers crash. They do some pointer-arithmetics which goes behind those 64 KB without checking whether they have gotten such a range of memory. Yeah, that helps... $ split -b64K oha-common-lib-2.0-SNAPSHOT.jar $ xmlindexer xaa > /tmp/foo xmlindexer: /build/buildd-strigi_0.7.2-1+b1-i386-km54Pe/strigi-0.7.2/src/streamanalyzer/analysisresult.cpp:134: Strigi::AnalysisResult::Private::Private(const std::string&, const char*, time_t, Strigi::AnalysisResult&, Strigi::AnalysisResult&): Assertion `m_path.size() > m_parent->p->m_path.size()+1' failed. Aborted :/ Egon -- Dr E.L. Willighagen Postdoctoral Researcher Institutet för miljömedicin Karolinska Institutet Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |
From: Egon W. <ego...@gm...> - 2011-02-25 20:18:11
|
On Fri, Feb 25, 2011 at 6:36 PM, Peter Penz <pet...@gm...> wrote: > Before investigating a lot of work to forward the bugs below and before implementing such a black-list I'm writing this mail in the hope that my impression is completely wrong and that it still is only a matter of time until those crashes get fixed :-) I tested four files now, the first files I could find listed in bug reports. None make Strigi crash on my machine. None even make Dolphin crash on my machine. I also found one report out of the some ten-ish I looked at, that was already marked as fixed :/ Strigi doesn't seem to do all that bad... not the streamanalyzers anyway. So, it seems to me to be either something platform-dependent, or perhaps in the way Dolphin calls the streamanalyzers... I am also puzzled by several of your bug reports, because the are about copying files... is Dolphin in fact extracting metadata of files that it is copying? Egon -- Dr E.L. Willighagen Postdoctoral Researcher Institutet för miljömedicin Karolinska Institutet Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |
From: Peter P. <pet...@gm...> - 2011-02-25 20:14:12
|
Hi Egon, Thanks for your fast reply. On Friday 25 February 2011 20:28:38 Egon Willighagen wrote: [...] > I am not an active Strigi developer, but wrote strigi plugins in the > past for chemistry file. File formats are difficult to parse, in > chemistry, and surely in other domains too. The problem is that there > is much software around that does not follow standards, which makes > extracting information a hard problem. I agree that parsing is not easy and it is sometimes very tricky to consider all the strange things that are part of some file formats. I wrote several parsers some years ago when working for another company... But I'm really convinced that writing parsers in a way that they at least don't crash when getting invalid/corrupt/strange files as input is not that hard. > As such, the list is really not that bad in my opinion, and as a Java > developers it would not have mattered much, as you would not have > gotten segfault for this... I wonder if properly catching these > crashes in C++ is really not possible at all... > > But, that's not what I want to reply about. As said, input is crufty > and you cannot compensate for all. Compensating is not a must-have, but at least not crashing is really a must and not that difficult (at least when not doing crazy unchecked pointer-arithmetics like I've seen in a few analyzers...) > That must never cause segfaults, of > course. Now fixing these issues is tricky, as these segfaults are > relatively rare. I am not good at reading C++ stacktraces, and note a > lot of threading, complicating the output... > > Some bug reports do, in fact, ask for the files that cause the test; > IMHO that is crucial here, and should be part of the bug report. Some > bug reporters actually do report that, and that is very useful. Yes. > Now, a simple test to way to reproduce these crashes and get a cleaner > stacktrace would be to use the xmlindexer instead of dolphin, and > that's what I like to provide as feedback here. Please ask the bug > reporters that can reproduce the bug and know the file, to report what > xmlindexer does on those files. I fully agree, but we did this already. We attached files to the bug-reports or referred from the strigi bug tracker to the corresponding kde bug report where the file is attached. Still no reply for many, many reports... (not all of course, there has also been support from a few people). So the reporting does not seem to be the issue - it is fine for me if the maintainer of an analyzer tells me that he is currently busy and cannot fix this bug during the next months. But getting no answer at all for so many issues is quite concerning... :-( > The least that would do, if give stacktraces without threading going around. > > Egon > > > |
From: Peter P. <pet...@gm...> - 2011-02-25 20:13:58
|
On Friday 25 February 2011 20:42:34 Egon Willighagen wrote: > On Fri, Feb 25, 2011 at 8:28 PM, Egon Willighagen > <ego...@gm...> wrote: > > Some bug reports do, in fact, ask for the files that cause the test; > > IMHO that is crucial here, and should be part of the bug report. Some > > bug reporters actually do report that, and that is very useful. > > For example, bug report #258715 mentions a file, so I just quickly ran > xmlindex on it (1 and 8 threads), and commented in the bug report: > > "I ran Strigi on the mentioned .zip file (it will go into the .zip, > and thus still test the jars) using xmlindexer, and get no crash: > > $ xmlindexer querydsl-jpa-2.0.5-full-deps.zip > > Tons of XML output and an undefined symbol ("xmlindexer: symbol lookup > error: /usr/lib/libldap_r-2.4.so.2: undefined symbol: > ldap_int_tls_destroy, version OPENLDAP_2.4_2"), but no crash. > > As such, I wonder if it is really the streamanalyzers that are buggy, > because then I would have gotten the crash too. KDE 4.4.5, Strigi > 0.7.2 on Debian Squeeze 32 bit. I'm not familiar with the xmlindexer. KFileMetaInfo uses a feature in Strigi that is not used by xmlindexer I think: The streams are limited to 64 KB and that is something where some analyzers crash. They do some pointer-arithmetics which goes behind those 64 KB without checking whether they have gotten such a range of memory. From my point of view unit-tests for analyzers with the files attached to bug-reports would be mandatory to prevent issues like this. > I also tried it in threaded mode, again without crash: > > $ xmlindexer querydsl-jpa-2.0.5-full-deps.zip > > Can those who do get the crash, perhaps try to run Strigi on those > files with xmlindexer too?" Some people are really helpful and we could of course ask them. But I think we must be happy if people report bugs at all and are able to attach the document, the testing must really be done by the developers I think ;-) Peter > Egon > > |
From: Egon W. <ego...@gm...> - 2011-02-25 19:43:02
|
On Fri, Feb 25, 2011 at 8:28 PM, Egon Willighagen <ego...@gm...> wrote: > Some bug reports do, in fact, ask for the files that cause the test; > IMHO that is crucial here, and should be part of the bug report. Some > bug reporters actually do report that, and that is very useful. For example, bug report #258715 mentions a file, so I just quickly ran xmlindex on it (1 and 8 threads), and commented in the bug report: "I ran Strigi on the mentioned .zip file (it will go into the .zip, and thus still test the jars) using xmlindexer, and get no crash: $ xmlindexer querydsl-jpa-2.0.5-full-deps.zip Tons of XML output and an undefined symbol ("xmlindexer: symbol lookup error: /usr/lib/libldap_r-2.4.so.2: undefined symbol: ldap_int_tls_destroy, version OPENLDAP_2.4_2"), but no crash. As such, I wonder if it is really the streamanalyzers that are buggy, because then I would have gotten the crash too. KDE 4.4.5, Strigi 0.7.2 on Debian Squeeze 32 bit. I also tried it in threaded mode, again without crash: $ xmlindexer querydsl-jpa-2.0.5-full-deps.zip Can those who do get the crash, perhaps try to run Strigi on those files with xmlindexer too?" Egon -- Dr E.L. Willighagen Postdoctoral Researcher Institutet för miljömedicin Karolinska Institutet Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |
From: Egon W. <ego...@gm...> - 2011-02-25 19:29:06
|
Hi Peter, On Fri, Feb 25, 2011 at 6:36 PM, Peter Penz <pet...@gm...> wrote: > As there is still a quite huge list of open strigi-issues reported at bugs.kde.org (see [1]) I wanted to ask whether it makes sense at all to forward those reports to the strigi bug tracker? I am not an active Strigi developer, but wrote strigi plugins in the past for chemistry file. File formats are difficult to parse, in chemistry, and surely in other domains too. The problem is that there is much software around that does not follow standards, which makes extracting information a hard problem. As such, the list is really not that bad in my opinion, and as a Java developers it would not have mattered much, as you would not have gotten segfault for this... I wonder if properly catching these crashes in C++ is really not possible at all... But, that's not what I want to reply about. As said, input is crufty and you cannot compensate for all. That must never cause segfaults, of course. Now fixing these issues is tricky, as these segfaults are relatively rare. I am not good at reading C++ stacktraces, and note a lot of threading, complicating the output... Some bug reports do, in fact, ask for the files that cause the test; IMHO that is crucial here, and should be part of the bug report. Some bug reporters actually do report that, and that is very useful. Now, a simple test to way to reproduce these crashes and get a cleaner stacktrace would be to use the xmlindexer instead of dolphin, and that's what I like to provide as feedback here. Please ask the bug reporters that can reproduce the bug and know the file, to report what xmlindexer does on those files. The least that would do, if give stacktraces without threading going around. Egon -- Dr E.L. Willighagen Postdoctoral Researcher Institutet för miljömedicin Karolinska Institutet Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |
From: <ty...@cc...> - 2011-02-22 20:02:22
|
Hi Sylvain, Since no one from the Strigi team seems to have time to anwer I'll say how it looks like to me. By looking at https://projects.kde.org/projects/kdesupport/strigi/libstreamanalyzer/repository/revisions/master/entry/plugins/endplugins/jpegendanalyzer.cpp it seems that the JPG analyzer does not index the "keywords" or "taglist" fields. Shouldn't be too difficult to fix. I guess you can file a bug report. Kind regards, Tuukka Lainaus Sylvain ZUCCA <syl...@gm...>: > Hello, > > No answer at this time from nepomuk mailing list but isn't it a strigi job ? > > Regards > > ---------- Forwarded message ---------- > From: Sylvain ZUCCA <syl...@gm...> > Date: 2011/2/12 > Subject: Photos with keywords and Tags list > To: ne...@kd... > > > Hi, > > I wonder why nepomuk does not index keywords and Tags List contained in some > photos I've imported in my picture folder > > I show you an example here : http://i.imgur.com/Mphjw.jpg > > If I search for "Mimine" in Dolphin, there is no result > > I use Kde 4.6 on Mandriva Cooker (and otherwise nepomuk works fine) > > > > > > -- > Sylvain > |
From: Sylvain Z. <syl...@gm...> - 2011-02-15 06:56:01
|
Hello, No answer at this time from nepomuk mailing list but isn't it a strigi job ? Regards ---------- Forwarded message ---------- From: Sylvain ZUCCA <syl...@gm...> Date: 2011/2/12 Subject: Photos with keywords and Tags list To: ne...@kd... Hi, I wonder why nepomuk does not index keywords and Tags List contained in some photos I've imported in my picture folder I show you an example here : http://i.imgur.com/Mphjw.jpg If I search for "Mimine" in Dolphin, there is no result I use Kde 4.6 on Mandriva Cooker (and otherwise nepomuk works fine) -- Sylvain |
From: Karsten K. <re...@gm...> - 2011-02-04 15:46:55
|
Am Freitag, 28. Januar 2011, 11:58:17 schrieb Karsten König: > Hi, > > the constructor of oleinputstream has troubles validating the filesize, > especially when KMetaFileInfo limits the filesize to 64k. > kde bugreport: https://bugs.kde.org/attachment.cgi?bugid=251701 > > Until now the code tries guessing the maximum blockid and turns this into > an expected filesize, I replaced that guessing with the calculation of a > minimum and maximum size, assuming the that the last block allocation > table doesn't only point at free blocks. So there still is a possible > error window of 64k at the end of the file, but it should be more robust > then the current state, I am off for the weekend but I'll add a section to > read the last part of the bat and thus get the real filesi> ze and check if > this fits with the size we read from the file. Ok, I added this, it reads the last block for the block allocation table and checks in reverse order for the first non-free block, this has to be included in the filesize. > Is this patch acceptable or am I overlooking something important? > > Cheers, > Karsten Please somebody check this stuff, it is crashing KDE application from 4.6 that use KFileMetaInfo on a big ole compound document format file, this stuff gets shipped to users who will always see a crash thanks to oleinputstream constructor lacking proper input checking. Here is a fixed link, nobody complained about the old one... https://bugs.kde.org/show_bug.cgi?id=251701&action=View Bye, Karsten |
From: Karsten K. <re...@gm...> - 2011-01-28 10:57:52
|
Hi, the constructor of oleinputstream has troubles validating the filesize, especially when KMetaFileInfo limits the filesize to 64k. kde bugreport: https://bugs.kde.org/attachment.cgi?bugid=251701 Until now the code tries guessing the maximum blockid and turns this into an expected filesize, I replaced that guessing with the calculation of a minimum and maximum size, assuming the that the last block allocation table doesn't only point at free blocks. So there still is a possible error window of 64k at the end of the file, but it should be more robust then the current state, I am off for the weekend but I'll add a section to read the last part of the bat and thus get the real filesize and check if this fits with the size we read from the file. Is this patch acceptable or am I overlooking something important? Cheers, Karsten |
From: Tuukka V. <ty...@cc...> - 2011-01-27 15:54:44
|
Hi Evgeny, Have you had time to look at the analyzer? Cheers, Tuukka On Tuesday 21 December 2010 15:26:12 Evgeny Egorochkin wrote: > While I didn't have a chance to test this yet, this is an amazing > contribution. Analyzers of this complexity don't arrive by email all that > often :-) > > Thanks! > > On Tuesday 21 December 2010 14:37:36 Tuukka Verho wrote: > > * indexes PDF metadata (https://bugs.kde.org/show_bug.cgi?id=234069). > > Note: the Author field is not indexed because I don't know what > > ontology to use. It rarely occurs in files, though. > > Some PDFs also embed XMP metadata(unless I'm confusing something, but as far > as I remember, there's also PDF-specific simpler metadata block). Not to > big deal to fix this for me. > > > * removes hyphenation > > One more nasty issue that I encounter often is that in some files words are > split in parts, that is they are output using 2 or more text output > commands, which is really bad. Of course, PDF allows you do put the whole > page char by char in random order because PDF is essentially more like a > vector graphics format than a document format. But this particular case is > really nasty and does happen. I assume you didn't encounter or handle it > but I wonder if you have any idea how hard it will be to fix this also. > > > Performance: on my machine (Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz) > > it parses my collenction of 340 scientific papers in 35 seconds. Also, > > it parses the PDF reference (30MB, 1300 pages, in plain text 2.5MB) in > > 13 seconds. > > > > The text command cache can of course grow quite large, because often the > > file needs to be parsed entirely before even the very first string can > > be > > converted to UTF. With normal documents this is no problem, but in case > > of large files I added a hard-coded limit of 2MB after which it falls > > back to the ASCII mode for the strings it cannot convert and empties > > the cache. The memory taken by dictionaries can approach 1 megabyte for > > some very large files, but I haven't seen it exceed that. > > I guess we can be even more flexible than this by using analyzer > configuration to specify limits, but then again you don't really have to > worry about this. It's just a little bit of polish that should be applied > to several other analyzers also. |
From: Erik W. <eri...@iq...> - 2011-01-11 08:06:34
|
On Sunday 09-01-2011 09:00:47 you wrote: > Personally I tend to check out separate subprojects and compile/install > them in this order: libstreams, libstreamanalyzer, strigiutils. There's > are no tricks you need to know to do this afaik. Thanks for the feedback. Are you installing the stuff into your development machine? Or are you using some kind of VM to test the new version? I'm worried about that my development/desktop will not work properly if I'm compile a test version on that machine. -- So long... Erik |
From: Evgeny E. <phr...@gm...> - 2011-01-09 11:09:05
|
On Thursday 30 December 2010 12:20:35 Erik Wasser wrote: > after the transfer from svn to the single git repositories I'm asking > myself how to compile the 5 different projects? > > Compiling the 'libstreamanalyzer' projects results in the following error > message: > > In file included from > /home/ewasser/src/external/libstreamanalyzer/lib/analysisresult.cpp:22:0: > /home/ewasser/src/external/libstreamanalyzer/lib/config.h:153:33: fatal > error: strigi/strigiconfig.h: No such file or directory > compilation terminated. > > Of course I know that I can find the file in the 'libstreams' project > directory but I have to fiddle with some cmake options to add this include > directory. And doing it will me just bring me probably only to the next > error message. B-) > > So what's the best practice to compile the stuff as "one" project? > Compiling out of the svn directory worked liked charm. Personally I tend to check out separate subprojects and compile/install them in this order: libstreams, libstreamanalyzer, strigiutils. There's are no tricks you need to know to do this afaik. -- Evgeny |
From: Evgeny E. <phr...@gm...> - 2011-01-08 18:22:49
|
On Friday 07 January 2011 15:16:26 Erik Wasser wrote: > On Monday 27-12-2010 23:41:12 Erik Wasser wrote: > > See the attachment. > > > > CHANGELOG: > > - bugfix: using the wrong buffer offset for the 'createdField' of the > > ID3v1 tags > > - added: removing useless spaces from ID3v1 tags > > Some kind of feedback to this patch would be very nice. B-) Sorry I'm very busy atm. I don't ignore and lose emails in general though, especially if they contain patches :) -- Evgeny |