carrot2-developers Mailing List for Carrot2
Brought to you by:
dawidweiss,
stachoo
This list is closed, nobody may subscribe to it.
2003 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(4) |
Aug
(1) |
Sep
(6) |
Oct
(8) |
Nov
(11) |
Dec
(5) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2004 |
Jan
(1) |
Feb
(8) |
Mar
(3) |
Apr
(6) |
May
(3) |
Jun
(3) |
Jul
(7) |
Aug
(17) |
Sep
(2) |
Oct
(10) |
Nov
(4) |
Dec
|
2005 |
Jan
(9) |
Feb
(11) |
Mar
(7) |
Apr
(6) |
May
(9) |
Jun
(7) |
Jul
|
Aug
(12) |
Sep
|
Oct
|
Nov
|
Dec
(4) |
2006 |
Jan
(20) |
Feb
(16) |
Mar
(4) |
Apr
(2) |
May
|
Jun
(2) |
Jul
(1) |
Aug
(6) |
Sep
(21) |
Oct
(37) |
Nov
(28) |
Dec
(5) |
2007 |
Jan
(7) |
Feb
(5) |
Mar
(31) |
Apr
(109) |
May
(31) |
Jun
(84) |
Jul
(70) |
Aug
(81) |
Sep
(42) |
Oct
(53) |
Nov
(31) |
Dec
(26) |
2008 |
Jan
(44) |
Feb
(72) |
Mar
(171) |
Apr
(76) |
May
(174) |
Jun
(98) |
Jul
(235) |
Aug
(152) |
Sep
(91) |
Oct
(145) |
Nov
(143) |
Dec
(34) |
2009 |
Jan
(89) |
Feb
(148) |
Mar
(189) |
Apr
(81) |
May
(154) |
Jun
(40) |
Jul
(148) |
Aug
(98) |
Sep
(83) |
Oct
(186) |
Nov
(67) |
Dec
(131) |
2010 |
Jan
(118) |
Feb
(85) |
Mar
(106) |
Apr
(71) |
May
(54) |
Jun
(72) |
Jul
(65) |
Aug
(126) |
Sep
(79) |
Oct
(75) |
Nov
(24) |
Dec
(23) |
2011 |
Jan
(37) |
Feb
(37) |
Mar
(163) |
Apr
(167) |
May
(34) |
Jun
(87) |
Jul
(128) |
Aug
(69) |
Sep
(93) |
Oct
(28) |
Nov
(58) |
Dec
(55) |
2012 |
Jan
(38) |
Feb
(19) |
Mar
(58) |
Apr
(25) |
May
(47) |
Jun
(88) |
Jul
(43) |
Aug
(60) |
Sep
(35) |
Oct
(68) |
Nov
(46) |
Dec
(9) |
2013 |
Jan
(34) |
Feb
(24) |
Mar
(44) |
Apr
(53) |
May
(44) |
Jun
(51) |
Jul
(87) |
Aug
(27) |
Sep
(26) |
Oct
(55) |
Nov
(13) |
Dec
(14) |
2014 |
Jan
(13) |
Feb
(34) |
Mar
(11) |
Apr
(22) |
May
(34) |
Jun
(15) |
Jul
(35) |
Aug
(15) |
Sep
(15) |
Oct
(10) |
Nov
(22) |
Dec
(4) |
2015 |
Jan
(15) |
Feb
(47) |
Mar
(36) |
Apr
(2) |
May
(23) |
Jun
(1) |
Jul
(11) |
Aug
(19) |
Sep
(5) |
Oct
(37) |
Nov
|
Dec
|
2016 |
Jan
|
Feb
(12) |
Mar
(25) |
Apr
(3) |
May
(18) |
Jun
(11) |
Jul
(2) |
Aug
(42) |
Sep
(7) |
Oct
(5) |
Nov
(28) |
Dec
(3) |
2017 |
Jan
|
Feb
(8) |
Mar
(4) |
Apr
|
May
(5) |
Jun
|
Jul
(24) |
Aug
(2) |
Sep
(6) |
Oct
(4) |
Nov
|
Dec
|
2018 |
Jan
(13) |
Feb
(9) |
Mar
(6) |
Apr
(3) |
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2019 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
(1) |
2020 |
Jan
|
Feb
|
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Dawid W. <daw...@gm...> - 2019-09-20 11:40:49
|
Dear All, This is a point release that: 1) replaces simple-xml with a "safe" version that prevents potential XXS attacks. If you're running the DCS with uncontrolled content, please upgrade immediately. 2) upgrades jackson to its latest version (2.9.9). 3) marks the end of support for Java 1.7. Java 1.8 is required from now on for Carrot2 3.x branch. Release notes: http://project.carrot2.org/release-3.16.2-notes.html Download: http://get.carrot2.org JIRA issues: https://issues.carrot2.org/projects/CARROT/versions/14320 Dawid Weiss, Stanislaw Osinski Carrot Search, in...@ca... |
From: Dawid W. <daw...@gm...> - 2019-01-15 10:40:34
|
Dear All, It's been quite a while, but finally we're getting back to cleaning up some old cruft in Carrot2 that's become obsolete over the years. To this tune, we're releasing 3.16.1, which is very likely the last one from the "old" line of development. It contains the following minor improvements: - updates FoamTree and Circles visualizations, - moves Bing API from V5 to V7. Release notes: http://project.carrot2.org/release-3.16.1-notes.html Download: http://get.carrot2.org JIRA issues: https://goo.gl/tgfbs2 Version 4.0.0 should clean up the API and bring external applications up to date with modern technology. Stay tuned, Dawid Weiss, Stanislaw Osinski Carrot Search, in...@ca... |
From: Dawid W. <daw...@gm...> - 2018-05-23 10:52:41
|
Dear All, We're pleased to announce Carrot2 3.16.0. This release: - addressed several incompatibilities with newer Java versions, both at build-level and at runtime. - third-party dependency updates (to their latest Java 1.7-compatible versions). Jackson XML, Lucene (5.5.5), Velocity (2.0). - Workbench bug fixes and workarounds (newer Ubuntu releases, odd characters in installation path, Java 1.9 compatibility). - Adjusted ETools document source's feed URL to HTTPS. - .NET document constructors now include one with a custom document ID. Release notes: http://project.carrot2.org/release-3.16.0-notes.html Download: http://get.carrot2.org JIRA issues: https://goo.gl/JHvQx9 Thanks! Dawid Weiss, Stanislaw Osinski Carrot Search, in...@ca... |
From: Lonnie C. <lo...@ou...> - 2018-04-12 16:41:38
|
Hello All, I hope that your day is going well. Lately, I have been on the hunt for a good open source meta search engine for a project. While having just a very little familiarization with Carrot2, I mentally placed it in the clustering applications space in my mind, but just recently came across it on the Internet as possibly being a viable MetaSearch engine and wanted to ask for more information on this if someone might be able to assist? What I have in mind is a small MetaSearch engine that I can add more external engines to be searched and results returned by Carrot2. I also think that Carrot2 can be setup as an independent server so that it could be sent a query and then return JSON results. Is this also true? Thanks in advance. Lonnie On Sun, Mar 11, 2018 at 1:50 PM, Dawid Weiss <daw...@gm...> wrote: > Hi Lonnie, > > We use single-source meta-search engine via friendly agreement with > Comcepta AG; see this faq entry, there is a link to Comcepta's web > page if you'd like to use their services. > > http://project.carrot2.org/faq.html#ipbanned > > Dawid > > On Sun, Mar 11, 2018 at 5:54 PM, Lonnie Cumberland <lo...@ou...> > wrote: > > Hello Dawid, > > > > I hope that you are doing well today and I had a question about Carrot2 > that > > I hope you can answer for me. > > > > Currently, my search engine project is actually moving towards a Java > basis > > with our frontend working to be using the Angular framework. I had > already > > wanted to use Carrot2 for clustering once the system is operational, but > > while working on some things, I started to wonder some questions about > > Carrot2. > > > > One thing that I was wondering, as I had looked at your demo > > (https://project.carrot2.org/index.html) and was playing around with > it, so > > I started to wonder. Does this Carrot2 demo collect results from > different > > search engines like a Metasearch engine and then cluster them, or are > all of > > the results from a single source? I am seeking an optimal solution for a > > Metasearch server that can be called via an API with a query and return > > results from many different search engines but did not remember how your > > demo obtains it results. I think that you may have mentioned getting them > > from a single source, but not complete sure. > > > > Cheers and have a great day, > > Lonnie > > > > > > > > ------------------------------------------------------------ > ------------------ > > Check out the vibrant tech community on one of the world's most > > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > > _______________________________________________ > > Carrot2-developers mailing list > > Car...@li... > > https://lists.sourceforge.net/lists/listinfo/carrot2-developers > > > > ------------------------------------------------------------ > ------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Carrot2-developers mailing list > Car...@li... > https://lists.sourceforge.net/lists/listinfo/carrot2-developers > |
From: Dawid W. <daw...@gm...> - 2018-04-04 19:28:06
|
Please see "IP banned", here: http://project.carrot2.org/faq.html#ipbanned Dawid On Wed, Apr 4, 2018 at 6:28 PM, Bogdan Aanei via Carrot2-developers <car...@li...> wrote: > Hello, > > When I try to acces the carrot 2 online version I receive on the screen the > following message: org.carrot2.source.etools.IpBannedException: > org.apache.http.client.HttpResponseException: Forbidden > > How can I solve this problem? > > Thanks > Bogdan > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Carrot2-developers mailing list > Car...@li... > https://lists.sourceforge.net/lists/listinfo/carrot2-developers > |
From: Bogdan A. <av....@ya...> - 2018-04-04 17:09:27
|
Hello, When I try to acces the carrot 2 online version I receive on the screen the following message: org.carrot2.source.etools.IpBannedException: org.apache.http.client.HttpResponseException: Forbidden How can I solve this problem? ThanksBogdan |
From: Dawid W. <daw...@gm...> - 2018-03-11 17:50:56
|
Hi Lonnie, We use single-source meta-search engine via friendly agreement with Comcepta AG; see this faq entry, there is a link to Comcepta's web page if you'd like to use their services. http://project.carrot2.org/faq.html#ipbanned Dawid On Sun, Mar 11, 2018 at 5:54 PM, Lonnie Cumberland <lo...@ou...> wrote: > Hello Dawid, > > I hope that you are doing well today and I had a question about Carrot2 that > I hope you can answer for me. > > Currently, my search engine project is actually moving towards a Java basis > with our frontend working to be using the Angular framework. I had already > wanted to use Carrot2 for clustering once the system is operational, but > while working on some things, I started to wonder some questions about > Carrot2. > > One thing that I was wondering, as I had looked at your demo > (https://project.carrot2.org/index.html) and was playing around with it, so > I started to wonder. Does this Carrot2 demo collect results from different > search engines like a Metasearch engine and then cluster them, or are all of > the results from a single source? I am seeking an optimal solution for a > Metasearch server that can be called via an API with a query and return > results from many different search engines but did not remember how your > demo obtains it results. I think that you may have mentioned getting them > from a single source, but not complete sure. > > Cheers and have a great day, > Lonnie > > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Carrot2-developers mailing list > Car...@li... > https://lists.sourceforge.net/lists/listinfo/carrot2-developers > |
From: Lonnie C. <lo...@ou...> - 2018-03-11 16:54:23
|
Hello Dawid, I hope that you are doing well today and I had a question about Carrot2 that I hope you can answer for me. Currently, my search engine project is actually moving towards a Java basis with our frontend working to be using the Angular framework. I had already wanted to use Carrot2 for clustering once the system is operational, but while working on some things, I started to wonder some questions about Carrot2. One thing that I was wondering, as I had looked at your demo ( https://project.carrot2.org/index.html) and was playing around with it, so I started to wonder. Does this Carrot2 demo collect results from different search engines like a Metasearch engine and then cluster them, or are all of the results from a single source? I am seeking an optimal solution for a Metasearch server that can be called via an API with a query and return results from many different search engines but did not remember how your demo obtains it results. I think that you may have mentioned getting them from a single source, but not complete sure. Cheers and have a great day, Lonnie |
From: Dawid W. <daw...@gm...> - 2018-03-11 16:34:15
|
> I am going to send this essay to official email of the Carrot2 company and it's correctness is important to me. Please, don't. We are the same people in the open source project and behind the commercial company, so don't duplicate. > > Hitherto, it works fine But here after When i Use Carrot2 workbench to search on Lucene indexed files, it shows all Persian stop words so that i think Lucene didn't remove stop words! Lucene did ignore stop words during indexing, but the content of your text fields is stored in full. Carrot2 does not use Lucene indexes -- it takes those stored fields and tokenizes them again (using its own pipeline), that's why you see stop words on Carrot2 side. > > I read an article in your online docs about adding extra stop words into stop words list, so I Added those of mine into stopwords.en under carrot2-workbench-3.15.1\workspace directory, and ran Carrot2 workbench again but it shows stop words in its results! I don't know why this is the case, but we don't really support Persian [1]. Could be that the English analyzer is not splitting things correctly. > However i added stop words both in Lucene Indexer and Carrot2 stopwords.en, it still shows stop words. > How do i remove stop words from search results in this case? Please take a look at the Java API examples in Carrot2 (not the Workbench) and feed the text of your input documents to Carrot2 directly. You'd need to hack the project to use Persian analyzer/ tokenizer from Lucene instead of Carrot2 default one. All this requires programming skills (and some patience); we won't be able to guide you step-by-step, unfortunately. Dawid [1] https://github.com/carrot2/carrot2/blob/master/core/carrot2-core/src/org/carrot2/core/LanguageCode.java |
From: Mo. P. <mo....@gm...> - 2018-03-10 11:28:50
|
I am going to send this essay to official email of the Carrot2 company and it's correctness is important to me. Would be highly appreciated to have your comments about this essay. Hi there I want to search on some indexed files, indexed by Lucene, using Carrot2 Workbench. In Carrot2 Workbench, i set Lucene as source and Lingo as search Algorithm on indexed files. StandardAnalyzer is used in Lucene with some added Persian stop words. This is some of my code in Lucene indexer: Set<String> set = new HashSet<String>(Arrays.asList("")); CharArraySet stopSet = CharArraySet.copy(set); System.out.println(stopSet.size()); File file = new File("stopwords.fa"); //Persian Stop Words List List<String> lines = new ArrayList<>(); BufferedReader br = new BufferedReader(new FileReader(file)); for (String line; (line = br.readLine()) != null { stopSet.add(line.trim()); } analyzer = new StandardAnalyzer(stopSet); System.out.println(" # of stop words in Standard Analyzer : " +analyzer.getStopwordSet().size()); config = new IndexWriterConfig(analyzer); index = FSDirectory.open(Paths.get("index-dir")); writoo = new IndexWriter(index, config); writoo.deleteAll(); I tested and sure it works as expected because when i do search a Persian stop word on Lucene results, it does return no hint. This is some more of my code used to search in Lucene: Indexer indexer1 = new Indexer(); indexer1.indexer(); . . . indexer1.query("از"); indexer1.searcher(); indexer1.display(ag); output run: # of Messages: 17913 # of stop words in Standard Analyzer : 320 Found 0 hits. BUILD SUCCESSFUL (total time: 2 seconds) Hitherto, it works fine But here after When i Use Carrot2 workbench to search on Lucene indexed files, it shows all Persian stop words so that i think Lucene didn't remove stop words! I read an article in your online docs about adding extra stop words into stop words list, so I Added those of mine into stopwords.en under carrot2-workbench-3.15.1\workspace directory, and ran Carrot2 workbench again but it shows stop words in its results! However i added stop words both in Lucene Indexer and Carrot2 stopwords.en, it still shows stop words. How do i remove stop words from search results in this case? Thanks |
From: Mo. P. <mo....@gm...> - 2018-03-07 10:34:47
|
Hi there I am using Lucene as source and Lingo as search Algorithm on a indexed files. StandardAnalyzer is used in Lucene with some extra Persian stop words. This is some of my code in Lucene indexer: Set<String> set = new HashSet<String>(Arrays.asList("")); > CharArraySet stopSet = CharArraySet.copy(set); > System.out.println(stopSet.size()); > File file = new File("stopwords.fa"); //Persian Stop Words List > List<String> lines = new ArrayList<>(); > BufferedReader br = new BufferedReader(new FileReader(file)); > for (String line; (line = br.readLine()) != null;) > { > stopSet.add(line.trim()); > } > analyzer = new StandardAnalyzer(stopSet); > System.out.println(" # of stop words in Standard Analyzer : " > +analyzer.getStopwordSet().size()); config = new IndexWriterConfig(analyzer); > index = FSDirectory.open(Paths.get("index-dir")); > writoo = new IndexWriter(index, config); > writoo.deleteAll(); I think it works as expected because when i do search a Persian stop word on Lucene index fie, it does return no hint. This is some more of my code used to search in Lucene: Indexer indexer1 = new Indexer(); > indexer1.indexer(); > . > . > . > indexer1.query("از"); > indexer1.searcher(); > indexer1.display(ag); output > run: > # of Messages: 17913 > # of stop words in Standard Analyzer : 320 > Found 0 hits. > BUILD SUCCESSFUL (total time: 2 seconds) Hitherto it works fine But here after When i Use Carrot2 to search on Lucene indexed files, it shows all Persian stop words so that i think Lucene didn't remove stop words! I Added those stop words into *stopwords.en* under *carrot2-workbench-3.15.1\workspace* directory, and ran Carrot2 workbench again but it showed stop words in it's results? How do i remove stop words from search results in this case? Thanks |
From: Mo. P. <mo....@gm...> - 2018-03-07 09:00:42
|
Hi there I asked my question On StackOverFlow.com https://stackoverflow.com/questions/49095981/carrot2-doesnt-show-clusters-all-containing-specific-word-on-search Hope to have your answer soon. Thanks |
From: Stanislaw O. <st...@gm...> - 2018-02-27 10:12:42
|
Hi Aziz, You need to force Workbench to reload lexical resources, see here: http://doc.carrot2.org/#section.lexical-resources.in-workbench Stanislaw On Sat, Feb 17, 2018 at 10:10 PM, Aziz Miah <a....@hs...> wrote: > Hello all, > > > I've been using Carrot2 Workbench lately and tried using the SMART > stopwords list: > > > http://ftp.gnome.org/mirror/archive/ftp.sunet.se/pub/ > databases/full-text/smart/english.stop > > > I saved this into the */xxx/workspace/* directory as *stopwords.en* > (keeping a backup of the original stopwords.en file for safety) > > > I then re-ran my XML file in Workbench, expecting it to produce different > results now that a new stopwords list is being used. > > > To my surprise, the new Workbench output is identical to the output one > gets from using the default *stopwords.en* file! > > > Either my sample data is "stopword-proof" (i.e. no amount of stopwords can > change the outcome because of how unique it is if that's even possible???) > or I must be doing something wrong in Workbench? > > > Is using a custom stopwords list really as simple as overwriting the > original *stopwords.en* file? Or do I need to change a setting in the > GUI? Or is there a problem with the format of the list I've downloaded > (given that the default stopwords.en file looks like it is regexy in > format?). > > > Any thoughts would be most appreciated. > > > Regards, > > A > > > > ------------------------------------------------------------ > ------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Carrot2-developers mailing list > Car...@li... > https://lists.sourceforge.net/lists/listinfo/carrot2-developers > > |
From: Aziz M. <a....@hs...> - 2018-02-21 16:23:28
|
That checkbox seems to have been the key to all my issues - thanks! ________________________________ From: Dawid Weiss <daw...@gm...> Sent: 18 February 2018 18:33:10 To: Carrot2-developers Subject: Re: [C2-devel] Using a custom stopwords.en list The result is served from the cache. You need to select "reload resources" checkbox in the user interface or restart the Workbench for changes to be picked up. Dawid On Sat, Feb 17, 2018 at 10:10 PM, Aziz Miah <a....@hs...> wrote: > Hello all, > > > I've been using Carrot2 Workbench lately and tried using the SMART stopwords > list: > > > http://ftp.gnome.org/mirror/archive/ftp.sunet.se/pub/databases/full-text/smart/english.stop > > > I saved this into the /xxx/workspace/ directory as stopwords.en (keeping a > backup of the original stopwords.en file for safety) > > > I then re-ran my XML file in Workbench, expecting it to produce different > results now that a new stopwords list is being used. > > > To my surprise, the new Workbench output is identical to the output one gets > from using the default stopwords.en file! > > > Either my sample data is "stopword-proof" (i.e. no amount of stopwords can > change the outcome because of how unique it is if that's even possible???) > or I must be doing something wrong in Workbench? > > > Is using a custom stopwords list really as simple as overwriting the > original stopwords.en file? Or do I need to change a setting in the GUI? Or > is there a problem with the format of the list I've downloaded (given that > the default stopwords.en file looks like it is regexy in format?). > > > Any thoughts would be most appreciated. > > > Regards, > > A > > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Carrot2-developers mailing list > Car...@li... > https://lists.sourceforge.net/lists/listinfo/carrot2-developers > ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Carrot2-developers mailing list Car...@li... https://lists.sourceforge.net/lists/listinfo/carrot2-developers |
From: Dawid W. <daw...@gm...> - 2018-02-19 23:23:05
|
The result is served from the cache. You need to select "reload resources" checkbox in the user interface or restart the Workbench for changes to be picked up. Dawid On Sat, Feb 17, 2018 at 10:10 PM, Aziz Miah <a....@hs...> wrote: > Hello all, > > > I've been using Carrot2 Workbench lately and tried using the SMART stopwords > list: > > > http://ftp.gnome.org/mirror/archive/ftp.sunet.se/pub/databases/full-text/smart/english.stop > > > I saved this into the /xxx/workspace/ directory as stopwords.en (keeping a > backup of the original stopwords.en file for safety) > > > I then re-ran my XML file in Workbench, expecting it to produce different > results now that a new stopwords list is being used. > > > To my surprise, the new Workbench output is identical to the output one gets > from using the default stopwords.en file! > > > Either my sample data is "stopword-proof" (i.e. no amount of stopwords can > change the outcome because of how unique it is if that's even possible???) > or I must be doing something wrong in Workbench? > > > Is using a custom stopwords list really as simple as overwriting the > original stopwords.en file? Or do I need to change a setting in the GUI? Or > is there a problem with the format of the list I've downloaded (given that > the default stopwords.en file looks like it is regexy in format?). > > > Any thoughts would be most appreciated. > > > Regards, > > A > > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Carrot2-developers mailing list > Car...@li... > https://lists.sourceforge.net/lists/listinfo/carrot2-developers > |
From: Aziz M. <a....@hs...> - 2018-02-17 21:10:47
|
Hello all, I've been using Carrot2 Workbench lately and tried using the SMART stopwords list: http://ftp.gnome.org/mirror/archive/ftp.sunet.se/pub/databases/full-text/smart/english.stop I saved this into the /xxx/workspace/ directory as stopwords.en (keeping a backup of the original stopwords.en file for safety) I then re-ran my XML file in Workbench, expecting it to produce different results now that a new stopwords list is being used. To my surprise, the new Workbench output is identical to the output one gets from using the default stopwords.en file! Either my sample data is "stopword-proof" (i.e. no amount of stopwords can change the outcome because of how unique it is if that's even possible???) or I must be doing something wrong in Workbench? Is using a custom stopwords list really as simple as overwriting the original stopwords.en file? Or do I need to change a setting in the GUI? Or is there a problem with the format of the list I've downloaded (given that the default stopwords.en file looks like it is regexy in format?). Any thoughts would be most appreciated. Regards, A |
From: alain_desilets <ala...@nr...> - 2018-02-15 23:41:38
|
I have the same need, so +1 for that. -- Sent from: http://carrot2-users-and-developers-forum.607571.n2.nabble.com/ |
From: Lonnie C. <lo...@ou...> - 2018-02-02 10:12:03
|
Oops, I mistakenly wrote "David" and if should have properly been "Dawid". Sorry, Lonnie On Fri, Feb 2, 2018 at 5:09 AM, Lonnie Cumberland <lo...@ou...> wrote: > Hi David, > > Thanks again and it looks like Carrot2 may be a great addition to the > project. > > As soon as we get a couple other parts of the search engine in place for > the containers based approach then I will look at getting a DCS container > set up for some testing. > > I appreciate you taking the time to discuss it with me and will keep the > community up to date on the progress. > > Cheers and have a great weekend, > Lonnie > > On Fri, Feb 2, 2018 at 2:54 AM, Dawid Weiss <daw...@gm...> wrote: > >> > My project is using Python, but it seems from what I have briefly read >> that >> > Carrot2 can be setup to run as a type of external server such that it >> can be >> > sent data and called from non-Java languages via REST. >> >> You need the DCS (document clustering server). It acts as a HTTP/REST >> service and >> you can query it from Python. There is no example written in Python >> that ships with the DCS, >> but it should be trivial to write one (even based on the Ruby example >> that is in there). >> >> > I am wondering if Carrot2 could be setup as a Docker container service >> that >> > I could call from my Python code to cluster and return the data and >> >> Probably. I don't have that much experience with docker, but it's a >> command-line script and a Java >> service, so I'm sure it can be set up in a dockerized environment. >> >> > If this were possible then scaling could be done just by adding >> > more Carrot2 docker containers. >> >> Again: very likely yes, DCS is stateless so you could run a HTTP >> load-balancing proxy in front of your >> DCS services and manage that. >> >> > If Carrot2 is setup as an external service and called by non-Java code >> like >> > my Python code, then does it support have multiple different instances >> of a >> > non-Java application calling a REST service concurrently? >> >> That's the point of a service exposed via HTTP/REST, unless I don't >> understand your question. >> >> D. >> >> ------------------------------------------------------------ >> ------------------ >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >> _______________________________________________ >> Carrot2-developers mailing list >> Car...@li... >> https://lists.sourceforge.net/lists/listinfo/carrot2-developers >> > > |
From: Lonnie C. <lo...@ou...> - 2018-02-02 10:10:06
|
Hi David, Thanks again and it looks like Carrot2 may be a great addition to the project. As soon as we get a couple other parts of the search engine in place for the containers based approach then I will look at getting a DCS container set up for some testing. I appreciate you taking the time to discuss it with me and will keep the community up to date on the progress. Cheers and have a great weekend, Lonnie On Fri, Feb 2, 2018 at 2:54 AM, Dawid Weiss <daw...@gm...> wrote: > > My project is using Python, but it seems from what I have briefly read > that > > Carrot2 can be setup to run as a type of external server such that it > can be > > sent data and called from non-Java languages via REST. > > You need the DCS (document clustering server). It acts as a HTTP/REST > service and > you can query it from Python. There is no example written in Python > that ships with the DCS, > but it should be trivial to write one (even based on the Ruby example > that is in there). > > > I am wondering if Carrot2 could be setup as a Docker container service > that > > I could call from my Python code to cluster and return the data and > > Probably. I don't have that much experience with docker, but it's a > command-line script and a Java > service, so I'm sure it can be set up in a dockerized environment. > > > If this were possible then scaling could be done just by adding > > more Carrot2 docker containers. > > Again: very likely yes, DCS is stateless so you could run a HTTP > load-balancing proxy in front of your > DCS services and manage that. > > > If Carrot2 is setup as an external service and called by non-Java code > like > > my Python code, then does it support have multiple different instances > of a > > non-Java application calling a REST service concurrently? > > That's the point of a service exposed via HTTP/REST, unless I don't > understand your question. > > D. > > ------------------------------------------------------------ > ------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Carrot2-developers mailing list > Car...@li... > https://lists.sourceforge.net/lists/listinfo/carrot2-developers > |
From: Dawid W. <daw...@gm...> - 2018-02-02 07:54:45
|
> My project is using Python, but it seems from what I have briefly read that > Carrot2 can be setup to run as a type of external server such that it can be > sent data and called from non-Java languages via REST. You need the DCS (document clustering server). It acts as a HTTP/REST service and you can query it from Python. There is no example written in Python that ships with the DCS, but it should be trivial to write one (even based on the Ruby example that is in there). > I am wondering if Carrot2 could be setup as a Docker container service that > I could call from my Python code to cluster and return the data and Probably. I don't have that much experience with docker, but it's a command-line script and a Java service, so I'm sure it can be set up in a dockerized environment. > If this were possible then scaling could be done just by adding > more Carrot2 docker containers. Again: very likely yes, DCS is stateless so you could run a HTTP load-balancing proxy in front of your DCS services and manage that. > If Carrot2 is setup as an external service and called by non-Java code like > my Python code, then does it support have multiple different instances of a > non-Java application calling a REST service concurrently? That's the point of a service exposed via HTTP/REST, unless I don't understand your question. D. |
From: Lonnie C. <lo...@ou...> - 2018-02-01 17:30:33
|
Hi David, Thanks for your information and I am exploring Carrot2 in more detail now to see if it could work for my project. My project is using Python, but it seems from what I have briefly read that Carrot2 can be setup to run as a type of external server such that it can be sent data and called from non-Java languages via REST. I am wondering if Carrot2 could be setup as a Docker container service that I could call from my Python code to cluster and return the data and clusters. If this were possible then scaling could be done just by adding more Carrot2 docker containers. This brings me to an additional question. If Carrot2 is setup as an external service and called by non-Java code like my Python code, then does it support have multiple different instances of a non-Java application calling a REST service concurrently? Thanks and I really think that I will be able to use Carrot2 in my project. Cheers, Lonnie On Tue, Jan 30, 2018 at 3:16 AM, Dawid Weiss <daw...@gm...> wrote: > The source code and binary-build links are on the same website and on > github. > > http://project.carrot2.org/download.html > https://github.com/carrot2/carrot2/ > > Please note that the "search engine" frontend is merely a demo and > backed by an external search results provider (Comcepta), whom > you need to contact in order to get larger per-IP request limits (or > buy an API access from Bing Search). > > Dawid > > On Tue, Jan 30, 2018 at 1:11 AM, Lonnie Cumberland <lo...@ou...> > wrote: > > Greetings All, > > > > I am researching the topic of search engines and metasearch engines and > have > > come across the Carrnot2 clustering engine which looks very promising for > > possible use in a project that I am designing out at the moment. > > > > In particular, on the home page of Carrot2 > > (http://project.carrot2.org/index.html) there is the live demo > > (http://search.carrot2.org/stable/search?q=salsa) which I find extremely > > interesting. > > > > I would very much like to locate the source code and binaries so that I > can > > set up a local "live demo" to test out and play around with on my local > > Ubuntu 16.04 system. > > > > Being new to Carrnot2, I am still trying to get a feel for things and how > > things are laid out. > > > > Any information or suggestions would be greatly appreciated. > > > > Thanks, > > Lonnie > > > > ------------------------------------------------------------ > ------------------ > > Check out the vibrant tech community on one of the world's most > > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > > _______________________________________________ > > Carrot2-developers mailing list > > Car...@li... > > https://lists.sourceforge.net/lists/listinfo/carrot2-developers > > > > ------------------------------------------------------------ > ------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Carrot2-developers mailing list > Car...@li... > https://lists.sourceforge.net/lists/listinfo/carrot2-developers > |
From: Dawid W. <daw...@gm...> - 2018-01-30 08:17:20
|
The source code and binary-build links are on the same website and on github. http://project.carrot2.org/download.html https://github.com/carrot2/carrot2/ Please note that the "search engine" frontend is merely a demo and backed by an external search results provider (Comcepta), whom you need to contact in order to get larger per-IP request limits (or buy an API access from Bing Search). Dawid On Tue, Jan 30, 2018 at 1:11 AM, Lonnie Cumberland <lo...@ou...> wrote: > Greetings All, > > I am researching the topic of search engines and metasearch engines and have > come across the Carrnot2 clustering engine which looks very promising for > possible use in a project that I am designing out at the moment. > > In particular, on the home page of Carrot2 > (http://project.carrot2.org/index.html) there is the live demo > (http://search.carrot2.org/stable/search?q=salsa) which I find extremely > interesting. > > I would very much like to locate the source code and binaries so that I can > set up a local "live demo" to test out and play around with on my local > Ubuntu 16.04 system. > > Being new to Carrnot2, I am still trying to get a feel for things and how > things are laid out. > > Any information or suggestions would be greatly appreciated. > > Thanks, > Lonnie > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Carrot2-developers mailing list > Car...@li... > https://lists.sourceforge.net/lists/listinfo/carrot2-developers > |
From: Lonnie C. <lo...@ou...> - 2018-01-30 00:32:42
|
Greetings All, I am researching the topic of search engines and metasearch engines and have come across the Carrnot2 clustering engine which looks very promising for possible use in a project that I am designing out at the moment. In particular, on the home page of Carrot2 ( http://project.carrot2.org/index.html) there is the live demo ( http://search.carrot2.org/stable/search?q=salsa) which I find extremely interesting. I would very much like to locate the source code and binaries so that I can set up a local "live demo" to test out and play around with on my local Ubuntu 16.04 system. Being new to Carrnot2, I am still trying to get a feel for things and how things are laid out. Any information or suggestions would be greatly appreciated. Thanks, Lonnie |
From: Dawid W. <daw...@gm...> - 2018-01-24 13:00:27
|
Definitely there... https://repo1.maven.org/maven2/org/carrot2/elasticsearch-carrot2/5.6.2/ On Wed, Jan 24, 2018 at 1:56 PM, Désilets, Alain <Ala...@nr...> wrote: > These are sonatype's pre-release repositories; once they're synced > with Maven Central they're wiped I think. The release is here: > https://repo1.maven.org/maven2/org/carrot2/elasticsearch-carrot2/5.6.2/elasticsearch-carrot2-5.6.2.zip > Dawid > > > Thanks Dawid. I was able to download it from that URL. > > https://repo1.maven.org/maven2/org/carrot2/elasticsearch-carrot2/ > > > I don't see 5.6.2. The most recent version displayed is 5.5.2. > > For now, I'll just add this 5.6.2 to my local .m2 repo. > > Alain > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Carrot2-developers mailing list > Car...@li... > https://lists.sourceforge.net/lists/listinfo/carrot2-developers > |
From: Désilets, A. <Ala...@nr...> - 2018-01-24 12:56:26
|
These are sonatype's pre-release repositories; once they're synced with Maven Central they're wiped I think. The release is here: https://repo1.maven.org/maven2/org/carrot2/elasticsearch-carrot2/5.6.2/elasticsearch-carrot2-5.6.2.zip Dawid Thanks Dawid. I was able to download it from that URL. https://repo1.maven.org/maven2/org/carrot2/elasticsearch-carrot2/ I don't see 5.6.2. The most recent version displayed is 5.5.2. For now, I'll just add this 5.6.2 to my local .m2 repo. Alain |