Thread: [Semanticscuttle-devel] google custom searches limited to main page of sites?
Brought to you by:
cweiske
From: Chris L. <ch...@ch...> - 2011-02-11 05:34:06
|
I was wondering if the Google Custom Search (gsearch/) is intended to limit searches to the main page of the sites listed? I have SemanticScuttle installed at: http://clinki.es/gsearch/ and this appears to be the behavior... for example, search for 'David' and this should return many pages, but returns only one: http://www.google.com/cse?cref=http://clinki.es/gsearch/context.php&q=david&sa=Search&siteurl=clinki.es/gsearch/ If this is intended behavior-- and I could understand that being the case-- is there either a) an option to change this or b) a way to use SemanticScuttle to drive a GCSE in this way? Thanks! c -- Chris Lott |
From: Christian W. <cw...@cw...> - 2011-02-11 07:43:14
|
Hi Chris, > I was wondering if the Google Custom Search (gsearch/) is intended to > limit searches to the main page of the sites listed? I have > SemanticScuttle installed at: http://clinki.es/gsearch/ and this > appears to be the behavior... for example, search for 'David' and this > should return many pages, but returns only one: > http://www.google.com/cse?cref=http://clinki.es/gsearch/context.php&q=david&sa=Search&siteurl=clinki.es/gsearch/ > > If this is intended behavior-- and I could understand that being the > case-- is there either a) an option to change this or b) a way to use > SemanticScuttle to drive a GCSE in this way? What do you mean by "main page"? All pages exported by > http://clinki.es/api/export_gcs.php should be found by google. This script is linked from http://clinki.es/gsearch/ -- Regards/Mit freundlichen Grüßen Christian Weiske -= Geeking around in the name of science since 1982 =- |
From: Chris L. <ch...@ch...> - 2011-02-19 17:36:28
|
What I mean is that, as I understand it, a GCSE can be setup to search all of a site that is listed as one of the custom sites, not just the page. For instance, if I bookmark http://chrislott.org/, a custom search generated by SemanticScuttle right now is returning results only for the front page of chrislott.org, not results from all chrislott.org pages. I assume this is a setting somewhere, since when I create a custom search engine and have http://chrislott.org/ as an entry in the sites to be searched, it finds pages from all of the domain. c On Thu, Feb 10, 2011 at 10:42 PM, Christian Weiske <cw...@cw...> wrote: > Hi Chris, > > >> I was wondering if the Google Custom Search (gsearch/) is intended to >> limit searches to the main page of the sites listed? I have >> SemanticScuttle installed at: http://clinki.es/gsearch/ and this >> appears to be the behavior... for example, search for 'David' and this >> should return many pages, but returns only one: >> http://www.google.com/cse?cref=http://clinki.es/gsearch/context.php&q=david&sa=Search&siteurl=clinki.es/gsearch/ >> >> If this is intended behavior-- and I could understand that being the >> case-- is there either a) an option to change this or b) a way to use >> SemanticScuttle to drive a GCSE in this way? > > What do you mean by "main page"? All pages exported by >> http://clinki.es/api/export_gcs.php > should be found by google. This script is linked from > http://clinki.es/gsearch/ > > > -- > Regards/Mit freundlichen Grüßen > Christian Weiske > > -= Geeking around in the name of science since 1982 =- > > ------------------------------------------------------------------------------ > The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: > Pinpoint memory and threading errors before they happen. > Find and fix more than 250 security defects in the development cycle. > Locate bottlenecks in serial and parallel code that limit performance. > http://p.sf.net/sfu/intel-dev2devfeb > _______________________________________________ > Semanticscuttle-devel mailing list > Sem...@li... > https://lists.sourceforge.net/lists/listinfo/semanticscuttle-devel > |
From: Eric C <eco...@el...> - 2011-02-19 17:43:59
|
Google has limits to the number of URL's with variables it will add to it's index... http://google.com/folder1/folder2/folder3/folder4/file.html is much more likely to be indexed than http://google.com/index.php?var1=1&var2=2 On Sat, Feb 19, 2011 at 3:28 PM, Chris Lott <ch...@ch...> wrote: > What I mean is that, as I understand it, a GCSE can be setup to search > all of a site that is listed as one of the custom sites, not just the > page. For instance, if I bookmark http://chrislott.org/, a custom > search generated by SemanticScuttle right now is returning results > only for the front page of chrislott.org, not results from all > chrislott.org pages. I assume this is a setting somewhere, since when > I create a custom search engine and have http://chrislott.org/ as an > entry in the sites to be searched, it finds pages from all of the > domain. > > c > > > On Thu, Feb 10, 2011 at 10:42 PM, Christian Weiske <cw...@cw...> wrote: >> Hi Chris, >> >> >>> I was wondering if the Google Custom Search (gsearch/) is intended to >>> limit searches to the main page of the sites listed? I have >>> SemanticScuttle installed at: http://clinki.es/gsearch/ and this >>> appears to be the behavior... for example, search for 'David' and this >>> should return many pages, but returns only one: >>> http://www.google.com/cse?cref=http://clinki.es/gsearch/context.php&q=david&sa=Search&siteurl=clinki.es/gsearch/ >>> >>> If this is intended behavior-- and I could understand that being the >>> case-- is there either a) an option to change this or b) a way to use >>> SemanticScuttle to drive a GCSE in this way? >> >> What do you mean by "main page"? All pages exported by >>> http://clinki.es/api/export_gcs.php >> should be found by google. This script is linked from >> http://clinki.es/gsearch/ >> >> >> -- >> Regards/Mit freundlichen Grüßen >> Christian Weiske >> >> -= Geeking around in the name of science since 1982 =- >> >> ------------------------------------------------------------------------------ >> The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: >> Pinpoint memory and threading errors before they happen. >> Find and fix more than 250 security defects in the development cycle. >> Locate bottlenecks in serial and parallel code that limit performance. >> http://p.sf.net/sfu/intel-dev2devfeb >> _______________________________________________ >> Semanticscuttle-devel mailing list >> Sem...@li... >> https://lists.sourceforge.net/lists/listinfo/semanticscuttle-devel >> > > ------------------------------------------------------------------------------ > The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: > Pinpoint memory and threading errors before they happen. > Find and fix more than 250 security defects in the development cycle. > Locate bottlenecks in serial and parallel code that limit performance. > http://p.sf.net/sfu/intel-dev2devfeb > _______________________________________________ > Semanticscuttle-devel mailing list > Sem...@li... > https://lists.sourceforge.net/lists/listinfo/semanticscuttle-devel > |
From: Chris L. <ch...@ch...> - 2011-02-19 19:14:40
|
That's not the problem at all, though. For instance: in my Scuttle bookmarks I have bookmarked http://chrislott.org/ -- the Google Search setup by Scuttle looks like this: http://www.google.com/cse?cref=http://clinki.es/gsearch/context.php&q=david&sa=Search&siteurl=clinki.es/gsearch/index.php and returns one result. I setup a CSE on my own with http://chrislott.org/ as a site: http://www.google.com/cse/home?cx=005906874594202320794:pd1c3rhiyv8&hl=en and perform the same search and get the expected results (many). c On Sat, Feb 19, 2011 at 8:43 AM, Eric C <eco...@el...> wrote: > Google has limits to the number of URL's with variables it will add to > it's index... > > http://google.com/folder1/folder2/folder3/folder4/file.html is much > more likely to be indexed than > http://google.com/index.php?var1=1&var2=2 > > > On Sat, Feb 19, 2011 at 3:28 PM, Chris Lott <ch...@ch...> wrote: >> What I mean is that, as I understand it, a GCSE can be setup to search >> all of a site that is listed as one of the custom sites, not just the >> page. For instance, if I bookmark http://chrislott.org/, a custom >> search generated by SemanticScuttle right now is returning results >> only for the front page of chrislott.org, not results from all >> chrislott.org pages. I assume this is a setting somewhere, since when >> I create a custom search engine and have http://chrislott.org/ as an >> entry in the sites to be searched, it finds pages from all of the >> domain. >> >> c >> >> >> On Thu, Feb 10, 2011 at 10:42 PM, Christian Weiske <cw...@cw...> wrote: >>> Hi Chris, >>> >>> >>>> I was wondering if the Google Custom Search (gsearch/) is intended to >>>> limit searches to the main page of the sites listed? I have >>>> SemanticScuttle installed at: http://clinki.es/gsearch/ and this >>>> appears to be the behavior... for example, search for 'David' and this >>>> should return many pages, but returns only one: >>>> http://www.google.com/cse?cref=http://clinki.es/gsearch/context.php&q=david&sa=Search&siteurl=clinki.es/gsearch/ >>>> >>>> If this is intended behavior-- and I could understand that being the >>>> case-- is there either a) an option to change this or b) a way to use >>>> SemanticScuttle to drive a GCSE in this way? >>> >>> What do you mean by "main page"? All pages exported by >>>> http://clinki.es/api/export_gcs.php >>> should be found by google. This script is linked from >>> http://clinki.es/gsearch/ >>> >>> >>> -- >>> Regards/Mit freundlichen Grüßen >>> Christian Weiske >>> >>> -= Geeking around in the name of science since 1982 =- >>> >>> ------------------------------------------------------------------------------ >>> The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: >>> Pinpoint memory and threading errors before they happen. >>> Find and fix more than 250 security defects in the development cycle. >>> Locate bottlenecks in serial and parallel code that limit performance. >>> http://p.sf.net/sfu/intel-dev2devfeb >>> _______________________________________________ >>> Semanticscuttle-devel mailing list >>> Sem...@li... >>> https://lists.sourceforge.net/lists/listinfo/semanticscuttle-devel >>> >> >> ------------------------------------------------------------------------------ >> The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: >> Pinpoint memory and threading errors before they happen. >> Find and fix more than 250 security defects in the development cycle. >> Locate bottlenecks in serial and parallel code that limit performance. >> http://p.sf.net/sfu/intel-dev2devfeb >> _______________________________________________ >> Semanticscuttle-devel mailing list >> Sem...@li... >> https://lists.sourceforge.net/lists/listinfo/semanticscuttle-devel >> > |
From: Christian W. <cw...@cw...> - 2011-05-14 08:28:52
Attachments:
signature.asc
|
Hello Chris, > That's not the problem at all, though. For instance: in my Scuttle > bookmarks I have bookmarked http://chrislott.org/ -- the Google Search > setup by Scuttle looks like this: > http://www.google.com/cse?cref=http://clinki.es/gsearch/context.php&q=david&sa=Search&siteurl=clinki.es/gsearch/index.php > > and returns one result. > > I setup a CSE on my own with http://chrislott.org/ as a site: > http://www.google.com/cse/home?cx=005906874594202320794:pd1c3rhiyv8&hl=en > > and perform the same search and get the expected results (many) I think I found the problem: The google search engine description is not valid XML. Try to visit http://clinki.es/gsearch/context.php - there are comments around the leading XML tag. Edit www/gsearch/context.php and change > <!--?xml version="1.0" encoding="UTF-8" ?--> to > <?xml version="1.0" encoding="UTF-8" ?> Also login as admin and click the "refresh at google" link on the gsearch page. -- Regards/Mit freundlichen Grüßen Christian Weiske -=≡ Geeking around in the name of science since 1982 ≡=- |