From: Andrea M. <a.m...@ci...> - 2013-10-06 11:40:19
|
I suggest a bit more exclusions: http://opac.braidense.it/robots.txt --- Andrea Marchitelli Servizi per la Gestione dell'Informazione e della Conoscenza CINECA - Sede operativa di Roma Via dei Tizii, 6 - 00185 Roma, Italy tel. +39 0644486625 - cell. +39 340 4027156 - http://www.cineca.it 2013/10/6 Eoghan Ó Carragáin <eog...@gm...> > Hi Nathan, > I don't see any problems with that. I have a few additional options in > ours: > > User-agent: * > Disallow: /Search/Results > Disallow: /Author/Home > Disallow: /Record/*/Details > Disallow: /Record/*/Export > Disallow: /Record/*/Cite > Disallow: /Record/*/Email > Disallow: /Record/*/Holdings > Sitemap: http://catalogue.nli.ie/sitemapIndex.xml > > > > ... and will be adding /AJAX now too. If you're experiencing load issues, > probably the most important one for you to add would be /Author/Home as > Luke recently pointed out. > > Eoghan > > > > On 4 October 2013 17:19, Nathan Tallman <nta...@gm...> wrote: > >> Thanks, Filipe! Yes, we are on 1.3. I do have sitemap.php and sitemap.ini >> setup and they do an excellent job at getting records into search engines. >> The issue I have is that odd pages are also indexed, such as page 4 of a >> set of search results. It's this type that I'm trying to prevent, as it >> leads to user confusion. >> >> Thanks, Demian for the AJAX call line. Should have realized it was so >> simple! >> >> Here's what I'm thinking about putting into robots.txt (catalog is our >> vufind root) >> >> Disallow: /catalog/Search/ >> Disallow: /catalog/Record/*/UserComments >> Disallow: /catalog/Record/*/Details >> Disallow: /catalog/AJAX >> Disallow: /catalog/AJAX/ >> >> Does anyone see any problems that I'm not? >> >> Thanks, >> Nathan >> >> >> On Fri, Oct 4, 2013 at 11:51 AM, Filipe MS Bento (UA) <fs...@ua...> wrote: >> >>> Hi Nathan, hello all!**** >>> >>> ** ** >>> >>> Truly hope that I got you right: in VuFind 1.x, that I believe is the >>> version you have (right?), you have an util (util/sitemap.php) that >>> generates a series of sitemap.xmls ready to be harvested by Google via >>> Google Webmasters Tools: https://www.google.com/webmasters/tools/ **** >>> >>> ** ** >>> >>> Just configure web/conf/sitemap.ini and you are ready to go (better, to >>> run util/sitemap.php) – just feed GWT with /sitemaps/sitemapIndex.xml<http://iia.web.ua.pt/sitemaps/sitemapIndex.xml> (generated >>> -- fileLocation = /var/www/html/VFSiteMaps/ > that converts to >>> /sitemaps/ in in my test install case).**** >>> >>> ** ** >>> >>> I have another one, http://iia.web.ua.pt/sitemap_interface.xml, so to >>> index the “static” pages of VuFind.**** >>> >>> ** ** >>> >>> I do not have “robots.txt” > well configured the XMLs above, Google will >>> “behave”… ;) or at least it seems so. >>> >>> All the best / have a great weekend,**** >>> >>> ** ** >>> >>> Filipe >>> >>> **** >>> >>> ** ** >>> >>> [image: http://graph.facebook.com/1569075303/picture] *Filipe MS Bento* >>> Computer Science Specialist, University of Aveiro, Portugal >>> Chairman of USE.pt Management Board (Portuguese Ex Libris UG, hosted by >>> The Portuguese Parliament, http://www.USEpt.org)**** >>> >>> [image: http://images.wisestamp.com/symbols/grey/email1.png]fs...@ua... >>> [image: http://images.wisestamp.com/symbols/grey/phone2.png] >>> 351234370200**** >>> >>> [image: http://images.wisestamp.com/symbols/grey/email1.png] >>> fi...@gm... [image: >>> http://images.wisestamp.com/symbols/grey/website.png] >>> http://about.filipebento.pt**** >>> >>> ** ** >>> >>> ** ** >>> >>> *Aviso de Confidencialidade/ Confidentiality Notice >>> Esta mensagem, e os ficheiros eventualmente anexos, é confidencial e >>> reservada apenas ao conhecimento da(s) pessoa(s) nela indicada(s) como >>> destinatária(s). Se não é o seu destinatário, ou se lhe foi enviada por >>> erro, não faça qualquer uso do respectivo conteúdo e proceda à sua >>> destruição, notificando o remetente. **This message, and the existing >>> attached files, is confidential and intended exclusively for the >>> individual(s) named as addressees. If you are not the intended recipient, >>> or if it was sent to you by error, you are kindly requested not to make any >>> use of its contents and to proceed to the destruction of the message, >>> thereby notifying the sender.***** >>> >>> ** ** >>> >>> ** ** >>> >>> ** ** >>> >>> ** ** >>> >>> *From:* Nathan Tallman [mailto:nta...@gm...] >>> *Sent:* 4 de outubro de 2013 16:30 >>> >>> *To:* vufind-tech >>> *Subject:* [VuFind-Tech] Robots.txt**** >>> >>> ** ** >>> >>> Hi VuFinders,**** >>> >>> Just a quick survey, can people please respond with what they are >>> disallowing, via robots.txt? What's the sweet spot that allows Google (et >>> al.) to crawl records without getting too many search results, >>> description/comment/tag pages, etc. in their indexes? >>> >>> Thank! >>> Nathan**** >>> >> >> >> >> ------------------------------------------------------------------------------ >> October Webinars: Code for Performance >> Free Intel webinars can help you accelerate application performance. >> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most >> from >> the latest Intel processors and coprocessors. See abstracts and register > >> >> http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk >> _______________________________________________ >> Vufind-tech mailing list >> Vuf...@li... >> https://lists.sourceforge.net/lists/listinfo/vufind-tech >> >> > > > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most > from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk > _______________________________________________ > Vufind-tech mailing list > Vuf...@li... > https://lists.sourceforge.net/lists/listinfo/vufind-tech > > |