Content-Type: multipart/alternative; boundary="_000_041D4FC2424BA2498CC3A9528643F59795889024SEQUOIAuapt_" --_000_041D4FC2424BA2498CC3A9528643F59795889024SEQUOIAuapt_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi Hugo, good morning. I guess your VuFind server has permission to access those 856 linked PDF fi= les, right? I ask this, because I'm getting a 403 when trying to access them. You might as well try removing the "pdf" filter: fulltext =3D custom, getFulltext(856u) and see if catches something... All the best (saludos), Filipe [http://graph.facebook.com/1569075303/picture] Filipe MS Bento Computer Science Specialist, University of Aveiro, Portugal Chairman of USE.pt Management Board (Portuguese Ex Libris UG, hosted by The= Portuguese Parliament, http://www.USEpt.org) [http://images.wisestamp.com/symbols/grey/email1.png]fsb@ua.pt [http://images.wisestamp.com/symbols/grey/phone2.png] 351234370200<= tel:351234370200> [http://images.wisestamp.com/symbols/grey/website.png] = http://about.filipebento.pt [https://encrypted-tbn0.gstatic.com/images?q=3Dtbn:ANd9GcRsc0WwK1VTw1XvKR9k= rH4VVO6iX_VdF8XA__oSf6i0jENDjV5g] [https://www.facebook.com/badge.php?id= =3D165468503498620&bid=3D1487&key=3D1725201791&format=3Dpng&z=3D1715831362]= Aviso de Confidencialidade/ Confidentiality Notice Esta mensagem, e os ficheiros eventualmente anexos, =E9 confidencial e rese= rvada apenas ao conhecimento da(s) pessoa(s) nela indicada(s) como destinat= =E1ria(s). Se n=E3o =E9 o seu destinat=E1rio, ou se lhe foi enviada por er= ro, n=E3o fa=E7a qualquer uso do respectivo conte=FAdo e proceda =E0 sua de= strui=E7=E3o, notificando o remetente. This message, and the existing attac= hed files, is confidential and intended exclusively for the individual(s) n= amed as addressees. If you are not the intended recipient, or if it was sen= t to you by error, you are kindly requested not to make any use of its cont= ents and to proceed to the destruction of the message, thereby notifying th= e sender. From: Hugo Agud [mailto:hagud@orex.es] Sent: 4 de dezembro de 2013 09:35 To: vufind-tech Subject: [VuFind-Tech] problemas with full text indexing Hi Some months ago we configured vufind and tika and worked fine, now we 're r= esuming the project and we have found that vufind it is not indexing de ful= lt text, we suspect is that he does'nt know that we wish to index the full= text. What we have done... please any idea how we can identify the problem? it se= ems it is not reading the marc_local.properties, it makes sense? 1. File --> -rw-rw-r-- 1 ubuntu ubuntu 3832 Dec 4 08:54 marc_local.propert= ies # Uncomment the following line to index full text from URLs found in your M= ARC # records. All sorts of document formats are supported (PDF, Word Doc, ASC= II # text, HTML, etc.) The first parameter is a fieldspec showing which fields= to use # for URL retrieval. The second parameter is optional -- if included, only= files # matching the specified suffix will be indexed. Note that this functional= ity # depends on a full text tool being installed on your system. See the wiki= for # details: # http://vufind.org/wiki/importing_records#indexing_full_text fulltext =3D custom, getFulltext(856u, pdf) 2. We have also configured the file fulltext.ini -rw-r--r-- 1 www-data www-data 1325 Dec 4 09:16 fulltext.ini [General] parser =3D Tika ; Aperture is a Java tool for extracting full text from documents. It is n= ot ; included with VuFind by default, but it can be downloaded here: ; http://aperture.sourceforge.net/ ; VuFind's Aperture code was tested with version 1.5.0 of the package. ;[Aperture] ; Once you have installed Aperture, uncomment one of the following two line= s ; and fill in the appropriate path to take advantage of it. ;webcrawler =3D "/usr/local/aperture/bin/webcrawler.sh" ; Linux ;webcrawler =3D "c:\aperture\bin\webcrawler.bat" ; Windows ; Tika is another Java tool for extracting fulltext from documents It is no= t ; included with VuFind by default, but it can be downloaded here: ; http://tika.apache.org/download.html ; VuFind's Tika code was tested with version 1.2 of Tika. [Tika] ; Download the jar file and fill in the appropriate path to use it. path =3D "/home/ubuntu/tika-1.4/tika-app/target/tika-app-1.4.jar" 3. Import de marc records with 856 fields, but we can't see any refer to fu= ll text indexing, when we did the test in 2012, we have a meesage here clai= ming the ful text indexing.. Dec 04, 09:29:59 /usr/lib/jvm/default-java/bin/java -Xms512m -Xmx512m -Duse= r.timezone=3DUTC -Dsolr.core.name=3Dbiblio -jar /u= sr/local/vufind2/import/SolrMarc.jar /usr/local/vufind2/local/import/import= .properties /home/ubuntu/proyectos.mrc INFO [main] (MarcImporter.java:851) - Starting SolrMarc indexing. INFO [main] (Utils.java:339) - Opening file: /usr/local/vufind2/local/impo= rt/import.properties INFO [main] (MarcImporter.java:784) - Connecting to remote Solr server at= URL http://localhost:8080/solr/biblio/update INFO [main] (MarcHandler.java:371) - Attempting to open data file: /home/u= buntu/proyectos.mrc INFO [main] (MarcImporter.java:318) - Added record 1 read from file: 21056 INFO [main] (MarcImporter.java:318) - Added record 2 read from file: 21057 INFO [main] (MarcImporter.java:617) - Adding 2 of 2 documents to index INFO [main] (MarcImporter.java:618) - Deleting 0 documents from index INFO [main] (MarcImporter.java:491) - Calling commit (with optimize set to= false) INFO [main] (MarcImporter.java:503) - Done with the commit, closing Solr INFO [main] (MarcImporter.java:506) - Setting Solr closed flag INFO [main] (MarcImporter.java:627) - Finished indexing in 0:11.00 INFO [main] (MarcImporter.java:636) - Indexed 2 at a rate of about 0.0 per= sec INFO [main] (MarcImporter.java:637) - Deleted 0 records INFO [Thread-1] (MarcImporter.java:566) - Starting Shutdown hook INFO [Thread-1] (MarcImporter.java:585) - Finished Shutdown hook ubuntu@pazpar2:/usr/local/vufind2$ Here the mark record 00247nam a22000977a 4500008004100000245001400041300001300055856005200068942= 001200120999001700132^^131204b xxu||||| |||| 00| 0 eng d^^ ^_a19983= 7409^^ ^_aproyecto^^ ^_uhttp://library.eada.edu/proyectos/199837409.pdf^^ ^_2ddc^_cBK^^ ^_c21056^= _d21056^^^]00247nam a22000977a 45000080041000002450014000413000013000558560= 05200068942001200120999001700132^^131204b xxu||||| |||| 00| 0 eng d^= ^ ^_a199838150^^ ^_aproyecto^^ ^_uhttp://library.eada.edu/proyectos/1998= 38150.pdf^^ ^_2ddc^_cBK^^= ^_c21057^_d21057^^ any idea? -- Hugo Agud - Orex Digital www.orex.es Director Passatge de la Llan=E7adera, 3 =B7 08338 Premi=E0 de Dalt - Tel: 93 539 40 = 70 hagud@orex.es =B7 http://www.orex.es/ No imprima este mensaje a no ser que sea necesario. Una tonelada de papel i= mplica la tala de 15 =E1rboles y el consumo de 250.000 litros de agua. Aviso de confidencialidad Este mensaje contiene informaci=F3n que puede ser CONFIDENCIAL y/o de USO R= ESTRINGIDO. Si usted no es el receptor deseado del mensaje (ni est=E1 autorizado a recibirlo por el remitente), no est=E1 autorizado a cop= iar, reenviar o divulgar el mensaje o su contenido. Si ha recibido este men= saje por error, por favor, notif=EDquenoslo inmediatamente y b=F3rrelo de su sis= tema. --_000_041D4FC2424BA2498CC3A9528643F59795889024SEQUOIAuapt_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable

I guess your VuFind server has permission to access those = 856 linked PDF files, right?

 

I ask this, because I’m getting a 403 when trying to= access them.

 

You might as well try removing the “pdf” filte= r:

   fulltext = =3D custom, getFulltext(856u)=

and see if catches something…

 

All the best (saludos),

 

Filipe

 

 <= /p>

 

3D"http://graph.facebook.com/156907= = Filipe MS Bento 
Computer Science Spec= ialist, Universi= ty of Aveiro, Portugal<= br> Chairman of USE.pt Ma= nagement Board (Portuguese Ex Libris UG, hosted by The Portuguese Parliamen= t, http://www.USEpt.org)

3D"http://images.wisestamp.com/symbols/g=fsb@ua.pt<= /span>  3D"http://ima= 351234370200  3D= http://about.filipebento.pt


=3D"https://encrypted-tbn0.gstatic.com/images?q=3Dtbn:ANd9GcRsc0WwK1VTw1XvK=  3D"https://www.facebook.com/badge.php?id=3D165468503498620&amp=  

A= viso de Confidencialidade/ Confidentiality Notice
Esta mensagem, e os ficheiros eventualmente anexos, =E9 confidencial e rese= rvada apenas ao conhecimento da(s) pessoa(s) nela indicada(s) como destinat= =E1ria(s). Se n=E3o =E9 o seu destinat=E1rio, ou se  lhe foi enviada p= or erro, n=E3o fa=E7a qualquer uso do respectivo conte=FAdo e proceda =E0 sua destrui=E7=E3o, notificando o remetente. 
This mes= sage, and the existing attached files, is confidential and intended exclusively for the individual(s) named as addressees. If you are not the = intended recipient, or if it was sent to you by error, you are kindly reque= sted not to make any use of its contents and to proceed to the destruction = of the message, thereby notifying the sender.=

 = ;

 = ;

 

 

From: Hugo Agud [mailto:hagud@orex.es]
Sent: 4 de dezembro de 2013 09:35
To: vufind-tech
Subject: [VuFind-Tech] problemas with full text indexing<= /span>

 

Hi

 

Some months ago we configured vufind and tika and wo= rked fine, now we 're resuming the project and we have found that vufind it= is not indexing de fullt text,  we suspect is that he does'nt know th= at we wish to index the full text.

 

What we have done... please any idea how we can iden= tify the problem? it seems it is not reading the marc_local.properties, it = makes sense?

 

1. File --> -rw-rw-r-- 1 ubuntu ubuntu 3832 Dec  4 08:54 marc_local.properties

 

# Uncomment the following line to index= full text from URLs found in your MARC

# records.  All sorts of document = formats are supported (PDF, Word Doc, ASCII

# text, HTML, etc.) The first parameter= is a fieldspec showing which fields to use

# for URL retrieval.  The second p= arameter is optional -- if included, only files

# matching the specified suffix will be= indexed.  Note that this functionality

# depends on a full text tool being ins= talled on your system.  See the wiki for

# details:

#       htt= p://vufind.org/wiki/importing_records#indexing_full_text

fulltext =3D custom, getFulltext(856u, = pdf)

 

2. We have also configured the file fulltext.ini

-rw-r--r-- 1 www-data www-data 1325 Dec  4 09:16 fulltext.ini=

 

[General]

parser =3D Tika

 

; Aperture is a Java tool for extrac= ting full text from documents.  It is not

; included with VuFind by default, b= ut it can be downloaded here:

;       http://aperture.sourceforge.ne= t/

; VuFind's Aperture code was tested = with version 1.5.0 of the package.

;[Aperture]

; Once you have installed Aperture, = uncomment one of the following two lines

; and fill in the appropriate path t= o take advantage of it.

;webcrawler =3D "/usr/local/ape= rture/bin/webcrawler.sh"   ; Linux

;webcrawler =3D "c:\aperture\bi= n\webcrawler.bat"          ; Windows

 

; Tika is another Java tool for extr= acting fulltext from documents It is not

; included with VuFind by default, b= ut it can be downloaded here:

;       http://tika.apache.org/dow= nload.html

; VuFind's Tika code was tested with= version 1.2 of Tika.

[Tika]

; Download the jar file and fill in = the appropriate path to use it.

path =3D "/home/ubuntu/tika-1.4= /tika-app/target/tika-app-1.4.jar"

 

 

 

3. Import de marc records with 856 fields, but we can't see any re= fer to full text indexing, when we did the test in 2012, we have a meesage = here claiming the ful text indexing..

 

Dec 04, 09:29:59 /usr/lib/jvm/defaul= t-java/bin/java -Xms512m -Xmx512m -Duser.timezone=3DUTC -Dsolr.core.name=3Dbiblio  -jar /usr/local/vufind2/import/SolrMarc.jar /usr/local/vufind2/local/impor= t/import.properties /home/ubuntu/proyectos.mrc

 INFO [main] (MarcImporter.java= :851) - Starting SolrMarc indexing.

 INFO [main] (Utils.java:339) -= Opening file: /usr/local/vufind2/local/import/import.properties=

 INFO [main] (MarcImporter.java= :784) -  Connecting to remote Solr server at URL http://localhost:8080/= solr/biblio/update

 INFO [main] (MarcHandler.java:= 371) - Attempting to open data file: /home/ubuntu/proyectos.mrc<= o:p>

 INFO [main] (MarcImporter.java= :318) - Added record 1 read from file: 21056

 INFO [main] (MarcImporter.java= :318) - Added record 2 read from file: 21057

 INFO [main] (MarcImporter.java= :617) -  Adding 2 of 2 documents to index

 INFO [main] (MarcImporter.java= :618) -  Deleting 0 documents from index

 INFO [main] (MarcImporter.java= :491) - Calling commit (with optimize set to false)

 INFO [main] (MarcImporter.java= :503) - Done with the commit, closing Solr

 INFO [main] (MarcImporter.java= :506) - Setting Solr closed flag

 INFO [main] (MarcImporter.java= :627) - Finished indexing in 0:11.00

 INFO [main] (MarcImporter.java= :636) - Indexed 2 at a rate of about 0.0 per sec

 INFO [main] (MarcImporter.java= :637) - Deleted 0 records

 INFO [Thread-1] (MarcImporter.= java:566) - Starting Shutdown hook

 INFO [Thread-1] (MarcImporter.= java:585) - Finished Shutdown hook

ubuntu@pazpar2:/usr/local/vufind2

 

 

Here the mark record

00247nam a22000977a 4500008004100000245001400041300001300055856005= 200068942001200120999001700132^^131204b        xxu|||||= |||| 00| 0 eng d^^  ^_a199837409^^  ^_aproyecto^^  ^_uhttp:= //library.eada.= edu/proyectos/199837409.pdf^^  ^_2ddc^_cBK^^  ^_c21056^_d21056^^^]00247nam a22000977a 45000080041000= 00245001400041300001300055856005200068942001200120999001700132^^131204b&nbs= p;       xxu||||| |||| 00| 0 eng d^^  ^_a199838150^^&nb= sp; ^_aproyecto^^  ^_uhttp://library.eada.edu/proyectos/199838150.pdf^^  ^_2ddc^_cBK^^  ^_c21057^_d21057^^

 

 

any idea?

 

 

 

--

Hugo Agud - = Orex Digital 

www.orex.es

 <= /span>

Direc= tor

Passa= tge de la Llan=E7adera, 3 =B7 08338 Premi=E0 de Dalt - Tel: 93 53= 9 40 70   hagud@orex.es=  =B7 http://www.orex.es/

 

No im= prima este mensaje a no ser que sea necesario. Una tonelada de papel implic= a la tala de 15 =E1rboles y el consumo de 250.000 litros de agua.

 

Aviso= de confidencialidad
Este mensaje contiene informaci=F3n que pued= e ser CONFIDENCIAL y/o de USO RESTRINGIDO. Si usted no es el receptor desea= do del mensaje (ni
est=E1 autorizado a recibirlo por el remiten= te), no est=E1 autorizado a copiar, reenviar o divulgar el mensaje o su con= tenido. Si ha recibido este mensaje
por error, por favor, notif=EDquenoslo inmed= iatamente y b=F3rrelo de su sistema.

 

--_000_041D4FC2424BA2498CC3A9528643F59795889024SEQUOIAuapt_--