Hi

Some months ago we configured vufind and tika and worked fine, now we 're resuming the project and we have found that vufind it is not indexing de fullt text,  we suspect is that he does'nt know that we wish to index the full text.

What we have done... please any idea how we can identify the problem? it seems it is not reading the marc_local.properties, it makes sense?

1. File --> -rw-rw-r-- 1 ubuntu ubuntu 3832 Dec  4 08:54 marc_local.properties

# Uncomment the following line to index full text from URLs found in your MARC

# records.  All sorts of document formats are supported (PDF, Word Doc, ASCII

# text, HTML, etc.) The first parameter is a fieldspec showing which fields to use

# for URL retrieval.  The second parameter is optional -- if included, only files

# matching the specified suffix will be indexed.  Note that this functionality

# depends on a full text tool being installed on your system.  See the wiki for

# details:

#       http://vufind.org/wiki/importing_records#indexing_full_text

fulltext = custom, getFulltext(856u, pdf)


2. We have also configured the file fulltext.ini

-rw-r--r-- 1 www-data www-data 1325 Dec  4 09:16 fulltext.ini


[General]

parser = Tika


; Aperture is a Java tool for extracting full text from documents.  It is not

; included with VuFind by default, but it can be downloaded here:

;       http://aperture.sourceforge.net/

; VuFind's Aperture code was tested with version 1.5.0 of the package.

;[Aperture]

; Once you have installed Aperture, uncomment one of the following two lines

; and fill in the appropriate path to take advantage of it.

;webcrawler = "/usr/local/aperture/bin/webcrawler.sh"   ; Linux

;webcrawler = "c:\aperture\bin\webcrawler.bat"          ; Windows


; Tika is another Java tool for extracting fulltext from documents It is not

; included with VuFind by default, but it can be downloaded here:

;       http://tika.apache.org/download.html

; VuFind's Tika code was tested with version 1.2 of Tika.

[Tika]

; Download the jar file and fill in the appropriate path to use it.

path = "/home/ubuntu/tika-1.4/tika-app/target/tika-app-1.4.jar"




3. Import de marc records with 856 fields, but we can't see any refer to full text indexing, when we did the test in 2012, we have a meesage here claiming the ful text indexing..


Dec 04, 09:29:59 /usr/lib/jvm/default-java/bin/java -Xms512m -Xmx512m -Duser.timezone=UTC -Dsolr.core.name=biblio  -jar /usr/local/vufind2/import/SolrMarc.jar /usr/local/vufind2/local/import/import.properties /home/ubuntu/proyectos.mrc

 INFO [main] (MarcImporter.java:851) - Starting SolrMarc indexing.

 INFO [main] (Utils.java:339) - Opening file: /usr/local/vufind2/local/import/import.properties

 INFO [main] (MarcImporter.java:784) -  Connecting to remote Solr server at URL http://localhost:8080/solr/biblio/update

 INFO [main] (MarcHandler.java:371) - Attempting to open data file: /home/ubuntu/proyectos.mrc

 INFO [main] (MarcImporter.java:318) - Added record 1 read from file: 21056

 INFO [main] (MarcImporter.java:318) - Added record 2 read from file: 21057

 INFO [main] (MarcImporter.java:617) -  Adding 2 of 2 documents to index

 INFO [main] (MarcImporter.java:618) -  Deleting 0 documents from index

 INFO [main] (MarcImporter.java:491) - Calling commit (with optimize set to false)

 INFO [main] (MarcImporter.java:503) - Done with the commit, closing Solr

 INFO [main] (MarcImporter.java:506) - Setting Solr closed flag

 INFO [main] (MarcImporter.java:627) - Finished indexing in 0:11.00

 INFO [main] (MarcImporter.java:636) - Indexed 2 at a rate of about 0.0 per sec

 INFO [main] (MarcImporter.java:637) - Deleted 0 records

 INFO [Thread-1] (MarcImporter.java:566) - Starting Shutdown hook

 INFO [Thread-1] (MarcImporter.java:585) - Finished Shutdown hook

ubuntu@pazpar2:/usr/local/vufind2



Here the mark record

00247nam a22000977a 4500008004100000245001400041300001300055856005200068942001200120999001700132^^131204b        xxu||||| |||| 00| 0 eng d^^  ^_a199837409^^  ^_aproyecto^^  ^_uhttp://library.eada.edu/proyectos/199837409.pdf^^  ^_2ddc^_cBK^^  ^_c21056^_d21056^^^]00247nam a22000977a 4500008004100000245001400041300001300055856005200068942001200120999001700132^^131204b        xxu||||| |||| 00| 0 eng d^^  ^_a199838150^^  ^_aproyecto^^  ^_uhttp://library.eada.edu/proyectos/199838150.pdf^^  ^_2ddc^_cBK^^  ^_c21057^_d21057^^



any idea?




--

Hugo Agud - Orex Digital 

www.orex.es


Director

Passatge de la Llançadera, 3 · 08338 Premià de Dalt - Tel: 93 539 40 70   hagud@orex.es · http://www.orex.es/

 

No imprima este mensaje a no ser que sea necesario. Una tonelada de papel implica la tala de 15 árboles y el consumo de 250.000 litros de agua.

 

Aviso de confidencialidad
Este mensaje contiene información que puede ser CONFIDENCIAL y/o de USO RESTRINGIDO. Si usted no es el receptor deseado del mensaje (ni
está autorizado a recibirlo por el remitente), no está autorizado a copiar, reenviar o divulgar el mensaje o su contenido. Si ha recibido este mensaje
por error, por favor, notifíquenoslo inmediatamente y bórrelo de su sistema.