Thread: [Carrot2-cvs] carrot2/components/inputs/adapter-nutch/src/com/dawidweiss/carrot/input/nutch NutchToC
Brought to you by:
dawidweiss,
stachoo
From: <daw...@us...> - 2004-02-06 18:19:42
|
Update of /cvsroot/carrot2/carrot2/components/inputs/adapter-nutch/src/com/dawidweiss/carrot/input/nutch In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv4867/components/inputs/adapter-nutch/src/com/dawidweiss/carrot/input/nutch Modified Files: NutchToCarrot2Servlet.java Removed Files: XMLSerializerHelper.java Log Message: [new], component: global Added a new ANT task to manage JAR interdependencies in the project. Now you can easily collect all JARs that a component requires, plus only these JARs are use d at compile-time. Try using these targets on build files of components: 'ant show.dependencies' (shows all required components and JAR files), 'ant collect.dependencies' (copies the required JARs to the distribution.dir folder). [refactoring], component: global Build files have changed, so ANT1.6 is now a requirement. Index: NutchToCarrot2Servlet.java =================================================================== RCS file: /cvsroot/carrot2/carrot2/components/inputs/adapter-nutch/src/com/dawidweiss/carrot/input/nutch/NutchToCarrot2Servlet.java,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** NutchToCarrot2Servlet.java 4 Feb 2004 20:32:14 -0000 1.1 --- NutchToCarrot2Servlet.java 6 Feb 2004 18:16:26 -0000 1.2 *************** *** 41,44 **** --- 41,46 ---- import org.w3c.dom.NodeList; + import com.dawidweiss.carrot.util.XMLSerializerHelper; + /** * Nutch search engine adapter that accepts queries in *************** *** 75,84 **** private int defaultResultsNumber = DEFAULT_REQUESTED_RESULTS; - /** - * A private instance of the serializer (small performance - * gain over static method invocation). - */ - private final XMLSerializerHelper xmlSerializer = - new XMLSerializerHelper(); /** --- 77,80 ---- *************** *** 219,222 **** --- 215,222 ---- } + // the serializer is not thread-safe, but could also be pooled, just + // just as the objects above. + XMLSerializerHelper xmlSerializer = XMLSerializerHelper.getInstance(); + // Perform a Nutch search with the acquired query. NutchBean nutchBean = NutchBean.get( *************** *** 238,243 **** out.write(Integer.toString(requestedResults)); out.write("\">"); ! out.write( ! xmlSerializer.escapeElementEntities(queryBuffer.toString())); out.write("</query>\n\n"); --- 238,242 ---- out.write(Integer.toString(requestedResults)); out.write("\">"); ! xmlSerializer.writeValidXmlText(out, queryBuffer.toString(), false); out.write("</query>\n\n"); *************** *** 245,249 **** Summarizer summarizer = new Summarizer(); for (int i = 0; i < length; i++) { - Hit hit = show[i]; HitDetails detail = details[i]; String title = detail.getValue("title"); --- 244,247 ---- *************** *** 261,265 **** // emit the title. out.write("<title>"); ! out.write( xmlSerializer.escapeElementEntities(title)); out.write("</title>\n"); --- 259,263 ---- // emit the title. out.write("<title>"); ! xmlSerializer.writeValidXmlText(out, title, false); out.write("</title>\n"); *************** *** 271,278 **** // emit the summary (if exists) ! // extract summaries. We can't use the same method as ! // in Nutch's search.jsp -- getSummary(details, query); -- because ! // it returns encoded HTML entities and we want to emit them ! // as UTF-8 byte [] content = nutchBean.getContent(details[i]); Summary summary = summarizer.getSummary( --- 269,275 ---- // emit the summary (if exists) ! // ! // THIS IS CURRENTLY BROKEN (OUTPUTS RAW PAGE CONTENTS). No API ! // in Nutch to access the text of the page. byte [] content = nutchBean.getContent(details[i]); Summary summary = summarizer.getSummary( *************** *** 283,287 **** out.write("<snippet>"); for (int j=0;j<fragments.length;j++) { ! out.write( xmlSerializer.escapeElementEntities(fragments[j].getText())); } out.write("</snippet>\n"); --- 280,284 ---- out.write("<snippet>"); for (int j=0;j<fragments.length;j++) { ! xmlSerializer.writeValidXmlText(out, fragments[j].getText(), false); } out.write("</snippet>\n"); --- XMLSerializerHelper.java DELETED --- |