From: Jacob M. <jac...@gm...> - 2010-04-07 17:37:11
|
I am bringing this thread over to devel list since that seems like the most appropriate location for this part of the discussion. I was thinking about it more and I came up with a slight variation on this last approach primarily to satisfy my desire not to accidentally trample anything. I was able to track down AnalyzerConfig and that is where I started my work, good to know I was following that part of the code correctly then. I gave it a few hours and banged together the following changes which I am posting for initial review from the developers for some feedback/suggestions. So org.exist.indexing.lucene.AnalyzerConfig has the following changes: Added: private final static String PARAM_CHILDREN = "params"; private final static String PARAM_TYPE_ATTRIBUTE = "type"; private static List<Object> parseParamChildren(NodeList children){ List<Object> p = new ArrayList<Object>(children.getLength()); for(int idx = children.getLength() - 1; idx >= 0; --idx){ p.add(0, parseParamChild(children.item(idx))); } return p; } private static Object parseParamChild(Node child) { String type = child.getNodeName(); String subtype = child.getAttributes().getNamedItem(PARAM_TYPE_ATTRIBUTE).getNodeValue(); if(type.equals("class")){ try { return Class.forName(child.getNodeValue()); } catch (ClassNotFoundException ex) { //TODO } } else if (type.equals("string")){ return child.getNodeValue(); } else if (type.equals("int")){ return Integer.parseInt(child.getNodeValue()); } else if (type.equals("char")){ return child.getNodeValue().toCharArray(); } else if (type.equals("float")){ return Float.parseFloat(child.getNodeValue()); } else if (type.equals("double")){ return Double.parseDouble(child.getNodeValue()); } else if (type.equals("array")){ if (subtype.equals("string")){ return parseParamChildren(child.getChildNodes()).toArray(new String[0]); } else if (subtype.equals("int")){ return parseParamChildren(child.getChildNodes()).toArray(new Integer[0]); } else if (subtype.equals("float")){ return parseParamChildren(child.getChildNodes()).toArray(new Float[0]); } else if (subtype.equals("double")){ return parseParamChildren(child.getChildNodes()).toArray(new Double[0]); } else if (subtype.equals("map")) { return (Map[]) parseParamChildren(child.getChildNodes()).toArray(); } else if (subtype.equals("object")) { return parseParamChildren(child.getChildNodes()).toArray(); } } else if (type.equals("map")){ Map map = new HashMap<String, Object>(); NodeList children = child.getChildNodes(); for(int idx = children.getLength() - 1; idx >= 0; --idx){ Node c = children.item(idx); map.put(c.getAttributes().getNamedItem("key").getNodeValue(), parseParamChild(c)); } return map; } else { try { return Class.forName(subtype).getConstructor(new Class[]{String.class}).newInstance(new Object[]{child.getNodeValue()}); } catch (ClassNotFoundException ex) { //TODO } catch (NoSuchMethodException ex) { //TODO } catch (SecurityException ex) { //TODO } catch (InstantiationException ex) { //TODO } catch (IllegalAccessException ex) { //TODO } catch (IllegalArgumentException ex) { //TODO } catch (InvocationTargetException ex) { //TODO } } return new Object(); } //-------------------------------------------------------------------------------------- I also added/modified the configureAnalyzer method: protected static Analyzer configureAnalyzer(Element config) throws DatabaseConfigurationException { String className = config.getAttribute(CLASS_ATTRIBUTE); List<Object> params = parseParamChildren(config.getElementsByTagName(PARAM_CHILDREN).item(0).getChildNodes()); Class[] signature = new Class[params.size()]; Object[] values = new Object[params.size()]; int idx = 0; for(Object p : params){ signature[idx] = p.getClass(); values[idx] = p; ++idx; } if (className != null && className.length() != 0) { try { Class<?> clazz = Class.forName(className); if (!Analyzer.class.isAssignableFrom(clazz)) throw new DatabaseConfigurationException("Lucene index: analyzer class has to be" + " a subclass of " + Analyzer.class.getName()); if (signature.length > 0){ try { return (Analyzer) clazz.getConstructor(signature).newInstance(values); } catch (NoSuchMethodException ex) { //TODO } catch (SecurityException ex) { //TODO } catch (IllegalArgumentException ex) { //TODO } catch (InvocationTargetException ex) { //TODO } } return (Analyzer) clazz.newInstance(); ///--- snip ---- rest is unmodified What this gets me, i believe, is the following: <analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer"> <params> <string>Something</string> </params> </analyzer> This configuration hunts for a single string accepting constructor and passes in Something to it. The way I have written the parser code there is predefined "shorthand" for string, int, float, double, char and class. You can actually put in any full java type that accepts a string constructor to accept the value being passed in. <int> is just shorthand for <object type="java.lang.Integer"> for instance. In addition I have allowed primitive arrays of the common base types as well as map. I currently have plans to adjust map to allow specifying the concrete type to implement just in case as well as a list type. At present maps are recursive, but the primitive arrays are not I don't think. IE: I can't construct String[][] with this method that I know of. This ties configuration of the analyzer into a namespace under the analyzer tag so it doesn't intrude on other potential things to be added (params right now, but easily renamed if desired... maybe init or configure?) and allows for some pretty simple configuration of even complex objects. Lets say I have an analyzer that takes 2 ints, a float, and a String array: <analyzer class="analyzers.SuperAnalyzer"> <params> <int name="min">2</int> <int name="max">10</int> <float name="boost">.5</float> <array name="stoplist" type="string"> <string>an</string> <string>the</string> <string>then</string> <string>test</string> </array> </params> </analyzer> I added name in there, though it is completely unused/ignored, but would help if trying to understand roughly what the config entries were mapping too so it is easier to tune by someone else. Note this still has the mandatory condition that the order of the nodes in the XML exactly match the constructor signature, or it will not be found. If people think this is a good approach I can work on making a few of those changes, most are fairly straight forward. The code I currently have does compile (can't remember it's exact version/date) but I have NOT tested it at all, even just to see if it parses without crashing. I was more interested in the approach and feedback before I start fiddling with testing related stuff. Jacob Myers |