From: Jacob M. <jac...@gm...> - 2010-04-22 19:24:10
|
I have gotten around to doing some more work on this. I submit the following diff file for anyone interested in giving it a test. I test built this against trunk, but it should work with 1.4 since nothing seemed to change. It works for most "reasonable" cases I believe, and in the worst case it should be 100% backwards compatible as it only kicks in when it sees analyzers with a params child under it. I tried to throw out reasonable errors and trap everything that might be not good. Of importance is the way I constructed this there is one significant problem I can't seem to avoid: I can't target constructors using primitives. IE: public constructor(int, int) isn't actually addressable in any way I can see since I can't get a class for a primitive to match against a constructor. If anyone has more java experience than me and can shed some light on this I would love to try to fix it. So only full-class types: Integer/Float/etc are supported, lists/map should work as well, everything is recursive where possible, similar to the primitive problem arrays are supported, but not recursively, so no Integer[][] but Integer[] should do fine. I assume most analyzers aren't going to need this crazy level of customization so it might not be an issue, if I could fix the problem with primitives, which I thought auto-boxing would have done but appears to fail, that would be great. Feedback is welcome too. Jacob Myers On Wed, Apr 7, 2010 at 1:36 PM, Jacob Myers <jac...@gm...> wrote: > I am bringing this thread over to devel list since that seems like the > most appropriate location for this part of the discussion. > > I was thinking about it more and I came up with a slight variation on > this last approach primarily to satisfy my desire not to accidentally > trample anything. I was able to track down AnalyzerConfig and that is > where I started my work, good to know I was following that part of the > code correctly then. I gave it a few hours and banged together the > following changes which I am posting for initial review from the > developers for some feedback/suggestions. > > So org.exist.indexing.lucene.AnalyzerConfig has the following changes: > Added: > > private final static String PARAM_CHILDREN = "params"; > private final static String PARAM_TYPE_ATTRIBUTE = "type"; > > private static List<Object> parseParamChildren(NodeList children){ > List<Object> p = new ArrayList<Object>(children.getLength()); > for(int idx = children.getLength() - 1; idx >= 0; --idx){ > p.add(0, parseParamChild(children.item(idx))); > } > return p; > } > > private static Object parseParamChild(Node child) { > String type = child.getNodeName(); > String subtype = > child.getAttributes().getNamedItem(PARAM_TYPE_ATTRIBUTE).getNodeValue(); > > if(type.equals("class")){ > try { > return Class.forName(child.getNodeValue()); > } catch (ClassNotFoundException ex) { > //TODO > } > } else if (type.equals("string")){ > return child.getNodeValue(); > } else if (type.equals("int")){ > return Integer.parseInt(child.getNodeValue()); > } else if (type.equals("char")){ > return child.getNodeValue().toCharArray(); > } else if (type.equals("float")){ > return Float.parseFloat(child.getNodeValue()); > } else if (type.equals("double")){ > return Double.parseDouble(child.getNodeValue()); > } else if (type.equals("array")){ > if (subtype.equals("string")){ > return > parseParamChildren(child.getChildNodes()).toArray(new String[0]); > } else if (subtype.equals("int")){ > return > parseParamChildren(child.getChildNodes()).toArray(new Integer[0]); > } else if (subtype.equals("float")){ > return > parseParamChildren(child.getChildNodes()).toArray(new Float[0]); > } else if (subtype.equals("double")){ > return > parseParamChildren(child.getChildNodes()).toArray(new Double[0]); > } else if (subtype.equals("map")) { > return (Map[]) > parseParamChildren(child.getChildNodes()).toArray(); > } else if (subtype.equals("object")) { > return parseParamChildren(child.getChildNodes()).toArray(); > } > } else if (type.equals("map")){ > Map map = new HashMap<String, Object>(); > > NodeList children = child.getChildNodes(); > for(int idx = children.getLength() - 1; idx >= 0; --idx){ > Node c = children.item(idx); > > map.put(c.getAttributes().getNamedItem("key").getNodeValue(), > parseParamChild(c)); > } > > return map; > } else { > try { > return Class.forName(subtype).getConstructor(new > Class[]{String.class}).newInstance(new > Object[]{child.getNodeValue()}); > } catch (ClassNotFoundException ex) { > //TODO > } catch (NoSuchMethodException ex) { > //TODO > } catch (SecurityException ex) { > //TODO > } catch (InstantiationException ex) { > //TODO > } catch (IllegalAccessException ex) { > //TODO > } catch (IllegalArgumentException ex) { > //TODO > } catch (InvocationTargetException ex) { > //TODO > } > } > > return new Object(); > } > //-------------------------------------------------------------------------------------- > > I also added/modified the configureAnalyzer method: > protected static Analyzer configureAnalyzer(Element config) throws > DatabaseConfigurationException { > String className = config.getAttribute(CLASS_ATTRIBUTE); > > List<Object> params = > parseParamChildren(config.getElementsByTagName(PARAM_CHILDREN).item(0).getChildNodes()); > > Class[] signature = new Class[params.size()]; > Object[] values = new Object[params.size()]; > int idx = 0; > for(Object p : params){ > signature[idx] = p.getClass(); > values[idx] = p; > ++idx; > } > > if (className != null && className.length() != 0) { > try { > Class<?> clazz = Class.forName(className); > if (!Analyzer.class.isAssignableFrom(clazz)) > throw new DatabaseConfigurationException("Lucene > index: analyzer class has to be" + > " a subclass of " + Analyzer.class.getName()); > if (signature.length > 0){ > try { > return (Analyzer) > clazz.getConstructor(signature).newInstance(values); > } catch (NoSuchMethodException ex) { > //TODO > } catch (SecurityException ex) { > //TODO > } catch (IllegalArgumentException ex) { > //TODO > } catch (InvocationTargetException ex) { > //TODO > } > } > return (Analyzer) clazz.newInstance(); > ///--- snip ---- rest is unmodified > > > What this gets me, i believe, is the following: > > <analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer"> > <params> > <string>Something</string> > </params> > </analyzer> > > This configuration hunts for a single string accepting constructor and > passes in Something to it. The way I have written the parser code > there is predefined "shorthand" for string, int, float, double, char > and class. You can actually put in any full java type that accepts a > string constructor to accept the value being passed in. <int> is just > shorthand for <object type="java.lang.Integer"> for instance. In > addition I have allowed primitive arrays of the common base types as > well as map. I currently have plans to adjust map to allow specifying > the concrete type to implement just in case as well as a list type. At > present maps are recursive, but the primitive arrays are not I don't > think. IE: I can't construct String[][] with this method that I know > of. > > This ties configuration of the analyzer into a namespace under the > analyzer tag so it doesn't intrude on other potential things to be > added (params right now, but easily renamed if desired... maybe init > or configure?) and allows for some pretty simple configuration of even > complex objects. Lets say I have an analyzer that takes 2 ints, a > float, and a String array: > > <analyzer class="analyzers.SuperAnalyzer"> > <params> > <int name="min">2</int> > <int name="max">10</int> > <float name="boost">.5</float> > <array name="stoplist" type="string"> > <string>an</string> > <string>the</string> > <string>then</string> > <string>test</string> > </array> > </params> > </analyzer> > > I added name in there, though it is completely unused/ignored, but > would help if trying to understand roughly what the config entries > were mapping too so it is easier to tune by someone else. Note this > still has the mandatory condition that the order of the nodes in the > XML exactly match the constructor signature, or it will not be found. > > If people think this is a good approach I can work on making a few of > those changes, most are fairly straight forward. The code I currently > have does compile (can't remember it's exact version/date) but I have > NOT tested it at all, even just to see if it parses without crashing. > I was more interested in the approach and feedback before I start > fiddling with testing related stuff. > > Jacob Myers > |