From: Jacob M. <jac...@gm...> - 2010-04-07 17:37:11
|
I am bringing this thread over to devel list since that seems like the most appropriate location for this part of the discussion. I was thinking about it more and I came up with a slight variation on this last approach primarily to satisfy my desire not to accidentally trample anything. I was able to track down AnalyzerConfig and that is where I started my work, good to know I was following that part of the code correctly then. I gave it a few hours and banged together the following changes which I am posting for initial review from the developers for some feedback/suggestions. So org.exist.indexing.lucene.AnalyzerConfig has the following changes: Added: private final static String PARAM_CHILDREN = "params"; private final static String PARAM_TYPE_ATTRIBUTE = "type"; private static List<Object> parseParamChildren(NodeList children){ List<Object> p = new ArrayList<Object>(children.getLength()); for(int idx = children.getLength() - 1; idx >= 0; --idx){ p.add(0, parseParamChild(children.item(idx))); } return p; } private static Object parseParamChild(Node child) { String type = child.getNodeName(); String subtype = child.getAttributes().getNamedItem(PARAM_TYPE_ATTRIBUTE).getNodeValue(); if(type.equals("class")){ try { return Class.forName(child.getNodeValue()); } catch (ClassNotFoundException ex) { //TODO } } else if (type.equals("string")){ return child.getNodeValue(); } else if (type.equals("int")){ return Integer.parseInt(child.getNodeValue()); } else if (type.equals("char")){ return child.getNodeValue().toCharArray(); } else if (type.equals("float")){ return Float.parseFloat(child.getNodeValue()); } else if (type.equals("double")){ return Double.parseDouble(child.getNodeValue()); } else if (type.equals("array")){ if (subtype.equals("string")){ return parseParamChildren(child.getChildNodes()).toArray(new String[0]); } else if (subtype.equals("int")){ return parseParamChildren(child.getChildNodes()).toArray(new Integer[0]); } else if (subtype.equals("float")){ return parseParamChildren(child.getChildNodes()).toArray(new Float[0]); } else if (subtype.equals("double")){ return parseParamChildren(child.getChildNodes()).toArray(new Double[0]); } else if (subtype.equals("map")) { return (Map[]) parseParamChildren(child.getChildNodes()).toArray(); } else if (subtype.equals("object")) { return parseParamChildren(child.getChildNodes()).toArray(); } } else if (type.equals("map")){ Map map = new HashMap<String, Object>(); NodeList children = child.getChildNodes(); for(int idx = children.getLength() - 1; idx >= 0; --idx){ Node c = children.item(idx); map.put(c.getAttributes().getNamedItem("key").getNodeValue(), parseParamChild(c)); } return map; } else { try { return Class.forName(subtype).getConstructor(new Class[]{String.class}).newInstance(new Object[]{child.getNodeValue()}); } catch (ClassNotFoundException ex) { //TODO } catch (NoSuchMethodException ex) { //TODO } catch (SecurityException ex) { //TODO } catch (InstantiationException ex) { //TODO } catch (IllegalAccessException ex) { //TODO } catch (IllegalArgumentException ex) { //TODO } catch (InvocationTargetException ex) { //TODO } } return new Object(); } //-------------------------------------------------------------------------------------- I also added/modified the configureAnalyzer method: protected static Analyzer configureAnalyzer(Element config) throws DatabaseConfigurationException { String className = config.getAttribute(CLASS_ATTRIBUTE); List<Object> params = parseParamChildren(config.getElementsByTagName(PARAM_CHILDREN).item(0).getChildNodes()); Class[] signature = new Class[params.size()]; Object[] values = new Object[params.size()]; int idx = 0; for(Object p : params){ signature[idx] = p.getClass(); values[idx] = p; ++idx; } if (className != null && className.length() != 0) { try { Class<?> clazz = Class.forName(className); if (!Analyzer.class.isAssignableFrom(clazz)) throw new DatabaseConfigurationException("Lucene index: analyzer class has to be" + " a subclass of " + Analyzer.class.getName()); if (signature.length > 0){ try { return (Analyzer) clazz.getConstructor(signature).newInstance(values); } catch (NoSuchMethodException ex) { //TODO } catch (SecurityException ex) { //TODO } catch (IllegalArgumentException ex) { //TODO } catch (InvocationTargetException ex) { //TODO } } return (Analyzer) clazz.newInstance(); ///--- snip ---- rest is unmodified What this gets me, i believe, is the following: <analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer"> <params> <string>Something</string> </params> </analyzer> This configuration hunts for a single string accepting constructor and passes in Something to it. The way I have written the parser code there is predefined "shorthand" for string, int, float, double, char and class. You can actually put in any full java type that accepts a string constructor to accept the value being passed in. <int> is just shorthand for <object type="java.lang.Integer"> for instance. In addition I have allowed primitive arrays of the common base types as well as map. I currently have plans to adjust map to allow specifying the concrete type to implement just in case as well as a list type. At present maps are recursive, but the primitive arrays are not I don't think. IE: I can't construct String[][] with this method that I know of. This ties configuration of the analyzer into a namespace under the analyzer tag so it doesn't intrude on other potential things to be added (params right now, but easily renamed if desired... maybe init or configure?) and allows for some pretty simple configuration of even complex objects. Lets say I have an analyzer that takes 2 ints, a float, and a String array: <analyzer class="analyzers.SuperAnalyzer"> <params> <int name="min">2</int> <int name="max">10</int> <float name="boost">.5</float> <array name="stoplist" type="string"> <string>an</string> <string>the</string> <string>then</string> <string>test</string> </array> </params> </analyzer> I added name in there, though it is completely unused/ignored, but would help if trying to understand roughly what the config entries were mapping too so it is easier to tune by someone else. Note this still has the mandatory condition that the order of the nodes in the XML exactly match the constructor signature, or it will not be found. If people think this is a good approach I can work on making a few of those changes, most are fairly straight forward. The code I currently have does compile (can't remember it's exact version/date) but I have NOT tested it at all, even just to see if it parses without crashing. I was more interested in the approach and feedback before I start fiddling with testing related stuff. Jacob Myers |
From: Jacob M. <jac...@gm...> - 2010-04-22 19:24:10
Attachments:
AnalyzerConfig.java.diff
|
I have gotten around to doing some more work on this. I submit the following diff file for anyone interested in giving it a test. I test built this against trunk, but it should work with 1.4 since nothing seemed to change. It works for most "reasonable" cases I believe, and in the worst case it should be 100% backwards compatible as it only kicks in when it sees analyzers with a params child under it. I tried to throw out reasonable errors and trap everything that might be not good. Of importance is the way I constructed this there is one significant problem I can't seem to avoid: I can't target constructors using primitives. IE: public constructor(int, int) isn't actually addressable in any way I can see since I can't get a class for a primitive to match against a constructor. If anyone has more java experience than me and can shed some light on this I would love to try to fix it. So only full-class types: Integer/Float/etc are supported, lists/map should work as well, everything is recursive where possible, similar to the primitive problem arrays are supported, but not recursively, so no Integer[][] but Integer[] should do fine. I assume most analyzers aren't going to need this crazy level of customization so it might not be an issue, if I could fix the problem with primitives, which I thought auto-boxing would have done but appears to fail, that would be great. Feedback is welcome too. Jacob Myers On Wed, Apr 7, 2010 at 1:36 PM, Jacob Myers <jac...@gm...> wrote: > I am bringing this thread over to devel list since that seems like the > most appropriate location for this part of the discussion. > > I was thinking about it more and I came up with a slight variation on > this last approach primarily to satisfy my desire not to accidentally > trample anything. I was able to track down AnalyzerConfig and that is > where I started my work, good to know I was following that part of the > code correctly then. I gave it a few hours and banged together the > following changes which I am posting for initial review from the > developers for some feedback/suggestions. > > So org.exist.indexing.lucene.AnalyzerConfig has the following changes: > Added: > > private final static String PARAM_CHILDREN = "params"; > private final static String PARAM_TYPE_ATTRIBUTE = "type"; > > private static List<Object> parseParamChildren(NodeList children){ > List<Object> p = new ArrayList<Object>(children.getLength()); > for(int idx = children.getLength() - 1; idx >= 0; --idx){ > p.add(0, parseParamChild(children.item(idx))); > } > return p; > } > > private static Object parseParamChild(Node child) { > String type = child.getNodeName(); > String subtype = > child.getAttributes().getNamedItem(PARAM_TYPE_ATTRIBUTE).getNodeValue(); > > if(type.equals("class")){ > try { > return Class.forName(child.getNodeValue()); > } catch (ClassNotFoundException ex) { > //TODO > } > } else if (type.equals("string")){ > return child.getNodeValue(); > } else if (type.equals("int")){ > return Integer.parseInt(child.getNodeValue()); > } else if (type.equals("char")){ > return child.getNodeValue().toCharArray(); > } else if (type.equals("float")){ > return Float.parseFloat(child.getNodeValue()); > } else if (type.equals("double")){ > return Double.parseDouble(child.getNodeValue()); > } else if (type.equals("array")){ > if (subtype.equals("string")){ > return > parseParamChildren(child.getChildNodes()).toArray(new String[0]); > } else if (subtype.equals("int")){ > return > parseParamChildren(child.getChildNodes()).toArray(new Integer[0]); > } else if (subtype.equals("float")){ > return > parseParamChildren(child.getChildNodes()).toArray(new Float[0]); > } else if (subtype.equals("double")){ > return > parseParamChildren(child.getChildNodes()).toArray(new Double[0]); > } else if (subtype.equals("map")) { > return (Map[]) > parseParamChildren(child.getChildNodes()).toArray(); > } else if (subtype.equals("object")) { > return parseParamChildren(child.getChildNodes()).toArray(); > } > } else if (type.equals("map")){ > Map map = new HashMap<String, Object>(); > > NodeList children = child.getChildNodes(); > for(int idx = children.getLength() - 1; idx >= 0; --idx){ > Node c = children.item(idx); > > map.put(c.getAttributes().getNamedItem("key").getNodeValue(), > parseParamChild(c)); > } > > return map; > } else { > try { > return Class.forName(subtype).getConstructor(new > Class[]{String.class}).newInstance(new > Object[]{child.getNodeValue()}); > } catch (ClassNotFoundException ex) { > //TODO > } catch (NoSuchMethodException ex) { > //TODO > } catch (SecurityException ex) { > //TODO > } catch (InstantiationException ex) { > //TODO > } catch (IllegalAccessException ex) { > //TODO > } catch (IllegalArgumentException ex) { > //TODO > } catch (InvocationTargetException ex) { > //TODO > } > } > > return new Object(); > } > //-------------------------------------------------------------------------------------- > > I also added/modified the configureAnalyzer method: > protected static Analyzer configureAnalyzer(Element config) throws > DatabaseConfigurationException { > String className = config.getAttribute(CLASS_ATTRIBUTE); > > List<Object> params = > parseParamChildren(config.getElementsByTagName(PARAM_CHILDREN).item(0).getChildNodes()); > > Class[] signature = new Class[params.size()]; > Object[] values = new Object[params.size()]; > int idx = 0; > for(Object p : params){ > signature[idx] = p.getClass(); > values[idx] = p; > ++idx; > } > > if (className != null && className.length() != 0) { > try { > Class<?> clazz = Class.forName(className); > if (!Analyzer.class.isAssignableFrom(clazz)) > throw new DatabaseConfigurationException("Lucene > index: analyzer class has to be" + > " a subclass of " + Analyzer.class.getName()); > if (signature.length > 0){ > try { > return (Analyzer) > clazz.getConstructor(signature).newInstance(values); > } catch (NoSuchMethodException ex) { > //TODO > } catch (SecurityException ex) { > //TODO > } catch (IllegalArgumentException ex) { > //TODO > } catch (InvocationTargetException ex) { > //TODO > } > } > return (Analyzer) clazz.newInstance(); > ///--- snip ---- rest is unmodified > > > What this gets me, i believe, is the following: > > <analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer"> > <params> > <string>Something</string> > </params> > </analyzer> > > This configuration hunts for a single string accepting constructor and > passes in Something to it. The way I have written the parser code > there is predefined "shorthand" for string, int, float, double, char > and class. You can actually put in any full java type that accepts a > string constructor to accept the value being passed in. <int> is just > shorthand for <object type="java.lang.Integer"> for instance. In > addition I have allowed primitive arrays of the common base types as > well as map. I currently have plans to adjust map to allow specifying > the concrete type to implement just in case as well as a list type. At > present maps are recursive, but the primitive arrays are not I don't > think. IE: I can't construct String[][] with this method that I know > of. > > This ties configuration of the analyzer into a namespace under the > analyzer tag so it doesn't intrude on other potential things to be > added (params right now, but easily renamed if desired... maybe init > or configure?) and allows for some pretty simple configuration of even > complex objects. Lets say I have an analyzer that takes 2 ints, a > float, and a String array: > > <analyzer class="analyzers.SuperAnalyzer"> > <params> > <int name="min">2</int> > <int name="max">10</int> > <float name="boost">.5</float> > <array name="stoplist" type="string"> > <string>an</string> > <string>the</string> > <string>then</string> > <string>test</string> > </array> > </params> > </analyzer> > > I added name in there, though it is completely unused/ignored, but > would help if trying to understand roughly what the config entries > were mapping too so it is easier to tune by someone else. Note this > still has the mandatory condition that the order of the nodes in the > XML exactly match the constructor signature, or it will not be found. > > If people think this is a good approach I can work on making a few of > those changes, most are fairly straight forward. The code I currently > have does compile (can't remember it's exact version/date) but I have > NOT tested it at all, even just to see if it parses without crashing. > I was more interested in the approach and feedback before I start > fiddling with testing related stuff. > > Jacob Myers > |