Support conventional regular expression
A source code analyzer
Brought to you by:
adangel,
juansotuyo
The current XPath serach does not support the
conventional regular expression.
For example the user would like to search for hard
coded static IP addresses a regular expression would
look like
\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b
This kind of search on string literals would provide
great power to user to extend XPath Rules.
Logged In: YES
user_id=5159
Hi Siva -
Thanks for the idea! It looks like we could do this via a
Jaxen extension function:
http://jaxen.org/extensions.html
Unless there's an easier way to do it... I'm not sure. I'll
ask about it on the Jaxen user's list...
Yours,
tom
Logged In: YES
user_id=5159
Hi Siva -
OK, looks like an extension function is the way to go:
http://archive.jaxen.codehaus.org/user/msg00883.html
Let's see - do we want to use JDK 1.4 regex package? Or
some third party package - Jakarta ORO or some such?
Yours,
tom
Logged In: YES
user_id=226817
A couple of things to watch out for:
127.0.0.1 (loopback address) is probably OK to see in
source code
Subnet masks might look like IP addresses. I'm not sure
which ones are common. Maybe 255.255.255.0
Don't forget IPv6 addresses (though you might not want to
grab those in your first pass at the problem)
Logged In: YES
user_id=154590
Here was my attempt to do some work on it...
The Source: =========================
String pattern = ".**";
String line = "xxx.xxx.xxx.xxx";
if (Pattern.matches(pattern, line)) {
System.out.println(line + " matches \"" + pattern + "\"");
} else {
System.out.println("NO MATCH");
}
=====================================
To Trigger this logic: ===========================
have a new Rule Class called xpath-regex which will work
based on the following NODE structure of properties
<properties>
<property name="xpath-regex">
<value>
<![CDATA[</value></property></properties>
//LocalVariableDeclaration/VariableDeclarator/VariableInitializer/Expression/PrimaryExpression/PrimaryPrefix/Literal
]]>
<regex>
<![CDATA[</regex>
\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b
]]>
==================================================
This is another package I thought we could look at
http://jregex.sourceforge.net/
Logged In: YES
user_id=154590
Right now the target I have in mind is only to provide for a
regular expression search on a "STRING LITERAL". I am trying
to use the IP Address concept to show this could be used.
This way we can allow the user to worry about what should or
should not be hard-coded in their source code :-)
How many time have we come across these hard-coded making
them un-portable and ugly :-)
In general I agree with "Elliotte's" comment of what IP
Address would be allowed. We could have this information as
part of Description oand/or Sample code.
Logged In: YES
user_id=226817
YAGNI. If you have a use case or cases that really require
regular expressions, then let's consider it, but let's not
try to solve all problems in advance of actual user
requirements. This particular use case I think would be more
appropriately addressed without regular expressions.
Logged In: YES
user_id=154590
I understand... But the request for this feature is not
aimed at solving one particular case.
To provide a limited background:
We package PMD as part of our product Optimal Advisor
[http://www.compuware.com/products/optimalj/2911_ENG_HTML.htm]
In our current release we a provide a UI wrapper to the user
to be able to build custom coding rules that would enable
them to make productive use of the tool. The UI is a simple
interface where the end-user can fill out a form and have
the rule added to a existing ruleset.
After the current release of our product bundled with PMD,
we have had few requests from our Engineers on the filed for
some extended rules.
One such request was: the need to be able to pass regular
expression as a parameter in string search.
================
The above is the reason I felt a proper regular expression
support would provide the customer with ability to not only
build Custom Rules using XPath, but also extend the XPath
queries with regular expression.
Logged In: YES
user_id=5159
OK, I've gotten Jakarta-ORO 2.0.8 and have started to fiddle
with it... seems doable...
Yours,
Tom
Logged In: YES
user_id=5159
It works! You can actually write XPath like:
//ClassOrInterfaceDeclaration[regexp( @Image, '/Foo/' )]
and it'll return the proper nodes. Good times. To use it,
you'll need to download some new jar files from here:
http://infoether.com/~tom/siva/pmd-3.4.jar
http://infoether.com/~tom/siva/jakarta-oro-2.0.8.jar
and then you can use PMD as usual; just make sure that oro
gets in the CLASSPATH.
I'll polish this up a bit, write some unit tests, and then
check it in... fun stuff!
Yours,
Tom
Logged In: YES
user_id=5159
Oops, there were some problems... I've uploaded a new
version, that should work fine. Here's a demo:
============================
$ cat Foo.java
public class Foo {
public class Fbb {}
public class F1o {}
}
$ ./pmd.sh Foo.java text scratchpad -debug
In JDK 1.4 mode
Loaded rule RegexTest
Processing /home/tom/pmd/pmd/bin/Foo.java
/home/tom/pmd/pmd/bin/Foo.java:1 regex test
/home/tom/pmd/pmd/bin/Foo.java:3 regex test
============================
and here's the test rule:
============================
<rule name="RegexTest" message="regex test" class="">
<description>
test
</description>
<properties>
<property name="xpath">
<value>
<![CDATA
//ClassOrInterfaceDeclaration[regexp(@Image, '/F?o/')
]]>
</value>
</property>
</properties>
<priority>3</priority>
<example>
</example></rule>
============================
Fun stuff!
Tom
Logged In: YES
user_id=5159
Hi Siva -
OK, this is checked in to CVS. In a nutshell, here's how to
use it:
//ClassOrInterfaceDeclaration[regexp(@Image, '/F?o/')]
And there's a dependency on jakarta-oro, so you'll need to
put that in your CLASSPATH to use this. You can get an
updated pmd-3.4.jar file here:
http://infoether.com/~tom/pmd-3.4.jar
that contains this new feature.
Thanks for the suggestion!
Yours,
Tom
Logged In: YES
user_id=154590
Tom:
Thanx a lot... That was nice piece of work and a quick turn
around...
I just tested it out with PMD/Designer and Viewer....
PMD and Designer are working fine, looks like the Viewer is
still missing some code...
Any way ... here is my rule and test case... Think anyone
would find this useful..?
nice thing to do!?!?
<properties>
<property name="xpath">
<value>
<![CDATA[</value></property></properties>
//PrimaryExpression/PrimaryPrefix/Literal[regexp(@Image,'/(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)/')]
]]>
<priority>3</priority>
<example>
<![CDATA
public class ConnectTo {
public void testMethod(Object obj){
//..........do lots of stuff
String sIpAddress = "10.16.20.32";
// do the work
//..........do loads more stuff
}
}
]>
</example>
Thanx
Siva
Logged In: YES
user_id=5159
Hi Siva -
No problemo! By the way, I've modified it slightly so that
you don't have to wrap the regular expression in //, so now
you can just do this:
//ClassOrInterfaceDeclaration[regexp(@Image, 'F?o')]
vs this;
//ClassOrInterfaceDeclaration[regexp(@Image, '/F?o/')]
It'll save a few characters here and there, good times.
Oop, yup, haven't updated that, will do.
Hm, you know, I'm not sure... maybe post it to the forums
and see if anyone has thoughts on it?
I'll almost certainly use it as an example in a blog entry :-)
Yours,
Tom
Logged In: YES
user_id=5159
Viewer is fixed now, new pmd-3.4 uploaded:
http://infoether.com/~tom/pmd-3.4.jar
Yours,
Tom
Logged In: YES
user_id=5159
FYI, I blogged this here:
http://tomcopeland.blogs.com/juniordeveloper/2005/12/using_regular_e.html
Yours,
Tom
Logged In: YES
user_id=154590
Actually my sample comes from the site http://www.regular-
expressions.info/
Logged In: YES
user_id=5159
Ah, OK, I'll tweak that blog entry then, thanks!
Tom
Logged In: YES
user_id=5159
Hi Siva -
One more change - Daniel Sheppard noted that this function
in XPath 2.0 and is called "matches" in the spec:
http://tomcopeland.blogs.com/juniordeveloper/2005/12/using_regular_e.html
I think we should rename it from "regexp" to "matches" so as
to align with the spec... sound OK to you?
Yours,
Tom
Logged In: YES
user_id=5159
Hi Siva -
I went ahead and made this change and uploaded a new jar
file... so now this should work:
//ClassOrInterfaceDeclaration[matches(@Image, 'F?o')]
I'll go ahead and mark this 'pending',
Yours,
Tom
Logged In: YES
user_id=154590
Hi Tom-
Works fine and I have tested it with a couple of test cases..
regards
Siva
Logged In: YES
user_id=5159
Hi Siva -
Cool, thanks for the confirmation!
Yours,
Tom