Menu

Tree [r1] /
 History

HTTPS access


File Date Author Commit
 examples 2013-07-16 dececco [r1] Initial commit, v1
 jars 2013-07-16 dececco [r1] Initial commit, v1
 scripts 2013-07-16 dececco [r1] Initial commit, v1
 src 2013-07-16 dececco [r1] Initial commit, v1
 COPYING 2013-07-16 dececco [r1] Initial commit, v1
 Manifest.txt 2013-07-16 dececco [r1] Initial commit, v1
 README.txt 2013-07-16 dececco [r1] Initial commit, v1
 build.xml 2013-07-16 dececco [r1] Initial commit, v1

Read Me

XMLFIND

xmlfind is a tool, written in Java, to extract information from an XML file in a format
compatible with the other UNIX text tools, i.e. a text file including lines separated in fields.

Please not that the focus is on keeping the code small (currently 700 lines) and make extensions simple,
and not on performance; it will works fine on many small files, but certain command combination will
have easily combinatorial costs on the xml tree size. YMMV.

REQUISITES

You need java installed, version 1.6 or later.

INSTALLATION

Check out the sources, compile running ant in the xmlfind directory.
Copy the script/xmlfind to a place included in your shell execution path,
like ~/bin. 

Edit the xmlfind to have XMLFINDDIR variable pointing to the directory
where you checked out the source tree.

USAGE

The xmlfind syntax is:

xmlfind [<commands>]* [ - | files ... ]

The defined commands are the following (a detailed explanation follows):


-path <path>
-match <path> <value>
-not-match <path> <value>
-element [ <tag> | <path> ]
-includes [ <tag> | <path> ]
-not-includes [ <tag> | <path> ]
-print <path>
-separator <separator>
-echo  <label>
-print-filename
-print-path

It apply the commands to all the files given as arguments; if the '-' argument is specified,
it take the list of files to process from the standard input.

This is very convenient when you need to use the find unix command to
produce the list of files; for example, to process all the pom files of a project
you can do:

find <projectdir> -name "pom.xml" -print | xmlfind ... commands ...  -

The xmlfind commands works a bit like the find command, with the difference that they
refer to the nodes of the XML tree, and not to files in the filesystem.
The commands are pipelined: each command act on a node of the XML tree, and pass to the following
commands one or mode nodes to process.

The first command act on the root of the xml tree.

While executing, xmlfind build up non-deterministically an output line; the line being built is printed each
time the last command of the line is executed on a node.

To clarify, let see some command.

-print-filename

Will add the name of the current file to the output line.

-path <path>

The -path command produce (pass to the next command) all the node that are reachable from the current node
following the specified path.

For example, in a pom file, the command:

 -path project/dependencies/dependency

applied to the root of the file, will invoke the following commands on all the "dependency" nodes.

-print <path>

Will add the value of the path to the current output line.
The path will be matched starting from the current node;
The path can ends with an element name or an attribute name; if the path ends with an attribute name,
the value of this path will be the value of the named attribute for the last node of the path,
otherwise it will be the text content of the last element of the path.

For example, 

-print project/modelVersion 

applied to the root of a pom.xml file, will add to the output line the version number of the pom model used.

-print dependency/artifactId 

applied to a node of type 'dependency' will add to the output line its artifactId.

A more complex example:

xmlfind -print-filename -print project/modelVersion -path project/dependencies/dependency -print dependency/artifactId -

Runned after a find looking for poms on a given directory, may for example produce the following output:

./prototypeProxy/pom.xml 4.0.0 junit
./prototypeProxy/pom.xml 4.0.0 commons-httpclient
./prototypeProxy/pom.xml 4.0.0 selenium-framework

It include a line for every path found by the -path command, and a field for every print command, including the -print-filename.
This output can be for example sorted on the thirds column to group all the files or subprojects using a given dependency.

The other defined commands are, for now:

-element [ <tag> | <path> ]

Search under the current node for all elements with the given tag (name); if a path is specified, elements must
match the last name in the path, and his ancestors the rest of the path.

For example, 

-element  dependencies/dependency

will match the dependency node of the project but also those of the plugins.


-separator <separator>

Change the output field separator, from a blank to the string specified as argument, for the following commands.

For example:

find . -name "pom.xml" -print | xmlfind -print-filename -print project/modelVersion -separator ':' -element dependencies/dependency -print dependency/artifactId -

will print for example something like the following:

./project/pom.xml 4.0.0:junit
./project/pom.xml 4.0.0:commons-httpclient
./project/pom.xml 4.0.0:selenium-framework
.......

-up

Move to the parent node; useful for searching a node brother of some kind.

-echo <label>

It will add to the output line the given label; convenient when we want to aggregate the output of multiple
xmlfind execution and we want to identify the source of the line, for example.

-includes [ <tag> | <path> ]
-no-includes [ <tag> | <path> ]

It is a filtering command: if the element identified by the argument (same rules as for the -element command) is (not) present in the 
current node descendants, the current node is further processed by the following commands, otherwise is discarded.
This command is useful to select elements that have a certain structure.

-match <path> <value>
-no-match <path> <value>

If is a filtering command: it look for the value of path starting from the current node (using the rules of the -print command) and if 
this value (trimmed of all initial and final spaces) match (or not match) the provided value, the current node is further processed by the following
commands.
This command is useful to select a specific instance of an element (for example, identify all the pom files that use 
a given plugin).

-print-path

Will add the path of the current  node to the current output line.
So for example, to know where an element called 'name' is in the xml tree of a file, the following command can be helpful:

xmlfind -element name -print-path

EXTENSIONS

The xmlfind comamnd set can be easily extended by adding a Java class for the new command; the commands act on a JDom representation
of the xml file.

Commands that can be implemented are for example:

-up  step up to the parent 
-print-count print the number of children of the current node
... and so on ...


Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.