Author: Robert Gaffney
Date: 07/16/2013
Copyright (c) 2013, Robert Gaffney
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Brief Purpose: Use a user-created text file to generate XML lists of games using a
master XML list for use in HyperSpin (or other front-ends).
Background: HyperSpin (and possibly other front-ends) present the user with a list
of games for each platform selected. This list is populated from an XML file.
HyperList (http://hyperlist.hyperspin-fe.com/) provides users with XML master
lists; however, these lists are usually extremely large and more cumbersome than
helpful for most users and painful to edit line by line. Users can optionally
access a list of genres by pressing a button, and when selecting a button, pull
up all the games associated with that genre, also populated from an XML list.
Again, these lists are difficult to edit by hand, where they exist at all.
Solution: This program allows the user to create a text file used to generate the
desired master game list and genre lists for each platform, as well as the XML
list of genres. Each line is a regular expression that is used to match against
the description tag of the games in the master XML file. Each match line in the
text file can match against multiple games. The user can also apply options to a
match line to further zero-in on the specific matches desired. In addition to
just match lines, the user can add genres, using the match lines listed under
each genre to populate that genre.
Implementation Details: For performance purposes, as the master XML file is
parsed, the game's description has all special characters replaced by spaces
and each resulting word is placed in a treemap that links to games containing
that keyword. This allows a greedy search whereby each match line in the user
input file is only attempted to be matched against games that match keywords
derived from the match line. The ---regex option forces the line to be matched
against every game in the master XML file.
The match line match is case insensitive on the main regular expression against
the description tag value. Genre/Manufactuter option searches are case sensitive.
Options passed as arguments when running the program are passed down to genres
where applicable, but options that conflict with these applied to a genre line
in the input file are overridden by the options specified in the file. Similarly,
options in the genre are passed down to each match line under the genre, but are
overridden by any conflicting options on the match line.
The program will output a file named the same new as the master xml file input,
but with "new" added to the name. So "originalList.xml" would result in the
creation of "originalListnew.xml". Each genre listed in the input text file is
output to a file of that name. So +++Some Genre+++ in the input file results in
"Some Genre.xml" being output. Additionally, "Genres.xml" is output, containing
the list of all genres.
If the same genre is listed twice (or more) in the input file, the list of their
matched games are merged in a single output file. This can actually be used
advantageously by using different genre options on each. For example:
+++Genre of Star Games+++ ---manufacuturer .*Sega.*
star
+++Genre of Star Games+++ ---year 1980-1984
star
By default, a full list is created in addition to a file for each genre listed.
This can be overridden with command-line arguments listed below. The full list
is the compilation of every match line listed in the file, so it contains every
game in all the genres files. If you need a different full list than the
compilation of each genre, you can list additional matches before the first
genre declaration. Or you can run the program once with the -noFullList option
on one input file, then with the -noGenres option on another one.
By default, each output file is sorted by the game description. In this case,
duplicates are not listed more than once in the output file. The user has the
option to sort by other game attributes. In other sorting schemes, duplicates
are listed as they appear.
When performing a match, if the matching engine does not find any keywords
in the regex that match against the keywords extracted from the game xml or
the match line fails to match a single game returned by the greedy keyword
search, the engine "falls back" to running the match line regex on every game
from the xml file. So something like "colou?r" wouldn't break the engine.
By default the given regex on the match line is surrounded by ".*", so that it
operates like a keyword match, not a full match. The ---strict option can be
used for a full match.
Anything in parentheses is removed from the game's description tag for the
purposes of matching. So "Game (World, set 2)" is searched as "GAME". This
prevents mismatches due to region, version, revision, set, hack, bootleg, etc.
I intend to improve this later on by adding an option to match region,
version/revision, set, bootleg/hack.
Donations appreciated:
https://www.paypal.com/us/cgi-bin/webscr?cmd=_flow&SESSION=_FYwMKVEKmqcrBNVTyg171LeBlaQufXo7Epo-tsz2f8cF2V1QOl_BgAFf1e&dispatch=5885d80a13c0db1f8e263663d3faee8d4e181b3aff599f99a338772351021e7d
usage:
HSxmlMaker.exe path regexFilename xmlFilename
Run-level Options:
-noFullList Only create genre XML files
-noGenres Only create one XML file with all roms
Genre-level Options - passed to each genre unless overridden
-createMissing [false] If no XML line matches regex, new xml line is created
-noClones [false] Exclude Clones from output XML Files
-sort [name|descr|genre|manufacturer|year|rating|false] Output according to key (false for input file order)
-numbered [false|desc] Add ascending [or descending] nbr to game descr; ex. top 10
Match Line-level options - passed to each match line unless overridden
-noseq [false] Do not match sequels (has 0-9, incl roman numerals, in title)
-nomatchclone [false] Do not match clones
-loose [false] Replace all spaces with wildcard \".*\"
-strict [false] Do not add leading and trailing wildcards \".*\"
-any [false] Match if any of given keywords match
-unique [false] Find best match only
-regex [false] Process as regex against all Game Descriptions
-genre genre_regex|false Genre tag must match regex
-manufacturer manufacturer_regex|false Manufacturer tag must match regex
-year year_to_match|min_year-max_year|false Year tag must be = or in range
Example input file:
+++Sega Only+++ ---manufacturer .*Sega.*
.* ---regex
+++Arcade+++
pac[ -_]?man
+++Retro+++ ---year 1960-1982
.* ---regex
+++Bullet Hell+++ ---genre Shoot.*
.* ---regex ---manufacturer .*Cave.*
+++Ambiguous Shooters+++ ---genre Shooter
star
pilot
battle
+++Originals Only+++ ---noseq
gradius
street fighter
+++Sort by Year+++ ---sort year
street fighter
+++Top 3+++ ---numbered ---sort false ---unique
ms.*pac[ -_]?man
asteroids
missile command