|
From: James H. <ja...@is...> - 2003-08-07 13:33:57
|
Hi everyone. I'm going to be doing the modelling code, and while I was getting to know the project, I thought I'd take the liberty of cleaning up names.xml into proper xml with a DTD (document type declaration) so it validates. I uploaded it to the patches area ( http://sourceforge.net/tracker/index.php?func=detail&aid=784360&group_id=67048&atid=516699 ) - partly because I don't have CVS access and partly because I don't know the project well enough to make fairly major changes like that. There are a few things I'd like to note: 1. XML is case- and whitespace-sensitive. The author of this file has done a good job here, so they obviously know this. Nonetheless I think it's worth pointing out in any introduction to XML, because most people don't expect it to be either (I guess because html isn't). 2. XML Comments. XML comments go like this: <!-- this is a comment -->. Technically they're a little more involved than this, but everyone I know thinks that using complex comments is a bad idea. C and C++ comments are not valid. 3. Elements vs. Attributes. There are no hard and fast rules on when to use attributes and when to use elements. (An element is a tag - <element>, and an attribute is a key-value pair in the element - <element key="value">.) I think it makes sense to use attributes over elements only when the attribute is a fundamental property of the element that is guaranteed to only have one value. For example, in an email XML, the <to> structure would be a bad candidate for being an attribute as you often want to send email to more than one person. On the other hand, in a bottle of wine XML, the vintage would be a good candidate: <wine vintage="1986">. Here's a rundown of the things I changed to names.xml (roughly in order of top to bottom): 1. Put a DTD in. All XML documents require a DTD - it serves roughly the same purpose as a C/C++ header file. 2. Put a root element in. All XML documents require exactly one root element. I decided to make this <database>, with an attribute type called "type". This way we can write a tool that can operate on any of our XML files, check what type they are, and display them appropritately. 3. Seperated the document out into two logical sections - a name database and country rules. There didn't seem to me to be any reason to define names inside continents. English names are English names, wherever you are - if America uses different names, then they should be called American names. 4. Removed the quantity attribute on group. I couldn't see any point for it. Here are some things I'd like to do with names.xml, but didn't: 1. Improve the type-checking in the DTD. Most of the data isn't checked at all, and it could be - for example the <probability group="country"> could be checked to make sure country was a country that had been defined in <namegroups>. 2. Change the probability structures. At the moment they're very difficult to parse. However as the document is just a place to put names and there are no tools that use it, this wasn't necessary. 3. Write a tool so that designers don't have to edit the XML manually. It's my opinion that the only people who ever see text files should be programmers, everyone else should have tools. 4. Add a namespace. All our XML should be put in our own namespace, eventually. 5. (And I'm an idiot for forgetting to do this) Put a version attribute in the <database> element. If you're interested in learning more about XML and DTD's and so forth, there are good tutorials at www.w3schools.com. The parser I used to check names.xml is at http://www.stg.brown.edu/service/xmlvalid/. james. |