I am interested in adding some semantic web goodness to the SEO friendly parts of this application.
The article.pl script generates an HTML page for a single article and the articlerss.pl script generates a sitemap for the entire site. The links in the articlerss.pl take the web browser to the pages generated by the article.pl script. This is the SEO support that is currently in this project.
The way I see it, the articlerss.pl should be enhanced to provide RDF Site Summary modules which is just adding new tags to your existing channels. Also, the article.pl should be changed to generate HTML with the appropriate microformats. Yahoo has just announced that it will be supporting hCard, hCalendar, hReview, hAtom and XFN. They will support vocabulary components from Dublin Core, Creative Commons, FOAF, GeoRSS, and MediaRSS.
I wish that google would come out with a statement on what they will support since they are the 900 pound gorilla of the search space.
What extensions I will implement will depend on how much search engine adoption of these standards there are. After all, why bother with an attribute that no one is looking for?
Sitemap extensions will cause more fields to be added to article maintenance. Microformats will cause changes to the content markup parser. Right now, you have square brackets for links, curly braces for images, asterisks for bullet points, and plus signs for paragraphs. We will need to extend that to support whatever microformats are chosen.
The search facility (searcharticles.pl) will also need to be enhanced to support these new fields.
I would be very interested in your feedback with regards to how you would like to see this project enhanced to support semantic web searching.
Proposed ERD
Logged In: YES
user_id=1576538
Originator: YES
I think that we can get by on the sitemap extensions simply by adding an atom syndication page using the parsed keywords from the article table as the term attribute for the category tag. See the attached PNG which is an ERD for the proposed new data base. In addition to using wiki-esque formatting, different content types are separated out into new tables. Lists, links, images, and paragraphs are represented as such in the database. In addition to that, tables are added for hCard and FOAF content.
File Added: web2newsportalERD.png
Logged In: YES
user_id=1576538
Originator: YES
This might provide more information relevant to SEO.
http://googleblog.blogspot.com/2008/05/introduction-to-google-search-quality.html
Logged In: YES
user_id=1576538
Originator: YES
Google has published a "tech talk" video promoting RDFa
http://www.youtube.com/watch?v=mxE3FeOyS-E
Here is a tutorial.
http://www.w3.org/TR/xhtml-rdfa-primer/
Google announces support for micro formats and RDFa
http://radar.oreilly.com/2009/05/google-announces-support-for-m.html
So far, they are focusing on reviews and people.
http://google.com/support/webmasters/bin/answer.py?hl=en&answer=99170
Here is some more in depth information.
http://google.com/support/webmasters/bin/answer.py?hl=en&answer=146898
In this page, Google publishes some RDFa that they look for.
http://googlewebmastercentral.blogspot.com/2009/05/introducing-rich-snippets.html
Also, this landing page for Yahoo SearchMonkey might be relevant.
http://developer.yahoo.com/searchmonkey/