From: Arno H. <aho...@in...> - 2000-11-07 11:06:52
|
I think we should rethink the wikiscore table. While the used metric may be quite useful in a web-like environment, within a wiki it is questionable. The problem is that people usually sign their contributions with their WikiName, and thus user's wiki-homepages get a very high wikiscore. And generally speaking homepages are not that important. So the wikiscore metric fails. For more info see MeatballWiki http://www.usemod.com/cgi-bin/mb.pl?IndexingScheme and look at MostReferencedPages / MostLinkedPages / ShortestPathPages ShortestPath seems to be most interesting but too expensive to compute. (unless someone comes up with a good incremental algorithm) Thus I suggest we change the following for 1.2.0: * drop wikiscore table * related pages are reduced to incoming/outgoing links which are sorted by hitcount. Btw, some people find the terms "incoming/outgoing links" confusing. Is there a better way to describe these? /Arno |
From: Steve W. <sw...@wc...> - 2000-11-09 04:23:29
|
On Tue, 7 Nov 2000, Arno Hollosi wrote: > > I think we should rethink the wikiscore table. > > While the used metric may be quite useful in a web-like environment, within > a wiki it is questionable. The problem is that people usually sign their > contributions with their WikiName, and thus user's wiki-homepages get a > very high wikiscore. And generally speaking homepages are not that > important. So the wikiscore metric fails. > > For more info see MeatballWiki > http://www.usemod.com/cgi-bin/mb.pl?IndexingScheme > and look at MostReferencedPages / MostLinkedPages / ShortestPathPages (The following started as a reply but becomes more and more pedantic as it goes, so I apologize for the tone... but sometimes you work these things out in your head as you write.) This relates the the Semantic Web article on the O'Reilly Network (http://www.xml.com/pub/2000/11/01/semanticweb/index.html?wwwrrr_rss). The problem is Wiki does not distinguish between pages: all pages are the same and have equal meaning, more or less. We are trying to make Wiki (the program, the machine) give meaning to pages created by humans. So we try graphing problems: how many pages does this page link to? how many pages link to this one? how many times has this page been edited? when was this page last edited? how long ago was this page created? how many times has this page been viewed? how many degrees of separation are there between this page and that page? how many link paths are there from this page to that page (if they do not link directly)? what is the shortest path from this page to that page? what pages have names similar to this one? But the machine cannot know ArnoHollosi is a personal page while WhyWikiWorks is a discussion, DesignPatterns describes an abstract concept and DesignPatternsBook describes a textbook published in 1994. So these approaches (and several more listed by Nicolas Roberts) have shortcomings. On c2.com they introduced CategoryDesignPatterns and TopicExtremeProgramming (the Category- and Topic- prefixes) to get around this problem; in the large, then, any Wiki needs a WikiLibrarian, someone who sorts, labels and classifies information. An Information Architect. With a Wiki, everyone who edits pages has to be a WikiLibrarian; i.e. it's a community effort. So we come back to another problem (really, an interesting aspect/feature but also a usability problem) of a WikiWikiWeb: a lot of the organization of the information is by social contract. People agree to social conventions like adding a Category- link at the bottom of their page. I read an article on Lotus Notes recently that reinforced a perception I've had for the last year, ever since I read Jon Udell's "Practical Internet Groupware": it's really hard to get people to adapt new social conventions, or in the case of Notes, learn new ways of doing things. It took email a relatively long time to make it into the workplace because people were used to phones, voice mail, faxes, post-it notes and so on. (Granted email penetrated the workplace pretty fast for a new technology, but then most businesses today still don't have email. My corner deli doesn't). So we can perhaps write a set of guidelines for using a Wiki, include it in pgsrc/, and trust the universe. We can provide a certain number of clues to the user though hitcount, wikiscore and so on. But I think our current model is limited to just that (and any groupware system, ultimately, is too). I remember reading Steven Levy's "Hackers: Heroes of the Computer Revolution," and one of the MIT hackers from the 1960's later remarks that he couldn't believe what they were trying to do on the hardware of the era; he felt they had been naive about what they could accomplish on a PDP-11. Perhaps we are looking at the limitations of a Wiki as well. > ShortestPath seems to be most interesting but too expensive to compute. > (unless someone comes up with a good incremental algorithm) If I read the page correctly it's the Traveling Salesman problem :-) > Thus I suggest we change the following for 1.2.0: > * drop wikiscore table > * related pages are reduced to incoming/outgoing links which are sorted by > hitcount. Now that I've read your code (finally!), and really understand what it's doing, I think we might what to keep it after all. All the count (the number in the parantheses) is, is: select the incoming links for this page and rank them by how many pages link to it select the outgoing links for this page and rank them by how many pages link to it but then again, I might be confused once again. I confess that ever since you added these, I have to stop and reason out what it is they are doing... which leads to your next question: > Btw, some people find the terms "incoming/outgoing links" confusing. Is > there a better way to describe these? They confuse me too :-) Let me see if I get it right: ---- Five pages that link to this one, that are themselves linked to the most by other pages: Five pages this links to, ranked by how many pages link to those pages: Five most popular pages that either link to this one or are linked to by this one: ---- I think the trouble is we don't know why ErikBagfors links to the PhpWiki page. We can guess, is all. I like the idea more and more of conventions like Category- and Topic-. Noone can sort info like a human. We can alternately provide tools to try to help users be good WikiLibrarians, and then the machine can infer meaning from the metadata they provide. That has its own pitfalls, like trying to get everyone to use the same keywords for <META> tags (so then search engines can infer from that too). Rather than think about how to make PhpWiki compute meaningful numbers based on the data, I am going to doodle for a while and think about what information is going to help users find useful information in a Wiki. That's what this is all about, in the end. Right off, a more sophisticated search engine comes to mind. The Meatball Wiki is quite interesting. cheers sw ...............................ooo0000ooo................................. Hear FM quality freeform radio through the Internet: http://wcsb.org/ home page: www.wcsb.org/~swain |