Sounds like you are doing work very similar to me. I'm also spending quite a bit of time on customization, upgrades, bugtracking, etc.
Our museum has in the range of 500,000 art objects. To say nothing of the number of artists, donors, materials used, locations, as well as all the data from OTHER museums that would be involved in data integration. PLUS all the relations between them.
I'm obviously nowhere near using all of those, and I split my wikis into multiple, purpose-oriented wikis. I've got wikis with 800, 1.5k, 26k, and 47k pages, and that's not even close to where I need to be. Currently running on VMs with 4G of ram. On those larger wikis, the queries don't run so well, but at least I can view pages.
So when I talk about scaling, that's what I'm dealing with.
potentially on the order of millions of pages. Or, at least I need to know how high I can reasonably scale, and what hardwares involved.
If I knew that we could spend $X, and then actually be able to fully scale to millions of pages, with full querying capabilities, I could probably get that money, almost no matter how big X was. This institution finds plenty of money for less worthy projects.
But, if the performance degrades geometrically, then there's a hard limit to what we can do with any amount of money.
My problem is, I got my current servers for asking nicely, because they're VMs partitioned from a large resource. Without some better information on scaling, it's hard for me to get a real budget allocated to experiment.
In any case, I'm getting 3 more VMs shortly (probably need to buy the admin a six-pack of Chimay Blue), at which point I can play a bit more with configuration and performance. I'm NO sysadmin of any sort,
just now learning how to tweak the mysql server.
Which bring me to a question:. say I've got two wikis of equal size, and two VMs, on the same physical hardware. Is it better to put both wikis on one machine, and both dbs on another, or to put each wiki with its db on the same server? I think I just revealed the depth of my ignorance here, but oh well...
Any other easy tweaking advice would be really appreciated. I've played a bit with the index buffer size on mysql. Any other ideas?
From: Laurent Alquier <firstname.lastname@example.org>
To: don undeen <email@example.com>
Sent: Thu, November 4, 2010 12:47:04 PM
Subject: Re: [SMW-devel] Interest in SMW in Museum/Cultural sector
I have been experimenting a lot with using SMW for Enterprise Data Integration in a production environment.
The good news is, like you, I am a convert to how flexible SMW is for data integration. At the moment, I am slowly pushing it for 1/ an enterprise semantic wikipedia and 2/ a gateway to author Linked Data without having to know anything about RDF and SPARQL (at least, as far as authors of content are concerned).
I have not yet reached the breaking point of SMW on our setup. We are using a 5 years old server, with a load of about 50 unique visits a day, 4500 pages and about 25 active contributors. That's also good news - no need for top of the line servers to get a decent amount of work done.
The bad news is that while SMW is indeed very cheap, it comes at the hidden cost of the need for an almost dedicated resources to tune it, improve it and make it successful. I am glad that I can patch the code and mold it to the way I want it to work, but it still requires a lot of attention.
We are using highly customized skins, custom code to improve on existing extensions or to simply make them work in our environment. The result is a very streamlined experience for users but a lot of work behind the scene as we are making it grow.
On Thu, Nov 4, 2010 at 12:21 PM, don undeen <firstname.lastname@example.org>
I just thought I'd give you an update on some presentations I did this past week in the museum/cultural sector on SMW, and the excellent response I've been receiving there.
The first talk I gave was as the Museum Computer Network Conference, MCN2010. The title of the talk was "Semantic Mediawiki for Easy Data Integration." My general thesis was that a combination of templates, forms, auto-page-creating properties, and the ExternalData extension, can allow us to integrate data from a variety of sources, without doing any up-front database design or data mapping. The talk was short (5 minutes), so I don't have many slides. But after the talk was a table session where I answered questions from those in the audience who were interested (they had 4 topics to choose from). My table
was by far the most populated, i'd say about 30 people from museums and cultural institutions around the world were very interested, and engaged me with challenging questions regarding this approach. Museums have a big job trying to communicate information with each other regarding their objects; a typical project is very task-specific (say, a particular website or research project), and involves massive data transfer, as well as data standards negotiation, target database design, data mapping, and vendor relationships. Lots of museums don't have the resources to do it at all. They really liked the idea that a SMW installation could be a low barrier-to-entry "sandbox" where they could put all this data together and see how well it integrates.
I didn't try to say "this is your data integration app," but rather "this is a tool that let's you SEE the data AS you're doing the planning."
The reason I played down the enterprise-level quality of SMW is
that frankly I don't have good data on scalability. Museums have a LOT of data, and I don't have the resources here to really scale out my wiki that large. Plus I haven't resolved some bugs, usability, and UI issues; that may be my fault (not keeping up with all the upgrades), or rough edges in SMW.
Just letting you all know that the interest is there in this industry, for this problem space. If someone can demonstrate scalability and a streamlines user experience, they could really do good work with museums.
I'll be putting together a more detailed document on how I set all this up.
Also, I went to a smaller meeting in Washington DC, to discuss the development of a documentation management system for Museum Conservators. Another international group, with funding for development. In my own museum, I've been using SMW for rapid prototyping and data staging. I'll be using it in this project for data staging; but I don't feel that
the SMW is flexible enough in the UI to use it for app prototyping for this project. Also there are a few conflicts between this software's projected workflow, and what SMW supports. Again, maybe this will turn out not to be true. But in any case, I'll be putting the SMW software in front of these heavy-duty knowledge workers (conservators are actually scientists with heavy data analysis needs). I'll let you know what kind of feedback I get from them.
Again, I'll be putting together a doc (a different one, focused on prototyping) that describes this software setup in more detail.
anyways, just thought I'd let you know that there's now this new group of people out there who are using the term "Semantic Wiki" with a gleam in their eyes. :) . And hopefully my institution will be allocating more of my time to work with SMW.
Let me know if you've got any questions.
cheers all, and thanks for making me look good!
Metropolitan Museum of Art
The Next 800 Companies to Lead America's Growth: New Video Whitepaper
David G. Thomson, author of the best-selling book "Blueprint to a
Billion" shares his insights and actions to help propel your
business during the next growth cycle. Listen Now!
Semediawiki-devel mailing list
- Laurent Alquierhttp://www.linfa.net