On Sonntag, 17. August 2008, Michael Dale wrote:
> I do sql queries that do selects with lots of results across several
> custom tables. To pull in semantic data I would add the following to
> mediaWikis database abstraction query variables:
> (assuming I have a bunch of mv_index_page results as part of a complex
> join query and wanted property like "spoken_by", or "other properties"
> that I could add conditions on I would add the following (where
> 'spoken_by' is the $prop variable
> $vars = $prop.object_title as $prop;
> $from_tables.=' ' .
> LEFT JOIN 'smw_relations'
> as $prop
> ON (mv_mvd_index.mv_page_id= $prop . subject_id
> $prop . relation_title = '$prop'
> This is a LOT more difficult to do with the new database structure....
> Any advice how I would approach this with the new db structure?
Okay, let's see. It's surely not so obvious any more, since the new DB is not
optimised for joins with MW page IDs at all.
The most straightforward solution (with the most complex query as a result) is
to add joins with further tables as needed. smw_rels2 has a triple structure
s_id, p_id, o_id, and instead of using direct property string names or MW
page ids, these ids now can only be resolved using the table smw_ids. The
latter realises a mapping between the store's ids and MediaWiki page names,
namespaces, and possibly interwiki names. No MediaWiki page id is tracked
there, so if your mv_mvd_index table only holds ids, then you will need
another join with page to get from names to ids. Probably it would be faster
to hold titles+namespaces in mv_mvd_index right away.
This is the canonical modification to use a DB query as before, and again it
depends completely on the SMW internal DB layout. It has many more joins, but
things might be not too bad, given that all joins with smw_ids join on a
single numeric primary key.
Another option should be considered as well: if the number of objects in
either table is typically very small (e.g. I am not sure how few objects you
have in mv_mvd_index), then one can realise the left join in PHP, iterating
through the result. This is never as fast as a direct SQL query, but it is
the solution that is most robust and requires least maintenance, since one
could use SMW API calls to do the individual checks, yielding complete
None of these options are really convincing to me. I think a better way to go
would be to move the relevant data (the data that is used to build
mv_mvd_index) into SMW, storing it there with the help of (predefined) SMW
properties. This must work somehow, since SMW is fairly universal in its data
model. The work then would be in extending the storage methods to make sure
SMW contains all relevant data. After this, the query part would no longer be
direct SQL, but merely an SMW query to the (current) SMW store, whatever it
It depends on the details of the data needed for mv_mvd_index whether or not
this would work well. I have to see more details here.
Finally, it would also be possible to move the construction of the above left
join into the SMW storage implementation. It is possible to subclass SMW
stores to extend/overwrite any of their functions, and such a new store can
easily be shipped and set by extensions. This would enable the tightest
integration and it is certainly the preferred way when private SQL queries
are really unavoidable. It still is easier to maintain because the extension
can presume a certain SMW store to be used as a basis for its customisation,
and it is clear that users cannot simply change the store without breaking
something. Also, minor changes in SMW's storage implementation are simply
inherited by the subclassed store. For instance, one could overwrite/extend
the initialisation and update methods of SMW to ensure that all data
for "special" properties (like spoken_by) is duplicated into another internal
table that is easier to use in the above SQL statement. See the current Halo
1.2 release for an example of subclassing another store (including error
messages when an admin tries to use another unsupported store).
There is yet another solution that requires work on our side: we could extend
the query API of SMW stores so that they are able to put query results into a
temporary DB table (returning its name), which might then be used for further
joins. Optionally this table could use MW page ids or title/namespace pairs.
This solution would be good as a general path for SQL-level extensions of
SMW, but it might force some inefficient behaviour for some SMW stores. It
also precludes some join-order-optimisations that could be possible when
everything is done by a single query.
> This SF.Net email is sponsored by the Moblin Your Move Developer's
> challenge Build the coolest Linux based applications with Moblin SDK & win
> great prizes Grand prize is a trip for two to an Open Source event anywhere
> in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/
> Semediawiki-devel mailing list
Semantic MediaWiki http://semantic-mediawiki.org