Hi,

Just a note on the docs.

[1] collects all the docs.
[2] is an ongoing doc on database tables.
[3] is a guide for migration to SQLStore3
[4] is a guide to using fixed properties (introduced in Store3)
[5] was used to collect ideas for Store3, but some of its content might be wrong about the implemented design.


On Tue, Oct 9, 2012 at 8:15 PM, Markus Krötzsch <markus@semantic-mediawiki.org> wrote:
On 08/10/12 15:44, Yaron Koren wrote:
> Hi,
>
> I'm finally trying out SMWSQLStore3, and figuring out how to get my
> extensions compatible with it, so I'm looking at the new table structure
> for the first time.
>
> First of all, is there any documentation about the design decisions that
> went into this new structure? Because it could be that this issue has
> been answered already.

Hi Yaron,

I don't think we have a comprehensive documentation yet (Nischay may
correct me though). You already have a good idea of certain things; I
will try to further clarify this below (I think there are some
misunderstandings).


I am not sure what documentation is being referred here. Most of the work done in Store3
is below the abstraction layer that class SMWStore provides. All extensions should tend to use this abstraction.
Most issues were internal i.e. changes 'in' the methods to update data and shouldn't be a much of a concern for
other extensions. However, you can ask for more documentation if needed and I will try to write more :)

 
>
> If not - basically, the way the new database structure seem to work is
> that there's a separate table for each special property.

That is not the case. You are right that the new structure supports
tables that are used for one property only. This has some advantages:

* The property is determined from the table; no need to store the
property in each row -- less storage space, smaller indexes, less memory
* Changes for that property can be written independently of changes for
other properties. For example, modification date changes on each edit,
yet we do not need to update other properties/tables on each edit --
reduced write activity
* The table is smaller and more specific than general-purpose
property-value tables. This gives MySQL (or any DB) a better chance for
guessing selectivity when doing join optimisations in queries. -- faster
query execution

There is a limit on how many tables a DBMS is happy with, but 30 tables
are not a problem. Reading all data for one subject can become slower if
the data resides in many tables, but reading data for a particular
property should be faster, esp. if the respective property is used a lot
(small tables fit into memory).

Now to your original question: it is not the case that all special
properties have their own tables. The two things are independent:
special properties can be stored in the common catch-all tables and
normal properties can have their own table. It is just that SMW by
default provides special tables for its own special properties. This
includes SF properties that are heavily used (this dependency between
SMW and SF has been there in similar form for many versions; properties
used by SF all the time [on every page build] always had their datatype
hardcoded in SMW).

There is a mechanism for users to create tables for "important"
properties at their own discretion. I am not sure if this is documented yet.


see [4]
 
> "Creation date"
> and "Modification date" both have one, as do "Has improper value for",
> and probably some others. Interestingly, there are also tables for the
> Semantic Forms special properties "Has default form" and "Has alternate
> form".
>
> This strikes me as strange design, for a few reasons:
>
> - I assume that this is done to speed up querying; but, as far as I
> know, many of SMW's special properties - like "Creation date" and the
> rest - are rarely queried on.

See above. Querying is only one aspect. Putting rarely used/rarely
updated data into its own table allows this data to be disregarded in
many cases, thus taking load off the rest of the DB.

>
> - This could lead to an explosion of database tables - it looks like
> there are about 30 in the new structure, which some might consider an
> explosion already, and if there are a bunch more special properties
> added for metadata (like "last author", "first author", etc.) the number
> could just keep growing indefinitely.

30 should not be a problem for any DBMS, and tables are not added
automatically when special properties are added.

>
> - It's understandable that Semantic Forms would get special handling,
> but still, having one extension handle things for another introduces
> dependency issues. What if Semantic Forms got other special properties?
> Or what if, say, "Has alternate form" were removed? It could potentially
> lead to compatibility problems.

We can think about how to reduce this dependency. SF could probably
control the property tables it wants to use by itself. However, I would
recommend having tables for all of its frequently used properties.

>
> - On that note, SF already has two special properties that don't have
> their own table - "Page has default form" and "Creates pages with form",
> both of which can apply to forms just like "Has default form" and "Has
> alternate form" do. Not that I'm suggesting that SMW should get two more
> tables for these, but on the other hand, I believe SF queries on these
> on a fairly regular basis.

Then it would probably be good to have extra tables.

>
> So what could be done instead? I can think of a few options:
>
> - Store all properties, special and otherwise, in the same set of
> tables, and make better use of indexing to speed up queries.

We already make better use of indexing anyway, to the extent that we
know how ;-). But MySQL is mainly using table-based selectivity
measures, so that join optimisation is not very good if everything is
stored in only a few tables.

The other motive for having many tables is to split frequently changing
data from infrequently changing data to allow for better update control
(fewer writes, smaller writes).

>
> - Create tables for general use by all special properties. In the new
> design, SMW has a separate table for each property type, but for special
> properties I think there are really only three types that have been
> used: "Page", "Date" and "String". So it could be that only three
> special property tables would be required. Again, indexing could be used
> here.

That would introduce a distinction between special properties and normal
properties on the database level. We do not think that this is justified
(as you argued, there can well be many "special" properties such as
"first author" that are not so different from normal properties; storing
them in the same table as "has type" would mix up requirements).

>
> - If Semantic Forms specifically is slowing things down with its
> querying, maybe that extension should change its own handling. It
> doesn't necessarily even need to use SMW to store information on "Has
> default form" and the rest - it could use the standard MediaWiki
> page_props table. SF's special properties are never queried on alongside
> regular properties, so I don't think that would have a negative impact
> on users. That could be fairly easily done (although the use of SMW
> would be preserved for backwards compatibility, probably with the aid of
> a LocalSettings.php setting). I don't know if that would improve
> performance at all, but it might be worth trying out.

I don't know of any problem with SF. It would of course be faster if you
would store all SF data for one page in one value, instead of storing
independent values for many properties. But then you would no longer be
able to query for the pages that use a certain form with #ask. One could
have both (store a blob with all information in one value and store each
value independently for queries), but I am not sure that this would be
worth the effort.

>
> - For that matter, maybe many of the other special properties could be
> stored using the page_props table instead. SMW has always been used to
> store metadata just because it was convenient to do - but if that
> storage is leading to significant performance problems, maybe page_props
> is the faster and (at least to the extent that it reduces table clutter)
> easier alternative.

I am not aware of performance problems related to special property reads
in SMW. If SMW reads many special property values, then they come from
many pages; no single page has very many special property values. So
using page_props would probably not change much in terms of overall read
counts.

The new system also has a way to recognize when a property has no value
for a certain subject without looking at its table. So the common case
that a property has no value (e.g., no "allowed values") could be
optimised further (tbd). 
Markus


------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
Semediawiki-devel mailing list
Semediawiki-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel


[1] http://semantic-mediawiki.org/wiki/Category:Developer_documentation
[2]http://semantic-mediawiki.org/wiki/Database_tables
[3]http://semantic-mediawiki.org/wiki/SQLStore3_migration
[4]http://semantic-mediawiki.org/wiki/Fixed_properties
[5]http://semantic-mediawiki.org/wiki/SQLStore_update


--
Cheers,

Nischay Nahata
nischayn22.in