Hi guys, i was reading SQL code (query.c), and TODO...
I think that mysqlfs developers don't know what is mysqlfs... is it a local filesystem or a network filesystem? (ok i know you know what's mysqlfs, but what's mysql idea? a network or a local database?)
It's very important question. a local filesystem can use shared memory (ext2), a network filesystem should use server based shared memory (smbfs, cifs, nfs). for local filesystem only we can use SQLITE, Berkley, others... for network filesystem we NEED a network database (mysql, oracle, postgres, firebird)... ok? mysql is fast because of MYISAM and fast protocol (we can use compress for a slow connection)
FOR SELECT OPTIMIZATIONS:
http://dev.mysql.com/doc/refman/5.0/en/select.html
When we need speed we could use SELECT optimizations...
for example at 'query.c':
all SELECT query, could use SELECT SQL_CACHE, all SELECT with small results can use SMALL_RESULT (inodes), all big result could use BIG_RESULT (data), if using mysql in local database we don't have network problems but using a network (TCP) we could use SQL_BUFFERED_RESULT (to allow read/write intensive and compressed protocol)
anothers optimizations... don't use NULL values, instead this use negative values, or fixed value (constants), for example for '/' we could use always -1 inode (this allow faster REPAIR TABLE, OPTIMIZE TABLE, CHECK TABLE queries, and allow small tables, since NULL is a `constant` value, we could use a CONSTANT value for each field type) for only unsigned column we can use a signed column and set negative values for NULL
Another interesting think... we should allow per ENGINE optimization... for example a using a INNODB optimizated query may be slow on MYISAM engine or an NDB or a ARIA or a PBXT, MEMORY, etc etc... what i sugest? MYISAM/ARIA (MYISAM with crash safe - see mariadb) or NDB (cluster filesystem!!!)
Some mysql (mariadb) tendencies...
MYISAM for not important tables (log/select intensive) (no crash safe, but can be used in filesystem with crash safe...) and table lock level
ARIA for important tables (with crash safe, can be used in ext2 filesystem) and page lock level
INNODB for important tables (with crash safe, can be used in devices without filesystem at partition or md (raid) devices) row lock level
NDB (cluster, distribuited, crash safe shared-nothing) very interesting... we HAVE A CLUSTER FILE SYSTEM!!!!
for FSCK code:
why not a
CHECK TABLE tables...
REPAIR TABLE tables...
for DELETE/PURGE code:
for every purge, or many deletes we SHOULD use OPTIMIZE TABLES tables...
a optimization but with 2 bytes per row (we are using 4096 block size), add at data_blocks table, the data_size field, at each UPDATE/INSERT we should set it as LENGTH(data), and at filesize we can use SUM(data_size) instead of: SELECT seq, data, LENGTH(data) FROM data_blocks, we could use SELECT seq,data,data_length
this is faster... (for a large number of files) since on each update/insert we can calculate it we will not have problem with non-deterministic values
we should but it in fsck too...
this could be translated to:
"UPDATE inodes SET size=("
"SELECT seq*%d + LENGTH(data) FROM data_blocks WHERE inode=%ld AND seq=("
"SELECT MAX(seq) FROM data_blocks WHERE inode=%ld"
")"
") "
"WHERE inode=%ld",
------
"UPDATE inodes SET size=("
"SELECT SUM(data_length) FROM data_blocks WHERE inode=%ld)"
") "
"WHERE inode=%ld",
another point....
since we use netowork system, TIME functions SHOULD be sincronized, we NEED a DATABASE SIDE DATE/TIME value for UPDATE atime, mtime, utime... Xtime inode values... (NOW() is a good function)
since we use network + database, we have two problems for block size... one is database harddisk (ssd) block size, most ssd use 4096 bytes, some use more, some less, but the main point here is: NETWORK block size, mysql have some variables at database side, i don't remember but i think it's network_buffer or somethink like it, maybe a ****per inode block size**** COULD be implemented?
some queries are very ugly and end with ';', ok it work...
delete from data_blocks where inode not in (select inode from inodes);
could be:
DELETE FROM data_blocks WHERE inode NOT IN (SELECT inode FROM inodes)
i don't understand this:
snprintf(sql, SQL_MAX, "select inode, sum(OCTET_LENGTH(data)) as size from data_blocks group by inode");
and after:
snprintf(sql, SQL_MAX, "update inodes set size=%ld where inode=%ld;", size, inode);
why not merge? and why use OCTET_LENGTH and LEGTH for data column?
check:
UPDATE inodes AS a SET size=( SELECT SUM(LENGTH(data)) FROM data_blocks WHERE inode=a.inode )
ok we can use data_length field... for example:
UPDATE data_blocks SET data_length=LENGTH(data_length);
UPDATE inodes AS a SET size=( SELECT SUM(data_length) FROM data_blocks WHERE inode=a.inode )
and why not LOCK tables when making a FSCK?
LOCK TABLE .... FOR WRITE
UNLOCK TABLES
I DIDN'T CHECKED LOCKS! we can lock a file?
what about using per row lock, page lock or table lock at mysql server side?
some updates should be transaction safe, for example update two data_block, should lock rows and after change it and unlock rows
for all engines- lock table, change data_block rows, unlock table
for row/page lock - we should check per engine lock:
check this INNODB locking:
SELECT counter_field FROM child_codes FOR UPDATE;
UPDATE child_codes SET counter_field = counter_field + 1;
http://dev.mysql.com/doc/refman/5.0/en/innodb-locking-reads.html
Note
Locking of rows for update using SELECT FOR UPDATE only applies when autocommit is disabled (either by beginning transaction with START TRANSACTION or by setting autocommit to 0. If autocommit is enabled, the rows matching the specification are not locked.
if we want implement a lock inside mysql, we should implement somethink like:
create table locks(
inode,
seq,
offset,
size,
connection_id,
uptime_at_lock_time,
LOCK_ID) ENGINE=memory or NDB or MYISAM (don't need to be fail safe if at fsck we TRUNCATE locks)
and at mysqlfs init function we could check SHOW PROCESSLIST check if connection_id is OK (a wathdog work better), and if uptime is bigger than current uptime, some UNIQUE lock time should be used too, maybe per mysqlfs mount point maybe LOCK_ID= option (per mysqlfs client lock)
the problem? with mysql we can't call another CLIENT with default mysql features (we can use plugin, ok dba don't like plugins...) but we could use a message table with a watchdog thread at mysqlfs side
create table watch_dog(
LOCK_ID,
expire_time
)
and a thread only to update it
UPDATE watch_dog SET expire_time=NOW() + some time... WHERE LOCK_ID=(mysqlfs option lock_id maybe MAC ADDRESS? a string or a int field?)
if the watch_dog isn't working we know that a lock is dead with:
LOCK TABLE locks...
DELETE FROM locks WHERE LOCK_ID IN (SELECT LOCK_ID FROM watch_dog WHERE expire_time<NOW())
SELECT COUNT(*) FROM locks WHERE inode=(INODE) AND seq IN (seqs to lock) LOCK_ID!=(CURRENT CONNECTION LOCK ID)
if COUNT(*)>0 we have a lock
if not we should lock it...
UNLOCK TABLES...
there's some ideas... but for a good and fast lock we could put it at data_blocks table... and a !=0 lock field with index...
nice?
some features are easy to implement, some not, but it's nice, not?
thanks guys, i'm using it at office for a cluster filesystem today (using NDB)! :D it works but it's not write intensive... just for reads
i could help at:
roberto at spadim dot com dot br
bye