moosefs-users Mailing List for MooseFS (Page 127)

Fault tolerant, POSIX-compliant, Net Distributed Storage / File System

Brought to you by: jakub_kruszona, moosefs, oxide94

moosefs-users — Mailing list for MooseFS Users

You can subscribe to this list here.

2009	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec (4)
2010	Jan (20)	Feb (11)	Mar (11)	Apr (9)	May (22)	Jun (85)	Jul (94)	Aug (80)	Sep (72)	Oct (64)	Nov (69)	Dec (89)
2011	Jan (72)	Feb (109)	Mar (116)	Apr (117)	May (117)	Jun (102)	Jul (91)	Aug (72)	Sep (51)	Oct (41)	Nov (55)	Dec (74)
2012	Jan (45)	Feb (77)	Mar (99)	Apr (113)	May (132)	Jun (75)	Jul (70)	Aug (58)	Sep (58)	Oct (37)	Nov (51)	Dec (15)
2013	Jan (28)	Feb (16)	Mar (25)	Apr (38)	May (23)	Jun (39)	Jul (42)	Aug (19)	Sep (41)	Oct (31)	Nov (18)	Dec (18)
2014	Jan (17)	Feb (19)	Mar (39)	Apr (16)	May (10)	Jun (13)	Jul (17)	Aug (13)	Sep (8)	Oct (53)	Nov (23)	Dec (7)
2015	Jan (35)	Feb (13)	Mar (14)	Apr (56)	May (8)	Jun (18)	Jul (26)	Aug (33)	Sep (40)	Oct (37)	Nov (24)	Dec (20)
2016	Jan (38)	Feb (20)	Mar (25)	Apr (14)	May (6)	Jun (36)	Jul (27)	Aug (19)	Sep (36)	Oct (24)	Nov (15)	Dec (16)
2017	Jan (8)	Feb (13)	Mar (17)	Apr (20)	May (28)	Jun (10)	Jul (20)	Aug (3)	Sep (18)	Oct (8)	Nov	Dec (5)
2018	Jan (15)	Feb (9)	Mar (12)	Apr (7)	May (123)	Jun (41)	Jul	Aug (14)	Sep	Oct (15)	Nov	Dec (7)
2019	Jan (2)	Feb (9)	Mar (2)	Apr (9)	May	Jun	Jul (2)	Aug	Sep (6)	Oct (1)	Nov (12)	Dec (2)
2020	Jan (2)	Feb	Mar	Apr (3)	May	Jun (4)	Jul (4)	Aug (1)	Sep (18)	Oct (2)	Nov	Dec
2021	Jan	Feb (3)	Mar	Apr	May	Jun	Jul (6)	Aug	Sep (5)	Oct (5)	Nov (3)	Dec
2022	Jan	Feb	Mar (3)	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec

Flat | Threaded

<< < 1 .. 125 126 127 128 129 .. 163 > >> (Page 127 of 163)

Re: [Moosefs-users] Power failure... unable to restore metadata.mfs

From: Michal B. <mic...@ge...> - 2011-04-05 08:11:53

Hi Pedro!

It may happen that after power failure last changelog may be broken. You
need to find the last line of the changelog (usually changelog.0.mfs) and
delete the last line.

Generally speaking, after power failure it is better to use metadata files
from metalogger, not running mfsmetarestore on the master server.

We made some improvements to the metarestore in the next development
version. Now it is more failproof for these kind of errors. We added a '-b'
option which forces to write the resulting metadata file at the first
encountered error (as "better such than none"). If you would like to test
it, we may send you metarestore sources before we officialy publish 1.6.21.


Kind regards
Michał Borychowski 
MooseFS Support Manager
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Gemius S.A.
ul. Wołoska 7, 02-672 Warszawa
Budynek MARS, klatka D
Tel.: +4822 874-41-00
Fax : +4822 874-41-01



-----Original Message-----
From: Pedro Naranjo [mailto:pe...@st...] 
Sent: Monday, March 21, 2011 11:20 PM
To: moo...@li...
Subject: [Moosefs-users] Power failure... unable to restore metadata.mfs

Hi there,

We run another test today after losing about 3TB+ of data when we could 
not restore the metadata.mfs file. We were in the process of copying 
files over when we power was cut off. This lead to an error when running 
the mfsmetarestore -a command as follows...

3605: '|' expected

We really feel like MooseFS is the best solution that we could find but 
somehow after not being able to recover from something so real as a 
power failure, I really worry.

Please advice as to how to fix this problem.

Sincerely,



Pedro Naranjo / STL Technologies / Solutions Architech / 888.556.0774


----------------------------------------------------------------------------
--
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software 
be a part of the solution? Download the Intel(R) Manageability Checker 
today! http://p.sf.net/sfu/intel-dev2devmar
_______________________________________________
moosefs-users mailing list
moo...@li...
https://lists.sourceforge.net/lists/listinfo/moosefs-users

Re: [Moosefs-users] mfsmetalogger segfaults

From: Michal B. <mic...@ge...> - 2011-04-05 07:49:10

Hi!

We tried to recreate this error but we couldn't. If you could run your
metalogger under valgrind? Possibly it would write some interesting
information. Or you can send us some core dump?


Kind regards
Michał Borychowski 
MooseFS Support Manager
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Gemius S.A.
ul. Wołoska 7, 02-672 Warszawa
Budynek MARS, klatka D
Tel.: +4822 874-41-00
Fax : +4822 874-41-01



-----Original Message-----
From: Boyko Yordanov [mailto:b.y...@ex...] 
Sent: Sunday, March 20, 2011 1:19 PM
To: moo...@li...
Subject: [Moosefs-users] mfsmetalogger segfaults

Hello!

I've been using moosefs for a while. I have 3 metadata backup loggers
running.

I noticed that if I kill mfsmaster process on the master node (simulating
power failure), mfsmetalogger crashes (segfault) on the metadata logger
node. Here are logs entries:

Mar 20 11:45:35 server110 mfsmetalogger[6546]: metadata downloaded
72105B/0.009982s (7.224 MB/s)
Mar 20 11:45:35 server110 mfsmetalogger[6546]: changelog_0 downloaded
0B/0.000001s (0.000 MB/s)
Mar 20 11:45:35 server110 mfsmetalogger[6546]: changelog_1 downloaded
164193B/0.015491s (10.599 MB/s)
Mar 20 11:45:35 server110 mfsmetalogger[6546]: sessions downloaded
3050B/0.001501s (2.032 MB/s)
Mar 20 11:46:03 server110 mfsmetalogger[6546]: sessions downloaded
3050B/0.001497s (2.037 MB/s)
Mar 20 11:47:00 server110 mfsmetalogger[6546]: sessions downloaded
3050B/0.001246s (2.448 MB/s)
Mar 20 11:48:48 server110 mfsmetalogger[6546]: sessions downloaded
3050B/0.001009s (3.023 MB/s)
Mar 20 11:48:48 server110 mfsmetalogger[6546]: connection was reset by
Master
Mar 20 11:49:00 server110 mfsmetalogger[6546]: connecting ...
Mar 20 11:49:00 server110 mfsmetalogger[6546]: connection failed, error:
ECONNREFUSED (Connection refused)
Mar 20 11:49:05 server110 mfsmetalogger[6546]: connecting ...
Mar 20 11:49:05 server110 mfsmetalogger[6546]: connection failed, error:
ECONNREFUSED (Connection refused)
Mar 20 11:49:06 server110 kernel: mfsmetalogger[6546]: segfault at
0000000000000060 rip 000000318c26119d rsp 00007fff2f368170 error 4

from another metadata logger:

Mar 20 13:33:00 server102 mfsmetalogger[5088]: sessions downloaded
3388B/0.000993s (3.412 MB/s)
Mar 20 13:34:00 server102 mfsmetalogger[5088]: sessions downloaded
3388B/0.001000s (3.388 MB/s)
Mar 20 13:35:00 server102 mfsmetalogger[5088]: sessions downloaded
3388B/0.001000s (3.388 MB/s)
Mar 20 13:35:48 server102 mfsmetalogger[5088]: connection was reset by
Master
Mar 20 13:35:50 server102 mfsmetalogger[5088]: connecting ...
Mar 20 13:35:50 server102 mfsmetalogger[5088]: connection failed, error:
ECONNREFUSED (Connection refused)
Mar 20 13:35:55 server102 mfsmetalogger[5088]: connecting ...
Mar 20 13:35:55 server102 mfsmetalogger[5088]: connection failed, error:
ECONNREFUSED (Connection refused)
Mar 20 13:35:56 server102 kernel: mfsmetalogger[5088]: segfault at
0000000000000060 rip 0000003c6386119d rsp 00007fff7d13a7d0 error 4
Mar 20 13:37:23 server102 mfsmetalogger[12676]: set gid to 502
Mar 20 13:37:23 server102 mfsmetalogger[12676]: set uid to 502
Mar 20 13:37:23 server102 mfsmetalogger[12676]: connecting ...
Mar 20 13:37:23 server102 mfsmetalogger[12676]: open files limit: 5000
Mar 20 13:37:23 server102 mfsmetalogger[12676]: connected to Master
Mar 20 13:37:23 server102 mfsmetalogger[12676]: metadata downloaded
72113B/0.013963s (5.165 MB/s)
Mar 20 13:37:23 server102 mfsmetalogger[12676]: changelog_0 downloaded
981876B/0.086934s (11.294 MB/s)
Mar 20 13:37:23 server102 mfsmetalogger[12676]: changelog_1 downloaded
164193B/0.015978s (10.276 MB/s)
Mar 20 13:37:23 server102 mfsmetalogger[12676]: sessions downloaded
3388B/0.001993s (1.700 MB/s)
Mar 20 13:39:00 server102 mfsmetalogger[12676]: sessions downloaded
3388B/0.002965s (1.143 MB/s)
Mar 20 13:40:00 server102 mfsmetalogger[12676]: sessions downloaded
3388B/0.001986s (1.706 MB/s)
Mar 20 13:41:00 server102 mfsmetalogger[12676]: sessions downloaded
3388B/0.000991s (3.419 MB/s)
Mar 20 13:41:23 server102 mfsmetalogger[12676]: connection was reset by
Master
Mar 20 13:41:25 server102 mfsmetalogger[12676]: connecting ...
Mar 20 13:41:25 server102 mfsmetalogger[12676]: connection failed, error:
ECONNREFUSED (Connection refused)
Mar 20 13:41:30 server102 mfsmetalogger[12676]: connecting ...
Mar 20 13:41:30 server102 mfsmetalogger[12676]: connection failed, error:
ECONNREFUSED (Connection refused)
Mar 20 13:41:31 server102 kernel: mfsmetalogger[12676]: segfault at
0000000000000060 rip 0000003c6386119d rsp 00007fff5207ee20 error 4

Both machines are running centos 5.5, x86_64, mfs-1.6.20-2, same for the
master.

Also, not sure if related, but while running tests - killing mfsmaster
process and trying to restore from a metadata logger - sometimes I am unable
to create the metadata.mfs data file, getting the following message:

[root@server102 mfs]# mfsmetarestore -a -d /var/lib/mfs
file 'metadata.mfs.back' not found - will try 'metadata_ml.mfs.back' instead
loading objects (files,directories,etc.) ... ok
loading names ... ok
loading deletion timestamps ... ok
checking filesystem consistency ... ok
loading chunks data ... ok
connecting files and chunks ... ok
hole in change files (entries from 791301 to 791305 are missing) - add more
files

Wondering why are these entries missing. As mfsmetalogger process crashes
after the mfsmaster process is killed, can this be related?  (btw, I'm
building the metadata.mfs file as suggested by Michal Borychowski in another
email regarding a bug in moosefs when using snapshots)

Can't tell for sure but I think that if I clear the /var/lib/mfs folder
(delete all the logs/files) and then start mfsmetalogger clean, there are no
issues when restoring metadata.mfs - all goes fine (at least for the 10
times I've tried so far). So the 'add more files' errors may be related to
having old changelogs in /var/lig/mfs, can anyone confirm this?

Anyone having similar issues?

Thanks a lot!

Boyko




----------------------------------------------------------------------------
--
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
_______________________________________________
moosefs-users mailing list
moo...@li...
https://lists.sourceforge.net/lists/listinfo/moosefs-users

[Moosefs-users] MooseFS on HPUX

From: Michal B. <mic...@ge...> - 2011-04-05 07:30:20

Hi!

 

Ad 1. Officially we do not support HPUX. But the only reason for this is
that we do not have machines with this platform for running tests. We’d be
very happy to announce that MooseFS is HPUX compatible.

 

Ad 2. For the moment, yes, the clients need to use FUSE. In the future
there should be created some kind of conversion to NFS (probably to NFSv4). 

 

Ad 3. Some time ago we created a plugin to NFSv3, but it would need some
improvements. I guess we could create something like this specially for you
- please drop me an email if you are interested.

 

 

Kind regards

Michał Borychowski 

MooseFS Support Manager

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Gemius S.A.

ul. Wołoska 7, 02-672 Warszawa

Budynek MARS, klatka D

Tel.: +4822 874-41-00

Fax : +4822 874-41-01

 

 

hi,i want to ask for you this question，please answer me
1.Does the MooseFs Support HPUX IA64 ???
2. does the client must use fuse???
3. client must use HPUX IA64 ，what should i do????

Re: [Moosefs-users] increase duplication level decrease performance

From: Fyodor U. <uf...@uf...> - 2011-04-04 20:11:18

On 04/04/2011 10:51 PM, Fyodor Ustinov wrote:
> Hi.
>
> ceph osd pool set data size 1
>
> dd if=/dev/zero of=aaa bs=1024000 count=4000
> 4096000000 bytes (4.1 GB) copied, 31.3153 s, 131 MB/s
>
> ceph osd pool set data size 2
> 4096000000 bytes (4.1 GB) copied, 72.7146 s, 56.3 MB/s
>
> ceph osd pool set data size 3
> 4096000000 bytes (4.1 GB) copied, 136.263 s, 30.1 MB/s
>
> Why? I thought increase in the number of copies should increase the 
> performance (in the worst case does not affect).
>
> WBR,
>    Fyodor.
Oops.
Not about moose :)
I'm test ceph and moosefs simultaneously/ this about ceph. :)

About moosefs - bonnie++ show dramatically slow on rewrite test.. :(

[Moosefs-users] increase duplication level decrease performance

From: Fyodor U. <uf...@uf...> - 2011-04-04 20:11:18

Hi.

ceph osd pool set data size 1

dd if=/dev/zero of=aaa bs=1024000 count=4000
4096000000 bytes (4.1 GB) copied, 31.3153 s, 131 MB/s

ceph osd pool set data size 2
4096000000 bytes (4.1 GB) copied, 72.7146 s, 56.3 MB/s

ceph osd pool set data size 3
4096000000 bytes (4.1 GB) copied, 136.263 s, 30.1 MB/s

Why? I thought increase in the number of copies should increase the 
performance (in the worst case does not affect).

WBR,
    Fyodor.

Re: [Moosefs-users] help:mfsfileinfo:operation not permitted.

From: g. <guj...@ge...> - 2011-04-02 02:27:32

Dears:
     Thank you very much for your help.I solved the problem.thanks jose maria !!


2011-04-02 




Best Regards!!
-----------------------------------------------------------------------------------
古举标(Juby,Gu) 存储工程师　信息生产平台 系统支持组
Mobile:13723406010
ＱＱ：190247054
Email:guj...@ge...
华大基因研究院(BGI)
地址：深圳市盐田区北山工业区综合楼1001（518083）
Addr:No.1001,Floor 10,Main Building,Beishan Industrial Zone,Yantian District,Shenzhen,China
Post Code:518083
------------------------------------------------------------------------------------




发件人： jose maria 
发送时间： 2011-04-02  01:29:29 
收件人： moosefs-users 
抄送： 
主题： Re: [Moosefs-users] help:mfsfileinfo:operation not permitted. 
 
El vie, 01-04-2011 a las 18:22 +0800, 古举标 escribió:
> Dears:
>    
>     when i use "mfsfileinfo" and "mfscheckfile" command,there is a error occur.below is the error message,pls help,thanks a lot.
>     #/usr/local/mfs/bin/mfsfileinfo /mnt/mfs
>     /mnt/mfs [0]: Operation not permitted.
>     #/usr/local/mfs/bin/mfscheckfile /mnt/mfs 
>     /mnt/mfs [0]: Operation not permitted.
>  
>     #df -Th | grep mfs
>     mfs#mfsmaster:9421     fuse      13G      2.4M       13G      1%           /mnt/mfs
>     
>     #ll /mnt/
>     drwxrwxrwx   2   root   root   0   Mar  30  12:51     mfs
>  
>    OS:CentOS release 5.4,Version:2.6.18-164.el5,32bit
>    MooseFS version: mfs-1.6.13
>     Fuse version:fuse-2.8.3
>  
* execute the commands over files and mfsdirinfo over directory.
* /mnt/mfs is local directory.
------------------------------------------------------------------------------
Create and publish websites with WebMatrix
Use the most popular FREE web apps or write code yourself; 
WebMatrix provides all the features you need to develop and 
publish your website. http://p.sf.net/sfu/ms-webmatrix-sf
_______________________________________________
moosefs-users mailing list
moo...@li...
https://lists.sourceforge.net/lists/listinfo/moosefs-users

Re: [Moosefs-users] help:mfsfileinfo:operation not permitted.

From: jose m. <let...@us...> - 2011-04-01 17:28:52

El vie, 01-04-2011 a las 18:22 +0800, 古举标 escribió:
> Dears:
>    
>     when i use "mfsfileinfo" and "mfscheckfile" command,there is a error occur.below is the error message,pls help,thanks a lot.
>     #/usr/local/mfs/bin/mfsfileinfo /mnt/mfs
>     /mnt/mfs [0]: Operation not permitted.
>     #/usr/local/mfs/bin/mfscheckfile /mnt/mfs 
>     /mnt/mfs [0]: Operation not permitted.
>  
>     #df -Th | grep mfs
>     mfs#mfsmaster:9421     fuse      13G      2.4M       13G      1%           /mnt/mfs
>     
>     #ll /mnt/
>     drwxrwxrwx   2   root   root   0   Mar  30  12:51     mfs
>  
>    OS:CentOS release 5.4,Version:2.6.18-164.el5,32bit
>    MooseFS version: mfs-1.6.13
>     Fuse version:fuse-2.8.3
>  

* execute the commands over files and mfsdirinfo over directory.
* /mnt/mfs is local directory.

[Moosefs-users] help:mfsfileinfo:operation not permitted.

From: 古举标 <guj...@ge...> - 2011-04-01 10:23:11

Dears:

when i use "mfsfileinfo" and "mfscheckfile" command,there is a error occur.below is the error message,pls help,thanks a lot.
#/usr/local/mfs/bin/mfsfileinfo /mnt/mfs
/mnt/mfs [0]: Operation not permitted.
#/usr/local/mfs/bin/mfscheckfile /mnt/mfs
/mnt/mfs [0]: Operation not permitted.

#df -Th | grep mfs
mfs#mfsmaster:9421 fuse 13G 2.4M 13G 1% /mnt/mfs

#ll /mnt/
drwxrwxrwx 2 root root 0 Mar 30 12:51 mfs

OS:CentOS release 5.4,Version:2.6.18-164.el5,32bit
MooseFS version: mfs-1.6.13
Fuse version:fuse-2.8.3

thanks again.

2011-03-31

--------------------------------------------------------------------------------

Best Regards!!
-----------------------------------------------------------------------------------
古举标(Juby,Gu) 存储工程师 信息生产平台 系统支持组
Mobile:13723406010
ＱＱ：190247054
Email:guj...@ge...
华大基因研究院(BGI)
地址：深圳市盐田区北山工业区综合楼1001（518083）
Addr:No.1001,Floor 10,Main Building,Beishan Industrial Zone,Yantian District,Shenzhen,China
Post Code:518083
------------------------------------------------------------------------------------

----- 原始邮件 -----
发件人: moo...@li...
收件人: guj...@ge...
已发送邮件: Fri, 01 Apr 2011 18:18:17 +0800 (HKT)
主题: Welcome to the "moosefs-users" mailing list

Welcome to the moo...@li... mailing list!

To post to this list, send your email to:

moo...@li...

General information about the mailing list is at:

https://lists.sourceforge.net/lists/listinfo/moosefs-users

If you ever want to unsubscribe or change your options (eg, switch to
or from digest mode, change your password, etc.), visit your
subscription page at:

https://lists.sourceforge.net/lists/options/moosefs-users/gujubiao%40genomics.org.cn

You can also make such adjustments via email by sending a message to:

moo...@li...

with the word `help' in the subject or body (don't include the
quotes), and you will get back a message with instructions.

You must know your password to change your options (including changing
the password, itself) or to unsubscribe. It is:

amfaazpe

Normally, Mailman will remind you of your lists.sourceforge.net
mailing list passwords once every month, although you can disable this
if you prefer. This reminder will also include instructions on how to
unsubscribe or change your account options. There is also a button on
your options page that will email your current password to you.

[Moosefs-users] master and metalogger both appear to be corrupted

From: <da...@sq...> - 2011-03-31 11:13:14


It appears that both the master and the metalogger servers have gotten
corrupted. When I try to run an mfsmeterestore -a on the master I get the
following:

 hole in change files (entries from 10772493 to 624987257 are
missing) - add more files

 So I go to the metalogger and I get the same
message but with a different set of numbers. Is there a way to run a
partial restore, so I can at least get to some of the data? Or is my data
just gone entirely? (I still have the chunkservers, which are intact)

 #
mfsmetarestore -v
 version: 1.6.20

 I've been searching for several days,
but haven't really been able to find much in relation to this. I know that
it isn't related to the snapshot bug, I haven't used snapshots yet.


Thanks,
 Dallin Jones

[Moosefs-users] Hello, I want to ask some questions , Sincerely look forward to your reply!

From: <ha...@si...> - 2011-03-29 02:03:32

Hello!

Will you  help me  solve some problems?

We intend to use moosefs at our product environment as the storage of our 
product service. I want to ask some questions as follows:

Problem One: About MooseFS, If the storage 500T, 3 million files, 
operating 500G times a day, how much memory the metadata needed?

Problem Two: If you modify the management about the metadata's namespace , 
such as from the HASH to B-TREE, whether it needs a lot of work 
and  whether you feel it the feasibility and reasonable?

Problem Three: About MooseFS, MFS whether to support INFINIBAND,  you 
think whether need more work and feel reasonable if we modify it to 
support?

And the last One: About MooseFS,Whether to support the IBM AIX and HP UX ? 
Which OS it supports and which one it doesn't ,please tell me what list?

That's all ,thanks a lot! 

Sincerely look forward to your reply!


Best regards!

Hanyw

[Moosefs-users] Hello, I want to ask some questions , Sincerely look forward to your reply!

From: <ha...@si...> - 2011-03-25 11:36:37

Hello, everyone!

We intend to use moosefs at our product environment as the storage of our 
product service. I want to ask some questions as follows:

Problem One: About MooseFS, If the storage 500T, 3 million files, 
operating 500G times a day, how much memory the metadata needed?

Problem Two: If you modify the management about the metadata's namespace , 
such as from the HASH to B-TREE, whether it needs a lot of work 
and  whether you feel it the feasibility and reasonable?

Problem Three: About MooseFS, MFS whether to support INFINIBAND,  you 
think whether need more work and feel reasonable if we modify it to 
support?

And the last One: About MooseFS,Whether to support the IBM AIX and HP UX ? 
Which OS it supports and which one it doesn't ,please tell me what list?

That's all ,thanks a lot! 

Sincerely look forward to your reply!


Best regards!

Hanyw

[Moosefs-users] Hello, I want to ask some questions , Sincerely look forward to your reply!

From: <ha...@si...> - 2011-03-25 11:15:15

Hello, everyone!

We intend to use moosefs at our product environment as the storage of our 
product service. I want to ask some questions as follows:

Problem One: About MooseFS, If the storage 500T, 3 million files, 
operating 500G times a day, how much memory the metadata needed?

Problem Two: If you modify the management about the metadata's namespace , 
such as from the HASH to B-TREE, whether it needs a lot of work 
and  whether you feel it the feasibility and reasonable?

Problem Three: About MooseFS, MFS whether to support INFINIBAND,  you 
think whether need more work and feel reasonable if we modify it to 
support?

And the last One: About MooseFS,Whether to support the IBM AIX and HP UX ? 
Which OS it supports and which one it doesn't ,please tell me what list?

That's all ,thanks a lot! 

Sincerely look forward to your reply!


Best regards!

Hanyw

[Moosefs-users] 答复: To access data was very slowly，nearly 2 minute。oh my god！

From: TianYuchuan(田玉川) <ti...@fo...> - 2011-03-25 06:10:01

Hi!
Thanks!

The master server inserted SAS 15K speed disk! 

Now the problem had solved!
I had update the moosefs version,now the version is mfs-1.6.20-2.
Updated ，Cpu was 16%。

Anthor question！
Moosefs upgrade! I installed the moosefs mfs-1.6.20-2  version of  a  new server，and  started  the master、chunkserver、client。
The old masterserver was not stoped。The old masterserver was not connected chunkserver、not connected client，but the master process occupied 80% CPU，then I restarted the master service，reduced to 5% CPU utilization。

The master cannot release the CPU?




-----邮件原件-----
发件人: Michal Borychowski [mailto:mic...@ge...] 
发送时间: 2011年3月24日 16:34
收件人: 'TianYuchuan(田玉川)'; 'Shen Guowen'
抄送: moo...@li...
主题: RE: [Moosefs-users] To access data was very slowly，nearly 2 minute。oh my god！

Hi!

You have almost all RAM consumed. As you have 100 million files in the system we suggest putting some extra RAM to the master server. Also it would be advisable to insert SSD disk into the master server so that the hourly metadata dump takes less time.


Kind regards
Michał Borychowski 
MooseFS Support Manager
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Gemius S.A.
ul. Wołoska 7, 02-672 Warszawa
Budynek MARS, klatka D
Tel.: +4822 874-41-00
Fax : +4822 874-41-01



-----Original Message-----
From: TianYuchuan(田玉川) [mailto:ti...@fo...] 
Sent: Thursday, March 17, 2011 10:03 AM
To: Shen Guowen
Cc: moo...@li...
Subject: [Moosefs-users] To access data was very slowly，nearly 2 minute。oh my god！


Hello 

My moosefs system was accessd  very slowly，I nave no idea，please help me！Thanks！！！
files number 104964618 ，chunks number 104963962。
master load is not high，but When the hour every to data cannot accessed，continued for several minutes。General，visit concurrent small， to access data delay was needed a few seconds。

My moosefs system have nine chunks，

The chunk station
1 localhost 192.168.0.118 9422 1.6.19 23387618 3.6 TiB 4.5 TiB 79.72 0 0 B 0 B - 
2 localhost 192.168.0.119 9422 1.6.19 23246974 3.6 TiB 4.5 TiB 79.72 0 0 B 0 B - 
3 localhost 192.168.0.120 9422 1.6.19 23360333 3.6 TiB 4.5 TiB 79.72 0 0 B 0 B - 
4 localhost 192.168.0.121 9422 1.6.19 23192013 3.6 TiB 4.5 TiB 79.69 0 0 B 0 B - 
5 localhost 192.168.0.122 9422 1.6.19 23483418 3.6 TiB 4.5 TiB 79.70 0 0 B 0 B - 
6 localhost 192.168.0.123 9422 1.6.19 23308366 3.6 TiB 4.5 TiB 79.70 0 0 B 0 B - 
7 localhost 192.168.0.124 9422 1.6.19 23361992 3.6 TiB 4.5 TiB 79.69 0 0 B 0 B - 
8 localhost 192.168.0.125 9422 1.6.19 23300478 3.6 TiB 4.5 TiB 79.70 0 0 B 0 B - 
9 localhost 192.168.0.127 9422 1.6.19 23284897 3.5 TiB 4.5 TiB 78.72 0 0 B 0 B -


--------------------------------------------------------------------------------------------------------------------------------------------------
[root@localhost mfs]# free -m
             total       used       free     shared    buffers     cached
Mem:         48295      46127       2168          0         38       8204
-/+ buffers/cache:      37884      10411
Swap:            0          0          0

The CPU using 95%，the highest was by 150%。



-----邮件原件-----
发件人: Shen Guowen [mailto:sh...@ui...] 
发送时间: 2010年8月9日 10:42
收件人: TianYuchuan(田玉川)
抄送: moo...@li...
主题: Re: [Moosefs-users] mfs-master[10546]: CS(192.168.0.125) packet too long (115289537/50000000)

Don't worry!
This is because some of your chunk servers are currently unreachable,
and the master server notices it, then modifies the meta data of files
in those chunk servers to set the "allvalidcopies" to 0 in "struct
chunk". When the master is rescanning the files (fs_test_files() in
filesystem.c), it finds out the valid copy is 0, then print information
into syslog file, just as listed below. However, printing process is
quite time-consuming, especially the mount of files is large. During
this period, the master ignores the chunk server's connection (because
it is in a big loop of test files, and it is a single thread to do this,
maybe this is a pitfall). So although you make sure the chunk server
working correctly, it is useless (you can notice the reconnecting
information in chunk server's syslog file).
You could let the master finish printing, then it will reconnect with
chunk servers, and will notice the files is there, then set the
"allvalidcopies" to a correct value. Then works normally.
Or you can re-compile the program with commenting the line 5512 and line
5482 in filesystem.c(mfs-1.6.15). It will ignore the print messages and
of cause, reduce the fs test time.
Below is from Michal:
-----------------------------------------------------------------------
We give you here some quick patches you can implement to the master
server to improve its performance for that amount of files:

In matocsserv.c in mfsmaster you need to change this line:

#define MaxPacketSize 50000000
into this:
#define MaxPacketSize 500000000

Also we suggest a change in filesystem.c in mfsmaster in "fs_test_files"
function. Change this line:

if ((uint32_t)(main_time())<=starttime+150) {
into:
if ((uint32_t)(main_time())<=starttime+900) {

And also changing this line:
        for (k=0 ; k<(NODEHASHSIZE/3600) && i<NODEHASHSIZE ; k++,i++) {
into this:
        for (k=0 ; k<(NODEHASHSIZE/14400) && i<NODEHASHSIZE ; k++,i++) {

You need to recompile the master server and start it again. The above
changes should make the master server work more stable with large amount
of files.

Another suggestion would be to create two MooseFS instances (eg. 2 x 200
million files). One master server could also be metalogger for the
another system and vice versa.


Kind regards

Michał 
-----------------------------------------------------------------------------

--
Guowen Shen

On Sun, 2010-08-08 at 22:51 +0800, TianYuchuan(田玉川) wrote:
> 
>  
> hello,everyone!
> I have a big quertion,please help me,thank you very much.
> We intend to use moosefs at our product environment as the storage of
> our online photo service. 
> We'll store for about 200 million photo files.  
> I've built one master server(48G mem), one metalogger server, eight
> chunk servers(8*1T SATA). When I copy photo files to the moosefs
> system. At start everything is good. But  I had copyed files 57
> million ，the master machines'CPU were used 100% 
> I sthoped the master when used “/user/local/mfs/sbin/mfsmasterserver
> -s”，that I started the master。but there was a big  problem ，the
> master had not read my files。 These documents are important to me，I
> am very anxious，please help me recover these files，tihanks。
>  
>  I got many error syslog from master server:
> 
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 41991323: 2668/2526212449954462668/176s.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 00000000043CD358 (inode: 50379931 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 50379931: 2926/4294909215566102926/163b.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 00000000002966C3 (inode: 48284 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 48284: bookdata/178/8533354296639220178/180b.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 0000000000594726 (inode: 4242588 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 4242588: bookdata/6631/4300989258725036631/85s.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 0000000000993541 (inode: 8436892 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 8436892: bookdata/7534/3147352338521267534/122b.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 0000000000D906E6 (inode: 12631196 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 12631196: bookdata/8691/11879047433161548691/164s.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 000000000118DC1E (inode: 16825500 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 16825500: bookdata/1232/17850056326363351232/166b.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 0000000001681BC7 (inode: 21019804 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 21019804: bookdata/26/12779298489336140026/246s.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 0000000001A804E1 (inode: 25214108 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 25214108: bookdata/3886/8729781571075193886/30s.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 0000000001E7E826 (inode: 29408412 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 29408412: bookdata/4757/142868991575144757/316b.jpg
> 
> 
> Aug  7 23:56:36 localhost mfsmaster[10546]: CS(192.168.0.124) packet
> too long (115289537/50000000)
> Aug  7 23:56:36 localhost mfsmaster[10546]: chunkserver disconnected -
> ip: 192.168.0.124, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0
> (0.00 GiB)
> Aug  8 00:08:14 localhost mfsmaster[10546]: CS(192.168.0.127) packet
> too long (104113889/50000000)
> Aug  8 00:08:14 localhost mfsmaster[10546]: chunkserver disconnected -
> ip: 192.168.0.127, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0
> (0.00 GiB)
> Aug  8 00:21:03 localhost mfsmaster[10546]: CS(192.168.0.120) packet
> too long (117046565/50000000)
> Aug  8 00:21:03 localhost mfsmaster[10546]: chunkserver disconnected -
> ip: 192.168.0.120, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0
> (0.00 GiB)
> 
> when I visited the mfscgi，the error  was“Can't connect to MFS master
> (IP:127.0.0.1 ; PORT:9421)”
> 。
> 
> Thanks all！ 
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by 
> 
> Make an app they can't live without
> Enter the BlackBerry Developer Challenge
> http://p.sf.net/sfu/RIM-dev2dev 
> _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users


------------------------------------------------------------------------------
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
_______________________________________________
moosefs-users mailing list
moo...@li...
https://lists.sourceforge.net/lists/listinfo/moosefs-users

Re: [Moosefs-users] FreeBSD mfsmetarestore - Operation Not Permitted

From: Michal B. <mic...@ge...> - 2011-03-24 08:37:56

Hi Robert!

 

Do you use 'mfsmakesnapshot' operation (ie. do you have 'SNAPSHOT' entries
in changelogs)? If yes, you may encounter an error I wrote about several
days ago. 

 

If you have nothing secret, you may send to my email address
metadata_ml.mfs.back and changelog_ml.0.mfs with changelog_ml.1.mfs so that
we have a closer look at them.

 

 

Kind regards

Michał Borychowski 

MooseFS Support Manager

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Gemius S.A.

ul. Wołoska 7, 02-672 Warszawa

Budynek MARS, klatka D

Tel.: +4822 874-41-00

Fax : +4822 874-41-01

 

From: Robert Dye [mailto:ro...@in...] 
Sent: Saturday, March 19, 2011 12:20 AM
To: moo...@li...
Subject: [Moosefs-users] FreeBSD mfsmetarestore - Operation Not Permitted

 

# mfsmetarestore -x -a /var/mfs/metadata_ml.mfs.back -d /var/mfs/

file 'metadata.mfs.back' not found - will try 'metadata_ml.mfs.back' instead

loading objects (files,directories,etc.) ... ok

loading names ... ok

loading deletion timestamps ... ok

checking filesystem consistency ... ok

loading chunks data ... ok

connecting files and chunks ... ok

found changelog file 1: /var/mfs/changelog_ml.0.mfs

found changelog file 2: /var/mfs/changelog_ml.1.mfs

change: 1300482060|FREEINODES():2036

154746233: error: 1 (Operation not permitted)

 

I am the root user, is this a bug?

Re: [Moosefs-users] To access data was very slowly，nearly 2 minute。oh my god！

From: Michal B. <mic...@ge...> - 2011-03-24 08:34:34

Hi!

You have almost all RAM consumed. As you have 100 million files in the system we suggest putting some extra RAM to the master server. Also it would be advisable to insert SSD disk into the master server so that the hourly metadata dump takes less time.


Kind regards
Michał Borychowski 
MooseFS Support Manager
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Gemius S.A.
ul. Wołoska 7, 02-672 Warszawa
Budynek MARS, klatka D
Tel.: +4822 874-41-00
Fax : +4822 874-41-01



-----Original Message-----
From: TianYuchuan(田玉川) [mailto:ti...@fo...] 
Sent: Thursday, March 17, 2011 10:03 AM
To: Shen Guowen
Cc: moo...@li...
Subject: [Moosefs-users] To access data was very slowly，nearly 2 minute。oh my god！


Hello 

My moosefs system was accessd  very slowly，I nave no idea，please help me！Thanks！！！
files number 104964618 ，chunks number 104963962。
master load is not high，but When the hour every to data cannot accessed，continued for several minutes。General，visit concurrent small， to access data delay was needed a few seconds。

My moosefs system have nine chunks，

The chunk station
1 localhost 192.168.0.118 9422 1.6.19 23387618 3.6 TiB 4.5 TiB 79.72 0 0 B 0 B - 
2 localhost 192.168.0.119 9422 1.6.19 23246974 3.6 TiB 4.5 TiB 79.72 0 0 B 0 B - 
3 localhost 192.168.0.120 9422 1.6.19 23360333 3.6 TiB 4.5 TiB 79.72 0 0 B 0 B - 
4 localhost 192.168.0.121 9422 1.6.19 23192013 3.6 TiB 4.5 TiB 79.69 0 0 B 0 B - 
5 localhost 192.168.0.122 9422 1.6.19 23483418 3.6 TiB 4.5 TiB 79.70 0 0 B 0 B - 
6 localhost 192.168.0.123 9422 1.6.19 23308366 3.6 TiB 4.5 TiB 79.70 0 0 B 0 B - 
7 localhost 192.168.0.124 9422 1.6.19 23361992 3.6 TiB 4.5 TiB 79.69 0 0 B 0 B - 
8 localhost 192.168.0.125 9422 1.6.19 23300478 3.6 TiB 4.5 TiB 79.70 0 0 B 0 B - 
9 localhost 192.168.0.127 9422 1.6.19 23284897 3.5 TiB 4.5 TiB 78.72 0 0 B 0 B -


--------------------------------------------------------------------------------------------------------------------------------------------------
[root@localhost mfs]# free -m
             total       used       free     shared    buffers     cached
Mem:         48295      46127       2168          0         38       8204
-/+ buffers/cache:      37884      10411
Swap:            0          0          0

The CPU using 95%，the highest was by 150%。



-----邮件原件-----
发件人: Shen Guowen [mailto:sh...@ui...] 
发送时间: 2010年8月9日 10:42
收件人: TianYuchuan(田玉川)
抄送: moo...@li...
主题: Re: [Moosefs-users] mfs-master[10546]: CS(192.168.0.125) packet too long (115289537/50000000)

Don't worry!
This is because some of your chunk servers are currently unreachable,
and the master server notices it, then modifies the meta data of files
in those chunk servers to set the "allvalidcopies" to 0 in "struct
chunk". When the master is rescanning the files (fs_test_files() in
filesystem.c), it finds out the valid copy is 0, then print information
into syslog file, just as listed below. However, printing process is
quite time-consuming, especially the mount of files is large. During
this period, the master ignores the chunk server's connection (because
it is in a big loop of test files, and it is a single thread to do this,
maybe this is a pitfall). So although you make sure the chunk server
working correctly, it is useless (you can notice the reconnecting
information in chunk server's syslog file).
You could let the master finish printing, then it will reconnect with
chunk servers, and will notice the files is there, then set the
"allvalidcopies" to a correct value. Then works normally.
Or you can re-compile the program with commenting the line 5512 and line
5482 in filesystem.c(mfs-1.6.15). It will ignore the print messages and
of cause, reduce the fs test time.
Below is from Michal:
-----------------------------------------------------------------------
We give you here some quick patches you can implement to the master
server to improve its performance for that amount of files:

In matocsserv.c in mfsmaster you need to change this line:

#define MaxPacketSize 50000000
into this:
#define MaxPacketSize 500000000

Also we suggest a change in filesystem.c in mfsmaster in "fs_test_files"
function. Change this line:

if ((uint32_t)(main_time())<=starttime+150) {
into:
if ((uint32_t)(main_time())<=starttime+900) {

And also changing this line:
        for (k=0 ; k<(NODEHASHSIZE/3600) && i<NODEHASHSIZE ; k++,i++) {
into this:
        for (k=0 ; k<(NODEHASHSIZE/14400) && i<NODEHASHSIZE ; k++,i++) {

You need to recompile the master server and start it again. The above
changes should make the master server work more stable with large amount
of files.

Another suggestion would be to create two MooseFS instances (eg. 2 x 200
million files). One master server could also be metalogger for the
another system and vice versa.


Kind regards

Michał 
-----------------------------------------------------------------------------

--
Guowen Shen

On Sun, 2010-08-08 at 22:51 +0800, TianYuchuan(田玉川) wrote:
> 
>  
> hello,everyone!
> I have a big quertion,please help me,thank you very much.
> We intend to use moosefs at our product environment as the storage of
> our online photo service. 
> We'll store for about 200 million photo files.  
> I've built one master server(48G mem), one metalogger server, eight
> chunk servers(8*1T SATA). When I copy photo files to the moosefs
> system. At start everything is good. But  I had copyed files 57
> million ，the master machines'CPU were used 100% 
> I sthoped the master when used “/user/local/mfs/sbin/mfsmasterserver
> -s”，that I started the master。but there was a big  problem ，the
> master had not read my files。 These documents are important to me，I
> am very anxious，please help me recover these files，tihanks。
>  
>  I got many error syslog from master server:
> 
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 41991323: 2668/2526212449954462668/176s.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 00000000043CD358 (inode: 50379931 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 50379931: 2926/4294909215566102926/163b.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 00000000002966C3 (inode: 48284 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 48284: bookdata/178/8533354296639220178/180b.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 0000000000594726 (inode: 4242588 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 4242588: bookdata/6631/4300989258725036631/85s.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 0000000000993541 (inode: 8436892 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 8436892: bookdata/7534/3147352338521267534/122b.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 0000000000D906E6 (inode: 12631196 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 12631196: bookdata/8691/11879047433161548691/164s.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 000000000118DC1E (inode: 16825500 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 16825500: bookdata/1232/17850056326363351232/166b.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 0000000001681BC7 (inode: 21019804 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 21019804: bookdata/26/12779298489336140026/246s.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 0000000001A804E1 (inode: 25214108 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 25214108: bookdata/3886/8729781571075193886/30s.jpg
> Aug  6 00:57:01 localhost mfsmaster[10546]: currently unavailable
> chunk 0000000001E7E826 (inode: 29408412 ; index: 0)
> Aug  6 00:57:01 localhost mfsmaster[10546]: * currently unavailable
> file 29408412: bookdata/4757/142868991575144757/316b.jpg
> 
> 
> Aug  7 23:56:36 localhost mfsmaster[10546]: CS(192.168.0.124) packet
> too long (115289537/50000000)
> Aug  7 23:56:36 localhost mfsmaster[10546]: chunkserver disconnected -
> ip: 192.168.0.124, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0
> (0.00 GiB)
> Aug  8 00:08:14 localhost mfsmaster[10546]: CS(192.168.0.127) packet
> too long (104113889/50000000)
> Aug  8 00:08:14 localhost mfsmaster[10546]: chunkserver disconnected -
> ip: 192.168.0.127, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0
> (0.00 GiB)
> Aug  8 00:21:03 localhost mfsmaster[10546]: CS(192.168.0.120) packet
> too long (117046565/50000000)
> Aug  8 00:21:03 localhost mfsmaster[10546]: chunkserver disconnected -
> ip: 192.168.0.120, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0
> (0.00 GiB)
> 
> when I visited the mfscgi，the error  was“Can't connect to MFS master
> (IP:127.0.0.1 ; PORT:9421)”
> 。
> 
> Thanks all！ 
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by 
> 
> Make an app they can't live without
> Enter the BlackBerry Developer Challenge
> http://p.sf.net/sfu/RIM-dev2dev 
> _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users


------------------------------------------------------------------------------
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
_______________________________________________
moosefs-users mailing list
moo...@li...
https://lists.sourceforge.net/lists/listinfo/moosefs-users

Re: [Moosefs-users] Failing Chunkserver

From: Thomas S H. <tha...@gm...> - 2011-03-23 16:07:11

Yep, that worked! Thanks!

2011/3/23 Michal Borychowski <mic...@ge...>

> So if you know sth happend on the hardware side, just delete the broken
> chunks
>
>
>
>
>
> Regards
>
> Michal
>
>
>
> *From:* Thomas S Hatch [mailto:tha...@gm...]
> *Sent:* Wednesday, March 23, 2011 3:48 PM
> *To:* Michal Borychowski
> *Cc:* moosefs-users
> *Subject:* Re: [Moosefs-users] Failing Chunkserver
>
>
>
> Thanks Michal!
>
> We were having some hardware issues on the node, and I suspect that this is
> a residual problem, I will give your suggestion a try!
>
> 2011/3/23 Michal Borychowski <mic...@ge...>
>
> Hi Thomas!
>
>
>
> You have bad chunk headers (but we don’t know why). You can just erase the
> wrong chunks or change (just for some time) these constants:
>
>
>
> #define LASTERRSIZE 3
>
> #define LASTERRTIME 60
>
>
>
> to:
>
>
>
> #define LASTERRSIZE 10
>
> #define LASTERRTIME 1
>
>
>
> in the mfschunkserver/hddspacemgr.c file, recompile CS and run it again. CS
> will stop to “unlink” the disks and will remove the wrong chunks by itself.
>
>
>
>
>
> Regards
>
> -Michal
>
>
>
> *From:* Thomas S Hatch [mailto:tha...@gm...]
> *Sent:* Wednesday, March 23, 2011 6:10 AM
> *To:* moosefs-users
> *Subject:* [Moosefs-users] Failing Chunkserver
>
>
>
> I am having some trouble with a chunkserver, it errors out and then the
> chunkserver stops working and reports %0 on the mfs cgi page
>
> Here is the error in the logs.
>
>
>
> 2011-03-22T16:18:38+00:00 node10 mfschunkserver[25905]: set gid to 70003
>
> 2011-03-22T16:18:38+00:00 node10 mfschunkserver[25905]: set uid to 70003
>
> 2011-03-22T16:18:38+00:00 node10 mfschunkserver[25783]: closing
> 172.11.1.110:9422
>
> 2011-03-22T16:18:47+00:00 node10 mfschunkserver[25905]: main server module:
> listen on 172.11.1.110:9422
>
> 2011-03-22T16:18:48+00:00 node10 mfschunkserver[25905]: connecting ...
>
> 2011-03-22T16:18:48+00:00 node10 mfschunkserver[25905]: stats file has been
> loaded
>
> 2011-03-22T16:18:48+00:00 node10 mfschunkserver[25905]: open files limit:
> 10000
>
> 2011-03-22T16:18:48+00:00 node10 mfschunkserver[25905]: connected to Master
>
> 2011-03-22T16:18:52+00:00 node10 mfschunkserver[25905]: testing chunk:
> /mnt/moose1/11/chunk_00000000019A9511_00000001.mfs
>
> 2011-03-22T16:18:52+00:00 node10 mfschunkserver[25905]: chunk_readcrc:
> file:/mnt/moose1/11/chunk_00000000019A9511_00000001.mfs - wrong id/version
> in header (00000000019A9511_00000000)
>
> 2011-03-22T16:18:52+00:00 node10 mfschunkserver[25905]: hdd_io_begin:
> file:/mnt/moose1/11/chunk_00000000019A9511_00000001.mfs - read error:
> Unknown error
>
> 2011-03-22T16:19:02+00:00 node10 mfschunkserver[25905]: testing chunk:
> /mnt/moose1/4A/chunk_0000000001EFFE4A_00000001.mfs
>
> 2011-03-22T16:19:03+00:00 node10 mfschunkserver[25905]: chunk_readcrc:
> file:/mnt/moose1/4A/chunk_0000000001EFFE4A_00000001.mfs - wrong id/version
> in header (0000000001EFFE4A_00000000)
>
> 2011-03-22T16:19:03+00:00 node10 mfschunkserver[25905]: hdd_io_begin:
> file:/mnt/moose1/4A/chunk_0000000001EFFE4A_00000001.mfs - read error:
> Unknown error
>
> 2011-03-22T16:19:13+00:00 node10 mfschunkserver[25905]: testing chunk:
> /mnt/moose1/83/chunk_0000000001776783_00000001.mfs
>
> 2011-03-22T16:19:13+00:00 node10 mfschunkserver[25905]: chunk_readcrc:
> file:/mnt/moose1/83/chunk_0000000001776783_00000001.mfs - wrong id/version
> in header (0000000001776783_00000000)
>
> 2011-03-22T16:19:13+00:00 node10 mfschunkserver[25905]: hdd_io_begin:
> file:/mnt/moose1/83/chunk_0000000001776783_00000001.mfs - read error:
> Unknown error
>
> 2011-03-22T16:19:13+00:00 node10 mfschunkserver[25905]: 3 errors occurred
> in 60 seconds on folder: /mnt/moose1/
>
> 2011-03-22T16:19:15+00:00 node10 mfschunkserver[25905]: replicator:
> hdd_create status: 21
>
>
>
> What do these errors mean? And what is the best way to recover?
>
>
>
> If worse comes to worse we of course have replicated chunks, so we can
> format the chunkserver and start it back up, but I am very curious how to
> best approach the situation.
>
>
>
> -Thomas S Hatch
>
>
>

Re: [Moosefs-users] Failing Chunkserver

From: Michal B. <mic...@ge...> - 2011-03-23 16:04:16

So if you know sth happend on the hardware side, just delete the broken
chunks

 

 

Regards

Michal

 

From: Thomas S Hatch [mailto:tha...@gm...] 
Sent: Wednesday, March 23, 2011 3:48 PM
To: Michal Borychowski
Cc: moosefs-users
Subject: Re: [Moosefs-users] Failing Chunkserver

 

Thanks Michal!

We were having some hardware issues on the node, and I suspect that this is
a residual problem, I will give your suggestion a try!

2011/3/23 Michal Borychowski <mic...@ge...>

Hi Thomas!

 

You have bad chunk headers (but we don't know why). You can just erase the
wrong chunks or change (just for some time) these constants:

 

#define LASTERRSIZE 3

#define LASTERRTIME 60

 

to:

 

#define LASTERRSIZE 10

#define LASTERRTIME 1

 

in the mfschunkserver/hddspacemgr.c file, recompile CS and run it again. CS
will stop to "unlink" the disks and will remove the wrong chunks by itself.

 

 

Regards

-Michal

 

From: Thomas S Hatch [mailto:tha...@gm...] 
Sent: Wednesday, March 23, 2011 6:10 AM
To: moosefs-users
Subject: [Moosefs-users] Failing Chunkserver

 

I am having some trouble with a chunkserver, it errors out and then the
chunkserver stops working and reports %0 on the mfs cgi page

Here is the error in the logs.

 

2011-03-22T16:18:38+00:00 node10 mfschunkserver[25905]: set gid to 70003

2011-03-22T16:18:38+00:00 node10 mfschunkserver[25905]: set uid to 70003

2011-03-22T16:18:38+00:00 node10 mfschunkserver[25783]: closing
172.11.1.110:9422

2011-03-22T16:18:47+00:00 node10 mfschunkserver[25905]: main server module:
listen on 172.11.1.110:9422

2011-03-22T16:18:48+00:00 node10 mfschunkserver[25905]: connecting ...

2011-03-22T16:18:48+00:00 node10 mfschunkserver[25905]: stats file has been
loaded

2011-03-22T16:18:48+00:00 node10 mfschunkserver[25905]: open files limit:
10000

2011-03-22T16:18:48+00:00 node10 mfschunkserver[25905]: connected to Master

2011-03-22T16:18:52+00:00 node10 mfschunkserver[25905]: testing chunk:
/mnt/moose1/11/chunk_00000000019A9511_00000001.mfs

2011-03-22T16:18:52+00:00 node10 mfschunkserver[25905]: chunk_readcrc:
file:/mnt/moose1/11/chunk_00000000019A9511_00000001.mfs - wrong id/version
in header (00000000019A9511_00000000)

2011-03-22T16:18:52+00:00 node10 mfschunkserver[25905]: hdd_io_begin:
file:/mnt/moose1/11/chunk_00000000019A9511_00000001.mfs - read error:
Unknown error

2011-03-22T16:19:02+00:00 node10 mfschunkserver[25905]: testing chunk:
/mnt/moose1/4A/chunk_0000000001EFFE4A_00000001.mfs

2011-03-22T16:19:03+00:00 node10 mfschunkserver[25905]: chunk_readcrc:
file:/mnt/moose1/4A/chunk_0000000001EFFE4A_00000001.mfs - wrong id/version
in header (0000000001EFFE4A_00000000)

2011-03-22T16:19:03+00:00 node10 mfschunkserver[25905]: hdd_io_begin:
file:/mnt/moose1/4A/chunk_0000000001EFFE4A_00000001.mfs - read error:
Unknown error

2011-03-22T16:19:13+00:00 node10 mfschunkserver[25905]: testing chunk:
/mnt/moose1/83/chunk_0000000001776783_00000001.mfs

2011-03-22T16:19:13+00:00 node10 mfschunkserver[25905]: chunk_readcrc:
file:/mnt/moose1/83/chunk_0000000001776783_00000001.mfs - wrong id/version
in header (0000000001776783_00000000)

2011-03-22T16:19:13+00:00 node10 mfschunkserver[25905]: hdd_io_begin:
file:/mnt/moose1/83/chunk_0000000001776783_00000001.mfs - read error:
Unknown error

2011-03-22T16:19:13+00:00 node10 mfschunkserver[25905]: 3 errors occurred in
60 seconds on folder: /mnt/moose1/

2011-03-22T16:19:15+00:00 node10 mfschunkserver[25905]: replicator:
hdd_create status: 21

 

What do these errors mean? And what is the best way to recover?

 

If worse comes to worse we of course have replicated chunks, so we can
format the chunkserver and start it back up, but I am very curious how to
best approach the situation.

 

-Thomas S Hatch

Re: [Moosefs-users] Failing Chunkserver

From: Thomas S H. <tha...@gm...> - 2011-03-23 14:48:32

Thanks Michal!
We were having some hardware issues on the node, and I suspect that this is
a residual problem, I will give your suggestion a try!

2011/3/23 Michal Borychowski <mic...@ge...>

> Hi Thomas!
>
>
>
> You have bad chunk headers (but we don’t know why). You can just erase the
> wrong chunks or change (just for some time) these constants:
>
>
>
> #define LASTERRSIZE 3
>
> #define LASTERRTIME 60
>
>
>
> to:
>
>
>
> #define LASTERRSIZE 10
>
> #define LASTERRTIME 1
>
>
>
> in the mfschunkserver/hddspacemgr.c file, recompile CS and run it again. CS
> will stop to “unlink” the disks and will remove the wrong chunks by itself.
>
>
>
>
>
> Regards
>
> -Michal
>
>
>
> *From:* Thomas S Hatch [mailto:tha...@gm...]
> *Sent:* Wednesday, March 23, 2011 6:10 AM
> *To:* moosefs-users
> *Subject:* [Moosefs-users] Failing Chunkserver
>
>
>
> I am having some trouble with a chunkserver, it errors out and then the
> chunkserver stops working and reports %0 on the mfs cgi page
>
> Here is the error in the logs.
>
>
>
> 2011-03-22T16:18:38+00:00 node10 mfschunkserver[25905]: set gid to 70003
>
> 2011-03-22T16:18:38+00:00 node10 mfschunkserver[25905]: set uid to 70003
>
> 2011-03-22T16:18:38+00:00 node10 mfschunkserver[25783]: closing
> 172.11.1.110:9422
>
> 2011-03-22T16:18:47+00:00 node10 mfschunkserver[25905]: main server module:
> listen on 172.11.1.110:9422
>
> 2011-03-22T16:18:48+00:00 node10 mfschunkserver[25905]: connecting ...
>
> 2011-03-22T16:18:48+00:00 node10 mfschunkserver[25905]: stats file has been
> loaded
>
> 2011-03-22T16:18:48+00:00 node10 mfschunkserver[25905]: open files limit:
> 10000
>
> 2011-03-22T16:18:48+00:00 node10 mfschunkserver[25905]: connected to Master
>
> 2011-03-22T16:18:52+00:00 node10 mfschunkserver[25905]: testing chunk:
> /mnt/moose1/11/chunk_00000000019A9511_00000001.mfs
>
> 2011-03-22T16:18:52+00:00 node10 mfschunkserver[25905]: chunk_readcrc:
> file:/mnt/moose1/11/chunk_00000000019A9511_00000001.mfs - wrong id/version
> in header (00000000019A9511_00000000)
>
> 2011-03-22T16:18:52+00:00 node10 mfschunkserver[25905]: hdd_io_begin:
> file:/mnt/moose1/11/chunk_00000000019A9511_00000001.mfs - read error:
> Unknown error
>
> 2011-03-22T16:19:02+00:00 node10 mfschunkserver[25905]: testing chunk:
> /mnt/moose1/4A/chunk_0000000001EFFE4A_00000001.mfs
>
> 2011-03-22T16:19:03+00:00 node10 mfschunkserver[25905]: chunk_readcrc:
> file:/mnt/moose1/4A/chunk_0000000001EFFE4A_00000001.mfs - wrong id/version
> in header (0000000001EFFE4A_00000000)
>
> 2011-03-22T16:19:03+00:00 node10 mfschunkserver[25905]: hdd_io_begin:
> file:/mnt/moose1/4A/chunk_0000000001EFFE4A_00000001.mfs - read error:
> Unknown error
>
> 2011-03-22T16:19:13+00:00 node10 mfschunkserver[25905]: testing chunk:
> /mnt/moose1/83/chunk_0000000001776783_00000001.mfs
>
> 2011-03-22T16:19:13+00:00 node10 mfschunkserver[25905]: chunk_readcrc:
> file:/mnt/moose1/83/chunk_0000000001776783_00000001.mfs - wrong id/version
> in header (0000000001776783_00000000)
>
> 2011-03-22T16:19:13+00:00 node10 mfschunkserver[25905]: hdd_io_begin:
> file:/mnt/moose1/83/chunk_0000000001776783_00000001.mfs - read error:
> Unknown error
>
> 2011-03-22T16:19:13+00:00 node10 mfschunkserver[25905]: 3 errors occurred
> in 60 seconds on folder: /mnt/moose1/
>
> 2011-03-22T16:19:15+00:00 node10 mfschunkserver[25905]: replicator:
> hdd_create status: 21
>
>
>
> What do these errors mean? And what is the best way to recover?
>
>
>
> If worse comes to worse we of course have replicated chunks, so we can
> format the chunkserver and start it back up, but I am very curious how to
> best approach the situation.
>
>
>
> -Thomas S Hatch
>

Re: [Moosefs-users] Failing Chunkserver

From: Michal B. <mic...@ge...> - 2011-03-23 12:13:13

Hi Thomas!

 

You have bad chunk headers (but we don't know why). You can just erase the
wrong chunks or change (just for some time) these constants:

 

#define LASTERRSIZE 3

#define LASTERRTIME 60

 

to:

 

#define LASTERRSIZE 10

#define LASTERRTIME 1

 

in the mfschunkserver/hddspacemgr.c file, recompile CS and run it again. CS
will stop to "unlink" the disks and will remove the wrong chunks by itself.

 

 

Regards

-Michal

 

From: Thomas S Hatch [mailto:tha...@gm...] 
Sent: Wednesday, March 23, 2011 6:10 AM
To: moosefs-users
Subject: [Moosefs-users] Failing Chunkserver

 

I am having some trouble with a chunkserver, it errors out and then the
chunkserver stops working and reports %0 on the mfs cgi page

Here is the error in the logs.

 

2011-03-22T16:18:38+00:00 node10 mfschunkserver[25905]: set gid to 70003

2011-03-22T16:18:38+00:00 node10 mfschunkserver[25905]: set uid to 70003

2011-03-22T16:18:38+00:00 node10 mfschunkserver[25783]: closing
172.11.1.110:9422

2011-03-22T16:18:47+00:00 node10 mfschunkserver[25905]: main server module:
listen on 172.11.1.110:9422

2011-03-22T16:18:48+00:00 node10 mfschunkserver[25905]: connecting ...

2011-03-22T16:18:48+00:00 node10 mfschunkserver[25905]: stats file has been
loaded

2011-03-22T16:18:48+00:00 node10 mfschunkserver[25905]: open files limit:
10000

2011-03-22T16:18:48+00:00 node10 mfschunkserver[25905]: connected to Master

2011-03-22T16:18:52+00:00 node10 mfschunkserver[25905]: testing chunk:
/mnt/moose1/11/chunk_00000000019A9511_00000001.mfs

2011-03-22T16:18:52+00:00 node10 mfschunkserver[25905]: chunk_readcrc:
file:/mnt/moose1/11/chunk_00000000019A9511_00000001.mfs - wrong id/version
in header (00000000019A9511_00000000)

2011-03-22T16:18:52+00:00 node10 mfschunkserver[25905]: hdd_io_begin:
file:/mnt/moose1/11/chunk_00000000019A9511_00000001.mfs - read error:
Unknown error

2011-03-22T16:19:02+00:00 node10 mfschunkserver[25905]: testing chunk:
/mnt/moose1/4A/chunk_0000000001EFFE4A_00000001.mfs

2011-03-22T16:19:03+00:00 node10 mfschunkserver[25905]: chunk_readcrc:
file:/mnt/moose1/4A/chunk_0000000001EFFE4A_00000001.mfs - wrong id/version
in header (0000000001EFFE4A_00000000)

2011-03-22T16:19:03+00:00 node10 mfschunkserver[25905]: hdd_io_begin:
file:/mnt/moose1/4A/chunk_0000000001EFFE4A_00000001.mfs - read error:
Unknown error

2011-03-22T16:19:13+00:00 node10 mfschunkserver[25905]: testing chunk:
/mnt/moose1/83/chunk_0000000001776783_00000001.mfs

2011-03-22T16:19:13+00:00 node10 mfschunkserver[25905]: chunk_readcrc:
file:/mnt/moose1/83/chunk_0000000001776783_00000001.mfs - wrong id/version
in header (0000000001776783_00000000)

2011-03-22T16:19:13+00:00 node10 mfschunkserver[25905]: hdd_io_begin:
file:/mnt/moose1/83/chunk_0000000001776783_00000001.mfs - read error:
Unknown error

2011-03-22T16:19:13+00:00 node10 mfschunkserver[25905]: 3 errors occurred in
60 seconds on folder: /mnt/moose1/

2011-03-22T16:19:15+00:00 node10 mfschunkserver[25905]: replicator:
hdd_create status: 21

 

What do these errors mean? And what is the best way to recover?

 

If worse comes to worse we of course have replicated chunks, so we can
format the chunkserver and start it back up, but I am very curious how to
best approach the situation.

 

-Thomas S Hatch

[Moosefs-users] Failing Chunkserver

From: Thomas S H. <tha...@gm...> - 2011-03-23 05:10:35

I am having some trouble with a chunkserver, it errors out and then the
chunkserver stops working and reports %0 on the mfs cgi page
Here is the error in the logs.

2011-03-22T16:18:38+00:00 node10 mfschunkserver[25905]: set gid to 70003
2011-03-22T16:18:38+00:00 node10 mfschunkserver[25905]: set uid to 70003
2011-03-22T16:18:38+00:00 node10 mfschunkserver[25783]: closing
172.11.1.110:9422
2011-03-22T16:18:47+00:00 node10 mfschunkserver[25905]: main server module:
listen on 172.11.1.110:9422
2011-03-22T16:18:48+00:00 node10 mfschunkserver[25905]: connecting ...
2011-03-22T16:18:48+00:00 node10 mfschunkserver[25905]: stats file has been
loaded
2011-03-22T16:18:48+00:00 node10 mfschunkserver[25905]: open files limit:
10000
2011-03-22T16:18:48+00:00 node10 mfschunkserver[25905]: connected to Master
2011-03-22T16:18:52+00:00 node10 mfschunkserver[25905]: testing chunk:
/mnt/moose1/11/chunk_00000000019A9511_00000001.mfs
2011-03-22T16:18:52+00:00 node10 mfschunkserver[25905]: chunk_readcrc:
file:/mnt/moose1/11/chunk_00000000019A9511_00000001.mfs - wrong id/version
in header (00000000019A9511_00000000)
2011-03-22T16:18:52+00:00 node10 mfschunkserver[25905]: hdd_io_begin:
file:/mnt/moose1/11/chunk_00000000019A9511_00000001.mfs - read error:
Unknown error
2011-03-22T16:19:02+00:00 node10 mfschunkserver[25905]: testing chunk:
/mnt/moose1/4A/chunk_0000000001EFFE4A_00000001.mfs
2011-03-22T16:19:03+00:00 node10 mfschunkserver[25905]: chunk_readcrc:
file:/mnt/moose1/4A/chunk_0000000001EFFE4A_00000001.mfs - wrong id/version
in header (0000000001EFFE4A_00000000)
2011-03-22T16:19:03+00:00 node10 mfschunkserver[25905]: hdd_io_begin:
file:/mnt/moose1/4A/chunk_0000000001EFFE4A_00000001.mfs - read error:
Unknown error
2011-03-22T16:19:13+00:00 node10 mfschunkserver[25905]: testing chunk:
/mnt/moose1/83/chunk_0000000001776783_00000001.mfs
2011-03-22T16:19:13+00:00 node10 mfschunkserver[25905]: chunk_readcrc:
file:/mnt/moose1/83/chunk_0000000001776783_00000001.mfs - wrong id/version
in header (0000000001776783_00000000)
2011-03-22T16:19:13+00:00 node10 mfschunkserver[25905]: hdd_io_begin:
file:/mnt/moose1/83/chunk_0000000001776783_00000001.mfs - read error:
Unknown error
2011-03-22T16:19:13+00:00 node10 mfschunkserver[25905]: 3 errors occurred in
60 seconds on folder: /mnt/moose1/
2011-03-22T16:19:15+00:00 node10 mfschunkserver[25905]: replicator:
hdd_create status: 21

What do these errors mean? And what is the best way to recover?

If worse comes to worse we of course have replicated chunks, so we can
format the chunkserver and start it back up, but I am very curious how to
best approach the situation.

-Thomas S Hatch

Re: [Moosefs-users] Power failure... unable to restore metadata.mfs

From: Boyko Y. <b.y...@ex...> - 2011-03-21 22:41:41

Hi,

Not sure if it is related at all, but there is a known bug with mfsmetarestore, take a look here: http://sourceforge.net/mailarchive/forum.php?thread_name=045c01cbe3e6%249a511760%24cef34620%24%40borychowski%40gemius.pl&forum_name=moosefs-users

You may want to try the suggested workaround,

Regards,
Boyko

On Mar 22, 2011, at 12:19 AM, Pedro Naranjo wrote:

> Hi there,
> 
> We run another test today after losing about 3TB+ of data when we could 
> not restore the metadata.mfs file. We were in the process of copying 
> files over when we power was cut off. This lead to an error when running 
> the mfsmetarestore -a command as follows...
> 
> 3605: '|' expected
> 
> We really feel like MooseFS is the best solution that we could find but 
> somehow after not being able to recover from something so real as a 
> power failure, I really worry.
> 
> Please advice as to how to fix this problem.
> 
> Sincerely,
> 
> 
> 
> Pedro Naranjo / STL Technologies / Solutions Architech / 888.556.0774
> 
> 
> ------------------------------------------------------------------------------
> Enable your software for Intel(R) Active Management Technology to meet the
> growing manageability and security demands of your customers. Businesses
> are taking advantage of Intel(R) vPro (TM) technology - will your software 
> be a part of the solution? Download the Intel(R) Manageability Checker 
> today! http://p.sf.net/sfu/intel-dev2devmar
> _______________________________________________
> moosefs-users mailing list
> moo...@li...
> https://lists.sourceforge.net/lists/listinfo/moosefs-users
>

[Moosefs-users] Power failure... unable to restore metadata.mfs

From: Pedro N. <pe...@st...> - 2011-03-21 22:28:00

Hi there,

We run another test today after losing about 3TB+ of data when we could 
not restore the metadata.mfs file. We were in the process of copying 
files over when we power was cut off. This lead to an error when running 
the mfsmetarestore -a command as follows...

3605: '|' expected

We really feel like MooseFS is the best solution that we could find but 
somehow after not being able to recover from something so real as a 
power failure, I really worry.

Please advice as to how to fix this problem.

Sincerely,



Pedro Naranjo / STL Technologies / Solutions Architech / 888.556.0774

Re: [Moosefs-users] how do you handle mfsmaster failover?

From: Thomas S H. <tha...@gm...> - 2011-03-21 15:27:53

Hi Pedro!
The problem I am running into is time, and a resource problem. I am in the
middle of a number of other projects and my test environment is currently in
a state of "flux".
I agree that this would be a great thing for moosefs to come packaged with,
but it should be a complete package, with ucarp failover scripts wrapped up
into a simple cluster management daemon.
I have mentioned it before but I hope to have more time for this in a few
weeks, but it keeps getting pushed off, I might not be able to get to it for
over a month, it keeps getting pushed back.
I know that there are many people interested in my moosefs failover, and it
is a high priority. Any contributions would be appreciated, all the code is
in place, I mostly just need to package it up.

P.S.
Since this is a list of system admins some of you might be interested in the
project that has been requiring most of my time lately, it is called salt:
https://github.com/thatch45/salt
Salt is a remote execution platform, I am using it to replace func, but it
allows for very fast communication to servers and beats the heck out of
using ssh for loops, I think it would also be very useful for people
deploying MooseFS, since often you want to get information from and execute
commands on many of your systems at once. I also have a blog post about it
here:
http://red45.wordpress.com/2011/03/19/salt-0-6-0-released/


On Mon, Mar 21, 2011 at 9:08 AM, Pedro Naranjo <pe...@st...> wrote:

>  Dear Thomas,
>
> Your contribution is very valuable. May I suggest to the Moose FS
> developers to include it in the general download of the system? I have also
> become very concerned about loosing data. We spent 3 days moving 3TB+ of
> data only to loose it all after stimulating a power failure. Granted we had
> not deployed the Metaloggers yet but never the lest what ever we can use to
> make sure the system as stable as possible is very important.
>
> Sincerely,
>
>
>
> Pedro Naranjo / STL Technologies / Solutions Architect / 888.556.0774
>
>
> On 3/21/2011 7:51 AM, Thomas S Hatch wrote:
>
> I have been hammering away at mfs failover for quite some time and I am
> familiar with your problem.
> What happens is that the mfsmetaloggers continue to stream updates from the
> mfsmaster even after a failover, but the mfsmetarestore command executed on
> the metadata on the new mfsmaster ends up creating a different "last change
> point" that what the other metaloggers see.
> This means that the mfsmetaloggers that did not become the new master have
> a bad set of metadata after your initial failover.
> Since I wanted to have a completely clean and automated failover in my
> MooseFS deployment, I created a wrapper daemon that manages the
> mfsmetalogger. This daemon should be run on all metaloggers and the
> mfsmaster, it detects when a failover occurs and ensures that the
> mfsmetalogger is running on the right nodes and that the metadata being used
> is the correct metadata.
> If you do want to use my mfsmetalogger manager it is available here:
>
>  https://github.com/thatch45/mfs-failover/blob/master/daemon/metaman.py
>
>  It is written in python3 (my deployments default to python3) but let me
> know if you are interested in running it on python2 and I will make a
> python2 version.
>
>  I also have some ucarp scripts in that github project that can be used
> for managing failover automatically in conjunction with metaman, but I have
> not had the time and resources to finish packaging them up.
>
>  Let me know if you have any questions!
>
>  -Thomas S Hatch
>
> On Mon, Mar 21, 2011 at 5:10 AM, Boyko Yordanov <b.y...@ex...>wrote:
>
>> Hi list,
>>
>> I'm wondering how are you guys handling mfs master failover?
>>
>> In my tests mfsmetalogger seems quite unreliable - 2 days of testing
>> showed a few cases when mfsmetarestore is unable to restore the metadata.mfs
>> datafile - getting different errors like Data mismatch, version mismatch,
>> hole in change files (add more files) etc.
>>
>> Running 3 different metadata backup loggers, master and chunk servers all
>> running mfs-1.6.20-2 on centos 5.5 x86_64, filesystem type is ext3.
>>
>> I'm aware that some of you are running huge clusters with terabytes of
>> data - I'm wondering how do you trust your mfsmaster and am I the only one
>> concerned with eventual data loss on mfsmaster failover, when mfsmetarestore
>> does not properly restore the metadata.mfs file from changelogs?
>>
>> Boyko
>>
>> ------------------------------------------------------------------------------
>> Colocation vs. Managed Hosting
>> A question and answer guide to determining the best fit
>> for your organization - today and in the future.
>> http://p.sf.net/sfu/internap-sfd2d
>> _______________________________________________
>> moosefs-users mailing list
>> moo...@li...
>> https://lists.sourceforge.net/lists/listinfo/moosefs-users
>>
>
>
> ------------------------------------------------------------------------------
> Colocation vs. Managed Hosting
> A question and answer guide to determining the best fit
> for your organization - today and in the future.http://p.sf.net/sfu/internap-sfd2d
>
>
> _______________________________________________
> moosefs-users mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/moosefs-users
>
>

Re: [Moosefs-users] how do you handle mfsmaster failover?

From: Thomas S H. <tha...@gm...> - 2011-03-21 14:51:31

I have been hammering away at mfs failover for quite some time and I am
familiar with your problem.
What happens is that the mfsmetaloggers continue to stream updates from the
mfsmaster even after a failover, but the mfsmetarestore command executed on
the metadata on the new mfsmaster ends up creating a different "last change
point" that what the other metaloggers see.
This means that the mfsmetaloggers that did not become the new master have a
bad set of metadata after your initial failover.
Since I wanted to have a completely clean and automated failover in my
MooseFS deployment, I created a wrapper daemon that manages the
mfsmetalogger. This daemon should be run on all metaloggers and the
mfsmaster, it detects when a failover occurs and ensures that the
mfsmetalogger is running on the right nodes and that the metadata being used
is the correct metadata.
If you do want to use my mfsmetalogger manager it is available here:

https://github.com/thatch45/mfs-failover/blob/master/daemon/metaman.py

It is written in python3 (my deployments default to python3) but let me know
if you are interested in running it on python2 and I will make a python2
version.

I also have some ucarp scripts in that github project that can be used for
managing failover automatically in conjunction with metaman, but I have not
had the time and resources to finish packaging them up.

Let me know if you have any questions!

-Thomas S Hatch

On Mon, Mar 21, 2011 at 5:10 AM, Boyko Yordanov <b.y...@ex...>wrote:

> Hi list,
>
> I'm wondering how are you guys handling mfs master failover?
>
> In my tests mfsmetalogger seems quite unreliable - 2 days of testing showed
> a few cases when mfsmetarestore is unable to restore the metadata.mfs
> datafile - getting different errors like Data mismatch, version mismatch,
> hole in change files (add more files) etc.
>
> Running 3 different metadata backup loggers, master and chunk servers all
> running mfs-1.6.20-2 on centos 5.5 x86_64, filesystem type is ext3.
>
> I'm aware that some of you are running huge clusters with terabytes of data
> - I'm wondering how do you trust your mfsmaster and am I the only one
> concerned with eventual data loss on mfsmaster failover, when mfsmetarestore
> does not properly restore the metadata.mfs file from changelogs?
>
> Boyko
>
> ------------------------------------------------------------------------------
> Colocation vs. Managed Hosting
> A question and answer guide to determining the best fit
> for your organization - today and in the future.
> http://p.sf.net/sfu/internap-sfd2d
> _______________________________________________
> moosefs-users mailing list
> moo...@li...
> https://lists.sourceforge.net/lists/listinfo/moosefs-users
>

[Moosefs-users] some questions about the background of Moosefs

From: Jofly <xh...@16...> - 2011-03-21 13:35:23

Dear Sir,
Thank you for you sparing some time to read my letter.
I am a college student come from china, and my English is not good.I write the letter in order to ask you some questions about the background of Moosefs.

Frist, did the Moosefs originate in the United States and when is it first made publicly ?
Secondly, which company or who developed the software ?
Thirdly,Is there any famous IT company using MooseFS, and can you tell me some successful cases?
 
Thanks for your time again,and I am looking forward to your reply.

84 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 125 126 127 128 129 .. 163 > >> (Page 127 of 163)