You can subscribe to this list here.
2008 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
(2) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2009 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
2010 |
Jan
|
Feb
|
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Anthony G. <agi...@fa...> - 2010-04-09 02:08:08
|
Scribe has moved to github: http://github.com/facebook/scribe Check out the wiki and the new mailing list for more information: http://groups.google.com/group/scribe-server -Anthony On Apr 8, 2010, at 6:54 PM, saikiran. gorthi wrote: > Hey all, > I'm a new developer for scribe but things look quite confusing.Can anyone help me suggesting some links or any material? > > I'm looking to develop on the scribe threading model and logging framework.Any suggestions for this? > <ATT00001..txt><ATT00002..txt> |
From: saikiran. g. <sai...@gm...> - 2010-04-09 01:54:34
|
Hey all, I'm a new developer for scribe but things look quite confusing.Can anyone help me suggesting some links or any material? I'm looking to develop on the scribe threading model and logging framework.Any suggestions for this? |
From: Anthony G. <agi...@fa...> - 2009-10-14 06:15:56
|
Scribe is changing hosting from Sourceforge to GitHub. Come check out the new homepage for Scribe: http://github.com/facebook/scribe This Scribe mailing list will be replaced by the new Scribe Discussion Group: http://groups.google.com/group/scribe-server/ Please update your bookmarks. |
From: Anthony G. <agi...@fa...> - 2009-07-29 22:34:45
|
Tor, I have a solution to this issue as well which is checked in to Scribe trunk, but it not in the latest released version of scribe. Try downloading the the most recent revision of Scribe from svn and set the following flag for each store you have configured: must_succeed=yes Let me know if this works for you. Your solution looks good as well. -Anthony On 7/28/09 11:56 PM, "Tor Myklebust" wrote: I have a central scribe server and a bunch of peripheral scribe servers. The peripheral servers are configured with the central server as a primary (network) store and a file store as a secondary store. This makes the logging system tolerate link failures and a failure of the central server. The central server is configured with only file stores. (Specifically, I made the decision to have a primary file store and a secondary file store, but it happened that both were on the same partition after moving the service to a different machine.) So, when the central server runs out of disk space, it (a) can't store messages to disk, so it silently discards them and (b) continues replying to Log calls with OK rather than TRY_AGAIN. (a) alone isn't cool; it should leave the messages in the queue rather than throwing them away. When combined with (b), this all means that when the central server runs out of disk space, all log messages vanish into the ether rather than being buffered at the peripheral servers as they should (until someone fixes things or they too run out of disk space). So I want to hack scribe so that when it can't store messages to disk for whatever reason, it leaves them in the queue instead of throwing them away and it also rejects incoming messages until it's able to save messages to the store again. I wrote the following (against a version of scribe from March): Index: src/scribe_server.cpp =================================================================== --- src/scribe_server.cpp (revision 45848) +++ src/scribe_server.cpp (revision 45886) @@ -269,11 +269,17 @@ return true; } +extern volatile int boned, peripheral; + ResultCode scribeHandler::Log(const vector<LogEntry>& messages) { //LOG_OPER("received Log with <%d> messages", (int)messages.size()); - if (throttleDeny(messages.size())) { + if (boned) { + return TRY_LATER; + } + + if (!peripheral && throttleDeny(messages.size())) { incrementCounter("denied for rate"); return TRY_LATER; } Index: src/store.cpp =================================================================== --- src/store.cpp (revision 45848) +++ src/store.cpp (revision 45886) @@ -539,7 +539,7 @@ bool FileStore::handleMessages(boost::shared_ptr<logentry_vector_t> messages) { - if (!isOpen()) { + if (!isOpen() && !openInternal(true, NULL)) { LOG_OPER("[%s] File failed to open FileStore::handleMessages()", categoryHandled.c_str()); return false; } @@ -1247,6 +1247,8 @@ } +volatile int peripheral; + NetworkStore::NetworkStore(const string& category, bool multi_category) : Store(category, "network", multi_category), useConnPool(false), @@ -1254,6 +1256,7 @@ remotePort(0), opened(false) { // we can't open the connection until we get configured + peripheral = true; // the bool for opened ensures that we don't make duplicate // close calls, which would screw up the connection pool's Index: src/store_queue.cpp =================================================================== --- src/store_queue.cpp (revision 45848) +++ src/store_queue.cpp (revision 45886) @@ -194,6 +194,8 @@ return store->getType(); } +volatile int boned; + void StoreQueue::threadMember() { LOG_OPER("store thread starting"); @@ -260,10 +262,10 @@ pthread_mutex_unlock(&cmdMutex); // handle messages if stopping, enough time has passed, or queue is large - // + // and we're not boned. if (stop || (this_loop - last_handle_messages > maxWriteInterval) || - msgQueueSize >= targetWriteSize) { + msgQueueSize >= targetWriteSize && !boned) { if (msgQueueSize > 0) { boost::shared_ptr<logentry_vector_t> messages = msgQueue; @@ -273,12 +275,23 @@ pthread_mutex_unlock(&msgMutex); if (!store->handleMessages(messages)) { - // Store could not handle these messages, nothing else to do + /*// Store could not handle these messages, nothing else to do // other than to record them as being lost LOG_OPER("[%s] WARNING: Lost %lu messages!", categoryHandled.c_str(), messages->size()); - g_Handler->incrementCounter("lost", messages->size()); + g_Handler->incrementCounter("lost", messages->size());*/ + // lies; we can requeue them and set boned. + boned=1; + pthread_mutex_lock(&msgMutex); + for (size_t i = 0; i < msgQueue->size(); i++) + messages->push_back((*msgQueue)[i]); + msgQueue = messages; + msgQueueSize = 0; + for (size_t i = 0; i < msgQueue->size(); i++) + msgQueueSize += (*msgQueue)[i]->message.size(); + pthread_mutex_unlock(&msgMutex); } + else boned = 0; // write succeeded; no longer boned. store->flush(); } else { pthread_mutex_unlock(&msgMutex); This looks to me like it works. I had to add the hack to FileStore because there seems to be no code path that reopens the file in as long as I could bo bothered to wait. I tested it by having a simple two-server configuration (one peripheral, one central). The central server is told to write to a 256MB ramdisk that contains a 60MB and a 192MB file, then I delete the 60MB file once scribe starts failing to write, then I delete the 192MB file once scribe fails to write again. I then give the peripheral server a bunch of messages that are the positive integers in increasing order followed by newlines, one number per message. No messages were lost, but a few were duplicated in one chunk and the next. I know this isn't the right way to handle the problem --- we should make sure there's space for writing a message to disk before writing the actually message --- but am I going to run into strange problems if I use this patch? (I see in trunk scribe that there's some logic for handling failed messages. I haven't tested it, though. And at a glance, I also don't understand why its resource consumption is bounded; I see no logic to tell clients to try later if we're making no progress on the queue. Or why mustSucceed should be false by default.) ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Scribeserver-devel mailing list Scr...@li... https://lists.sourceforge.net/lists/listinfo/scribeserver-devel |
From: Tor M. <tmy...@cs...> - 2009-07-29 06:56:17
|
I have a central scribe server and a bunch of peripheral scribe servers. The peripheral servers are configured with the central server as a primary (network) store and a file store as a secondary store. This makes the logging system tolerate link failures and a failure of the central server. The central server is configured with only file stores. (Specifically, I made the decision to have a primary file store and a secondary file store, but it happened that both were on the same partition after moving the service to a different machine.) So, when the central server runs out of disk space, it (a) can't store messages to disk, so it silently discards them and (b) continues replying to Log calls with OK rather than TRY_AGAIN. (a) alone isn't cool; it should leave the messages in the queue rather than throwing them away. When combined with (b), this all means that when the central server runs out of disk space, all log messages vanish into the ether rather than being buffered at the peripheral servers as they should (until someone fixes things or they too run out of disk space). So I want to hack scribe so that when it can't store messages to disk for whatever reason, it leaves them in the queue instead of throwing them away and it also rejects incoming messages until it's able to save messages to the store again. I wrote the following (against a version of scribe from March): Index: src/scribe_server.cpp =================================================================== --- src/scribe_server.cpp (revision 45848) +++ src/scribe_server.cpp (revision 45886) @@ -269,11 +269,17 @@ return true; } +extern volatile int boned, peripheral; + ResultCode scribeHandler::Log(const vector<LogEntry>& messages) { //LOG_OPER("received Log with <%d> messages", (int)messages.size()); - if (throttleDeny(messages.size())) { + if (boned) { + return TRY_LATER; + } + + if (!peripheral && throttleDeny(messages.size())) { incrementCounter("denied for rate"); return TRY_LATER; } Index: src/store.cpp =================================================================== --- src/store.cpp (revision 45848) +++ src/store.cpp (revision 45886) @@ -539,7 +539,7 @@ bool FileStore::handleMessages(boost::shared_ptr<logentry_vector_t> messages) { - if (!isOpen()) { + if (!isOpen() && !openInternal(true, NULL)) { LOG_OPER("[%s] File failed to open FileStore::handleMessages()", categoryHandled.c_str()); return false; } @@ -1247,6 +1247,8 @@ } +volatile int peripheral; + NetworkStore::NetworkStore(const string& category, bool multi_category) : Store(category, "network", multi_category), useConnPool(false), @@ -1254,6 +1256,7 @@ remotePort(0), opened(false) { // we can't open the connection until we get configured + peripheral = true; // the bool for opened ensures that we don't make duplicate // close calls, which would screw up the connection pool's Index: src/store_queue.cpp =================================================================== --- src/store_queue.cpp (revision 45848) +++ src/store_queue.cpp (revision 45886) @@ -194,6 +194,8 @@ return store->getType(); } +volatile int boned; + void StoreQueue::threadMember() { LOG_OPER("store thread starting"); @@ -260,10 +262,10 @@ pthread_mutex_unlock(&cmdMutex); // handle messages if stopping, enough time has passed, or queue is large - // + // and we're not boned. if (stop || (this_loop - last_handle_messages > maxWriteInterval) || - msgQueueSize >= targetWriteSize) { + msgQueueSize >= targetWriteSize && !boned) { if (msgQueueSize > 0) { boost::shared_ptr<logentry_vector_t> messages = msgQueue; @@ -273,12 +275,23 @@ pthread_mutex_unlock(&msgMutex); if (!store->handleMessages(messages)) { - // Store could not handle these messages, nothing else to do + /*// Store could not handle these messages, nothing else to do // other than to record them as being lost LOG_OPER("[%s] WARNING: Lost %lu messages!", categoryHandled.c_str(), messages->size()); - g_Handler->incrementCounter("lost", messages->size()); + g_Handler->incrementCounter("lost", messages->size());*/ + // lies; we can requeue them and set boned. + boned=1; + pthread_mutex_lock(&msgMutex); + for (size_t i = 0; i < msgQueue->size(); i++) + messages->push_back((*msgQueue)[i]); + msgQueue = messages; + msgQueueSize = 0; + for (size_t i = 0; i < msgQueue->size(); i++) + msgQueueSize += (*msgQueue)[i]->message.size(); + pthread_mutex_unlock(&msgMutex); } + else boned = 0; // write succeeded; no longer boned. store->flush(); } else { pthread_mutex_unlock(&msgMutex); This looks to me like it works. I had to add the hack to FileStore because there seems to be no code path that reopens the file in as long as I could bo bothered to wait. I tested it by having a simple two-server configuration (one peripheral, one central). The central server is told to write to a 256MB ramdisk that contains a 60MB and a 192MB file, then I delete the 60MB file once scribe starts failing to write, then I delete the 192MB file once scribe fails to write again. I then give the peripheral server a bunch of messages that are the positive integers in increasing order followed by newlines, one number per message. No messages were lost, but a few were duplicated in one chunk and the next. I know this isn't the right way to handle the problem --- we should make sure there's space for writing a message to disk before writing the actually message --- but am I going to run into strange problems if I use this patch? (I see in trunk scribe that there's some logic for handling failed messages. I haven't tested it, though. And at a glance, I also don't understand why its resource consumption is bounded; I see no logic to tell clients to try later if we're making no progress on the queue. Or why mustSucceed should be false by default.) |
From: Anthony G. <an...@fa...> - 2008-12-08 21:20:00
|
Jonathan, Please send me an email stating that you have the rights to this code and that you agree to distribute it under Scribe's license(which is the Apache License, version 2.0). Then attach your code as a unified diff ('svn diff' or 'diff -u') and we will review it. Thank you for the contribution. -Anthony On 12/4/08 7:41 PM, "Jonathan Cao" <jon...@ro...> wrote: Current Scribe file store only support hourly and daily log rotation. I made some code change to support 5 minute log rotation. Is there any plan for change along the line? What is the procedure to contribute to the code? Thanks, Jonathan |
From: Jonathan C. <jon...@ro...> - 2008-12-05 03:47:22
|
Current Scribe file store only support hourly and daily log rotation. I made some code change to support 5 minute log rotation. Is there any plan for change along the line? What is the procedure to contribute to the code? Thanks, Jonathan |
From: Anthony G. <an...@fa...> - 2008-10-23 19:25:11
|
Welcome to the Scribe Developers mailing list. This list is for all developers of Scribe. |