scribeserver-devel Mailing List for Scribe

Brought to you by: agiardullo, borthakur, davefet, facebook_bobby

scribeserver-devel — Updates for contributors to Scribe

You can subscribe to this list here.

2008	_Jan	_Feb	_Mar	_Apr	_May	_Jun	_Jul	_Aug	_Sep	_Oct (1)	_Nov	_Dec (2)
2009	_Jan	_Feb	_Mar	_Apr	_May	_Jun	_Jul (2)	_Aug	_Sep	_Oct (1)	_Nov	_Dec
2010	_Jan	_Feb	_Mar	_Apr (2)	_May	_Jun	_Jul	_Aug	_Sep	_Oct	_Nov	_Dec

Flat | Threaded

Re: [Scribeserver-devel] Threading and logging framework:

From: Anthony G. <agi...@fa...> - 2010-04-09 02:08:08

Scribe has moved to github:
http://github.com/facebook/scribe

Check out the wiki and the new mailing list for more information:
http://groups.google.com/group/scribe-server


-Anthony

On Apr 8, 2010, at 6:54 PM, saikiran. gorthi wrote:

> Hey all,
>              I'm a new developer for scribe but things look quite confusing.Can anyone help me suggesting some links or any material?
>  
> I'm looking to develop on the scribe threading model and logging framework.Any suggestions for this?
> <ATT00001..txt><ATT00002..txt>

[Scribeserver-devel] Threading and logging framework:

From: saikiran. g. <sai...@gm...> - 2010-04-09 01:54:34

Hey all,
             I'm a new developer for scribe but things look quite
confusing.Can anyone help me suggesting some links or any material?

I'm looking to develop on the scribe threading model and logging
framework.Any suggestions for this?

[Scribeserver-devel] Scribe is moving to GitHub

From: Anthony G. <agi...@fa...> - 2009-10-14 06:15:56

Scribe is changing hosting from Sourceforge to GitHub.  Come check out the new homepage for Scribe:
http://github.com/facebook/scribe

This Scribe mailing list will be replaced by the new Scribe Discussion Group:
http://groups.google.com/group/scribe-server/

Please update your bookmarks.

Re: [Scribeserver-devel] failure mode when out of disk space

From: Anthony G. <agi...@fa...> - 2009-07-29 22:34:45

Tor,

I have a solution to this issue as well which is checked in to Scribe trunk, but it not in the latest released version of scribe.  Try downloading the the most recent revision of Scribe from svn and set the following flag for each store you have configured:

must_succeed=yes

Let me know if this works for you.  Your solution looks good as well.

-Anthony

On 7/28/09 11:56 PM, "Tor Myklebust" wrote:

I have a central scribe server and a bunch of peripheral scribe servers.
The peripheral servers are configured with the central server as a primary
(network) store and a file store as a secondary store.  This makes the
logging system tolerate link failures and a failure of the central server.

The central server is configured with only file stores.  (Specifically, I
made the decision to have a primary file store and a secondary file store,
but it happened that both were on the same partition after moving the
service to a different machine.)  So, when the central server runs out of
disk space, it

(a) can't store messages to disk, so it silently discards them
and
(b) continues replying to Log calls with OK rather than TRY_AGAIN.

(a) alone isn't cool; it should leave the messages in the queue rather
than throwing them away.  When combined with (b), this all means that when
the central server runs out of disk space, all log messages vanish into
the ether rather than being buffered at the peripheral servers as they
should (until someone fixes things or they too run out of disk space).

So I want to hack scribe so that when it can't store messages to disk for
whatever reason, it leaves them in the queue instead of throwing them away
and it also rejects incoming messages until it's able to save messages to
the store again.

I wrote the following (against a version of scribe from March):

Index: src/scribe_server.cpp
===================================================================
--- src/scribe_server.cpp       (revision 45848)
+++ src/scribe_server.cpp       (revision 45886)
@@ -269,11 +269,17 @@
    return true;
  }

+extern volatile int boned, peripheral;
+
  ResultCode scribeHandler::Log(const vector<LogEntry>&  messages) {

    //LOG_OPER("received Log with <%d> messages", (int)messages.size());

-  if (throttleDeny(messages.size())) {
+  if (boned) {
+    return TRY_LATER;
+  }
+
+  if (!peripheral && throttleDeny(messages.size())) {
      incrementCounter("denied for rate");
      return TRY_LATER;
    }
Index: src/store.cpp
===================================================================
--- src/store.cpp       (revision 45848)
+++ src/store.cpp       (revision 45886)
@@ -539,7 +539,7 @@

  bool FileStore::handleMessages(boost::shared_ptr<logentry_vector_t> messages) {

-  if (!isOpen()) {
+  if (!isOpen() && !openInternal(true, NULL)) {
      LOG_OPER("[%s] File failed to open FileStore::handleMessages()", categoryHandled.c_str());
      return false;
    }
@@ -1247,6 +1247,8 @@
  }


+volatile int peripheral;
+
  NetworkStore::NetworkStore(const string& category, bool multi_category)
    : Store(category, "network", multi_category),
      useConnPool(false),
@@ -1254,6 +1256,7 @@
      remotePort(0),
      opened(false) {
    // we can't open the connection until we get configured
+  peripheral = true;

    // the bool for opened ensures that we don't make duplicate
    // close calls, which would screw up the connection pool's
Index: src/store_queue.cpp
===================================================================
--- src/store_queue.cpp (revision 45848)
+++ src/store_queue.cpp (revision 45886)
@@ -194,6 +194,8 @@
    return store->getType();
  }

+volatile int boned;
+
  void StoreQueue::threadMember() {
    LOG_OPER("store thread starting");

@@ -260,10 +262,10 @@
      pthread_mutex_unlock(&cmdMutex);

      // handle messages if stopping, enough time has passed, or queue is large
-    //
+    // and we're not boned.
      if (stop ||
          (this_loop - last_handle_messages > maxWriteInterval) ||
-        msgQueueSize >= targetWriteSize) {
+        msgQueueSize >= targetWriteSize && !boned) {

        if (msgQueueSize > 0) {
          boost::shared_ptr<logentry_vector_t> messages = msgQueue;
@@ -273,12 +275,23 @@
          pthread_mutex_unlock(&msgMutex);

          if (!store->handleMessages(messages)) {
-          // Store could not handle these messages, nothing else to do
+          /*// Store could not handle these messages, nothing else to do
            // other than to record them as being lost
            LOG_OPER("[%s] WARNING: Lost %lu messages!",
                     categoryHandled.c_str(), messages->size());
-          g_Handler->incrementCounter("lost", messages->size());
+          g_Handler->incrementCounter("lost", messages->size());*/
+          // lies; we can requeue them and set boned.
+          boned=1;
+          pthread_mutex_lock(&msgMutex);
+          for (size_t i = 0; i < msgQueue->size(); i++)
+            messages->push_back((*msgQueue)[i]);
+          msgQueue = messages;
+          msgQueueSize = 0;
+          for (size_t i = 0; i < msgQueue->size(); i++)
+            msgQueueSize += (*msgQueue)[i]->message.size();
+          pthread_mutex_unlock(&msgMutex);
          }
+        else boned = 0; // write succeeded; no longer boned.
          store->flush();
        } else {
          pthread_mutex_unlock(&msgMutex);


This looks to me like it works.  I had to add the hack to FileStore
because there seems to be no code path that reopens the file in as long as
I could bo bothered to wait.  I tested it by having a simple two-server
configuration (one peripheral, one central).  The central server is told
to write to a 256MB ramdisk that contains a 60MB and a 192MB file, then I
delete the 60MB file once scribe starts failing to write, then I delete
the 192MB file once scribe fails to write again.  I then give the
peripheral server a bunch of messages that are the positive integers in
increasing order followed by newlines, one number per message.  No
messages were lost, but a few were duplicated in one chunk and the next.

I know this isn't the right way to handle the problem --- we should make
sure there's space for writing a message to disk before writing the
actually message --- but am I going to run into strange problems if I use
this patch?

(I see in trunk scribe that there's some logic for handling failed
messages.  I haven't tested it, though.  And at a glance, I also don't
understand why its resource consumption is bounded; I see no logic to tell
clients to try later if we're making no progress on the queue.  Or why
mustSucceed should be false by default.)

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Scribeserver-devel mailing list
Scr...@li...
https://lists.sourceforge.net/lists/listinfo/scribeserver-devel

[Scribeserver-devel] failure mode when out of disk space

From: Tor M. <tmy...@cs...> - 2009-07-29 06:56:17

I have a central scribe server and a bunch of peripheral scribe servers. 
The peripheral servers are configured with the central server as a primary 
(network) store and a file store as a secondary store.  This makes the 
logging system tolerate link failures and a failure of the central server.

The central server is configured with only file stores.  (Specifically, I 
made the decision to have a primary file store and a secondary file store, 
but it happened that both were on the same partition after moving the 
service to a different machine.)  So, when the central server runs out of 
disk space, it

(a) can't store messages to disk, so it silently discards them
and
(b) continues replying to Log calls with OK rather than TRY_AGAIN.

(a) alone isn't cool; it should leave the messages in the queue rather 
than throwing them away.  When combined with (b), this all means that when 
the central server runs out of disk space, all log messages vanish into 
the ether rather than being buffered at the peripheral servers as they 
should (until someone fixes things or they too run out of disk space).

So I want to hack scribe so that when it can't store messages to disk for 
whatever reason, it leaves them in the queue instead of throwing them away 
and it also rejects incoming messages until it's able to save messages to 
the store again.

I wrote the following (against a version of scribe from March):

Index: src/scribe_server.cpp
===================================================================
--- src/scribe_server.cpp	(revision 45848)
+++ src/scribe_server.cpp	(revision 45886)
@@ -269,11 +269,17 @@
    return true;
  }

+extern volatile int boned, peripheral;
+
  ResultCode scribeHandler::Log(const vector<LogEntry>&  messages) {

    //LOG_OPER("received Log with <%d> messages", (int)messages.size());

-  if (throttleDeny(messages.size())) {
+  if (boned) {
+    return TRY_LATER;
+  }
+
+  if (!peripheral && throttleDeny(messages.size())) {
      incrementCounter("denied for rate");
      return TRY_LATER;
    }
Index: src/store.cpp
===================================================================
--- src/store.cpp	(revision 45848)
+++ src/store.cpp	(revision 45886)
@@ -539,7 +539,7 @@

  bool FileStore::handleMessages(boost::shared_ptr<logentry_vector_t> messages) {

-  if (!isOpen()) {
+  if (!isOpen() && !openInternal(true, NULL)) {
      LOG_OPER("[%s] File failed to open FileStore::handleMessages()", categoryHandled.c_str());
      return false;
    }
@@ -1247,6 +1247,8 @@
  }


+volatile int peripheral;
+
  NetworkStore::NetworkStore(const string& category, bool multi_category)
    : Store(category, "network", multi_category),
      useConnPool(false),
@@ -1254,6 +1256,7 @@
      remotePort(0),
      opened(false) {
    // we can't open the connection until we get configured
+  peripheral = true;

    // the bool for opened ensures that we don't make duplicate
    // close calls, which would screw up the connection pool's 
Index: src/store_queue.cpp
===================================================================
--- src/store_queue.cpp	(revision 45848)
+++ src/store_queue.cpp	(revision 45886)
@@ -194,6 +194,8 @@
    return store->getType();
  }

+volatile int boned;
+
  void StoreQueue::threadMember() {
    LOG_OPER("store thread starting");

@@ -260,10 +262,10 @@
      pthread_mutex_unlock(&cmdMutex);

      // handle messages if stopping, enough time has passed, or queue is large
-    //
+    // and we're not boned.
      if (stop ||
          (this_loop - last_handle_messages > maxWriteInterval) ||
-        msgQueueSize >= targetWriteSize) {
+        msgQueueSize >= targetWriteSize && !boned) {

        if (msgQueueSize > 0) {
          boost::shared_ptr<logentry_vector_t> messages = msgQueue;
@@ -273,12 +275,23 @@
          pthread_mutex_unlock(&msgMutex);

          if (!store->handleMessages(messages)) {
-          // Store could not handle these messages, nothing else to do
+          /*// Store could not handle these messages, nothing else to do
            // other than to record them as being lost
            LOG_OPER("[%s] WARNING: Lost %lu messages!",
                     categoryHandled.c_str(), messages->size());
-          g_Handler->incrementCounter("lost", messages->size());
+          g_Handler->incrementCounter("lost", messages->size());*/
+          // lies; we can requeue them and set boned.
+          boned=1;
+          pthread_mutex_lock(&msgMutex);
+          for (size_t i = 0; i < msgQueue->size(); i++)
+            messages->push_back((*msgQueue)[i]);
+          msgQueue = messages;
+          msgQueueSize = 0;
+          for (size_t i = 0; i < msgQueue->size(); i++)
+            msgQueueSize += (*msgQueue)[i]->message.size();
+          pthread_mutex_unlock(&msgMutex);
          }
+        else boned = 0; // write succeeded; no longer boned.
          store->flush();
        } else {
          pthread_mutex_unlock(&msgMutex);


This looks to me like it works.  I had to add the hack to FileStore 
because there seems to be no code path that reopens the file in as long as 
I could bo bothered to wait.  I tested it by having a simple two-server 
configuration (one peripheral, one central).  The central server is told 
to write to a 256MB ramdisk that contains a 60MB and a 192MB file, then I 
delete the 60MB file once scribe starts failing to write, then I delete 
the 192MB file once scribe fails to write again.  I then give the 
peripheral server a bunch of messages that are the positive integers in 
increasing order followed by newlines, one number per message.  No 
messages were lost, but a few were duplicated in one chunk and the next.

I know this isn't the right way to handle the problem --- we should make 
sure there's space for writing a message to disk before writing the 
actually message --- but am I going to run into strange problems if I use 
this patch?

(I see in trunk scribe that there's some logic for handling failed 
messages.  I haven't tested it, though.  And at a glance, I also don't 
understand why its resource consumption is bounded; I see no logic to tell 
clients to try later if we're making no progress on the queue.  Or why 
mustSucceed should be false by default.)

Re: [Scribeserver-devel] log rotation change

From: Anthony G. <an...@fa...> - 2008-12-08 21:20:00

Jonathan,

Please send me an email stating that you have the rights to this code and that you agree to distribute it under Scribe's license(which is the Apache License, version 2.0).

Then attach your code as a unified diff ('svn diff' or 'diff -u') and we will review it.

Thank you for the contribution.

-Anthony


On 12/4/08 7:41 PM, "Jonathan Cao" <jon...@ro...> wrote:

Current Scribe file store only support hourly and daily log rotation. I made some code change to support 5 minute log rotation. Is there any plan for change along the line? What is the procedure to contribute to the code?

Thanks,
Jonathan

[Scribeserver-devel] log rotation change

From: Jonathan C. <jon...@ro...> - 2008-12-05 03:47:22

Current Scribe file store only support hourly and daily log rotation. I made
some code change to support 5 minute log rotation. Is there any plan for
change along the line? What is the procedure to contribute to the code?

Thanks,
Jonathan

[Scribeserver-devel] Welcome

From: Anthony G. <an...@fa...> - 2008-10-23 19:25:11

Welcome to the Scribe Developers mailing list.

This list is for all developers of Scribe.

Flat | Threaded

2008	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (1)	Nov	Dec (2)
2009	Jan	Feb	Mar	Apr	May	Jun	Jul (2)	Aug	Sep	Oct (1)	Nov	Dec
2010	Jan	Feb	Mar	Apr (2)	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec