We are evaluating scribe for one of our requirement. The requirements is listed below.
We are planning to build a log managemnet system to manage logs that are coming from various application(s) instance from multiple servers(50-100 servers potentially).
The logs can be coming from JBoss AS, JBoss Portal, Tomcat, Webserver, Netscaler logs etc. The logs can be in anyformat and the log formats are not standardized.
For collecting these logs to a centralized location(may be Jackrabbit or Hadoop file system), we are planning to use scribe.
With this scenario I have the following queries
1. Can scribe be used successfully for above scenario ?
2. Does scribe is agnostic to log formats or is there any dependency ?
3. Instead of aggregating information from several scribe agents onto a single file, can I just use scribe to copy the files into multiple directories of central location. In a nutshell, is it possible for us to aggregate the data only at application instance level. If so, any high level thoughts on how to acheve would be really helpful.
If there are better ways of doing it..I would like to know them as well..
Appreciate your help in answering the queries..
Thanks,
sS
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
1. Can scribe be used successfully for above scenario ?
Sure. You can log any data you want with scribe and you can separate your logs using different category names. 50-100 servers should not be a problem since Facebook manages over 10TB/day of logs from over 10,000 servers.
2. Does scribe is agnostic to log formats or is there any dependency ?
By default, Scribe ignores the the format of your data. You just need to give each message a category name and then tell Scribe what to do with all messages of that category. (The only exception is if you configure Scribe to use a BucketStore to route messages based on a prefix of the message).
3. Instead of aggregating information from several scribe agents onto a single
file, can I just use scribe to copy the files into multiple directories of central
location. In a nutshell, is it possible for us to aggregate the data only at
application instance level. If so, any high level thoughts on how to acheve
would be really helpful.
I’m not sure exactly what you are asking here. But you can configure Scribe to write to a different file for each message category. Then all you Jboss logs can be in 1 file, while your websever logs are in another file.
Let me know if this answers your questions.
-Anthony
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Multiple instances of each of these applications are running on servers S1 and S2.
So I have the app instance set up are as follows
On server S1 -> APP1-Instance1, APP1-Instance2, APP2-Instance2
On server S2 -> APP1-Instance3, APP2-Instance1, APP2-Instance3
Each of the APP instance produces logs(every day the logs may recycle..),
Also assume, I have a central server C1 to which I need to stream the APP instance logs that are being created from the above setup.
Now my requirement is to setup the scribe in such a way that the
--> all logs corresponding to APP1-Instance1(including the recycled log files) are copied to a certain folder location say /APP1/Instance1/<logfile name> on centralized server
--> all logs corresponding to APP1-Instance2(including the recycled log files) are copied to a certain folder location say /APP1/Instance2/<logfile name> on centralized server
.
.
.
--> all logs corresponding to APP2-Instance3(including the recycled log files) are copied to a certain folder location say /APP2/Instance3/<logfile name> on centralized server
Hope it makes my requirement clear.
Also, I read that Scribe supports HDFS now. Just wondering whether it support streaming data into JackRabbit file repository as well ?
Thanks for your time..
Regards,
sS
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You can accomplish what you want by logging all data from App1 Instance1 using a category name of "App1-Instance1". Then you can easily configure Scribe on the central server to write to a separate subdirectory for each new category it encounters. Take a look at the examples directory in the Scribe source for an example of how to configure this.
-Anthony
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Sorry, I'm not familiar with JackRabbit. Currently, Scribe would only be able to support it if you are able to mount a JackRabbit file system with acceptable write performance.
-Anthony
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
We are evaluating scribe for one of our requirement. The requirements is listed below.
We are planning to build a log managemnet system to manage logs that are coming from various application(s) instance from multiple servers(50-100 servers potentially).
The logs can be coming from JBoss AS, JBoss Portal, Tomcat, Webserver, Netscaler logs etc. The logs can be in anyformat and the log formats are not standardized.
For collecting these logs to a centralized location(may be Jackrabbit or Hadoop file system), we are planning to use scribe.
With this scenario I have the following queries
1. Can scribe be used successfully for above scenario ?
2. Does scribe is agnostic to log formats or is there any dependency ?
3. Instead of aggregating information from several scribe agents onto a single file, can I just use scribe to copy the files into multiple directories of central location. In a nutshell, is it possible for us to aggregate the data only at application instance level. If so, any high level thoughts on how to acheve would be really helpful.
If there are better ways of doing it..I would like to know them as well..
Appreciate your help in answering the queries..
Thanks,
sS
1. Can scribe be used successfully for above scenario ?
Sure. You can log any data you want with scribe and you can separate your logs using different category names. 50-100 servers should not be a problem since Facebook manages over 10TB/day of logs from over 10,000 servers.
2. Does scribe is agnostic to log formats or is there any dependency ?
By default, Scribe ignores the the format of your data. You just need to give each message a category name and then tell Scribe what to do with all messages of that category. (The only exception is if you configure Scribe to use a BucketStore to route messages based on a prefix of the message).
3. Instead of aggregating information from several scribe agents onto a single
file, can I just use scribe to copy the files into multiple directories of central
location. In a nutshell, is it possible for us to aggregate the data only at
application instance level. If so, any high level thoughts on how to acheve
would be really helpful.
I’m not sure exactly what you are asking here. But you can configure Scribe to write to a different file for each message category. Then all you Jboss logs can be in 1 file, while your websever logs are in another file.
Let me know if this answers your questions.
-Anthony
Hi Anthony,
Thanks for your quick clarifications.
Let me try elaborate my 3rd requirement.
Lets say i have 2 applications say APP1 and APP2
Multiple instances of each of these applications are running on servers S1 and S2.
So I have the app instance set up are as follows
On server S1 -> APP1-Instance1, APP1-Instance2, APP2-Instance2
On server S2 -> APP1-Instance3, APP2-Instance1, APP2-Instance3
Each of the APP instance produces logs(every day the logs may recycle..),
Also assume, I have a central server C1 to which I need to stream the APP instance logs that are being created from the above setup.
Now my requirement is to setup the scribe in such a way that the
--> all logs corresponding to APP1-Instance1(including the recycled log files) are copied to a certain folder location say /APP1/Instance1/<logfile name> on centralized server
--> all logs corresponding to APP1-Instance2(including the recycled log files) are copied to a certain folder location say /APP1/Instance2/<logfile name> on centralized server
.
.
.
--> all logs corresponding to APP2-Instance3(including the recycled log files) are copied to a certain folder location say /APP2/Instance3/<logfile name> on centralized server
Hope it makes my requirement clear.
Also, I read that Scribe supports HDFS now. Just wondering whether it support streaming data into JackRabbit file repository as well ?
Thanks for your time..
Regards,
sS
You can accomplish what you want by logging all data from App1 Instance1 using a category name of "App1-Instance1". Then you can easily configure Scribe on the central server to write to a separate subdirectory for each new category it encounters. Take a look at the examples directory in the Scribe source for an example of how to configure this.
-Anthony
Thanks for quick response.
Any idea about Scribe's support for streaming data into JackRabbit file repository as well ?
Thanks,
sS
Sorry, I'm not familiar with JackRabbit. Currently, Scribe would only be able to support it if you are able to mount a JackRabbit file system with acceptable write performance.
-Anthony