You can subscribe to this list here.
2008 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(2) |
Nov
(24) |
Dec
(2) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2009 |
Jan
(2) |
Feb
(4) |
Mar
|
Apr
(3) |
May
|
Jun
|
Jul
(11) |
Aug
(8) |
Sep
|
Oct
(5) |
Nov
|
Dec
|
2011 |
Jan
|
Feb
(1) |
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Ian H. <li...@ho...> - 2008-11-21 01:16:08
|
Joydeep Sen Sarma wrote: > Dhruba can shed more light on chukwa - I haven't looked at it myself. > > Does ur mail imply that the log files are being tailed and written out to hdfs periodically (by an application outside scribe?). If so - this is not so different from what we do (except there's more code to deal with catchups etc.). > > we're not using scribe at the moment (it & chukwa weren't there when we started), but am investigating moving to a more widely used solution. and yes... we're using something called 'logtail' http://www.drxyzzy.org/ntlog/logtail.c which remembers where it was the last time it was run, and does a seek() into the logfile and continues from that point, there is a bit of log-rotation logic which goes on so it doesn't miss something if it was stopped. This stream is then piped into a piece of code that runs on the webserver that writes it into hdfs, switching the filename every 15m ( making sure it is unique etc) the downside is that we create files like crazy on HDFS. 4 x # of webservers per hour, each about 30-180Mb each (which is small imho for a file in hdfs). hence why are looking at doing something else ;-) > -----Original Message----- > From: Ian Holsman [mailto:li...@ho...] > Sent: Thursday, November 20, 2008 4:41 PM > To: Joydeep Sen Sarma > Cc: scr...@li...; Dhruba Borthakur; Rodrigo Schmidt > Subject: Re: [Scribeserver-users] Scribe and Hadoop > > Hi Joydeep. > > we currently are just doing the naivé approach writing log files > directly into hadoop to individual files, rotating them every 15minutes > to avoid the append problem. We used logtail on the client side to > de-couple the system and a map/red job which then aggregates the info > 10-30 minutes later. > > I was wondering if you had seen the recent contribution to hadoop called > 'chukwa', and what your thoughts were on it. > > personally i'm looking at scribe (and chukwa) for realtime logging and > decision systems. > > Joydeep Sen Sarma wrote: > >> Hi folks, >> >> Can shed some light on scribe and hdfs/hadoop integration at FB: >> >> - when we (actually Avinash - who's leading the Cassandra project now) >> started out - we investigated writing log files from scribe directly >> to hdfs (using libhdfs c++ api). However there were a few issues with >> this approach that steered us in a different direction: >> >> o hdfs uptime: there have been periods of sustained downtime and we >> can't rule that out in the future. There are many reasons - software >> upgrades being the most common. Buffering data in scribe for such >> large periods didn't seem like a very good route >> >> o lack of append support in hdfs in early days >> >> o desire to build loosely coupled systems (otherwise we would have to >> upgrade scribe servers with new libhdfs every time we had a software >> upgrade on hdfs) >> >> o flexibility in transforming data while copying into hdfs (more on >> this later) >> >> - currently we have a rsync like model to pull data from scribe to hdfs: >> >> o scribe writes data to netapp filers. These filers are high speed >> buffers for the most part >> >> o we have 'copier' jobs that 'pull' data from scribe output locations >> in these filers to hdfs. They maintain file offsets for copied data in >> a registry - so that these jobs can be periodically invoked so that >> continuous copying can happen. >> >> o 'copier' jobs can run in continous mode - or can be invoked to copy >> (or re-copy) data from older dates (this can be important if incorrect >> data was logged or data shows up late) >> >> o 'copier' jobs are map-only jobs in hadoop - this means that we can >> increase the copy parallelism if required. For example - if we are >> falling behind or hdfs was down for long time and there's a lot of >> accumulated data - the copiers will dial up the parallelism (up to a >> maximum - so as not to trip the filers up completely). >> >> - data 'copied' into hdfs - is eventually 'loaded' into Hive (this is >> our open source date warehousing layer on top of Hadoop). Usually this >> loading is a nightly process - but in some small number of cases - we >> load data at hourly granularity for semi-real-time applications. >> Application processing over scribe log sets is typically using Hive QL. >> >> - one interesting angle is 'close of books'. Scribe itself does not >> provide any hard guarantees on when data for a given date will be >> logged by. However several applications (especially revenue sensitive >> ones) need a hard deadline (invoke me when all data for a given day >> has been logged). For such applications - the loading process >> typically waits until 2am or so in the night (on day N+1) and then >> scans data from day N-1, N, N+1 to find all the relevant data for day >> N (using unix timestamps that are typically logged with the data). >> This is the data that's loaded into date partition N for the relevant >> hive table. Clearly the 2am boundary is arbitrary and we will move >> towards more heuristic based ways of determining when data for a given >> date is (almost) complete. >> >> - we have instances of text, json and thrift data sets logged via >> scribe. For the case of thrift (particularly when there's a thrift >> file with heterogenous records) - we do some transforms in the copying >> process to make the subsequent loading easier. Thrift data also shows >> up as TFileTransport format - and this cannot be parallel processed by >> Hadoop natively (although it wouldn't be so hard to arrange that as >> well) - so we always convert thrift data into sequencefiles as it's >> copied into hadoop. >> >> there are several pieces here that are not open sourced - and >> depending on community interest can be made available. The scribe to >> hdfs copier code for one. TFileTransport's java implementation is also >> not open sourced (since there is constant talk of superseding it with >> newer better transports). >> >> Please let us know if there are more questions and would be happy to >> answer. >> >> Joydeep >> >> ------ Forwarded Message >> *From: *Johan Oskarsson <jo...@os...> >> *Date: *Wed, 19 Nov 2008 09:36:38 -0800 >> *To: *<scr...@li...> >> *Subject: *[Scribeserver-users] Scribe and Hadoop >> >> I understand Scribe is being used to put logs into the Hadoop HDFS at >> Facebook. I'd love to hear more about how that works and how to >> replicate the setup you guys have. >> >> /Johan >> >> ------------------------------------------------------------------------- >> This SF.Net email is sponsored by the Moblin Your Move Developer's >> challenge >> Build the coolest Linux based applications with Moblin SDK & win great >> prizes >> Grand prize is a trip for two to an Open Source event anywhere in the >> world >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> <http://moblin-contest.org/redirect.php?banner_id=100&url=/> >> _______________________________________________ >> Scribeserver-users mailing list >> Scr...@li... >> https://lists.sourceforge.net/lists/listinfo/scribeserver-users >> >> >> ------ End of Forwarded Message >> >> ------------------------------------------------------------------------ >> >> ------------------------------------------------------------------------- >> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge >> Build the coolest Linux based applications with Moblin SDK & win great prizes >> Grand prize is a trip for two to an Open Source event anywhere in the world >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Scribeserver-users mailing list >> Scr...@li... >> https://lists.sourceforge.net/lists/listinfo/scribeserver-users >> >> > > > |
From: Ian H. <li...@ho...> - 2008-11-21 01:01:10
|
Hi Joydeep. we currently are just doing the naivé approach writing log files directly into hadoop to individual files, rotating them every 15minutes to avoid the append problem. We used logtail on the client side to de-couple the system and a map/red job which then aggregates the info 10-30 minutes later. I was wondering if you had seen the recent contribution to hadoop called 'chukwa', and what your thoughts were on it. personally i'm looking at scribe (and chukwa) for realtime logging and decision systems. Joydeep Sen Sarma wrote: > > Hi folks, > > Can shed some light on scribe and hdfs/hadoop integration at FB: > > - when we (actually Avinash – who’s leading the Cassandra project now) > started out – we investigated writing log files from scribe directly > to hdfs (using libhdfs c++ api). However there were a few issues with > this approach that steered us in a different direction: > > o hdfs uptime: there have been periods of sustained downtime and we > can’t rule that out in the future. There are many reasons – software > upgrades being the most common. Buffering data in scribe for such > large periods didn’t seem like a very good route > > o lack of append support in hdfs in early days > > o desire to build loosely coupled systems (otherwise we would have to > upgrade scribe servers with new libhdfs every time we had a software > upgrade on hdfs) > > o flexibility in transforming data while copying into hdfs (more on > this later) > > - currently we have a rsync like model to pull data from scribe to hdfs: > > o scribe writes data to netapp filers. These filers are high speed > buffers for the most part > > o we have ‘copier’ jobs that ‘pull’ data from scribe output locations > in these filers to hdfs. They maintain file offsets for copied data in > a registry – so that these jobs can be periodically invoked so that > continuous copying can happen. > > o ‘copier’ jobs can run in continous mode – or can be invoked to copy > (or re-copy) data from older dates (this can be important if incorrect > data was logged or data shows up late) > > o ‘copier’ jobs are map-only jobs in hadoop – this means that we can > increase the copy parallelism if required. For example – if we are > falling behind or hdfs was down for long time and there’s a lot of > accumulated data – the copiers will dial up the parallelism (up to a > maximum – so as not to trip the filers up completely). > > - data ‘copied’ into hdfs – is eventually ‘loaded’ into Hive (this is > our open source date warehousing layer on top of Hadoop). Usually this > loading is a nightly process – but in some small number of cases – we > load data at hourly granularity for semi-real-time applications. > Application processing over scribe log sets is typically using Hive QL. > > - one interesting angle is ‘close of books’. Scribe itself does not > provide any hard guarantees on when data for a given date will be > logged by. However several applications (especially revenue sensitive > ones) need a hard deadline (invoke me when all data for a given day > has been logged). For such applications – the loading process > typically waits until 2am or so in the night (on day N+1) and then > scans data from day N-1, N, N+1 to find all the relevant data for day > N (using unix timestamps that are typically logged with the data). > This is the data that’s loaded into date partition N for the relevant > hive table. Clearly the 2am boundary is arbitrary and we will move > towards more heuristic based ways of determining when data for a given > date is (almost) complete. > > - we have instances of text, json and thrift data sets logged via > scribe. For the case of thrift (particularly when there’s a thrift > file with heterogenous records) – we do some transforms in the copying > process to make the subsequent loading easier. Thrift data also shows > up as TFileTransport format – and this cannot be parallel processed by > Hadoop natively (although it wouldn’t be so hard to arrange that as > well) – so we always convert thrift data into sequencefiles as it’s > copied into hadoop. > > there are several pieces here that are not open sourced – and > depending on community interest can be made available. The scribe to > hdfs copier code for one. TFileTransport’s java implementation is also > not open sourced (since there is constant talk of superseding it with > newer better transports). > > Please let us know if there are more questions and would be happy to > answer. > > Joydeep > > ------ Forwarded Message > *From: *Johan Oskarsson <jo...@os...> > *Date: *Wed, 19 Nov 2008 09:36:38 -0800 > *To: *<scr...@li...> > *Subject: *[Scribeserver-users] Scribe and Hadoop > > I understand Scribe is being used to put logs into the Hadoop HDFS at > Facebook. I'd love to hear more about how that works and how to > replicate the setup you guys have. > > /Johan > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > Build the coolest Linux based applications with Moblin SDK & win great > prizes > Grand prize is a trip for two to an Open Source event anywhere in the > world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > <http://moblin-contest.org/redirect.php?banner_id=100&url=/> > _______________________________________________ > Scribeserver-users mailing list > Scr...@li... > https://lists.sourceforge.net/lists/listinfo/scribeserver-users > > > ------ End of Forwarded Message > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > ------------------------------------------------------------------------ > > _______________________________________________ > Scribeserver-users mailing list > Scr...@li... > https://lists.sourceforge.net/lists/listinfo/scribeserver-users > |
From: Joydeep S. S. <js...@fa...> - 2008-11-21 00:49:58
|
Dhruba can shed more light on chukwa - I haven't looked at it myself. Does ur mail imply that the log files are being tailed and written out to hdfs periodically (by an application outside scribe?). If so - this is not so different from what we do (except there's more code to deal with catchups etc.). -----Original Message----- From: Ian Holsman [mailto:li...@ho...] Sent: Thursday, November 20, 2008 4:41 PM To: Joydeep Sen Sarma Cc: scr...@li...; Dhruba Borthakur; Rodrigo Schmidt Subject: Re: [Scribeserver-users] Scribe and Hadoop Hi Joydeep. we currently are just doing the naivé approach writing log files directly into hadoop to individual files, rotating them every 15minutes to avoid the append problem. We used logtail on the client side to de-couple the system and a map/red job which then aggregates the info 10-30 minutes later. I was wondering if you had seen the recent contribution to hadoop called 'chukwa', and what your thoughts were on it. personally i'm looking at scribe (and chukwa) for realtime logging and decision systems. Joydeep Sen Sarma wrote: > > Hi folks, > > Can shed some light on scribe and hdfs/hadoop integration at FB: > > - when we (actually Avinash - who's leading the Cassandra project now) > started out - we investigated writing log files from scribe directly > to hdfs (using libhdfs c++ api). However there were a few issues with > this approach that steered us in a different direction: > > o hdfs uptime: there have been periods of sustained downtime and we > can't rule that out in the future. There are many reasons - software > upgrades being the most common. Buffering data in scribe for such > large periods didn't seem like a very good route > > o lack of append support in hdfs in early days > > o desire to build loosely coupled systems (otherwise we would have to > upgrade scribe servers with new libhdfs every time we had a software > upgrade on hdfs) > > o flexibility in transforming data while copying into hdfs (more on > this later) > > - currently we have a rsync like model to pull data from scribe to hdfs: > > o scribe writes data to netapp filers. These filers are high speed > buffers for the most part > > o we have 'copier' jobs that 'pull' data from scribe output locations > in these filers to hdfs. They maintain file offsets for copied data in > a registry - so that these jobs can be periodically invoked so that > continuous copying can happen. > > o 'copier' jobs can run in continous mode - or can be invoked to copy > (or re-copy) data from older dates (this can be important if incorrect > data was logged or data shows up late) > > o 'copier' jobs are map-only jobs in hadoop - this means that we can > increase the copy parallelism if required. For example - if we are > falling behind or hdfs was down for long time and there's a lot of > accumulated data - the copiers will dial up the parallelism (up to a > maximum - so as not to trip the filers up completely). > > - data 'copied' into hdfs - is eventually 'loaded' into Hive (this is > our open source date warehousing layer on top of Hadoop). Usually this > loading is a nightly process - but in some small number of cases - we > load data at hourly granularity for semi-real-time applications. > Application processing over scribe log sets is typically using Hive QL. > > - one interesting angle is 'close of books'. Scribe itself does not > provide any hard guarantees on when data for a given date will be > logged by. However several applications (especially revenue sensitive > ones) need a hard deadline (invoke me when all data for a given day > has been logged). For such applications - the loading process > typically waits until 2am or so in the night (on day N+1) and then > scans data from day N-1, N, N+1 to find all the relevant data for day > N (using unix timestamps that are typically logged with the data). > This is the data that's loaded into date partition N for the relevant > hive table. Clearly the 2am boundary is arbitrary and we will move > towards more heuristic based ways of determining when data for a given > date is (almost) complete. > > - we have instances of text, json and thrift data sets logged via > scribe. For the case of thrift (particularly when there's a thrift > file with heterogenous records) - we do some transforms in the copying > process to make the subsequent loading easier. Thrift data also shows > up as TFileTransport format - and this cannot be parallel processed by > Hadoop natively (although it wouldn't be so hard to arrange that as > well) - so we always convert thrift data into sequencefiles as it's > copied into hadoop. > > there are several pieces here that are not open sourced - and > depending on community interest can be made available. The scribe to > hdfs copier code for one. TFileTransport's java implementation is also > not open sourced (since there is constant talk of superseding it with > newer better transports). > > Please let us know if there are more questions and would be happy to > answer. > > Joydeep > > ------ Forwarded Message > *From: *Johan Oskarsson <jo...@os...> > *Date: *Wed, 19 Nov 2008 09:36:38 -0800 > *To: *<scr...@li...> > *Subject: *[Scribeserver-users] Scribe and Hadoop > > I understand Scribe is being used to put logs into the Hadoop HDFS at > Facebook. I'd love to hear more about how that works and how to > replicate the setup you guys have. > > /Johan > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > Build the coolest Linux based applications with Moblin SDK & win great > prizes > Grand prize is a trip for two to an Open Source event anywhere in the > world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > <http://moblin-contest.org/redirect.php?banner_id=100&url=/> > _______________________________________________ > Scribeserver-users mailing list > Scr...@li... > https://lists.sourceforge.net/lists/listinfo/scribeserver-users > > > ------ End of Forwarded Message > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > ------------------------------------------------------------------------ > > _______________________________________________ > Scribeserver-users mailing list > Scr...@li... > https://lists.sourceforge.net/lists/listinfo/scribeserver-users > |
From: Joydeep S. S. <js...@fa...> - 2008-11-21 00:14:09
|
Hi folks, Can shed some light on scribe and hdfs/hadoop integration at FB: - when we (actually Avinash - who's leading the Cassandra project now) started out - we investigated writing log files from scribe directly to hdfs (using libhdfs c++ api). However there were a few issues with this approach that steered us in a different direction: o hdfs uptime: there have been periods of sustained downtime and we can't rule that out in the future. There are many reasons - software upgrades being the most common. Buffering data in scribe for such large periods didn't seem like a very good route o lack of append support in hdfs in early days o desire to build loosely coupled systems (otherwise we would have to upgrade scribe servers with new libhdfs every time we had a software upgrade on hdfs) o flexibility in transforming data while copying into hdfs (more on this later) - currently we have a rsync like model to pull data from scribe to hdfs: o scribe writes data to netapp filers. These filers are high speed buffers for the most part o we have 'copier' jobs that 'pull' data from scribe output locations in these filers to hdfs. They maintain file offsets for copied data in a registry - so that these jobs can be periodically invoked so that continuous copying can happen. o 'copier' jobs can run in continous mode - or can be invoked to copy (or re-copy) data from older dates (this can be important if incorrect data was logged or data shows up late) o 'copier' jobs are map-only jobs in hadoop - this means that we can increase the copy parallelism if required. For example - if we are falling behind or hdfs was down for long time and there's a lot of accumulated data - the copiers will dial up the parallelism (up to a maximum - so as not to trip the filers up completely). - data 'copied' into hdfs - is eventually 'loaded' into Hive (this is our open source date warehousing layer on top of Hadoop). Usually this loading is a nightly process - but in some small number of cases - we load data at hourly granularity for semi-real-time applications. Application processing over scribe log sets is typically using Hive QL. - one interesting angle is 'close of books'. Scribe itself does not provide any hard guarantees on when data for a given date will be logged by. However several applications (especially revenue sensitive ones) need a hard deadline (invoke me when all data for a given day has been logged). For such applications - the loading process typically waits until 2am or so in the night (on day N+1) and then scans data from day N-1, N, N+1 to find all the relevant data for day N (using unix timestamps that are typically logged with the data). This is the data that's loaded into date partition N for the relevant hive table. Clearly the 2am boundary is arbitrary and we will move towards more heuristic based ways of determining when data for a given date is (almost) complete. - we have instances of text, json and thrift data sets logged via scribe. For the case of thrift (particularly when there's a thrift file with heterogenous records) - we do some transforms in the copying process to make the subsequent loading easier. Thrift data also shows up as TFileTransport format - and this cannot be parallel processed by Hadoop natively (although it wouldn't be so hard to arrange that as well) - so we always convert thrift data into sequencefiles as it's copied into hadoop. there are several pieces here that are not open sourced - and depending on community interest can be made available. The scribe to hdfs copier code for one. TFileTransport's java implementation is also not open sourced (since there is constant talk of superseding it with newer better transports). Please let us know if there are more questions and would be happy to answer. Joydeep ------ Forwarded Message From: Johan Oskarsson <jo...@os...> Date: Wed, 19 Nov 2008 09:36:38 -0800 To: <scr...@li...> Subject: [Scribeserver-users] Scribe and Hadoop I understand Scribe is being used to put logs into the Hadoop HDFS at Facebook. I'd love to hear more about how that works and how to replicate the setup you guys have. /Johan ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Scribeserver-users mailing list Scr...@li... https://lists.sourceforge.net/lists/listinfo/scribeserver-users ------ End of Forwarded Message |
From: Johan O. <jo...@os...> - 2008-11-19 18:04:32
|
I understand Scribe is being used to put logs into the Hadoop HDFS at Facebook. I'd love to hear more about how that works and how to replicate the setup you guys have. /Johan |
From: Anthony G. <an...@fa...> - 2008-11-11 20:57:30
|
Yes, when rotate_period is 'daily', the filename will include the date. How about you try it out? -Anthony On 11/11/08 9:56 AM, "Alex Loddengaard" <al...@cl...> wrote: Great information, Anthony. Thanks! I'm a little confused about rotate_period. Will setting this to daily change the filename format that Scribe creates? So instead of category_00001, I would see something like category_todaysdate_00001? Thanks again. Alex On Mon, Nov 10, 2008 at 1:10 PM, Anthony Giardullo <an...@fa...> wrote: After writing to category_99999, scribe will write to category_100000. You can test this out yourself by creating a file named category_99999 and then logging until it rotates to a new file. The file numbers go as high as INT_MAX. Also, I recommend setting a File Store's rotate_period to "daily" so that every day the file number will reset with a new date. And you can configure the File Store's max_size to determine how often Scribe will write to a new file. By setting either of these configuration options, it should be easier to manage your log files. -Anthony On 11/9/08 3:27 PM, "Alex Loddengaard" <al...@cl... <http://al...@cl...> > wrote: Scribe outputs files of the form "category/category_XXXXX." What happens when "XXXXX" is "99999"? Thanks. Alex ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Scribeserver-users mailing list Scr...@li... https://lists.sourceforge.net/lists/listinfo/scribeserver-users |
From: Alex L. <al...@cl...> - 2008-11-11 17:56:19
|
Great information, Anthony. Thanks! I'm a little confused about rotate_period. Will setting this to daily change the filename format that Scribe creates? So instead of category_00001, I would see something like category_todaysdate_00001? Thanks again. Alex On Mon, Nov 10, 2008 at 1:10 PM, Anthony Giardullo <an...@fa...>wrote: > After writing to category_99999, scribe will write to category_100000. > You can test this out yourself by creating a file named category_99999 and > then logging until it rotates to a new file. The file numbers go as high as > INT_MAX. > > Also, I recommend setting a File Store's rotate_period to "daily" so that > every day the file number will reset with a new date. And you can configure > the File Store's max_size to determine how often Scribe will write to a new > file. By setting either of these configuration options, it should be easier > to manage your log files. > > -Anthony > > > On 11/9/08 3:27 PM, "Alex Loddengaard" <al...@cl...> wrote: > > Scribe outputs files of the form "category/category_XXXXX." What happens > when "XXXXX" is "99999"? > > Thanks. > > Alex > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > Build the coolest Linux based applications with Moblin SDK & win great > prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Scribeserver-users mailing list > Scr...@li... > https://lists.sourceforge.net/lists/listinfo/scribeserver-users > > |
From: Anthony G. <an...@fa...> - 2008-11-10 21:11:07
|
After writing to category_99999, scribe will write to category_100000. You can test this out yourself by creating a file named category_99999 and then logging until it rotates to a new file. The file numbers go as high as INT_MAX. Also, I recommend setting a File Store's rotate_period to "daily" so that every day the file number will reset with a new date. And you can configure the File Store's max_size to determine how often Scribe will write to a new file. By setting either of these configuration options, it should be easier to manage your log files. -Anthony On 11/9/08 3:27 PM, "Alex Loddengaard" <al...@cl...> wrote: Scribe outputs files of the form "category/category_XXXXX." What happens when "XXXXX" is "99999"? Thanks. Alex |
From: Alex L. <al...@cl...> - 2008-11-09 22:27:48
|
Scribe outputs files of the form "category/category_XXXXX." What happens when "XXXXX" is "99999"? Thanks. Alex |
From: Anthony G. <an...@fa...> - 2008-11-06 22:40:43
|
Alex, See the Scribe Wiki for more information about configuration options: http://scribeserver.wiki.sourceforge.net/ If your Scribe instance is using up a ton of memory, this is most likely due to there being too many simultaneous connections trying to log large amounts of data. Scribe is built using a Thrift TNonBlocking server, and Thrift will attempt to allocate a buffer for every open connection. (Thrift developers are currently working on a feature to help manage the number of Thrift connections, but this has not yet been released.) The only way to reduce this memory usage is to either limit the number of simultaneous connections or reduce the amount of data sent in each call to Scribe. I'm not sure how you have configured Scribe, but turning on 'use_conn_pool=yes' on may help. (You would need to use connection pooling on all the Scribe servers that are sending messages to the Scribe server that is using too much memory). Feel free to send me a copy of the Scribe configuration(s) you are using and I will take a look. -Anthony |
From: Alex L. <al...@cl...> - 2008-11-06 22:03:22
|
Hey Anthony, Thanks for the info! It turns out that our downed nodes were a result of something entirely other than Scribe. Scribe had nothing to do with it, but at the time we thought it did. Thanks again! This new wiki looks great. Alex On Thu, Nov 6, 2008 at 3:40 PM, Anthony Giardullo <an...@fa...>wrote: > Alex, > > See the Scribe Wiki for more information about configuration options: > http://scribeserver.wiki.sourceforge.net/ > > If your Scribe instance is using up a ton of memory, this is most likely > due to there being too many simultaneous connections trying to log large > amounts of data. Scribe is built using a Thrift TNonBlocking server, and > Thrift will attempt to allocate a buffer for every open connection. (Thrift > developers are currently working on a feature to help manage the number of > Thrift connections, but this has not yet been released.) > > The only way to reduce this memory usage is to either limit the number of > simultaneous connections or reduce the amount of data sent in each call to > Scribe. I'm not sure how you have configured Scribe, but turning on > 'use_conn_pool=yes' on may help. (You would need to use connection pooling > on all the Scribe servers that are sending messages to the Scribe server > that is using too much memory). Feel free to send me a copy of the Scribe > configuration(s) you are using and I will take a look. > > -Anthony > |
From: Alex L. <al...@cl...> - 2008-10-31 20:51:00
|
First off, thanks for contributing Scribe! It's an awesome service, and we already have it up and running to collect Hadoop log messages. Can you provide a list of configuration options along with descriptions for each? The example configurations are not entirely self-explanatory. Also, what's a typical Scribe server's memory usage? A few of our nodes just went down due to swap and memory being entirely used. I blame this insane memory usage on a bad configuration file, but I was hoping someone else could comment on other possible reasons why Scribe would consume so much memory? I'm nearly 100% certain that no other process was consuming a large amount of memory. Thanks! Alex |
From: Anthony G. <an...@fa...> - 2008-10-23 19:24:52
|
Welcome to the Scribe Users mailing list. This list is for all users of Scribe. |