From: Guo, J. <ju...@eb...> - 2017-03-29 02:54:23
|
Hi, I’m writing this mail to discuss whether Ganglia works well for a large-scale cluster (more than 4000 nodes). As per Ganglia document, ganglia can scale to handle clusters with 2000 nodes. So many people have concern on using Ganglia for a 4000 nodes production cluster. It has been used to link clusters across university campuses and around the world and can scale to handle clusters with 2000 nodes. If the cluster is large than 2000 nodes, say 4000 nodes, can Ganglia handle it properly? To verify this, I create a 5000 nodes ganglia cluster on top of Docker cluster (10 machine). I put 500 nodes in a cluster, so there are 10 cluster. And these 10 clusters are in the same Grid. For each gmond, I use a script to generate 30 customized metrics (with gmetric). Currently it works fine in the Docker based test environment. So, my question is whether Ganglia is suitable for 4000 nodes cluster? Thanks & Best Regards, Jason Gu0o |
From: Guo, J. <ju...@eb...> - 2017-03-30 08:30:46
|
Thanks Vladimir As you mentioned, FB had clusters with tens of thousands of nodes in a cluster. How they orchestrate these nodes? Here are some options in my mind 1. All the nodes share a few centralized gmonds and all of them belong to a single cluster (the cluster concept in ganglia) 2. All the nodes share a few centralized gmonds and each centralized gmond belong to different cluster, and there is a single gmetad which poll data from these centralized gmond 3. There are multiple gmetad/grid and then orchestrate these grids with a centralized gmetad/grid\ Thanks & Best Regards, Jason Guo From: Vladimir Vuksan <vl...@ve...> Date: Wednesday, March 29, 2017 at 20:09 To: "Guo, Jason" <ju...@eb...>, "gan...@li..." <gan...@li...> Subject: Re: [Ganglia-developers] Does Ganglia work well for a large-scale cluster Hi Jason, it depends on the number of metrics and associated metadata in the cluster and how busy gmetad is overall. Also depends on your hardware. At one point FB had clusters with tens of thousands of nodes in a cluster. Try to keep your metrics lean ie. don't add any metric descriptions if you don't have to so to keep the XML payload small and it should be fine. Vladimir 3/28/2017 u 10:19 PM, Guo, Jason je napisao/la: Hi, I’m writing this mail to discuss whether Ganglia works well for a large-scale cluster (more than 4000 nodes). As per Ganglia document, ganglia can scale to handle clusters with 2000 nodes. So many people have concern on using Ganglia for a 4000 nodes production cluster. It has been used to link clusters across university campuses and around the world and can scale to handle clusters with 2000 nodes. If the cluster is large than 2000 nodes, say 4000 nodes, can Ganglia handle it properly? To verify this, I create a 5000 nodes ganglia cluster on top of Docker cluster (10 machine). I put 500 nodes in a cluster, so there are 10 cluster. And these 10 clusters are in the same Grid. For each gmond, I use a script to generate 30 customized metrics (with gmetric). Currently it works fine in the Docker based test environment. So, my question is whether Ganglia is suitable for 4000 nodes cluster? Thanks & Best Regards, Jason Gu0o ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Ganglia-developers mailing list Gan...@li...<mailto:Gan...@li...> https://lists.sourceforge.net/lists/listinfo/ganglia-developers |
From: Vladimir V. <vl...@ve...> - 2017-03-30 13:23:09
|
<html> <head> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"> </head> <body bgcolor="#FFFFFF" text="#000000"> <div class="moz-cite-prefix">Clusters are logical grouping of like hosts. This can be e.g. per location (same data center), per app or per function (DB, web, etc.). It really depends how you are viewing your environment. There is no right or wrong way to group it.<br> <br> Vladimir<br> <br> 03/30/2017 u 04:30 AM, Guo, Jason je napisao/la:<br> </div> <blockquote cite="mid:EFF...@eb..." type="cite"> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <meta name="Title" content=""> <meta name="Keywords" content=""> <meta name="Generator" content="Microsoft Word 15 (filtered medium)"> <style><!-- /* Font Definitions */ @font-face {font-family:"Courier New"; panose-1:2 7 3 9 2 2 5 2 4 4;} @font-face {font-family:Wingdings; panose-1:5 0 0 0 0 0 0 0 0 0;} @font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4;} @font-face {font-family:DengXian; panose-1:2 1 6 0 3 1 1 1 1 1;} @font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0in; margin-bottom:.0001pt; font-size:12.0pt; font-family:Calibri;} a:link, span.MsoHyperlink {mso-style-priority:99; color:#0563C1; text-decoration:underline;} a:visited, span.MsoHyperlinkFollowed {mso-style-priority:99; color:#954F72; text-decoration:underline;} p {mso-style-priority:99; mso-margin-top-alt:auto; margin-right:0in; mso-margin-bottom-alt:auto; margin-left:0in; font-size:12.0pt; font-family:"Times New Roman";} pre {mso-style-priority:99; mso-style-link:"HTML Preformatted Char"; margin:0in; margin-bottom:.0001pt; font-size:10.0pt; font-family:"Courier New";} p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph {mso-style-priority:34; margin-top:0in; margin-right:0in; margin-bottom:0in; margin-left:.5in; margin-bottom:.0001pt; font-size:12.0pt; font-family:Calibri;} span.EmailStyle17 {mso-style-type:personal; font-family:Calibri; color:windowtext;} span.HTMLPreformattedChar {mso-style-name:"HTML Preformatted Char"; mso-style-priority:99; mso-style-link:"HTML Preformatted"; font-family:Courier;} span.EmailStyle21 {mso-style-type:personal-reply; font-family:Calibri; color:windowtext;} span.msoIns {mso-style-type:export-only; mso-style-name:""; text-decoration:underline; color:teal;} .MsoChpDefault {mso-style-type:export-only; font-size:10.0pt;} @page WordSection1 {size:8.5in 11.0in; margin:1.0in 1.0in 1.0in 1.0in;} div.WordSection1 {page:WordSection1;} /* List Definitions */ @list l0 {mso-list-id:1830517967; mso-list-type:hybrid; mso-list-template-ids:-797825752 67698689 67698691 67698693 67698689 67698691 67698693 67698689 67698691 67698693;} @list l0:level1 {mso-level-number-format:bullet; mso-level-text:; mso-level-tab-stop:none; mso-level-number-position:left; text-indent:-.25in; font-family:Symbol;} @list l0:level2 {mso-level-number-format:bullet; mso-level-text:o; mso-level-tab-stop:none; mso-level-number-position:left; text-indent:-.25in; font-family:"Courier New";} @list l0:level3 {mso-level-number-format:bullet; mso-level-text:; mso-level-tab-stop:none; mso-level-number-position:left; text-indent:-.25in; font-family:Wingdings;} @list l0:level4 {mso-level-number-format:bullet; mso-level-text:; mso-level-tab-stop:none; mso-level-number-position:left; text-indent:-.25in; font-family:Symbol;} @list l0:level5 {mso-level-number-format:bullet; mso-level-text:o; mso-level-tab-stop:none; mso-level-number-position:left; text-indent:-.25in; font-family:"Courier New";} @list l0:level6 {mso-level-number-format:bullet; mso-level-text:; mso-level-tab-stop:none; mso-level-number-position:left; text-indent:-.25in; font-family:Wingdings;} @list l0:level7 {mso-level-number-format:bullet; mso-level-text:; mso-level-tab-stop:none; mso-level-number-position:left; text-indent:-.25in; font-family:Symbol;} @list l0:level8 {mso-level-number-format:bullet; mso-level-text:o; mso-level-tab-stop:none; mso-level-number-position:left; text-indent:-.25in; font-family:"Courier New";} @list l0:level9 {mso-level-number-format:bullet; mso-level-text:; mso-level-tab-stop:none; mso-level-number-position:left; text-indent:-.25in; font-family:Wingdings;} @list l1 {mso-list-id:2087918548; mso-list-type:hybrid; mso-list-template-ids:-1378208578 67698703 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;} @list l1:level1 {mso-level-tab-stop:none; mso-level-number-position:left; text-indent:-.25in;} @list l1:level2 {mso-level-number-format:alpha-lower; mso-level-tab-stop:none; mso-level-number-position:left; text-indent:-.25in;} @list l1:level3 {mso-level-number-format:roman-lower; mso-level-tab-stop:none; mso-level-number-position:right; text-indent:-9.0pt;} @list l1:level4 {mso-level-tab-stop:none; mso-level-number-position:left; text-indent:-.25in;} @list l1:level5 {mso-level-number-format:alpha-lower; mso-level-tab-stop:none; mso-level-number-position:left; text-indent:-.25in;} @list l1:level6 {mso-level-number-format:roman-lower; mso-level-tab-stop:none; mso-level-number-position:right; text-indent:-9.0pt;} @list l1:level7 {mso-level-tab-stop:none; mso-level-number-position:left; text-indent:-.25in;} @list l1:level8 {mso-level-number-format:alpha-lower; mso-level-tab-stop:none; mso-level-number-position:left; text-indent:-.25in;} @list l1:level9 {mso-level-number-format:roman-lower; mso-level-tab-stop:none; mso-level-number-position:right; text-indent:-9.0pt;} ol {margin-bottom:0in;} ul {margin-bottom:0in;} --></style> <div class="WordSection1"> <p class="MsoNormal"><span style="font-size:11.0pt">Thanks Vladimir<o:p></o:p></span></p> <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p> <p class="MsoNormal"><span style="font-size:11.0pt">As you mentioned, FB had clusters with tens of thousands of nodes in a cluster. <o:p></o:p></span></p> <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p> <p class="MsoNormal"><span style="font-size:11.0pt">How they orchestrate these nodes? Here are some options in my mind<o:p></o:p></span></p> <p class="MsoListParagraph" style="text-indent:-.25in;mso-list:l1 level1 lfo2"><!--[if !supportLists]--><span style="font-size:11.0pt"><span style="mso-list:Ignore">1.<span style="font:7.0pt "Times New Roman""> </span></span></span><!--[endif]--><span style="font-size:11.0pt">All the nodes share a few centralized gmonds and all of them belong to a single cluster (the cluster concept in ganglia)<o:p></o:p></span></p> <p class="MsoListParagraph" style="text-indent:-.25in;mso-list:l1 level1 lfo2"><!--[if !supportLists]--><span style="font-size:11.0pt"><span style="mso-list:Ignore">2.<span style="font:7.0pt "Times New Roman""> </span></span></span><!--[endif]--><span style="font-size:11.0pt">All the nodes share a few centralized gmonds and each centralized gmond belong to different cluster, and there is a single gmetad which poll data from these centralized gmond<o:p></o:p></span></p> <p class="MsoListParagraph" style="text-indent:-.25in;mso-list:l1 level1 lfo2"><!--[if !supportLists]--><span style="font-size:11.0pt"><span style="mso-list:Ignore">3.<span style="font:7.0pt "Times New Roman""> </span></span></span><!--[endif]--><span style="font-size:11.0pt">There are multiple gmetad/grid and then orchestrate these grids with a centralized gmetad/grid\<o:p></o:p></span></p> <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p> <p class="MsoNormal"><span style="font-size:11.0pt">Thanks & Best Regards,<o:p></o:p></span></p> <p class="MsoNormal"><span style="font-size:11.0pt">Jason Guo<o:p></o:p></span></p> <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p> <div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in"> <p class="MsoNormal"><b><span style="color:black">From: </span></b><span style="color:black">Vladimir Vuksan <a class="moz-txt-link-rfc2396E" href="mailto:vl...@ve..."><vl...@ve...></a><br> <b>Date: </b>Wednesday, March 29, 2017 at 20:09<br> <b>To: </b>"Guo, Jason" <a class="moz-txt-link-rfc2396E" href="mailto:ju...@eb..."><ju...@eb...></a>, <a class="moz-txt-link-rfc2396E" href="mailto:gan...@li...">"gan...@li..."</a> <a class="moz-txt-link-rfc2396E" href="mailto:gan...@li..."><gan...@li...></a><br> <b>Subject: </b>Re: [Ganglia-developers] Does Ganglia work well for a large-scale cluster<o:p></o:p></span></p> </div> <div> <p class="MsoNormal"><span style="font-family:"Times New Roman""><o:p> </o:p></span></p> </div> <div> <p class="MsoNormal">Hi Jason,<br> <br> it depends on the number of metrics and associated metadata in the cluster and how busy gmetad is overall. Also depends on your hardware. At one point FB had clusters with tens of thousands of nodes in a cluster. <br> <br> Try to keep your metrics lean ie. don't add any metric descriptions if you don't have to so to keep the XML payload small and it should be fine.<br> <br> Vladimir<br> <br> 3/28/2017 u 10:19 PM, Guo, Jason je napisao/la:<o:p></o:p></p> </div> <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt"> <p class="MsoNormal"><span style="font-size:11.0pt">Hi,</span><o:p></o:p></p> <p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p> <p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p> <p class="MsoNormal"><span style="font-size:11.0pt">I’m writing this mail to discuss whether Ganglia works well for a large-scale cluster (more than 4000 nodes).</span><o:p></o:p></p> <p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p> <p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p> <p class="MsoNormal"><span style="font-size:11.0pt">As per Ganglia document, ganglia can scale to handle clusters with 2000 nodes. So many people have concern on using Ganglia for a 4000 nodes production cluster.</span><o:p></o:p></p> <p class="MsoNormal"><i><span style="font-size:11.0pt">It has been used to link clusters across university campuses and around the world and can <b>scale to handle clusters with 2000 nodes</b>.</span></i><o:p></o:p></p> <p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p> <p class="MsoNormal"><span style="font-size:11.0pt">If the cluster is large than 2000 nodes, say 4000 nodes, can Ganglia handle it properly?</span><o:p></o:p></p> <p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p> <p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p> <p class="MsoNormal"><span style="font-size:11.0pt">To verify this, I create a 5000 nodes ganglia cluster on top of Docker cluster (10 machine). </span><o:p></o:p></p> <p class="MsoNormal"><span style="font-size:11.0pt">I put 500 nodes in a cluster, so there are 10 cluster. And these 10 clusters are in the same Grid.</span><o:p></o:p></p> <p class="MsoNormal"><span style="font-size:11.0pt">For each gmond, I use a script to generate 30 customized metrics (with gmetric).</span><o:p></o:p></p> <p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p> <p class="MsoNormal"><span style="font-size:11.0pt">Currently it works fine in the Docker based test environment.</span><o:p></o:p></p> <p class="MsoNormal"><span style="font-size:11.0pt"> </span><o:p></o:p></p> <p class="MsoNormal"><span style="font-size:11.0pt">So, my question is whether Ganglia is suitable for 4000 nodes cluster?</span><o:p></o:p></p> </blockquote> </div> </blockquote> <br> </body> </html> |
From: Anders B. <an...@ec...> - 2017-03-30 21:36:42
|
Also, if the size of the XML payload is the biggest concern (rather than the sheer amount of XDR traffic) then gzip compression would be a good idea: gzip_output = yes See https://www.quantcast.com/blog/quantcast-open-source-diaries-ganglia-gzip/ for some background. Also might want to look into using rrdcached ? https://github.com/ganglia/monitor-core/wiki/Integrating-Ganglia-with-rrdcached /Anders Den 2017-03-30 kl. 15:22, skrev Vladimir Vuksan: Clusters are logical grouping of like hosts. This can be e.g. per location (same data center), per app or per function (DB, web, etc.). It really depends how you are viewing your environment. There is no right or wrong way to group it. Vladimir 03/30/2017 u 04:30 AM, Guo, Jason je napisao/la: Thanks Vladimir As you mentioned, FB had clusters with tens of thousands of nodes in a cluster. How they orchestrate these nodes? Here are some options in my mind 1. All the nodes share a few centralized gmonds and all of them belong to a single cluster (the cluster concept in ganglia) 2. All the nodes share a few centralized gmonds and each centralized gmond belong to different cluster, and there is a single gmetad which poll data from these centralized gmond 3. There are multiple gmetad/grid and then orchestrate these grids with a centralized gmetad/grid\ Thanks & Best Regards, Jason Guo From: Vladimir Vuksan <vl...@ve...><mailto:vl...@ve...> Date: Wednesday, March 29, 2017 at 20:09 To: "Guo, Jason" <ju...@eb...><mailto:ju...@eb...>, "gan...@li..."<mailto:gan...@li...> <gan...@li...><mailto:gan...@li...> Subject: Re: [Ganglia-developers] Does Ganglia work well for a large-scale cluster Hi Jason, it depends on the number of metrics and associated metadata in the cluster and how busy gmetad is overall. Also depends on your hardware. At one point FB had clusters with tens of thousands of nodes in a cluster. Try to keep your metrics lean ie. don't add any metric descriptions if you don't have to so to keep the XML payload small and it should be fine. Vladimir 3/28/2017 u 10:19 PM, Guo, Jason je napisao/la: Hi, I’m writing this mail to discuss whether Ganglia works well for a large-scale cluster (more than 4000 nodes). As per Ganglia document, ganglia can scale to handle clusters with 2000 nodes. So many people have concern on using Ganglia for a 4000 nodes production cluster. It has been used to link clusters across university campuses and around the world and can scale to handle clusters with 2000 nodes. If the cluster is large than 2000 nodes, say 4000 nodes, can Ganglia handle it properly? To verify this, I create a 5000 nodes ganglia cluster on top of Docker cluster (10 machine). I put 500 nodes in a cluster, so there are 10 cluster. And these 10 clusters are in the same Grid. For each gmond, I use a script to generate 30 customized metrics (with gmetric). Currently it works fine in the Docker based test environment. So, my question is whether Ganglia is suitable for 4000 nodes cluster? |