CAT
CAT is the basic component of the server project
...It offers real-time dashboards showing throughput, response times, error rates, and service dependency graphs to help operations and development collaborate on reliability issues. In addition to metrics, it enables tracing—propagating context across RPC boundaries so problems like latency spikes or failed calls can be traced end-to-end. Alert rules and anomaly detection can be defined to notify teams proactively. The system supports multiple data backends and ingestion pipelines to collect data from JVM, C/C++, Python, and other ecosystems. With the collected data, Cat supports analysis of hotspots, trending anomalies, and capacity planning to drive continuous reliability improvements.