In today’s constantly connected, fast-paced business environment, a high performing network is critical for all organizations. If network performance is less than optimal, it can impact user experience, revenue and even brand reputation.
Network key performance indicators (KPIs) are benchmarks by which optimal network performance is determined. Tracking network performance against key performance indicators helps network managers make proactive decisions to ensure agreed service levels are met.
KPIs offer measurable data that helps the network team make informed decisions about infrastructure upgrades, performance improvements and resource allocation, based on actual needs and usage trends.
However, KPIs are not only used for demonstrating network performance levels to other parties within the business, they’re also an invaluable tool for all network managers to measure and assess the quality and capability of the network.
When troubleshooting network degradation or outages, engineers rely on KPIs to determine the root cause of degradation, such as packet loss, saturation, bandwidth hogs, or interface and network device outages.
With the right monitoring solution like Statseeker, network engineers can quickly identify and address these issues, ensuring the network runs smoothly and service levels are consistently met. A tool that removes a lot of the manual work needed to find and compile this data means the network team spends less time searching for problems and more time making decisions and enacting network improvements.
But which metrics provide the essential information required for measuring network performance? While there is not a defined set of universal standards, the following five categories tend to be the most commonly used:
1. Device health
CPU, memory utilization, and component temperatures are the foundation for proactive device health monitoring. Network engineers should be able to obtain and report on these metrics (or similar) from every device on the network.
Device health metrics can help identify hardware or firmware issues, environmental changes, and monitor equipment performance. As CPU, memory and temperature metrics are the most common device health KPIs, setting alerts on these metrics means problems can be addressed before network users notice or report an issue.
For example, consistent air conditioning temperature readings in a data center will form part of a regular pattern of behavior for devices. However, an overworked device that reports a rise in temperature readings can be identified as an anomaly, your team is alerted in real time, and the issue can be fixed as quickly as possible.
With the right network monitoring tool in place, a baseline of regular device health behavior can be established. Even the smallest deviation from the baseline can then be identified using threshold rules and alerts and caught before it leads to a bigger issue.
2. Device availability
Ideally, the network team should be able to see the availability of every device on the network, in as near to real time as possible, by using a network performance monitoring solution which spot outages in under a minute of the outage occurring and generates an alert. If the network team only becomes aware that a router or a managed switch is not working when a user calls to complain, that is too late.
The more detailed the information engineers have about device availability, the better they can assess which devices experience the most outages.
A device which experiences a five-minute outage every day, for example, causes four business days of impact to the network team and the business over the span of a year.
Outage event data, retained in full (never averaged or rolled up), allows engineers to pinpoint the precise times devices were down, even going back months.
3. Latency and packet loss
Today, every industry depends heavily on high quality voice and video communications. The performance of these services can be significantly affected by challenges such as latency, jitter, or packet loss.
Measuring network latency and packet loss together gives you an early warning sign and an indication of network problems. A trending increase in latency or packet loss usually affects the user experience and services delivered across the network. It’s time to investigate!
In environments where network latency is operation-critical, such as hospitals or high-frequency trading companies, being able to report on sub millisecond RTT latency is essential. Network engineers will also benefit from having a solution which allows them to compare a single device from two (or more) separate locations to view latency differences that impact the device and the user experience.
4. Network interface availability and utilization
Common KPIs, such as volume of network traffic, can be measured using SNMP polling. SNMP polling also provides network engineers with visibility of the number of errors and discards per interface, inbound and outbound.
Interfaces can be polled for availability or inactivity. Measuring the availability and utilization of interfaces on a network gives engineers insights into whether these critical links are delivering the expected service level. Any downtime on an interface during core service hours could translate into revenue loss and a reduction in user productivity. Bandwidth data is a valuable measure to assess the capacity of the network and predict the need for capacity upgrades.
Polling network interfaces for traffic statistics, errors and discards combine to signal the health and performance of a link. Network engineers can use this data to highlight any potential degradation or saturation. If a link is being updated or needs further inspection, a live view of the traffic link is important.
5. Device-specific metrics
Most SNMP-enabled network devices provide common key performance metrics. However, certain devices provide additional metrics for network functions which are as important for monitoring user experience as the universal data.
For example, additional metrics can help determine whether UPS units are charging correctly, ensure load balancers are effectively distributing connections across different pools, detect unusual spikes in VPN bandwidth that may be impacting user experience or monitor firewall activity to identify potential security threats and ensure proper traffic filtering.
Being able to access and visualize this critical performance data in your NOC is key to ensuring the networking team can resolve incidents before they become outages. But being able to bring together metrics from multiple device types, in a single dashboard, without the need to examine multiple tools to reach the right information is ideal.
Find out how Statseeker helps network teams deliver optimal network performance, or try Statseeker for yourself, access a cloud demo.
Related Categories