Throughout my career as an IT professional, I have set up, knocked down, broke, fixed, spilled blood, and cried over network monitoring systems. From an IT infrastructure standpoint, justification for an NMS is easy. You want to know what, when and where something happens in your environment, so that you can determine the "WHY" and "HOW."
This article focuses more on the basics of an NMS and not necessarily a sales pitch for a specific product. With that being said, let's start with breaking down what I'll be talking about:
- Where to start
- What to monitor
- How to monitor and alarm
- When to send alarms
- Proactive vs. Reactive
Where to Start
Start by determining your needs of your organization. Where else would you start? Ask everyone who is responsible for the operations of network, server, and storage at your place of business. Once you have the list, look at your SLA's (or create them). Evaluate your necessary response time and expectations for the specific items in the "needs" list. Once you get this far, it's time to look into products, sit through demo's, watch YouTube videos, meet with sales teams, and perform your own due diligence.
What to Monitor
So you built a list on the needs for your organization. Let's use that as a guide and continue adding to it. All of the data center assets should be the priority. CPU Utilization, Disk Utilization, Memory Utilization, IOPS, Basic ICMP checking, Windows Services, Basic Ports (80, 443, 22, etc) - and these are just a start. Every environment is different and chances are, there are a variety of items that need specific monitoring.
How to Monitor & Alarm
Now that you understand the needs of your organization and know what you're going to monitor, it's time to figure out how you're going to do this. A robust NMS will have tools built in that capture data for every monitor that is configured. Some NMS products require remote agents to be installed on devices, however most will simply rely on data to be sent to the monitoring host in some capacity. ICMP (ping), WMI & SNMP are by far the easiest methods to ensure uptime in your organization.
When to Send Alarms
So your monitors are ready and you want to be notified when something goes down or is reaching a threshold. Determine how you will receive the alarms - typically via the NMS web interface, via SMS (text message), or email. Then, determine frequency, schedules (ie SMS after hours), and make sure it works for your team and meets the expectations/SLAs.
Proactive vs. Reactive
One beauty of an NMS that should be a selling point on all solutions is to have the ability to respond to issues before they happen - or at least before end users are impacted. Configuring specific and/or multiple thresholds on devices is key to ensuring as much proactivity as possible in your environment. An example of a proactive threshold monitor would be to configure a C: drive on a Windows server at 90% utilization. Depending on the size of the drive, you should have enough time to log in and clean up items before the drive fills up and locks the server completely. Being reactive in IT is all too common, but with a robust NMS in your environment, you could potentially be as proactive as possible.
A decent NMS will have some kind of database ability that stores values of every monitor it collects. This will allow you view historical data and view how your environment is acting, ensure purge jobs are set up properly, etc.
Like trending, reporting is another bi-product of an NMS that will allow your team and/or managers to review your inventory, plan for future resources, view performance and work strategies, and many more.
Using this as a guide, you should have a successful implementation of an NMS. I purposely left specifics out of the article, but if you have any questions, please contact me. Thank you and happy monitoring!