Written by: David Wright
This is one of several white papers written for Finisar. DWC was also responsible for their site architecture, much of the web content, and systems that decentralized content contribution from various departments.
Finisar NetWisdom and Capacity Planning
User demands for data are dynamic. At certain times of the day or month, user requests for data and applications can spike, putting stress on SAN resources. From end-of-month accounting to mid-morning data processing, business demands on the SAN can create bottlenecks – or even system failures – in otherwise healthy networks. Capacity planning is the process of anticipating these increased network loads and compensating for them before problems occur.
Network congestion is one of the most common reasons for network brownouts, stalls, or shutdowns, and most often encountered when the load on the hierarchy increases. Network administrators can now answer capacity questions before problems occur by using NetWisdom performance monitoring tools.
Problem scenarios are plentiful. “It is not uncommon to find an MS SQL cluster sharing storage ports with an email cluster” explains Jon Hudson, Sr. SAN & Unix Architect for Finisar. “If MS SQL usage spikes, it might consume the entire bandwidth, prohibiting Exchange from accessing data.” Capacity planning ensures that, when applications fight one another for bandwidth, the load is effectively managed by existing hardware.
How do SAN administrators anticipate changing resource loads in order to plan effectively? Until recently, tracking the performance of server hardware resources – CPU, disk, memory, links, ports, and switches – was a time consuming, complicated process, often involving every vendor of every hardware component in the system. Today, tracking performance within heterogenous systems can be accomplished using NetWisdom from Finisar.
NetWisdom is a Storage Area Network (SAN) performance monitoring solution that provides system administrators with information they need for precise capacity planning. A dedicated hardware and software monitoring tool also available as a software-only solution, NetWisdom enables SAN administrators to plan for spikes in demand for data and applications.
NetWisdom uses a 3-tiered architecture consisting of a Probe that connects to the SAN data paths or runs through the SPAN port on a switch, the Portal software that collects data from the Probes, and the Views software that presents the data in a flexible, graphical user interface. Briefly, the three function as follows:
NetWisdom Probes are connected either directly in-line through a switch or mirror port or through the use of Finisar’s Network Taps. Finisar also offers alternative software-only “switch probe” that gathers the statistics in and out of the fabric’s switches. Probes gather all of the transactions at the Initiator/Target/LUN level, providing detailed statistics on device health and performance.
The portal is a self-managing database that gathers the data from the probes and stores it for viewing and analysis. The Portal collects statistics and aggregates them over time according to user-defined schedules. It also allows Alarms to be set to that specified actions are carried out when pre-defined thresholds are breached. Users who are comfortable working with MySQL will benefit from NetWisdom Portal use of MySQL as the data container. Users can mine and collect data either from Netwisdom, or use their own home grown tools that are already customized to their environment.
This software (either Windows or Solaris based) provides a powerful user interface for viewing, analyzing and processing data collected by the Portal, yielding a consolidated picture of overall network traffic. Data can be viewed in a variety of formats, including tables, graphs and charts. Views software provides both real-time and historical data. Event recording is available as well as customizable reporting.
NetWisdom is able to measure entire SAN utilization in real time. SAN Managers can account for the precise amount of storage capacity they use currently and will need to use in the future. With historical statistic collection and the ability to trend performance and capacity across time, SAN Administrators are provided with accurate usage information to purchase only the hardware they really need.
Capacity Planning using NetWisdom begins with a complete scan of the entire SAN for a given time increment. NetWisdom Probe, performing at full line rate, provides accurate real-time statistics covering individual end-device conversations between switches, storage ports, links (physical cables) and LUNs. Administrators now know which server or application may have generated the volume, and what hardware was affected. By selecting strategic links, SAN managers can collect detailed performance and event statistics for key servers or storage devices.
NetWisdom Probe delivers real-time Fibre Channel and SCSI statistics to NetWisdom Portal. Portal then gathers, aggregates and records the SCSI and Link statistics for every second and performs statistical calculations, generates alarms and records the storage centric statistics for play-back. Key statistics are aggregated for Initiators, Targets and Target/LUNs, allowing the SAN manager to monitor accesses to a server or a device. With these device-level statistics, SAN managers can analyze exactly which devices are communicating through which link.
For analysis, NetWisdom Views provides a flexible graphical user interface that enables monitoring of multiple links simultaneously. Multiple views of Portal metrics allow the user to analyze or debug at all levels of protocol traffic (Link to Device) of the Fabric. Among the useful views available capacity planning is largely facilitated by the Graph View, which provides real-time comparison and combination of metrics, allowing trend-of-activity comparisons for Device level or Link level analysis. The Record and Playback feature of NetWisdom Views is useful for comparison of performance and health at a later date. This feature permits users with good base line data to notice, report, and act on very minor variations, thus giving the SAN manger a granularity not available with other products.
Key Issues For Capacity Planners
It is not uncommon for a network to share as many hardware components as possible – such as multiple connections to a storage array. “Problems results when a surge in demand consumes the capacity of a given component” explains Hudson. A 200MB link, for example, might be shared by two servers and function correctly most of the time. But when an application on one of the servers demands the entire 200MB pipe, or even just a larger percentage than usual, the second server may get frozen out. Users requesting applications or data on the second server will likely experience delays or errors. With the Fibre Channel protocol in particular, it’s first come, first served.
Queue management is an important part of capacity planning. Unexpected delays and even data flow interruption can result when disc access bogs due to overloaded queues. The maximum queue depth represents the number of outstanding commands to a single LUNs (logical unit number) or target port.
Often, the fix is as simple as increasing the queue-depth from low pre-set factory defaults to a higher setting. Other problems can be more difficult, such as slow disk access resulting from an overloaded storage controller, or storage servers sharing the same switch with application servers. “The problem can be hardware – multiple connections to a storage array, for example – or software” explains Hudson. “When multiple software applications are running on a host, one application can steal resources from the others.”
SAN administrators have no way of watching or tracking low-level errors. “Most operating systems can not report on low-level errors because they are being handled at a lower part of the stack” says Hudson. “Unfortunately, that information is critical for both error detection and capacity planning. NetWisdom gives administrators the specific LUN-level details required for this type of work.”
Conclusion: Capacity Planning with NetWisdom: From a “Dark” Art to a Science
“By providing real-time statistics covering individual end-device conversations and overall SAN performance, NetWisdom allows SAN managers to become proactive instead of a reactive” says Hudson. By identifying trends, managers can create an infrastructure that is sustainable even during periods of high user demand on the system. As network design tools such as Visio, Veritas SANPoint Control and others become more prevalent for network modeling, it becomes even more critical that decisions and designs are made based on real data – not assumptions. NetWisdom provides the specific data points that help transform capacity planning from a “dark” art to a precise science.