It was winter of 1992. I had just begun to learn Computer Trouble-shooting. It was fascinating to see the kind of intelligence built into the ubiquitous Personal Computer. It had a way to tell the Power User or the Support Engineer what was wrong with it. It used to spit cryptic error codes like ‘201’ or ‘301’ to say if something was wrong with the Programmable Peripheral Interface on its motherboard or with the UART on the Serial Port adapter that attached to it through a Bus. But, how will it communicate if the PC was not functional enough to let the Power On Self Test (POST) use the display card?
It had a way around this road block, too. It then used the simple speaker -- more appropriate to be called the beeper, given the polyphonic advancements with speakers now available on much smaller and cheaper devices. One beep indicated some failure and two indicated some other. A short beep indicated certain problem and a long beep quite another.
Over time, specific parts of the Personal Computer began to come equipped with embedded diagnostics. Hard drives were among the first subsystems to become intelligent. For instance, Compaq had come up with Intellisafe drives – drives that could think and talk! An Intellisafe drive monitored several aspects of its operation such as Spin Up time, Temperature, etc., to determine if it was moving away from health. It notified the user that the drive is about to fail, when it noticed any of the performance parameters falling below acceptable levels. Instead of having to respond to an emergency, the user now had ample time to backup his data and replace the drive under Pre-Failure Warranty. Subsequently, the standard came to be ratified as S.M.A.R.T. (Self-Monitoring, Analysis & Reporting Technology). IBM had been a fore-runner to Compaq with its PFA (Predictive Failure Analysis).
When heavily upgraded Personal Computers began to be used as Servers for File Storage or for Database applications with several other computers on the network dependent on them, ‘Availability’ became a critical requirement. Concepts such as RAID and Redundant Power Supply had advanced availability of individual parts by leaps and bounds. But what if the computing power itself suddenly became unavailable? Server Companies responded with Out-of-band management techniques, naming them Remote Insight (RI), integrated Lights Out (iLO) and so on.
This allowed control over the Servers from outside the Local Area Network, when the servers stopped responding to input from within the network. When the Server hung and a virtual console application on one of the network’s desktop could not be used to restart the server, the only way to reboot the machine was to gain physical access to the Server Room. A very smart alternative for the administrator who was at home during off-hours or who was managing multiple sites remotely or had to manage servers in a hosted (and therefore physically secured environment) was to somehow gain access to the Server Hardware, as if he was physically next to it. A specialized card plugged into one of the Servers IO slots, with a dialup connection to it for the Administrator to access remotely from anywhere, allowed him such access.Alongside Servers, networking devices such as the Modems, Switches and Routers brought about greater sophistication in embedded diagnostics. The LEDs on the Modem or the Switch indicated if the corresponding ports were working fine. A dumb terminal could be connected to the console port of the switch or the router to find out detailed information about how packets were being transmitted or dropped. With the emergence of networking, smart diagnostic technologies found their way from computing machines into connecting devices. However, the information churned by the networking device continued to be cryptic and required a highly trained professional to decrypt them and fix the issue.
The increasing complexity of telecommunication devices and the spiraling need for highly trained support engineers compounded several problems in the support industry. The cut-throat competition among System Integrators in an industry already dependent on expensive knowledge workers resulted in slender margins getting ever thinner. The dependence on skilled professionals meant that quality suffered whenever an engineer was lost through attrition. To address the Revenue-Cost dynamics and the demand-supply equation, today’s products come with very intelligent diagnostics embedded within. These not only provide meaningful information about the error conditions but also step-by-step instructions to resolve them.
Cisco’s Smart Call Home (SCH) is a case in point, providing higher availability through proactive, fast issue resolution. The built in diagnostics provide proactive, detailed and real-time alerts on core network devices and even remediation recommendations based on Cisco proven practices, to help identify and resolve issues quickly, conserving valuable staff time and improving network availability.
SCH provides increased operational efficiency through reduced troubleshooting time. When the problem is serious, it generates a Service Request to Cisco that is routed to the right team for the specific problem. Service requests to Cisco include relevant diagnostic and product information, so user does not have to repeat information when engaging with Cisco engineer. The figure above illustrates how a problem that earlier required 2 business days for resolution is now solved in under 6 hours.
The devices with SCH capability continuously monitor their own health using Generic OnLine Diagnostics (GOLD) technology and automatically notify user of potential issues even before a failure occurs, using secure transmissions. SCH gives the user the option of receiving proactive notifications of problems that are likely to be emerging issues, such as high temperature independent of any fan failure or accumulating single-bit memory errors.
SCH also provides quick access to information. It provides recommendations using email notifications and SCH web portal. Security advisories, field notices and end-of-life notices are personalized for the user’s hardware and software inventory.
From the humble beginnings of POST and BIST (Built In Self Test) all the way to the Smart Call Home, the world of Embedded Diagnostics has come a long, long way! Soon, it is going to be TOTAL Machine To Machine (M2M) -- diagnostics embedded locally or on a remote device not just recommending a fix but also implementing it through automation. Such days are not far away!