After 5 years spent in an IBM’s ISV subsidiary and then 5 year within Bull Evidian’s R&D team, Bruno Farcy is managing the Bull’s "System Software Development" R&D team since 2002.
He also represents Bull within the DMTF (Distributed Management Task Force) consortium.
Why a System Management solution?
The common denominator between a JOnAS-based application server, a database solution under Windows and SQL Server, a Linux HPC cluster and a Web server farm, is (of course!) Bull’s NovaScale server range, eminently capable of hosting these very diverse environments.
When a solution of this type is installed, the priority is always the smooth operation of the application concerned. To meet this requirement, configuration, surveillance and control tools are more often supplied with the solution. This is a must, but is not in itself always sufficient to ensure continuous availability. Indeed, this application is very closely tied in with the status of its ecosystem: both the hardware, and the host operating system. Therefore, it is usually viewed as a System Management solution that handles monitoring and control of the host system’s physical and logical resources.
What is a System Management solution?
In the pyramid of systems administration products, a System Management solution must above all satisfy the demands of the pyramid’s lower levels: Platform Management and OS Management. This article will not tackle the subject of Enterprise Management – which handles systems administration functions for applications, service quality and Business Processes management. In the first place, System Management must offer a centralized input point that is uniform and secure for the smooth administration of the system’s logical and physical components.
Next, ergonomics are fundamental. It must be adapted to all phases in the product’s life cycle: installation, configuration and operation.
Finally, the impact on the system itself must be minimal. System Management must never interfere with the application, particularly when it comes to performance.
The security aspect is also important, and involves authentication, role definition, certification and encryption mechanisms controlling access to system data and control functions.
Functions provided by System Management
The main function of a systems administration tool is monitoring and more precisely, detection of errors in the system. This translates in the first instance into being able to represent the entire system graphically.
One should be able to organize this to suit logical requirements (machines allocated to a particular project, to a particular application, by Operating System…) and/or depending on physical requirements (grouping machines geographically, by network, cluster, etc.). The monitoring function should make effective use of color, as shown in Figure 1.
Figure 1: Example of a graphical representation configured in NovaScale Master.
Status levels can have several source formats:
- Discrete value indicators: OK or KO. For example, the indicators that signal the presence of a component, a process, or an intrusion
- Digital indicators: these indicate the situation with respect to a threshold (or several associated thresholds) defining a status. For example, percentage of memory used, temperature of a component, the number of processes currently being executed
- Alarms: detection of an event, assessed in terms of its severity, enabling a status to be defined. For example, OS log events or an SNMP trap originating from a blade server housing.
To make the job of monitoring easier and more systematic (an operator will rarely stay in front of his control workstation screen the whole time) changes in status are linked to notification mechanisms, that can be used by management frameworks such as Evidian OpenMaster, Tivoli, OpenView and others. These will warn a designated addressee (a person or an application) if a problem occurs.
Once a problem has been identified by a change in status, either as a result of a regularly-run query, or if an alert is received, the systems administrator will try to discover additional information so as to understand what has happened. He will want to know the probable causes, the chronology, the background to the incident, etc. The two functions that are useful when it comes to doing this are the inventory information; machine type, disk capacity, OS type (see Figure 2), number of processes, etc. and operating reports or summary reporting (status log, numerical graphs and displays…).
Figure 2: Example of an OS inventory displayed in NovaScale Master
The inventory information contributes to understanding the background to a problem, enabling us to then quantify the problem in terms of time (Since when? How many times? Did the problem manifest itself suddenly or over a period of time? …). Figure 3 shows an example of a digitized display.
Figure 3: Example of a digital graph produced by NovaScale Master
Reporting can also be used preventively to monitor system loading and performance, allowing us to anticipate subsequent problems.
When the problem has been analyzed and understood, it only remains to act on the system, at best to solve the problem or at worst to implement a bypass. To do this, the System Management solution should provide Remote Control tools and/or access to these tools if they exist elsewhere.
This usually takes one of two forms: control GUIs (access to OS sessions, configuration tools, etc.) and scriptable commands, which can be integrated within batches. For this kind of domains to work really effectively, the openness of the solution is crucial. This translates into being able to integrate contextual calls from third-party tools, and to use open and recognized protocol standards such as SNMP, CIM/WBEM, WS-Management, IPMI, etc.
Most of these systems administration standards are implemented in Bull’s NovaScale Master solution supplied with the NovaScale range.
NovaScale Master: NovaScale System Management
The topological definition of the NovaScale monitoring system will always be the first step of its integration into a System Management solution. To facilitate this definition, NovaScale Master offers a simple topological model. This provides a basic scheme for grouping machines, as well as linking functions to system components (clusters, servers, disk bays, housings, etc.).
NovaScale Master covers all the functions of a System Management solution that we have just described and illustrated, for all Bull’s NovaScale Intensive and NovaScale Universal ranges.
Services provided include monitoring, notification, reporting and inventory services, and more.
Figure 4: Example of the NovaScale Master console
And it provides, as illustrated in Figure 4, an open and configurable Web console, which plays the role of a collective and uniform input point for server administration and all peripherals.
NovaScale Master: an open architecture.
Technically speaking, NovaScale Master is a Web solution made up of a three-tier architecture as illustrated in Figure 5: part console, part server, and the target part that requires administration.
Figure 5: Three-tier architecture
This also brings together several Open Source tools, each with an excellent reputation in their own area (Nagios, SNMPTT, Webmin, MRTG, IPMItool, nmap, UltraVNC, Cygwin...), along with Bull’s expertise in the world of systems and administration.
To illustrate this marriage: the Open Source Nagios tool acts as a monitoring server within NovaScale Master. But the first challenge for the R&D team was to port the solution to Itanium® 2 as well as to Windows. In addition, its ergonomy has been reworked, so it integrates better with the NovaScale Master console, as illustrated in Figure 6.
Figure 6: Example of a Nagios page modified and integrated
into the NovaScale Master console: Alerts Viewer
Additional functionality and tools developed by the R&D team, available separately from the Web console itself, fulfill the following functions:
- Notify Bull maintenance sites (if a support contract permits this)
- Incorporate inventory information derived from hardware and operating systems into the NovaScale Master console
- Provide a ‘Power control’ Web interface of computers
- Integrate, through simple configuration, any other SNMP agent provided with a peripheral or third-party tool
- Customize the system monitoring function
A series of configuration tools has been designed to centralize data and make the systems administrator’s job easier. Figure 7 shows the home page.
Figure 7: NovaScale Master Web configuration application
All these open technologies and tools help us meet not just generic requirements, but also the specific requirements of each customer.
Some examples of NovaScale Master in use
Bull is, naturally, one of NovaScale Master’s biggest users. For example, in the United Kingdom Bull uses it to monitor a series of Windows servers and Ethernet switches at one of its main sites in Hemel Hempstead, just outside London. This solution was much appreciated when a neighboring oil depot blew up, preventing all access to the site. When the machine room remained inaccessible over several days, the applications kept running thanks in particular to the remote monitoring and control facilities via the Web offered by NovaScale Master.
The monitoring function of the French Atomic Energy Authority’s TERA-10 HPC cluster is based on a NovaScale Master solution that administers more than 600 NovaScale Intensive computing nodes.
At telecoms operator SFR, the NovaScale machines and Linux OS that go to make up a Bull VoIP (Voice over IP) solution are administered by NovaScale Master.
Generally speaking, customers have successfully integrated NovaScale Master, with the help of Bull teams, both in the early and completion stages of their projects. Bull provides audit, monitoring, installation, configuration and (of course) technical support services for NovaScale Master.
R&D collaborates closely with the service and support teams, to develop and back-up these services. It also responds to various requests for product updates.
NovaScale Master and the future
The three avenues for developing NovaScale Master are linked with:
- The evolution of the NovaScale range and its peripherals (storage, KVM, OS, PowerSwitch, etc.)
- The integration of enhancements and new functions programmed and/or detected in our customers’ systems
- The incorporation of the fruits of co-operative R&D projects both internally within Bull (High Performance Computing, telecoms, Windows Competence Center, Storage, GCOS and Innovations) and external joint ventures (Intel, LSI, etc.).
The integration of the WS-Management standard (Web Services) is a development that is due to be completed in 2007.
Recently the new NovaScale range of ‘virtual’ servers has been launched, alongside the NovaScale ‘physical’ range. The VMs (Virtual Machines) bring slight variations to the concepts behind System Management. There is, for example, the idea that a (physical) machine can contain several others (virtual machines). This is why we are currently working on the development of NovaScale Master to integrate these new ‘machines’. The challenge is to link the physical world and the virtual world, at least from the System Management point of view.
To ensure that applications have a smooth and productive life, a System Management solution is essential to organize and administer their ecosystem. For Bull’s NovaScale range, this solution is called NovaScale Master.
This solution is the successful result of combining the Open Source world with Bull’s system expertise. Through a single ergonomic, interactive and secure Web input point, it brings together all the monitoring, reporting, inventory and remote control functions of NovaScale servers and operating systems. But above all, it is an open and scalable product thanks to the standard technologies it uses. It enables generic administration needs to be fulfilled, but also matches the specific and changing demands of each NovaScale customer. And to achieve all this, Bull can support its NovaScale Master customers thanks to the services, support and expertise it provides.
CIM/WBEM: Common Information Model/Web Based Enterprise Management. XML/http enterprise management protocol. Flagship standard of the DMTF consortium. (See below).
DMTF: Distributed Management Task Force is the System Management solution standardization consortium in which brings together practically all the HSVs, ISVs and OSVs. Further information can be obtained from: http://www.dmtf.org
GUI: Graphical User Interface.
IPMI: Intelligent Platform Management Interface. Platform Management protocol specified by a consortium, co-founded by Intel, NEC and others.
KVM: Keyboard, Video, Mouse. Switching system enables sharing of a keyboard, screen, mouse, between several machines, without having to restart these every time a change occurs.
LAN: Local Area Network.
SNMP: Simple Network Management Protocol. Remote or local systems administration protocol that are used on Internet type networks, originally designed for bridges and routers, and now more widely.
System: Group of machines, peripherals, networks, OS, components etc, that require administration.
NovaScale Master Bull documents:
Documents and other information on NovaScale Master can be found at the following address: http://support.bull.com in the section entitled: Platforms/NovaScale Master