Integrating PCP into an Enterprise Management Strategy

  Tools
pmie pmieconf pmlogger pmlogconf

This chapter of the Performance Co-Pilot tutorial discusses the steps required to integrate PCP into the various management frameworks available. It takes into consideration the distributed nature of both the management frameworks and of the PCP tools, and how best to combine the functionality they offer.

For an explanation of Performance Co-Pilot terms and acronyms, consult the PCP glossary .


Points of Integration

Following is a brief description on how PCP can be integrated in terms of some of the aligned features of PCP and the typical management framework.

Data

The PCP archive logging utility pmlogger (1) generates performance data files in the PCP archive format, which is specifically designed for optimal fetch latency when retrieving data from the archive for replay and when seeking to random time points in the archive.   This format is specific to the PCP tools and the PMAPI, and clearly external interfaces are necessary for non-PCP tools to read the data.

The LOGIMPORT (3) APIs provide a mechanism for importing data into a PCP archive. Using these services, tools are provided to import common sources of performance data such as spreadsheets, binary data from sar (1) and iostat (1) output. Other tools can easily be developed in C or Perl, see LOGIMPORT (3), pmiStart (3) and PCP::LogImport (3).

There are a number of PCP tools which have specifically been developed with the aim of producing a format which is easily incorporated into an external framework, database, or spreadsheet application - for example, the pmdumptext (1) and pmlogsummary (1) tools both provide options to output data in a time-stamped, tab-delimited or comma-separated form, which is easily incorporated into other tools.  PCP also provides daily log rotation, merging and culling facilities for multiple collector hosts from a single monitor host, as well as the automated pmchart/cron/pmsnap performance graph image generation facility.

Due to the unlimited potential consumers of this historical performance data, it is left as an exercise for the reader to figure out how best to incorporate this data into their own environment.

Note that all of the PCP tools are "timezone-aware" and can switch between the timezone of the monitoring machine and the timezone of the collector machine for which the archive was generated (this information is stored in the archive).  Also, PCP archives can be generated on a machine of one operating system version, architecture or byte-order, and replayed on a completely different machine.

Events

Among the more compelling reasons for making use of a management framework to administer an enterprise are the distributed monitoring and centralized analysis aspects.  All frameworks provide event monitoring facilities, of varying complexity.  Some provide simple point to point event generation, others have proxy event servers which allow events to be filtered and then potentially passed upstream to another event monitor.  To allow the framework to be extended, the frameworks will typically provide a mechanism for external applications to push their own events into the framework, and it is this feature which we wish to exploit in our PCP integration efforts.

Although PCP does provide a powerful inference engine in pmie (1) (as well as a far richer set of performance metrics than the more generic frameworks can provide, and low latency protocols designed specifically for transporting performance data quickly), no attempt is made to provide an event "sink" - some application which will display and filter performance events for the user.  From an enterprise management point of view, an event monitoring facility specifically for performance events would be exactly the wrong thing to do from within PCP - system administrators managing a wide array of different machines should expect to see all system-wide events coming to a single point for their notification.

So, the integration point must be from pmie (1) - when it detects an abnormal performance situation, it must pass it on in the most appropriate manner possible. Unfortunately, the various event management frameworks have widely differing mechanisms for receiving events, so each framework must be handled separately in order to make best use of their event viewing and filtering capabilities.

The diagram shows a typical pmie (1) setup - rattle.melbourne.sgi.com running pmie , fetching performance data from a variety of sources, then evaluating its set of performance rules and generating events into whichever event "sinks" have been specified.  Specifying how to generate an event, which rules to use, how frequently to evaluate each of the rules, which hosts to monitor, etc, is performed by the pmieconf (1) utility, which can be extended to allow new frameworks to be incorporated.

User Interface

A number of the enterprise management frameworks have the ability to provide closer integration between the tools themselves, for example starting PCP tools from a menu option of some of the framework's tools, or by installing additional on-line help for the performance events which PCP generates.  This level of integration is not attempted, and is not seen as providing much value in practice.  The pmieconf (1) utility is the definitive source of help text for the performance events generated by pmie (1) - it describes the rules and each of the customizable variables affecting the rules (including the "global" variables affecting all of the rules, such as where to send events when a performance event is generated).


CA/Unicenter TNG (Computer Associates)

Step-by-step - how to setup and ensure pmie (1) can talk to the Unicenter TNG Framework:

  • Start CCI services on the monitor node, i.e. the node where pmie events should be propagated to (a Windows machine - "hugh" - in this example). For a Windows monitoring node, refer to the "Services" window from the "Control Panel".
  • Start CCI services on the monitored node, i.e. the node where pmie is running (an IRIX machine - "wobbly" - in this example).
                  wobbly# $CAIGLBL0000/bin/unicntrl start cci
                  wobbly# 
                  
  • Ensure the connection between the two nodes is active.  For UNIX machines:
                  wobbly# $CAIGLBL0000/cci/bin/rmt status
                  
                    Sysid  State                 Last Send Time  Last Receive Time
                  --------|---------------------|---------------|-----------------
                  hugh     ACTIVE                041099 17:54:37 041199 13:42:01
                  
                  wobbly#
                  
    For Windows machines:
                  C:\TNGFW\BIN>rmtcntrl status
                  SUCCESS: information returned
                  Sysid    State                 Last Send Time  Last Receive time
                  --------|---------------------|---------------|-----------------
                  HUGH     ACTIVE
                  WOBBLY   ACTIVE                Apr-10-99 17:58:25 Apr-10-99 17:58:25
                  
                  C:\TNGFW\BIN>
                  
  • Send a test event to the monitoring node from the node where pmie will be running
                  wobbly# $CAIGLBL0000/bin/cawto -n hugh -g Performance -s wobbly test event
                  
  • To verify that the event is successfully received on a Windows NT monitoring host, use the Event Console, which lists events as they arrive.

    There are several tools which ship with the TNG Framework for examining and verifying the connection between two nodes - such as oprping (1) - refer to the Unicenter TNG documentation for full details.

  • Once the test event propagates successfully, enable pmie event generation into the TNG Framework:
                  wobbly# pmieconf modify global tngfw_action yes
                  wobbly# $PCP_RC_DIR/pmie start
                  
Configuration options - how to customize the setup for different environments:
  • The node to which events are sent is identified by the tngfw_node pmieconf variable, so to setup event propagation from pmie on rattle to TNG on hugh as described above:
                  wobbly# pmieconf modify global tngfw_node hugh
                  wobbly# $PCP_RC_DIR/pmie start
                  
  • Other parameters associated with TNG events can also be specified - these include the color and category of each individual event, a group of events, or globally for all events:
                  wobbly# pmieconf modify global tngfw_color Yellow
                  wobbly# pmieconf modify global tngfw_category "PMIE Events"
                  wobbly# pmieconf modify cisco tngfw_color Red
                  wobbly# $PCP_RC_DIR/pmie start
                  


HP OpenView (Hewlett-Packard)

Step-by-step on how to setup and ensure pmie (1) can talk to OpenView:

  • Start the pmcd (1) daemon on the node which will receive pmie (1) events - "wobbly" in this example.
                  wobbly# service snmp start
                  wobbly# service ovnnm start
                  
  • Send a test event to the monitoring node from the node where pmie (1) will be running (OV_BIN is typically /usr/OV/bin):
                  rattle# cd $OV_BIN
                  rattle# ./ovevent -c "Status Events" -s Normal \
                  .1.3.6.1.4.1.11.2.17.1.0.58916872 \
                  .1.3.6.1.4.1.11.2.17.2.1.0 Integer 14 \
                  .1.3.6.1.4.1.11.2.17.2.2.0 OctetString "rattle" \
                  .1.3.6.1.4.1.11.2.17.2.4.0 OctetString "test event"
                  
  • To verify that the event is successfully received, run the OpenView event monitoring program (pictured here):
                  wobbly# cd $OV_BIN
                  wobbly# ./xnmevents &
                  
  • If this event propagates successfully, enable pmie event generation into OpenView:
                  rattle# pmieconf modify global ov_action yes
                  rattle# $PCP_RC_DIR/pmie start
                  

The xnmevents (1) GUI connects to the pmcd (1) daemon on the monitoring node which supplies any new events which arrive while xnmevents is running. The image to the right shows the xnmevents main window. On receipt of a new pmie event, the "Threshold Events" toggle button changes color according to the event severity, indicating that there are new events - clicking on the toggle button brings up the viewer window (shown below).

The viewer lets you view, filter, and acknowledge events (once all events are acknowledged, the "Threshold Events" button becomes white - until the next event arrives).

Configuration options - using pmieconf (1) to customize the setup for different environments:
  • The node to which events are sent is identified by the ov_node pmieconf variable, so to setup event transferral from rattle to wobbly as described above, I ran:
                  # pmieconf modify global ov_node wobbly
                  # $PCP_RC_DIR/pmie start
                  
  • Other parameters associated with OpenView events can also be specified - these include the category and severity of each individual event, a group of events, or globally for all events:
                  rattle# pmieconf modify filesys.filling ov_severity Critical
                  rattle# pmieconf modify filesys.filling ov_category "Status Events"
                  rattle# pmieconf modify cisco ov_severity Major
                  rattle# $PCP_RC_DIR/pmie start
                  


EnlightenDSM (Enlighten Software Solutions)

Step-by-step on how to setup and ensure pmie (1) can talk to EnlightenDSM:

  • Start the EnlightenDSM (1) daemons on the host being monitored by pmie (1) - "wobbly" in this example.
                  wobbly# /opt/enlighten/bin/start_enl_daemons -r
                  ...
                  start_emdd: Invoking /opt/emd/bin/emdd...
                  start_enl_daemons: Invoking /opt/enlighten/bin/pep...
                  start_enl_daemons: Invoking /opt/enlighten/bin/renld...
                  start_enl_daemons: Invoking /opt/enlighten/bin/AgentMon...
                  
  • Start up the Enlighten GUI to verify that events can be received. Go into the "Events" -> "Status Map" window:
                  wobbly# /opt/enlighten/bin/xenln &
                  
  • Send a test event to the Enlighten GUI, and verify that the scrolling event list at the bottom of the "Status Map" window updates after the event has been sent:
                  wobbly# /opt/enlighten/bin/EventsCli -n Test -u tps -v 12345 -s 5 -q
                  
  • If this event propagates successfully, enable pmie event generation into Enlighten:
                  wobbly# pmieconf modify global enln_action yes
                  wobbly# $PCP_RC_DIR/pmie start
                  
Configuration options - using pmieconf (1) to customize the setup:

The only configurable option of note for Enlighten DSM is the ability to change the severity setting for individual events, event groups, or globally for all events (where severity is a number between a low of 1 and a high of 5, and the default severity value for pmie (1) events is 2):

              wobbly# pmieconf modify filesys.filling enln_severity 4
              wobbly# pmieconf modify cisco enln_severity 5
              wobbly# $PCP_RC_DIR/pmie start
              


Extending to Other Frameworks

All of the enterprise management frameworks we've come across provide some mechanism for generating events from outside the framework.  This is usually in the form of a stand-alone utility, but could also be an API, which packages the attributes associated with an event (these attributes are usually things like severity, source host, message text, etc) and sends this to the monitoring host.

Since pmie (1) supports running an arbitrary command upon detection of a performance event (in addition to its other native actions, such as writing an entry in the system log file), this is the hook we'll use to add support for additional frameworks.

Steps involved when integrating other frameworks with PCP:

  • find or create a utility to generate events into the framework;
  • familiarize yourself with the command line options for this utility, and decide which options are of interest from a performance-event perspective;
  • test the utility by hand - ensure that you can run the utility from the command line and that the framework's monitoring software successfully receives the event;
  • using the files in /var/pcp/config/pmieconf/global as a guide, create a pmieconf (1) file containing "global" variables for use in pmieconf - one of these is always an "action" variable, which allows the framework utility to be run when pmie (1) generates an event;
  • using pmieconf , ensure that the syntax of your created file is correct, and that the new variables are visible in the "global" group;
                  	# pmieconf -r newfile list global
                  
  • finally, install the new file into the /var/pcp/config/pmieconf/global subdirectory and switch on your new framework action for all of the rules;
                  	# pmieconf modify global new_action yes
                  	# $PCP_RC_DIR/pmie start
                  

If the framework is sufficiently common that other people may wish to use your new pmieconf (1) "action", feel free to share it and we'll incorporate it into future PCP release.