PCP Introduction

The Performance Co-Pilot (PCP) is an open source toolkit designed for monitoring and managing system-level performance. These services are distributed and scalable to accommodate the most complex system configurations and performance problems.

PCP supports many different platforms, including (but not limited to) Linux, MacOSX, FreeBSD, IRIX, Solaris and Windows (Win32). From a high-level PCP can be considered to contain two classes of software utility:

  • PCP Collectors . These are the parts of PCP that collect and extract performance data from various sources, e.g. the kernel or a database. These are available from http://pcp.io/ .

  • PCP Monitors . These are the parts of PCP that display data collected from hosts (or archives) that have the PCP Collector installed. Many monitor tools are available as part of PCP. Other monitoring tools are available separately, such as pmchart, in layered packages that build on the core PCP functionality.

This document describes the high-level features and options common to most PCP utilities available on all platforms.

Overview

The PCP architecture is distributed in the sense that any PCP tool may be executing remotely. On the host (or hosts) being monitored, each domain of performance metrics, whether the kernel, a service layer, a database, a web server, an application, etc. requires a Performance Metrics Domain Agent (PMDA) which is responsible for collecting performance measurements from that domain. All PMDAs are controlled by the Performance Metrics Collector Daemon ( pmcd (1)) on the same host.

Client applications (the monitoring tools) connect to pmcd (1), which acts as a router for requests, by forwarding requests to the appropriate PMDA and returning the responses to the clients. Clients may also access performance data from a PCP archive (created using pmlogger (1)) for retrospective analysis.

Each tool or command is documented completely in its own reference page.

The following performance monitoring applications are primarily console based, are typically run directly from the command line, and are all part of the base PCP package.

  • pmstat - Outputs an ASCII high-level summary of system performance.
  • pmie - An inference engine that can evaluate predicate-action rules to perform alarms and automate system management tasks.
  • pminfo - Interrogate specific performance metrics and the meta data that describes them.
  • pmlogger - Generates PCP archives of performance metrics suitable for replay by most PCP tools.
  • pmval - Simple periodic reporting for some or all instances of a performance metric, with optional VCR time control.

Additional tools can be found in the layered PCP GUI package.

  • pmchart - Strip charts for arbitrary combinations of performance metrics.
  • pmdumptext - Produce ASCII reports for arbitrary combinations of performance metrics.

Common Command Line Arguments

There is a set of common command line arguments that are used consistently by most PCP tools.

-a archive

Performance metric information is retrospectively retrieved from the Performance Co-Pilot (PCP) archive, previously generated by pmlogger (1). The -a and -h options are mutually exclusive.

-a archive[,archive,..]

An alternate form of -a for applications that are able to handle multiple archives.

-h hostname

Unless directed to another host by the -h option, or to an archive by the -a option, the source of performance metrics will be the Performance Metrics Collector Daemon (PMCD) on the local host. The -a and -h options are mutually exclusive.

-s samples

The argument samples defines the number of samples to be retrieved and reported. If samples is 0 or -s is not specified, the application will sample and report continuously (in real time mode) or until the end of the PCP archive (in archive mode).

-z

Change the reporting timezone to the local timezone at the host that is the source of the performance metrics, as identified via either the -h or -a options.

-Z timezone

By default, applications report the time of day according to the local timezone on the system where the application is executed. The -Z option changes the timezone to timezone in the format of the environment variable TZ as described in environ (5).

Interval Specification and Alignment

Most PCP tools operate with periodic sampling or reporting, and the -t and -A options may be used to control the duration of the sample interval and the alignment of the sample times.

-t interval

Set the update or reporting interval.

The interval argument is specified as a sequence of one or more elements of the form number[units] where number is an integer or floating point constant (parsed using strtod (3)) and the optional units is one of: seconds, second, secs, sec, s, minutes, minute, mins, min, m, hours, hour, h, days, day and d. If the unit is empty, second is assumed.

In addition, the upper case (or mixed case) version of any of the above is also acceptable.

Spaces anywhere in the interval are ignored, so 4 days 6 hours 30 minutes, 4day6hour30min, 4d6h30m and 4d6.5h are all equivalent.

Multiple specifications are additive, e.g. ‘‘1hour 15mins 30secs’’ is interpreted as 3600+900+30 seconds.

-A align

By default samples are not necessarily aligned on any natural unit of time. The -A option may be used to force the initial sample to be aligned on the boundary of a natural time unit. For example -A 1sec , -A 30min and -A 1hour specify alignment on whole seconds, half and whole hours respectively.

The align argument follows the syntax for an interval argument described above for the -t option.

Note that alignment occurs by advancing the time as required, and that -A acts as a modifier to advance both the start of the time window (see the next section) and the origin time (if the -O option is specified).

Time Window Specification

Many PCP tools are designed to operate in some time window of interest, e.g. to define a termination time for real time monitoring or to define a start and end time within a PCP archive log.

In the absence of the -O and -A options to specify an initial sample time origin and time alignment (see above), the PCP application will retrieve the first sample at the start of the time window.

The following options may be used to specify a time window of interest.

-S starttime

By default the time window commences immediately in real time mode, or coincides with time at the start of the PCP archive log in archive mode. The -S option may be used to specify a later time for the start of the time window.

The starttime parameter may be given in one of three forms ( interval is the same as for the -t option as described above, ctime is described below):

interval To specify an offset from the current time (in real time mode) or the beginning of a PCP archive (in archive mode) simply specify the interval of time as the argument. For example -S 30min will set the start of the time window to be exactly 30 minutes from now in real time mode, or exactly 30 minutes from the start of a PCP archive.

-interval To specify an offset from the end of a PCP archive log, prefix the interval argument with a minus sign. In this case, the start of the time window precedes the time at the end of archive by the given interval. For example -S -1hour will set the start of the time window to be exactly one hour before the time of the last sample in a PCP archive log.

@ctime To specify the calendar date and time (local time in the reporting timezone) for the start of the time window, use the ctime (3) syntax preceded by an at sign. For example -S ’@ Mon Mar 4 13:07:47 1996’ .

-T endtime

By default the end of the time window is unbounded (in real time mode) or aligned with the time at the end of a PCP archive log (in archive mode). The -T option may be used to specify an earlier time for the end of the time window.

The endtime parameter may be given in one of three forms (interval is the same as for the -t option as described above, ctime is described below):

interval To specify an offset from the start of the time window simply use the interval of time as the argument. For example -T 2h30m will set the end of the time window to be 2 hours and 30 minutes after the start of the time window.

-interval To specify an offset back from the time at the end of a PCP archive log, prefix the interval argument with a minus sign. For example -T -90m will set the end of the time window to be 90 minutes before the time of the last sample in a PCP archive log.

@ctime To specify the calendar date and time (local time in the reporting timezone) for the end of the time window, use the ctime (3) syntax preceded by an at sign. For example -T ’@ Mon Mar 4 13:07:47 1996’ .

-O origin

By default samples are fetched from the start of the time window (see description of -S option) to the end of the time window (see description of -T option). The -O option allows the specification of an origin within the time window to be used as the initial sample time. This is useful for interactive use of a PCP tool with the pmtime (1) VCR replay facility.

The origin argument accepted by -O conforms to the same syntax and semantics as the starttime argument for the -T option.

For example -O -0 specifies that the initial position should be at the end of the time window; this is most useful when wishing to replay "backwards" within the time window.

The ctime argument for the -O , -S and -T options is based upon the calendar date and time format of ctime (3), but may be a fully specified time string like Mon Mar 4 13:07:47 1996 or a partially specified time like Mar 4 1996 , Mar 4 , Mar , 13:07:50 or 13:08 .

For any missing low order fields, the default value of 0 is assumed for hours, minutes and seconds, 1 for day of the month and Jan for months. Hence, the following are equivalent: -S ’@ Mar 1996’ and -S ’@ Mar 1 00:00:00 1996’.

If any high order fields are missing, they are filled in by starting with the year, month and day from the current time (real time mode) or the time at the beginning of the PCP archive log (archive mode) and advancing the time until it matches the fields that are specified. So, for example if the time window starts by default at "Mon Mar 4 13:07:47 1996", then -S @13:10 corresponds to 13:10:00 on Mon Mar 4, 1996, while -S @10:00 corresponds to 10:00:00 on Tue Mar 5, 1996 (note this is the following day).

For greater precision than afforded by ctime (3), the seconds component may be a floating point number.

Also the 12 hour clock (am/pm notation) is supported, so for example 13:07 and 1:07 pm are equivalent.

Performance Metrics - Names and Identifiers

The number of performance metric names supported by PCP in IRIX is of the order of a few thousand. There are fewer metrics on Linux, but still a considerable number. The PCP libraries and applications use an internal identification scheme that unambiguously associates a single integer with each known performance metric. This integer is known as the Performance Metric Identifier, or PMID. Although not a requirement, PMIDs tend to have global consistency across all systems, so a particular performance metric usually has the same PMID.

For all users and most applications, direct use of the PMIDs would be inappropriate (e.g. this would limit the range of accessible metrics, make the code hard to maintain, force the user interface to be particularly baroque, etc.). Hence a Performance Metrics Name Space (PMNS) is used to provide external names and a hierarchic classification for performance metrics. A PMNS is represented as a tree, with each node having a label, a pointer to either a PMID (for leaf nodes) or a set of descendent nodes in the PMNS (for non-leaf nodes).

A node label must begin with an alphabetic character, followed by zero or more characters drawn from the alphabetics, the digits and character `_´ (underscore). For alphabetic characters in a node label, upper and lower case are distinguished.

By convention, the name of a performance metric is constructed by concatenation of the node labels on a path through the PMNS from the root node to a leaf node, with a ‘‘.’’ as a separator. The root node in the PMNS is unlabeled, so all names begin with the label associated with one of the descendent nodes below the root node of the PMNS, e.g. kernel.percpu.syscall . Typically (although this is not a requirement) there would be at most one name for each PMID in a PMNS. For example kernel.all.cpu.idle and disk.dev.read are the unique names for two distinct performance metrics, each with a unique PMID.

Groups of related PMIDs may be named by naming a non-leaf node in the PMNS tree, e.g. disk .

There may be PMIDs with no associated name in a PMNS; this is most likely to occur when specific PMIDs are not available in all systems, e.g. if ORACLE is not installed on a system, there is no good reason to pollute the PMNS with names for all of the ORACLE performance metrics.

Note also that there is no requirement for the PMNS to be the same on all systems, however in practice most applications would be developed against a stable PMNS that was assumed to be a subset of the PMNS on all systems. Indeed the PCP distribution includes a default local PMNS for just this purpose.

The default local PMNS is located at $PCP_VAR_DIR/pmns/root however the environment variable PMNS_DEFAULT may be set to the full pathname of a different PMNS which will then be used as the default local PMNS.

Most applications do not use the local PMNS, but rather import parts of the PMNS as required from the same place that performance metrics are fetched, i.e. from pmcd (1) for live monitoring or from a PCP archive for retrospective monitoring.

To explore the PMNS use pminfo (1), or if the PCP GUI package is installed the New Chart and Metric Search windows within pmchart (1).

Performance Metric Specifications

In configuration files and (to a lesser extent) command line options, metric specifications adhere to the following syntax rules.

If the source of performance metrics is real time from pmcd (1) then the accepted syntax is

host:metric[instance1,instance2,...]

If the source of performance metrics is a PCP archive log then the accepted syntax is

archive/metric[instance1,instance2,...]

The host:, archive/ and [instance1,instance2,...] components are all optional.

The , delimiter in the list of instance names may be replaced by whitespace.

Special characters in instance names may be escaped by surrounding the name in double quotes or preceding the character with a backslash.

White space is ignored everywhere except within a quoted instance name.

An empty instance is silently ignored, and in particular ‘‘[]’’ is the same as no instance, while ‘‘[one,,,two]’’ is parsed as specifying just the two instances ‘‘one’’ and ‘‘two’’.

PMCD and Archive Versions

Since PCP version 2, version information has been associated with pmcd (1) and PCP archives. The version number is used in a number of ways, but most noticeably for the distributed pmns(5). In PCP version 1, the client applications would load the PMNS from the default PMNS file but in PCP version 2, the client applications extract the PMNS information from pmcd or a PCP archive. Thus in PCP version 2, the version number is used to determine if the PMNS to use is from the default local file or from the actual current source of the metrics.

Environment

In addition to the PCP run time environment and configuration variables described in the PCP Environment section below, the following environment variables apply to all installations.

PCP_STDERR

Many PCP tools support the environment variable PCP_STDERR , which can be used to control where error messages are sent. When unset, the default behavior is that ‘‘usage’’ messages and option parsing errors are reported on standard error, other messages after initial startup are sent to the default destination for the tool, i.e. standard error for ASCII tools, or a dialog for GUI tools.

If PCP_STDERR is set to the literal value DISPLAY then all messages will be displayed in a dialog. This is used for any tools launched from a Desktop environment.

If PCP_STDERR is set to any other value, the value is assumed to be a filename, and all messages will be written there.

PMCD_CONNECT_TIMEOUT

When attempting to connect to a remote pmcd (1) on a machine that is booting, the connection attempt could potentially block for a long time until the remote machine finishes its initialization.

Most PCP applications and some of the PCP library routines will abort and return an error if the connection has not been established after some specified interval has elapsed. The default interval is 5 seconds. This may be modified by setting PMCD_CONNECT_TIMEOUT in the environment to a real number of seconds for the desired timeout. This is most useful in cases where the remote host is at the end of a slow network, requiring longer latencies to establish the connection correctly.

PMCD_RECONNECT_TIMEOUT

When a monitor or client application loses a connection to a pmcd (1), the connection may be re-established by calling a service routine in the PCP library. However, attempts to reconnect are controlled by a back-off strategy to avoid flooding the network with reconnection requests. By default, the back-off delays are 5, 10, 20, 40 and 80 seconds for consecutive reconnection requests from a client (the last delay will be repeated for any further attempts after the fifth). Setting the environment variable PMCD_RECONNECT_TIMEOUT to a comma separated list of positive integers will re-define the back-off delays, e.g. setting PMCD_RECONNECT_TIMEOUT to ‘‘1,2’’ will back-off for 1 second, then attempt another connection request every 2 seconds thereafter.

PMCD_WAIT_TIMEOUT

When pmcd (1) is started from $PCP_RC_DIR/pcp then the primary instance of pmlogger (1) will be started if the configuration flag pmlogger is chkconfig ’ed on and pmcd is running and accepting connections.

The check on pmcd ’s readiness will wait up to PMCD_WAIT_TIMEOUT seconds. If pmcd has a long startup time (such as on a very large system), then PMCD_WAIT_TIMEOUT can be set to provide a maximum wait longer than the default 60 seconds.

PMNS_DEFAULT

If set, then this is interpreted as the the full pathname to be used as the default local PMNS for pmLoadNameSpace (3). Otherwise, the default local PMNS is located at $PCP_VAR_DIR/pcp/pmns/root for base PCP installations.

PCP_COUNTER_WRAP

Many of the performance metrics exported from PCP agents have the semantics of counter meaning they are expected to be monotonically increasing. Under some circumstances, one value of these metrics may be less than the previously fetched value. This can happen when a counter of finite precision overflows, or when the PCP agent has been reset or restarted, or when the PCP agent is exporting values from some underlying instrumentation that is subject to some asynchronous discontinuity.

The environment variable PCP_COUNTER_WRAP may be set to indicate that all such cases of a decreasing counter should be treated as a counter overflow, and hence the values are assumed to have wrapped once in the interval between consecutive samples. This ‘‘wrapping’’ behavior was the default in earlier PCP versions, but by default has been disabled in PCP release from version 1.3 on.

PMDA_PATH

The PMDA_PATH environment variable may be used to modify the search path used by pmcd (1) and pmNewContext (3) (for PM_CONTEXT_LOCAL contexts) when searching for a daemon or DSO PMDA. The syntax follows that for PATH in sh (1), i.e. a colon separated list of directories, and the default search path is ‘‘/var/pcp/lib:/usr/pcp/lib’’, (or ‘‘/var/lib/pcp/lib’’ on Linux, depending on the value of the $PCP_VAR_DIR environment variable).

PMCD_PORT

The TPC/IP port(s) used by pmcd (1) to create the socket for incoming connections and requests, was historically 4321 and more recently the officially registered port 44321; in the current release, both port numbers are used by default as a transitional arrangement. This may be over-ridden by setting PMCD_PORT to a different port number, or a comma-separated list of port numbers. If a non-default port is used when pmcd (1) is started, then every monitoring application connecting to that pmcd (1) must also have PMCD_PORT set in their environment before attempting a connection.

The following environment variables are relevant to installations in which pmlogger (1), the PCP archive logger, is used.

PMLOGGER_PORT

The environment variable PMLOGGER_PORT may be used to change the base TCP/IP port number used by pmlogger (1) to create the socket to which pmlc (1) instances will try and connect. The default base port number is 4330. When used, PMLOGGER_PORT should be set in the environment before pmlogger (1) is executed.

If you have the PCP package installed, then the following environment variables are relevant to the Performance Metrics Domain Agents (PMDAs).

PMDA_LOCAL_PROC

If set, then a context established with the type of PM_CONTEXT_LOCAL will have access to the ‘‘proc’’ PMDA to retrieve performance metrics about individual processes.

PMDA_LOCAL_SAMPLE

If set, then a context established with the type of PM_CONTEXT_LOCAL will have access to the ‘‘sample’’ PMDA if this optional PMDA has been installed locally.

PMIECONF_PATH

If set, pmieconf (1) will form its pmieconf (5) specification (set of parameterized pmie (1) rules) using all valid pmieconf files found below each subdirectory in this colon-separated list of subdirectories. If not set, the default is $PCP_VAR_DIR/config/pmieconf.

Files

/etc/pcp.conf

Configuration file for the PCP runtime environment, see pcp.conf (5).

$PCP_RC_DIR/pcp

Script for starting and stopping pmcd (1).

$PCP_PMCDCONF_PATH

Control file for pmcd (1).

$PCP_PMCDOPTIONS_PATH

Command line options passed to pmcd (1) when it is started from $PCP_RC_DIR/pcp . All the command line option lines should start with a hyphen as the first character. This file can also contain environment variable settings of the form "VARIABLE=value".

$PCP_BINADM_DIR

Location of PCP utilities for collecting and maintaining PCP archives, PMDA help text, PMNS files etc.

$PCP_PMDAS_DIR

Parent directory of the installation directory for Dynamic Shared Object (DSO) PMDAs.

$PCP_RUN_DIR/pmcd.pid

If pmcd is running, this file contains an ascii decimal representation of its process ID.

$PCP_LOG_DIR/pmcd

Default location of log files for pmcd (1), current directory for running PMDAs. Archives generated by pmlogger (1) are generally below $PCP_LOG_DIR/pmlogger .

$PCP_LOG_DIR/pmcd/pmcd.log

Diagnostic and status log for the current running pmcd (1) process. The first place to look when there are problems associated with pmcd .

$PCP_LOG_DIR/pmcd/pmcd.log.prev

Diagnostic and status log for the previous pmcd (1) instance.

$PCP_LOG_DIR/NOTICES

Log of pmcd (1) and PMDA starts, stops, additions and removals.

$PCP_VAR_DIR/config

Contains directories of configuration files for several PCP tools.

$PCP_VAR_DIR/config/pmcd/rc.local

Local script for controlling PCP boot, shutdown and restart actions.

$PCP_VAR_DIR/pmns/root

The ASCII $PCP_LOG_DIR/NOTICES/pmns (5) exported by pmcd (1) by default. This PMNS is be the super set of all other PMNS files installed in $PCP_VAR_DIR/pmns .

$PCP_LOG_DIR/NOTICES

In addition to the pmcd (1) and PMDA activity, may be used to log alarms and notices from pmie (1) via pmpost (1).

$PCP_VAR_DIR/config/pmlogger/control

Control file for pmlogger (1) instances launched from $PCP_RC_DIR/pcp and/or managed by pmlogger_check (1) and pmlogger_daily (1) as part of a production PCP archive collection setup.

PCP Environment

Environment variables with the prefix PCP_ are used to parameterize the file and directory names used by PCP. On each installation, the file /etc/pcp.conf contains the local values for these variables. The $PCP_CONF variable may be used to specify an alternative configuration file, as described in pcp.conf (5).