PCP Quick Reference Guide

Introduction

Performance Co-Pilot (PCP) is an open source framework and toolkit for monitoring, analyzing, and responding to details of live and historical system performance. PCP has a fully distributed, plug-in based architecture making it particularly well suited to centralized analysis of complex environments and systems. Custom performance metrics can be added using the C, C++, Perl, and Python interfaces.

This page provides quick instructions how to install and use PCP on a set of hosts of which one (a monitor host) will be used for monitoring and analyzing itself and other hosts (collector hosts).

Installation

PCP is available on all recent distribution releases, include Debian/Fedora/RHEL/Ubuntu. For earlier releases and other distributions you might want to consider installation from sources or checking auxiliary package repositories, like EPEL .

Installing Collector Hosts


To install basic PCP tools and services and enable collecting performance data on Fedora/RHEL, run:

# yum install pcp
# chkconfig pmcd on
# service pmcd start
# chkconfig pmlogger on
# service pmlogger start

To install basic PCP tools and services and enable collecting performance data on Debian/Ubuntu, run:

$ sudo apt-get install pcp
$ sudo update-rc.d pmcd defaults
$ sudo update-rc.d pmlogger defaults
$ sudo service pmcd restart
$ sudo service pmlogger restart

This will enable the Performance Metrics Collector Daemon ( pmcd(1) ) on the host which then in turn will control and request metrics on behalf of clients from various Performance Metrics Domain Agents (PMDAs). The PMDAs provide the actual data from different components (domains) in the system, for example from the Linux Kernel PMDA or the NFS Client PMDA. The default configuration includes over 1000 metrics with negligible overall overhead. Local PCP archive logs will also be enabled on the host for convenience with pmlogger(1) .


To enable PMDAs which are not enabled by default, for example the Postfix PMDA, run the corresponding Install script:

# cd /var/lib/pcp/pmdas/postfix
# ./Install

The client tools will contact local or remote PMCDs as needed, communication with PMCD over the network uses TCP port 44321 by default.

Installing Monitor Host

The following additional packages can be optionally installed on the monitoring host to extend the set of monitoring tools from the base pcp package.


Install graphical analysis tools and documentation on Fedora/RHEL:

# yum install pcp-doc pcp-gui

Install graphical analysis tools and documentation on Debian/Ubuntu:

$ sudo apt-get install pcp-doc pcp-gui

To enable centralized archive log collection on the monitoring host, its pmlogger is configured to fetch performance metrics from collector hosts. Add each collector host to the pmlogger configuration file /etc/pcp/pmlogger/control and then restart the pmlogger service on the monitoring host.


Enable recording of metrics from remote host acme.com :

# echo acme.com n n PCP_LOG_DIR/pmlogger/acme.com -r -T24h10m -c config.acme.com >> /etc/pcp/pmlogger/control

# service pmlogger restart

Checks for remote log collection will be done every half an hour. You may also wish to run /usr/libexec/pcp/bin/pmlogger_check -V -C (on Fedora/RHEL) or /usr/lib/pcp/bin/pmlogger_check -V -C (on Debian/Ubuntu) manually (service restart above issues this command internally).

Note that a default configuration file (config.acme.com above) will be generated if it does not exist already. This process is optional (a custom configuration for each host can be provided instead), see the pmlogconf(1) manual page for details on this.

Dynamic Host Discovery

In dynamic environments manually configuring every host is not feasible, perhaps even impossible. PCP Manager ( pmmgr(1) , from the pcp-manager package) can be used instead of directly invoking pmlogger and pmie to auto-discover and auto-configure new collector hosts.


To install the PMMGR daemon and begin monitoring either statically or dynamically configured hosts, run:

## Fedora/RHEL:
# yum install pcp-manager
# chkconfig pmmgr on
## Debian/Ubuntu:
$ sudo apt-get install pcp-manager
$ sudo update-rc.d pmmgr defaults
# Common:
# echo acme.com >> /etc/pcp/pmmgr/target-host
# echo avahi >> /etc/pcp/pmmgr/target-discovery
# echo probe= ip.addr.tup.le/netmask >> /etc/pcp/pmmgr/target-discovery
# service pmmgr restart
# find /var/log/pcp/pmmgr

Discover use of the PCP pmcd service on the local network:

$ pmfind -s pmcd

Installation Health Check

Basic installation health check for running services, network connectivity between hosts, and enabled PMDAs can be done simply as follows.


Check PCP services on remote host munch and historically, from a local archive for host smash :

$ pcp -h munch
              Performance Co-Pilot configuration on munch:
                platform: SunOS munch 5.11 oi_151a8 i86pc
                hardware: 4 cpus, 3 disks, 4087MB RAM
                timezone: EST-10
                services: pmcd pmproxy
                    pmcd: Version 3.8.9-1, 3 agents
                    pmda: pmcd mmv solaris
                    pmie: /var/log/pcp/pmie/munch/pmie.log
              

$ pcp -a /var/log/pcp/pmlogger/ smash /20140729
              Performance Co-Pilot configuration on smash:
                 archive: /var/log/pcp/pmlogger/smash/20140729
                platform: Linux smash 2.6.32-279.46.1.el6.x86_64 #1 SMP Mon May 19 16:16:00 EDT 2014 x86_64
                hardware: 8 cpus, 2 disks, 1 node, 23960MB RAM
                timezone: EST-10
                services: pmcd pmproxy pmwebd
                    pmcd: Version 3.9.8-1, 8 agents
                    pmda: pmcd proc xfs linux mmv nvidia dmcache postgresql
                pmlogger: primary logger: /var/log/pcp/pmlogger/smash/20140729.00.10
                    pmie: /var/log/pcp/pmie/smash/pmie.log
              

System Level Performance Monitoring

PCP comes with a wide range of command line utilities for accessing live performance metrics via PMCDs or historical data using archive logs. The following examples illustrate some of the most useful use cases, please see the corresponding manual pages for each command for additional information. In the examples below -h <host> is always optional, the default is the local host.

Monitoring Live Performance Metrics


Display all the enabled performance metrics on a host (use with -t to include a short description for each):

$ pminfo -h acme.com

Display detailed information about a performance metric and its current values:

$ pminfo -dfmtT disk.partitions.read -h acme.com

Monitor live disk write operations per partition with two second interval using fixed point notation (use -i instance to list only certain metrics and -r for raw values):

$ pmval -t 2sec -f 3 disk.partitions.write -h acme.com

Monitor live CPU load, memory usage, and disk write operations per partition with two second interval using fixed width columns:

$ pmdumptext -Xlimu -t 2sec 'kernel.all.load[1]' mem.util.used disk.partitions.write -h acme.com

Monitor live process creation rate and free/used memory with two second interval printing timestamps and using GBs for output values in CSV format:

$ pmrep -h acme.com -p -b GB -t 2sec -o csv kernel.all.sysfork mem.util.free mem.util.used

Monitor system metrics in a top-like window:

$ pcp atop

Monitor system metrics in a sar-like (System Activity Report) manner:

$ pcp atopsar

Monitor system metrics in a sar like fashion with two second interval from two different hosts:

$ pmstat -t 2sec -h acme1.com -h acme2.com

Monitor system metrics in an iostat like fashion with two second interval:

$ pmiostat -t 2sec -h acme.com

Monitor performance metrics with a GUI application with two second default interval from two different hosts. Use File->New Chart to select metrics to be included in a new view and use File->Open View to use a predefined view:

$ pmchart -t 2sec -h acme1.com -h acme2.com

Retrospective Performance Analysis

PCP archive logs are located under /var/log/pcp/pmlogger/ hostname , and the archive names indicate the date they cover. Archives are self-contained, and machine-independent so can be transfered to any machine for offline analysis.


Check the host and the time period an archive covers:

$ pmdumplog -l acme.com/20140902

Check PCP configuration at the time when an archive was created:

$ pcp -a acme.com/20140902

Display all enabled performance metrics at the time when an archive was created:

$ pminfo -a acme.com/20140902

Display detailed information about a performance metric at the time when an archive was created:

$ pminfo -df mem.freemem -a acme.com/20140902

Dump past disk write operations per partition in an archive using fixed point notation (use -i instance to list only certain metrics and -r for raw values):

$ pmval -f 3 disk.partitions.write -a acme.com/20140902

Replay past disk write operations per partition in an archive with two second interval using fixed point notation between 9 AM and 10 AM (use full dates with syntax like @"2014-08-20 14:00:00" ):

$ pmval -d -t 2sec -f 3 disk.partitions.write -S @09:00 -T @10:00 -a acme.com/20140902

Calculate average values of performance metrics in an archive between 9 AM / 10 AM using table like formatting including the time of minimum/maximum value and the actual minimum/maximum value:

$ pmlogsummary -HlfiImM -S @09:00 -T @10:00 acme.com/20140902 disk.partitions.write mem.freemem

Dump past CPU load, memory usage, and disk write operations per partition in an archive averaged over 10 minute interval with fixed columns between 9 AM and 10 AM:

$ pmdumptext -Xlimu -t 10m -S @09:00 -T @10:00 'kernel.all.load[1]' 'mem.util.used' 'disk.partitions.write' -a acme.com/20140902

Dump past CPU load, memory usage, and disk write operations per partition in an archive with extended header using MBs but without interpolation between 9 AM and 10 AM:

$ pmrep -a acme.com/20140902 -p -u -b MB -x -S @09:00 -T @10:00 kernel.all.load mem.util.used disk.partitions.write

Summarize differences in past performance metrics between two archives, comparing 2 AM / 3 AM in the first archive to 9 AM / 10 AM in the second archive (grep for '+' to quickly see values which were zero during the first period):

$ pmdiff -S @02:00 -T @03:00 -B @09:00 -E @10:00 acme.com/20140902 acme.com/20140901

Replay past system metrics in an archive in a top-like window starting 9 AM:

$ pcp atop -b 09:00 -r acme.com/20140902
$ pcp -S @09:00 -a acme.com/20140902 atop

Dump past system metrics in a sar like fashion averaged over 10 minute interval in an archive between 9 AM and 10 AM:

$ pmstat -t 10m -S @09:00 -T @10:00 -a acme.com/20140902

Dump past system metrics in an iostat(1) like fashion averaged over one hour interval in an archive:

$ pmiostat -t 1h -a acme.com/20140902

Dump past system metrics in a free(1) like fashion at a specific historical time offset:

$ pcp -a acme.com/20140902 -O @10:02 free

Replay performance metrics with a GUI application with two second default interval in an archive between 9 AM and 10 AM. Use File->New Chart to select metrics to be included in a new view and use File->Open View to use a predefined view:

$ pmchart -t 2sec -S @09:00 -T @10:00 -a acme.com/20140902

Merge several archives as a new combined archive (see the manual page how to write configuration file to collect only certain metrics):

$ pmlogextract <archive1> <archive2> <newarchive>

Visualizing iostat and sar Data

iostat and sar data can be imported as PCP archives which then allows inspecting and visualizing the data with PCP tools. The iostat2pcp(1) importer is in the pcp-import-iostat2pcp package and the sar2pcp(1) importer is in the pcp-import-sar2pcp package.


Import iostat data to a new PCP archive and visualize it:

$ iostat -t -x 2 > iostat.out
$ iostat2pcp iostat.out iostat.pcp
$ pmchart -t 2sec -a iostat.pcp

Import sar data from an existing sar archive to a new PCP archive and visualize it (sar logs are under /var/log/sysstat on Debian/Ubuntu):

$ sar2pcp /var/log/sa/sa15 sar.pcp
$ pmchart -t 2sec -a sar.pcp

Process Level Performance Monitoring

PCP provides details of each running process via the standard PCP interfaces and tools on the localhost but due to security and performance considerations, most of the process related information is not stored in archive logs by default. Also for security reasons, only root can access some details of running processes of other users.

Custom application instrumentation is possible with the Memory Mapped Value (MMV) PMDA.

Live and Retrospective Process Monitoring


Display all the available process related metrics:

$ pminfo proc

Monitor the number of open file descriptors of the process 1234:

$ pmval -t 2sec 'proc.fd.count[1234]'

Monitor the CPU time, memory usage (RSS), and the number of threads of the process 1234 ( -h local: is a workaround needed for the time being):

$ pmdumptext -Xlimu -t 2sec 'proc.psinfo.utime[1234]' 'proc.memory.rss[1234]' 'proc.psinfo.threads[1234]' -h local:

Display all the available process related metrics in an archive:

$ pminfo proc -a acme.com/20140902

Display the number of running processes on 2014-08-20 14:00:

$ pmval -s 1 -S @"2014-08-20 14:00" proc.nprocs -a acme.com/20140820

Monitoring “Hot” Processes with Hotproc

It is also possible to monitor “hot” or “interesting” processes by name with PCP 3.10.5 or later, for example all processes of which command name is java or python . This monitoring of “hot” processes can also be enabled or disabled based on certain criterias or from the command line on the fly. The metrics will be available under the namespace hotproc .

Configuring processes to be monitored contantly using the hotproc namespace can be done using the configuration file /var/lib/pcp/pmdas/proc/hotproc.conf - see the pmdaproc(1) manual page for details. This allows monitoring these processes regardless of their PIDs and also logging the metrics easily.


Enable monitoring of all Java instances on the fly and display all the collected metrics:

# pmstore hotproc.control.config 'fname == "java"'
# pminfo -f hotproc

Application Instrumentation

Applications can be instrumented in the PCP world by using Memory Mapped Values (MMVs). pmdammv is a PMDA which exports application level performance metrics using memory mapped files. It offers an extremely low overhead instrumentation facility that is well-suited to long running, mission critical applications where it is desirable to have performance metrics and availability information permanently enabled.

Application to be instrumented with MMV need to be PCP MMV aware, APIs are available for several languages including C, C++, Perl, and Python. Java applications may use the separate Parfait class library for enabling MMV.

Instrumentation of unaltered Java applications is a known feature request and is planned for a not-too-distant release.

See the Performance Co-Pilot Programmer's Guide PDF for more information about application instrumentation.

Derived Metrics

PCP provides a wide range of performance metrics but still in some cases the readily available metrics may not exactly provide what is needed. Derived metrics (see pmLoadDerivedConfig(3) ) may be used to extend the available metrics with new (derived) metrics by using simple arithmetic expressions (see pmRegisterDerived(3) ).

The following example illustrates how to define corresponding metrics which are displayed by sar -d but are not provided by default by PCP:


Create a file containing definitions of derived metrics and point PCP_DERIVED_CONFIG to it when running PCP utilities:
$ cat ./pcp-deriv-metrics.conf
disk.dev.avqsz = disk.dev.read_rawactive + disk.dev.write_rawactive
disk.dev.avrqsz = 2 * rate(disk.dev.total_bytes) / rate(disk.dev.total)
disk.dev.await = 1000 * (rate(disk.dev.read_rawactive) + rate(disk.dev.write_rawactive)) / rate(disk.dev.total)
$ export PCP_DERIVED_CONFIG=./pcp-deriv-metrics.conf
$ pmval -t 2sec -f 3 disk.dev.avqsz
$ pmval -t 2sec -f 3 disk.dev.avrqsz -h acme.com
$ pmval -t 2sec -f 3 disk.dev.await -a acme.com/20140902

Define a derived metric on the command line and monitor it with regular metrics:
$ pmrep -t 2sec -p -b MB -e "mem.util.allcache = mem.util.bufmem + mem.util.cached + mem.util.slab" mem.util.free mem.util.allcache mem.util.used

Performance Metrics Inference

Performance Metrics Inference Engine ( pmie(1) ) can evaluate rules and generate alarms, run scripts, or automate system management tasks based on live or past performance metrics.


To enable and start PMIE on Fedora/RHEL:

# chkconfig pmie on
# service pmie start

To make sure PMIE is running on Debian/Ubuntu:

$ sudo update-rc.d pmie defaults
$ sudo service pmie restart

To enable the monitoring host to run PMIE for collector hosts, add each host to the /etc/pcp/pmie/control configuration file.


Enable monitoring of metrics from remote host acme.com :
# echo acme.com n PCP_LOG_DIR/pmie/acme.com -c config.acme.com

# service pmie restart

Some examples in plain English describing what could be done with PMIE:

  • If the number of IP received packets exceeds a threshold run a script to adjust firewall rules to limit the incoming traffic
  • If 3 out of 4 consecutive samples taken every minute of disk operations exceeds a threshold between 9 AM and 5 PM send an email and write a system log message
  • If all hosts in a group have CPU load over a threshold for more than 10 minutes or they have more application processes running than a threshold limit generate an alarm and run a script to tune the application

This example shows a PMIE script, checks its syntax, runs it against an archive, and prints a simple message if more than 5 GB of memory was in use between 9 AM and 10 AM using one minute sampling interval:

$ cat pmie.ex
bloated = ( mem.util.used > 5 Gbyte )
                    -> print "%v memory used on %h!"

$ pmie -C pmie.ex
$ pmie -t 1min -c pmie.ex -S @09:00 -T @10:00 -a acme.com/20140820

PCP Web Services

Performance Metrics Web Daemon

Performance Metrics Web Daemon ( pmwebd(1) ) is a front-end to both PMCD and PCP archives, providing a REST web service (over HTTP/JSON) suitable for use by web-based tools wishing to access performance data over HTTP. Custom applications can access all the available PCP information using this method, including custom metrics generated by custom PMDAs.


To install the PCP web service on Fedora/RHEL:

# yum install pcp-webapi
# chkconfig pmwebd on
# service pmwebd start

To install the PCP web service on Debian/Ubuntu:

$ sudo apt-get install pcp-webapi
$ sudo update-rc.d pmwebd defaults
$ sudo service pmwebd restart

User Web Interface for Performance Metrics

Several browser interfaces for accessing PCP performance metrics are also available. These web interfaces make PCP metrics available via your choice of Grafana or Graphite .

After installing the PCP web services daemon as described above, install the pcp-webjs package and then just point a browser toward http://localhost:44323 .

Customizing and Extending PCP

PCP PMDAs offer a way for administrators and developers to customize and extend the default PCP installation. The pcp-libs-devel package contains all the needed development related examples, headers, and libraries. New PMDAs can easily be added, below is a quick list of references for starting development:

  • Some examples exist below /var/lib/pcp/pmdas/ - the simple, sample, and txmon PMDAs are easy to read PMDAs.
    • The simple PMDA provides implementations in C, Perl and Python.
  • A simple command line monitor tool is /usr/share/pcp/demos/pmclient (C language).
  • Good initial Python monitor examples are /usr/libexec/pcp/bin/pcp/pcp-* (Fedora/RHEL) or /usr/lib/pcp/bin/pcp-* (Debian/Ubuntu).
    • Slightly more complex examples are the pcp-free, pmiostat, pmcollectl commands.
  • The applications in the pcp-webjs source tree are helpful when developing new web applications.

Additional Information