Performance Co-Pilot

A System Performance and Analysis Framework

pcp.io / @performancepcp

What is Performance Co-Pilot?

  • Framework for system-level performance analysis
  • Collection, monitoring, and analysis of system metrics
  • Uses a distributed architecture
  • Provides a full API (C, Python, Perl)
  • Easily extendible and flexible

Who uses PCP?

Netflix, Red Hat,

Fujitsu, IBM,

HPC sites, banks,

telcos, developers,

...

Architecture

  • pmcd - Performance Metrics Collector Daemon
  • pmdas - Performance Metrics Domain Agents
  • pmns - Performance Metrics Name Space
  • Clients
Architecture

Existing PMDAs

activemq, apache,

bonding, ds389,

gluster, postgresql,

perfevent, mysql,

postfix, named,

samba, linux,

redis, elasticsearch,

bcc, bpftrace,

...

Installation

On Fedora/RHEL based systems:

# dnf install pcp-zeroconf

On Debian/Ubuntu based systems:

# apt-get install pcp-zeroconf

Installation

# systemctl status pmcd

# pcp
Performance Co-Pilot configuration on marquez.int.rhx:

 platform: Linux marquez.int.rhx 5.13.4-0.rc6.git1.1.fc34.x86_64 
 hardware: 8 cpus, 4 disks, 1 node, 11742MB RAM
 timezone: CEST-2
 services: pmcd pmproxy
     pmcd: Version 5.3.3-1, 10 agents, 1 client
     pmda: root pmcd proc pmproxy xfs redis linux mmv openmetrics
							

Installation

Web interface needed?
# dnf install grafana-pcp redis
# systemctl enable pmproxy grafana-server redis
# systemctl start pmproxy grafana-server redis

Usage

pminfo

Displays information about performance metrics


# pminfo | head -n3
cgroup.blkio.all.io_merged.async
cgroup.blkio.all.io_merged.read
cgroup.blkio.all.io_merged.sync
							

# pminfo | wc -l
4460
							

pminfo


# pminfo -dtf mem.freemem

mem.freemem [free system memory metric from /proc/meminfo]
    Data Type: 64-bit unsigned int  InDom: PM_INDOM_NULL 0xffffff
    Semantics: instant  Units: Kbyte
    value 340832
							

# pminfo -dtTf network.interface.in.bytes

network.interface.in.bytes [network recv read bytes from /proc/net/dev per network interface]
    Data Type: 64-bit unsigned int  InDom: 60.3 0xf000003
    Semantics: counter  Units: byte
Help:
bytes column on the "Receive" side of /proc/net/dev (stats->rx_bytes counter in
rtnl_link_stats64)
    inst [3 or "lo"] value 1449301
    inst [4 or "virbr0-nic"] value 0
    inst [5 or "virbr0"] value 0
    inst [6 or "vnet0"] value 9087033220
    inst [7 or "em1"] value 9371300105
							

pmstore

PMDAs do not only pull information

PMDAs can also dynamically get data pushed to them


# pmstore hotproc.control.config 'fname = "java"'
							

pmval


# pmval mem.freemem

metric:    mem.freemem
host:      marquez.int.rhx
semantics: instantaneous value
units:     Kbyte
samples:   all
               371924
               371824
               371824
               371456
               371392
						

All metrics are autocompleted when using bash-completion

Other CLI tools

Implementing your favourite monitoring tool is simple

For example: pmiostat


# Timestamp              Device rrqm/s wrqm/s  r/s   w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await %util
Mon Apr 27 21:59:44 2015 sda       0.0    0.0  0.0   0.0   0.0   0.0     0.00     0.00   0.0     0.0     0.0   0.0
Mon Apr 27 21:59:44 2015 sdb       0.0    0.0  0.0   0.0   0.0   0.0     0.00     0.00   0.0     0.0     0.0   0.0
Mon Apr 27 21:59:44 2015 sdc       0.0    0.0  0.0   0.0   0.0   0.0     0.00     0.00   0.0     0.0     0.0   0.0
Mon Apr 27 21:59:44 2015 sdd       0.0    0.0  0.0  15.0   0.0 119.9     8.00     0.11   7.1     0.0     7.1   1.1
Mon Apr 27 21:59:45 2015 sda       0.0    0.0  0.0   0.0   0.0   0.0     0.00     0.00   0.0     0.0     0.0   0.0
Mon Apr 27 21:59:45 2015 sdb       0.0    0.0  0.0   0.0   0.0   0.0     0.00     0.00   0.0     0.0     0.0   0.0
Mon Apr 27 21:59:45 2015 sdc       0.0    0.0  0.0   0.0   0.0   0.0     0.00     0.00   0.0     0.0     0.0   0.0
Mon Apr 27 21:59:45 2015 sdd       0.0    0.0  0.0   2.0   0.0  51.8    26.00     0.01   3.0     0.0     3.0   0.3
							

pmchart

pmchart

Historical Data

Collect data over time via the pmlogger service.


# systemctl enable pmlogger
							

# systemctl start pmlogger
							

Configuration in /etc/pcp/pmlogger. Default collection interval is 1 minute.

Archives end up in /var/log/pcp/pmlogger/`hostname`.

Historical Data

Most tools can work with archive files directly via the -a <archive_file> option.


# pmval -a /var/log/pcp/pmlogger/root.acksyn.org/20150409 \
  mem.freemem -S "Apr 09 02:00" -T "Apr 09 03:00" -t20min
metric:    mem.freemem
archive:   /var/log/pcp/pmlogger/root.acksyn.org/20150409
host:      root.acksyn.org
start:     Thu Apr  9 02:00:00 2015
end:       Thu Apr  9 03:00:00 2015
semantics: instantaneous value
units:     Kbyte
samples:   4
interval:  600.00 sec
02:00:00.000               179540
02:20:00.000               173756
02:40:00.000               184852
03:00:00.000               186352
							

Historical Data

pmdiff can be used to find metrics that changed by a certain factor in one or two archives.


# /usr/bin/pmdiff -q 50 \
  /var/log/pcp/pmlogger/root.acksyn.org/20210408 \
  /var/log/pcp/pmlogger/root.acksyn.org/20210409  
Ratio Threshold: > 50.00 or < 0.020
       20210408        20210409   Ratio  Metric-Instance
      start-end       start-end
          0.001           0.107   >100   disk.partitions.blkread ["md125"]
          0.001           0.107   >100   disk.partitions.blkread ["dm-8"]
          0.001           0.087  87.00   disk.partitions.blkread ["sda3"]
          0.014           0.000    |-|   disk.partitions.blkread ["dm-3"]
          0.002           0.000    |-|   disk.partitions.read ["dm-3"]
          0.001           0.000    |-|   kernel.percpu.interrupts.PMI ["cpu1"]
          0.000           0.186    |+|   kernel.all.nusers
          0.000           0.019    |+|   disk.partitions.blkread ["sdb3"]
          0.000           0.013    |+|   disk.partitions.read ["md125"]
          0.000           0.013    |+|   disk.partitions.read ["dm-8"]
          0.000           0.004    |+|   disk.partitions.read ["sda3"]
          0.000           0.003    |+|   disk.partitions.blkread ["sdb1"]
          0.000           0.003    |+|   disk.partitions.blkread ["dm-2"]
							

Historical Data

There are many more tools that can be used with archives (beyond the live tools):

  • pmlogextract - reduce, extract, concatenate and merge archives
  • pmlogsummary - calculate averages of metrics and output summary
  • Convert from and to other metric formats: collectl2pcp, iostat2pcp, mrtg2pcp, sar2pcp, ganglia2pcp, pcp2graphite
  • pcp2pdf can be used to create a formatted PDF report out of an archive

Performance Metrics Inference Engine

  • An inference engine that can evaluate rules and generate alarms or automate system management tasks
  • Rules have actions attached to them. (e.g. generate an alarm, create a log entry, run a script)
  • Works against live PMCD or Archives

Performance Metrics Inference Engine


# systemctl enable pmie
							

# systemctl start pmie
							

If the number of IP packets received is more than 10 per second, send an email /etc/pcp/pmie/config.default:


Delta = 1sec;
network.ip.inreceives > 10 count/sec -> shell \
 "echo foo | mail -s 'nomoreyoutube' icanthazcats@acksyn.org";
							

man pmie

Containers

Use the --container option to specify if a metric is to be fetched from a container.

There is no need to install any PCP components inside the container.


# pminfo --fetch containers.name containers.state.running
containers.name
    inst [0 or "f4d3b90bea15..."] value "sharp_feynman"
    inst [1 or "d43eda0a7e7d..."] value "cranky_colden"
containers.state.running
    inst [0 or "f4d3b90bea15..."] value 1
    inst [1 or "d43eda0a7e7d..."] value 0

# pmprobe -I network.interface.up
network.interface.up 5 "p2p1" "wlp2s0" "lo" "docker0" "veth2"

# pmprobe -I --container sharp_feynman network.interface.up
network.interface.up 2 "lo" "eth0"
							

Grafana

grafana-pcp

PCP Vector

pcp-vector

PCP Redis

pcp-redis

PCP bpftrace

pcp-bpftrace

PCP bpftrace flame graphs

bpftrace-flame-graph

Questions?

pcp@groups.io
https://pcp.io
@performancepcp