Extending the Grafana integration

This project involves advancing the performance analysis capabilities provided by the PCP Grafana integration in grafana-pcp.

The student will work on:

  1. developing a 'reactive' dashboard with the scripted dashboards feature of Grafana and using PCP metrics
  2. implementing support for PCP 'derived metrics' in the Vector data source, as described in the PCP REST API documentation
  3. improving the 'metric search' process by creating a new page for live, full-text search on PCP metric names, labels and descriptions

Based on performance metrics from PCP and a predefined rule set, the 'reactive' dashboard should display only relevant metrics to the user. For example, if the CPU utilization is high, it should show an overview of CPU statistics and processes contributing load. The user should then be able to drill down into more specific areas of interest - e.g. present graphs about lock contention.

For each recognized performance issue a link should be presented to the user on how to resolve the issue. In this way the dashboards guide users toward possible root cause of performance problems.

As a starting point, an initial checklist schema shows a tree structure, rules, help text and links to further information.

Expected results: The student will extend their TypeScript and React programming skills, will gain insight into the semantics of various forms of performance data available from systems and applications, and visualization techniques appropriate to their analysis. They will also learn a great detail about the inner workings of Grafana, a popular open-source visualization tool.

Prerequisite knowledge: TypeScript, JavaScript and React programming, operating systems.

Skill level: Intermediate

Primary mentor: Andreas Gerstmayr <agerstmayr@redhat.com> , secondary mentor: Jason Koch <jkoch@netflix.com>

Interested students so far: 3


Improving the pbench integration

This project involves advancing the performance analysis capabilities provided by the PCP integration within the pbench benchmarking and analysis framework.

The student will work on:

  1. Modifying pbench to leverage pmlogger's remote collection capabilities
  2. Leveraging PCP's archive compression feature for efficient storage
  3. Enhancing pbench with the ability to enable live display of collected metrics via Grafana
  4. Modify PCP to provide new pmcd agent metrics, pmlogger and pmlogconf templates tailored to recommended performance analysis data collection for target workloads (database, web server, computation, low-latency networking, etc.)

Expected results: The student will extend their Python and Bash programming skills, learn about extensible system benchmarking with pbench, learn how to export data to Redis for the grafana-pcp data source, work with the PCP pmlogger tool to efficiently collect data, and discover the kinds of operating system and application metrics PCP is capable of collecting to match various workloads.

Prerequisite knowledge: Python and Bash programming, operating systems.

Skill level: Intermediate

Primary mentor: Peter Portant <pportant@redhat.com> , secondary mentor: Nick Dokos <ndokos@redhat.com>

Interested students so far: 4


Timeseries query language extension

Performance Co-Pilot timeseries are series of time-stamped values gathered centrally from hosts making performance data available. This data could be gathered for many metrics, at high frequency, and from many hosts. It is potentially high volume data, and searching it efficiently (querying) at speed is a non-trivial problem.

The Performance Co-Pilot timeseries query language is designed to allow fast querying based on metric names and labels. A command line utility and a REST API are available from pmseries and the pmproxy daemon.

The following is a simple example query which extracts the five most recently sampled values for aggregate disk read throughput (disk.all.read metric) for two hosts:

disk.all.read { hostname: "app1.acme.com" || hostname: "app2.acme.com" } [count: 5]

Internally, the PCP query language makes use of the Redis distributed data store and its native timeseries features. The pmseries command line utility provides low-level access to the language.

This project will extend the existing query language with:

  1. statistical functions (sum, mean, average, standard deviation, histogram binning, top-N, N-th percentile)
  2. rate conversion function for counter metrics
  3. scale and unit conversion functions
  4. mathematical functions (abs, floor, log, sqrt, round)
  5. binary operators for numeric metrics (addition, subtraction, division, multiplication, exponentiation)

Expected results: The student will extend their C language programming skills, learn about the lex and yacc language parsing tools, performance analysis with PCP and the Redis distributed data store.

Prerequisite knowledge: C programming.

Skill level: Intermediate-Advanced

Primary mentor: Mark Goodwin <goodwinos@gmail.com> , secondary mentor: Nathan Scott <nathans@redhat.com>

Interested students so far: 3


Scaling timeseries injest and querying

The Performance Co-Pilot approach to scalable, multi-host performance analysis builds on the Redis distributed data store and its native timeseries support. The pmseries utility and pmproxy daemon provide the tooling and APIs to support this. This project will improve scalability in these programs through:

  1. extending PCP instrumentation in pmproxy to expose latency and throughput metrics to analysis tools
  2. adding PCP functionality to make pmseries queries use Redis features for parallel query execution, with the aim of scaling up PCP timeseries injest and querying to the level of many-thousands-of-nodes.
  3. implementing compression of responses in pmproxy and evaluating impact on overall and individual response performance
  4. implementing and evaluating other performance improvement ideas based on profiling and analysis of the server under load

Expected results: The student will extend their C language programming skills, learn about the Redis distributed data store (and Redis clustering in particular), gain deep familiarity with low-level Linux performance tuning tools such as perf and bpftrace as well as learning to apply PCP tools to analyse complex distributed system performance problems.

Prerequisite knowledge: C programming, Linux experience.

Skill level: Intermediate-Advanced

Primary mentor: Nathan Scott <nathans@redhat.com> , secondary mentor: Mark Goodwin <goodwinos@gmail.com>

Interested students so far: 2