Site rules with pmieconf


Tools
              pmie
              pmval
              pmchart
              pmieconf
              pmdashping

This tutorial covers customization of pmie rules using pmieconf .  For an explanation of Performance Co-Pilot terms and acronyms, consult the PCP glossary .

It is advisable to first read the comprehensive introductory pmie tutorial before tackling this one.


Initial Setup

In this exercise we create a scenario which exhibits the sort of behaviour that might be of concern in a production environment.  We'll then use several PCP tools to detect, identify and understand the problem.


Simulate an "interesting" problem scenario:

              $ while true; do sleep 0; done &
              

Have a look at some of the effects it's having on the system:

              $ pmchart -t 0.5sec -c CPU &
              

Create a new chart showing the process context switch rate ( kernel.all.pswitch ), adding it to your existing display.

Important: the above test case can be quite intrusive on low processor count machines, so remember to terminate it when you've finished this tutorial:

              $ jobs
              [1]- Running     while true; do sleep 0; done &
              [2]+ Running     pmchart -t 0.5sec -c CPU &
              $ fg %1
              

However, you should leave it running throughout all of the tests below.


Using pmieconf and pmie


Create your own pmie rules using pmieconf :
$ pmieconf -f myrules
              pmieconf> disable all
              pmieconf> enable cpu.context_switch
              pmieconf> modify global delta "5 sec"
              pmieconf> modify global holdoff ""
              pmieconf> modify global syslog_action no
              pmieconf> modify global user_action yes
              pmieconf> quit
              
Determine what this command sequence has done by:
  • Inspecting the created file myrules
  • Making reference to the pmieconf man page
  • Exploring other pmieconf commands ("help" and "list" are useful in this context)
Run pmie rules using pmieconf , and see if the alarm messages appear on standard output:
$ pmie -c myrules
              
Terminate pmie and use the reported values from pmchart to determine what the average rate of system calls is.  Then re-run pmieconf to adjust the threshold level up or down to alter the behaviour of pmie . Re-run pmie .

              $ pmieconf -f myrules
              pmieconf> modify cpu.context_switch threshold 5000    # 
              pmieconf> quit
              $ pmie -c myrules
              


Monitoring state with the shping PMDA

Installing pmdashping to record system state

The default shping configuration is $PCP_PMDAS_DIR/shping/sample.conf .  The comments explain the syntax.

Create a new configuration file, say $PCP_PMDAS_DIR/shping/my.conf , with shell tag and command of the form:


                  no-pmie    test ! -f /tmp/no-pmie
              

Install pmdashping :
# cd $PCP_PMDAS_DIR/shping
              # ./Install
              
Mostly take the defaults, other than specifying your own configuration file ( my.conf ) and setting the cycle time to 5 (seconds); don't worry about the timeout period, as timeouts are not going to happen in this configuration of the agent.

Monitoring pmdashping to observe system state

In one window, use pmval to monitor shping.status.


In a command shell:
$ pmval -t 5 shping.status

In another window, first create the file /tmp/no-pmie , wait ten seconds, and then remove the file. Observe what pmval reports in the other window. Terminate pmval .


Custom site rules with pmieconf

Using your editor of choice, edit the pmieconf output file created earlier, i.e. myrules . Append a new rule at the end (after the END GENERATED SECTION line), that is a copy of the cpu.context_switch rule.

To this new rule, add the following conjunct before the action line (containing ->), modify the message in the new rule's action to be different to the standard rule, make sure the threshold is low enough for the predicate to be true, and then save the file.

    && shping.status #'no-pmie' == 0

Re-run pmieconf to disable the standard rule:
$ pmieconf -f myrules
              pmieconf> disable cpu.context_switch
              pmieconf> quit
              
Inspect the re-created file myrules . Check your new rule is still there and the standard rule has been removed.

Run pmie using myrules , and verify that your new alarm messages appear on standard output. In another window, create the file /tmp/no-pmie , wait a while, then remove the file.

Notice there may be some delay between the creation or removal of /tmp/no-pmie and the change in pmie behaviour.  Can you explain this?