Cloud Monitoring resources for Heat

Brief summary

The Rackspace Cloud Monitoring resources allow you to configure monitoring on resources that you create with Heat.

In this tutorial, you create a web server that is monitored with a web site check and a CPU check.

Pre-reading

The following introductory material should give you enough background to proceed with this tutorial.

Example template

Start by adding the top-level template sections:

heat_template_version: 2013-05-23

description: |
  Test template using Cloud Monitoring

resources:

Resources section

Add an OS::Nova::Server resource and configure it to install the Cloud Monitoring agent:

server:
  type: OS::Nova::Server
  properties:
    image: Ubuntu 14.04 LTS (Trusty Tahr) (PVHVM)
    flavor: 2 GB Performance
    name: { get_param: "OS::stack_name" }
    user_data_format: RAW
    config_drive: true
    user_data:
      str_replace:
        template: |
          #!/bin/bash
          echo "deb http://stable.packages.cloudmonitoring.rackspace.com/ubuntu-14.04-x86_64 cloudmonitoring main" > /etc/apt/sources.list.d/rackspace-monitoring-agent.list
          curl https://monitoring.api.rackspacecloud.com/pki/agent/linux.asc | sudo apt-key add -
          apt-get -y update
          apt-get -y install rackspace-monitoring-agent apache2
          echo "monitoring_token {{token}}" > /etc/rackspace-monitoring-agent.cfg
          service rackspace-monitoring-agent restart
        params:
          "{{token}}": { get_resource: token }
    metadata:
      rax-heat: { get_param: "OS::stack_id" }
      stack-name: { get_param: "OS::stack_name" }

It is possible to monitor one or more servers by creating a Rackspace::CloudMonitoring::Entity resource 1. Entities are automatically created for cloud servers, so we will refer to the server resource above as our Cloud Monitoring entity.

Add a Rackspace::CloudMonitoring::AgentToken resource that will create a token used by the monitoring agent to authenticate with the monitoring service:

token:
  type: Rackspace::CloudMonitoring::AgentToken
  properties:
    label: { get_param: "OS::stack_name" }

Add a Rackspace::CloudMonitoring::Check resource and configure it to check that the web service on the server entity is responsive:

webcheck:
  type: Rackspace::CloudMonitoring::Check
  properties:
    entity: { get_resource: server }
    type: remote.http
    details:
      url:
        str_replace:
          template: http://server_ip/
          params:
            server_ip: { get_attr: [ server, accessIPv4 ] }
    label: webcheck
    metadata:
      rax-heat: { get_param: "OS::stack_id" }
      stack-name: { get_param: "OS::stack_name" }
    period: 120
    timeout: 10
    monitoring_zones_poll:
    - Northern Virginia (IAD)
    - Chicago (ORD)
    target_hostname: { get_attr: [ server, accessIPv4 ] }
    target_receiver: IPv4

Add another Rackspace::CloudMonitoring::Check resource and configure it to check the server’s CPU resources via the monitoring agent:

cpucheck:
  type: Rackspace::CloudMonitoring::Check
  properties:
    entity: { get_resource: server }
    type: agent.cpu
    label: cpu_check
    details: {}
    metadata:
      rax-heat: { get_param: "OS::stack_id" }
      stack-name: { get_param: "OS::stack_name" }
    period: 30
    timeout: 10

The actual alarm criteria for the CPU check will be defined in the Rackspace::CloudMonitoring::Alarm resource below.

Add a Rackspace::CloudMonitoring::Notification resource that will send an email to admin@example.com whenever it is triggered:

email_notification_1:
  type: Rackspace::CloudMonitoring::Notification
  properties:
    label: email_ops_team
    type: email
    details:
      address: "admin@example.com"

Add a similar Rackspace::CloudMonitoring::Notification resource that will send an email to allclear@example.com whenever it is triggered:

email_notification_2:
  type: Rackspace::CloudMonitoring::Notification
  properties:
    label: email_ops_team_2
    type: email
    details:
      address: "allclear@example.com"

Add a Rackspace::CloudMonitoring::NotificationPlan resource to configure Cloud Monitoring to trigger the email_notification1 notification whenever an alarm enters the WARNING or CRITICAL state and email_notification2 whenever an alarm enters the OK state:

notify_ops_team:
  type: Rackspace::CloudMonitoring::NotificationPlan
  properties:
    label: { get_param: "OS::stack_name" }
    warning_state:
    - { get_resource: email_notification_1 }
    critical_state:
    - { get_resource: email_notification_1 }
    ok_state:
    - { get_resource: email_notification_2 }

Finally, add a Rackspace::CloudMonitoring::Alarm resource that will configure the agent to enter the WARNING state when CPU usage is over 85% for 5 seconds, the CRITICAL state when CPU usage is over 95% for 5 seconds, and the OK state otherwise:

alert_ops:
  type: Rackspace::CloudMonitoring::Alarm
  properties:
    label: test_cpu_alarm
    check: { get_resource: cpucheck }
    plan: { get_resource: notify_ops_team }
    criteria: |
      :set consecutiveCount=5
      if (metric['usage_average'] > 95) {
          return new AlarmStatus(CRITICAL, 'CPU usage is #{usage_average}%');
      }
      if (metric['usage_average'] > 85) {
          return new AlarmStatus(WARNING, 'CPU usage is #{usage_average}%');
      }
      return new AlarmStatus(OK);
    metadata:
      rax-heat: { get_param: "OS::stack_id" }
      stack-name: { get_param: "OS::stack_name" }

Full template

The following template is a combination of all of the snippets above. It will create a web server that is monitored with a web site check and a CPU check.

heat_template_version: 2013-05-23

description: |
  Test template using Cloud Monitoring

resources:

  server:
    type: OS::Nova::Server
    properties:
      image: Ubuntu 14.04 LTS (Trusty Tahr) (PVHVM)
      flavor: 2 GB Performance
      name: { get_param: "OS::stack_name" }
      user_data_format: RAW
      config_drive: true
      user_data:
        str_replace:
          template: |
            #!/bin/bash
            echo "deb http://stable.packages.cloudmonitoring.rackspace.com/ubuntu-14.04-x86_64 cloudmonitoring main" > /etc/apt/sources.list.d/rackspace-monitoring-agent.list
            curl https://monitoring.api.rackspacecloud.com/pki/agent/linux.asc | sudo apt-key add -
            apt-get -y update
            apt-get -y install rackspace-monitoring-agent apache2
            echo "monitoring_token {{token}}" > /etc/rackspace-monitoring-agent.cfg
            service rackspace-monitoring-agent restart
          params:
            "{{token}}": { get_resource: token }
      metadata:
        rax-heat: { get_param: "OS::stack_id" }
        stack-name: { get_param: "OS::stack_name" }

  token:
    type: Rackspace::CloudMonitoring::AgentToken
    properties:
      label: { get_param: "OS::stack_name" }

  webcheck:
    type: Rackspace::CloudMonitoring::Check
    properties:
      entity: { get_resource: server }
      type: remote.http
      details:
        url:
          str_replace:
            template: http://server_ip/
            params:
              server_ip: { get_attr: [ server, accessIPv4 ] }
      label: webcheck
      metadata:
        rax-heat: { get_param: "OS::stack_id" }
        stack-name: { get_param: "OS::stack_name" }
      period: 120
      timeout: 10
      monitoring_zones_poll:
      - Northern Virginia (IAD)
      - Chicago (ORD)
      target_hostname: { get_attr: [ server, accessIPv4 ] }
      target_receiver: IPv4

  cpucheck:
    type: Rackspace::CloudMonitoring::Check
    properties:
      entity: { get_resource: server }
      type: agent.cpu
      label: cpu_check
      details: {}
      metadata:
        rax-heat: { get_param: "OS::stack_id" }
        stack-name: { get_param: "OS::stack_name" }
      period: 30
      timeout: 10

  email_notification_1:
    type: Rackspace::CloudMonitoring::Notification
    properties:
      label: email_ops_team
      type: email
      details:
        address: "admin@example.com"

  email_notification_2:
    type: Rackspace::CloudMonitoring::Notification
    properties:
      label: email_ops_team_2
      type: email
      details:
        address: "allclear@example.com"

  notify_ops_team:
    type: Rackspace::CloudMonitoring::NotificationPlan
    properties:
      label: { get_param: "OS::stack_name" }
      warning_state:
      - { get_resource: email_notification_1 }
      critical_state:
      - { get_resource: email_notification_1 }
      ok_state:
      - { get_resource: email_notification_2 }

  alert_ops:
    type: Rackspace::CloudMonitoring::Alarm
    properties:
      label: test_cpu_alarm
      check: { get_resource: cpucheck }
      plan: { get_resource: notify_ops_team }
      criteria: |
        :set consecutiveCount=5
        if (metric['usage_average'] > 95) {
            return new AlarmStatus(CRITICAL, 'CPU usage is #{usage_average}%');
        }
        if (metric['usage_average'] > 85) {
            return new AlarmStatus(WARNING, 'CPU usage is #{usage_average}%');
        }
        return new AlarmStatus(OK);
      metadata:
        rax-heat: { get_param: "OS::stack_id" }
        stack-name: { get_param: "OS::stack_name" }

Reference documentation

1

The following is an example of a Rackspace::CloudMonitoring::Entity resource definition:

entity:
  type: Rackspace::CloudMonitoring::Entity
  properties:
    label: { get_param: "OS::stack_name" }
    metadata:
      rax-heat: { get_param: "OS::stack_id" }
      stack-name: { get_param: "OS::stack_name" }
    ip_addresses:
      web_server: { get_attr: [ server, accessIPv4 ] }