Salt stack issues

  • The function “state.apply” is running as PID

Restart salt-minion with command: service salt-minion restart

  • No matching sls found for ‘init’ in env ‘base’

Add top.sls file in the directory where your main sls file is present.

Create the file as follows:

base:
'web*':
- apache

If the sls is present in a subdirectory elasticsearch/init.sls then write the top.sls as:

base:
'*':
- elasticsearch.init
  • How to execute saltstack-formulas
    1. create file /srv/pillar/top.sls with content:
    base:
      '*':
        - salt
    1. create file /srv/pillar/salt.sls with content:
    salt:
      master:
        worker_threads: 2
        fileserver_backend:
          - roots
          - git
        gitfs_remotes:
          - git://github.com/saltstack-formulas/epel-formula.git
          - git://github.com/saltstack-formulas/git-formula.git
          - git://github.com/saltstack-formulas/nano-formula.git
          - git://github.com/saltstack-formulas/rabbitmq-formula.git
          - git://github.com/saltstack-formulas/remi-formula.git
          - git://github.com/saltstack-formulas/vim-formula.git
          - git://github.com/saltstack-formulas/salt-formula.git
          - git://github.com/saltstack-formulas/users-formula.git
        external_auth:
          pam:
            tiger:
              - .*
              - '@runner'
              - '@wheel'
        file_roots:
          base:
            - /srv/salt
        pillar_roots:
          base:
            - /srv/pillar
        halite:
          level: 'debug'
          server: 'gevent'
          host: '0.0.0.0'
          port: '8080'
          cors: False
          tls: True
          certpath: '/etc/pki/tls/certs/localhost.crt'
          keypath: '/etc/pki/tls/certs/localhost.key'
          pempath: '/etc/pki/tls/certs/localhost.pem'
      minion:
        master: localhost
    1. before you can use saltstack-formula you need to make one change to /etc/salt/master and add next config:
    fileserver_backend:
      - roots
      - git
    gitfs_remotes:
      - git://github.com/saltstack-formulas/salt-formula.git
    1. restart salt-master (e.g. service salt-master restart)
    2. run salt-call state.sls salt.master
  • The Salt Master has cached the public key for this node

Execute the following command:

delete the exiting key on master by:

salt-key -d <minion-id>

then restart minion. Then reaccept the key on master:

salt-key -a <minion-id>

  • If salt-cloud is giving error as below:

Missing dependency: ‘netaddr’. The openstack driver requires ‘netaddr’ to be installed.

Execute the command: yum install python-netaddr

then verify if your provider is loaded with command: salt-cloud –list-providers

  • Remove dead minions keys in salt

salt-run manage.down removekeys=True

Advertisements

Yum : Operation too slow. Less than 1000 byt es/sec transferred the last 30 seconds

  • First thing to try is the usual
    yum clean all
  • You might be running 3rd party repositories and do not have yum-plugin-priorities installed.
    This could compromise your system, so please install and configure yum-plugin-priorities.
  • You could also try the following:

yum –disableplugin=fastestmirror update.

  • minrate This sets the low speed threshold in bytes per second. If the server is sending data slower than this for at least timeout' seconds, Yum aborts the connection. The default is1000′.

  timeout Number of seconds to wait for a connection before timing out. Defaults to 30 seconds. This may be too short of a time for extremely overloaded sites.


You can reduce minrate and/or increase timeoute. Just add/edit these parameters in /etc/yum.conf [main] section. For example:

[main]
...
minrate=1
timeout=300

Diamond installation on centos 7

$ yum install make rpm-build python-configobj python-setuptools
$ git clone https://github.com/python-diamond/Diamond
$ cd Diamond
$ make buildrpm
Then use the package you built like this:

$ yum localinstall –nogpgcheck dist/diamond-4.0.449-0.noarch.rpm
$ cp /etc/diamond/{diamond.conf.example,diamond.conf}
$ $EDITOR /etc/diamond/diamond.conf
# Start Diamond service via service manager.

$ service diamond start

diamond-setup -C ElasticSearchCollector
diamond-setup -C NetworkCollector

Issues

  1. failed to connect socket to ‘/var/run/libvirt/libvirt-sock-ro’ no such file or directory

execute: egrep ‘(vmx|svm)’ /proc/cpuinfo

2. If the above commands returns with any output showing vmx or svm then your hardware supports VT else it does not.

yum install qemu-kvm qemu-img virt-manager libvirt libvirt-python libvirt-client virt-install virt-viewer

Elasticsearch Queries

  1. Create indices
curl -XPUT 'localhost:9200/twitter?pretty' -H 'Content-Type: application/json' -d'
{
 "settings" : {
 "index" : {
 "number_of_shards" : 3,
 "number_of_replicas" : 2
 }
 }
}
'

2. Search

curl -XGET 'localhost:9200/sw/_search?pretty' -H 'Content-Type: application/json' -d'
{
 "query": { "match_all": {} },
 "_source": ["gender", "height"]
}
'</pre>
3. Creating index and adding documents to it
<pre>curl -XPUT 'localhost:9200/my_index?pretty' -H 'Content-Type: application/json' -d'
{
 "mappings": {
 "my_type": {
 "properties": {
 "user": {
 "type": "nested"
 }
 }
 }
 }
}
'
curl -XPUT 'localhost:9200/my_index/my_type/1?pretty' -H 'Content-Type: application/json' -d'
{
 "group" : "fans",
 "user" : [
 {
 "first" : "John",
 "last" : "Smith"
 },
 {
 "first" : "Alice",
 "last" : "White"
 }
 ]
}
'

4. Must match

curl -XGET 'localhost:9200/my_index/_search?pretty' -H 'Content-Type: application/json' -d'
{
 "query": {
 "nested": {
 "path": "user",
 "query": {
 "bool": {
 "must": [
 { "match": { "user.first": "Alice" }},
 { "match": { "user.last": "Smith" }}
 ]
 }
 }
 }
 }
}
'

5. Highlight

curl -XGET 'localhost:9200/my_index/_search?pretty' -H 'Content-Type: application/json' -d'
{
  "query": {
    "nested": {
      "path": "user",
      "query": {
        "bool": {
          "must": [
            { "match": { "user.first": "Alice" }},
            { "match": { "user.last":  "White" }}
          ]
        }
      },
      "inner_hits": {
        "highlight": {
          "fields": {
            "user.first": {}
          }
        }
      }
    }
  }
}
'

6. To get all records:
curl -XGET ‘localhost:9200//_search?size=100&pretty=true’ -d ”

7. Match all


curl -XGET 'localhost:9200/foo/_search?size=NO_OF_RESULTS' -d '
{
"query" : {
 "match_all" : {}
 }
}'

8. This example does a match_all and returns documents 11 through 20


curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d'
{
"query": { "match_all": {} },
"from": 10,
"size": 10
}
'

9. This example does a match_all and sorts the results by account balance in descending order and returns the top 10 (default size) documents


curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d'
{
"query": { "match_all": {} },
"sort": { "balance": { "order": "desc" } }
}
'

10. This example shows how to return two fields, account_number and balance (inside of _source), from the search


curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d'
{
"query": { "match_all": {} },
"_source": ["account_number", "balance"]
}
'

11. This example returns the account numbered 20


curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d'
{
"query": { "match": { "account_number": 20 } }
}
'

12. This example returns all accounts containing the term “mill” in the address


curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d'
{
"query": { "match": { "address": "mill" } }
}
'

13. This example returns all accounts containing the term “mill” or “lane” in the address


curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d'
{
"query": { "match": { "address": "mill lane" } }
}
'

14. This example is a variant of match (match_phrase) that returns all accounts containing the phrase “mill lane” in the address


curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d'
{
"query": { "match_phrase": { "address": "mill lane" } }
}
'

15. This example composes two match queries and returns all accounts containing “mill” and “lane” in the address


curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool": {
      "must": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}
'

16. In contrast, this example composes two match queries and returns all accounts containing “mill” or “lane” in the address


curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool": {
      "should": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}
'

17. This example returns all accounts of anybody who is 40 years old but doesn’t live in ID

curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool": {
      "must": [
        { "match": { "age": "40" } }
      ],
      "must_not": [
        { "match": { "state": "ID" } }
      ]
    }
  }
}
'

18. This example uses a bool query to return all accounts with balances between 20000 and 30000


curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool": {
      "must": { "match_all": {} },
      "filter": {
        "range": {
          "balance": {
            "gte": 20000,
            "lte": 30000
          }
        }
      }
    }
  }
}
'

19. To start with, this example groups all the accounts by state, and then returns the top 10 (default) states sorted by count descending (also default)

curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      }
    }
  }
}
'

20. Building on the previous aggregation, let’s now sort on the average balance in descending order

curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword",
        "order": {
          "average_balance": "desc"
        }
      },
      "aggs": {
        "average_balance": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}
'

21. This example demonstrates how we can group by age brackets (ages 20-29, 30-39, and 40-49), then by gender, and then finally get the average account balance, per age bracket, per gender

curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "group_by_age": {
      "range": {
        "field": "age",
        "ranges": [
          {
            "from": 20,
            "to": 30
          },
          {
            "from": 30,
            "to": 40
          },
          {
            "from": 40,
            "to": 50
          }
        ]
      },
      "aggs": {
        "group_by_gender": {
          "terms": {
            "field": "gender.keyword"
          },
          "aggs": {
            "average_balance": {
              "avg": {
                "field": "balance"
              }
            }
          }
        }
      }
    }
  }
}
'

22. Assuming the data consists of documents representing exams grades (between 0 and 100) of students we can average their scores with

curl -XPOST 'localhost:9200/exams/_search?size=0&pretty' -H 'Content-Type: application/json' -d'
{
    "aggs" : {
        "avg_grade" : { "avg" : { "field" : "grade" } }
    }
}
'

23. Multiply current marks with 1.2 then get the aggregate

curl -XPOST 'localhost:9200/exams/_search?size=0&pretty' -H 'Content-Type: application/json' -d'
{
    "aggs" : {
        "avg_corrected_grade" : {
            "avg" : {
                "field" : "grade",
                "script" : {
                    "lang": "painless",
                    "inline": "_value * params.correction",
                    "params" : {
                        "correction" : 1.2
                    }
                }
            }
        }
    }
}
'

24. Documents without a value in the grade field will fall into the same bucket as documents that have the value 10

curl -XPOST 'localhost:9200/exams/_search?size=0&pretty' -H 'Content-Type: application/json' -d'
{
    "aggs" : {
        "grade_avg" : {
            "avg" : {
                "field" : "grade",
                "missing": 10
            }
        }
    }
}
'

25. Type count for the balance

curl -XPOST 'localhost:9200/bank/_search?size=0&pretty' -H 'Content-Type: application/json' -d'
{
    "aggs" : {
        "type_count" : {
            "cardinality" : {
                "field" : "balance"
            }
        }
    }
}
'

26. Use of inline painless script for adding promoted value to type value

curl -XPOST 'localhost:9200/bank/_search?size=0&pretty' -H 'Content-Type: application/json' -d'
{
    "aggs" : {
        "type_promoted_count" : {
            "cardinality" : {
                "script": {
                    "lang": "painless",
                    "inline": "doc[\u0027type\u0027].value + \u0027 \u0027 + doc[\u0027promoted\u0027].value"
                }
            }
        }
    }
}
'

27. Extended stats for balance

curl -XPOST 'localhost:9200/bank/_search?size=0&pretty' -H 'Content-Type: application/json' -d'
{
    "aggs" : {
        "grades_stats" : { "extended_stats" : { "field" : "balance" } }
    }
}
'

28. Geopoint and geo centroid example

curl -XPUT 'localhost:9200/museums' -H 'Content-Type: application/json' -d'
{
    "mappings": {
        "doc": {
            "properties": {
                "location": {
                    "type": "geo_point"
                }
            }
        }
    }
}
'

curl -XPOST 'localhost:9200/museums/doc/_bulk?refresh' -H 'Content-Type: application/json' -d'
{"index":{"_id":1}}
{"location": "52.374081,4.912350", "name": "NEMO Science Museum"}
{"index":{"_id":2}}
{"location": "52.369219,4.901618", "name": "Museum Het Rembrandthuis"}
{"index":{"_id":3}}
{"location": "52.371667,4.914722", "name": "Nederlands Scheepvaartmuseum"}
{"index":{"_id":4}}
{"location": "51.222900,4.405200", "name": "Letterenhuis"}
{"index":{"_id":5}}
{"location": "48.861111,2.336389", "name": "Musée du Louvre"}
{"index":{"_id":6}}
{"location": "48.860000,2.327000", "name": "Musée dOrsay"}'

curl -XPOST 'localhost:9200/museums/_search?size=0' -H 'Content-Type: application/json' -d'
{
    "query" : {
        "match" : { "name" : "musée" }
    },
    "aggs" : {
        "viewport" : {
            "geo_bounds" : {
                "field" : "location",
                "wrap_longitude" : true
            }
        }
    }
}
'

curl -XPOST 'localhost:9200/museums/_search?size=0' -H 'Content-Type: application/json' -d'
{
    "aggs" : {
        "centroid" : {
            "geo_centroid" : {
                "field" : "location"
            }
        }
    }
}
'

curl -XPOST 'localhost:9200/museums/_search?size=0' -H 'Content-Type: application/json' -d'
{
    "aggs" : {
        "cities" : {
            "terms" : { "field" : "city.keyword" },
            "aggs" : {
                "centroid" : {
                    "geo_centroid" : { "field" : "location" }
                }
            }
        }
    }
}
'

29. Max balance

curl -XPOST 'localhost:9200/bank/_search?size=0&pretty' -H 'Content-Type: application/json' -d'
{
    "aggs" : {
        "max_price" : { "max" : { "field" : "balance" } }
    }
}
'

30. Min balance

curl -XPOST 'localhost:9200/sales/_search?size=0&pretty' -H 'Content-Type: application/json' -d'
{
    "aggs" : {
        "min_price" : { "min" : { "field" : "price" } }
    }
}
'

31. Percentiles

{
    "aggs" : {
        "load_time_outlier" : {
            "percentiles" : {
                "field" : "load_time"
            }
        }
    }
}

32. Percentiles of values within specific bounds

curl -XPOST 'localhost:9200/bank/account/_search?size=0&pretty' -H 'Content-Type: application/json' -d'
{
    "aggs": {
        "balance_outlier": {
            "percentile_ranks": {
                "field": "balance",
                "values": [25000, 50000],
                "keyed": false
            }
        }
    }
}
'

33. Sum of hat prices

{
"aggs" : {
        "hat_prices" : { "sum" : { "field" : "price" } }
    }
}

34. Sort by call_duration in descending order

curl -u elastic:changeme -XGET 'localhost:9200/index-alias2-events-2015.01.01-00/_search?pretty' -H 'Content-Type: application/json' -d'
{
  "query": { "match_all": {} },
  "sort": { "call_duration": { "order": "desc" } }
}
'

The Ceres Database

Ceres is a time-series database format intended to replace Whisper as the default storage format for Graphite. In contrast with Whisper, Ceres is not a fixed-size database and is designed to better support sparse data of arbitrary fixed-size resolutions. This allows Graphite to distribute individual time-series across multiple servers or mounts.

Ceres is not actively developped at the moment. For alternatives to whisper look at alternative storage backends.

Storage Overview

Ceres databases are comprised of a single tree contained within a single path on disk that stores all metrics in nesting directories as nodes.

A Ceres node represents a single time-series metric, and is composed of at least two data files. A slice to store all data points, and an arbitrary key-value metadata file. The minimum required metadata a node needs is a 'timeStep'. This setting is the finest resolution that can be used for writing. A Ceres node however can contain and read data with other, less-precise values in its underlying slice data.

Other metadata keys that may be set for compatibility with Graphite are 'retentions' , 'xFilesFacter', and 'aggregationMethod'.

A Ceres slice contains the actual data points in a file. The only other information a slice holds is the timestamp of the oldest data point, and the resolution. Both of which are encoded as part of its filename in the format timestamp@resolution.

Data points in Ceres are stored on-disk as a contiguous list of big-endian double-precision floats. The timestamp of a datapoint is not stored with the value, rather it is calculated by using the timestamp of the slice plus the index offset of the value multiplied by the resolution.

The timestamp is the number of seconds since the UNIX Epoch (01-01-1970). The data value is parsed by the Python float() function and as such behaves in the same way for special strings such as 'inf'. Maximum and minimum values are determined by the Python interpreter’s allowable range for float values which can be found by executing:

python -c 'import sys; print sys.float_info'

Slices: Precision and Fragmentation

Ceres databases contain one or more slices, each with a specific data resolution and a timestamp to mark the beginning of the slice. Slices are ordered from the most recent timestamp to the oldest timestamp. Resolution of data is not considered when reading from a slice, only that when writing a slice with the finest precision configured for the node exists.

Gaps in data are handled in Ceres by padding slices with null datapoints. If the slice gap however is too big, then a new slice is instead created. If a Ceres node accumulates too many slices, read performance can suffer. This can be caused by intermittently reported data. To mitigate slice fragmentation there is a tolerance for how much space can be wasted within a slice file to avoid creating a new one. That tolerance level is determined by 'MAX_SLICE_GAP', which is the number of consecutive null datapoints allowed in a slice file.

If set very low, Ceres will waste less of the tiny bit disk space that this feature wastes, but then will be prone to performance problems caused by slice fragmentation, which can be pretty severe.

If set really high, Ceres will waste a bit more disk space. Although each null datapoint wastes 8 bytes, you must keep in mind your filesystem’s block size. If you suffer slice fragmentation issues, you should increase this or defrag the data more often. However you should not set it to be huge because then if a large but allowed gap occurs it has to get filled in, which means instead of a simple 8-byte write to a new file we could end up doing an (8 * MAX_SLICE_GAP)-byte write to the latest slice.

Rollup Aggregation

Expected features such as roll-up aggregation and data expiration are not provided by Ceres itself, but instead are implemented as maintenance plugins.

Such a rollup plugin exists in Ceres that aggregates data points in a way that is similar behavior of Whisper archives. Where multiple data points are collapsed and written to a lower precision slice, and data points outside of the set slice retentions are trimmed. By default, an average function is used, however alternative methods can be chosen by changing the metadata.

Retrieval Behavior

When data is retrieved (scoped by a time range), the first slice which has data within the requested interval is used. If the time period overlaps a slice boundary, then both slices are read, with their values joined together. Any missing data between them are filled with null data points.

There is currently no support in Ceres for handling slices with mixed resolutions in the same way that is done with Whisper archives.

Database Format

CeresSlice Data
Data Point+

Data types in Python’s struct format:

Point !d

Metadata for Ceres is stored in JSON format:

{“retentions”: [[30, 1440]], “timeStep”: 30, “xFilesFactor”: 0.5, “aggregationMethod”: “average”}

Graphite and your data

Getting your data into Graphite is very flexible. There are three main methods for sending data to Graphite: Plaintext, Pickle, and AMQP.

It’s worth noting that data sent to Graphite is actually sent to the Carbon and Carbon-Relay, which then manage the data. The Graphite web interface reads this data back out, either from cache or straight off disk.

Choosing the right transfer method for you is dependent on how you want to build your application or script to send data:

  • There are some tools and APIs which can help you get your data into Carbon.
  • For a singular script, or for test data, the plaintext protocol is the most straightforward method.
  • For sending large amounts of data, you’ll want to batch this data up and send it to Carbon’s pickle receiver.
  • Finally, Carbon can listen to a message bus, via AMQP.

The plaintext protocol

The plaintext protocol is the most straightforward protocol supported by Carbon.

The data sent must be in the following format: <metric path> <metric value> <metric timestamp>. Carbon will then help translate this line of text into a metric that the web interface and Whisper understand.

On Unix, the nc program (netcat) can be used to create a socket and send data to Carbon (by default, ‘plaintext’ runs on port 2003):

If you use the OpenBSD implementation of netcat, please follow this example:

PORT=2003
SERVER=graphite.your.org
echo "local.random.diceroll 4 `date +%s`" | nc -q0 ${SERVER} ${PORT}

The -q0 parameter instructs nc to close socket once data is sent. Without this option, some nc versions would keep the connection open.

If you use the GNU implementation of netcat, please follow this example:

PORT=2003
SERVER=graphite.your.org
echo "local.random.diceroll 4 `date +%s`" | nc -c ${SERVER} ${PORT}

The -c parameter instructs nc to close socket once data is sent. Without this option, nc will keep the connection open and won’t end.

The pickle protocol

The pickle protocol is a much more efficient take on the plaintext protocol, and supports sending batches of metrics to Carbon in one go.

The general idea is that the pickled data forms a list of multi-level tuples:

[(path, (timestamp, value)), ...]

Once you’ve formed a list of sufficient size (don’t go too big!), and pickled it (if your client is running a more recent version of python than your server, you may need to specify the protocol) send the data over a socket to Carbon’s pickle receiver (by default, port 2004). You’ll need to pack your pickled data into a packet containing a simple header:

payload = pickle.dumps(listOfMetricTuples, protocol=2)
header = struct.pack("!L", len(payload))
message = header + payload

You would then send the message object through a network socket.

Using AMQP

When AMQP_METRIC_NAME_IN_BODY is set to True in your carbon.conf file, the data should be of the same format as the plaintext protocol, e.g. echo “local.random.diceroll 4 date +%s”. When AMQP_METRIC_NAME_IN_BODY is set to False, you should omit ‘local.random.diceroll’.

Getting Your Data Into Graphite

The Basic Idea

Graphite is useful if you have some numeric values that change over time and you want to graph them. Basically you write a program to collect these numeric values which then sends them to graphite’s backend, Carbon.

Step 1 – Plan a Naming Hierarchy

Everything stored in graphite has a path with components delimited by dots. So for example, website.orbitz.bookings.air or something like that would represent the number of air bookings on orbitz. Before producing your data you need to decide what your naming scheme will be. In a path such as “foo.bar.baz”, each thing surrounded by dots is called a path component. So “foo” is a path component, as well as “bar”, etc.

Each path component should have a clear and well-defined purpose. Volatile path components should be kept as deep into the hierarchy as possible.

Step 2 – Configure your Data Retention

Graphite is built on fixed-size databases (see Whisper.) so we have to configure in advance how much data we intend to store and at what level of precision. For instance you could store your data with 1-minute precision (meaning you will have one data point for each minute) for say 2 hours. Additionally you could store your data with 10-minute precision for 2 weeks, etc. The idea is that the storage cost is determined by the number of data points you want to store, the less fine your precision, the more time you can cover with fewer points. To determine the best retention configuration, you must answer all of the following questions.

  1. How often can you produce your data?
  2. What is the finest precision you will require?
  3. How far back will you need to look at that level of precision?
  4. What is the coarsest precision you can use?
  5. How far back would you ever need to see data? (yes it has to be finite, and determined ahead of time)

Once you have picked your naming scheme and answered all of the retention questions, you need to create a schema by creating/editing the /opt/graphite/conf/storage-schemas.conf file.

The format of the schemas file is easiest to demonstrate with an example. Let’s say we’ve written a script to collect system load data from various servers, the naming scheme will be like so:

servers.HOSTNAME.METRIC

Where HOSTNAME will be the server’s hostname and METRIC will be something like cpu_load, mem_usage, open_files, etc. Also let’s say we want to store this data with minutely precision for 30 days, then at 15 minute precision for 10 years.

For details of implementing your schema, see the Configuring Carbon document.

Basically, when carbon receives a metric, it determines where on the filesystem the whisper data file should be for that metric. If the data file does not exist, carbon knows it has to create it, but since whisper is a fixed size database, some parameters must be determined at the time of file creation (this is the reason we’re making a schema). Carbon looks at the schemas file, and in order of priority (highest to lowest) looks for the first schema whose pattern matches the metric name. If no schema matches the default schema (2 hours of minutely data) is used. Once the appropriate schema is determined, carbon uses the retention configuration for the schema to create the whisper data file appropriately.

Step 3 – Understanding the Graphite Message Format

Graphite understands messages with this format:

metric_path value timestamp\n

metric_path is the metric namespace that you want to populate.

value is the value that you want to assign to the metric at this time.

timestamp is the number of seconds since unix epoch time.

A simple example of doing this from the unix terminal would look like this:

echo "test.bash.stats 42 `date +%s`" | nc graphite.example.com 2003

There are many tools that interact with Graphite. See the Tools page for some choices of tools that may be used to feed Graphite.

What is Graphite?

What Graphite is and is not

Graphite does two things:

  1. Store numeric time-series data
  2. Render graphs of this data on demand

What Graphite does not do is collect data for you, however there are some tools out there that know how to send data to graphite. Even though it often requires a little code, sending data to Graphite is very simple.

About the project

Graphite is an enterprise-scale monitoring tool that runs well on cheap hardware. It was originally designed and written by Chris Davis at Orbitz in 2006 as side project that ultimately grew to be a foundational monitoring tool. In 2008, Orbitz allowed Graphite to be released under the open source Apache 2.0 license. Since then Chris has continued to work on Graphite and has deployed it at other companies including Sears, where it serves as a pillar of the e-commerce monitoring system. Today many large companies use it.

The architecture in a nutshell

Graphite consists of 3 software components:

  1. carbon – a Twisted daemon that listens for time-series data
  2. whisper – a simple database library for storing time-series data (similar in design to RRD)
  3. graphite webapp – A Django webapp that renders graphs on-demand using Cairo

Feeding in your data is pretty easy, typically most of the effort is in collecting the data to begin with. As you send datapoints to Carbon, they become immediately available for graphing in the webapp. The webapp offers several ways to create and display graphs including a simple URL APIfor rendering that makes it easy to embed graphs in other webpages.

_images/overview.png

What is Graphite?

Graphite is a highly scalable real-time graphing system. As a user, you write an application that collects numeric time-series data that you are interested in graphing, and send it to Graphite’s processing backend, carbon, which stores the data in Graphite’s specialized database. The data can then be visualized through graphite’s web interfaces.

The Carbon Daemons

When we talk about “Carbon” we mean one or more of various daemons that make up the storage backend of a Graphite installation. In simple installations, there is typically only one daemon, carbon-cache.py. As an installation grows, the carbon-relay.py and carbon-aggregator.py daemons can be introduced to distribute metrics load and perform custom aggregations, respectively.

All of the carbon daemons listen for time-series data and can accept it over a common set of protocols. However, they differ in what they do with the data once they receive it. This document gives a brief overview of what each daemon does and how you can use them to build a more sophisticated storage backend.

carbon-cache.py

carbon-cache.py accepts metrics over various protocols and writes them to disk as efficiently as possible. This requires caching metric values in RAM as they are received, and flushing them to disk on an interval using the underlying whisper library. It also provides a query service for in-memory metric datapoints, used by the Graphite webapp to retrieve “hot data”.

carbon-cache.py requires some basic configuration files to run:

carbon.conf
The [cache] section tells carbon-cache.py what ports (2003/2004/7002), protocols (newline delimited, pickle) and transports (TCP/UDP) to listen on.
storage-schemas.conf
Defines a retention policy for incoming metrics based on regex patterns. This policy is passed to whisper when the .wsp file is pre-allocated, and dictates how long data is stored for.

As the number of incoming metrics increases, one carbon-cache.py instance may not be enough to handle the I/O load. To scale out, simply run multiple carbon-cache.py instances (on one or more machines) behind a carbon-aggregator.py or carbon-relay.py.

carbon-relay.py

carbon-relay.py serves two distinct purposes: replication and sharding.

When running with RELAY_METHOD = rules, a carbon-relay.py instance can run in place of a carbon-cache.py server and relay all incoming metrics to multiple backend carbon-cache.py‘s running on different ports or hosts.

In RELAY_METHOD = consistent-hashing mode, a DESTINATIONS setting defines a sharding strategy across multiple carbon-cache.py backends. The same consistent hashing list can be provided to the graphite webapp via CARBONLINK_HOSTS to spread reads across the multiple backends.

carbon-relay.py is configured via:

carbon.conf
The [relay] section defines listener host/ports and a RELAY_METHOD
relay-rules.conf
With RELAY_METHOD = rules set, pattern/servers tuples in this file define which metrics matching certain regex rules are forwarded to which hosts.

carbon-aggregator.py

carbon-aggregator.py can be run in front of carbon-cache.py to buffer metrics over time before reporting them into whisper. This is useful when granular reporting is not required, and can help reduce I/O load and whisper file sizes due to lower retention policies.

carbon-aggregator.py is configured via:

carbon.conf
The [aggregator] section defines listener and destination host/ports.
aggregation-rules.conf
Defines a time interval (in seconds) and aggregation function (sum or average) for incoming metrics matching a certain pattern. At the end of each interval, the values received are aggregated and published to carbon-cache.py as a single metric.

carbon-aggregator-cache.py

carbon-aggregator-cache.py combines both carbon-aggregator.py and carbon-cache.py. This is useful to reduce the resource and administration overhead of running both daemons.

carbon-aggregator-cache.py is configured via:

carbon.conf
The [aggregator-cache] section defines listener and destination host/ports.
relay-rules.conf
See carbon-relay.py section.
aggregation-rules.conf
See carbon-aggregator.py section.

The Whisper Database

Whisper is a fixed-size database, similar in design and purpose to RRD (round-robin-database). It provides fast, reliable storage of numeric data over time. Whisper allows for higher resolution (seconds per point) of recent data to degrade into lower resolutions for long-term retention of historical data.

Data Points

Data points in Whisper are stored on-disk as big-endian double-precision floats. Each value is paired with a timestamp in seconds since the UNIX Epoch (01-01-1970). The data value is parsed by the Python float() function and as such behaves in the same way for special strings such as 'inf'. Maximum and minimum values are determined by the Python interpreter’s allowable range for float values which can be found by executing:

python -c 'import sys; print sys.float_info'

Archives: Retention and Precision

Whisper databases contain one or more archives, each with a specific data resolution and retention (defined in number of points or max timestamp age). Archives are ordered from the highest-resolution and shortest retention archive to the lowest-resolution and longest retention period archive.

To support accurate aggregation from higher to lower resolution archives, the precision of a longer retention archive must be divisible by precision of next lower retention archive. For example, an archive with 1 data point every 60 seconds can have a lower-resolution archive following it with a resolution of 1 data point every 300 seconds because 60 cleanly divides 300. In contrast, a 180 second precision (3 minutes) could not be followed by a 600 second precision (10 minutes) because the ratio of points to be propagated from the first archive to the next would be 3 1/3 and Whisper will not do partial point interpolation.

The total retention time of the database is determined by the archive with the highest retention as the time period covered by each archive is overlapping (see Multi-Archive Storage and Retrieval Behavior). That is, a pair of archives with retentions of 1 month and 1 year will not provide 13 months of data storage as may be guessed. Instead, it will provide 1 year of storage – the length of its longest archive.

Rollup Aggregation

Whisper databases with more than a single archive need a strategy to collapse multiple data points for when the data rolls up a lower precision archive. By default, an average function is used. Available aggregation methods are:

  • average
  • sum
  • last
  • max
  • min

Multi-Archive Storage and Retrieval Behavior

When Whisper writes to a database with multiple archives, the incoming data point is written to all archives at once. The data point will be written to the highest resolution archive as-is, and will be aggregated by the configured aggregation method (see Rollup Aggregation) and placed into each of the higher-retention archives. If you are in need for aggregation of the highest resolution points, please consider using carbon-aggregator for that purpose.

When data is retrieved (scoped by a time range), the first archive which can satisfy the entire time period is used. If the time period overlaps an archive boundary, the lower-resolution archive will be used. This allows for a simpler behavior while retrieving data as the data’s resolution is consistent through an entire returned series.