What is Graphite?

What Graphite is and is not

Graphite does two things:

  1. Store numeric time-series data
  2. Render graphs of this data on demand

What Graphite does not do is collect data for you, however there are some tools out there that know how to send data to graphite. Even though it often requires a little code, sending data to Graphite is very simple.

About the project

Graphite is an enterprise-scale monitoring tool that runs well on cheap hardware. It was originally designed and written by Chris Davis at Orbitz in 2006 as side project that ultimately grew to be a foundational monitoring tool. In 2008, Orbitz allowed Graphite to be released under the open source Apache 2.0 license. Since then Chris has continued to work on Graphite and has deployed it at other companies including Sears, where it serves as a pillar of the e-commerce monitoring system. Today many large companies use it.

The architecture in a nutshell

Graphite consists of 3 software components:

  1. carbon – a Twisted daemon that listens for time-series data
  2. whisper – a simple database library for storing time-series data (similar in design to RRD)
  3. graphite webapp – A Django webapp that renders graphs on-demand using Cairo

Feeding in your data is pretty easy, typically most of the effort is in collecting the data to begin with. As you send datapoints to Carbon, they become immediately available for graphing in the webapp. The webapp offers several ways to create and display graphs including a simple URL APIfor rendering that makes it easy to embed graphs in other webpages.


What is Graphite?

Graphite is a highly scalable real-time graphing system. As a user, you write an application that collects numeric time-series data that you are interested in graphing, and send it to Graphite’s processing backend, carbon, which stores the data in Graphite’s specialized database. The data can then be visualized through graphite’s web interfaces.

The Carbon Daemons

When we talk about “Carbon” we mean one or more of various daemons that make up the storage backend of a Graphite installation. In simple installations, there is typically only one daemon, carbon-cache.py. As an installation grows, the carbon-relay.py and carbon-aggregator.py daemons can be introduced to distribute metrics load and perform custom aggregations, respectively.

All of the carbon daemons listen for time-series data and can accept it over a common set of protocols. However, they differ in what they do with the data once they receive it. This document gives a brief overview of what each daemon does and how you can use them to build a more sophisticated storage backend.


carbon-cache.py accepts metrics over various protocols and writes them to disk as efficiently as possible. This requires caching metric values in RAM as they are received, and flushing them to disk on an interval using the underlying whisper library. It also provides a query service for in-memory metric datapoints, used by the Graphite webapp to retrieve “hot data”.

carbon-cache.py requires some basic configuration files to run:

The [cache] section tells carbon-cache.py what ports (2003/2004/7002), protocols (newline delimited, pickle) and transports (TCP/UDP) to listen on.
Defines a retention policy for incoming metrics based on regex patterns. This policy is passed to whisper when the .wsp file is pre-allocated, and dictates how long data is stored for.

As the number of incoming metrics increases, one carbon-cache.py instance may not be enough to handle the I/O load. To scale out, simply run multiple carbon-cache.py instances (on one or more machines) behind a carbon-aggregator.py or carbon-relay.py.


carbon-relay.py serves two distinct purposes: replication and sharding.

When running with RELAY_METHOD = rules, a carbon-relay.py instance can run in place of a carbon-cache.py server and relay all incoming metrics to multiple backend carbon-cache.py‘s running on different ports or hosts.

In RELAY_METHOD = consistent-hashing mode, a DESTINATIONS setting defines a sharding strategy across multiple carbon-cache.py backends. The same consistent hashing list can be provided to the graphite webapp via CARBONLINK_HOSTS to spread reads across the multiple backends.

carbon-relay.py is configured via:

The [relay] section defines listener host/ports and a RELAY_METHOD
With RELAY_METHOD = rules set, pattern/servers tuples in this file define which metrics matching certain regex rules are forwarded to which hosts.


carbon-aggregator.py can be run in front of carbon-cache.py to buffer metrics over time before reporting them into whisper. This is useful when granular reporting is not required, and can help reduce I/O load and whisper file sizes due to lower retention policies.

carbon-aggregator.py is configured via:

The [aggregator] section defines listener and destination host/ports.
Defines a time interval (in seconds) and aggregation function (sum or average) for incoming metrics matching a certain pattern. At the end of each interval, the values received are aggregated and published to carbon-cache.py as a single metric.


carbon-aggregator-cache.py combines both carbon-aggregator.py and carbon-cache.py. This is useful to reduce the resource and administration overhead of running both daemons.

carbon-aggregator-cache.py is configured via:

The [aggregator-cache] section defines listener and destination host/ports.
See carbon-relay.py section.
See carbon-aggregator.py section.

The Whisper Database

Whisper is a fixed-size database, similar in design and purpose to RRD (round-robin-database). It provides fast, reliable storage of numeric data over time. Whisper allows for higher resolution (seconds per point) of recent data to degrade into lower resolutions for long-term retention of historical data.

Data Points

Data points in Whisper are stored on-disk as big-endian double-precision floats. Each value is paired with a timestamp in seconds since the UNIX Epoch (01-01-1970). The data value is parsed by the Python float() function and as such behaves in the same way for special strings such as 'inf'. Maximum and minimum values are determined by the Python interpreter’s allowable range for float values which can be found by executing:

python -c 'import sys; print sys.float_info'

Archives: Retention and Precision

Whisper databases contain one or more archives, each with a specific data resolution and retention (defined in number of points or max timestamp age). Archives are ordered from the highest-resolution and shortest retention archive to the lowest-resolution and longest retention period archive.

To support accurate aggregation from higher to lower resolution archives, the precision of a longer retention archive must be divisible by precision of next lower retention archive. For example, an archive with 1 data point every 60 seconds can have a lower-resolution archive following it with a resolution of 1 data point every 300 seconds because 60 cleanly divides 300. In contrast, a 180 second precision (3 minutes) could not be followed by a 600 second precision (10 minutes) because the ratio of points to be propagated from the first archive to the next would be 3 1/3 and Whisper will not do partial point interpolation.

The total retention time of the database is determined by the archive with the highest retention as the time period covered by each archive is overlapping (see Multi-Archive Storage and Retrieval Behavior). That is, a pair of archives with retentions of 1 month and 1 year will not provide 13 months of data storage as may be guessed. Instead, it will provide 1 year of storage – the length of its longest archive.

Rollup Aggregation

Whisper databases with more than a single archive need a strategy to collapse multiple data points for when the data rolls up a lower precision archive. By default, an average function is used. Available aggregation methods are:

  • average
  • sum
  • last
  • max
  • min

Multi-Archive Storage and Retrieval Behavior

When Whisper writes to a database with multiple archives, the incoming data point is written to all archives at once. The data point will be written to the highest resolution archive as-is, and will be aggregated by the configured aggregation method (see Rollup Aggregation) and placed into each of the higher-retention archives. If you are in need for aggregation of the highest resolution points, please consider using carbon-aggregator for that purpose.

When data is retrieved (scoped by a time range), the first archive which can satisfy the entire time period is used. If the time period overlaps an archive boundary, the lower-resolution archive will be used. This allows for a simpler behavior while retrieving data as the data’s resolution is consistent through an entire returned series.


Configuring Graphite on Centos 7

Clone the source code:
git clone https://github.com/graphite-project/graphite-web.git
cd graphite-web
git checkout 0.9.x
cd ..
git clone https://github.com/graphite-project/carbon.git
cd carbon
git checkout 0.9.x
cd ..
git clone https://github.com/graphite-project/whisper.git
cd whisper
git checkout 0.9.x
cd ..

Configure whisper:
pushd whisper
sudo python setup.py install

Configure carbon:
pushd carbon
sudo python setup.py install
pushd /opt/graphite/conf/
sudo cp carbon.conf.example carbon.conf
sudo cp storage-schemas.conf.example storage-schemas.conf

storage-schemas.conf has information about schema definitions for Whisper files. We can define the data retention time under this file. By default it retains everything for one day. Once graphite is configured changing this file wont change whisper’s internal metrics. You can use whisper-resize.py for that.

Configure Graphite
pushd graphite-web
python check-dependencies.py
Install the unmet dependencies:
sudo apt-get install python-cairo python-django python-django-tagging python-memcache python-ldap

Configure the webapp:
pushd graphite-web
sudo python setup.py install

Configure Graphite webapp using Apache:
Install apache and mod_wsgi:
sudo apt-get install apache2 libapache2-mod-wsgi

Configure graphite virtual host:
sudo cp graphite-web/examples/example-graphite-vhost.conf
sudo ln -s /etc/apache2/sites-available/graphite-vhost.conf /etc/apache2/sites-enabled/graphite-vhost.conf
sudo unlink /etc/apache2/sites-enabled/000-default.conf

Edit /etc/apache2/sites-available/graphite-vhost.conf and add
WSGISocketPrefix /var/run/apache2/wsgi
Edit /etc/apache2/sites-available/graphite-vhost.conf and add
<Directory /opt/graphite/conf/>
Options FollowSymlinks
AllowOverride none
Require all granted
Reload apache configurations:
sudo service apache2 reload

Sync sqlite database for graphite-web:
cp /opt/graphite/webapp/graphite/local_settings.py.example
cd /opt/graphite/webapp/graphite
You can add turn on debugging for graphite-web by adding following to local_settings.py:
sudo python manage.py syncdb
while doing db-sync you will be asked to create superuser for graphite. Create a superuser and password for it.
Change owner of graphite storage directory to a user through which apache is being run:
sudo chown -R www-data:www-data /opt/graphite/storage/

Configure nginx instead of apache:

Install necessary packages:
sudo apt-get install nginx php5-fpm uwsgi-plugin-python uwsgi
Configure nginx and uwsgi:
cd /opt/graphite/conf/
sudo cp graphite.wsgi.example wsgi.py
Create a file /etc/nginx/sites-available/graphite-vhost.conf and add following to it:
server {
listen 8080;
server_name graphite;
root /opt/graphite/webapp;
error_log /opt/graphite/storage/log/webapp/error.log error;
access_log /opt/graphite/storage/log/webapp/access.log;

location / {
include uwsgi_params;

Enable the nginx server
sudo ln -s /etc/nginx/sites-available/graphite-vhost.conf /etc/nginx/sites-enabled/graphite-vhost.conf
Create a file /etc/uwsgi/apps-available/graphite.ini and add following to it:
processes = 2
socket =
gid = www-data
uid = www-data
chdir = /opt/graphite/conf
module = wsgi:application
sudo ln -s /etc/uwsgi/apps-available/graphite.ini /etc/uwsgi/apps-enabled/graphite.ini

Restart services:
sudo /etc/init.d/uwsgi restart
sudo /etc/init.d/nginx restart

Start Carbon (the data aggregator):
cd /opt/graphite/
./bin/carbon-cache.py start


Access the graphite home page:
Graphite homepage is available at http://<ip-of-graphite-host&gt;

Connect graphite to DSP-Core:

For connecting your DSP-Core with graphite you need to install carbon agnet over your
DSP-Core machine so that carbon agent will send the data to your graphite host which can be
displayed over web-UI. Follow these steps to connect your DSP-Core with graphite.
Install carbon over DSP-Core machine:
git clone https://github.com/graphite-project/carbon.git
cd carbon
git checkout 0.9.x

cd ..
pushd carbon
sudo python setup.py install
pushd /opt/graphite/conf/
sudo cp carbon.conf.example carbon.conf
sudo cp storage-schemas.conf.example storage-schemas.conf

Configure DSP-Core to send data to graphite:
You need to add following to /data1/deploy/dsp/current/dsp-core/conf/bootstrap.json
“carbon-uri”: [“<IP-of-graphite-host>:2003”]

Linux ldconfig Command Examples

What is ldconfig?

ldconfig is used to create, udpate and remove symbolic links for the current shared libraries based on the lib directories present in the /etc/ld.so.conf

3 ldconfig Examples

1. Display current libraries from the cache

This displays the list of directories and the libraries that are stored in the current cache. In the following example, it indicates that there are 916 libraries found in the cache file /etc/ld.so.cache, and it lists all of them below.

# ldconfig -p | head -5
916 libs found in cache `/etc/ld.so.cache'
	libzephyr.so.4 (libc6) => /usr/lib/libzephyr.so.4
	libzbar.so.0 (libc6) => /usr/lib/libzbar.so.0
	libz.so.1 (libc6) => /lib/libz.so.1
	libz.so (libc6) => /usr/lib/libz.so

2. Display libraries from every directory

Scans all the directories, and prints the directory name, and all the links that are created under it.

# ldconfig -v | head
	libGL.so.1 -> libGL.so.1.2
	liblouis.so.2 -> liblouis.so.2.2.0
	libasound_module_ctl_oss.so -> libasound_module_ctl_oss.so
	libasound_module_ctl_bluetooth.so -> libasound_module_ctl_bluetooth.so
	libasound_module_pcm_bluetooth.so -> libasound_module_pcm_bluetooth.so
	libasound_module_pcm_vdownmix.so -> libasound_module_pcm_vdownmix.so
	libasound_module_rate_speexrate.so -> libasound_module_rate_speexrate_medium.so

The /etc/ld.so.conf has an include statement, which indicates that all the *.conf file under /etc/ld.so.conf.d directory should be considered.

# cat /etc/ld.so.conf
include /etc/ld.so.conf.d/*.conf

As you see below, there are multiple *.conf file located under this ld.so.conf.d directory. All of these files will be used.

# ls -1 /etc/ld.so.conf.d/

Sometimes when you do ldconfig -v, you might get the following error. This is because the directory referred by some of the *.conf file located under /etc/ld.so.conf.d is not valid, and contains directory names that doesn’t exist.

/sbin/ldconfig.real: Can't stat /lib/i486-linux-gnu: No such file or directory
/sbin/ldconfig.real: Can't stat /usr/lib/i486-linux-gnu: No such file or directory
/sbin/ldconfig.real: Can't stat /lib/i686-linux-gnu: No such file or directory
/sbin/ldconfig.real: Can't stat /lib64: No such file or directory

Note: You can either ignore these error mesages are remove those *.conf files from the /etc/ld.so.conf.d directory.

3. Inform System about the New Libraries

If you’ve installed a new program by compiling it from source, you might want to inform the system about the new libraries.

For example, let us assume that you’ve installed a program called dummy, which has all it’s libraries under /opt/dummy/lib directory.

The following example will update the links using only the directory /opt/dummy/lib. This doesn’t rebuilt the links by processing the /etc/ld.so.conf file. Please note that this doesn’t rebuild the cache. It just updates the link.

# ldconfig -n /opt/dummy/lib

Instead of the above, you can also add the “/opt/dummy/lib” to /etc/ld.so.conf and do the following.

# vi /etc/ld.so.conf

# ldconfig

Syntax and Options


ldconfig [OPTION...]
Short Option Long Option Option Description
-v –verbose Indicates verbose mode. Prints current version number, name of each directory as it is scanned and links that are created.
-n Process the directories that are specified from the command line. This doesn’t process the regular /usr/lib and lib directories. This also doesn’t process directories specified in the /etc/ld.so.conf. This option implies -N.
-N This doesn’t rebuild the cache. Unless -X is also specified, links are still updated.
-X This doesn’t update the links. Unless -N is also specified, the cache is still rebuilt.
-f Use the specified config file instead of /etc/ld.so.conf.
-C Use the specified cache instead of /etc/ld.so.cache.
-r Change to and use root as the root directory.
-l This is library mode, which manually links individual libraries.
-p –print-cache Print the lists of directories and candidate libraries stored in the current cache.
-c FORMAT –format=FORMAT Uses FORMAT for the cache file. Valid values for FORMAT: old, new and compat. compat is the default value.
-i –ignore-aux-cache Ignore auxiliary cache file.
-? –help, –usage Display help
-V –version Display version number

Microservices at Netflix scale

2017-03-11 18_37_43-GOTO2016•Micros-Download-From-YTPak.com (1).mp4 - VLC media player2017-03-11 18_38_19-GOTO2016•Micros-Download-From-YTPak.com (1).mp4 - VLC media player2017-03-11 18_40_13-GOTO2016•Micros-Download-From-YTPak.com (1).mp4 - VLC media player2017-03-11 18_40_28-GOTO2016•Micros-Download-From-YTPak.com (1).mp4 - VLC media player2017-03-11 18_40_45-GOTO2016•Micros-Download-From-YTPak.com (1).mp4 - VLC media player

Netflix took 7 years to completely transform to microservices. The traditional approach that was followed was that developers contributed to their individual jars/wars which would go through the regular sprint iteration.

2017-03-11 18_42_47-GOTO2016•Micros-Download-From-YTPak.com (1).mp4 - VLC media player.png

As can be seen above that there are issues with the velocity for delivery and reliability.

Any one change in one service it would reflect into the other services which was very difficult to handle. This caused too many bugs and single point database. Few years back the production database of netflix got corrupted and the users/customers saw the following message.

2017-03-11 18_47_48-GOTO2016•Micros-Download-From-YTPak.com (1).mp4 - VLC media player.png

2017-03-11 18_49_46-GOTO2016•Micros-Download-From-YTPak.com (1).mp4 - VLC media player2017-03-11 18_50_22-GOTO2016•Micros-Download-From-YTPak.com (1).mp4 - VLC media player

2017-03-11 18_52_13-GOTO2016•Micros-Download-From-YTPak.com (1).mp4 - VLC media player2017-03-11 18_52_53-GOTO2016•Micros-Download-From-YTPak.com (1).mp4 - VLC media player

2017-03-11 18_54_05-GOTO2016•Micros-Download-From-YTPak.com (1).mp4 - VLC media player2017-03-11 18_54_44-GOTO2016•Micros-Download-From-YTPak.com (1).mp4 - VLC media player

2017-03-11 18_56_36-GOTO2016•Micros-Download-From-YTPak.com (1).mp4 - VLC media player.png

2017-03-11 18_57_58-GOTO2016•Micros-Download-From-YTPak.com (1).mp4 - VLC media player.png

2017-03-11 19_03_16-GOTO2016•Micros-Download-From-YTPak.com (1).mp4 - VLC media player2017-03-11 18_58_26-GOTO2016•Micros-Download-From-YTPak.com (1).mp4 - VLC media player2017-03-11 18_59_06-GOTO2016•Micros-Download-From-YTPak.com (1).mp4 - VLC media player2017-03-11 18_59_39-GOTO2016•Micros-Download-From-YTPak.com (1).mp4 - VLC media player2017-03-11 19_00_14-GOTO2016•Micros-Download-From-YTPak.com (1).mp4 - VLC media player2017-03-11 19_00_27-GOTO2016•Micros-Download-From-YTPak.com (1).mp4 - VLC media player2017-03-11 19_00_47-GOTO2016•Micros-Download-From-YTPak.com (1).mp4 - VLC media player2017-03-11 19_01_09-GOTO2016•Micros-Download-From-YTPak.com (1).mp4 - VLC media player2017-03-11 19_01_20-GOTO2016•Micros-Download-From-YTPak.com (1).mp4 - VLC media player2017-03-11 19_01_40-GOTO2016•Micros-Download-From-YTPak.com (1).mp4 - VLC media player2017-03-11 19_02_29-GOTO2016•Micros-Download-From-YTPak.com (1).mp4 - VLC media player2017-03-11 19_02_57-GOTO2016•Micros-Download-From-YTPak.com (1).mp4 - VLC media player

2017-03-11 19_07_25-GOTO2016•Micros-Download-From-YTPak.com (1).mp4 - VLC media player2017-03-11 19_07_47-GOTO2016•Micros-Download-From-YTPak.com (1).mp4 - VLC media player

2017-03-11 19_13_46-GOTO2016•Micros-Download-From-YTPak.com (1).mp4 - VLC media player2017-03-11 19_10_19-GOTO2016•Micros-Download-From-YTPak.com (1).mp4 - VLC media player2017-03-11 19_11_12-GOTO2016•Micros-Download-From-YTPak.com (1).mp4 - VLC media player2017-03-11 19_11_25-GOTO2016•Micros-Download-From-YTPak.com (1).mp4 - VLC media player2017-03-11 19_13_03-GOTO2016•Micros-Download-From-YTPak.com (1).mp4 - VLC media player

At netflix they want their services to isolate single point failures so here comes Hystrix. Hystrix is a latency and fault tolerance library designed to isolate points of access to remote systems, services and 3rd party libraries, stop cascading failure and enable resilience in complex distributed systems where failure is inevitable.

They test their system with fault injection test framework (FIT).

2017-03-11 19_19_45-GOTO2016•Micros-Download-From-YTPak.com (1).mp4 - VLC media player2017-03-11 19_20_10-GOTO2016•Micros-Download-From-YTPak.com (1).mp4 - VLC media player2017-03-11 19_21_50-GOTO2016•Micros-Download-From-YTPak.com (1).mp4 - VLC media player2017-03-11 19_21_43-GOTO2016•Micros-Download-From-YTPak.com (1).mp4 - VLC media player2017-03-11 19_22_05-GOTO2016•Micros-Download-From-YTPak.com (1).mp4 - VLC media player

Layer and tier

difference between a layer and a tier:

q1: web server architecture is a pipelined architecture.

q2: cloud archi require end teir design architecture.

q3: SOA architecture: component architecture

q4: GoF book “design patterns”

q5: plugin: type of factory pattern implemented to submit a inversion of control for object.

q6: piece of code that runs on its own

q7: GoF

q8: all hidden

q9: encapsulation ensures that the code does not have direct dependencies outside that can be changed at will.

q10: Cohesion determines layes

q11: cannot be scled easily

q12: no of things in piece of code.

q13: Favor component design over inheritnce

q14: Persistence,caching and optimistic locking cohesive code.

q15: Instance separately and consumed has connection with business ligic and comes as a service.

q16: sprint methodology (iterative)

q17: how implementation is done vs how it was told.


Many consumers share data, only some part of it is private.

Cache data locally.

CDN can block traffic like proxy.

DDoS Denial of service attack

CDN cache whole site even if its down

New relic uses cdn for mobile sdks