DOT language for graph

2017-05-30

GraphViz package is a set of tools to describe, automatically layout and manipulate graphs, which is of great significance in connection and relationship analysis and illustration. The core module of GraphViz is the layout and render engines which generate neat and elegant visualization of graph in various common formats from a descriptive small language (the graph file language or the DOT language). This post will represent a brief introduction of the DOT language and show some applications that utilize the power of GraphViz.

GraphViz & DOT: Hello World

First section of this post, let’s create a simple graph via GraphViz in DOT language.

Create a plain text file with the following content and save it as g.dot.

// File: hello-world.dot
// a directed graph with two nodes
// create a directed graph
digraph {
    // an directed edge from "Hello" to  "World"
    Hello -> World;
}

Then use dot command to convert the dot file to a visual graph. The dot command is part of the graphviz package which can easily be installed via package managers in most of Linux Distributions. (For example, use sudo apt install graphviz in Ubuntu or its derivations.)

1 2	# create g.svg at the same folder of the source file dot 'hello-world.dot' -Tsvg -O

By default, the graph of hello-world.svg will look like the following one.

This is much easier to make simple graph in this method then drawing them manually on either a piece of paper with pencil or canvas of a drawing program with mouse.

Syntax

After the first example, let’s have a more specific introduction about the DOT language by creating some simple graphs.

The syntax of the DOT language is quite simple, since all you need to specify in order to fully describe a graph are the nodes (the vertices) as well as the edges in between. Thus in DOT language, a typical graph is also specifying these entities.

// Specify the type and an optional name of graph 
// The type including graph (undirected graph) and digraph (directed graph)
graph G { 
    // list the attributes of the graph
    name=val;
    // list nodes and its attributes
    // If the default attributes are applicable, the declaration can be omitted
    A [name=val];
    B [name=val];
    // the "node" represent the default attributes for all of the nodes from this line
    node [name=val];
    // a link between A and B (no direction, only valid in undirected graph)
    // the options are optional
    A -- B [name=val];
    // multiple connections can chain together
    B -- C -- D;
    // several nodes connect to one node can be represented as a group
    {A, B} -- {C, D} -- {E, F};
    // a directed link from A to B (an arrow pointing to B, only valid in directed graph)
    A -> B [name=val];
}

Configuration & Customization

This section lists some simple attributes used to adjust nodes and edges. A complete list can refer to the official document.

Node properties

shape=name: the shape of the node. a full list is available at Node List;
width=d and height=d: specify the size of node;
style=filled: whether the node is filled, use style="" to clear the fill;
label=text: the label of the node, can also be simple HTML code for a flexible content;
color=c: the color or the outline of the shape or the edge, like color="#0091cc" or color=red;
fontcolor: the color of text;
fillcolor: the filled color of the node;

Edge properties

style=dotted|dashed|bold|solid|none: the line style of an edge;
arrowType=normal|vee|dot|odot|empty: the arrow style of an directed edge, a full list is available at Arrow List;
label: the label of the node;
color: the color of the edge;

Layout and layout engine

Layout is the relative position of nodes in a graph.

layout properties

Some common options for the graph can be used to adjust the layout.

size="x,y": the maximum size of the rendered graph in inch;
margin=f: the margin of the rendered graph, accounts for the size of the result (more like the padding property in CSS);
rankdir=LR|RL|BT|TB: drawing left to right (LR), right to left (RL), bottom to top (BT) or top to bottom(TB the default option);
nodesep=f: the minimum separation between nodes.

layout engine

So far, all of the graphs are converted via dot command. Apart from that, GraphViz also provides some other layout engines. A common set of engines is listed as the following table, while more engines may available within your installation package.

Name	Description
`dot`	drawing directed graphs
`neato`	drawing undirected graphs
`twopi`	radial layouts of graphs
`circo`	circular layout of graphs
`fdp`	drawing undirected graphs
`sfdp`	drawing large undirected graphs
`patchwork`	tree maps

The layout engine can be selected by either the name of the command or the -K command option (the latter is prior). This can also be selected via graph property layout (such as layout=dot).

Application

In this section, let’s explore some of tricks and applications of GraphViz.

Export to common image files

GraphViz in implemented as a open and extensible architecture. Thus, it can render the graph to various of targets specified by plugins. Besides, as a command line tool, the output of the application can be piped to other utilities for further processing.

Traditionally, GraphViz support the following types： ps (post script), svg/svgz(scalable vector graphics), png/gif (bitmap file), etc. However, most packaging of the GraphViz will contain several extensions providing more export varieties. For a list of supported export target of your installation, use the -T option with a invalid name:

dot -T\*
# may return something like the next line, which contains quite a long list
# Format: "*" not recognized. Use one of: bmp canon cmap cmapx cmapx_np dot eps fig gd gd2 gif gtk gv ico imap imap_np ismap jpe jpeg jpg pdf pic plain plain-ext png pov ps ps2 svg svgz tif tiff tk vml vmlz vrml wbmp x11 xdot xdot1.2 xdot1.4 xlib

For a single file type, there may also be several supported plugins, which can be specified by append the variety name leading by a comma, like dot -Tpng:cairo:gd. Similarly, a list of supported items can be retrieved by providing a invalid variety.

1
2
3

dot -Tpng:\*
# sample result
# Format: "png:*" not recognized. Use one of: png:cairo:cairo png:cairo:gdk png:cairo:gd png:gd:gd

Jupyter Notebook

As mentioned in previous post, there are several handy bindings of GraphViz in the world of Python. This sections illustrate a example of using GraphViz package for representing a decision tree constructed by a tree model sklearn package (a machine learning package).

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier, export_graphviz
from graphviz import Source
# use the famous iris data set as an illustration
iris = load_iris()
model = DecisionTreeClassifier()
model.fit(iris.data, iris.target)
# construct tree (the graph viz dot file from the model)
tree = export_graphviz(model, 
                       out_file=None, 
                       special_characters=True, 
                       feature_names=iris.feature_names, # specify the feature names (petal width, petal length, sepal length and sepal length)
                       class_names=iris.target_names # specify the target names (type of iris)
                       )
Source(tree)

The decision tree trained from the iris data set is just represented within the jupyter notebook

Actually the variable is a plain string contains the source of the dot file. The source for the graph illustrated above is listed as follows (sklearn use HTML format to fill the content of graph):

digraph Tree {
    node [shape=box] ;
    0 [label=<petal width (cm) &le; 0.8<br/>gini = 0.6667<br/>samples = 150<br/>value = [50, 50, 50]<br/>class = setosa>] ;
    1 [label=<gini = 0.0<br/>samples = 50<br/>value = [50, 0, 0]<br/>class = setosa>] ;
    0 -> 1 [labeldistance=2.5, labelangle=45, headlabel="True"] ;
    2 [label=<petal width (cm) &le; 1.75<br/>gini = 0.5<br/>samples = 100<br/>value = [0, 50, 50]<br/>class = versicolor>] ;
    0 -> 2 [labeldistance=2.5, labelangle=-45, headlabel="False"] ;
    3 [label=<petal length (cm) &le; 4.95<br/>gini = 0.168<br/>samples = 54<br/>value = [0, 49, 5]<br/>class = versicolor>] ;
    2 -> 3 ;
    4 [label=<petal width (cm) &le; 1.65<br/>gini = 0.0408<br/>samples = 48<br/>value = [0, 47, 1]<br/>class = versicolor>] ;
    3 -> 4 ;
    5 [label=<gini = 0.0<br/>samples = 47<br/>value = [0, 47, 0]<br/>class = versicolor>] ;
    4 -> 5 ;
    6 [label=<gini = 0.0<br/>samples = 1<br/>value = [0, 0, 1]<br/>class = virginica>] ;
    4 -> 6 ;
    7 [label=<petal width (cm) &le; 1.55<br/>gini = 0.4444<br/>samples = 6<br/>value = [0, 2, 4]<br/>class = virginica>] ;
    3 -> 7 ;
    8 [label=<gini = 0.0<br/>samples = 3<br/>value = [0, 0, 3]<br/>class = virginica>] ;
    7 -> 8 ;
    9 [label=<petal length (cm) &le; 5.45<br/>gini = 0.4444<br/>samples = 3<br/>value = [0, 2, 1]<br/>class = versicolor>] ;
    7 -> 9 ;
    10 [label=<gini = 0.0<br/>samples = 2<br/>value = [0, 2, 0]<br/>class = versicolor>] ;
    9 -> 10 ;
    11 [label=<gini = 0.0<br/>samples = 1<br/>value = [0, 0, 1]<br/>class = virginica>] ;
    9 -> 11 ;
    12 [label=<petal length (cm) &le; 4.85<br/>gini = 0.0425<br/>samples = 46<br/>value = [0, 1, 45]<br/>class = virginica>] ;
    2 -> 12 ;
    13 [label=<sepal width (cm) &le; 3.1<br/>gini = 0.4444<br/>samples = 3<br/>value = [0, 1, 2]<br/>class = virginica>] ;
    12 -> 13 ;
    14 [label=<gini = 0.0<br/>samples = 2<br/>value = [0, 0, 2]<br/>class = virginica>] ;
    13 -> 14 ;
    15 [label=<gini = 0.0<br/>samples = 1<br/>value = [0, 1, 0]<br/>class = versicolor>] ;
    13 -> 15 ;
    16 [label=<gini = 0.0<br/>samples = 43<br/>value = [0, 0, 43]<br/>class = virginica>] ;
    12 -> 16 ;
}

Besides, the PyEDA package also utilize the GraphViz engine to visualize binary decision diagram (BDD) and reduced ordered BDD (ROBDD). Thus a nice visualization can instantly represented in Jupyter notebook when handling boolean expressions, as well as its conversions and simplifications. The official document of PyEDA also provides a detailed tutorial, which is available at here.

Port to Web Platform

Modern web platform provides a rich and growing set of APIs that empowers variety of fancy application. One of the recent added API is Web Assembly which expose a more native interface for web application to utilize the computation capability of client. With assist of some compiler backends (such as emscripten), the layout engine of GraphViz can be compiled into Web Assembly format and execute directly within user browser. A ready-to-use library named Viz.js is available at GitHub (mdaines/viz.js), which also provides am online demo at http://viz-js.com/.

The API exposed by Viz.js is quite simple and clear. It just consumes the source of a graph and generates the image. However, this process maybe kinds of time-consuming, and will block the UI thread (the interaction of the web page), which is quite annoying, especially on mobile devices with limited computing capability. A solution to this is problem is using a dedicate thread for calculating the layout, which can be implemented as a Web Worker.

I have also made a simple demo for representing DFA converted from regular expressions, which use this approach to automatically layout the nodes within a modern browser. You can refer to the source for integration of Viz.js to webworker and packaging every piece of code with rollup.js.

CC-BY-SA 4.0

The content of this post is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

END

Simple Visualization in Python

Configure ROS Environment

Disqus is loading...