Graphing with R

I wanted to know who was using our DNS servers for recursive lookups from off campus. Getting all the data together for this wasn’t hard, but once I had it it was difficult to read, so I made a graph of it. Here’s how:

First I got all the data into a csv file one datapoint per line and a header line, it looks like:


asn, queries, hosts, name
11888, 0.031, 0.006, Television Internacional S.A. de C.V.
5552, 0.006, 0.006, DIALNET-UK DIALnet PLC

The asns and names are pulled from Cymru, queries is the average number of queries from that netblock per day since we started monitoring, and hosts is the average number of unique IPs querying per day since we started monitoring. Getting the data into this format is left as an exercise for the reader (you probably want to graph something completely different anyway.

Now the fun part lets define a scatterplot in R.


dns = read.table("/tmp/summary", sep=",", header = TRUE, strip.white = TRUE)
queries = dns$queries
hosts = dns$hosts
png(filename = "/home/stuartf/public_html/dns.png", width=800, height=600)
par(xaxs="r")
plot(hosts, queries, log="xy", pch=16, cex=.25)
text(hosts[hosts>50], queries[hosts>50], dns$name[hosts>50], cex=.75, adj=0)
text(hosts[queries>1000], queries[queries>1000], dns$name[queries>1000], cex=.75, adj=0)

The first line creates a table from our csv file.
Then the next two lines make columns of data from that table into 1 dimensional arrays.
The fourth line tells R that I’d like to write the file out as a png with the given dimensions.
The next line adjusts how the interval of the ticks on the x axis is calculated see R’s documentation.
Then we plot the number hosts against the number of queries with logarithmic scales on both axes, the pch sets the plot character and the cex scales it.
The final two lines label the points where more than 50 hosts or more than 1000 queries show up per day.

The resulting graph is a bit messy and probably isn’t fit for publishing, but it gives me the data I need, and could be further tweaked for readability if necessary.

[img_assist|fid=111|thumb=1|alt=scatterplot of dns queries & hosts per day|caption=A scatterplot made with R.]