Dawn of the Dev: 2012

Friday, 19 October 2012

EDS Killing Thunderbird

I’ve finally worked out why Thunderbird email has been almost unusable since upgrading my various Ubuntu machines to Precise Pangolin 12.04.

The symptoms were very sluggish Thunderbird interface response, leading to a segmentation fault crash. The crashes only happened sometimes and didn’t seem to follow a pattern of time from start-up.

The EDS Contact Integration Add-on seems to have been responsible. EDS is the Evolution Data Server and although I don’t use Evolution directly, I have registered my various Google and Twitter accounts, which apparently get pulled in by EDS too.

I have quite a few Google contacts (>700), so I can only assume that the add-on is trying to process them all.

Anyway, the immediate problem is solved by disabling the add-on. I use a google contact sync plug-in anyway, which works fine, so as far as I know I’m not missing out on anything.

Monday, 28 May 2012

Statistical Graphing with R

I have been collecting some metrics from the 8 agile teams that I work with and planned to create some nice visual reports to help identify software development practices that could be improved.

The data was initially in an Excel spreadsheet, so I used its built-in graphing capabilities which had been good enough for previous projects. I quickly found problems that made me want to look for a better solution;

Difficult to create multiple graphs with aligned x-axes.
Every time I wanted to use the same graph with different data, I’d have to either replace the data or tweak every graph.

So I tried Gnumeric, which has some nice graphing features, but its lack of pivot tables puts it out of the running. Libre Office Calc was much the same as Excel. I tried some Linux graphical plotting applications, the nicest of which was QTIPlot, which solved the common X-axis problem, but its vector output was poor.

Eventually I looked at statistical computing environments and settled on the R Project, a programming language for statistical analysis and graphing. It solved all the problems I was having with GUI based tools and in the process introduced me to a new way of explore my data.

R has a shell, so you can load some data into a variable and then start playing with it. You can easily filter data, apply matrix transformations and feed data through mapping functions.
Once you have the data in the shape you’re interested in, you can run it through one of the built-in functions, which give you lots of standard graph types, or delve into The Comprehensive R Archive Network (CRAN) which is a massive library of user contributed functions for graphing and analysis.

Once you have settled on the transformations and graphs you want to write out, you can write a script that outputs to various file formats, including PDF, SVG and Postscript. At the end of each iteration, I run my script that takes the latest data as a CSV file and outputs all the graphs I need for my report. When I think of new graphs I’d like to include, I add them to my script. It’s easy, reproducible, massively flexible and oh yes… it’s open source too.

Here’s a quick distribution graph (took maybe five minutes)…

This distribution of story estimates (in man-days) shows that;

Even numbers are more popular than odd numbers
9, 11, 17 and 19 are never chosen, presumably due to rounding up or down to 10 and 20.
There is a general preference for nice small stories.

All this was created with the following R script;


data <- read.table("data.csv", header=TRUE, sep=",")

small_quotes <- data$Original.Estimate[
  data$Original.Estimate <= 576000 & 
  data$Original.Estimate > 0] / 28800

hist(
  small_quotes, col=rainbow(40,0.5),
  main=paste("Histogram of ", length(small_quotes),
    "Quote Sizes <= 20 units"),
  breaks=20, xlab="Estimate", ylab="Frequency"
)

Have fun setting your data free!

Monday, 20 February 2012

Learning Erlang Recursion

I’m learning the Erlang Programming Language and am currently working on the recursion exercise in the erlang course.

The min and max functions seemed pretty straightforward;

min([X,Y]) when X > Y -> Y;
min([X,_]) -> X;
min([H|T]) -> min(H,min(T)).

max([X,Y]) when X > Y -> X;
max([_,Y]) -> Y;
max([H|T]) -> max(H,max(T)).

min_max(List) -> {min(List),max(List)}.

But this is the best I could come up with for the Swedish Date problem; convert an Erlang date in the format {2001,02,03} to “010203″ recursively.

swedish_date() -> swedish_date(date()). 

swedish_date(Date) -> swedish_date(tuple_to_list(Date), "").

swedish_date([H|T], DateString) ->
        List = "0" ++ integer_to_list(H),
        String = lists:sublist(List, length(List)-1, 2),
        swedish_date(T, DateString ++ String);
swedish_date([], DateString) -> DateString.

Can anyone improve on that? I feel sure that recursion could help with the number of digits in each part of the tuple. 2001 → 01, 2 -> 02, etc.