How are branches being used? A visualization

I really like the terminal for visualizations. There is something in its restrictions that gives it the right balance of detail for rough visualizations. The fixed width font, the ASCII character set, the small overall grid, etc, all help. I wanted to see how our developers are using branches over the last 180 days. This script gets the branches log and outputs a chart for days that had checkins by each developer. Note that a day without checkins by anyone will have no column in the chart.

The code is
#!/bin/bash

date1=$(date -v -180d +'%Y-%m-%d')
date2=$(date -v -1d +'%Y-%m-%d')
svn='svn+ssh://.../branches'

svn log --revision {$date1}:{$date2} $svn | perl -e '
  my %D = (); # dates
  my %U = (); # users
  while( <> ) {
    if ( /\| (.*?) \| (\d\d\d\d-\d\d-\d\d) / ) {
      my $u = $1;
      my $d = $2;
      $D{$d} ||= {};
      $D{$d}->{$u} += 1;
      $U{$u} = 1;
    }
  }
  my @d = sort keys %D;
  my ( $l ) = reverse sort map { length($_); } @d; # max length
  for my $u ( sort keys %U ) {
    printf( "%*s %s | ", -$l, $u, $d[0] );
    for my $d ( @d ) {
      print $D{$d}->{$u} ? "*" : "-";
    }
    printf( " | %s\n", $d[$#d] );
  }
'

# END
Update: A failing of this script is that it only shows days with activity. Days without activity by any developer will be missing from the timeline. I replaced this script with this one
#!/usr/bin/perl -w

use strict;
use DateTime;

my $svn_url='svn+ssh://.../branches';

my $stop = DateTime->today();
my $start = $stop->clone()->subtract( days => shift @ARGV || 90 );

my %W = (); # week days
my %D = (); # dates
my %U = (); # users

# initialize the dates and the weekdays hashs with days between the start and stop dates
for ( my $next = $start->clone(); $next->compare($stop) <= 0; $next->add( days => 1) ) {
 my $F = $next->strftime('%Y-%m-%d');
 $D{$F} = {};
 $W{$F} = $next->day_of_week();
}

# collect the svn usage data
if ( open( IN, "svn log --revision \{$start\}:\{$stop\} $svn_url |" ) ) { 
 while( <IN> ) {
  if ( /\| (.*?) \| (\d\d\d\d-\d\d-\d\d) / ) {
   my $u = $1;
   my $d = $2;
   if ( defined $D{$d} ) {
    $D{$d}->{$u} += 1;
    $U{$u} = 1;
   }
  }
 }
 close(IN);
}

my @d = sort keys %D;
my ( $l ) = reverse sort { $a <=> $b } map { length($_); } keys %U; # max user length

# output the chart
for my $u ( sort keys %U ) {
 printf( "%*s %s | ", -$l, $u, $d[0] );
 for my $d ( @d ) {
  print $D{$d}->{$u} ? "*" : "-";
 }
 printf( " | %s\n", $d[$#d] );
}

# output the line of mondays
printf( "%*s %*s | ", -$l, "", length($d[0]), "" );
for my $d ( @d ) {
 print $W{$d} == 1 ? "M" : " ";
}
printf( " |\n" );

# END

2 comments:

Matt Caron said...

Lolz SVN... 2005 called, they want their SCM back. :-) Seriously, gits is soooo much better/faster..

Andrew Gilmartin said...

Yes and you have to work in an organization that wants to use a distributed version control system. I don't work in one. With that said, an advantage of subversion and any centralized version control system is that when a check-in occurs it automatically places the versioned files on another machine. So a lost drive or mistaken rm -rf on the developer's machine does not lose days, if not weeks, of work.

The whole reason for the chart was to show how poorly my fellow developers are using the existing VCS to my manager. Perhaps GIT's speed would make them more likely to use the VCS, but I doubt it. Establishing and maintaining an engineering culture works better from the top than from the bottom. Sigh.