Update 27 Sep 2011
In our last meeting I took down these actions:
- Ressurect NGMAST paper and use Enron + MIT-NOV – argue that MIT is not big enough to make assumptions
I have started another paper in the Clique SVN here: https://erdos.ucd.ie/svn/clique/papers/BubbleH – at the moment its just the same as the NGMAST paper.
- Write Thesis outline and get started on background chapters – message dissemination, and social networks side (from milgram onwards etc)
I created a thesis document in the repository too: https://erdos.ucd.ie/svn/clique/papers/MattStabelerThesis – I have started to compile some notes into background chapters, based on my transfer report. I have also started a rough outline of chapters. The latest version in PDF is here.
- Speak to Conrad about finding a non-overlapping Hierarchical CFA
Since the feedback in CLIQUE mini-workshop, I asked Fergal, Conrad and Aaron again about community finding algoriths. They mentioned the Blondel one and the InfoMap one again.
- Get a decent sub-set of STUDIVZ dataset
To get a sub-set of nodes in the STUDIVZ dataset I started by picking N nodes randomly, and included their immediate neighbours. I do the L more times to get the nodes a depth of L hops away from the source. Using 10 random nodes, with a depth of 2 yields a network of around 3500 nodes (12% of all nodes). When reduced to 5 seed nodes, we get ~1000 nodes (~4%). Going the other way, 100 seed nodes, with a depth of 1 gives 14571 nodes covering ~50% of the network. These figures change depending on which nodes are selected at random initially.
Currently, i’m testing the setup with 5 seed nodes and 2 levels of network, with the hope that there will be some overlap.
Conrad suggests that we take them non-randomly – by first reducing our set of nodes to those with high activity (either number of posts, or total length of posts), then using the network L hops from the remaining nodes.


