Matt Stabeler

The problem parents

June 22nd, 2011 matt 2 comments

After we submitted the paper to FindingNEMO2011, I started to work on the Social Sensing dataset, and Pádriag’s suggestion to create a Random community finding algorithm. Initial tests showed a peculiar phenomenon: All runs of BUBBLERap and for KCLIQUE with multiple parameters were coming up with exactly the same results. Now, this really shouldn’t happen, so I looked further.

Unfortunately, or fortunately, depending on how you look at it, it is not a bug in the code. It was because of a decision we made a long while back, to force each community finding algorithm to output a top level community including all nodes. This works well for BubbleH, which uses targeted community routing. But, and in hindsight, this is obvious, BUBBLERap simply looks as community membership, and so, when it meets another node, they are ALWAYS in the same community, and therefore it routes using ONLY local ranking. This has the effect of making our implementation simply centrality based routing.

I spent the last couple of days re-engineering the configuration, so that now, we can get results for communites with a global parent, and without a global parent. Shown below are the results from the FindingNEMO11 paper, and below that the updated results. Where we have used no-global-parent for BUBBLERap and a global parent for BubbleH.

Results used in FindingNEMO11 paper

Results for ALL CFAs and Datasets, using Global Parents only for BubbleH

The new results shows a slightly different picture, in that the improvement in delivery ratio and latency are less pronounced. One interesting thing is that for InfoMap, which partitions the graph, the results are exactly the same. This is because there is no ovelap, which both algorithms to work in the same way, i.e. BUBBLERap uses global centrality to route between nodes of different communities, and BubbleH uses the local centrality in the GLOBAL parent to route to nodes not in smaller communities. This means that BubbleH is actually designed solely for overlapping community structure. With regards to the Heirarchichal clustering of HGCE, we see that it performs poorly for Cambridge, but roughly equally for the others. LinkClustering for BUBBLERap in InfoCom2005 and MIT-NOV performs very poorly on Delivered Hops. This could be down it creating a large number of communities.

Global versus No Global parent for LinkClustering BUBBLERap on MIT November,

It seems the difference is very pronounced in LinkClustering, but it is not a problem with the results. The differences between the results for HGCE is the result of randomness (pertubations) for each run. This may result in withdrawing the paper from SMW11. Which is very unfortunate.

My next step is to finish getting the Social Sensing dataset prepared, (by testing muliple CFA parameters against it) and add it to the matrix plot. Then, I will finish doing the Random runs against all of the datasets, to see if the CFAs have any real effect on performance, or whether it is insignificant. (!)

Finally, if things go to plan, I will continue working on the twitter based dataset, and perhaps the new sensor dataset from St Andrews.

Categories: Uncategorized

Social Sensing Dataset

June 15th, 2011 matt 3 comments

I finally got around to extracting out contact events from the social sensing study. I have made some notes on the Social Sensing Dataset page. The main images are shown below:

Social Sensing Dataset activity, 15mins, hourly, daily, 3 daily, and weekly. Lines showing the most active period

It seems most active from Oct 2008 onwards, and tails off, with the end of Nov 2008 being a good cut-off point. This means there is over a year of contact data.

Force directed connected time graph for SocialSensing dataset, showing mapping of users to nodes, with low weight edges removed. NOTE: Do not share this image - remove names/btmacs first

The connected time graph (i.e. edges weighted by the total length of time connected with the target node) gives a really good sense of the social structure. As I personally know most of these people, (and I am on it myself) it is easy for me to see the relevance of the layout.

I haven’t written code to lookup locations based on cell towers/wifi points yet, but it might be interesting in the future if we end up using location.

Also, I need to work on getting BTMAC addresses for the three other people. The problem is that I know that at least one of these people changed phones during the study – hence why they are removed.

Categories: Datasets, what i've been doing

Extracting contact networks from less obvious datasets.

June 14th, 2011 matt 2 comments

There are three (or more) possible routes from this point, either we delve deeper into the merits of using community structure in routing, and try to categorise different networks based on the performance differences of routing in these diverse contact networks. Or we concentrate on the mechanics of making the existing system fully distributed, and therefore having the possibility of real-world deployment. Or we could go down the route of adding a new aspect to the mix, and finding out what location does when put into the mix. However, there is a problem with all of these, and that is that we don’t have the datasets to measure any of them.

Pádraig and I recently discussed relaxing the specification on datasets, to give us larger and more robust data to deal with.

Conrad has a dataset based on wall posts on a social networking site, which we could use as an analogue to a human contact network: A contact event occurs when a wall post is made by one person to another, the ‘duration’ of the contact is based on the length of the wall post.

The Enron dataset has been used in the literature (Kleinberg vector clock paper) before, and we could adapt this in the same way, using the emails to build a network of contacts in a similar way – with message length (if available) as a indicated of contact duration.

Pádraig also mentioned a blogging network, from which we could extract contacts from links, or trackbacks/pingbacks etc.(?)

Finally, the Sentiment project allready collects a large number of tweets, along with location information about them, talking to Anthony Brew, it looks like it may be limited, but I think it is worth a look over the data to see if there is enough coverage to build a reliable network from directed messages (@someone). The problem is that we will only have a percentage of the tweets from the twitter API (apparently). But we could override this by making use of whatever (rate limited) we can get from twitter about an interesting group of specific people, rather than just looking at everyone.

Categories: Datasets

Extending Datasets

May 31st, 2011 matt 2 comments

Pádraig had the idea to try to eliminate Delivery Ratio from our analysis, this way we could then concentrate specifically on Cost and Latency. He suggested that we take the normal dataset and extend it to three times the length, repeating the events in the correct time ordering. Then we could run the simulations 10 times each with a different start time in the original portion of the dataset, this would give each message a much better chance of being delivered, over three iterations of the dataset.

I implemented this by taking a slice of the MIT dataset, processing the events contained within it, and duplicating them, adding the length of the slice to the timestamp times the iteration (i.e. MIT-NOV, MIT-NOV+1 length, MIT-NOV+2 lengths). Any contact events that were active at the start or end time of the slice, were truncated to start/end at the start/end time. I verified the resulting dataset by counting the number of connection every day, and ensuring that there was a distinct repeating pattern. For each dataset, I split the duration between the start and end time of the original dataset time period into 10 equal chunks, and calculated the corresponding start time for each. For MIT and Cambridge this equates to roughly every three days, whereas in the InfoCom datasets there is a much smaller gap between start times (~14-20 hours).

[expand title=”start times”]

mit-nov 2004-11-10_00:00:00 - 2004-12-11_00:00:00
2004-11-10_00:00:00
2004-11-13_02:24:00
2004-11-16_04:48:00
2004-11-19_07:12:00
2004-11-22_09:36:00
2004-11-25_12:00:00
2004-11-28_14:24:00
2004-12-01_16:48:00
2004-12-04_19:12:00
2004-12-07_21:36:00

infocom-2005 2005-03-06_00:00:00 - 2005-03-12_00:00:00
2005-03-06_00:00:00
2005-03-06_14:24:00
2005-03-07_04:48:00
2005-03-07_19:12:00
2005-03-08_09:36:00
2005-03-09_00:00:00
2005-03-09_14:24:00
2005-03-10_04:48:00
2005-03-10_19:12:00
2005-03-11_09:36:00

infocom-2006 2006-04-24_00:00:00 - 2006-05-02_00:00:00
2006-04-24_00:00:00
2006-04-24_19:12:00
2006-04-25_14:24:00
2006-04-26_09:36:00
2006-04-27_04:48:00
2006-04-28_00:00:00
2006-04-28_19:12:00
2006-04-29_14:24:00
2006-04-30_09:36:00
2006-05-01_04:48:00

cambridge 2005-10-28_00:00:00 - 2005-12-23_00:00:00
2005-10-28_00:00:00
2005-11-02_13:30:00
2005-11-08_04:00:00
2005-11-13_18:30:00
2005-11-19_09:00:00
2005-11-24_23:30:00
2005-11-30_14:00:00
2005-12-06_04:30:00
2005-12-11_19:00:00
2005-12-17_09:30:00

[/expand]

Running Simulations

We decided to remove the overhead of running the CFAs with multiple parameters, by choosing a set of parameters for each, that had performed well in previous tests. The list below shows the ones we chose:

[expand title=”CFA Parameters”]

MIT-NOV
	HGCE
	  Bubble
	  K	A	M	E	P	S	R	ST	MAP	Z
	  3	1.3	0	0.25	0.2	4	100	0.7	0.5	0.1

	  BubbleH
	  K	A	M	E	P	S	R	ST	MAP	Z
	  4	1.3	0	0.25	0.3	4	100	0.9	0.9	0.1
			# Communities AverageMembers
			# 21 15
			2 3 4 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 23 24 25 27 29 30 31 32 34 35 36 37 38 39 41 42 43 45 46 47 48 50 51 53 54 55 57 58 59 60 61 62 63 65 66 67 68 69 70 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 94 95 97 98 99 100 101 102 103
			6 8 14 16 18 30 41 46 53 59 61 63 78 79 86 88 91 95 101 102
			6 13 17 29 30 47 86 88 95 101
			10 25 62 65 74 83 102
			3 25 30 62 74
			4 23 30 45 57 91
			9 30 42 101
			19 27 35 37 39 81 89 102
			2 41 51 87 88 95 102 103
			15 41 43 51 67 69 87 88 95 100 103
			19 27 41 66 85 98
			15 43 46 58 61 67 69 78 79 83 85 90 95 97 100 103
			2 7 11 15 57 61 67 69 78 79 85 87 88 90 95 97 100 101 103
			6 7 8 10 11 14 15 16 17 21 30 31 34 35 38 41 43 46 55 58 61 69 75 78 79 80 83 85 86 88 90 91 94 95 97 101 102
			8 10 15 16 17 30 38 41 46 53 63 73 80 83 85 95 97 99 101 102
			10 14 18 25 35 46 65 83 86 88 90 102
			10 17 25 35 53 72 85 88
			10 25 53 72 74 78
			27 30 50 95 97 99
			4 12 19 20 23 24 27 30 32 36 37 39 41 45 48 50 54 59 60 68 70 76 77 81 82 84 89 98
			19 27 37 39 41 68 81 98

  KCLIQUE
	  BubbleH
	  K	MCT
	  3	0.01
			# k = 3
			# minimum_connected_ratio = 0.01
			# number_of_communities = 4
			# community_sizes = [5, 3, 45, 24]
			# average_size = 19.25
			103 2 67 100 87
			99 27 69
			6 7 8 10 11 13 14 15 16 17 18 21 29 30 31 34 35 38 41 43 44 46 53 55 58 59 61 63 67 73 75 78 79 80 83 85 86 88 90 91 95 97 100 101 102
			68 70 76 12 77 81 82 19 84 23 24 89 32 98 36 37 39 45 48 50 54 20 59 4 

	  Bubble
	  K	MCT
	  3		0.01	

  LinkClustering
	  BubbleH
	  ET
	  0
			# D_max: 0.591406681046
			# S_max: 0.596774193548
			# Communities: 79
			# Average size : 8
			# Nodes : 95 (seen 635 times)
			3 72
			51 73
			58 30 77 50 102 88 89 97 53 67 82 69 81 86 87 84 85 24 20 21 48 23 46 45 43 41 4 7 6 8 68 11 83 13 99 76 75 38 70 91 90 100 101 95 29 79 78 15 10 39 12 59 14 17 98 19 54 31 16 37 36 35 55 63 32 61 103
			25 72
			62 45
			74 65
			72 35
			77 19 74 20
			19 9 77
			25 84
			103 67 47
			3 30
			25 65
			44 73
			74 73
			9 2
			83 61 88 67 102 90 100 81 95 87 79 85 11 10 101 21 17 16 86 41 2 7 103
			25 74
			27 74
			91 10 17 16 30 88 41 61 62 102 83 101 86 79
			61 88 67 91 83 100 101 86 87 97 85 69 95 57 30 41 7 103 78 31 90
			3 74
			19 98 66 24
			76 64
			4 45 50 72
			102 63 30 33 101
			62 65
			24 20 48 19 32 37 77 98 89 73 54 59
			10 21 61 88 63 90 91 83 101 102 94 97 85 95 25 86 58 17 16 55 18 30 43 53 41 46 69 7 8 103 78
			1 83 30
			9 72
			98 36 77 76 89 70 68 20 84 24 39 12 59 48 23 19 54 82 45 37 50 4 32
			30 42 101
			18 84
			42 72
			90 28 103
			44 27
			25 3
			25 35
			25 62
			102 88 91 90 80 101 95 87 79 85 11 10 81 15 21 14 16 46 86 31 30 61 43 35 41 97 7 6 9 8 18
			60 81 30
			9 52
			52 101 15 16 30 41
			64 73
			39 27 19 37 50 36 98 89 4 68
			99 63 73 102 83 101 95 87 97 85 11 10 38 15 14 17 16 46 31 30 53 41 94 6 8
			73 72
			83 29 88 65 90 102 103 80 86 79 78 10 38 15 21 55 18 31 43 35 46 69 7
			61 63 66 67 102 83 100 101 95 85 10 14 16 46 41 55 69 7 8 78 90
			11 13 58 17 16 47 31 30 29 88 95 83 101 86
			23 34
			93 40
			9 57
			21 16 30 40 41 61 88 6 102 101 86 14
			27 66
			67 43 61 88 63 53 90 69 80 81 86 87 85 21 46 95 29 41 79 7 6 8 83 99 75 91 103 100 101 102 94 97 78 11 10 13 38 15 58 17 16 55 18 31 30 35 34 14
			9 42
			11 13 14 17 30 35 41 61 88 6 93 101 86 87 79
			73 66
			74 72
			3 53
			87 15 95 51 43 41 88 67 102 8 103 69 100
			7 21 43 99 75 64 95 90 101 86 79 78 10 58 55 18 31 30 29 88 6 8
			3 62
			10 88 72 17 69 85 87 53 78
			62 74
			62 28
			9 74
			77 42
			60 98 20 54 77 50
			2 51
			12 2
			91 10 101 14 31 30 53 34 99 61 74 102 83 41 95 78
			57 45 4 23
			9 25
			14 61 44 30 53 41 102 88 7 6 91 83 101 95 79
			11 83 99 61 88 63 90 102 103 100 81 69 94 97 85 91 10 27 15 21 14 16 55 95 31 30 35 41 80 79 101 6 8

	  Bubble
	  ET
		0.003

	InfoMap
			# InfoMap: Generated from {datasets/communities/EXTENDED_DATASETS/InfoMap/mit-nov/edge_list.dat.tree}
			# Generated on {Fri, 27 May 2011 11:03:51 +0100}
			# number_of_communities = 11
			# avg size = 9
			# nodes = 95
			19 20 24 32 12 54 98 76 89 50 77 36 70 48 68 4 84 45 37 82 59 23 39 81 60
			30 102 86 14 6 91 61 41 16 101 88 79 18 53 35 63 44 9 40 93 52 72 33 42
			15 97 85 100 80 83 38 73 1
			90 95 29 46 58 8 7 75 43 66 64 94
			67 103 87 2 51 57 47
			21 55 78 34
			11 17 13
			31 10
			69 27 99
			25 65 74 3
			62 28

InfoCom2005
  HGCE
	  Bubble
	  K	A	M	E	P	S	R	ST	MAP	Z
	  3	1	0	0.25	0.2	1	100	0.5	0.5	0.1
	  BubbleH
	  K	A	M	E	P	S	R	ST	MAP	Z
	  3	1	0	0.25	0.2	1	100	0.6	0.9	0.1

			# Communities AverageMembers
			# 2 27
			1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 32 33 34 35 36 37 38 39 40 41
			1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 32 33 34 35 36 37 38 39 40 41

  KCLIQUE
	  BubbleH
	  K	MCT
	  4	0.009

			# k = 4
			# minimum_connected_ratio = 0.009
			# number_of_communities = 1
			# community_sizes = [37]
			# average_size = 37.0
			1 2 3 4 5 6 7 8 10 11 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 32 33 34 35 37 38 39 40 41 

	  Bubble
	  K	MCT
	  2	0.11

  LinkClustering
	  bubbleH
	  ET
	  0.008
			# D_max: 0.398456726708
			# S_max: 0.368421052632
			# Communities: 15
			# Average size : 7
			# Nodes : 40 (seen 99 times)
			9 30
			21 41
			12 23
			11 25 26 35 15 41 13 14 24 10 39 27 20 21 22 16 33 32 37 29 40 34 1 19 3 5 4 7 6 8 17 18
			38 36
			10 39 15 22 33 37 36 34 1 13
			24 27 39 2 14 35 34
			24 25 13 15 9 5 37 29 34
			18 12
			2 23
			11 39 27 32 30 37 40 35 5 4 28
			26 22 23 33 8 6 18 34
			9 12
			39 38 40 22 19 32 35 4

	  bubble
	  ET
	  0.04

	InfoMap
			# InfoMap: Generated from {datasets/communities/EXTENDED_DATASETS/InfoMap/infocom-2005/edge_list.dat.tree}
			# Generated on {Fri, 27 May 2011 11:03:59 +0100}
			# number_of_communities = 3
			# avg size = 14
			# nodes = 41
			40 32 39 1 27 22 15 35 29 19 24 17 4 25 8 13 30 10 9 26 14 36 5 34 11 6 16 38 21 3 23 28 41 2 12 31
			33 20 37
			7 18

InfoCom2006
  HGCE
	  Bubble
	  K	A	M	E	P	S	R	ST	MAP	Z
	  3	1	0	0.25	0.2	1	100	0.7	0.5	0.2
			# Communities AverageMembers
			# 2 49
			22 23 24 25 26 27 28 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 84 85 86 87 88 89 90 91 92 93 94 95 96 98
			22 23 24 25 26 27 28 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 84 85 86 87 88 89 90 91 92 93 94 95 96 98

	  BubbleH
	  K	A	M	E	P	S	R	ST	MAP	Z
	  3	1	0	0.25	0.3	4	100	0.6	0.5	0.2

  KCLIQUE
	  BubbleH
	  K	MCT
	  3	0.02

			# k = 3
			# minimum_connected_ratio = 0.02
			# number_of_communities = 1
			# community_sizes = [58]
			# average_size = 58.0
			22 25 26 27 28 31 32 34 35 36 37 38 39 40 43 44 45 46 47 49 50 51 52 53 54 55 56 57 58 60 61 63 64 66 67 70 72 73 74 75 76 77 78 79 81 82 83 84 85 86 87 88 89 90 91 93 95 98 

	  Bubble
	  K	MCT
	  3	0.02

  LinkClustering
	  BubbleH
	  ET
	  0.01
			# D_max: 0.352182263797
			# S_max: 0.407407407407
			# Communities: 95
			# Average size : 4
			# Nodes : 78 (seen 387 times)
			86 35
			38 40 49 36 35
			44 56
			83 48
			24 62 27
			68 69 21
			22 67
			36 48
			30 59
			46 44
			56 71
			91 39
			92 48
			58 94
			83 40
			56 94 40
			59 43
			80 65
			56 40 48
			24 60 62 51 75 52 77 76 88 89
			46 39 86
			47 56 40
			77 58 44 45 51 43 41 60 75 89 73 71 70 64 80 92 95 93 74
			25 38 58 49 32 57 56 88 40 43 60 75 89 53 52 86
			68 48
			46 26 39
			26 80
			39 74 89 46 54 88 53 52
			26 56
			74 97
			56 29
			54 48
			26 97
			46 85
			60 25 75 22 57 51 78
			24 62
			54 41
			76 58 82 32 44 30 51 52
			38 22 49 56 40
			47 22
			87 65 72
			26 58 53 43
			31 65
			83 73 23
			24 80
			88 57 70 28 50 77 75 89 66 67 82 31 93 81 44 84 85
			42 48
			24 62 48
			69 21
			47 45 28 43 60 88 89 73 79 64 93 81 86 84 78
			54 22
			73 29
			56 73 40 49
			80 59
			45 28
			24 62 36
			68 65
			22 48
			65 96
			42 29
			86 85
			25 44
			71 41
			51 21
			63 38 59
			47 92
			25 67
			31 35 49
			29 43
			89 58 43 37 36 53 34
			39 30
			22 85
			97 85
			92 71
			74 64 73 68 51 45 86 43 52
			76 66 91 90 31 96 93 79 78
			59 52
			77 58 32 33 86 45 51 43
			25 65
			83 23
			46 39 59
			30 48
			33 48
			46 39
			48 51 53 76 89 64 66 82 73 80 84 78
			27 49 55 56 35 61 63 40 38
			25 26 74 54 57 75 52
			26 22
			68 47
			69 65
			61 27 51 43
			28 60 88 89 64 66 67 82 80 81 86 87 84 25 44 45 42 43 34 77 76 75 74 73 72 70 91 90 93 95 79 78 51 58 98 32 57 37 50 53 52 54 31
			27 48
			37 59 34

	  Bubble
	  ET
	  0.09  

	InfoMap
			# InfoMap: Generated from {datasets/communities/EXTENDED_DATASETS/InfoMap/infocom-2006/edge_list.dat.tree}
			# Generated on {Fri, 27 May 2011 11:04:18 +0100}
			# number_of_communities = 5
			# avg size = 16
			# nodes = 78
			78 93 76 66 60 74 64 82 77 81 45 72 84 88 28 43 89 90 44 53 51 67 79 87 37 70 75 86 91 56 34 52 73 48 58 40 95 41 32 80 71 50 31 54 26 57 85 47 92 98 36 24 30 33 59 42 62 96 65 23 29 83 94 97 21
			38 35 49 63 55 27 61
			25 22
			39 46
			68 69

Cambridge
  HGCE
	  BubbleH
	  K	A	M	E		P	S	R	ST	MAP	Z
	  5	1.6	0	0.25	0.2	4	100	0.6	0.5	0.1
			# Communities AverageMembers
			# 4 23
			1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
			1 2 3 4 5 6 7 9 10 11 13 14 15 17 18 19 20 21 23 25 28 29 30 32 33 34
			2 3 5 6 7 8 9 10 11 12 13 16 18 20 21 22 25 27 28 29 31 32 33 34 35 36
			1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 17 18 19 20 23 24 26 29 30 31 33 35

	  Bubble
	  K	A	M	E	P	S	R	ST	MAP	Z
	  5	1	0	0.25	0.3	4	100	0.9	0.9	0.2

  KCLIQUE
	  BubbleH
	  K	MCT
	  2	0.02
			# k = 2
			# minimum_connected_ratio = 0.02
			# number_of_communities = 3
			# community_sizes = [2, 2, 11]
			# average_size = 5.0
			1 4
			24 26
			32 34 3 36 7 16 21 22 25 27 28 

	  Bubble
	  K	MCT
	  2	0.04

  LinkClustering - bubbleH
	  ET
	  0.004
			# D_max: 0.75851421306
			# S_max: 0.5
			# Communities: 12
			# Average size : 5
			# Nodes : 34 (seen 55 times)
			11 25 27 21 22 16 32 28 36 34 3 12 7
			9 5 19 15 33
			2 34
			26 35
			3 6
			13 15 6 19 20
			9 13
			10 20 14 17 23 19 18 30 1 33 2 5 4 6 15
			2 35
			24 26
			11 21 22 29 7

	  Bubble
	  ET
	  0.003

	InfoMap
			# InfoMap: Generated from {datasets/communities/EXTENDED_DATASETS/InfoMap/cambridge/edge_list.dat.tree}
			# Generated on {Fri, 27 May 2011 11:04:24 +0100}
			# number_of_communities = 3
			# avg size = 12
			# nodes = 36
			34 21 36 22 25 32 28 3 7 16 27 12 11 29 8
			18 33 15 14 5 20 19 4 6 2 23 1 17 30 10 13 9 31
			26 35 24

[/expand]

I originally made the decision to use seperate parameters for Bubble and BubbleH, but in hindsight this was not a good idea, as they cannot easily be compared.

When we discussed the results, we realised the KCLIQUE seems to perform well for latency with BubbleH, despite not having much hierarchical structure. Pádraig suggested I investigate this further, to find out what is happening. In the mean time I have re-run the simulations, using the best parameters for BubbleH – to level the playing field in terms of parameters. As is obvious from the above community structures, only HGCE produces communities with one all encompassing community, and in the case of KCLIQUE for both InfoCom datasets, it produces only one community – this seems odd, as we would expect the best performance from a more complicated structure.

This resulted in the following community structure.

[expand title=”Community Structures”]

MIT-NOV
  HGCE
    comm_count	avg_member_count	min_nodes	max_nodes
    21	        15.80952381     	4	        87
  KCLIQUE
    comm_count	avg_member_count	min_nodes	max_nodes
    4	          20.25	            4	        46

  InfoMap
    comm_count	avg_member_count	min_nodes	max_nodes
    11          8.636363636     	2	        25

  LinkClustering
    comm_count	avg_member_count	min_nodes	max_nodes
    78	          8.141025641     2       	67

INFOCOM-2005
  HGCE
    comm_count	avg_member_count	min_nodes	max_nodes
    2	          40     	          40	      40

  KCLIQUE
    comm_count	avg_member_count	min_nodes	max_nodes
    1	          38     	          38	      38

  InfoMap
    comm_count	avg_member_count	min_nodes	max_nodes
    3	          13.66666667     	2	        36

  LinkClustering
    comm_count	avg_member_count	min_nodes	max_nodes
    14          7.071428571     	2	        32

INFOCOM-2006
  HGCE
    comm_count	avg_member_count	min_nodes	max_nodes
    2	          73     	          73	      73
  KCLIQUE
    comm_count	avg_member_count	min_nodes	max_nodes
    1	          59                59       	59
  InfoMap
    comm_count	avg_member_count	min_nodes	max_nodes
    5	          15.6            	2       	65
  LinkClustering
    comm_count	avg_member_count	min_nodes	max_nodes
    94          4.117021277     	2	        43

CAMBRIDGE
  HGCE
    comm_count	avg_member_count	min_nodes	max_nodes
    4	          28.75     	      23	      36

  KCLIQUE
    comm_count	avg_member_count	min_nodes	max_nodes
    3           6     	          3       	12

  InfoMap
    comm_count	avg_member_count	min_nodes	max_nodes
    3           12                3         18

  LinkClustering
    comm_count	avg_member_count	min_nodes	max_nodes
    11          5	                2	        15

[/expand]

Delivery Ratio, Cost, Latency and Delivered Hops for each CFA and for each Dataset, for BubbleRAP and BubbleH. (OLD PLOT - SEE BELOW FOR NEW PLOT)

In the results above, there seems to be something odd happening with LinkClustering for BUBBLERap, for Cost and Hops in MIT-NOV and InfoCom2005 it has much higher values than for the other CFAs. Also, there appears to be something odd happening with Latency in the Cambridge dataset, with a maximum of ~1000 hours latency which is about 40 days. The MIT dataset is roughly the same length, but doesn’t get anywhere near the same latency.

The two plots below show Delivery Ratio and Latency for BubbleH, HGCE for the Cambridge dataset.

Delivery Ratio for HGCE-BubbleH, on the Cambridge dataset

Latency for HGCE-BubbleH, on the Cambridge dataset

These seem to show that at the points where the new parts of the dataset are joined, there is a huge jump in deliveries of messages. It also seems that the datasets are truncated at the end. I will double check the start/end times, but this could be due to the original dataset having no events after a certain timestamp, and therefore, in the last section, the simulation stops when there are no events left to process. The huge spike in messages delivered explains the very high and very low extremes – as messages that were sent in the last iteration of message sending, get delivered very quickly. It seems odd however, that the same is not true for the first iteration. It could be, that because the last set have started to be distributed, maybe only one hop away, these nodes are already in a much better position to deliver the nodes at the start of the next section. This could be due to higher activity in the first hours of the dataset collection, perhaps even due to all the nodes being switched on in the same place at the same time. If this is the case, then we need to start the dataset slice slightly later, to avoid this. The activity traces for each dataset is shown below:

MIT-Nov Daily Connections

Cambridge Daily Connections

Infocom2005 Hourly Connections

Infocom2006 hourly connections

From these we can see that MIT has a good distribution of connectivy across the dataset, but the other three have only short spikes of activity. This in itself does not make a difference to the comparison of resulst for each datasets, as they are all relative, but it does go some way to explain the very high latency figures for Cambridge.

The issues with KCLIQUE

I looked back at the data for KCLIQUE in the NGMAST simulations, where there were multiple parameters to KCLIQUE, resulting in varying degrees of community structure, for all datasets, the delivery ratio in KCLIQUE seems to increase with member count, whereas community count peaks before delivery ratio really gets going.

Delivery Ratio, Cost, Community Count and Average Member count.

This makes sense, because for BubbleH, the more we know about members of a the communities, the more messages we can direct using local ranking. And, because we haven’t included an overall community for KCLIQUE (or InfoMap*, or LinkClustering) we have no way to deliver messages to nodes not in a community, until a node holding a message comes into contact with the destination node. I’m not sure if we designed it this way or whether this is just a bug.

The next step is to fix this issue, by either

adjusting the BubbleH algorithm to use global rank to pass messages between nodes at the highest level
Force a global community in the other community finding algorithms (this will have the same effect as 1)

* InfoMap includes every node in the communities list, but does not include a global parent community

Update:

I have now run all of the datasets with a forced global parent community, this seems to have evened the playing field somewhat:

Results where every CFA is forced to produce a global parent community including all nodes. (NOTE the change in axis labels from previous run)

The interesting thing is that there is very little difference in delivery ratio and cost using BUBBLERap.

Categories: discussions, experiments, Supervisor Meetings, what i've been doing

Results for NGMAST11/NEMO

May 23rd, 2011 matt 2 comments

We have decided not to submit to NGMAST11 conference as the results we have were not really ready, also I found an error in the config setup for KCLIQUE BubbleH. I have therefore run all of the simulations from scratch, just to make sure it is all correct. In this re-run, I ran the HGCE algorithms far too many times for each dataset, so there is a very large number of results for the HGCE algorihtm (~700 for each dataset). We plan to submit to the Finding NEMO workshop instead.

To visualise the results so far, we have plotted the best (see note below) results for BubbleRAP and BubbleH:

BUBBLERap and BubbleH compared for Delivery Ratio, Cost and Latency

To determine the ‘best’ run for algorithms with multiple parameters, we first filtered the results to the best 10 for latency, then ordered by delivery ratio followed by cost, this should mean that a good delivery ratio is still shown for good delivery latency, but avoids results where a abnormally low delivery ratio is chosen with a very poor delivery ratio.

Categories: Datasets, experiments, what i've been doing

Initial results – HGCE, KCLIQUE, InfoMap and LinkClustering

May 7th, 2011 matt 2 comments

We ran each community finding algorithm over the datasets: MIT-NOV, Cambridge, InfoCom 2005, and InfoCom 2006. We used bubbleRAP, BubbleH, Unlimited flood, hold & wait, and Prophet to measure the effect of the CFA on the dataset. The results below are grouped by dataset. Only the best results for BubbleH/BubbleRAP KCLIQUE and HGCE are shown. Note: The local ranking used for InfoMap is based on the rank given by the InfoMap algorithm, NOT centrality as with BubbleRAP and BubbleH.

MIT-NOV

CAMBRIDGE

IncoCom 2005

InfoCom 2005

My next plan is to link statistics about community structure (number of communities, average members etc.) to the results.

Also, I want to find out if there is an existing measure of overlappiness? Does the amount of overlap in communities affect the results in any way? Also, what about the extent to which there is ‘nesting’ of communities, does this make a difference?

I also want to visualise the communities produced in some way, so that it is clearer what the network structure for the best results is based on.

This excel file this excel file contains the combined results of each CFA for all 4 datasets, so we can compare the effect of community structure (number of communities, average member size) directly with results for each simulation run.

Categories: Datasets, experiments, what i've been doing

Next steps

April 27th, 2011 matt 2 comments

In the last few days, I have been trying out new CFAs for use in the next paper for NGMAST11, the idea behind the paper is exploring different CFAs and datasets and seeing what happens. In order to do this, I have decided to first explore what the Different CFAs do using the MIT-NOV dataset, for both BubbleRAP and BubbleH, i.e. use the communities found by each to drive both DTN algorithm. This gives a fair comparison. Then, I will perform the same tests on different datasets, I envisage problems for some CFAs as the datasets we intend to use are in some cases small, and may not yield good community results, and therefore we will explore thresholding of edges before finding communities (depending on whether the CFA can accept weighted edges or not.

The following tables sum up the tasks invloved, and will be completed as I get the work done:

[TABLE=2]

Some notes so far:

InfoMap undirected, weighted version gives it own ranking for nodes within communities, which can be used in BubbleRAP and BubbleH, so the two versions could be compared. The hierarchical version also does this, but only for the finest level community.

It might be worth writing new community dataset loader for ContactSim/LocationSim so it can take in and calculate the hierarchy, for example, at the moment, each community is defined by a list of nodes, but if we prefix this list with a parent id, it will implicitly specify the hierarchy. (this may be time consuming, but worthwhile down the line, as we can do away with HierarchyOracle and associated classes, and can work directly on communities.

Results so far:

Community finding

InfoMap, which partitions the graph, is easy to visualize,

InfoMap clustering of MIT-NOV dataset - Colour of nodes indicates clustering (edges are coloured by connecting node and weighted by connected time)

The others are harder however, as there I could think of no easy way of indicating all the communities a node belongs to visually, the following shows LinkClustering.

LinkClustering communities on MIT-NOV where the size and colour of the node indicate the number of communities it belongs to

The communities found by HGCE are multiplied across many parameters to the algorithm, so they are much harder to visualise in terms of nodes and edges, however, the plot below, shows the effect on community size and number of communities when changing parameters to HGCE. The graph below is orderded by MAP, ST and average nodes per community. Details about the parameters to HGCE are summarised here.

MIT NOV, HGCE multiple parameters

BubbleH and BubbleRAP

Categories: Datasets

BubbleH and SMW11

April 19th, 2011 matt 2 comments

Our paper for SMW11 was accepted, and now we will be able to include the results from the bug-fixed version of BubbleH, which performs very well. An addition that Pádraig suggested, was to ignore level information in the hierarchy, and simply use community size. This has the benefit of making the algorithm simpler, but still incorporating hierarchical structure. I ran the two versions side by side and found that they perform almost identically, with some cases ignoring level being slightly better!

MIT-NOV Dataset, with levels included and not included for multiple parameters to H-GCE

This plot is very hard to read, but it is possible to see the similarities at a broad level. The best performing run was with H-GCE parameters of K=3, E=0.15, ST=0.9, MAP=0.9 and Z=2. It’s structure seems to be relatively flat, with a large number of communities:

[expand title=”click to view”]

  0(90)
  |
   - 1(5)
  |
   - 2(5)
  |
   - 3(8)
  |
   - 4(8)
  |
   - 5(8)
  |
   - 6(4)
  |
   - 7(28)
  |
   - 8(27)
  |
   - 9(22)
  |
   - 10(17)
  |
   - 11(13)
  |
   - 12(20)
  |
   - 13(16)
  |
   - 14(26)
  |  |
  |   - 15(12)
  |
   - 16(7)
  |
   - 17(4)
  |
   - 18(4)
  |
   - 19(3)
  |
   - 20(8)
  |
   - 21(8)
  |
   - 22(3)
  |
   - 23(13)
  |
   - 24(8)
  |
   - 25(27)

[/expand]

This gives us the following graphic for the SMW11 paper.

bubbleh-bubble-prophet-flood-ignorelevelstrue-bestrun

MIT-NOV Dataset, for multiple parameters to H-GCE, compared to Bubble, PROPHET and Flood

The latency is shown below, which seems to follow the same trend as the previous version, but with BubbleH actually beating all but Flood at the end.

Latency in MIT-NOV dataset for Bubble, BubbleH, PROPHET and Flood.

This is a positive result, but whilst doing some work towards the next paper for NGMAST11, I realised that we should be doing runs for multiple parameters to K-Clique, however for this paper, probably don’t need to worry so much. Also, for reference, the average value for BubbleH is included in the plot below.

We need to consider whether to evaluate multiple parameters to BubbleRAP, and see whether this affects the algorithm. Also, we need to consider whether the hierarchy is really making things better or is it the sheer number of communities, because the best performing run has a quite flat structure.

Categories: Publications, Thoughts, what i've been doing, What i've been writing, What Matt Says, What Pádraig Says

The trouble with too many hops

April 12th, 2011 matt 6 comments

In the recent paper that we submitted to SMW2011, we showed that we could increase the delivery ratio using overlapping hierarchical clustering, however, we also identified that the cost involved was much higher than we expected, up to 245 hops in some cases. To find out what was happening, we checked the behaviour of the best performing parameters (GCEH-K3-E0.15-ST0.7-MAP0.5-Z0.2) by recording the route of each message along with some hints as to why the message was being passed at that point. Hop Stats for GCEH-K3-E0.15-ST0.7-MAP0.5-Z0.2 (11MB uncompressed).

This is the current algorithm:

[expand title=”click to view algorithm”]

for (BubbleHMessage message : my_buffer) {
  /**
  * Identify smallest community that Self and Destination share =
  * Bridging Community
  * */

  int my_bridge_community = BubbleHierarchyOracle.bridgeCommunity(my_node.getID(), message.dest, my_properties);

  // if encountered node is destination
  if (process.getNode().getID() == message.dest) {
  // pass it on to them.
    destination_met++;
    toSend.add(message);
    message.setHopCode("DM");
  } else {
    int remote_bridge = BubbleHierarchyOracle.bridgeCommunity(process.getNode().getID(), message.dest, my_properties);
    if (BubbleHierarchyOracle.isBetter(remote_bridge, my_bridge_community)) {
      // if P is in community with message.dest, that is smaller
      // than BC
      // pass message
      message.setHopCode(remote_bridge +"-isBetter-" + my_bridge_community+ ":" + BubbleHierarchyOracle.whyIsBetter(remote_bridge, my_bridge_community));
      toSend.add(message);
      smaller_community_found++;
    } else if (process.getCommunities().contains(my_bridge_community)) {// if both nodes are in the bridge community
        // if the rank of the encountered node is higher, pass the message
      if (process.getlocalRank(my_bridge_community) > getlocalRank(my_bridge_community)) {
        // pass message
        toSend.add(message);
        message.setHopCode("RANKBC-" + my_bridge_community);
        bridge_community_ranking_message_passed++;
      }
    } else {
      // process is not destination, is not in a smaller community
      // with the destination, and is not in the bridge community,
      // therefore we do not pass the message to it
    }
  }

}

[/expand]

The isBetter function:

[expand title=”click to view isBetter function”]

public static boolean isBetter(int candidate_bridgeCommunity, int test_bridgeCommunity) {

  if(candidate_bridgeCommunity == -1 && test_bridgeCommunity == -1){
    // neither have a shared community
    return false;
  }else if((candidate_bridgeCommunity == -1) && (test_bridgeCommunity >= 0)){
    // candidate shares a community with dest
    return false;
  }else if((candidate_bridgeCommunity >= 0) && (test_bridgeCommunity == -1)){
    // current shares a community, but candidate does not
    return true;
  }else if(heirarachy.level(candidate_bridgeCommunity) > heirarachy.level(test_bridgeCommunity)){
      // candidate has a higher level
      return true;
  }else if (heirarachy.members(candidate_bridgeCommunity).length < heirarachy.members(test_bridgeCommunity).length){
      // candidate has less members
      return true;
  }else{
    return false;
  }

}

[/expand]

Some information about the run:

Log first ran in simulator: 2004-12-09_23:57:24
Log Filename: [...]0.hop-stats.dat
Mesages: 10506
	Hops: 276313
	Avg: 26
	Max Hop Count: 245
Earliest Hop: Wed, 10 Nov 2004 00:00:00 +0000
Latest Hop: Thu, 09 Dec 2004 23:13:29 +0000

: Distribution of path lengths for GCEH-K3-E0.15-ST0.7-MAP0.5-Z0.2 using MIT-NOV dataset

Community Structure:

[expand title=”click view community structure”]

 0(95)
  |
   - 1(95)
    |
     - 2(33)
    |
     - 3(30)
    |
     - 4(33)
    |
     - 5(31)
    |
     - 6(20)
    |  |
    |   - 8(7)
    |
     - 7(12)
    |
     - 9(15)
    |
     - 10(12)
    |  |
    |   - 11(5)
    |  |
    |   - 12(5)
    |
     - 13(4)
    |
     - 14(24)
    |
     - 15(15)
    |
     - 16(11)
    |
     - 17(49)
    |  |
    |   - 18(11)
    |
     - 19(41)
    |  |
    |   - 20(14)
    |
     - 21(4)
    |
     - 22(11)
    |
     - 23(5)
    |
     - 24(45)

[/expand]

Community Membership:

[expand title=”click to view”]

# options
# {'phi': 0.25, 'threads': 5, 'max_times_spoken_for': 1, 'minCliqueSize': 3, 'num_realizations': 100, 'epsilon': 0.14999999999999999, 'perturb_alpha': 0.20000000000000001, 'minCliqueEdgeWeight': 0.0, 'similarity_threshold': 0.69999999999999996, 'outfile': None, 'min_appearance_prop': 0.5, 'intNodeIDs': False, 'file': None, 'alpha': 1.0}
# ForceGlobalParent:true
# community_id-parent_id: members...n
0--1: 1 2 3 4 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 23 24 25 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 50 51 52 53 54 55 57 58 59 60 61 62 63 64 65 66 67 68 69 70 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 93 94 95 97 98 99 100 101 102 103
1-0: 1 2 3 4 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 23 24 25 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 50 51 52 53 54 55 57 58 59 60 61 62 63 64 65 66 67 68 69 70 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 93 94 95 97 98 99 100 101 102 103
2-1: 6 8 10 11 14 15 16 17 18 21 29 30 31 38 41 43 46 55 61 63 79 80 83 85 86 88 90 91 94 95 97 101 102
3-1: 6 10 11 14 15 16 17 18 21 30 31 38 41 43 55 61 69 79 80 83 85 86 88 90 91 94 95 97 101 102
4-1: 6 7 8 10 14 15 16 17 18 21 29 30 31 41 46 55 58 61 75 78 79 80 83 85 86 88 90 91 94 95 97 101 102
5-1: 6 10 11 13 14 15 16 17 18 27 30 31 35 41 46 53 61 63 69 78 79 83 85 86 88 91 95 97 99 101 102
6-1: 3 9 10 16 25 30 31 35 41 53 61 65 72 74 79 88 91 95 101 102
7-1: 6 10 14 18 25 31 62 65 83 86 88 102
8-6: 3 25 41 53 65 72 74
9-1: 6 14 16 30 41 44 53 61 73 79 83 91 95 101 102
10-1: 9 15 16 30 41 42 52 61 77 91 101 102
11-10: 9 30 42 77 101
12-10: 9 30 41 42 101
13-1: 1 30 83 101
14-1: 7 15 16 30 35 38 41 46 53 58 61 69 79 80 83 85 90 91 95 97 100 101 102 103
15-1: 10 16 30 41 53 61 73 74 79 83 91 95 99 101 102
16-1: 6 11 13 14 17 30 47 58 86 90 95
17-1: 1 2 3 7 8 10 15 21 25 27 29 31 34 38 42 43 46 47 51 52 55 57 58 61 64 65 66 67 69 72 73 74 75 78 79 80 83 85 87 88 90 93 94 95 97 98 99 100 103
18-17: 8 10 15 27 66 80 83 85 97 98 100
19-1: 1 2 6 7 8 14 15 18 21 29 33 34 38 40 43 46 47 51 55 57 58 64 66 67 72 73 75 78 80 83 85 86 87 88 90 94 95 97 100 102 103
20-19: 2 15 38 51 67 80 85 87 88 95 97 100 102 103
21-1: 28 58 90 103
22-1: 23 30 39 41 59 63 79 81 91 101 102
23-1: 45 69 72 88 99
24-1: 3 4 9 12 19 20 23 24 25 28 32 33 34 36 37 39 42 44 45 48 50 51 52 54 57 59 60 62 63 64 65 66 68 70 72 73 74 76 77 81 82 84 89 94 98

[/expand]

As expected there are a high number of messages that arrive within a small number of hops, which is highly desirable, however there is a long tail that extends up to 245 hops, which was not expected.

An annotated log file excerpt for message pair 85:3 (one of the longest paths) is shown below. Note that this message is live, which means it has not been delivered. Analysis of this shows a large number of hops between the same nodes, and the same communities.

Log file Exerpt

Key:

n-isBetter-n:Reason = the first  (candidate) community is deemed
                      better than the last (test), for the reason given.
RANKBC-n  =           the rank of the two nodes in the given community n is
                      compared, and the encountered node scored higher
DM =                  destination met, so the message was passed

[expand title=”click to view log”]

Message ID: 8060
From 85 to 3
Status: live
Passed to node 85 at 2004-11-10 00:00:00 reason: INITIATED
Passed to node 16 at 2004-11-10 09:22:09 reason: 6-isBetter-17:BETTER-CANDIDATE-LESS-MEMBERS
Passed to node 102 at 2004-11-10 09:56:10 reason: RANKBC-6
Passed to node 41 at 2004-11-10 12:04:45 reason: 8-isBetter-6:BETTER-CANDIDATE-HAS-HIGHER-LEVEL-DEPTH
Passed to node 6 at 2004-11-10 14:36:56 reason: RANKBC-8
Passed to node 90 at 2004-11-10 14:40:55 reason: 17-isBetter-1:BETTER-CANDIDATE-HAS-HIGHER-LEVEL-DEPTH
Passed to node 95 at 2004-11-10 15:23:10 reason: 6-isBetter-17:BETTER-CANDIDATE-LESS-MEMBERS
Passed to node 30 at 2004-11-10 15:26:52 reason: RANKBC-6
Passed to node 102 at 2004-11-10 15:40:22 reason: RANKBC-6
Passed to node 41 at 2004-11-10 18:06:57 reason: 8-isBetter-6:BETTER-CANDIDATE-HAS-HIGHER-LEVEL-DEPTH
Passed to node 90 at 2004-11-10 18:19:24 reason: RANKBC-8
Passed to node 95 at 2004-11-11 09:47:52 reason: 6-isBetter-17:BETTER-CANDIDATE-LESS-MEMBERS
Passed to node 30 at 2004-11-11 11:46:14 reason: RANKBC-6
Passed to node 102 at 2004-11-11 12:00:38 reason: RANKBC-6
Passed to node 41 at 2004-11-11 12:55:31 reason: 8-isBetter-6:BETTER-CANDIDATE-HAS-HIGHER-LEVEL-DEPTH
Passed to node 30 at 2004-11-11 14:05:32 reason: RANKBC-8
Passed to node 41 at 2004-11-11 14:38:50 reason: 8-isBetter-6:BETTER-CANDIDATE-HAS-HIGHER-LEVEL-DEPTH
Passed to node 30 at 2004-11-11 15:11:56 reason: RANKBC-8
Passed to node 41 at 2004-11-11 16:37:36 reason: 8-isBetter-6:BETTER-CANDIDATE-HAS-HIGHER-LEVEL-DEPTH
Passed to node 30 at 2004-11-11 17:30:20 reason: RANKBC-8
Passed to node 41 at 2004-11-11 17:56:34 reason: 8-isBetter-6:BETTER-CANDIDATE-HAS-HIGHER-LEVEL-DEPTH
Passed to node 6 at 2004-11-11 18:18:04 reason: RANKBC-8
Passed to node 30 at 2004-11-15 09:20:19 reason: 6-isBetter-1:BETTER-CANDIDATE-HAS-HIGHER-LEVEL-DEPTH
Passed to node 101 at 2004-11-15 09:46:13 reason: RANKBC-6
Passed to node 41 at 2004-11-15 09:57:18 reason: 8-isBetter-6:BETTER-CANDIDATE-HAS-HIGHER-LEVEL-DEPTH
Passed to node 11 at 2004-11-15 12:47:19 reason: RANKBC-8
Passed to node 88 at 2004-11-15 13:41:22 reason: 6-isBetter-1:BETTER-CANDIDATE-HAS-HIGHER-LEVEL-DEPTH
Passed to node 38 at 2004-11-15 13:47:35 reason: RANKBC-6
Passed to node 79 at 2004-11-15 14:41:15 reason: 6-isBetter-17:BETTER-CANDIDATE-LESS-MEMBERS
Passed to node 91 at 2004-11-15 14:52:48 reason: RANKBC-6
Passed to node 101 at 2004-11-15 14:58:03 reason: RANKBC-6
Passed to node 102 at 2004-11-15 14:58:48 reason: RANKBC-6
Passed to node 41 at 2004-11-15 15:26:31 reason: 8-isBetter-6:BETTER-CANDIDATE-HAS-HIGHER-LEVEL-DEPTH
Passed to node 30 at 2004-11-15 16:04:25 reason: RANKBC-8
Passed to node 101 at 2004-11-15 16:08:44 reason: RANKBC-6
Passed to node 53 at 2004-11-15 16:14:16 reason: 8-isBetter-6:BETTER-CANDIDATE-HAS-HIGHER-LEVEL-DEPTH
Passed to node 13 at 2004-11-15 16:25:03 reason: RANKBC-8
Passed to node 91 at 2004-11-15 16:29:28 reason: 6-isBetter-1:BETTER-CANDIDATE-HAS-HIGHER-LEVEL-DEPTH
Passed to node 53 at 2004-11-15 16:30:40 reason: 8-isBetter-6:BETTER-CANDIDATE-HAS-HIGHER-LEVEL-DEPTH
Passed to node 30 at 2004-11-15 16:41:37 reason: RANKBC-8
Passed to node 41 at 2004-11-15 16:51:04 reason: 8-isBetter-6:BETTER-CANDIDATE-HAS-HIGHER-LEVEL-DEPTH
Passed to node 30 at 2004-11-15 17:31:42 reason: RANKBC-8
Passed to node 41 at 2004-11-15 18:13:21 reason: 8-isBetter-6:BETTER-CANDIDATE-HAS-HIGHER-LEVEL-DEPTH
Passed to node 30 at 2004-11-16 09:18:35 reason: RANKBC-8
Passed to node 41 at 2004-11-16 09:24:56 reason: 8-isBetter-6:BETTER-CANDIDATE-HAS-HIGHER-LEVEL-DEPTH
Passed to node 6 at 2004-11-16 11:01:28 reason: RANKBC-8
Passed to node 30 at 2004-11-16 11:01:30 reason: 6-isBetter-1:BETTER-CANDIDATE-HAS-HIGHER-LEVEL-DEPTH
Passed to node 101 at 2004-11-16 11:58:26 reason: RANKBC-6
Passed to node 102 at 2004-11-16 12:09:36 reason: RANKBC-6
Passed to node 41 at 2004-11-16 12:15:36 reason: 8-isBetter-6:BETTER-CANDIDATE-HAS-HIGHER-LEVEL-DEPTH
Passed to node 6 at 2004-11-16 13:35:34 reason: RANKBC-8
Passed to node 102 at 2004-11-16 14:12:41 reason: 6-isBetter-1:BETTER-CANDIDATE-HAS-HIGHER-LEVEL-DEPTH
Passed to node 41 at 2004-11-16 14:47:50 reason: 8-isBetter-6:BETTER-CANDIDATE-HAS-HIGHER-LEVEL-DEPTH
Passed to node 30 at 2004-11-16 14:49:36 reason: RANKBC-8
Passed to node 41 at 2004-11-16 14:49:53 reason: 8-isBetter-6:BETTER-CANDIDATE-HAS-HIGHER-LEVEL-DEPTH
Passed to node 30 at 2004-11-16 15:06:52 reason: RANKBC-8
Passed to node 101 at 2004-11-16 15:13:25 reason: RANKBC-6
Passed to node 53 at 2004-11-16 15:41:12 reason: 8-isBetter-6:BETTER-CANDIDATE-HAS-HIGHER-LEVEL-DEPTH
Passed to node 13 at 2004-11-16 15:50:43 reason: RANKBC-8
Passed to node 102 at 2004-11-16 15:55:57 reason: 6-isBetter-1:BETTER-CANDIDATE-HAS-HIGHER-LEVEL-DEPTH
Passed to node 53 at 2004-11-16 16:44:56 reason: 8-isBetter-6:BETTER-CANDIDATE-HAS-HIGHER-LEVEL-DEPTH
Passed to node 14 at 2004-11-16 16:46:28 reason: RANKBC-8
Passed to node 79 at 2004-11-16 16:52:06 reason: 6-isBetter-1:BETTER-CANDIDATE-HAS-HIGHER-LEVEL-DEPTH
Passed to node 61 at 2004-11-16 16:53:08 reason: RANKBC-6
Passed to node 58 at 2004-11-16 17:20:54 reason: RANKBC-6
Passed to node 31 at 2004-11-16 17:21:37 reason: 6-isBetter-17:BETTER-CANDIDATE-LESS-MEMBERS
Passed to node 90 at 2004-11-16 18:52:52 reason: RANKBC-6
Passed to node 101 at 2004-11-17 12:32:06 reason: 6-isBetter-17:BETTER-CANDIDATE-LESS-MEMBERS
Passed to node 41 at 2004-11-17 12:37:00 reason: 8-isBetter-6:BETTER-CANDIDATE-HAS-HIGHER-LEVEL-DEPTH
Passed to node 30 at 2004-11-17 12:51:31 reason: RANKBC-8
Passed to node 102 at 2004-11-17 12:53:32 reason: RANKBC-6
Passed to node 41 at 2004-11-17 12:59:28 reason: 8-isBetter-6:BETTER-CANDIDATE-HAS-HIGHER-LEVEL-DEPTH
Passed to node 17 at 2004-11-17 13:10:13 reason: RANKBC-8
... etc.

[/expand]

In the implementation, the way that the ‘best’ bridging community is found, involves a choice when there are two or more candidates for the best BC for a node pair. This choice is naive because the choice is made by picking the last one in the list.

There seems to be a number of problems when considering whether a given best bridging community for one node is better than another. When considering this choice, the depth of the community in the structure is compared, this is also a naive decision, as the hierarchy levels in the structure do not relate to the importance of the community, as the depth is not governed by any metric saying how deep a community is, only that it has a parent, and perhaps some children. However, the size of the communities is still perhaps a good metric to use.

There will always be a bridging community for every node pair, as there is one global parent community, this means the the global community is treated like a local community. This should not be a big problem, as it pushes messages to the top when there is no better community to be found, these messages should soon find a node with a better community than the global one.

A couple of solutions to this are:

compare all best communities for a node/message when comparing two nodes
store a best-so-far community with the message, and only pass it on to nodes with a better one.
stop using any local/global ranking, and consider only community best-ness

Pádraig suggested checking to see whether it is particular destination nodes causing the high hop counts, the graphic below illustrates the average hop lengths for each node, both when the node is the source and when the node is the destination. The data table is also given, which also includes the standard deviation.

Average hop lengths by node where the node is the source and where the node is the destination

Data table:

[expand title=”click to view data”]

node	avg_hops_when_src	stddev	avg_hops_when_dest	stddev
1	5.65	3.06	165.62	56.67
2	36.99	48.48	9.68	12.99
3	24.58	35.35	177.15	90.16
4	31.98	40.29	14.3	18.24
5	1	0	1	0
6	36.5	48.97	8.59	6.47
7	30.65	50	6.27	10.11
8	30.64	51.51	17.74	13.27
9	40.73	49.72	49.09	34.73
10	28.68	50.77	25.77	21.38
11	29.25	51.39	5.19	3.42
12	28.47	37.03	13.25	17.61
13	28.65	48.69	9.17	9.79
14	29.51	48.44	5.63	4.58
15	29.37	49.2	25.79	23.91
16	28.68	50.36	9.35	7.35
17	28.81	50.53	8.09	10.76
18	24.65	35.73	82.75	35.73
19	26.43	49.7	12.36	16.71
20	28.99	47.09	14.4	17.78
21	26.54	49.97	6.3	9.82
22	1	0	1	0
23	31.04	46.51	24.57	30.6
24	24.81	44.58	13.21	17.19
25	22.88	40.02	107.42	63.14
26	1	0	1	0
27	19.05	33.76	18.77	12.32
28	14.76	31.24	45.16	26.61
29	37.24	46.37	6.75	9.76
30	29.38	49.53	5.64	3.51
31	30.43	49.31	18.72	10.58
32	33.48	46	12.8	17.24
33	39.87	50.71	4.82	3.3
34	33.17	40.86	5.78	3.49
35	33.63	45.75	5.06	2.24
36	23.25	40.12	15.01	18.54
37	35.37	45.88	14.89	18.24
38	27.52	43.52	9.3	9.72
39	25.62	39.24	46.45	29.91
40	24.88	36.34	5.75	3.27
41	32.17	52.09	9.75	5.69
42	38.6	48.48	111.61	61.11
43	29.83	51.16	6.34	9.82
44	14.51	16.61	76.79	37.2
45	37.45	43.78	29.45	24.56
46	27.56	48.06	6.54	9.87
47	28.09	43.68	22.54	19.48
48	20.84	33.7	13.55	17.73
49	1	0	1	0
50	18.75	34.44	13.67	17.81
51	28.18	50.56	6.25	4.93
52	41.13	49.35	64.24	34.58
53	30.95	47.38	15.41	13.87
54	31.66	46.03	13.83	17.66
55	28.63	44.23	6.44	9.57
56	1	0	1	0
57	31.55	41.85	5.34	3.2
58	27.45	50.7	22.09	22.22
59	27.46	44.1	26.02	27.82
60	10.72	9.25	30.58	26.47
61	30.2	49.97	13.91	18.41
62	33.79	44.06	109.91	57.24
63	30.76	48.56	14.25	16.67
64	20.71	22.38	5.73	3.49
65	20.37	34.53	111.34	63.53
66	37.36	45.57	127.11	65.15
67	38.19	48.7	9.25	13.14
68	25.67	37.47	15.05	18.31
69	29.79	45.81	7.61	4.27
70	21.41	33.17	16.02	18.61
71	1	0	1	0
72	22.07	27.05	24.52	20.85
73	30.64	49.46	72.68	66.96
74	20.09	30.47	93.49	55.14
75	28.4	50.19	6.65	9.8
76	27.27	37.04	14.08	17.77
77	15.32	29.31	31.28	38.38
78	29.72	47.43	6.76	9.76
79	23.72	42.65	10.85	9.49
80	29.97	52.24	31.66	25.39
81	34.55	49.01	22.03	26.42
82	27.24	40.62	16.61	18.63
83	28.57	32.77	99.87	50.57
84	33.09	45.61	15.61	18.58
85	27.16	48.18	8.73	8
86	24.51	44.49	4.67	3.37
87	27	50.14	9.69	13.32
88	30.65	52.13	28.11	44.69
89	21.62	35.71	14.28	17.76
90	27.7	48	10.19	9.92
91	28.15	47.49	8.95	8.27
92	1	0	1	0
93	27.7	37.74	34.67	35.09
94	36.59	47.23	3.75	1.49
95	29.91	49.99	23.63	20.39
96	1	0	1	0
97	37.44	49.69	9.7	6.36
98	18.9	35.19	17.85	19.89
99	31.82	49.52	32.99	20.77
100	23.79	39.27	34.01	17.46
101	29.15	50.93	9.07	8.7
102	34.7	47.82	6.42	5.27
103	37.67	48.72	60.97	33.82

[/expand]

It seems that in some cases, a node might be very hard to get to, and therefore has a high number of hops, however it could also be the case that because the node is not seen very often, then the messages just keep moving around different nodes. The way to see this would be to take measurements of the the hop count (as above) at time intervals. However, it would be best to first fix the algorithm to see if the behaviour continues, then consider further analysis.

Problem Found/Solution

After a bit of bug tracking and debug output scouring, we were able to find two problems. The first, a simulator problem meant that community file (numbered 0 to n) were being loaded in an incorrect order, which meant the internal numbering of communities did not match the file names:
[expand title=”view ordering”]

community.0.dat
community.1.dat
community.10.dat
community.11.dat
community.12.dat
community.13.dat
community.14.dat
community.15.dat
community.16.dat
community.17.dat
community.18.dat
community.19.dat
community.2.dat
community.20.dat
community.21.dat
community.22.dat
community.23.dat
community.24.dat
community.3.dat
community.4.dat
community.5.dat
community.6.dat
community.7.dat
community.8.dat
community.9.dat

[/expand]

The second problem was a little less subtle, it was caused by the behaviour of the isBetter function when considering the level of a community in the hierarchy and the number of members in a community. When an upper level community had a smaller membership than a lower level one, the message tended to be flip-flop’d between communities. This was temporarily fixed by only considering the size of a community when the level was the same.

Results after this fix (for both BubbleRAP and BubbleH) are shown below:

However!

We have since decided to test what happens when we ignore the depth of a community, and only consider the member count, this might have the effect of flattening the hierarchy somewhat, but it simplifies the algorithm… watch this space for results.

Categories: discussions, what i've been doing

InfoCom, Cambridge and Hong-Kong datasets

April 5th, 2011 matt 5 comments

I managed to get a copy of the two InfoCom datasets and the Cambridge dataset mentioned in the BubbleRAP paper, I could not find the Hong-Kong dataset. I parsed them into the correct format for LocationSim, and was able to genereate edge lists for all the connections, and have vizualised them below. Note: Node size represents betweeness centrality before edge removal. Edge width/colour represents connected time. Graphs below are for mobile nodes only; they do not include fixed nodes.

                 InfoCom05     InfoCom06     Cambridge
#iMotes           41              98             54
     Fixed        0               20             18
     Mobile       41              78             36
External          233             ?              ?
Duration         ~3 days        ~4 days         ~1 month (2 months in readme)

InfoCom 2006 edges removed where connected time less than 4 hours – (min = 1000ms max = 72 hours). InfoCom2006 gephi file.

InfoCom 2006 edges removed where connected time less than 2.5 hours – (min = 1000ms max = 46 hours). infocom2005 gephi file.

Cambridge with edges removed where connected time less than ~3 hours – (min = 1000ms max = ~39 hours). infocom2005 gephi file.

Cambridge Dataset with edges removed where connected time is less than ~3 hours - (min 1000ms max ~39 hours)

These graphs suggest that some community structure exists in the traces, and despite the duration of the InfoCom experiments being quite short, they may have more contact events due to the context of the experiment. (I wonder if the same patterns appear for broad scale deployments that appear in small area deployments?)

TODO:
Plot activity for duration of expermiments.
Parse SocialSensing Dataset…..

Activity every 15 mins for InfoCom2005 datset

Activity every 15 mins for InfoCom2006 datset

Activity every 12 hours for Cambridge dataset

Categories: Datasets

Newer Entries Older Entries

Matt Stabeler

Archive

The problem parents

Social Sensing Dataset

Extracting contact networks from less obvious datasets.

Extending Datasets

Results for NGMAST11/NEMO

Initial results – HGCE, KCLIQUE, InfoMap and LinkClustering

Next steps

Community finding

BubbleH and SMW11

The trouble with too many hops

Log file Exerpt

Problem Found/Solution

However!

InfoCom, Cambridge and Hong-Kong datasets

Random Posts

Categories

Archives

Meta

Archive

Community finding

Log file Exerpt

Problem Found/Solution

However!

Random Posts

Tag Cloud

Categories

Archives

Meta