Matt Stabeler

Cell tower connections based on contact events

December 10th, 2010 matt 2 comments

Using the log of hop events, I was able to generate a list of cell towers being used at the time of the hop, from this I built a list of the cell towers that are linked in this way, with a count of the number of times they had been connected (linked_cell list txt, or linked_cells json version ). From this a created a graph where the weight of the edges is the count.

Edges with a weight (or number of times nodes passed a message and reported different cell towers) less than 100 are removed for clarity, the colour of the edges shows the weight of the edge increasing from blue to green.

The tool that I have used to visualise these graphs is called the network workbench, and it reports the following about the graph:

Nodes: 84
Isolated nodes: 9
Node attributes present: label
Edges: 149

No self loops were discovered.
No parallel edges were discovered.
Edge attributes:
Did not detect any nonnumeric attributes
Numeric attributes:
min max mean
weight 1 2700 103.42953

This network seems to be valued.

Average total degree: 3.547619047619045
Average in degree: 1.7738095238095228
Average out degree: 1.7738095238095228
This graph is not weakly connected.
There are 14 weakly connected components. (9 isolates)
The largest connected component consists of 67 nodes.
This graph is not strongly connected.
There are 61 strongly connected components.
The largest strongly connected component consists of 24 nodes.

Density (disregarding weights): 0.02137
Additional Densities by Numeric Attribute
densities (weighted against standard max)
weight: 2.21041
densities (weighted against observed max)
weight: 0.00082

Edges are coloured linearly from white to black depending on weight, edges with a weight greater than 300 are labelled.

It seems that there are is a group of towers that are strongly connected, by large number of messages being passed, and nodes reporting them as being the cell towers used. One possible reason for this, is perhaps because the places where the nodes are, are popular, and as such, require more cell towers to cover the demand. These results are promising, because it means that if we pre-process the data, and ignore connections where there is a low weight, we can use groups of towers to give a notion of place. What this means is, when a node reports a tower in the known cluster of towers, we can assume that it is in roughly the same place as any other node who also reports a tower in the same cluster.

Categories: Datasets, Ideas, what i've been doing

Routing with location

November 3rd, 2010 matt 2 comments

I have started to think again about how to route using location, ignoring contact traces for the moment, I considered what the simplest metric for routing might be. The simplest method, as I seem to remember talking with Davide about previously, is simply comparing knowledge about the set of places visited by each node.

For example, in the above scenario, where the numbered nodes are places visited by nodes, and the lines build the network of places visited by each node (a line represents the route between two places, direction of the arrows shows any ordering of places), if node A has a message for node C, and at some time, meets node B, he can consider whether to pass the message to node B, or keep it to himself, by comparing the intersection (?) of the set of places that each node has with that of the destination node C. Based on the assumption that the more locations it visits in common with the destination node, the more likely it will be to pass the message to the destination.

So, in this case where:

[latex]A \cap B = (3,7) [/latex]
[latex]A \cap C = (3,7) [/latex]
[latex]B \cap C = (3,7,8) [/latex]

Node B has more places in common with C than node A does, so node A passes the message to B, because, B is more likely to visit the same location as C, than A is. This simplistic scheme may work for some situations, but does not consider when two nodes do not meet in the same places, so how to bridge the gap. This is something to think about later on, but an initial guess is to use some other scheme to find a node that does know the destination node, then use this simple scheme.

This scheme is probably not sophisticated enough to be efficient in large networks, and in previous thoughts about routing schemes for location (in which I was mostly using the term vector clocks) I considered a number of things that can be used to inform, or be considered in a routing protocol:

Locations visited
The times locations are visited
- The length of time spent at a location
- The length of time between visits
The ordering of visits to locations
Common popular locations
- Use of sinks/post offices
- Use of data mules (proxy nodes that visit popular locations frequently)
- The number of routes that pass through a place (degree)
The mobility of a node, e.g. the number of places it knows about
Prediction of future visits to locations based on historic knowledge
Sharing of knowledge between nodes

The edges in the graph, could easily be labelled with the times nodes travelled between places, such that we can build a (historic?) vector clock of all traversals in the graph.

I have started to think that using location for routing may be helpful because, knowing WHERE a meeting takes place, might help to consider the importance of that meeting to the routing protocol, whereas when we only know that a meeting has taken place, but not in what context, it is hard to distinguish whether it is an important meeting. However, I haven’t fully formed this idea.

Categories: Ideas, Thoughts, what i've been doing

Understanding ContactSim

November 3rd, 2010 matt 2 comments

In the last post, I mentioned Graham’s simulator, ContactSim, which he descibes as follows:

ContactSim is a discrete-time event simulator which is specifically designed to facilitate the playback of contact traces. It is flexible, allows simple addition of new datasets and swapping of existing datasets for the underlying contact patterns, and has a powerful XML configuration language.

I have managed to have a look through it, and with the help of Graham’s appendices notes and emails, I have managed to get it working, and have started to understand how it works. Graham mentioned that he has some ideas about how I might be able to adapt it to also consider location, and automatically run the kind of simulations we have been thinking about.

Grahams Appedix A lists in more detail how the simulator works. He suggests adding a LocationUpdate event, which informs nodes (aka processes in ContactSim) of any changes in their location, which can be used by the routing algorithms.

As a test, I formatted the co-location data that we generated from the SocialSensing study into the CSV format used in the simulator (I need to get a bit more information about the correct way to do this for Graham, but for the time being I copied the formatting of the MIT Reality Mining data format used by Graham), and ran two algorithms over it, Persistance Based Routing and Prophet. Whilst I am still trying to work out what the results mean(!) I produced the two plots below:

This plot simply shows the delivery ratio of the PBR prototcol using the MIT dataset with 103 users, vs the Social Sensing Study data with 26 users. It covers different time periods (both of 1 month), but clips some data at the end of the MIT dataset. However, it does show that the datasets are in some way comparable.

I am continuing to investigate the software, and will speak with Graham (who is in Canada until Christmas) about how best to adapt the software for our needs. It would seem wasteful not to adopt this software which does such a similar thing, without first testing its feasability. Another simulator mentioned by Graham is the ONE simulator, which I have not yet looked at.

What I am also planning to do, is to find out whether the MIT Dataset contains some notion of location, even if that is just cell tower data, that way, it would make it easier to compare any future location based DTN algorithm that we come up with, to Graham’s work, and any other work that uses this dataset.

Categories: Ideas, projects, what i've been doing

Ideas about Entites, Contacts and Places

October 4th, 2010 matt 2 comments

To define a format for storing data about interaction and movement of people, it might be useful to start with an abstract view of what the data means. This may help when making policies regarding how data is manipualted.

There are some decisions to make regarding how to decide on how to interpret different situations:

Entities

We assume that an entity is defined soley by their mobile device.
We assume that in most cases the entity carries that device with them.

Contact

If two entities see each other (via bluetooth), we consider them to be able to communicate.
- are there some time constraints?
If only one entity can see another, this may still mean that there are able to communicate.
If an entity sees an unknown device, does it still:
- Attempt to communicate with it?
- Consider in it’s degree calculations?
- Share data about it?

Places

Entities calculate the meaningful places that they visit.
- What algorithm do they use to calculate the place?
- But when do they update this information?
Do entities also track routes between places, and how do they calculate this
- What are the characteristics of a route?
Do entities conform to a shared view of places and location? or do they keep their own knowledge of places, and attempt to find overlaps with their peers?

Categories: Ideas

Meeting with KM, SB and DC about Social Sensing data 7th July 2010

July 7th, 2010 matt 3 comments

Met with Kevin McCarthy, Steven Bourke and Davide Cellai about what the Social Sensing study data could be used for. We are all interested in movement patterns and prediction, and decided that we should work together.

The data itself is currently stored in a MongoDB, which is a document storage database, and is apparently very easy to query. The data itself is stored in approximately 200GB of seperate files. Kevin assured us that we would be able to access this data.

Steven suggested a number of sources of similar (location) data:

GeoLife, by Microsoft Research.
SimpleGeo
SpotRank by Skyhook

He also described how he collected location data from Gowalla, for ~2000 users in Dublin. His masters thesis was about DTN with sensors, and so his interests are in line with mine and DC’s.

We agreed to meet next week to brainstorm some ideas worthy of collaboration.

Categories: collaborations, discussions, General Meetings, Ideas

Quick meeting with Paddy 6th May 2010

May 6th, 2010 matt 2 comments

Met with paddy in his office and discussed updates so far.

Paddy wants to see something tangible by next week’s update – i.e. a worked example of how vector clocks will work in terms of location.

Also emphasised that I should not get sidetracked! (of course!)

Suggested storing temporal information (parallel vector clocks?) – e.g. a from to, so that we can say things like does this time overlap this one, is it contained withing etc. etc.

Also thought about how to define location – the bounding of the location gives an experimentation thing – change the grid size – whats the computation impact of the size of the grid – and what the relevance e.g. too big and it makes realistic.

Construct vector stamps – 5 separate path across these locations, two or three allow drop messages – run through and pick various vector clocks at various times, and show them. Then start generalising them.

From this we can draw general things about e.g.: decisions made, what information is stored, what we put in vector clocks, what operators we need.

Then run a simulation and see if generalisation works. Then we can see if some things fall down, and come back and change things.

Should stick with ideas about location, not proximity yet.

Using this it is then possible to write this concisely.

Actions:

Generate a worked example/scenario
- show examples of vector clocks at times
- show the movements over time
Don’t get sidetracked!

Categories: discussions, General Meetings, Ideas, projects, Supervisor Meetings, What Paddy Says

Supervisor PhD Meeting 14 Apr 2010

April 15th, 2010 matt 2 comments

Had a meeting with Paddy for me to pitch my ideas.

This was the basis of my Pitch_to_Paddy

Mobility is NOT location…. see Simons paper

Discussed my ‘Hypothesis’

Human mobility patterns are predictable
Human proximity patterns are predictable
Knowledge of proximity and location makes opportunistic routing more efficient than proximity alone.
Proximity and Mobility can be used independantly to achieve the same efficiency in oppotunistic networking.
Mobility or Proximity can be discounted when planning opportunistic networking algorithms.
There are low complexity algorithms based on vector clocks that can be used for routing
Any given node will only need to communicate with with other nodes that they know (friends), or want to know (friends of friends).
- Paddy suggested this might be a bit like a DNS tree, which hands off knowledge
- Also experiments need to establish that a friend or a friends of a friends knowledge is enough to route
- Might establish that 3 degrees is more than enouhg
- tradeoff = you can get better efficiency if you give more coverage of the network

Local Metrics

Using vector clocks –

Range – how many hops away – build a knowledge about the network using information in the vector clocks – how do you do that? This is a PhD part.

How do we determine the important nodes? – the Aaron Quigley effect.

Nodes in your ball of radius = sphere of interest = friends, +1 hop = friends of friends, +1 = everyone else.

Dig out reviews on DTN’s – especially patterns – but paddy thinks that the notion of location and proximity have never been used, but the patterning structures e.g. highly important update nodes. So i need to look at ways of discovering special nodes. How do they determine that . Location thing seems to be different- find a review.

Mobility as opposed to location – gives your prediction element.

Limit the range of forward projection.

Datasets

Email GW to see if he can get a public dataset. – sign over under NDA?

Email Barry Smith to see if he knows of any datasets we can use. – Vodaphone dataset?

Email Aaron Quigley – he will know if there is any publicly available – his masters student has access to a corporate travel dataset.

Also – see Simons Paper about location.

Look up Intel Placelab dataset

Email knox to see what datasets he might have.

Progression

Paddy not sure where the Vector clocks fit in

Is it a novel implemenation mech. – i.e. am I going to use vector clocks in this thing.

I want to make a prototype. – paddy likes the idea – there are some hard questions – some novelty in there. This parts Understanding how to frame solid hypothesis. reading reviews. building exp structure, breaking out a few bits – vector clocks, heuristics about making decisions, how it all fits together.

Ideas about locations

Not fully formed so need to think a bit more about it

Future

We can turn the VC thing on its head, and make it useful for proximity and location.

I want to build prototype

Need to be careful not to spend too much time comparing to existing things if they are not really related.

Important thing is does it matter where you are when I pass you a message – as proximity and mobility are the same – do I pass it to you because i know you see that person, or because I know you will be in the same location as that person.

Nodes can predict their own positions – share this information with other nodes – paddy suggested sharing based on the types of people – e.g. friends get detailed info, FOAF get obfuscated location, others get much broader information.

Requests?

Does a node do calculations about other nodes, or does it ask the other node – can you get this to this person?

A little but like Agent systems?

You might have different approaches depending on who you are dealing with – e.g. message to a friends goes through friends, message to FOAF goes otherwise, everyone else – can you get it to somewhere nearer than me – or somehting…

Then we can say we of course can encrypt this information.

Plan

Paddy felt that vector clocks etc. that are used to encode e.g. double vecotr cocoks location mobility, is a solid piece of work, and if it gets into a good conference, then it is my PhD.

It will need a Order of computaion section with a complexity analysis i.e. is it N^N, Nlog(N), N^2, N^3 – dont go much beyond that – need to analyse the algorithms at the end. well travelled ground about what the complexity of vector clocks is.

I want to Nail down what these metrics in the system are, then implement CAR using these metrics as well as coming up with my own algorithm.

Need to convince Paddy what algorithms/datasets to compare with, there needs to be a good rationale be behind it.

Need to refine the contributions bit – but this will come with time

Hypotheis section is good – but must refine and remove negative ones – it should keep the positive ones, and prove one thing true or false – pick the one we think is most likely.

Add another hypothesis that low complexity algorithms based on vecotr clocks (30:24) can be used.

Dont go down rabbit holes!!! Give Paddy weekly updates – every Friday – nag paddy – if he has not responded by 10am monday – Paddy will comment back

Tighten up – look at knox Hyp section. – and write a halfpage hypothesis introduction and a one/two line hypothesis at the end.

Aside:

Can we use Barabasi way to generate new dataset – almost reverse engineer their preditions, and try to get a dataset based on it?

e.g. Random graph of streets for dublin, randomly place nodes – simulator and start to make locations as the predictions.

Dig out reviews on DTN’

Categories: Ideas, Supervisor Meetings, what i've been doing, What i've been reading, What Paddy Says

Meeting with Paddy 9 March 2010

March 9th, 2010 matt 3 comments

Met with Paddy to discuss funding and an idea of direction in PhD

Paddy said that if his EU grants get signed off, he will garauntee me another 1 year of funding from September, the only requirements will be a chunk of work which results in a ~30 page document – something to do with autonomous software patching.

Also spoke to him about the idea of Vector clocks and how we can use them for opportunisic networking – he wondered whether we could derive some utility for locations, perhaps based on duration spent at a location, number of nodes in that location etc. etc.

He said that perhaps a good paper would be on techniques for analysing locations for use with opportunistic networking.

Paddy said that he wants me to aim to submit to CfP on Social Based Routing in Mobile and Delay Tolerent Nets the deadline is in June, and split the time up until then into 6 week chunks, and for the first chunk, tell him what I will be doing – i.e. what problems I am solving, and for the whole time, what my goals are.

Also spoke about my idea for using the Dublin Bus network for research and profit – he mentioned a guy in trinity who has access to a load of transport data and also suggested I dig out an email address of someone important at Dublin bus, and send them a formal email about gettin information about the bus, and CC him (so it looks official).

He also mentioned the TextUS service based at DCU, (runs the Aircoach SMS payment service, and the SMS parking scheme) who we could collaborate with to provide a TFL type service for dublin.

Categories: discussions, General Meetings, Ideas, Supervisor Meetings, What Paddy Says

Presentation 10th Feb 2010

February 11th, 2010 matt No comments

Gave presentation to Paddy, Davide, Neil Cowzer and Fergal Reid (clique) about my quick and dirty analysis of the dataset that I have collected allready.

Slides

General concensus was that there was not really enough users, and so there were some suggestions about other datasets that might be found -persuade a mobile phone company to give data about user movements. Mine flickr/twitter for geo-tagged photo’s/tweets, and try to determine groups of people based on similar locations.

Also suggested that the GMA is good for visualising data, not greatly interesting, PH is interesting as is SPD. BD is something that is useful as an application to gather data, but would need a very large engineering effort.

Paddy suggested that if we could make the data collection process very easy, then we could throw it out to the student population to start collecting data. Fergal said that in J2ME it would be very difficult, but by sticking to C++ it might work (for Nokia phones).

Also talked about getting ground truth for data, Fergal Suggested collecting accellorometer data too (so if someone asked – how did you verify GPS trace, one can say that we correlated it with the accelorometer data). I also suggested tagging locations.

Determined the following actions:

Look for access to datasets with good location – 1 week
- WaveLAN Dataset
- HeaNET – chase paddy – Eduroam
- Mine location data from Flickr
Look at applying analysis to these datasets – specifically
- Periodicity Hunting
- Spatial Dependance on the Degree
See if we can construct overlay over these networks
- e.g. drop nodes
  - Popular locations
  - popular people
  - Other?
  - Vector clocks might be the way to do it
Read up about Vector Clocks as suggested in the paper by Klineberg, Watts and ???? at KDDOA
Speak to Graham about whether I can easily integrate this data into his code, if so – do it, otherwise think about implementing it seperately(robustly!)

Also planned to meet Paddy again next week to go over these things, and try to hammer out a better plan. Then meet with these people again in three weeks to show where I have go to.

Davide also talked about churn in proximity patterns – might be worth thinking about what this means (example was then a person regularly sees other people, and after a while, one of those people drops off the radar – what does this mean)

Paddy said that in his mind, the long goal is to be able to forward plan using the knowledge of data that has passed (prediction).

Categories: discussions, General Meetings, Ideas, projects, what i've been doing, What Paddy Says

Discussion with Davide about plots etc 4th Feb 2010

February 4th, 2010 matt No comments

Three types of data analysis:

General Mobility Analysis

We calculate the distance between locations at the start of every time period, (e.g. 1 hour) and plot the number of time that particular distance is travelled (to some granularity) over some time period (1 week maybe)

Periodicity Hunting

We measure the time spent at a location, and count the number of times in a bounded time period (say a week), using the same timescale as above to bracket readings.

(people visit common locations frequenly, or the visit some locations for a long period of time. – also think about the case that lots of people visit a common location infrequently/frequently).

Statial Dependance of the Degree

We count the number of devices seen in a given time period (same as above – e.g. 1 hour) and the location

Buddy Discovery

We count the duration of the contacts between pairs (the user and the devices he can see) and also the location of the contacts, and try to see which devices are seen most often, and then try to see which devices are seen at multiple locations. (using the same time period as above – 1 hour slots over a week)

Categories: discussions, Ideas, projects

Newer Entries Older Entries

Archive

Cell tower connections based on contact events

Routing with location

Understanding ContactSim

Ideas about Entites, Contacts and Places

Meeting with KM, SB and DC about Social Sensing data 7th July 2010

Quick meeting with Paddy 6th May 2010

Supervisor PhD Meeting 14 Apr 2010

Discussed my ‘Hypothesis’

Local Metrics

Datasets

Progression

Ideas about locations

Future

Requests?

Plan

Aside:

Meeting with Paddy 9 March 2010

Presentation 10th Feb 2010

Discussion with Davide about plots etc 4th Feb 2010

Random Posts

Categories

Archives

Meta

Archive

Discussed my ‘Hypothesis’

Local Metrics

Datasets

Progression

Ideas about locations

Future

Requests?

Plan

Aside:

Random Posts

Tag Cloud

Categories

Archives

Meta