Archive

Author Archive

Whiteboard discussion with Davide 29 Oct 2010

September 30th, 2010 2 comments

Had a discussion with Davide about what we had decided would be the best algoritm to use as the baseline scenario for routing.

We think the best approach is to use either Graham’s periodic degree routing approach, or the BUBBLE protocol. We sketched out the algorithm for Grahams algorithm, and realised that in fact it is very similar to the BUBBLE protocol, apart from BUBBLE does not adapt to a changing period in which to calculate degree, and Grahams protocol does not use communities.

We also had some ideas about detecting places as higher level concepts of where a node is, rather than raw location readings (e.g. GPS).

We also talked briefly about how we would use the popularity of places, in combination with community knowledge as the basis for a routing algorithm. I will write this up seperately.

Categories: Uncategorized

Kick off Meeting with Padraig and Davide 21 Sep 2010

September 21st, 2010 2 comments

Had a first meeting with Padraig, where I presented a few slides (Presentation to Padraig Sep 2010) about my area of interest and how I got to it etc.

Padraig initially suggested that location is not important, but later decided that in fact, it is useful when the network structure is not fully known.

He suggested that I take a baseline scenario, using a reliable dataset (Social Sensing), and use is as a straw man to test further refinements on.E.g. Take the degree approach (Graham & Davide), train it over 1 month, then have a test period, where 50 messages are sent across the network. At the same time have a firm idea about the next stage refinements, i.e. using location to improve routing. (i.e. use a location based routing algorithm). This will then drive the next round of improvements.

He also suggested that the application of the idea is not so important, and whilst I will deal with that in the thesis, it will not figure greatly.

He also talked about community finding in the network, and how we could use that for routing.

I agreed to send on Graham’s paper, and perhaps the BUBBLERap paper.

We will meet again next week.

Categories: Uncategorized

About Datasets

September 14th, 2010 2 comments

I have spent a couple of days looking at the quality of the datasets I have parsed (Cabspotting and Geolife) and have had a couple problems. When I imported the data, I made the decision to use the MySQL datetime format for timestamps. This resulted in very poor performance when querying the database (maybe). So, I decided to convert the timestamps to integers, which represented unix time.

ALTER TABLE `entity_locations` ADD `unixtime` INT NULL;

Query OK, 25075904 rows affected (7 min 40.11 sec)

The coversion was a simple case of create an new column, then updating the new column with converted timestamps:

UPDATE entity_locations SET unixtime = UNIX_TIMESTAMP(timestamp);

Query OK, 25075904 rows affected (11 min 13.38 sec)

Then adding an index (I should have added this when adding the extra column, but may not have made much difference)

ALTER TABLE `entity_locations` ADD INDEX ( `unixtime` );

Query OK, 25075904 rows affected (15 min 21.79 sec)

All seemed well and fine, however, I had noticed during testing a sample dataset, that the MySQL datetime field equated to 23 seconds earlier than the new unixtime field. (as parsed by PHP), the ameded dataset maintained this difference, meaning it was not a compound error. This will not be a major problem, unless comparing to another dataset, using a small granularity.

One problem I did notice however, in the Cabspotting dataset, the timestamp column  had ‘on update current timestamp’ set meaning that when the new unixtime column was updated, the timestamp column was given a new timestamp value. I rectified this by removing the trigger, and setting the timestamp based on the unixtime. (not accounting for the 23 seconds).

Another issue I noticed with some of the data, was that the timestamp was set to the time of insert, for some of the geolife data, as well as some records which were zeroed, and some which were in the future. This may mean the need for a full import. Luckily I have kept all of the data and parsing scripts.

I still need to do some thorough analyis of the datasets in other ways as mentioned before. And I also want to convert the social sensing type data into the same format as these two datasets. There is also the CenceMe dataset to consider; it may not be suitable for parsing, as the location data could be very sparse. This of course might be useful for comparing to thicker datasets.

Geolife dataset

Initial analysis of the datasets to find the concentration of readings shows that there are a handful of users who collect data from around apri/mayl 2008, then October 2008, another set of users contribute data in earnest, with fewer contributing from the original set of users. Readings tail off from april 2009, with vert few users contributing after the beggining of August 2009.

Cabspotting

Show a good amount of data is recoreded for most users between 17 May 2008 and 09 Jun 2008, with only a small number of exceptions.

Discussion with Davide 10 Sep 2010

September 10th, 2010 2 comments

Had a brainstorm with Davide about what we should analyse. We decide to use the Geolife dataset to test out some ideas:

Firstly I will do a meta-analysis of the data, to see what nodes we do and do not have data for in a given time period. Counting the number of reading per day over the dataset period.

We want to identify the colocation between nodes, and we calculate this

[latex]C_{AB}(t) = 1 - \theta(|x_A(t)-x_B(t)|-\lambda)[/latex]

which means – if A and B are within λ distance of each other (where x is the location of a and y is the location of b) at time t, Cab is 1, else Cab is 0.

[latex]C_{AB} = \frac{1}{T} \sum C_{AB}(t) [/latex]

This gives us the average over all time periods, showing the co-locatedness of a and b. (the sum of the co-location of A and B over all locations, divided by the number of time periods)

We use this to construct a graph, where a and b are connected using a weighted edge labelled with Cab – this network is the structure of co-locatedness, and is exactly the same as a proximity network; it does not care where you meet, only that you do meet.

We calculate this in 1 month(?) blocks to see how the graph evolves over time.

We are also interested to find out if nodes are influential over locations (e.g. if particular nodes visit a location, which then becomes popular).

We plot the popularity of x over time, where the popularity is the number of poeple that visit a location in a day. Where the set of X  is derived from all known locations.We hope to find a pattern where the graph tails off, or increases over time. flat graphs are un-interesting.

For this I will need to calculate locations from the data.

Davide’s note about Mobile Agents

Categories: Uncategorized

Weekly update 10 Sep 2010

September 10th, 2010 2 comments

Davide and I have been conducting a quick literature search for interesting papers relating to location in delay tolerant networking, and interesting goals to do with knowledge and prediction of location. The first round led us to four main areas of interest:

  • Reccommendation systems
  • Routing in Delay Tolerant Networks (obviously)
  • Peer-to-peer over proximity networks (e.g. bit torrent over DTN)
  • ????

I am in the process of reading a few papers of interest, and Davide is working away at finding other information out. We plan to meet this afternoon to discuss our progress.

I have also parsed and assimilated the GeoLife and Cabspotting datasets, and plan to do the same for the CenceMe dataset.  When we get the Social Sensing project dataset, I will do the same with that. I have tried to use a consistant format for the data, so that when it comes to it, it will be possible to analyse it all in the same way. The simple format is based on the concept that networks are formed of entities, which also have locations that they visit over time.

I had already written, and have since updated to reflect this format, a playback visualisation of entity movements. This is really just a way of seeing what the data looks like over time.

Funding update:

I have finally managed to sort out the details of my funding, and have calculated that I have enough funds to cover a stipend for the next 11 months (I have already paid half fees for this year):

€11,976.42 remaing IRCSET Scholarship fund

+ €2621.25 credit on UCD account – which will be transferred to my IRCSET fund by 18 September 2010

= €14,597.67 / €1333.50 = 10.94 installments.

However, the only problem is that I will not be paid this month, so I plan to ask payroll to produce an advance cheque for this month (fingers crossed).

Categories: Uncategorized

Meeting with Davide 3rd Sep 2010

September 10th, 2010 5 comments

Met with Davide to catch up after the holidays. I reminded Davide what we had talked about before, and we decided to go away and do a literature search for related things.

I re-iterated my idea about using prediction to get a message to the next expected meeting point, but if the timing does not overlap, using an unknown proxy at that location to pass the message on.

Categories: Uncategorized

Update August 2010

August 23rd, 2010 2 comments

I have come back from a busy two weeks of selling photographs (oldtowncarnivalweek.co.uk/photos/) and visiting relatives, and I am thoroughly exhausted! And I am happy to say, that I didn’t think about PhD stuff once, which hopefully means, my brain has been processing data in the background. This said, I have forgotten where I was before I left, so I am hoping that all that background processing will mean I have already solved the problems, when I re-discover them.
My plan for the next week or so, it to get back up to speed by re-reading my notes, and starting to test out my/our hypothesis, starting with the suggestions that Davide made in the last meeting.

Paddy has left for Tazmania, and unfortunately, we were unable to meet to talk about funding before he left. This means that I am at a loss as to how to proceed. I have retrospectively applied for half fees, with the knowledge that IRCSET will allow me to carry on using unused funds for my stipend. The only worry I have, is that there will not be enough to cover fees. I may try to apply for 6 month fees, rather than a full 12.Paddy did originally say that he could at least find some money to cover fees.

I have also applied for the CSI research bursary, but I am hoping that Paddy will be able to find extra funding (he previously suggested that he had some LERO grant money left, and an associated project).

I have received an email from IRCSET with regards to Paddy leaving, I am still waiting for advice from Paddy as to how I proceed.

Categories: Uncategorized

Meeting with DC 28 July 2010

July 28th, 2010 2 comments

Met with Davide to talk about where I currently am in my research.

I talked him through Ideas about Vector Clocks for Proximity and Location that I had recently transcribed into a document from my notebook, and we came up with some plans for the next few weeks whilst he is away, and agreed to meet when he is back at the end of August.

The following are some notes we made whilst discussing my idea:

Formal notation regarding node location overlaps.

Ovy < Owy  → message v,w

When the overlap of the set of locations that v visits in relations to y, is less than the set of locations that w visits in relations to y, then pass the message from v to w.

We discussed security withing DTNs and realised that neither of us has much knowledge of this, so Davide suggested I look up the work of Eiko Yoneki, who may have published in this area, as it is something we will have to consider.

We also discussed how to compare the graph generted from locations, and the graph genereted from proximity, and how we could combine them. Davide suggested looking at the work of Schlomo Havlin.

Finally we talked about where to go from here. I will look at some datasets, including CABSPOTTING, where there is fine grained location data, and try to test the hypothesis that I came up with. for example, look at the overlap of locations for individuals over certain timescales, and produce a probability distribution of overlap. Also, make a new graph whihc is formed by overlaps based on whether a node visits a location (with no ordering), and see whether this has any interesting properties. Overall, try to use the datasets to test out all of the hypothesis.

  • Look up the work of Eiko Yoneki
  • Look at the work of Schlomo Havlin.
  • Find datasets and form into graphs
  • Test all hypothesis questions
Categories: Uncategorized

Meeting with Paddy 8th July 2010

July 8th, 2010 2 comments

Talked about the project so far, I said that I had started a workthrough of people’s movement patterns, and had come up with many ideas around it, and said that what I would like to do is present what I had found so far, in terms of node statistics at any given point in time, and some interesting aspects that came out.

I also listed some of the questions I had thought about during this exercsise, as follows:

  • How do we collect data about mobility
  • How do we define what a location is
    • What types of location are there?
  • What is a node?
    • A user? A phone? A user carrying a phone?
    • A location? A series of locations
  • How are edges defined?
    • users who meet (at a location, or are proximate)
    • Locations that are connected when a user travels between them
  • What can nodes learn about the network?
    • VC of proximity times (windows)
    • VC of location visits
    • Knowledge of location probabilities
    • Computed routes between nodes based on co-locations
    • Expected next delivery times for messages (+/- error)
  • How can nodes communicate?
    • Ask for expected delivery time to X
    • Share knowledge of routes to X
    • Share location and proximity probabilities over time
    • Use Agent like behaviour to decide when to pass messages
    • Feedback mechanism to re-enforce routes
  • How do we deal with privacy? (not main focus)
    • Share fine-grained location information only with trusted peers
    • Encrypted payloads

We agreed that the most interesting part is the forward prediction of location, and Paddy said that if I can do all of this, then it will be my PhD. Paddy also said that I should spend some time understanding the realms of computational complexity, just so I can recognise it when I see it.

Also spoke about funding. Paddy is looking at finaces this week.

I asked about getting a new 2nd supervisor, and Paddy suggested I look for a new primary supervisor, someone who I could knock on the door and ask for help – as he is still a visiting supervisor, he will be here a bit, but not much. I said that I had spoken to Padraig Cunningham, and he suggested I speak to Paddy about path to finishing, and then come and see him. Paddy intimated that he would be a good supervisor, but that I would have to be very concise when discussing PhD stuff with him, as he won’t like waffle!

New tasks:

  • Plan a presentation to Paddy and Invite Davide along
  • Look up Computational Complexity
  • Look up Graeme Stevensons Paper on LOCATE

Existing Tasks:

Project specific tasks

  • Plan short experiment to collect ground truth location data
  • Prepare to show Paddy the finished vector clock implementation (workthrough), picking out interesting parts and identifying next steps
  • Summarise findings from vector clock implementation (workthrough)

Other tasks

  • Implement vector clocks in simulator, based on location
  • Read Knox’s thesis.
  • Read Barabasi’s book.
  • Generate a rough outline of chapters for my thesis, and identify the main areas for the background section
  • Write down ideas about how to define locations (draft)
  • Dig out reviews on DTN’s- especially about patterns and finding important nodes
Categories: Uncategorized

Meeting with KM, SB and DC about Social Sensing data 7th July 2010

July 7th, 2010 3 comments

Met with Kevin McCarthy, Steven Bourke and Davide Cellai about what the Social Sensing study data could be used for. We are all interested in movement patterns and prediction, and decided that we should work together.

The data itself is currently stored in a MongoDB, which is a document storage database, and is apparently very easy to query. The data itself is stored in approximately 200GB of seperate files. Kevin assured us that we would be able to access this data.

Steven suggested a number of sources of similar (location) data:

  • GeoLife, by Microsoft Research.
  • SimpleGeo
  • SpotRank by Skyhook

He also described how he collected location data from Gowalla, for ~2000 users in Dublin. His masters thesis was about DTN with sensors, and so his interests are in line with mine and DC’s.

We agreed to meet next week to brainstorm some ideas worthy of collaboration.