Filling the Gaps of Vehicular Mobility Traces · Filling the Gaps of Vehicular Mobility Traces...

8
Filling the Gaps of Vehicular Mobility Traces Fabrício A. Silva Federal University of Minas Gerais Belo Horizonte, Brazil [email protected] Clayson Celes Federal University of Minas Gerais Belo Horizonte, Brazil [email protected] Azzedine Boukerche University of Ottawa Ottawa, Canada [email protected] Linnyer B. Ruiz State University of Maringá Maringá, Brazil [email protected] Antonio A. F. Loureiro Federal University of Minas Gerais Belo Horizonte, Brazil [email protected] ABSTRACT Simulation is the approach most adopted to evaluate Vehicu- lar Ad hoc Network (VANET) and Delay-Tolerant Network (DTN) solutions. Furthermore, the results’ reliability de- pends fundamentally on mobility models used to represent the real network topology with high fidelity. Usually, sim- ulation tools use mobility traces to build the corresponding network topology based on existing contacts established be- tween mobile nodes. However, the traces’ quality, in terms of spatial and temporal granularity, is a key factor that affects directly the network topology and, consequently, the evalu- ation results. In this work, we show that highly adopted existing real vehicular mobility traces present gaps, and propose a solution to fill those gaps, leading to more fine- grained traces. We propose and evaluate a cluster-based solution using clustering algorithms to fill the gaps. We ap- ply our solution to calibrate three existing, widely adopted taxi traces. The results reveal that indeed the gaps lead to network topologies that differ from reality, affecting directly the performance of the evaluation results. To contribute to the research community, the calibrated traces are publicly available to other researchers that can adopt them to im- prove their evaluation results. Categories and Subject Descriptors C.2.4 [Computer-Communication Networks]: Dis- tributed Systems General Terms Design, Algorithms, Performance Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full cita- tion on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- publish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. MSWiM’15, November 2–6, 2015, Cancun, Mexico. c 2015 ACM. ISBN 978-1-4503-3762-5/15/11 ...$15.00. DOI: http://dx.doi.org/10.1145/2811587.2811612. Keywords vehicular mobility traces; filling the gaps; calibration; vehic- ular ad-hoc networks; performance evaluation; 1. INTRODUCTION Simulation is the approach most adopted to evaluate Ve- hicular Ad hoc Network (VANET) solutions [10, 17]. Their performance evaluation is an important challenge faced by researchers, given the particular characteristics of this type of ad hoc network, such as a highly dynamic and large- scale topology. The conduction of real experiments using ordinary vehicles is a very expensive and time-consuming approach, particularly when a large-scale evaluation is re- quired. In addition, there is no publicly available, large-scale testbed. Moreover, it is unlikely that a large-scale testbed will be available in the near future due to involved deploy- ment and maintenance costs. Simulation, on the other hand, is a cost-effective, large-scale approach widely adopted by researchers. However, the reliability of the simulation re- sults depends on appropriate and accurate vehicular mo- bility models to represent the network topology with high fidelity. The adopted vehicular mobility model plays an impor- tant role on the reliability of the simulation results [7, 8, 11, 18]. Existing simulation tools use mobility models to build scenarios where vehicles move and communicate with each other. The mobility model is responsible for determining the position of vehicles at each instant, which is used to build the network topology. In other words, unrealistic mobility models lead to unrealistic network topologies, and, conse- quently, unreliable evaluation results, as demonstrated by Baumann et al. [2]. Therefore, it is very important to adopt realistic vehicular mobility models when evaluating VANET solutions. Vehicular mobility traces typically describe the position of vehicles over time and are used to bring realism to sim- ulation tools. With the advance of GPS-enabled devices, real traces collected by vehicles during their daily routine are publicly available in the literature [3, 23, 26]. These real traces are very useful to the evaluation process, since they define movements of real vehicles and are used in dif- ferent scenarios, as discussed in the next section. However, their quality, in terms of spatial and temporal granularity, is a key factor that affects the network topology and, con- 47

Transcript of Filling the Gaps of Vehicular Mobility Traces · Filling the Gaps of Vehicular Mobility Traces...

Page 1: Filling the Gaps of Vehicular Mobility Traces · Filling the Gaps of Vehicular Mobility Traces Fabrício A. Silva Federal University of Minas Gerais Belo Horizonte, Brazil fabricio.asilva@dcc.ufmg.br

Filling the Gaps of Vehicular Mobility Traces

Fabrício A. SilvaFederal University of Minas

GeraisBelo Horizonte, Brazil

[email protected]

Clayson CelesFederal University of Minas

GeraisBelo Horizonte, Brazil

[email protected]

Azzedine BoukercheUniversity of Ottawa

Ottawa, [email protected]

Linnyer B. RuizState University of Maringá

Maringá, [email protected]

Antonio A. F. LoureiroFederal University of Minas

GeraisBelo Horizonte, [email protected]

ABSTRACTSimulation is the approach most adopted to evaluate Vehicu-lar Ad hoc Network (VANET) and Delay-Tolerant Network(DTN) solutions. Furthermore, the results’ reliability de-pends fundamentally on mobility models used to representthe real network topology with high fidelity. Usually, sim-ulation tools use mobility traces to build the correspondingnetwork topology based on existing contacts established be-tween mobile nodes. However, the traces’ quality, in terms ofspatial and temporal granularity, is a key factor that affectsdirectly the network topology and, consequently, the evalu-ation results. In this work, we show that highly adoptedexisting real vehicular mobility traces present gaps, andpropose a solution to fill those gaps, leading to more fine-grained traces. We propose and evaluate a cluster-basedsolution using clustering algorithms to fill the gaps. We ap-ply our solution to calibrate three existing, widely adoptedtaxi traces. The results reveal that indeed the gaps lead tonetwork topologies that differ from reality, affecting directlythe performance of the evaluation results. To contribute tothe research community, the calibrated traces are publiclyavailable to other researchers that can adopt them to im-prove their evaluation results.

Categories and Subject DescriptorsC.2.4 [Computer-Communication Networks]: Dis-tributed Systems

General TermsDesign, Algorithms, Performance

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full cita-tion on the first page. Copyrights for components of this work owned by others thanACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-publish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected]’15, November 2–6, 2015, Cancun, Mexico.c© 2015 ACM. ISBN 978-1-4503-3762-5/15/11 ...$15.00.

DOI: http://dx.doi.org/10.1145/2811587.2811612.

Keywordsvehicular mobility traces; filling the gaps; calibration; vehic-ular ad-hoc networks; performance evaluation;

1. INTRODUCTIONSimulation is the approach most adopted to evaluate Ve-

hicular Ad hoc Network (VANET) solutions [10, 17]. Theirperformance evaluation is an important challenge faced byresearchers, given the particular characteristics of this typeof ad hoc network, such as a highly dynamic and large-scale topology. The conduction of real experiments usingordinary vehicles is a very expensive and time-consumingapproach, particularly when a large-scale evaluation is re-quired. In addition, there is no publicly available, large-scaletestbed. Moreover, it is unlikely that a large-scale testbedwill be available in the near future due to involved deploy-ment and maintenance costs. Simulation, on the other hand,is a cost-effective, large-scale approach widely adopted byresearchers. However, the reliability of the simulation re-sults depends on appropriate and accurate vehicular mo-bility models to represent the network topology with highfidelity.

The adopted vehicular mobility model plays an impor-tant role on the reliability of the simulation results [7, 8, 11,18]. Existing simulation tools use mobility models to buildscenarios where vehicles move and communicate with eachother. The mobility model is responsible for determining theposition of vehicles at each instant, which is used to buildthe network topology. In other words, unrealistic mobilitymodels lead to unrealistic network topologies, and, conse-quently, unreliable evaluation results, as demonstrated byBaumann et al. [2]. Therefore, it is very important to adoptrealistic vehicular mobility models when evaluating VANETsolutions.

Vehicular mobility traces typically describe the positionof vehicles over time and are used to bring realism to sim-ulation tools. With the advance of GPS-enabled devices,real traces collected by vehicles during their daily routineare publicly available in the literature [3, 23, 26]. Thesereal traces are very useful to the evaluation process, sincethey define movements of real vehicles and are used in dif-ferent scenarios, as discussed in the next section. However,their quality, in terms of spatial and temporal granularity,is a key factor that affects the network topology and, con-

47

Page 2: Filling the Gaps of Vehicular Mobility Traces · Filling the Gaps of Vehicular Mobility Traces Fabrício A. Silva Federal University of Minas Gerais Belo Horizonte, Brazil fabricio.asilva@dcc.ufmg.br

sequently, the evaluation results. The existence of spatialand temporal gaps (i.e., long periods or distances betweentwo consecutive entries of a vehicle in the trace) lead to net-work topologies that differ from reality. In fact, the traceswe analyzed present spatial and temporal gaps that affectthe simulation results presented in the literature. Hence, itturns out that finding and eliminating such gaps to builda high-fidelity mobility model is a key aspect to guaranteereliability of the results. Nevertheless, this problem is nottackled appropriately in the literature, since most solutionsfocus on finding a high-level path between two sparse points(i.e., a gap), instead of building a fine-grained trajectory.In this study, we find and fill existing gaps appropriately,leading to calibrated fine-grained traces.

In this work, we contribute to the research communityby finding and eliminating gaps of vehicular mobility traces.First, we demonstrate the existence of gaps in the publiclyavailable traces, which must then be calibrated to eliminatesuch gaps. Given that, we propose and evaluate a cluster-based solution to fill the gaps, following the methodologyproposed in [25]. Our solution relies on the existing trajec-tory points, obtained from the trace itself, that are organizedinto clusters to represent anchor points used in the calibra-tion. Therefore, our approach is flexible to be adopted in dif-ferent real traces, since there is no need for looking at a mapnor any further information. In fact, we demonstrate thatby applying our solution to calibrate three existing, widelyadopted taxi traces. We consider taxi traces in our study be-cause they are real, publicly available, and widely adoptedin the literature. However, our solution is general enoughto be applied to any vehicular mobility trace. The resultsreveal that indeed the gaps lead to different network topol-ogy graphs, affecting directly the results of the performanceevaluation. To cooperate with the research community, wemade the calibrated traces publicly available at [29].

The remainder of this work is organized as follows. InSection 2, we present the studies found in the literature thatare related to ours. Next, in Section 3, we describe the realtraces used in our work and analyze the existence of gapsthat need to be filled. In Section 4, we present a calibrationsolution to fill the gaps. We present the calibration resultsin Section 5. Finally, we conclude our study and presentsome future work in Section 6.

2. RELATED WORKThe demand for real vehicular mobility traces has in-

creased significantly in the last decade because of the in-terest of the research community in VANETs and Delay-Tolerant Networks (DTNs). This demand has led some re-search groups to install GPS-enabled devices on taxis andcollect their routes, which are then organized and made pub-licly available. Currently, there are publicly taxi mobilitytraces from San Francisco, USA [23, 24], Rome, Italy [1, 3],Shanghai [26] and Beijing [30], China.

The public availability of such traces has led the researchcommunity to question how to model vehicles and their con-nectivity. To this end, some studies started characterizingthe mobility traces. In [1], the authors characterize the taxitrace from Rome and analyze an epidemic dissemination pro-tocol using this trace as the mobility model. The studiespresented in [4, 6, 13] characterize the network topologyand connectivity metrics of the taxi trace from San Fran-cisco. Furthermore, the taxi trace of Shanghai is used to

study mobility patterns [15, 16, 19, 31] and network topol-ogy and connectivity metrics [14, 32, 33]. Similarly, thetrace from Beijing is also explored in mobility characteriza-tion studies [9, 30].

Those characterization and analyzes lead to importantfindings about mobility patterns, helping defining novel solu-tions related to communication and dissemination protocolsfor VANETs and DTNs. However, most VANET and DTNperformance evaluations rely on vehicle contacts. It turnsout that the network graph representing those contacts isbuilt based on the mobility traces, which may present gapsin space and time (see Section 4.1). Furthermore, such gapslead to missing contacts and, consequently, an incompletegraph representing the network topology that does not rep-resent correctly the real contacts among vehicles.

There are only a few studies analyzing how meaningful anetwork topology obtained from a trace is, and, when neces-sary, proposing calibrating algorithms to improve it. In [12],the authors focus on extrapolating people trajectories basedon sparse collected points. To this end, the authors assumeusers have a daily routine, and exploit their historical tra-jectories to infer unknown ones. The assumption of the ex-istence of daily routines does not apply to taxis that moveaccording to user demands. Furthermore, the inferred tra-jectory is not represented by a high number of points, whichis required to build the network topology.

In [20], the authors propose a probabilistic model to inferthe path a mobile user follows given two position points.The path is represented by the roads in the map that theuser probably traveled to move from the origin point to thedestination one. Since the path does not present any pointrepresenting a high-granularity trajectory of a mobile user,it is not feasible to build a contact graph with this solution.Therefore, it is not useful to evaluate the performance ofVANETs and DTNs.

In [13], the authors interpolate adjacent points with theobjective of finding an intermediary point between them.To this end, it averages samples one minute backward andforward to estimate the position of a mobile entity in a givenperiod. This simple approach works when the mobile entitytravels following a straight line. However, it fails when theentity turns its direction at an intersection, a very commonmobility pattern when it comes to vehicles.

In [25], the authors propose a methodology composed oftwo components, adopted in this work, to calibrate trajec-tories: a reference system and a calibration method. Thereference system is built from a set of anchor points inde-pendent of the trajectory. The calibration method uses thereference system to find points to be inserted along the tra-jectory, making it more complete. The authors evaluate andpresent results of different strategies of their methodology.However, the following drawbacks motivate us to developthe current work. First, the detailed algorithm to build areference system is not presented. Also, their calibrationmethod ignores the relationship between anchor points inthe reference system, leading to an inaccurate calibration.In addition, the taxi trace mentioned and used in their workis not described; in other words, it is not possible to re-produce their results. Finally, the calibrated trace is notpublicly available.

Our current work complements [25] by proposing and de-scribing algorithms to calibrate incomplete trajectory data,and by making calibrated traces available to the research

48

Page 3: Filling the Gaps of Vehicular Mobility Traces · Filling the Gaps of Vehicular Mobility Traces Fabrício A. Silva Federal University of Minas Gerais Belo Horizonte, Brazil fabricio.asilva@dcc.ufmg.br

community. Furthermore, our calibration method performsbetter than [25], since it considers the relationships betweenthe points in the reference system. Therefore, researcherscan easily reproduce our results, apply our solution to othertraces, and download the already calibrated traces fromthree cities located in different parts of the world. Moreover,we envision a significant improvement on the performanceevaluation results of VANET and DTN solutions.

3. THE MOBILITY TRACESThe vehicular mobility traces available in the literature

can be classified as synthetic or real. The synthetic tracesare built by mobility generator tools based on particularcharacteristics of the city, such as population, neighborhood(i.e., residential, commercial, industrial), among other as-pects collected by the city managers. The most known syn-thetic mobility traces are from Cologne [28] and Zurich [22].Since synthetic traces present a high granularity in terms ofspace and time, there is no need to fill their gaps.

The real mobility traces are the ones generated by real ve-hicles equipped with GPS-enabled devices. Usually, the realmobility traces represent the mobility of taxis, since it is eas-ier to deploy and maintain this kind of experiment in vehiclesof this category than in ordinary vehicles. Among the exist-ing real mobility traces in the literature, we selected three tobe used in this work: Rome, San Francisco, and Shanghai.The selection was motivated by the high demand for suchtraces and their geographical locations that represent threedifferent parts of the world, namely Europe, North America,and Asia.

Each trace comes from a different source and presents adifferent format. To facilitate their adoption and use, we for-matted all entries as tuples 〈id, timestamp, lat, long〉, whereid is the vehicle unique identifier, timestamp is the date andtime of the entry in the format yyyy-mm-dd HH:MM:ss, andlat and long are the latitude and longitude, respectively, inthe WGS84 coordinate system format. In the following, wedescribe the main details of each trace.

3.1 RomeThe trace of Rome [1, 3] contains the position of taxi cabs

working during the entire month of February, 2014. Eachtaxi driver has a device running Android and is equippedwith a GPS receiver that periodically retrieves its positionand sends it to a central server. Positions with a precisionerror higher than 20 m were ignored. This trace contains, forthe entire month, a total of 21,817,851 position entries com-ing from 316 taxis. On average, each vehicle contributes with69,040 positions for the whole collection period. However,few vehicles contribute with higher values up to 118,500,while others contribute with values as low as 19, for exam-ple.

3.2 San FranciscoThe trace of San Francisco [23, 24] contains position en-

tries of 536 taxis working during the month of May, 2008.Each taxi has a GPS receiver installed in it, and sends loca-tion information (identifier, timestamp, latitude, longitude)periodically to a central server. This trace contains, for theentire month, a total of 11,219,955 position entries. Eachtaxi contributes, on average, with around 20,930 entries. Avery few of them contribute with significant lower values,while others contribute with higher measures up to 49,370.

3.3 ShanghaiThe trace from Shanghai [15, 26] presents positions of

4,316 taxis from February to April of 20071. However, wecould find publicly available only the data from one sin-gle day of February, containing 6,075,587 position entries.Similar to the other cases, the taxis were equipped withGPS-enabled devices, which sent their position informationperiodically to a server. On average, each vehicle collectedaround 1,408 entries for the only day publicly available. Fewvehicles contribute with lower values, while others contributewith higher values, up to 7,011.

4. CALIBRATION SOLUTIONIn this section, we describe our solution to fill the gaps be-

tween consecutive points far enough from each other, causingthe topology graph of the network to be incomplete. First,we show how distant in space the entries of the traces are.Next, we present our solution to fill such gaps and, conse-quently, improve the quality of the traces.

4.1 Measuring the gapsThe completeness of the topology graph is a key factor in

the performance evaluation of VANET and DTN solutions.In fact, contacts among vehicles or mobile devices that oc-curred in reality, but were not considered due to gaps in thetraces, affect the evaluation, since data exchange dependson the contacts. To measure how expressive the gaps in theexisting traces are, we evaluate the distance between everytwo consecutive entries, as discussed in the following.

Figure 1 depicts the Complementary Cumulative Distri-bution Function (CCDF) of the distances between every twoconsecutive points for Rome (Figure 1(a)), San Francisco(Figure 1(b)) and Shanghai (Figure 1(c)). As indicated bythe third quartile (blue vertical line), 25% of two consecutivepoints are 66.7 m, 446.7 m, and 163.3 m apart for Rome, SanFrancisco, and Shanghai, respectively. Considering thosegaps and assuming a transmission range of 100 m [5], manyexisting contacts will be missed when the network topologygraph is built from the trace.

In addition to spatial gaps, temporal gaps should also beconsidered since they also lead to missing contacts. For ex-ample, two vehicles not moving (or moving very slowly) closeto each other, will be in contact for a period of time. Ifthe trace lacks these points for the entire period, the con-tacts will not be considered in the topology graph, since it isnot possible to know whether or not the vehicles were closeenough to be in contact in a particular period.

To evaluate the existence of temporal gaps, we also mea-sured the intervals between two consecutive points of thethree traces, and it was possible to note that 25% of the in-tervals between two consecutive points are longer than 15 s,62 s, and 63 s, for Rome, San Francisco, and Shanghai, re-spectively. Given the highly dynamic aspect of VANETs andDTNs, such long intervals between two consecutive pointswill cause gaps in the induced topology and many existingcontacts will be missed.

Given that, we conclude that the existing vehicular mobil-ity traces indeed present spatial and temporal gaps, whichmay lead to incorrect representation of the network topol-ogy for performance evaluation. In addition, we observe that

1These data were obtained from Wireless and Sensor net-works Lab (WnSN), Shanghai Jiao Tong University.

49

Page 4: Filling the Gaps of Vehicular Mobility Traces · Filling the Gaps of Vehicular Mobility Traces Fabrício A. Silva Federal University of Minas Gerais Belo Horizonte, Brazil fabricio.asilva@dcc.ufmg.br

(a) Rome (b) San Francisco (c) Shanghai

Figure 1: Complementary Cumulative Distribution Function (CCDF) of the distances between two adjacentpoints. These plots reveal that a significant number of entries present a distance between points that couldaffect the network topology.

linear interpolation is not an attractive technique for fillingthe gaps, since it would lead to points outside the roads. Tosolve this problem, in the next section we present a cluster-based solution to fill the existing gaps.

4.2 Filling the gapsOur approach for filling the gaps in vehicular mobility

traces is divided into two stages. The first one extracts areference system from the vehicles’s historical GPS trajec-tory dataset. The second stage applies a calibration method,using a subset of anchor points of the previously built refer-ence system. In the following, we describe both steps.

4.2.1 Cluster-based Reference SystemThe reference system consists of a set of points resulting

from a clustering process that uses historical trajectories.Each point, called centroid, represents a cluster of GPSpoints close together recorded by all vehicles in the trace.Given that those GPS points represent real trajectories, itis reasonable to assume that each centroid is a potential lo-cation for a new point in a trajectory. In other words, itis very likely that a centroid represents a correct point ina road where vehicles travel through. Here, we adopt theK-Means clustering method [21] for partitioning the datainto K clusters according to the density of GPS points and,then, we obtain the centroid point of each cluster to formthe reference system.

Algorithm 1 - Reference System based on Clustering

Input: The historical of vehicles trajectories (raw data) andnumber of clusters (K)Output: Reference System (RefSys), a set of centroidpoints.

1: procedure ClusteringGpsPoints2: Clusters← applyClustering(raw data,K)3: RefSys← getCentroids(Clusters)4: end procedure

Algorithm 1 shows the basic steps to obtain the referencesystem. Initially, the K-means method partitions the datainto K groups according to the density of points (Line 2).

Then, we obtain the centroid of each group and add to thereference system (Line 3).

A particular problem when using K-Means is the identifi-cation of an appropriate value of K. Thus, to overcome thisproblem, we applied the elbow method [27], which finds theminimum value of K that seems to give the smallest error.In other words, if we increase the value K, the error willnot decrease significantly, meaning it is not worth to do so.Regarding the computational complexity, the running timeof K-Means clustering method is given as O(nkdi), wheren is the number of samples, d is the number of dimensions(two dimensions in our case, namely latitude and longitude),k is the number of clusters, and i is the number of itera-tions needed until the convergence of the clustering processis reached.

4.2.2 Calibration MethodIn this stage, we perform the calibration following a

geometric-based approach, which is an improvement of thebase method described in [25]. More specifically, when wefound a gap in a trajectory T , we obtain the reference systemof the region and, then, select the centroid points betweenthe endpoints of the gaps to the trajectory T .

The calibration method receives as input the followingparameters: T is a set of n consecutive points with spatio-temporal information; RefSys is the reference system ob-tained from the Algorithm 1; min d and max d are the lim-its to consider the existence of a spatial gap, and time d isthe threshold to consider a temporal gap. As result, we havea new trajectory T ′ with the original points from T and aset of calibrated points added to fill the existing gaps in T .

Algorithm 2 describes the calibration method. For eachsequence of two points in T , we check if there is a gap be-tween them according to input parameters (Lines 4–8). Ifthis is the case, we perform the calibration. Initially, we de-tect the set of centroid points from the reference system nearthe corresponding gap. For this, the bounding box functionfinds the point half-way (midpoint) between the two end-points of the gap and returns the circle with center in thismidpoint (Line 9). Then, we obtain all centroid points fromthe reference system with coordinates inside the circle andstore them in C (Line 10). Next, we iteratively find the near-

50

Page 5: Filling the Gaps of Vehicular Mobility Traces · Filling the Gaps of Vehicular Mobility Traces Fabrício A. Silva Federal University of Minas Gerais Belo Horizonte, Brazil fabricio.asilva@dcc.ufmg.br

est point a∗ ∈ C to the centroid that satisfies the angularcondition (Lines 14–15). The angular condition (Line 15)guarantees that only centroids in the same direction of thetrajectory are considered, to avoid the selection of points inthe opposite direction. If true, insert a∗ in L between pp andpn. Next, remove a∗ from C and repeat this last sequence ofsteps while C is not empty (Lines 13–23). Finally, we insertthe calibrated points of L into T ′.

The algorithm described in [25] does not consider the re-lationship between the points inserted in the gaps. In oursolution, presented in Algorithm 2, we consider the relation-ship for choosing each new centroid based on the distancefrom the last selected centroid (Line 14).

Algorithm 2 - Calibration Method

Input: Trajectory (T = [P1, P2, ..., Pn]), Reference System(RefSys), minimum spatial distance (min d), maximumspatial distance (max d) and temporal distance (time d)Output: A calibrated trajectory (T ′) without gaps.

1: procedure Calibrate2: T ′ ← T [1]3: for i = 2 to length(T ) do4: pp ← T [i− 1] . pp is the previous point5: pn ← T [i] . pn is the next point6: d← distance(pp, pn)7: t← interval(pp, pn)8: if d in [min d,max d] and t ≤ time d then9: bb coord← bounding box(pp, pn)

10: C ← subset(RefSys, bb coord)11: Initialize an empty list L12: a′ ← pp13: while TRUE do14: a∗ = arg mina∈C d(a, a′)

15: if ∠(−−→a′a∗,−−→pppn) < π

2then

16: Add a∗ to L17: a′ = a∗

18: end if19: Remove a∗ from C20: if C is empty then21: break22: end if23: end while24: Insert the centroids in L into T ′

25: else26: Insert pn in T ′

27: end if28: end for29: return T ′

30: end procedure

Besides inserting the calibrated points because of the spa-tial gap, it is important to obtain their timestamp to ac-curately represent the trajectory. Thus, before adding a∗

to L (Line 16), we compute an estimated time for temporaloccurrence of the centroid a∗ using Equation 1 [25], whered(·, ·) is the distance between two coordinates:

a∗ · t = pp · t +(pn · t− pp · t) · d(pp, a

∗) ·∣∣∣−−→ppa

∗ · −−→pppn

∣∣∣d(pp, pn) ·

∣∣∣−−→ppa∗∣∣∣ · |−−→pppn|

(1)

Regarding the computational complexity, the runningtime of Algorithm 2 depends of the length of T and the

Metric Trace Original Calibrated

No. ClustersRome 7 3

San Francisco 2 1Shanghai 297 141

Avg. Rome 4.3 3.1Eccentricity San Francisco 3.5 2.0

Shanghai 16.37 16.09Avg. Rome 2.4 1.9

Path Length San Francisco 2.3 1.4Shanghai 4.20 2.92

Table 1: Graph properties of original and calibratedtraces. The calibrated traces present different com-plex network metric values than the original one.

number of centroid points in C for each calibrated gap. AsNc is the average number of centroids for a gap and NT isthe length of the trajectory, it follows that the complexity isO(NT ·N2

c ). Given that the number of centroids is not highbecause of the adopted elbow method, and that this is anoffline process that aims to calibrate the traces only once,we consider very reasonable this complexity.

5. CALIBRATION RESULTSIn this section, we assess the performance of our cali-

bration solution by applying it to the mobility traces fromRome, San Francisco, and Shanghai. To this end, we ap-ply algorithm 1 to build the reference system for each sce-nario, and algorithm 2 to calibrate them. We consider asgap when two consecutive points are more than 50 m andless than 500 m apart. In addition, the interval between twoconsecutive points must be lower than 1200 s. Otherwise, itis considered as a break period of the driver, and not as agap. The input parameters for algorithm 2 were then set asmin d = 50m, max d = 500m and time d = 1200s.

First, we compare our calibration method to the one de-scribed in [25]. Figure 2 illustrates an example of one gap,where red points represent the endpoints of the gap (i.e.,the original points) and the blue points are new ones addedby the calibration method. Our calibration method fills thegap more accurately, since the previous added anchor point(i.e., centroid) is considered as reference to the selection ofthe next one. In contrast, the original solution does notconsider the relationship among the anchors to be selected.This demonstrates one of the contributions of our solution,as already discussed, which is the improvement of the cali-bration method.

To assess the impact of filling the gaps in the networktopology, we compare some network topology metrics of theoriginal and calibrated traces. The objective is to show thatthe gaps in the original trace are responsible for a networktopology that differs from reality. To this end, we pick a ran-dom sample of an entire weekday from each of the originaltraces, apply our calibration method to them, and build thetopology graphs for the original and the calibrated traces.It is important to mention that our goal here is to focus onanalyzing how the network topology built from the origi-nal trace differs from the calibrated one, and not explainingtheir mobility patterns.

Figure 3 illustrates the CCDF of the vehicles’ degree,which is the number of contacts a vehicle had during theentire period. In this case, the degree is the number of other

51

Page 6: Filling the Gaps of Vehicular Mobility Traces · Filling the Gaps of Vehicular Mobility Traces Fabrício A. Silva Federal University of Minas Gerais Belo Horizonte, Brazil fabricio.asilva@dcc.ufmg.br

(a) Filling gaps using related work [25]. (b) Filling gaps using our calibration method.

Figure 2: Comparing our calibration method with [25].

0 20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0 Vehicles' Degree (CCDF)

Vehicle Degree

P(D

egre

e >=

x)

OriginalCalibrated

(a) Rome

0 100 200 300 400

0.0

0.2

0.4

0.6

0.8

1.0

Vehicles' Degree (CCDF)

Vehicle Degree

P(D

egre

e >=

x)

OriginalCalibrated

(b) San Francisco

0 50 100 150 200 2500.

00.

20.

40.

60.

8

Vehicles' Degree (CCDF)

Vehicle Degree

P(D

egre

e >=

x)

OriginalCalibrated

(c) Shanghai

Figure 3: CCDF of vehicles’ degree. Notice that a significant number of contacts are lost due to the gaps inthe original traces.

vehicles that a particular vehicle had contact with during theentire evaluation period. A contact exists when two vehiclesare less than 100 m apart from each other. Because of theexisting gaps, some contacts that occur in reality may notbe represented by the traces. In fact, it is remarkable theamount of contacts lost due to the existing gaps. As statedearlier, the missing contacts lead to unrealistic topology and,consequently, inaccurate results.

The betweenness of a vehicle counts the number of short-est paths from all other vehicles to all others that passthrough it. This is an important metric, since a vehicle witha high betweenness is expected to have a higher influencein the dissemination of messages in the network. Therefore,the betweenness is relevant to the performance evaluation,since it represents the importance of vehicles to the net-work topology. For example, the betweenness could be usedto select the most appropriate vehicle to carry and forwardpackets. As expected, the gaps also lead to differences incentrality metrics such as betweenness, as depicted in Fig-ure 4. We can observe that vehicles in the calibrated traces

present, on average, lower betweenness values for Rome andSan Francisco, and about the same for Shanghai.

The same is observed for other structural properties of thetopology graphs, such as average nearest neighbor degree,as illustrated in Figure 5. For all scenarios, it is possible tosee that vehicles are connected to other vehicles presentinghigher degree in the calibrated traces. This is also due to thefact that not all contacts that occur are represented in theoriginal traces. Again, this metric affects the performanceevaluation, since it interferes on the communication amongvehicles.

To conclude, we have also measured other graph proper-ties, namely the number of clusters formed in the graph,the average eccentricity of vehicles, and the average pathlength of vehicles. These results, presented in Table 1, alsodemonstrate that the original traces with gaps differ fromreality.

52

Page 7: Filling the Gaps of Vehicular Mobility Traces · Filling the Gaps of Vehicular Mobility Traces Fabrício A. Silva Federal University of Minas Gerais Belo Horizonte, Brazil fabricio.asilva@dcc.ufmg.br

0 500 1000 1500 2000

0.0

0.2

0.4

0.6

0.8

1.0 Vehicles' Betweenness (CCDF)

Betweenness

P(B

etw

eenn

ess

>= x

)OriginalCalibrated

(a) Rome

0 1000 2000 3000 4000

0.0

0.2

0.4

0.6

0.8

1.0

Vehicles' Betweenness (CCDF)

Betweenness

P(B

etw

eenn

ess

>= x

)

OriginalCalibrated

(b) San Francisco

0e+00 1e+05 2e+05 3e+05 4e+05

0.0

0.2

0.4

0.6

0.8

Vehicles' Betweenness (CCDF)

Betweenness

P(B

etw

eenn

ess

>= x

)

OriginalCalibrated

(c) Shanghai

Figure 4: The vehicles’ betweenness is, in general, lower for the calibrated traces.

0 20 40 60 80 100

020

4060

80

Average nearest neighbor

Vehicle Degree

Knn

OriginalCalibrated

(a) Rome

0 100 200 300 400

050

100

200

300

Average nearest neighbor

Vehicle Degree

Knn

OriginalCalibrated

(b) San Francisco

0 50 100 150 200 2500

2040

6080

100

120

Average nearest neighbor

Vehicle DegreeK

nn

OriginalCalibrated

(c) Shanghai

Figure 5: The average nearest neighbor is also higher for the calibrated traces.

6. CONCLUSION AND FUTURE WORKIn this work, we showed that existing real vehicular mo-

bility traces present temporal and spatial gaps. These gapslead to network topologies that differ from reality and,consequently, to an unreliable performance evaluation. Totackle this problem, we proposed and evaluated a solution tofind and fill the gaps by adopting a cluster-based referencesystem and a calibration method. The results revealed thatour approach was able to fill properly the gaps, and that in-deed the network topologies built from the calibrated tracesdiffer significantly from the original ones. To contribute withthe research community, we made the calibrated traces fromthree different cities publicly available.

As future work, we plan to fine-tune the calibration solu-tion to avoid adding calibrated points outside roads causedby GPS errors in the traces. Furthermore, it is important toevaluate other clustering algorithms, as well as other strate-gies to build the reference system.

7. ACKNOWLEDGEMENTThis work is supported by NSERC, Canada Research

Chair Program and NSERC-DIVA Strategic Research Net-work, by CNPq under grant 573.738/2008-4 (INCT NA-

MITEC), and by CAPES under grant 99999.002061/2014-07.

8. REFERENCES[1] R. Amici, M. Bonola, L. Bracciale, A. Rabuffi,

P. Loreti, and G. Bianchi. Performance assessment ofan epidemic protocol in VANET using real traces. InInt’l Conference on Selected Topics in Mobile andWireless Networking, volume 40, pages 92–99, 2014.

[2] R. Baumann, S. Heimlicher, and M. May. TowardsRealistic Mobility Models for Vehicular Ad-hocNetworks. In IEEE Mobile Networking for VehicularEnvironments, pages 73–78, 2007.

[3] L. Bracciale, M. Bonola, P. Loreti, G. Bianchi,R. Amici, and A. Rabuffi. CRAWDAD data setroma/taxi (v. 2014-07-17). Downloaded fromhttp://crawdad.org/roma/taxi/, July 2014.

[4] Y. Chen, M. Xu, Y. Gu, P. Li, and X. Cheng.Understanding topology evolving of vanets from taxitraces. Advanced Science and Technology Letters,42(Mobile and Wireless):13–17, 2013.

[5] L. Cheng, B. Henty, D. Stancil, F. Bai, andP. Mudalige. Mobile vehicle-to-vehicle narrow-band

53

Page 8: Filling the Gaps of Vehicular Mobility Traces · Filling the Gaps of Vehicular Mobility Traces Fabrício A. Silva Federal University of Minas Gerais Belo Horizonte, Brazil fabricio.asilva@dcc.ufmg.br

channel measurement and characterization of the 5.9ghz dedicated short range communication (dsrc)frequency band. IEEE Journal on Selected Areas inCommunications, 25(8):1501–1516, 2007.

[6] A. Cornejo, C. Newport, S. Gollakota, J. Rao, andT. J. Giuli. Prioritized gossip in vehicular networks.Ad Hoc Networks, 11(1):397–409, 2013.

[7] M. Fiore and J. Harri. The networking shape ofvehicular mobility. In Proc. of the 9th ACMinternational symposium on Mobile ad hoc networkingand computing, pages 261–272. ACM, 2008.

[8] M. Fiore, J. Harri, F. Filali, and C. Bonnet.Understanding vehicular mobility in networksimulation. In IEEE Internatonal Conference onMobile Adhoc and Sensor Systems, pages 1–6, 2007.

[9] M. Gao, T. Zhu, X. Wan, and Q. Wang. Analysis oftravel time patterns in urban using taxi gps data. InIEEE International Conference on Green Computingand Communications, pages 512–517, 2013.

[10] A. Grzybek, M. Seredynski, G. Danoy, and P. Bouvry.Aspects and trends in realistic VANET simulations. InIEEE International Symposium on a World ofWireless, Mobile and Multimedia Networks(WoWMoM), pages 1–6, June 2012.

[11] J. Harri, F. Filali, and C. Bonnet. Mobility models forvehicular ad hoc networks: a survey and taxonomy.IEEE Communications Surveys & Tutorials,11(4):19–41, 2009.

[12] A. Hess and J. Ott. Extrapolating sparse large-scalegps traces for contact evaluation. In ACM Workshopon HotPlanet, pages 39–44, 2013.

[13] M. A. Hoque, X. Hong, and B. Dixon. Efficientmulti-hop connectivity analysis in urban vehicularnetworks. Vehicular Communications, 1(2):78–90,2014.

[14] X. Hou, Y. Li, D. Jin, D. Wu, and S. Chen. Modelingthe impact of mobility on the connectivity of vehicularnetworks in large-scale urban environment. IEEETransactions on Vehicular Technology, (99), 2015.

[15] H. Huang, D. Zhang, Y. Zhu, M. Li, and M.-Y. Wu. Ametropolitan taxi mobility model from real gps traces.Journal of Universal Computer Science,18(9):1072–1092, 2012.

[16] H. Huang, Y. Zhu, X. Li, M. Li, and M.-Y. Wu. Meta:A mobility model of metropolitan taxis extracted fromgps traces. In IEEE Wireless Communications andNetworking Conference (WCNC), pages 1–6, 2010.

[17] S. Joerer, F. Dressler, and C. Sommer. Comparingapples and oranges? In ACM International Workshopon Vehicular Inter-networking, Systems andApplications, pages 27–32, New York, USA, 2012.

[18] A. Kesting, M. Treiber, , and D. Helbing. Connectivitystatistics of store-and-forward intervehiclecommunication. IEEE Transactions on IntelligentTransportation Systems, 11(1):172–181, March 2010.

[19] C.-H. Lee, J. Kwak, and D. Y. Eun. Characterizinglink connectivity for opportunistic mobile networking:Does mobility suffice? In IEEE INFOCOM, pages2076–2084, 2013.

[20] M. Li, A. Ahmed, and A. J. Smola. Inferringmovement trajectories from gps snippets. InInternational Conference on Web Search and DataMining (WSDM), 2015.

[21] S. Lloyd. Least squares quantization in pcm. IEEETrans. Inf. Theor., 28(2):129–137, Sept. 2006.

[22] V. Naumov, R. Baumann, and T. Gross. Anevaluation of inter-vehicle ad hoc networks based onrealistic vehicular traces. In The 7th ACMInternational Symposium on Mobile Ad HocNetworking and Computing (MobiHoc), pages108–119, New York, NY, USA, May 2006.

[23] M. Piorkowski, N. Sarafijanovic-Djukic, andM. Grossglauser. CRAWDAD data set epfl/mobility(v. 2009-02-24). Downloaded fromhttp://crawdad.org/epfl/mobility/, Feb. 2009.

[24] M. Piorkowski, N. Sarafijanovoc-Djukic, andM. Grossglauser. A Parsimonious Model of MobilePartitioned Networks with Clustering. In The FirstInternational Conference on COMmunication Systemsand NETworkS (COMSNETS), January 2009.

[25] H. Su, K. Zheng, J. Huang, H. Wang, and X. Zhou.Calibrating trajectory data for spatio-temporalsimilarity analysis. The International Journal on VeryLarge Data Bases (VLDB), 24(1):93–116, 2015.

[26] SUVnet. Shanghai data trace. Online (available athttp://wirelesslab.sjtu.edu.cn/taxi trace data.html).

[27] R. L. Thorndike. Who belongs in the family?Psychometrika, 18(4):267–276, 1953.

[28] S. Uppoor, O. Trullols-Cruces, M. Fiore, and J. M.Barcelo-Ordinas. Generation and Analysis of aLarge-Scale Urban Vehicular Mobility Dataset. IEEETransactions on Mobile Computing, 13(5), 2014.

[29] Wisemap. Urban mobility. Available atwww.wisemap.dcc.ufmg.br/urbanmobility, May 2015.

[30] C. Xia, D. Liang, H. Wang, M. Luo, and W. Lv.Characterization and modeling in large-scale urbanDTNs. In IEEE 37th Conference on Local ComputerNetworks (LCN), pages 352–359, 2012.

[31] L. Zhang, M. Ahmadi, J. Pan, and L. Chang.Metropolitan-scale taxicab mobility modeling. InIEEE Global Communications Conference(GLOBECOM), pages 5404–5409, 2012.

[32] D. Zhao, H. Ma, L. Liu, and X.-Y. Li. Opportunisticcoverage for urban vehicular sensing. ComputerCommunications, 60:71–85, 2015.

[33] H. Zhu, M. Li, S. Member, L. Fu, and G. Xue. Impactof Traffic Influxes: Revealing Exponential IntercontactTime in Urban VANETs. IEEE Transactions onParallel and Distributed Systems, 22(8):1258–1266,2011.

54