- Open Access
VSTP: vessel spatio-temporal contact pattern detection based on MapReduce
© The Author(s) 2017
- Received: 31 May 2017
- Accepted: 13 October 2017
- Published: 30 October 2017
Due to lack of the coverage of 3G/4G network, satellite communication which costs excessively is the main approach used in ocean to provide network service. Ocean mobile delay tolerant network (OMDTN) can provide low-cost data transmission service in the network by utilizing the contact chances of moving vessels. Spatio-temporal contact pattern is one of the key metrics to improve the efficiency of the routing algorithm in OMDTN. Some researches have been carried out on human handheld device and vehicular ad hoc networks (VANETs). However, the vessel’s trajectory data is distributed and stored disorderly, which makes traditional contact pattern detection algorithm cannot be directly applied. In this paper, we design a parallel algorithm named VSTP based on MapReduce to detect spatio-temporal contact pattern from trajectories of over 2000 vessels. Studying the vessels’ trajectories and the contact records, we observe that the vessels’ contact pattern including inter-contact time distribution and contact times distribution is in sharp contrast to the study on human handheld device and VANETs. Our results can provide the guidelines for the design of data routing protocols on OMDTN and give a new solution to overcome the difficulty of ocean network coverage.
- Contact pattern
- Empirical data analysis
- Delay tolerant network
- Sensor network
Ocean network communication is one of the most important research topics in the field of ocean information technology in the future. It plays an important role in vessel communication, ocean observation and military security. Currently, the main communication means of ocean vessels include HF and VHF radio communication [1–3], onboard laser communication , cellular mobile network , and maritime satellite communication . HF and VHF radio communication are vulnerable to atmospheric interference, thus having blind coverage areas and poor confidentiality. Its communication requires a predetermined frequency which makes it impossible to provide network service in large-scale ocean area. At present, it is only suitable for the internal communication among fleet of vessels and the directional communication to the seashore base station. Laser communication has many advantages, such as high bandwidth and confidentiality, but it is not suitable for providing general oceanic network communication services due to the targeting problem. Because of high cost of setting up the ocean base station, the cellular network cannot provide the network coverage for medium or long distance of sea area and frequent network data transmission will lead to high communication cost. Currently, maritime satellite communication is a good way of ocean network communication. However, the high price of terminals and communication expenses prohibit it from large-scale application.
In short, existing ocean communication schemes are restricted by the limitation of the communication mode and high infrastructure deployment cost and communication expenses; thus, they cannot provide low-cost large-scale oceanic network services. Hence, ocean network communication is a problem that should be solved quickly. The ocean mobile delay tolerant network can provide mutual communication opportunity through the movement of the vessels in the network and provide low-cost data transmission service in the whole network without any infrastructure. In the process of data routing in the mobile delay tolerant network, the optimal routing path can be computed efficiently if the obtained information base (such as the contact pattern, hot spots area among nodes, and the movement pattern model) is very comprehensive. Therefore, it is the key problem in the research of ocean network communication to explore the rules of vessel movement and contact pattern from the large-scale moving trajectory data of vessels.
However, there are many unique characteristics of vessel trajectory data. Firstly, the vessel moving trajectory data is sparse. Due to the pressure of the base stations, capacity constraints of satellite communication, equipment’s stability and the vessel density in the sea and so on, under normal circumstances, the timestamp interval of vessel trajectory data is different and it is from 3 to 20 min in general. This results in the characteristic of sparseness in the vessel trajectory data. The missing data is short-time missing data. In addition, because some fishermen lack the safety operation awareness and do not open the VMS (Vessel Monitoring System), there is long-term missing data in part of the vessels’ trajectory.
Secondly, the vessel trajectory data is stored distributed and not time-sequential. And it can be divided into real-time positioning data by Beidou satellite and Automatic Identification System (AIS) data. Because of the large amount of trajectory data and various acquisition methods, the trajectory data is distributed stored. Due to the quality influence of the satellite communication and the communication range limitation of the AIS’s base station on shore, the vessel moving trajectory data cannot be stored into the information system in real time. Thus, the vessel moving trajectory data is not time-sequential. The non-sequential and distributed stored trajectory data brings great difficulties to its sorting, processing, and mining.
Finally, the vessel density distribution is uneven. Due to the fishing moratorium restrictions and uneven distribution of ports and fishery resources, the ocean vessel has obvious hot spot areas and spatio-temporal correlation characteristic  which is different from the traditional VANETs and human handheld device [8, 9].
In this paper, we propose vessel spatio-temporal contact pattern detection algorithm based on MapReduce called VSTP and analyze the vessel contact pattern including inter-contact time distribution [10–12], contact times distribution and so on. To the best of our knowledge, this is the first systematic vessel spatio-temporal contact pattern detection strategy in the ocean communication field. And it is the first paper about vessel spatio-temporal contact pattern detection with real data in this field.
We carry out the quantitative and qualitative analysis of vessel spatio-temporal contact pattern detection for the first time on the basis of vessel trajectory data.
We propose vessel spatio-temporal contact pattern detection in parallel based on MapReduce model.
We establish inter-contact time distribution and sailing alone time distribution modelings.
The rest of this paper is organized as follows. Firstly, some related works are introduced in Section 2. Secondly, a parallel algorithm named VSTP based on MapReduce is designed in Section 3. After that, the experiment and modeling are presented in Section 4. Finally, the paper is summarized in Section 5.
The research goal of ocean mobile delay tolerant network is to achieve data transmission service with high efficiency and low cost. At present, there is little research result about the key technologies of the ocean mobile delay tolerant network for the vessels. Existing research results mainly focus on the traditional mobile delay tolerant network for vehicle and handheld devices. Existing node contact pattern and data routing algorithm of mobile delay tolerant network are introduced as follows. Also, the parallel data processing technology is introduced in this section.
2.1 Node movement and contact pattern
Mobile delay tolerant network creates communication opportunities through the movement of nodes and provides low-cost data transmission services in the whole network. Studying of the network nodes’ movement and contact pattern is the key to predict communication opportunities, and it is also the key issue in the research of mobile delay tolerant network. In order to explore the node contact pattern, researchers from all over the world obtain the node contact data through asking volunteers to wear special equipment , AP scan , and other experimental methods. The researchers in  process real experimental data collected by abovementioned researchers and analyze the cumulative distribution function of all the nodes’ contact time interval in the logarithmic coordinate. They conclude that the time interval of human contact within 10 min to 24 h obeys the power law distribution. Literature  makes further research on the basis of above researches and gets the conclusion that the time interval between the nodes obeys the power law distribution in a short period of time and obeys the exponential distribution after a short period of time. Such research methods need to deploy equipment in advance to collect contact data which prohibits it from large-scale application. Zhu  et. al. have extended the research object, collected the real moving trajectory data from 2109 taxies for 1 month in Shanghai city, and analyzed the large-scale taxi moving trajectory data by using a sliding time windows algorithm. Research shows that the inter-contact time between taxies obeys a tail exponential distribution on a large range of time scale and the traffic problem is an important factor affecting the inter-contact time between taxies. However, the vessel trajectory data is stored distributed and non-sequential which makes it difficult to achieve the sorting of the full data set. Therefore, it is necessary to propose a new method to solve the problem.
2.2 MDTN routing algorithm
According to the use of historical information, we divide the algorithms into two categories.
The algorithms that do not use historical information are always designed to transfer the data to the node, which improves network performance (such as reducing transmission delay, improving delivery ratio). RAPID take the MDTN routing as resource allocation problem . The authors predict the inter-contact time and transmission delay based on exponential distribution and calculate the optimal data transmission path which improves network performance with the limitation of bandwidth and buffer. On the basis of RAPID, Lee et. al. designed an algorithm called Max-Contribution which takes joint resource allocation problem of link scheduling, data transmission, and data copies into consideration . Based on local knowledge base, authors designed a greedy algorithm to solve the transmission path calculation problem.
Taking advantages of comprehensive historical information will further improve the efficiency of routing algorithm. The historical information-based algorithms usually use the community structure and centrality as the basic information . The community structure is used to describe the intimate relationship among nodes, and centrality is used to describe the active degree of nodes in the network. Based on the above properties, Bubble rap algorithm utilizes global centrality and local centrality to design the routing algorithm . When the nodes of different communities encounter, the packets will be forwarded along the increment direction of the global centrality. When the data is forwarded to the destination community, it will be forwarded along the increment direction of the local centrality, and the data will be sent to the destination node. According to the contact pattern of vehicle, the author designs ZOOM algorithm . The algorithm uses the information entropy and the modularity parameter to analyze the vehicle’s contact pattern, and trains the higher order Markov chain by the historical contact information. Based on Markov chain and other related parameters, the algorithm can predict the future contact information effectively and improve the delivery ratio of the package.
MapReduce is a widely used programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster [21, 22]. A MapReduce computation is combined by a number of map tasks and reduce tasks, which are respectively executed by two kinds of basic computing units called mappers and reducers . Lots of works are carried out from many aspects to improve the performance of MapReduce cluster, such as placement , blocking operator , I/O optimization , task scheduling , and hybrid system .
In this paper, we utilize MapReduce programming model to deal with large amounts of non-serialization distributed vessel data, which overcomes multiple defects of existing methods.
In order to investigate the contact pattern between vessels, it is important to analyze large number of vessels’ trajectories. In this section, we first introduce the trajectory data collected by Beidou satellite system and Automatic Identification System (AIS). Then, we will give a brief introduction of sliding time window algorithm and its disadvantages. Finally, we will provide the VSTP algorithm that can be used to detect the contact pattern of vessels. The algorithm is designed based on MapReduce, which can solve the problems of big data, disorder storage, and distributed storage.
3.1 Trajectory data collection
The trajectory data used in this paper is collected by vessel monitoring system of Zhejiang Province Ocean and Fisheries Bureau, which contains the data from AIS and Beidou Satellite.
To make the government more easily monitor the fisherman’s illegal activities, ocean and fisheries bureau of Zhejiang province popularizes and applies the BeiDou Navigation Satellite System which can monitor the fishing vessels in real time. Because of the unstable satellite signal, the data collected by BeiDou is also in disorder. To avoid the big data problem, they use Hadoop system to store the trajectory data.
3.2 Data preprocess
In order to improve the availability of the data, the data needs to be cleaned and preprocessed. Because the device is unstable, nearly 5% of vessels’ data is rare and incomplete which needs to be deleted from the dataset.
Then, we use velocity filter to filter the noise points caused by signal problem. We use velocity max to represent the maximum velocity of a vessel in the dataset. The velocity is acquired by GPS device which eliminates the error of the current. We calculate the Euclidean distance between neighbor points from the first point and calculate the average velocity of it. If the average velocity is greater than (velocity max ∗λ), we will delete the latter point of this neighbor from the dataset. λ is the measurement parameter. The smaller it is, the sensitive the filter will be. In this paper, we set λ to 1.5. The filter will firstly check whether the first point is noise. If the average velocity between the first and second points is far greater than velocity max . We will then check the second and the third points’ average velocity. If it is far greater than the velocity max , we will treat the second point as noise. Otherwise, we will treat the first point as noise.
Data sparsity will also influence the data availability. The time interval of the record is from 30 to 300 s. Before utilizing the data, we use the high order spline interpolation algorithm called OceanRoute to preprocess it which makes it more dense and available . The OceanRoute utilizes vessel turning feature and solve eight equations to accurately interpolate the data. The experiment results in  show that the algorithm is usable and efficient.
3.3 Sliding time window algorithm
Ideally, in order to detect all the contact of the vessels, the time interval of the data should be 1 s. Due to the data storage pressure and the devices’ communication capability, the data collection of trajectories is always in sparse granularity. Researchers use sliding time window algorithm to deal with the data sparseness problem. The algorithm assumes that there is a time window sliding on the timeline. The objects in the time window are checked with each other to detect whether other objects are in their communication range. Then, they make the time window slide forward to the end of the timeline. If a large time window is selected, the algorithm may detect some false contacts which do not exist. If a small time window is selected, the algorithm may ignore some real contacts which exist .
3.4 Algorithm design
In this subsection, we will introduce a parallel algorithm based on MapReduce, which utilizes the key-value processing procedure to avoid the data gathering and sorting procedures. The idea of VSTP is simple. Each server will let the data records initiatively fall into the time window by using Map function. Then, each time window calculate the contact based on Reduce function. Finally, another round of MapReduce procedure needs to be executed in order to merge the contacts. The procedure is shown as below:
The pseudocode of mapping is shown as Algorithm 1. Function JudgeTimeWindow is designed to calculate timeWindow which the record belongs to. Function Interpolation realizes the OceanRoute algorithm  to interpolate the data in order to acquire more accurate result. Then, the mapping procedure will generate intermediate key-value pairs.
The reducing procedure is designed to process all the intermediary key-value pairs. The intermediary key-value pairs which have the same interKey would be processed by the same reducer. The communication radius is denoted by r. The reducer needs to calculate each pair of records’ distance. If the distance is less than r, a contact and other related information should be marked.
The pseudocode of reducing is shown as Algorithm 2. Each reducer traverses all the record pairs which have the same timeWindow id to check whether their distance is within the communication range. If so, the reducer will generate a record which contains id pair, contact timestamp, contact location, and contact duration. Especially, the parameter idPair is a string combined by vessels’ id and character ’, ’. Because “ id 1,i d 2” and “ id 2,i d 1” are the same pair of vessels, in order to ease the calculating process, the idPair should be combined in ascending order as shown in Algorithm 2. The contact duration is estimated by vessels’ current location, velocity, and direction, which will be used by merging procedure. The estimation method is shown as below.
As can be seen from inequality (10), it is a typical one-variable quadratic inequality. By solving inequality (10), we can obtain the communication time ranges of the two vessels. Therefore, we can predict the duration of communication between the two vessels.
Sliding Time Window Algorithm may divide one real contact into several small contacts , so merging procedure based on MapReduce is designed to avoid this problem. Firstly, we need a simple mapping procedure to reshuffle the key-value pairs based on idPair. Then, another round of reducing is needed to merge the contacts. If the second contact time of the same pair of vessels is within the estimated contact duration range, the reducer will merge the contact and update the contact parameters. The pseudocode is shown as Algorithm 3.
In this section, we will conduct experiments to analyze and model the contact pattern.
4.1 Experiments design
4.2 Result analysis
We process the contact records and generate some representative contact pattern results.
Figure 7b, c illustrates vessel contact times’ distribution and sailing alone time distribution results. They indicate the vessels’ activeness. Nearly 40% vessels have more than 2000 contacts during 14 days as shown in Fig. 7b. Considering the voyage cycle, most vessels have more than 200 contacts on average each day. Figure 7c shows that only 2% of vessels would sail alone more than 2000 s (about 30 min), which means most vessels always have contacts with other vessels during their voyage.
Figure 7d illustrates contact times distribution of same pair of vessels. It indicates the intimacy of vessel pairs. The quantity of pairs declines with the increase of contact times. From records, we could see that 54051 pairs of vessels contact more than 10 times. Storing this information in the knowledge base of each vessel may have great contribution to the design of OMDTN.
Figure 7e, f illustrates the periodic regularity of contact times. From Fig. 7e, we could see that the contact times of the whole data set changes on a 24-h cycle. Based on this discovery, we calculate average contact times in 24 h. Obviously, vessels have more contacts during daytime as shown in Fig. 7f, which indicates that the vessels could have more opportunities to deliver the package in the daytime.
In this paper, we design a parallel algorithm based on MapReduce called VSTP to process the 2-week-long discrete trajectory data of 2093 vessels and detect the contact between each pair of vessels. We surprisingly found that the contact pattern is in sharp contrast to the study on human handheld device and VANETs. Based on the pattern, we modeling parts of contact pattern to get more general result. Our results can provide the guidelines for the design of data routing protocols on OMDTN and give a new solution to overcome the difficulty of ocean network coverage.
Nevertheless, many issues still remain to be explored. Our ongoing works are as follows: (1) further mining the contact records and utilizing the contact location to establish contact location changing pattern; (2) evaluating VSTP on larger data sets and deeply mining the record for longer period; (3) designing OMDTN routing algorithm based on these patterns to overcome the difficulty of ocean network coverage.
This work was supported by the National Natural Science Foundation of China grant 61379127, 61379128, 61572448, the National Key R&D Program 2016YFC1401900, the Fundamental Research Funds for the Central Universities 201713016, and Qingdao National Labor for Marine Science and Technology Open Research Project QNLM2016ORP0405.
CL and ZS design a parallel algorithm named VSTP based on MapReduce and establish two distribution modelings. All authors contributed to the interpretation of the results and writing of the manuscript. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
- D Senic, A Sarolic, in International Conference on Software, Telecommunications and Computer Networks. Simulation of a shipboard VHF antenna radiation pattern using a complete sailboat model (IEEEHvar, 2009), pp. 65–69.Google Scholar
- Senic, D́, Šarolic, Á, in International Symposium Elmar 2010. Simulation of slanted shipboard VHF antenna radiation pattern (ELMARZadar, 2010), pp. 293–296.Google Scholar
- S Sathyamurthy, S Sundaresh, Performance simulation of hf-vhf mobile radio systems in a tactical vehicle. Def. Sci. J.58(6), 762–767 (2008).View ArticleGoogle Scholar
- WS Rabinovich, CI Moore, R Mahon, PG Goetz, HR Burris, MS Ferraro, JL Murphy, LM Thomas, GC Gilbreath, M Vilcheck, Free-space optical communications research and demonstrations at the U.S. Naval Research Laboratory. Appl. Opt.54(31), 189 (2015).View ArticleGoogle Scholar
- MJ Farooq, H Ghazzai, A Kadri, H Elsawy, MS Alouini, A hybrid energy sharing framework for green cellular networks. IEEE Trans. Commun.PP(99), 1–1 (2016).Google Scholar
- B Evans, M Werner, E Lutz, M Bousquet, GE Corazza, G Maral, R Rumeau, Integration of satellite and terrestrial systems in future multimedia communications. IEEE Wirel. Commun.12(5), 72–80 (2005).View ArticleGoogle Scholar
- Y Zong, H Huang, F Hong, Y Zhen, Z Guo, in IEEE/MTS Techo-Ocean 2016 Conference. Recognizing fishing activities via vms trace analysis based on mathematical morphology (IEEEKobe, 2016).Google Scholar
- G Yan, DB Rawat, Vehicle-to-vehicle connectivity analysis for vehicular ad-hoc networks. Ad Hoc Networks. 58:, 25–35 (2017).View ArticleGoogle Scholar
- M Zarei, AM Rahmani, H Samimi, Connectivity analysis for dynamic movement of vehicular ad hoc networks. Wirel. Netw. 23:, 1–16 (2017).View ArticleGoogle Scholar
- A Chaintreau, P Hui, J Crowcroft, C Diot, R Gass, J Scott, in INFOCOM 2006. IEEE International Conference on Computer Communications, Joint Conference of the IEEE Computer and Communications Societies, 23-29 April 2006. Impact of human mobility on the design of opportunistic forwarding algorithms (IEEEBarcelona, 2006), pp. 1–13.Google Scholar
- T Henderson, D Kotz, I Abyzov, The changing usage of a mature campus-wide wireless network. Comput. Netw. 52(14), 2690–2712 (2008).View ArticleMATHGoogle Scholar
- P Hui, A Chaintreau, J Scott, R Gass, J Crowcroft, C Diot, in Proceeding of the 2005 ACM SIGCOMM Workshop on Delay-tolerant Networking. Pocket switched networks and the consequences of human mobility in conference environments (ACMPhiladelphia, 2005), pp. 244–251.View ArticleGoogle Scholar
- N Eagle, A Pentland, Reality mining: sensing complex social systems. Pers. Ubiquit. Comput.10(4), 255–268 (2006).View ArticleGoogle Scholar
- T Henderson, D Kotz, I Abyzov, The changing usage of a mature campus-wide wireless network. Comput. Netw.52(14), 2690–2712 (2008).View ArticleMATHGoogle Scholar
- T Karagiannis, JYL Boudec, Vojnovic, Ḿ, in ACM International Conference on Mobile Computing and NETWORKING. Power law and exponential decay of intercontact times between mobile devices (ACMMontréal, 2007), pp. 183–194.Google Scholar
- H Zhu, M Li, L Fu, G Xue, Y Zhu, LM Ni, Impact of traffic influxes: Revealing exponential intercontact time in urban vanets. IEEE Trans. Parallel Distrib. Syst.22(8), 1258–1266 (2011).View ArticleGoogle Scholar
- A Balasubramanian, B Levine, A Venkataramani, in Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications. Dtn routing as a resource allocation problem (ACMKyoto, 2007), pp. 373–384.Google Scholar
- K Lee, Y Yi, J Jeong, H Won, in INFOCOM, 2010 Proceedings IEEE. Max-contribution: on optimal resource allocation in delay tolerant networks (IEEESan Diego, 2010), pp. 1–9.Google Scholar
- P Hui, J Crowcroft, E Yoneki, Bubble rap: Social-based forwarding in delay-tolerant networks. IEEE T r a n s. M o b. C o m p u t.10(11), 1576–1589 (2010).Google Scholar
- H Zhu, M Dong, S Chang, Y Zhu, M Li, X Shen, Zoom: Scaling the mobility for fast opportunistic forwarding in vehicular networks. Proc. IEEE INFOCOM. 12(11), 2832–2840 (2013).Google Scholar
- B Liu, K Huang, J Li, MC Zhou, An incremental and distributed inference method for large-scale ontologies based on mapreduce paradigm. IEEE Trans. Cybern.45(1), 53–64 (2015).View ArticleGoogle Scholar
- B Wang, S Huang, J Qiu, Y Liu, G Wang, Parallel online sequential extreme learning machine based on mapreduce. Neurocomputing. 149(PA), 224–232 (2015).View ArticleGoogle Scholar
- J Dean, S Ghemawat, in Conference on Symposium on Opearting Systems Design and Implementation. Mapreduce: simplified data processing on large clusters (ACMSan Francisco, 2008), pp. 10–10.Google Scholar
- X Xu, M Tang, A new approach to the cloud-based heterogeneous mapre d u c e p l a c e m e n t p r o b l e m. I E E E T r a n s. S e r v. C o m p u t.9(6), 862–871 (2016).Google Scholar
- C Jin, J Chen, H Liu, Mapreduce-based entity matching with multiple blocking functions. Front. Comput. Sci.11(5), 895–911 (2017).View ArticleGoogle Scholar
- B Feng, X Yang, K Feng, Y Yin, XH Sun, in IEEE International Conference on CLUSTER Computing. Iosig+: On the role of i/o tracing and analysis for hadoop systems (IEEEChicago, 2015), pp. 62–65.Google Scholar
- J Wolf, D Rajan, K Hildrum, R Khandekar, V Kumar, S Parekh, KL Wu, A Balmin, FLEX: A Slot Allocation Scheduling Optimizer for MapReduce Workloads (Springer, Bangalore, 2010).Google Scholar
- A Abouzeid, K Bajda-Pawlikowski, D Abadi, A Silberschatz, A Rasin, Hadoopdb: an architectural hybrid of mapreduce and dbms technologies for analytical workloads. Proc. Vldb Endowment. 2(1), 922–933 (2009).View ArticleGoogle Scholar
- C Liu, YJ Liu, ZW Guo, W Jing, Oceanroute: Vessel mobility data processing and analyzing model based on mapreduce. J. Ocean Univ. China. 1(1), 1–10 (2017).Google Scholar