Intelligent popularity-aware content caching and retrieving in highway vehicular networks

Content dissemination over vehicular networks is a critical challenge due to its high intermittent connectivity. However, content cache and retrieval by exploiting the information-centric networking (ICN) paradigm did not get much attention. In this paper, in order to improve the efficiency of content retrieval, we propose a novel popularity-aware content caching and retrieving strategy in VANETs named P-CCR, which innovatively considers the ICN perspective into vehicles mobility and vehicular wireless communication in highway scenarios. At the same time, it supports to independently switch the retrieval modes between vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) according to the popularity of contents. Most notably, an actively caching mechanism is proposed in P-CCR, which can be triggered to inform surrounding volunteer vehicles to cache contents requested in a short time. Extensive simulations highlight that P-CCR significantly reduces RTT value and the amount of interest packets, which can lighten network loads and avoid wasting resources.


Introduction
Vehicular ad hoc networks (VANETs) are autonomous networks allowing mobile vehicles to communicate with each other without relying on any infrastructure [1,2]. As the vehicles becoming more and more important in city's life, VANETs put the road for offering services to users on wheels using the flexible vehicle-to-vehicle (V2V) or vehicle-to-infrastructure (V2I) communications [3][4][5]. Recently, with the development of large-scale Intelligent Transportation System (ITS) deployment, infotainment service has become a foreseeable trend in VANETs [6][7][8][9][10][11]. What is more, drivers will be alerted timely with precise information by using of multimedia technologies when there is a dangerous situation or an accident ahead. We can make a clearer and more accurate decision by personal priorities and/or vehicle capabilities. This represents that video streaming applications will become an attractive field in VANETs (Fig. 1).
Lately, content dissemination has been proven that distributed solutions based on the wired Internet is powerful and highly extendible [12][13][14][15][16][17]. Current research efforts are applied to extend high-quality content services (e.g., video streaming) to the mobile and wireless environments [18,19]. For example, peer-to-peer (P2P) multimedia services over VANETs become a very interesting research subject. A QoE-driven user-centric VoD approach is proposed for urban multi-homed P2P-based vehicular network [20]. Although P2P multimedia delivery over VANETs are attracting more and more attention from the research community, studies of it still face great challenges: quickly changing topologies on account of vehicle mobility, intermittent and momentary connectivity because of sparse network infrastructures, harsh propagation environments and multi-hop communications. The main cause of these issues is the host-oriented connectivity in the design of traditional Internet [21].
Innovative network architectures are expected to get over the present problems in content delivery in VANETs and support mobility natively. Recently, informationcentric networking (ICN) has gained momentum as the candidate architecture for the sake of Internet in the future, which cares about what users want to access rather than where the contents reside. Several different ICN derivative proposals have been proposed all over the word [22][23][24]. All these schemes try to replace hosts with contents itself in order to build a content-based communication model. The packet-level caching transparently performed at each ICN-based equipment or node has the potential to cope with the intermittent connectivity. Owing to the innovative idea, ICNs are expected to bring a renewed perspective and offer benefits for the effective content streaming retrieval in highly mobile and dynamic environments like VANETs.
This work is an extension of our earlier work [25]. Different from the former work, we conducted a deeper research and added many new achievements. This paper focuses on improving the quality of content delivery in highway VANETs scenario, and proposes a Popularity-aware Content Caching and Retrieving (P-CCR) solution. The main contributions in this paper are summarized below: A novel calculating method for the popularity is proposed, and the popularity depends on two variables: the reference frequency of the object (the chunk belongs to) and the reference frequency of the chunk. Switching retrieval modes is presented to improve the retrieval efficiency. When requesting, the vehicle can independently switch the retrieval modes between V2V and V2I according to popularity level value. An actively caching mechanism is proposed to avoid the sudden peak requests. If the interest increasing rate (IIR) of a requested content exceeds the defined threshold, the caching mechanism will be triggered autonomously.
The rest of this paper is organized as follows. Section 2 introduces a brief overview of existing researches. In section 3, we present our special considerations about the strategy factors. Section 4 details the propsoed P-CCR solution and its mechanisms. Section 5 illustrates the experimental analysis. Finally, section 6 concludes this paper.

Related work
We give a brief review of the related work in this section. Our work is mainly related to three aspects: modeling in ICN, ICN-based VANETs, and content transmission in VANETs.
Some researchers focus on theory models about information-centric forwarding and analyze their features for a large-scale Internet. Psaras et al. [26] lay down a mathematical model to estimate CCN packet-level caching features based on the continuous time Markov chains. Carofiglio et al. [27] set up an analytical model for CCN content transfer mechanisms based on these contents' hit possibility, and address a closed-form expression of delivery delay and throughput. Nevertheless, these models are initially conceived for wired network topologies, and based on the static network. Therefore, they are not suitable for mobile VANET environments.
On the other hand, researchers have witnessed fresh advances in the research of mobility in ICNs. Tyson et al. [28] make an investigation on how ICN is analyzing its challenges, sustain mobility, and major research area. Furthermore, some other scholars start to merge ICN into MANETs. Oh et al. [29] current the specifics of CCN in urgent and tactical MANETs and realize a brief linear topology test platform to demonstrate the performance of MANETs in CCN. It is observed that all these researches compare two networking schemes through simulations to check whether ICN-based MANETs outperform the traditional MANET approaches, without using theoretical analytical tools. Varvello et al. [30] first look into the spatial design for a content-center MANETs in the view of analytical models. The authors derive a serial of analysis models for a serial of solutions for candidate and compare  (GHT) routing approaches, proactive flooding, and reactive flooding. However, all the solutions for candidate stem from conventional MANET designs, and a few important capabilities in ICN-based MANETs are not considered when building the model. ICN-based information dissemination over VANETs is currently gearing up for development in the research circles. Amadeo et al. [3] proposed a pioneering networking architecture for dimension distribution and retrieval in VANETs by leveraging the recently proposed ICN paradigm, and confirm that the potentialities of ICN will be an ideal networking solution for the future VANETs. Arnould et al. [31] raised a new network structure aimed at mixed VANETs based on the CCN architecture. Later, TalebiFard et al. [32] address the problem of information transmission in vehicular network environments and put forward a model and solution by a content-center approach of networking by using a selective random network coding approach. Beyond that, Wang et al. put the Named Data Networking (NDN) concept into V2V communications, put forth a strawman proposal for the data name design, and developed a brief traffic information transmission application based on the data naming design [33,34]. However, all these proposals focus on the information dissemination, without considering the specific applications such as content streaming, which own some unique characteristics. The content streaming features may affect greatly in different application context. P2P content sharing in VANETs has attracted research interest from different scientists. Hsieh et al. [35] introduce an effective dynamic overlay multicast method for the live video streaming in urban VANETs. Qadri et al. [36] present a realistic model scheme for video streaming over VANETs, using an overlay network comprising a plurality of sources. Zhou et al. [37] propose a P2P media-service scheme that collectively resolves the content dissemination, cache update, and matter of fairness for vehicular networks. Yang et al. [38] develop a live content streaming system (called CodePlay) in VANETs by means of the symbol-level network coding. However, none of the above works consider the information-centric perspective, which is believed as a promising networking solution for VANETs.
From the above, it is a really fresh area to bring the information-centric thinking in the content cache and retrieval over VANETs. This paper proposes an optimized P-CCR strategy which integrates the mobility of mobile vehicles and features of wireless environment into a pervasive methodology.

Strategy factor consideration
This section describes the considerations of our strategy factors in detail. To make the description clearer, we first present an overview of ICN-based content cache and retrieval over VANETs. Next, we clarify the methods based on two basic components in our system: locations and content streaming. At last, we draw the methodology of this paper.

ICN-based content cache and retrieval
We make an overview of the content retrieval process with ICN schemes. In the server, contents are pre-processed as the streaming and each streaming is divided into a lot of chunks, which are the basic elements in ICN. An identified name is distributed to the relative chunk via the naming process. After that, each vehicle can request the wanted streaming chunk by the chunk name directly, rather than the IP address. Broadcasting interest packets issue a chunk request that contain the chunk name. Due to certain relevance of these chunks, this information-centric content cache and retrieval process can support the related requests, not just the separated request. Then vehicles which cache this chunk will respond to an originator. The requesting vehicle can get several answers and choose the "best" one. A playback buffer is adopted to cache the received chunks, which are queued to be "played". Every vehicle, which transfers requested chunks, will cache chunks locally for the same requests later. The overview of ICN-based content retrieval is shown in Fig. 2.
It should be noted that caching the chunk locally will lead to unnecessary content caching redundancy, but can bring impressive effects, such as shorter RTT and less interest packets. Compared with saving the storage space, alleviating the burden of Internet is more meaningful and significant in highway VANETs. With many characteristics, which are featured in ICN and VANETs, some more considerations should be given to the model in our full analysis particularly.

Mobility in highway scenarios
In VANETs, many vehicles autonomously construct a temporal network to communicate with each other. Different from general MANETs, VANETs have several special attributes. For example, in VANETs, the vehicles' mobility patterns are highly relying on the road structures.
In VANETs, the locations of vehicles vary with the time. In this paper, we choose a linear topology to analyze in order to make the model simple. We assume the vehicular scenario is in the highway, so all vehicles can be distributed as some lines and can move along the lines in both directions. Vehicles may be parted on the highway, forming platoons. However, vehicles in different platoons can be bridged by the cars coming from the opposing directions (Fig. 3). In this theory, we model the linear topology by separating it into different sub-regions, which represent the location coordinate.
Let R = [0, C] represent a linear highway scenario. R = [0, C] is further parted into M cells with the same size, which can be calculated by In each location cell R i , a lot of vehicles locate. In other words, a mobile vehicle can occupy one of R i at any moment. We will focus on the mobility, which begins with a cell and ends up in another cell.

Content popularity
In our model, each content (e.g., video) is composed with one or many chunks. The content is divided into multiple equal-sized chunks, which is the smallest unit in the transfer model. Multiple chunks can be grouped into a segment in each vehicle. Different contents can contain different number of chunks, each of which owns a chunk identifier. Based on this, content retrieval can be disposed by using of a sequence of chunk identifier or segment identifier.
Let δ express the average size (the number of chunks) of content. According to Carofiglio et al. [39], a group of content items Q are averagely divided in K classes based on their popularity. Assume that the popularity in Q follow a Zipf-like distribution. Let q k express the requesting probability of class k content items, k = 1, ⋯, K; thus we have q k = c/k α , with parameter α > 1.

Overall framework
Based on the above descriptions, the relationship among contents, vehicles, and locations can be summarized in our models. Many vehicles can share one content at the same time; a vehicle moves among the location cell; and one or more vehicles can be carried by one location cell at any time.
We proposed a methodology to connect the relationship among contents, vehicles, and locations, as shown in Fig. 4. It outlines the whole structure of this paper and our considerations in content cache and retrieval in ICN-based VANETs. In the methodology, we are concerned about the three processes: vehicle location distribution [40], content retrieval, caching and correlation study [39]. At the bottom, highway is chosen as the application scenarios, and transition probability theory is adopted to represent vehicles' movement. In the top, we build the content retrieval model and derive the evaluation metrics. Besides, switching retrieval modes is to optimize the QoE. In the center, we analyze four points: actively caching mechanism, caching in-networking features, generalized mobility model, and natures of wireless environment. The vehicle is the key role, which is the bridge to link locations and contents. This is the core thought in our proposal.

Our proposal: P-CCR
We accomplish a whole analytical model for ICN-based content cache and retrieval in VANETs in this section. Figure 5 shows the workflow of P-CCR. The major characteristics of P-CCR are summarized as follows: a novel calculation of the popularity, switching of the retrieval modes, and an actively caching mechanism. Next, we will carry out a detailed exposition from three aspects.

Popularity value ω
Each vehicle has a cache with a size of x chunks in the network. These chunks are cached and replaced in accordance with the popularity ω, which significantly impacts the retrieval's efficiency. In general, the more popular the content is, the more duplicate vehicles. What is more, the popularity keeps changing with the varying of time and space. Due to the features of ICN, we consider the processes of content caching and retrieving are all chunk-level. However, the chunks, which belong to one content object, have a certain back-and-forth relevance. Therefore, the popularity should be related with each other. In fact, some chunks are more popular than the whole content objects. In another words, some chunks in the content are what people really want. Each chunk is interconnected, so the full content should be taken into account when calculating the popularity. To get a reasonable popularity, we redesign the definition of the popularity. We believe that the popularity of each chunk depends on two parts: the reference frequency of the object (the chunk belongs to) and the reference frequency of the chunk.
Assuming one content object C contains a sequence of streaming chunks c 1 , c 2 , c 3 . We define the reference frequency of C is σ and the reference frequency of chunk c i is σ i , and the content weight value is v. Then the popularity of this c i follows: To make sure that each vehicle in networks can get the popularity of requested chunk, we establish a novel mechanism for calculating the popularity (Algorithm 1), which is setup in a special entity, named as resource popularity manager (RPM). RPMs can be distributed in the network and connected with RSUs. Each vehicle is available to access RPMs to query the popularity when it is needed.
RPM keeps listening on the well-known port which receives requests for contents. If a request for c i is received, RPMs will firstly retrieve the resource name which c i belongs to. For example, c i belongs to C. Both c i and C have their records indicating their reference frequency numbers. RPMs update both the records by adding each record with one. Then, RPMs calculate the popularity value of c i through a specific math formula (1), and respond to the request with it.

Switch the retrieval modes
Vehicular wireless technologies, for example, Wireless Access in the Vehicular Environment (WAVE) (IEEE 802.11p) enable data delivery via V2V and V2I communication. In V2I, vehicles should maintain connection to the RSUs during the driving period to support the continuous content retrieval. We observe that both of them have their advantages and disadvantages in the process of content streaming retrieval. According to the cache replacement policy, the more popular the content is, the more copies it has. For the popular content, V2V represents a high efficiency and reliability. It is more efficient to obtain contents from the surrounding vehicles. Due to the mobility of vehicles, if the contents are acquired from the fixed device, it may need to be repeatedly switched among the RSUs. It will be wasting time and poor qualification.
However, if a requested content has a low popularity, which means this content is stored in a handful of vehicles, content's discovery in vehicles will be a time-consuming duration. What is more, this process may disseminate a lot of interest packets, which will result in the waste of the vehicular resources. However, RSUs' position is fixed, and the storage space is large enough, they can even access the contents of the request directly from the server, so V2I communication is more likely to be successful to access the content via RSUs. Based on this consideration, we propose a popularity-aware-based retrieval strategy to optimize the information-centric content retrieval process (Algorithm 2). This strategy can adjust the content retrieval mode (V2V or V2I) with the knowledge of the popularity value of the requested content.
There are two retrieval modes that can be used to obtain a resource: V2I communication mode and V2V communication mode. Assuming the content of c i is requested, the requesting vehicle V firstly gets the popularity value of c i from the roadside station (RPM). If the value ω is lower than the threshold popularity value Ω, it means c i is unpopular. In this case, V will try to get the content in V2I mode. V sends requests for connection to RSU and retrieves the content from the RSU. Otherwise, if ω is in excess of Ω, V will try to get the content in V2V mode. V disseminates requests of c i on the radio to find out how many vehicles nearby holds c i and records them, then makes connections with the best vehicle to retrieve the content. By now, if c i was still not obtained, V will connect the nearby roadside unit RSUs. RSUs can connect to the Internet to request the content directly, which can prove the probability of success.

Actively caching mechanism
When the IIR of the requested content θ is more than the threshold Θ, the vehicles will trigger the actively caching mechanism to inform other vehicles to cache the content. In most cases, some content will suddenly become a hot spot. In order to cope with the peak of this kind of sudden, we designed the actively caching mechanism. Through the forecast judgment algorithm, the vehicle automatically sends the content to surrounding vehicles before the peak appears. Therefore, when the peak is coming, the request vehicles can achieve the content in the vicinity, which can reduce the RTT and improve the QoE of the passengers.
The interest increasing rate follows: Based on this consideration, we propose an actively caching mechanism (Algorithm 3). Because all of requesting packets will be sent to the well-known port, RPM can calculate IIR. If θ ci is in excess of Θ, the actively caching mechanism will be triggered that the vehicle automatically sends the content to surrounding vehicles to cache the content.
We conduct a group of experiments to estimate and analyze the performance of P-CCR in this part. We analyze the advantages of P-CCR from two aspects, the evaluation of switching retrieval modes and the performance of the actively caching mechanism. In our experiments, we set 10 RSUs and a range of 50-150 vehicles moving in a highway scenario with a length of 10 km. The highway scenario has only one lane in both directions and is separated to 100 sub-regions in our model. The vehicles are randomly distributed in sub-regions initially. Content samples with 20 different popularity classes are randomly distributed in all RSUs and vehicles. Each content sample has a same number of 128 chunks. The popularity follows a Zipf-like distribution, which determines the popularity classes. Each experiment is executed 10 times repeatedly, and the average is taken as the final results. Table 1 sums up a list of key parameter values or their ranges in the simulation.

Evaluation of switching retrieval modes
To evaluate the superiority of switching retrieval modes, it is compared with the random caching strategy (R-Caching) in terms of RTT and the number of interest packets. In this experiment, we set 50, 100, and 150 vehicles moving in a 10-km highway.
Firstly, we observe the RTT according to the content popularity ω. Experimental results are given in Figs. 6, 7, and 8. In this experiment, three cases with different number of vehicles are chosen and compared. By comparing these figures, we can see RTT based on P-CCR has an obvious decreasing trend, compared with the R-Caching strategy. That is because different popularity contents will bring different distribution according to P-CCR. The more popular one content is, the smaller miss probability it is in every region cell. Based on R-Caching, however, the RTT do not change with the increase of the popularity. For example, when the popularity is 14, the RTT based on P-CCR is less about 10 ms than R-Caching. It is significant for the highway passengers to reduce the waiting time of 10 ms.
Besides, we can see the number of vehicles affects the property of content retrieval in VANETs. The delay of content retrieval reduces with the rising of vehicle density to a certain extent. It is because vehicles retrieve the content from the near vehicular peers, which can provide the requested contents. Each vehicle can cache the content and can be seen as a casual content provider. For a certain requested content, the higher the density vehicles is, the more potential providers nearby there are. Thus, it has a smaller miss probability in each location region. Better yet, it is observed that three sets of lines have the same consistent trend in these three figures, which strongly validates the reliability of our model to some extent.
Secondly, the number of interest packets is analyzed according to the number of vehicles. Figure 9 shows the experimental results. In this figure, the popularity is fixed as 10. The blue one represents the R-Caching scheme and the red shows the P-CCR solution. We know that the number of interest packages will naturally grow as vehicles grow. But the quantity of interest  messages can significantly reduce by switching retrieval modes intelligently compared with R-Caching. According to P-CCR, requesting vehicles can choose the best retrieval mode (V2V or V2I) based on the popularity, which is derived from the cache possibility of the requested content. The content is more popular, and it will be searched successfully more possibly in vehicles. Therefore, retrieving the content in vehicles will decrease the number of interest packets by adopting P-CCR. Similarly, the requested content with low popularity will reduce interest packets if it is searched in the RSUs. For example, we fix the number of vehicles at 150. The R-Caching strategy need to broadcast almost 3000 interest packets. However, it is just required 2400 packets for the P-CCR scheme. It can lighten greatly network loads and avoid efficiently wasting resources, which is significant for VANETs in highway.

Evaluation of actively caching mechanism
Based on this experiment, we assess the performance improvement of the proposed actively caching mechanism by comparing with the R-Caching without the actively caching mechanism. The experiments are conducted in different cases with varying increasing rates of interest packets. We fix the number of vehicle as 100, and RTT is adopted to measure the flexibility of actively caching mechanism. As shown in Fig. 10, we fix the threshold of IIR at 0.5. According to R-Caching, the RTT increase sharply with the growing of IIR, which will waste the wireless resource. Compared with R-Caching, P-CCR with an actively caching mechanism can obviously limit the RTT. When the popularity is 10 and the increasing rate is 0.5, the RTT is decreased by 10 ms compared with the random caching. At the same time, we compare different popularity by P-CCR with actively caching mechanism. The larger the popularity is, the less the RTT is. In Fig. 11, we fix the popularity at 10. We further evaluate the RTT according to different threshold of IIR. We find that different threshold will affect the performance of the actively caching mechanism. The greater the growth rate is, the more obvious the change is. So setting proper IIR is crucial to achieve the best performance. We can set different threshold values according to different kinds of contents. For example, we can set a larger threshold for the content with low real-time requirement (e.g., web services), and set a lower threshold for the content with high real-time requirement (e.g., video).
We found that the greater the popularity is, the more significant the effect is. When the rate of requested chunks exceeds the pre-setup threshold of IIR, the RTT with P-CCR has a sharp change and remains in a steady state. That is because when the rate is higher than the threshold, the actively caching mechanism will be triggered and the content will be cached in more vehicles. The P-CCR solution limits the amount of variation of the lines and guarantees a smaller RTT value, which represents an acceptable QoE. This paper proposed an innovative popularity-aware content caching and retrieving strategy (called P-CCR) for highway VANETs by incorporating the mobility of mobile vehicles and characteristics of wireless environment in a pervasive scheme. P-CCR presents a bright thinking for the information-centric content retrieval in VANETs. Specifically, an actively caching mechanism is adopted in P-CCR by making use of the interest increasing rate to guarantee the stability of content transmission. Simulation experiments show that P-CCR could adjust the communication modes and has a high reliability for content retrieval in VANETs. In the future, P-CCR extensions will be considered with security factors [41,42].