CGGMC model and coalition formation
We design a hierarchical multihop clustering model for a network of vehicles on a suburban or urban road with installed RSUs. Each vehicle is assumed to be equipped with GPS, sensors or cameras, and an ECU processor, so it can detect local information, such as position, velocity, movement direction, and surrounding events of interest. Its OBU is assumed to be an IEEE 802.11p standard radio transceiver that has an adjustable transmission range function. The communication bandwidth is separated into a cluster control channel and a data channel. Vehicles use the cluster control channel to organize the clustering process, and they transfer their collected data through the data channel. These two channels do not interfere with each other. To achieve our goal, the vehicles form a multihop cluster covering an area as large as possible, and cooperatively collect and transmit data via this cluster structure to a nearby RSU within a given transmission delay time constraint.
The vehicles are grouped into clusters to form a coalition. Each cluster has one vehicle as a cluster head (CH) and at least one child node as a cluster member (CM). Data flows from CM to CH in a singlehop. The clusters merge into the same coalition to expand the network area as a connected multihop cluster as shown in Fig. 1. The CH of a cluster becomes a member of another cluster that it merged into. Its status changes to a dual status of CH and CM simultaneously, or it is called CMH. The CMH performs routing between its cluster members and its CH. Routing is automatically formed like a tree structure, so data can flow by multihop steps from leaf CM nodes and be aggregated while passing through each CMH node until reaching the root sink CH node. The root sink CH node is almost an RSU, but the vehicle that is the current toplevel node will be the CH instead if no RSU exists. Thus, the coalition of any node is defined as the cooperation in which that node receives data from all its offspring to aggregate and send to its parent, as shown in Fig. 2.
We use a coalitional graph game with nontransferable utility (NTU) to maximize the coalition area \(A_{i}\) while satisfying the transmission delay time \(T_{i}\) from node i to the sink node. The number of clusters joining the coalition is limited by the transmission delay time constraint specified prior to the clustering. The coalitional graph game is a type of cooperation game theory, where the utility is affected by both a membership and a relationship structure between members of the coalition, as explained in the related work section, and can be analyzed by the relational link graph among players in the game. The NTU is a subtype of the coalitional graph game, where a player cannot directly transfer its utility to another player in the same coalition, but they can take some actions to support the coalition to obtain higher utility.
We model the coalition formation of connected multihop clusters as shown in Fig. 2. The set of all vehicles and the RSU, denoted by \(N= \{ 1,2,3,\ldots , N \}\), are the players in the game. Any vehicle node i has a 2D position \((x_{i},y_{i})\). Its coalition is denoted by family(i) which is a set of its parent node, itself, and all its offspring nodes. Node i directly receives data from its child nodes and transmits data to its parent node. The Euclidean distance between the transmitter node \(t_{x}\) and the receiver node \(r_{x}\) is denoted by \(d_{{t_{x}} \rightarrow {r_{x}}}\). The coalition area of node i covering the whole family of node i is calculated as
$$\begin{aligned} \begin{aligned} A_{i} = {\left\{ \begin{array}{lll} 0 & ; & {\text {node}} \ i \ {\text {is alone (singletoncoalition)}}\\ \pi R_{{\mathrm{cen}} \rightarrow {\mathrm{bor}}}^{2} & ; & {\text {otherwise}}, \end{array}\right. } \end{aligned} \end{aligned}$$
(1)
where \(A_{i}\) is the coalition area of node i in the circular shape. This area shows the knowledge of node i because it can get information from all of its covered offspring to serve its parent. The radius \(R_{{\mathrm{cen}} \rightarrow {\mathrm{bor}}}\) is the Euclidean distance from the centroid to the border of the coalition. The centroid is the average position of every node in the family(i) that is located at the position \((x_{\mathrm{cen}},y_{\mathrm{cen}})\). The border can be referred to the position of the farthest node in the family(i). The maximum distance given from the centroid to any node in the family(i) is the coalition radius.
The parent(i) denotes the parent of node i, and offspring(i) denotes a set of offspring of node i. The \(T_{i}\) is the transmission delay time of node i that node i consumes while receiving data from all of offspring(i) and transmitting aggregated data to parent(i). However, the transmission delay time is zero if the node i is only one member in the coalition because it does not yet receive or transmit any data. Therefore, the transmission delay time of node i is calculated as
$$\begin{aligned} \begin{aligned} T_{i} = {\left\{ \begin{array}{lll} 0 & ; & {\text {node}} \ i \ {\text {is alone}}\\ && {\text {(singletoncoalition)}}\\ \left[ \max \limits _{\forall j \in \textit{offspring}(i)} \left( T_{j \rightarrow i} \right) \right] + T_{i \rightarrow parent(i)} & ; & {\text {otherwise}}, \end{array}\right. } \end{aligned} \end{aligned}$$
(2)
where \(T_{i \rightarrow parent(i)}\) is the transmission time that node i successfully delivers the data packets to its parent node. Node j denotes any offspring of node i. The \(T_{j \rightarrow i}\) is the transmission time that node j successfully delivers the data packets to node i. The data channel is fairly shared, so all of the offspring nodes can deliver data simultaneously. The offspring node that has a large data size or a bottleneck path will consume more time and be slower than others. We use the maximum function to pick up the time that offspring node consumes for successful delivery.
The transmission time \(T_{i \rightarrow parent(i)}\) is calculated by the amount of aggregated data from node i that is successfully delivered to parent(i) with the maximum throughput rate as
$$\begin{aligned} \begin{aligned} T_{i \rightarrow parent(i)} = \frac{D_{i}}{\widehat{TP}_{i \rightarrow parent(i)}} , \end{aligned} \end{aligned}$$
(3)
where \(D_{i}\) denotes the amount of aggregated data from node i. It consists of node i’s self local data \(\Delta _{i}\) and all data from all offspring nodes as depicted in Fig. 2.
The maximum throughput of the path between node i and its parent node, denoted by \(\widehat{TP}_{i \rightarrow parent(i)}\), can be scaled from the data channel capacity and approximated based on Gupta–Kumar’s model [38, 39] as
$$\begin{aligned} \begin{aligned} \widehat{TP}_{i \rightarrow parent(i)} \propto \frac{C_{i \rightarrow parent(i)}}{\sqrt{n \log n}} , \end{aligned} \end{aligned}$$
(4)
where n represents the number of nodes in the same cluster that consists of node i, parent(i), and siblings(i).
The data channel capacity, denoted by \(C_{i \rightarrow parent(i)}\), can be approximated by the Shannon capacity formula [36] as
$$\begin{aligned} \begin{aligned} C_{i \rightarrow parent(i)} = B \log _{2} {\left( 1 + \frac{P_{r_{x}}}{N_{0}} \right) } , \end{aligned} \end{aligned}$$
(5)
where B is the channel bandwidth, \(P_{r_{x}}\) is the received power at parent(i) that has decayed along with the distance from the transmitter node, and \(N_{0}\) is the noise power. From the equations of Gupta–Kumar and Shannon, the maximum throughput fundamentally depends on the distance between nodes and the number of nodes in the cluster.
Any coalition, denoted by S, can be regarded as the subset of all players in the game \((S \subseteq N)\). Each player attempts to increase its coalition area by joining another coalition and aims to increase its utility as much as possible. However, a larger coalition area leads to throughput reduction, and thus more transmission delay time to the sink node due to the longer transmission distance and more interfering nodes. Thus, the utility of each player can be represented in terms of gain and cost, where the gain is the coalition coverage area, and the cost is the transmission delay time from its most distant offspring node to its parent node. Each player is unable to transfer its utility to another, but it can join or merge its coalition with another to achieve higher utility altogether.
The utility of node i, denoted by \(u_{i}\), can be defined as a function of coalition area \(A_{i}\) and transmission delay time \(T_{i}\) as
$$\begin{aligned} u_{i} = (\alpha A_{i}  \tau T_{i})+(\rho _{i}  \psi _{i}) , \end{aligned}$$
(6)
where the \(\alpha\) and \(\tau\) are the positive coefficients to weigh the importance of the coalition area and the transmission delay time. The parameters \(\rho _{i}\) and \(\psi _{i}\) are the reward and penalty, respectively. Node i will get a reward \(\rho _{i}\) if it connects to a coalition that has an RSU as a member. We assign a constant value for a reward, such as \(\rho _{i} = 100\) for a coalition having an RSU, but \(\rho _{i} = 0\) otherwise. Node i will get a penalty \(\psi _{i}\) if its velocity differs greatly from that of its parent node. In particular, the penalty \(\psi _{i}\) is defined as the magnitude of relative velocity between node i and parent(i) as
$$\begin{aligned} \begin{aligned} \psi _{i} = {\left\{ \begin{array}{lll} 0 & ; & {\text {node}} \ i \ {\text {has no parent}}\\ \left \vec {v}_{i}  \vec {v}_{parent(i)} \right & ; & {\text {otherwise}}. \end{array}\right. } \end{aligned} \end{aligned}$$
(7)
We define the coalitional structure, denoted by \(\Gamma = \{ S_{1},S_{2}, \ldots ,S_{m} \}\), as a set of m coalitions that occur at the same time. The coalition S that contains all nodes in N is called the grand coalition. However, the grand coalition may not be guaranteed to occur, and there may be many coalitions at the same time. If a coalition \(S_{k}\) and another coalition \(S_{k'}\) occur at the same time, each coalition will not contain the same node (\(S_{k} \cap S_{k'} = \emptyset\) for \(k \ne k'\)).
The rank precedence and relationship graph of the nodes in the coalition S are represented by the arrow sign \(( \rightarrow )\). The head of the arrow points at the higherrank node, and the tail of the arrow is at the lowerrank node. The rank level status of each node can be any one in the set \(L = \{ {\mathrm{UN}},{\mathrm{CM}},{\mathrm{CH}},{\mathrm{CMH}} \}\) which denotes uncluster status, cluster member status, cluster head status, and dual status (both CM and CH), respectively. Therefore, the coalition S can be written in the coalitional graph form, such as \((1^{\mathrm{UN}})\), \((1^{\mathrm{CM}} \rightarrow 2^{\mathrm{CH}})\), \((1^{\mathrm{CM}} \rightarrow 2^{\mathrm{CMH}} \rightarrow 3^{\mathrm{CH}})\), \((1^{\mathrm{CM}} \rightarrow 2^{\mathrm{CH}} \leftarrow 3^{\mathrm{CM}}),\ldots\), where \(1,2,3,\ldots\) are the ID numbers of nodes in the game. The coalitional structure \(\Gamma\) can be written as follows:
$$ \Gamma = {\left\{ \begin{array}{*{20}l} \left\{ \left( 1^{\mathrm{UN}}\right) , \ \left( 2^{\mathrm{UN}}\right) , \ \left( 3^{\mathrm{UN}}\right) \right\} \\ \left\{ \left( 1^{\mathrm{CH}} \leftarrow 2^{\mathrm{CM}}\right) , \ \left( 3^{\mathrm{UN}}\right) \right\} \\ \left\{ \left( 1^{\mathrm{CH}} \leftarrow 2^{\mathrm{CMH}} \leftarrow 3^{\mathrm{CM}}\right) \right\} \\ \left\{ \left( 2^{\mathrm{CM}} \rightarrow 1^{\mathrm{CH}} \leftarrow 3^{\mathrm{CM}}\right) \right\} \\ \qquad \qquad \quad \vdots &&. \end{array}\right. } $$
The merge and split process is a critical part of our game model. The rank level status can be changed during the clustering process due to merging and splitting of nodes. For a given node, its status can change with the following merging and splitting:

The UN changes to CM, or CH changes to CMH, when the node joins another coalition as a child node.

The UN changes to CH, or CM changes to CMH, when another coalition joins the node as its offspring.

The CM changes to UN, or CH changes to UN, when the node leaves its coalition to be a standalone node.

The CMH changes to CM when all its children leave the node.

The CMH changes to CH when its parent leaves the node.
For example, suppose at time step t, the coalitional structure is
$$\begin{aligned} \Gamma _{t} = \{ S_{1} , S_{2}, S_{3} \} = \{ (1^{\mathrm{UN}}), (2^{\mathrm{UN}}), (3^{\mathrm{UN}}) \}. \end{aligned}$$
At time step \(t+1\), if nodes 1 and 2 merge such that \(S_{1}^{\mathrm{new}} = (1^{\mathrm{CH}} \leftarrow 2^{\mathrm{CM}})\), the new coalitional structure will be
$$\begin{aligned} \Gamma _{t+1}^{\mathrm{new}} = \{ S_{1}^{\mathrm{new}} , S_{3} \} = \left\{ \left( 1^{\mathrm{CH}} \leftarrow 2^{\mathrm{CM}}\right) , \left( 3^{\mathrm{UN}}\right) \right\} . \end{aligned}$$
Merging process
Messages exchanged in coalition forming
The UN or CH node who is the highest ranked node in the coalition acts as the requester to join another coalition. It broadcasts the joinrequest message over the cluster control channel. The joinrequest message contains the current node’s position, velocity, utility, and coalitional graph. Any node of another coalition acts as the acceptor who can accept the request if both coalitions can improve their utility. The merging of these coalitions successfully forms a new coalition when the acceptor returns the joinaccept message to the requester, and then the requester returns the joinconfirm message to the acceptor. Then, the requester and acceptor change their rank status as explained above. Moreover, when the requester changes its status to CM or CMH, it cannot be the requester until it leaves the coalition.
Complexity of messages exchanged in CGGMC
We consider the worstcase scenario where all N vehicles are in transmission range of each other (all nodes can connect together with a singlehop range), and only a pair of nodes successfully merges in each merging step. The complexity to form and expand the coalition depends on the number of messages exchanged between nodes. To merge two coalitions, three messages are transmitted: joinrequest, joinaccept, and joinconfirm. If all N nodes broadcast the joinrequest message, the number of messages exchanged is no more than 3N. Initially, there are N singletoncoalitions. The worstcase coalition merging occurs when a singletoncoalition merges with another coalition so that the number of singletoncoalitions is reduced by one after each merging. Therefore, there are \(N1\) merging steps at most to finally obtain the grand coalition of size N, and the total number of messages exchanged is upperbounded by \(3N(N1)\) or \(O(N^{2})\).
Probabilistic Greedy merging
The utility of a grand coalition depends on the coalition merging sequence. The sequence of coalitional structures obtained from the merging in a threenode case is shown in Fig. 3. To obtain the grand coalition with the highest utility, all possible coalitional structures must be elaborated to identify the one with the highest utility. However, this approach is not feasible in practice due to the explosive growth in the number of possible coalitional structures as the number of vehicles gets large. Under the sequential merging process discussed in Sect. 3.2, coalitions in \(\Gamma _{1}\) can be changed to any coalition in \(\Gamma _{2}\) to \(\Gamma _{7}\) depending on which node initiates the joinrequest message first, and whether the new coalition improves the utility. The grand coalition can be any one in \(\Gamma _{8}\)–\(\Gamma _{16}\), one of which has the maximum utility.
If we let the merging process occur without any intervention, the grand coalition may end up with a poor utility. In particular, any node can broadcast the joinrequest message, and the next coalitional structure is formed only if the utility is improved. As such, the next coalitional structure in the sequence can be any structure having a higher utility. For instance, from Fig. 3, if Node 2 is the first node that broadcasts the joinrequest, and Node 3 is the first node that accepts and returns the joinaccept message, the next coalitional structure to form will be \(\Gamma _{4}\).
A better approach, that improves the chance of obtaining the grand coalition with high utility, is to give higher preferences to coalitions with higher utilities in each merging stage, which we refer to as probabilistic greedy merging. Each requester node broadcasts the joinrequest message and waits for join accept messages from all acceptor nodes in its transmission range to determine possible coalitional structures in the next stage and their utility. All requester nodes exchange their potential coalitions in the next stage, and each candidate coalitional structure is assigned the probability of being selected that depends on its utility relative to the others. Therefore, the candidate coalitional structure with the highest utility has the largest probability of being selected.
Stability analysis of coalition formation
In this section, we show that the grand coalition obtained from probabilistic greedy merging (described in Sect. 3.3) is stable, and we derive the probability of reaching each of the possible grand coalitions. A coalition \(S_{k}\) is called internally stable if no member can improve its utility by leaving or splitting out of the coalition, and it is called externally stable if it cannot improve its utility by joining or merging with another coalition \(S_{k'}\). So, the coalition is stable when its structure no longer changes.
Let \(\Omega = \{ \Gamma _{1}, \Gamma _{2}, \ldots , \Gamma _{\gamma } \}\) denotes a set of all possible \(\gamma\) coalitional structures for a given number of nodes with fixed topology. An example of \(\Omega\) for a case of three vehicles was shown earlier in Fig. 3. The probability that the merging of coalitions described in Sect. 3.2 will result in a given coalition structure \(\Gamma _{j}\) can be analyzed by using a discretetime Markov chain (DTMC) where \(\Gamma _{j}\) represents any jth state in the DTMC state space \(\Omega\).
Let \(\Gamma _{j} = \{ S_{1},S_{2}, \ldots ,S_{m} \}\) be an arbitrary coalitional structure with m coalitions, and let the coalitional structure \(\Gamma _{1} = \{ \{1\},\{2\}, \ldots ,\{N \} \}\) contain all singletoncoalitions. Let \(v(S_{k})\) be the sum of utilities from all the nodes in the coalition \(S_{k}\) of coalitional structure \(\Gamma _{j}\), i.e., \(v(S_{k}) = \sum _{\forall {i} \in S_{k} \in \Gamma _{j}} u_{i}\). It can be proved that there will be at least one absorbing state in the Markov chain if the utility summation of the singletoncoalitions is less than that of the nonsingleton coalition [40]. In other words, the players attempt to form a coalition that improves their utility, because the utility of players is just equal to zero when they act alone \(\left( \sum _{\forall {\{i\}} \in \Gamma _{1}} v(\{ i \}) = 0 \right)\). This property implies that the coalition merging always converges to one of the coalitional structures with a grand coalition, and remains there. Those coalitional structures are thus absorbing states in the DTMC.
Let P be the transition matrix whose entries are the probabilities \(P_{\Gamma _{j} \Rightarrow \Gamma _{j'}}\) that the coalitional structure changes from the current structure \(\Gamma _{j}\) to the new structure \(\Gamma _{j'}\) in a time step [41, 42], denoted by
$$\begin{aligned} \begin{aligned} P = \begin{bmatrix} P_{\Gamma _{1} \Rightarrow \Gamma _{1}} &{}\quad \dots &{}\quad P_{\Gamma _{1} \Rightarrow \Gamma _{j}} &{}\quad \dots &{}\quad P_{\Gamma _{1} \Rightarrow \Gamma _\gamma } \\ \vdots &{}\quad \ddots &{}\quad \vdots &{} &{} \vdots \\ P_{\Gamma _{j} \Rightarrow \Gamma _{1}} &{}\quad \dots &{}\quad P_{\Gamma _{j} \Rightarrow \Gamma _{j}} &{}\quad \dots &{}\quad P_{\Gamma _{j} \Rightarrow \Gamma _\gamma } \\ \vdots &{} &{}\quad \vdots &{}\quad \ddots &{}\quad \vdots \\ P_{\Gamma _\gamma \Rightarrow \Gamma _{1}} &{}\quad \dots &{}\quad P_{\Gamma _\gamma \Rightarrow \Gamma _{j}} &{}\quad \dots &{}\quad P_{\Gamma _\gamma \Rightarrow \Gamma _\gamma } \end{bmatrix} . \end{aligned} \end{aligned}$$
(8)
Let \(Z_{\Gamma _{j} \Rightarrow \Gamma _{j'}}\) be the set of coalitions whose members can change when the transition from the current coalitional structure \(\Gamma _{j}\) to the new coalitional structure \(\Gamma _{j'}\) occurs. For the probabilistic greedy merging, the transition probability \(P_{\Gamma _{j} \Rightarrow \Gamma _{j'}}\) in (8) can be calculated by
$$\begin{aligned} \begin{aligned} \begin{array}{lll} P_{\Gamma _{j} \Rightarrow \Gamma _{j'}} = \frac{\Big [ \sigma _{\Gamma _{j} \Rightarrow \Gamma _{j'}} \Big ] \Big [ u_{\mathrm{CH}} \left( S_{k}^{\mathrm{new}} \right) \Big ]}{\omega _{\Gamma _{j} \Rightarrow \Gamma _{j'}}} & ; & S_{k}^{\mathrm{new}} \in \Gamma _{j'},\\ && {\mathrm{CH}} \in S_{k} \in Z_{\Gamma _{j} \Rightarrow \Gamma _{j'}} , \end{array} \end{aligned} \end{aligned}$$
(9)
where \(u_{\mathrm{CH}}(S_{k}^{\mathrm{new}})\) is the utility of a node i that used to be the cluster head node of the coalition in \(Z_{\Gamma _{j} \Rightarrow \Gamma _{j'}}\), and it still has a CH rank when changes to the new coalition \(S_{k}^{\mathrm{new}}\). The utility is normalized by dividing by \(\omega _{\Gamma _{j} \Rightarrow \Gamma _{j'}}\) so that \(0 \le P_{\Gamma _{j} \Rightarrow \Gamma _{j'}} \le 1\). The normalization term \(\omega _{\Gamma _{j} \Rightarrow \Gamma _{j'}}\) is the sum of utilities obtained from CH node in every possible coalitional structure, which is calculated by
$$\begin{aligned} \begin{aligned} \begin{array}{lll} \omega _{\Gamma _{j} \Rightarrow \Gamma _{j'}} = \sum _{j'=1}^{\gamma } \Big [ \sigma _{\Gamma _{j} \Rightarrow \Gamma _{j'}} \Big ] \Big [ u_{\mathrm{CH}} \left( S_{k}^{\mathrm{new}} \right) \Big ] & ; & S_{k}^{\mathrm{new}} \in \Gamma _{j'},\\ && {\mathrm{CH}} \in S_{k} \in Z_{\Gamma _{j} \Rightarrow \Gamma _{j'}} . \end{array} \end{aligned} \end{aligned}$$
(10)
The variable \(\sigma _{\Gamma _{j} \Rightarrow \Gamma _{j'}}\) takes on a binary values (0 or 1) to enforce the condition that only the transition to a new coalitional structure with better utility is allowed. That is
$$\begin{aligned} \begin{aligned} \sigma _{\Gamma _{j} \Rightarrow \Gamma _{j'}} = {\left\{ \begin{array}{lll} 1 &{} ; &{} u_{\mathrm{CH}} \left( S_{k}^{\mathrm{new}} \right) \ge u_{\mathrm{CH}} \left( S_{k} \right) \\ 0 &{} ; &{} {\text {otherwise}}. \end{array}\right. } \end{aligned} \end{aligned}$$
(11)
Therefore, we can solve the probability of absorption into coalitional structures with a grand coalition by rearranging and partitioning the transition matrix P as
$$\begin{aligned} \begin{aligned} P = \left[ \begin{array}{cc} Q \;&{}\; R \\ \hline 0 \;&{}\; I \end{array} \right] , \end{aligned} \end{aligned}$$
(12)
where Q is a matrix of transition probabilities between the transient states, R is a matrix of transition probabilities from the transient states to the absorbing states, and I is an identity matrix [41, 42]. For an absorbing DTMC, \((IQ)^{1}\) is the fundamental matrix that shows the expected time to absorption, and \((IQ)^{1}R\) is the matrix whose elements are the probabilities that each absorbing state is reached. The state which has the highest probability of absorption becomes the most probable coalitional structure.
Splitting process
In the nonsingletoncoalition, any node whose utility has decreased is becoming unstable, and is ready to leave the current coalition to find a new one that can improve its utility. The condition for leaving is when its utility is less than zero, meaning that splitting itself off to form a new singletoncoalition is better than staying in the current coalition. Let the current coalitional structure be \(\Gamma _{j} = \{ S_{k} \}\). The members of the current coalition \(S_{k}\) can split to multiple new coalitions \(S_{k}^{\mathrm{new}}, S_{k'}^{\mathrm{new}}\) where \(S_{k}^{\mathrm{new}} \cap S_{k'}^{\mathrm{new}} = \emptyset\) if they obtain higher utility than the current coalition, i.e., \(u_{i}(S_{k}^{\mathrm{new}}) \ge 0 \ge u_{i}(S_{k})\) and \(u_{i}(S_{k'}^{\mathrm{new}}) \ge 0 \ge u_{i}(S_{k})\). The new coalitional structure after splitting is \(\Gamma _{j'}^{\mathrm{new}} = \{ S_{k}^{\mathrm{new}}, S_{k'}^{\mathrm{new}} \}\).