In this section, we introduce the proposed secondorder ship path planning model in detail. Specifically, the problems that need to be solved in ship path planning are introduced first. Second, we present the modeling method of sea area scene, including the rasterization method and the storage format of geographic information. Then, the static global path planning algorithm based on DynaSarsa(\(\lambda\)) is introduced, including the eligibility trace and the optimization process of Sarsa algorithm by Dyna framework. Finally, the dynamic local path planning algorithm based on Fuzzy Collision Cost A* (FCCA*) is introduced, including the identification of collision risk, the construction of ship domain and the optimization process of collision risk membership function to A* algorithm.
Problem description
The proposed secondorder ship path planning model needs to solve two problems, i.e., static global path planning and dynamic local path planning. Figure 1 shows a schematic diagram of the proposed path planning model.
On the macro level, the ship is in a longdistance sea area with few obstacles. As shown by the purple trajectory in Fig. 1, the ship will navigate in a global path planning manner, when there is no local dynamic ship collision risk. This proposed planning method gives more priority to the path planning speed and path length.
Microscopically, the ship trajectory prediction method based on StackedBiGRUs [35, 36] continuously detects the collision risk between the ship and other ships. The model will switch states and navigate in a local path planning manner when the ship collision risk index exceeds the rated threshold. As shown in Fig. 1, in the local path planning frame, the red ship should try to avoid collision with the blue ship that is sailing straightly. This planning method prioritizes the safety of the planned path and needs to avoid collisions with static obstacles and dynamic ships at the same time.
The framework of the proposed secondorder ship path planning model is shown in Fig. 2, and it mainly consists of two components, i.e., global path planning and local path planning.
Global path planning. First, the ship path planning based on Sarsa reinforcement learning algorithm can effectively carry out path planning but the convergence speed is slow. Then, the eligibility trace and decay value mechanism are incorporated, and a global path planning algorithm based on the Sarsa(\(\lambda\)) [37] learning model is proposed. Finally, the reinforcement learning algorithm framework Dyna is presented. In particular, the global path planning speed is further accelerated by combining the Dyna framework and the Sarsa(\(\lambda\)) learning model into a DynaSarsa(\(\lambda\)) learning model.
Local path planning. First, the ship collision risk identification is introduced. Specifically, the future route of the ship agent may inevitably collide with other dynamic ships when the ship is sailing on the globally planned path. At this time, the system should identify the collision risk and carry out dynamic local path planning to further ensure the navigation safety of the ship. This work focuses on the encounter situation of two ships, and the proposed model uses the ship trajectory prediction model to predict the future trajectories of the two ships in a period of time, and calculates the collision risk index (CRI) for each moment in this period of time. If the CRI index exceeds the threshold, the ship’s path planning is switched to second order from the first order. Then, the traditional A* algorithm and its shortcomings that ignores the ship collision domain when applied to the ship trajectory planning problem is discussed. Finally, we introduce the GOODWIN ship domain model. The heuristic estimation cost of the A* algorithm is modified via the membership function, and the collision risk of dynamic obstacles is combined with the A* algorithm, and the FCCA* path planning model is proposed to effectively reducing the collision risk of ships in local path planning.
Marine scene modeling
There are various methods for modeling geographic information of the marine scene, most of which are related to converting the surrounding environment into the problem of graph theory. The environmental map conversion methods in twodimensional marine can be divided into vector data method and rasterization method, and their characteristics can be summarized as follows.

Vector data have the advantages of standardized structure and low redundancy. Particularly, the data retrieval speed is fast, and the image resolution is high. However, the data structure is relatively complex, and it is difficult to process irregular graphics.

Raster data have simpler data structure than that of vector data, and it is less difficult in spatial analysis or surface simulation. Besides, the integration or splicing of irregular graphics is more convenient, and it is easy to carry out various spatial analysis and mathematical simulation. The disadvantage is that the geographic information conversion becomes more difficult with data scale increasing.

For the geographic features in largescale ocean scenes shown in Fig. 3a, the rasterization method can represent geographic entities more effectively than the vector data method. The accuracy is determined by the grid side length.
In the sea area with complex weather and geographical environment, the ships may encounter many obstacles during the entire navigation process, and the obstacles include manmade marine structures, glaciers, reefs, etc. Such topographic data are generally stored in electronic charts. Therefore, it is necessary to convert the electronic chart into a scene data model that the algorithm can recognize to realize the path planning on the simulated electronic chart. The data source used in this paper is the shapefile format data based on ArcGIS, and the raster method is used to establish a static scene model with the vector data rasterization tool provided in ArcMap. This rasterization method belongs to an interpolation method, which is specially used to create a digital elevation model (DEM) that conforms to the real surface. The main principle of interpolation is to restore the real terrain by using traditional input data structures and known surface features.
Water is the primary erosive force that determines the general shape of most terrains. Therefore, most terrains contain many local maxima such as peaks, but few local minima, resulting in a discontinuous terrain state. Terrain to raster can constrain the interpolation process with surfacerelated constraints, generating a continuous terrain structure and an accurate representation of mountains and rivers. This type of functionconstrained method can generate more accurate topographic maps with less input data. The scale of information will be smaller than the information required to describe geographic information with digital contours, further reducing the cost of obtaining accurate DEM.
This rasterization method is fully computed when removing sinks, and does not impose functional constraints where it might conflict with the input elevation data. Such conflicts are usually saved in log files in the form of sinks. These data can be used to correct geographic information, which is especially suitable for processing large and informative datasets such as marine environment. The rasterized Baltic Sea area is shown in Fig. 3b. Finally, the result data are saved in Shapefile format.
In particular, in the dynamic local ship path planning task, the extracted Shapefile data are used to model the local navigation chart, where the collision avoidance rules, navigation experience, ship operation characteristics and the size of the navigation chart should be fully considered. Let \((x_s, y_s)\) be the starting position of the ship, and \((x_d, y_d)\) be the target position at the end of the planning. Then, the center coordinate of the navigation chart \({\text{point}}_{\textrm{ce}}\) is formally set as:
$$\begin{aligned} {\text{point}}{_{\textrm{ce}}} = \left( {\frac{{{x_s} + {x_d}}}{2},\frac{{{y_s} + {y_d}}}{2}} \right) . \end{aligned}$$
(1)
The warp length \(l_{\textrm{lon}}\) and latitude length \(l_{\textrm{lat}}\) of the navigation chart are set as:
$$\begin{aligned} {l_{\textrm{lon}}}&= {{y_s}  {y_d}} , \end{aligned}$$
(2)
$$\begin{aligned} {l_{\textrm{lat}}}&= {{x_s}  {x_d}} . \end{aligned}$$
(3)
The grain size of rasterization determines the fineness of path planning, and it is necessary to coordinate the execution time of the algorithm and the planning quality.
Static global path planning algorithm based on DynaSarsa(\(\lambda\))
In this section, we will introduce the static global path planning algorithm based on DynaSarsa(\(\lambda\)) in the proposed model. The main feature of the Sarsa algorithm is to perform singlestep update. The value function is updated immediately after each step in the environment, which can quickly respond to environmental information. Therefore, the traditional Sarsa algorithm is represented as Sarsa(0). However, in the singlestep update method, only the previous step that reaches the goal is related to the goal and all actions before that become unrelated. In particular, this situation will slow down the convergence speed of the algorithm. Generally, the continuous multistep can be set as one round by extending the number of steps to update, and a complete update is conducted at the end of each round. This memory state of the continuous multistep is called the eligibility traces (ET).
ET is an important concept in reinforcement learning. Sutton and Barto [38] pointed out that ET is additional memory variables associated with each state considering the frequency of visiting each state. There are three different expressions of ET: accumulating trace (AT), replacing trace (RT), and true online trace (TOT). The cumulative eligibility traces of stateaction pairs are calculated as follows:
$$\begin{aligned} {e_{t + 1}}\left( {s,a} \right) = \left\{ {\begin{array}{*{20}{c}} {\gamma \lambda {e_t}\left( {s,a} \right) \quad \quad \mathrm{{if}}\, s \ne {s_t},a \ne {a_t}}\\ {\gamma \lambda {e_t}\left( {s,a} \right) + 1\quad \mathrm{{if}}\, s = {s_t},a = {a_t}} \end{array}} \right. , \end{aligned}$$
(4)
where \(\gamma\) represents the discount factor. \(\lambda \in [0,1]\) is the decay coefficient of the trace, which defines how much the information of a selection in the past should be attenuated. In many cases, related studies have found that eligibility traces can speed up the convergence rate [39]. We can get the obtain the Sarsa(\(\lambda\)) algorithm by using the eligibility trace to modify the Sarsa algorithm, and Sarsa(\(\lambda\)) is shown in Algorithm 1.
In Algorithm 1, \(\delta\) represents the temporal difference learning error (TDerror). At each moment, the current \(\delta\) is assigned to each state according to its eligibility trace. The use of the eligibility trace allows the Sarsa(\(\lambda\)) algorithm to converge to the global optimum faster than the traditional Sarsa(0) algorithm. However, this acceleration is based on the preservation of past visits, and it will consume additional memory space. In the case with sufficient computing resources, choosing the Sarsa(\(\lambda\)) algorithm can quickly obtain a safer navigation planning path. The stateaction trajectory diagram of the algorithm is shown in Fig. 4, where T is the total number of iterations.
The use of cumulative eligibility trace and decay coefficient \(\lambda\) in the optimization of the Sarsa algorithm can improve the convergence speed of the algorithm. However, the Sarsa(\(\lambda\)) algorithm still belongs to the category of modelindependent reinforcement learning algorithms. In particular, the ships directly use the experience learned from the marine environment to generate, and the learning efficiency of this method is relatively limited. In modelbased reinforcement learning algorithms, ships use the experience generated in the simulated environment to select new strategies by continuously refining the model.
During the training process with the Dyna learning framework, the ship first interacts directly in the simulation environment to obtain real experience to generate a premodel, and interactively obtains simulation experience in the simulation scene inside the model at the same time. Besides, the real experience and simulation experience are integrated to train the ship, helping the ships plan and judge the optimal path. The core idea of the Dyna learning framework is to consume computing resources in exchange for high sampling efficiency. Particularly, more environmental interaction experience can be obtained, which improves the efficiency of the algorithm per unit time, while consuming computing resources. At the same time, in the stage of obtaining simulation experience in the Dyna model, the update method of the Qlearning algorithm is used. This method has the ability to learn the global optimum and can help the ship to avoid the local optimum situation. The steps of the DynaSarsa(\(\lambda\)) algorithm after combining the Dyna learning framework with Sarsa(\(\lambda\)) are shown in Algorithm 2.
By integrating the Dyna framework with Sarsa(\(\lambda\)), the ships can not only obtain experience from the simulation training of the Dyna framework, but also learn experience from the direct interaction with the marine environment. The fusion of two kinds of experiences can provide guidance for ship path planning, which can greatly improve the efficiency of ship static global path planning.
Dynamic local path planning algorithm based on FCCA*
It is necessary to switch from global path planning to local path planning for collision avoidance operations when a ship faces a collision risk. Therefore, this section first introduces the method for identifying the collision risk of encountering ships.
In the traditional autonomous ship collision avoidance system, the collision risk index (CRI) is usually used as an index to measure the collision risk of ships. The minimum value of CRI is 0 and the maximum value is 1. The minimum encounter distance (distance to closest point of approach, DCPA) and the minimum encounter time (time to closest point of approach, TCPA) are important factors for evaluating the CRI index between encountering ships in actual scenarios. As the value range of CRI has a nonlinear negative correlation with DCPA and TCPA, we use DCPA and TCPA to quantify CRI.
In consideration of the calculation of the collision risk of two ships, it is assumed that the status of the two ships at a certain moment is: \({V_0}\left( {Lo{n_0},La{t_0},So{g_0},Co{g_0}} \right)\) and \({V_1}\left( {Lo{n_1},La{t_1},So{g_1},Co{g_1}} \right)\), where Lon, Lat, Sog, and Cog represent the longitude, latitude, ground speed, and ground angle of the ship, respectively. Therefore, the relative speed \(S_r\) and relative angle \(C_r\) of the two ships at this moment can be calculated as:
$$\begin{aligned} {S_r}&= \sqrt{Sog_0^2 + Sog_t^2 + 2So{g_0}So{g_t}\cos \left( {Co{g_t}  Co{g_0}} \right) }, \end{aligned}$$
(5)
$$\begin{aligned} cr &= \left\{ {\begin{array}{*{20}{c}} {Co{g_0}  {\mathop {\textrm{arcos}}\nolimits } \left( {\frac{{S_r^2 + Sog_0^2  Sog_t^2}}{{2{S_r}So{g_0}}}} \right) \quad Co{g_0} < Co{g_t}}\\ {Co{g_0} + {\mathop {\textrm{arcos}}\nolimits } \left( {\frac{{S_r^2 + Sog_0^2  Sog_t^2}}{{2{S_r}So{g_0}}}} \right) \quad Co{g_0} \ge Co{g_t}} \end{array}}. \right. \end{aligned}$$
(6)
Besides, DCPA and TCPA are defined as:
$$\begin{aligned} {\text{DCPA}}= dist*\left( {\sin \left( {{C_r}  Co{g_0}  {\text{Bearing}}  \pi } \right) } \right) , \end{aligned}$$
(7)
$$\begin{aligned} {\text{TCPA}}= dist*\left( {{{\cos \left( {{C_r}  Co{g_0}  {\text{Bearing}}  \pi } \right) } / {{S_r}}}} \right) , \end{aligned}$$
(8)
where dist is the distance between the two ships on the sea, Bearing is the angle of the ship \(V_1\) relative to \(V_0\) when the ship \(V_0\) is the coordinate origin. Besides, the unit of DCPA is nautical miles, and the unit of TCPA is minutes.
The relationships between CRI and DCPA or TCPA are defined as:
$$\begin{aligned} CR{I_d}= & {} {a_d}\exp \left( {{b_d}{\text{DCPA}}} \right) , \end{aligned}$$
(9)
$$\begin{aligned} CR{I_t}= & {} {a_t}\exp \left( {{b_t}{\text{TCPA}}} \right) , \end{aligned}$$
(10)
where the parameters a and b are the adjustment coefficients estimated according to the opinions of the ship experts and the watchmen in the ship transportation system. In this work, the parameters are set as \((a_d, b_d, a_t, b_t ) = (1.0529 , 1.5694, 1.3971, 0.0879)\) according to the movement of the objects on the sea [40]. CRI is calculated by the following formula based on the weighted sum of \(CRI_d\) and \(CRI_t\)
$$\begin{aligned} CRI = \alpha CR{I_d} + \beta CR{I_t}, \end{aligned}$$
(11)
the parameters \(\alpha\) and \(\beta\) are the weights of \({\text{CRI}}_d\) and \({\text{CRI}}_t,\) respectively. The sum of \(\alpha\) and \(\beta\) is 1, and its value can be set according to the specific characteristics of marine traffic applications.
Whenever the state of the two ships at the next moment is predicted, the collision risk index CRI is calculated. If the CRI index exceeds the collision threshold, the ship changes from the global path planning state to the local path planning state.
In the local path planning stage, the collision model of the ship itself becomes a factor that cannot be ignored. The basic structure of the ship is shown in Fig. 5. Experts and scholars have conducted related research and proposed ship domain models suitable for different scenarios. In this work, we will first introduce the GOODWIN ship domain model.
Japanese ship expert FUJII first proposed the concept of ship domain in the 1960s. FUJII uses sensing equipment to collect and organize ship encounter behaviors in coastal waterways and crowded areas. Then, the ship collision avoidance trajectory data are filtered and analyzed, and finally an elliptical ship field is obtained. The ship is located at the intersection of the long and short axes. Specifically, the long axis is 8 times the length of the deck, and the short axis is 3.2 times the length of the deck. The schematic diagram of the FUJII ship domain model is shown in Fig. 6:
Then, GOODWIN improved the FUJII model into an asymmetrical shape via marine traffic surveys and a large number of collision avoidance experiments are conducted on radar simulators using crew training machines, taking into account the International Regulations for Preventing Collisions at Sea. The GOODWIN ship domain model with asymmetric shape based on the FUJII model. The model consists of three sectors with different radii spliced together. The sector areas are distributed according to the range of the ship’s lights. Its fanshaped radii are 0.7 nautical miles, 0.85 nautical miles and 0.45 nautical miles, respectively. The schematic diagram of the GOODWIN ship domain model is shown in Fig. 7.
The GOODWIN model is considered to be suitable for collision avoidance of ships at sea [40]. Particularly, the GOODWIN model is safer than the COLDWELL model and the FUJII model in practical use, so the GOODWIN model is selected as the collision domain model in this study. The GOODWIN model calculates the ship domain according to the angle relationship between ships. Formally, GOODWIN is defined as:
$$\begin{aligned} {\text{GOODWIN}} = \left\{ {\begin{array}{*{20}{c}} {0.85\quad {0^ \circ } \le \theta< {{112.5}^ \circ }\quad \;\;}\\ {0.45\quad {{112.5}^ \circ } \le \theta \le {{247.5}^ \circ }}\\ {0.7\quad \;{{247.5}^ \circ }< \theta < {{360}^ \circ }} \end{array}} \right. . \end{aligned}$$
(12)
In this work, the A* algorithm will be used to obtain the local optimal planning path of the ship. The cost function f(k) of the A* algorithm in this scenario should be expressed as the sum of the navigation distance cost and the collision cost:
$$\begin{aligned} f\left( k \right) = g\left( k \right) + h\left( k \right) , \end{aligned}$$
(13)
where \(g\left( k \right)\) is the cost of the ship’s distance from the starting point, and its initial value is 0. Besides, the heuristic cost function \(h\left( k \right)\) can choose from a variety of methods to calculate the distance, such as Manhattan distance, Euclidean distance, and Chebyshev distance. Considering the underactuated characteristics of the ship (the degree of freedom of the ship’s navigation is less than the degree of freedom of the marine environment) [27], we use the sum of the Chebyshev distance and the collision cost (fuzzy collision cost, FCC) at this point as the heuristic estimated cost of \(\left( k \right)\). Give the ship a guiding direction, and the specific calculation expression of \(\left( k \right)\) is:
$$\begin{aligned} \left( k \right) = \max \left( {{x_k}  {x_t} , {y_k}  {y_t} } \right) + FCC\left( dist, \theta _1, \theta _2 \right) , \end{aligned}$$
(14)
where \((x_t, y_t)\) is the position coordinate of the target waypoint, \((x_k, y_k)\) is the current position coordinate of the ship. Besides, we set the clockwise direction of true north from \(0^{\circ }\) to \(360^{\circ }\), and the ship direction angle is the angle between the ship’s bow and the true north direction. FCC is the fuzzy collision cost based on the GOODWIN ship domain model. Next, the collision cost FCC based on the fuzzy model is introduced.
The basic operation in traditional Boolean logic is “and, or, not,” which is suitable for scenarios with clear logic. However, there is no particularly clear threshold when actually judging the distance and angle of two ships. In fuzzy logic, there are no strict boundaries between distances and angles, and the classification of different orientations is measured by the degree of membership. Specifically, the degree of membership refers to the quantitative analysis of a fuzzy research object through membership functions, and the process of transforming logical input values into membership degrees of each set is called fuzzification. The calculation of the collision risk membership function FCC can be expressed as:
$$\begin{aligned} {\text{FCC}} = \frac{1}{2}{U_\theta } + \frac{1}{2}{U_{\textrm{dist}}}, \end{aligned}$$
(15)
where \(U_\theta\) is the membership function of the azimuth angle \(\theta\) between the current ship and the target ship, and \(U_{\textrm{dist}}\) is the membership function of the distance dist between the current ship and the target ship.
The collision risk index encountered by the ship will change with the relative angle of the two ships. \(U_\theta\) is a function of the included angle between the two ships. According to the ship collision avoidance rule [41], the membership degree of the azimuth angle \(\theta\) between the current ship and the target ship is defined as:
$$\begin{aligned} {U_\theta } = \frac{{17}}{{44}}\left[ {\cos \left( {abs\left( {{\theta _1}  {\theta _2}} \right)  {{19}^ \circ }} \right) + \sqrt{\frac{{440}}{{289}} + {{\cos }^2}\left( {abs\left( {{\theta _1}  {\theta _2}} \right)  {{19}^ \circ }} \right) } } \right] . \end{aligned}$$
(16)
Moreover, dist, the distance between the ship and the target ship, will also cause the change of the collision risk index. Combined with the GOODWIN ship domain model, the surrounding of the ship is divided into three areas, and the collision risk \(U_dist\) is calculated for each area separately, which is shown in Algorithm 3.
Besides, we use the collision risk membership function FCC to modify the heuristic estimation cost function of the traditional A* algorithm, and the process of the FCCA* algorithm is shown in Algorithm 4.