A dynamic taint tracking optimized fuzz testing method based on multi-modal sensor data fusion

The safety of the Industrial Internet Control Systems has been a hotspot in the information security. To meet the needs of communication, a large variety of proprietary protocols have emerged in the field of industrial control. The protocol field is often trusted in the implementation of industrial control terminal code. If attackers modify the data of these fields using the protocol defect, the operation of the program can be controlled and the entire system will be affected. To cope with such security threats, academia and industry generally adopt fuzz test methods. However, the current industrial control protocol fuzz test methods generally have low code coverage, where unified description models are missing and test cases are not targeted. A method of fuzzification processing combined with dynamic multi-modal sensor communication data is proposed. To track the program execution, the dynamic pollution analysis is used to search for the input fields that affect the execution of the conditional branch and capture the dependencies between the conditional branches to guide the grammar generation of test cases, which can increase the chances of executing deep code. The experimental results show that the proposed method improves the validity and code coverage of test cases to a certain extent and greatly increases the probability of anomaly detection in the protocol implementation.

Microsoft's vulnerability MSII-087 to avoid the detection of security components of system and then stole various information on system and returned it to the attacker. In 2015, Ukrainian electric power system was attacked by malicious software Black Energy [6] which got the remote access and control power of system and thus caused the crash of power grid's SCADA host system and then an electricity black-out in a large area. In 2017, security manufacturer ESET announced a tool win32/Industroyer attacking electric IIS directly. The tool can cause the blackout from transformer substation by controlling the breaker. In 2018, a lot of warning messages of illegal intranet access appeared on a provincial electric IIS monitoring platform in China. After an analysis, people found the reason was the manufacturer ran and maintained the product server remotely by opening the function of file sharing and thus exposed the system in the public network for a long time, which brought serious hidden dangers. In the past 10 years, many security incidents and potential and possible dangers made the security situation of IIS increasingly serious. Figure 2 shows the statistic of IIS security incidents in recent years, and the data came from IS-CERT (The Internet Systems Cyber Emergency Response Team) [7][8][9][10][11][12][13].
During the realization of protocols by the industrial Internet terminal codes, protocol fields are generally credible, but the attacker can control the running of program by changing the data values of the fields using the defects of protocols and then affect the whole system. For example, the skip instruction's destination address parameters generally come from credible data sources in the program rather than externally incredible input such as the incoming data from an industrial Internet protocol. However, the attacker can overwrite the instruction's destination address through system vulnerability and then control the running process of IIS. Professor Jonathan M. Garibaldi proposed an indistinguishable conceptual framework as a key component of the evaluation of computerized decision support systems. Case studies show that human experts are not perfect, and there are techniques that allow fuzzy systems to simulate human-level performance, including variability. He demonstrated the necessity of "fuzzy intelligence" from two aspects: (a) fuzzy methodology (in the technical sense of Zade fuzzy sets and systems) is necessary as a knowledge-based system to represent and reason for uncertainty; (b) when evaluating intelligent systems, ambiguity (in a non-technical sense) is required and imperfect performance is accepted [14]. This paper is inspired by these ideas and has carried out related research. In this context, to solve the problem of low rate of code coverage caused by the repeated execution of test cases on the same path, starting from the level of system program in the realization of industrial Internet protocols and in the premise of acquirable source program codes or executable binary file, the paper proposes a method combined with the dynamic multi-modal sensor communication data in the program for fuzzy processing. The method tacks the program execution of protocol realization, finds the input fields affecting conditional branches through dynamic taint analysis, and captures the dependence relationship between conditional branches to guide the generation of test cases pertinently.

Security analysis of industrial control protocols
The industrial control system is to directly communicate with the underlying equipment or the data acquisition converter using the protocol agreed by the communicating parties. At first, most industrial control protocols were implemented only between a closed network and trusted software through a dedicated serial port. But in order to meet the increasingly complex requirements of industrial control systems, dedicated lines are gradually being replaced by TCP or wireless links. At the beginning of the design of industrial control protocols did not fully consider the necessary conditions to protect users' security such as encryption and authentication, and many protocols currently rely on TCP/IP. In this case, the data transmitted through the protocol cannot guarantee its security. An attacker only needs to master the protocol specifications and penetrate into the industrial control network to tamper with any data of the target device. The security flaws of common industrial control protocols are analyzed as follows.
1. The Modbus protocol has no authentication mechanism. It establishes communication on the basis of TCP/IP. Therefore, as long as the attacker obtains the device's network IP, he can successfully connect using port 502 directly. If the function code carried by the application data unit is supported by the Modbus device, a legal Modbus session can be established. In addition, there is no message check in the Modbus/TCP protocol. The checksum is generated at the transport layer instead of the application layer, so it is easier to fake the command. At the same time, for anyone, as long as they can connect to the target Modbus device, they can perform the functions of the Modbus device without permission to distinguish. Moreover, the data encapsulated in Modbus is transmitted in clear text, and an attacker can obtain the message data by using a network packet capture tool. Finally, the most dangerous aspect of Modbus is its programmability. Attackers can inject malicious code into RTU or PLC to achieve control. 2. IEC 60870-5-101/104 is a widely used protocol in power industrial control systems, used to implement communication data transmission between MTU and RTU in industrial control systems. First, the checksum mechanism cannot guarantee the integrity of the transmitted data. The IEC-101 protocol only has a 1-byte checksum. There is a possibility of a checksum overflow, and the exact value of this checksum cannot be determined by using a single-byte checksum, so the integrity of the data cannot be guaranteed. IEC-104 defaults to the default checksum. Its integrity is completely dependent on the underlying layer, and data values can be easily changed. Secondly, in the data transmission layer, the limited bandwidth limits the transmission length of the packet frame. The IEC-101 and IEC-104 protocols can only send 255 8-bit bytes when used at the same time, so it is not possible to add security bits during the data transmission. Although IEC 62351 was introduced to add security features to the IEC 60870-5 series of protocols, it is only focused on the application layer and does not involve the security mechanism of the data link layer. 3. DNP3 is a standard formulated by IEEE PES on the basis of IEC. Its security is mainly reflected in the absence of authorization and encryption mechanisms, and its function codes and data types have been clearly defined, which makes it easier for attackers to tamper with DNP3 session content. 4. The OPC protocol is a set of interface specifications conforming to industrial control requirements formulated by the OPC Foundation for protocol conversion and data sharing. The OPC protocol is implemented in a Windows environment, so an attacker can use Windows system vulnerabilities to attack it. Although most OPC hosts have the authentication mechanism enabled, the weak passwords, coupled with the opening of many system-independent services, make unnecessary running processes and development ports vulnerable to attack. OPC's reliance on Microsoft's authorization services is also outdated, and these security policies are too fragile and vulnerable. 5. Ethernet/IP is more modern than Modbus, but there are still security issues. For example, when using UDP to broadcast data in real time, it lacks any built-in network layer mechanism to ensure communication reliability and data integrity. Attackers can easily inject false data or use IGMP control messages to manipulate transmission paths.

An industrial Internet protocol fuzz test method based on dynamic analysis
In view of the serious damages caused by IIS' vulnerability, researchers have proposed many vulnerability discovery technologies, such as the dynamic analysis, the symbolic execution, and the fuzz test [15,16]. Comparing with other technologies, the fuzz test requires only a little knowledge of target, but can expand to large application program easily with good reusability and thus has become the most popular vulnerability discovery solution currently, especially in the IIS. With regard to network protocols, the most representative fuzzifier is SPIKE [17]. Its principle is describing the protocol as a block sequence model and making the data variation in blocks, and then partitioning the message's data structure and making statistic of field length after the variation automatically, thus improving the effectiveness of test cases significantly, but the fuzzifier shows an inadequate descriptive power for the constrained relationship in protocol messages. Later, Sulley [18] and Peach [19] extended the data model based on SPIKE and added more descriptions on the dependence relationship between data blocks. In order to provide a more flexible and accurate fuzzy framework, AFL [20] tracked the path coverage of each input through the lightweight instrumentation in source program and allocated ID randomly to the basic blocks on the path using the Hash mechanism to determine whether where is any new path generated, and then used the input generating a new path as the seed. Combining with the detailed information in the program, the method improves the code coverage rate, but the Hash method may have collision cases easily and thus cause the problem of false negatives even though the input has reached a new path. Gan et al. proposed CollAFL [21] which allocated the ID value to each basic block using the greedy algorithm and other methods to ensure the Hash value is difference in each side and thus avoided Hash collision and realized a more accurate judgment of path coverage. Besides, the method uses the number of neighbor branches of paths as the weight to sort seeds and prompts the fuzzifier to probe the paths that have not been reached. Sanjay et al. proposed VUzzer [22]. The method requires no prior knowledge of application program or input format, and mainly uses the control flow and data flow characteristic of static and dynamic analyses to assist in variation and selecting seed files, and prompts the program to execute on a deeper level by giving the specific weight to each basic block. With regard to the industrial Internet protocol, most research achievements were realized by improving and extending existing network protocol fuzz test methods or tools. For instance, Roland et al. proposed ProFuzz [23] specific for the Profinet protocol stack. ProFuzz is mainly based on the fuzz test tools developed by Scapy fuzzer [24], and also contains Sullley's fuzzer module, supporting five types including cyclic real-time execution, device discovery, configuration acquisition, application request, and accurate time control protocol. With regard to Wurldtech Company's BlackPeer test framework [25], its key to constructing anomalous data is, in the premise of given initial sample data, defining protocol messages with the extended Backus-Naur form and generating corresponding test data grammar and then realizing the variation of protocol data units using interactive semantIIS. On the basis, Yafeng Zhang et al. [9] improved the BNF grammar. They parsed the industrial protocol samples into the variation tree by introducing a tree-shape structure and traversed all nodes in the tree for the purpose of variation. McCorkle et al. [26] made a fuzz test to the upper computer software of IIS, constructing anomalous data by making the variation operation to response messages and keeping the communication state of protocol by forging check codes and count values. SecuriTeam added the DNP3 protocol into the test tool beSTORM [27] and found the vulnerability of denial of service attack specific for DNP3 during the fuzz test to Wireshark. The Mu suite launched by Mu Dynamic Company [28] is applicable to protocols of IEC 61850, Modbus/TCP and DNP3, which constructs abnormal message data with the structured grammar analysis method and can also extend the industrial Internet protocols with unknown specifications using the additional function Studio Fuzz. Lzfuzz [29] is used for the specific DCADA protocols with unknown grammar, of which the basic idea is deducing message tokens using the Lempel-Ziv compression algorithm and, combined with token information, making the variation processing to samples, but the method can hardly be used for the protocols with complicated input grammar.
From the research above, we summarize the problems of fuzz test methods applying to industrial Internet protocols currently as follows: (a) The low code coverage: Many bugs can be triggered only in the case of a large path coverage rate. Although some methods have used the dynamic multi-modal sensor communication data processed in the program, it is not reflected in the generation of test cases directly, causing that most cases are executed repeatedly on the same paths as input data and only cover a few paths. (b) No unified description models. Protocols, even of the same type, have different forms of messages, but current methods require building different data models, thus increasing the construction workload of model. (c) Insufficient pertinence of test cases. It is hard for test cases to pass the program verification, thus causing too many invalid tests. The design of variation strategy without taking the characteristic of industrial Internet protocols into full consideration may cause the redundancy of case in the case of a lot of sample data and then affect test efficiency.

Dynamic taint analysis
The dynamic taint analysis proposed by Newsome [30] is a method tracking information flow during the running of program, i.e., tracking the transmission of data slots to be analyzed in the system and gaining target program's detailed processing process for the data. In the method proposed, we use the technology to get the dynamic multimodal sensor communication data in the program to provide the basis for the further fuzz test.
The dynamic taint analysis method involves two parts, namely taint data identification and taint transmission path monitoring [31]. The core of taint data identification is defining taint data. If the data source is suspect, the data generated by it are taint data, which, as the input, shall be tracked and analyzed in the running process of binary code, and then the message data constructed in the fuzz test can be considered as suspect.
During the running of program, the transmission of taint data is completed through instructions and taint data may participate in the operation as the source operands in the arithmetical operation instruction or the parameters of data movement instruction, of which the operation output is generally related to taint tracking data and thus should be identified as the taint attribute. Figure 2 shows a simple example of dynamic analysis process. In the figure, a 1 and a 2 are taint data; the arrow means the operation execution; the leading end represents the source operand; and the tail end represents the operation output. During the running of program, variables V 1~V8 are affected by taint data a 1 and a 2 . If A, as the set of taint source of each variable, is empty, we can consider that the variable is not tainted. From the figure, we can see that the set of taint source of V 8 which is the last output is the union set of taint source sets of operation operands V 3 and V 6 .
Using the dynamic taint method, we mainly aim to analyze the data flow and the control flow, also known as the explicit flow and the implicit flow, during the execution of binary codes. The former refers to data dependence relationship, i.e., variable V 1 's taint information is sent to V 2 directly through the assignment or arithmetic operation, as shown in Fig. 2; the latter corresponds to the control and dependence relationship of conditional branches, i.e., variable V 1 's taint information is sent to V 2 indirectly through the associated condition expression. For the purpose of understanding, we make a simple analysis using the example of sample code shown in Fig. 3.
In the example above, the function of code implementation experiences the input of variable x, the generation of message msg and the submission through post. In the process, x is identified as the taint datum, and according to the analysis method of data flow, msg is assigned as constant "a" or "b" directly. The constant would not be tainted, so msg is identified as the untainted attribute. However, the value of msg also depends on whether conditional branches x==a and x==b are true or false, i.e., msg and x have a control and dependence relationship, which belongs to the implicit taint of flow. In practical applications, such codes have a security risk when used for the communication realization of network protocols.
When msg is submitted, the attacker may intercept msg and then deduce the value of input x, so msg is definitely a taint datum and thus should be tacked and monitored. There will be false negatives without considering the control and dependence relationship of msg. On the contrary, url's value is independent of branches L3 and L7, so there is no harm even if it is identified as the untainted datum.

Method and principle
According to the industrial Internet protocol realization program, the section gets related dynamic multi-modal sensor communication data through dynamic taint analysis to guide the generation of test cases. Figure 4 shows the specific test process of the method. The method proposed uses many protocol messages as the input to avoid the problem of insufficient test cases caused by single sample datum. We define the input protocol fields affecting conditional branches through the dynamic taint analysis, make the taint identification processing to each protocol input field, and track the tainted data flow in the execution of program. With regard to the dependence and control relationship of program, we capture it with the algorithm proposed in ref. [32]. It should be noted that for industrial Internet protocols, the fields generally have the checksum (like the CRC) which will cause the deluge of taint data if transmitted in control information flow. Because of all fields used as checksums, most conditional branches will be identified as taint data, in which case, it is hard to directly find the specific field affecting conditional branches. Besides, due to the dependence and control relationship, the fuzzy method proposed considers the transmission taint in control flow indirectly. Therefore, when using the dynamic taint analysis method, we only focus on the transmission taint in the data flow rather than tracking the transmission taint in the control flow. Algorithm 1 introduces the fuzzy method proposed in details. The main function dynamicFuzz can execute program P, Protocol G, and protocol message I as the input, to process much protocol input, and the output is the abnormal information triggered. Lines 1~2 of algorithm initialize the data structure to be stored. The test cases generated which will also be used as the new input are put in the queue for storage for convenience of recursion execution. Line 5 combined with two parameters, program, and protocol input, gets the sets of DIF(x i ) and CDC(x i ) of each conditional branch in each program using the dynamic taint analysis method, and then constructs the corresponding dynamic control flow diagram using DIF(x i ) as the node and CDC(x i ) as the side. Line 6 makes the fuzzy operation from the start point of executable path in the control flow diagram. Line 8 deduces Constraint(x i ), the corresponding constraint condition of node x i , as the parameter of test case generation algorithm.
Line 16 of algorithm means that after the generation of test case, we need to use the test case as the new input to replace the value of protocol field in set DIF(x i ) of original input and modify all fields related to Constraint(x i ) to meet the constraint condition. To ensure the test case will be executed on a deeper level in the program and improve the probability of finding new execution paths, the rest of protocol fields which are unrelated to DIF(x i ) and Constraint(x i ) should also be kept valid according to protocol grammar. For instance, in the protocol input, the values of two fields start address and end address shall change from original 1 and 2 into 3 and 6. Even if there is no constraint condition, the corresponding fields of address data shall be regenerated based on the original normal size (increasing from 1 to 3), but if the two address change into the invalid test cases of 6 and 3 after the fuzzification (start address < end address), the fields unrelated to DIF(x i ) and Constraint(x i ) still keep the normal values.
Lines 17~20 monitors whether there is any abnormal state such as the crash or memory leak in the execution of test case. Although the test case is used as the new input, in consideration of the expenses caused by program execution, the method proposed does not get the dynamic multi-modal sensor communication data of each conditional branch repeatedly. In the execution of test case, lines 21~24 store the code paths traversed, put the conditional branches unparsed into the input queue and then decide the priority ranking according to the quantity of new branches. Line 25 of algorithm stores the dynamic multi-modal sensor communication data sequence of each node for lines 9~13 to judge whether current node generates the test case. When the list has a dynamic multi-modal sensor communication data sequence and the test case generated which correspond to current node x i 's DIF(x i ) and Constraint(x i ), respectively, then the algorithm will skip the test case generation of the node directly. Line 29 of algorithm stores the dynamic control flow diagrams as the paths probed in the queue after completing the code paths of program.
To reflect the dynamic multi-modal sensor communication data extracted from the program into the construction of test case directly, the method proposed focuses on the generation technology assisted with the variation and guides the generation of test case grammar specific for node x i by combining with corresponding dynamic multimodal sensor communication data. Algorithm 2 describes the specific process. The test case generation function makeTestCases considers protocol field fields, i.e., the element in set DIF(x i ) in Definition 1, rule constraint c and protocol G as the input, and the output is the set of test cases tcList. The key to the function is generating test cases by acquiring the efficient grammar of each node and the reversed grammar. Line 2 has only one valid grammar for protocol fields, i.e., deducing the grammar of each protocol field in the constraint condition according to parameters as the valid grammar of the node. The feasibility is the valid grammar applied to node x i definitely is the subset of protocol grammar G, because according to Definition 3.2, i.e., in the constraint condition of Constraint(x i ), the valid grammar can apply to all the protocol fields in DIF(x i ). Line 3 means when the protocol field has much valid grammar, constructing much grammar through the combinations of the valid grammar and storing the grammar in the set of test cases. For instance, for node xi, there is DIF(x i ) = {F a ,F b }, and the valid grammar deduced in Constraint(x i ) is F a = (0|1), F b = 2, and then the grammar combined is (F a = 0, F b = 2) and (F a = 1, F b = 2). Next, lines 4~9 of algorithm reverses the valid grammar corresponding to each filed of valid grammar in the premise of meeting constraint condition Constraint(x i ) and then combine them with other fields' valid grammar for fuzzified grammar and add the grammar to the set of test cases.
When there is no applicable test grammar found in DIF(x i ), the case data shall be generated through variation with methods like random bit flip, extreme substitution, and boundary value substitution.

Methods and experimental
We use the Modbus protocol as an example. Figure 5 shows the simplified format of Modbus protocol. The fields include the protocol identity, the length of information, the field of function code, the start address, the end address, the data field determined according to function code, and the CRC (Cyclic Redundancy Check). The figures in brackets represent the sizes of fields. The values of CRC calculation and other checksum algorithms as the mechanisms completing protocol communication are generally imbedded into protocol specifications to identify potential broken data. Although the verification mechanism of Modbus/TCP is realized in the transmission frame of TCP and the protocol formats do not include the CRC, the CRC verification generally exists in Modbus RTU and other industrial Internet protocols. If the CRC received by the program fails to match the CRC obtained through calculation, the data of corresponding case may be ignored directly. It is a very important mechanism to ensure security and functionality, but if CRC's value is not updated when protocol fields change, the fuzz test shall be blocked, so the CRC is also considered as an important field. Figure 6 describes some program codes processing protocol realization. Line 3 conditional branch of program checks whether the protocol identity meets specifications and verifies the validity of input message data. The protocol messages failing to meet specifications shall be abandoned directly without any processing. Line 8 is the CRC. Similarly, the messages failing to pass the check shall also be abandoned directly. If the conditional branches above are passed, the next operation is reading the function code. The example program only offers two typical function codes of Modbus protocol: reading and writing, i.e., function code 0x05 supported represents the writing operation, and 0x01 represents the reading operation. The example code has two bugs, and these anomalies shall be triggered only in the case of meeting specific conditional branches. Line 22 describes the typical heap overflow bug which means the program fails to check whether the size of data from the start address to the end address is consistent with the size of actual message and thus causes the problem of heap overflow when the length of data exceeds the size of actual data. Lines 24~26 are the bugs embedded deliberately, which shall be triggered only if a specific value (KEY) is written into a specific address (TRIG_POINT). We can see clearly that for the test case input, the bug may be triggered only through the protocol identity and CRC. It is difficult for the fuzz test method based on random variation which does not know the grammar actually executed by the target protocol in the program and thus cannot reach the conditional branches may trigger the anomaly and also causes a lot of invalid test data. According to the algorithm above, we analyze the protocol input format in Fig. 5 and the protocol program example in Fig. 6. For the convenience of description, we call the protocol fields as I, L, F, S, E, D, and C for short. Figure 7 shows the control flow diagram and execution paths of the first input. It is a dynamic control flow diagram constructed through a valid input executing the writing function, of which the right half part shows the paths actually covered by the input during the running of example program and the dotted part shows the new execution paths discovered in each node. According to Definition 3, in the figure, the start node 3 1 represents the first execution of the conditional branch corresponding to the 3rd line of example code; the elements in braces represent DIF(3 1 ), the field affecting the 3rd line conditional branch; the side with arrow represents the dependence and control  Fig. 8 is the test case generated by choosing the combinations of function code 0x05 and n data fields in the case node 24 1 is true in the first input execution, and is considered as the second input. Suppose the start node 3 1 does not depend on any node, considers protocol identity field I as the dynamic interactive field, and has the only true value of 0x0000, and then get the corresponding two test grammar after fuzzification with the algorithm. Node 3 1 actually is a conditional branch to identify whether the field is true or false according to the protocol, so according to each test grammar, it is easy to deduce that the constraint condition making node 3 1 true is I ≠ 0x0000. Briefly speaking, the condition making the conditional branch of line 3 executable is I ≠ 0x0000, while the constraint making the conditional branch false is I = 0x0000. In this case, for the next node 8 1 , It should be noted that according to the algorithm description, the test grammar generated in node 13 1 has decided whether node 14 1 is true or false, so there is no corresponding test grammar generated in node 141. Similarly, skip nodes 21 1 , 21 2 , 22 2 , 24 2 , and 21 3 . According to Algorithm 1, when the test grammar is generated, the dynamic flow diagram is stored into the queue as the probed path. In this case, if there is the probed path, when searching paths in the control flow diagram, we can get the partition node generating different paths by comparing the paths and consider the node as a start node. All nodes before the node have the same dynamic multi-modal sensor communication data and thus can be extracted directly to avoid the repeated generation of test cases. Table 1 gives an example of the test case generation grammar of the first input obtained with Algorithm 2. We make an experiment using the most popular Modbus/TCP protocol as an example. Figure 9 shows the test environment created. Table 2 gives the equipment involved in the Fig. 10. The experiment is made with Ubuntu 14.04, so we choose the open source library libmodbus to realize the Modbus/ TCP protocol communication in Linux system. The fuzzy method proposed is realized with Symfuzz [33], a tool developed based on BAP [34] open-source binary system analysis framework, which can convert executable files into the intermediary language applicable to program analysis, and combined with PIN to execute dynamic binary system instrumentation to the target program to get the dynamic interactive fields and dependence and control relationship required by the method to guide the generation of subsequent test case. We choose AFL-fuzzer as the fuzzy tool in the comparison experiment.
The Modbus slave station (libmodbus server) waits for the request messages from other main stations. In the experiment, we set that the main station messages send 5, 10, and 15 request messages, as test input, to the slave station, respectively, and makes the fuzzy processing to Modbus slave station program and monitors whether the target program has any anomaly.
To evaluate the performance of the fuzz test method proposed, we make a comparison in terms of the quantity of test cases, the total number of execution paths, the code coverage rate, and the test time with the same number of samples.
The experiment makes the statistic on the number of test cases generated using two fuzz test methods with 5, 10, and 15 sample data. The number of test cases refers to the total number of samples generated after the crash of program or the complete execution of samples. Figure 10 shows that with the increase in the number of samples, the test data generated with the method proposed are significantly less than the test Table 1 Test case generation grammar rules Grammar after fuzzification Test grammar 3 1 I ø ø I = 0x0000 I ≠ 0x0000 I = 0x0000 I ≠ 0x0000 data generated with AFL-fuzzer, because the method proposed constructs test cases using the generation-based technology in which case the test cases generated are definitely less than those of AFL-fuzzer using the variation strategy. Besides, we can see the number of test cases generated by AFL-fuzzer grows steadily as the number of samples increases, because Afl-fuzzer prunes input samples in consideration of that users may offer low-quality initial samples and thus cause the possible data redundancy in some types of variations. Although depending more on the number of samples, the method proposed adopts a more pertinent generation strategy [35][36][37][38][39][40][41][42][43][44][45][46][47][48][49]. Figure 11 shows the coverage rate of code. The calculation principle is that the instrumentation can help capture the coverage rate of branch and detect the rough hit count of branch execution. The code coverage rate does not necessarily have any connection with the probability of finding anomalies, but undoubtedly, for a test case, an execution path failing to reach the program's conditional branch on a deep level certainly will not trigger any potential anomaly. So, we can see that the method proposed realizes a higher rate of code coverage compared with the AFL-fuzzer. Table 3 shows the statistic when the target program crashes. The process of guiding test case generation by combining with program dynamic multi-modal sensor communication data pays the price of test time for solving conditional branch constraint and generating test grammar, compared with the variation strategy. However, it significantly increases the number of execution paths and the code coverage rate, indicating in the running of program, the more execution paths are found by the conditional branches traversed, the bigger the probability that test cases reach the deep level and trigger anomalies is, and the stronger the pertinence is. Therefore, in the case the program has anomalies, although scarifying some test time, the method proposed is superior to the AFL-fuzzer in terms of realizing Modbus-TCP protocol test.

Conclusion
The work proposes a fuzzy processing method combined with dynamic multi-modal sensor communication data from the level of system program in industrial Internet  protocol realization. The paper first expounds the theory of dynamic taint analysis and then gives the definition of dynamic multi-modal sensor communication data required and proposes the fuzz test method combined with dynamic multi-modal sensor communication data. The method proposed tracks program execution, finds the input fields affecting conditional branches through the dynamic taint analysis, and captures the dependence relationship of conditional branches to guide test case grammar generation pertinently, thus increasing the opportunity of executing codes on the deep level. The results of comparison experiment prove that the method improves the validity of test cases and the coverage rate of codes to some extent and also increases the probability of finding the anomalies in protocol realization. However, follow-up research needs to deepen around the following areas: Fig. 10 The numbers of test cases with different numbers of samples