Data processing scheme based on blockchain

In the white paper written on Bitcoin, a chain of blocks was proposed by Satoshi Nakamoto. Since then, blockchain has been rapidly developed. Blockchain is not only limited to the field of cryptocurrency but also has been extensively applied to the Internet of Things, supply chain finance, electronic evidence storage, data sharing, and e-government fields. Both the public chain and the alliance chain have been extensively developed. In the data processing field, blockchain has a particularly good application potential. The Square Kilometre Array (SKA) is a proposal consisting of a joint venture of more than ten countries, resulting in the world’s largest synthetic aperture radio telescope. In the SKA, the processing scale of the data is large, and it consists of several data processing nodes. The data will be processed in the cloud computing mode. Taking the SKA under consideration, this report proposes a data processing scheme based on blockchain for the anti-counterfeiting, anti-tampering and traceability of data. Furthermore, the authenticity and integrity of the data are assured. The primary aspects include data distribution, data operation and data sharing, which correspond to the data reception, data algorithm processing and result sharing of data operation in the SKA. With this process, the integrity, reliability and authenticity of the data are guaranteed. Additionally, smart contracts, homomorphic hashing, secure containers, aggregate signatures and one-way encrypted channels are implemented to ensure the intelligence, security and high performance of the process.

privacy protection, which Ethereum cannot currently meet.In 2015, the Hyperledger project was launched, in which the IBM-backed Fabric framework was the most recognized.Fabric is aimed at the alliance blockchain, which essentially meets the needs of practical applications in terms of performance, privacy protection and usability.
With the development of the public and alliance chains, the blockchain application field has rapidly developed.Blockchain has been extensively applied to the Internet of Things [5], supply chain finance, digital data storage certificate, data processing and e-government [6] fields.In the field of data processing, blockchain guarantees the authenticity, security and reliability of data [7].Various studies have introduced the use of blockchain for medical data sharing [8], personal data protection [9] and data distribution [10].
Astronomical data have certain characteristics, such as large amounts of data, realtime requirements [11], complicated calculation processes [12], heterogeneous calculation nodes [13], diverse storage models, various data access patterns [14], high expansibility, etc. High-performance computing, distributed computing, parallel computing, uniform resource management, container technology and telescope observation control system technology are needed [15].Current-related technologies, such as Apache Hadoop, OpenMP, MPI, etc., all face various problems in processing astronomical data [16].In the SKA data process, it is necessary to use cloud computing [17].In distributed data processing, attention should be given to data protection [18].Therefore, the security of the data-distributed storage [19] and the integrity of the data [20] are particularly important.During data processing, there are extremely high requirements for the synchronization of time [21] and the optimization of algorithm in data merging [22].Blockchain can play a positive role in ensuring the integrity, security and availability of the data.
In the remainder of this report, Sect. 2 primarily introduces the data distribution scheme based on blockchain, which reflects the generation and collection of data in the SKA.Section 3 introduces the method of data operation, which reflects the combination of collected data and related algorithms.Section 3.3 introduces the process of sharing data, which reflects the sharing problem of the result after the original data is processed by the related algorithms.Section 4 summarizes the conclusions of this report.

Preliminaries
In this section, we first define certain notations used in this report.

Bilinear mapping
G 1 and G 2 are two multiplicative cyclic groups of prime order p, where g 1 is a genera- tor of G 1 and g 2 is a generator of G 2 .ψ is a computable isomorphism from G 2 to G 1 , with ψ(g 2 ) = g 1 .A bilinear pairing can be defined as where G 1 = g 1 , G 2 = g 2 and G T are multiplicative groups of order n.Let e : G 1 × G 2 → G T be defined as a map with the following properties: • Non-degenerate: There exists u ∈ G 1 , v ∈ G 2 such that e(u, v) = O , where O denotes the identity of G T .• Computability: There is an efficient algorithm to compute e (u,v) Then, e is considered bilinear mapping.

Aggregate signature
An aggregate signature is a variant signature scheme used to aggregate any number of signatures into one signature.For example, suppose there are n users in the system {u 1 ,u 2 ,…,u n }, n public keys {pk 1 , pk 2 ,…,pk n }, n messages {m 1 , m 2 ,…,m n } and n signatures {σ 1 , σ 2 ,…,σ n } for these messages.The generator of the aggregate signature (here the generator can be arbitrary and does not need to be in {u 1 ,u 2 ,…,u n }) can aggregate {σ 1 , σ 2 ,…, σ n } to a short signature σ.Importantly, the aggregate signature is verifiable, i.e., given a set of public keys {pk 1 , pk 2 ,…,pk n } and its signatures of the original message set {m 1 ,m 2 ,… ,m n }, it can be verified that the user u i has created a signature of message m i .The execution of the aggregate signature is described in detail below.AS = (Gen, Sign, Verify, AggS, AggV) is a quintuple of the polynomial time algorithm, and the details can be noted as follows: DS = (Gen, Sign, Verify) is a common signature scheme, which is also known as the benchmark for the aggregate signature.
Furthermore, the aggregate signature can support incremental aggregation; thus, if σ 1 and σ 2 can be aggregated to σ 12 , then σ 12 and σ 3 can be aggregated to σ 123 .

Homomorphic hash
Homomorphism is the mapping of two algebraic structures in abstract algebra that remain structurally constant.There are two groups, G 1 and G 2 , and f is the mapping from The homomorphic hash has long been used in peer-to-peer networks [23], which use correction and network codes together against attack events.In a peer-to-peer network, each peer will obtain the original data block directly from the other peers; thus, hash functions such as SHA1 can be used to directly verify the correctness of the received data block by comparing the hash value of the received data block with the original hash value.
Using the homomorphic hash function mentioned in earlier studies [24], i.e., h G (•) , a set of hash parameters can be obtained as h G (•) , G = p, q, g .The parameter description is shown in Table 1.Each of these elements in g can be represented as x (p−1)/q mod p , where x ∈ Z p and x = 1.
where rand(•) is a pseudo-random function, which can be used as a pseudo-random number generator to initialize the homomorphism hash function parameters in the generating process, generate random numbers in the tag generate process, and determine the random data block in the challenge process, thus creating challenges that can cover the entire data range.
For a block b i , the hash value can be calculated as follows: The hash values of the original block Given a coding block e j and a coefficient vector (c j,1 , c j,2 , . . ., c j,n ) , the homomorphic hash function h G (•) can satisfy the equation as follows: This feature can be used to verify the integrity of a code block.First, the publisher needs to calculate the homomorphic hash values of each data block in advance.The download downloads these homomorphic hash values.Once the verification block is received, its hash value can be calculated using Eq.(3).Then, Eq. ( 4) can be used to verify the correctness of the verification block [25].

Blockchain-based data distribution scheme
Here, we simplify the process of receiving astronomical data in the SKA.The SOURCE represents the original astronomical data, and the Data Receiving Station (DRS) represents the real astronomical data receiving device.The DRS setting is distributed.Different DRSs are responsible for receiving data within their own respective areas.
Considering the limitation of the hardware functions, the DRS is only responsible for data reception, temporary storage and data forwarding; it does not participate in data calculation.All data calculation is completed by the Data Processing Node (DPN), which is connected to the blockchain.The concrete architecture is shown in Fig. 1.
The method of processing data from the SOURCE to the DRS is relatively simple.It involves processing the data format and setting the storage mode, which is not the focus of this study.Here, the execution process of the DRS to the DPN is introduced.
Furthermore, we use the idea of distributed storage in an IPFS, as shown in Fig. 2.
Each block contains a list of trading objects, a link to the previous block, and a hash value for the state tree/database.
Additionally, we introduce the method used to import data into a blockchain.Let q be a large prime number.Then, select P ∈ G 1 , Q ∈ G 2 to define an addi- tive group G 1 and a multiplicative group G 2 with order q.Thus, a bilinear map- ping e : The number of data receiving stations is m, the number of data processing nodes responsible for the ith data receiving station is m i , and the current view is v. ing parameters, the system parameters can be obtained as follows: The user u i selects a random value x i ∈ Z * q as its secret value and calculates It can be assumed that the public key of the jth (j = 1, 2, . . .m i ) Data Processing Node DPN j i of the ith (j = 1, 2, . . .m i ) Data Receiving Station (DRS i ) in the rth round is pk 1 i , pk 2 i , . . ., pk m i i . The data produced by the SOURCE is D r i .Each DRS consensus for the resulting data can be reached using a static aggregate Practical Byzantine Fault Tolerance (PBFT) [26,27].The specific process is shown in Algorithm 1.To verify the validity of the aggregate signature σ, Algorithm 1 can be implemented.Using the system parameter Params, user's corresponding identity list ID = {ID 1 , . . ., ID n } , public keys list P = {P 1 , . . ., P n } , messages list M = {m 1 , . . ., m n } , signature list σ = {σ 1 , . . ., σ n } , computer Q i = H 1 (ID i ||P i ) and T = H 2 (P v ) , the equation can be verified as follows: If the equation holds true, then the validation passes; otherwise, the validation fails.The correctness of this basic framework is given below.Theorems 1 and 2 provide the correctness of the verification process of a single signature and the correctness of the verification process of an aggregate signature, respectively.

Theorem 1
The verification process of a single signature is correct.
Proof: The verification process of the signature σ i = (V i , R i ) that DRS i performs for D r i can be given as follows: Theorem 2 The verification process of an aggregation signature is correct.

Blockchain-based data operation scheme
The Science Data Processor (SDP) [28] is the SKA Data processing module.The main data are taken from the Central Signal Processor (CSP) module [29], the metadata are taken from the Telescope Manager (TM) module, and the Signal and Data Transport (SaDT) module is responsible for the data transmission.Multiple regional data processing centres will be built.The primary functions of the SDP can be given as follows: • Extract data from the CSP and TM modules • Treat source data as data products that can be used for scientific research • Archive and store data products • Provide access to data products • Control and feedback information to the TM module for a timely challenge observation In the SKA SDP, the two most important computational tasks are FFT [30] and gridding [31].These two algorithms account for an important part of the total computation, and their efficient implementation provides considerable assistance in the design of the SKA SDP.
As depicted in Fig. 3, in the data calculation scheme based on the blockchain, the Data Supply Node (DSN) and the Algorithm Supply Node (ASN) are separated, and all of the data and algorithms enter the Secure Container [32] through a one-way encrypted channel under the control of the Smart Contract (SM) to perform calculations.Providers and the provided time of the data and algorithms are recorded on the blockchain through the SM.It can be assumed that there are w Data Supply Nodes and one Algorithm Supply Node.Before entering the Secure Container, all of the data D i (i = 1, 2, . . ., w) and algorithms A are signed by the private key(sk i ) of the DSN i and the private key(sk a ) of the ASN.Furthermore, the data and algorithms are first encrypted by the public key SC pk of the Secure Container and then decrypted and verified after entering the Secure Container by the public key (pk i ) of the DSN i , the public key(pk a ) of the ASN and the private key of the Secure Container.This specific process is shown in Algorithm 2 and Algorithm 3. As described in Algorithm 2, each DSN signs the data with its own private key and then encrypts the data with the public key of the security container.The processed data are sent to the security container.Then, the sub-block H(D i )||time i is calculated.Then, the ASN signs the algorithm with its own private key and encrypts the data with the private key of the security container.The processed data are sent to the security container.The sub-block H (A)||time a is then calculated.At last, the final block b As described in Algorithm 3, the security container verifies each D ′ i with its private key and the public key of each DSN.The processed data are then sent to the security container.Next, the sub-block H D ′ i ||time i is calculated.Then, A ′ is verified with its private key and the public key of the ASN.The processed data are sent to the security container.The sub-block H A ′ ||time a is calculated.At last, the final block

Blockchain-based data sharing scheme
The Data Requirement Nodes, which are represented by the public keys pk 1 , pk 2 , . . ., pk r of the calculation result, are determined in advance through the smart contract.Under intrusive surveillance, the calculated result Re is shared to the nodes represented by these public keys.The shared results, targets and shared time are recorded on the blockchain through the smart contracts.The concrete architecture is shown in Fig. 4. As shown in Fig. 4, the allocation of data is allocated by the data container to each data consumer.In order to ensure the security of data, data allocation adopts the way of single channel.The data allocation rules are determined by the smart contract of the system.
Before recording on the blockchain, it is necessary to verify the target, and the target verifies the calculated results.If the verification passes, then it is signed.If more than 2/3 of the target's signature is obtained, then the block formed will be recorded on the blockchain.The simple architecture is shown in Fig. 5.
It is assumed that there are r Data Requirement Nodes (DRNs).The calculated results Re are encrypted by the public key pk i of DRN i (i = 1, 2, . . ., r) and signed by the private key SC sk of the SC to obtain Re i .The cascading of the hash value of Re i and the time forms the block b i .The homomorphic hash h is used by pk i .Then, b i (i = 1, 2, . . ., r) forms the final block b.At last, the homomorphic hash is verified.If the verification passes, then the calculation results Re i will be sent to the DRN i , which will be decrypted by the private key sk i of the DRN i and the public key SC pk of the secure container.This specific process is shown in Algorithm 4.

Conclusion
This study discusses the data storage, data operation and data sharing methods for large amounts of data processing.Using the blockchain data structure combined with intelligent contracts, homomorphic hashes, secure containers, aggregate signatures and one-way encrypted channels, the authenticity, integrity and reliability of data for the collection, calculation and results sharing of astronomical data is ensured.Combined with the SKA project, this scheme can be applied to astronomical data processing.This method provides innovative ideas for the application of blockchain in the fields of large data volume, rapid data generation, high complexity data processing and high value data processing results.
If S is a set, then |S| denotes the number of elements in this set.If b is a real number, then a ← b indicates that a = b.If C is a node and c is an element, then C ⇐ c denotes sending c to C. If a and b are two real numbers, then a||b indicates the cascading of a and b.

Fig. 1
Fig. 1 Data distribution based on blockchain.This image depicts the concrete architecture of the process of receiving astronomical data in a SKA.The SOURCE represents the original astronomical data, and the data receiving station (DRS) represents the real astronomical data receiving device.All data calculations are completed by the data processing node (DPN), which is connected to the blockchain

Fig. 2
Fig. 2 Distributed storage in IPFS.This image depicts the distributed storage in an IPFS.Each block contains a list of trading objects, link to the previous block, and hash value for the state tree/database

Fig. 3
Fig. 3 Data Operation Based on Blockchain.In the blockchain data calculation scheme, the Data Supply Node (DSN) and the Algorithm Supply Node (ASN) are separated, and all of the data and algorithms enter the Secure Container through a one-way encrypted channel under the control of the Smart Contract (SM) to perform calculations

Fig. 4
Fig. 4 Data Sharing Based on Blockchain.This image depicts data sharing architecture based on blockchain

Fig. 5
Fig. 5 Validation of Smart Contract.This image depicts the architecture of a smart contract signature.If more than 2/3 of the target's signature is obtained, then the block formed will be recorded on the blockchain

Table 1 Parameter description Parameter Description
p Discrete logarithmic secure parameter (1024 bit) q