In this section, we will conduct rigorous security analysis of our proposed scheme. Specifically, we evaluate the security strength of secure keyword-match queries. Then, we discuss how our scheme can achieve forward security during the update operation.
5.1 Security on encrypted keyword search
The keyword-match index design is built on the framework of SSE scheme proposed in [5]. Once the data owner uploads the encrypted index to the data server, the size of indexes will be learned. During the query procedure, there will be the leakage of access pattern and query pattern. Explicitly, the access pattern indicates the search results; the query pattern is the repeated query tokens. Following the security notion of SSE, we first define the leakage functions for exact-match index initialization as follows:
$$\begin{aligned} {\mathcal {L}}_1^{kwd} (\mathbf{K }) = (\{Z_i\}_m, \langle |\alpha |, |\beta | \rangle ) \end{aligned}$$
where \(\mathbf{K }\) is the set of keywords, m is the number of data nodes, \(Z_i\) is the node i’s keyword-match index size, and \(\langle |\alpha |, |\beta | \rangle\) are the index lengths of key-value pairs. After processing a keyword search request, we define the following leakage functions:
$$\begin{aligned} {\mathcal {L}}_2^{kwd} (K) = (t_K, \{\langle \alpha , \beta \rangle , id\}_n) \end{aligned}$$
where K is the query keyword, \(t_K\) is the query token, and \(\{\langle \alpha , \beta \rangle , id\}_n\) are n query results including the accessed index pairs and corresponding encrypted document IDs. In addition, we also define the leakage \({\mathcal {L}}_3^{kwd}\) to maintain repeated requests as follows:
$$\begin{aligned} {\mathcal {L}}_3^{kwd} (\mathbf{Q }) = (M_{q\times q}) \end{aligned}$$
where \(\mathbf{Q }\) is q number of keyword search requests. \(M_{q\times q}\) is the symmetric bit matrix that maintains the repeated requests. Each element in the \(M_{q\times q}\) is initialized as 0. For \(i,j\in [1,q]\), the elements of matrix \(M_{i,j}\) and \(M_{j,i}\) are equal to 1 if two tokens \(t_i=t_j\). Given above leakage definitions, we provide the simulation-based security definition of the keyword-match scheme as follows:
Definition 1
Let \(\mathsf {\Pi _{kwd} = (KGen, Build_{kwd}, Query_{kwd})}\) be our secure keyword-match query scheme, and let \({\mathcal {L}}_1^{kwd}\), \({\mathcal {L}}_2^{kwd}\) and \({\mathcal {L}}_3^{kwd}\) be the leakage functions. Given a probabilistic polynomial time (PPT) adversary \({\mathcal {A}}\) and a PPT simulator \({\mathcal {S}}\), define the following probabilistic games \(\mathbf {Real}_{{\mathcal {A}}}(k)\) and \(\mathbf {Ideal}_{{\mathcal {A}}, {\mathcal {S}}}(k)\):
\(\mathbf {Real}_{{\mathcal {A}}}(k)\): The data owner calls \(\mathsf {KGen(1^k)}\) to get a private key K. \({\mathcal {A}}\) selects a dataset \({\mathbf {D}}\) and asks the owner to build \(\{I^{kwd}_1, \cdots , I^{kwd}_m\}\) via \(\mathsf {Build_{kwd}}\). Then, \({\mathcal {A}}\) adaptively conducts a polynomial number of q queries with the tokens and ciphertexts generated by the owner. Finally, \({\mathcal {A}}\) returns a bit as the output.
\(\mathbf {Ideal}_{{\mathcal {A}}, {\mathcal {S}}}(k)\): \({\mathcal {A}}\) selects \({\mathbf {D}}\), and \({\mathcal {S}}\) builds \(\{I'^{kwd}_1,\) \(\cdots\), \(I'^{kwd}_m\}\) for \({\mathcal {A}}\) based on \({\mathcal {L}}^{kwd}_1\). Then, \({\mathcal {A}}\) adaptively performs a polynomial number of q queries. From \({\mathcal {L}}^{kwd}_2\) and \({\mathcal {L}}^{kwd}_3\) in each query, \({\mathcal {S}}\) generates the simulated tokens and ciphertexts, which are processed over \(\{I'^{kwd}_1, \cdots , I'^{kwd}_m\}\). Finally, \({\mathcal {A}}\) returns a bit as the output.
\(\mathsf {\Pi _{kwd}}\) is adaptively secure with \(({\mathcal {L}}^{kwd}_1, {\mathcal {L}}^{kwd}_2, {\mathcal {L}}^{kwd}_3)\) if for all PPT adversaries \({\mathcal {A}}\), there exists a PPT simulator \({\mathcal {S}}\) such that: \(Pr[\mathbf {Real}_{{\mathcal {A}}}(k)=1] - Pr[\mathbf {Ideal}_{{\mathcal {A}}, {\mathcal {S}}}(k) = 1] \le negl(k)\), where negl(k) is a negligible function in k.
Theorem 1
\(\mathsf {\Pi _{kwd}}\) is adaptively secure with \(({\mathcal {L}}^{kwd}_1\), \({\mathcal {L}}^{kwd}_2\), \({\mathcal {L}}^{kwd}_3)\) leakages under the random-oracle model if G1, G2, and h are secure PRFs.
Proof
Given \({\mathcal {L}}^{kwd}_1\), the simulator \({\mathcal {S}}\) simulates the encrypted keyword-match indexes \(\{I'^{kwd}_1, \cdots , I'^{kwd}_m\}\) for m nodes, which have the same size Z as the real encrypted indexes. Each simulated entry contains \(|\alpha |\)-bit and \(|\beta |\)-bit random string as a key-value pair, which is indistinguishable from the real encrypted index entry.
From \({\mathcal {L}}^{kwd}_2\), \({\mathcal {S}}\) can simulate the first query token and results. On the simulated index, \({\mathcal {S}}\) randomly selects n entries, which are the same as the query request over the real one, and assigns the resulting id to the simulated entries. The random masked key-value pair can be simulated as \(\alpha _i' = G1'(t', n), \beta ' = G2'(\alpha _{i-1}', id)\), where \(i\in \{1, n\}\) and \(t'\) is a random string as the simulated token, and id is identical to the one in the real keyword-match queries. In particular, we use random oracles \(\{G1', G2'\}\) as PRFs \(\{G1, G2\}\). From \({\mathcal {L}}^{kwd}_3\), S updates \(M_{1,1} = 1\) in a matrix \(M_{q\times q}\).
In the subsequent jth queries (\(j\in \{2, q\}\)), if the query appears repeatedly, \({\mathcal {S}}\) will choose the same tokens simulated before, and return the repeated matching results. Meanwhile, it will update the corresponding element in \(M'_{1, j}\) and \(M'_{j, 1}\) to be “1.” Otherwise, \({\mathcal {S}}\) will generate simulate tokens and operate random oracle to get the results as shown in the first query procedure.
Due to the pseudo-randomness of secure PRF, \({\mathcal {A}}\) cannot differentiate the outputs of the simulated experiment from the real one. \(\square\)
5.2 Forward security analysis
As described in Sect. 4, we combine keyword state information stored on table S on the smart contract and a chaining index table stored on the cloud server to preserve our scheme to achieve forward security. Because the search trapdoor of keyword w is generated from the latest state of S associated with w, and this state updates once a new keyword/document pair (w, id) is added to the database. Meanwhile, each newly added entry needs to be encrypted by using fresh random masks generated from the latest state information. Cloud server does not know which already searched/updated keyword that current document contains. And it does not know newly updated search trapdoor of keyword w until next query of keyword w. Based on the construction of the chain-based index, the cloud server can recover neither the matched document id embedded with newly added key-value pair without the updated search trapdoor, nor learn whether the newly added entry is generated from the same keyword as that of those previously added entries without knowing the newly updated state information.