Let the number of hidden neurons of the dynamic voltage and current mapping neural networks be *N*_{hv} and *N*_{hc}, respectively. Let generic symbols *w*_{1, i} (*i* = 1, 2, …, *N*_{hv}) and *w*_{2, i} (*i* = 1, 2, …, *N*_{hc}) be internal weights of the voltage and current mapping neural network, respectively. *w*_{1, i} and *w*_{2, i} are the *i-*th component of vectors *w*_{1} and *w*_{2}, respectively. In order to train the general Neuro-SM efficiently, gradient information provided by sensitivities of the model with respect to *w*_{1, i} and *w*_{2, i} is needed [16].

(1) *DC sensitivity*: let the DC output at gate and drain of the general Neuro-SM model be *I*_{f, DC}. The sensitivities of *I*_{f, DC} with respect to *w*_{1, i} and *w*_{2, i} are described in functional form as

$$ {\displaystyle \begin{array}{l}\frac{\partial {\boldsymbol{I}}_{f,\mathrm{DC}}}{\partial {w}_{1,i}}={\left(\frac{\partial {\boldsymbol{I}}_{f,\mathrm{DC}}^T}{\partial {\boldsymbol{I}}_{c,\mathrm{DC}}}\right)}^T\cdot {\left(\frac{\partial {\boldsymbol{I}}_{c,\mathrm{DC}}^T}{\partial {\boldsymbol{V}}_{c,\mathrm{DC}}}\right)}^T\cdot \frac{\partial {\boldsymbol{V}}_{c,\mathrm{DC}}}{\partial {w}_{1,i}}\\ {}={\left(\frac{\partial {\boldsymbol{h}}_{\mathrm{ANN}}^T\left({\boldsymbol{V}}_{f,\mathrm{DC}},\overset{N_c+1}{\overbrace{{\left.{\boldsymbol{I}}_{c,\mathrm{DC}}\right|}_{{\boldsymbol{V}}_{c,\mathrm{DC}}},{\boldsymbol{I}}_{c,\mathrm{DC}}{\left|{}_{{\boldsymbol{V}}_{c,\mathrm{DC}}},\dots, {\boldsymbol{I}}_{c,\mathrm{DC}}\right|}_{{\boldsymbol{V}}_{c,\mathrm{DC}}}}},{\boldsymbol{w}}_2\right)\kern0.6em }{\partial {\boldsymbol{I}}_{c,\mathrm{DC}}}\right)}^T\kern0.3em \cdot {\boldsymbol{G}}_c\\ {}\kern3.399999em \cdot \frac{\partial {\boldsymbol{f}}_{\mathrm{ANN}}\left(\overset{N_v+1}{\overbrace{{\boldsymbol{V}}_{f,\mathrm{DC}},{\boldsymbol{V}}_{f,\boldsymbol{DC}},\dots, {\boldsymbol{V}}_{f,\mathrm{DC}}}},{\boldsymbol{w}}_1\right)\kern0.1em }{\partial {w}_{1,i}}\kern11.60001em \end{array}} $$

(10)

$$ \frac{\partial {\boldsymbol{I}}_{f,\mathrm{DC}}}{\partial {w}_{2,i}}=\frac{\partial {\boldsymbol{h}}_{\mathrm{ANN}}\left({\boldsymbol{V}}_{f,\mathrm{DC}},\overset{N_c+1}{\overbrace{{\left.{\boldsymbol{I}}_{c,\mathrm{DC}}\right|}_{{\boldsymbol{V}}_{c,\mathrm{DC}}},{\boldsymbol{I}}_{c,\mathrm{DC}}{\left|{}_{{\boldsymbol{V}}_{c,\mathrm{DC}}},\dots, {\boldsymbol{I}}_{c,\mathrm{DC}}\right|}_{{\boldsymbol{V}}_{c,\mathrm{DC}}}}},{\boldsymbol{w}}_2\right)\kern0.1em }{\partial {w}_{2,i}} $$

(11)

where \( {\boldsymbol{G}}_c={\left(\partial {\boldsymbol{I}}_{c,\mathrm{DC}}^T/\partial {\boldsymbol{V}}_{c,\mathrm{DC}}\right)}^T \) is the DC conductance matrix of the existing coarse model, and the first-order derivatives *∂**f*_{ANN}/*∂w*_{1, i} and *∂**h*_{ANN}/*∂w*_{2, i} can be calculated by neural network backpropagation [17].

(2) *S parameter sensitivity*: *S* parameter sensitivity can be obtained by converting its *Y* parameter sensitivity. The small-signal *Y* parameter sensitivities of the general Neuro-SM model with respect to *w*_{1, i} and *w*_{2, i} are shown in Eqs. (12) and (13), respectively. These two equations can be obtain by differentiating (5) with respect to *w*_{1, i} and *w*_{2, i}, respectively.

$$ {\displaystyle \begin{array}{l}\frac{\partial {\boldsymbol{Y}}_f\left(\omega \right)}{\partial {w}_{1,i}}\\ {}=\sum \limits_{r=1,2}\sum \limits_{m=0}^{N_c}\left(\begin{array}{l}{\left(\sum \limits_{l=0}^{N_c}{\left.{e}^{- j\omega l\tau}\cdot \frac{\partial^2{\boldsymbol{h}}_{\mathrm{ANN}}^T\left({\boldsymbol{v}}_f(t),{\boldsymbol{i}}_c(t),{\boldsymbol{i}}_c\left(t-\tau \right),\dots, {\boldsymbol{i}}_c\left(t-{N}_c\tau \right),{\boldsymbol{w}}_2\right)}{\partial {\boldsymbol{i}}_c\left(t- l\tau \right)\partial {i}_{cr}\left(t- m\tau \right)}\right|}_{\begin{array}{l}{\boldsymbol{v}}_f={\boldsymbol{V}}_{f,\mathrm{Bias}}\\ {}{\left.{\boldsymbol{i}}_c(t)={\boldsymbol{i}}_c\left(t-\tau \right)=\dots ={\boldsymbol{i}}_c\left(t-{N}_c\tau \right)={\boldsymbol{I}}_c\right|}_{{\boldsymbol{V}}_{c,\mathrm{Bias}}}\end{array}}\right)}^T\\ {}{\left.\cdot {e}^{- j\omega m\tau}\cdot \sum \limits_{p=1,2}{Y}_{c, rp}\right|}_{{\boldsymbol{V}}_{c,\mathrm{Bias}}}\cdot {\left.\frac{\partial {f}_{\mathrm{ANN}p}\left({\boldsymbol{v}}_f(t),{\boldsymbol{v}}_f\left(t-\tau \right),\dots, {\boldsymbol{v}}_f\left(t-{N}_v\tau \right),{\boldsymbol{w}}_1\right)}{\partial {w}_{1,i}}\right|}_{{\boldsymbol{v}}_f={\boldsymbol{V}}_{f.\mathrm{Bias}}}\kern0.1em \end{array}\right)\\ {}\kern0.5em \cdot {\left.{\boldsymbol{Y}}_c\left(\omega \right)\right|}_{{\boldsymbol{V}}_{c,\mathrm{Bias}}}\cdot {\left(\sum \limits_{k=0}^{N_v}{e}^{- j\omega k\tau}\cdot {\left.\frac{\partial {\boldsymbol{f}}_{\mathrm{ANN}}^T\left({\boldsymbol{v}}_f(t),{\boldsymbol{v}}_f\left(t-\tau \right),\dots, {\boldsymbol{v}}_f\left(t-{N}_v\tau \right),{\boldsymbol{w}}_1\right)}{\partial {\boldsymbol{v}}_f\left(t- k\tau \right)}\right|}_{{\boldsymbol{v}}_f(t)={\boldsymbol{v}}_f\left(t-\tau \right)=\dots ={\boldsymbol{v}}_f\left(t-{N}_v\tau \right)={\boldsymbol{V}}_{f,\mathrm{Bias}}}\right)}^T\\ {}+{\left(\sum \limits_{l=0}^{N_c}{\left.{e}^{- j\omega l\tau}\cdot \frac{\partial {\boldsymbol{h}}_{\mathrm{ANN}}^T\left({\boldsymbol{v}}_f(t),{\boldsymbol{i}}_c(t),{\boldsymbol{i}}_c\left(t-\tau \right),\dots, {\boldsymbol{i}}_c\left(t-{N}_c\tau \right),{\boldsymbol{w}}_2\right)}{\partial {\boldsymbol{i}}_c\left(t- l\tau \right)}\right|}_{\begin{array}{l}{\boldsymbol{v}}_f={\boldsymbol{V}}_{f.\mathrm{Bias}}\\ {}{\left.{\boldsymbol{i}}_c(t)={\boldsymbol{i}}_c\left(t-\tau \right)=\dots ={\boldsymbol{i}}_c\left(t-{N}_c\tau \right)={\boldsymbol{I}}_c\right|}_{{\boldsymbol{V}}_{c,\mathrm{Bias}}}\end{array}}\right)}^T\cdot {\left.{\boldsymbol{Y}}_c\left(\omega \right)\right|}_{{\boldsymbol{V}}_{c,\mathrm{Bias}}}\kern0.1em \\ {}\kern0.3em \cdot {\left(\sum \limits_{k=0}^{N_v}{e}^{- j\omega k\tau}\cdot {\left.\frac{\partial^2{\boldsymbol{f}}_{\mathrm{ANN}}^T\left({\boldsymbol{v}}_f(t),{\boldsymbol{v}}_f\left(t-\tau \right),\dots, {\boldsymbol{v}}_f\left(t-{N}_v\tau \right),{\boldsymbol{w}}_1\right)}{\partial {\boldsymbol{v}}_f\left(t- k\tau \right)\partial {w}_{1,i}}\right|}_{{\boldsymbol{v}}_f(t)={\boldsymbol{v}}_f\left(t-\tau \right)=\dots ={\boldsymbol{v}}_f\left(t-{N}_v\tau \right)={\boldsymbol{V}}_{f,\mathrm{Bias}}}\right)}^T\\ {}+{\left(\sum \limits_{l=0}^{N_c}{\left.{e}^{- j\omega l\tau}\cdot \frac{\partial {\boldsymbol{h}}_{\mathrm{ANN}}^T\left({\boldsymbol{v}}_f(t),{\boldsymbol{i}}_c(t),{\boldsymbol{i}}_c\left(t-\tau \right),\dots, {\boldsymbol{i}}_c\left(t-{N}_c\tau \right),{\boldsymbol{w}}_2\right)}{\partial {\boldsymbol{i}}_c\left(t- l\tau \right)}\right|}_{\begin{array}{l}{\boldsymbol{v}}_f={\boldsymbol{V}}_{f.\mathrm{Bias}}\\ {}{\left.{\boldsymbol{i}}_c(t)={\boldsymbol{i}}_c\left(t-\tau \right)=\dots ={\boldsymbol{i}}_c\left(t-{N}_c\tau \right)={\boldsymbol{I}}_c\right|}_{{\boldsymbol{V}}_{c,\mathrm{Bias}}}\end{array}}\right)}^T\\ {}\cdot \left(\sum \limits_{r=1,2}{\left.\frac{\partial {\boldsymbol{Y}}_c}{\partial {v}_{cr}}\right|}_{{\boldsymbol{V}}_{c,\mathrm{Bias}}}\cdot {\left.\frac{\partial {f}_{\mathrm{ANN}r}\left({\boldsymbol{v}}_f(t),{\boldsymbol{v}}_f\left(t-\tau \right),\dots, {\boldsymbol{v}}_f\left(t-{N}_v\tau \right),{\boldsymbol{w}}_1\right)}{\partial {w}_{1,i}}\right|}_{{\boldsymbol{v}}_f(t)={\boldsymbol{v}}_f\left(t-\tau \right)=\dots ={\boldsymbol{v}}_f\left(t-{N}_v\tau \right)={\boldsymbol{V}}_{f,\mathrm{Bias}}}\right)\\ {}\kern0.6em \cdot {\left(\sum \limits_{k=0}^{N_v}{e}^{- j\omega k\tau}\cdot {\left.\frac{\partial {\boldsymbol{f}}_{\mathrm{ANN}}^T\left({\boldsymbol{v}}_f(t),{\boldsymbol{v}}_f\left(t-\tau \right),\dots, {\boldsymbol{v}}_f\left(t-{N}_v\tau \right),{\boldsymbol{w}}_1\right)}{\partial {\boldsymbol{v}}_f\left(t- k\tau \right)}\right|}_{{\boldsymbol{v}}_f(t)={\boldsymbol{v}}_f\left(t-\tau \right)=\dots ={\boldsymbol{v}}_f\left(t-{N}_v\tau \right)={\boldsymbol{V}}_{f,\mathrm{Bias}}}\right)}^T\kern3.199999em \\ {}\kern1em \end{array}} $$

(12)

$$ {\displaystyle \begin{array}{l}\frac{\partial {\boldsymbol{Y}}_f\left(\omega \right)}{\partial {w}_{2,i}}\\ {}={\left(\sum \limits_{l=0}^{N_c}{\left.{e}^{- j\omega l\tau}\cdot \frac{\partial^2{\boldsymbol{h}}_{\mathrm{ANN}}^T\left({\boldsymbol{v}}_f(t),{\boldsymbol{i}}_c(t),{\boldsymbol{i}}_c\left(t-\tau \right),\dots, {\boldsymbol{i}}_c\left(t-{N}_c\tau \right),{\boldsymbol{w}}_2\right)}{\partial {\boldsymbol{i}}_c\left(t- l\tau \right)\partial {w}_{2,i}}\right|}_{\begin{array}{l}{\boldsymbol{v}}_f={\boldsymbol{V}}_{f.\mathrm{Bias}}\\ {}{\left.{\boldsymbol{i}}_c(t)={\boldsymbol{i}}_c\left(t-\tau \right)=\dots ={\boldsymbol{i}}_c\left(t-{N}_c\tau \right)={\boldsymbol{I}}_c\right|}_{{\boldsymbol{V}}_{c,\mathrm{Bias}}}\end{array}}\right)}^T\cdot {\left.{\boldsymbol{Y}}_c\left(\omega \right)\right|}_{{\boldsymbol{V}}_{c,\mathrm{Bias}}}\\ {}\kern0.6em \cdot {\left(\sum \limits_{k=0}^{N_v}{e}^{- j\omega k\tau}\cdot {\left.\frac{\partial {\boldsymbol{f}}_{\mathrm{ANN}}^T\left({\boldsymbol{v}}_f(t),{\boldsymbol{v}}_f\left(t-\tau \right),\dots, {\boldsymbol{v}}_f\left(t-{N}_v\tau \right),{\boldsymbol{w}}_1\right)}{\partial {\boldsymbol{v}}_f\left(t- k\tau \right)}\right|}_{{\boldsymbol{v}}_f(t)={\boldsymbol{v}}_f\left(t-\tau \right)=\dots ={\boldsymbol{v}}_f\left(t-{N}_v\tau \right)={\boldsymbol{V}}_{f,\mathrm{Bias}}}\right)}^T\\ {}\kern0.5em +{\left({\left.\frac{\partial^2{\boldsymbol{h}}_{\mathrm{ANN}}^T\left({\boldsymbol{v}}_f(t),{\boldsymbol{i}}_c(t),{\boldsymbol{i}}_c\left(t-\tau \right),\dots, {\boldsymbol{i}}_c\left(t-{N}_c\tau \right),{\boldsymbol{w}}_2\right)}{\partial {\boldsymbol{v}}_f(t)\partial {w}_{2,i}}\right|}_{\begin{array}{l}{\boldsymbol{v}}_f={\boldsymbol{V}}_{f.\mathrm{Bias}}\\ {}{\left.{\boldsymbol{i}}_c(t)={\boldsymbol{i}}_c\left(t-\tau \right)=\dots ={\boldsymbol{i}}_c\left(t-{N}_c\tau \right)={\boldsymbol{I}}_c\right|}_{{\boldsymbol{V}}_{c,\mathrm{Bias}}}\end{array}}\right)}^T\end{array}} $$

(13)

where the second-order derivative of the dynamic voltage and current mapping neural networks *f*_{ANN} and *h*_{ANN}, which are the differentiation of the Jacobian matrix \( \partial {\boldsymbol{f}}_{\mathrm{ANN}}^T/\partial {\boldsymbol{i}}_c\left(t- l\tau \right) \) and \( \partial {\boldsymbol{f}}_{\mathrm{ANN}}^T/\partial {\boldsymbol{v}}_f\left(t- k\tau \right) \) with respect to *w*_{1, i} and *w*_{2, i}, can be obtained by the adjoint neural network back-propagation [17], respectively.

(3) *HB sensitivity*: the sensitivities of the large-signal harmonic current of the general Neuro-SM model with respect to *w*_{1, i} and *w*_{2, i} at a generic harmonic frequency *ω*_{
k
}, *k* = 0, 1, 2, …, *N*_{
H
} can be described in functional form as

$$ {\displaystyle \begin{array}{l}\frac{\partial {\boldsymbol{I}}_f\left({\omega}_k\right)}{\partial {w}_{1,i}}\\ {}=\frac{1}{N_T}\sum \limits_{n=0}^{N_T-1}\left(\begin{array}{l}\sum \limits_{m=0}^{N_c}\frac{{\left.\operatorname{}\partial {\boldsymbol{h}}_{\mathrm{ANN}}\Big({\boldsymbol{v}}_f\left({t}_n\right),{\boldsymbol{i}}_c\left({t}_n\right)\right|}_{{\boldsymbol{v}}_c\left({t}_n\right)},{\boldsymbol{i}}_c\left({t}_n-\tau \right){\left|{}_{{\boldsymbol{v}}_c\left({t}_n\right)},\dots, {\boldsymbol{i}}_c\left({t}_n-{N}_c\tau \right)\right|}_{{\boldsymbol{v}}_c\left({t}_n\right)},{\boldsymbol{w}}_2\Big)}{\partial {\boldsymbol{i}}_c\left({t}_n- m\tau \right)}\cdot {e}^{- j\omega m\tau}\\ {}{\left.\operatorname{}\cdot {\boldsymbol{G}}_c\left({t}_n\right)\right|}_{{\boldsymbol{v}}_c\left({t}_n\right)}\cdot \frac{\partial {\boldsymbol{f}}_{\mathrm{ANN}}\left({\boldsymbol{v}}_f\left({t}_n\right),{\boldsymbol{v}}_f\left({t}_n-\tau \right),\dots, {\boldsymbol{v}}_f\left({t}_n-{N}_v\tau \right),{\boldsymbol{w}}_1\right)}{\partial {w}_{1,i}}\cdot {W}_N\left(n,k\right)\end{array}\right)\kern0.1em \end{array}} $$

(14)

$$ {\displaystyle \begin{array}{l}\frac{\partial {\boldsymbol{I}}_f\left({\omega}_k\right)}{\partial {w}_{2,i}}\\ {}=\frac{1}{N_T}\sum \limits_{n=0}^{N_T-1}\frac{{\left.\partial {\boldsymbol{h}}_{\mathrm{ANN}}\Big({\boldsymbol{v}}_f\left({t}_n\right),{\boldsymbol{i}}_c\left({t}_n\right)\right|}_{{\boldsymbol{v}}_c\left({t}_n\right)},{\boldsymbol{i}}_c\left({t}_n-\tau \right){\left|{}_{{\boldsymbol{v}}_c\left({t}_n\right)},\dots, {\boldsymbol{i}}_c\left({t}_n-{N}_c\tau \right)\right|}_{{\boldsymbol{v}}_c\left({t}_n\right)},{\boldsymbol{w}}_2\Big)}{\partial {w}_{2,i}}\cdot {W}_N\left(n,k\right)\end{array}} $$

(15)

where *G*_{
c
}(*t*_{
n
}) at the mapped voltage of coarse model *v*_{c}(*t*_{
n
}) is the nonlinear conductance matrix of the existing coarse model at time point *t*_{n}.