Let the number of hidden neurons of the dynamic voltage and current mapping neural networks be Nhv and Nhc, respectively. Let generic symbols w1, i (i = 1, 2, …, Nhv) and w2, i (i = 1, 2, …, Nhc) be internal weights of the voltage and current mapping neural network, respectively. w1, i and w2, i are the i-th component of vectors w1 and w2, respectively. In order to train the general Neuro-SM efficiently, gradient information provided by sensitivities of the model with respect to w1, i and w2, i is needed [16].
(1) DC sensitivity: let the DC output at gate and drain of the general Neuro-SM model be If, DC. The sensitivities of If, DC with respect to w1, i and w2, i are described in functional form as
$$ {\displaystyle \begin{array}{l}\frac{\partial {\boldsymbol{I}}_{f,\mathrm{DC}}}{\partial {w}_{1,i}}={\left(\frac{\partial {\boldsymbol{I}}_{f,\mathrm{DC}}^T}{\partial {\boldsymbol{I}}_{c,\mathrm{DC}}}\right)}^T\cdot {\left(\frac{\partial {\boldsymbol{I}}_{c,\mathrm{DC}}^T}{\partial {\boldsymbol{V}}_{c,\mathrm{DC}}}\right)}^T\cdot \frac{\partial {\boldsymbol{V}}_{c,\mathrm{DC}}}{\partial {w}_{1,i}}\\ {}={\left(\frac{\partial {\boldsymbol{h}}_{\mathrm{ANN}}^T\left({\boldsymbol{V}}_{f,\mathrm{DC}},\overset{N_c+1}{\overbrace{{\left.{\boldsymbol{I}}_{c,\mathrm{DC}}\right|}_{{\boldsymbol{V}}_{c,\mathrm{DC}}},{\boldsymbol{I}}_{c,\mathrm{DC}}{\left|{}_{{\boldsymbol{V}}_{c,\mathrm{DC}}},\dots, {\boldsymbol{I}}_{c,\mathrm{DC}}\right|}_{{\boldsymbol{V}}_{c,\mathrm{DC}}}}},{\boldsymbol{w}}_2\right)\kern0.6em }{\partial {\boldsymbol{I}}_{c,\mathrm{DC}}}\right)}^T\kern0.3em \cdot {\boldsymbol{G}}_c\\ {}\kern3.399999em \cdot \frac{\partial {\boldsymbol{f}}_{\mathrm{ANN}}\left(\overset{N_v+1}{\overbrace{{\boldsymbol{V}}_{f,\mathrm{DC}},{\boldsymbol{V}}_{f,\boldsymbol{DC}},\dots, {\boldsymbol{V}}_{f,\mathrm{DC}}}},{\boldsymbol{w}}_1\right)\kern0.1em }{\partial {w}_{1,i}}\kern11.60001em \end{array}} $$
(10)
$$ \frac{\partial {\boldsymbol{I}}_{f,\mathrm{DC}}}{\partial {w}_{2,i}}=\frac{\partial {\boldsymbol{h}}_{\mathrm{ANN}}\left({\boldsymbol{V}}_{f,\mathrm{DC}},\overset{N_c+1}{\overbrace{{\left.{\boldsymbol{I}}_{c,\mathrm{DC}}\right|}_{{\boldsymbol{V}}_{c,\mathrm{DC}}},{\boldsymbol{I}}_{c,\mathrm{DC}}{\left|{}_{{\boldsymbol{V}}_{c,\mathrm{DC}}},\dots, {\boldsymbol{I}}_{c,\mathrm{DC}}\right|}_{{\boldsymbol{V}}_{c,\mathrm{DC}}}}},{\boldsymbol{w}}_2\right)\kern0.1em }{\partial {w}_{2,i}} $$
(11)
where \( {\boldsymbol{G}}_c={\left(\partial {\boldsymbol{I}}_{c,\mathrm{DC}}^T/\partial {\boldsymbol{V}}_{c,\mathrm{DC}}\right)}^T \) is the DC conductance matrix of the existing coarse model, and the first-order derivatives ∂fANN/∂w1, i and ∂hANN/∂w2, i can be calculated by neural network backpropagation [17].
(2) S parameter sensitivity: S parameter sensitivity can be obtained by converting its Y parameter sensitivity. The small-signal Y parameter sensitivities of the general Neuro-SM model with respect to w1, i and w2, i are shown in Eqs. (12) and (13), respectively. These two equations can be obtain by differentiating (5) with respect to w1, i and w2, i, respectively.
$$ {\displaystyle \begin{array}{l}\frac{\partial {\boldsymbol{Y}}_f\left(\omega \right)}{\partial {w}_{1,i}}\\ {}=\sum \limits_{r=1,2}\sum \limits_{m=0}^{N_c}\left(\begin{array}{l}{\left(\sum \limits_{l=0}^{N_c}{\left.{e}^{- j\omega l\tau}\cdot \frac{\partial^2{\boldsymbol{h}}_{\mathrm{ANN}}^T\left({\boldsymbol{v}}_f(t),{\boldsymbol{i}}_c(t),{\boldsymbol{i}}_c\left(t-\tau \right),\dots, {\boldsymbol{i}}_c\left(t-{N}_c\tau \right),{\boldsymbol{w}}_2\right)}{\partial {\boldsymbol{i}}_c\left(t- l\tau \right)\partial {i}_{cr}\left(t- m\tau \right)}\right|}_{\begin{array}{l}{\boldsymbol{v}}_f={\boldsymbol{V}}_{f,\mathrm{Bias}}\\ {}{\left.{\boldsymbol{i}}_c(t)={\boldsymbol{i}}_c\left(t-\tau \right)=\dots ={\boldsymbol{i}}_c\left(t-{N}_c\tau \right)={\boldsymbol{I}}_c\right|}_{{\boldsymbol{V}}_{c,\mathrm{Bias}}}\end{array}}\right)}^T\\ {}{\left.\cdot {e}^{- j\omega m\tau}\cdot \sum \limits_{p=1,2}{Y}_{c, rp}\right|}_{{\boldsymbol{V}}_{c,\mathrm{Bias}}}\cdot {\left.\frac{\partial {f}_{\mathrm{ANN}p}\left({\boldsymbol{v}}_f(t),{\boldsymbol{v}}_f\left(t-\tau \right),\dots, {\boldsymbol{v}}_f\left(t-{N}_v\tau \right),{\boldsymbol{w}}_1\right)}{\partial {w}_{1,i}}\right|}_{{\boldsymbol{v}}_f={\boldsymbol{V}}_{f.\mathrm{Bias}}}\kern0.1em \end{array}\right)\\ {}\kern0.5em \cdot {\left.{\boldsymbol{Y}}_c\left(\omega \right)\right|}_{{\boldsymbol{V}}_{c,\mathrm{Bias}}}\cdot {\left(\sum \limits_{k=0}^{N_v}{e}^{- j\omega k\tau}\cdot {\left.\frac{\partial {\boldsymbol{f}}_{\mathrm{ANN}}^T\left({\boldsymbol{v}}_f(t),{\boldsymbol{v}}_f\left(t-\tau \right),\dots, {\boldsymbol{v}}_f\left(t-{N}_v\tau \right),{\boldsymbol{w}}_1\right)}{\partial {\boldsymbol{v}}_f\left(t- k\tau \right)}\right|}_{{\boldsymbol{v}}_f(t)={\boldsymbol{v}}_f\left(t-\tau \right)=\dots ={\boldsymbol{v}}_f\left(t-{N}_v\tau \right)={\boldsymbol{V}}_{f,\mathrm{Bias}}}\right)}^T\\ {}+{\left(\sum \limits_{l=0}^{N_c}{\left.{e}^{- j\omega l\tau}\cdot \frac{\partial {\boldsymbol{h}}_{\mathrm{ANN}}^T\left({\boldsymbol{v}}_f(t),{\boldsymbol{i}}_c(t),{\boldsymbol{i}}_c\left(t-\tau \right),\dots, {\boldsymbol{i}}_c\left(t-{N}_c\tau \right),{\boldsymbol{w}}_2\right)}{\partial {\boldsymbol{i}}_c\left(t- l\tau \right)}\right|}_{\begin{array}{l}{\boldsymbol{v}}_f={\boldsymbol{V}}_{f.\mathrm{Bias}}\\ {}{\left.{\boldsymbol{i}}_c(t)={\boldsymbol{i}}_c\left(t-\tau \right)=\dots ={\boldsymbol{i}}_c\left(t-{N}_c\tau \right)={\boldsymbol{I}}_c\right|}_{{\boldsymbol{V}}_{c,\mathrm{Bias}}}\end{array}}\right)}^T\cdot {\left.{\boldsymbol{Y}}_c\left(\omega \right)\right|}_{{\boldsymbol{V}}_{c,\mathrm{Bias}}}\kern0.1em \\ {}\kern0.3em \cdot {\left(\sum \limits_{k=0}^{N_v}{e}^{- j\omega k\tau}\cdot {\left.\frac{\partial^2{\boldsymbol{f}}_{\mathrm{ANN}}^T\left({\boldsymbol{v}}_f(t),{\boldsymbol{v}}_f\left(t-\tau \right),\dots, {\boldsymbol{v}}_f\left(t-{N}_v\tau \right),{\boldsymbol{w}}_1\right)}{\partial {\boldsymbol{v}}_f\left(t- k\tau \right)\partial {w}_{1,i}}\right|}_{{\boldsymbol{v}}_f(t)={\boldsymbol{v}}_f\left(t-\tau \right)=\dots ={\boldsymbol{v}}_f\left(t-{N}_v\tau \right)={\boldsymbol{V}}_{f,\mathrm{Bias}}}\right)}^T\\ {}+{\left(\sum \limits_{l=0}^{N_c}{\left.{e}^{- j\omega l\tau}\cdot \frac{\partial {\boldsymbol{h}}_{\mathrm{ANN}}^T\left({\boldsymbol{v}}_f(t),{\boldsymbol{i}}_c(t),{\boldsymbol{i}}_c\left(t-\tau \right),\dots, {\boldsymbol{i}}_c\left(t-{N}_c\tau \right),{\boldsymbol{w}}_2\right)}{\partial {\boldsymbol{i}}_c\left(t- l\tau \right)}\right|}_{\begin{array}{l}{\boldsymbol{v}}_f={\boldsymbol{V}}_{f.\mathrm{Bias}}\\ {}{\left.{\boldsymbol{i}}_c(t)={\boldsymbol{i}}_c\left(t-\tau \right)=\dots ={\boldsymbol{i}}_c\left(t-{N}_c\tau \right)={\boldsymbol{I}}_c\right|}_{{\boldsymbol{V}}_{c,\mathrm{Bias}}}\end{array}}\right)}^T\\ {}\cdot \left(\sum \limits_{r=1,2}{\left.\frac{\partial {\boldsymbol{Y}}_c}{\partial {v}_{cr}}\right|}_{{\boldsymbol{V}}_{c,\mathrm{Bias}}}\cdot {\left.\frac{\partial {f}_{\mathrm{ANN}r}\left({\boldsymbol{v}}_f(t),{\boldsymbol{v}}_f\left(t-\tau \right),\dots, {\boldsymbol{v}}_f\left(t-{N}_v\tau \right),{\boldsymbol{w}}_1\right)}{\partial {w}_{1,i}}\right|}_{{\boldsymbol{v}}_f(t)={\boldsymbol{v}}_f\left(t-\tau \right)=\dots ={\boldsymbol{v}}_f\left(t-{N}_v\tau \right)={\boldsymbol{V}}_{f,\mathrm{Bias}}}\right)\\ {}\kern0.6em \cdot {\left(\sum \limits_{k=0}^{N_v}{e}^{- j\omega k\tau}\cdot {\left.\frac{\partial {\boldsymbol{f}}_{\mathrm{ANN}}^T\left({\boldsymbol{v}}_f(t),{\boldsymbol{v}}_f\left(t-\tau \right),\dots, {\boldsymbol{v}}_f\left(t-{N}_v\tau \right),{\boldsymbol{w}}_1\right)}{\partial {\boldsymbol{v}}_f\left(t- k\tau \right)}\right|}_{{\boldsymbol{v}}_f(t)={\boldsymbol{v}}_f\left(t-\tau \right)=\dots ={\boldsymbol{v}}_f\left(t-{N}_v\tau \right)={\boldsymbol{V}}_{f,\mathrm{Bias}}}\right)}^T\kern3.199999em \\ {}\kern1em \end{array}} $$
(12)
$$ {\displaystyle \begin{array}{l}\frac{\partial {\boldsymbol{Y}}_f\left(\omega \right)}{\partial {w}_{2,i}}\\ {}={\left(\sum \limits_{l=0}^{N_c}{\left.{e}^{- j\omega l\tau}\cdot \frac{\partial^2{\boldsymbol{h}}_{\mathrm{ANN}}^T\left({\boldsymbol{v}}_f(t),{\boldsymbol{i}}_c(t),{\boldsymbol{i}}_c\left(t-\tau \right),\dots, {\boldsymbol{i}}_c\left(t-{N}_c\tau \right),{\boldsymbol{w}}_2\right)}{\partial {\boldsymbol{i}}_c\left(t- l\tau \right)\partial {w}_{2,i}}\right|}_{\begin{array}{l}{\boldsymbol{v}}_f={\boldsymbol{V}}_{f.\mathrm{Bias}}\\ {}{\left.{\boldsymbol{i}}_c(t)={\boldsymbol{i}}_c\left(t-\tau \right)=\dots ={\boldsymbol{i}}_c\left(t-{N}_c\tau \right)={\boldsymbol{I}}_c\right|}_{{\boldsymbol{V}}_{c,\mathrm{Bias}}}\end{array}}\right)}^T\cdot {\left.{\boldsymbol{Y}}_c\left(\omega \right)\right|}_{{\boldsymbol{V}}_{c,\mathrm{Bias}}}\\ {}\kern0.6em \cdot {\left(\sum \limits_{k=0}^{N_v}{e}^{- j\omega k\tau}\cdot {\left.\frac{\partial {\boldsymbol{f}}_{\mathrm{ANN}}^T\left({\boldsymbol{v}}_f(t),{\boldsymbol{v}}_f\left(t-\tau \right),\dots, {\boldsymbol{v}}_f\left(t-{N}_v\tau \right),{\boldsymbol{w}}_1\right)}{\partial {\boldsymbol{v}}_f\left(t- k\tau \right)}\right|}_{{\boldsymbol{v}}_f(t)={\boldsymbol{v}}_f\left(t-\tau \right)=\dots ={\boldsymbol{v}}_f\left(t-{N}_v\tau \right)={\boldsymbol{V}}_{f,\mathrm{Bias}}}\right)}^T\\ {}\kern0.5em +{\left({\left.\frac{\partial^2{\boldsymbol{h}}_{\mathrm{ANN}}^T\left({\boldsymbol{v}}_f(t),{\boldsymbol{i}}_c(t),{\boldsymbol{i}}_c\left(t-\tau \right),\dots, {\boldsymbol{i}}_c\left(t-{N}_c\tau \right),{\boldsymbol{w}}_2\right)}{\partial {\boldsymbol{v}}_f(t)\partial {w}_{2,i}}\right|}_{\begin{array}{l}{\boldsymbol{v}}_f={\boldsymbol{V}}_{f.\mathrm{Bias}}\\ {}{\left.{\boldsymbol{i}}_c(t)={\boldsymbol{i}}_c\left(t-\tau \right)=\dots ={\boldsymbol{i}}_c\left(t-{N}_c\tau \right)={\boldsymbol{I}}_c\right|}_{{\boldsymbol{V}}_{c,\mathrm{Bias}}}\end{array}}\right)}^T\end{array}} $$
(13)
where the second-order derivative of the dynamic voltage and current mapping neural networks fANN and hANN, which are the differentiation of the Jacobian matrix \( \partial {\boldsymbol{f}}_{\mathrm{ANN}}^T/\partial {\boldsymbol{i}}_c\left(t- l\tau \right) \) and \( \partial {\boldsymbol{f}}_{\mathrm{ANN}}^T/\partial {\boldsymbol{v}}_f\left(t- k\tau \right) \) with respect to w1, i and w2, i, can be obtained by the adjoint neural network back-propagation [17], respectively.
(3) HB sensitivity: the sensitivities of the large-signal harmonic current of the general Neuro-SM model with respect to w1, i and w2, i at a generic harmonic frequency ω
k
, k = 0, 1, 2, …, N
H
can be described in functional form as
$$ {\displaystyle \begin{array}{l}\frac{\partial {\boldsymbol{I}}_f\left({\omega}_k\right)}{\partial {w}_{1,i}}\\ {}=\frac{1}{N_T}\sum \limits_{n=0}^{N_T-1}\left(\begin{array}{l}\sum \limits_{m=0}^{N_c}\frac{{\left.\operatorname{}\partial {\boldsymbol{h}}_{\mathrm{ANN}}\Big({\boldsymbol{v}}_f\left({t}_n\right),{\boldsymbol{i}}_c\left({t}_n\right)\right|}_{{\boldsymbol{v}}_c\left({t}_n\right)},{\boldsymbol{i}}_c\left({t}_n-\tau \right){\left|{}_{{\boldsymbol{v}}_c\left({t}_n\right)},\dots, {\boldsymbol{i}}_c\left({t}_n-{N}_c\tau \right)\right|}_{{\boldsymbol{v}}_c\left({t}_n\right)},{\boldsymbol{w}}_2\Big)}{\partial {\boldsymbol{i}}_c\left({t}_n- m\tau \right)}\cdot {e}^{- j\omega m\tau}\\ {}{\left.\operatorname{}\cdot {\boldsymbol{G}}_c\left({t}_n\right)\right|}_{{\boldsymbol{v}}_c\left({t}_n\right)}\cdot \frac{\partial {\boldsymbol{f}}_{\mathrm{ANN}}\left({\boldsymbol{v}}_f\left({t}_n\right),{\boldsymbol{v}}_f\left({t}_n-\tau \right),\dots, {\boldsymbol{v}}_f\left({t}_n-{N}_v\tau \right),{\boldsymbol{w}}_1\right)}{\partial {w}_{1,i}}\cdot {W}_N\left(n,k\right)\end{array}\right)\kern0.1em \end{array}} $$
(14)
$$ {\displaystyle \begin{array}{l}\frac{\partial {\boldsymbol{I}}_f\left({\omega}_k\right)}{\partial {w}_{2,i}}\\ {}=\frac{1}{N_T}\sum \limits_{n=0}^{N_T-1}\frac{{\left.\partial {\boldsymbol{h}}_{\mathrm{ANN}}\Big({\boldsymbol{v}}_f\left({t}_n\right),{\boldsymbol{i}}_c\left({t}_n\right)\right|}_{{\boldsymbol{v}}_c\left({t}_n\right)},{\boldsymbol{i}}_c\left({t}_n-\tau \right){\left|{}_{{\boldsymbol{v}}_c\left({t}_n\right)},\dots, {\boldsymbol{i}}_c\left({t}_n-{N}_c\tau \right)\right|}_{{\boldsymbol{v}}_c\left({t}_n\right)},{\boldsymbol{w}}_2\Big)}{\partial {w}_{2,i}}\cdot {W}_N\left(n,k\right)\end{array}} $$
(15)
where G
c
(t
n
) at the mapped voltage of coarse model vc(t
n
) is the nonlinear conductance matrix of the existing coarse model at time point tn.