Parameters | Values |
---|---|
Dim of embedded \(\varvec{B}_{g}\) | 128 |
Dim of \(\varvec{B}_{t}\) | 128 |
Multi-heads | 6 |
Number of attention modules | 2 |
Dim of MLP | 256 |
Length of beam codeword | 64 |
Option of latent beam codebook \(N_{CB}\) | 64, 128, 256 |
Batch size N | 128 |
Number of positive sample | 1 |
Number of negative samples | 127 |
Anchor t | 16 |
Upper bound of k | 8 |
Optimizer | Adam |
Learning rate | \(10^{-3}\) |
Number of epochs | 100 |
Dropout percentage | \(20\%\) |
Dataset size (\(100\%\)) | \(10\times 10^{4}\) |
Dataset split | \(70\%;30\%\) |