This section describes the methodologies for realizing heterogeneous sensing regarding the previously identified challenges.
Conceptually, we propose the following workflow for heterogeneous spectrum sensing (see Figure 1). The initial phase for performing sensing experiments consists of configuring the heterogeneous devices, sending them the instructions for starting the sensing and collecting the data. This involves creating a series of device-specific scripts. As one of the contributions of this paper, we propose a uniform way of providing the configuration to the devices and storing the data from the devices. We use a common data format (CDF) for experiment description and data storage that is device independent and machine readable. Spectrum sensing descriptions and settings are defined using this common data format as depicted in Figure 1. In the first step, these uniform descriptions are then converted into device (or testbed/infrastructure)-specific configurations and control scripts. In the second step, the results of the experiments are transformed into a common representation format. As the third and final step, the resulting uniformly described data is further processed by a set of tools to align the resolution, achieve calibration and compute spectrum occupancy-related metrics.
It should be noted that, the information needed in the calibration phase comes from separate experiments, which will be denoted as calibration experiments in the remaining part of the paper. Calibration experiments are not necessarily part of every experiment iteration, but it should be at least performed once before the real sensing measurements start.
Common data format
The proposed common data format has been developed to ease spectrum sensing experimentation across devices and testbeds and contains three main parts. The first part refers to the description of the experiment abstract, the second part refers to the spectrum sensing experiment, thus the so-called meta-data. The third part focuses on the actual traces resulted from the experiment.
The experiment description provides a detailed description of the experiment, such as how it was performed and what kind of data was collected. From the top level, the description contains the following fields: experiment abstract, meta-information and experiment iteration(s). Below each field (except for experiment abstract), some sub fields are defined, as shown in Figure 2.
Experiment abstract
Experiment abstract is a high level description of the experiment, providing a basic idea of the experiment motivation, as well as the expected output. It is possible to relate to other experiments by adding relevant information. For instance, when experiment B is a scaled extension of experiment A, the following sentence ‘repetition of experiment A on a larger scale’ can be noted in the abstract of experiment B. In addition, we provide means to link to related documentations, such as publications that are based on a given experiment.
Meta-information
Meta-information is the information required for describing, understanding and evaluating the experiment. All experimental details except the data itself should be described in this field. The most important items are the description of involved devices, physical setup of the experiment, the selected signal type and frequency, as well as the description of the measurements.
The description of the involved devices is critical to reproduce the experiment. It should not only be limited to textual description but also provide references to the relevant data sheets. Moreover, we recommend to include information of related software and, if necessary, the operating system. The bottom line is that the collected information must suffice to repeat the experiment from scratch, starting from finding the same devices to setting up the identical software environment.
The physical experiment setup mainly refers to the description of how devices are positioned and connected. Ideally, there should be a location map to indicate the topology of the devices. Wireless experiments are sensitive to environmental factors, such as if an experiment is conducted indoor or outdoor, or if an experiment is conducted under a static or rather dynamic environment. Thus, we recommend to document this information in the meta-information as well.
Furthermore, the operating frequency and the characteristics of the used signals are noted as additional parameters. This creates a convenient way of indexing the existing experiments, e.g. one can easily find all sensing experiments in the TV white space. Thus, it allows experimenters to reuse past experiments more efficiently.
Finally, the measurement description contains a common description of the recorded data of all the experiment iterations, allowing experimenters to understand and process the data more smoothly. It specifies the configuration used by each device (e.g. gain settings, sample frequency) and the collected data types (e.g. frequency, signal power, time stamp). In addition, each data type is associated with a measurement unit (e.g. Hz, dBm, μs). For more information related to defining measurement units, readers are referred to the IEEE 1900.6 [12] standard.
Experiment iteration(s)
Experiment iteration provides information that is related to the execution of a particular experiment round. There are two sub fields in each experiment iteration: the trace description and the trace file reference. The trace description is similar to the description in the meta-information but may extend or refine the meta-information partially if necessary, as shown by the red line in Figure 2. For instance, if a set of measurements is used to compare the influence of different radio frequency (RF) front-end gain settings, trace description is an ideal place to indicate what gain setting is used in each experiment iteration. This way, different settings among experiment iterations can be highlighted without the need of describing the entire experiment setup over and again.
The trace file reference is a ‘pointer’ towards the measurement data, which indicates where the measurement trace is physically stored.
A reference implementation of the CDF architecture is presented in Section 3.
Measurement resolution
Typically, one spectrum sensing trace cannot be directly compared to another, due to the differences in frequency and/or time domain. To overcome the heterogeneous frequency resolution, the easiest and most straightforward approach is to integrate the power spectral density (PSD) in a certain frequency interval and use the integrated power as the metric for comparison. This also implies that the selected interval for integration needs to be wider than the largest resolution bandwidth among all the sensing solutions.
There are different approaches to overcome the differences in time resolution. The easiest way is to apply averaging on the traces obtained in the same time duration. Alternatively, instead of using averaging, one can apply max-hold filtering, so the combined trace contains every transient signal that ever appeared in the observation period. By using integration in the frequency domain and averaging or max-holding in the time domain, a common metric is derived from various raw spectra. This is referred to as the common metric in the remainder of the paper. We provide a reference implementation of this processing scheme in the CDF toolbox (pw_integration function).
Calibration
Calibration of heterogeneous devices essentially means comparing the received power of each device to its corresponding input signal strength. The calibration process consists of four steps. First, a set of reference signals has to be selected. Second, the path loss between the signal source and the devices under calibration must be strictly controlled. Third, a suitable metric for performing the calibration has to be identified and fourth, the offset between the reference signal and the signals received by the devices has to be computed.
For the first step, it is generally advisable to use a set of diversified input signals (i.e. different bandwidth and signal strength) so that the calibration experiment is general enough to cope with different types of input. Also the generated signal needs to be continuous so that the recorded signal has a constant amplitude. This ensures that the sensing performance in terms of timing does not affect the performance in terms of power accuracy. The produced signal strength needs to be tuned within the dynamic range of all devices. If the signal is too strong, it may saturate the device under calibration; on the other hand, when it is too weak, the signal might be buried by noise. Both situations should be avoided. Ideally, a high-end signal generator should be used as the signal source to meet the above constraints.
For the second step, the most at-hand method is to use a coaxial cable for controlling the path loss between the signal source and the sensing devices. An alternative way would be to use an anechoic chamber where the path loss is not affected by the multi-path effect.
In the third step, the received signal strength needs to be calculated from the power spectral density, which comes down to performing integration over the interval where the signal is transmitted in the frequency domain. If the integration interval is not the same as the signal bandwidth, the obtained metric will rely partially on the device’s noise floor instead of solely on the input signal, thus, not qualified for power calibration.
Finally, in the fourth step, the power offset is then computed according to Equation 1, where the transmit power is denoted as P
tx, the received power is denoted as P
rx, and the total attenuation caused by coaxial cables and splitters is denoted as P
atten:
$$ P_{\text{offset}} = P_{\text{tx}} - P_{\text{atten}} - P_{\text{rx}} $$
((1))
In Equation 1, P
offset accounts for the combined heterogeneity of the RF front-end, analog-to-digital converter (ADC) and the processing unit. However, it does not include the influence of the antenna, as the antenna is replaced by the coaxial cable connections. For devices using different types of antenna, the power offset needs to be readjusted with the antenna gain.
If the relative position of the transmitter and receiver is known, the influence of the radiation pattern should also be taken into account. For omnidirectional antenna, the radiation pattern changes with the elevation angle between the transmitter and receiver; while for directional antenna, the radiation pattern varies with both horizontal and vertical angles [20]. When the relative position of transmitter and receiver is unknown, it is necessary to rotate the directional antenna several times to cover the 360° [18].
Sometimes, P
offset varies with the input signal strength and the settings of the sensing device (i.e. gain settings). For instance, it is mentioned in [15] that the RFX2400 daughter-board of USRP does not have a linear input and output (IO) relationship. In this case, more measurements need to be performed to cope with different input signal strength and sensing configurations.
Processing
Sensitivity and accuracy are two important metrics to compare spectrum sensing devices. For heterogeneous sensitivity analysis, experimenters tend to form a mainstream processing style, which is discussed in the first part of this section. As for power accuracy, generally, a high-end device (i.e. spectrum analyser) is used as benchmarker in various measurements. However, when it comes to large scale heterogeneous measurements, this approach becomes very tedious. Thus, there is a need to process data in a more elegant approach, which is what we discuss in the second part of this section.
Heterogeneous sensitivity analysis
The sensitivity of a sensing device is reflected by its noise floor. Unlike power accuracy, sensitivity cannot be evaluated by the common metric derived in Section 3. This is because the noise floor is affected by the resolution bandwidth, thus the integrated power metric will always be higher than the original noise floor.
The most straightforward way is to observe the mean and variance value of the spectrum trace when no signal is present. Alternatively, we can also use the receiver operating characteristic. The ROC is obtained by expressing the probability of detection (P
d) as a function of the probability of false alarm (P
f). Some papers utilize the probability of missed detection (P
m) which is simply given by 1−P
d.
Despite of the heterogeneity in power spectra, ROC can be obtained via a common approach:
-
Record spectrum traces when no signal is present.
-
Vary P
f from 0% to 100% in small steps and determine a detection threshold for each P
f based on the previously recorded trace.
-
Apply a signal at the input of the sensing device and record spectrum trace again.
-
Compute P
d or P
m for all the detection thresholds determined in the second step from the trace recorded in the previous step
The advantage of ROC analysis is that it is device independent, as for a given false alarm, each device can have its own threshold. The only constraint is that the detection threshold should be calculated in an uniform approach for all devices. This is why it is commonly applied in the heterogeneous sensitivity studies [14,15,19].
As for the method to obtain detection threshold, there are many optimized variants [21-23]. As an example, the constant false alarm (CFA) approach [11] is described in Equation 2, where σ
n
denotes the variance of the noise samples, N denotes the number of spectrum samples, P
f denotes the target false alarm and λ denotes the calculated detection threshold, respectively.
$$ \lambda = {\sigma_{n}^{2}}(1+ \frac{Q^{-1}(P_{\text{f}})}{ \sqrt{N/2}}) $$
((2))
Heterogeneous accuracy analysis
As stated previously, distributed heterogeneous measurements usually generate a large amount of data, which needs more efficient processing mechanisms. When processing a large dataset, the basic approach is to look at how the data is distributed, which can be achieved by computation of several statistics (i.e. mean, variance). However, to gain more insights of the data (i.e. discover a common behaviour, or a group of data that displays similarity within the entire set), then more advanced techniques, such as correlation, various linear regression algorithms need to be involved. Essentially, we recommend to use the basic techniques of data mining for analysing large scale heterogeneous sensing experiments, among which four most relevant techniques are exemplified as follows:
-
Dependency modelling - the establishment of relationships between variables. This could be that the detection probability depends on the target signal strength or the distance between the transmitter and the sensing device.
-
Outlier detection - the identification of the unusual spectrum records, which could be caused by malfunctioning devices or other unknown interferences.
-
Regression - is a statistical way to explore the relationship among variables which models the data with the least error.
-
Clustering - is the task of discovering groups and structures in the data that are in one way or another ‘similar.’ In case of spectrum sensing, this could be that a group of sensing devices are shadowed by a common obstacle.
The outlier detection is a rather basic step, which can be achieved by many statistical tools or simply manual observations. The procedure of ‘Clustering’ and ‘Regression’ are addressed with a concrete experiment in Section 3. For the dependency modelling, we find that the path loss model (the relationship between received signal strength and distance) is generally applicable for dependency modelling in the case of distributed sensing measurements. More particularly, the well-known log-distance path loss model can be expressed by two parameters - the path loss coefficient exponent α and the path loss offset β:
$$ PL(d) = 20 \times \alpha \times \text{log}_{10}(d) + \beta, $$
((3))
where d is the distance between the transmitter and the receiver. When using the logarithmic distance as the argument, the equation reduces to a simple linear expression. Hence, various approaches, such as least square regression, can be used to estimate α and β.
The role of the path loss model is essentially a way to extract new parameters out of the raw data sets. It is a tool to correlate data from distributed locations. Although deriving the path loss model is not always easy or feasible, the basic idea of correlating data to extract new parameters is generally applicable and highly valuable in our experience.