DigitalCommons@WPI

. We provide regression analysis-based statistical models to explain the usage of wireless spectrum across four mid-size US cities in four frequency bands. Speciﬁcally, the variations in spectrum occupancy across space, time, and frequency are investigated and compared between di ﬀ erent sites within the city as well as with other cities. By applying the mixed-e ﬀ ects models, several conclusions are drawn that give the occupancy percentage and the ON time duration of the licensed signal transmission as a function of several predictor variables.


Introduction
Spectrum measurement studies conducted by wireless communications researchers have shown that the utilization of licensed wireless spectrum is relatively low [1].This is a result of the fact that frequency bands are exclusively licensed to specific services and entities on a command-andcontrol basis by regulatory agencies, for example, U.S. Federal Communications Commission (FCC), Industry Canada, and U.K. Office of Communications (OfCom).Although such a scheme is effective in protecting the rights of the incumbent license holders, a completely new strategy of spectrum allocation is needed in order to accommodate the increasing need for efficiently utilizing the wireless spectrum.This new strategy, called dynamic spectrum access (DSA), can be enabled by highly agile wireless platforms called cognitive radios [2].Recently, as a significant step in this direction, the FCC has adopted the initial rules for the use of unlicensed devices in TV bands [3].Consequently, there is a need to accurately assess and characterize wireless spectrum in order to facilitate the transition to this new spectrum allocation strategy.
Some of the earlier works aimeing at quantifying the spectrum usage within the context of DSA-oriented cognitive radio using actual real-time measurements have been reported in [1,4].A comprehensive summary of spectrum occupancy for New York City and several locations in Virginia were reported in [1].Reference [4] presents similar results for locations in the state of Georgia.In particular, spectrum occupancy variations as a function of varying thresholds and across the different angles of arrival at the receiver were presented.In [5], a more thorough mathematical analysis based on continuous-time semi-Markov models is provided using spectrum measurement data of WLAN channels.More recently, closed-form probability distributions are presented for several fixed bandwidth signalling channels in [6] using the datasets presented in [1] whereas in [7], a comparison of the spectrum occupancy characteristics in four mid-size US cities is provided.
Spectrum measurement-based studies similar to those described above have also been conducted outside of the United States.In [8], spectrum occupancy for several bands in the frequency range from 806 MHz to 2750 MHz in urban Auckland, New Zealand is provided.In [9], four spectrum sensing methods have been proposed, and their performance is compared for UMTS uplink and GSM 1800 uplink bands.In [10], a methodology has been developed to identify TV whitespace frequencies in the UK, using digital TV coverage maps in conjunction with a database containing their locations.
Although the paradigm shift in wireless spectrum regulatory approaches is based on the assumption that the majority of wireless spectra are extensively underutilized by the incumbent license holders, who rely on several The rest of this paper is organized as follows.In Section 2, the measurement setup consisting of the hardware and software tools used to collect the data is described.Then, a description of the statistical results extracted from the measured data is presented in Section 3. A brief discussion of the linear mixed-effects model followed by its application to the collected measurement data is provided in Sections 4 and 5. Finally, we conclude the paper by highlighting the key conclusions in Section 4.

Spectrum Measurement Setup
In our measurement campaign, we used two antennas for scanning the low-and the high-frequency ranges.For the low-frequency range, from 88 MHz to 1240 MHz, we used a Diamond D-220 mini-Discone antenna with an operating frequency range of 100-1600 MHz.For the high-frequency range, from 1850 MHz to 2686 MHz, we used an Advanced Technical Materials (ATM) 07-18-440-NF horn antenna with an operating frequency range of 0.7-18 GHz and an aperture of 60 • .This helped us in observing the variation in spectrum usage across different angles of arrival.During our operation, one of these antennas is wired to an Agilent CSA series N1996A spectrum analyzer with frequency range ranging from 100 kHz to 3 GHz and consisting of a low-noise amplifier (LNA).We use an in-house software tool called SQUIRREL (Spectrum Query Utility Interface for Real-time Radio Electromagnetics) to communicate remotely with the spectrum analyzer via commands issued through a simple graphical user interface on a laptop.The GUI accepts details such as the center frequency, the span around the center frequency, and the resolution bandwidth.SQUIRREL communicates with the spectrum analyzer using TCL (Tool Command Language) over TCP/IP.After the sweep action is performed by the spectrum analyzer, the data points are returned to the GUI in a comma-spaced value format.In its current format, the GUI and the server are written in JAVA and can be deployed on a variety of operating systems and computers.
The details about the locations and the dates of our spectrum measurement campaign are given in Table 1.We chose five locations which were at least a mile apart from each other, so that we would be able to capture the spatial variation as we go higher in frequency in the radio frequency (RF) spectrum.We measured usage activity across approximately 70% of the wireless spectrum from 88 MHz to 2686 MHz.We omitted those bands in which the average usage has been previously reported to be extremely low.Thus, we focused on the remaining bands of interest.Also, in our measurement procedure, we sweep a particular frequency band, for example, Personal Communications Service (PCS) from 1850 MHz to 1990 MHz, completely for a specific number of times and then proceed to the next band instead of scanning a wide frequency range.By performing the sweeps in this manner, our goal was to capture temporal variations over small periods of time.We chose a constant resolution bandwidth of 20 kHz, and the number of sweeps recorded per band per site is 25. Figure 1 provides a firststep summary of all the data points collected across all the frequencies in bins of 20 kHz.This plot which is a complementary cumulative distribution function shows the spectrum occupancy in each of the four cities as a function of energy.

Spectrum Occupancy Characteristics
Figure 1 shows the trend in the occupancy irrespective of the cities, sites, time, and frequency.Although it serves the purpose of summarizing the measured results, a great deal of details remain hidden in the data with respect to both the occupancy characteristics over time, frequency, and space, and their dependance on other influencing factors.One way of analyzing the occupancy results is presented in [7] where we have provided occupancy values in percentages across different channels, along different angles of arrival and over several time sweeps as observed during the measurement duration.Another way of performing the analysis is from the point of associating the measured data with certain predictor variables in a linear mixed-effects model as we will explain below.
Due to the differences in the signal modulation involved as well as the differences in the bandwidths utilized by each channel, energy spectral densities corresponding to signals transmitted for different wireless services can be expected to be different.Thus, the four different wireless services analyzed, namely, paging, TV, WCS, and PCS correspond to four different predictor variables.Similarly, the four US cities are also predictor variables.Assuming that the spectrum usage is dependant on two other factors, namely, the time of the day and day of the week, they are incorporated as well.Due to the fact that our data corresponds to only four mid-size US cities, we do not claim that our model is a representative of all the mid-size US cities.This is the reason why although our model is not as general as we would like it to be, due to practical constraints involved, we, nevertheless, believe that it is indicative of the general trends in spectrum occupancy characteristics that can be expected in any typical US city.Moreover, we considered the population densities associated with the measurement sites as our random-effects term to reflect this fact.In the following sections, we provide more details regarding the occupancy values by grouping the appropriate collected data points as functions of several predictor variables.We now briefly explain the algorithm used to determine the presence/absence of the licensed user signal.
In order to show a comparison of the spectrum usage as a function of the variables mentioned above, an optimum threshold is computed using Otsu's gray-level thresholding algorithm [11] for each of the datasets.Otsu's optimum threshold provides a maximum separation between the two classes of data, namely, the signal and the noise (There are alternative approaches for computing the threshold, some of which are explained in [4]).Our primary motivation to use Otsu's thresholding algorithm is influenced by the nature of the data collected.Our measured data is in fact samples of energy spectral density (ESD) across a band of concentration and not time samples.We cannot apply traditional signal detection-based techniques due to total absence of phase information.Therefore, we detect the presence of the signal in the data purely from the point of view of separating data into two distinct distributions.The optimal threshold calculated using Otsu's algorithm is known to maximize the variance between the two classes of data, namely, the signal and the noise classes.Therefore, we employ this algorithm in our analysis.
In order to apply Otsu's algorithm, a matrix M(t j , f i ) is formed from the collected data points where the row t j contains data points over all the frequency locations in the band of interest during one particular time instant, and the column f i represents the data points observed in that frequency bin over all time sweeps during the measurement process.The next step is to transform the contents of this matrix into gray-scale values by applying the procedure given by ( Applying Otsu's algorithm to the matrix, I(t j , f i ), gives the required optimum threshold, using which, all the values that are below are classified as noise and the rest as signal.
Performing row-wise additions on the matrix, M, and dividing each element of the obtained column matrix with the total number of frequency locations give the percentage occupancy during the time period when the measurements were taken.We consider this percentage occupancy as the response variable which is a function of predictors such as the city, the site, the time of day during which the measurements were taken, weekday/weekend, and the specific wireless service corresponding to a particular frequency band, as mentioned previously.Before proceeding to fit the spectrum occupancy percentage as a function of these variables, we provide a brief overview of the linear mixed-effects model and explain its appropriateness in modeling the abovementioned response variable.

An Overview of the Linear Mixed-Effects Model
The normal linear model given by the equation: explains the relationship between one or more independent variables, called regressor variables, and a dependent variable, called the response variable.The parameters of the model are called the regression coefficients, specified as β 1 , β 2 , . . ., β p , and the error variance, defined as σ 2 .The above model has one random-effect term, the error term ε i given by which is assumed to be independent and identically distributed (i.i.d.).Another important assumption is that the sample is drawn randomly from the population of interest.Usually, we set x 1i = 1 while β 1 is either a constant or an intercept.Therefore, rewriting the model in matrix form yields where we define the following variables: (i) y = [y 1 , y 2 , . . ., y n ] T is the response vector; (ii) X is the model matrix; (iii) β = [β 1 , β 2 , . . ., β n ] T is the vector of regression coefficients; (iv) ε = [ε 1 , ε 2 , . . ., ε n ] T is the vector of errors; (v) N n represents the n-variable multivariate normal distribution.
Estimating the parameters of the above model is a well known linear least squares problem.The estimate of the regression coefficient vector is given by the expression: Several variants of the basic linear regression model of ( 2) are widely used in various areas of science.One such variant is the mixed-effect model.These models include additional random-effect terms and are appropriate in representing clustered, and therefore, dependent data arising when data are collected over time on the same entities; that is, these repeated measures data are generated by observing a number of entities repeatedly under differing experimental conditions, where the entities are assumed to constitute a random sample from a population of interest.Longitudinal data constitute a common type of repeated measures data, where the observations are ordered by time or position in space.In general, longitudinal data can be defined as repeated measures data where the observations within entities could not have been randomly assigned to the levels of a "treatment" of interest (usually time or position in space); hence, serial correlation results.
Writing the linear mixed-effect model of the form shown in (2) yields where Alternately, but equivalently, the above model can be written in matrix form as where we define the following variables: (i) y i is the n i × 1 response variable for observations in the ith group; (ii) X i is the n i × p model vector for the fixed effects for observations in the ith group; (iii) β is the p ×1 vector of fixed-effects coefficients for the ith group; (iv) Z i is the n i × q model matrix for the random effects for observations in the ith group; (v) b i is the q × 1 vector of random-effects coefficients for the ith group; (vi) ε i is the n i × 1 variable of error for the ith group; (vii) σ 2 D is the q × q covariance matrix for the randomeffects; (viii) σ 2 Λ is the n i × n i covariance matrix for the errors in the ith group.
From the above representation, define When the variance components Λ and D are known, the standard estimators for β and b are the generalized linear estimator β lin = (X T V −1 X) −1 X T V −1 y where V = Λ + ZDZ T and the posterior mean, b lin = DZ T V −1 (y − X β).The estimates β lin and b lin jointly maximize the function [12]: The above function is the logarithm of the posterior density of b (up to a constant) for fixed β and for fixed b is the loglikelihood for β (up to a constant).Equation ( 9) has two terms, a sum of squares term and a quadratic term in b.By transforming the quadratic term in b to an equivalent sum of squares term, the optimization can be treated purely as a least squares problem.Then it is straightforward to translate it into the nonlinear setting.

Linear Mixed-Effects Model Applied to
Real-Time Wireless Spectrum Analysis

Regression Fit for Percentage Spectrum Occupancy.
The model for the performed analysis on the spectrum occupancy percentage using a linear mixed model is as follows: Occ.Perc i j = β 0 + β 1 TV i j + β 2 PCS i j + β 3 WCS i j + β 4 roch i j +β 5 buff i j +β 6 pitt i j +β 7 AN i j As seen from the above model, we have selected three indicator variables (i.e., either 1 or 0) for the types of the wireless service (TV, PCS, WCS), three indicator variables for the cities (Rochester, Buffalo, Pittsburgh), one indicator variable for afternoon/before noon, and one indicator variable for weekend/weekday.The intercept represents the spectrum occupancy in the paging band for Worcester, Massachusetts.As mentioned previously, the response variable in the regression analysis that we considered is the percentage spectrum occupancy which is calculated after applying Otsu's thresholding algorithm.Also, notice that the population density of the sites is chosen as the random-effects term which is specific to each of 20 groups (4 cities × 5 sites).Since, we collected 25 wireless spectrum sweeps in each of the 5 sites from each city, the population density is chosen as the random effect that is different among the sites.Moreover, the population density is rounded off to the next highest multiple of 100.Thus, discrete values are considered, which helps in the interpretation of the obtained results.Fitting the linear mixed model gives the following results in Table 2.The parameters associated with the random effects are as follows: standard deviation of the intercept = 2.14, standard deviation of the population density = 0.12, and the correlation coefficient of the population density = 0.007.From the above random effects, the covariance matrix of the random effects [13] can be calculated as follows: 2  2.14 × 0.12 × 0.007 2.14 × 0.12 × 0.007 0.  2 mark (in Tables 2 and 3, DF is short for degrees of freedom), we see that the percentage spectrum occupancy for Worcester in the paging band is 13.28% with a P-value of < .0001.With all other regressors remaining constant, the percentage spectrum occupancy for the city of Rochester in the paging band increases to 14.65%; that is, it is 1.37% higher than that of the city of Worcester with the associated P-value being 7 × 10 −4 .Similarly, with the city under  consideration, the type of the wireless service, and the time of the day remaining constant, the spectrum occupancy decreases by 1.12% on the weekends for a P-value of < .0001.Notice that we have obtained all of the above coefficients at very low P-values indicating the statistical significance of each of the regressors.Also, the structure of the D matrix which is almost diagonal suggests that the assumed normality assumption on the random effects is valid.The plot of the standardized residuals shown in Figure 2(a) also supports this assumption on the residuals since approximately 95% of the residuals lie in the range [−1.96, +1.96]; that is, they follow a standard normal distribution very closely.

Regression Fit for ON Time Duration of the Licensed Signal
Transmissions.The model for the performed analysis on the ON time duration of the licensed signal transmissions in the four bands considered is similar to that of the spectrum occupancy percentage.Thus, it follows that ON. time i j = β 0 + β 1 TV i j + β 2 PCS i j + β 3 WCS i j + β 4 roch i j +β 5 buff i j +β 6 pitt i j +β 7 AN i j + β 8 weekend i j + b i0 + b i1 PD i j + ε i j .(12) In this case, the response variable in the regression analysis performed is the ON time duration which is calculated after applying Otsu's thresholding algorithm.We calculated the amount of time during which the licensed signal transmission was consistently above the calculated threshold.The regressor variables are the same.Fitting the linear mixed model gives the following results presented in Table 3.The parameters associated with the random effects are as follows: standard deviation of the intercept = 1.58, standard deviation of the population density = 0.34, and the correlation coefficient of the population density = 0.005.
From the above random effects, the covariance matrix of the random effects can be calculated as follows: (13)

Interpretation of the Obtained Regression Fit.
From Table 3, we see that the ON time duration for the city of Worcester in the PCS band is 3.35 s with a P-value of <.0001.It is 1.02 higher than that of the paging band.With all other regressors remaining constant, the ON time duration of the licensed signal transmissions for the city of Pittsburgh in the PCS band increases to 6.09 s; that is, it is 3.76 s higher than that of the city of Worcester with the associated P-value being 2×10 −4 .Similarly, with the city under consideration, the type of the wireless service, and the time of the day remaining constant, the ON time duration decreases by 1.87 s on the weekends for a P-value of < .0001.Again, we have obtained all of the above coefficients at very low P-values indicating the statistical significance of each of the regressors.Again, the normality assumption on the random effects is validated by the structure of the D matrix which is almost diagonal.We also show the quantile-quantile plot of the standardized residuals in Figure 2(b).Even though, towards the lower tail of the distribution, there is a slight deviation from the normal scores, we believe that it is not significant enough to seriously violate the normal distribution assumption.

Conclusion
In this paper, we analyzed the spectrum occupancy characteristics for four mid-size US cities in four frequency bands from a spectrum survey point of view.A linear mixed-effects model fit was obtained, and the selected regressor variables were shown to be statistically significant.The residual plots shown are a good indicator of this.Extending the considered models to include other regressor variables without making the interpretability of the models difficult is an important area in the field of regression analysis.In future work, we plan to study other techniques available in order to explain the spectrum occupancy characteristics for the other bands.

EURASIPFigure 1 :
Figure 1: Cumulative distribution functions showing spectrum occupancy for the four cities surveyed.

1 .
Interpretation of the Obtained Regression Fit.From Table Quantile-quantile plot for the fit shown in(10) Quantile-quantile plot for the fit shown in(12)

Figure 2 :
Figure 2: Quantile-quantile plots for the proposed linear mixed-effects models.

Table 1 :
List of spectrum measurement locations.
WPIindependently conducted measurement campaigns, there still exists a definite need to obtain a deeper understanding of this natural resource.By gaining insights into wireless spectrum occupancy characteristics, appropriate technical and legislative actions can be taken in order to support continued growth in the wireless sector.In this paper, we present a statistical analysis for the wireless spectrum occupancy across the spatial, temporal, and frequency dimensions