In this section, we discuss the spatio-temporal feature of the fall, followed by the sensing model design.

### 3.1 Optical flow based analysis

To obtain the discriminative spatio-temporal feature of the fall, it is necessary to analyze the difference between the fall and other normal activities. This is the key to detect the fall in an efficient way. Based on the analysis, we design the corresponding sensing model.

The sophisticated optical flow method is employed to refine the analysis of the spatio-temporal feature as a human performs different activities. The estimation of pixel motion in two consecutive frames yields the optical flow computation [28]. Some sample images of normal activities and the fall are shown in Figure 1. The motion images are taken from a video of which the sample rate is 25 frames/s. We select three frames and its corresponding optical flow vector images in each category of activities for visualization. We divide the monitored region into four sub-regions, as shown in Figure 1a; furthermore, we aggregate the horizontal vector magnitude within these regions separately, denoted as horizontal motion energy (*HME*), as shown in Figure 2. The *HME* reflects the horizontal component of the human motion that crosses the sub-region perpendicularly, and will be used as the cue to analyze the spatio-temporal feature of different activities. As shown in Figure 2a "walking" and 2d "jogging", the peaks of *HME* of each sub-region appear one by one at roughly fixed interval, which reflect the motion characteristic of "walking" and "jogging" that pass these four sub-regions sequentially at about the same horizontal speed. As shown in Figure 2b,c, the peaks of *HME* of "sitting down" and "standing up" disappear or appear gradually, as they are controlled human activities. By contrast, the "fall" will cause the *HME* output of the adjacent sub-regions to overlap in a relatively short period of time, which corresponds with the velocity features of the fall [29], as shown in Figure 2e. These observations consist with the dynamics of the fall. In [29], Wu found two velocity features of the fall: (1) the magnitude of both vertical and horizontal velocities of the trunk will increase dramatically during the falling phase, reaching up to 2 to 3 times that of any other controlled movement; (2) the increase of the vertical and horizontal velocities usually occurs simultaneously, about 300-400 ms before the end of the fall process, which are strongly dissimilar with the controlled human activities.

Based on the above analysis, the time-varying *HME* of each sub-region is a discriminative spatio-temporal feature which can be used to distinguish the fall from other normal activities. To leverage this feature for fall detection in an efficient fashion, the feature-specific system should: (1) segment the monitored region into sub-regions and (2) the sensors collect the energy variation of each sub-region. This is the inspiration of our sensing model design, which will be elaborated in the succeeding sections.

### 3.2 Sensing model

To capture the most discriminative spatio-temporal feature of the fall, namely the *HME* of each sub-region, the sensing model has to be designed deliberately. Our model springs from the reference structure tomography (RST) paradigm, which uses multidimensional modulations to encode mappings between radiating objects and measurements [12].

The schematic diagram of our sensing model is shown in Figure 3. The *object space* refers to the space where the thermal object moves. The *measurement space* refers to the space where the PIR sensors are placed. The *reference structure* specifies the mapping from the object space to the measurement space [12], and is used to modulate the FOV of each PIR. In the case of opaque *reference structure*, the visibility function *v*_{
j
}(**r**) is binary valued depending on whether the point **r** in the object space is visible to the *j* th PIR sensor:

{v}_{j}(\mathbf{r})=\{\begin{array}{ll}1\hfill & \mathbf{r}\phantom{\rule{0.1em}{0ex}}\text{is}\phantom{\rule{0.1em}{0ex}}\text{visible}\phantom{\rule{0.1em}{0ex}}\text{to}\phantom{\rule{0.1em}{0ex}}\text{the}\phantom{\rule{0.1em}{0ex}}j\text{th}\phantom{\rule{0.1em}{0ex}}\text{PIR}\hfill \\ 0\hfill & \text{otherwise}\hfill \end{array}

The function of the PIR sensors is to transform the incident radiation into measurements. The measurement of the *j* th PIR sensor is given by

{m}_{j}\left(t\right)=h\left(t\right)*\underset{\Omega}{\int}{v}_{j}\left(\mathbf{r}\right)s\left(\mathbf{r},t\right)\mathsf{\text{d}}\mathbf{r}

(1)

where "*" denotes convolution, *h*(*t*) is the impulse response of the PIR sensor, Ω ∈ R^{3} is the object space covered by the FOV of the *j* th PIR sensor, *v*_{
j
}(**r**) is the visibility function, and *s*(**r**,*t*) is the thermal density function in the object space.

Assume that there are *M* sensors in the measurement space, and their FOVs are multiplexed. Thus, every point **r** in the object space can be associated with a binary *signature vector* [*v*_{
j
}(**r**)] ∈ {0, 1}^{M}, which specifies its visibility to these *M* sensors. In the object space, contiguous points with the same signature form a cell that is referred to as a *sampling cell*. As a result, the 3D object space Ω can be divided into *L* discrete non-overlapping sampling cells, denoted as Ω_{
i
}

\Omega ={\cup}_{i}{\Omega}_{i},{\Omega}_{i}\cap {\Omega}_{j}=\varnothing

(2)

where *i, j* = 1, ..., *L*. Then (1) can be rewritten in discrete form

\begin{array}{ll}\hfill {m}_{j}\left(t\right)& =h\left(t\right)*\sum _{i=1}^{L}\underset{{\Omega}_{i}}{\int}{v}_{j}\left(\mathbf{r}\right)s\left(\mathbf{r},t\right)\mathsf{\text{d}}\mathbf{r}\phantom{\rule{2em}{0ex}}\\ =h\left(t\right)*\sum _{i=1}^{L}{v}_{ji}\underset{{\Omega}_{i}}{\int}s\left(\mathbf{r},t\right)\mathsf{\text{d}}\mathbf{r}\phantom{\rule{2em}{0ex}}\\ =\sum _{i=1}^{L}{v}_{ji}\left[h\left(t\right)*\underset{{\Omega}_{i}}{\int}s\left(\mathbf{r},t\right)\mathsf{\text{d}}\mathbf{r}\right]\phantom{\rule{2em}{0ex}}\\ =\sum _{i=1}^{L}{v}_{ji}{s}_{i}\left(t\right)\phantom{\rule{2em}{0ex}}\end{array}

(3)

where *v*_{
ji
}is the *j* th element of the signature vector of Ω_{
i
}, and {s}_{i}\left(t\right)=h\left(t\right)*{\int}_{{\Omega}_{i}}s\left(\mathbf{r},t\right)\mathsf{\text{d}}\mathbf{r} is the sensor measurement of sampling cell Ω_{
i
}.

Then (3) can be written in a matrix form as

\mathbf{m}=\mathbf{Vs}

(4)

where **m** = [*m*_{
j
}(*t*)] ∈ ℝ^{M×1}is the measurement vector, **V** = [*v*_{
ji
}] ∈ ℝ^{M×L}is the measure matrix determined by the visibility modulation scheme, and **s** = [*s*_{
i
}(*t*)] ∈ ℝ^{L×1}is the sensor measurement of the sampling cells.

As the analysis mentioned in Section 3.1, to capture the discriminative spatio-temporal feature of the fall, it requires that the system can sense the time-varying *HME* of each sub-region. In our sensing model design, each sampling cell corresponds to a sub-region of the monitored region, and the PIR sensors are employed to capture the time-varying *HME* of these sampling cells. Therefore, our sensing model satisfies the design requirements. Our sensing model is an intrinsic non-isomorphic model, which means the number of PIR sensors *M* is less than the number of sampling cells *L*, and the measurement of each PIR sensor is a linear combination of the sampling cells [30]. However, its sensing efficiency is high, that is, it can robustly detect the fall by processing low-dimensional sensor data directly. The reason lies in that the our sensing model captures the most discriminative spatio-temporal feature of the fall by efficient spatial segmentation. In Section 5, we will elaborate the system implementation, and list the specification of the *reference structure*.

### 3.3 Signal feature extraction

To represent the energy variation of the time-varying PIR sensor signals, it is critical to select an appropriate feature. Because the short time energy (*STE*) has been proved effective in depicting the energy variation of the sine-like waveform [31], we employ it as the feature of the PIR signals. The *STE* of the *n* th frame of the *j* th PIR is defined as

{p}_{j}\left(n\right)=\sum _{k=0}^{{Z}_{n}-1}\left|{m}_{j}\left(k\right)-avST{E}_{j}\left(n\right)\right|

(5)

\mathsf{\text{with}}\phantom{\rule{2.77695pt}{0ex}}avST{E}_{j}\left(n\right)=\frac{1}{{Z}_{n}}\sum _{k=0}^{{Z}_{n}-1}{m}_{j}\left(k\right)

(6)

where *j* ∈ {1, ..., *M*} is the index of the PIR sensor, *Z*_{
n
}is the total number of the sampling points in the *n* th frame, *avSTE*_{
j
}(*n*) is the average energy of the sampling points, and *m*_{
j
}(*k*) is the signal amplitude of the *k* th sampling point.

### 3.4 Hierarchical classifier

Based on the extracted signal feature, *STE* of each frame, we can continue to design the corresponding classifier. The design of the classifier is problem-specific. Fall detection could be regarded as a binary classification problem, that is, fall or other normal activities. However, because it is difficult to design a single classifier to accomplish the task, the coarse-to-fine strategy is a better choice [32]. Thus, we design a binary hierarchical classifier for the fall detection.

The hierarchical classifier in our study is based on the hidden Markov models (HMMs). HMMs have been demonstrated as a powerful tool for modeling time-varying sequence data, such as speech [33] and video stream [34]. The parameters of a HMM can be denoted compactly by *λ* = (**A**, **B**, **Π**), where **A** = {*a*_{
ij
}} represents the hidden state transition probabilities matrix, **B** = {*b*_{
i
}(**p**(*n*))} denotes the probability density distribution of the observation vector, and **Π** = {*π*_{
i
}} is the initial state probability vector [33]. The parameters are learned from training data using Baum-Welch method. This is done for each class separately.

The binary hierarchical classifier we designed is the two-layer HMMs model, as shown in Figure 4. The normal activities include normal horizontal activities and normal vertical activities. The first-layer HMMs is responsible to classify the unknown activities into normal horizontal activities and the rest. The horizontal activities include walking and jogging, and the rest activities include fall, sitting down, and standing up. In other words, we need to train two HMMs to separate these two groups of activities, *G*_{1} = {walking, jogging} and *G*_{2} = {fall, sitting down, standing up}. Based on the Bayesian rule, given the sequence **P** = [*p*_{
j
}(*n*)], the likelihood output *p*(λ_{
i
}|**P**) is proportional to *p*(**P**|λ_{
i
}). That is, label the input sequence **P** to the HMM with the height likelihood,

{i}^{*}=\text{arg}\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}\underset{i\in \left\{1,2\right\}}{\text{max}}p\left(\mathbf{P}|{\lambda}_{i}\right)

(7)

where λ_{1} and λ_{2} correspond to *G*_{1} and *G*_{2}, respectively.

By the same method, the second-layer HMMs is to distinguish the fall from other normal vertical activities (sitting down and standing up).