Indoor location identification and navigation need to be as simple, seamless, and ubiquitous as its outdoor GPS-based counterpart is. It would be of great convenience to the mobile user to be able to continue navigating seamlessly as he or she moves from a GPS-clear outdoor environment into an indoor environment or a GPS-obstructed outdoor environment such as a tunnel or forest. Existing infrastructure-based indoor localization systems lack such capability, on top of potentially facing several critical technical challenges such as increased cost of installation, centralization, lack of reliability, poor localization accuracy, poor adaptation to the dynamics of the surrounding environment, latency, system-level and computational complexities, repetitive labor-intensive parameter tuning, and user privacy. To this end, this paper presents a novel mechanism with the potential to overcome most (if not all) of the abovementioned challenges. The proposed mechanism is simple, distributed, adaptive, collaborative, and cost-effective. Based on the proposed algorithm, a mobile blind device can potentially utilize, as GPS-like reference nodes, either in-range location-aware compatible mobile devices or preinstalled low-cost infrastructure-less location-aware beacon nodes. The proposed approach is model-based and calibration-free that uses the received signal strength to periodically and collaboratively measure and update the radio frequency characteristics of the operating environment to estimate the distances to the reference nodes. Trilateration is then used by the blind device to identify its own location, similar to that used in the GPS-based system. Simulation and empirical testing ascertained that the proposed approach can potentially be the core of future indoor and GPS-obstructed environments.
Movies provide us with a mass of visual content as well as attracting stories. Existing methods have illustrated that understanding movie stories through only visual content is still a hard problem. In this paper, for answering questions about movies, we put forward a Layered Memory Network (LMN) that represents frame-level and clip-level movie content by the Static Word Memory module and the Dynamic Subtitle Memory module, respectively. Particularly, we firstly extract words and sentences from the training movie subtitles. Then the hierarchically formed movie representations, which are learned from LMN, not only encode the correspondence between words and visual content inside frames, but also encode the temporal alignment between sentences and frames inside movie clips. We also extend our LMN model into three variant frameworks to illustrate the good extendable capabilities. We conduct extensive experiments on the MovieQA dataset. With only visual content as inputs, LMN with frame-level representation obtains a large performance improvement. When incorporating subtitles into LMN to form the clip-level representation, we achieve the state-of-the-art performance on the online evaluation task of 'Video+Subtitles'. The good performance successfully demonstrates that the proposed framework of LMN is effective and the hierarchically formed movie representations have good potential for the applications of movie question answering.