This paper identifies a new phenomenon: when users interact with physically simulated objects in a virtual environment that is much smaller than usual, there is a mismatch between the object physics that they expect and the object physics that is actually correct. We report the findings of our study investigating the relationship between perceived realism and a physically correct approximation of realism in a small-scale virtual environment. We conducted a within-subjects experiment in which 44 subjects performed a simple interaction task with objects, using two physics simulation conditions while scaled down by a factor of ten in a virtual reality application. Although both conditions gave the visual impression of a scaled-down user interacting in a normal-sized environment, the physics conditions affecting the objects were different by simulating either correct behavior at that scale, or incorrect behavior similar to as if a normal sized user was interacting in a world that had been scaled up instead. We found that the significant majority of the users considered the latter condition to be the realistic one. We argue that our findings have implications on many virtual reality and telepresence applications involving operation with physically simulated or physical objects in small scales.
Early detection and treatment of depression is essential in promoting remission, preventing relapse, and reducing the emotional burden of the disease. Current diagnoses are primarily subjective, inconsistent across professionals, and expensive for individuals who may be in urgent need of help. This paper proposes a novel approach to automated depression detection in speech using convolutional neural network (CNN) and multipart interactive training. The model was tested using 2568 voice samples obtained from 77 non-depressed and 30 depressed individuals. In experiment conducted, data were applied to residual CNNs in the form of spectrograms, images auto-generated from audio samples. The experimental results obtained using different ResNet architectures gave a promising baseline accuracy reaching 77%.
A speaker naming task, which finds and identifies the active speaker in a certain movie or drama scene, is crucial for dealing with high-level video analysis applications such as automatic subtitle labeling and video summarization. Modern approaches have usually exploited biometric features with a gradient-based method instead of rule-based algorithms. In a certain situation, however, a naive gradient-based method does not work efficiently. For example, when new characters are added to the target identification list, the neural network needs to be frequently retrained to identify new people and it causes delays in model preparation. In this paper, we present an attention-based method which reduces the model setup time by updating the newly added data via online adaptation without a gradient update process. We comparatively analyzed with three evaluation metrics(accuracy, memory usage, setup time) of the attention-based method and existing gradient-based methods under various controlled settings of speaker naming. Also, we applied existing speaker naming models and the attention-based model to real video to prove that our approach shows comparable accuracy to the existing state-of-the-art models and even higher accuracy in some cases.