A comparative study of various techniques which can be used for summarization of Videos i.e. Video to Video conversion is presented along with respective architecture, results, strengths and shortcomings. In all approaches, a lengthy video is converted into a shorter video which aims to capture all important events that are present in the original video. The definition of 'important event' may vary according to the context, such as a sports video and a documentary may have different events which are classified as important.

0
下载
关闭预览

相关内容

模式识别 Pattern Recognition

Since its renaissance, deep learning has been widely used in various medical imaging tasks and has achieved remarkable success in many medical imaging applications, thereby propelling us into the so-called artificial intelligence (AI) era. It is known that the success of AI is mostly attributed to the availability of big data with annotations for a single task and the advances in high performance computing. However, medical imaging presents unique challenges that confront deep learning approaches. In this survey paper, we first present traits of medical imaging, highlight both clinical needs and technical challenges in medical imaging, and describe how emerging trends in deep learning are addressing these issues. We cover the topics of network architecture, sparse and noisy labels, federating learning, interpretability, uncertainty quantification, etc. Then, we present several case studies that are commonly found in clinical practice, including digital pathology and chest, brain, cardiovascular, and abdominal imaging. Rather than presenting an exhaustive literature survey, we instead describe some prominent research highlights related to these case study applications. We conclude with a discussion and presentation of promising future directions.

0
0
下载
预览

Code summarization is the task of generating readable summaries that are semantically meaningful and can accurately describe the presumed task of a software. Program comprehension has become one of the most tedious tasks for knowledge transfer. As the codebase evolves over time, the description needs to be manually updated each time with the changes made. An automatic approach is proposed to infer such captions based on benchmarked and custom datasets with comparison between the original and generated results.

0
0
下载
预览

View-out is one of the architectural design parameters that influence the occupants' health and well-being. This article conducts a critical literature review on building standards, rating systems, and scientific studies that provide evaluation methods and guidelines for quality view-out. Based on the literature review, we identify primary variables for the quality of view-out, describe each variable's criteria, and propose recommendations for window design that ensure quality view-out.

0
0
下载
预览

In this paper, we present an accurate and scalable approach to the face clustering task. We aim at grouping a set of faces by their potential identities. We formulate this task as a link prediction problem: a link exists between two faces if they are of the same identity. The key idea is that we find the local context in the feature space around an instance (face) contains rich information about the linkage relationship between this instance and its neighbors. By constructing sub-graphs around each instance as input data, which depict the local context, we utilize the graph convolution network (GCN) to perform reasoning and infer the likelihood of linkage between pairs in the sub-graphs. Experiments show that our method is more robust to the complex distribution of faces than conventional methods, yielding favorably comparable results to state-of-the-art methods on standard face clustering benchmarks, and is scalable to large datasets. Furthermore, we show that the proposed method does not need the number of clusters as prior, is aware of noises and outliers, and can be extended to a multi-view version for more accurate clustering accuracy.

0
16
下载
预览

In recent years with the rise of Cloud Computing (CC), many companies providing services in the cloud, are empowered a new series of services to their catalog, such as data mining (DM) and data processing, taking advantage of the vast computing resources available to them. Different service definition proposals have been proposed to address the problem of describing services in CC in a comprehensive way. Bearing in mind that each provider has its own definition of the logic of its services, and specifically of DM services, it should be pointed out that the possibility of describing services in a flexible way between providers is fundamental in order to maintain the usability and portability of this type of CC services. The use of semantic technologies based on the proposal offered by Linked Data (LD) for the definition of services, allows the design and modelling of DM services, achieving a high degree of interoperability. In this article a schema for the definition of DM services on CC is presented, in addition are considered all key aspects of service in CC, such as prices, interfaces, Software Level Agreement, instances or workflow of experimentation, among others. The proposal presented is based on LD, so that it reuses other schemata obtaining a best definition of the service. For the validation of the schema, a series of DM services have been created where some of the best known algorithms such as \textit{Random Forest} or \textit{KMeans} are modeled as services.

0
3
下载
预览

We study the problem of video-to-video synthesis, whose goal is to learn a mapping function from an input source video (e.g., a sequence of semantic segmentation masks) to an output photorealistic video that precisely depicts the content of the source video. While its image counterpart, the image-to-image synthesis problem, is a popular topic, the video-to-video synthesis problem is less explored in the literature. Without understanding temporal dynamics, directly applying existing image synthesis approaches to an input video often results in temporally incoherent videos of low visual quality. In this paper, we propose a novel video-to-video synthesis approach under the generative adversarial learning framework. Through carefully-designed generator and discriminator architectures, coupled with a spatio-temporal adversarial objective, we achieve high-resolution, photorealistic, temporally coherent video results on a diverse set of input formats including segmentation masks, sketches, and poses. Experiments on multiple benchmarks show the advantage of our method compared to strong baselines. In particular, our model is capable of synthesizing 2K resolution videos of street scenes up to 30 seconds long, which significantly advances the state-of-the-art of video synthesis. Finally, we apply our approach to future video prediction, outperforming several state-of-the-art competing systems.

0
6
下载
预览

The large amount of videos popping up every day, make it is more and more critical that key information within videos can be extracted and understood in a very short time. Video summarization, the task of finding the smallest subset of frames, which still conveys the whole story of a given video, is thus of great significance to improve efficiency of video understanding. In this paper, we propose a novel Dilated Temporal Relational Generative Adversarial Network (DTR-GAN) to achieve frame-level video summarization. Given a video, it can select a set of key frames, which contains the most meaningful and compact information. Specifically, DTR-GAN learns a dilated temporal relational generator and a discriminator with three-player loss in an adversarial manner. A new dilated temporal relation (DTR) unit is introduced for enhancing temporal representation capturing. The generator aims to select key frames by using DTR units to effectively exploit global multi-scale temporal context and to complement the commonly used Bi-LSTM. To ensure that the summaries capture enough key video representation from a global perspective rather than a trivial randomly shorten sequence, we present a discriminator that learns to enforce both the information completeness and compactness of summaries via a three-player loss. The three-player loss includes the generated summary loss, the random summary loss, and the real summary (ground-truth) loss, which play important roles for better regularizing the learned model to obtain useful summaries. Comprehensive experiments on two public datasets SumMe and TVSum show the superiority of our DTR-GAN over the state-of-the-art approaches.

0
5
下载
预览

Coherence plays a critical role in producing a high-quality summary from a document. In recent years, neural extractive summarization is becoming increasingly attractive. However, most of them ignore the coherence of summaries when extracting sentences. As an effort towards extracting coherent summaries, we propose a neural coherence model to capture the cross-sentence semantic and syntactic coherence patterns. The proposed neural coherence model obviates the need for feature engineering and can be trained in an end-to-end fashion using unlabeled data. Empirical results show that the proposed neural coherence model can efficiently capture the cross-sentence coherence patterns. Using the combined output of the neural coherence model and ROUGE package as the reward, we design a reinforcement learning method to train a proposed neural extractive summarizer which is named Reinforced Neural Extractive Summarization (RNES) model. The RNES model learns to optimize coherence and informative importance of the summary simultaneously. Experimental results show that the proposed RNES outperforms existing baselines and achieves state-of-the-art performance in term of ROUGE on CNN/Daily Mail dataset. The qualitative evaluation indicates that summaries produced by RNES are more coherent and readable.

0
6
下载
预览

The best summary of a long video differs among different people due to its highly subjective nature. Even for the same person, the best summary may change with time or mood. In this paper, we introduce the task of generating customized video summaries through simple text. First, we train a deep architecture to effectively learn semantic embeddings of video frames by leveraging the abundance of image-caption data via a progressive and residual manner. Given a user-specific text description, our algorithm is able to select semantically relevant video segments and produce a temporally aligned video summary. In order to evaluate our textually customized video summaries, we conduct experimental comparison with baseline methods that utilize ground-truth information. Despite the challenging baselines, our method still manages to show comparable or even exceeding performance. We also show that our method is able to generate semantically diverse video summaries by only utilizing the learned visual embeddings.

0
4
下载
预览

While advances in computing resources have made processing enormous amounts of data possible, human ability to identify patterns in such data has not scaled accordingly. Thus, efficient computational methods for condensing and simplifying data are becoming vital for extracting actionable insights. In particular, while data summarization techniques have been studied extensively, only recently has summarizing interconnected data, or graphs, become popular. This survey is a structured, comprehensive overview of the state-of-the-art methods for summarizing graph data. We first broach the motivation behind and the challenges of graph summarization. We then categorize summarization approaches by the type of graphs taken as input and further organize each category by core methodology. Finally, we discuss applications of summarization on real-world graphs and conclude by describing some open problems in the field.

0
3
下载
预览
小贴士
相关论文
S. Kevin Zhou,Hayit Greenspan,Christos Davatzikos,James S. Duncan,Bram van Ginneken,Anant Madabhushi,Jerry L. Prince,Daniel Rueckert,Ronald M. Summers
0+阅读 · 3月5日
Neural Code Summarization
Piyush Shrivastava
0+阅读 · 3月4日
Won Hee Ko,Michael G. Kent,Stefano Schiavon,Brendon Levitt,Giovanni Betti
0+阅读 · 3月4日
Zhongdao Wang,Liang Zheng,Yali Li,Shengjin Wang
16+阅读 · 2019年3月27日
Semantics of Data Mining Services in Cloud Computing
Manuel Parra-Royon,Ghislain Atemezing,J. M. Benítez
3+阅读 · 2018年10月5日
Video-to-Video Synthesis
Ting-Chun Wang,Ming-Yu Liu,Jun-Yan Zhu,Guilin Liu,Andrew Tao,Jan Kautz,Bryan Catanzaro
6+阅读 · 2018年8月20日
Yujia Zhang,Michael Kampffmeyer,Xiaodan Liang,Dingwen Zhang,Min Tan,Eric P. Xing
5+阅读 · 2018年4月30日
Yuxiang Wu,Baotian Hu
6+阅读 · 2018年4月19日
Jinsoo Choi,Tae-Hyun Oh,In So Kweon
4+阅读 · 2018年3月1日
Yike Liu,Abhilash Dighe,Tara Safavi,Danai Koutra
3+阅读 · 2017年4月12日
相关资讯
【文本摘要】Text Summarization文本摘要与注意力机制
深度学习自然语言处理
3+阅读 · 2020年3月15日
已删除
将门创投
8+阅读 · 2019年8月28日
计算机 | 国际会议信息5条
Call4Papers
3+阅读 · 2019年7月3日
计算机 | CCF推荐期刊专刊信息5条
Call4Papers
3+阅读 · 2019年4月10日
无监督元学习表示学习
CreateAMind
20+阅读 · 2019年1月4日
A Technical Overview of AI & ML in 2018 & Trends for 2019
待字闺中
10+阅读 · 2018年12月24日
笔记 | Sentiment Analysis
黑龙江大学自然语言处理实验室
7+阅读 · 2018年5月6日
视频超分辨 Detail-revealing Deep Video Super-resolution 论文笔记
统计学习与视觉计算组
16+阅读 · 2018年3月16日
【论文】变分推断(Variational inference)的总结
机器学习研究会
22+阅读 · 2017年11月16日
Top