具有高速可下载功能的可缩放动态平行网络搜索引擎平行网络定位器架构 (Architecture of A Scalable Dynamic Parallel WebCrawler with High Speed Downloadable Capability for a Web Search Engine)

Today World Wide Web (WWW) has become a huge ocean of information and it is growing in size everyday. Downloading even a fraction of this mammoth data is like sailing through a huge ocean and it is a challenging task indeed. In order to download a large portion of data from WWW, it has become absolutely essential to make the crawling process parallel. In this paper we offer the architecture of a dynamic parallel Web crawler, christened as "WEB-SAILOR," which presents a scalable approach based on Client-Server model to speed up the download process on behalf of a Web Search Engine in a distributed Domain-set specific environment. WEB-SAILOR removes the possibility of overlapping of downloaded documents by multiple crawlers without even incurring the cost of communication overhead among several parallel "client" crawling processes.

翻译：今天的万维网(WWW)已经成为一个巨大的信息海洋,而且它每天都在扩大。下载甚至这一长毛象数据中的一小部分就像在巨大的海洋中航行一样,这的确是一项艰巨的任务。为了下载WWW的大量数据,将爬行过程平行化已经变得绝对必要。在本文中,我们提供了动态平行的网络爬行器的结构,这个结构以“WEB-SAILR”为名,它以客户服务员模式为基础,为分布式域位特定环境中的网络搜索引擎加快下载过程。WEB-SAILR排除了多个爬行者重复下载文件的可能性,甚至没有在多个平行的“客户”爬行程序之间承担通信费。

相关内容

Engineering

关注 6

《工程》是中国工程院（CAE）于2015年推出的国际开放存取期刊。其目的是提供一个高水平的平台，传播和分享工程研发的前沿进展、当前主要研究成果和关键成果；报告工程科学的进展，讨论工程发展的热点、兴趣领域、挑战和前景，在工程中考虑人与环境的福祉和伦理道德，鼓励具有深远经济和社会意义的工程突破和创新，使之达到国际先进水平，成为新的生产力，从而改变世界，造福人类，创造新的未来。期刊链接：https://www.sciencedirect.com/journal/engineering

商业数据分析，39页ppt

专知会员服务

157+阅读 · 2020年6月2日

【微众银行】联邦学习白皮书_v2.0，48页pdf，

专知会员服务

163+阅读 · 2020年4月26日

因果图，Causal Graphs，52页ppt

专知会员服务

238+阅读 · 2020年4月19日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

59+阅读 · 2020年3月19日