Complex networks are often either too large for full exploration, partially accessible or partially observed. Downstream learning tasks on incomplete networks can produce low quality results. In addition, reducing the incompleteness of the network can be costly and nontrivial. As a result, network discovery algorithms optimized for specific downstream learning tasks and given resource collection constraints are of great interest. In this paper we formulate the task-specific network discovery problem in an incomplete network setting as a sequential decision making problem. Our downstream task is vertex classification.We propose a framework, called Network Actor Critic (NAC), which learns concepts of policy and reward in an offline setting via a deep reinforcement learning algorithm. A quantitative study is presented on several synthetic and real benchmarks. We show that offline models of reward and network discovery policies lead to significantly improved performance when compared to competitive online discovery algorithms.
翻译:复杂网络往往过于庞大,无法全面探索、部分可进入或部分观察。不完全网络的下游学习任务可以产生低质量结果。此外,降低网络的不完善性可能成本高且非三元性。因此,为具体的下游学习任务优化的网络发现算法和由于资源收集方面的限制,引起了极大的兴趣。在本文件中,我们将任务特有的网络发现问题作为不完全的网络设置,作为一个顺序决策问题。我们下游的任务是对顶端分类。我们建议了一个框架,称为网络Actor Critic(NAC),通过深度强化学习算法,在离线环境中学习政策和奖励概念。根据若干合成和实际基准进行了定量研究。我们表明,与竞争性的在线发现算法相比,脱机奖励和网络发现政策模式可以大大改进业绩。