Descent directions such as movement towards Frank-Wolfe vertices, away steps, in-face away steps and pairwise directions have been an important design consideration in conditional gradient descent (CGD) variants. In this work, we attempt to demystify the impact of movement in these directions towards attaining constrained minimizers. The best local direction of descent is the directional derivative of the projection of the gradient, which we refer to as the $\textit{shadow}$ of the gradient. We show that the continuous-time dynamics of moving in the shadow are equivalent to those of PGD however non-trivial to discretize. By projecting gradients in PGD, one not only ensures feasibility but also is able to "wrap" around the convex region. We show that Frank-Wolfe (FW) vertices in fact recover the maximal wrap one can obtain by projecting gradients, thus providing a new perspective to these steps. We also claim that the shadow steps give the best direction of descent emanating from the convex hull of all possible away-vertices. Opening up the PGD movements in terms of shadow steps gives linear convergence, dependent on the number of faces. We combine these insights into a novel $S\small{HADOW}$-$CG$ method that uses FW steps (i.e., wrap around the polytope) and shadow steps (i.e., optimal local descent direction), while enjoying linear convergence. Our analysis develops properties of directional derivatives of projections (which may be of independent interest), while providing a unifying view of various descent directions in the CGD literature.
翻译:向 Frank- Wolfe 脊椎的移动、 向外步骤、 向外步骤、 向外步骤和双向方向的下层方向, 是有条件的梯度下移( CGD) 变量中的一个重要设计考虑因素。 在这项工作中, 我们试图解开这些方向向内运动的影响, 以达到限制最小化。 最佳的本地下降方向是梯度投影的方向衍生出, 我们称之为斜度的$textit{ shadow} 美元。 我们还表明, 阴影中运动的连续时间动态相当于 PGD 的动态, 相当于 PGD 的动态, 与 PGD 的动态相当, 通过预测PGD 的梯度方向, 不仅确保可行性, 还能在C 区域周围“ 缩小 ” 。 我们显示, 弗兰克- Wolfe 的脊椎事实上恢复了通过投影梯度得到的最大折叠, 从而为这些步骤提供了新的视角。 我们还声称, 暗色步骤提供了来自所有可能的离心的螺旋的底结构中的最佳下降方向, 。 我们的直径的直线值 。