** Finding an optimal correspondence between point sets is a common task in computer vision. Existing techniques assume relatively simple relationships among points and do not guarantee an optimal match. We introduce an algorithm capable of exactly solving point set matching by modeling the task as hypergraph matching. The algorithm extends the classical branch and bound paradigm to select and aggregate vertices under a proposed decomposition of the multilinear objective function. The methodology is motivated by Caenorhabditis elegans, a model organism used frequently in developmental biology and neurobiology. The embryonic C. elegans contains seam cells that can act as fiducial markers allowing the identification of other nuclei during embryo development. The proposed algorithm identifies seam cells more accurately than established point-set matching methods, while providing a framework to approach other similarly complex point set matching tasks. **

** This paper shows that graph spectral embedding using the random walk Laplacian produces vector representations which are completely corrected for node degree. Under a generalised random dot product graph, the embedding provides uniformly consistent estimates of degree-corrected latent positions, with asymptotically Gaussian error. In the special case of a degree-corrected stochastic block model, the embedding concentrates about K distinct points, representing communities. These can be recovered perfectly, asymptotically, through a subsequent clustering step, without spherical projection, as commonly required by algorithms based on the adjacency or normalised, symmetric Laplacian matrices. While the estimand does not depend on degree, the asymptotic variance of its estimate does -- higher degree nodes are embedded more accurately than lower degree nodes. Our central limit theorem therefore suggests fitting a weighted Gaussian mixture model as the subsequent clustering step, for which we provide an expectation-maximisation algorithm. **

** An elastic-degenerate (ED) string is a sequence of $n$ sets of strings of total length $N$, which was recently proposed to model a set of similar sequences. The ED string matching (EDSM) problem is to find all occurrences of a pattern of length $m$ in an ED text. An $O(nm^{1.5}\sqrt{\log m}+N)$-time algorithm for EDSM is known [Aoyama et al., CPM 2018]. The standard assumption in the prior work on this question is that $N$ is substantially larger than both $n$ and $m$, and thus we would like to have a linear dependency on the former. Under this assumption, the natural open problem is whether we can decrease the 1.5 exponent in the time complexity, similarly as in the related (but, to the best of our knowledge, not equivalent) word break problem [Backurs and Indyk, FOCS 2016]. Our starting point is a conditional lower bound for EDSM. We use the popular combinatorial Boolean Matrix Multiplication (BMM) conjecture stating that there is no truly subcubic combinatorial algorithm for BMM [Abboud and Williams, FOCS 2014]. By designing an appropriate reduction we show that a combinatorial algorithm solving the EDSM problem in $O(nm^{1.5-e}+N)$ time, for any $e>0$, refutes this conjecture. Our reduction should be understood as an indication that decreasing the exponent requires fast matrix multiplication. String periodicity and fast Fourier transform are two standard tools in string algorithms. Our main technical contribution is that we successfully combine these tools with fast matrix multiplication to design a non-combinatorial $\tilde{O}(nm^{\omega-1}+N)$-time algorithm for EDSM, where $\omega$ denotes the matrix multiplication exponent. To the best of our knowledge, we are the first to combine these tools. In particular, using the fact that $\omega<2.373$ [Le Gall, ISSAC 2014; Williams, STOC 2012], we obtain an $O(nm^{1.373}+N)$-time algorithm for EDSM. **

** We present a novel spectral machine learning (SML) method in screening for pancreatic mass using CT imaging. Our algorithm is trained with approximately 30,000 images from 250 patients (50 patients with normal pancreas and 200 patients with abnormal pancreas findings) based on public data sources. A test accuracy of 94.6 percents was achieved in the out-of-sample diagnosis classification based on a total of approximately 15,000 images from 113 patients, whereby 26 out of 32 patients with normal pancreas and all 81 patients with abnormal pancreas findings were correctly diagnosed. SML is able to automatically choose fundamental images (on average 5 or 9 images for each patient) in the diagnosis classification and achieve the above mentioned accuracy. The computational time is 75 seconds for diagnosing 113 patients in a laptop with standard CPU running environment. Factors that influenced high performance of a well-designed integration of spectral learning and machine learning included: 1) use of eigenvectors corresponding to several of the largest eigenvalues of sample covariance matrix (spike eigenvectors) to choose input attributes in classification training, taking into account only the fundamental information of the raw images with less noise; 2) removal of irrelevant pixels based on mean-level spectral test to lower the challenges of memory capacity and enhance computational efficiency while maintaining superior classification accuracy; 3) adoption of state-of-the-art machine learning classification, gradient boosting and random forest. Our methodology showcases practical utility and improved accuracy of image diagnosis in pancreatic mass screening in the era of AI. **

** We provide improved upper bounds for the simultaneous sketching complexity of edit distance. Consider two parties, Alice with input $x\in\Sigma^n$ and Bob with input $y\in\Sigma^n$, that share public randomness and are given a promise that the edit distance $\mathsf{ed}(x,y)$ between their two strings is at most some given value $k$. Alice must send a message $sx$ and Bob must send $sy$ to a third party Charlie, who does not know the inputs but shares the same public randomness and also knows $k$. Charlie must output $\mathsf{ed}(x,y)$ precisely as well as a sequence of $\mathsf{ed}(x,y)$ edits required to transform $x$ into $y$. The goal is to minimize the lengths $|sx|, |sy|$ of the messages sent. The protocol of Belazzougui and Zhang (FOCS 2016), building upon the random walk method of Chakraborty, Goldenberg, and Kouck\'y (STOC 2016), achieves a maximum message length of $\tilde O(k^8)$ bits, where $\tilde O(\cdot)$ hides $\mathrm{poly}(\log n)$ factors. In this work we build upon Belazzougui and Zhang's protocol and provide an improved analysis demonstrating that a slight modification of their construction achieves a bound of $\tilde O(k^3)$. **

** Despite the remarkable recent progress, person Re-identification (Re-ID) approaches are still suffering from the failure cases where the discriminative body parts are missing. To mitigate such cases, we propose a simple yet effective Horizontal Pyramid Matching (HPM) approach to fully exploit various partial information of a given person, so that correct person candidates can be still identified even if some key parts are missing. Within the HPM, we make the following contributions to produce a more robust feature representation for the Re-ID task: 1) we learn to classify using partial feature representations at different horizontal pyramid scales, which successfully enhance the discriminative capabilities of various person parts; 2) we exploit average and max pooling strategies to account for person-specific discriminative information in a global-local manner; 3) we introduce a novel horizontal erasing operation during training to further resist the problem of missing parts and boost the robustness of feature representations. Extensive experiments are conducted on three popular benchmarks including Market-1501, DukeMTMC-reID and CUHK03. We achieve mAP scores of 83.1%, 74.5% and 59.7% on these benchmarks, which are the new state-of-the-arts. **

** In this paper, we propose a graph correspondence transfer (GCT) approach for person re-identification. Unlike existing methods, the GCT model formulates person re-identification as an off-line graph matching and on-line correspondence transferring problem. In specific, during training, the GCT model aims to learn off-line a set of correspondence templates from positive training pairs with various pose-pair configurations via patch-wise graph matching. During testing, for each pair of test samples, we select a few training pairs with the most similar pose-pair configurations as references, and transfer the correspondences of these references to test pair for feature distance calculation. The matching score is derived by aggregating distances from different references. For each probe image, the gallery image with the highest matching score is the re-identifying result. Compared to existing algorithms, our GCT can handle spatial misalignment caused by large variations in view angles and human poses owing to the benefits of patch-wise graph matching. Extensive experiments on five benchmarks including VIPeR, Road, PRID450S, 3DPES and CUHK01 evidence the superior performance of GCT model over other state-of-the-art methods. **