FAST：Conference on File and Storage Technologies。 Explanation：文件和存储技术会议。 Publisher：USENIX。 SIT:http://dblp.uni-trier.de/db/conf/fast/

Traditional time discretization methods use a single timestep for the entire system of interest and can perform poorly when the dynamics of the system exhibits a wide range of time scales. Multirate infinitesimal step (MIS) methods (Knoth and Wolke, 1998) offer an elegant and flexible approach to efficiently integrate such systems. The slow components are discretized by a Runge-Kutta method, and the fast components are resolved by solving modified fast differential equations. Sandu (2018) developed the Multirate Infinitesimal General-structure Additive Runge-Kutta (MRI-GARK) family of methods that includes traditional MIS schemes as a subset. The MRI-GARK framework allowed the construction of the first fourth order MIS schemes. This framework also enabled the introduction of implicit methods, which are decoupled in the sense that any implicitness lies entirely within the fast or slow integrations. It was shown by Sandu that the stability of decoupled implicit MRI-GARK methods has limitations when both the fast and slow components are stiff and interact strongly. This work extends the MRI-GARK framework by introducing coupled implicit methods to solve stiff multiscale systems. The coupled approach has the potential to considerably improve the overall stability of the scheme, at the price of requiring implicit stage calculations over the entire system. Two coupling strategies are considered. The first computes coupled Runge-Kutta stages before solving a single differential equation to refine the fast solution. The second alternates between computing coupled Runge-Kutta stages and solving fast differential equations. We derive order conditions and perform the stability analysis for both strategies. The new coupled methods offer improved stability compared to the decoupled MRI-GARK schemes. The theoretical properties of the new methods are validated with numerical experiments.

Convolutional neural network (CNN) and its variants have led to many state-of-art results in various fields. However, a clear theoretical understanding about them is still lacking. Recently, multi-layer convolutional sparse coding (ML-CSC) has been proposed and proved to equal such simply stacked networks (plain networks). Here, we think three factors in each layer of it including the initialization, the dictionary design and the number of iterations greatly affect the performance of ML-CSC. Inspired by these considerations, we propose two novel multi-layer models--residual convolutional sparse coding model (Res-CSC) and mixed-scale dense convolutional sparse coding model (MSD-CSC), which have close relationship with the residual neural network (ResNet) and mixed-scale (dilated) dense neural network (MSDNet), respectively. Mathematically, we derive the shortcut connection in ResNet as a special case of a new forward propagation rule on ML-CSC. We find a theoretical interpretation of the dilated convolution and dense connection in MSDNet by analyzing MSD-CSC, which gives a clear mathematical understanding about them. We implement the iterative soft thresholding algorithm (ISTA) and its fast version to solve Res-CSC and MSD-CSC, which can employ the unfolding operation for further improvements. At last, extensive numerical experiments and comparison with competing methods demonstrate their effectiveness using three typical datasets.

Biomedical challenges have become the de facto standard for benchmarking biomedical image analysis algorithms. While the number of challenges is steadily increasing, surprisingly little effort has been invested in ensuring high quality design, execution and reporting for these international competitions. Specifically, results analysis and visualization in the event of uncertainties have been given almost no attention in the literature. Given these shortcomings, the contribution of this paper is two-fold: (1) We present a set of methods to comprehensively analyze and visualize the results of single-task and multi-task challenges and apply them to a number of simulated and real-life challenges to demonstrate their specific strengths and weaknesses; (2) We release the open-source framework challengeR as part of this work to enable fast and wide adoption of the methodology proposed in this paper. Our approach offers an intuitive way to gain important insights into the relative and absolute performance of algorithms, which cannot be revealed by commonly applied visualization techniques. This is demonstrated by the experiments performed within this work. Our framework could thus become an important tool for analyzing and visualizing challenge results in the field of biomedical image analysis and beyond.

The Kronecker product is an important matrix operation with a wide range of applications in supporting fast linear transforms, including signal processing, graph theory, quantum computing and deep learning. In this work, we introduce a generalization of the fast Johnson-Lindenstrauss projection for embedding vectors with Kronecker product structure, the Kronecker fast Johnson-Lindenstrauss transform (KFJLT). The KFJLT reduces the embedding cost to an exponential factor of the standard fast Johnson-Lindenstrauss transform (FJLT)'s cost when applied to vectors with Kronecker structure, by avoiding explicitly forming the full Kronecker products. We prove that this computational gain comes with only a small price in embedding power: given $N = \prod_{k=1}^d n_k$, consider a finite set of $p$ points in a tensor product of $d$ constituent Euclidean spaces $\bigotimes_{k=d}^{1}\mathbb{R}^{n_k} \subset \mathbb{R}^{N}$. With high probability, a random KFJLT matrix of dimension $N \times m$ embeds the set of points up to multiplicative distortion $(1\pm \varepsilon)$ provided by $m \gtrsim \varepsilon^{-2} \cdot \log^{2d - 1} (p) \cdot \log N$. We conclude by describing a direct application of the KFJLT to the efficient solution of large-scale Kronecker-structured least squares problems for fitting the CP tensor decomposition.

In networks today, the data plane handles forwarding---sending a packet to the next device in the path---and the control plane handles routing---deciding the path of the packet in the network. This architecture has limitations. First, when link failures occur, the data plane has to wait for the control plane to install new routes, and packet losses can occur due to delayed routing convergence or central controller latencies. Second, policy-compliance is not guaranteed without sophisticated configuration synthesis or controller intervention. In this paper, we take advantage of the recent advances in fast programmable switches to perform policy-compliant route computations entirely in the data plane, thus providing fast reactions to failures. D2R, our new network architecture, can provide the illusion of a network fabric that is always available and policy-compliant, even under failures. We implement our data plane in P4 and demonstrate its viability in real world topologies.

Adversarial examples of deep neural networks are receiving ever increasing attention because they help in understanding and reducing the sensitivity to their input. This is natural given the increasing applications of deep neural networks in our everyday lives. When white-box attacks are almost always successful, it is typically only the distortion of the perturbations that matters in their evaluation. In this work, we argue that speed is important as well, especially when considering that fast attacks are required by adversarial training. Given more time, iterative methods can always find better solutions. We investigate this speed-distortion trade-off in some depth and introduce a new attack called boundary projection (BP) that improves upon existing methods by a large margin. Our key idea is that the classification boundary is a manifold in the image space: we therefore quickly reach the boundary and then optimize distortion on this manifold.

A new implementation of the canonical polyadic decomposition (CPD) is presented. It features lower computational complexity and memory usage than the available state of art implementations available. The CPD of tensors is a challenging problem which has been approached in several manners. Alternating least squares algorithms were used for a long time, but they convergence properties are limited. Nonlinear least squares (NLS) algorithms - more precisely, damped Gauss-Newton (dGN) algorithms - are much better in this sense, but they require inverting large Hessians, and for this reason there is just a few implementations using this approach. In this paper, we propose a fast dGN implementation to compute the CPD. In this paper, we make the case to always compress the tensor, and propose a fast damped Gauss-Newton implementation to compute the canonical polyadic decomposition.

In this paper, we describe a strategy for training neural networks for object detection in range images obtained from one type of LiDAR sensor using labeled data from a different type of LiDAR sensor. Additionally, an efficient model for object detection in range images for use in self-driving cars is presented. Currently, the highest performing algorithms for object detection from LiDAR measurements are based on neural networks. Training these networks using supervised learning requires large annotated datasets. Therefore, most research using neural networks for object detection from LiDAR point clouds is conducted on a very small number of publicly available datasets. Consequently, only a small number of sensor types are used. We use an existing annotated dataset to train a neural network that can be used with a LiDAR sensor that has a lower resolution than the one used for recording the annotated dataset. This is done by simulating data from the lower resolution LiDAR sensor based on the higher resolution dataset. Furthermore, improvements to models that use LiDAR range images for object detection are presented. The results are validated using both simulated sensor data and data from an actual lower resolution sensor mounted to a research vehicle. It is shown that the model can detect objects from 360{\deg} range images in real time.

Data sketches are approximate succinct summaries of long streams. They are widely used for processing massive amounts of data and answering statistical queries about it in real-time. Existing libraries producing sketches are very fast, but do not allow parallelism for creating sketches using multiple threads or querying them while they are being built. We present a generic approach to parallelising data sketches efficiently, while bounding the error that such parallelism introduces. Utilising relaxed semantics and the notion of strong linearisability we prove our algorithm's correctness and analyse the error it induces in two specific sketches. Our implementation achieves high scalability while keeping the error small.

Neural Tangents is a library designed to enable research into infinite-width neural networks. It provides a high-level API for specifying complex and hierarchical neural network architectures. These networks can then be trained and evaluated either at finite-width as usual or in their infinite-width limit. Infinite-width networks can be trained analytically using exact Bayesian inference or using gradient descent via the Neural Tangent Kernel. Additionally, Neural Tangents provides tools to study gradient descent training dynamics of wide but finite networks in either function space or weight space. The entire library runs out-of-the-box on CPU, GPU, or TPU. All computations can be automatically distributed over multiple accelerators with near-linear scaling in the number of devices. Neural Tangents is available at www.github.com/google/neural-tangents. We also provide an accompanying interactive Colab notebook.

The majority of document image analysis systems use a document skew detection algorithm to simplify all its further processing stages. A huge amount of such algorithms based on Hough transform (HT) analysis has already been proposed. Despite this, we managed to find only one work where the Fast Hough Transform (FHT) usage was suggested to solve the indicated problem. Unfortunately, no study of that method was provided. In this work, we propose and study a skew detection algorithm for the document images which relies on FHT analysis. To measure this algorithm quality we use the dataset from the problem oriented DISEC'13 contest and its evaluation methodology. Obtained values for AED, TOP80, and CE criteria are equal to 0.086, 0.056, 68.80 respectively.

We propose a new fast randomized algorithm for interpolative decomposition of matrices which utilizes CountSketch. We then extend this approach to the tensor interpolative decomposition problem introduced by Biagioni et al. (J. Comput. Phys. 281, pp. 116-134, 2015). Theoretical performance guarantees are provided for both the matrix and tensor settings. Numerical experiments on both synthetic and real data demonstrate that our algorithms maintain the accuracy of competing methods, while running in less time, achieving at least an order of magnitude speed-up on large matrices and tensors.

Top