In 1954, Alston S. Householder published Principles of Numerical Analysis, one of the first modern treatments on matrix decomposition that favored a (block) LU decomposition-the factorization of a matrix into the product of lower and upper triangular matrices. And now, matrix decomposition has become a core technology in machine learning, largely due to the development of the back propagation algorithm in fitting a neural network. The sole aim of this survey is to give a self-contained introduction to concepts and mathematical tools in numerical linear algebra and matrix analysis in order to seamlessly introduce matrix decomposition techniques and their applications in subsequent sections. However, we clearly realize our inability to cover all the useful and interesting results concerning matrix decomposition and given the paucity of scope to present this discussion, e.g., the separated analysis of the Euclidean space, Hermitian space, Hilbert space, and things in the complex domain. We refer the reader to literature in the field of linear algebra for a more detailed introduction to the related fields.

### 相关内容

The key elements of seismic probabilistic risk assessment studies are the fragility curves which express the probabilities of failure of structures conditional to a seismic intensity measure. A multitude of procedures is currently available to estimate these curves. For modeling-based approaches which may involve complex and expensive numerical models, the main challenge is to optimize the calls to the numerical codes to reduce the estimation costs. Adaptive techniques can be used for this purpose, but in doing so, taking into account the uncertainties of the estimates (via confidence intervals or ellipsoids related to the size of the samples used) is an arduous task because the samples are no longer independent and possibly not identically distributed. The main contribution of this work is to deal with this question in a mathematical and rigorous way. To this end, we propose and implement an active learning methodology based on adaptive importance sampling for parametric estimations of fragility curves. We prove some theoretical properties (consistency and asymptotic normality) for the estimator of interest. Moreover, we give a convergence criterion in order to use asymptotic confidence ellipsoids. Finally, the performances of the methodology are evaluated on analytical and industrial test cases of increasing complexity.

Embedded and Internet-of-Things (IoT) devices have seen an increase in adoption in many domains. The security of these devices is of great importance as they are often used to control critical infrastructure, medical devices, and vehicles. Existing solutions to isolate microcontroller (MCU) resources in order to increase their security face significant challenges such as specific hardware unavailability, Memory Protection Unit (MPU) limitations and a significant lack of Direct Memory Access (DMA) support. Nevertheless, DMA is fundamental for the power and performance requirements of embedded applications. In this paper, we present D-Box, a systematic approach to enable secure DMA operations for compartmentalization solutions of embedded applications using real-time operating systems (RTOS). D-Box defines a reference architecture and a workflow to protect DMA operations holistically. It provides practical methods to harden the kernel and define capability-based security policies for easy definition of DMA operations with strong security properties. We implemented a D-Box prototype for the Cortex-M3/M4 on top of the popular FreeRTOS-MPU (F-MPU). The D-Box procedures and a stricter security model enabled DMA operations, yet it exposed 41 times less ROP (return-orienting-programming) gadgets when compared with the standard F-MPU. D-Box adds only a 2% processor overhead while reducing the power consumption of peripheral operation benchmarks by 18.2%. The security properties and performance of D-Box were tested and confirmed on a real-world case study of a Programmable Logic Controller (PLC) application.

In this work, we study a variant of nonnegative matrix factorization where we wish to find a symmetric factorization of a given input matrix into a sparse, Boolean matrix. Formally speaking, given $\mathbf{M}\in\mathbb{Z}^{m\times m}$, we want to find $\mathbf{W}\in\{0,1\}^{m\times r}$ such that $\| \mathbf{M} - \mathbf{W}\mathbf{W}^\top \|_0$ is minimized among all $\mathbf{W}$ for which each row is $k$-sparse. This question turns out to be closely related to a number of questions like recovering a hypergraph from its line graph, as well as reconstruction attacks for private neural network training. As this problem is hard in the worst-case, we study a natural average-case variant that arises in the context of these reconstruction attacks: $\mathbf{M} = \mathbf{W}\mathbf{W}^{\top}$ for $\mathbf{W}$ a random Boolean matrix with $k$-sparse rows, and the goal is to recover $\mathbf{W}$ up to column permutation. Equivalently, this can be thought of as recovering a uniformly random $k$-uniform hypergraph from its line graph. Our main result is a polynomial-time algorithm for this problem based on bootstrapping higher-order information about $\mathbf{W}$ and then decomposing an appropriate tensor. The key ingredient in our analysis, which may be of independent interest, is to show that such a matrix $\mathbf{W}$ has full column rank with high probability as soon as $m = \widetilde{\Omega}(r)$, which we do using tools from Littlewood-Offord theory and estimates for binary Krawtchouk polynomials.

Relying on random matrix theory (RMT), this paper studies asymmetric order-$d$ spiked tensor models with Gaussian noise. Using the variational definition of the singular vectors and values of (Lim, 2005), we show that the analysis of the considered model boils down to the analysis of an equivalent spiked symmetric block-wise random matrix, that is constructed from contractions of the studied tensor with the singular vectors associated to its best rank-1 approximation. Our approach allows the exact characterization of the almost sure asymptotic singular value and alignments of the corresponding singular vectors with the true spike components, when $\frac{n_i}{\sum_{j=1}^d n_j}\to c_i\in [0, 1]$ with $n_i$'s the tensor dimensions. In contrast to other works that rely mostly on tools from statistical physics to study random tensors, our results rely solely on classical RMT tools such as Stein's lemma. Finally, classical RMT results concerning spiked random matrices are recovered as a particular case.

This paper uses the concept of algorithmic efficiency to present a unified theory of intelligence. Intelligence is defined informally, formally, and computationally. I introduce the concept of Dimensional complexity in algorithmic efficiency and deduce that an optimally efficient algorithm has zero Time complexity, zero Space complexity, and an infinite Dimensional complexity. This algorithm is then used to generate the number line.

Multiplying matrices is among the most fundamental and compute-intensive operations in machine learning. Consequently, there has been significant work on efficiently approximating matrix multiplies. We introduce a learning-based algorithm for this task that greatly outperforms existing methods. Experiments using hundreds of matrices from diverse domains show that it often runs $100\times$ faster than exact matrix products and $10\times$ faster than current approximate methods. In the common case that one matrix is known ahead of time, our method also has the interesting property that it requires zero multiply-adds. These results suggest that a mixture of hashing, averaging, and byte shuffling$-$the core operations of our method$-$could be a more promising building block for machine learning than the sparsified, factorized, and/or scalar quantized matrix products that have recently been the focus of substantial research and hardware investment.

We show that for the problem of testing if a matrix $A \in F^{n \times n}$ has rank at most $d$, or requires changing an $\epsilon$-fraction of entries to have rank at most $d$, there is a non-adaptive query algorithm making $\widetilde{O}(d^2/\epsilon)$ queries. Our algorithm works for any field $F$. This improves upon the previous $O(d^2/\epsilon^2)$ bound (SODA'03), and bypasses an $\Omega(d^2/\epsilon^2)$ lower bound of (KDD'14) which holds if the algorithm is required to read a submatrix. Our algorithm is the first such algorithm which does not read a submatrix, and instead reads a carefully selected non-adaptive pattern of entries in rows and columns of $A$. We complement our algorithm with a matching query complexity lower bound for non-adaptive testers over any field. We also give tight bounds of $\widetilde{\Theta}(d^2)$ queries in the sensing model for which query access comes in the form of $\langle X_i, A\rangle:=tr(X_i^\top A)$; perhaps surprisingly these bounds do not depend on $\epsilon$. We next develop a novel property testing framework for testing numerical properties of a real-valued matrix $A$ more generally, which includes the stable rank, Schatten-$p$ norms, and SVD entropy. Specifically, we propose a bounded entry model, where $A$ is required to have entries bounded by $1$ in absolute value. We give upper and lower bounds for a wide range of problems in this model, and discuss connections to the sensing model above.

Deep learning is the mainstream technique for many machine learning tasks, including image recognition, machine translation, speech recognition, and so on. It has outperformed conventional methods in various fields and achieved great successes. Unfortunately, the understanding on how it works remains unclear. It has the central importance to lay down the theoretic foundation for deep learning. In this work, we give a geometric view to understand deep learning: we show that the fundamental principle attributing to the success is the manifold structure in data, namely natural high dimensional data concentrates close to a low-dimensional manifold, deep learning learns the manifold and the probability distribution on it. We further introduce the concepts of rectified linear complexity for deep neural network measuring its learning capability, rectified linear complexity of an embedding manifold describing the difficulty to be learned. Then we show for any deep neural network with fixed architecture, there exists a manifold that cannot be learned by the network. Finally, we propose to apply optimal mass transportation theory to control the probability distribution in the latent space.

This research mainly emphasizes on traffic detection thus essentially involving object detection and classification. The particular work discussed here is motivated from unsatisfactory attempts of re-using well known pre-trained object detection networks for domain specific data. In this course, some trivial issues leading to prominent performance drop are identified and ways to resolve them are discussed. For example, some simple yet relevant tricks regarding data collection and sampling prove to be very beneficial. Also, introducing a blur net to deal with blurred real time data is another important factor promoting performance elevation. We further study the neural network design issues for beneficial object classification and involve shared, region-independent convolutional features. Adaptive learning rates to deal with saddle points are also investigated and an average covariance matrix based pre-conditioned approach is proposed. We also introduce the use of optical flow features to accommodate orientation information. Experimental results demonstrate that this results in a steady rise in the performance rate.

Since the invention of word2vec, the skip-gram model has significantly advanced the research of network embedding, such as the recent emergence of the DeepWalk, LINE, PTE, and node2vec approaches. In this work, we show that all of the aforementioned models with negative sampling can be unified into the matrix factorization framework with closed forms. Our analysis and proofs reveal that: (1) DeepWalk empirically produces a low-rank transformation of a network's normalized Laplacian matrix; (2) LINE, in theory, is a special case of DeepWalk when the size of vertices' context is set to one; (3) As an extension of LINE, PTE can be viewed as the joint factorization of multiple networks' Laplacians; (4) node2vec is factorizing a matrix related to the stationary distribution and transition probability tensor of a 2nd-order random walk. We further provide the theoretical connections between skip-gram based network embedding algorithms and the theory of graph Laplacian. Finally, we present the NetMF method as well as its approximation algorithm for computing network embedding. Our method offers significant improvements over DeepWalk and LINE for conventional network mining tasks. This work lays the theoretical foundation for skip-gram based network embedding methods, leading to a better understanding of latent network representation learning.

Clement Gauchy,Cyril Feau,Josselin Garnier
0+阅读 · 1月14日
Alejandro Mera,Yi Hui Chen,Ruimin Sun,Engin Kirda,Long Lu
0+阅读 · 1月13日
Sitan Chen,Zhao Song,Runzhou Tao,Ruizhe Zhang
0+阅读 · 1月13日
Mohamed El Amine Seddik,Maxime Guillaud,Romain Couillet
0+阅读 · 1月12日
Alexander Ngu
0+阅读 · 2021年12月24日
Davis Blalock,John Guttag
6+阅读 · 2021年6月21日
Maria-Florina Balcan,Yi Li,David P. Woodruff,Hongyang Zhang
3+阅读 · 2018年10月18日
Na Lei,Zhongxuan Luo,Shing-Tung Yau,David Xianfeng Gu
3+阅读 · 2018年5月31日
Jiezhong Qiu,Yuxiao Dong,Hao Ma,Jian Li,Kuansan Wang,Jie Tang
15+阅读 · 2017年12月12日

86+阅读 · 2021年6月4日

38+阅读 · 2020年12月14日

75+阅读 · 2019年10月11日

19+阅读 · 2019年10月9日

Call4Papers
10+阅读 · 2019年6月24日
CreateAMind
13+阅读 · 2019年5月22日
CreateAMind
8+阅读 · 2019年5月18日
Call4Papers
3+阅读 · 2019年4月10日
Call4Papers
5+阅读 · 2019年1月10日

10+阅读 · 2018年12月24日
CreateAMind
8+阅读 · 2018年12月10日
CreateAMind
24+阅读 · 2018年9月12日
CreateAMind
3+阅读 · 2018年4月15日
Top