Transformer是谷歌发表的论文《Attention Is All You Need》提出一种完全基于Attention的翻译架构

** Constraining linear layers in neural networks to respect symmetry transformations from a group $G$ is a common design principle for invariant networks that has found many applications in machine learning. In this paper, we consider a fundamental question that has received little attention to date: Can these networks approximate any (continuous) invariant function? We tackle the rather general case where $G\leq S_n$ (an arbitrary subgroup of the symmetric group) that acts on $\mathbb{R}^n$ by permuting coordinates. This setting includes several recent popular invariant networks. We present two main results: First, $G$-invariant networks are universal if high-order tensors are allowed. Second, there are groups $G$ for which higher-order tensors are unavoidable for obtaining universality. $G$-invariant networks consisting of only first-order tensors are of special interest due to their practical value. We conclude the paper by proving a necessary condition for the universality of $G$-invariant networks that incorporate only first-order tensors. Lastly, we propose a conjecture stating that this condition is also sufficient. **

** In many previous works, a cascaded phase-only mask (or phase-only hologram) architecture is designed for optical image encryption and watermarking. However, one such system usually cannot process multiple pairs of host images and hidden images in parallel. In our proposed scheme, multiple host images can be simultaneously input to the system and each corresponding output hidden image will be displayed in a non-overlap sub-region in the output imaging plane. Each input host image undergoes a different optical transform in an independent channel within the same system. The multiple cascaded phase masks (up to 25 layers or even more) in the system can be effectively optimized by a wavefront matching algorithm. **

** The study was conducted to develop an appropriate model that could predict the weekly reported Malaria incidence in the Philippines using the Box-Jenkins method.The data were retrieved from the Department of Health(DOH) website in the Philippines. It contains 70 data points of which 60 data points were used in model building and the remaining 10 data points were used for forecast evaluation. The R Statistical Software was used to do all the necessary computations in the study. Box-Cox Transformation and Differencing was done to make the series stationary. Based on the results of the analysis, ARIMA (2, 1, 0) is the appropriate model for the weekly Malaria incidence in the Philippines. **

** In this work we propose a novel method for supervised, keyshots based video summarization by applying a conceptually simple and computationally efficient soft, self-attention mechanism. Current state of the art methods leverage bi-directional recurrent networks such as BiLSTM combined with attention. These networks are complex to implement and computationally demanding compared to fully connected networks. To that end we propose a simple, self-attention based network for video summarization which performs the entire sequence to sequence transformation in a single feed forward pass and single backward pass during training. Our method sets a new state of the art results on two benchmarks TvSum and SumMe, commonly used in this domain. **

** Compositional data consists of vectors of proportions whose components sum to 1. Such vectors lie in the standard simplex, which is a manifold with boundary. One issue that has been rather controversial within the field of compositional data analysis is the choice of metric on the simplex. One popular possibility has been to use the metric implied by logtransforming the data, as proposed by Aitchison [1, 2]; and another popular approach has been to use the standard Euclidean metric inherited from the ambient space. Tsagris et al. [21] proposed a one-parameter family of power transformations, the $\alpha$-transformations, which include both the metric implied by Aitchison's transformation and the Euclidean metric as particular cases. Our underlying philosophy is that, with many datasets, it may make sense to use the data to help us determine a suitable metric. A related possibility is to apply the $\alpha$-transformations to a parametric family of distributions, and then estimate a along with the other parameters. However, as we shall see, when one follows this last approach with the Dirichlet family, some care is needed in a certain limiting case which arises $(\alpha \neq 0)$, as we found out when fitting this model to real and simulated data. Specifically, when the maximum likelihood estimator of a is close to 0, the other parameters tend to be large. The main purpose of the paper is to study this limiting case both theoretically and numerically and to provide insight into these numerical findings. **

** Intensity-based image registration approaches rely on similarity measures to guide the search for geometric correspondences with high affinity between images. The properties of the used measure are vital for the robustness and accuracy of the registration. In this study a symmetric, intensity interpolation-free, affine registration framework based on a combination of intensity and spatial information is proposed. The excellent performance of the framework is demonstrated on a combination of synthetic tests, recovering known transformations in the presence of noise, and real applications in biomedical and medical image registration, for both 2D and 3D images. The method exhibits greater robustness and higher accuracy than similarity measures in common use, when inserted into a standard gradient-based registration framework available as part of the open source Insight Segmentation and Registration Toolkit (ITK). The method is also empirically shown to have a low computational cost, making it practical for real applications. Source code is available. **

** This paper presents a high-level conceptual framework to help orient the discussion and implementation of open-endedness in evolutionary systems. Drawing upon earlier work by Banzhaf et al., three different kinds of open-endedness are identified: exploratory, expansive, and transformational. These are characterised in terms of their relationship to the search space of phenotypic behaviours. A formalism is introduced to describe three key processes required for an evolutionary process: the generation of a phenotype from a genetic description, the evaluation of that phenotype, and the reproduction with variation of individuals according to their evaluation. The distinction is made between intrinsic and extrinsic implementations of these processes. A discussion then investigates how various interactions between these processes, and their modes of implementation, can lead to open-endedness. However, an important contribution of the paper is the demonstration that these considerations relate to exploratory open-endedness only. Conditions for the implementation of the more interesting kinds of open-endedness - expansive and transformational - are also discussed, emphasizing factors such as multiple domains of behaviour, transdomain bridges, and non-additive compositional systems. These factors relate not to the generic evolutionary properties of individuals and populations, but rather to the nature of the building blocks out of which individual organisms are constructed, and the laws and properties of the environment in which they exist. The paper ends with suggestions of how the framework can be used to categorise and compare the open-ended evolutionary potential of different systems, how it might guide the design of systems with greater capacity for open-ended evolution, and how it might be further improved. **