Payment channel networks, and the Lightning Network in particular, seem to offer a solution to the lack of scalability and privacy offered by Bitcoin and other blockchain-based cryptocurrencies. Previous research has focused on the scalability, availability, and crypto-economics of the Lightning Network, but relatively little attention has been paid to exploring the level of privacy it achieves in practice. This paper presents a thorough analysis of the privacy offered by the Lightning Network, by presenting several attacks that exploit publicly available information about the network in order to learn information that is designed to be kept secret, such as how many coins a node has available or who the sender and recipient are in a payment routed through the network.
Post-fault dynamics of short-term voltage stability (SVS) present spatial-temporal characteristics, but the existing data-driven methods for online SVS assessment fail to incorporate such characteristics into their models effectively. Confronted with this dilemma, this paper develops a novel spatial-temporal graph convolutional network (STGCN) to address this problem. The proposed STGCN utilizes graph convolution to integrate network topology information into the learning model to exploit spatial information. Then, it adopts one-dimensional convolution to exploit temporal information. In this way, it models the spatial-temporal characteristics of SVS with complete convolutional structures. After that, a node layer and a system layer are strategically designed in the STGCN for SVS assessment. The proposed STGCN incorporates the characteristics of SVS into the data-driven classification model. It can result in higher assessment accuracy, better robustness and adaptability than conventional methods. Besides, parameters in the system layer can provide valuable information about the influences of individual buses on SVS. Test results on the real-world Guangdong Power Grid in South China verify the effectiveness of the proposed network.
Credit networks rely on decentralized, pairwise trust relationships (channels) to exchange money or goods. Credit networks arise naturally in many financial systems, including the recent construct of payment channel networks in blockchain systems. An important performance metric for these networks is their transaction throughput. However, predicting the throughput of a credit network is nontrivial. Unlike traditional communication channels, credit channels can become imbalanced; they are unable to support more transactions in a given direction once the credit limit has been reached. This potential for imbalance creates a complex dependency between a network's throughput and its topology, path choices, and the credit balances (state) on every channel. Even worse, certain combinations of these factors can lead the credit network to deadlocked states where no transactions can make progress. In this paper, we study the relationship between the throughput of a credit network and its topology and credit state. We show that the presence of deadlocks completely characterizes a network's throughput sensitivity to different credit states. Although we show that identifying deadlocks in an arbitrary topology is NP-hard, we propose a peeling algorithm inspired by decoding algorithms for erasure codes that upper bounds the severity of the deadlock. We use the peeling algorithm as a tool to compare the performance of random topologies as well as to aid in the synthesis of topologies robust to deadlocks.
Mobile health applications (mHealth apps for short) are being increasingly adopted in the healthcare sector, enabling stakeholders such as governments, health units, medics, and patients, to utilize health services in a pervasive manner. Despite having several known benefits, mHealth apps entail significant security and privacy challenges that can lead to data breaches with serious social, legal, and financial consequences. This research presents an empirical investigation about security awareness of end-users of mHealth apps that are available on major mobile platforms, including Android and iOS. We collaborated with two mHealth providers in Saudi Arabia to survey 101 end-users, investigating their security awareness about (i) existing and desired security features, (ii) security related issues, and (iii) methods to improve security knowledge. Findings indicate that majority of the end-users are aware of the existing security features provided by the apps (e.g., restricted app permissions); however, they desire usable security (e.g., biometric authentication) and are concerned about privacy of their health information (e.g., data anonymization). End-users suggested that protocols such as session timeout or Two-factor authentication (2FA) positively impact security but compromise usability of the app. Security-awareness via social media, peer guidance, or training from app providers can increase end-users trust in mHealth apps. This research investigates human-centric knowledge based on empirical evidence and provides a set of guidelines to develop secure and usable mHealth apps.
Android apps include third-party native libraries to increase performance and to reuse functionality. Native code is directly executed from apps through the Java Native Interface or the Android Native Development Kit. Android developers add precompiled native libraries to their projects, enabling their use. Unfortunately, developers often struggle or simply neglect to update these libraries in a timely manner. This results in the continuous use of outdated native libraries with unpatched security vulnerabilities years after patches became available. To further understand such phenomena, we study the security updates in native libraries in the most popular 200 free apps on Google Play from Sept. 2013 to May 2020. A core difficulty we face in this study is the identification of libraries and their versions. Developers often rename or modify libraries, making their identification challenging. We create an approach called LibRARIAN (LibRAry veRsion IdentificAtioN) that accurately identifies native libraries and their versions as found in Android apps based on our novel similarity metric bin2sim. LibRARIAN leverages different features extracted from libraries based on their metadata and identifying strings in read-only sections. We discovered 53/200 popular apps (26.5%) with vulnerable versions with known CVEs between Sept. 2013 and May 2020, with 14 of those apps remaining vulnerable. We find that app developers took, on average, 528.71 days to apply security patches, while library developers release a security patch after 54.59 days - a 10 times slower rate of update.
The increasing use of cloud computing for latency-sensitive applications has sparked renewed interest in providing tight bounds on network tail latency. Achieving this in practice at reasonable network utilization has proved elusive, due to a combination of highly bursty application demand, faster link speeds, and heavy-tailed message sizes. While priority scheduling can be used to reduce tail latency for some traffic, this comes at a cost of much worse delay behavior for all other traffic on the network. Most operators choose to run their networks at very low average utilization, despite the added cost, and yet still suffer poor tail behavior. This paper takes a different approach. We build a system, swp, to help operators (and network designers) to understand and control tail latency without relying on priority scheduling. As network workload changes, swp is designed to give real-time advice on the network switch configurations needed to maintain tail latency objectives for each traffic class. The core of swp is an efficient model for simulating the combined effect of traffic characteristics, end-to-end congestion control, and switch scheduling on service-level objectives (SLOs), along with an optimizer that adjusts switch-level scheduling weights assigned to each class. Using simulation across a diverse set of workloads with different SLOs, we show that to meet the same SLOs as swp provides, FIFO would require 65% greater link capacity, and 79% more for scenarios with tight SLOs on bursty traffic classes.
As data are increasingly being stored in different silos and societies becoming more aware of data privacy issues, the traditional centralized training of artificial intelligence (AI) models is facing efficiency and privacy challenges. Recently, federated learning (FL) has emerged as an alternative solution and continue to thrive in this new reality. Existing FL protocol design has been shown to be vulnerable to adversaries within or outside of the system, compromising data privacy and system robustness. Besides training powerful global models, it is of paramount importance to design FL systems that have privacy guarantees and are resistant to different types of adversaries. In this paper, we conduct the first comprehensive survey on this topic. Through a concise introduction to the concept of FL, and a unique taxonomy covering: 1) threat models; 2) poisoning attacks and defenses against robustness; 3) inference attacks and defenses against privacy, we provide an accessible review of this important topic. We highlight the intuitions, key techniques as well as fundamental assumptions adopted by various attacks and defenses. Finally, we discuss promising future research directions towards robust and privacy-preserving federated learning.
Graph convolutional network (GCN) has been successfully applied to many graph-based applications; however, training a large-scale GCN remains challenging. Current SGD-based algorithms suffer from either a high computational cost that exponentially grows with number of GCN layers, or a large space requirement for keeping the entire graph and the embedding of each node in memory. In this paper, we propose Cluster-GCN, a novel GCN algorithm that is suitable for SGD-based training by exploiting the graph clustering structure. Cluster-GCN works as the following: at each step, it samples a block of nodes that associate with a dense subgraph identified by a graph clustering algorithm, and restricts the neighborhood search within this subgraph. This simple but effective strategy leads to significantly improved memory and computational efficiency while being able to achieve comparable test accuracy with previous algorithms. To test the scalability of our algorithm, we create a new Amazon2M data with 2 million nodes and 61 million edges which is more than 5 times larger than the previous largest publicly available dataset (Reddit). For training a 3-layer GCN on this data, Cluster-GCN is faster than the previous state-of-the-art VR-GCN (1523 seconds vs 1961 seconds) and using much less memory (2.2GB vs 11.2GB). Furthermore, for training 4 layer GCN on this data, our algorithm can finish in around 36 minutes while all the existing GCN training algorithms fail to train due to the out-of-memory issue. Furthermore, Cluster-GCN allows us to train much deeper GCN without much time and memory overhead, which leads to improved prediction accuracy---using a 5-layer Cluster-GCN, we achieve state-of-the-art test F1 score 99.36 on the PPI dataset, while the previous best result was 98.71 by .
We investigate how the final parameters found by stochastic gradient descent are influenced by over-parameterization. We generate families of models by increasing the number of channels in a base network, and then perform a large hyper-parameter search to study how the test error depends on learning rate, batch size, and network width. We find that the optimal SGD hyper-parameters are determined by a "normalized noise scale," which is a function of the batch size, learning rate, and initialization conditions. In the absence of batch normalization, the optimal normalized noise scale is directly proportional to width. Wider networks, with their higher optimal noise scale, also achieve higher test accuracy. These observations hold for MLPs, ConvNets, and ResNets, and for two different parameterization schemes ("Standard" and "NTK"). We observe a similar trend with batch normalization for ResNets. Surprisingly, since the largest stable learning rate is bounded, the largest batch size consistent with the optimal normalized noise scale decreases as the width increases.
The classification of sentences is very challenging, since sentences contain the limited contextual information. In this paper, we proposed an Attention-Gated Convolutional Neural Network (AGCNN) for sentence classification, which generates attention weights from the feature's context windows of different sizes by using specialized convolution encoders. It makes full use of limited contextual information to extract and enhance the influence of important features in predicting the sentence's category. Experimental results demonstrated that our model can achieve up to 3.1% higher accuracy than standard CNN models, and gain competitive results over the baselines on four out of the six tasks. Besides, we designed an activation function, namely, Natural Logarithm rescaled Rectified Linear Unit (NLReLU). Experiments showed that NLReLU can outperform ReLU and is comparable to other well-known activation functions on AGCNN.
Network embedding has become a hot research topic recently which can provide low-dimensional feature representations for many machine learning applications. Current work focuses on either (1) whether the embedding is designed as an unsupervised learning task by explicitly preserving the structural connectivity in the network, or (2) whether the embedding is a by-product during the supervised learning of a specific discriminative task in a deep neural network. In this paper, we focus on bridging the gap of the two lines of the research. We propose to adapt the Generative Adversarial model to perform network embedding, in which the generator is trying to generate vertex pairs, while the discriminator tries to distinguish the generated vertex pairs from real connections (edges) in the network. Wasserstein-1 distance is adopted to train the generator to gain better stability. We develop three variations of models, including GANE which applies cosine similarity, GANE-O1 which preserves the first-order proximity, and GANE-O2 which tries to preserves the second-order proximity of the network in the low-dimensional embedded vector space. We later prove that GANE-O2 has the same objective function as GANE-O1 when negative sampling is applied to simplify the training process in GANE-O2. Experiments with real-world network datasets demonstrate that our models constantly outperform state-of-the-art solutions with significant improvements on precision in link prediction, as well as on visualizations and accuracy in clustering tasks.