The outbreak of the COVID-19 pandemic has changed our lives in unprecedented ways. In the face of the projected catastrophic consequences, many countries have enacted social distancing measures in an attempt to limit the spread of the virus. Under these conditions, the Web has become an indispensable medium for information acquisition, communication, and entertainment. At the same time, unfortunately, the Web is being exploited for the dissemination of potentially harmful and disturbing content, such as the spread of conspiracy theories and hateful speech towards specific ethnic groups, in particular towards Chinese people since COVID-19 is believed to have originated from China. In this paper, we make a first attempt to study the emergence of Sinophobic behavior on the Web during the outbreak of the COVID-19 pandemic. We collect two large-scale datasets from Twitter and 4chan's Politically Incorrect board (/pol/) over a time period of approximately five months and analyze them to investigate whether there is a rise or important differences with regard to the dissemination of Sinophobic content. We find that COVID-19 indeed drives the rise of Sinophobia on the Web and that the dissemination of Sinophobic content is a cross-platform phenomenon: it exists on fringe Web communities like \dspol, and to a lesser extent on mainstream ones like Twitter. Also, using word embeddings over time, we characterize the evolution and emergence of new Sinophobic slurs on both Twitter and /pol/. Finally, we find interesting differences in the context in which words related to Chinese people are used on the Web before and after the COVID-19 outbreak: on Twitter we observe a shift towards blaming China for the situation, while on /pol/ we find a shift towards using more (and new) Sinophobic slurs.
We study how the COVID-19 pandemic, alongside the severe mobility restrictions that ensued, has impacted information access on Wikipedia, the world's largest online encyclopedia. A longitudinal analysis that combines pageview statistics for 12 Wikipedia language editions with mobility reports published by Apple and Google reveals massive shifts in the volume and nature of information seeking patterns during the pandemic. Interestingly, while we observe a transient increase in Wikipedia's pageview volume following mobility restrictions, the nature of information sought was impacted more permanently. These changes are most pronounced for language editions associated with countries where the most severe mobility restrictions were implemented. We also find that articles belonging to different topics behaved differently; e.g., attention towards entertainment-related topics is lingering and even increasing, while the interest in health- and biology-related topics was either small or transient. Our results highlight the utility of Wikipedia for studying how the pandemic is affecting people's needs, interests, and concerns.
In this paper, we present a large-scale characterization of the Manosphere, a conglomerate of Web-based misogynist movements roughly focused on "men's issues," which has seen significant growth over the past years. We do so by gathering and analyzing 28.8M posts from 6 forums and 51 subreddits. Overall, we paint a comprehensive picture of the evolution of the Manosphere on the Web, showing the links between its different communities over the years. We find that milder and older communities, such as Pick Up Artists and Men's Rights Activists, are giving way to more extremist ones like Incels and Men Going Their Own Way, with a substantial migration of active users. Moreover, our analysis suggests that these newer communities are more toxic and misogynistic than the former.
Machine Learning models have been deployed across almost every aspect of society, often in situations that affect the social welfare of many individuals. Although these models offer streamlined solutions to large problems, they may contain biases and treat groups or individuals unfairly. To our knowledge, this review is one of the first to focus specifically on gender bias in applications of machine learning. We first introduce several examples of machine learning gender bias in practice. We then detail the most widely used formalizations of fairness in order to address how to make machine learning models fairer. Specifically, we discuss the most influential bias mitigation algorithms as applied to domains in which models have a high propensity for gender discrimination. We group these algorithms into two overarching approaches -- removing bias from the data directly and removing bias from the model through training -- and we present representative examples of each. As society increasingly relies on artificial intelligence to help in decision-making, addressing gender biases present in these models is imperative. To provide readers with the tools to assess the fairness of machine learning models and mitigate the biases present in them, we discuss multiple open source packages for fairness in AI.
Due to its impact, COVID-19 has been stressing the academy to search for curing, mitigating, or controlling it. However, when it comes to controlling, there are still few studies focused on under-reporting estimates. It is believed that under-reporting is a relevant factor in determining the actual mortality rate and, if not considered, can cause significant misinformation. Therefore, the objective of this work is to estimate the under-reporting of cases and deaths of COVID-19 in Brazilian states using data from the Infogripe on notification of Severe Acute Respiratory Infection (SARI). The methodology is based on the concepts of inertia and the use of event detection techniques to study the time series of hospitalized SARI cases. The estimate of real cases of the disease, called novelty, is calculated by comparing the difference in SARI cases in 2020 (after COVID-19) with the total expected cases in recent years (2016 to 2019) derived from a seasonal exponential moving average. The results show that under-reporting rates vary significantly between states and that there are no general patterns for states in the same region in Brazil. The published version of this paper is made available at https://doi.org/10.1007/s00354-021-00125-3. Please cite as: B. Paix\~ao, L. Baroni, M. Pedroso, R. Salles, L. Escobar, C. de Sousa, R. de Freitas Saldanha, J. Soares, R. Coutinho, et al., 2021, Estimation of COVID-19 Under-Reporting in the Brazilian States Through SARI, New Generation Computing
Transport makes an impact across SDGs, encompassing climate change, health, inequality and sustainability. It is also an area in which individuals are able to make decisions which have potential to collectively contribute to significant and wide-ranging benefits. Governments and authorities need citizens to make changes towards adopting sustainable transport behaviours and behaviour change interventions are being used as tools to foster changes in travel choices, towards more sustainable modes. Blockchain technology has the potential to bring new levels of scale to transport behaviour change interventions, but a rigorous approach to token design is required. This paper uses a survey of research projects and use cases to analyse current applications of blockchain technology in transport behaviour change interventions, and identifies barriers and limitations to achieving targeted change at scale. The paper draws upon these findings to outline a research agenda that brings a focus on correlating specific Behaviour Change Techniques (BCTs) to token design, and defines processes for standardising token designs in behaviour change tools. The paper further outlines architecture and operational considerations for blockchain-based platforms in behaviour change interventions, such that design choices do not compromise opportunities or wider environmental goals.
As the COVID-19 pandemic is disrupting life worldwide, related online communities are popping up. In particular, two "new" communities, /r/China flu and /r/Coronavirus, emerged on Reddit and have been dedicated to COVID- related discussions from the very beginning of this pandemic. With /r/Coronavirus promoted as the official community on Reddit, it remains an open question how users choose between these two highly-related communities. In this paper, we characterize user trajectories in these two communities from the beginning of COVID-19 to the end of September 2020. We show that new users of /r/China flu and /r/Coronavirus were similar from January to March. After that, their differences steadily increase, evidenced by both language distance and membership prediction, as the pandemic continues to unfold. Furthermore, users who started at /r/China flu from January to March were more likely to leave, while those who started in later months tend to remain highly "loyal". To understand this difference, we develop a movement analysis framework to understand membership changes in these two communities and identify a significant proportion of /r/China flu members (around 50%) that moved to /r/Coronavirus in February. This movement turns out to be highly predictable based on other subreddits that users were previously active in. Our work demonstrates how two highly-related communities emerge and develop their own identity in a crisis, and highlights the important role of existing communities in understanding such an emergence.
The global spread of the novel coronavirus is affected by the spread of related misinformation -- the so-called COVID-19 Infodemic -- that makes populations more vulnerable to the disease through resistance to mitigation efforts. Here we analyze the prevalence and diffusion of links to low-credibility content about the pandemic across two major social media platforms, Twitter and Facebook. We characterize cross-platform similarities and differences in popular sources, diffusion patterns, influencers, coordination, and automation. Comparing the two platforms, we find divergence among the prevalence of popular low-credibility sources and suspicious videos. A minority of accounts and pages exert a strong influence on each platform. These misinformation "superspreaders" are often associated with the low-credibility sources and tend to be verified by the platforms. On both platforms, there is evidence of coordinated sharing of Infodemic content. The overt nature of this manipulation points to the need for societal-level solutions in addition to mitigation strategies within the platforms. However, we highlight limits imposed by inconsistent data-access policies on our capability to study harmful manipulations of information ecosystems.
The COVID-19 pandemic has impacted billions of people around the world. To capture some of these impacts in the United States, we are conducting a nationwide longitudinal survey collecting information about travel-related behaviors and attitudes before, during, and after the COVID-19 pandemic. The survey questions cover a wide range of topics including commuting, daily travel, air travel, working from home, online learning, shopping, and risk perception, along with attitudinal, socioeconomic, and demographic information. Version 1.0 of the survey contains 8,723 responses that are publicly available. The survey is deployed over multiple waves to the same respondents to monitor how behaviors and attitudes evolve over time. This article details the methodology adopted for the collection, cleaning, and processing of the data. In addition, the data are weighted to be representative of national and regional demographics. This survey dataset can aid researchers, policymakers, businesses, and government agencies in understanding both the extent of behavioral shifts and the likelihood that these changes will persist after COVID-19.
The task of session search focuses on using interaction data to improve relevance for the user's next query at the session level. In this paper, we formulate session search as a personalization task under the framework of learning to rank. Personalization approaches re-rank results to match a user model. Such user models are usually accumulated over time based on the user's browsing behaviour. We use a pre-computed and transparent set of user models based on concepts from the social science literature. Interaction data are used to map each session to these user models. Novel features are then estimated based on such models as well as sessions' interaction data. Extensive experiments on test collections from the TREC session track show statistically significant improvements over current session search algorithms.
Edge intelligence refers to a set of connected systems and devices for data collection, caching, processing, and analysis in locations close to where data is captured based on artificial intelligence. The aim of edge intelligence is to enhance the quality and speed of data processing and protect the privacy and security of the data. Although recently emerged, spanning the period from 2011 to now, this field of research has shown explosive growth over the past five years. In this paper, we present a thorough and comprehensive survey on the literature surrounding edge intelligence. We first identify four fundamental components of edge intelligence, namely edge caching, edge training, edge inference, and edge offloading, based on theoretical and practical results pertaining to proposed and deployed systems. We then aim for a systematic classification of the state of the solutions by examining research results and observations for each of the four components and present a taxonomy that includes practical problems, adopted techniques, and application goals. For each category, we elaborate, compare and analyse the literature from the perspectives of adopted techniques, objectives, performance, advantages and drawbacks, etc. This survey article provides a comprehensive introduction to edge intelligence and its application areas. In addition, we summarise the development of the emerging research field and the current state-of-the-art and discuss the important open issues and possible theoretical and technical solutions.