题目： “You are grounded!”: Latent Name Artifacts in Pre-trained Language Models
预训练语言模型（LM）可能会使训练语料库的偏见持续下降到下游模型。 我们将重点放在与给定名称（例如Donald）的表示相关的构件上，这些构件取决于语料库，可能与特定实体相关联，如下一个标记预测（例如Trump）所示。 虽然在某些情况下很有帮助，但在未指定或不适当的情况下也会发生接地。 例如，“唐纳德是一个”而产生的结尾与其他名字的结尾大不相同，并且通常具有比平均水平更高的负面情绪。 我们通过阅读理解证明了对下游任务的潜在影响。 我们的实验表明，对不同语料库进行额外的预训练可能会减轻这种影响。
Recently, the pandemic of the novel Coronavirus Disease-2019 (COVID-19) has presented governments with ultimate challenges. In the United States, the country with the highest confirmed COVID-19 infection cases, a nationwide social distancing protocol has been implemented by the President. For the first time in a hundred years since the 1918 flu pandemic, the US population is mandated to stay in their households and avoid public contact. As a result, the majority of public venues and services have ceased their operations. Following the closure of the University of Washington on March 7th, more than a thousand colleges and universities in the United States have cancelled in-person classes and campus activities, impacting millions of students. This paper aims to discover the social implications of this unprecedented disruption in our interactive society regarding both the general public and higher education populations by mining people's opinions on social media. We discover several topics embedded in a large number of COVID-19 tweets that represent the most central issues related to the pandemic, which are of great concerns for both college students and the general public. Moreover, we find significant differences between these two groups of Twitter users with respect to the sentiments they expressed towards the COVID-19 issues. To our best knowledge, this is the first social media-based study which focuses on the college student community's demographics and responses to prevalent social issues during a major crisis.