培训前的文字表述方式是否有助于多语种和多维语言熟练程度建模? (Are pre-trained text representations useful for multilingual and multi-dimensional language proficiency modeling?) - 专知论文

会员服务 ·

0

Performer · MoDELS · Seven · 学习器 · 词表 ·

2021 年 2 月 25 日

Are pre-trained text representations useful for multilingual and multi-dimensional language proficiency modeling?

翻译：培训前的文字表述方式是否有助于多语种和多维语言熟练程度建模?

Taraka Rama,Sowmya Vajjala

from arxiv, 10 pages

Development of language proficiency models for non-native learners has been an active area of interest in NLP research for the past few years. Although language proficiency is multidimensional in nature, existing research typically considers a single "overall proficiency" while building models. Further, existing approaches also considers only one language at a time. This paper describes our experiments and observations about the role of pre-trained and fine-tuned multilingual embeddings in performing multi-dimensional, multilingual language proficiency classification. We report experiments with three languages -- German, Italian, and Czech -- and model seven dimensions of proficiency ranging from vocabulary control to sociolinguistic appropriateness. Our results indicate that while fine-tuned embeddings are useful for multilingual proficiency modeling, none of the features achieve consistently best performance for all dimensions of language proficiency. All code, data and related supplementary material can be found at: https://github.com/nishkalavallabhi/MultidimCEFRScoring.

翻译：过去几年来,为非母语学生开发语言熟练程度模式一直是国家语言熟练程度研究的一个积极关注领域,尽管语言熟练程度具有多面性,但现有研究通常在建立模式时考虑单一的“总体熟练程度”,此外,现有方法也同时只考虑一种语言,本文描述了我们关于预先培训和经过微调的多种语言熟练程度的多语言嵌入在多维、多语言熟练程度分类中的作用的实验和观察,我们报告了三种语言 -- -- 德语、意大利语和捷克语 -- -- 的实验,以及从词汇控制到社会语言适应性的7个水平模型。我们的结果显示,尽管微调的嵌入对于多语言熟练程度模型有用,但没有一个特征在语言熟练程度的所有层面都能始终取得最佳业绩。所有代码、数据和相关补充材料见https://github.com/nishkalavalabhi/MultidimCEFRScoring。

0

相关内容

Performer

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

【ACL2020-斯坦福大学】低维双曲线知识图谱嵌入，Low-Dimensional Hyperbolic Knowledge Graph Embeddings (ACL 2020)

【ACL2020-斯坦福大学】低维双曲线知识图谱嵌入，Low-Dimensional Hyperbolic Knowledge Graph Embeddings (ACL 2020)

专知会员服务

55+阅读 · 2020年7月3日

KG-BERT：基于BERT的知识图谱补全，KG-BERT: BERT for Knowledge Graph Completion

KG-BERT：基于BERT的知识图谱补全，KG-BERT: BERT for Knowledge Graph Completion

专知会员服务

195+阅读 · 2020年5月31日

【论文翻译】2020最新预训练语言模型综述：Pre-trained Models for Natural Language Processing: A Survey

【论文翻译】2020最新预训练语言模型综述：Pre-trained Models for Natural Language Processing: A Survey

专知会员服务

94+阅读 · 2020年4月13日

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

专知会员服务

26+阅读 · 2020年3月19日

【SIGMOD2020】稀疏数据半监督学习的分解图表示，Factorized Graph Representations for Semi-Supervised Learning from Sparse Data

【SIGMOD2020】稀疏数据半监督学习的分解图表示，Factorized Graph Representations for Semi-Supervised Learning from Sparse Data

专知会员服务

15+阅读 · 2020年3月7日

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

专知会员服务

32+阅读 · 2020年2月21日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

计算机类 | 低难度国际会议信息6条

计算机类 | 低难度国际会议信息6条

Call4Papers

6+阅读 · 2019年4月28日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

AINLP

10+阅读 · 2019年2月9日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Linguistically Regularized LSTMs for Sentiment Classification

Linguistically Regularized LSTMs for Sentiment Classification

黑龙江大学自然语言处理实验室

8+阅读 · 2018年5月4日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】SLAM相关资源大列表

【推荐】SLAM相关资源大列表

机器学习研究会

10+阅读 · 2017年8月18日

LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding

Arxiv

0+阅读 · 2021年4月18日

MetaXL: Meta Representation Transformation for Low-resource Cross-lingual Learning

Arxiv

0+阅读 · 2021年4月16日

Empowering News Recommendation with Pre-trained Language Models

Arxiv

1+阅读 · 2021年4月15日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Arxiv

15+阅读 · 2020年2月28日

Pre-Training with Whole Word Masking for Chinese BERT

Arxiv

11+阅读 · 2019年6月19日

ERNIE: Enhanced Language Representation with Informative Entities

Arxiv

5+阅读 · 2019年5月17日

Pre-trained Language Model Representations for Language Generation

Arxiv

5+阅读 · 2019年4月1日

Multilingual Sentiment Analysis: An RNN-Based Framework for Limited Data

Arxiv

12+阅读 · 2018年6月8日

Fine-tuned Language Models for Text Classification

Arxiv

5+阅读 · 2018年1月18日

VIP会员

文章信息

相关主题

相关VIP内容

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

【ACL2020-斯坦福大学】低维双曲线知识图谱嵌入，Low-Dimensional Hyperbolic Knowledge Graph Embeddings (ACL 2020)

【ACL2020-斯坦福大学】低维双曲线知识图谱嵌入，Low-Dimensional Hyperbolic Knowledge Graph Embeddings (ACL 2020)

专知会员服务

55+阅读 · 2020年7月3日

KG-BERT：基于BERT的知识图谱补全，KG-BERT: BERT for Knowledge Graph Completion

KG-BERT：基于BERT的知识图谱补全，KG-BERT: BERT for Knowledge Graph Completion

专知会员服务

195+阅读 · 2020年5月31日

【论文翻译】2020最新预训练语言模型综述：Pre-trained Models for Natural Language Processing: A Survey

【论文翻译】2020最新预训练语言模型综述：Pre-trained Models for Natural Language Processing: A Survey

专知会员服务

94+阅读 · 2020年4月13日

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

专知会员服务

26+阅读 · 2020年3月19日

【SIGMOD2020】稀疏数据半监督学习的分解图表示，Factorized Graph Representations for Semi-Supervised Learning from Sparse Data

【SIGMOD2020】稀疏数据半监督学习的分解图表示，Factorized Graph Representations for Semi-Supervised Learning from Sparse Data

专知会员服务

15+阅读 · 2020年3月7日

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

专知会员服务

32+阅读 · 2020年2月21日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

新型数字杀伤链：理解综合战术网络对野战炮兵体系的能力与效益

《对抗环境中运用数字孪生技术优化预测性维护与后勤保障》2025最新93页

《任务式指挥十六个案例研究》232页

《幻觉还是事实：国防大型语言模型的可信度评估研究》2025最新109页

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

计算机类 | 低难度国际会议信息6条

计算机类 | 低难度国际会议信息6条

Call4Papers

6+阅读 · 2019年4月28日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

AINLP

10+阅读 · 2019年2月9日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Linguistically Regularized LSTMs for Sentiment Classification

Linguistically Regularized LSTMs for Sentiment Classification

黑龙江大学自然语言处理实验室

8+阅读 · 2018年5月4日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】SLAM相关资源大列表

【推荐】SLAM相关资源大列表

机器学习研究会

10+阅读 · 2017年8月18日

相关论文

LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding

Arxiv

0+阅读 · 2021年4月18日

MetaXL: Meta Representation Transformation for Low-resource Cross-lingual Learning

Arxiv

0+阅读 · 2021年4月16日

Empowering News Recommendation with Pre-trained Language Models

Arxiv

1+阅读 · 2021年4月15日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Arxiv

15+阅读 · 2020年2月28日

Pre-Training with Whole Word Masking for Chinese BERT

Arxiv

11+阅读 · 2019年6月19日

ERNIE: Enhanced Language Representation with Informative Entities

Arxiv

5+阅读 · 2019年5月17日

Pre-trained Language Model Representations for Language Generation

Arxiv

5+阅读 · 2019年4月1日

Multilingual Sentiment Analysis: An RNN-Based Framework for Limited Data

Arxiv

12+阅读 · 2018年6月8日

Fine-tuned Language Models for Text Classification

Arxiv

5+阅读 · 2018年1月18日

微信扫码咨询专知VIP会员