大型增强大规模抽取电子邮件特写,专门为Cosine 距离定制 (Massive Enhanced Extracted Email Features Tailored for Cosine Distance) - 专知论文

会员服务 ·

0

余弦 · 绝对多数投票 · 模型评估 · 知识神经元网络 · 标注 ·

2022 年 5 月 11 日

Massive Enhanced Extracted Email Features Tailored for Cosine Distance

翻译：大型增强大规模抽取电子邮件特写,专门为Cosine 距离定制

Farshad Barahimi

In this paper, the process of converting the Enron email dataset (the version cited in the preprint) to thousands of features per email for a selected set of 2400 labelled emails is explained and evaluated. The final features are tailored for Cosine distance so that the Cosine distance invertly reflect the number of top indicative words of each email that are common between the two emails in an explainable normalized fashion. The labelling is based on the leaf folder name in the Enron email dataset (the version cited in the preprint) folders tree and the 2400 emails selected consist 300 emails for each of the 8 labels. The evaluation is based on the accuracy of a k nearest neighbours majority voting classification using Cosine distance. In addition to KNN majority voting classification accuracy and confusion matrix, some statistics for the process is reported. The KNN majority voting classification accuracy using Cosine distance is 76.75% which shows at least some level of success given the 8 labels involved. The result of conversion is 48557 features per selected email out of which exactly 40 features per email are non-zero. The result of conversion is a data set named MeeefTCD (Massive Enhanced Extracted Email Features Tailored for Cosine Distance) available at https://web.cs.dal.ca/~barahimi/data-sets/meeeftcd/ and on a github repository mentioned in this paper.

翻译：在本文中, 将 Enron 电子邮件数据集( 预印中引用的版本) 转换为每部电子邮件的数千个功能的过程得到解释和评估。最后的功能是为Cosine 距离定制的, 以便Cosine 距离能以可解释的正常化方式反倒反映两个电子邮件之间常见的每个电子邮件的顶级提示词数。标签基于 Enron 电子邮件数据集( 预印中引用的版本) 的叶子文件夹名称。标签基于 Enron 电子邮件( 预印中引用的版本) 文件夹树和所选的 24 00 email 包含8 标签中每个标签的300 个电子邮件。评估基于使用 Cosine 距离的近邻多数选举分类的准确性。除了 KNN 多数选举分类的准确性和混乱矩阵外, 进程的一些统计数据被报告。 KNN 使用 Cosine 距离的多数选举分类准确性为76. 75% 显示8 标签至少一定的成功程度。转换的结果是每个选中的电子邮件有48557 的特性, 其中每部有40个功能。。。转换的结果是在Meef- dealalalemb/ developmentalemisalisalalaldaldaldaldalmax 。

0

相关内容

【MM 2021】VLAD-VSA: 基于词表分离和自适应的跨领域人脸欺诈检测，VLAD-VSA: Cross-Domain Face Presentation Attack Detection with Vocabulary Separation and Adaptation

【MM 2021】VLAD-VSA: 基于词表分离和自适应的跨领域人脸欺诈检测，VLAD-VSA: Cross-Domain Face Presentation Attack Detection with Vocabulary Separation and Adaptation

专知会员服务

5+阅读 · 2022年3月22日

MIT经典《线性代数》，584页pdf，Introduction to Linear Algebra, Fifth Edition, Gilbert Strang, 2016.

MIT经典《线性代数》，584页pdf，Introduction to Linear Algebra, Fifth Edition, Gilbert Strang, 2016.

专知会员服务

425+阅读 · 2021年1月11日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

180+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

42+阅读 · 2019年1月3日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

rs4969170GG基因型抑制SOCS3基因转录活性促进肝癌发生发展的功能机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

20-HETE在非甾体抗炎药致心血管损伤中的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

miR-195靶向LRP6调控Wnt/β-catenin信号通路抑制大肠癌侵袭转移及健脾解毒方对其作用的研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于Cl自由基注入的煤富氧燃烧烟气Hg高效脱除实验研究

国家自然科学基金

0+阅读 · 2013年12月31日

核受体LXR促进BATF2表达的分子机制及其在抗肝细胞癌中的作用研究

国家自然科学基金

0+阅读 · 2013年12月31日

化坚解毒活血法调节p53-microRNA200/HIF-1a发挥抗大肠癌转移的分子机制

国家自然科学基金

0+阅读 · 2013年12月31日

Diversin介导非小细胞肺癌长春瑞滨耐药的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

靶向调控PLCE1基因的microRNAs在新疆哈族食管癌侵袭转移中的作用与机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

生长素在harpinXoo激发的过敏性反应中的角色及其分子调控机制

国家自然科学基金

0+阅读 · 2011年12月31日

有机微腔发光器件中激子发光动力学过程研究

国家自然科学基金

0+阅读 · 2009年12月31日

TopicFM: Robust and Interpretable Feature Matching with Topic-assisted

Arxiv

0+阅读 · 2022年7月1日

On the Number of Quantifiers as a Complexity Measure

Arxiv

1+阅读 · 2022年6月30日

GLD-Net: Improving Monaural Speech Enhancement by Learning Global and Local Dependency Features with GLD Block

Arxiv

0+阅读 · 2022年6月30日

Knowledge Distillation for Object Detection via Rank Mimicking and Prediction-guided Feature Imitation

Arxiv

11+阅读 · 2021年12月9日

Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark

Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark

Arxiv

19+阅读 · 2020年12月17日

Graph Enhanced Representation Learning for News Recommendation

Arxiv

24+阅读 · 2020年3月31日

CoNet: Collaborative Cross Networks for Cross-Domain Recommendation

Arxiv

13+阅读 · 2018年4月20日

An Interpretable Reasoning Network for Multi-Relation Question Answering

Arxiv

17+阅读 · 2018年1月15日

Learning Hierarchical Features for Visual Object Tracking with Recursive Neural Networks

Arxiv

13+阅读 · 2018年1月6日

Distance-based Self-Attention Network for Natural Language Inference

Arxiv

10+阅读 · 2017年12月6日

VIP会员

文章信息

相关主题

绝对多数投票

知识神经元网络

相关VIP内容

【MM 2021】VLAD-VSA: 基于词表分离和自适应的跨领域人脸欺诈检测，VLAD-VSA: Cross-Domain Face Presentation Attack Detection with Vocabulary Separation and Adaptation

【MM 2021】VLAD-VSA: 基于词表分离和自适应的跨领域人脸欺诈检测，VLAD-VSA: Cross-Domain Face Presentation Attack Detection with Vocabulary Separation and Adaptation

专知会员服务

5+阅读 · 2022年3月22日

MIT经典《线性代数》，584页pdf，Introduction to Linear Algebra, Fifth Edition, Gilbert Strang, 2016.

MIT经典《线性代数》，584页pdf，Introduction to Linear Algebra, Fifth Edition, Gilbert Strang, 2016.

专知会员服务

425+阅读 · 2021年1月11日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

180+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

中文版 | 美国会在最终的25财年国防协议中缩减了预算目标

《发展“敏捷战斗部署”所需的作战支援任务就绪空勤人员》最新102页报告

《人工智能在决策中角色的演变》最新278页

中文版 | 近程防空系统的必要性日益凸显

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

42+阅读 · 2019年1月3日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

相关论文

TopicFM: Robust and Interpretable Feature Matching with Topic-assisted

Arxiv

0+阅读 · 2022年7月1日

On the Number of Quantifiers as a Complexity Measure

Arxiv

1+阅读 · 2022年6月30日

GLD-Net: Improving Monaural Speech Enhancement by Learning Global and Local Dependency Features with GLD Block

Arxiv

0+阅读 · 2022年6月30日

Knowledge Distillation for Object Detection via Rank Mimicking and Prediction-guided Feature Imitation

Arxiv

11+阅读 · 2021年12月9日

Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark

Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark

Arxiv

19+阅读 · 2020年12月17日

Graph Enhanced Representation Learning for News Recommendation

Arxiv

24+阅读 · 2020年3月31日

CoNet: Collaborative Cross Networks for Cross-Domain Recommendation

Arxiv

13+阅读 · 2018年4月20日

An Interpretable Reasoning Network for Multi-Relation Question Answering

Arxiv

17+阅读 · 2018年1月15日

Learning Hierarchical Features for Visual Object Tracking with Recursive Neural Networks

Arxiv

13+阅读 · 2018年1月6日

Distance-based Self-Attention Network for Natural Language Inference

Arxiv

10+阅读 · 2017年12月6日

相关基金

rs4969170GG基因型抑制SOCS3基因转录活性促进肝癌发生发展的功能机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

20-HETE在非甾体抗炎药致心血管损伤中的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

miR-195靶向LRP6调控Wnt/β-catenin信号通路抑制大肠癌侵袭转移及健脾解毒方对其作用的研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于Cl自由基注入的煤富氧燃烧烟气Hg高效脱除实验研究

国家自然科学基金

0+阅读 · 2013年12月31日

核受体LXR促进BATF2表达的分子机制及其在抗肝细胞癌中的作用研究

国家自然科学基金

0+阅读 · 2013年12月31日

化坚解毒活血法调节p53-microRNA200/HIF-1a发挥抗大肠癌转移的分子机制

国家自然科学基金

0+阅读 · 2013年12月31日

Diversin介导非小细胞肺癌长春瑞滨耐药的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

靶向调控PLCE1基因的microRNAs在新疆哈族食管癌侵袭转移中的作用与机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

生长素在harpinXoo激发的过敏性反应中的角色及其分子调控机制

国家自然科学基金

0+阅读 · 2011年12月31日

有机微腔发光器件中激子发光动力学过程研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员