我们能通过自然语言生成壳牌码吗? (Can We Generate Shellcodes via Natural Language? An Empirical Study) - 专知论文

会员服务 ·

0

shellcode · NMT · 模型评估 · Machine Translation · CASES ·

2022 年 2 月 8 日

Can We Generate Shellcodes via Natural Language? An Empirical Study

翻译：我们能通过自然语言生成壳牌码吗?

Pietro Liguori,Erfan Al-Hossami,Domenico Cotroneo,Roberto Natella,Bojan Cukic,Samira Shaikh

from arxiv, 33 pages, 5 figures, 9 tables. To be published in Automated Software Engineering journal

Writing software exploits is an important practice for offensive security analysts to investigate and prevent attacks. In particular, shellcodes are especially time-consuming and a technical challenge, as they are written in assembly language. In this work, we address the task of automatically generating shellcodes, starting purely from descriptions in natural language, by proposing an approach based on Neural Machine Translation (NMT). We then present an empirical study using a novel dataset (Shellcode_IA32), which consists of 3,200 assembly code snippets of real Linux/x86 shellcodes from public databases, annotated using natural language. Moreover, we propose novel metrics to evaluate the accuracy of NMT at generating shellcodes. The empirical analysis shows that NMT can generate assembly code snippets from the natural language with high accuracy and that in many cases can generate entire shellcodes with no errors.

翻译：书写软件开发是进攻性安全分析家调查和防止攻击的一个重要做法,特别是贝壳代码特别耗费时间和技术挑战,因为它们是用装配语言写的。在这项工作中,我们处理的是自动生成贝壳代码的任务,完全从自然语言描述开始,提出基于神经机器翻译(NMT)的方法。然后,我们用新颖的数据集(Shellcode_IA32)提出实证研究,该数据集包括公共数据库3 200个真正的Linux/x86贝壳代码的组装代码片,用自然语言附加注释。此外,我们提出了评估NMT在生成贝壳代码时准确性的新指标。实证分析表明,NMT可以非常精确地从自然语言中生成组装代码片,在许多情况下,可以产生没有误差的完整贝壳代码。

0

相关内容

shellcode

shellcode是一段不依赖于上下文的可执行机器码。

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

高斯序列与过程的极值理论

国家自然科学基金

2+阅读 · 2015年12月31日

内蒙古木本固氮植物共生固氮菌多样性分析

国家自然科学基金

0+阅读 · 2012年12月31日

关于图顶点划分的 Thomassen 猜想

国家自然科学基金

0+阅读 · 2011年12月31日

Lorenz-like系统族的等价性和混沌吸引子几何结构

国家自然科学基金

0+阅读 · 2011年12月31日

基于生成树库分析与生成一体化机器翻译模型研究

国家自然科学基金

0+阅读 · 2009年12月31日

A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

Arxiv

0+阅读 · 2022年4月17日

A Survey of Natural Language Generation

Arxiv

15+阅读 · 2021年12月22日

Latent Relation Language Models

Arxiv

21+阅读 · 2019年8月21日

Multilingual Sentiment Analysis: An RNN-Based Framework for Limited Data

Arxiv

12+阅读 · 2018年6月8日

Distance-based Self-Attention Network for Natural Language Inference

Arxiv

10+阅读 · 2017年12月6日

VIP会员

文章信息

相关主题

Machine Translation

相关VIP内容

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

前沿人工智能趋势报告（Frontier AI Trends Report）

【AAAI2026】善始则事半功倍：基于前缀优化的大语言模型推理强化学习

Andrej Karpathy：2025 年 LLM 年度回顾（2025 LLM Year in Review）

音退化问题：基于输入操控的鲁棒语音转换综述

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

Arxiv

0+阅读 · 2022年4月17日

A Survey of Natural Language Generation

Arxiv

15+阅读 · 2021年12月22日

Latent Relation Language Models

Arxiv

21+阅读 · 2019年8月21日

Multilingual Sentiment Analysis: An RNN-Based Framework for Limited Data

Arxiv

12+阅读 · 2018年6月8日

Distance-based Self-Attention Network for Natural Language Inference

Arxiv

10+阅读 · 2017年12月6日

相关基金

高斯序列与过程的极值理论

国家自然科学基金

2+阅读 · 2015年12月31日

内蒙古木本固氮植物共生固氮菌多样性分析

国家自然科学基金

0+阅读 · 2012年12月31日

关于图顶点划分的 Thomassen 猜想

国家自然科学基金

0+阅读 · 2011年12月31日

Lorenz-like系统族的等价性和混沌吸引子几何结构

国家自然科学基金

0+阅读 · 2011年12月31日

基于生成树库分析与生成一体化机器翻译模型研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员