Software log analysis helps to maintain the health of software solutions and ensure compliance and security. Existing software systems consist of heterogeneous components emitting logs in various formats. A typical solution is to unify the logs using manually built parsers, which is laborious. Instead, we explore the possibility of automating the parsing task by employing machine translation (MT). We create a tool that generates synthetic Apache log records which we used to train recurrent-neural-network-based MT models. Models' evaluation on real-world logs shows that the models can learn Apache log format and parse individual log records. The median relative edit distance between an actual real-world log record and the MT prediction is less than or equal to 28%. Thus, we show that log parsing using an MT approach is promising.
翻译:软件日志分析有助于维护软件解决方案的健康,并确保遵守和安全。 现有的软件系统由不同组成部分组成, 以不同格式排放日志。 一个典型的解决方案是使用人工制作的解析器统一日志, 这是一项艰巨的工作。 相反, 我们探索通过机器翻译实现解析任务自动化的可能性。 我们创建了一个工具, 生成合成阿帕奇日志记录, 用于培训基于经常性神经网络的MT模型。 模型对真实世界日志的评估显示, 模型可以学习阿帕奇日志格式, 分析单个日志记录。 实际真实世界日志记录与MT预测之间的中位相对编辑距离小于或等于28%。 因此, 我们显示, 使用MT方法进行日志分析很有希望。