This paper conducts a comparative study on the performance of various machine learning (``ML'') approaches for classifying judgments into legal areas. Using a novel dataset of 6,227 Singapore Supreme Court judgments, we investigate how state-of-the-art NLP methods compare against traditional statistical models when applied to a legal corpus that comprised few but lengthy documents. All approaches tested, including topic model, word embedding, and language model-based classifiers, performed well with as little as a few hundred judgments. However, more work needs to be done to optimize state-of-the-art methods for the legal domain.
翻译:本文件比较研究各种机器学习(“ML”)将判决分类为法律领域的方法的绩效,利用新加坡最高法院6 227项判决的新数据集,我们调查在应用到由少量但长篇文件组成的法律体系时,最先进的NLP方法如何与传统统计模式进行比较,所有方法,包括专题模式、字嵌入和语言模式分类方法都经过测试,只做了几百项判决,但还需要做更多的工作,以优化法律领域的最先进方法。