FoolNLTK：可能是目前最准的中文分词工具

FoolNLTK：可能是目前最准的中文分词工具 | 软件推介

2017 年 12 月 23 日 开源中国 正_午

FoolNLTK 是一个中文处理工具包，可能不是最快的开源中文分词，但很可能是最准的开源中文分词

授权协议：Apache

开发语言：Python

操作系统：跨平台

软件作者：正_午

特点

可能不是最快的开源中文分词，但很可能是最准的开源中文分词
基于BiLSTM模型训练而成
包含分词，词性标注，实体识别,　都有比较高的准确率
用户自定义词典

安装

pip install foolnltk

使用说明

分词

import fool

text = "一个傻子在北京"

print(fool.cut(text))

# ['一个', '傻子', '在', '北京']

命令行分词

python -m fool [filename]

用户自定义词典

词典格式格式如下，词的权重越高，词的长度越长就越越可能出现,　权重值请大于1

难受香菇 10

什么鬼 10

分词工具 10

北京 10

北京天安门 10

加载词典

import fool

fool.load_userdict(path)

text = "我在北京天安门看你难受香菇"

print(fool.cut(text))

# ['我', '在', '北京天安门', '看', '你', '难受香菇']

删除词典

fool.delete_userdict();

词性标注

import fool

text = "一个傻子在北京"

print(fool.pos_cut(text))

#[('一个', 'm'), ('傻子', 'n'), ('在', 'p'), ('北京', 'ns')]

实体识别

import fool

text = "一个傻子在北京"

words, ners = fool.analysis(text)

print(ners)

#[(5, 8, 'location', '北京')]

注意

暂时只在Python3 Linux 平台测试通过

思维导图 | HTTP 超文本协议，让 HTTP 不再难懂

2018 最具就业前景的 7 大编程语言，前三无悬念？

要火！Python 纳入高考科目；PHP、JS 等主流编程语言爆安全漏洞

如何愉快的使用 MQ - 详述各种功能场景

Java 10 新特性解密，引入类型推断机制

点击“阅读原文”查看更多精彩内容

登录查看更多

相关内容

中文分词

关注 6

将一个汉字序列切分成一个一个单独的词，以达到电脑自动识别语句含义的效果。

还在修改博士论文？这份《博士论文写作技巧》为你指南

专知会员服务

159+阅读 · 2020年6月9日

【实用书】Python技术手册，第三版767页pdf

专知会员服务

229+阅读 · 2020年5月21日

【实用书】Python爬虫Web抓取数据，第二版，306页pdf

专知会员服务

115+阅读 · 2020年5月10日

【2020关键词提取】使用多个本地功能从单个文档中提取关键字，YAKE! Keyword extraction from single documents using multiple local features

专知会员服务

25+阅读 · 2020年5月2日

【ICMR2020】持续健康状态接口事件检索

专知会员服务

17+阅读 · 2020年4月18日

【经典书】Python算法第二版，303页pdf，掌握Python语言中的基本算法

专知会员服务

208+阅读 · 2020年3月29日

【干货书】流畅Python，766页pdf，中英文版

专知会员服务

223+阅读 · 2020年3月22日

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

专知会员服务

49+阅读 · 2020年3月7日

【干货】大数据入门指南：Hadoop、Hive、Spark、 Storm等

专知会员服务

94+阅读 · 2019年12月4日

资源|Blockchain区块链中文资源阅读列表

专知会员服务

43+阅读 · 2019年11月20日

来，试试百度的深度学习情感分析工具

AINLP

5+阅读 · 2019年7月8日

Python中文分词工具大合集：安装、使用和测试

AINLP

11+阅读 · 2019年5月13日

中文分词工具在线PK新增：FoolNLTK、LTP、StanfordCoreNLP

AINLP

13+阅读 · 2019年5月5日

五款中文分词工具在线PK: Jieba, SnowNLP, PkuSeg, THULAC, HanLP

AINLP

13+阅读 · 2019年5月1日

Jiagu：中文深度学习自然语言处理工具

AINLP

90+阅读 · 2019年2月20日

北大开源了中文分词工具包，准确度远超Jieba，提供三个预训练模型

量子位

5+阅读 · 2019年1月9日

北大开源全新中文分词工具包：准确率远超THULAC、结巴分词

机器之心

6+阅读 · 2019年1月9日

跨框架深度学习可视化框架 VisualDL | 软件推介

开源中国

3+阅读 · 2018年4月26日

【推荐】中文处理(BiLSTM分词)工具包FoolNLTK

机器学习研究会

6+阅读 · 2017年12月27日

Face Recognition —— Python 的人脸识别库 | 软件推介

开源中国

3+阅读 · 2017年8月5日

BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model

Arxiv

3+阅读 · 2019年2月11日

Deep Learning for Digital Text Analytics: Sentiment Analysis

Arxiv

4+阅读 · 2018年4月10日

Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments

Arxiv

5+阅读 · 2018年4月5日

Identifying Semantic Divergences in Parallel Text without Annotations

Arxiv

3+阅读 · 2018年3月29日

A Comparison of Word Embeddings for the Biomedical Natural Language Processing

Arxiv

3+阅读 · 2018年2月1日

Audio Adversarial Examples: Targeted Attacks on Speech-to-Text

Arxiv

18+阅读 · 2018年1月5日

Knowledge-based Word Sense Disambiguation using Topic Models

Arxiv

5+阅读 · 2018年1月5日

A Big Data Analysis Framework Using Apache Spark and Deep Learning

Arxiv

3+阅读 · 2017年11月25日

Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments

Arxiv

3+阅读 · 2017年11月24日

MatchZoo: A Toolkit for Deep Text Matching

Arxiv

5+阅读 · 2017年7月23日