In the proposed demo, we will present a new software - Linguistic Field Data Management and Analysis System - LiFE (https://github.com/kmi-linguistics/life) - an open-source, web-based linguistic data management and analysis application that allows for systematic storage, management, sharing and usage of linguistic data collected from the field. The application allows users to store lexical items, sentences, paragraphs, audio-visual content with rich glossing / annotation; generate interactive and print dictionaries; and also train and use natural language processing tools and models for various purposes using this data. Since its a web-based application, it also allows for seamless collaboration among multiple persons and sharing the data, models, etc with each other. The system uses the Python-based Flask framework and MongoDB in the backend and HTML, CSS and Javascript at the frontend. The interface allows creation of multiple projects that could be shared with the other users. At the backend, the application stores the data in RDF format so as to allow its release as Linked Data over the web using semantic web technologies - as of now it makes use of the OntoLex-Lemon for storing the lexical data and Ligt for storing the interlinear glossed text and then internally linking it to the other linked lexicons and databases such as DBpedia and WordNet. Furthermore it provides support for training the NLP systems using scikit-learn and HuggingFace Transformers libraries as well as make use of any model trained using these libraries - while the user interface itself provides limited options for tuning the system, an externally-trained model could be easily incorporated within the application; similarly the dataset itself could be easily exported into a standard machine-readable format like JSON or CSV that could be consumed by other programs and pipelines.
翻译:在拟议的演示中,我们将推出一个新的软件-语言外勤数据管理和分析系统-LIFE(https://github.com/kmi-Linguistics/life),这是一个开放源码、基于网络的语言数据管理和分析应用程序,可以系统存储、管理、分享和使用从实地收集的语言数据。应用程序允许用户存储包含丰富光滑/注解内容的词汇项目、句号、段落、视听内容;生成互动和打印字典;以及培训和使用这些数据为各种目的培训和使用自然语言处理工具和模型。由于它是一个基于网络的应用程序,它还允许多人之间开展无缝合作,并彼此共享数据、模型、模型等。这个系统使用基于Python的Flask框架和在后端和 HTML、 CSS 和 Javastratimetict 的MongDDD 软件。这个界面可以创建多个项目,可以与其他用户共享。在后端,应用程序将任何可快速存储数据格式保存在RDFFS格式中,以便将其通过网络链接数据发布,同时使用现在的Crealtical-real liketal 和Lex 将数据库链接链接链接链接链接链接链接链接链接,同时将JyLex-liex-liex 将它作为内部数据库的链接链接链接链接链接链接链接链接链接链接链接到内部数据系统,将它作为内部数据库,将数据库,将它作为内部的链接链接链接链接链接到内部的链接到内部数据库,将数据库,将它作为内部数据库,将数据库,将数据库,将数据库,将它作为内部链接链接到内部数据库,将数据库,将它作为内部数据库,将它作为内部的链接链接链接链接链接链接到内部的链接到内部的链接到内部数据系统,将它作为内部的链接到内部数据系统,将它作为内部的链接到内部数据数据库,将它作为内部数据系统,将它作为内部数据系统,将它作为内部的链接到内部的链接到内部数据数据库,将它作为内部的链接到内部的链接到内部的链接到内部的链接到内部的链接到内部的链接到内部的链接到内部的链接到内部的链接,将它作为内部的链接到内部数据数据库,将它的链接,将它的链接,将它的链接到链接