会话英语音频注释

资源介绍

会话英语音频注释 (http://ds.jsai.org.cn/) 语音识别第1张 Context Named Entity Recognition (NER) has been mostly studied in the context of written text. Specifically, NER is an important step in de-identification (de-ID) of medical records, many of which are recorded conversations between a patient and a doctor. In such recordings, audio spans with personal information should be redacted, similar to the redaction of sensitive character spans in de-ID for written text. This dataset was used to test the performance of our Audio De-id pipeline in our NAACL 2019 paper '[Audio De-identification: A New Entity Recognition Task][1]'. We evaluated our pipeline using a random subset of conversations from the Switchboard (LDC2001S13) and Fisher (LDC2004S13) datasets, which consist of English conversations. Content We annotated the files manually with audio annotations consisting of an NER tag, an audio interval time, a conversation ID and a source dataset. The dataset includes a CC BY 4.0 license file, three data files, and a readme file with additional context and instructions. [1]: https://arxiv.org/abs/1903.07037

END

上一篇爵士乐 Ml 准备米迪

下一篇欧洲电视2018年投票结果

发表评论取消回复

请先登录账户再评论哦

会话英语音频注释免费

资源介绍

发表评论取消回复

最新文章

热门文章

THUYG-20 维吾尔语语音数据

VGG-Sound

ESC环境噪音分类数据集

LibriTTS语料库

CN-Celeb

标签云

猜你喜欢

会话英语音频注释免费

资源介绍

发表评论 取消回复

最新文章

热门文章

THUYG-20 维吾尔语语音数据

VGG-Sound

ESC环境噪音分类数据集

LibriTTS语料库

CN-Celeb

标签云

猜你喜欢

THUYG-20 维吾尔语语音数据

VGG-Sound

ESC环境噪音分类数据集

LibriTTS语料库

CN-Celeb

叠置密集去噪-分割合成标注

AISHELL-1 开源中文语音数据库

THCHS30 中文语音数据集

呼吸声音数据集，用于检测呼吸系统疾病

Google Audioset 音频数据集

发表评论取消回复