会话英语音频注释免费

jsaifc 22 2021-08-24 语音识别

资源介绍

会话英语音频注释 (http://ds.jsai.org.cn/) 语音识别 第1张Context Named Entity Recognition (NER) has been mostly studied in the context of written text. Specifically, NER is an important step in de-identification (de-ID) of medical records, many of which are recorded conversations between a patient and a doctor. In such recordings, audio spans with personal information should be redacted, similar to the redaction of sensitive character spans in de-ID for written text. This dataset was used to test the performance of our Audio De-id pipeline in our NAACL 2019 paper '[Audio De-identification: A New Entity Recognition Task][1]'. We evaluated our pipeline using a random subset of conversations from the Switchboard (LDC2001S13) and Fisher (LDC2004S13) datasets, which consist of English conversations. Content We annotated the files manually with audio annotations consisting of an NER tag, an audio interval time, a conversation ID and a source dataset. The dataset includes a CC BY 4.0 license file, three data files, and a readme file with additional context and instructions. [1]: https://arxiv.org/abs/1903.07037

END

发表评论