Context Named Entity Recognition (NER) has been mostly studied in the context of written text. Specifically, NER is an important step in de-identification (de-ID) of medical records, many of which are recorded conversations between a patient and a doctor. In such recordings, audio spans with personal information should be redacted, similar to the redaction of sensitive character spans in de-ID for written text. This dataset was used to test the performance of our Audio De-id pipeline in our NAACL 2019 paper '[Audio De-identification: A New Entity Recognition Task][1]'. We evaluated our pipeline using a random subset of conversations from the Switchboard (LDC2001S13) and Fisher (LDC2004S13) datasets, which consist of English conversations. Content We annotated the files manually with audio annotations consisting of an NER tag, an audio interval time, a conversation ID and a source dataset. The dataset includes a CC BY 4.0 license file, three data files, and a readme file with additional context and instructions. [1]: https://arxiv.org/abs/1903.07037