Creators:
Alexander Vergara (vergara '@' ucsd.edu)
BioCircutis Institute
University of California San Diego
San Diego, California, USA
Donors of the Dataset:
Alexander Vergara (vergara '@' ucsd.edu)
Jordi Fonollosa (fonollosa '@' ucsd.edu)
Marco Trincavelli (marco.trincavelli '@' oru.se)
Nikolai F. Rulkov (nrulkov '@' ucsd.edu)
Ramon Huerta (rhuerta '@' ucsd.edu)
Data Set Information:
Number of instances:
18000 times-series measurements recorded from a 72 metal-oxide gas sensor array-based chemical detection platform.
Number of attributes (features):
Every measurement contains 72 time-series recorded during 260 seconds, each collected at a sample rate of 100 Hz (samples per second).
The dataset also contains time, temperature, and relative humidity information.
The resulting dataset ultimately includes 75-time series composed of 26000 points.
This archive contains 18000 time-series measurement recordings collected from an array of 72 metal-oxide gas sensors composing our sensing platform utilized in the detection and identification of potentially-dangerous chemical gaseous substances under complex environmental conditions, as reported in the related manuscript below. Our primary purpose is to make our dataset freely accessible on-line to the chemo-sensing research and machine-learning communities, as well as other interested communities, to develop alternative competitive solutions relevant to gas-sensing discrimination tasks in open sampling settings, such as the one pursued here, and/or navigation. The dataset can be used exclusively for research purposes. Commercial purposes are fully excluded.
The dataset was gathered from December 2010 to April 2012 (16 months) in a 2.5 m ?— 1.2 m ?— 0.4 m wind tunnel research test-bed facility situated at the BioCircuits Institute, University of California San Diego. Specifically, our customized research facility, endowed with a computer-supervised mass flow controller-based continuous flow gas delivery system, operates in a propulsion open-cycle mode, by continuously drawing external turbulent air into and throughout the tunnel and exhausting it back to the outside, thereby creating a relatively less-turbulent airflow moving downstream towards the end of the test field, which is particularly suitable for applications pursued here that require injecting chemical poisonous agents or explosive mixtures because it prevents saturation. Being operated by a fully computerized environment a€”controlled by a player/stage robot server software programmed on C++ on a PC fitted with the appropriate serial cardsa€” and with minimum human intervention, the designed wind tunnel test-bed facility provides versatility for releasing the chemical substances of interest at the desired concentrations with high accuracy and in a highly reproducible manner during the entire experiment and simultaneously in preserving the appropriate environmental conditions to generate chemical gas plumes exhibiting turbulent patterns. A graphical illustration of the designed wind tunnel test-bed facility considered in this study along with the characteristics of the geometry of the problem as well as the exact locations of the chemical analyte source and chemo-sensory platform is presented in Figure 2 of the manuscript cited below. Actual pictures of the designed wind tunnel are also presented in the Supplementary Material, Figure S.1 of the accompanying manuscript.
The resulting dataset induces a ten-class gas discrimination problem, comprising recordings from ten distinct pure chemical gases, namely Acetone, Acetaldehyde, Ammonia, Butanol, Ethylene, Methane, Methanol, Carbon Monoxide, Benzene, and Toluene. The goal is to identify and discriminate the mentioned chemical hazards at relevant concentrations regardless of the location of the sensory system platform within the annotated wind tunnel research facility as well as the environmental and parametric conditions induced in the setting (Please see manuscript for more details). See Table 1 on Vergara et a. 2013 (manuscript below) for specifics on the identity of chemical analyte hazards as well as their nominal concentration values at the outlet of the gas source in parts-per-million by volume (ppmv). Please refer to the manuscript below for a more details of the wind tunnel test-bed facility as well as for specifics on the collection procedure followed and the operating and environmental parameters utilized during the creation of the aforementioned dataset.
Attribute Information:
The response of the sensor platform is read-out in the form of the resistance across the active sensitive film of each of the 72 gas sensors integrating the sensor array; hence, each measurement produced a 72-channel time series, each of which represented by a 260-second time series collected at a sample rate of 100 samples per second (Hz), reflecting all the environmental changes in the evaluated scenario. For a more detailed analysis and discussion on the processing of the time series as well as a graphical illustration of them please refer to Sections 2 and 3 and Figure 4, respectively, of the manuscript below.
For manipulation purposes, the data is organized into eleven folders, each containing the number of measurements per chemical class identity and nominal concentration indicated above and described in the Table 2 of the manuscript. For example the folder named a€?Toluene_200a€? means the name of the gas identity is Toluene, which has been dosed at 200 ppmv. Each folder contains 6 folders, each representing the line location within the test area of the wind tunnel (location 1, L1, to location 6, L6, being L1 the closest point to the gas source) from which the set of time-series were recorded. In each of these folders there are 300 files, each of which corresponding to the number of measurements recorded at each location in the tunnel. The name of each file contains the exact log information of each of the measurements performed during the entire experiment, which is organized as follows. The first 12 digits of the file name (e.g., 201106060617) indicate the date and time at which each specific measurement was collected, starting from the year, month, day, and time. The last 4 digits in the following 19 characters of the name file, (e.g., board_setPoint_500V), indicate the fixed operating temperature value, represented by a voltage value applied to the embedded heating element in the chemical sensor, applied to the entire sensing platform, which can adopt nominal values from 4 to 6 V with an resolution value of 0.5 V. Note that the value 500V in the example is a graphical representation of the 5V value applied to the sensora€?s heater. For more details on the operating principles of the chemical sensors utilized in our platform please refer to Section 2 of the manuscript. The last 3 digits in the following 16 characters of the file name (e.g., fan_setPoint_060) indicates the set-point value of the nominal rotational speeds of the multiple-step motor-driven exhaust fan utilized to induce the distinct artificial airflows speed in the wind tunnel. Only three values were adopted in this condition: the value a€?000a€? in the file name, which indicates the slowest rotational speed (1500 rpm), the value a€?060a€?, indicating the mid-point rotational speed value of the fan (3900rpm), and the value a€?100a€?, which refers to the fastest induced speed of the fan, 5500 rpm. The last 14 characters of the following string of 27 characters (e.g., mfc_setPoint_Toluene_200ppm) describe the analyte identity and concentration value for each particular measurement. Thus, the just mentioned example represents the class corresponding to the chemical analyte identity a€?Toluenea€? dosed at the nominal concentration value of 200 ppm. Finally, the last 2 or 3 digits in the name (e.g., a€?p7a€?) describe the line po