Context The goal of this dataset was using available online information from the German railway service (DBAHN) in order to create a set of data that allow the user to analyze the state of the different train lines at different points throughout the country. The source used to obtain the data is the DBAHN‘s website, with web-scraping tools created in Python. Content Data will be captured each minute adding a line in the log: # Request Url example `https://reiseauskunft.bahn.de/bin/bhftafel.exe/dn?&country=DEld=15082&seqnr=4&protocol=https:&input=Berlin%238011160&ident=fi.0865482.1497188234&rt=1&productsFilter=1111100000&time=11:00&date=20.06.19&ld=15082&start=1&boardType=arr&rtMode=DB-HYBRID HTTP/1.1` The result is that there are multiple entry for the same train/travel. So it′s posible to observe the evolution of delays, alerts, etc.. `2019-06-20 11:09:41 DEBUG: RESULT_ROW TAA-|TA-|TIN-S 7|TIR-Potsdam Hbf (S) 10:21-Berlin Wannsee (S) 10:32-Berlin-Nikolassee 10:35-Berlin-Grunewald 10:42-Berlin Westkreuz 10:45Berlin Bellevue 10:55-Berlin Hbf (S-Bahn) 10:58|TSI-8011160|TIM-arr|TIL-/bin/traininfo.exe/dn/789990/646057/632576/52969/80?ld=15082&country=DEU&protocol=https:&seqnr=4&ident=fi.0865482.1497188234&rt=1&date=20.06.19&time=10:58&station_evaId=8089021&station_type=arr&rtMode=DB-HYBRID&|TIRE-Potsdam Hbf (S)|TIP-15Berlin Hbf (S-Bahn)|TIT-10:58|TID-20.06.19|TSC-Berlin%238011160 ` ## Columns explanation - TAA: Alerts (@@ separated) - TA: Delay hour (before was in minutes) need to be compared with TIT column - TIN: Train Model - TIR: Route - TSI: Station ID - TIM: Direction departura/arrival - TIL: Request with parameters - TIRE: Destination - TIP: Platform number - TIT: Departure hour - TID: Date - TSC: Station Name and ID - TAc: delay in minutes ( TA - TIT ) Inspiration - models of trains with more failures - more confluent cities - cities that are bottlenecks - breakdown forecast - ...