Context I want to use haversine distance and approximate zip-code of trips in NY City Taxi Trips Durations project. Content (https://www.kaggle.com/c/nyc-taxi-trip-duration) Acknowledgements **geosphere** package in R
**zipcode** package in R Inspiration If you think this data is useful for you and not able to download it. You can re-compute the same using below few lines of code. library(data.table)
library(geosphere)
library(zipcode)
** Reading data **
train <- fread("train.csv")
test <- fread("test.csv")
** Computing haversine distance **
train[, distance := distHaversine(c(pickup_longitude, pickup_latitude), c(dropoff_longitude, dropoff_latitude)), by = id]
test[, distance := distHaversine(c(pickup_longitude, pickup_latitude), c(dropoff_longitude, dropoff_latitude)), by = id]
** Assigning approximate zipcode using euclidean distance between the locations **
data("zipcode")
zipcode <- as.data.table(zipcode)
zipcode <- zipcode[state == "NY