top of page
Search

NYC Taxi Analysis

  • bokakwu
  • Nov 22, 2021
  • 3 min read


I worked on Maven’s NYC Taxi dataset. To be honest, this data set was very challenging. To begin with, the data had about 28 million rows. The data was so large my laptop got angry. The Power BI desktop I have stopped working several times. I decided to cut the dataset in half and do the analysis for 2 years, 2019 and 2020 instead of 4 years. The dataset also required tons of cleaning. Part of the requirements was that the analysis should only be for trips paid using cash or credit card. Here are some of my analyses for the dataset.


The total amount generated from the trips for 2019 and 2020 was $119.95M the distance covered by the trip was 8M km. The average fare per trip was $15.04 and the average distance per trip was 6.6 km.




The expected trip distance to be covered for week 47 is 98,278 km. For the rest of the year, the expected trip distance to be covered every week shall be 202,832 km for week 48, week 49 is expected to record the highest number of trip distances will be 262,582 km. The trip distance to be covered for week 50 shall be 243,948 km, 232,416 km for week 51 and 248,411 km for week 52.




The most popular pickup and dropoff location is Zone 74. Other popular pickup and dropoff locations are Zones 6 to 8, 41 to 43, 75 and 129, Zone 166 is also a popular pickup and drop-off location.





The month with the highest number of trips is January, followed by Feb and March. Trip volume seems to reduce significantly in April and May. It begins to increase again in June, and it is lowest in August.




For week 47 (the current week of the year), the average number of trips to be expected is 130,139 trips. Average trips to be expected every week for the rest of the year will be 111,856 trips for week 48, week 49 is expected to have 128,672 trips, Week 50 should have 132,713 trips, week 51 will have 128,092 trips and week 52 should have 98,278 trips




The trip volume is usually highest on Fridays from 4 pm to 6 pm. The busiest time during the weekdays is also 4 pm to 6 pm. However, the busiest time on Saturday’s and Sunday’s is 1 am.




For trips by zones using Power BI customs shape maps show that East Harlem North has the highest number of trips and generates the highest amount from trip fares. This zone has about 610K trips and makes about $6.7m from these trips.



Putting it all together



Recommendations

  • The estimated average fare per trip is $15. This rate should be used to plan for the year 2022.

  • The average distance covered per trip is 6.6km. This should be used in estimating the distance to be covered in the year 2022

  • The busiest time during the weekdays is 4 pm to 6 pm. Having more taxis available at this time will pay off.

  • The busiest time during the weekends is 1 am. This should be included in the plan for the year 2022.

  • The highest pickup and drop-off location is Zone 74. More taxis could be made available for this location or plans to make pickup and drop-off seamless could be implemented for the next fiscal year.

  • Shape maps show that East Harlem North generates the highest revenue and the highest count of trips. Plans could be implemented to make pickups and drop-offs seamless in this location.


The slides I prepared for this analysis is here: https://bit.ly/3cx8Alt



 
 
 

Comments


bottom of page