Flight Data Analysis

Advanced Analytics of Flight Patterns and Pricing for JFK-Bound Flights
Duration: Nov. 2023 - Jan. 2024
Skills: Data Preprocessing (Python, SQL), Statistical Analysis (Python), Data Visualization (Tableau)

Description

This project utilizes a dataset from Kaggle, comprising detailed flight ticket data with over 6 million records captured from Expedia. The dataset represents a wide array of flights to and from key airports like ATL, DFW, DEN, ORD, LAX, CLT, MIA, JFK, EWR, SFO, DTW, BOS, PHL, LGA, and IAD, recorded between April 2022 and November 2022. It includes comprehensive flight information such as the flight and search dates, starting and destination airports, fare details, travel duration, and seat availability, among others.

In this analysis, the focus narrows to examining flights arriving at John F. Kennedy International Airport (JFK). JFK is selected for its prominence as a major international and domestic hub, offering a rich and diverse dataset to analyze. By concentrating on JFK, the aim is to gain deeper insights into the flight patterns and pricing dynamics specific to this significant airport.

Objective / Success Metrics

The aim of this project was to analyze and understand the complexities of flight pricing and demand at one of the United States' busiest transport hubs, JFK Airport. Specifically, the objectives were:

  • Spatial Fare Patterns: To visualize and analyze the geographical distribution of flight fares and counts, seeking regional trends and anomalies.
  • Seasonal Fare Trends: To reveal how flight prices and demand fluctuate over time, especially in relation to seasons and weekdays.
  • Demand Analysis: To quantify the market share of airlines operating flights to JFK, identifying dominant carriers and competitive dynamics.
  • Pricing Strategy Exploration: To investigate the pricing strategies across different airlines, assessing how factors like cabin class influence fare structures.

These objectives were established to provide actionable insights into airline pricing strategies, consumer demand patterns, and the overall economic landscape of airline travel to JFK Airport.

Approach

  1. Data Preprocessing (SQL, Python):

    Before diving into the analysis, the data was preprocessed with Python Dask and SQL, and was reduced from over 6 million records to 637,000 entries. This stage was critical to ensure the accuracy and usability of the data, tailored specifically to the needs of this project. The process involved cleaning, transforming, and enriching the dataset to facilitate a detailed and meaningful analysis. Exploratory data analysis (EDA) was employed to understand the data's structure, identify patterns, and assess the distribution of key variables.

    • Subset Selection: Concentrated on flights destined for JFK to refine the dataset scope.
    • Column Removal: Dropped unnecessary columns such as 'legId', 'fareBasisCode', and 'elapsedDays'.
    • Null Value Treatment: Eliminated rows with missing data to ensure dataset integrity.
    • Date Transformation: Transformed 'searchDate' and 'flightDate' into datetime format and 'travelDuration' into minutes.
    • Additional Attributes: Enriched the dataset with new columns for weekdays.
    • Segment Standardization: Unified 'segmentsAirlineCode' and 'segmentsCabinCode', labeling as 'multiple' where necessary.

  2. Data Analysis and Visualization (Tableau):

    With the data refined and structured, Tableau was employed to visualize the data. This phase was about translating complex datasets into intuitive graphics, enabling the identification of clear trends, patterns, and outliers. The visualizations aimed to provide a compelling narrative of the flight market trends and behaviors at JFK Airport.

Results

Key Takeaways

  1. Summer months showed an unexpected trend: despite high demand, flight fares did not spike, suggesting competitive pricing among airlines to attract customers. This price strategy ensured that revenue remained high due to increased passenger volume.

  2. Fares for flights with ample seats remain stable, while those nearing full capacity increase sharply as the travel date approaches. This suggests a strategic approach for travelers: to monitor flight capacities well in advance. For flights not near full capacity, they could potentially benefit from waiting, as prices may not significantly increase. However, for nearly full flights, early booking is advisable to avoid higher costs as the travel date gets closer.

  3. American Airlines (AA) emerges as the dominant carrier for JFK-bound flights, reflecting its robust network and strategic presence at these airports. This suggests a potentially strong brand preference or more comprehensive service offerings compared to Delta (DL) and JetBlue (B6), which also hold substantial shares.

The analysis offers a comprehensive look at the airline market's behavior towards JFK flights, exploring regional demand, fare variation by weekday, carrier dominance, and class-based pricing strategies. It reveals how airlines adjust to market shifts and passenger preferences. This overview is not just a snapshot of current operations; it's a predictive guide for future industry movements, equipping stakeholders with the insights for informed decision-making.