NBA Performance and Respective City's Economic Growth
Duration:
Tools:
Project:
Responsibilities:
October 2022 ~ December 2022
NumPy, PANDAS
UCSD / Group Project
Finding Datasets, Data Cleaning, EDA and Analysis
In this report, we observed the relationship between the NBA team's performance and its respective city's economic growth. To do so, we used public datasets and combine them into workable data. With that data, we conducted exploratory data analysis by using various graphs for us to visualize and decide whether there is a relationship. Moreover, we discussed the result by having a statistical analysis using a linear regression model. The result showed that it is hard to determine the relationship between the NBA performance and the respective city's economic growth, and we assume that it is because of too many outliers or other variables that also affect the economic status of the city.
Research Question
Is there a relationship between an NBA team’s performance (win rate, league standings (1st, 2nd…) and economic growth - such as unemployment rate, population growth, and GDP growth - in their respective city?
Background & Prior Work
The NBA is the most outstanding sports organization in the world for basketball. We know how Michael Jordan, Kobe Bryant, and LeBron James have significantly impacted our society. Franchise organizations - such as the Golden State Warriors and the Los Angeles Lakers - have contributed to the United States economy. Lately, questions about the year’s NBA champions resulting in their respective city’s rent inflation and other social effects has become a debatable topic. Our background research on the NBA and its economic impact have allowed us to narrow down our interest in the NBA team's performance and the economic growth in their respective city. Our primary purpose is to see whether having an NBA team in a town is a significant benefit or not. Specifically, we will divide economic growth into three subparts: unemployment rate, population growth, and GDP growth.
​
Gilchrist examines how constructing an NBA arena impacts the host city's economy [^Gilchrist]. He states that construction is not a significant indicator of the economic growth of their respective cities. With the background knowledge and borrowing information from previous literature, he believes the amount of tourism will increase, which will automatically impact the host city's economy. His ideas are closely related to our interest; we would want to see if there is a relationship between the team's performance instead of the construction of the new stadium. However, we can adapt insights into how to treat data like how Gilchrist did and reduce some variances throughout our research.
Additionally, CNN shows the key economic growth indicators: US payrolls, Unemployment rate, GDP and etcetera [^CNN]. Among those indicators, we have decided to see if there is a relationship between the unemployment rate, GDP, and population growth with the NBA team's performances. Coates and Hemphreys introduce how the structure of local professional teams brings hundreds of new jobs; however, they also state that there is no evidence of a positive impact economically from professional sports. Both publications have the same statements that facilities for sports teams do not contribute. However, we hope to see if the team's performance brings other contributions to economic growth.
​
[^Gilchrist] : Gilchrist, G.(2018). Analysis of the economic impact of constructing a NBA arena on the host City". CMC Senior Theses. https://scholarship.claremont.edu/cgi/viewcontent.cgi?article=2938&context=cmc_theses
[^CNN] Key Economic Growth Indicators. (n.d.). CNN Business. Retrieved October 23, 2022, from https://edition.cnn.com/business/economic-growth-indicators
[^Coates] : Coates, D., & Humphreys, B. R. (2003). Professional sports facilities, franchises and urban economic development. Public Finance and Management, 3(3), 335-357. https://core.ac.uk/download/pdf/7068694.pdf
Hypothesis
We believe that as teams win more games and come out on top of the season, they will become more popular with more live-game attendance which will improve it's respective city's economic. As their performance improves, there will be an increase in hiring, which we assume will improve the unemployment rate. Additionally, we presume the population will grow as more people move into the respective cities to support their teams. Also, the increase in GDP could occur from people being more active in the consumer market related to their supporting teams.
Datasets
Dataset Name: NBA (Team_Data)
-
Link to the dataset: https://www.basketball-reference.com/leagues/
-
From above link, we have imported each datasets based on the season's year which are going to be total of 20 datasets
-
Number of observations: 457
-
Description: when the whole data from NBA data is combined we will have the variables of the winrate and the rankings for each teams through seasons from 2001 to 2019
​
Dataset Name: Population Growth Rate
-
Link to the dataset: https://www.macrotrends.net/
-
From above link, we have imported each years population growth rate datasets based on NBA team's respective cities, which are going to be total of 23 datasets
-
Number of observations: 457
-
Description: when the whole data for Population change rate data is combined we will have the variables of population change rate relative to previous year and the total population in by end of that year for each city that is respective to NBA team through seasons from 2001 to 2019.
Dataset Name: Unemployment Rate
-
Link to the dataset: https://www.bls.gov/lau/
-
From above link, we have imported each years unemployment rate datasets based on NBA team's respective cities, which are going to be total of 23 datasets
-
Number of observations: 457
-
Description: when the whole data from Unemployment Rate data is combined we will have the variables of unemployment rate for each city that is respective to NBA team through seasons from 2001 to 2019
Dataset Name: GDP Growth Rate
-
Link to the dataset: https://www.bea.gov/itable/
-
From above link, we have imported each years GDP Growth rate datasets based on NBA team's respective cities, which are going to be total of 23 datasets
-
Number of observations: 457
-
Description: Our dataset contains the GDP data of cities measured across different years. To match the years for NBA season, we includes the years from 2001~2020 and will have variables of GDP rate.
Data Cleaning and Wrangling
Population Growth Data:
We began by importing the CSV files for each respective city and initiated the process of wrangling and cleaning the population growth data. To facilitate our data organization, we created two lists: one for storing the CSV file names related to population growth and another for the city names, which would be assigned later.
Next, we proceeded to read the data and store each individual data frame (CSV file) into a list called df_pop_list, which we had created earlier.
With the data safely stored in df_pop_list, our next step was to eliminate any extraneous years from the dataset. We focused on the years between 2000 and 2020. This timeframe was chosen as it aligns with the NBA season schedule, which typically ends in April of the following year (e.g., 2000-December).
Additionally, we prepared to combine all datasets into one comprehensive population data set using the concat function. We also identified specific datasets for exclusion – Charlotte 2002 and 2003, as well as Memphis 2000 – due to a lack of corresponding NBA data.
Unemployment Data:
In our effort to wrangle and clean the unemployment data, we took several steps. First, we created a list of city names for importing CSV files and another list to store each unemployment data frame.
We then proceeded to import each CSV file for unemployment rates, utilizing the city_list as the file name in a for loop. Subsequently, we removed unwanted rows, specifically those related to Charlotte in 2003 and 2004, and Memphis in 2001.
After these initial steps, we structured the data frames for unemployment rates, defining the columns as "Unemployment Name," "Unemployment Year," and "Unemployment Rate." These columns would prove vital in our analysis of the relationship between unemployment rates and NBA team performance.
​
GDP Data:
In our quest to wrangle and clean GDP data, we began by importing the respective CSV files for each city. To streamline our data, we listed the CSV files and sorted them alphabetically.
Notably, we encountered data that was organized in a format opposite to what we required. To rectify this, we employed a helper function to reformat the data and standardized the titles.
​
Despite these efforts, we discovered that the CSV files contained seven extraneous lines that disrupted the data. We overcame this challenge by implementing skiplines and skiprows to eliminate these unnecessary lines.
Finally, we successfully combined the data sets, merging them into a single data frame. We took care to remove redundant columns, resulting in a clean and organized data set that would serve as a valuable resource for our analysis.
NBA Performance Data:
Our journey into NBA performance data began with the importation of CSV files for each respective city. Utilizing the data we had, we employed a "winrate" function to calculate the win rate of each team in each season.
​
Each dataset provided detailed information about the team's name, final placement in the standings, and win-loss record for that season. Due to website restrictions, we couldn't download the data directly. Instead, we copied the data for each season into a notepad document and saved it as a CSV file.
With the datasets in our possession, we seamlessly merged them into a single data frame representing NBA records. Additional columns were introduced to capture season information and calculate win percentages (wins divided by total games).
We completed the process by merging the dataframes. We were left with the following dataframe to be used for our data visualization and analysis.

Overall Trends in Economic Data
To begin our analysis, we examined the trends in various economic variables over the years. We aimed to visualize these trends to gain insights and generate questions about potential relationships with other variables. We looked at Unemployment Rate, GDP, and Population Change Rate for multiple cities.



The Unemployment Rate trend for each city appeared remarkably similar, suggesting the influence of a common factor affecting the nationwide unemployment rate. GDP showed a consistent upward trajectory over time, but its connection with NBA performance remained uncertain. Population Change Rate exhibited steady growth over the years, with occasional decreases and slight increases. To investigate further, statistical testing would be necessary to determine the significance of these relationships.


Additionally, distribution plots were created for various variables, highlighting that, except for Population Change Rate, all other variables exhibited relatively normal distributions. The bimodal and right-skewed nature of Population Change Rate would require special consideration during statistical tests.
​
​
Comparing Atlanta's Performance and Economic Data:
Next, we shifted our focus to analyzing the performance of a specific city, Atlanta, and its economic data. We visualized trends in Win Rate, GDP, Unemployment Rate, and Population Change Rate for Atlanta alone.




Our observations revealed some overlapping trends in GDP for Atlanta, but distinct patterns in Unemployment Rate and Population Change Rate. This exploration prompted us to create a new data frame based on average performance over the years, focusing on the top five cities, to investigate relationships between economic data and NBA performance further.
​
Looking at Linear Relationships Using the Whole Dataset:
Before delving into city-specific analyses, we explored the entire dataset to identify potential relationships between variables. We created scatterplots to examine the correlation between Win Rate and Unemployment Rate, GDP, and Population Change Rate.






While the scatterplots indicated a slight negative relationship between Win Rate and Unemployment Rate, there was no evident relationship between Win Rate and GDP or Population Change Rate. We decided to conduct linear regression analyses to scrutinize these relationships further.
​
Linear Regression Model Using the Whole Dataset:
Linear regression models were applied to assess the relationships between Win Rate and various economic variables using the entire dataset. The results provided insights into the strength and significance of these relationships.
-
Unemployment Rate vs. Win Rate: The regression model showed a negative relationship, but the p-value indicated that this relationship was not statistically significant.
-
GDP vs. Win Rate: The regression model revealed no significant relationship between GDP and Win Rate, with both the coefficient and p-value suggesting little to no association.
-
Population Change Rate vs. Win Rate: Similar to the other variables, the regression model found no significant relationship between Population Change Rate and Win Rate.
In summary, the linear regression analyses using the whole dataset failed to establish strong and statistically significant relationships between NBA team performance (Win Rate) and economic variables (Unemployment Rate, GDP, and Population Change Rate).
​
Average Team Performance
Creating a New Data Frame Based on Average Performance (2001-2019):
Our initial objective was to calculate the average rank of each NBA team from the years 2001 to 2019 and store this information in a new data frame for reference. To accomplish this, we utilized Python code to compute the average rankings for each team and organized the data.
​
The resulting data frame, team_ranking_df, contains two columns: 'City' and 'Average Ranking.' It provides insights into how teams have performed on average over the specified time period. Sorting this data frame revealed the top-performing and bottom-performing teams, as well as those in between.
Analysis of Average Rankings:
Upon examining team_ranking_df, we identified the top five and bottom five performing teams based on their average rankings. San Antonio, Dallas, Houston, Boston, and Miami constituted the top five, while Atlanta, Charlotte, Sacramento, Washington, and Minnesota comprised the bottom five.
​
Please note that these rankings are based on the teams' average performances from 2001 to 2019.

Division of the Original Data Frame:
With the teams categorized into top-performing, middle-performing, and bottom-performing groups, we proceeded to divide our original data frame (df) into three subsets accordingly: top5_df, mid_df, and bot5_df. These subsets contained data for the respective groups of teams and would facilitate further analysis.
​
The division process involved iterating through the rankings and appending the corresponding team data to the appropriate subsets. Each subset was reset to ensure proper organization for subsequent analysis.
Relationships + Regression / Top Data
Looking at Linear Relationships for Top Data
In this analysis, we will examine linear relationships between various factors and the win rates of the top five NBA teams. First, we'll observe the trends in the ranks of the top five teams over the years. Then, we'll explore how points are distributed to get a sense of any potential relationships. We'll also visually inspect the slope of the trend lines. Finally, we'll perform linear regression to determine if there is statistical significance in these relationships.

As expected, the trends for the top five teams show a lot of ups and downs over the years, which is understandable given the challenges of maintaining consistent performance.
​
Now, having examined the trends of the top five teams, we'll explore their relationships with economic factors through scatter plots. We'll start by looking at the relationship between win rates and the unemployment rate, followed by the relationship between win rates and GDP (Gross Domestic Product), and lastly, the relationship between win rates and population change rate. To gain more insight, we'll overlay linear regression lines on these scatter plots to visualize potential trends.



The scatter plots for unemployment rate, GDP, and population change rate in relation to win rates show that the relationships are not immediately clear due to scattered data points. From the line plots, we can discern some relationships.
-
Unemployment rate tends to increase as win rates increase, which is somewhat unexpected.
-
Similarly, GDP appears to decrease as win rates increase, contrary to our initial expectations.
-
However, the population change rate increases as win rates improve, which aligns with our expectations, indicating that better team performance attracts more people to the city.
​
Linear Regression Model Using the Top Data
Now, we'll conduct linear regression analysis to determine the significance of these relationships. We'll examine each relationship individually.
-
Unemployment Rate vs. Win Rate:
-
The analysis yields a low R-squared value of 0.019, suggesting that only a small fraction of the variance in unemployment rate can be explained by changes in win rate.
-
The p-value of 0.171 indicates that the relationship is not statistically significant.
-
The positive coefficient implies a slight positive association between win rate and unemployment rate, but it's not statistically meaningful.
-
-
GDP vs. Win Rate:
-
The R-squared value is 0.011, indicating a weak relationship between GDP and win rate.
-
The p-value of 0.293 suggests that the relationship is not statistically significant.
-
The negative coefficient implies a weak negative association between GDP and win rate.
-
-
Population Change Rate vs. Win Rate:
-
This analysis shows a higher R-squared value of 0.074, indicating that a larger proportion of the variance in population change rate can be explained by changes in win rate.
-
The p-value of 0.00610 is less than 0.05, indicating a statistically significant relationship.
-
The positive coefficient implies that as the win rate increases, there is a positive effect on the population change rate.
-
​
In summary:
-
There is no significant relationship between win rate and unemployment rate or GDP for the top five NBA teams' cities.
-
There is a significant and positive relationship between win rate and population change rate, suggesting that improved team performance attracts more people to these cities.
Relationships + Regression / Mid Data
Looking at Linear Relationships Using the Middle Data
Here we focus on exploring potential linear relationships involving the middle seven NBA teams. First, we'll examine the trends in win rates for these teams over the years. Then, we will investigate whether there are any discernible relationships between the performance of historically mediocre teams and economic data. This analysis aims to shed light on which economic factors, if any, are associated with average team performance.

As anticipated, the win rates of these middle seven teams exhibit fluctuations over the years, in line with the unpredictable nature of sports. Nevertheless, we will proceed to investigate whether any linear relationships exist between economic data and these mediocre teams.
​
We start by visualizing the relationships using scatter plots. We examine the correlations between win rates and unemployment rates, win rates and GDP (Gross Domestic Product), and win rates and population change rates. To gain a clearer visualization, we overlay linear regression lines on these scatter plots.



The scatter plots for these relationships do not immediately reveal any distinct patterns. The data points are widely scattered, suggesting a lack of obvious linear relationships. We also see almost no apparent relationship between win rate and unemployment rate. The line is nearly flat, and the data points are dispersed, making it evident that there is no significant correlation between unemployment rate and win rate for mediocre teams.
​
Similarly, the relationship between GDP and win rate appears weak, as indicated by the slightly positive slope of the line and the scattered data points. The p-value further confirms that this correlation is not statistically significant, reinforcing the notion that GDP and win rate are not linearly related for mediocre teams.
​
Lastly, when examining the relationship between population change rate and win rate, we notice a slight positive slope in the line. However, the p-value once again suggests that this correlation is not statistically significant. In summary, there is no significant relationship between population change rate and win rate for the middle seven teams.
​
Linear Regression Model Using the Middle Data
Now, we will run linear regression models to validate our observations. We will analyze each relationship individually.
-
Unemployment Rate vs. Win Rate:
-
The analysis reveals a very low R-squared value of 0.000, indicating that an extremely small portion of the variance in unemployment rate can be explained by changes in win rate.
-
The p-value of 0.963 signifies that the relationship is not statistically significant.
-
With a nearly flat line, the slope of the regression line is negligible, reinforcing that there is no meaningful relationship between unemployment rate and win rate for mediocre teams.
-
-
GDP vs. Win Rate:
-
The R-squared value is 0.006, suggesting a weak relationship between GDP and win rate.
-
The p-value of 0.216 confirms that the relationship is not statistically significant.
-
Although there is a slightly positive slope in the line, it is not statistically meaningful, indicating no significant linear relationship between GDP and win rate for these teams.
-
-
Population Change Rate vs. Win Rate:
-
The analysis shows an R-squared value of 0.003, indicating that only a tiny fraction of the variance in population change rate can be explained by changes in win rate.
-
The p-value of 0.382 reiterates that the relationship is not statistically significant.
-
Despite a slight positive slope, this relationship is not significant, highlighting the absence of a meaningful linear connection between population change rate and win rate for mediocre teams.
-
​
In summary:
-
There is no significant relationship between win rate and unemployment rate, GDP, or population change rate for the middle seven NBA teams.
-
The linear regression models confirm that these relationships are not statistically meaningful.
Relationships + Regression / Bot Data
Looking at Linear Relationships for Bottom Data
Lastly, we delve into examining potential linear relationships using data from the bottom five NBA teams. The objective is to explore whether any economic data is correlated with the performances of these bottom-ranking teams. This exploration will provide insights into which economic factors, if any, are associated with the performance of the least successful teams in the NBA.

As expected, the win rates of the bottom five teams exhibit considerable fluctuations over the years. This variability is expected due to the challenges these teams face in maintaining consistent performance.
​
To investigate the relationship between economic factors and the performance of these teams, we employ scatter plots with linear regression lines. These plots visualize the correlations between win rates and three economic indicators: unemployment rate, GDP (Gross Domestic Product), and population change rate, and the lines provide a clearer perspective on the potential relationships between economic data and the performance of the bottom five teams.



The scatter plot for the relationship between win rate and unemployment rate shows most data points concentrated at the lower end of the spectrum. This clustering suggests that regardless of how high the win rate is, unemployment rates remain relatively constant for these teams.
In the case of the relationship between win rate and GDP, a rough line can be discerned in the middle of the scatter plot. This indicates a positive correlation between win rate and GDP within the win rate range of 20-40. However, beyond a win rate of 40, the GDP appears to plateau, suggesting that regardless of the win rate, GDP remains relatively stable.
For the relationship between win rate and population change rate, the scatter plot shows a scattered distribution of data points with no clear trend.
​
From the line plots, we can draw some preliminary conclusions:
-
The unemployment rate tends to decrease as the win rate increases, indicating a negative correlation. However, this correlation is not statistically significant, as the p-value is greater than 0.05.
-
Similarly, for the relationship between GDP and win rate, the regression analysis reveals an extremely low R-squared value and a high p-value, indicating a lack of statistical significance. This suggests no meaningful relationship between GDP and win rate for the bottom five teams, despite the positive correlation observed within a certain win rate range.
-
In the case of the relationship between population change rate and win rate, the analysis also yields a low R-squared value and a high p-value, indicating a lack of statistical significance. Consequently, there is no significant linear relationship between population change rate and win rate for these teams.
​
Linear Regression Model Using the Bottom Data
To further validate our findings, we conduct linear regression models for each of the relationships explored.
-
Unemployment Rate vs. Win Rate:
-
The regression analysis indicates a negative coefficient, suggesting that as win rate increases, unemployment rate decreases. However, the p-value is 0.106, slightly larger than the typical significance levels of 0.01 or 0.05. This lack of statistical significance means that there is no substantial relationship between unemployment rate and win rate for the bottom five teams.
-
-
GDP vs. Win Rate:
-
The regression analysis reveals an almost negligible R-squared value and a very high p-value of 0.978. These results confirm that there is no significant relationship between GDP and win rate for these teams.
-
-
Population Change Rate vs. Win Rate:
-
Similarly, the regression analysis for this relationship results in a low R-squared value and a high p-value of 0.813. This indicates a lack of statistical significance and, consequently, no significant linear relationship between population change rate and win rate for the bottom five teams.
-
​
In summary:
-
There is no significant linear relationship between win rate and unemployment rate, GDP, or population change rate for the bottom five NBA teams.
-
The linear regression models affirm the absence of statistically meaningful relationships.
Overall Results
Most of our results indicated that there are no relationship between the economic variables and the NBA teams performance. This might be because of other variables out in the world that might also be affecting the economy. NBA market is just a small size of economic factor when we scope the economy as a whole country. Although we did not take those other variables into account while observing statistical models since we were limited with our data, we still were able to observe the relationship between the population change rate and win rate of top five teams. We observed that when Win rate increase, we saw population change also increase with acceptable p-value.
Ethics + Privacy
The statistics such as GDP and population rate change were publicly available on the BEA government database, and we believe these statistics to be facts rather than something that should be agreed upon. In our project, we made sure to generalize a particular population such as the cities. If a city disagrees or refutes its analysis for any reason, then they would have to accept that the data we have collected are facts rather than opinions, and therefore should be available for anyone to use. We also made sure to scrape the project of only its salient metrics (GDP, Population change rate, Winrate, etc.). This was done mainly so we would not have to mention the name of particular economic industries or call attention to any individuals. With this information in mind, there are very few windows for privacy concerns.
​
However, our biggest ethical concern comes in the fact that our data is pulled from multiple sources where we have no context as to how they’re being calculated. For example, although GDP is a very direct and objective measurement of economic growth, it is extremely difficult to determine if anything is or isn’t biasing something. On top of this, by cleaning and manipulating the data to fit our hypothesis it is very possible that the nuances of each metric become more and more hard to see clearly. By the nature of our hypothesis targeting cities to their respective NBA team, the best we can do is acknowledge that these generalizations will lead to data that may be somewhat skewed.
Conclusion + Discussion
In summary, we looked at how an NBA team’s win rate can have an impact on economic growth. Specifically we looked at the unemployment rate, GDP, and population rate change to measure economic growth and win rate to find a correlation between them, for their respective cities (23 cities). To do this we first cleaned and combined the datasets, making it so that we had the team’s respective city and the economic measures we used for the city so that we can line up the information by year. We then utilized line plots to visualize the trends of GDP, unemployment rate, and population rate change over the years (2001~2020). We used distribution plots to visualize the spread of the data for winrate, GDP, unemployment rate, and population rate change, to be cautious when analyzing our data.
When looking over the linear relationship of the whole data between unemployment rate, GDP, and population rate change with win rate. We found a negative relationship between the Win Rate and Unemployment Rate, but no significance based on the p-value. Looking at the p-value and coefficient, there was no relationship using the whole data frame for GDP, unemployment Rate, and population rate change. From this we went on to use the NBA team’s average ranking from 2001~2019 to group our teams into three groups [top 5, middle, and bottom 5] to do our linear regression analysis. For each group we found no significant relationship between win rate and GDP nor win rate and unemployment rate. For the bottom 5 and the middle group we found no significant relationship between win rate and population change rate. For the top 5 we found a significant relationship between win rate and population change rate. Overall, most of our results indicated that there is no relationship between the economic variables and the NBA teams performance. Yet, we observed that when the Win rate increased, we saw the population change rate also increase for the top 5 teams group.
A limitation of this project is that we looked at a total of 23 teams and their respective cities, limiting the applicability of what we observed: that when the Win rate increased, we saw population change rate also increase for the top 5 teams group. Therefore, our results might not be able to be generalized. The impact of this work on society is minimal since most of our results indicated that there are no relationship between the economic variables and the NBA teams performance.