Overview¶
Fantasy Football is a game where participants create virtual teams made of real-life football players. These teams then compete against eachother based on how well the selected players perform in their actual games. The goal of this project is to assemble the best possible lineup to maximize fantasy points while staying within certain team and salary constraints.
This project is related to Daily Fantasy Sports (DFS) on DraftKings, specifically for Week 17 and the entire 2024 NFL season. This involves making strategic player seletions while adhering to the constraints.
Problem Outline¶
On DraftKings, participants must select nine players within a $50,000 salary cap, ensuring their lineup includes players from at least two different games and follows the required positional structure:
- 1 Quarterback (QB)
- 2 Running Backs (RBs)
- 3 Wide Receivers (WRs)
- 1 Tight End (TE)
- 1 FLEX (RB, WR, or TE)
- 1 Team Defense/Special Teams (DST)
Each player's performance is converted into fantasy points based on specific actions, such as yards gained, touchdowns score, or defensive plays made. The goal of this project is to use historical and actual game data to determine the best possible team selecion using linear optimization techniques.
Solving Two Optimization Problems:
Problem 1: "Cheat Code" Lineup: I will determine the best possible lineup after the games have already happened, using actual Week 17 statistics.
Problem 2: Pre-Game Predictive Lineup Since in real life we don’t know how players will perform in advance, I will attempt to predict the best lineup using historical data. I have decided to analyze two trends: Do players perform better at home versus away? Does age affect player performance late in the season?
First, Let's get the notebook ready to run the models and import the necessary files.¶
# Connecting to Google Drive
import os
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
# Importing our Neccessary Files
import seaborn as sns
import pandas as pd
import pyomo.environ as pe
import matplotlib.pyplot as plt
Clean and standardize the Data for Model 1¶
The Data
To solve this problem, I am working with three datasets:
- DKSalaries.csv – This dataset provides DraftKings salary information, player names, positions, teams, and their average fantasy points per game.
- NFLWeek17Stats.csv – This dataset contains the actual player performance data for Week 17, which will help determine what the ideal lineup would have been if we had perfect foresight.
- NFL2024SeasonStats.xls – This dataset includes season-long player performance statistics, which will help in predicting which players might perform well in Week 17.
os.chdir("/content/drive/My Drive/Colab Notebooks/Optimization/Final")
# Load Files
SalariesFile = pd.read_csv('DKSalaries.csv')
Week17StatsFile = pd.read_csv('NFLWeek17Stats.csv')
salaries_df = SalariesFile.copy()
stats_df = Week17StatsFile.copy()
We need to clean the data and then merge the Salaries and Week 17 Stats for this first problem.
# Standardize column names (lowercase & replace spaces with underscores)
salaries_df.columns = salaries_df.columns.str.lower().str.replace(" ", "_")
stats_df.columns = stats_df.columns.str.lower().str.replace(" ", "_")
# Merge the datasets
merged_df = stats_df.merge(salaries_df[['name', 'salary', 'roster_position', 'game_info']],
left_on='player', right_on='name', how='left')
# Drop "name" column from salaries_df after merge
merged_df = merged_df.drop(columns=['name'])
Calculation of Total Fantasy Points for each player¶
Fantasy Point Calculation in DraftKings NFL Contest¶
Fantasy football players earn points based on their real-life game performance. In this project, we will use DraftKings' official scoring system to calculate the fantasy points for each player based on their Week 17 game stats.
Offense Scoring:¶
Passing Touchdown (TD): +4 points
Passing Yards: +1 point per 25 yards (or +0.04 points per yard)
300+ Yard Passing Game Bonus: +3 points
Interception Thrown: -1 point
Rushing Touchdown (TD): +6 points
Rushing Yards: +1 point per 10 yards (or +0.1 points per yard)
100+ Yard Rushing Game Bonus: +3 points
Receiving Touchdown (TD): +6 points
Receiving Yards: +1 point per 10 yards (or +0.1 points per yard)
100+ Yard Receiving Game Bonus: +3 points
Reception (Catch): +1 point
Punt/Kickoff/Field Goal (FG) Return for TD: +6 points
Fumble Lost: -1 point
2-Point Conversion (Pass, Run, or Catch): +2 points
Offensive Fumble Recovery TD: +6 points
Defense/Special Teams (DST) Scoring:¶
- Sack: +1 point
- Interception: +2 points
- Fumble Recovery: +2 points
- Punt/Kickoff/Field Goal Return for TD: +6 points
- Interception Return TD: +6 points
- Fumble Recovery TD: +6 points
- Blocked Punt or Field Goal Return TD: +6 points
- Safety: +2 points
- Blocked Kick: +2 points
- 2-Point Conversion/Extra Point Return: +2 points
Points Allowed (Defense):¶
- 0 Points Allowed: +10 points
- 1 – 6 Points Allowed: +7 points
- 7 – 13 Points Allowed: +4 points
- 14 – 20 Points Allowed: +1 point
- 21 – 27 Points Allowed: 0 points
- 28 – 34 Points Allowed: -1 point
- 35+ Points Allowed: -4 points
This scoring system will be applied to our datasets to evaluate player performance and optimize lineup selection.
# Calculate fantasy points for offensive and defensive teams
merged_df["total_fantasy_points"] = (
(merged_df["pass_td"] * 4) + # Passing TDs
(merged_df["pass_yds"] * 0.04) + # Passing yards (1 pt per 25 yards)
(merged_df["pass_yds"].apply(lambda x: 3 if x >= 300 else 0)) + # 300+ Passing Bonus
(merged_df["pass_int"] * -1) + # Interceptions
(merged_df["rush_td"] * 6) + # Rushing TDs
(merged_df["rush_yds"] * 0.1) + # Rushing yards (1 pt per 10 yards)
(merged_df["rush_yds"].apply(lambda x: 3 if x >= 100 else 0)) + # 100+ Rushing Bonus
(merged_df["rcv_td"] * 6) + # Receiving TDs
(merged_df["rcv_yds"] * 0.1) + # Receiving yards (1 pt per 10 yards)
(merged_df["rcv_yds"].apply(lambda x: 3 if x >= 100 else 0)) + # 100+ Receiving Bonus
(merged_df["rec"] * 1) + # Receptions (PPR)
(merged_df["punt/kickoff/fg_return_for_td"] * 6) + # Return TDs
(merged_df["fl"] * -1) + # Fumbles Lost
(merged_df["2pt_conversions"] * 2) + # 2-Point Conversions
(merged_df["fumble_recovery_td"] * 6) + # Offensive Fumble Recovery TDs
# Defensive Scoring
(merged_df["sack"] * 1) + # Sacks
(merged_df["interception"] * 2) + # Interceptions
(merged_df["fumble_recovery"] * 2) + # Fumble Recoveries
(merged_df["interception_return_td"] * 6) + # Interception Return TD
(merged_df["fumble_recovery_td"] * 6) + # Fumble Recovery TD
(merged_df["blocked_punt_or_fg_return_td"] * 6) + # Blocked Punt/FG Return TD
(merged_df["safety"] * 2) + # Safety
(merged_df["blocked_kicks"] * 2) + # Blocked Kicks
(merged_df["2pt_conversions"] * 2) + # 2 Pt Conversion (Defensive Return)
# Points Allowed Scoring
(merged_df["ttl_pts_allowed"].apply(lambda x: 10 if x == 0 else 0)) + # 0 pts allowed
(merged_df["ttl_pts_allowed"].apply(lambda x: 7 if 1 <= x <= 6 else 0)) + # 1-6 pts allowed
(merged_df["ttl_pts_allowed"].apply(lambda x: 4 if 7 <= x <= 13 else 0)) + # 7-13 pts allowed
(merged_df["ttl_pts_allowed"].apply(lambda x: 1 if 14 <= x <= 20 else 0)) + # 14-20 pts allowed
(merged_df["ttl_pts_allowed"].apply(lambda x: -1 if 28 <= x <= 34 else 0)) + # 28-34 pts allowed
(merged_df["ttl_pts_allowed"].apply(lambda x: -4 if x >= 35 else 0)) # 35+ pts allowed
)
Merging DST and Player Data into a Final DataFrame¶
In this section, we are cleaning and preparing our merged dataset to ensure it is structured correctly for further analysis. The key steps include:
1. Extracting the Game ID
- The
game_info
column contains both game identifiers and timestamps. - We extract only the first part (the game identifier) by splitting the string at the space and keeping the first element.
2. Handling Missing Values
- Any missing data in the dataset is replaced with
0
to avoid errors in calculations and analysis.
3. Saving the Processed Data
- The cleaned dataset is exported as a CSV file (
merged_df.csv
) so we can use it for further processing.
4. Confirming Column Names
- We display the column names of the dataset to verify the structure and ensure all necessary fields are included.
5. Updating Roster Positions for Defense Teams
- Since Defensive/Special Teams (DST) do not have specific player names, we update their roster position in the dataset.
- We identify rows where the
player
column contains"Defense"
, then assign"DST"
as theirroster_position
.
Clean up Game Info into 'XXX@YYY' format¶
# Extract the first part of 'game_info' before the space (removes date/time)
merged_df["game_id"] = merged_df["game_info"].str.split(" ").str[0]
# Fill missing values with 0
merged_df = merged_df.fillna(0)
merged_df.to_csv('merged_df.csv', index=False)
merged_df.columns
Index(['player', 'tm', 'pass_cmp', 'pass_att', 'pass_yds', 'pass_td', 'pass_int', 'pass_sack', 'pass_sack_yds', 'pass_long_complete', 'ruch_att', 'rush_yds', 'rush_td', 'rush_lng', 'rcv_tgt', 'rec', 'rcv_yds', 'rcv_td', 'rcv_lng', 'fmb', 'fl', '2pt_conversions', 'ttl_pts_allowed', 'safety', 'blocked_kicks', 'sack', 'interception', 'fumble_recovery', 'punt/kickoff/fg_return_for_td', 'interception_return_td', 'fumble_recovery_td', 'blocked_punt_or_fg_return_td', 'salary', 'roster_position', 'game_info', 'total_fantasy_points', 'game_id'], dtype='object')
Change roster_position for Defensive Teams to DST¶
# Make roster_position for defense teams (contains 'Desfense')= DST
merged_df.loc[merged_df["player"].str.contains("Defense"), "roster_position"] = "DST"
Problem 1: Selecting Lineup from the "Cheat Code" (week 17 stats merged into merged_df)¶
Creating the Optimization Model¶
In this section, we define the optimization model that will help us select the best possible fantasy football lineup while following the constraints set by DraftKings.
1. Defining the Decision Variables
- We create a mathematical model using Pyomo (
pe.ConcreteModel()
), which allows us to set up and solve optimization problems. - We define binary decision variables:
x
: Represents whether a player is selected (1
for selected,0
otherwise).y
: Represents whether a game is included in the lineup (1
if at least one player from that game is chosen,0
otherwise).
# players
players= merged_df['player'].unique()
# games
games= merged_df['game_id'].unique()
Create the Model and add the decision variables¶
model= pe.ConcreteModel()
# Add Decision Variables to Model
model.x= pe.Var(players, within= pe.Binary)
model.y= pe.Var(games, within= pe.Binary)
Adding the Constraints to the model¶
2. Adding Constraints to the Model
- Roster Size: The lineup must include exactly 9 players.
- Salary Cap: The total salary of selected players must not exceed $50,000.
- Game Variety: The lineup must include players from at least two different games.
- Position Requirements: The final lineup must include:
- 1 Quarterback (QB)
- 2 Running Backs (RBs)
- 3 Wide Receivers (WRs)
- 1 Tight End (TE)
- 1 FLEX (which can be an RB, WR, or TE)
- 1 Defense/Special Teams (DST)
These constraints ensure that our selected lineup is valid within the DraftKings rules while optimizing for the best fantasy points. Next, we will define the objective function to maximize the total fantasy points scored by the lineup.
# Salary Cap Constraint
model.salary_cap = pe.Constraint(
expr=sum(model.x[player] * merged_df.set_index("player")["salary"][player] for player in players) <= 50000
)
# Players from at least 2 games
model.min_games = pe.Constraint(
expr=sum(model.y[g] for g in games) >= 2
)
# Total player constraint
model.total_players = pe.Constraint(
expr=sum(model.x[player] for player in players) == 9
)
# 1 QB
model.qb = pe.Constraint(expr=sum(model.x[player] for player in merged_df[merged_df["roster_position"] == "QB"]["player"]) == 1)
# 2 RB
model.rb = pe.Constraint(expr=sum(model.x[player] for player in merged_df[merged_df["roster_position"] == "RB/FLEX"]["player"]) >= 2)
# 3 WR
model.wr = pe.Constraint(expr=sum(model.x[player] for player in merged_df[merged_df["roster_position"] == "WR/FLEX"]["player"]) >= 3)
# 1 TE
model.te = pe.Constraint(expr=sum(model.x[player] for player in merged_df[merged_df["roster_position"] == "TE/FLEX"]["player"]) >= 1)
# 1 FLEX (Must be RB, WR, or TE)
model.flex = pe.Constraint(expr=sum(model.x[player] for player in merged_df[merged_df["roster_position"].isin(["RB/FLEX", "WR/FLEX", "TE/FLEX"])]["player"]) == 7)
# 1 DST (Defense is listed as "[Team] Defense" in the "player" column)
model.dst = pe.Constraint(expr=sum(model.x[player] for player in merged_df[merged_df["player"].str.contains("Defense")]["player"]) == 1)
Objective Function and Model Solution¶
In this section, we define and solve the optimization model to find the best possible fantasy football lineup.
1. Defining the Objective Function
- Our goal is to maximize the total fantasy points of the selected lineup.
- We achieve this by summing the total fantasy points for each selected player, using the values stored in the dataset.
2. Solving the Optimization Model
- We use
SolverFactory('appsi_highs')
to find the optimal solution. - After solving the model, we print the solver status to confirm whether the solution is optimal.
3. Extracting and Displaying the Results
- We retrieve the selected players and selected games by checking which decision variables have a value of
1
(indicating they are chosen in the lineup). - We calculate the total salary of the selected lineup to ensure it remains under the $50,000 cap.
- The final lineup, including player names, positions, and salaries, is stored in a DataFrame for easy interpretation.
This step finalizes our lineup selection by applying linear optimization and finding the best-performing fantasy football team.
# Objective Function
model.obj = pe.Objective(
expr=sum(model.x[player] * merged_df.set_index("player")["total_fantasy_points"][player] for player in players),
sense=pe.maximize
)
opt=pe.SolverFactory('appsi_highs')
result= opt.solve(model)
print(result.solver.status, result.solver.termination_condition)
ok optimal
obj_val=model.obj.expr()
# Print the Decision Variables in a data frame
selected_players= [player for player in players if model.x[player].value == 1]
selected_games= [game for game in games if model.y[game].value == 1]
# Print the Decision Variables in a data frame
selected_players= [player for player in players if model.x[player].value == 1]
selected_games= [game for game in games if model.y[game].value == 1]
# Create DataFrame for selected players
salary_cap= sum(merged_df.set_index("player")["salary"][player] for player in selected_players)
print(f'The optimal lineup has {obj_val:,.2f} points and a total salary of ${salary_cap:,.2f}:')
selected_players_df = merged_df[merged_df["player"].isin(selected_players)][["player", "roster_position", "salary", "game_id", "total_fantasy_points"]]
selected_players_df
The optimal lineup has 377.16 points and a total salary of $49,200.00:
player | roster_position | salary | game_id | total_fantasy_points | |
---|---|---|---|---|---|
4 | Trey McBride | TE/FLEX | 6200.0 | ARI@LAR | 43.30 |
73 | Bucky Irving | RB/FLEX | 6700.0 | CAR@TB | 36.00 |
101 | DeVonta Smith | WR/FLEX | 5800.0 | DAL@PHI | 43.00 |
119 | Tee Higgins | WR/FLEX | 6900.0 | DEN@CIN | 54.10 |
135 | Ricky Pearsall | WR/FLEX | 4200.0 | DET@SF | 41.70 |
163 | Jonathan Taylor | RB/FLEX | 7900.0 | IND@NYG | 40.60 |
171 | Drew Lock | QB | 4900.0 | IND@NYG | 47.36 |
174 | Malik Nabers | WR/FLEX | 6600.0 | IND@NYG | 49.10 |
329 | PHI Defense | DST | 0.0 | 0 | 22.00 |
Model Results: Optimized Fantasy Football Lineup¶
After solving the optimization model, we have generated the best possible fantasy football lineup while staying within the constraints set by DraftKings (salary cap, roster size, and positional requirements).
Key Findings:¶
- Total Fantasy Points: 377.16
- Total Salary Used: $49,200
- Number of Players Selected: 9
- Games Represented: Players are chosen from multiple games to meet the game variety constraint.
Observations:¶
High-Scoring Players Selected
- The optimization model has prioritized high fantasy point earners while ensuring salary constraints are met.
- Tee Higgins was the highest scorer at 54.10 points.
- Malik Nabers and Drew Lock also contributed significantly with 49.10 and 47.36 points, respectively.
Strategic Salary Allocation
- The model effectively stayed within the salary cap, amd leaves a small buffer while maximizing performance.
- Lower-cost players like Drew Lock and Ricky Pearsall provided high fantasy returns, making them great budget picks.
Game Variety and Position Requirements Met
- Players come from at least two different games, ensuring compliance with DraftKings rules.
- The lineup includes all required positions, such as 1 QB, 2 RBs, 3 WRs, 1 TE, 1 FLEX, and 1 DST.
Defense Selection Strategy
- The PHI Defense was included, contributing 22 fantasy points, which aligns with the importance of selecting a strong DST.
Let's visualize these results:¶
plt.figure(figsize=(12, 8))
# Plot all players (background points)
sns.scatterplot(data=merged_df, x='salary', y='total_fantasy_points', color='gray', alpha=0.5, label='Non-Selected Players')
# Plot selected players (highlighted with vibrant colors)
sns.scatterplot(data=merged_df[merged_df['player'].isin(selected_players)],
x='salary',
y='total_fantasy_points',
hue='roster_position', # Different colors for each roster position
palette='deep',
s=100,
edgecolor='black')
# Titles and labels
plt.title('Salary vs. Total Fantasy Points')
plt.xlabel('Salary ($)')
plt.ylabel('Total Fantasy Points')
# Display legend
plt.legend(title='Roster Position')
# Show plot
plt.grid(True)
plt.show()
Salary vs. Total Fantasy Points: Analyzing Player Selection¶
This scatter plot visualizes the relationship between player salary and total fantasy points scored for both selected and non-selected players in the optimized lineup.
Key Insights:¶
Gray Dots Represent Non-Selected Players
- The majority of non-selected players (gray points) are clustered in the lower fantasy point range.
- Many lower-salary players scored fewer fantasy points, making them suboptimal choices.
Colored Dots Represent Players in the Optimized Lineup
- Players chosen for the final lineup are color-coded by their roster position.
- These players tend to have a high fantasy point output relative to their salary.
- Notably, high-scoring selections such as RB/FLEX, WR/FLEX, and QB are positioned towards the upper-right section, indicating strong fantasy returns for their respective salaries.
Defense/Special Teams (DST) Position
- The DST position (purple) is positioned on the left side. All DST salaries were set to 0 and did not contribute to overall salary for this model.
Optimal Player Selection Strategy
- The model successfully identified players with a good balance between cost and performance.
- Some high-salary players were excluded due to the $50,000 budget constraint, despite their high fantasy points.
Problem 2: Selecting an Optimal Lineup Using Historical Data¶
In this approach, we optimize the lineup without knowing future performance by adjusting player projections based on:
Fatigue (Age-Based Decline)
- Players over 30 lose 3% of their points per year.
- Adjusted Score =
AvgPointsPerGame * (1 - FatiguePenalty)
.
Home vs. Away Performance
- Home players receive a 5% bonus to their projected performance.
- Adjusted Score =
Fatigue Adjusted Score * (1 + HomeBonus)
.
Setting up this problem:¶
- Merge the Salaries file with the 2024 season stats to get player ages.
- Apply the adjustments to projected scores.
- Run the optimization model to select the best lineup based on historical data and strategic factors.
This method simulates a real-world fantasy sports strategy.
The 2024 NFL Season Stats contains 4 sheets¶
- Pass Stats
- Rush Stats
- Rcv Stats
- Def Stats
# Load Excel file
SeasonStatsFile = pd.ExcelFile('NFL2024SeasonStats.xlsx')
# Load Salaries file
Salaries= pd.read_csv('DKSalaries.csv')
# Display actual sheet names for Season Stats
print(SeasonStatsFile.sheet_names)
['PassStats', 'RushStats', 'RcvStats', 'DefStats']
# Load individual sheets from SeasonStatsFile
PassStatsFile = pd.read_excel(SeasonStatsFile, sheet_name='PassStats')
RushStatsFile = pd.read_excel(SeasonStatsFile, sheet_name='RushStats')
RcvStatsFile = pd.read_excel(SeasonStatsFile, sheet_name='RcvStats')
DefStatsFile = pd.read_excel(SeasonStatsFile, sheet_name='DefStats')
# Download each file as csv to see the best way to merge the data
PassStatsFile.to_csv('PassStatsFile.csv', index=False)
RushStatsFile.to_csv('RushStatsFile.csv', index=False)
RcvStatsFile.to_csv('RcvStatsFile.csv', index=False)
DefStatsFile.to_csv('DefStatsFile.csv', index=False)
Merging Player Age and Game Information with Salaries¶
This section finalizes the data preparation by merging player age and game details with the DraftKings salary dataset. This allows us to apply both the fatigue adjustment (age-based decline) and the home/away performance adjustment.
Steps:¶
Standardizing Column Names
- Converts column names to lowercase and removes spaces for consistency.
Extracting and Merging Player Age
- Retrieves player age from the passing, rushing, and receiving stats sheets.
- Combines and merges it with the DraftKings salary file.
- Handles missing values by filling in the median age instead of using zero.
Extracting Game Matchup Information
- Extracts the matchup from
game_info
, which contains both teams playing. - Splits the matchup into away team and home team.
- Extracts the matchup from
Storing and Verifying Data
- Saves the cleaned dataset as
SalariesWithAge.csv
. - Displays the first few rows to confirm successful integration of player age and game information.
- Saves the cleaned dataset as
This preprocessing ensures that both fatigue penalties (age-based decline) and home/away bonuses can be applied to player projections before running the optimization model.
PassStatsFile.columns = PassStatsFile.columns.str.lower().str.replace(" ", "_")
RushStatsFile.columns = RushStatsFile.columns.str.lower().str.replace(" ", "_")
RcvStatsFile.columns = RcvStatsFile.columns.str.lower().str.replace(" ", "_")
Salaries.columns = Salaries.columns.str.lower().str.replace(" ", "_")
DefStatsFile.columns = DefStatsFile.columns.str.lower().str.replace(" ", "_")
# Extract player age from relevant stat sheets after standardizing column names
PassStats_Age = PassStatsFile[["player", "age"]]
RushStats_Age = RushStatsFile[["player", "age"]]
RcvStats_Age = RcvStatsFile[["player", "age"]]
# Combine all age datasets while ensuring no duplicate players
AllAgeStats = pd.concat([PassStats_Age, RushStats_Age, RcvStats_Age]).drop_duplicates(subset=["player"])
# Merge with DraftKings salaries to fill missing ages
SalariesWithAge = Salaries.merge(AllAgeStats, left_on="name", right_on="player", how="left")
# Drop duplicate "player" column after merging
SalariesWithAge.drop(columns=["player"], inplace=True)
# Fill missing ages with the median age instead of 0 to avoid unnecessary penalties
SalariesWithAge.loc[:, "age"] = SalariesWithAge["age"].fillna(SalariesWithAge["age"].median())
# Display the first few rows to verify the merge
SalariesWithAge.head()
position | name_+_id | name | id | roster_position | salary | game_info | teamabbrev | avgpointspergame | age | |
---|---|---|---|---|---|---|---|---|---|---|
0 | RB | Saquon Barkley (37128352) | Saquon Barkley | 37128352 | RB/FLEX | 8600 | DAL@PHI 12/29/2024 01:00PM ET | PHI | 24.49 | 27.0 |
1 | QB | Josh Allen (37128254) | Josh Allen | 37128254 | QB | 8500 | NYJ@BUF 12/29/2024 01:00PM ET | BUF | 24.87 | 28.0 |
2 | RB | Jahmyr Gibbs (37128354) | Jahmyr Gibbs | 37128354 | RB/FLEX | 8500 | DET@SF 12/30/2024 08:15PM ET | DET | 20.04 | 22.0 |
3 | WR | Ja'Marr Chase (37128674) | Ja'Marr Chase | 37128674 | WR/FLEX | 8400 | DEN@CIN 12/28/2024 04:30PM ET | CIN | 24.68 | 24.0 |
4 | WR | Justin Jefferson (37128676) | Justin Jefferson | 37128676 | WR/FLEX | 8300 | GB@MIN 12/29/2024 04:25PM ET | MIN | 20.46 | 25.0 |
SalariesWithAge.to_csv('SalariesWithAge.csv', index=False)
Verify game_info¶
# Ensure "game_info" is not empty before extracting
SalariesWithAge = SalariesWithAge.dropna(subset=["game_info"])
# Split "game_info" to extract the matchup (first part before the date)
SalariesWithAge["matchup"] = SalariesWithAge["game_info"].str.split().str[0]
# Split the "matchup" column into away and home teams
SalariesWithAge["away_team"] = SalariesWithAge["matchup"].str.split("@").str[0]
SalariesWithAge["home_team"] = SalariesWithAge["matchup"].str.split("@").str[1]
# Verify the Data Frame
SalariesWithAge.head()
position | name_+_id | name | id | roster_position | salary | game_info | teamabbrev | avgpointspergame | age | matchup | away_team | home_team | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | RB | Saquon Barkley (37128352) | Saquon Barkley | 37128352 | RB/FLEX | 8600 | DAL@PHI 12/29/2024 01:00PM ET | PHI | 24.49 | 27.0 | DAL@PHI | DAL | PHI |
1 | QB | Josh Allen (37128254) | Josh Allen | 37128254 | QB | 8500 | NYJ@BUF 12/29/2024 01:00PM ET | BUF | 24.87 | 28.0 | NYJ@BUF | NYJ | BUF |
2 | RB | Jahmyr Gibbs (37128354) | Jahmyr Gibbs | 37128354 | RB/FLEX | 8500 | DET@SF 12/30/2024 08:15PM ET | DET | 20.04 | 22.0 | DET@SF | DET | SF |
3 | WR | Ja'Marr Chase (37128674) | Ja'Marr Chase | 37128674 | WR/FLEX | 8400 | DEN@CIN 12/28/2024 04:30PM ET | CIN | 24.68 | 24.0 | DEN@CIN | DEN | CIN |
4 | WR | Justin Jefferson (37128676) | Justin Jefferson | 37128676 | WR/FLEX | 8300 | GB@MIN 12/29/2024 04:25PM ET | MIN | 20.46 | 25.0 | GB@MIN | GB | MIN |
Calculating Adjusted Fantasy Scores¶
This function adjusts player fantasy scores based on two factors:
Fatigue Penalty (for players over 30)
- Players lose 3% of their points per year over age 30.
Home Advantage Bonus
- Players receive a 5% boost if they are playing at home.
The function is applied by row to compute a final_adjusted_score
for each player, which is then stored in the dataset for optimization.
# Define function to calculate adjusted score based on fatigue and home advantage
def calculate_adjusted_score(row):
# Fatigue Penalty (Only applies if age > 30)
fatigue_penalty = 0
if row["age"] > 30:
fatigue_penalty = (row["age"] - 30) * 0.03 # 3% per year over 30
# Apply Fatigue Adjustment
fatigue_adjusted_score = row["avgpointspergame"] * (1 - fatigue_penalty)
# Check if player is playing at home
is_home = row["teamabbrev"] == row["home_team"]
# Apply Home Bonus (5% boost if home)
if is_home:
final_adjusted_score = fatigue_adjusted_score * 1.05
else:
final_adjusted_score = fatigue_adjusted_score
return final_adjusted_score
# Apply the function row-wise
SalariesWithAge["final_adjusted_score"] = SalariesWithAge.apply(calculate_adjusted_score, axis=1)
# Display the updated dataset
SalariesWithAge.head()
position | name_+_id | name | id | roster_position | salary | game_info | teamabbrev | avgpointspergame | age | matchup | away_team | home_team | final_adjusted_score | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | RB | Saquon Barkley (37128352) | Saquon Barkley | 37128352 | RB/FLEX | 8600 | DAL@PHI 12/29/2024 01:00PM ET | PHI | 24.49 | 27.0 | DAL@PHI | DAL | PHI | 25.7145 |
1 | QB | Josh Allen (37128254) | Josh Allen | 37128254 | QB | 8500 | NYJ@BUF 12/29/2024 01:00PM ET | BUF | 24.87 | 28.0 | NYJ@BUF | NYJ | BUF | 26.1135 |
2 | RB | Jahmyr Gibbs (37128354) | Jahmyr Gibbs | 37128354 | RB/FLEX | 8500 | DET@SF 12/30/2024 08:15PM ET | DET | 20.04 | 22.0 | DET@SF | DET | SF | 20.0400 |
3 | WR | Ja'Marr Chase (37128674) | Ja'Marr Chase | 37128674 | WR/FLEX | 8400 | DEN@CIN 12/28/2024 04:30PM ET | CIN | 24.68 | 24.0 | DEN@CIN | DEN | CIN | 25.9140 |
4 | WR | Justin Jefferson (37128676) | Justin Jefferson | 37128676 | WR/FLEX | 8300 | GB@MIN 12/29/2024 04:25PM ET | MIN | 20.46 | 25.0 | GB@MIN | GB | MIN | 21.4830 |
Final Data Adjustment Before Optimization¶
In this step, all Defense/Special Teams (DST) salaries are set to 0 because they are not considered in the salary constraint for optimization.
After this adjustment, the final dataset is saved, and the optimization model is ready to run.
# Set all DST Salaries to 0 since we are not considering them in the salary constraint
SalariesWithAge.loc[SalariesWithAge["roster_position"] == "DST", "salary"] = 0
SalariesWithAge.to_csv('SalariesWithAge.csv', index=False)
Running and Solving the Model¶
Model Setup:¶
1. Decision Variables¶
x[player]
: Binary variable (1 if the player is selected, 0 otherwise).y[game]
: Binary variable (1 if at least one player from the game is selected, 0 otherwise).
2. Constraints¶
- Salary Cap: Total player salaries must not exceed $50,000.
- Minimum Games Rule: Players must come from at least 2 different games.
- Total Players: The lineup must consist of exactly 9 players.
- Position Requirements: Ensures compliance with DraftKings roster rules:
- 1 QB
- 2 RBs
- 3 WRs
- 1 TE
- 1 FLEX (RB, WR, or TE)
- 1 DST (Defense/Special Teams)
3. Objective Function¶
The model maximizes the total adjusted fantasy score, which accounts for:
- Fatigue adjustments (penalizing players over 30).
- Home advantage bonus (5% boost for players at home).
4. Solver Execution¶
- The model is solved using
appsi_highs
.
# Decision Variables
player= SalariesWithAge['name'].unique()
games= SalariesWithAge['game_info'].unique()
# Create Model
model2= pe.ConcreteModel()
# Add Decision Variables to Model
model2.x= pe.Var(player, within= pe.Binary)
model2.y= pe.Var(games, within= pe.Binary)
# Salary Cap Constraint
model2.salary_cap = pe.Constraint(
expr=sum(model2.x[player] * SalariesWithAge.set_index("name")["salary"][player] for player in player) <= 50000
)
# Players from at least 2 games
model2.min_games = pe.Constraint(
expr=sum(model2.y[g] for g in games) >= 2
)
# Total player constraint (must pick exactly 9 players)
model2.total_players = pe.Constraint(
expr=sum(model2.x[player] for player in player) == 9
)
# 1 QB
model2.qb = pe.Constraint(
expr=sum(model2.x[player] for player in SalariesWithAge[SalariesWithAge["roster_position"] == "QB"]["name"]) == 1
)
# 2 RB
model2.rb = pe.Constraint(
expr=sum(model2.x[player] for player in SalariesWithAge[SalariesWithAge["roster_position"] == "RB/FLEX"]["name"]) >= 2
)
# 3 WR
model2.wr = pe.Constraint(
expr=sum(model2.x[player] for player in SalariesWithAge[SalariesWithAge["roster_position"] == "WR/FLEX"]["name"]) >= 3
)
# 1 TE
model2.te = pe.Constraint(
expr=sum(model2.x[player] for player in SalariesWithAge[SalariesWithAge["roster_position"] == "TE/FLEX"]["name"]) >= 1
)
# 1 FLEX (Must be RB, WR, or TE)
model2.flex = pe.Constraint(
expr=sum(model2.x[player] for player in SalariesWithAge[SalariesWithAge["roster_position"].isin(["RB/FLEX", "WR/FLEX", "TE/FLEX"])]["name"]) == 7
)
# 1 DST (Defense is listed in "name" column and must come from Salaries file)
model2.dst = pe.Constraint(
expr=sum(model2.x[player] for player in Salaries[Salaries["roster_position"] == "DST"]["name"]) == 1
)
# Objective Function
model2.obj = pe.Objective(
expr=sum(model2.x[player] * SalariesWithAge.set_index("name")["final_adjusted_score"][player] for player in player),
sense=pe.maximize # We want to maximize total score
)
opt=pe.SolverFactory('appsi_highs')
result= opt.solve(model2)
print(result.solver.status, result.solver.termination_condition)
ok optimal
# objective val
obj_val=model2.obj.expr()
Display the Results¶
# Extract selected players and games
selected_players = [player for player in player if pe.value(model2.x[player]) == 1]
selected_games = [game for game in games if pe.value(model2.y[game]) == 1]
# Compute the total salary of the selected lineup
salary_cap = sum(SalariesWithAge.set_index("name")["salary"].get(player, 0) for player in selected_players)
# Print lineup summary
print(f'The optimal lineup has {pe.value(model2.obj):,.2f} points and a total salary of ${salary_cap:,.2f}:')
# Create DataFrame for selected players with age and matchup home/away info
selected_players_df = SalariesWithAge[SalariesWithAge["name"].isin(selected_players)][["name", "age", "roster_position", "salary", "teamabbrev", "matchup", "final_adjusted_score"]]
selected_players_df
The optimal lineup has 179.17 points and a total salary of $49,300.00:
name | age | roster_position | salary | teamabbrev | matchup | final_adjusted_score | |
---|---|---|---|---|---|---|---|
0 | Saquon Barkley | 27.0 | RB/FLEX | 8600 | PHI | DAL@PHI | 25.71450 |
3 | Ja'Marr Chase | 24.0 | WR/FLEX | 8400 | CIN | DEN@CIN | 25.91400 |
4 | Justin Jefferson | 25.0 | WR/FLEX | 8300 | MIN | GB@MIN | 21.48300 |
19 | Alvin Kamara | 29.0 | RB/FLEX | 7300 | NO | LV@NO | 20.56950 |
21 | Joe Burrow | 28.0 | QB | 7200 | CIN | DEN@CIN | 24.76950 |
286 | David Montgomery | 27.0 | RB/FLEX | 4000 | DET | DET@SF | 16.12000 |
372 | Vikings | 26.0 | DST | 0 | MIN | GB@MIN | 10.63650 |
477 | Chris Godwin | 28.0 | WR/FLEX | 3000 | TB | CAR@TB | 21.56700 |
624 | Taysom Hill | 34.0 | TE/FLEX | 2500 | NO | LV@NO | 12.40008 |
Optimized Lineup Selection Based on Historical Data¶
This section extracts and displays the optimal lineup generated using the adjusted fantasy scores, which account for fatigue penalties and home/away bonuses.
Key Results:¶
- Total Fantasy Points: 179.17
- Total Salary Used: $49,300
- Number of Players Selected: 9
Observations:¶
Balanced Salary Distribution
- The model effectively used $49,300 of the budget while maximizing fantasy points.
- Players like Taysom Hill were selected as valuable low-cost options.
Age and Fatigue Considerations
- Older players, such as Taysom Hill (34) and Alvin Kamara (29), still made the lineup despite fatigue penalties, indicating their strong adjusted performance.
- No players significantly over 34 were selected, showing that the fatigue adjustment impacted the selections.
Home vs. Away Impact
- Several home players (Vikings, Justin Jefferson, and Alvin Kamara) were selected, benefiting from the 5% home performance bonus.
Strategic Position Allocation
- The model ensured position constraints were met, with 1 QB, 2 RBs, 3 WRs, 1 TE, 1 FLEX, and 1 DST.
Visualize the Selection- Players, Salaries and Final Score¶
# Non-selected players DataFrame
non_selected_players_df = SalariesWithAge[~SalariesWithAge['name'].isin(selected_players)]
plt.figure(figsize=(12, 8))
# Plot non-selected players
plt.scatter(non_selected_players_df['salary'],
non_selected_players_df['final_adjusted_score'],
color='gray', alpha=0.5, label='Non-Selected Players')
# Plot selected players with color by roster position
sns.scatterplot(data=selected_players_df,
x='salary',
y='final_adjusted_score',
hue='roster_position',
palette='viridis',
s=100,
edgecolor='black')
# Add plot title and labels
plt.title('Player Salary vs. Final Adjusted Score')
plt.xlabel('Salary ($)')
plt.ylabel('Final Adjusted Score')
# Display legend
plt.legend(title='Roster Position')
# Show plot
plt.grid(True)
plt.show()
Player Salary vs. Final Adjusted Score¶
This scatter plot visualizes the relationship between player salary ($) and their final adjusted score, which accounts for fatigue penalties (age-based decline) and home/away performance bonuses.
Key Insights:¶
Gray Dots Represent Non-Selected Players
- Most non-selected players (gray) have lower adjusted scores or inefficient salary-to-performance ratios.
Colored Dots Represent Players in the Optimized Lineup
- The model selected players (color-coded by position) with a high adjusted score relative to salary, optimizing cost-effectiveness.
- WR/FLEX and QB selections tend to have higher salaries but offer strong adjusted scores.
Low-Cost, High-Value Picks
- Some selected players (e.g., DST, TE/FLEX) have low salaries but decent adjusted scores, making them valuable budget-friendly choices.
- These choices help maximize total lineup performance while staying under the $50,000 salary cap.
Linear Trend: Higher Salary → Higher Adjusted Score
- Generally, higher-salary players tend to have better adjusted scores, but there are exceptions where some high-cost players are inefficient.
Visualize the Selection- Players, Age and Final Score¶
# Non-selected players DataFrame
non_selected_players_df = SalariesWithAge[~SalariesWithAge['name'].isin(selected_players)]
plt.figure(figsize=(12, 8))
# Plot non-selected players
plt.scatter(non_selected_players_df['age'],
non_selected_players_df['final_adjusted_score'],
color='gray', alpha=0.5, label='Non-Selected Players')
# Plot selected players with color by roster position
sns.scatterplot(data=selected_players_df,
x='age',
y='final_adjusted_score',
hue='roster_position',
palette='viridis',
s=100,
edgecolor='black')
# Add plot title and labels
plt.title('Player Age vs. Final Adjusted Score')
plt.xlabel('Age')
plt.ylabel('Final Adjusted Score')
# Display legend
plt.legend(title='Roster Position')
# Show plot
plt.grid(True)
plt.show()
Player Age vs. Final Adjusted Score¶
This scatter plot visualizes the relationship between player age and their final adjusted score, which includes fatigue penalties for older players and home/away performance adjustments.
Key Insights:¶
Gray Dots Represent Non-Selected Players
- The majority of non-selected players have lower adjusted scores, and many are clustered at younger ages.
Colored Dots Represent Players in the Optimized Lineup
- The model prioritized younger to mid-career players (ages 24-30) who maintained high adjusted scores despite potential fatigue penalties.
- The highest-scoring selected players are WR/FLEX, RB/FLEX, and QB.
Fatigue Impact on Older Players
- The fatigue penalty (3% per year over 30) led to fewer older players being selected.
- Taysom Hill (34, TE/FLEX) was included, but his adjusted score is lower than younger counterparts.
Young Peak Performance Range (24-30 Years Old)
- The highest final adjusted scores are concentrated between ages 24-28, suggesting that these players have strong performances without significant fatigue penalties.
Conclusion¶
This visualization confirms that the fatigue adjustment influenced the model’s selections, favoring younger and mid-career players while still allowing high-value older players when justified.
Final Conclusion¶
This project applied optimization techniques to construct an optimal fantasy football lineup while considering real-world constraints like salary caps, positional requirements, and game variety. Two different approaches were tested:
- Cheat Code Model – Used actual Week 17 performance data to identify the highest-scoring lineup.
- Predictive Model Using Historical Data – Adjusted player projections based on fatigue penalties (age-based decline) and home-field advantage bonuses to simulate real-world decision-making.
The first model showed the best possible lineup if future performance were known, while the second model provided a more strategic approach using historical trends.