Appendix E — Chapter 2 Practice - Solutions

Click here to download this chapter as a Jupyter (.ipynb) file.

These are suggested solutions to the practice exercises for the Grouping and Aggregating Data with Pandas chapter. For some of the exercises multiple alternative solutions are presented.

The code below loads pandas and numpy and reads two datasets into DataFrames. The first DataFrame, store, describes retail sales for a business. Each row of the dataset represents one product on one order. An order that included multiple products will have multiple rows, each with the same order ID but a different product ID. Now that we have covered grouping in pandas we have the tools to group the rows by order ID so that we can calculate various statistics related to orders.

The boxscores DataFrame is the box score data from the 2023-24 NBA season. Each row represents a player-game, that is, one player’s statistics in one game. Grouping will also help us gain more insight from this dataset, as we can now look at statistics by player, by team, or by game for various time periods.

import pandas as pd
import numpy as np

# Import the store data
store = pd.read_csv("https://neuronjolt.com/data/superstore.csv")

# Convert the Order Date column in the store date to a datetime type
store['Order Date'] = pd.to_datetime(store['Order Date'])

# Import the NBA boxscore data
url = "https://neuronjolt.com/data/nba_bs_2023-24_season_cleaned.csv"
boxscores = pd.read_csv(url)

# Convert the date column in the box score data to a datetime type
boxscores['date'] = pd.to_datetime(boxscores['date'], format = '%Y%m%d')

E.1 Practice Exercise 2-1

How many orders are there in the store DataFrame?

store['Order ID'].nunique()
5009

E.2 Practice Exercise 2-2

What is the date range of the orders in the store DataFrame?

# Solution 1
store['Order Date'].describe()
count                             9994
mean     2016-04-30 00:07:12.259355648
min                2014-01-03 00:00:00
25%                2015-05-23 00:00:00
50%                2016-06-26 00:00:00
75%                2017-05-14 00:00:00
max                2017-12-30 00:00:00
Name: Order Date, dtype: object
# Solution 2
print(f"Earliest Date: {store['Order Date'].min()}")
print(f"Latest Date: {store['Order Date'].max()}")
Earliest Date: 2014-01-03 00:00:00
Latest Date: 2017-12-30 00:00:00

E.3 Practice Exercise 2-3

Add a column to the boxscores DataFrame that shows the rank of the player-game according to points scored, with the highest points having rank 1. Use the min ranking method so that all player-games with the same number of points have the same rank, and that rank is the minimum rank of the group.

Then, use nsmallest() to show date, player, points, and points rank for the player-games ranked in the top 5.

Explanation of Solution Code:

  • We create the new column in the boxscores DataFrame by directly assigning values to it
  • The output shows the smallest 5 ranks, but there is no rank 5. Why is that? Since we set the method = 'min' parameter within the rank() method ties all receive the lowest rank for the group. There is a tie between two player-games at rank 4, so they both get rank 4. The next-highest would be rank 6, because there would be 5 player-games higher than it
  • A new code cell is added below to show 15 smallest ranks, so you can see how the ties are handled
boxscores['pts_rank'] = boxscores['pts'].rank(method = 'min', 
                                              ascending = False)
boxscores.nsmallest(n = 5, columns = 'pts_rank')[['date', 'player', 
                                                  'pts', 'pts_rank']]
date player pts pts_rank
14280 2024-01-26 L. Doncic 73 1.0
13726 2024-01-22 J. Embiid 70 2.0
7396 2023-12-13 G. Antetokounmpo 64 3.0
13768 2024-01-22 K. Towns 62 4.0
14327 2024-01-26 D. Booker 62 4.0
# Added this code to illustrate how ties are handled with the 'min' method
boxscores.nsmallest(n = 15, columns = 'pts_rank')[['date', 'player', 
                                                   'pts', 'pts_rank']]
date player pts pts_rank
14280 2024-01-26 L. Doncic 73 1.0
13726 2024-01-22 J. Embiid 70 2.0
7396 2023-12-13 G. Antetokounmpo 64 3.0
13768 2024-01-22 K. Towns 62 4.0
14327 2024-01-26 D. Booker 62 4.0
23635 2024-03-29 J. Brunson 61 6.0
15660 2024-02-03 Steph Curry 60 7.0
2560 2023-11-09 G. Antetokounmpo 54 8.0
13307 2024-01-19 D. Booker 52 9.0
24107 2024-04-01 D. Booker 52 9.0
25143 2024-04-07 T. Maxey 52 9.0
575 2023-10-28 Z. LaVine 51 12.0
8542 2023-12-20 J. Embiid 51 12.0
15401 2024-02-01 T. Maxey 51 12.0
25423 2024-04-09 A. Edwards 51 12.0

E.4 Practice Exercise 2-4

Show date, player, points, rebounds, assists, and steals for player-games that rank in the top 10 (using the ‘min’ ranking method) for assists. Sort by assists in descending order.

Explanation of Solution Code:

  • Since we aren’t asked to display the rank we don’t need to create a new column for it. We can just use it to create a Boolean array within the loc[] method
  • Notice that there are 16 rows in the top 10. This is because there is a tie with many player-games at 17 assists
(boxscores
 .loc[boxscores['ast'].rank(ascending = False, method = 'min') <= 10,
      ['date', 'player', 'pts', 'reb', 'ast', 'stl']]
 .sort_values(by = 'ast', ascending = False)
)
date player pts reb ast stl
10128 2023-12-30 T. Haliburton 22 5 23 2
9726 2023-12-28 T. Haliburton 21 3 20 2
3787 2023-11-17 N. Jokic 26 16 18 1
6175 2023-12-02 L. Doncic 36 15 18 2
10995 2024-01-05 T. Haliburton 10 8 18 2
20109 2024-03-07 I. Quickley 21 9 18 1
2943 2023-11-12 T. Haliburton 25 1 17 2
3445 2023-11-15 Trae Young 15 1 17 2
7197 2023-12-12 L. Doncic 33 6 17 1
7345 2023-12-13 Trae Young 35 4 17 0
8149 2023-12-18 F. VanVleet 27 8 17 2
13330 2024-01-19 T. Haliburton 21 2 17 0
14658 2024-01-27 L. Doncic 28 10 17 2
17583 2024-02-14 D. Russell 11 9 17 1
18630 2024-02-27 Tyus Jones 14 6 17 2
26360 2024-04-14 L. James 28 11 17 5

E.5 Practice Exercise 2-5

The transform() method of a GroupBy object creates the result of an aggregating function for every row of the input DataFrame. Use assign() to add a temporary column to the boxscores DataFrame that has the player’s total points scored for the season. Then, show player, date, pts, and the total points column for 5 randomly-selected games by L. James (LeBron James).

Explanation of Solution Code:

E.5.0.1 Solution 1

  • We use query() to avoid having to use a lambda function within the loc[]
  • Notice that total points is the same on every row. This is what the transform() method does; it creates the aggregate value for every row that has the group key. So, in this example, evert player-game for the player L. James gets the total points he scored in the season.
# Solution 1

(boxscores
 .assign(tot_pts = boxscores.groupby('player')['pts'].transform('sum'))
 .query('player == "L. James"')
 .loc[:, ['player', 'date', 'pts', 'tot_pts']]
 .sample(5)
)
player date pts tot_pts
2 L. James 2023-10-24 21 1822
15699 L. James 2024-02-03 24 1822
7188 L. James 2023-12-12 33 1822
4184 L. James 2023-11-19 37 1822
5324 L. James 2023-11-27 18 1822

Explanation of Solution Code:

E.5.0.2 Solution 2

  • We use a lambda function within the loc[] because the DataFrame coming down from the previous step doesn’t have a name. The lambda function gives the incoming DataFrame the name df and then creates the Boolean array for the .loc[] row selector based on values in the player column of the incoming DataFrame df['player']).
# Solution 2

(boxscores
 .assign(tot_pts = boxscores.groupby('player')['pts'].transform('sum'))
 .loc[lambda df: df['player'] == 'L. James', 
      ['player', 'date', 'pts', 'tot_pts']]
 .sample(5)
)
player date pts tot_pts
2392 L. James 2023-11-08 18 1822
4474 L. James 2023-11-21 17 1822
24326 L. James 2024-04-03 25 1822
347 L. James 2023-10-26 21 1822
19685 L. James 2024-03-04 19 1822

E.6 Practice Exercise 2-6

Create a new column in the boxscores DataFrame named superstar. Put ‘yes’ in this column for any player ranked in the top 5 (by ‘min’ ranking method) for season total points or assists. Next, display player name and superstar status for all the superstars.

Explanation of Solution Code:

  • To rank the point and assist totals correctly we need just one set of totals per player, so we will need to break this up into steps:
    • First, calculate the total points and assists for each player
    • Next, add the ranks for points and assists, using the ‘min’ method
    • Keep only rows for which either rank is 5 or less
    • Save the list of players from this data
    • Use the list of players with isin() and np.where() to create the new column for boxscores
    • Display the names of the superstars and their superstar status from boxscores using query(). Group by player so that only one row is shown for each superstar player. Use the first() method to show the first value of superstar in the group for each of the players. This works because the superstar value for all rows for a specified player will all have the same value
player_totals = (boxscores
                 .groupby('player', as_index = False)
                 [['pts', 'ast']]
                 .sum()
                )

player_totals['pts_rank'] = player_totals['pts'].rank(method = 'min', ascending = False)
player_totals['ast_rank'] = player_totals['ast'].rank(method = 'min', ascending = False)

superstar_totals = player_totals.query('pts_rank <= 5 or ast_rank <= 5')

superstars = superstar_totals['player'].unique()

boxscores['superstar'] = np.where(boxscores['player'].isin(superstars), 'yes', 'no')

(boxscores
 .query('superstar == "yes"')
 .groupby('player')['superstar'].first()
)
player
D. Sabonis               yes
G. Antetokounmpo         yes
J. Brunson               yes
J. Harden                yes
L. Doncic                yes
N. Jokic                 yes
S. Gilgeous-Alexander    yes
T. Haliburton            yes
Name: superstar, dtype: object

E.7 Practice Exercise 2-7

Calculate the total sales per month, in chronological order of months.

Explanation of Solution Code:

  • Use assign() to create columns for the year and month.
  • Group by year and month. If we only group by month each month’s total will include that month’s sales from all the years in the data.
  • use .agg() to sum the Sales column, naming the result Monthly_Sales
(store
 .assign(Year = store['Order Date'].dt.year, 
         Month = store['Order Date'].dt.month)
 .groupby(['Year', 'Month'])
 .agg(Monthly_Sales = ('Sales', 'sum'))
)
Monthly_Sales
Year Month
2014 1 14236.8950
2 4519.8920
3 55691.0090
4 28295.3450
5 23648.2870
6 34595.1276
7 33946.3930
8 27909.4685
9 81777.3508
10 31453.3930
11 78628.7167
12 69545.6205
2015 1 18174.0756
2 11951.4110
3 38726.2520
4 34195.2085
5 30131.6865
6 24797.2920
7 28765.3250
8 36898.3322
9 64595.9180
10 31404.9235
11 75972.5635
12 74919.5212
2016 1 18542.4910
2 22978.8150
3 51715.8750
4 38750.0390
5 56987.7280
6 40344.5340
7 39261.9630
8 31115.3743
9 73410.0249
10 59687.7450
11 79411.9658
12 96999.0430
2017 1 43971.3740
2 20301.1334
3 58872.3528
4 36521.5361
5 44261.1102
6 52981.7257
7 45264.4160
8 63120.8880
9 87866.6520
10 77776.9232
11 118447.8250
12 83829.3188

E.8 Practice Exercise 2-8

Who are the top 10 customers by total sales?

Explanation of Solution Code:

  • Group by customer ID, sum the sales for each group, and then find the 10 largest with nlargest()
  • Indicated a list of columns, with just one column, Sales, in the list, so that the output will be a DataFrame
  • Since .nlargest() is applied to a DataFrame we need to indicate the column(s) for which we want to find the 10 largest values
  • keep = 'all' specifies that if there is a tie at the end we keep all the customer IDs with the same sales total
(store
 .groupby('Customer ID')[['Sales']]
 .sum()
 .nlargest(n = 10, columns = 'Sales', keep = 'all')
)
Sales
Customer ID
SM-20320 25043.050
TC-20980 19052.218
RB-19360 15117.339
TA-21385 14595.620
AB-10105 14473.571
KL-16645 14175.229
SC-20095 14142.334
HL-15040 12873.298
SE-20110 12209.438
CC-12370 12129.072

E.9 Practice Exercise 2-9

Using the box scores data, show the top 10 scoring performances (player-games) in December. Show only the date, player, and points scored. If there is a tie for tenth highest, keep all the ties.

Explanation of Solution Code:

  • Use .loc[] to limit to player-games in December and indicate the columns to keep
  • Use .nlargest() with keep = 'all' to find the 10 largest, keeping all ties at the bottom. Note that there were three players tied at 43 points for the tenth-highest spot.
(boxscores
 .loc[boxscores['date'].dt.month == 12, ['date', 'player', 'pts']]
 .nlargest(n = 10, columns = 'pts', keep = 'all')
)
date player pts
7396 2023-12-13 G. Antetokounmpo 64
8542 2023-12-20 J. Embiid 51
6396 2023-12-06 J. Embiid 50
7782 2023-12-15 J. Brunson 50
9344 2023-12-25 L. Doncic 50
6374 2023-12-06 D. Bane 49
7997 2023-12-16 Keegan Murray 47
7360 2023-12-13 J. Jackson Jr. 44
9756 2023-12-28 A. Edwards 44
7580 2023-12-14 S. Gilgeous-Alexander 43
8209 2023-12-18 C. Cunningham 43
9516 2023-12-26 D. Fox 43

E.10 Practice Exercise 2-10

Show the player name and their scoring average on Sundays for the top 10 Sunday scoring averages, sorted in descending order.

Explanation of Solution Code:

  • Use .loc[] to limit to Sunday games and specify two columns to keep
  • Group by player and take the mean of points. Note that a lost of columns is indicated in .groupby('player')[['pts']].mean() so that the output of that step is a DataFrame
  • Used .agg() to calculate the mean of points to gain ability to name the column with the result of the aggregation
  • Use .nlargest() to find the 10 highest scoring averages
(boxscores.loc[boxscores['date'].dt.day_name() == 'Sunday',
         ['player', 'pts']]
 .groupby('player')
 .agg(scoring_avg = ('pts', 'mean'))
 .nlargest(n = 10, columns = 'scoring_avg', keep = 'all')
)
scoring_avg
player
J. Embiid 34.666667
L. Doncic 33.777778
T. Maxey 31.375000
L. James 30.200000
G. Antetokounmpo 29.666667
K. Irving 29.333333
L. Markkanen 28.333333
Trae Young 27.666667
GG Jackson 27.500000
A. Edwards 26.727273

E.11 Practice Exercise 2-11

Show Luka Doncic’s scoring average by month, in chronological order.

Hint: To put the months in chronological order you should group by year and month.

(boxscores
 .loc[boxscores['player'] == 'L. Doncic', ['date', 'pts']]
 .assign(year = boxscores['date'].dt.year,
         month = boxscores['date'].dt.month)
 .groupby(['year', 'month'])
 .agg(scoring_avg = ('pts', 'mean'))
)
scoring_avg
year month
2023 10 39.000000
11 29.428571
12 37.538462
2024 1 37.100000
2 33.363636
3 32.500000
4 32.000000

E.12 Practice Exercise 2-12

Show the top 5 teams in descending order of total points scored in the season.

(boxscores
 .groupby('team')['pts'].sum()
 .nlargest(n = 5, keep = 'all')
)
team
IND    10110
BOS     9887
OKC     9847
MIL     9756
ATL     9703
Name: pts, dtype: int64

E.13 Practice Exercise 2-13

Create a DataFrame that shows field goal percentage (field goals made / field goals attempted), free throw percentage (free throws made / free throws attempted), and three point percentage (three point shots made / three point shots attempted) by player for the season for players who had at least 100 attempts of each type (field goals, free throws, three pointers) for the season.

Show the rows with the top 5 field goal percentages.

Explanation of Solution Code:

E.13.0.1 Solution 1

  • We first group by player and sum each player’s shot attempts and shots made for various types of shots
  • Then we apply query() to filter the data, limiting it to just those players who have at least 100 shot attempts of each type (field goals, free throws, and three pointers)
  • Then we use assign() to create columns with percentage made for each type of shot. The input of the assign() step is the output of the previous step. That output is a DataFrame that does not have a name. We need its name in order to refer to its columns to do the calculations to create the new columns, so we need to use lambda expressions within the assign. For example, within the assign() step, consider the following code:
fg_pct = lambda df: df['fg_m'] / df['fg_a']

The new column, fg_pct, gets the result of the lambda expression. The lambda expression takes as its argument the input to the assign() step, which is the DataFrame output by the query() step above, and gives this argument the name df. Then it specifies that the values to be assigned to the fg_pct column are what you get when you divide the fg_m column in df by the fg_a column in df. * After the new columns are created nlargest() is used to show the top 5 by field goal percentage.

# Solution 1, using method chaining
(boxscores.groupby('player')
 [['fg_m', 'fg_a', 'ft_m', 'ft_a', 'tres_m', 'tres_a']]
 .sum()
 .query('fg_a >= 100 & ft_a >= 100 & tres_a >= 100')
 .assign(fg_pct = lambda df: df['fg_m'] / df['fg_a'],
         ft_pct = lambda df: df['ft_m'] / df['ft_a'],
         tres_pct = lambda df: df['tres_m'] / df['tres_a']
        )
 .nlargest(n = 5, columns = 'fg_pct', keep = 'all')         
)
fg_m fg_a ft_m ft_a tres_m tres_a fg_pct ft_pct tres_pct
player
G. Antetokounmpo 837 1369 514 782 34 124 0.611395 0.657289 0.274194
M. Wagner 332 552 166 204 38 115 0.601449 0.813725 0.330435
J. Smith 234 395 72 104 61 144 0.592405 0.692308 0.423611
N. Jokic 822 1411 358 438 83 231 0.582566 0.817352 0.359307
O. Toppin 332 579 77 100 102 253 0.573402 0.770000 0.403162
# Alternate solution, breaking the task into parts

gp_by_player = boxscores.groupby('player')
player_sums = gp_by_player[['fg_m', 'fg_a', 'ft_m', 'ft_a', 'tres_m', 'tres_a']].sum()

# Use .copy() because we are going to assign values to new columns in the next step.
# eligible is a subset of player_sums, so if we try to make new columns for
# eligible we are assigning to a subset of player_sums, which causes an error message.
# To avoid this use .copy() to make eligible a separate DataFrame rather than a 
# name of a subset of player_sums.
eligible = player_sums.query('fg_a >= 100 & ft_a >= 100 & tres_a >= 100').copy()

eligible['fg_pct'] = eligible['fg_m'] / eligible['fg_a']
eligible['ft_pct'] = eligible['ft_m'] / eligible['ft_a']
eligible['tres_pct'] = eligible['tres_m'] / eligible['tres_a']

eligible.nlargest(n = 5, columns = 'fg_pct', keep = 'all')
fg_m fg_a ft_m ft_a tres_m tres_a fg_pct ft_pct tres_pct
player
G. Antetokounmpo 837 1369 514 782 34 124 0.611395 0.657289 0.274194
M. Wagner 332 552 166 204 38 115 0.601449 0.813725 0.330435
J. Smith 234 395 72 104 61 144 0.592405 0.692308 0.423611
N. Jokic 822 1411 358 438 83 231 0.582566 0.817352 0.359307
O. Toppin 332 579 77 100 102 253 0.573402 0.770000 0.403162

E.14 Practice Exercise 2-14

Best Products in Best Markets

Show the top 5 product IDs by total sales, sorted in decreasing order, for Cities with sales of at least $100,000.

Explanation of Solution Code:

E.14.0.1 Solution 1

  • We use assign() to create a City_Sales column
    • The City_Sales column shows the city’s total sales for every row of the original DataFrame because we created it with the transform() method.
    • This preserves the data in the other columns
  • We can then use query() to limit to the cities with sales of 100,000 or more
  • Then group by product ID and sum sales
  • Finally, use largest() to show the top 5
# Solution 1 - Uses .transform() since we need to do two separate groupings,
# by city, then by product
(store
 .assign(City_Sales = store.groupby('City')[['Sales']].transform('sum'))
 .query('City_Sales >= 100_000')
 .groupby('Product ID')[['Sales']].sum()
 .nlargest(n = 5, columns = 'Sales', keep = 'all')
)
Sales
Product ID
TEC-CO-10004722 33599.904
OFF-SU-10000151 13100.240
FUR-CH-10002024 12126.954
FUR-BO-10004834 8809.800
FUR-BO-10002213 8036.834

Explanation of Solution Code:

E.14.0.2 Solution 2

  • Breaks it up into steps, assigning intermediate objects to variables along the way
  • What we want from top_cities is the row index values, since those are the names of the cities with sales of at least 100,000. We use the index property to return the index and the to_numpy() method to change the index from an Index object to just an array, which is a numpy data structure somewhat similar to a python list.
  • Next, we use loc[] with isin() to limit the data to just the top cities
  • Finally, we group by product, sum sales, and use nlargest() to show the top 5
# Solution 2 - Selects by Cities

# Calculate sales sums for cities and limit to just those with sales >= 100,000
top_cities = store.groupby('City')['Sales'].sum()
top_cities = top_cities[top_cities >= 100_000]
top_cities = top_cities.index.to_numpy()

# Limit to top cities
store_subset = store.loc[store['City'].isin(top_cities)]

# Sum sales by product
ss_top_products = store_subset.groupby('Product ID')[['Sales']].sum()

# Show top 5
ss_top_products.nlargest(n = 5, columns = 'Sales', keep = 'all')
Sales
Product ID
TEC-CO-10004722 33599.904
OFF-SU-10000151 13100.240
FUR-CH-10002024 12126.954
FUR-BO-10004834 8809.800
FUR-BO-10002213 8036.834

E.15 Practice Exercise 2-15

Show NBA players and their ranks for minutes played for the season, with rank 1 being the highest number of minutes played. Show just player name and total minutes, and show all players with rank in the top 10. In the case of ties use the minimum rank method.

Explanation of Solution Code:

  • We could have done this with nlargest(), but we used rank() in order to obtain more control over how ties are handled
  • Since the result of the first step, grouping by player and summing minutes, is no longer boxscores we need to use a lambda function within the assign() to create the new column based on the min column of the DataFrame that is passed down from the first step
(boxscores
 .groupby('player')[['min']].sum()
 .assign(min_played_rank = lambda df: df['min'].rank(ascending = False, method = 'min'))
 .query('min_played_rank <= 5')
 .sort_values(by = 'min_played_rank')
)
min min_played_rank
player
D. DeRozan 2995 1.0
D. Sabonis 2931 2.0
C. White 2881 3.0
Mikal Bridges 2855 4.0
P. Banchero 2798 5.0

E.16 Practice Exercise 2-16

Show Steph Curry’s points per minute by month of the season.
Hints: * Calculate points per minute for a month by summing the points and minutes for that month and then dividing total points by total minutes * You could accomplish this task by breaking it into steps * Alternatively, you could use a lambda function within the .assign method to define the variable that represents points per minute

# Solution 1 - Breaks it into steps to avoid using a lambda function

steph_months = (
 boxscores
 .loc[boxscores['player'] == 'Steph Curry']
 .assign(year = boxscores['date'].dt.year,
         month = boxscores['date'].dt.month)
 .groupby(['year', 'month'])
 .agg(total_pts = ('pts', 'sum'),
      total_min = ('min', 'sum'))
)

steph_months.assign(ppm = steph_months['total_pts'] / steph_months['total_min'])
total_pts total_min ppm
year month
2023 10 134 127 1.055118
11 367 438 0.837900
12 318 439 0.724374
2024 1 310 369 0.840108
2 388 455 0.852747
3 291 394 0.738579
4 148 198 0.747475
# Solution 2 - Uses method chaining and lambda function

(boxscores
 .loc[boxscores['player'] == 'Steph Curry']
 .assign(year = boxscores['date'].dt.year,
         month = boxscores['date'].dt.month)
 .groupby(['year', 'month'])
 .agg(total_pts = ('pts', 'sum'),
      total_min = ('min', 'sum'))
 .assign(ppm = lambda df: df['total_pts'] / df['total_min'])
)
total_pts total_min ppm
year month
2023 10 134 127 1.055118
11 367 438 0.837900
12 318 439 0.724374
2024 1 310 369 0.840108
2 388 455 0.852747
3 291 394 0.738579
4 148 198 0.747475

E.17 Practice Exercise 2-17

Show the player name and count of triple doubles during the season, sorted in descending order of triple doubles. Show only the top 10. If there are ties for tenth place make sure all the player names who tied are included.

# Solution 1 - Uses value_counts()

(boxscores
 .loc[(boxscores['pts'] >= 10) & 
      (boxscores['reb'] >= 10) & 
      (boxscores['ast'] >= 10), 'player']
 .value_counts()
 .nlargest(n = 10, keep = 'all')
)
player
D. Sabonis          26
N. Jokic            25
L. Doncic           21
G. Antetokounmpo    10
J. Hart              6
L. James             5
S. Barnes            4
J. Giddey            3
J. Harden            3
P. Banchero          2
A. Sengun            2
A. Davis             2
B. Adebayo           2
J. Embiid            2
Name: count, dtype: int64
# Solution 2 - Uses groupby() and count()
(boxscores
 .loc[(boxscores['pts'] >= 10) & 
      (boxscores['reb'] >= 10) & 
      (boxscores['ast'] >= 10)]
 .groupby('player')['date'].count()
 .nlargest(n = 10, keep = 'all')
)
player
D. Sabonis          26
N. Jokic            25
L. Doncic           21
G. Antetokounmpo    10
J. Hart              6
L. James             5
S. Barnes            4
J. Giddey            3
J. Harden            3
A. Davis             2
A. Sengun            2
B. Adebayo           2
J. Embiid            2
P. Banchero          2
Name: date, dtype: int64