Appendix D — Chapter 2 Practice

Click here to download this chapter as a Jupyter (.ipynb) file.

The code below loads pandas and numpy and reads two datasets into DataFrames. The first DataFrame, store, describes retail sales for a business. Each row of the dataset represents one product on one order. An order that included multiple products will have multiple rows, each with the same order ID but a different product ID. Now that we have covered grouping in pandas we have the tools to group the rows by order ID so that we can calculate various statistics related to orders.

The boxscores DataFrame is the box score data from the 2023-24 NBA season. Each row represents a player-game, that is, one player’s statistics in one game. Grouping will also help us gain more insight from this dataset, as we can now look at statistics by player, by team, or by game for various time periods.

import pandas as pd
import numpy as np

# Import the store data
store = pd.read_csv("https://neuronjolt.com/data/superstore.csv")

# Convert the Order Date column in the store date to a datetime type
store['Order Date'] = pd.to_datetime(store['Order Date'])

# Import the NBA boxscore data
url = "https://neuronjolt.com/data/nba_bs_2023-24_season_cleaned.csv"
boxscores = pd.read_csv(url)

# Convert the date column in the box score data to a datetime type
boxscores['date'] = pd.to_datetime(boxscores['date'], format = '%Y%m%d')

D.1 Practice Exercise 2-1

How many orders are there in the store DataFrame?

D.2 Practice Exercise 2-2

What is the date range of the orders in the store DataFrame?

D.3 Practice Exercise 2-3

Add a column to the boxscores DataFrame that shows the rank of the player-game according to points scored, with the highest points having rank 1. Use the min ranking method so that all player-games with the same number of points have the same rank, and that rank is the minimum rank of the group.

Then, use nsmallest() to show date, player, points, and points rank for the player-games ranked in the top 5.

D.4 Practice Exercise 2-4

Show date, player, points, rebounds, assists, and steals for player-games that rank in the top 10 (using the ‘min’ ranking method) for assists. Sort by assists in descending order.

D.5 Practice Exercise 2-5

The transform() method of a GroupBy object creates the result of an aggregating function for every row of the input DataFrame. Use assign() to add a temporary column to the boxscores DataFrame that has the player’s total points scored for the season. Then, show player, date, pts, and the total points column for 5 randomly-selected games by L. James (LeBron James).

D.6 Practice Exercise 2-6

Create a new column in the boxscores DataFrame named superstar. Put ‘yes’ in this column for any player ranked in the top 5 (by ‘min’ ranking method) for season total points or assists. Next, display player name and superstar status for all the superstars.

D.7 Practice Exercise 2-7

Calculate the total sales per month, in chronological order of months.

D.8 Practice Exercise 2-8

Who are the top 10 customers by total sales?

D.9 Practice Exercise 2-9

Using the box scores data, show the top 10 scoring performances (player-games) in December. Show only the date, player, and points scored. If there is a tie for tenth highest, keep all the ties.

D.10 Practice Exercise 2-10

Show the player name and their scoring average on Sundays for the top 10 Sunday scoring averages, sorted in descending order.

D.11 Practice Exercise 2-11

Show Luka Doncic’s scoring average by month, in chronological order.

Hint: To put the months in chronological order you should group by year and month.

D.12 Practice Exercise 2-12

Show the top 5 teams in descending order of total points scored in the season.

D.13 Practice Exercise 2-13

Create a DataFrame that shows field goal percentage (field goals made / field goals attempted), free throw percentage (free throws made / free throws attempted), and three point percentage (three point shots made / three point shots attempted) by player for the season for players who had at least 100 attempts of each type (field goals, free throws, three pointers) for the season.

Show the rows with the top 5 field goal percentages.

D.14 Practice Exercise 2-14

Best Products in Best Markets

Show the top 5 product IDs by total sales, sorted in decreasing order, for Cities with sales of at least $100,000.

D.15 Practice Exercise 2-15

Show NBA players and their ranks for minutes played for the season, with rank 1 being the highest number of minutes played. Show just player name and total minutes, and show all players with rank in the top 10. In the case of ties use the minimum rank method.

D.16 Practice Exercise 2-16

Show Steph Curry’s points per minute by month of the season.
Hints: * Calculate points per minute for a month by summing the points and minutes for that month and then dividing total points by total minutes * You could accomplish this task by breaking it into steps * Alternatively, you could use a lambda function within the .assign method to define the variable that represents points per minute

D.17 Practice Exercise 2-17

Show the player name and count of triple doubles during the season, sorted in descending order of triple doubles. Show only the top 10. If there are ties for tenth place make sure all the player names who tied are included.