Appendix B — Chapter 1 Practice

Click here to download this chapter as a Jupyter (.ipynb) file.

These practice exercises will help you practice using the pandas tools presented in the Pandas Fundamentals chapter. A good strategy for using them is to first try to do the exercises on your own, using the Pandas Fundamentals chapter and online documentation for reference. If you get stuck after spending a reasonable amount of time on an exercise you can look at the suggested solution presented in this textbook. Solutions for all the exercises are included in the next chapter! If you don’t understand the suggested solution you should ask about it.

import pandas as pd
import numpy as np

The code below loads the superstore.csv dataset from an online server into a DataFrame named store. This data describes retail sales for a business. Each row of the dataset represents one product on one order. An order that included multiple products will have multiple rows, each with the same order ID but a different product ID.

store = pd.read_csv("https://neuronjolt.com/data/superstore.csv")

B.1 Practice Exercise 1-1

Use the .info() method to learn more about the store DataFrame

B.2 Practice Exercise 1-2

Use the .sample() method to examine 3 randomly-selected rows of data from the store DataFrame

B.3 Practice Exercise 1-3

Display the names of the columns in the store DataFrame.

B.4 Practice Exercise 1-4

Display the count of unique orders (unique Order IDs) in the store DataFrame.

B.5 Practice Exercise 1-5

Display the counts of rows in the store DataFrame for each unique product ID. Your result should show the unique product IDs with an integer beside each one that represents the count of rows in the dataset that have that product ID.

B.6 Practice Exercise 1-6

Using just one pandas method, display descriptive statistics (including count, mean, std deviation, minimum, 25th percentile, median, 75th percentile, and maximum) for the sales amount of the order lines.

The next exercises makes use of the quiz_df DataFrame, which shows quiz scores for several students. The DataFrame is created and displayed in the code cell below.

quizzes = {
    'name': ["Hannah", "Sam", "Anjali", "Erin", "Latasha"], 
    'quiz_1': [89, 74, 79, 90, 95],
    'quiz_2': [73, 75, 78, 88, 86],
    'quiz_3': [92, 88, 85, 95, 100],
    'quiz_4': [100, 90, 88, 92, 98]
}

quiz_df = pd.DataFrame(quizzes).set_index('name')

quiz_df

B.7 Practice Exercise 1-7

Using one pandas method, calculate and display the median quiz score for each student in the quiz_df DataFrame

B.8 Practice Exercise 1-8

Calculate and display the minimum quiz score for each quiz in the quiz_df DataFrame

B.9 Practice Exercise 1-9

Show the quiz scores for just Anjali. Hint: In quiz_df the names are the row index, so they cannot be accessed as a column.

B.10 Practice Exercise 1-10

Show the scores on quizzes 1 through 3 for Sam and Erin.

Now, let’s load the NBA boxscore data from the 2023-24 season so that we can use it in some of the exercises below. Recall that in this dataset each row represents the statistics for one player in one game.

url = "https://neuronjolt.com/data/nba_bs_2023-24_season_cleaned.csv"
boxscores = pd.read_csv(url)

B.11 Practice Exercise 1-11

Show counts by player of games in which the player scored 50+ points during the season. Hint: Remember that each row represents a player-game, so you should limit the data to rows in which 50 or more points were scored and then count the rows for each player.

B.12 Practice Exercise 1-12

Let’s look for some amazing guard play. Display player, ast and stl, and pts for player-games in which a player had 20 or more assists or 7 or more steals.

B.13 Practice Exercise 1-13

Show player name and count of games (rows) in which the player had 10 or more assists. Use the head() method to limit the result to 10 rows. In other words, create a top-10 list of players according to how many games in which they had 10 or more assists.

B.14 Practice Exercise 1-14

Show player name, points, rebounds, and assists for all the player-games in which the player had at least ten points, ten rebounds, and ten assists (“triple double”). Sort in decreasing order of points.

B.15 Practice Exercise 1-15

Display the entire stat line for any player who scored 40 or more points and 10 or more rebounds in a game while playing for the Lakers (‘LAL’ in the team column).

B.16 Practice Exercise 1-16

How many players are in the boxscores data?

B.17 Practice Exercise 1-17

Convert the date column in the boxscore data to a datetime type.

B.18 Practice Exercise 1-18

Show the date, player, and points for the top 5 games (top 5 points scored by a player in a game) that occurred in January, 2024.

B.19 Practice Exercise 1-19

Show all the unique player names that include the string “James”. Make the test case-sensitive.

B.20 Practice Exercise 1-20

From the sales data, show descriptive statistics for quantity ordered for product ID ‘FUR-CH-10001146’.

The code below creates the scores_df DataFrame, which has some missing values. It shows course section and quiz scores for some students.

scores_df = pd.DataFrame(
    {
        'section': ['A', 'B', 'A', 'A', 'B', 'B', 'A', 'A', 'B', 'A'],
        'student': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
        'q1': [99, 87, 55, 100, 75, None, 88, 90, 92, np.nan],
        'q2': [95, 90, 78, 94, 85, 90, 90, 89, 99, 100],
        'q3': [None, np.nan, None, 75, 93, None, 85, 88, 90, 92]
    }
)

scores_df

B.21 Practice Exercise 1-21

Show all rows in the scores DataFrame for which the quiz 3 score is missing.

B.22 Practice Exercise 1-22

Show all rows in the scores DataFrame for which the quiz 3 score is NOT missing.

B.23 Practice Exercise 1-23

Display the q1 column of the scores_df DataFrame with the missing values in that column replaced by zeros.

B.24 Practice Exercise 1-24

Display the quiz_df, then add a new column to the DataFrame that has the quiz average for each student. Display the DataFrame again after adding the new column.

B.25 Practice Exercise 1-25

Delete the new column you created for the quiz average from the quiz_df DataFrame and then display the DataFrame to verify that the column is gone.