Create variable centered_price
containing a version of the price
column with the mean price subtracted.
Ctrl + Alt + 6
import pandas as pd
import numpy as npa = df.price.mean()
centered_price = df.price.apply(lambda x : x - a)=centered_price = df.price - df.price.mean()
Create a variable bargain_wine
with the title of the wine with the highest points-to-price ratio in the dataset.
bargain_idx = (df.points / df.price).idxmax()
bargain_wine = df.loc[bargain_idx, 'title']
Words count
n_trop = df.description.map(lambda desc: "tropical" in desc).sum()
n_fruity = df.description.map(lambda desc: "fruity" in desc).sum()
descriptor_counts = pd.Series([n_trop, n_fruity], index=['tropical', 'fruity'])
Create a series star_ratings
with the number of stars corresponding to each review in the dataset.
def stars(row):
if row.country == 'Canada':
return 3
elif row.points >= 95:
return 3
elif row.points >= 85:
return 2
else:
return 1star_ratings = df.apply(stars, axis='columns')
To add a new columns with multiple condition.
df['mean'] = df.mean(axis=1)
condition = [(df['mean'] > 80),
(df['mean'] <90) & (df['mean'] >60)]
result = ['A', 'B']
df['Grade'] = np.select(condition, result)#other
df['Pass/Fail'] = np.where(df['mean'] > 60, 'pass', 'fail')
=
df['Pass/Fail'] = ['pass' if x > 60 else 'fail' for x in df['mean']]