2

Coming from R to Python and I can't seem to figure out a simple case of creating a new column, based on conditionally checking other columns.

# In R, create a 'z' column based on values in x and y columns
df <- data.frame(x=rnorm(100),y=rnorm(100))
df$z <- ifelse(df$x > 1.0 | df$y < -1.0, 'outlier', 'normal')
table(df$z)
# output below
normal outlier 
     66      34 

Attempt at the equivalent statement in Python:

import numpy as np
import pandas as pd
df = pd.DataFrame({'x': np.random.standard_normal(100), 'y': np.random.standard_normal(100)})
df['z'] = 'outlier' if df.x > 1.0 or df.y < -1.0 else 'normal'

However, the following exception is thrown: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

What is the pythonic way of achieving this? Many thanks :)

2 Answers 2

3

Try this:

df['z'] = np.where((df.x > 1.0) | (df.y < -1.0), 'outlier', 'normal')
1

If you want to do elementwise operations on columns you can't adress your columns like this. Use numpy where

Not the answer you're looking for? Browse other questions tagged or ask your own question.