Creating new pandas column based on Series conditional

Question

Coming from R to Python and I can't seem to figure out a simple case of creating a new column, based on conditionally checking other columns.

# In R, create a 'z' column based on values in x and y columns
df <- data.frame(x=rnorm(100),y=rnorm(100))
df$z <- ifelse(df$x > 1.0 | df$y < -1.0, 'outlier', 'normal')
table(df$z)
# output below
normal outlier 
     66      34

Attempt at the equivalent statement in Python:

import numpy as np
import pandas as pd
df = pd.DataFrame({'x': np.random.standard_normal(100), 'y': np.random.standard_normal(100)})
df['z'] = 'outlier' if df.x > 1.0 or df.y < -1.0 else 'normal'

However, the following exception is thrown: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

What is the pythonic way of achieving this? Many thanks :)

MaxU - stand with Ukraine · Accepted Answer · 2017-06-23 19:25:28Z

3

Try this:

df['z'] = np.where((df.x > 1.0) | (df.y < -1.0), 'outlier', 'normal')

answered Jun 23, 2017 at 19:25

MaxU - stand with Ukraine

210k36 gold badges398 silver badges428 bronze badges

Add a comment |

Fredz0r · Accepted Answer · 2017-06-23 19:27:00Z

1

If you want to do elementwise operations on columns you can't adress your columns like this. Use numpy where

answered Jun 23, 2017 at 19:27

Fredz0r

6026 silver badges12 bronze badges

Add a comment |

Collectives™ on Stack Overflow

Creating new pandas column based on Series conditional

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
python
r
pandas
dataframe
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Not the answer you're looking for? Browse other questions tagged pythonrpandasdataframe or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
python
r
pandas
dataframe
or ask your own question.