if else function in pandas dataframe [duplicate]

Question

I'm trying to apply an if condition over a dataframe, but I'm missing something (error: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().)

raw_data = {'age1': [23,45,21],'age2': [10,20,50]}
df = pd.DataFrame(raw_data, columns = ['age1','age2'])

def my_fun (var1,var2,var3):
if (df[var1]-df[var2])>0 :
    df[var3]=df[var1]-df[var2]
else:
    df[var3]=0
print(df[var3])

my_fun('age1','age2','diff')

The error means, that in your selected columns are some values, which are evaluated as True and also some, which are evaluated as False. You may need to run my_fun per row. — Michal Polovka, Commented Apr 13, 2017 at 11:55
I don't know this approach per row, could you give me any hint please? — progster, Commented Apr 13, 2017 at 12:03

Community · Accepted Answer · 2017-05-23 10:30:52Z

55

You can use numpy.where:

def my_fun (var1,var2,var3):
    df[var3]= np.where((df[var1]-df[var2])>0, df[var1]-df[var2], 0)
    return df

df1 = my_fun('age1','age2','diff')
print (df1)
   age1  age2  diff
0    23    10    13
1    45    20    25
2    21    50     0

Error is better explain here.

Slowier solution with apply, where need axis=1 for data processing by rows:

def my_fun(x, var1, var2, var3):
    print (x)
    if (x[var1]-x[var2])>0 :
        x[var3]=x[var1]-x[var2]
    else:
        x[var3]=0
    return x    

print (df.apply(lambda x: my_fun(x, 'age1', 'age2','diff'), axis=1))
   age1  age2  diff
0    23    10    13
1    45    20    25
2    21    50     0

Also is possible use loc, but sometimes data can be overwritten:

def my_fun(x, var1, var2, var3):
    print (x)
    mask = (x[var1]-x[var2])>0
    x.loc[mask, var3] = x[var1]-x[var2]
    x.loc[~mask, var3] = 0

    return x    

print (my_fun(df, 'age1', 'age2','diff'))
   age1  age2  diff
0    23    10  13.0
1    45    20  25.0
2    21    50   0.0

edited May 23, 2017 at 10:30

CommunityBot

11 silver badge

answered Apr 13, 2017 at 11:55

jezrael

854k100 gold badges1.4k silver badges1.3k bronze badges

the point is that in real life the conditions are more tricky and it seems that nesting with the np.where could be a little tricky to read. is there any change to do it with a more tradition if-elif-else statement?
– progster
Commented Apr 13, 2017 at 12:05
1

I add solution with apply. You are right, if meany complicated conditions with many elif, apply is better.
– jezrael
Commented Apr 13, 2017 at 12:32
thanks, since you asked more details I opened another topic, just to avoid confusion in this 3d. stackoverflow.com/questions/43393672/…
– progster
Commented Apr 13, 2017 at 13:32

Add a comment |

piRSquared · Accepted Answer · 2017-04-13 13:13:20Z

14

You can use pandas.Series.where

df.assign(age3=(df.age1 - df.age2).where(df.age1 > df.age2, 0))

   age1  age2  age3
0    23    10    13
1    45    20    25
2    21    50     0

You can wrap this in a function

def my_fun(v1, v2):
    return v1.sub(v2).where(v1 > v2, 0)

df.assign(age3=my_fun(df.age1, df.age2))

   age1  age2  age3
0    23    10    13
1    45    20    25
2    21    50     0

answered Apr 13, 2017 at 13:13

piRSquared

292k63 gold badges497 silver badges638 bronze badges

Add a comment |

cardamom · Accepted Answer · 2018-03-20 23:55:31Z

There is another way without np.where or pd.Series.where. Am not saying it is better, but after trying to adapt this solution to a challenging problem today, was finding the syntax for where no so intuitive. In the end, not sure whether it would have possible with where, but found the following method lets you have a look at the subset before you modify it and it for me led more quickly to a solution. Works for the OP here of course as well.

You deliberately set a value on a slice of a dataframe as Pandas so often warns you not to.

This answer shows you the correct method to do that.

The following gives you a slice:

df.loc[df['age1'] - df['age2'] > 0]

..which looks like:

   age1  age2
0    23    10
1    45    20

Add an extra column to the original dataframe for the values you want to remain after modifying the slice:

df['diff'] = 0

Now modify the slice:

df.loc[df['age1'] - df['age2'] > 0, 'diff'] = df['age1'] - df['age2']

..and the result:

   age1  age2  diff
0    23    10    13
1    45    20    25
2    21    50     0

Collectives™ on Stack Overflow

if else function in pandas dataframe [duplicate]

3 Answers 3

Not the answer you're looking for? Browse other questions tagged
python
pandas
if-statement
dataframe
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Not the answer you're looking for? Browse other questions tagged pythonpandasif-statementdataframe or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
pandas
if-statement
dataframe
or ask your own question.