Pandas: update column values from another column if criteria [duplicate]

Question

I have a DataFrame:

I want to update each item column A of the DataFrame with values of column B if value from column A equals 0.

DataFrame I want to get:

I've already tried this code

df['A'] = df['B'].apply(lambda x: x if df['A'] == 0 else df['A'])

It raise an error :The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Not sure this is a duplicate. The linked duplicate is about adding a new column based on another column. This is about updating an existing column (and is easier to find via google). @sailestim My apologies that this was marked as a duplicate. Please keep the questions coming. — informaton, Commented Aug 30, 2022 at 15:26
Answers below use both dot and bracket notation, some references suggest brackets are better: dataschool.io/pandas-dot-notation-vs-brackets stackoverflow.com/questions/41030013/… — Casey, Commented Sep 27, 2022 at 18:34

Rushabh Mehta · Accepted Answer · 2018-08-10 13:08:18Z

38

df['A'] = df.apply(lambda x: x['B'] if x['A']==0 else x['A'], axis=1)

Output

answered Aug 10, 2018 at 13:08

Rushabh Mehta

1,5591 gold badge15 silver badges29 bronze badges

Add a comment |

Zero · Accepted Answer · 2018-08-10 13:07:32Z

16

Use where

In [348]: df.A = np.where(df.A.eq(0), df.B, df.A)

In [349]: df
Out[349]:
    A  B
1:  1  1
2:  0  0
3:  1  1
4:  1  1
5:  1  0

answered Aug 10, 2018 at 13:07

Zero

76k22 gold badges152 silver badges154 bronze badges

1

Which solution is more efficient by time, yours or by Rusabh?
– sailestim
Commented Aug 10, 2018 at 13:12

Add a comment |

ysearka · Accepted Answer · 2018-08-10 13:11:17Z

You can perform this by using a mask:

df = pd.DataFrame()
df['A'] = [0,0,1,0,1]
df['B'] = [1,0,1,1,0]
mask = (df.A == 0)
df.loc[mask,'A'] = df.loc[mask,'B']

    A   B
0   1   1
1   0   0
2   1   1
3   1   1
4   1   0

EDIT: Ok this is actually a unefficient solution:

%timeit df.loc[mask,'A'] = df.loc[mask,'B']
%timeit df.apply(lambda x: x['B'] if x['A']==0 else x['A'], axis=1)
%timeit np.where(df.A.eq(0), df.B, df.A)

5.52 ms ± 556 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
1.27 ms ± 167 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
796 µs ± 89.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

So thanks to zero for this efficient solution with np.where!

This is actually the most efficient as it uses only 100 loops. — Kedar U Shet, Commented Aug 10, 2022 at 19:01

Collectives™ on Stack Overflow

Pandas: update column values from another column if criteria [duplicate]

3 Answers 3

Not the answer you're looking for? Browse other questions tagged
python
pandas
lambda
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Not the answer you're looking for? Browse other questions tagged pythonpandaslambda or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
pandas
lambda
or ask your own question.