pandas apply function row wise taking too long is there any alternative for below code [duplicate]

Question

I have a data frame and big function like below and i wanted to apply norm_group function to data frame columns but its taking too much time with apply command. is there any way to reduce the time for this code? currently it's taking 24.4s for each loop.

import pandas as pd
import numpy as np

np.random.seed(1234)
n = 1500000

df = pd.DataFrame()
df['group'] = np.random.randint(1700, size=n)
df['ID'] = np.random.randint(5, size=n)
df['s_count'] = np.random.randint(5, size=n)
df['p_count'] = np.random.randint(5, size=n)
df['d_count'] = np.random.randint(5, size=n)
df['Total'] = np.random.randint(400, size=n)
df['Normalized_total'] = df.groupby('group')['Total'].apply(lambda x: (x-x.min())/(x.max()- x.min()))
df['Normalized_total'] = df['Normalized_total'].apply(lambda x:round(x,2))

def norm_group(a,b,c,d,e):
if a >= 0.7 and b >=1000 and c >2:
    return "Both High "
elif a >= 0.7 and b >=1000 and c < 2:
    return "High and C Low"
elif a >= 0.4 and b >=500 and d > 2:
    return "Medium and D High"
elif a >= 0.4 and b >=500 and d < 2:
    return "Medium and D Low"
elif a >= 0.4 and b >=500 and e > 2:
    return "Medium and E High"
elif a >= 0.4 and b >=500 and e < 2:
    return "Medium and E Low"
else:
    return "Low"

%timeit df['Categery'] = df.apply(lambda x:norm_group(a=x['Normalized_total'],b=x['group']), axis=1)

24.4 s ± 551 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

i have multiple text columns in my original data frame and wanted to apply similar kind of function that is taking much more time compare to this one.

Thanks

Lots of way to approach this sort of question, this answer should answer your Q stackoverflow.com/a/39111919 — Umar.H, Commented Nov 12, 2019 at 18:16

Quang Hoang · Accepted Answer · 2019-11-12 18:15:47Z

5

You can vectorize with np.select:

df['Category'] = np.select((df['Normalized_total'].ge(0.7) & df['group'].ge(1000),
                            df['Normalized_total'].ge(0.4) & df['group'].ge(500)),
                           ('High', 'Medium'), default='Low'
                          )

Performance:

255 ms ± 2.71 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

answered Nov 12, 2019 at 18:15

Quang Hoang

150k10 gold badges58 silver badges82 bronze badges

Thanks for the answer. i have edited my question if its only 2 or 3 conditions your answer is correct..suppose if i have multiple if else statements then its difficult write it down in select..is there any way tackle multiple conditions?
– Kumar AK
Commented Nov 12, 2019 at 18:55
1

@KumarAK then check the answer i put in comments it works with n number of conditions.
– Umar.H
Commented Nov 12, 2019 at 18:58
You just stack them into np.select with the default being the last one, e.g. np.select([cond1, cond2, cond3], [val1, val2, val3], default=default_val).
– Quang Hoang
Commented Nov 12, 2019 at 18:59

Add a comment |

Collectives™ on Stack Overflow

pandas apply function row wise taking too long is there any alternative for below code [duplicate]

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
python
pandas
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Not the answer you're looking for? Browse other questions tagged pythonpandas or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
pandas
or ask your own question.