1

I'm using the titanic dataset and have a created a series Famsize. I'd like to create a second series that outputs 'single' if famsize =1, 'small' if 1 < famsize < 5 and 'large' if famsize >=5.

   Famsize FamsizeDisc
     1         single
     2         small
     5         large

I've tried using np.where but as I have three outputs I haven't been able to find a solution.

Any suggestions?

1
  • 1
    do share what you've attempted so far.
    – parth
    Commented Oct 5, 2017 at 11:00

2 Answers 2

2

Its called binning so use pd.cut i.e

df['new'] = pd.cut(df['Famsize'],bins=[0,1,4,np.inf],labels=['single','small','large'])

Output:

   Famsize FamsizeDisc     new
0        1      single  single
1        2       small   small
2        5       large   large
1

Either you could create a function which does the mapping:

def get_sizeDisc(x):
    if x == 1:
        return 'single'
    elif x < 5:
        return 'small'
    elif x >= 5:
        return 'large'

df['FamsizeDisc'] = df.Famsize.apply(get_sizeDisc)

Or you could use .loc

df.loc[df.Famsize==1, 'FamsizeDisc'] = 'single'
df.loc[df.Famsize.between(1,5, inclusive = False), 'FamsizeDisc'] = 'small'
df.loc[df.Famsize>=5, 'FamsizeDisc'] = 'large'
1
  • My bad, hadn't reloaded the page to see your answer. I'll remove it from my answer and upvote yours, as it's clearly the more concise solution :D
    – greg_data
    Commented Oct 5, 2017 at 11:51

Not the answer you're looking for? Browse other questions tagged or ask your own question.