Pandas conditional creation of a dataframe column: based on multiple conditions

Question

I have a df:

  col1 col2 col3
0    1    2    3
1    2    3    1
2    3    3    3
3    4    3    2

I want to add a new column based on the following conditions:

 - if   col1 > col2 > col3   ----->  2
 - elif col1 > col2          ----->  1
 - elif col1 < col2 < col3   -----> -2
 - elif col1 < col2          -----> -1
 - else                      ----->  0

And it should become this:

  col1 col2 col3   new
0    1    2    3   -2
1    2    3    1   -1
2    3    3    3    0
3    4    3    2    2

I followed the method from this post by unutbu, with 1 greater than or less than is fine. But in my case with more than 1 greater than or less than, conditions returns error:

conditions = [
       (df['col1'] > df['col2'] > df['col3']), 
       (df['col1'] > df['col2']),
       (df['col1'] < df['col2'] < df['col3']),
       (df['col1'] < df['col2'])]
choices = [2,1,-2,-1]
df['new'] = np.select(conditions, choices, default=0)


Traceback (most recent call last):

  File "<ipython-input-43-768a4c0ecf9f>", line 2, in <module>
    (df['col1'] > df['col2'] > df['col3']),

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1478, in __nonzero__
    .format(self.__class__.__name__))

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

How should I do this?

try using '.eval' method instead to create a single boolean series, e.g., df.eval('col1 > col2 > col3'). This is equivalent to assigning True if, element by element, col1 is greater than col2 and col2 is greater than col3. Otherwise, pandas creates a boolean series for df['col1'] > df['col2'] and then it doesn't know how to evaluate the condition boolean series > df['col3'] — jtorca, Commented Jun 23, 2020 at 2:09

Ehsan · Accepted Answer · 2020-06-23 03:10:55Z

3

Change your code to

conditions = [
       (df['col1'] > df['col2']) &  (df['col2'] > df['col3']), 
       (df['col1'] > df['col2']),
       (df['col1'] < df['col2']) & (df['col2'] < df['col3']),
       (df['col1'] < df['col2'])]
choices = [2,1,-2,-1]
df['new'] = np.select(conditions, choices, default=0)

edited Jun 23, 2020 at 3:10

Ehsan

12.3k2 gold badges23 silver badges35 bronze badges

answered Jun 23, 2020 at 2:09

BENY

322k22 gold badges173 silver badges247 bronze badges

Add a comment |

sammywemmy · Accepted Answer · 2022-03-24 10:58:39Z

One option is with case_when from pyjanitor; under the hood it uses pd.Series.mask.

The basic idea is a pairing of condition and expected value; you can pass as many pairings as required, followed by a default value and a target column name:

# pip install pyjanitor
import pandas as pd
import janitor

df.case_when( 
      # condition, value
     'col1>col2>col3', 2,
     'col1>col2', 1,
     'col1<col2<col3', -2,
     'col1<col2', -1,
     0, # default
     column_name = 'new')

   col1  col2  col3  new
0     1     2     3   -2
1     2     3     1   -1
2     3     3     3    0
3     4     3     2    2

The code above uses strings for the conditions, which are evaluated by pd.eval on the parent dataframe - note that speed wise, this can be slower for small datasets. A faster option (depending on the data size) would be to avoid the pd.eval option:

df.case_when( 
      # condition, value
     df.col1.gt(df.col2) & df.col2.gt(df.col3), 2,
     df.col1.gt(df.col2), 1,
     df.col1.lt(df.col2) & df.col2.lt(df.col3), -2,
     df.col1.lt(df.col2), -1,
     0, # default
     column_name = 'new')

   col1  col2  col3  new
0     1     2     3   -2
1     2     3     1   -1
2     3     3     3    0
3     4     3     2    2

Collectives™ on Stack Overflow

Pandas conditional creation of a dataframe column: based on multiple conditions

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
python
pandas
numpy
dataframe
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Not the answer you're looking for? Browse other questions tagged pythonpandasnumpydataframe or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
pandas
numpy
dataframe
or ask your own question.