4

I have a df:

  col1 col2 col3
0    1    2    3
1    2    3    1
2    3    3    3
3    4    3    2

I want to add a new column based on the following conditions:

 - if   col1 > col2 > col3   ----->  2
 - elif col1 > col2          ----->  1
 - elif col1 < col2 < col3   -----> -2
 - elif col1 < col2          -----> -1
 - else                      ----->  0

And it should become this:

  col1 col2 col3   new
0    1    2    3   -2
1    2    3    1   -1
2    3    3    3    0
3    4    3    2    2

I followed the method from this post by unutbu, with 1 greater than or less than is fine. But in my case with more than 1 greater than or less than, conditions returns error:

conditions = [
       (df['col1'] > df['col2'] > df['col3']), 
       (df['col1'] > df['col2']),
       (df['col1'] < df['col2'] < df['col3']),
       (df['col1'] < df['col2'])]
choices = [2,1,-2,-1]
df['new'] = np.select(conditions, choices, default=0)


Traceback (most recent call last):

  File "<ipython-input-43-768a4c0ecf9f>", line 2, in <module>
    (df['col1'] > df['col2'] > df['col3']),

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1478, in __nonzero__
    .format(self.__class__.__name__))

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

How should I do this?

1
  • 1
    try using '.eval' method instead to create a single boolean series, e.g., df.eval('col1 > col2 > col3'). This is equivalent to assigning True if, element by element, col1 is greater than col2 and col2 is greater than col3. Otherwise, pandas creates a boolean series for df['col1'] > df['col2'] and then it doesn't know how to evaluate the condition boolean series > df['col3']
    – jtorca
    Commented Jun 23, 2020 at 2:09

2 Answers 2

3

Change your code to

conditions = [
       (df['col1'] > df['col2']) &  (df['col2'] > df['col3']), 
       (df['col1'] > df['col2']),
       (df['col1'] < df['col2']) & (df['col2'] < df['col3']),
       (df['col1'] < df['col2'])]
choices = [2,1,-2,-1]
df['new'] = np.select(conditions, choices, default=0)
0

One option is with case_when from pyjanitor; under the hood it uses pd.Series.mask.

The basic idea is a pairing of condition and expected value; you can pass as many pairings as required, followed by a default value and a target column name:

# pip install pyjanitor
import pandas as pd
import janitor

df.case_when( 
      # condition, value
     'col1>col2>col3', 2,
     'col1>col2', 1,
     'col1<col2<col3', -2,
     'col1<col2', -1,
     0, # default
     column_name = 'new')

   col1  col2  col3  new
0     1     2     3   -2
1     2     3     1   -1
2     3     3     3    0
3     4     3     2    2

The code above uses strings for the conditions, which are evaluated by pd.eval on the parent dataframe - note that speed wise, this can be slower for small datasets. A faster option (depending on the data size) would be to avoid the pd.eval option:

df.case_when( 
      # condition, value
     df.col1.gt(df.col2) & df.col2.gt(df.col3), 2,
     df.col1.gt(df.col2), 1,
     df.col1.lt(df.col2) & df.col2.lt(df.col3), -2,
     df.col1.lt(df.col2), -1,
     0, # default
     column_name = 'new')

   col1  col2  col3  new
0     1     2     3   -2
1     2     3     1   -1
2     3     3     3    0
3     4     3     2    2

Not the answer you're looking for? Browse other questions tagged or ask your own question.