How to apply conditional logic to a Pandas DataFrame.

See DataFrame shown below,

   data desired_output
0     1          False
1     2          False
2     3           True
3     4           True

My original data is show in the 'data' column and the desired_output is shown next to it. If the number in 'data' is below 2.5, the desired_output is False.

I could apply a loop and do re-construct the DataFrame... but that would be 'un-pythonic'

  • maybe I don't know pandas, but it seems that you have two numbers in data -- which one are you checking against (seemingly the one on the right? What relevance is the number on the left?)
    – mgilson
    Commented Feb 5, 2013 at 18:26
  • 4
    the number on the left is the index and the one on the right is the data
    – nitin
    Commented Feb 5, 2013 at 18:31
  • Does this answer your question? Pandas conditional creation of a series/dataframe column
    – AMC
    Commented Jan 25, 2020 at 19:14

5 Answers 5

In [1]: df
0     1
1     2
2     3
3     4

You want to apply a function that conditionally returns a value based on the selected dataframe column.

In [2]: df['data'].apply(lambda x: 'true' if x <= 2.5 else 'false')
0     true
1     true
2    false
3    false
Name: data

You can then assign that returned column to a new column in your dataframe:

In [3]: df['desired_output'] = df['data'].apply(lambda x: 'true' if x <= 2.5 else 'false')

In [4]: df
   data desired_output
0     1           true
1     2           true
2     3          false
3     4          false
  • Although this answer is more verbose and not as simple as the answer @Jasc gave, it is more general and can be applied to other situations in which one wants output other than true and false. Commented Jun 20, 2018 at 16:49
  • 5
    apply + lambda is not recommended for easily vectorisable operations. Use np.where or loc methods instead to utilize Pandas / NumPy vectorisation.
    – jpp
    Commented Aug 10, 2018 at 13:12

Just compare the column with that value:

In [9]: df = pandas.DataFrame([1,2,3,4], columns=["data"])

In [10]: df
0     1
1     2
2     3
3     4

In [11]: df["desired"] = df["data"] > 2.5
In [11]: df
   data desired
0     1   False
1     2   False
2     3    True
3     4    True
In [34]: import pandas as pd

In [35]: import numpy as np

In [36]:  df = pd.DataFrame([1,2,3,4], columns=["data"])

In [37]: df
0     1
1     2
2     3
3     4

In [38]: df["desired_output"] = np.where(df["data"] <2.5, "False", "True")

In [39]: df
   data desired_output
0     1          False
1     2          False
2     3           True
3     4           True
  • 1
    This is good, but the < seems unnecessarily confusing. If the condition is true, the first value results, if false the second value results. So it seems far more clear (and equivalent) to have the right side = np.where(df["data"] >= 2.5, "True", "False") Commented Oct 16, 2018 at 14:48

In this specific example, where the DataFrame is only one column, you can write this elegantly as:

df['desired_output'] = df.le(2.5)

le tests whether elements are less than or equal 2.5, similarly lt for less than, gt and ge.

  • OP wants to return False if df['data'] < 2.5. So you should use gt here.
    – rachwa
    Commented Jun 19, 2022 at 17:17

You can also use eval here:

In [3]: df.eval('desired_output = data >= 2.5', inplace=True)

In [4]: df
   data  desired_output
0     1           False
1     2           False
2     3            True
3     4            True

Since inplace=True you don't need to assign it back to df.

Not the answer you're looking for? Browse other questions tagged or ask your own question.