Pandas Dataframe - Conditional Column Creation

Question

I am attempting to create a new column based on conditional logic from another column. I've tried searching and haven't been able to find anything that addresses my issue.

I have imported a CSV to a pandas dataframe, it is structured like this. I edited a few of the descriptions for this post, but other than that everything is the same:

#code used to load dataframe:
df = pd.read_csv(r"C:\filepath\filename.csv")

#output from print(type(df)):
#class 'pandas.core.frame.DataFrame'

#output from print(df.columns.values):
#['Type' 'Trans Date' 'Post Date' 'Description' 'Amount'] 

#output from print(df.columns):
    Index(['Type', 'Trans Date', 'Post Date', 'Description', 'Amount'], dtype='object')
#output from print

Type  Trans Date   Post Date            Description  Amount
0  Sale  01/25/2018  01/25/2018                  DESC1  -13.95

1  Sale  01/25/2018  01/26/2018   AMAZON MKTPLACE PMTS   -6.99

2  Sale  01/24/2018  01/25/2018          SUMMIT BISTRO   -5.85

3  Sale  01/24/2018  01/25/2018                  DESC3   -9.13

4  Sale  01/24/2018  01/26/2018    DYNAMIC VENDING INC   -1.60

I then write the following code:

def criteria(row):
    if row.Description.find('SUMMIT BISTRO')>0:
        return 'Lunch'
    elif row.Description.find('AMAZON MKTPLACE PMTS')>0:
        return 'Amazon'
    elif row.Description.find('Aldi')>0:
        return 'Groceries'
    else:
        return 'NotWorking'

df['Category'] = df.apply(criteria, axis=0)

Errors:

Traceback (most recent call last):
File "C:\Users\Test_BankReconcile2.py", line 44, in <module>
df['Category'] = df.apply(criteria, axis=0)
  File "C:\Users\Anaconda3\lib\site-packages\pandas\core\frame.py", line 4262, in apply
ignore_failures=ignore_failures)
  File "C:\Users\Anaconda3\lib\site-packages\pandas\core\frame.py", line 4358, in _apply_standard
results[i] = func(v)
  File "C:\Users\OneDrive\Documents\finance\Test_BankReconcile2.py", line 35, in criteria
if row.Description.find('SUMMIT BISTRO')>0:
  File "C:\Users\Anaconda3\lib\site-packages\pandas\core\generic.py", line 3081, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: ("'Series' object has no attribute 'Description'", 'occurred at index Type')

I'm able to successfully execute this same sort of command on a very similar csv file from a different bank (this example is from my credit card), so I don't know what is going on but possibly I need to define the dataframe in some way that I'm not doing? Or possibly something else that is very obvious that I'm not seeing? Thank you all in advance for helping me solve this.

df.apply(func, axis=0) (axis=0 is default) applies the function func to each column of df (the columns of a DataFrame are series). So your function criteria(row) isn't actually receiving a row, but a column. Changing to axis=1 should fix things. — Peter Leimbigler, Commented Jan 28, 2018 at 20:59

juanpa.arrivillaga · Accepted Answer · 2018-01-28 20:59:32Z

Yes, your problem is that you need to pass axis=1 to .apply:

In [52]: df
Out[52]:
   Type  Trans Date   Post Date           Description  Amount
0  Sale  01/25/2018  01/25/2018                 DESC1  -13.95
1  Sale  01/25/2018  01/26/2018  AMAZON MKTPLACE PMTS   -6.99
2  Sale  01/24/2018  01/25/2018         SUMMIT BISTRO   -5.85
3  Sale  01/24/2018  01/25/2018                 DESC3   -9.13
4  Sale  01/24/2018  01/26/2018   DYNAMIC VENDING INC   -1.60

In [53]: def criteria(row):
    ...:     if row.Description.find('SUMMIT BISTRO')>0:
    ...:         return 'Lunch'
    ...:     elif row.Description.find('AMAZON MKTPLACE PMTS')>0:
    ...:         return 'Amazon'
    ...:     elif row.Description.find('Aldi')>0:
    ...:         return 'Groceries'
    ...:     else:
    ...:         return 'NotWorking'
    ...:

In [54]: df.apply(criteria, axis=1)
Out[54]:
0    NotWorking
1    NotWorking
2    NotWorking
3    NotWorking
4    NotWorking
dtype: object

The second problem is you have a logic error, instead of .find(x) > 0 you want .find(x) >= 0, or better yet, some_string in some_other_string

This is working, thank you. I'd like to keep the Description.find logic as I seem to also be able to use "and row.Amount <= 1.23" which is helpful in further defining categories based on spending and the description. — Brian, Commented Jan 28, 2018 at 21:11
@Brian you could still do that. don't use .find unless you actually need the index — juanpa.arrivillaga, Commented Jan 28, 2018 at 21:12
@juanpa.arrivillaga - be free change/add your solution with in, I have no problem with it (+1) — jezrael, Commented Jan 28, 2018 at 21:18

jezrael · Accepted Answer · 2018-01-28 21:00:16Z

1

For more general solution omit Description in loop and instead use df['Description'].apply(criteria) with Series.apply.

Also for check substring in string use in.

def criteria(row):
    if 'SUMMIT BISTRO' in row:
        return 'Lunch'
    elif 'AMAZON MKTPLACE PMTS' in row:
        return 'Amazon'
    elif 'Aldi' in row:
        return 'Groceries'
    else:
        return 'NotWorking'

df['Category'] = df['Description'].apply(criteria)

answered Jan 28, 2018 at 21:00

jezrael

854k100 gold badges1.4k silver badges1.3k bronze badges

Add a comment |

Collectives™ on Stack Overflow

Pandas Dataframe - Conditional Column Creation

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
python
python-3.x
pandas
dataframe
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Not the answer you're looking for? Browse other questions tagged pythonpython-3.xpandasdataframe or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
python
python-3.x
pandas
dataframe
or ask your own question.