1

I am attempting to create a new column based on conditional logic from another column. I've tried searching and haven't been able to find anything that addresses my issue.

I have imported a CSV to a pandas dataframe, it is structured like this. I edited a few of the descriptions for this post, but other than that everything is the same:

#code used to load dataframe:
df = pd.read_csv(r"C:\filepath\filename.csv")

#output from print(type(df)):
#class 'pandas.core.frame.DataFrame'

#output from print(df.columns.values):
#['Type' 'Trans Date' 'Post Date' 'Description' 'Amount'] 

#output from print(df.columns):
    Index(['Type', 'Trans Date', 'Post Date', 'Description', 'Amount'], dtype='object')
#output from print

Type  Trans Date   Post Date            Description  Amount
0  Sale  01/25/2018  01/25/2018                  DESC1  -13.95

1  Sale  01/25/2018  01/26/2018   AMAZON MKTPLACE PMTS   -6.99

2  Sale  01/24/2018  01/25/2018          SUMMIT BISTRO   -5.85

3  Sale  01/24/2018  01/25/2018                  DESC3   -9.13

4  Sale  01/24/2018  01/26/2018    DYNAMIC VENDING INC   -1.60

I then write the following code:

def criteria(row):
    if row.Description.find('SUMMIT BISTRO')>0:
        return 'Lunch'
    elif row.Description.find('AMAZON MKTPLACE PMTS')>0:
        return 'Amazon'
    elif row.Description.find('Aldi')>0:
        return 'Groceries'
    else:
        return 'NotWorking'

df['Category'] = df.apply(criteria, axis=0)

Errors:

Traceback (most recent call last):
File "C:\Users\Test_BankReconcile2.py", line 44, in <module>
df['Category'] = df.apply(criteria, axis=0)
  File "C:\Users\Anaconda3\lib\site-packages\pandas\core\frame.py", line 4262, in apply
ignore_failures=ignore_failures)
  File "C:\Users\Anaconda3\lib\site-packages\pandas\core\frame.py", line 4358, in _apply_standard
results[i] = func(v)
  File "C:\Users\OneDrive\Documents\finance\Test_BankReconcile2.py", line 35, in criteria
if row.Description.find('SUMMIT BISTRO')>0:
  File "C:\Users\Anaconda3\lib\site-packages\pandas\core\generic.py", line 3081, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: ("'Series' object has no attribute 'Description'", 'occurred at index Type')

I'm able to successfully execute this same sort of command on a very similar csv file from a different bank (this example is from my credit card), so I don't know what is going on but possibly I need to define the dataframe in some way that I'm not doing? Or possibly something else that is very obvious that I'm not seeing? Thank you all in advance for helping me solve this.

2
  • df.apply(criteria, axis=1)
    – Stephen Rauch
    Commented Jan 28, 2018 at 20:58
  • df.apply(func, axis=0) (axis=0 is default) applies the function func to each column of df (the columns of a DataFrame are series). So your function criteria(row) isn't actually receiving a row, but a column. Changing to axis=1 should fix things. Commented Jan 28, 2018 at 20:59

2 Answers 2

2

Yes, your problem is that you need to pass axis=1 to .apply:

In [52]: df
Out[52]:
   Type  Trans Date   Post Date           Description  Amount
0  Sale  01/25/2018  01/25/2018                 DESC1  -13.95
1  Sale  01/25/2018  01/26/2018  AMAZON MKTPLACE PMTS   -6.99
2  Sale  01/24/2018  01/25/2018         SUMMIT BISTRO   -5.85
3  Sale  01/24/2018  01/25/2018                 DESC3   -9.13
4  Sale  01/24/2018  01/26/2018   DYNAMIC VENDING INC   -1.60

In [53]: def criteria(row):
    ...:     if row.Description.find('SUMMIT BISTRO')>0:
    ...:         return 'Lunch'
    ...:     elif row.Description.find('AMAZON MKTPLACE PMTS')>0:
    ...:         return 'Amazon'
    ...:     elif row.Description.find('Aldi')>0:
    ...:         return 'Groceries'
    ...:     else:
    ...:         return 'NotWorking'
    ...:

In [54]: df.apply(criteria, axis=1)
Out[54]:
0    NotWorking
1    NotWorking
2    NotWorking
3    NotWorking
4    NotWorking
dtype: object

The second problem is you have a logic error, instead of .find(x) > 0 you want .find(x) >= 0, or better yet, some_string in some_other_string

3
  • This is working, thank you. I'd like to keep the Description.find logic as I seem to also be able to use "and row.Amount <= 1.23" which is helpful in further defining categories based on spending and the description.
    – Brian
    Commented Jan 28, 2018 at 21:11
  • @Brian you could still do that. don't use .find unless you actually need the index Commented Jan 28, 2018 at 21:12
  • @juanpa.arrivillaga - be free change/add your solution with in, I have no problem with it (+1)
    – jezrael
    Commented Jan 28, 2018 at 21:18
1

For more general solution omit Description in loop and instead use df['Description'].apply(criteria) with Series.apply.

Also for check substring in string use in.

def criteria(row):
    if 'SUMMIT BISTRO' in row:
        return 'Lunch'
    elif 'AMAZON MKTPLACE PMTS' in row:
        return 'Amazon'
    elif 'Aldi' in row:
        return 'Groceries'
    else:
        return 'NotWorking'

df['Category'] = df['Description'].apply(criteria)
0

Not the answer you're looking for? Browse other questions tagged or ask your own question.