0

I am trying to combine multiple CSV files into an Excel worksheet so I can analyse my data.

So far I have been able to create the worksheet with the names of the CSV files as the tabs, but in the sheets themselves, I can't get the data I want.

I am using read.csv from pandas, but from what I can see and have been able to get working, the syntax only gives me the option to read the data of one CSV file and it then copies that data to all of the tabs which is not what I want.

import pandas as pd 
import os 

os.chdir('C:/Users/user3482176/correct_document_location) 
csv_files = ['doc1.csv', 'doc2.csv', 'doc3.csv'] 
reading = pd.read_csv('C:/Users/user3482176/correct_document_location/doc1.csv') 
with pd.ExcelWriter('combined_csv_files.xlsx', engine='openpyxl') as writer:
    for csv_file in csv_files:
        sheet_name = csv_file.split('.')[0]
        df.to_excel(writer,sheet_name=sheet_name, index=False)
    if_sheet_exists:'overlay'

I have seen that you can use the glob package to get the csv files and then display the data using that but I get an array error. I believe this is because my data has columns of different sizes, which they are meant to be, so not sure how to get around this.

import pandas as pd 
import os 
import glob 

os.chdir('C:/Users/user3482176/correct_document_location) 
csv_files = ['doc1.csv', 'doc2.csv', 'doc3.csv'] 
xlsx_files = glob.glob(os.path.join(directory_path, '*.csv')) 
df_list = [pd.read_csv(file) for file in xlsx_files] 
combined_df = pd.concat(df_list,join='outer', ignore_index=True) 
df = pd.DataFrame(df_list) 

with pd.ExcelWriter('combined_csv_files.xlsx', engine='openpyxl') as writer:
    for csv_file in csv_files:
        sheet_name = csv_file.split('.')[0]
        df.to_excel(writer,sheet_name=sheet_name, index=False)
    if_sheet_exists:'overlay'

Traceback (most recent call last): File "C:\Users\user3482176\PycharmProjects\combination_worksheet\collating sheets.py", line 53, in <module> df = pd.DataFrame(df_list) File "C:\Users\user3482176\PycharmProjects\combination_worksheet\venv\lib\site-packages\pandas\core\frame.py", line 867, in __init__ mgr = ndarray_to_mgr( File "C:\Users\user3482176\PycharmProjects\combination_worksheet\venv\lib\site-packages\pandas\core\internals\construction.py", line 319, in ndarray_to_mgr values = _prep_ndarraylike(values, copy=copy_on_sanitize) File "C:\Users\user3482176\PycharmProjects\combination_worksheet\venv\lib\site-packages\pandas\core\internals\construction.py", line 575, in _prep_ndarraylike values = np.array([convert(v) for v in values]) ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (39,) + inhomogeneous part. 

What I am trying to understand is how I can get around this error to get my multiple CSV files into 1 Excel Worksheet.

2

1 Answer 1

0

I got it working like this:

import pandas as pd
import os

csv_files = ['doc1.csv', 'doc2.csv']

excel_writer = pd.ExcelWriter('output.xlsx', engine='xlsxwriter')

for csv_file in csv_files:

    df = pd.read_csv(csv_file)

    sheet_name = os.path.splitext(os.path.basename(csv_file))[0]
    df.to_excel(excel_writer, sheet_name=sheet_name, index=False)

excel_writer.close()

You could switch the xlsxwriter engine with your openpyxl

This creates an Excel file with a sheet for each .csv file.

3
  • Hey Adrian, thanks for taking a look! This worked for me, I just had to change the names of some of the CSV files so they would be <31 characters. Is it possible for you to explain why my code was not working, it looks like I took a way more complicated way than you. Commented Jul 8 at 9:14
  • @user3482176 sure I can try. First of all I don't see where your directory_path variable is coming from, but I think it is the same you have entered for changing the work directory. So until combined_df everything is fine. With pd.concat() you created one huge DataFrame by joining them together like explained here this is not necessary. Then you try to create a single Dataframe with a list of DataFrames that don't have the same size. This is creating your error since this is not a valid way of creating a DataFrame object.
    – AdrianG
    Commented Jul 8 at 9:35
  • Basically what you needed to adjust in your code was remove the lines where you assign combined_df and df. And then don't create the df_list, instead create the df from a csv in the for loop so you can access the name of the file for the sheet name easily.
    – AdrianG
    Commented Jul 8 at 9:37

Not the answer you're looking for? Browse other questions tagged or ask your own question.