自学内容网 自学内容网

11个100w条dataframe数据写一张工作表多张工作簿优化记录

问题:目前有11个dataframe,最多的dataframe有100w条数据,现在需要把这11个dataframe写入同一个表格的11个工作簿

解决历程:

1.使用 pd.ExcelWriter方法,默认为xlwriter引擎,修改为 openpyxl:

ps: 在本地写入100w条数据大概2分多,但是服务器上大概要20分钟。。。

with pd.ExcelWriter(self.file_name, engine='openpyxl') as file_writer:
    sheet_name1 = 'aaa'
    frame1.to_excel(writer, engine='openpyxl', encoding='utf-8', sheet_name=sheet_name1, index=False)

    sheet_name2 = 'bbb'
    frame2.to_excel(writer, engine='openpyxl', encoding='utf-8', sheet_name=sheet_name2, index=False)

     ...

上述方法只能简单的写入数据,效率较慢

2.在上述的基础上,使用多线程写入,效率提升一小半

with pd.ExcelWriter(self.file_name, engine="openpyxl") as file_writer:
    """采用多线程方式,提高写入速度"""
        thread1 = threading.Thread(target=data_to_excel, args=(frame1, file_writer, 'aaa', 1))
        thread2 = threading.Thread(target=data_to_excel, args=(frame2, file_writer, 'bbb')
        
       ...

        thread1.start()
        thread2.start()
       ...

        thread1.join()
        thread2.join()
       ...

3.虽然使用了多线程,但是保存时最后的save动作耗费的时间还是很长,所以继续优化,直接使用openpyxl库来进行写入:

from openpyxl.utils.dataframe import dataframe_to_rows
from openpyxl import Workbook


file_writer = Workbook()
thread1 = threading.Thread(target=data_to_excel, args=(cell_del_frame, file_writer, 'aaa', 1))
thread2 = threading.Thread(target=data_to_excel, args=(cell_data, file_writer, 'bbb'))

...


thread1.start()
thread2.start()
...

thread1.join()
thread2.join()
...

file_writer.save('data.xlsx')



"""""""""""""
data_to_excel
"""""""""""""

def data_to_excel(frame, writer, sheet_name, target=None):
    """ frame写入 excel工作簿"""
    print("开始写入:", sheet_name)
    print("监控报告数据长度:", len(frame))
    # frame.to_excel(writer, engine='openpyxl', encoding='utf-8', sheet_name=sheet_name, index=False)
    if target == 1:
        sheet = writer.active
        sheet.title = sheet_name
    else:
        sheet = writer.create_sheet(sheet_name)
    for row in dataframe_to_rows(frame, index=False):
        sheet.append(row)
    print("写入完成:", sheet_name)

整体写入效率提升1/3,18分钟--> 12分钟完成写入


原文地址:https://blog.csdn.net/qq_35204012/article/details/135464450

免责声明:本站文章内容转载自网络资源,如本站内容侵犯了原著者的合法权益,可联系本站删除。更多内容请关注自学内容网(zxcms.com)!