自学内容网 自学内容网

python win11 编程 实现:读取指定文件夹下所有word文档,然后依次把文档里面的文本返回【zhilu.space】

from pathlib import Path
from docx import Document
import logging
from concurrent.futures import ThreadPoolExecutor

设置日志

logging.basicConfig(level=logging.INFO, format=‘%(asctime)s - %(levelname)s - %(message)s’)

def read_docx(file_path):
doc = Document(file_path)
return ‘\n’.join(para.text for para in doc.paragraphs if para.text.strip())

def process_file(file_path):
try:
content = read_docx(file_path)
if content:
logging.info(f"Content from {file_path.name}:“)
logging.info(content)
except Exception as e:
logging.error(f"An error occurred while reading {file_path.name}: {e}”)

def read_docx_files_from_folder(folder_path):
folder_path = Path(folder_path)
docx_files = list(folder_path.glob(‘*.docx’))

# 使用线程池来并行处理文件
with ThreadPoolExecutor() as executor:
    executor.map(process_file, docx_files)

原文地址:https://blog.csdn.net/bvip911/article/details/142390176

免责声明:本站文章内容转载自网络资源,如本站内容侵犯了原著者的合法权益,可联系本站删除。更多内容请关注自学内容网(zxcms.com)!