利用Python filestream实现文件流读

🕗 发布于 2024-10-16 15:22 python java android

在 Python 中，文件流（filestream）操作通过内置的 open() 函数实现，它提供了对文件的读取、写入、以及流控制的支持。常见的文件模式包括：

r：只读模式（默认）。
w：写入模式（会覆盖已有内容）。
a：追加模式。
r+：读写模式。

下面介绍如何使用文件流进行基本的文件操作，以及如何控制文件流读取（如逐行读取、分块读取等）。

在这里插入图片描述

1、问题背景

在编写一个编译器时，需要逐个字符地读取文件中的内容。如果遇到 “/” 后跟另一个 “/”，则将把其余的行视为注释。使用 file.read(1) 每次读取一个字符。但是，如果查找到 “/” 后面跟着不是 “/” 的字符，有没有办法将文件流向后移动一个字符，以免丢失该字符？

以下是相关代码：

def tokenType(self):
    # PAGE 108
    if (self.current == '{' or self.current == '}' or self.current == '(' or self.current == ')' or self.current == '[' or self.current == ']' or self.current == '.' or self.current == ',' or self.current == ';' or self.current == '-' or self.current == '*' or self.current == '/' or self.current == '&' or self.current == '|' or self.current == '<' or self.current == '>' or self.current == '=' or self.current == '~'):
        if (self.current == '/'):
            next = self.file.read(1)
            if (next == '/'):
                while (next != "\n"):
                    next = self.file.read(1)
                return "IGNORE"
            if (next == '*'):
                while (True):
                    next = self.file.read(1)
                    if (next == '*'):
                        next = self.file.read(1)
                        if (next == '/'):
                            break
                return "IGNORE"
            else:
                return "SYMBOL"
        return "SYMBOL"
    elif (self.current == " " or self.current == "\n"):
        return "IGNORE"
    elif (self.current == "'"):
        while(next != "'"):
            self.current = self.current + next
        return "STRING_CONST"
    elif (type(self.current) == int):
        next = self.file.read(1)
        while(next != " "):
            self.current = self.current + next
        return "INT_CONST"
    else:
        next = self.file.read(1)
        while(next != " " and next != ""):
            self.current = self.current + next
            next = self.file.read(1)
        if (self.current == 'class' or self.current == 'constructor' or self.current == 'function' or self.current == 'method' or self.current == 'field' or self.current == 'static' or self.current == 'var' or self.current == 'int' or self.current == 'char' or self.current == 'boolean' or self.current == 'void' or self.current == 'true' or self.current == 'false' or self.current == 'null' or self.current == 'this' or self.current == 'let' or self.current == 'do' or self.current == 'if' or self.current == 'else' or self.current == 'while' or self.current == 'return'):
            return "KEYWORD"
        else:
            return "IDENTIFIER"

My problem seems to be when I have something like 10/5 and my program checks to see if the next character is a "/". Then on the next pass through my character interpreting function, the 5 has already been removed when it was checking for a comment.
So, is there any way I can get a character from a file stream without it being "removed" from the stream or is there a way I can move it back a character when I hit a case like this?

2、解决方案

第一种方法: 使用 file.seek() 函数调整文件流位置

file.seek() 可以将文件流指针定位到文件中的特定位置。在处理完一个字符后，可以使用 file.seek() 将流指针向前移动一个字符，以便在下次读取时能够读取该字符。

def tokenType(self):
    # PAGE 108
    if (self.current == '{' or self.current == '}' or self.current == '(' or self.current == ')' or self.current == '[' or self.current == ']' or self.current == '.' or self.current == ',' or self.current == ';' or self.current == '-' or self.current == '*' or self.current == '/' or self.current == '&' or self.current == '|' or self.current == '<' or self.current == '>' or self.current == '=' or self.current == '~'):
        if (self.current == '/'):
            next = self.file.read(1)
            if (next == '/'):
                while (next != "\n"):
                    next = self.file.read(1)
                return "IGNORE"
            if (next == '*'):
                while (True):
                    next = self.file.read(1)
                    if (next == '*'):
                        next = self.file.read(1)
                        if (next == '/'):
                            break
                return "IGNORE"
            else:
                self.file.seek(-1, 1)  # 将文件流指针向前移动一个字符
                return "SYMBOL"
        return "SYMBOL"
    elif (self.current == " " or self.current == "\n"):
        return "IGNORE"
    elif (self.current == "'"):
        while(next != "'"):
            self.current = self.current + next
        return "STRING_CONST"
    elif (type(self.current) == int):
        next = self.file.read(1)
        while(next != " "):
            self.current = self.current + next
        return "INT_CONST"
    else:
        next = self.file.read(1)
        while(next != " " and next != ""):
            self.current = self.current + next
            next = self.file.read(1)
        if (self.current == 'class' or self.current == 'constructor' or self.current == 'function' or self.current == 'method' or self.current == 'field' or self.current == 'static' or self.current == 'var' or self.current == 'int' or self.current == 'char' or self.current == 'boolean' or self.current == 'void' or self.current == 'true' or self.current == 'false' or self.current == 'null' or self.current == 'this' or self.current == 'let' or self.current == 'do' or self.current == 'if' or self.current == 'else' or self.current == 'while' or self.current == 'return'):
            return "KEYWORD"
        else:
            return "IDENTIFIER"

My problem seems to be when I have something like 10/5 and my program checks to see if the next character is a "/". Then on the next pass through my character interpreting function, the 5 has already been removed when it was checking for a comment.
So, is there any way I can get a character from a file stream without it being "removed" from the stream or is there a way I can move it back a character when I hit a case like this?

第二种方法: 使用 Python 的 io.StringIO() 类

io.StringIO() 类可以创建一个文件对象，该对象将字符串作为输入。这样，就可以将字符串作为文件流来处理。当需要将文件流指针向前移动时，可以使用 io.StringIO() 的 seek() 方法来调整指针位置。

import io

def tokenType(self):
    string_io = io.StringIO(self.file.read())  # 将文件内容作为字符串读入
    while True:
        char = string_io.read(1)
        if char == '{' or char == '}' or char == '(' or char == ')' or char == '[' or char == ']' or char == '.' or char == ',' or char == ';' or char == '-' or char == '*' or char == '/' or char == '&' or char == '|' or char == '<' or char == '>' or char == '=' or char == '~':
            if char == '/':
                next = string_io.read(1)
                if next == '/':
                    while next != "\n":
                        next = string_io.read(1)
                    return "IGNORE"
                if next == '*':
                    while True:
                        next = string_io.read(1)
                        if next == '*':
                            next = string_io.read(1)
                            if next == '/':
                                break
                    return "IGNORE"
                else:
                    string_io.seek(-1, 1)  # 将文件流指针向前移动一个字符
                    return "SYMBOL"
            return "SYMBOL"
        elif char == " " or char == "\n":
            return

总结

按行读取：适用于逐行处理大文件。
分块读取：适用于内存敏感的操作，尤其是处理超大文件时。
文件指针控制：通过 seek() 和 tell() 可以实现随机访问和流控制。
安全文件操作：使用 with 关键字和异常处理可以确保文件安全、正确地被打开和关闭。

这些方法可以帮助你高效地控制和处理文件流，尤其是在处理大文件时，能够大大优化内存使用。

原文地址：https://blog.csdn.net/weixin_44617651/article/details/142941469

免责声明：本站文章内容转载自网络资源，如本站内容侵犯了原著者的合法权益，可联系本站删除。更多内容请关注自学内容网（zxcms.com）！

上一篇：VSCode 如何格式化某个文件夹下所有的文件
下一篇：Mac 在vscode 中的常用快捷键

JavaWeb合集-SpringBoot项目配套知识
Tomcat是 Apache软件基金会一个核心项目，是一个开源免费的轻量级Web服务器，支持Servlet/JSP少量JavaEE规范。Web服务器是一个软件程序，对HTTP协议的操作进行封装,
阅读更多2024-10-18
【MySQL】内置函数
想必大家在学校也学习过MySQL，可能学的懵懵懂懂，这个板块我们从入门开始，从最新的安装MySQL到学习MySQL语句，一步一步开始，一切都是新的，新的板块新的开始，大家一起努力，一起进步！！！二。
阅读更多2024-10-18
C++核心编程、面向对象
C++核心编程、面向对象
阅读更多2024-10-18
用PHP爬虫API数据获取商品SKU信息实战指南
在电商领域，对商品SKU信息的精准把握是商家取胜的关键。通过PHP爬虫API获取淘宝商品SKU信息，我们能够为电商运营提供数据支持，优化库存管理，制定精准的营销策略。这不仅提高了运营效率，也为消费者提
阅读更多2024-10-18
Devops工具链集成的意义及基本原理
Devops工具链集成的意义在于实现开发（Development）与运维（Operations）之间的紧密协作，通过自动化流程提高软件交付的速度、质量和稳定性。其基本原理是通过一系列相互连接的工具，涵
阅读更多2024-10-18
3D Gaussian Splatting前向渲染代码解读
3D GS前向渲染解读
阅读更多2024-10-18
Android SELinux——策略文件配置结构（八）
在 Android 系统中，SELinux 主要是通过一系列配置文件来进行管理和配置的。这些配置文件涵盖了策略定义、标签映射、签名信息等多个方面。
阅读更多2024-10-18
数据结构--线性表
循环链表是链式存储结构的一种特殊形式，其特点是表中最后一个节点的指针域指向头节点，从而使整个链表形成一个环状结构。这种结构使得链表中的元素可以无限循环地被访问，为某些特定场景下的操作提供了便利。循环链
阅读更多2024-10-18
【OpenGauss源码学习 —— （VecSortAgg）】
在 openGauss (OG) 中，VecSortAgg 是一种基于矢量化的排序聚合操作，它用于在执行 SQL 查询时高效地对数据进行分组和聚合。与传统的逐行处理不同，VecSortAgg 通过批量
阅读更多2024-10-18
决策树C4.5如何处理缺省值
C4.5通过加权的方式有效处理缺失值，无需删除或填补缺失数据。这种灵活性使得它在应对真实世界中的数据集时表现优越，因为真实数据往往存在一定的缺失信息。C4.5的这种策略既能最大限度利用样本信息，又能减
阅读更多2024-10-18

利用Python filestream实现文件流读

总结

相关文章