您现在的位置：万盛学电脑网 >> 程序编程 >> 网络编程 >> 编程语言综合 >> 正文

如何编写Python脚本替换文件中的多行字符？

作者：佚名责任编辑：admin 更新时间：2022-06-22

问题描述

解题思路
代码实现
Python的特点
1、问题描述
项目源码很大，属于C/C++混合的那种，编程风格也很多样，有'.c'、'.cc'、'cpp'、'.h'、'.hh'等文件。我要完成的任务是：把包含特定几行内容的注释删掉，如（声明：下面的内容只是我随便举的一个例子，项目源码中不涉及下面的内容。）

/*
* Copyright 2002 Sun Microsystems, Inc. All rights reserved.

*

* Redistribution and use in source and binary forms, with or without

* modification, are permitted provided that the following conditions

* are met:

*

* - Redistributions of source code must retain the above copyright

* notice, this list of conditions and the following disclaimer.

*

* - Redistribution in binary form must reproduce the above copyright

* notice, this list of conditions and the following disclaimer in

* the documentation and/or other materials provided with the

* distribution.

*

* Neither the name of Sun Microsystems, Inc. or the names of

* contributors may be used to endorse or promote products derived

* from this software without specific prior written permission.

*/

但是格式有很多种，如有的在“ Copyright 2002 Sun Microsystems, Inc. All rights reserved.”前面有一段关于本源码文件的描述、有的在“from this software without specific prior written permission.”后面有一段关于本源码文件的描述、有的是C++风格的注释用"//",而不是“/**/”、还有的没有

“ * - Redistribution in binary form must reproduce the above copyright

* notice, this list of conditions and the following disclaimer in

* the documentation and/or other materials provided with the

* distribution.”等等还有其他一些。总之一句话，我要删除的包含特定几行内容的注释有很多中格式！

于是我决定要用Python来编写脚本处理。要匹配特定的内容，我想到了用正则表达式，但苦于不知道如何去构建正则来匹配上面描述的内容（您知道的话，希望能够告诉我）！我只有另辟路径了。

2、解题思路
我的思路——要删除所有项目源码中包含特定几行内容的注释，脚本要满足以下几点功能：

脚本要能够遍历所有的源码文件（'.c'、'.cc'、'cpp'、'.h'、'.hh'），并只处理上面的几种类型的文件
找出包含特定几行内容的注释，并删除之
能够处理一些特殊情况，如软连接文件
上面的几点的处理步骤可以表示如下：

Step 1：输入要处理源码文件夹名，或者源码文件名；

Step 2：如果是文件名，检查文件的类型是否为'.c'、'.cc'、'cpp'、'.h'、'.hh'，否则不处理；

Step 3：检查文件是否是软连接，如果是软连接则不处理；

Step 4：查找文件中是否存在匹配的注释，存在则删掉，否则不处理；

Step 5：如果是文件夹，则对文件夹中的每个文件、文件夹进行处理，转Step2.

思路很明确，关键是如何查找文件中是否包含匹配的内容，并删除！还有就是，对于一个没用过Python等脚本语言的人来说，如何编码实现也是一个问题！

如何确定注释是否为包含特定几行内容的注释？我的思路如下：（因为正则表达式学的不好，只有通过下面的方法了）

如果是/*、//则记录下当前的文件行数，即行号startLine
以行为单位查找是否存在特定的几行，如“ Copyright 2002 Sun Microsystems, Inc. All rights reserved.”等等
直到遇到*/，或注释结束了（对于//）。如果存在，则记录下注释结束的行号endLine
最后，删掉这从startLine ~ endLine的内容。
3、代码实现
废话我不多说了，直接按照上面的实例实现代码，如果你对Python不熟，请参阅相关资料。
#!/usr/bin/env python
#Filename: comment.py

import os, sys, fileinput

#-------------------------------------------------------------
def usage():
    print u'''
    help: comment.py <filename | dirname>

    [dirname]: Option, select a directory to operate
    [filename]: Option, select a file to operate

    Example: python comment.py /home/saylor/test
    '''
#--------------------------------------------------------------
def commentFile(src, fileList):
    '''
    description: comment files
    param src: Operate file name
    '''
    #if file exist?
    ifnot os.path.exists(src):
        print'Error: file - %s doesn't exist.'% src
        return False
    if os.path.islink(src):
        print'Error: file - %s is just a link, will not handle it.'
        return False
    filetype = (os.path.splitext(src))[1]
    ifnot filetype in ['.c','.h']:
        return False
    try:
        ifnot os.access(src, os.W_OK):
            os.chmod(src, 0664)
    except:
        print'Error: you can not chang %s's mode.'% src
    try:
        inputf = open(src, 'r')
        outputfilename = src +'.tmp'
        outputf = open(outputfilename, 'w')
    beginLine = 0
    endLine =100000000
    isMatched = False

    #-----find the beginLine and endLine -------------------
        for eachline in fileinput.input(src):
        if eachline.find('/*') >= 0:
        beginLine = fileinput.lineno()
        if eachline.find('Copyright 2002 Sun Microsystems, Inc. All rights reserved.')>= 0:
            isMatched = True
        if eachline.find('*/') >= 0 and isMatched:
        endLine = fileinput.lineno()
        break

    #-----delete the content between beginLine and endLine-----
    print beginLine, endLine
    lineNo =1
    for eachline in inputf:
        if lineNo < beginLine:
        print eachline
        outputf.write(eachline)
        elif lineNo > endLine:
        print eachline
        outputf.write(eachline)
      &

上一个程序编程： Mac OS X系统搭建谷歌Go语言开发工具 Sublime Text 2环境配置
下一个程序编程： Python文件操作类操作实例代码

电脑店

您现在的位置：万盛学电脑网 >> 程序编程 >> 网络编程 >> 编程语言综合 >> 正文

如何编写Python脚本替换文件中的多行字符？

作者：佚名责任编辑：admin 更新时间：2022-06-22

编程语言综合排行

程序编程推荐

热门文章

相关文章

图片文章

Photoshop调出偏灰人物图片柔美的青绿色

Win8IE10浏览器：锁定的网站磁贴能否干掉收…

MAC磁盘常见问题和解决方法

DiscuzX系统DIY功能使用详细图文指导

万盛电脑知识网 | 设为首页 | 加入收藏 | 关于我们

您现在的位置： 万盛学电脑网 >> 程序编程 >> 网络编程 >> 编程语言综合 >> 正文

如何编写Python脚本替换文件中的多行字符？

作者：佚名 责任编辑：admin 更新时间：2022-06-22

编程语言综合排行

程序编程推荐

热门文章

相关文章

图片文章

Photoshop调出偏灰人物图片柔美的青绿色

Win8IE10浏览器：锁定的网站磁贴能否干掉收…

MAC磁盘常见问题和解决方法

DiscuzX系统DIY功能使用详细图文指导

万盛电脑知识网 | 设为首页 | 加入收藏 | 关于我们

您现在的位置：万盛学电脑网 >> 程序编程 >> 网络编程 >> 编程语言综合 >> 正文

作者：佚名责任编辑：admin 更新时间：2022-06-22