1、有一个文件,单词之间使用空格、分号、逗号、或者句号分隔,请提取全部单词。
解决方案:
使用\w匹配并提取单词,但是存在误判
使用str.split分隔字符字符串,但是需要多次分隔
使用re.split分隔字符串
In [4]: help(re.split)Help on function split in module re:split(pattern, string, maxsplit=0, flags=0)Split the source string by the occurrences of the pattern,returning a list containing the resulting substrings.
In [23]: text = "i'm xj, i love Python,,Linux; i don't like windows."In [24]: fs = re.split(r"(,|\.|;|\s)+\s*", text)In [25]: fs Out[25]: ["i'm",' ','xj',' ','i',' ','love',' ','Python',',','Linux',' ','i',' ',"don't",' ','like',' ','windows','.','']In [26]: fs[::2] #提取出单词 Out[26]: ["i'm",'xj','i','love','Python','Linux','i',"don't",'like','windows','']In [27]: fs[1::2] #提取出符号 Out[27]: [' ', ' ', ' ', ' ', ',', ' ', ' ', ' ', ' ', '.']In [53]: fs = re.findall(r"[^,\.;\s]+", text)In [54]: fs Out[54]: ["i'm", 'xj', 'i', 'love', 'Python', 'Linux', 'i', "don't", 'like', 'windows']In [55]: fh = re.findall(r'[,\.;\s]', text)In [56]: fh Out[56]: [' ', ',', ' ', ' ', ' ', ',', ',', ';', ' ', ' ', ' ', ' ', '.']
2、有一个目录,保存了若干文件,找出其中所有的C源文件(.c和.h)
解决方案:
使用listdir
使用str.endswith判断
In [13]: s = "xj.c"In [14]: s.endswith(".c") Out[14]: TrueIn [15]: s.endswith(".h") Out[15]: FalseIn [16]: import osIn [17]: os.listdir("/usr/include/") Out[17]: ['libmng.h','netipx','ft2build.h','FlexLexer.h','selinux','QtSql','resolv.h','gio-unix-2.0','wctype.h','python2.6','scsi',...'QtOpenGL','mysql','byteswap.h', , 'xj.c''mntent.h','semaphore.h','stdio_ext.h','libxml2']In [21]: for filename in os.listdir("/usr/include"):if filename.endswith(".c"):print filename....: xj.cIn [22]: for filename in os.listdir("/usr/include"):if filename.endswith((".c", ".h")): #这里元祖是或的关系print filename....: libmng.h ft2build.h FlexLexer.h nss.h png.h utime.h ieee754.h features.h xj.c . . . verto-module.h semaphore.h stdio_ext.hIn [23]:
3、fnmath模块
支持和shell一样的通配符
In [24]: help(fnmatch) #是否区分大小写与操作系统一致Help on function fnmatch in module fnmatch:fnmatch(name, pat)Test whether FILENAME matches PATTERN.Patterns are Unix shell style:* matches everything? matches any single character[seq] matches any character in seq[!seq] matches any char not in seqAn initial period in FILENAME is not special.Both FILENAME and PATTERN are first case-normalizedif the operating system requires it.If you don't want this, use fnmatchcase(FILENAME, PATTERN). ~ (END) In [47]: fnmatch.fnmatch("sba.txt", "*txt") Out[47]: TrueIn [48]: fnmatch.fnmatch("sba.txt", "*t") Out[48]: TrueIn [49]: fnmatch.fnmatch("sba.txt", "*b") Out[49]: FalseIn [50]: fnmatch.fnmatch("sba.txt", "*b*") Out[50]: True
案例: 你有一个程序处理文件,文件名由用户输入,你需要支持和shell一样的通配符。
[root@Node3 src]# cat test1.py #!/usr/local/bin/python2.7 #coding: utf-8import os import sys from fnmatch import fnmatchret = [name for name in os.listdir(sys.argv[1]) if fnmatch(name, sys.argv[2])] print ret [root@Node3 src]# python2.7 test1.py /usr/include/ *.c ['xj.c']
4、re.sub() 文本替换
In [53]: help(re.sub)Help on function sub in module re:sub(pattern, repl, string, count=0, flags=0)Return the string obtained by replacing the leftmostnon-overlapping occurrences of the pattern in string by thereplacement repl. repl can be either a string or a callable;if a string, backslash escapes in it are processed. If it isa callable, it's passed the match object and must returna replacement string to be used.
案例:有一个文本,文本里的日期使用的是%m/%d/%Y的格式,你需要把它全部转化成%Y-%m-%d的格式。
In [55]: text = "Today is 11/08/2016, next class time 11/15/2016"In [56]: new_text = re.sub(r'(\d+)/(\d+)/(\d+)', r'\3-\2-\1', text )In [57]: new_text Out[57]: 'Today is 2016-08-11, next class time 2016-15-11'
5、str.format 字符串格式化
In [71]: help(str.format)Help on method_descriptor:format(...)S.format(*args, **kwargs) -> stringReturn a formatted version of S, using substitutions from args and kwargs.The substitutions are identified by braces ('{' and '}'). (END)
案例:你需要创建一个小型的模版引擎,不需要逻辑控制,但是需要使用变量来填充模版
In [18]: text = "{name} has {n} messages"In [19]: new_text = text.format(name = "xj", n = 17)In [20]: text Out[20]: '{name} has {n} messages'In [22]: new_text Out[22]: 'xj has 17 messages'In [29]: text = "%s has %s messages"In [30]: print text % ("xj", 17) xj has 17 messages
6、StringIO 伪文件对象 将字符串模拟成文件对象
案例:有一个方法worker,它被设计用来处理文件对象,你有一些字符串,希望使用worker来处理。
解决方案:
把字符串写入文件,再使用work处理 #涉及io,低效
使用SrtingIO模块处理 #将字符串模拟成文件对象 伪文件对象
In [3]: import jsonIn [4]: from SrtingIO import StringIOIn [6]: StringIO. #有文件的所有属性和方法 StringIO.close StringIO.read StringIO.truncate StringIO.flush StringIO.readline StringIO.write StringIO.getvalue StringIO.readlines StringIO.writelines StringIO.isatty StringIO.seek StringIO.next StringIO.tell In [6]: type(StringIO) Out[6]: classobjIn [7]: data = {'a':1, 'b':[2, 3, 4]}In [8]: io = StringIO()In [9]: json.d json.decoder json.dump json.dumps In [9]: io. io.buf io.getvalue io.read io.tell io.buflist io.isatty io.readline io.truncate io.close io.len io.readlines io.write io.closed io.next io.seek io.writelines io.flush io.pos io.softspace In [9]: type(io) Out[9]: instanceIn [10]: json.dump(data, io)In [11]: print io.getvalue() {"a": 1, "b": [2, 3, 4]}
转载于:https://blog.51cto.com/xiexiaojun/1870832