python VS awk 在处理文本的不同之处

1650阅读 0评论2015-10-13 expert1
分类:Python/Ruby

题目在这里 ,bbs有问题,我的几个回帖都被删了.

  1. #!/usr/bin/env python

  2. import subprocess

  3. spec ={'node': ((53,102),(103,122),(223,376)),

  4.         'renderG' : (('1-166','181-202'),('167-180','203-204'))
  5. }

  6. cmd ="qbhosts | awk '/node/||/renderG/'"

  7. output = subprocess.Popen(cmd,shell=True,stdout=subprocess.PIPE).communicate()[0]

  8. d ={}

  9. for i in output.splitlines():

  10.     items = i.split()
  11.     if not items[0].endswith("xxx.com"):
  12.         items[0]=items[0]+".xxx.com"
  13.     d[items[0]] = items[3]

  14. for k,v in spec.items() :

  15.                 for i in v:
  16.                         # set default value for m1,m2.
  17.                         m1=m2=None

  18.                         if type(i[0]) == str or type(i[1]) == str :

  19.                                 begin = i[0].split("-")[0]
  20.                                 m1 = i[0].split("-")[1]
  21.                                 m2 = i[1].split("-")[0]
  22.                                 end = i[1].split("-")[1]
  23.                                 host = [ "%s-%03d.xxx.com" %(k,m ) for m in xrange(int(begin),int(end)+1) if m not in xrange(int(m1)+1,int(m2))]

  24.                         else :
  25.                                 begin,end = i[0],i[1]
  26.                                 host = [ "%s-%03d.xxx.com" %(k,m) for m in xrange(begin,end+1) ]
  27.                         #print "--------\n",host,"\n----------------"

  28.                         temp = {}

  29.                         for h in host :

  30.                                 if h in d.keys():
  31.                                         temp[h] = d[h]

  32.                                 #else : print "%s doesn't exist."%h

  33.                         act = len([k1 for k1,v1 in temp.items() if v1 =="active" ])
  34.                         down = len([k1 for k1,v1 in temp.items() if v1 =="down" ])

  35.                         if m1 and m2 :

  36.                                 print "\n%s[%03d-%03d,%03d-%03d].xxx.com have %d active node(s), %d down node(s)\n" % (k,int(begin),int(m1),int(m2),int(end),act,down)

  37.                         else:
  38.                                 print "\n%s[%03d-%03d].xxx.com have %d active node(s), %d down node(s)\n" % (k,int(begin),int(end),act,down)

  39.                         temp.clear()
相对而言,awk的简洁一些,但有些问题,
  1. awk 'NR==FNR{
  2. for(i=$2;i<=$3;i++){a[$1" "i]=$2;b[$1" "$2]=$3;}
  3. if(NF>3) { b[$1" "$2]=$3" Exclude ["$4"-"$5"]";
  4. for(i=$4;i<=$5;i++)
  5. delete a[$1" "i]
  6. }
  7. }NR>FNR{
  8. split($1,c,"[-.]")
  9. n=c[1]" "(c[2]+0)
  10. if(n in a) d[c[1]" "a[n]][$4]++ # d[node" "53]["active"] ++ ; d[node" "103]["down"] ++
  11. }END{for(i in d)for(j in d[i])print i"-"b[i],j,d[i][j]}'

gawk 4.0+ 可用。

题外话:要处理文件,截取固定的行,read/write都有3个,read/readline/readlines , read每次只读取若干字节,readline、readlines很相似,后者是一次性读取完,前者逐行(必须有while True之类的来保证继续),而且数据类型也不同,readlines是一个list.
他们都区别还在于是否有尾部的\n,要想往回读文件,要seek了。

例如:
with open("sample","r") as f:

        for i in f.readlines()[2:-4]:
                s =i.split()
                print "INSERT INTO tab VALUES (%s )"%",".join("'%s'"%i for i in s)

f.close()

上一篇:python操作windows AD的代码
下一篇:AWS: VPC基础之CIDR