用python抓一把百度音乐的热歌榜 top500-joepayne-ChinaUnix博客

#!/usr/bin/python
#filename:get_html.py
#coding=utf-8
import urllib2
import re
#item = { 'songItem': { 'sid': '(?=\d)', 'sname': '(?!\d)', 'author': '?!\d' } }
item = "{ 'songItem': { (.*) } }"
item2 = "'sid': '(.*)', 'sname': '(.*)', 'author': '(.*)'"
myfile = file("song.txt",'w')
response = urllib2.urlopen('')
html = response.read()
html = re.findall(item,html)
i = 1
for rec in html:
r = re.findall(item2,rec)
print >> myfile,i,r[0][0],r[0][1],r[0][2]
i = i+1

如下图是生成的结果文件

另有两篇不错的文章：
1. 关于Python字符串的encode与decode的比较透彻的讲解。
2.http://www.cnblogs.com/huxi/archive/2010/07/04/1771073.html 很好的讲python正则表达式的。