httpclient抓取网页内容

1121阅读 0评论2009-10-20 hkebao
分类:Java

1.想下载远程URL地址的内容。可以使用httpclient现在整理一下相关的代码:
而且解决中文乱码问题
方法一:流转码
public String convertStreamToString(InputStream is) throws UnsupportedEncodingException {
        BufferedReader reader = new BufferedReader(new InputStreamReader(is,"gbk"));
        StringBuilder sb = new StringBuilder();
        String line = null;
        try {
            while ((line = reader.readLine()) != null) {
                sb.append(line + "\n");
            }
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
           try {
            is.close();
           } catch (IOException e) {
            e.printStackTrace();
           }
      }
      return sb.toString();
     }
//下载内容
private String urlContent(String urlString) throws HttpException, IOException {
        HttpClient client = new HttpClient();
        GetMethod get = new GetMethod("");
        client.executeMethod(get);
        System.out.print(get.getResponseCharSet());
        InputStream iStream = get.getResponseBodyAsStream();
        String contentString = convertStreamToString(iStream);
       
        get.releaseConnection();
        return contentString;
    }

通过 GET方法能够实现下载网页内容出来的
上一篇:有关路由器方面的
下一篇:HTML Parser 学习二