當前位置：首頁 » 編程語言 » python抓取表格

python抓取表格

發布時間: 2021-03-05 05:07:00

㈠如何用python爬取靜態表格

問題出在
1、基本的循環沒有搞很清楚，那條語句該放循環裡面，還是外面，提問者沒有搞清楚。
2、出了問題，不會加print 語句調試。

# coding:utf-8

import urllib.request
import bs4 as bs
import re
import string
import csv
import time

csvfile = open('test.csv', 'a')

writer = csv.writer(csvfile)

def earse(strline, ch):
left = 0
right = strline.find(ch)

while right != -1:
strline = strline.replace(ch, '')
right = strline.find(ch)
return strline

url = r""

resContent = urllib.request.urlopen(url).read()

soup = bs.BeautifulSoup(resContent, "html.parser")

tab = soup.findAll('table')

tab = tab[2]
tds = tab.findAll('tr')
for trIter in tds:
tdIter = trIter.findAll('td')
templist = []
for item in tdIter:
templist.append(item.string.strip())
print(item.string.strip())
if templist[0:4]:
writer.writerow(
[time.asctime(time.localtime(time.time()))] + templist[0:4])

csvfile.close()

㈡抓取表格 python

你好：

建議你放到mdb裡面；
利用Python很方便的；

㈢ python如何用urllib抓取網頁中表格的第二及後續頁面

手邊沒現來成代碼，就不貼了，源告訴你思路：

雖然url里看著地址都一樣，其實是不一樣的，需要F12分析後台代碼；

然後有兩種方式，

就是F12分析代碼後爬取真實地址；
用py模擬敲擊「下一頁」。

具體代碼都不麻煩，度娘很好找。

㈣求教如何通過python抓取網頁中表格信息

看你抓的是靜態還是動態的了，這里是靜態表格信息的代碼：


importurllib2
importre
importstring

defearse(strline,ch):
left=0
right=strline.find(ch)

whileright!=-1:
strline=strline.replace(ch,'')
right=strline.find(ch)
returnstrline

url=r"http://www.bjsta.com"

resContent=urllib2.urlopen(url).read()

resContent=resContent.decode('gb18030').encode('utf8')

soup=BeautifulSoup(resContent)

printsoup('title')[0].string

tab=soup.findAll('table')

trs=tab[len(tab)-1].findAll('tr')

fortrIterintrs:
tds=trIter.findAll('td')
fortdIterintds:
span=tdIter('span')
foriinrange(len(span)):
ifspan[i].string:
printearse(span[i].string,'').strip(),
else:
pass
print

㈤ python怎麼提取excle表格數據

通過實例來說明，在Excle表格中有數據和文字結合的內容
把Excle表格中的數據，復制——粘貼到word中。
按下ctrl+F鍵，打開「查找和替換」對話框。點擊「替換」標簽，在其高級功能下選擇「使用通配符」
然後在查找的內容的位置輸入：[0-9]，替換位置：是空值，不輸入任何數據。點擊」全部替換「按鈕。
點擊"全部替換"，數據中所有的數字全部刪除掉了。

那麼，如何提取Excle數據中的非數字部分？也是通過「使用通配符」來實現。
把原來的數據復制到word文檔中去，按下ctrl+F鍵，打開「查找和替換」對話框。點擊「替換」標簽，在其高級功能下選擇「使用通配符」，和上述的步驟一樣。
在查找的內容的位置輸入：[!0-9]，替換位置：是空值，不輸入任何數據。點擊」全部替換「按鈕。

這樣，數據中所有的非數字數字全部刪除掉了。

這里，介紹一下數據中包含字元和數字的情況，這里不需要使用通配符功能，也可以通過一種更便捷的方法來實現。（只剩下數字部分的實現方法）
例如，有寫好的數據，如圖所示：

同樣的操作步驟，把原來的數據復制到word文檔中去。

按下ctrl+F鍵，打開「查找和替換」對話框。點擊「替換」標簽。這里不需要使用通配符功能。
在查找的內容的位置輸入：^$，替換位置：是空值，不輸入任何數據。點擊」全部替換「按鈕。

只剩下數字部分的實現方法。

㈥ python 怎麼獲取docx表格中內容

from docx import Document
doc=Document(r'd:zzz3.docx')
tb=doc.tables[0]
for r in tb.rows:
for c in r.cells:
print(c.text)

㈦ python 如何讀取 excel 指定單元格內容

1、首先打開電腦上編寫python的軟體。

㈧ python爬蟲表格裡面的數據應該怎樣抓

貼一個例子你看，如何使用看文檔
import urllib2
from bs4 import BeautifulSoup
import csv
url = ('http://nflcombineresults.com/nflcombinedata.php?year=2000&pos=&college=')
page = urllib2.urlopen(url).read()
soup = BeautifulSoup(page)
table = soup.find('table')
f = csv.writer(open("2000scrape.csv", "w"))
f.writerow(["Name", "Position", "Height", "Weight", "40-yd", "Bench", "Vertical", "Broad", "Shuttle", "3-Cone"])
# variable to check length of rows
x = (len(table.findAll('tr')) - 1)
# set to run through x
for row in table.findAll('tr')[1:x]:
col = row.findAll('td')
name = col[1].getText()
position = col[3].getText()
height = col[4].getText()
weight = col[5].getText()
forty = col[7].getText()
bench = col[8].getText()
vertical = col[9].getText()
broad = col[10].getText()
shuttle = col[11].getText()
threecone = col[12].getText()
player = (name, position, height, weight, forty, bench, vertical, broad, shuttle, threecone, )
f.writerow(player)

㈨ Python怎麼抓取表格正則怎麼寫

看了你抄的正則表達式。思路基本上是正則的。不過有些小問題。我建議你初學的時候分兩步搜索。

先找到所有的tr，再在tr里找td

exp1=re.compile("(?isu)<tr[^>]*>(.*?)</tr>")
exp2=re.compile("(?isu)<td[^>]*>(.*?)</td>")
htmlSource=urllib.urlopen("http://cn-proxy.com/").read()
forrowinexp1.findall(htmlSource):
print'==============='
forcolinexp2.findall(row):
printcol,
print

這里(?isu)意思就是，要搜索時，包含回車換行，包含漢字，包含空格。

你多試試。找一個正則表達式驗證工具，比如kodos。然後看看python自帶的那個正則表達式教程就可以了。

閱讀全文

python抓取表格

與python抓取表格相關的閱讀推薦