Categories

See More
Popular Forum

MBA (4887) B.Tech (1769) Engineering (1486) Class 12 (1030) Study Abroad (1004) Computer Science and Engineering (988) Business Management Studies (865) BBA (846) Diploma (746) CAT (651) B.Com (648) B.Sc (643) JEE Mains (618) Mechanical Engineering (574) Exam (525) India (462) Career (452) All Time Q&A (439) Mass Communication (427) BCA (417) Science (384) Computers & IT (Non-Engg) (383) Medicine & Health Sciences (381) Hotel Management (373) Civil Engineering (353) MCA (349) Tuteehub Top Questions (348) Distance (340) Colleges in India (334)
See More

python beautifulsoup web scraping issue

Course Queries Syllabus Queries

Max. 2000 characters
Replies

usr_profile.png

User

( 5 months ago )

 

page = requests.get("http://www.freejobalert.com/upsc-recruitment/16960/#Engg-Services2019")
c = page.content
soup=BeautifulSoup(c,"html.parser")
data=soup.find_all("tr")
for r in data:
    td = r.find_all("td",{"style":"text-align: center;"})
    for d in td:
        link =d.find_all("a")
        for li in link:
            span = li.find_all("span",{"style":"color: #008000;"})
            for s in span:
                strong = s.find_all("strong")
                for st in strong:
                        dict['title'] = st.text
        for l in link:
            dict["link"] = l['href']
    print(dict)

It is giving

{'title': 'Syllabus', 'link': 'http://www.upsc.gov.in/'}
{'title': 'Syllabus', 'link': 'http://www.upsc.gov.in/'}
{'title': 'Syllabus', 'link': 'http://www.upsc.gov.in/'}

I am expecting:

{'title': 'Apply Online', 'link': 'https://upsconline.nic.in/mainmenu2.php'}
{'title': 'Notification', 'link': 'http://www.freejobalert.com/wp-content/uploads/2018/09/Notification-UPSC-Engg-Services-Prelims-Exam-2019.pdf'}
{'title': 'Official Website ', 'link': 'http://www.upsc.gov.in/'}

Here i want all "Important Links" means "Apply online","Notification","official website" and it's link for each table. but it is giving me "Syllabus" in title instead with repeting links..

please have a look into this..

usr_profile.png

User

( 5 months ago )

This may help you, check the code below.

import requests
from bs4 import BeautifulSoup
page = requests.get('http://www.freejobalert.com/'
'upsc-recruitment/16960/#Engg-Services2019')
c = page.content
soup = BeautifulSoup(c,"html.parser")
row = soup.find_all('tr')
dict = {}
for i in row:
    for title in i.find_all('span', attrs={
        'style':'color: #008000;'}):
        dict['Title'] = title.text
    for link in i.find_all('a', href=True):
        dict['Link'] = link['href']
        print(dict)

what's your interest


forum_ban8_5d8c5fd7cf6f7.gif