A website
may have hundreds or even thousands of pages. Over time URLs change pages get.
Internal links become broken. These broken links are bad for user experience
reduce crawl efficiency and can hurt search engine rankings.
Manually
checking every page is almost impossible. This is where Python automation comes
in.
In this
tutorial you will learn how to build a content audit tool using Python,
BeautifulSoup and Requests that scans a website and identifies broken internal
links automatically.
By the end
you will have a script that can save hours of manual SEO work.
Why Broken
Internal Links Matter
Broken
links affect both visitors and search engines.
Common
issues include:
* Poor user
experience
* Increased
bounce rates
* Wasted
crawl budget
* SEO
performance
* Loss of
link equity
For growing
websites regular content audits should be part of every SEO strategy.
Tools
Required
You need to
install the libraries:
pip install
requests beautifulsoup4 pandas
You will
use:
Requests to
fetch web pages
BeautifulSoup
to extract links
Pandas to
export audit reports
Understanding
the Workflow
The Python
script will:
Visit a
webpage
Extract all
internal links
Check each
URLs status code
Identify
broken links
Save
results into a CSV report
The
workflow is:
Website →
Python Script → Link Scanner → Report
Step 1:
Import Required Libraries
You import
the necessary libraries:
import
requests
from bs4
import BeautifulSoup
import
pandas as pd
from
urllib.parse import urljoin
These
libraries handle web requests HTML parsing and report generation.
Step 2:
Fetch Website Content
You start
with a website URL:
url =
"https://yourwebsite.com"
You send a
request to the website:
response =
requests.get(url)
You check
the response status code:
print(response.status_code)
A
successful request returns:
200
Step 3:
Extract Internal Links
You parse
the website content:
soup =
BeautifulSoup(response.text, "html.parser")
You find
all links on the page:
links = []
You loop
through each link:
for link in
):
href =
link.get("href")
if href:
full_url =
urljoin(url, href)
links.append(full_url)
This
collects all links found on the page.
Step 4:
Check Link Status
You check
each link:
broken_links
= []
You loop
through each link:
for link in
links:
try:
result =
requests.get(link, timeout=5)
if
result.status_code >= 400:
broken_links.append(
[link,
result.status_code]
)
except:
broken_links.append(
[link,
"Error"]
)
Any status
code above 400 usually indicates a problem.
Examples:
Status Code Meaning
404 Page Not Found
500 Server Error
403
Step 5:
Export Results to CSV
You create
a DataFrame:
df =
pd.DataFrame(
broken_links,
columns=["URL"
"Status"]
)
You export
the results to a CSV file:
df.to_csv(
"broken_links_report.csv"
index=False
)
You now
have a report for SEO review.
Improving
the Script
Professional
SEO audits usually include checks.
You can
expand the tool to:
Crawl
Multiple Pages
of checking
one page you can crawl the entire website.
Identify
Redirect Chains
You can
detect:
301 → 302 →
URL
Excessive
redirects can slow page loading.
Find
Missing Meta Tags
You can
audit:
Title Tags
Meta
Descriptions
Canonical
Tags
Check Image
Errors
You can
identify:
Missing
images
Broken
image URLs
Oversized
images
SEO
Benefits of Automated Audits
audits
help:
Improve
user experience
Strengthen
linking
Increase
crawl efficiency
Discover
hidden technical issues
Improve
search visibility
Many
agencies charge thousands of rupees for audits that can be partially automated
using Python.
Common
Mistakes Beginners Make
Crawling
Too Aggressively
You should
not send hundreds of requests quickly as this can overload servers.
Always use
delays when scanning websites:
import time
time.sleep(1)
Ignoring
Robots.txt
Some pages
should not be crawled.
Always
respect website crawling rules.
Auditing
Only the Homepage
Many broken
links exist deeper within websites.
You should
audit pages whenever possible.
Real-World
Use Cases
This
automation can be used by:
Digital
Marketers
You can
identify SEO issues quickly.
Website
Owners
You can
maintain site health.
Freelancers
You can
offer audit services.
Agencies
You can
automate client work.
Future
Enhancements
After
mastering audits you can try building:
SEO
dashboard using Streamlit
Automated
site crawler
Keyword
tracking system
XML sitemap
validator
Competitor
audit tool
These
projects are excellent additions to a digital marketing or Python automation
portfolio.
Automating
content audits with Python is one of the ways to improve website maintenance
and SEO efficiency. With a few libraries you can identify broken links generate
reports and save countless hours of manual checking.
As websites
continue to grow automation becomes a skill for digital marketers, SEO
professionals and developers alike.
Learning
tools like Requests, BeautifulSoup and Pandas gives you the foundation to build
advanced SEO automation systems in the future.
Free
Resource
You can get
the enhanced version of this script with:
Full
website crawling
CSV export
Redirect
detection
Missing
metadata checks
You can
submit an inquiry through KodVidya Academy and request the SEO Content Audit
Automation Toolkit.
Interactive
Question
What is the
website you have worked on and how many pages did it contain?
You can
share your answer in the comments. Discuss your biggest SEO audit challenges
with other readers.
Learn
Python Automation & Digital Marketing at KodVidya Academy
You can
master Python scripting, SEO automation, web scraping and digital marketing
tools, through hands-on projects designed for real-world careers.
No comments:
Post a Comment