Building a Python Web Scraper using only Natural Language

Web scraping used to require deep knowledge of HTML parsing, HTTP requests, and CSS selectors. You had to understand the DOM, handle pagination, and deal with rate limiting.

With Vibe Coding, you can build a production-ready web scraper by simply describing what data you want to extract.

In this tutorial, we'll build a scraper that extracts job listings from a website and exports them to a CSV file—using only natural language prompts in Cursor.

What We're Building

Scraper Name: JobHunter
Target: A job board website (we'll use a practice site)
Data to Extract:
* Job title
* Company name
* Location
* Salary range
* Posted date

Output: CSV file with all the data

Tech Stack:
* Python 3.10+
* BeautifulSoup4 (for HTML parsing)
* Requests (for HTTP)
* Pandas (for CSV export)

Step 1: Set Up the Project

Create a New Directory

“`bash
mkdir job-scraper
cd job-scraper
“`

Create a Virtual Environment

“`bash
python -m venv venv
source venv/bin/activate # On Windows: venv\\Scripts\\activate
“`

Open in Cursor

“`bash
cursor .
“`

Step 2: The Initial Prompt

Open Cursor's Composer (`Cmd+I`) and paste this prompt:

> “Create a Python web scraper that:
> 1. Scrapes job listings from ‘https://realpython.github.io/fake-jobs/' (a practice scraping site)
> 2. Extracts: job title, company name, location, and posted date
> 3. Stores the data in a list of dictionaries
> 4. Exports the data to a CSV file called ‘jobs.csv'
> 5. Uses BeautifulSoup4 and requests
> 6. Includes error handling for network issues
> 7. Adds a 1-second delay between requests to be polite
> 8. Prints progress as it scrapes
>
> Create a file called `scraper.py` with all the necessary code.”

Press Enter and wait.

Step 3: Review the Generated Code

Cursor will create `scraper.py`:

“`python
import requests
from bs4 import BeautifulSoup
import pandas as pd
import time

def scrape_jobs():
url = “https://realpython.github.io/fake-jobs/”

print(f”Fetching {url}…”)
try:
response = requests.get(url, timeout=10)
response.raise_for_status()
except requests.RequestException as e:
print(f”Error fetching the page: {e}”)
return []

soup = BeautifulSoup(response.content, ‘html.parser')
job_cards = soup.find_all(‘div', class_='card-content')

jobs = []
print(f”Found {len(job_cards)} job listings. Scraping…”)

for i, card in enumerate(job_cards, 1):
try:
title = card.find(‘h2′, class_='title').text.strip()
company = card.find(‘h3′, class_='company').text.strip()
location = card.find(‘p', class_='location').text.strip()
posted = card.find(‘time').text.strip()

job = {
‘title': title,
‘company': company,
‘location': location,
‘posted_date': posted
}

jobs.append(job)
print(f” [{i}/{len(job_cards)}] Scraped: {title} at {company}”)

time.sleep(1) # Be polite

except AttributeError as e:
print(f” Error parsing job card {i}: {e}”)
continue

return jobs

def save_to_csv(jobs, filename='jobs.csv'):
if not jobs:
print(“No jobs to save.”)
return

df = pd.DataFrame(jobs)
df.to_csv(filename, index=False)
print(f”\\nSaved {len(jobs)} jobs to {filename}”)

if __name__ == “__main__”:
jobs = scrape_jobs()
save_to_csv(jobs)
“`

Step 4: Install Dependencies

Cursor might also create a `requirements.txt`:

“`
beautifulsoup4==4.12.2
requests==2.31.0
pandas==2.1.0
“`

Install them:
“`bash
pip install -r requirements.txt
“`

Step 5: Run the Scraper

“`bash
python scraper.py
“`

Output:
“`
Fetching https://realpython.github.io/fake-jobs/…
Found 100 job listings. Scraping…
[1/100] Scraped: Senior Python Developer at Payne, Roberts and Davis
[2/100] Scraped: Energy engineer at Vasquez-Davidson
…
Saved 100 jobs to jobs.csv
“`

Step 6: Iterate with Vibes

The basic scraper works, but let's make it better.

Iteration 1: Add Filtering

Prompt:
> “Update the scraper to only save jobs that contain ‘Python' in the title.”

Iteration 2: Add Pagination

Prompt:
> “The website has multiple pages. Update the scraper to handle pagination. The next page URL is in a link with class ‘next'.”

Iteration 3: Add Salary Extraction

Prompt:
> “Some job cards have a salary range in a `

` tag. Extract this if it exists, otherwise set it to ‘Not specified'.”

Iteration 4: Add Logging

Prompt:
> “Replace print statements with proper logging using Python's logging module. Save logs to ‘scraper.log'.”

Each iteration takes 30-60 seconds.

Step 7: Handle Edge Cases

Real-world scraping has challenges. Let's address them.

Challenge 1: Dynamic Content (JavaScript-rendered)

Prompt:
> “If the website uses JavaScript to load content, update the scraper to use Selenium instead of requests.”

Challenge 2: Rate Limiting

Prompt:
> “Add exponential backoff if we get a 429 (Too Many Requests) response.”

Challenge 3: User-Agent Spoofing

Prompt:
> “Some websites block scrapers. Add a realistic User-Agent header to the requests.”

Advanced: Building a Multi-Site Scraper

Once you've mastered single-site scraping, you can build a scraper that handles multiple job boards.

Prompt:
> “Create a scraper that can scrape jobs from multiple websites. It should:
> – Accept a list of URLs
> – Detect the site structure automatically (or use site-specific parsers)
> – Combine all results into a single CSV
> – Run scrapers in parallel using threading”

Legal and Ethical Considerations

Important: Always check a website's `robots.txt` file and Terms of Service before scraping.

Prompt to Cursor:
> “Add a function that checks if scraping is allowed by reading the robots.txt file.”

Conclusion

Building web scrapers used to require expertise in HTML parsing and HTTP protocols. With Vibe Coding, you just describe what data you want, and the AI handles the implementation.

At BYS Marketing, we use AI-powered scraping to gather competitive intelligence, monitor pricing, and track industry trends for our clients.

—

Need custom data extraction?
Contact BYS Marketing. We build intelligent scrapers that respect website policies and deliver clean data.

🚀 Elevate Your Business with BYS Marketing

From AI Coding to Media Production, we deliver excellence.

Contact Us: Get a Quote Today

Building a Python Web Scraper using only Natural Language

Building a Python Web Scraper using only Natural Language

What We're Building

Step 1: Set Up the Project

Create a New Directory

Create a Virtual Environment

Open in Cursor

Step 2: The Initial Prompt

Step 3: Review the Generated Code

Step 4: Install Dependencies

Step 5: Run the Scraper

Step 6: Iterate with Vibes

Iteration 1: Add Filtering

Iteration 2: Add Pagination

Iteration 3: Add Salary Extraction

Iteration 4: Add Logging

Step 7: Handle Edge Cases

Challenge 1: Dynamic Content (JavaScript-rendered)

Challenge 2: Rate Limiting

Challenge 3: User-Agent Spoofing

Advanced: Building a Multi-Site Scraper

Legal and Ethical Considerations

Conclusion

🚀 Elevate Your Business with BYS Marketing

BYS Marketing Team

Leave a Reply Cancel reply