A Gentle Introduction to Web Scraping with Python
Build a simple web scraper that fetches data from websites using Python, requests, pandas and BeautifulSoup.
“The universe is made of stories, not of atoms.”
— Muriel Rukeyser
Muriel Rukeyser once wrote, “The universe is made of stories, not of atoms.” And if that’s true, then websites are little universes — bursting with data, meaning and stories just waiting to be told.
Web scraping is the act of finding clarity in digital noise. It’s one of the simplest ways to step into the world of data science, automation and storytelling — and the best part? You can start right now, even if you’re a complete beginner.
Step 1: Setting the Stage – What You'll Need
Before we dive in, here’s what you’ll need:
✅ Google Colab (or your local IDE like VS Code, Anaconda, Jupyter etc.)
✅ A web page to scrape – we’ll use Yahoo Finance’s Most Active Stocks
✅ Three Python libraries:
requests– the web-fetching courierbeautifulsoup4– the HTML detectivepandas– the spreadsheet artist
Meet Your Dream Team (a.k.a. The Libraries)
Think of this like planning a wedding:
Requests is your planner – it gets you to the venue (aka, the webpage).
BeautifulSoup is your assistant – it helps you find exactly what you need once you’re inside.
Pandas is your designer – it turns raw material into a beautiful, usable format.
Installing the Tools
If you're using Google Colab, you're all set — these libraries are pre-installed. Just import them with the following code:
import requests
from bs4 import BeautifulSoup
import pandas as pd
A quick note:
When we write import pandas as pd, we’re giving it a nickname — kind of like calling your friend “pd” instead of “Pandas” every time you talk. It just makes our code cleaner, shorter, and yes, easier for lazy fingers. If you’re working locally, install the libraries using:
pip install requests
pip install beautifulsoup4
pip install pandas
If you’re in Google Colab, use !pip install instead:
!pip install requests
!pip install beautifulsoup4
!pip install pandas
Depending on your setup, you might need to use pip3 instead of pip. If in doubt, try a few variations or check your Python version with python --version.
Step 2: Fetching the Web Page
Now we’re reaching out to the site we want to scrape.
url = "https://finance.yahoo.com/most-active"
headers = {'User-Agent': 'Mozilla/5.0'}
page = requests.get(url, headers=headers)
What does this all mean?
url = "...": This is the webpage we want to scrape. We store it in a variable for easy reference.headers = {'User-Agent': 'Mozilla/5.0'}: Some websites won’t give data to bots. This header tricks the site into thinking we’re just a normal browser. Think of it as dressing nicely to get into a members-only jazz club.page = requests.get(url, headers=headers): This is us sending a polite request: “Hey, can I have a copy of this page?” The site says yes, and we store the response in page.
Step 3: Reading the HTML
Now that we have the page, we need to make sense of it with BeautifulSoup.
soup = BeautifulSoup(page.text, 'html.parser')
table = soup.find('table')
What’s happening here?
BeautifulSoup(page.text, 'html.parser'): This turns the messy HTML into a beautiful, searchable structure. If HTML is a manuscript, BeautifulSoup is the editor.soup.find('table'): TellsBeautifulSoup, “Find me the first table on this page.” Yahoo displays stock data in a table, so that’s where we’ll look.
Want to see what the soup looks like? Try:
print(soup.prettify()[:1000])
It’ll print the first 1000 characters of your parsed HTML.
Step 4: Extracting the Data (The Fun Part)
We’ve got our table — now let’s extract the stock info:
cleaned_data = []
for row in table.find_all('tr')[1:]:
cols = [col.text.strip() for col in row.find_all('td')]
if len(cols) >= 7:
cleaned_data.append(cols[:7])
What each line does:
cleaned_data = []: An empty list where we’ll store our nice clean rows.for row in table.find_all('tr')[1:]:: Loops through each row — skipping the header row.cols = [col.text.strip() for col in row.find_all('td')]: Pulls out and cleans up each cell’s text.if len(cols) >= 7:: Ensures we only grab rows with 7 or more columns.cleaned_data.append(cols[:7]): Adds the first 7 columns of clean data to our list.
By the end, we’ve got a tidy list of lists — aka. our dataset.
Step 5: Wrangling It with Pandas
Now we turn that raw list into a DataFrame (like a spreadsheet):
columns = ['Symbol', 'Name', 'Price', 'Change', '% Change', 'Market Cap', 'Volume']
df = pd.DataFrame(cleaned_data, columns=columns)
df.head()
columns = [...]: Defines the column headers.pd.DataFrame(...): Tells pandas to create a table from our cleaned data.df.head(): Shows the first 5 rows.
💾 Bonus: Save Your Scraped Data
Want to keep your scraped data? Use the following:
df.to_csv('most_active_stocks.csv', index=False)
This creates a .csv file you can open in Excel, Google Sheets, or any other spreadsheet tool.
Full Code Recap
If you want to copy and paste the whole thing into your Google Colab or IDE, here’s the full code we used from start to finish:
import requests
from bs4 import BeautifulSoup
import pandas as pd
# Step 1: Get the web page
url = "https://finance.yahoo.com/most-active"
headers = {'User-Agent': 'Mozilla/5.0'}
page = requests.get(url, headers=headers)
# Step 2: Parse the HTML
soup = BeautifulSoup(page.text, 'html.parser')
table = soup.find('table')
# Step 3: Extract table rows
cleaned_data = []
for row in table.find_all('tr')[1:]: # skip the header
cols = [col.text.strip() for col in row.find_all('td')]
if len(cols) >= 7: # Make sure we have enough columns
cleaned_data.append(cols[:7]) # Only take the first 7
# Step 4: Load into a pandas DataFrame
columns = ['Symbol', 'Name', 'Price', 'Change', '% Change', 'Market Cap', 'Volume']
df = pd.DataFrame(cleaned_data, columns=columns)
# Step 5: Display the first 5 rows
df.head()
In Summary …
Web scraping isn’t just about collecting data — it’s about uncovering stories. Each stock symbol, each price change, each row in your dataframe… they’re not just numbers. They’re fragments of a larger narrative unfolding in the markets, in the economy, in human behavior. Maybe you'll build a scraper to track prices over time, or compare companies, or generate stunning data visualizations. Or maybe you'll feed this data into a machine learning model and make a predictive analysis.
As Muriel Rukeyser reminds us: “The universe is made of stories, not of atoms.” And every line of code you write — especially in projects like these — is a step toward telling those stories with data.
💌 Stay in the Loop
If you found this helpful (or just kind of delightful), subscribe to The Literary Coder — where we make code poetic, data meaningful, and every tutorial a little bit literary. Thanks for scraping with me 🧡
➡️ Coming soon on The Literary Coder
How to scrape dynamic sites using Selenium, and even combine scraping with AI tools like Langchain for some real-time transformation magic.

