Python 30‑by‑30 Course
This is it! In our final module, we bring everything together to build practical, real-world tools. You'll learn how to make Python your personal assistant, teach it to browse and extract information from the web, and cap it all off by refactoring a real-world project.
In Module 5, we learned the practices of a professional software developer:
class
, bundling data (attributes) and actions (methods) into objects.unittest
module to create a safety net for our code, allowing us to make changes with confidence.pdb
) and how to record our program's activities with the logging
module.cron
or schedule
to run your scripts automatically.pathlib
and datetime
modules for file system tasks.One of the most satisfying uses of Python is automating the boring stuff. Think about tasks you do over and over: renaming files, organizing downloads, creating backups. A simple Python script can do these things for you in seconds. Key to this are the pathlib
and datetime
modules. pathlib
provides a clean, modern way to work with files and folders, while datetime
lets you easily handle dates and times, like checking how old a file is.
The real magic happens when you schedule these scripts to run automatically. On Mac and Linux, a built-in tool called cron
is the standard way to run a command at a specific time (e.g., every day at midnight). For a pure Python solution that's cross-platform, you can install a simple third-party library like schedule
. This allows you to write, in plain English, rules like schedule.every().monday.at("08:00").do(my_backup_job)
. With scheduling, you build a true "set it and forget it" solution.
Write a script that creates a backup of a specific file. It should read the contents of a source file (e.g., notes.txt
) and write them to a new file with a timestamp in the name, like notes_2025-08-29.txt
.
# simple_backup.py
import datetime
source_file = 'notes.txt'
# Create a dummy source file
with open(source_file, 'w') as f:
f.write("This is an important note.")
# Get today's date to use in the filename
today_str = datetime.date.today().isoformat() # Format: YYYY-MM-DD
backup_filename = f"{source_file.split('.')[0]}_{today_str}.txt"
try:
with open(source_file, 'r') as f_in:
content = f_in.read()
with open(backup_filename, 'w') as f_out:
f_out.write(content)
print(f"Backup successful! Created {backup_filename}")
except FileNotFoundError:
print(f"Error: Source file '{source_file}' not found.")
What if the data you need isn't in a nice file, but on a website? Web scraping is the process of writing a program to automatically extract that information. It's a two-step process. First, you need to download the page's raw HTML code. The requests
library is the gold standard for this; it makes fetching a web page as simple as requests.get(url)
.
Second, you need to parse that messy HTML to find the specific pieces of information you want. The BeautifulSoup
library is a fantastic tool for this. It turns the HTML into a structured object that you can easily navigate. You can ask it to find all the <h2>
tags, or find the element with a specific CSS class, making it simple to pull out things like article titles, prices, or table data.
A quick note on ethics: always be a polite scraper! Check a website's `robots.txt` file to see what they allow, don't hammer their server with too many requests too quickly, and never scrape personal data without permission.
Find a simple, static news website or blog that you like. Write a script that uses requests
and BeautifulSoup
to download the homepage and print out a list of all the article headlines you can find (they are often in <h2>
or <h3>
tags).
# scrape_headlines.py
import requests
from bs4 import BeautifulSoup
# This URL is a special page designed for scraping practice
URL = "http://quotes.toscrape.com/"
try:
response = requests.get(URL)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
soup = BeautifulSoup(response.text, 'html.parser')
# On this site, the quotes are in a span with class="text"
quotes = soup.find_all('span', class_='text')
print("--- Found Quotes ---")
for quote in quotes:
print(f"- {quote.text}")
except requests.exceptions.RequestException as e:
print(f"Error fetching the URL: {e}")
Have you ever visited a website where the content appears a second or two *after* the page loads? This is because of JavaScript. The requests
library can't handle this; it only downloads the initial HTML. It can't run the JavaScript, so it can't see the final content. For these modern, dynamic websites, you need a more powerful tool: Selenium.
Selenium doesn't just download a page; it automates an actual web browser (like Chrome or Firefox). Your Python script can tell the browser to go to a URL, click buttons, fill out forms, and scroll down. Because it's a real browser, all the JavaScript runs just like it would for a human user. This allows your script to see the final, fully-rendered page.
The key to using Selenium effectively is learning to make your script wait. You can tell it to wait up to 10 seconds for a specific button to become clickable or for a piece of data to appear on the page. This makes your scripts robust and able to handle pages that take a few seconds to load their content.
Go to a simple e-commerce website with a search bar. Write a Selenium script that navigates to the homepage, finds the search input field, types "python book" into it, and clicks the search button.
# selenium_search.py
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
import time
# NOTE: You need to have geckodriver (for Firefox) or chromedriver (for Chrome) installed.
driver = webdriver.Firefox()
try:
# This example uses books.toscrape.com, which doesn't have a search bar.
# We will simulate a search on a site like Wikipedia instead.
driver.get("https://en.wikipedia.org/wiki/Main_Page")
# Find the search input element by its ID
search_box = driver.find_element(By.ID, "searchInput")
# Type text into the search box and press Enter
search_box.send_keys("Python programming")
search_box.send_keys(Keys.RETURN)
print("Search submitted! Waiting for results...")
# Give the page a moment to load
time.sleep(5)
print(f"Current page title is: {driver.title}")
finally:
# Always close the browser window
driver.quit()
It's time to graduate from simple scripts to polished, professional-feeling Command-Line Interface (CLI) tools. A good CLI tool is more than just a script; it's a program that's easy for others to use, with clear instructions and flexible options. We'll use the argparse
module we learned about earlier to define a robust interface for a new tool.
The goal is to combine the skills from this module. Your tool might take a file path as an argument (Day 16), parse it as a CSV (Day 17), and then use that data to scrape a website for more information (Day 27). argparse
lets you define required inputs, optional flags (like --verbose
), and even sub-commands (like git pull
or git commit
), which allow your tool to perform different actions.
Focus on the user experience. A good tool provides a helpful --help
message, gives clear feedback as it runs, and formats its output neatly. Building a solid CLI is a fantastic way to package your automation or scraping scripts into a reusable and shareable tool.
Build a simple CLI tool that scrapes the main headline from a news website. It should take one required argument: the name of the site (e.g., 'bbc', 'reuters'). Use a dictionary in your code to map the name to the actual URL and the correct CSS selector for the headline. Your tool should then print the headline.
# headline_scraper_cli.py
import argparse
import requests
from bs4 import BeautifulSoup
# A dictionary to hold the configuration for each site
SITE_CONFIG = {
'toscrape': {
'url': 'http://quotes.toscrape.com/',
'selector': 'span.text'
}
# In a real tool, you would add more sites here
# 'bbc': { 'url': 'https://www.bbc.com/news', 'selector': 'h3' }
}
parser = argparse.ArgumentParser(description="Scrape the main headline/quote from a website.")
parser.add_argument("site", choices=SITE_CONFIG.keys(), help="The short name of the site to scrape.")
args = parser.parse_args()
config = SITE_CONFIG[args.site]
url = config['url']
selector = config['selector']
try:
response = requests.get(url)
response.raise_for_status()
soup = BeautifulSoup(response.text, 'html.parser')
# Find the FIRST matching element
headline = soup.select_one(selector)
if headline:
print(headline.text.strip())
else:
print(f"Could not find the headline using selector: {selector}")
except requests.RequestException as e:
print(f"Error fetching URL: {e}")
Congratulations, you've made it to the final day! For your capstone project, you won't be writing a script from scratch. Instead, you'll be doing something far more common in the life of a developer: taking an existing, working script and making it better. You'll be working with the real script I wrote for Whatchan to scrape and display football listings.
Your task is to refactor this script. It already works, but it's rigid. It has hard-coded values and doesn't give the user much control or feedback. Your job is to improve it by:
argparse
to control its behavior (e.g., run headlessly, change the output directory).logging
to provide clear feedback about what the script is doing and to report errors gracefully.This project will test your ability to read someone else's code, understand its logic, and carefully modify it without breaking it. This is the ultimate test of your new skills and will leave you with a powerful, practical tool that you've made your own. Good luck!
Find the whatchan_amended.py
script from the course materials. First, just read through it and try to understand how it works. Run it and see the output. Then, start your refactoring. Add a --headless
flag, an --output-dir
argument, and a --no-images
flag. Sprinkle logging.info()
and logging.error()
messages throughout the code to track its progress. You've got this!
# In your refactored whatchan.py, the top would look something like this:
import argparse
import logging
from pathlib import Path
# ... other imports
# 1. Set up Logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s'
)
# 2. Set up Argparse
parser = argparse.ArgumentParser(description="Scrape football listings from Whatchan.")
parser.add_argument('--headless', action='store_true', help='Run browser in headless mode.')
parser.add_argument('--output-dir', type=Path, default=Path('output'), help='Directory to save output files.')
parser.add_argument('--no-images', action='store_true', help='Do not download channel images.')
args = parser.parse_args()
# 3. Use the arguments in your main function
def main():
logging.info("Starting the scraper...")
# Use args.headless to configure selenium
# Use args.output_dir to define save paths
# Use if not args.no_images: to control image downloads
# ...
logging.info("Scraping complete.")
if __name__ == '__main__':
main()