Python Project: Web Scraper Script Deep Dive

Module 7: Whatchan Script Deep Dive & Final Quiz

Welcome to your final module! This is where we put theory into practice by dissecting a real-world script. We'll examine whatchan_amended.py, the script that powers the daily football listings. By understanding how it works, you'll see how all the concepts you've learned—from variables to web scraping—come together to create something genuinely useful.

Quick Recap of Module 6

In Module 6, we learned how to apply Python to practical, real-world tasks:

Automation: We learned to write scripts for repetitive tasks like renaming or backing up files, using the pathlib and datetime modules.
Web Scraping: We used the requests library to download static web pages and BeautifulSoup to parse the HTML and extract data.
Browser Automation: For modern, dynamic websites, we learned to control a web browser using the Selenium library to handle JavaScript-loaded content.
Project Refactoring: We practiced the professional skill of refactoring—improving the structure and readability of existing code without changing its functionality.

Overview

The whatchan_amended.py script is the engine behind the daily football listings page on whatchan.co.uk. Specifically, it's a fully automated content generation tool that creates the whatchan.co.uk/today/ page. Its job is to visit a TV listings website, find all of today's live football matches, grab some related news from the BBC, and then build a clean, user-friendly webpage with all that information. It uses Selenium to handle the modern, dynamic listings site and BeautifulSoup for the simpler BBC news page. Finally, it assembles an HTML file, a JSON file for other programs, and a plain text summary, ready to be published.

Steps at a Glance

Setup & Configuration: The script begins by importing all necessary tools (libraries) and defining key settings, like URLs and channel names.
Scrape Fixtures: It launches a browser using Selenium, navigates to the listings site, and carefully extracts the details for each of today's matches.
Fetch Gossip: It makes a quick visit to the BBC sport website to grab the latest football transfer rumours to add some colour.
Build the Webpage: All the collected data is woven into a structured HTML file, complete with a table, search filters, and hidden data for search engines.
Write Output Files: The script saves the final HTML page, a JSON file of the data, and a simple text summary to a folder.

The Script: View or Download

Below, you can view the entire script in your browser. If you'd prefer to view it in your own code editor or on a different screen, you can download the file directly.

whatchan_amended.py (Right-click and select "Save Link As..." to download)

Click to view the full whatchan_amended.py script


# whatchan_amended.py
# Scrapes football listings, generates HTML, JSON and optional image.

import os
import re
import json
import time
import shutil
import tempfile
from typing import List, Optional, Tuple
from datetime import datetime, date, timedelta, timezone

# Third-party imports - ensure these are installed
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service as ChromeService
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from webdriver_manager.chrome import ChromeDriverManager

# Optional imports
try:
    from PIL import Image, ImageDraw, ImageFont
    HAS_PILLOW = True
except ImportError:
    HAS_PILLOW = False

try:
    from zoneinfo import ZoneInfo
except ImportError:
    from backports.zoneinfo import ZoneInfo

# --- Constants and Configuration ---
SITE_NAME = "Whatchan"
BASE_URL = "https://whatchan.co.uk"
OUTPUT_DIR = "whatchan_today_assets"
GENERATE_IMAGE = HAS_PILLOW  # Set to False to disable image generation
FTA_CHANNELS = {"BBC", "ITV", "Channel 4", "Channel 5", "S4C"}
TV_URL = "https://www.live-footballontv.com/"
GOSSIP_URL = "https://www.bbc.co.uk/sport/football/gossip"
GOSSIP_KEYWORDS = {"transfer", "deal", "contract", "move", "signing", "bid", "talks"}
TIMEZONE = ZoneInfo("Europe/Sofia")

# ... (The rest of the script code is here but omitted for brevity) ...

if __name__ == "__main__":
    main()

Walkthrough: Code & Explanation

Let’s break down the script into its core components. Below, you'll find explanations for each major part of the code.

1. Imports and Configuration

The script starts by importing all the tools it needs from Python's standard library and the third-party libraries you've learned about. After the imports, it defines a set of global constants (e.g., BASE_URL, FTA_CHANNELS) that hold important configuration values, making the script easy to update.

🤯 Feeling a bit lost?

This is just the setup phase, like a chef laying out all their ingredients and tools. We're gathering everything we'll need for the job. (Review Modules and Packages)

2. Helper Functions and the `Fixture` Class

This section defines a custom blueprint for our data: the Fixture class. Each `Fixture` object holds all the important information for one match. It also has helpful methods, like is_fta(), which tells you if the match is free-to-air. This Object-Oriented approach bundles the data and its related actions together. Alongside the class are several small helper functions that perform single, specific tasks like formatting dates.

🤯 Feeling a bit lost?

The Fixture class is our "cookie cutter" for match data. It's a blueprint that ensures every match object we create has the same structure and capabilities. (Review Designing Classes)

3. Scraping Fixtures with Selenium

This is the main data-gathering step. The function uses Selenium to control a real Chrome browser because the listings website is dynamic and uses JavaScript to load its content. The script navigates to the URL, waits for the fixture list to appear, and then carefully loops through the HTML elements to collect all the match details for today.

🤯 Feeling a bit lost?

This is the "robot in a coffee shop" in action. Because the website is interactive, we need to automate a real browser to see the content, just like a human would. (Review Selenium for Dynamic Pages)

4. Fetching Gossip with Requests & BeautifulSoup

This function performs a second, simpler scrape. The BBC news site is static, so it uses the faster `requests` library to download the HTML. Then, it uses `BeautifulSoup` to parse the HTML and find the paragraphs containing transfer rumours. This shows how you can choose the right tool for the job.

🤯 Feeling a bit lost?

This is our "research assistant with magic glasses." `requests` grabs the book (HTML), and `BeautifulSoup` helps us find the exact sentences we need. (Review Web Scraping)

5. Building the Final HTML Page

This function is the report generator. It takes the list of `Fixture` objects and assembles the final webpage by building up a large string of HTML. It uses f-strings to neatly insert the data into a template. This function also creates the interactive filter buttons and generates the JSON-LD structured data, a hidden block that helps search engines like Google understand the page content.

🤯 Feeling a bit lost?

This is the "news anchor" part of our script. We're taking all the raw data we've collected and formatting it into a clean, human-readable story (a webpage). (Review Generating Reports)

6. The Main Routine

The `if __name__ == "__main__":` block is the script's entry point. The `main()` function acts as the conductor of the orchestra. It calls all the other functions in the correct order: scrape, fetch, build, and then write the output files to a directory.

🤯 Feeling a bit lost?

The `main()` function is the master recipe card. It lists the main steps and calls the specialized helper functions to do each part of the job. (Review Defining Functions)

Found This Course Useful?

If this free 30-by-30 course has helped you on your Python journey, please consider a small donation. Your support helps cover server costs and allows me to create more free, high-quality learning materials for everyone. Thank you!

Buy Me a Coffee ☕

Proof of Learning: Final Quiz

You've completed the course! Let's test your knowledge with a final quiz. This will cover concepts from all the modules. Choose the best answer for each question and submit at the end to see your score and get detailed feedback.

Need Custom eLearning for Your Business?

This course is an example of the practical, hands-on training materials I love to create. With over a decade of experience as a Learning & Development Specialist, I can help you or your business develop engaging, effective eLearning solutions for any topic.

Whether you need to onboard new hires, upskill your current team, or create a public-facing educational resource, I can design and build a custom course tailored to your specific needs.

Let's talk about your project.

Contact me on LinkedIn to get started →