Python script: bulk download from Wikipedia (solving the http error 403)
# save Wikipedia HTML pages (properly named) from a text list of URLs # NOTE: usea quotes to escape / . e.g: https://en.wikipedia.org/wiki/V/H/S from bs4 import BeautifulSoup import urllib.request failed = [] with open('wk.txt') as f: for line in f: if "file:///wiki/" in line: line = line.replace("file:///wiki/","https://en.wikipedia.org/wiki/").strip() if "http" in line.lower(): headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'} req = urllib.request.Request(line, headers=headers) try: ...