Computer Science Atlas
Snippets

Python 3: Download a Webpage or File from URL

February 2, 2021|Updated February 3, 2021
 
Table of Contents

Download as Text

Downloading as text data is required if you want to store the webpage or file to a string, and take advantage of the many available string functions such as split() and find() to process the data.

Read to String

from urllib.request import urlopen

with urlopen( 'https://example.com/' ) as webpage:
    content = webpage.read().decode()

print( content )
1
2
3
4
5
6
from urllib.request import urlopen

with urlopen( 'https://example.com/' ) as webpage:
    content = webpage.read().decode()

print( content )

.read() first downloads the data in binary format, then .decode() converts it to a string using Unicode UTF-8 decoding rules. If the text is encoded in a different format, such as ASCII, you have to specify the format explicitly as an argument to decode():

content = webpage.read().decode( 'ascii' )
content = webpage.read().decode( 'ascii' )

Save to File (Works Only for Decoded Text Data)

from urllib.request import urlopen

# Download from URL and decode as UTF-8 text.
with urlopen( 'https://example.com/' ) as webpage:
    content = webpage.read().decode()

# Save to file.
with open( 'output.html', 'w' ) as output:
    output.write( content )
1
2
3
4
5
6
7
8
9
from urllib.request import urlopen

# Download from URL and decode as UTF-8 text.
with urlopen( 'https://example.com/' ) as webpage:
    content = webpage.read().decode()

# Save to file.
with open( 'output.html', 'w' ) as output:
    output.write( content )

Download as Binary Data to bytes

If you don't need to use string operations like find() on the downloaded data, or if the data isn't text data at all (e.g., image, video, or Excel files), then you can simply treat it as binary data (type bytes).

Read to Variable

from urllib.request import urlopen

with urlopen( 'https://example.com/file.png' ) as file:
    content = file.read()
1
2
3
4
from urllib.request import urlopen

with urlopen( 'https://example.com/file.png' ) as file:
    content = file.read()

Save to File (Works for Text or Binary Data)

from urllib.request import urlopen

# Download from URL.
with urlopen( 'https://example.com/' ) as webpage:
    content = webpage.read()

# Save to file.
with open( 'output.html', 'wb' ) as download:
    download.write( content )
1
2
3
4
5
6
7
8
9
from urllib.request import urlopen

# Download from URL.
with urlopen( 'https://example.com/' ) as webpage:
    content = webpage.read()

# Save to file.
with open( 'output.html', 'wb' ) as download:
    download.write( content )