Computer Science Atlas
Code Review

Python 3 Examples: Download a Webpage or File from URL

February 2, 2021|Last Updated February 3, 2021
 

Download as Text

Downloading as text data is required if you want to store the webpage or file to a string, and take advantage of the many available string functions such as split() and find() to process the data.

Read to String

Copy
from urllib.request import urlopen

with urlopen( 'https://example.com/' ) as webpage:
    content = webpage.read().decode()

print( content )
1
2
3
4
5
6
from urllib.request import urlopen

with urlopen( 'https://example.com/' ) as webpage:
    content = webpage.read().decode()

print( content )

.read() first downloads the data in binary format, then .decode() converts it to a string using Unicode UTF-8 decoding rules. If the text is encoded in a different format, such as ASCII, you have to specify the format explicitly as an argument to decode():

Copy
content = webpage.read().decode( 'ascii' )
1
content = webpage.read().decode( 'ascii' )

Save to File (Works Only for Decoded Text Data)

Copy
from urllib.request import urlopen

# Download from URL and decode as UTF-8 text.
with urlopen( 'https://example.com/' ) as webpage:
    content = webpage.read().decode()

# Save to file.
with open( 'output.html', 'w' ) as output:
    output.write( content )
1
2
3
4
5
6
7
8
9
from urllib.request import urlopen

# Download from URL and decode as UTF-8 text.
with urlopen( 'https://example.com/' ) as webpage:
    content = webpage.read().decode()

# Save to file.
with open( 'output.html', 'w' ) as output:
    output.write( content )

Download as Binary Data to bytes

If you don't need to use string operations like find() on the downloaded data, or if the data isn't text data at all (e.g., image, video, or Excel files), then you can simply treat it as binary data (type bytes).

Read to Variable

Copy
from urllib.request import urlopen

with urlopen( 'https://example.com/file.png' ) as file:
    content = file.read()
1
2
3
4
from urllib.request import urlopen

with urlopen( 'https://example.com/file.png' ) as file:
    content = file.read()

Save to File (Works for Text or Binary Data)

Copy
from urllib.request import urlopen

# Download from URL.
with urlopen( 'https://example.com/' ) as webpage:
    content = webpage.read()

# Save to file.
with open( 'output.html', 'wb' ) as download:
    download.write( content )
1
2
3
4
5
6
7
8
9
from urllib.request import urlopen

# Download from URL.
with urlopen( 'https://example.com/' ) as webpage:
    content = webpage.read()

# Save to file.
with open( 'output.html', 'wb' ) as download:
    download.write( content )