Python 3: Download a Webpage or File from URL
February 2, 2021|Updated February 3, 2021
Download as Text
Downloading as text data is required if you want to store the webpage or file to a string, and take advantage of the many available string functions such as split()
and find()
to process the data.
Read to String
1
2
3
4
5
6
| from urllib.request import urlopen
with urlopen( 'https://example.com/' ) as webpage:
content = webpage.read().decode()
print( content )
|
.read()
first downloads the data in binary format, then .decode()
converts it to a string using Unicode UTF-8 decoding rules. If the text is encoded in a different format, such as ASCII, you have to specify the format explicitly as an argument to decode()
:
content = webpage.read().decode( 'ascii' )
Save to File (Works Only for Decoded Text Data)
1
2
3
4
5
6
7
8
9
| from urllib.request import urlopen
# Download from URL and decode as UTF-8 text.
with urlopen( 'https://example.com/' ) as webpage:
content = webpage.read().decode()
# Save to file.
with open( 'output.html', 'w' ) as output:
output.write( content )
|
Download as Binary Data to bytes
If you don't need to use string operations like find()
on the downloaded data, or if the data isn't text data at all (e.g., image, video, or Excel files), then you can simply treat it as binary data (type bytes
).
Read to Variable
1
2
3
4
| from urllib.request import urlopen
with urlopen( 'https://example.com/file.png' ) as file:
content = file.read()
|
Save to File (Works for Text or Binary Data)
1
2
3
4
5
6
7
8
9
| from urllib.request import urlopen
# Download from URL.
with urlopen( 'https://example.com/' ) as webpage:
content = webpage.read()
# Save to file.
with open( 'output.html', 'wb' ) as download:
download.write( content )
|