Computer Science Atlas
Code Review

Python 3 Examples: List the Contents of a Directory, Including Recursively

March 27, 2021
 

This article shows how to list the files and directories inside a directory using Python 3. Throughout this article, we'll refer to the following example directory structure:

Files
Copy
mydir/
    alpha/
        a1.html
        a2.html
    beta/
        b1.html
        b2.html
    index.html
    script.py
mydir/
alpha/
a1.html
a2.html
beta/
b1.html
b2.html
index.html
script.py

We'll assume the code examples will be saved in script.py above, and will be run from inside the mydir directory so that the relative path '.' always refers to mydir.

Using pathlib (Python 3.4 and up)

Non-Recursive

iterdir

To list the contents of a directory using Python 3.4 or higher, we can use the built-in pathlib library's iterdir() to iterate through the contents. In our example directory, we can write in script.py:

Copy
from pathlib import Path

for p in Path( '.' ).iterdir():
    print( p )
1
2
3
4
from pathlib import Path

for p in Path( '.' ).iterdir():
    print( p )

When we run from inside mydir, we should see output like:

Terminal
Copy
$ python3 script.py
alpha
beta 
index.html
script.py
$ python3 script.py
alpha
beta 
index.html
script.py

Because iterdir is non-recursive, it only lists the immediate contents of mydir and not the contents of subdirectories (like a1.html).

Note that each item returned by iterdir is also a pathlib.Path, so we can call any pathlib.Path method on the object. For example, to resolve each item as an absolute path, we can write in script.py:

Copy
from pathlib import Path

for p in Path( '.' ).iterdir():
    print( p.resolve() )
1
2
3
4
from pathlib import Path

for p in Path( '.' ).iterdir():
    print( p.resolve() )

This will list the resolved absolute path of each item instead of just the filenames.

Because iterdir returns a generator object (meant to be used in loops), if we want to store the results in a list variable, we can write:

Copy
from pathlib import Path

files = list( Path( '.' ).iterdir() )
print( files )
1
2
3
4
from pathlib import Path

files = list( Path( '.' ).iterdir() )
print( files )

glob

We can also use pathlib.Path.glob to list all files (the equivalent of iterdir):

Copy
from pathlib import Path

for p in Path( '.' ).glob( '*' ):
    print( p )
1
2
3
4
from pathlib import Path

for p in Path( '.' ).glob( '*' ):
    print( p )

Filename Pattern Matching with glob

If we want to filter our results using Unix glob command-style pattern matching, glob can handle that too. For example, if we only want to list .html files, we would write in script.py:

Copy
from pathlib import Path

for p in Path( '.' ).glob( '*.html' ):
    print( p )
1
2
3
4
from pathlib import Path

for p in Path( '.' ).glob( '*.html' ):
    print( p )

As with iterdir, glob returns a generator object, so we'll have to use list() if we want to convert it to a list:

Copy
from pathlib import Path

files = list( Path( '.' ).glob( '*.html' ) )
print( files )
1
2
3
4
from pathlib import Path

files = list( Path( '.' ).glob( '*.html' ) )
print( files )

Recursive

To recursively list the entire directory tree rooted at a particular directory (including the contents of subdirectories), we can use rglob. In script.py, we can write:

Copy
from pathlib import Path

for p in Path( '.' ).rglob( '*' ):
    print( p )
1
2
3
4
from pathlib import Path

for p in Path( '.' ).rglob( '*' ):
    print( p )

This time, when we run script.py from inside mydir, we should see output like:

Terminal
Copy
$ python3 script.py
alpha
beta
index.html
script.py
alpha/a1.html
alpha/a2.html
beta/b1.html
beta/b2.html
$ python3 script.py
alpha
beta
index.html
script.py
alpha/a1.html
alpha/a2.html
beta/b1.html
beta/b2.html

rglob is the equivalent of calling glob with **/ at the beginning of the path, so the following code is equivalent to the rglob code we just saw:

Copy
from pathlib import Path

for p in Path( '.' ).glob( '**/*' ):
    print( p )
1
2
3
4
from pathlib import Path

for p in Path( '.' ).glob( '**/*' ):
    print( p )

Filename Pattern Matching with rglob

Just as with glob, rglob also allows glob-style pattern matching, but automatically does so recursively. In our example, to list all *.html files in the directory tree rooted at mydir, we can write in script.py:

Copy
from pathlib import Path

for p in Path( '.' ).rglob( '*.html' ):
    print( p )
1
2
3
4
from pathlib import Path

for p in Path( '.' ).rglob( '*.html' ):
    print( p )

This should display all and only .html files, including those inside subdirectories:

Terminal
Copy
$ python3 script.py
index.html   
alpha/a1.html
alpha/a2.html
beta/b1.html
beta/b2.html
$ python3 script.py
index.html   
alpha/a1.html
alpha/a2.html
beta/b1.html
beta/b2.html

Since rglob is the same as calling glob with **/, we could also just use glob to achieve the same result:

Copy
from pathlib import Path

for p in Path( '.' ).glob( '**/*.html' ):
    print( p )
1
2
3
4
from pathlib import Path

for p in Path( '.' ).glob( '**/*.html' ):
    print( p )

Not Using pathlib

Non-Recursive

os.listdir

On any version of Python 3, we can use the built-in os library to list directory contents. In script.py, we can write:

Copy
import os

for filename in os.listdir( '.' ):
    print( filename )
1
2
3
4
import os

for filename in os.listdir( '.' ):
    print( filename )

Unlike with pathlib, os.listdir simply returns filenames as strings, so we can't call methods like .resolve() on the result items. To get full paths, we have to build them manually:

Copy
import os

root = '.'
for filename in os.listdir( root ):
    relative_path = os.path.join( root, filename )
    absolute_path = os.path.abspath( relative_path )
    print( absolute_path )
1
2
3
4
5
6
7
import os

root = '.'
for filename in os.listdir( root ):
    relative_path = os.path.join( root, filename )
    absolute_path = os.path.abspath( relative_path )
    print( absolute_path )

Another difference from pathlib is that os.listdir returns a list of strings, so we don't need to call list() on the result to convert it to a list:

Copy
import os

files = os.listdir( '.' )   # files is a list
print( files )
1
2
3
4
import os

files = os.listdir( '.' )   # files is a list
print( files )

glob

Also available on all versions of Python 3 is the built-in glob library, which provides Unix glob command-style filename pattern matching.

To list all items in a directory (equivalent to os.listdir), we can write in script.py:

Copy
import glob

for filename in glob.glob( './*' ):
    print( filename )
1
2
3
4
import glob

for filename in glob.glob( './*' ):
    print( filename )

This will produce output like:

Terminal
Copy
$ python3 script.py
./alpha
./beta
./index.html
./script.py
$ python3 script.py
./alpha
./beta
./index.html
./script.py

Note that the root directory ('.' in our example) is simply included in the path pattern passed into glob.glob().

Filename Pattern Matching with glob

To list only .html files, we can write in script.py:

Copy
import glob

for filename in glob.glob( './*.html' ):
    print( filename )
1
2
3
4
import glob

for filename in glob.glob( './*.html' ):
    print( filename )

Recursive

Since Python versions lower than 3.5 do not have a recursive glob option, and Python versions 3.5 and up have pathlib.Path.rglob, we'll skip recursive examples of glob.glob here.

os.walk

On any version of Python 3, we can use os.walk to list all the contents of a directory recursively.

os.walk() returns a generator object that can be used with a for loop. Each iteration yields a 3-tuple that represents a directory in the directory tree:

In our example, we can write in script.py:

Copy
import os

for current_dir, subdirs, files in os.walk( '.' ):
    # Current Iteration Directory
    print( current_dir )

    # Directories
    for dirname in subdirs:
        print( '\t' + dirname )

    # Files
    for filename in files:
        print( '\t' + filename )
1
2
3
4
5
6
7
8
9
10
11
12
13
import os

for current_dir, subdirs, files in os.walk( '.' ):
    # Current Iteration Directory
    print( current_dir )

    # Directories
    for dirname in subdirs:
        print( '\t' + dirname )

    # Files
    for filename in files:
        print( '\t' + filename )

This produces the following output:

Terminal
Copy
$ python3 script.py
.
        alpha     
        beta      
        index.html
        script.py 
./alpha
        a1.html
        a2.html
./beta
        b1.html
        b2.html
$ python3 script.py
.
        alpha     
        beta      
        index.html
        script.py 
./alpha
        a1.html
        a2.html
./beta
        b1.html
        b2.html

To get full paths instead of just filenames, we can write:

Copy
import os

for current_dir, subdirs, files in os.walk( '.' ):
    for dirname in subdirs:
        relative_path = os.path.join( current_dir, dirname )
        absolute_path = os.path.abspath( relative_path )
        print( absolute_path )
    for filename in files:
        relative_path = os.path.join( current_dir, filename )
        absolute_path = os.path.abspath( relative_path )
        print( absolute_path )
1
2
3
4
5
6
7
8
9
10
11
import os

for current_dir, subdirs, files in os.walk( '.' ):
    for dirname in subdirs:
        relative_path = os.path.join( current_dir, dirname )
        absolute_path = os.path.abspath( relative_path )
        print( absolute_path )
    for filename in files:
        relative_path = os.path.join( current_dir, filename )
        absolute_path = os.path.abspath( relative_path )
        print( absolute_path )

Filename Pattern Matching with walk

To filter results based on filenames, we have to manually write pattern matching code. To accomplish that, we can use regular expressions or string methods on the filenames. For example, to only list .html files in our example directory, we can write in script.py:

Copy
import os

for current_dir, _, files in os.walk( '.' ):
    # Skip subdirs since we're only interested in files.
    for filename in files:
        if filename.endswith( '.html' ):
            relative_path = os.path.join( current_dir, filename )
            absolute_path = os.path.abspath( relative_path )
            print( absolute_path )
1
2
3
4
5
6
7
8
9
import os

for current_dir, _, files in os.walk( '.' ):
    # Skip subdirs since we're only interested in files.
    for filename in files:
        if filename.endswith( '.html' ):
            relative_path = os.path.join( current_dir, filename )
            absolute_path = os.path.abspath( relative_path )
            print( absolute_path )