Structuring Your Python Project: A Best Practice Guide
A well-structured project is the cornerstone of maintainable, scalable, and collaborative software. This document aims to provide a modern best-practice guide on how to structure a typical Python project.
References:
1. Why is Project Structure Important?
Good project structure is about more than just aesthetics. When a potential contributor or user lands on your repository, a clear structure is the first step for them to understand your project. More importantly, in the long run, a logically organized structure enables:
- Reduced Cognitive Load: Allows team members to quickly locate the code they need to find or modify.
- Simplified Dependencies and Imports: Avoids complex relative import and path issues.
- Easier Automation: Makes automated processes like testing, building, and deployment easier to configure.
2. The "Gold Standard" Project Structure Example
The following is a widely recognized project structure suitable for most small to medium-sized Python applications or libraries.
my_project/
├── .gitignore # List of files for Git to ignore
├── docs/ # Documentation directory
│ ├── conf.py
│ └── index.rst
├── src/ # Source code directory (src-layout)
│ └── my_package/ # Your Python package
│ ├── __init__.py
│ ├── module1.py
│ └── module2.py
├── tests/ # Test directory
│ ├── test_module1.py
│ └── test_module2.py
├── LICENSE # Project license
├── Makefile # (Optional) Task runner
├── pyproject.toml # The core configuration file for modern Python projects
└── README.md # Project description
3. Detailed Breakdown of Each Part
README.md
This is the front page of your project. It should clearly explain:
- What the project does.
- How to install and configure it.
- A quick-start usage example.
- How to contribute to the project.
LICENSE
A legal document that defines how others can use, modify, and distribute your code. If you're unsure which license to use, visit choosealicense.com. The absence of a license prevents many people from using your code with confidence.
.gitignore
Tells Git which files or directories should not be included in version control. A typical Python .gitignore
file would include:
- Virtual environment directories (
.venv/
,env/
) - Python cache files (
__pycache__/
,*.pyc
) - IDE and OS-generated files (
.idea/
,.vscode/
,.DS_Store
) - Build artifacts (
build/
,dist/
,*.egg-info
)
pyproject.toml
This is the core of a modern Python project. According to PEP 518 and PEP 621, this file unifies the project's build information and metadata, replacing the old combination of setup.py
, setup.cfg
, and requirements.txt
.
It should contain:
- Project Metadata: The
[project]
table, including name, version, description, authors, license, etc. - Project Dependencies: The
[project.dependencies]
list, defining the libraries required for the project to run. - Development Dependencies: Usually defined in a group named
dev
ortest
under[project.optional-dependencies]
. - Build System Information: The
[build-system]
table, specifying the tools required to build the project (e.g.,poetry-core
orsetuptools
).
src/
Directory Layout (Src Layout)
This is a key practice in modern Python project structuring: placing your main source code package inside a src
directory.
Why use a src
layout?
- Avoids Accidental Imports: If your package is in the root directory, you might accidentally import it via a relative path during development, even if it's not properly installed. This will cause an
ImportError
when someone else tries to install and use your package viapip
. Thesrc
layout forces you to install your project in editable mode (pip install -e .
) for local development, thus ensuring your test environment behaves identically to a user's installation environment. - Clear Separation of Concerns: It clearly separates your source code from other parts of the project like
docs
,tests
, and configuration files.
tests/
Directory
This is where all your test code should live.
- Separated from Source: Keeping tests in a top-level
tests
directory, rather than inside your package, prevents them from being accidentally included in your final distribution package. - Running Tests: You can use a tool like
pytest
to automatically discover and run all tests within thetests
directory.
docs/
Directory
This is for your project's detailed documentation. It's common to use Sphinx to generate HTML documentation, which can automatically extract API references from your code's docstrings.
Makefile
(Optional)
Although make
was originally designed for C projects, it's an incredibly convenient general-purpose task runner. You can use it to define a series of shortcuts for common project commands.
A simple Makefile
might look like this:
.PHONY: install test docs clean
install:
# Install development dependencies
pip install -e ".[dev]"
test:
# Run tests
pytest
docs:
# Build documentation
sphinx-build docs/ docs/_build
clean:
# Clean up build cache
rm -rf build/ dist/ .eggs/ __pycache__/
This way, you only need to run simple commands like make install
, make test
, etc., without having to remember the full command lines.
4. Advanced Structure: Managing Multiple Packages in a Single Repository
As projects grow very large, you might encounter a more complex scenario: maintaining and distributing multiple installable packages from a single repository (a "Monorepo"), while they share the same src
directory. For example, acme.core
and acme.client
.
For this situation, Python provides the Namespace Packages mechanism.
Core Concept: Namespace Packages
Difference from Regular Packages:
- A Regular Package must contain an
__init__.py
file in its directory. - A Namespace Package, in contrast, is a directory that must not contain an
__init__.py
file at its top level.
- A Regular Package must contain an
How it Works: When the Python interpreter encounters a directory without an
__init__.py
, it treats it as a namespace. This allows multiple physically separate directories to contribute to the same logical package name.
Structure Example
Let's assume we have an acme
namespace containing two independent sub-packages, core
and client
.
my_monorepo/
├── src/
│ └── acme/ # Top-level of the namespace package (NO __init__.py)
│ ├── core/ # acme.core sub-package (a regular package)
│ │ ├── __init__.py
│ │ └── logic.py
│ └── client/ # acme.client sub-package (a regular package)
│ ├── __init__.py
│ └── app.py
│
├── pyproject.toml # Config file for building acme.core
└── pyproject.client.toml # Config for acme.client (one possible way to organize)
The Build Challenge and Solution
The Challenge: The standard specifies that one pyproject.toml
file defines one project. So how do we build two different distribution packages from the same source tree?
The Solution: The prevailing practice is to create separate build configurations for each distributable sub-package. While pyproject.toml
itself doesn't support defining multiple projects in one file, we can leverage the flexibility of build backends to achieve this.
Method: Use a Separate pyproject.toml
for Each Package
This is the cleanest and most standard approach. You can create a dedicated directory for each sub-package to hold its build configuration.
my_monorepo/
├── .git/
├── src/
│ └── acme/
│ ├── core/
│ │ └── ...
│ └── client/
│ └── ...
│
├── packages/ # Create a directory for each distributable package
│ ├── acme-core/
│ │ └── pyproject.toml
│ └── acme-client/
│ └── pyproject.toml
│
└── README.md
Then, in each sub-package's pyproject.toml
, you need to tell the build backend (e.g., hatchling
, setuptools
) where to find the source code.
packages/acme-core/pyproject.toml
Example (with hatchling):
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[project]
name = "acme.core"
version = "0.1.0"
# ... other metadata ...
[tool.hatch.build.targets.wheel]
# Explicitly tell hatchling to only package src/acme/core
packages = ["../../src/acme/core"]
This way, when you navigate into the packages/acme-core/
directory and run the build command (e.g., python -m build
), it will only package the code under src/acme/core
into the acme.core
distribution. The configuration for acme.client
would be similar.
Clarification on a "Unified Export" File
Python does not have a direct equivalent to JavaScript's index.js
for a "unified export" file.
- The
__init__.py
file serves as the entry point and facade for a single package, not for the entiresrc
directory. - In the multi-package scenario above,
src/acme/core/__init__.py
defines the public API for theacme.core
package, whilesrc/acme/client/__init__.py
defines the public API for theacme.client
package. - There is no single file that can export members from both
acme.core
andacme.client
simultaneously. They are two separate worlds that will ultimately be installed into the sameacme
namespace in a user's environment.