This is Part 2 of a three-part series. In Part 1: The Strategic Value of Thinking in Notebooks, we discussed why and when to use Jupyter. Here, we dive into the technical implementation. Part 3: Real-World Code Examples covers practical use cases.
The Modern Jupyter Stack
For a software engineer, the “standard” way of installing Jupyter (global pip install) is often the wrong way. It leads to dependency hell and “it works on my machine” syndrome.
Here is a professional setup guide.
1. Installation & Environment Management
The “UV” Way (Recommended)
If you haven’t tried uv yet, it’s a lightning-fast Python package manager. It makes managing Jupyter environments trivial.
| |
The Traditional Virtualenv Way
If you prefer standard tools:
| |
2. Choosing Your Interface
JupyterLab (The Browser Experience)
JupyterLab is the next-generation web-based user interface. It supports tabs, file browsers, and terminal access.
- Run it:
jupyter lab - Best for: Deep data exploration and when you want a dedicated workspace.
VS Code (The Engineer’s Choice)
Most software engineers should use the VS Code Jupyter Extension.
- Why: You get your familiar keybindings, themes, and Copilot integration directly inside the notebook.
- Setup: Install the “Jupyter” extension from the Marketplace. Open any
.ipynbfile, and VS Code will prompt you to select a kernel (point it to your.venv).
3. Managing Kernels
A Kernel is the engine that runs your code. You can have different kernels for different projects (e.g., one for Python 3.10, one for R, one for a specific project with heavy dependencies).
To make your virtual environment available as a kernel:
| |
4. Version Control: The “Notebook Problem”
Standard .ipynb files are JSON blobs containing code, metadata, and outputs (like large images or dataframes). This makes Git diffs unreadable.
Solution: Jupytext
Jupytext allows you to pair your notebooks with plain .py files.
- You edit the
.ipynbin the UI. - Jupytext automatically saves a
.pyversion. - You commit the
.pyfile to Git. - Result: Clean, readable code reviews.
Solution: nbstripout
Use nbstripout as a git filter to automatically remove output cells before committing.
| |
5. Storage & Remote Execution
- Local: Keep your notebooks in a dedicated
/notebooksfolder in your repo. - Cloud (Google Colab / Kaggle): Great for quick tests or when you need a free GPU.
- Self-Hosted (JupyterHub): If your team needs a shared environment with access to internal databases.
6. Project Structure & Hierarchy
As your research grows, a single folder full of untitled1.ipynb files becomes a nightmare. A professional Jupyter project should follow a predictable hierarchy.
The “Research-First” Structure
| |
Best Practices
- Number your notebooks: Prefixing filenames with
01-,02-ensures they appear in the order of the workflow. - The “Notebook-to-Script” Pipeline: Once a function in a notebook becomes stable and reused across multiple notebooks, move it to
src/utils.py. This keeps notebooks clean and makes the code testable. - Data Isolation: Always keep
data/rawread-only. Any transformations should be saved intodata/processed.
Conclusion
Setting up Jupyter correctly is the difference between a messy experiment and a professional research tool. By using modern package managers like uv, integrating with VS Code, and handling version control with Jupytext, you turn Jupyter into a first-class citizen of your development workflow.
Remember: Jupyter isn’t where you write your app; it’s where you understand the problems your app is trying to solve.
Further Reading & Resources
- Official Docs: JupyterLab Documentation
- Package Management: uv: An extremely fast Python package manager
- VS Code Integration: Working with Jupyter Notebooks in VS Code
- Version Control: Jupytext: Jupyter Notebooks as Markdown or Python Scripts
- Clean Diffs: nbstripout: Strip output from Jupyter and IPython notebooks
- Practical Examples: Part 3: Real-World Code Examples
Omid Farhang