This is Part 2 of a three-part series. In Part 1: The Strategic Value of Thinking in Notebooks, we discussed why and when to use Jupyter. Here, we dive into the technical implementation. Part 3: Real-World Code Examples covers practical use cases.
The Modern Jupyter Stack
For a software engineer, the “standard” way of installing Jupyter (global pip install) is often the wrong way. It leads to dependency hell and “it works on my machine” syndrome.
Here is how to set it up like a pro.
1. Installation & Environment Management
The “UV” Way (Recommended)
If you haven’t tried uv yet, it’s a lightning-fast Python package manager. It makes managing Jupyter environments trivial.
|
|
The Traditional Virtualenv Way
If you prefer standard tools:
|
|
2. Choosing Your Interface
JupyterLab (The Browser Experience)
JupyterLab is the next-generation web-based user interface. It supports tabs, file browsers, and terminal access.
- Run it:
jupyter lab - Best for: Deep data exploration and when you want a dedicated workspace.
VS Code (The Engineer’s Choice)
Most software engineers should use the VS Code Jupyter Extension.
- Why: You get your familiar keybindings, themes, and Copilot integration directly inside the notebook.
- Setup: Install the “Jupyter” extension from the Marketplace. Open any
.ipynbfile, and VS Code will prompt you to select a kernel (point it to your.venv).
3. Managing Kernels
A Kernel is the engine that runs your code. You can have different kernels for different projects (e.g., one for Python 3.10, one for R, one for a specific project with heavy dependencies).
To make your virtual environment available as a kernel:
|
|
4. Version Control: The “Notebook Problem”
Standard .ipynb files are JSON blobs containing code, metadata, and outputs (like large images or dataframes). This makes Git diffs unreadable.
Solution: Jupytext
Jupytext allows you to pair your notebooks with plain .py files.
- You edit the
.ipynbin the UI. - Jupytext automatically saves a
.pyversion. - You commit the
.pyfile to Git. - Result: Clean, readable code reviews.
Solution: nbstripout
Use nbstripout as a git filter to automatically remove output cells before committing.
|
|
5. Storage & Remote Execution
- Local: Keep your notebooks in a dedicated
/notebooksfolder in your repo. - Cloud (Google Colab / Kaggle): Great for quick tests or when you need a free GPU.
- Self-Hosted (JupyterHub): If your team needs a shared environment with access to internal databases.
6. Project Structure & Hierarchy
As your research grows, a single folder full of untitled1.ipynb files becomes a nightmare. A professional Jupyter project should follow a predictable hierarchy.
The “Research-First” Structure
|
|
Best Practices
- Number your notebooks: Prefixing filenames with
01-,02-ensures they appear in the order of the workflow. - The “Notebook-to-Script” Pipeline: Once a function in a notebook becomes stable and reused across multiple notebooks, move it to
src/utils.py. This keeps notebooks clean and makes the code testable. - Data Isolation: Always keep
data/rawread-only. Any transformations should be saved intodata/processed.
Conclusion
Setting up Jupyter correctly is the difference between a messy experiment and a professional research tool. By using modern package managers like uv, integrating with VS Code, and handling version control with Jupytext, you turn Jupyter into a first-class citizen of your development workflow.
Remember: Jupyter isn’t where you write your app; it’s where you understand the problems your app is trying to solve.
Further Reading & Resources
- Official Docs: JupyterLab Documentation
- Package Management: uv: An extremely fast Python package manager
- VS Code Integration: Working with Jupyter Notebooks in VS Code
- Version Control: Jupytext: Jupyter Notebooks as Markdown or Python Scripts
- Clean Diffs: nbstripout: Strip output from Jupyter and IPython notebooks
- Practical Examples: Part 3: Real-World Code Examples
Omid Farhang