We’ve all used ChatGPT to write a function or debug a regex. But that’s just the tip of the iceberg. The real power of Large Language Models (LLMs) isn’t in the “chat”—it’s in the integration.

As I explored in my 2025 series on Jupyter and AI, the real value of these tools comes when they are part of a structured thinking process. By combining the interactive execution of Jupyter Notebooks with the reasoning power of Local LLMs, we can build something much more powerful: a Personal Knowledge Engine.

In this post, I share my workflow for moving beyond “prompt engineering” and into the world of executable reasoning.

The Problem with “Chat”

The chat interface is great for quick questions, but it has several limitations for serious engineering work:

  1. Context Limits: You can only paste so much code before the model starts to “forget” the beginning.
  2. Hallucinations: You have to manually copy-paste the code into your IDE to see if it actually works.
  3. Privacy: You might not want to send your company’s proprietary architecture or sensitive data to a cloud provider.
  4. Latency: Waiting for a cloud API can break your “flow state.”

The Solution: Jupyter + Local LLMs

By running models locally (using tools like Ollama, LM Studio, or vLLM) and interacting with them through a Jupyter Notebook, we solve these problems.

1. Context Injection (RAG for One)

Instead of copy-pasting, we can use Python to programmatically inject context. I use a simple script that crawls my local project directory, extracts the relevant interfaces and documentation, and feeds them into the LLM’s context window.

This is essentially a “Retrieval-Augmented Generation” (RAG) system, but instead of a massive vector database, it’s just a few Python functions that understand my project’s structure. This allows the LLM to “know” about my internal design system or my specific API patterns without me having to explain them every time.

2. Executable Reasoning

This is the “killer feature.” In a Jupyter Notebook, I can ask the LLM to generate a piece of code, and then immediately run it in the next cell.

If the code fails, I don’t have to explain the error to the LLM. I can just pipe the stack trace back into the model and ask it to fix it. This creates a “tight loop” of reasoning and verification. The LLM isn’t just guessing; it’s being held accountable by the Python kernel.

3. Privacy and Speed

Running models like Llama 3, Mistral, or Phi-3 on your own hardware (especially if you have a modern GPU) is surprisingly fast. But more importantly, it’s private. I can feed my entire resume.yaml or my internal project roadmaps into the model without worrying about where that data is going.

This allows me to use the LLM for high-level architectural decision-making—like “How should I refactor this legacy module to support gRPC?"—using the actual, sensitive context of the project.

My Workflow: A Typical Session

  1. Initialize: I run a cell that loads my local LLM and indexes the current project’s README.md and core interfaces.
  2. The Query: I ask a complex question: “Based on our current AuthService, how should we implement a new multi-factor authentication flow?”
  3. The Draft: The LLM generates a draft implementation in a Markdown cell.
  4. The Verification: I copy the code into a Python cell and run it against a mock version of the service.
  5. The Refinement: If the logic is sound, I use the LLM to generate the corresponding unit tests.

Conclusion: Stop Chatting, Start Building

The era of “prompt engineering” as a standalone skill is ending. The future belongs to engineers who can integrate these models into their existing workflows.

By using Jupyter as your “workbench” and local LLMs as your “advisor,” you can build a knowledge engine that is faster, more private, and more accurate than any chat interface. It’s about moving from “AI as a toy” to “AI as a professional tool.”

Further Reading & References