LOCAL MODELS • NO CLOUD REQUIRED • SIMPLE SETUP • TERMINAL POWER • SELF-HOST YOUR OWN AI • MID-2000s WEB SPIRIT •

Simple guide to a local Ollama server

This is the straightforward version: install Ollama, download a model, run it, and connect to it from your browser or another app.

Why Ollama?

Ollama is one of the easiest ways to run large language models on your own machine. It gives you a local server and a simple command line interface, which is exactly what most people need to get started.

  • Your prompts stay on your machine
  • You can use it offline once the model is downloaded
  • It is easy to connect to apps, scripts, and web UIs
  • It makes local AI feel practical instead of theoretical
If your goal is "I want my own AI running in my house, on my hardware," Ollama is a very solid starting point.

Install It

Go to the Ollama website and install it for your system. On Linux, that usually means using their install command. On Windows or Mac, use the normal installer.

Typical Linux install:

curl -fsSL https://ollama.com/install.sh | sh

After that, check that it works:

ollama --version
If the command is not found, your install probably did not finish cleanly, or the binary is not on your path yet.

Get a Model

Once Ollama is installed, pull down a model. Start with something reasonable instead of grabbing the biggest thing you can find.

Example:

ollama run qwen3.5:2b

The first time you run that, Ollama downloads the model and then opens an interactive chat in your terminal.

You can also list what you have later:

ollama list
Smaller models start faster and are easier on your hardware. That is usually the right move while you are testing.

Run the Server

Ollama works as a local server. On many systems, it starts automatically after install. If not, you can start it yourself.

ollama serve

By default, it usually listens on:

http://127.0.0.1:11434

That means apps on the same machine can talk to it through that address.

To test it from terminal, you can still do something simple like:

ollama run qwen3.5:2b

Or from another tool or script, you point it at the local Ollama API.

If you want other devices on your network to reach it, that is a separate step. Local-only is the safest default.

Real-World Tips

  1. Start simple. Get one model working before building a whole stack around it.
  2. Watch RAM and VRAM. Bigger models are not automatically better for your setup.
  3. Keep a terminal open while testing so you can actually see what is happening.
  4. Use Ollama as the engine, then add a web UI later if you want a nicer frontend.
  5. Do not expose your server to the internet unless you really know what you are doing.

A lot of frustration comes from trying to do too much at once. The better path is: install, test, confirm, then expand.

The real win is not just “running AI.” The real win is understanding your own setup well enough that you can change it when you need to.