How to install Deepseek-R1 with Open Web UI on laptop with Ubuntu

DeepSeek

Today I decided to test Deepseek-R1 on my old laptop with Ubuntu. In this article I will describe the steps that I used to run the model.

Ollama

First of all you need to install Ollama that will allow you to run queries to your language model.

Here is the official website: https://ollama.com/

The installation is quite simple, just run the next command that you can find in the official documentation

After the installation you will probably asked to reboot your computer to apply the changes.

Because Ollama installed as a system service you can verify the status via basic systemctl command like this

You will see something like this, that says that the service is running

Pay attention that the service is enabled by default, so if you want to change this you need to disable it and then run manually whenever you need it.

To remove from autostart

To return back to autostart

To start/stop/restart service


DeepSeek language model

Now you need to download the model. The Deepseek models ids can be found on the next page

As you can see on this page we can use different models that have different amount of parameters, and as result the size

in this example we will use 7b, but you can choose another one.

So let’s download and run the model, for this we need to execute next command

if you want to download particular model, just use the full name of this model, for example

It makes sense to use a model that can be fully loaded into your GPU’s video memory. For example, if you have 6GB of VRAM, using an 8B model (4.7GB) is a good choice. If you have 12GB of VRAM, the 14B model (9GB) would be better suited for your needs. For a high-end GPU like the RTX 3090 with 24GB of VRAM, the 32B model (20GB) would be an excellent choice.

After the model downloaded you will see the prompt where you can chat with the model

You can use “/?” to see available commands

So, to exit from the prompt to console just use next the

To re-enter to the prompt run again


Ollama API

As I already said by default Ollama started as a service which means that even if you exit from the prompt it will run in background and we can send basic HTTP requests to the service.

The host is localhost and default port is 11434

API documentation can be found in GitHub: https://github.com/ollama/ollama/blob/main/docs/api.md

Here is example of the request via CURL


Open Web UI

Open Web UI is an GUI interface similar to ChatGPT web ui.

For the installation I will use Docker, so you need to install it if it’s not installed

After the docker installation pull the main image

If you have Nvidia videocard that supports CUDA pull the “cuda” image

After the pulling just run the next command that will start docker container

Open your browser and navigate to http://localhost:8080 . On first run you will be asked to create the admin account by entering email and password.

After you create an account you will be able to log in and start the chat.

If the model is not appeared then go to settings and check connection with ollama.

If you have a slow laptop like mine you can face the “Network Problem” error. If it’s so try to wait some time, the response will appear after will be generated. You can measure the average response time by using Ollama Prompt or API first. This problem will look like


Other. Videocard information

The videocard compatibility with Ollama can be checked here: https://github.com/ollama/ollama/blob/main/docs/gpu.md .

The load of the Nvidia videocard can be checked with the utility nvidia-smi, just run it in terminal and you can monitor memory usage and GPU utilisation, here is example of my GTX 1660 TI

You can also get info about your card in other formats, for example run next to get you videocard model

You can also use Chrome to quickly get info by using “chrome://gpu/” as an URL


Other. Ollama CLI commands

  • ollama serve = Starts Ollama on your local system.
  • ollama create = Creates a new model from an existing one for customization or training.
  • ollama show = Displays details about a specific model, such as its configuration and release date.
  • ollama run = Runs the specified model, making it ready for interaction
  • ollama pull = Downloads the specified model to your system.
  • ollama list = Lists all the downloaded models.
  • ollama ps = Shows the currently running models.
  • ollama stop = Stops the specified running model.
  • ollama rm = Removes the specified model from your system.

Other. Ollama system information

You can get some system information about current Ollama status by using next command

Here we have next columns:

  • NAME = loaded model name
  • ID = loaded model ID
  • SIZE = size of the model
  • PROCESSOR = indicates where the model is loaded:
    • 100% GPU: The model is fully loaded into the GPU.
    • 100% CPU: The model is fully loaded into system memory.
    • 48%/52% CPU/GPU: The model is partially loaded onto both the GPU and system memory
  • UNTIL = The time remaining until the model is unloaded from memory

Other. Useful links

You May Also Like

About the Author: vo