<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>CUDA on avni.sh</title>
    <link>http://www.avni.sh/tags/cuda/</link>
    <description>Recent content in CUDA on avni.sh</description>
    <image>
      <title>avni.sh</title>
      <url>http://www.avni.sh/cover.webp</url>
      <link>http://www.avni.sh/cover.webp</link>
    </image>
    <generator>Hugo -- 0.146.0</generator>
    <language>en</language>
    <lastBuildDate>Fri, 12 Jan 2024 00:00:00 +0000</lastBuildDate>
    <atom:link href="http://www.avni.sh/tags/cuda/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Self Hosting LLMs using Ollama</title>
      <link>http://www.avni.sh/posts/computer-science/technologies/self-hosting/self-hosting-ollama/</link>
      <pubDate>Fri, 12 Jan 2024 00:00:00 +0000</pubDate>
      <guid>http://www.avni.sh/posts/computer-science/technologies/self-hosting/self-hosting-ollama/</guid>
      <description>Hosting Large Language Models (LLMs) on your infrastructure and integrating them with your development environment</description>
      <content:encoded><![CDATA[<p>Ollama provides an interface to self-host and interact with
open-source LLMs (Large Language Models) using its binary
or container image. Managing LLMs using Ollama
is like managing <a href="/posts/computer-science/technologies/cloud-native/container-lifecycle" target="_blank">container lifecycle</a> using container engines like <code>docker</code> or <code>podman</code>.</p>
<ul>
<li>
<p>Ollama commands <code>pull</code> and <code>run</code> are used to download and execute
LLMs respectively, just like the ones used to manage containers
with <code>podman</code> or <code>docker</code>.</p>
</li>
<li>
<p>Tags like <code>13b-python</code> and <code>7b-code</code> are used to manage different
variations of an LLM.</p>
</li>
<li>
<p>A <code>Modelfile</code> (like <code>Dockerfile</code>) is created to build a custom
model using an existing LLM as its base. Additional
parameters like <code>TEMPLATE</code> and <code>PARAMETER</code> could be used to define
a prompt template or fine-tune model parameters respectively.</p>
</li>
</ul>
<h1 id="deploying-ollama-container-with-nvidia-gpu">Deploying Ollama container with NVIDIA GPU</h1>
<p>Deploying the Ollama container directly would allow it to
utilize CPU resources for its LLM workloads, but with the parallel
computation capabilities of a Graphics Processing Unit (GPU), we can
improve the inference performance of all models.</p>
<p>In this article I&rsquo;m using an <strong>NVIDIA GeForce RTX 3070 Ti</strong> GPU, if you
want to use a GPU from AMD/Intel or any other manufacturer then steps
like driver and container toolkit installation and GPU configuration
for the container engine will differ.</p>
<h2 id="gpu-passthrough-to-vm">GPU Passthrough to VM</h2>
<p>I am deploying the Ollama container on a Fedora 38 virtual machine
so the first step will be the GPU Passthrough from my hypervisor
(Proxmox) to the VM. You can skip this step if you are deploying
Ollama on a baremetal machine.</p>
<p>In the Proxmox&rsquo;s Web UI, we can go to the VM&rsquo;s <code>Hardware</code> section
and <code>Add</code> your <code>PCI  Device</code> i.e. your GPU.</p>
<p align="center"><img src="proxmox-hardware.png" alt="Proxmox VM's Hardware Section"></p>
<p align="center"><small>Proxmox VM's Hardware Section</small></p>
<p>Make sure to mark the <code>All Functions</code> checkbox.</p>
<p align="center"><img src="proxmox-gpu-passthrough.png" alt="GPU Passthrough to a Proxmox VM"></p>
<p align="center"><small>GPU Passthrough to a Proxmox VM</small></p>
<p>Once the VM is rebooted we can verify the GPU Passthrough using
the following command.</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">lspci <span class="p">|</span> grep NVIDIA
</span></span></code></pre></td></tr></table>
</div>
</div><p>If the GPU name is present in the command&rsquo;s output (like below)
then the passthrough is successful and we can move to the next step.</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="cl">06:10.0 VGA compatible controller: NVIDIA Corporation GA104 [GeForce RTX 3070 Ti] (rev a1)
</span></span><span class="line"><span class="cl">06:10.1 Audio device: NVIDIA Corporation GA104 High Definition Audio Controller (rev a1)
</span></span></code></pre></td></tr></table>
</div>
</div><h2 id="cuda-toolkit-installation">CUDA Toolkit Installation</h2>
<p>To utilize the parallel computation capabilities of the CUDA cores
provided in NVIDIA GPUs we have to install the CUDA Toolkit.
You can follow <a href="https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html" target="_blank">NVIDIA&rsquo;s documentation</a> on the CUDA Toolkit
installation on Linux because the steps vary depending on the host&rsquo;s configuration.</p>
<p>Here are the steps for Fedora 38:</p>
<ol>
<li>Downloading CUDA Toolkit Repo RPM.</li>
</ol>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">wget https://developer.download.nvidia.com/compute/cuda/12.3.2/local_installers/cuda-repo-fedora37-12-3-local-12.3.2_545.23.08-1.x86_64.rpm
</span></span></code></pre></td></tr></table>
</div>
</div><ol start="2">
<li>Installing CUDA Toolkit Repo RPM.</li>
</ol>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">sudo rpm -i cuda-repo-fedora37-12-3-local-12.3.2_545.23.08-1.x86_64.rpm
</span></span></code></pre></td></tr></table>
</div>
</div><ol start="3">
<li>Cleaning <code>dnf</code> Repository Metadata.</li>
</ol>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">sudo dnf clean all
</span></span></code></pre></td></tr></table>
</div>
</div><ol start="4">
<li>Installing <code>cuda-toolkit</code> package.</li>
</ol>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">sudo dnf -y install cuda-toolkit-12-3
</span></span></code></pre></td></tr></table>
</div>
</div><ol start="5">
<li>Installing <code>legacy</code> (proprietary) or <code>open</code> (open source)
kernel module for <code>nvidia-driver</code>.</li>
</ol>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">sudo dnf -y module install nvidia-driver:latest-dkms
</span></span></code></pre></td></tr></table>
</div>
</div><p>or</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">sudo dnf -y module install nvidia-driver:open-dkms
</span></span></code></pre></td></tr></table>
</div>
</div><h2 id="nvidia-container-toolkit-installation">NVIDIA Container Toolkit Installation</h2>
<p>With <code>nvidia-container-tookit</code>, we can use our NVIDIA GPU
in containerized applications. Here are the steps for installing
NVIDIA Container Toolkit on Fedora 38:</p>
<ol>
<li>Adding <code>nvidia-container-tookit</code> repository.</li>
</ol>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo <span class="se">\
</span></span></span><span class="line"><span class="cl"><span class="se"></span>    <span class="p">|</span> sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
</span></span></code></pre></td></tr></table>
</div>
</div><ol start="2">
<li>Installing the <code>nvidia-container-tookit</code> package.</li>
</ol>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">sudo dnf install -y nvidia-container-toolkit
</span></span></code></pre></td></tr></table>
</div>
</div><ol start="3">
<li>Once the container toolkit is installed, we have to add its
runtime to our container engine.</li>
</ol>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">sudo nvidia-ctk runtime configure --runtime<span class="o">=</span>docker
</span></span></code></pre></td></tr></table>
</div>
</div><ol start="4">
<li>Finally, we can start using our NVIDIA GPU with Docker
containers after restarting the <code>docker</code> Daemon.</li>
</ol>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">sudo systemctl restart docker
</span></span></code></pre></td></tr></table>
</div>
</div><h2 id="deploying-ollama-as-a-docker-container">Deploying Ollama as a Docker Container</h2>
<ol>
<li>Create a directory on our host to store LLMs to avoid
re-downloading models after reprovisioning or updating the
container.</li>
</ol>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">mkdir -p ~/container-data/ollama
</span></span></code></pre></td></tr></table>
</div>
</div><ol start="2">
<li>The following <code>compose.yaml</code> file will deploy
the <code>ollama</code> container with our NVIDIA GPU.</li>
</ol>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span><span class="lnt">16
</span><span class="lnt">17
</span><span class="lnt">18
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">version</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;3.6&#39;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">services</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">ollama</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">container_name</span><span class="p">:</span><span class="w"> </span><span class="l">ollama</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l">ollama/ollama:latest</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">volumes</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span>- <span class="l">~/container-data/ollama:/root/.ollama</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">ports</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span>- <span class="s2">&#34;11434:11434&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">restart</span><span class="p">:</span><span class="w"> </span><span class="l">unless-stopped</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">deploy</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">resources</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">reservations</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">devices</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span>- <span class="nt">driver</span><span class="p">:</span><span class="w"> </span><span class="l">nvidia</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">count</span><span class="p">:</span><span class="w"> </span><span class="l">all</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">capabilities</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">gpu]</span><span class="w">
</span></span></span></code></pre></td></tr></table>
</div>
</div><p>If you want to provision the container without GPU
you have to remove the <code>deploy</code> section.</p>
<ol start="3">
<li>Deploy the <code>ollama</code> container using the following command.</li>
</ol>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">docker compose down <span class="o">&amp;&amp;</span> docker compose up -d
</span></span></code></pre></td></tr></table>
</div>
</div><p>If you want to deploy Ollama with a ChatGPT-Style Web UI
then follow the deployment steps for <a href="/posts/computer-science/technologies/self-hosting/self-hosting-ollama/#ollama-web-ui">Ollama Web UI</a>.</p>
<h1 id="managing-llms-using-ollama">Managing LLMs using Ollama</h1>
<p>Once the container is provisioned we can start downloading and
executing models.</p>
<p>To attach the Ollama container with a terminal use the
following command</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">docker <span class="nb">exec</span> -it ollama /bin/bash
</span></span></code></pre></td></tr></table>
</div>
</div><h2 id="downloading-llms-using-the-pull-command">Downloading LLMs using the <code>pull</code> command</h2>
<p>To download a model use the <code>ollama pull</code> command with the name
of LLM and its tag (refer to the <a href="https://ollama.ai/library" target="_blank">Ollama Library</a>).</p>
<p>For example, to download the Code Llama model with 7 Billion
parameters we have to pull the <code>codellama:7b</code> model.</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">ollama pull codellama:7b
</span></span></code></pre></td></tr></table>
</div>
</div><p>The model size could range from 4 to 19 GB (or even more).
So choosing the right model tag is crucial to decrease downloading
time and resource utilization.</p>
<p>If we want to delete a downloaded model we&rsquo;ll use the <code>ollama rm</code>
command followed by the name of the model.</p>
<h2 id="executing-llms-using-the-run-command">Executing LLMs using the <code>run</code> command</h2>
<p>Before we prompt the model we have to run it
first using the <code>ollama run</code> command followed by the name of
the model.</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">ollama run codellama:7b
</span></span></code></pre></td></tr></table>
</div>
</div><p>This command will drop us directly into the model&rsquo;s
prompting window.</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span><span class="lnt">7
</span><span class="lnt">8
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="cl">&gt;&gt;&gt; Who are you?
</span></span><span class="line"><span class="cl">I am LLaMA, an AI assistant developed by Meta AI that 
</span></span><span class="line"><span class="cl">can understand and respond to human input in a conversational 
</span></span><span class="line"><span class="cl">manner. I am trained on a massive dataset of text from the 
</span></span><span class="line"><span class="cl">internet and can generate human-like responses to a wide range 
</span></span><span class="line"><span class="cl">of topics and questions. I can be used to create chatbots, 
</span></span><span class="line"><span class="cl">virtual assistants, and other applications that require natural 
</span></span><span class="line"><span class="cl">language understanding and generation capabilities.
</span></span></code></pre></td></tr></table>
</div>
</div><h1 id="prompting-llms-from-command-line">Prompting LLMs from Command Line</h1>
<p>Ollama exposes multiple REST API endpoints to manage and interact
with the models</p>
<ul>
<li><code>/api/tags</code>: To list all the local models.</li>
<li><code>/api/generate</code>: To generate a response from an LLM with the
prompt passed as input.</li>
<li><code>/api/chat</code>: To generate the next chat response from an LLM.
The prior chat history could be passed as input.</li>
</ul>
<p>We can perform these API requests using <code>curl</code> and format the
response using <code>jq</code>.</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">curl -d <span class="s1">&#39;{
</span></span></span><span class="line"><span class="cl"><span class="s1">  &#34;model&#34;: &#34;codellama:7b&#34;,
</span></span></span><span class="line"><span class="cl"><span class="s1">  &#34;prompt&#34;: &#34;Write a quicksort program in Go&#34;,
</span></span></span><span class="line"><span class="cl"><span class="s1">  &#34;stream&#34;: false
</span></span></span><span class="line"><span class="cl"><span class="s1">}&#39;</span> http://localhost:11434/api/generate <span class="p">|</span> jq -r <span class="s2">&#34;.response&#34;</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>By assigning the <code>stream</code> as <code>false</code> we will receive the complete
response as a single JSON object rather than a stream of multiple
objects.</p>
<h1 id="ollama-web-ui">Ollama Web UI</h1>
<p>With self-hosted applications, it always helps to have a web interface
for management and access from any device.
The Ollama Web UI provides an interface similar to ChatGPT to interact
with LLMs present in Ollama.</p>
<h2 id="deploying-ollama-web-ui">Deploying Ollama Web UI</h2>
<p>Similar to the <code>ollama</code> container deployment we will create a data
directory for <code>ollama-webui</code></p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">mkdir -p ~/container-data/ollama-webui
</span></span></code></pre></td></tr></table>
</div>
</div><p>Modify our existing <code>compose.yaml</code>.</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span><span class="lnt">16
</span><span class="lnt">17
</span><span class="lnt">18
</span><span class="lnt">19
</span><span class="lnt">20
</span><span class="lnt">21
</span><span class="lnt">22
</span><span class="lnt">23
</span><span class="lnt">24
</span><span class="lnt">25
</span><span class="lnt">26
</span><span class="lnt">27
</span><span class="lnt">28
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">version</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;3.6&#39;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">services</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">ollama</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">container_name</span><span class="p">:</span><span class="w"> </span><span class="l">ollama</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l">ollama/ollama:latest</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">volumes</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span>- <span class="l">~/container-data/ollama:/root/.ollama</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">ports</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span>- <span class="s2">&#34;11434:11434&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">restart</span><span class="p">:</span><span class="w"> </span><span class="l">unless-stopped</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">deploy</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">resources</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">reservations</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">devices</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span>- <span class="nt">driver</span><span class="p">:</span><span class="w"> </span><span class="l">nvidia</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">count</span><span class="p">:</span><span class="w"> </span><span class="l">all</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">              </span><span class="nt">capabilities</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">gpu]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">ollama-webui</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">container_name</span><span class="p">:</span><span class="w"> </span><span class="l">ollama-webui</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l">ghcr.io/ollama-webui/ollama-webui:main</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">ports</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span>- <span class="s2">&#34;3030:8080&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">extra_hosts</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span>- <span class="l">host.docker.internal:host-gateway</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">volumes</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span>- <span class="l">~/container-data/ollama-webui:/app/backend/data</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">restart</span><span class="p">:</span><span class="w"> </span><span class="l">always</span><span class="w">
</span></span></span></code></pre></td></tr></table>
</div>
</div><p>Deploy both containers using <code>docker compose</code>.</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">docker compose down <span class="o">&amp;&amp;</span> docker compose up -d
</span></span></code></pre></td></tr></table>
</div>
</div><p>If the <code>ollama</code> container is deployed on a different host then
we have to rebuild the <code>ollama-webui</code> container image by following
the instructions from <a href="https://github.com/ollama-webui/ollama-webui#using-ollama-on-a-different-server" target="_blank">here</a>.</p>
<h2 id="managing-llms-from-ollama-web-ui">Managing LLMs from Ollama Web UI</h2>
<p>Once the deployment is completed we can visit the web UI
at <a href="http://localhost:3030" target="_blank"><code>localhost:3030</code></a>.</p>
<p align="center"><img src="ollama-webui.png" alt="Ollama Web UI"></p>
<p align="center"><small>Ollama Web UI</small></p>
<p>Alongside prompting we can also use the Web UI to manage models.</p>
<p align="center"><img src="ollama-webui-models.png" alt="Managing models using Ollama Web UI"></p>
<p align="center"><small>Managing models using Ollama Web UI</small></p>
<h1 id="integrating-ollama-with-neovim">Integrating Ollama with Neovim</h1>
<p>If you are using Neovim (<a href="/posts/developer-tools/my-development-environment" target="_blank">like me</a>)
then you can integrate models in your development environment
using <a href="https://github.com/nomnivore/ollama.nvim" target="_blank"><code>ollama.nvim</code></a>.</p>
<p><code>ollama.nvim</code> supports the following features:</p>
<ul>
<li>Code generation from a text prompt</li>
<li>Generating an explanation for a code snippet</li>
<li>Code modification suggestions</li>
</ul>
<p align="center"><img src="ollama-nvim.gif" alt="Code explanation from Ollama using ollama.nvim"></p>
<p align="center"><small>Code explanation from Ollama using ollama.nvim</small></p>
<p>I am using LazyVim so I&rsquo;ve created <code>~/.config/nvim/lua/plugins/ollama.lua</code>
with the following content.</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span><span class="lnt">16
</span><span class="lnt">17
</span><span class="lnt">18
</span><span class="lnt">19
</span><span class="lnt">20
</span><span class="lnt">21
</span><span class="lnt">22
</span><span class="lnt">23
</span><span class="lnt">24
</span><span class="lnt">25
</span><span class="lnt">26
</span><span class="lnt">27
</span><span class="lnt">28
</span><span class="lnt">29
</span><span class="lnt">30
</span><span class="lnt">31
</span><span class="lnt">32
</span><span class="lnt">33
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-lua" data-lang="lua"><span class="line"><span class="cl"><span class="kr">return</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">	<span class="s2">&#34;nomnivore/ollama.nvim&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">	<span class="n">dependencies</span> <span class="o">=</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">		<span class="s2">&#34;nvim-lua/plenary.nvim&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">	<span class="p">},</span>
</span></span><span class="line"><span class="cl">	<span class="c1">-- All the user commands added by the plugin</span>
</span></span><span class="line"><span class="cl">	<span class="n">cmd</span> <span class="o">=</span> <span class="p">{</span> <span class="s2">&#34;Ollama&#34;</span><span class="p">,</span> <span class="s2">&#34;OllamaModel&#34;</span><span class="p">,</span> <span class="s2">&#34;OllamaServe&#34;</span><span class="p">,</span> <span class="s2">&#34;OllamaServeStop&#34;</span> <span class="p">},</span>
</span></span><span class="line"><span class="cl">	<span class="n">keys</span> <span class="o">=</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">		<span class="c1">-- Sample keybind for prompt menu.</span>
</span></span><span class="line"><span class="cl">		<span class="c1">-- Note that the &lt;c-u&gt; is important for selections</span>
</span></span><span class="line"><span class="cl">		<span class="c1">-- to work properly.</span>
</span></span><span class="line"><span class="cl">		<span class="p">{</span>
</span></span><span class="line"><span class="cl">			<span class="s2">&#34;&lt;leader&gt;oo&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">			<span class="s2">&#34;:&lt;c-u&gt;lua require(&#39;ollama&#39;).prompt()&lt;cr&gt;&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">			<span class="n">desc</span> <span class="o">=</span> <span class="s2">&#34;ollama prompt&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">			<span class="n">mode</span> <span class="o">=</span> <span class="p">{</span> <span class="s2">&#34;n&#34;</span><span class="p">,</span> <span class="s2">&#34;v&#34;</span> <span class="p">},</span>
</span></span><span class="line"><span class="cl">		<span class="p">},</span>
</span></span><span class="line"><span class="cl">		<span class="c1">-- Sample keybind for direct prompting.</span>
</span></span><span class="line"><span class="cl">		<span class="c1">-- Note that the &lt;c-u&gt; is important for selections</span>
</span></span><span class="line"><span class="cl">		<span class="c1">-- to work properly.</span>
</span></span><span class="line"><span class="cl">		<span class="p">{</span>
</span></span><span class="line"><span class="cl">			<span class="s2">&#34;&lt;leader&gt;oG&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">			<span class="s2">&#34;:&lt;c-u&gt;lua require(&#39;ollama&#39;).prompt(&#39;Generate_Code&#39;)&lt;cr&gt;&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">			<span class="n">desc</span> <span class="o">=</span> <span class="s2">&#34;ollama Generate Code&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">			<span class="n">mode</span> <span class="o">=</span> <span class="p">{</span> <span class="s2">&#34;n&#34;</span><span class="p">,</span> <span class="s2">&#34;v&#34;</span> <span class="p">},</span>
</span></span><span class="line"><span class="cl">		<span class="p">},</span>
</span></span><span class="line"><span class="cl">	<span class="p">},</span>
</span></span><span class="line"><span class="cl">	<span class="c1">---@type Ollama.Config</span>
</span></span><span class="line"><span class="cl">	<span class="n">opts</span> <span class="o">=</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">		<span class="n">model</span> <span class="o">=</span> <span class="s2">&#34;codellama:7b&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">		<span class="n">url</span> <span class="o">=</span> <span class="s2">&#34;http://127.0.0.1:11434&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">	<span class="p">},</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span></code></pre></td></tr></table>
</div>
</div><h1 id="integrating-ollama-with-vscode">Integrating Ollama with VSCode</h1>
<p>The <a href="https://marketplace.visualstudio.com/items?itemName=Continue.continue" target="_blank">Continue</a> VSCode
extension supports the integration of LLMs
as coding assistants. To use it with Ollama we have to
change the <strong>Proxy Server Url</strong> in its settings to the one
used by our Ollama container.</p>
<p align="center"><img src="continue-settings.png" alt="Continue Extension Settings"></p>
<p align="center"><small>Continue Extension Settings</small></p>
<p>Watch Ollama in action inside VSCode</p>
<p align="center"><img src="ollama-vscode.gif" alt="Optimizing code using Ollama in VSCode"></p>
<p align="center"><small>Optimizing code using Ollama in VSCode</small></p>
<hr>
<p>Thank you for taking the time to read this blog post! Have questions, feedback or want to discuss this topic? Feel free to reach out at <a href="mailto:blog@avni.sh"><a href="mailto:blog@avni.sh">blog@avni.sh</a></a>.</p>
<p>If you found this content valuable and would like to stay updated with my latest posts, consider subscribing to my <a href="https://www.avni.sh/index.xml" target="_blank">RSS Feed</a>.</p>
<h1 id="resources">Resources</h1>
<p><a href="https://ollama.ai/" target="_blank">Ollama</a><br>
<a href="https://hub.docker.com/r/ollama/ollama" target="_blank">Ollama Docker Image</a><br>
<a href="https://www.nvidia.com/en-in/geforce/graphics-cards/30-series/rtx-3070-3070ti/" target="_blank">NVIDIA GeForce RTX 3070 Ti</a><br>
<a href="https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html" target="_blank">NVIDIA CUDA Installation Guide for Linux</a><br>
<a href="https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/index.html" target="_blank">NVIDIA Container Toolkit</a><br>
<a href="https://ollama.ai/library" target="_blank">Ollama Library</a><br>
<a href="https://github.com/ollama-webui/ollama-webui" target="_blank">Ollama Web UI</a><br>
<a href="https://github.com/nomnivore/ollama.nvim" target="_blank">ollama.nvim</a><br>
<a href="https://marketplace.visualstudio.com/items?itemName=Continue.continue" target="_blank">Continue</a></p>
]]></content:encoded>
    </item>
  </channel>
</rss>
