Run a LLM Locally on an Intel Mac with an eGPU
Originally published on https://www.ankitbabber.com
I have a Mac with Intel silicon. I also have an eGPU with an AMD 6900XT (...allright!). BUT I COULDN'T HARNESS THAT POWER AND RUN A LLM LOCALLY WITH OLLAMA!!! If you have a Mac with Intel silicon, then you know that the CPU and integrated GPU are insufficient for running a LLM locally. I don't want to buy new hardware to play around with LLMs locally. Then I found llama.cpp. This is an amazing repo that helps democratize the use of LLMs on local machines. However, the install instructions on the llama.cpp repo did not cover the issues I came across, which is why I was motivated to write up a post to help others out. If you have older hardware that isn't supported by the current tools to run a LLM locally (specifically a Mac with Intel silicon and an AMD eGPU), then this post is for you! As a side note, if your hardware isn't a 1:1 match for what I have, but you realize that running the Vulkan SDK backend for llama.cpp is the right fit for you, then this post may be useful for setting up the Vulkan backend.
0. Dependencies and Setup
If you have a Mac with Intel silicon, then you need to use the
Vulkan
backend when setting up llama.cpp because llama.cpp only supports theMetal
api on Macs for Apple silicon and AMD only providesHIP
support for Linux.Make sure you haven't installed
MoltenVK
or otherVulkan
SDK components piecemeal via a package manager likebrew
because that can interfere with theVulkan
SDK install.Download and install the Vulkan SDK.
Install
cmake
andlibomp
with your favorite package manager.Make sure you verify the
sha256
hash to ensure the file you downloaded is correct.
# verify the sha256 hash on your local machine, I'll do it on my Mac with the below command
openssl dgst -sha256 ./path/to/download/vulkansdk-macos-1.4.304.0.zip
# verify the output visually, on a repl, etc
# you will also need to install cmake and libomp
brew install cmake libomp
brew doctor --verbose
# if there is no output, then proceed
# if there are files listed here, copy them and save them in a separate txt file
# get the `llama.cpp` repo from github
git clone https://github.com/ggerganov/llama.cpp.git
# or with the github cli get the `llama.cpp` repo from github
# gh repo clone ggerganov/llama.cpp
# ensure you are in the llama.cpp directory for the next steps
cd llama.cpp
1. Build llama.cpp
Using cmake
- When I tried to build
llama.cpp
withcmake
for the first time following the instructions on the llama.cpp repo for Vulkan, I got the following error after runningcmake -B build -DGGML_VULKAN=ON
:
-- Could NOT find OpenMP_C (missing: OpenMP_C_FLAGS OpenMP_C_LIB_NAMES)
-- Could NOT find OpenMP_CXX (missing: OpenMP_CXX_FLAGS OpenMP_CXX_LIB_NAMES)
-- Could NOT find OpenMP (missing: OpenMP_C_FOUND OpenMP_CXX_FOUND)
CMake Warning at ggml/src/ggml-cpu/CMakeLists.txt:53 (message):
OpenMP not found
On your Mac, Xcode Command Line Tools or
xcode-select
should provide access to OpenMP viaClang
.My version of
cmake
could not see theOpenMP
api included withClang
, so to make my life easier I just installedOpenMP
viabrew install libomp
.After this, I needed to link
libomp
when runningcmake -B build -DGGML_VULKAN=ON
from the instructions on the llama.cpp repo for Vulkan. Since, I installedlibomp
withbrew
, the pathway tolibomp
reflects that. Also, I explicitly ensured theMetal
api was off.
cmake -B build -DGGML_METAL=OFF -DGGML_VULKAN=ON \
-DOpenMP_C_FLAGS=-fopenmp=lomp \
-DOpenMP_CXX_FLAGS=-fopenmp=lomp \
-DOpenMP_C_LIB_NAMES="libomp" \
-DOpenMP_CXX_LIB_NAMES="libomp" \
-DOpenMP_libomp_LIBRARY="$(brew --prefix)/opt/libomp/lib/libomp.dylib" \
-DOpenMP_CXX_FLAGS="-Xpreprocessor -fopenmp $(brew --prefix)/opt/libomp/lib/libomp.dylib -I$(brew --prefix)/opt/libomp/include" \
-DOpenMP_CXX_LIB_NAMES="libomp" \
-DOpenMP_C_FLAGS="-Xpreprocessor -fopenmp $(brew --prefix)/opt/libomp/lib/libomp.dylib -I$(brew --prefix)/opt/libomp/include"
I didn't run into any other errors after running the previous command, but make sure to read through the output to ensure everything is fine on your system.
To complete the build process, run
cmake --build build --config Release
.You will see many objects being built so that
llama.cpp
can run, and the following warnings as of versionb4686
ofllama.cpp
:
ggml_vulkan: Generating and compiling shaders to SPIR-V
[ 6%] Building CXX object ggml/src/ggml-vulkan/CMakeFiles/ggml-vulkan.dir/ggml-vulkan.cpp.o
/Users/ankit/Playground/llama.cpp/ggml/src/ggml-vulkan/ggml-vulkan.cpp:1382:2: warning: extra ';' outside of a function is incompatible with C++98 [-Wc++98-compat-extra-semi]
1382 | };
| ^
/Users/ankit/Playground/llama.cpp/ggml/src/ggml-vulkan/ggml-vulkan.cpp:7048:16: warning: 'return' will never be executed [-Wunreachable-code-return]
7048 | return false;
| ^~~~~
/Users/ankit/Playground/llama.cpp/ggml/src/ggml-vulkan/ggml-vulkan.cpp:8208:15: warning: 'break' will never be executed [-Wunreachable-code-break]
8208 | } break;
| ^~~~~
/Users/ankit/Playground/llama.cpp/ggml/src/ggml-vulkan/ggml-vulkan.cpp:8167:15: warning: 'break' will never be executed [-Wunreachable-code-break]
8167 | } break;
| ^~~~~
/Users/ankit/Playground/llama.cpp/ggml/src/ggml-vulkan/ggml-vulkan.cpp:8088:15: warning: 'break' will never be executed [-Wunreachable-code-break]
8088 | } break;
| ^~~~~
/Users/ankit/Playground/llama.cpp/ggml/src/ggml-vulkan/ggml-vulkan.cpp:8035:13: warning: 'break' will never be executed [-Wunreachable-code-break]
8035 | break;
| ^~~~~
6 warnings generated.
[ 6%] Building CXX object ggml/src/ggml-vulkan/CMakeFiles/ggml-vulkan.dir/ggml-vulkan-shaders.cpp.o
[ 7%] Linking CXX shared library ../../../bin/libggml-vulkan.dylib
[ 7%] Built target ggml-vulkan
[ 8%] Building C object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.c.o
cc: warning: /usr/local/opt/libomp/lib/libomp.dylib: 'linker' input unused [-Wunused-command-line-argument]
In file included from /Users/ankit/Playground/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:40:
/usr/local/opt/libomp/include/omp.h:54:9: warning: ISO C restricts enumerator values to range of 'int' (2147483648 is too large) [-Wpedantic]
54 | omp_sched_monotonic = 0x80000000
| ^ ~~~~~~~~~~
/usr/local/opt/libomp/include/omp.h:411:7: warning: ISO C restricts enumerator values to range of 'int' (18446744073709551615 is too large) [-Wpedantic]
411 | KMP_ALLOCATOR_MAX_HANDLE = UINTPTR_MAX
| ^ ~~~~~~~~~~~
/usr/local/opt/libomp/include/omp.h:427:7: warning: ISO C restricts enumerator values to range of 'int' (18446744073709551615 is too large) [-Wpedantic]
427 | KMP_MEMSPACE_MAX_HANDLE = UINTPTR_MAX
| ^ ~~~~~~~~~~~
/usr/local/opt/libomp/include/omp.h:471:39: warning: ISO C restricts enumerator values to range of 'int' (18446744073709551615 is too large) [-Wpedantic]
471 | typedef enum omp_event_handle_t { KMP_EVENT_MAX_HANDLE = UINTPTR_MAX } omp_event_handle_t;
| ^ ~~~~~~~~~~~
4 warnings generated.
[ 8%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.cpp.o
c++: warning: /usr/local/opt/libomp/lib/libomp.dylib: 'linker' input unused [-Wunused-command-line-argument]
[ 9%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-aarch64.cpp.o
c++: warning: /usr/local/opt/libomp/lib/libomp.dylib: 'linker' input unused [-Wunused-command-line-argument]
[ 9%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-hbm.cpp.o
c++: warning: /usr/local/opt/libomp/lib/libomp.dylib: 'linker' input unused [-Wunused-command-line-argument]
[ 9%] Building C object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-quants.c.o
cc: warning: /usr/local/opt/libomp/lib/libomp.dylib: 'linker' input unused [-Wunused-command-line-argument]
[ 10%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-traits.cpp.o
c++: warning: /usr/local/opt/libomp/lib/libomp.dylib: 'linker' input unused [-Wunused-command-line-argument]
[ 10%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/amx.cpp.o
c++: warning: /usr/local/opt/libomp/lib/libomp.dylib: 'linker' input unused [-Wunused-command-line-argument]
[ 11%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/mmq.cpp.o
c++: warning: /usr/local/opt/libomp/lib/libomp.dylib: 'linker' input unused [-Wunused-command-line-argument]
[ 11%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/llamafile/sgemm.cpp.o
c++: warning: /usr/local/opt/libomp/lib/libomp.dylib: 'linker' input unused [-Wunused-command-line-argument]
[ 12%] Linking CXX shared library ../../bin/libggml-cpu.dylib
[ 12%] Built target ggml-cpu
- I have not run into any issues from these warnings as of yet. But, I will update this post in the future if I do.
2. Getting Models for llama.cpp
You will need to download models to run with
llama.cpp
.Hugging Face is an excellent source for models, but make sure you get quantized models. Quantized models have been changed to work on hardware that does not have enough
RAM
to run the model as it was intended. The only change in the model is how large of an input the model will work with by reducing the bits.I will create a future post regarding quantizing models, but for now we will use a pre-quantized model for the purposes of testing our build.
cd ../
or go one level up outside of thellama.cpp
directory andmkdir llm-models
. We will store all of our models outside of thellama.cpp
repo.Go to the newly created directory
cd llm-models
.I downloaded an
8 bit
Meta Llama 3.1
model from ggml's hugging face, which was quantized using aQ4_0
quantization method.Ensure you download the model into the
llm-models/
directory.Once the download is complete, we are ready to run the model locally.
3. Running llama.cpp
cd llama.cpp
# start interactive mode
./build/bin/llama-cli -m ../llm-models/meta-llama-3.1-8b-instruct-q4_0.gguf
- If everything went well,
llama.cpp
will see the AMD GPU:
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 6900 XT (MoltenVK) | uma: 0 | fp16: 1 | warp size: 64 | shared memory: 65536 | matrix cores: none
- Test out interactive mode and have fun running an
LLM
locally!
4. Last Minute Cleanup
On my Mac, the
Vulkan
SDK created a bunch ofdylib
,static lib
,pc
, andheader
files in/usr/local/lib
.When you run
brew doctor --verbose
,brew
will give you a bunch of warnings that it found unbrewed files.You can choose to ignore this warning. However, if it bothers you like it bothered me. You'll want to do something about it.
WARNING:
The next steps involve alteringbrew
locally, and is a temporary fix. If you are not comfortable working withbash
functions or alteringruby
code, do not proceed.Go to
/usr/local/Homebrew/Library/Homebrew
.There you will find
diagnostic.rb
.Open this file and change the
allow_list
array in the following functionsdef check_for_stray_dylibs
,def check_for_stray_static_libs
,def check_for_stray_pcs
, anddef check_for_stray_headers
, with the corresponding files. You can runbrew doctor --verbose
to get the list of files again.Make sure not to include the files that were output by
brew doctor --verbose
and saved to a separate file. If there were unbrewed files before installing theVulkan
SDK, then they must be addressed separately and are not part of the scope of this post.Once you have changed
diagnostic.rb
, save the file.When you
cd /usr/local/Homebrew
and rungit status
, you will see that thediagnostic.rb
is changed. We cannot commit these changes.If you run
brew doctor --verbose
the files added to/usr/local/lib
by theVulkan
SDK are no longer there.These changes are not permanent. To ensure I don't have to edit
diagnostic.rb
each time I upgradebrew
. I wrote twobash
functions and saved the list ofdylibs
,static libs
,pcs
, andheaders
to aJSON
file.Add the following functions to your
.bashrc
,.zshrc
or/custom
foroh-my-zsh
function allow-stash()
{
CURRDIRR=$(pwd)
cd /usr/local/HomeBrew
git stash
cd $CURRDIRR
}
function allow-stash-apply()
{
CURRDIRR=$(pwd)
cd /usr/local/HomeBrew
git stash apply
cd $CURRDIRR
}
allow-stash
saves the changes made to thediagnostic.rb
ingit stash
. We're simply saving the appends made to theallow_list
in the previously listed functions.Then I run
brew update && brew upgrade
.Then I run
allow-stash-apply
to add my changes todiagnostic.rb
again.You can create a third function to combine all these steps into a single command.
function update-brew()
{
allow-stash && brew update && brew upgrade && allow-stash-apply
}
bash
will run all these commands in sequence from left to right, so the order matters.