©AdobeStock_571426637
Local AI solutions – easy to explain and make good use of!
Dear readers,
today, we would like to share with you the opportunities offered by Large Language Models (LLMs). We explain the main concepts and functioning, as well as the advantages and disadvantages of cloud versus local applications.
What is the difference between...?
AI applications such as Open AI ChatGPT from the US, Le Chat from Mistral from France or DeepSeek from China are already a term for many. They enable the generation of naturally sounding texts, the comprehensible collation of information, or even the generation of images, speech or videos. What is special is that complicated instructions in a programming language are no longer needed, but their results are generated from human language. However, almost all of these applications work on the providers’ cloud servers and can only be accessed via the internet. They entail corresponding disadvantages in terms of data protection.
One of the most exciting applications in the business context is therefore locally operated AI systems combined with in-house personal data. In this way, models can be adapted not only for their own use, but also to meet their own requirements in a more targeted way. This is where local language models and applications come into play. These can be hosted locally on their own hardware and can thus be used more securely and flexibly than online applications.
What does... actually mean?
Large Language Models (LLMs) are AI models that can understand human language and, among other things, generate naturally sounding texts. The Number of model parameters (e.g. 7B or 70B) indicates how powerful a model is, but larger models (e.g. Llama 3 70B) also require more powerful hardware.
The Retrieval-Augmented Generation Technology (RAG) is nowadays in everyone’s mouth. This allows the AI model to specifically retrieve up-to-date information from external sources and to generate precise answers to specific questions. In this way, one’s own data can easily be integrated into the system.
Local hosting means the operation of software directly on your own hardware, without the use of cloud services and with a API It is also possible to communicate easily and flexibly with locally hosted software via external programs and applications.
Local solutions have the great advantage of keeping your data secure on your own server or computer and avoiding running cloud costs. However, more powerful models require more powerful hardware, otherwise longer response times will arise.
What hardware do I need?
Entry models such as Llama 3 (8B), Mistral 7B or Phi-3 are excellent for everyday applications and can operate even on commercial office 8-16 GB RAM laptops, especially when no Retrieval Augmented Generation (RAG) is used. These models can mostly be efficiently implemented through the CPUs or integrated GPUs (e.g. Intel Arc or Apple M chips).
However, more powerful models such as Llama 3 (70B) or Mixtral 8×7B require significantly more computational power. A workstation or server with at least 32 GB of RAM and a dedicated GPU with at least 24 GB of VRAM (e.g. NVIDIA RTX 4090 or A6000) is recommended. These systems allow not only larger contexts and faster inference times, but also features such as RAG. The use of several GPUs may also be useful for productive operations or parallel queries.
Conclusion
Local AI solutions enable safety, flexibility and long-term cost-efficiency. They are particularly suitable for companies that take data protection seriously but still do not want to forego the benefits of AI applications.
We hope this overview will help you understand local AI applications and identify opportunities for meaningful use!
If we have attracted your interest, simply make an appointment for a free speaking hour at: info@edih-saarland.de. We are happy to help you implement your project!

Author
Daniel Silva
East Side Fab

