{"id":10894,"date":"2025-03-31T10:00:00","date_gmt":"2025-03-31T08:00:00","guid":{"rendered":"https:\/\/haimagazine.com\/uncategorized\/pitfalls-in-ai-implementations\/"},"modified":"2025-06-26T15:36:39","modified_gmt":"2025-06-26T13:36:39","slug":"pitfalls-in-ai-implementations","status":"publish","type":"post","link":"https:\/\/haimagazine.com\/en\/hai-magazine-4\/pitfalls-in-ai-implementations\/","title":{"rendered":"\ud83d\udd12 Pitfalls in AI Implementations"},"content":{"rendered":"<p class=\"wp-block-paragraph\">History already knows cases of unfortunate AI implementations. Chatbots made unauthorized discount promises to customers, which ended up in lawsuits. There were also bots that, instead of informing customers about the status of their packages, showed off their ability to write haiku. Or solutions based on large language models that directed insults towards users. These infamous cases are just the tip of the iceberg.    <\/p><p class=\"wp-block-paragraph\">Many more AI projects fail in a completely different way \u2013 quietly. They just never get past the user acceptance testing (UAT) phase. We&#8217;re talking about initiatives that reach a level that is supposedly &#8220;good enough&#8221;, but not convincing enough for management to decide on production implementation. These are precisely the projects that, in my opinion, suffer the biggest failures. Often, work on these projects begins with great enthusiasm, consumes significant resources and takes up months of developers&#8217; work, only to end up forgotten and not generating any business value.    <\/p><p class=\"wp-block-paragraph\">In the past two years as a technical architect, I had the opportunity to support companies in the realization of over a hundred projects that utilized artificial intelligence. Based on these experiences, I have gathered key mistakes that you must avoid if you want a succesful AI implementation in your organization. <\/p><p class=\"wp-block-paragraph\">It may seem surprising that an article about such a groundbreaking technology begins with its limitations. However, from the perspective of a systems architect, I believe that the key to success is a realistic approach. Similarly to the classic service triangle (&#8220;a service can be done quickly, cheaply and well \u2013 you just have to choose two of the three&#8221;), in AI projects we also have to make pragmatic choices.  <\/p><h4 class=\"wp-block-heading\"><strong>The probabilistic nature of AI<\/strong><\/h4><p class=\"wp-block-paragraph\">Every machine learning (ML) model can make mistakes. In classical ML solutions, we can often determine the model&#8217;s confidence level through what&#8217;s called a confidence score \u2013 for instance, when the model declares with 98 percent certainty that the image shows a chihuahua, not a muffin (Google &#8220;chihuahua muffin&#8221; if you don&#8217;t know what I&#8217;m talking about).  <\/p><p class=\"wp-block-paragraph\">In the case of language models (LLM), the matter is more complicated. The most popular solution is to verify the results using a second LLM or SLM model. We can introduce one or even several such validators that will check the genAI results for us and confirm that it meets our expectations not only in terms of quality, but also in terms of absence of biases. This approach significantly reduces the risk of errors.   <\/p><p class=\"wp-block-paragraph\">However, it introduces additional challenges:<\/p><ul class=\"wp-block-list\"><li>it increases the number of queries to models, and thus usually also the inference costs,<\/li>\n\n<li>it extends latency, which in turn can degrade the user experience, especially in cases of chatbots, voicebots and similar applications.<\/li><\/ul><p class=\"wp-block-paragraph\">You can also think about taking a hybrid approach that combines AI with rules. Especially if you have managed to create something that mostly just works well. Then, a good solution is often to simply add information to the user interface: &#8220;You are talking to an artificial intelligence that can make mistakes.<br\/>If you have any doubts, contact info@yourcompany.com.&#8221; This will help protect against the last few percent risk of error.    <\/p><h4 class=\"wp-block-heading\"><strong>Hallucinations \u2013 facts and myths<\/strong><\/h4><p class=\"wp-block-paragraph\">The term &#8220;hallucinations&#8221; is probably familiar to all of us, as it frequently appears in the context of incorrect responses generated by AI. However, true hallucinations \u2014mistakes resulting from the nature of the language model\u2014 constitute only a small part of the problem. Imperfections in responses are often due to factors such as:  <\/p><ul class=\"wp-block-list\"><li>improperly designed solution architecture,<\/li>\n\n<li>inconsistent or unclear prompts,<\/li>\n\n<li>poorly chosen context,<\/li>\n\n<li>inappropriate query parameters, e.g., too high &#8220;temperature&#8221;, which is a parameter that controls the degree of randomness in the generated responses,<\/li>\n\n<li>insufficient data quality in RAG systems (Retrieval-Augmented Generation).<\/li><\/ul><p class=\"wp-block-paragraph\">The last point, the insufficient quality of the data being added to the database that the AI system is going to use, deserves special attention. We often try to fix the problem of incorrect, conflicting, and\/or insufficient data by manipulating the prompt, when a better solution is usually to improve the quality of the input data. Fortunately, AI can help us with this.   <\/p><p class=\"wp-block-paragraph\">Let&#8217;s imagine, for example, that you are building a chatbot for your clients that will respond to questions based on our sources \u2013 the terms of the services you offer, their costs and availability. These documents will change as modifications are made in your portfolio. Therefore, you can create an application that checks the quality of every new document you want to add. You can implement it in such a way that it verifies both the quality of the new document itself, such as whether the PDF file has been correctly scanned, and whether it does not contain conflicting information in comparison with other documents that are already present in your source document database. After detecting issues, the application will attempt to fix them and\/or send a warning to the person responsible for this document (i.e., in classic concepts: data governance \u2013 to the data steward). The application and its corresponding process significantly help reduce the risk of incorrect responses from LLMs that result from the quality of data.     <\/p><h4 class=\"wp-block-heading\"><strong>Perfect is the enemy of good<\/strong><\/h4><p class=\"wp-block-paragraph\">A well-designed and implemented AI system rarely makes mistakes. However, it is key to define at the beginning of the project what \u201crarely\u201d is sufficient for us. If you need 100% accuracy and absolute certainty that the system will respond exactly as you expect, an LLM might simply not be the best choice.  <\/p><p class=\"wp-block-paragraph\">However, in most projects this is not necessary. Sometimes, you just have to accept minor stylistic differences or slight deviations from the intended result. Instead of striving for unattainable perfection, it&#8217;s worth setting specific, measurable goals, for example:  <\/p><ul class=\"wp-block-list\"><li>no hallucinations in key areas and at least 80% relevancy score for responses,<\/li>\n\n<li>effectiveness verified on 30 predefined test cases,<\/li>\n\n<li>tests repeated ten times for reliable results.<\/li><\/ul><p class=\"wp-block-paragraph\">Such arrangements allow us to avoid the trap of feeling that we are one prompt away from the perfect solution, which in turn often leads to dragging projects on indefinitely.<\/p><h4 class=\"wp-block-heading\"><strong>Managing expectations<\/strong><\/h4><p class=\"wp-block-paragraph\">Everyone who has ever worked with clients knows how important it is to manage expectations and set clear, realistic goals. Attempts to implement projects with unclear or conflicting goals usually end in failure, client dissatisfaction, and even the breakdown of business relationships. <\/p><p class=\"wp-block-paragraph\">This same principle applies to AI projects. For developing AI-based tools, it&#8217;s crucial to establish realistic (and consistent!) requirements. For example:  <\/p><ul class=\"wp-block-list\"><li>the tool can generate product descriptions either in a direct and humorous style or in a formal and elegant tone,<\/li>\n\n<li>a chatbot based on the RAG architecture can provide either general or detailed responses, and may either strictly adhere to sources and quote their content, or primarily follow user questions and tailor responses accordingly.<\/li><\/ul><p class=\"has-accent-color has-text-color has-link-color wp-elements-0cf430b13b0b444900af836f35a02042 wp-block-paragraph\">It is advisable to avoid creating &#8220;all-in-one&#8221; solutions. These often turn into complicated, overly long, and internally contradictory prompts and application projects. Introducing exceptions to the rules of AI-based solutions increases the risk that they will not behave as we would expect.  <\/p><p class=\"wp-block-paragraph\">If you truly need different behaviors in various contexts, a better solution would be to implement a more advanced approach. For example, one based on agents. Each agent should have a separate knowledge resource. It should be designed in such a way that it behaves appropriately: for example, one agent would have a single task \u2013 to extract criticism of your service from the statements of clients calling the helpline.<br\/>So, to put it simply, limit its task to one action. The system prompt, the selected model, the parameters and the current knowledge should be adjusted to this specific task. On the other hand, design another agent to formulate responses to customer questions. In this case, it should have access to key data sources and know how to properly use them, even when written in specialized jargon.        <\/p><h4 class=\"wp-block-heading\"><strong>Avoiding excessive control<\/strong><\/h4><p class=\"wp-block-paragraph\">Excellent results are often also achieved by combining traditional programming with AI. You can do it for example like this: <\/p><ul class=\"wp-block-list\"><li>First, classify the emails with traditional ML methods.<\/li>\n\n<li>Then, direct them to specialized AI agents.<\/li>\n\n<li>Every agent has an appropriate prompt, parameters and access to the essential tools and knowledge.<\/li><\/ul><p class=\"wp-block-paragraph\">However, we should avoid excessive &#8220;tracking&#8221; \u2013 such as forcing LLMs into advanced email classifications and prompting them to provide responses for each type of query according to strict templates. Let&#8217;s be honest: language models are not the best at classification, and overly detailed instructions often limit their potential. This type of model performs best in situations where they are given some freedom to operate. If we need strict control, it&#8217;s worth considering alternative solutions:   <\/p><ul class=\"wp-block-list\"><li>using regex to validate order numbers provided by customers in correspondence instead of a complicated prompt,<\/li>\n\n<li>using a specialized tool for data anonymization if you want to make sure that all instances where a client provided their ID or social security number are captured (like Azure AI PII Detection),<\/li>\n\n<li>using smaller, specialized Named Entity Recognition (NER) units, whose only task is to find names, streets, places, etc., in text, such as those available in Azure AI Language.<\/li><\/ul><p class=\"wp-block-paragraph\">A good approach might also be to combine general LLMs or SLMs with specialized tools. The idea is to fully tap into each one&#8217;s potential by delegating them to tasks in which they perform best. <\/p><p class=\"wp-block-paragraph\">Combining different services so that they can correct each other&#8217;s mistakes also works well. For example, in OCR projects, the combination of specialized image processing models for specific purposes (e.g., Azure AI Document Intelligence, Azure AI Vision) with generative multimodal models like GPT-4o works great. <\/p><h4 class=\"wp-block-heading\"><strong>AI in constant evolution<\/strong><\/h4><p class=\"wp-block-paragraph\">In the past two years, the world of generative AI has experienced a paradox of change. On one hand, there were only a few of them \u2013 especially since the premiere of Sora and the introduction of the Realtime API last year, it&#8217;s hard to pinpoint any truly groundbreaking moments. Of course, apart from those related to the training speed and infrastructural requirements. We have been rather dealing with an evolution. On the other hand, when we look at the practical capabilities of the latest models, the changes seem huge.    <\/p><p class=\"wp-block-paragraph\">The latest models, such as GPT-4o, GPT-o1 and o3, surpass their predecessors in key aspects, such as:<\/p><ol class=\"wp-block-list\"><li>reasoning \u2013 that is, the ability to draw conclusions and answer questions that require logical thinking skills,<\/li>\n\n<li>performance speed &#8211; noticeably lower latency, with the exception of reasoning type models.<br\/>In their case, the reasoning process that happens when we ask the model a question justifies the longer time it needs to provide an answer, <\/li>\n\n<li>ability to work with an expanded context \u2013 currently, models not only process larger amounts of information, but also use it in a significantly more effective way. While older models focused more on the beginning and end of the context they were given, newer versions can almost evenly analyze the entirety of the data they&#8217;re provided with.  <\/li><\/ol><p class=\"wp-block-paragraph\">Significant progress can also be seen in the area of security \u2013 in the level of awareness of threats and defense against them. For example, the Microsoft Azure platform offers an extensive set of security features (guardrails), equipped with filters protecting against different forms of attacks on prompts, such as jailbreak attacks and indirect attacks.  <\/p><p class=\"wp-block-paragraph\">In such a dynamically evolving field, keeping up with new developments can help companies operate more efficiently and better control the risks associated with AI implementations.<\/p>","protected":false},"excerpt":{"rendered":"<p>In the last two years, language models (LLMs) have dominated the technology world and sparked the imagination of both individual users and businesses. However, the journey from admiration to successful production implementation in a company is significantly harder than using ready-made solutions for private use. <\/p>\n","protected":false},"author":260,"featured_media":9757,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"rank_math_lock_modified_date":false,"footnotes":""},"categories":[783,756,785,673,781,674],"tags":[],"popular":[],"difficulty-level":[38],"ppma_author":[639],"class_list":["post-10894","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-industry","category-ai_branza","category-business","category-hai-magazine-4","category-hai-premium","category-issue-4","difficulty-level-medium"],"acf":[],"authors":[{"term_id":639,"user_id":260,"is_guest":0,"slug":"agnieszka-niezgoda","display_name":"dr Agnieszka Niezgoda","avatar_url":{"url":"https:\/\/haimagazine.com\/wp-content\/uploads\/2025\/03\/Untitled-design-1.png","url2x":"https:\/\/haimagazine.com\/wp-content\/uploads\/2025\/03\/Untitled-design-1.png"},"first_name":"dr Agnieszka","last_name":"Niezgoda","user_url":"","job_title":"","description":"Ekspert i mened\u017cer w dziedzinie danych i AI, z 15-letnim mi\u0119dzynarodowym do\u015bwiadczeniem w tworzeniu aplikacji opartych na AI, modeli uczenia maszynowego, in\u017cynierii danych oraz strategii danych i AI. Architekt danych i AI w Microsoft."}],"_links":{"self":[{"href":"https:\/\/haimagazine.com\/en\/wp-json\/wp\/v2\/posts\/10894","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/haimagazine.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/haimagazine.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/haimagazine.com\/en\/wp-json\/wp\/v2\/users\/260"}],"replies":[{"embeddable":true,"href":"https:\/\/haimagazine.com\/en\/wp-json\/wp\/v2\/comments?post=10894"}],"version-history":[{"count":1,"href":"https:\/\/haimagazine.com\/en\/wp-json\/wp\/v2\/posts\/10894\/revisions"}],"predecessor-version":[{"id":10895,"href":"https:\/\/haimagazine.com\/en\/wp-json\/wp\/v2\/posts\/10894\/revisions\/10895"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/haimagazine.com\/en\/wp-json\/wp\/v2\/media\/9757"}],"wp:attachment":[{"href":"https:\/\/haimagazine.com\/en\/wp-json\/wp\/v2\/media?parent=10894"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/haimagazine.com\/en\/wp-json\/wp\/v2\/categories?post=10894"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/haimagazine.com\/en\/wp-json\/wp\/v2\/tags?post=10894"},{"taxonomy":"popular","embeddable":true,"href":"https:\/\/haimagazine.com\/en\/wp-json\/wp\/v2\/popular?post=10894"},{"taxonomy":"difficulty-level","embeddable":true,"href":"https:\/\/haimagazine.com\/en\/wp-json\/wp\/v2\/difficulty-level?post=10894"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/haimagazine.com\/en\/wp-json\/wp\/v2\/ppma_author?post=10894"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}