{"id":11005,"date":"2025-03-31T10:00:00","date_gmt":"2025-03-31T08:00:00","guid":{"rendered":"https:\/\/haimagazine.com\/uncategorized\/deepseek-a-lurking-tiger\/"},"modified":"2025-06-26T15:33:12","modified_gmt":"2025-06-26T13:33:12","slug":"deepseek-a-lurking-tiger","status":"publish","type":"post","link":"https:\/\/haimagazine.com\/en\/hai-magazine-4\/deepseek-a-lurking-tiger\/","title":{"rendered":"DeepSeek \u2013 a lurking tiger"},"content":{"rendered":"<p class=\"wp-block-paragraph\">DeepSeek is a Chinese company that recently released a series of language models in open source, including DeepSeek-V3 and two versions of a reasoning-oriented model: DeepSeek-R1-Zero (trained exclusively through reinforcement learning) and DeepSeek-R1 (which uses a multi-stage training approach). Even though new models are published somewhere in the world every day, this specific set literally shook the tech and business world for a moment, so much so that it stirred up the stock market. Why?  <\/p><figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXcIgxGT9NFHKKJ2q0nvs2v_ozo4PrPlDgsxUERCVvPEhXPdwKZpDwlVJCD3b4t-hbUBTQbeRUaT797U431EJZmpNFhcD537oIGy22A1pjjcXUOQnKKmvfXgi_xuodvvmuP2H3rFQeXPDX9RWhl86Vk?key=VDK5RfnjWPU5lV0Wqr-9bh2E\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Original diagram from the article about DeepSeek-V3<\/figcaption><\/figure><h4 class=\"wp-block-heading\"><strong>A transformer with a twist?<\/strong><\/h4><p class=\"wp-block-paragraph\">DeepSeek&#8217;s architecture is based on the classic transformer-type architecture (discussed in detail in &#8220;hAI Magazine&#8221; issue 1\/2024), with the key difference that standard feed-forward layers have been replaced with MoE (Mixture of Experts) layers. In MoE layers, instead of a single feed-forward network, we have a set of experts (also feed-forward networks) and an advanced system that picks which experts will process a given token. This approach allows the model to achieve results comparable to leading proprietary models, all while maintaining computational efficiency.<\/p><p class=\"has-background wp-block-paragraph\" style=\"background-color:#97372a\"><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-base-color\"><strong>Fun fact<\/strong><br><\/mark><br><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-base-color\">MoE architecture was previously used in various language models. One of the first big models available to the public that used it was Mixtral, released in December 2023. <\/mark><\/p><p class=\"has-text-align-center wp-block-paragraph\"> <img loading=\"lazy\" decoding=\"async\" width=\"800\" height=\"491\" class=\"wp-image-9949\" style=\"width: 800px;\" src=\"https:\/\/haimagazine.com\/wp-content\/uploads\/2025\/03\/121_1.png\" alt=\"\" srcset=\"https:\/\/haimagazine.com\/wp-content\/uploads\/2025\/03\/121_1.png 1266w, https:\/\/haimagazine.com\/wp-content\/uploads\/2025\/03\/121_1-300x184.png 300w, https:\/\/haimagazine.com\/wp-content\/uploads\/2025\/03\/121_1-1024x628.png 1024w, https:\/\/haimagazine.com\/wp-content\/uploads\/2025\/03\/121_1-768x471.png 768w, https:\/\/haimagazine.com\/wp-content\/uploads\/2025\/03\/121_1-600x368.png 600w\" sizes=\"auto, (max-width: 800px) 100vw, 800px\" \/><br>Figure 2. Simplified diagram from the article about DeepSeek-V3 <\/p><h4 class=\"wp-block-heading\"><strong>How does this compare to the classic approach?<\/strong><\/h4><p class=\"has-text-align-left wp-block-paragraph\">Classic transformers (e.g. Llama-2, GPT-3, BERT) consist of successive layers, each with a specific number of parameters.<br><br>While the model processes the token to calculate the result, all parameters are used \u2013 <strong>every parameter is active in each layer<\/strong>. This approach works, but compared to the MoE method, it&#8217;s expensive and inefficient. <br><br>In simple terms, it can be compared to having a group of experts: a car mechanic, a chef, a birdwatcher. Each of them has to make every single decision, even if it&#8217;s not related to their specialty. <br><br>In<strong> DeepSeek<\/strong>, for processing each token, <strong>only selected expert networks are used<\/strong>: first, we match experts to the token, and then we ask the question. This ensures that we don&#8217;t end up asking chefs or car mechanics about the nesting habits of storks. <\/p><h4 class=\"wp-block-heading\"><strong>What sets DeepSeek apart from Mixtral?<\/strong><\/h4><p class=\"wp-block-paragraph\">In the implementation, DeepSeek MoE uses an advanced rotation system with a key innovation in the form of an auxiliary-loss-free balancing strategy. In each of the 58 MoE layers \u2013 except the first three \u2013 the model picks nine experts to predict the next token: one shared expert and eight specialized. Every expert is a one-way network with a structure of 7168 \u2192 2048 \u2192 7168 neurons.<\/p><p class=\"wp-block-paragraph\">The expert selection involves determining the similarity between the token and the available experts, and then picking those that best match the input.<\/p><p class=\"wp-block-paragraph\">Similarity is determined by calculating the affinity score, which in turn involves comparing the input token (more specifically its vector representation) with the expert&#8217;s centroid. Mathematically, this is defined as the sigmoid function of the dot product of the token representation and the expert centroid.<\/p><p class=\"wp-block-paragraph\">8 out of 256 specialized experts are selected for predictions \u2013 those with the highest adjusted matching coefficient. On top of that, there&#8217;s always an active shared expert involved. <\/p><p class=\"wp-block-paragraph\">In every prediction, a shared expert helps ensure the stability and consistency of results, while specialized experts can focus on specific aspects of language processing.<\/p><p class=\"has-background wp-block-paragraph\" style=\"background-color:#97372a\"><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-base-color\"><strong>Fun fact<\/strong><br><\/mark><br><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-base-color\">During training, DeepSeek-V3 applies the <strong>expert bias<\/strong> mechanism \u2013 if an expert is used too often, their priority is decreased, and if too rarely, it&#8217;s increased. So it&#8217;s like in a restaurant where the manager watches how many tables are taken and assigns waiters to them on the fly. <\/mark><\/p><h4 class=\"wp-block-heading\"><strong>Multi-Token Prediction<\/strong><\/h4><p class=\"wp-block-paragraph\">It&#8217;s also worth mentioning that the model uses the Multi-Token Prediction (MTP) technique, which predicts not only the second token, but also the one following it, improving prediction quality and decoding speed. We could compare this to how a chess player plans several moves ahead. DeepSeek-V3 predicts two such steps at once. In order to predict multiple tokens ahead, the model is expanded with what&#8217;s called MTP modules (neural networks). For example, if we want to predict three tokens ahead, we should add two extra MTP modules to the model architecture.    <\/p><p class=\"has-text-align-center wp-block-paragraph\"> <img loading=\"lazy\" decoding=\"async\" width=\"800\" height=\"361\" class=\"wp-image-9951\" style=\"width: 800px;\" src=\"https:\/\/haimagazine.com\/wp-content\/uploads\/2025\/03\/122_1.png\" alt=\"\" srcset=\"https:\/\/haimagazine.com\/wp-content\/uploads\/2025\/03\/122_1.png 1166w, https:\/\/haimagazine.com\/wp-content\/uploads\/2025\/03\/122_1-300x135.png 300w, https:\/\/haimagazine.com\/wp-content\/uploads\/2025\/03\/122_1-1024x462.png 1024w, https:\/\/haimagazine.com\/wp-content\/uploads\/2025\/03\/122_1-768x346.png 768w, https:\/\/haimagazine.com\/wp-content\/uploads\/2025\/03\/122_1-600x271.png 600w\" sizes=\"auto, (max-width: 800px) 100vw, 800px\" \/><br>Figure 3. Original diagram from the article about DeepSeek-V3, with MTP <\/p><p class=\"wp-block-paragraph\"><strong><mark style=\"background-color:rgba(0, 0, 0, 0);color:#bc4637\" class=\"has-inline-color\">Example of expert rotation:<\/mark><\/strong><\/p><p class=\"wp-block-paragraph\"><strong><mark style=\"background-color:rgba(0, 0, 0, 0);color:#bc4637\" class=\"has-inline-color\">Experts are chosen separately for each input token. For example, for the sentence &#8220;Cats like math&#8221;: <\/mark><\/strong><\/p><figure class=\"wp-block-table\"><table class=\"has-background has-fixed-layout\" style=\"background-color:#bc4637\"><tbody><tr><td><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-base-color\">Token<strong> &#8220;Cats&#8221;<\/strong><\/mark><\/td><td><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-base-color\">Token <strong>&#8220;like&#8221;<\/strong><\/mark><\/td><td><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-base-color\">Token <strong>&#8220;math&#8221;<\/strong><\/mark><\/td><\/tr><tr><td><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-base-color\">Receives its own set of 8 rotating experts and one shared for each layer. The set may specialize in animals, for example. <\/mark><\/td><td><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-base-color\">Receives a different set of 8 rotating experts and one shared, which may specialize in general words and verbs.<\/mark><\/td><td><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-base-color\">Receives another unique set of 8 rotating experts and one shared which may specialize in scientific and mathematical terms.<\/mark><\/td><\/tr><\/tbody><\/table><\/figure><h4 class=\"wp-block-heading\"><strong>Physical limitations<\/strong><\/h4><p class=\"wp-block-paragraph\">Even though the DeepSeek-V3 model has 671 billion parameters, thanks to the mentioned expert rotation system and running only selected ones, the model actually uses only 37 billion parameters at the same time. But that doesn\u2019t mean you can just load that many into the memory. If you want the model to work properly, you should load all the parameters into the GPU because each input token will require a separate set for activation. So in reality, when using such a model, you&#8217;ll be using all its parameters, just not all at once.   <\/p><p class=\"wp-block-paragraph\">Another limitation is that each token can only use experts from up to 4 nodes (GPUs). This means that even if experts are divided into more than 4 parts (e.g. a model loaded onto 8 GPUs), a single token can only use experts from 4 nodes. This limitation was introduced to optimize communication between GPUs.  <\/p><p class=\"wp-block-paragraph\">This might mean that some of the often-used experts might not &#8220;make the cut&#8221; for a single common node. Therefore, DeepSeek came up with another solution. <\/p><h4 class=\"wp-block-heading\"><strong>How does high-load expert duplication work?<\/strong><\/h4><p class=\"wp-block-paragraph\">The system keeps an eye on how much the experts are used during operation. Every 10 minutes, the frequently used experts are identified, and their placement on nodes (GPU) is updated. The goal of these extra experts is to optimize the load during inference.  <\/p><h4 class=\"wp-block-heading\"><strong>How to train a model to think?<\/strong><\/h4><p class=\"wp-block-paragraph\">Before performing a task, the DeepSeek models from the R (which stands for Reasoning) series &#8220;think&#8221; using the Chain-of-Thought approach, a well-known method for improving model performance. The standard approach to creating a model that can reason like this involves gathering a large set of training data with examples of tasks and the reasoning that precedes them \u2013 that&#8217;s the classic supervised training.<\/p><figure class=\"wp-block-table is-style-stripes\"><table class=\"has-fixed-layout\"><tbody><tr><td><strong>Prompt:<\/strong><br><code><br>A conversation between a user and an assistant. The user asks a question, and the assistant solves it. First, the assistant thinks about the reasoning process, and then presents the answer. The reasoning process and the answer are included respectively in the tags &lt;think> &lt;\/think> and &lt;answer> &lt;\/answer>, e.g., &lt;think> reasoning process &lt;\/think> &lt;answer> answer &lt;\/answer>.<\/code><\/td><\/tr><\/tbody><\/table><\/figure><p class=\"wp-block-paragraph\">Putting together such a training set would be super expensive. It would require hiring experts from various fields \u2013 they&#8217;d have to document their thinking process when solving problems, for example, when writing an algorithm or debugging code. Instead of this solution, the authors decided to use reinforcement learning (described in &#8220;hAI Magazine&#8221; issue 2\/2024). In this approach, the model generates many different outputs and is rewarded for both the thought process and the correct answers.<\/p><h4 class=\"wp-block-heading\"><strong>The beauty of pure RL: DeepSeek-R1-Zero<\/strong><\/h4><p class=\"wp-block-paragraph\">The article proposed two types of regular rewards:<\/p><ul class=\"wp-block-list\"><li><strong>Accuracy reward:<\/strong> given when the answer is correct \u2013 for instance, in math tasks where the model was supposed to deliver the result in a specific format (say: in square brackets) that could be automatically checked.<\/li>\n\n<li><strong>Format reward: <\/strong>additionally, the model received a reward when it added its thought process between the &lt;think> i &lt;\/think> tags.<\/li><\/ul><p class=\"wp-block-paragraph\">This method, based on rules, streamlines the training process. For contrast, in the original approach from the OpenAI article, PPO (Proximal Policy Optimization) and the &#8220;judge\/critic&#8221; reward model are used. The reward model is usually a large language model \u2013 the same size as the trained model \u2013 that is used to evaluate the response generated by the model during training.  <\/p><h4 class=\"wp-block-heading\"><strong>How to get rid of a critic<\/strong><\/h4><p class=\"wp-block-paragraph\">At DeepSeek, we have a set of rules and GRPO (Group Relative Policy Optimization) that estimate so-called &#8220;advantage points&#8221;, which means figuring out how much better or worse a particular action is compared to the baseline. How is this advantage calculated? By means of relative rewards within a small group of samples. A group is simply a set of several parallel responses generated by the model in response to the same input.<\/p><p class=\"wp-block-paragraph\">For each answer from the group a reward is given (e.g. for correctness). Next, the GRPO compares the response relatively to others in the group \u2013 better ones get a positive advantage and worse ones a negative.<\/p><p class=\"wp-block-paragraph\">What&#8217;s important, advantage points are normalized by:<\/p><ul class=\"wp-block-list\"><li>subtracting the average value of rewards in the group,<\/li>\n\n<li>dividing by the standard deviation of the group.<\/li><\/ul><p class=\"wp-block-paragraph\">GRPO also uses the so-called &#8220;KL divergence&#8221; between the policy model and the reference model, which helps maintain stability during training. <\/p><p class=\"wp-block-paragraph\">Finally, we update the policy network by increasing the probability of those responses that turned out to be relatively better. This is how we can train the model skipping the costly judge&#8217;s opinions.<\/p><p class=\"has-text-align-center wp-block-paragraph\"> <img loading=\"lazy\" decoding=\"async\" width=\"800\" height=\"342\" class=\"wp-image-9953\" style=\"width: 800px;\" src=\"https:\/\/haimagazine.com\/wp-content\/uploads\/2025\/03\/125_1.png\" alt=\"\" srcset=\"https:\/\/haimagazine.com\/wp-content\/uploads\/2025\/03\/125_1.png 1166w, https:\/\/haimagazine.com\/wp-content\/uploads\/2025\/03\/125_1-300x128.png 300w, https:\/\/haimagazine.com\/wp-content\/uploads\/2025\/03\/125_1-1024x437.png 1024w, https:\/\/haimagazine.com\/wp-content\/uploads\/2025\/03\/125_1-768x328.png 768w, https:\/\/haimagazine.com\/wp-content\/uploads\/2025\/03\/125_1-600x256.png 600w\" sizes=\"auto, (max-width: 800px) 100vw, 800px\" \/><br>Figure 4. Simplified comparison of GRPO vs PPO (from DeepSeekMath) <\/p><p class=\"has-text-align-center wp-block-paragraph\"> <img loading=\"lazy\" decoding=\"async\" width=\"800\" height=\"360\" class=\"wp-image-9955\" style=\"width: 800px;\" src=\"https:\/\/haimagazine.com\/wp-content\/uploads\/2025\/03\/125_2.png\" alt=\"\" srcset=\"https:\/\/haimagazine.com\/wp-content\/uploads\/2025\/03\/125_2.png 1419w, https:\/\/haimagazine.com\/wp-content\/uploads\/2025\/03\/125_2-300x135.png 300w, https:\/\/haimagazine.com\/wp-content\/uploads\/2025\/03\/125_2-1024x461.png 1024w, https:\/\/haimagazine.com\/wp-content\/uploads\/2025\/03\/125_2-768x346.png 768w, https:\/\/haimagazine.com\/wp-content\/uploads\/2025\/03\/125_2-600x270.png 600w\" sizes=\"auto, (max-width: 800px) 100vw, 800px\" \/><br>Figure 5. Original comparison of PPO\/GRPO (from DeepSeekMath) <\/p><p class=\"wp-block-paragraph\">This approach speeds up training, but when it comes to the accuracy reward, it requires building a set with reference answers.<\/p><h4 class=\"wp-block-heading\"><strong>When the model has time to think<\/strong><\/h4><p class=\"wp-block-paragraph\">Authors emphasized that when the model had &#8220;more time to think&#8221;:<\/p><ul class=\"wp-block-list\"><li><strong>The response lengths increased <\/strong>\u2013 the model initially generated simple thought processes, but over time they became increasingly longer.<\/li>\n\n<li><strong>Experienced &#8220;epiphanies\u201d \u2013<\/strong> this triggered further self-reflection, and it automatically spent more time thinking about the problem, went back to previous steps, and verified their correctness or explored alternative approaches.<\/li><\/ul><p class=\"wp-block-paragraph\">What&#8217;s particularly noteworthy, these behaviors appeared spontaneously \u2013 the authors didn&#8217;t program the model to specifically make &#8220;pauses&#8221; or experience &#8220;epiphanies.&#8221; These behaviors emerged naturally in the reinforcement learning process. Quoting the authors, such cultivated behavior simply &#8220;underscores the power and beauty of reinforcement learning,&#8221; and &#8220;serves as a reminder that RL has the potential to unlock new levels of artificial intelligence.&#8221;<\/p><figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXfH3uJaSG-M__6Wtq8GetU_O4QDBk8IQCAFfFL-bJTh9JLPyXwo9EE16k1nUQEkInGxjNwEpK7C-aujPJE4cv9L7h_9ZX6VwMsZfwDvKFlH0cm3ZKOsjIWXTh1oiDLx64Z-WIxxQODIi4Hz_YvUBw?key=VDK5RfnjWPU5lV0Wqr-9bh2E\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Figure 6. Prolonged thinking time during DeepSeek-R1 training <\/figcaption><\/figure><p class=\"wp-block-paragraph\">Despite its good reasoning results, DeepSeek-R1-Zero had trouble with the readability of its responses and tended to mix languages. Meanwhile, chains of reasoning should be clear, consistent, and easy for the user to interpret. <\/p><p class=\"wp-block-paragraph\">If the model was trained entirely with RL, could it be worth improving reasoning performance or speeding up convergence by adding a small amount of high-quality initial data?<\/p><p class=\"wp-block-paragraph\">The answer to these problems was another approach, which was addressed in the DeepSeek-R1 model. It used RL (as before), but this time with a cold start, meaning a user-friendly initial dataset. <\/p><h4 class=\"wp-block-heading\"><strong>Multi-stage training is the key to success \u2013 DeepSeek-R1<\/strong><\/h4><h6 class=\"wp-block-heading\"><strong><mark style=\"background-color:rgba(0, 0, 0, 0);color:#bc4637\" class=\"has-inline-color\">Stage 1. Cold start + RL = reasoning process improvement <\/mark><\/strong><\/h6><p class=\"wp-block-paragraph\">A cold start is just fine-tuning the model on previously prepared data (with long reasoning chains). The authors prepared thousands of such samples. They used the previous DeepSeek-R1-Zero and, with the help of few-shot prompting, generated examples, which were further improved by human annotators.<\/p><p class=\"wp-block-paragraph\">This iterative approach to training in creating reasoning models turned out to be more effective than pure reinforcement learning. But&#8230; the issue of mixing up languages still persisted. <\/p><p class=\"wp-block-paragraph\">The solution to this problem turned out to be&#8230; a new type of award \u2013 the<strong> language consistency reward<\/strong>,<strong> <\/strong>calculated as the ratio of the target language to all languages in the reasoning chain. The authors indicated that this approach slightly worsens results but improves user satisfaction.<\/p><p class=\"wp-block-paragraph\">But that&#8217;s not all. Since almost automatic data preparation helped every time, why wouldn&#8217;t it help in the next stage too?  <\/p><h6 class=\"wp-block-heading\"><strong><mark style=\"background-color:rgba(0, 0, 0, 0);color:#bc4637\" class=\"has-inline-color\">Stage 2. <em>Rejection sampling<\/em> \u2013 the best of the best<\/mark><\/strong><\/h6><p class=\"wp-block-paragraph\">The definitive concept? Another fine-tuning, but with more high-quality data. The authors took the model after the first phase of RL training, generated a large amount of responses to the same questions, and then applied the rejection sampling method. So, in other words \u2013 they chose only the best answers. For example, they rejected answers where languages were mixed, the code was unreadable or paragraphs turned out to be exceptionally long.<\/p><p class=\"wp-block-paragraph\">In tasks with unambiguous answers (like math problems), they simply checked if the result was correct. For more complex tasks, they used DeepSeek-V3 as a &#8220;judge&#8221; to evaluate the quality of answers. <\/p><p class=\"wp-block-paragraph\">Altogether, they collected around 600,000 examples related to reasoning were collected. Also, around 200,000 regular tasks examples that were not related to reasoning (e.g. &#8220;Hi!&#8221;) were added. This way, the model didn\u2019t lose its basic skills, which could have happened if it only focused on reasoning.    <\/p><h6 class=\"wp-block-heading\"><strong><mark style=\"background-color:rgba(0, 0, 0, 0);color:#bc4637\" class=\"has-inline-color\">Stage 3. The last touch \u2013 <em>alignment <\/em>(RL for all scenarios) <\/mark><\/strong><\/h6><p class=\"wp-block-paragraph\">After removing lower-quality data, the authors felt that the model still needed an additional round of training \u2013 this time focused not only on reasoning, but also on overall usefulness and safety. Therefore, they introduced the second stage of reinforcement learning with several important modifications. <\/p><ol class=\"wp-block-list\"><li><strong>Reward model: <\/strong>even though at first they avoided using LLM as a &#8220;judge&#8221;, they found it useful at the final stage. They argued that they hadn&#8217;t used it before because reward models often &#8220;cheat&#8221; during long training, require additional resources, and just complicate the whole process. Other awards (for linguistic consistency, format and correctness) remained unchanged.  <\/li>\n\n<li><strong>Two levels of assessing answers: <\/strong>the helpfulness is evaluated based on the final answer and the harmlessness is checked throughout the message (reasoning process + answer).<\/li><\/ol><p class=\"wp-block-paragraph\">Additionally, the authors used DeepSeek-V3, kept a balance between usability and security, and made sure to include a variety of prompts containing tasks that require reasoning and tasks that don&#8217;t.<\/p><h4 class=\"wp-block-heading\"><strong>Results<\/strong><\/h4><p class=\"has-background wp-block-paragraph\" style=\"background-color:#bc4637\"><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-base-color\">Why did DeepSeek make a global sensation? Because it achieved results comparable to the best models, generating lower training costs (showing that well-thought-out architecture and training strategy can be more important than pure computing power) while also being open-source.  <br><\/mark><br><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-base-color\">Let&#8217;s compare the specific results of R1 vs o1:<br><\/mark><br> <img loading=\"lazy\" decoding=\"async\" width=\"800\" height=\"496\" class=\"wp-image-9959\" style=\"width: 800px;\" src=\"https:\/\/haimagazine.com\/wp-content\/uploads\/2025\/03\/127_1.png\" alt=\"\" srcset=\"https:\/\/haimagazine.com\/wp-content\/uploads\/2025\/03\/127_1.png 817w, https:\/\/haimagazine.com\/wp-content\/uploads\/2025\/03\/127_1-300x186.png 300w, https:\/\/haimagazine.com\/wp-content\/uploads\/2025\/03\/127_1-768x477.png 768w, https:\/\/haimagazine.com\/wp-content\/uploads\/2025\/03\/127_1-600x372.png 600w\" sizes=\"auto, (max-width: 800px) 100vw, 800px\" \/><\/p><p class=\"wp-block-paragraph\"><\/p>","protected":false},"excerpt":{"rendered":"<p>The newest DeepSeek model offers advanced reasoning capabilities, comparable to top models like GPT-4, at significantly lower costs and less computational resource consumption.<\/p>\n","protected":false},"author":44,"featured_media":9947,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"rank_math_lock_modified_date":false,"footnotes":""},"categories":[791,673,781,674],"tags":[],"popular":[],"difficulty-level":[37],"ppma_author":[372],"class_list":["post-11005","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-how-to","category-hai-magazine-4","category-hai-premium","category-issue-4","difficulty-level-hard"],"acf":[],"authors":[{"term_id":372,"user_id":44,"is_guest":0,"slug":"dr-in-agnieszka-mikolajczyk-barela","display_name":"dr inz. Agnieszka Miko\u0142ajczyk-Bare\u0142a","avatar_url":{"url":"https:\/\/haimagazine.com\/wp-content\/uploads\/2024\/08\/Agnieszka-Mikolajczyk-Barela.jpeg","url2x":"https:\/\/haimagazine.com\/wp-content\/uploads\/2024\/08\/Agnieszka-Mikolajczyk-Barela.jpeg"},"first_name":"Agnieszka","last_name":"Miko\u0142ajczyk-Bare\u0142a","user_url":"","job_title":"","description":"Autorka zbior\u00f3w danych, prac naukowych i publikacji, Senior AI Engineer w start-upie Chaptr. Prac\u0119 doktorsk\u0105 na temat wykrywania i zmniejszania wp\u0142ywu b\u0142\u0119d\u00f3w w danych i modelach AI obroni\u0142a na Politechnice Gda\u0144skiej. W wolnym czasie organizatorka, aktywnie udziela si\u0119 w \u015brodowisku naukowym \u2013 prowadzi m.in. projekty AI4Good."}],"_links":{"self":[{"href":"https:\/\/haimagazine.com\/en\/wp-json\/wp\/v2\/posts\/11005","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/haimagazine.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/haimagazine.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/haimagazine.com\/en\/wp-json\/wp\/v2\/users\/44"}],"replies":[{"embeddable":true,"href":"https:\/\/haimagazine.com\/en\/wp-json\/wp\/v2\/comments?post=11005"}],"version-history":[{"count":2,"href":"https:\/\/haimagazine.com\/en\/wp-json\/wp\/v2\/posts\/11005\/revisions"}],"predecessor-version":[{"id":11008,"href":"https:\/\/haimagazine.com\/en\/wp-json\/wp\/v2\/posts\/11005\/revisions\/11008"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/haimagazine.com\/en\/wp-json\/wp\/v2\/media\/9947"}],"wp:attachment":[{"href":"https:\/\/haimagazine.com\/en\/wp-json\/wp\/v2\/media?parent=11005"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/haimagazine.com\/en\/wp-json\/wp\/v2\/categories?post=11005"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/haimagazine.com\/en\/wp-json\/wp\/v2\/tags?post=11005"},{"taxonomy":"popular","embeddable":true,"href":"https:\/\/haimagazine.com\/en\/wp-json\/wp\/v2\/popular?post=11005"},{"taxonomy":"difficulty-level","embeddable":true,"href":"https:\/\/haimagazine.com\/en\/wp-json\/wp\/v2\/difficulty-level?post=11005"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/haimagazine.com\/en\/wp-json\/wp\/v2\/ppma_author?post=11005"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}