How to ask the engineer Phi-3-MINI: a practical guide

News FetcherApril 20, 2025

0 0 8 minutes read

How to ask the engineer Phi-3-MINI: a practical guide

The claims are basically our requests or inputs to artificial intelligence models. Claiming engineering, as the name suggests, relates to a little depth of basic claims by creating specialized inputs (more) effectively direct artificial intelligence models to produce almost perfect outputs.

You don’t have to use a programming language or IDE for this purpose, as most people suggest that you can only use the front end of Chatgpt. This is technically accurate, but this does not give you the exact “fun” engineering “pleasure” demanding a model to use a programming language, not to mention that it is not effective either.

In this article, we will go on how to do this in Bethon, using the Phi-MINI-4K-Instruct model by Microsoft. We will use the InduggesFace Investment Application interface for this, so you will not have to download the 7 GB locally.

Consider Manipulation The interior model, not from the basic chat messages. Tampering with her, to be abstract.

Environmental preparation

Create a Lugingface account and recover the API (your profile> access codes) With “Write” access.

This is not published by sponsorship. If you work with LLMS, you will have to create a Lugingface account at some point; This is certain.
Make sure to install Python 3.10+ in your system and prepare IDE. Or you can use the notebook on Google Colab.
Install the ‘Ugingface_hub’ Book

Understanding the basics

Before jumping to the code, let’s learn a little about the demand engineering.

As I mentioned before, important engineering creates specialized inputs mainly to control the outputs of the model based on your requirements.

Various LLMS responds to different different engineering techniques differently. This means that you cannot use the same fast engineering template for everyone and all LLM. This means again that you have to read LLM documents to know which technology is the best for use.

Here are some popular films:

Learning zeroRequesting the model with an important performance without any examples

Classify the next text as positive or negative: “I really enjoyed this movie!”

This works with well trained models such as GPT-4, Claude 3 OPUS and Gemini Ultra.

In my experience, Mistral-7B, although it is small LLM, it also has impressive results in zero learning.

Learn a little shotProvide some examples before asking the model with an important performance.

The text: “The food was terrible.” Feelings: negative

The text: “I have spent a great time.” Feelings: positive

Ideal for tasks that may be a little mysterious for the model or where you want to show a specific format.

Cott idea (CO): Encouraging the model to explain his step by step.
```
Question: If John has 5 apples and gives 2 to Mary, how many does he have left?
Let's think through this step by step:
```
The first thing that might come to your mind is the Deepseek R1 model. Well, this is true; Perhaps it was the first form published with a A visual series of thought, That is why it was the game changed.
Role -based claim: Requesting the form to take over a specific role or personality.
```
You are an expert Python programmer. Please review this code and suggest improvements:
```
This should be the technique of the most popular student among non -programmers. ChatGPT, CLADE and most other Chatbots excel in providing roles -based outputs.
System mentor: Preparing context and instructions Before the actual user inquiry

This is my favorite when it comes to “tampering” with LLM. You can only do this at the back interface in most cases, which is simply great.

The system’s mentor works as the “personality and instructions” that have been set for a specific model. It is useful for determining rules or restrictions.

What’s more, you can do what you cannot do with the basic inputs when determining the system message. If we take a small LLM, for example, if you put something harmful to it with a basic input message, it will deny the response to it. However, if you change the system of the system, there is a high possibility, it will ignore its safety handrails and try to answer it – in some models.

(It is a serious supervision in LLMS, I agree.)

All technologies mentioned above can be made on the user interface for ChatGPT or other user interface, except for the system and the covers of the covers (technically, we can do this as well, but not really effective).

Therefore, we will talk about these two in the next section.

A series of profits

In most llms, you cannot see the series of ideas behind her thinking, but you can make it visible through the instant engineering in Python.

Before writing the job, import the library and identify the customer:

from huggingface_hub import InferenceClient

# Replace with your Hugging Face token
client = InferenceClient(token="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")

Then we have to determine how we can implement the thought chain.

LLMS does not have a direct function to make her interior ideas visible-with the exception of Deepsek R1, where she is integrated.

This means that if we want to achieve this, we will have to use the system of the system. However, do not confuse this with the techniques we discussed previously. The system’s claim, in this case, behaves more like a way to implement COT, not the claim technique.

This is how we can say that:

Format your response as follows

1. THINKING: First, show all mental steps, considerations, and explorations. Include alternative hypotheses you consider and reject. Think about edge cases.
2. VERIFICATION: Double-check your logic and facts, identifying any potential errors.
3. ANSWER: Only after showing all thinking, provide your final answer.

Here is how to combine it into the job to create outputs:

def generate_chain_of_thought_response(user_input):
    # System message defines personality and expectations
    system_prompt = (
        "Format your response as follows:"
"1. THINKING: First, show all mental steps, considerations, and explorations. Include alternative hypotheses you consider and reject. Think about edge cases."
"2. VERIFICATION: Double-check your logic and facts, identifying any potential errors."
"3. ANSWER: Only after showing all thinking, provide your final answer."
    )

    # Alternating user input to encourage visible reasoning
    formatted_user_input = f"{user_input}\nLet's think through this step by step."

    # Phi-style formatting
    prompt = (
        f"<|im_start|>system\n{system_prompt}<|im_end|>\n"
        f"<|im_start|>user\n{formatted_user_input}<|im_end|>\n"
        f"<|im_start|>assistant\n"
    )

    # Call the model
    response = client.text_generation(
        prompt,
        model="microsoft/Phi-3-mini-4k-instruct",  
        max_new_tokens=500,
        temperature=0.7,
        top_p=0.95,
        repetition_penalty=1.1,
        stop_sequences=["<|im_end|>"]
    )

    # Cleanup
    answer = response.strip().split("<|im_end|>")[0].strip()

    return answer

In this symbol, we distinguished the LLM borders. Let me explain to them one by one.

max_new_tokens=500: This parameter determines the maximum number of distinctive symbols that the model is allowed to establish in response to the input. One symbol may represent a word or part of the word (depending on the type of model), and its purpose is to make sure that the response is not very long.
temperature=0.7: This parameter deals with Randomness It is directed by the form. If it is less, like 0.2, the model responses are more concentrated and related; It may also lead to repetition and lack of creativity.

When it is higher, on the other hand, the model generates more diverse and creative outputs, but it may lead to non -relevant information (well, sometimes). 0.7, anyway, strikes in the middle and appear to be suitable for this model.

top_p=0.95Top_P teacher uses Nucleus samples to determine the smallest set of icons of at least 95 % of its cumulative probability 95 %. Unlike TOP_K, which limits the options to a fixed number, Top_P adjusts a dynamic symbol collection based on the possibility. A wisest approach in this case.
repetition_penalty=1.1: This applies to a “penalty” previously to repeated symbols, which makes it less likely to appear in the text that has been created again and again. Value more than 1.0 reduces the possibility of repetition a lot.

Also note how to coordinate the claim here:

 f"<|im_start|>system\n{system_prompt}<|im_end|>\n"
        f"<|im_start|>user\n{formatted_user_input}<|im_end|>\n"
        f"<|im_start|>assistant\n"

This coordination, merging “<| im_start |>“<| im_end |>`, Depends on the type LLM. The best way to determine this is a Chatgpt claim to read the form documents.

Finally, to experience interactive chat, implement this episode:

print("Chain-of-Thought Phi (type 'exit' to quit)")
while True:
    user_input = input("\nYou: ")
    if user_input.lower().strip() in {"exit", "quit"}:
        break
    output = generate_chain_of_thought_response(user_input)
    print("\nAssistant:\n", output)

It’s time for a fast test. Run the text program, and ask a question like “What is 7 x 9 + 100?” You can expect to take out like below:

Firstly, let us break down the expression into two parts according to the order of operations (PEMDAS/BODMAS): parentheses first then exponents or powers, followed by multiplication and division from left to right, and finally addition and subtraction from left to right. There are no parentheses or exponents in our case; so we move on to multiplication before dealing with addition. Here’s how it breaks down:

Step 1 – Multiplication part: We need to multiply 7 times 9 which gives us \(7 \times 9 = 63\).

Next Step - Addition part: Now take that result and add 100 to it (\(63 + 100\)).

Adding these together yields \(63 + 100 = 163\).

So, when calculating \(7 \times 9 + 100\), following the correct arithmetic sequence will give us a total of 163.

This may not seem like a big deal, but if you only use PHI-3-MINI-4K-Instruct without any fast engineering, the output will be more simple.

Raw output (before the demanding engineering)

This is about a bed. Let’s go to the system’s message.

Regulation claims

One way to announce the type of gender system is somewhat without a symbol to pay at the beginning of each chat in artificial intelligence models. But when the conversation continues more, most models tend to forget the initial instructions due to the windows of the context.

However, when it announces a system router on the LLM back interface, the model will adhere to it throughout the entire conversation. Why? Before creating any response, the model reads the system’s message first for the entire conversation, regardless of the window of context.

Regarding the symbol, start the license, as we did earlier:

from huggingface_hub import InferenceClient

# Replace 'YOUR_HF_API_TOKEN' with your actual Hugging Face API token
client = InferenceClient(token="YOUR_HF_API_TOKEN")

In this case, I will write a system message to make the model calm and peaceful, as in the Buddhist Zen. Note that PHI models have a content baptism (good function, microsoft), and you will not be able to change the claim to anything that is harmful.

Here is the code that we can use:

def generate_response(user_input):
    system_message = (
       "Use words often used in Zen buddhism"
       "Act like you are a monk, staying calm and peaceful"
       "Encourage the user to be calm and follow Zen practices too"
    )

    prompt = (
        f"<|im_start|>system\n{system_message}<|im_end|>\n"
        f"<|im_start|>user\n{user_input}<|im_end|>\n"
        f"<|im_start|>assistant\n"

    )

For some reason, this model is directed by <| im_end |>. This does not affect the performance of the model, but we can coordinate it anyway.

    # Clean up the result
    answer = response.strip()

    if answer.endswith("<|im_end|>"):
        answer = answer.replace("<|im_end|>", "").strip()

    formatted_answer = '\n'.join(answer[i:i + 190] for i in range(0, len(answer), 100))
    return formatted_answer

This is all. Complete the code with a user input loop as follows:

print("Zen AI (type 'quit' to exit)")
while True:
    user_input = input("\nYou: ")
    if user_input.lower() in ["quit", "exit"]:
        break
    response = generate_response(user_input)
    print("Assistant:", response)

Run a fast test, and see how the form is attached to the system’s system beautifully.

You: Hello

Assistant: Names. I hope your day will be calm and reasonable as guidelines.

Do not hesitate to change Max_new_tokens or other values for your needs.

And Woela! We have successfully prompted the PHI-3-MINI model to show a series of ideas and then become a zen monk.

summary

Fast engineering, although it looks like a big deal, is not a big deal. What matters is the way the model requires to do what you want; And remember, you cannot force a model to do what to do. You should ask for this by gentle persuasion – like a mother who asks a young child and put his jacket, without causing anger.

For example, if we tell us the Phi-3-MINI model as “You are Monk Freakin Zen! You act as one! Worse, you will always get responses like” Please remember that as AI was developed by Microsoft, called Phi (or GPT) … “.

This is all for today. Thanks for reading so far. We see you in … two weeks?

News FetcherApril 20, 2025

0 0 8 minutes read