Generalization by asking! How can prompt engineering be the gateway to zero-shot generalization?



The command in the input sequence is what is mostly contributing to the output we see! Notice how the words of the sentence are on the low spectrum of contributing percentage. Is this an artifact of this example being a running example of the paper? Or does the model genuinely care about the command more than the actual sentence to produce an accurate output?

Let’s try an unseen task. We create an example of summarizing the summary of 6.S898 class!






Interesting! You can see that the contribution of the input is towards (1) the command and (2) the word in the input that the model decides should be part of the output. While this is a property of summarization, we can tell that the model interpreted the task.

Now that we have seen T5 we can try other models, for example its successor T0. Note that in the attached google colab, we will be using GPT-2 as our base model. Originally we were supposed to test T0 but due to it being a large file (41GB to download) we will replace it with GPT-2. You can try any other language model with a conditional generation property.

Let’s try the “just ask” methodology in prompting language models. If we look at prompts with different styles:

  1. prompt = “Translate to German: That is good.”
  2. prompt = “How do you say “That is good.” in German?”
  3. prompt = “This is good. How do you say it in German?”

What will happen? Let’s see!

task style “Translate to German: That is good.”



question style “How do you say “That is good.” in German?”



question style “This is good. How do you say it in German?”




In the first prompt, we get a sentence completion type of continuations but in the other two prompts, we get an answer from the model, although not for the question we are seeking. In all cases, we can say that the output is not meaningful. This task is probably very hard for the model since, well GPT-2 is nowhere near its successor GPT-3 in terms of generalization capabilities AND it has not seen the type of the prompt before, nor has it seen the task. However it is still interesting to look at how the input and output connect, if you have access to GPT-3 this would be definitely worthwhile to investigate!

That’s it! I hope you enjoyed going through the blog and exploring how prompts can gauge the model output. When T0 gets more compact, this blog will be updated…


[1] Brown et al., Language models are Few-Shot learners. May 2020.
[2] Brian Lester, Rami Al-Rfou, and Noah Constant. The power of scale for Parameter-Efficient prompt tuning. April 2021.
[3] Liu et al., Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. July 2021.
[4] Radford et al., Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
[5] Raffel et al., Exploring the limits of transfer learning with a unified text-to-text transformer. 2019.
[6] Sanh et al., Multitask prompted training enables Zero-Shot task generalization. October 2021.

-->

Enter your title here