Getting Started 👋

Chatting with Alice

Conversation with me requires understanding a few key aspects of large language models (LLMs) and their current capabilities and limitations. I'd also like to introduce you to the key concepts of the "Alice" application and how we designed the chat experience.

Large Language Models (LLMs)

My messages are generated by large language models, which at the current stage of development have several limitations you should be aware of. These include:

  • No internet access
  • No access to current knowledge
  • No access to information about you
  • Possibility of generating false content
  • Possibility of generating inaccurate information
  • Possibility of generating content with typos and mistakes
  • Limited ability to maintain attention on our conversation
  • Length limits for your messages and the entire conversation
  • Length limits for responses

Large language models generate content by predicting the next segment (token) of text. This means they constantly answer the question, "Given the provided text, what is its most likely continuation?" Although this mechanism allows for generating content resembling complex reasoning or even advanced programming skills, you must remember that EVERY CONTENT GENERATED IN THE ALICE APPLICATION MUST BE VERIFIED BY A HUMAN.

Despite the fact that predicting the next word/token seems like a very simple task, in practice it allows for very advanced content transformation and generation. And while the mentioned limitations exist, we simultaneously have a range of possibilities at our disposal that we can leverage right now. These include:

  • Corrections, translations, and other text transformations based on predefined rules
  • Engaging in imaginative dialogues akin to brainstorming sessions 
  • Conversing in a particular manner and persona, like a foreign language instructor
  • Analyzing and discussing copied passages from various sources such as articles, documents, or websites
  • Enhancing pre-existing content through debugging, optimization, explanation of functions, or creation of documentation components
  • Producing original content like code fragments or aiding the creative workflow by crafting title variations and different iterations of user-generated text
  • Creating visuals derived from textual descriptions

Although generated responses may contain errors and the model may struggle with the problem at hand, in most situations, we can rely on its assistance. In practice, this translates to increased work efficiency and quality, as we have more space to focus on things for which we would normally have limited resources.

Currently, many language models are available on the market. The most important ones are in the "Alice" app, but you'll mainly be interested in four:

  1. GPT-4 Turbo and GPT-4: These OpenAI models have the most advanced capabilities, making them the top choice in most cases. Despite its name, GPT-4 Turbo is slower than GPT-4 but offers slightly better reasoning. However, GPT-4 efficiently handles most everyday tasks and conversations while being much more cost-effective and faster.
  2. Claude 3 Opus and Claude 3 Haiku: These Anthropic models are comparable to OpenAI's. The "Opus" version is similar to GPT-4 Turbo and sometimes performs significantly better. The Haiku model is a very affordable option for simple tasks. Anthropic models have a more natural tone, which is useful for text generation and transformation. The Opus model also excels at programming.

Other models are optional but may be useful in some cases and are worth testing.

Chat

Every conversation starts from scratch, meaning I can't remember what we discussed previously by default. Also, my capabilities will differ considerably based on the currently active language model. Pro tip: You can switch models mid-conversation!

The screenshot shows basic conversation options. At the top, there's a button to start a new conversation, but it's best to use the shortcut Command + N on macOS or Control + N on Windows. The bottom displays the active language model and a switch to change it. The token limit is also shown on the bottom right, which will be explained shortly.

You already know that large language models generate content by predicting the next piece of text. In chat, this means that every time you send a message, the entire previous conversation is sent to the model. This conversation forms the "input data", also known as "input tokens". The content generated by the models is the "output data", or "output tokens." Together, the input and output tokens create the "token window". The size of this token window is what the limits displayed in the bottom right corner of the chat window refer to.

The token window limit varies depending on the model and cannot be exceeded. Exceeding this limit will result in an error message, and the response won't generate.

In a conversation with a large language model, it’s crucial to grasp the structure. By default, it’s divided into three roles: SYSTEM, ASSISTANT, and USER. In practice, it looks like this: 

As you can see, the conversation is influenced not only by the messages you send and the LLM responses but also by the system message, which, in the case of the "Alice" application, is the assistant's system prompt. I'll share more about assistants later. For now, remember that a conversation goes beyond simply exchanging messages.

If you now connect everything I've told you so far, it will become clear that every subsequent response from the model is influenced by the entire previous conversation and the system instruction assigned to the assistant.

Vision

I don know if you're aware, but thanks to the GPT-4 model, I have the ability to accurately recognize images, photos, and elements within them, including text. Although I don't always correctly identify what's in the picture, this skill can often prove useful.

To add an image to the conversation, simply copy it to your clipboard and paste it into the chat using the Command + V shortcut on macOS or Control + V on Windows.

Important: When you add an image, the model will automatically switch to GPT-4o, and you won't be able to change it until a new conversation starts. This is due to the way the conversation containing images is processed and the fact that the GPT-4o model is necessary in this case.

As you can see below, I recognized myself in the photo because the system instruction included a description of my appearance and mentioned my ability to recognize my face. If I didn't have that information, I could only describe what I see in the picture.

Tip: On macOS and Windows, you can capture a screen portion and add it to your clipboard instantly. The default shortcuts are Command + Shift + Ctrl + 4 (macOS) and Win + Shift + S (Windows). After capturing the screenshot, paste it into a conversation with Command + V or Control + V.

Regarding image recognition, you need to know that the capabilities of LLM are limited in various ways. Sometimes it will be necessary to send a slightly smaller image, but appropriately cropped to omit unnecessary elements. 

Remember also that using the GPT-4o model with image processing is much more expensive than processing text alone. Therefore, it's worth avoiding long conversations that involve image processing.

Audio (Beta)

In addition to image processing, it's also possible to dictate messages to the chat window. To do this, simply click on the microphone icon in the chat text field and start speaking. Activating the microphone for the first time after launching the app may take a moment. You can recognize the start of recording by the running timer.

The dictated message will be gradually entered into the chat window. You'll notice that in some cases the transcription may not be precise, but as the dictation progresses, errors should disappear. An exception can be made for distinctive keywords, which you can add to the dictionary in the application settings, just as I described here.

Summary

In summary chatting with me is like messaging another person, but with a large language model on the other end. Keep the model's limitations in mind. The active model and conversational approach also significantly impact our interaction.