As the name suggests, this is Meta’s second version of the tool (LLaMA stands for Large Language Model Meta AI). According to Meta, the new LlaMa was trained on 40% more data than its predecessor and has double the context length.
But how does it compare to some of the other text-generating A.I. tools out there, like ChatGPT, Bing Chat or Google Bard?
I played around with LLaMa 2 to see how it performs on some of the common tasks that generative A.I. tools are useful for. What I found was a powerful open-source model that offers lots of potential to be adapted and customized for different experiences. But as an out-of-the-box consumer-facing A.I. assistant for jobs like writing or researching, LLaMA 2 is a usable, but not superior tool compared to some of the existing bots.
A different kind of A.I. bot
One thing to understand about LLaMa 2 is that its primary purpose isn’t to be a chatbot. LLaMa 2 is a general LLM available for developers to download and customize, part of Meta CEO Mark Zuckerberg’s plan to improve and advance the model.
That means that if you want to use LLaMa 2 as a chatbot, you’ll need to use special demo versions available on platforms like Hugging Face. The version that we used, HuggingChat, was created by the developer community by deploying LLaMa 2 to Hugging Face. There are other available places to try different LLaMa 2-based chatbots, but HuggingChat is a specialized chatbot, created to be an open-source alternative to ChatGPT.
To try HuggingChat click here.
Philipp Schmid, a technical director of Hugging Face, told Fortune that while the chatbot is comparable to other A.I. bots, it’s not a perfect comparison. LLaMa 2’s specialty is that it can inexpensively be shaped for specific needs. The model hasn’t been fine-tuned to a specific purpose the way a product like Bing Chat has.
LLaMA 2 is also not connected to the internet. That means it has a “knowledge cutoff” at December 2022. That’s more recent than the September 2021 cutoff of ChatGPT. The creators of the HuggingChat chatbot added an option to search the web, but it’s still in the early stages and doesn’t give LLaMa 2 the same capacity as other web-searching chatbots. If you need the most up-to-date information from the internet, you’re better served with a tool like Bing Chat or Google Bard.
In a paper announcing the release of LLaMa 2, Facebook researchers wrote that LLaMA 2 models generally perform better than existing open-source models and are close behind closed-source models like ChatGPT, according to the human evaluations in the paper. The paper acknowledges it can’t yet fully compare to GPT4, OpenAI’s most advanced LLM.
Putting LLaMa 2 to the test
I asked the bot to write an email to my co-workers telling them I was going out of town. It spit out a decent memo suited for the crisp formality of the corporate space.
It can write emails, but can it navigate the touchy subjects, like turning down a job offer? I prompted the bot to draft me an email response saying that I couldn’t accept the job offer. It wrote a short, impersonal three paragraphs that might pass as human, but the email certainly wouldn’t smooth any frustrations that would come from receiving a rejection.
So I asked it again, and I requested it to be more specific, personal and apologetic. It responded with a wordy, possibly too formal email, but this time the email was do-able. It seems the LLaMa 2 demo can fake some contrition when requested
LLaMa could do these kinds of tasks, especially if you prompt it with specifics. It could write decent summaries, and it could easily draft a memo should someone need help. There’s functionality for politely declining a meeting — just feed it the specific names, times and reasons — or for writing specific, formal emails.
Compared to ChatGPT, I found LLaMa 2’s penmanship to be decent but overly formal. I’d use ChatGPT because it often has a stronger flair for putting a degree of human in its language. LLaMa was a bit unpolished and generic for these tasks.
For more creative or “literary” writing tasks, LLaMa 2 was mixed. It struggles to follow word count instructions. If I asked for a 150-word short story, it would give me 190 words. It could write a haiku or 16-line poem about any suggested topic, but whether it was any good is hard to say. Do you think “Circuits hum with life, Processors pace the digital strife, Binary symphony” is a strong haiku?
I asked it to write about “the plight of journalism in 2020,” and it wrote a fairly terrible 16-line poem. While the chatbots aren’t known for their literary elegance (and I’m likely not qualified to judge a poem), this poem felt half-baked. It didn’t rhyme, and even though it generated fun lines like “ink-stained wretches, once the fourth estate’s pride” and had a coherent theme, I wouldn’t call it well-written by any stretch.
When it comes to research, LLaMA 2 isn’t up to par
I also quized the bot on some hard facts, asking it to tell me about the property crisis in China. It served up a slew of bullet points summarizing the market, societal problems and infrastructure in China. When pressed for more information, it could even elaborate on the housing prices and the effects of the COVID-19 pandemic.
Then, I asked it to give me a 50-word summary with citations. It gave me 71 words with the names of publications in parentheses at the end. I turned on the “Search web” function, which allows it to pull from the web, and asked again. It gave me 50 words this time, but each link led to a non-existent page.
When asked about what’s going on with the crisis in July 2023, it again fed me a slew of confusing apologies for misinformation and more broken links.
Between the knowledge cutoff of December 2022 and its faulty search function, it’s likely best to not use this for important research. It’s still a demo, but it’s in need of some fine-tuning. The same rules apply to all generative A.I. tools — always do your research about what it creates. But it is especially important to do that for this tool. It hallucinated citations and it has a knowledge cutoff. If I asked it to summarize and condense information or to alter the text, its response would be increasingly prone to hallucinating fake information.
Should you ride the LLaMA?
The LLaMA 2 demo on Hugging Face isn’t the same as the other chatbots like ChatGPT, Google Bard, and Bing Chat. It shows promise for an early version of a chatbot, but it’s still pretty unpolished. It’s not great for researching, and it had some “deceitful” moments (if you’ll excuse the anthropomorphism).
If I were looking to use the demo for anything more than testing and writing memos, I would have to sift through wordy, occasionally unfinished work.
That said, there are countless reasons to use an A.I. chatbot, and tools like the LLama 2-based HuggingChat are constantly being tweaked and updated. So I encourage you to take this bot for a spin yourself, and see if it’s better suited for what you need. Just be aware of its limitations.
You can try HuggingChat here.
And here’s Fortune’s 3-step guide on how to use chat A.I. tools.