On Tuesday, Meta AI unveiled a demo of Galactica, a large language model designed to “store, combine and reason about scientific knowledge.” While intended to accelerate writing scientific literature, adversarial users running tests found it could also generate realistic nonsense. After several days of ethical criticism, Meta took the demo offline, reports MIT Technology Review.
Large language models (LLMs), such as OpenAI’s GPT-3, learn to write text by studying millions of examples and understanding the statistical relationships between words. As a result, they can author convincing-sounding documents, but those works can also be riddled with falsehoods and potentially harmful stereotypes. Some critics call LLMs “stochastic parrots” for their ability to convincingly spit out text without understanding its meaning.
Enter Galactica, an LLM aimed at writing scientific literature. Its authors trained Galactica on “a large and curated corpus of humanity’s scientific knowledge,” including over 48 million papers, textbooks and lecture notes, scientific websites, and encyclopedias. According to Galactica’s paper, Meta AI researchers believed this purported high-quality data would lead to high-quality output.