{"id":655,"date":"2021-03-01T15:03:44","date_gmt":"2021-03-01T20:03:44","guid":{"rendered":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/?p=655"},"modified":"2022-04-11T17:30:19","modified_gmt":"2022-04-11T21:30:19","slug":"the-scariest-deepfake-of-all-ai-text-generator-gpt-3","status":"publish","type":"post","link":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/science\/the-scariest-deepfake-of-all-ai-text-generator-gpt-3\/","title":{"rendered":"&#8216;The Scariest Deepfake of All&#8217;: AI-Generated Text &amp; GPT-3"},"content":{"rendered":"<p><span style=\"font-weight: 400\"><a href=\"http:\/\/unsplash.com\/photos\/ncLdDcvrcfw\"><img loading=\"lazy\" decoding=\"async\" class=\"alignright size-medium wp-image-660\" src=\"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-content\/uploads\/sites\/35\/2021\/03\/jason-leung-300x253.jpg\" alt=\"\" width=\"300\" height=\"253\" srcset=\"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-content\/uploads\/sites\/35\/2021\/03\/jason-leung-300x253.jpg 300w, https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-content\/uploads\/sites\/35\/2021\/03\/jason-leung-1024x865.jpg 1024w, https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-content\/uploads\/sites\/35\/2021\/03\/jason-leung-768x649.jpg 768w, https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-content\/uploads\/sites\/35\/2021\/03\/jason-leung-1536x1298.jpg 1536w, https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-content\/uploads\/sites\/35\/2021\/03\/jason-leung-2048x1730.jpg 2048w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a>Recent advances in machine-learning systems have led to both exciting and unnerving technologies\u2014personal assistance bots, email spam filtering, and search engine algorithms are just a few omnipresent examples of technology made possible through these systems. Deepfakes (deep learning fakes), or, algorithm-generated synthetic media, constitute one example of a still-emerging and tremendously consequential development in machine-learning. WIRED recently called AI-generated text \u201cthe scariest deepfake of all\u201d, turning heads to one of the most powerful text generators out there: artificial intelligence research lab OpenAI\u2019s Generative Pre-Trained Transformer (GPT-3) language model.<\/span><\/p>\n<p><a href=\"https:\/\/www.wired.com\/story\/ai-text-generator-gpt-3-learning-language-fitfully\/\"><span style=\"font-weight: 400\">GPT-3<\/span><\/a><span style=\"font-weight: 400\"> is an autoregressive language model that uses its deep-learning experience to produce human-like text. Put simply, GPT-3 is directed to study the statistical patterns in a dataset of about a trillion words collected from the web and digitized books. GPT-3 then uses its digest of that massive corpus to respond to text prompts by generating new text with similar statistical patterns, endowing it with the ability to compose news articles, satire, and even poetry.\u00a0 <\/span><\/p>\n<p><span style=\"font-weight: 400\">GPT-3\u2019s creators designed the AI to learn language patterns and immediately saw GPT-3 scoring exceptionally well on reading-comprehension tests. But when OpenAI researchers configured the system to generate strikingly human-like text, they began to imagine how these generative capabilities could be used for harmful purposes. Previously, OpenAI had often released full code with its publications on new models. This time, GPT-3s creators decided to <\/span><a href=\"https:\/\/www.wired.com\/story\/ai-text-generator-too-dangerous-to-make-public\/\"><span style=\"font-weight: 400\">hide its underlying code from the public<\/span><\/a><span style=\"font-weight: 400\">, not wanting to disseminate the full model or the millions of web pages used to train the system. In OpenAI\u2019s research <\/span><a href=\"https:\/\/arxiv.org\/abs\/2005.14165\"><span style=\"font-weight: 400\">paper<\/span><\/a><span style=\"font-weight: 400\"> on GPT-3, authors note that \u201cany socially harmful activity that relies on generating text could be augmented by powerful language models,\u201d and \u201cthe misuse potential of language models increases as the quality of text synthesis improves.\u201d\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">Just like humans are prone to internalizing the belief systems \u201cfed\u201d to us, machine-learning systems mimic what\u2019s in their training data. In GPT-3\u2019s case, biases present in the vast training corpus of Internet text led the AI to generate stereotyped and prejudiced content. Preliminary testing at OpenAI has <\/span><a href=\"http:\/\/acknowledge\"><span style=\"font-weight: 400\">shown<\/span><\/a><span style=\"font-weight: 400\"> that GPT-3-generated content reflects gendered stereotypes and reproduces racial and religious biases. Because of already <\/span><a href=\"https:\/\/www.pewresearch.org\/politics\/2020\/09\/14\/americans-views-of-government-low-trust-but-some-positive-performance-ratings\/\"><span style=\"font-weight: 400\">fragmented trust<\/span><\/a><span style=\"font-weight: 400\"> and pervasive <\/span><a href=\"https:\/\/www.pewresearch.org\/fact-tank\/2020\/11\/13\/america-is-exceptional-in-the-nature-of-its-political-divide\/\"><span style=\"font-weight: 400\">polarization<\/span><\/a><span style=\"font-weight: 400\"> online, Internet users find it increasingly difficult to trust online content. GPT-3-generated text online would require us to be even more critical consumers of online content. The ability for GPT-3 to mirror societal biases and prejudices in its generated text means that GPT-3 online might only give more voice to our darkest emotional, civic, and social tendencies.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Because GPT-3\u2019s underlying code remains in the hands of OpenAI and its <\/span><a href=\"https:\/\/openai.com\/blog\/openai-api\/\"><span style=\"font-weight: 400\">API<\/span><\/a><span style=\"font-weight: 400\"> (the interface where users can partially work with and test out GPT-3) is not freely accessible to the public, many concerns over its implications steer our focus towards a <\/span><i><span style=\"font-weight: 400\">possible<\/span><\/i><span style=\"font-weight: 400\"> future where its synthetic text becomes ubiquitous online. Due to GPT-3\u2019s frighteningly successful \u201cconception\u201d of natural language as well as creative capabilities and bias-susceptible processes, many are worried that a GPT-3-populated Internet could do a lot of harm to our information ecosystem. However, GPT-3 exhibits powerful affordances as well as limitations, and experts are asking us not to project too many fears about <\/span><a href=\"https:\/\/medium.com\/mapping-out-2050\/distinguishing-between-narrow-ai-general-ai-and-super-ai-a4bc44172e22\"><span style=\"font-weight: 400\">human-level AI<\/span><\/a><span style=\"font-weight: 400\"> onto GPT-3 just yet.<\/span><\/p>\n<p><b>GPT-3: Online Journalist<\/b><\/p>\n<figure id=\"attachment_664\" aria-describedby=\"caption-attachment-664\" style=\"width: 457px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/arxiv.org\/abs\/2005.14165\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-664\" src=\"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-content\/uploads\/sites\/35\/2021\/03\/GPTarticle-300x183.png\" alt=\"\" width=\"457\" height=\"279\" srcset=\"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-content\/uploads\/sites\/35\/2021\/03\/GPTarticle-300x183.png 300w, https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-content\/uploads\/sites\/35\/2021\/03\/GPTarticle-1024x625.png 1024w, https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-content\/uploads\/sites\/35\/2021\/03\/GPTarticle-768x469.png 768w, https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-content\/uploads\/sites\/35\/2021\/03\/GPTarticle.png 1221w\" sizes=\"auto, (max-width: 457px) 100vw, 457px\" \/><\/a><figcaption id=\"caption-attachment-664\" class=\"wp-caption-text\"><span style=\"color: #993366\">GPT-3-generated news article that research participants had the greatest difficulty distinguishing from a human-written article<\/span><\/figcaption><\/figure>\n<p><span style=\"font-weight: 400\">Fundamentally, concerns about GPT-3-generated text online come from an awareness of just how different a threat synthetic text poses versus other forms of synthetic media. In a recent article, WIRED contributor Renee DiResta <\/span><a href=\"http:\/\/www.wired.com\/story\/ai-generated-text-is-the-scariest-deepfake-of-all\/.\"><span style=\"font-weight: 400\">writes<\/span><\/a><span style=\"font-weight: 400\"> that, throughout the development of photoshop and other image-editing CGI tools, we learned to develop a healthy skepticism, though without fully disbelieving such photos, because \u201cwe understand that each picture is rooted in reality.\u201d She points out that generated media, such as deepfaked video or GPT-3 output, is different because there is no unaltered original, and we will have to adjust to a new level of unreality. In addition, synthetic text \u201cwill be easy to generate in high volume, and with fewer tells to enable detection.\u201d Right now, it is possible to detect repetitive or recycled comments that use the same snippets of text to flood a comment section or persuade audiences. However, if such comments had been generated independently by an AI, DiResta notes, these manipulation campaigns would have been much harder to detect:<\/span><\/p>\n<p><b>\u201cUndetectable textfakes\u2014masked as regular chatter on Twitter, Facebook, Reddit, and the like\u2014have the potential to be far more subtle, far more prevalent, and far more sinister \u2026 The ability to manufacture a majority opinion, or create a fake commenter arms race\u2014with minimal potential for detection\u2014would enable sophisticated, extensive influence campaigns.\u201d <span style=\"color: #993366\">&#8211; Renee DiResta, WIRED<\/span><\/b><\/p>\n<p><span style=\"font-weight: 400\">In their <\/span><a href=\"https:\/\/arxiv.org\/abs\/2005.14165\"><span style=\"font-weight: 400\">paper<\/span><\/a><span style=\"font-weight: 400\"> \u201cLanguage Models are Few-Shot Learners,\u201d GPT-3\u2019s developers discuss the potential for misuse and threat actors\u2014those seeking to use GPT-3 for malicious or harmful purposes. The paper states that threat actors can be organized by skill and resource levels, \u201cranging from low or moderately skilled and resourced actors who may be able to build a malicious product to \u2026 highly skilled and well resourced (e.g. state-sponsored) groups with long-term agendas.\u201d Interestingly, OpenAI researchers write that threat actor agendas are \u201cinfluenced by economic factors like scalability and ease of deployment\u201d and that ease of use is another significant incentive for malicious use of AI. It seems that the very principles that guide the development of many emerging AI models like GPT-3\u2014scalability, accessibility, and stable infrastructure\u2014could also be what position these models as perfect options for threat actors seeking to undermine personal and collective agency online.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Staying with the projected scenario of GPT-3-text becoming widespread online, it is useful to consider the already algorithmic nature of our interactions online. In her <\/span><a href=\"http:\/\/www.wired.com\/story\/ai-generated-text-is-the-scariest-deepfake-of-all\/\"><span style=\"font-weight: 400\">article<\/span><\/a><span style=\"font-weight: 400\">, DiResta writes about the Internet that \u201calgorithmically generated content receives algorithmically generated responses, which feeds into algorithmically mediated curation systems that surface information based on engagement.\u201d Introducing an AI \u201cvoice\u201d into this environment could make our online interactions even less human. One example of possible algorithmic accomplices of GPT-3 are Google Autocomplete algorithms which internalize queries and often <\/span><a href=\"https:\/\/www.tandfonline.com\/doi\/abs\/10.1080\/17405904.2012.744320\"><span style=\"font-weight: 400\">reflect \u201c-ism\u201d statements and biases<\/span><\/a><span style=\"font-weight: 400\"> while processing suggestions based on common searches. The presence of AI-generated texts could populate Google algorithms with even more problematic content and continue to narrow our ability to have control over how we acquire neutral, unbiased knowledge.<\/span><\/p>\n<p><b><i>An Emotional Problem<\/i><\/b><\/p>\n<p><span style=\"font-weight: 400\">Talk of GPT-3 passing <\/span><a href=\"https:\/\/plato.stanford.edu\/entries\/turing-test\/\"><span style=\"font-weight: 400\">The Turing Test<\/span><\/a><span style=\"font-weight: 400\"> reflects many concerns about creating increasingly powerful AI. GPT-3 seems to hint at the possibility of a future where AI is able to replicate those attributes we might hope are exclusively human\u2014traits like creativity, ingenuity, and, of course, understanding language. As Microsoft AI Blog contributor Jennifer Langston writes in a recent <\/span><a href=\"https:\/\/blogs.microsoft.com\/ai\/openai-azure-supercomputer\/\"><span style=\"font-weight: 400\">post<\/span><\/a><span style=\"font-weight: 400\">, \u201cdesigning AI models that one day understand the world more like people starts with language, a critical component to understanding human intent.\u201d\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">Of course, as a machine-learning model, GPT-3 relies on a neural network (inspired by neural pathways in the human brain) that can process language. Importantly, GPT-3 represents a massive acceleration in scale and computing power (rather than novel ML techniques), which give it the ability to exhibit something eerily close to human intelligence. A recent <\/span><a href=\"https:\/\/www.vox.com\/future-perfect\/21355768\/gpt-3-ai-openai-turing-test-language\"><span style=\"font-weight: 400\">Vox article<\/span><\/a><span style=\"font-weight: 400\"> on the subject asks the question, \u201cis human-level intelligence something that will require a fundamentally new approach, or is it something that emerges of its own accord as we pump more and more computing power into simple machine learning models?\u201d For some, the idea that the only thing distinguishing human intelligence from our algorithms is our relative \u201ccomputing power\u201d is more than a little uncomfortable.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">As mentioned earlier, GPT-3 has been able to exhibit creative and artistic qualities, generating a <\/span><a href=\"https:\/\/www.gwern.net\/GPT-3\"><span style=\"font-weight: 400\">trove of literary content<\/span><\/a><span style=\"font-weight: 400\"> including poetry and satire. The attributes we\u2019ve long understood to be distinctly human are now proving to be replicable by AI, raising new anxieties about humanity, identity, and the future.<\/span><\/p>\n<figure id=\"attachment_669\" aria-describedby=\"caption-attachment-669\" style=\"width: 300px\" class=\"wp-caption alignright\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-669\" src=\"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-content\/uploads\/sites\/35\/2021\/03\/Screenshot-217-1-300x265.png\" alt=\"\" width=\"300\" height=\"265\" srcset=\"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-content\/uploads\/sites\/35\/2021\/03\/Screenshot-217-1-300x265.png 300w, https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-content\/uploads\/sites\/35\/2021\/03\/Screenshot-217-1-1024x906.png 1024w, https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-content\/uploads\/sites\/35\/2021\/03\/Screenshot-217-1-768x679.png 768w, https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-content\/uploads\/sites\/35\/2021\/03\/Screenshot-217-1-1536x1359.png 1536w, https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-content\/uploads\/sites\/35\/2021\/03\/Screenshot-217-1-2048x1811.png 2048w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><figcaption id=\"caption-attachment-669\" class=\"wp-caption-text\"><span style=\"color: #993366\">GPT-3&#8217;s recreation of Allen Ginsberg&#8217;s &#8220;Howl&#8221;<\/span><\/figcaption><\/figure>\n<h3><b>GPT-3\u2019s Limitations<\/b><\/h3>\n<p><span style=\"font-weight: 400\">While GPT-3 can generate impressively human-like text, <\/span><a href=\"blank\"><span style=\"font-weight: 400\">most researchers maintain<\/span><\/a><span style=\"font-weight: 400\"> that this text is often \u201cunmoored from reality,\u201d and, even with GPT-3, we are still far from reaching artificial general intelligence. In a recent MIT Technology Review <\/span><a href=\"https:\/\/www.technologyreview.com\/2020\/07\/20\/1005454\/openai-machine-learning-language-generator-gpt-3-nlp\/\"><span style=\"font-weight: 400\">article<\/span><\/a><span style=\"font-weight: 400\">, author Will Douglas Heaven points out that GPT-3 often returns contradictions or nonsense because its process is not guided by any true understanding of reality. Ultimately, researchers believe that GPT-3\u2019s human-like output and versatility are the results of excellent engineering, not genuine intelligence. GPT-3 uses many of its parameters to memorize Internet text that doesn&#8217;t generalize easily, and essentially parrots back \u201csome well-known facts, some half-truths, and some straight lies, strung together in what first looks like a smooth narrative,\u201d according to Douglas. As it stands today, GPT-3 is just an early glimpse of AI\u2019s world-altering potential, and remains a narrowly intelligent tool made by humans and reflecting our conceptions of the world.<\/span><\/p>\n<p><span style=\"font-weight: 400\">A final point of optimism is that the field around <\/span><a href=\"https:\/\/plato.stanford.edu\/entries\/ethics-ai\/\"><span style=\"font-weight: 400\">ethical AI<\/span><\/a><span style=\"font-weight: 400\"> is ever-expanding, and developers at OpenAI are looking into the possibility of automatic discriminators that may have greater success than human evaluators at detecting AI model-generated text. In their research paper, developers wrote that \u201cautomatic detection of these models may be a promising area of future research.\u201d Improving our ability to detect AI-generated text might be one way to regain agency in a possible future with bias-reproducing AI \u201cjournalists\u201d or undetectable deepfaked text spreading misinformation online.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Ultimately, GPT-3 suggests that language is more predictable than many people assume, and challenges common assumptions about what makes humans unique. Plus, exactly what\u2019s going on inside GPT-3 isn\u2019t entirely clear, challenging us to continue to think about the <\/span><a href=\"https:\/\/www.forbes.com\/sites\/cognitiveworld\/2019\/07\/23\/understanding-explainable-ai\/?sh=63d6e8367c9e\"><span style=\"font-weight: 400\">AI \u201cblack box\u201d problem<\/span><\/a><span style=\"font-weight: 400\"> and methods to figure out just how GPT-3 reiterates natural language after digesting millions of snippets of Internet text. However, perhaps GPT-3 gives us an opportunity to decide for ourselves whether even the most powerful of future text generators could undermine the distinctly human conception of the world and of poetry, language, and conversation. A tweet Douglas quotes in his article from user @mark_riedl provides one possible way to frame both our worries and hopes about tech like GPT-3: \u201cRemember&#8230;the Turing Test is not for AI to pass, but for humans to fail.\u201d<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Recent advances in machine-learning systems have led to both exciting and unnerving technologies\u2014personal assistance bots, email spam filtering, and search engine algorithms are just a few omnipresent examples of technology made possible through these systems. Deepfakes (deep learning fakes), or, algorithm-generated synthetic media, constitute one example of a still-emerging and tremendously consequential development in machine-learning. [&hellip;]<\/p>\n","protected":false},"author":91,"featured_media":660,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_genesis_hide_title":false,"_genesis_hide_breadcrumbs":false,"_genesis_hide_singular_image":false,"_genesis_hide_footer_widgets":false,"_genesis_custom_body_class":"","_genesis_custom_post_class":"","_genesis_layout":"","footnotes":""},"categories":[65,1],"tags":[85,87,96,97,98,99],"class_list":{"0":"post-655","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-csci-tech","8":"category-science","9":"tag-ai","10":"tag-ai-ethics","11":"tag-artificial-intelligence","12":"tag-gpt-3","13":"tag-online-journalists","14":"tag-textfakes","15":"entry"},"featured_image_src":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-content\/uploads\/sites\/35\/2021\/03\/jason-leung-600x400.jpg","featured_image_src_square":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-content\/uploads\/sites\/35\/2021\/03\/jason-leung-600x600.jpg","author_info":{"display_name":"Micaela Simeone '22","author_link":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/author\/msimeone\/"},"_links":{"self":[{"href":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-json\/wp\/v2\/posts\/655","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-json\/wp\/v2\/users\/91"}],"replies":[{"embeddable":true,"href":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-json\/wp\/v2\/comments?post=655"}],"version-history":[{"count":0,"href":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-json\/wp\/v2\/posts\/655\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-json\/wp\/v2\/media\/660"}],"wp:attachment":[{"href":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-json\/wp\/v2\/media?parent=655"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-json\/wp\/v2\/categories?post=655"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-json\/wp\/v2\/tags?post=655"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}