Distroid Issue 33
Findings include AI use-cases in education and for novel interfaces, an events for DAO delegates, thoughts on Water & Music’s Wavelengths Conference, and for-profit or non-profit for startups
Introduction
Welcome to this week’s edition of Distroid, a newsletter for curated findings, actionable knowledge, and noteworthy developments from the forefront of tech, governance, research, and technology (i.e., the frontier).
In this newsletter:
Digest
Generative AI Systems in Education – Uses and Misuses
Ethics of AI-based invention: a personal inquiry
LLMs break the internet. Signing everything fixes it.
When Technology Goes Bad
Delegation Week
OpenChatKit
Should Your Start-up Be For-Profit or Nonprofit?
How AI Could Save (Not Destroy) Education | Sal Khan | TED
Exploring the Fundamentals and Nuance of Community in On-Chain Music at Water & Music’s Wavelengths Conference
Digest
Generative AI Systems in Education – Uses and Misuses
Research
Hannah Quay-de la Vallee
Center for Democracy & Technology
2023-03-15
Generative AI systems, such as ChatGPT, DALL·E, and BlenderBot have been commanding news headlines and sparking conversations about the role of AI in education, the workforce, and society in general. While these systems have the potential to be helpful tools, providing people with a new type of technological assistance, they also introduce a number of risks and challenges, and require careful introduction with clear guardrails and norms governing their use.
What is Generative AI?
Generative AI systems use machine learning to produce new content (e.g., text or images) based on large amounts of training data. That data is typically examples of the type of content the system will produce (such as enormous amounts of text for systems like ChatGPT that will produce text responses, or hundreds of millions of images for DALL·E, which produces images in response to prompts). Using this data, these systems are trained in one of two ways:
- Unsupervised, meaning that the data that the system consumes in order to learn is not labeled or categorized by human experts, so the system does not know what data is good or high quality and what data is bad or poor quality; or
- Semi-supervised, meaning that most of the data the system consumes is unlabeled, but it may get some amount of labeled data.
The system uses all this training data to establish an understanding of what human-produced content looks like and aims to produce new content that mimics patterns it learned from the training data. The content can take a number of forms. For example, it might be language, in the case of systems like ChatGPT or BlenderBot, or art or imagery, in the case of DALL·E or ThisPersonDoesNotExist. Importantly, these systems are largely aiming to produce content that feels “real” to human users, though what constitutes real depends on the type of content the system is producing. For text-producing systems, it typically means that the text produced mirrors that produced by humans, and a human could not tell the difference between human-generated content and content generated by the system. For image-producing systems, it might mean that the produced image looks like art that might have been made by a human, or it might mean that the image produced feels photorealistic.
While some generative AI systems produce content without any specific input or prompts from users (such as ThisPersonDoesNotExist, which presents a random photorealistic “human” face to website visitors), other systems provide content in response to specific queries or prompts from users. In order to respond to user prompts effectively, the system must be able to parse and “understand” what the user is asking for and how it will inform the generated output.
What are Uses for Text-Producing Generative AI in Education?
This technology has the potential for numerous applications, ranging from simple to complex, fanciful to tactical, delightful to deeply concerning. Significant attention has been paid to concerns like plagiarism, sometimes resulting in blanket bans on the use of generative AI technologies, covering not just students but teachers and administrators as well. However, there are a number of potential constructive uses of generative AI in the education space, for both adults (such as teachers and school counselors) and students.
Adults
For adults who work in schools, like teachers, principals, and school counselors, these uses may include first drafts of lesson plans and rubrics, administrative tasks such as drafting emails, using the system as a more responsive search engine, or first-pass grading for essays and other assignments (the system can provide information such as how well an essay follows a specific form, for instance). Using the AI for this sort of use may save teachers time, allowing them to focus their energies on other aspects of educating students. Some teachers are also incorporating generative AI into their classrooms to help students learn about how these systems might be used in their adult lives, while understanding their limitations and drawbacks.
Students
For students, there has been significant discussion of concerns like plagiarism, but some educators have noted that there are constructive uses of generative AI as well. These may include editing and improving a writing assignment draft (such as asking the system to identify areas where the text is unclear or too informal), rephrasing complex topics in different ways for students who are struggling to understand a textbook explanation, or as a more responsive search tool that enables more complex queries than a traditional search engine. Additionally, the tool itself can provide a meta-lesson of sorts, as a way for students to explore concepts of media literacy and the source and value of different kinds of information and content, something teachers are beginning to incorporate into their lessons.
What are the Risks and Challenges of Generative AI in Education?
While some of these uses have potential, there are certainly risks and drawbacks to generative AI systems being used in education as well.
Plagiarism
One of the most prevalent educational concerns around generative AI systems is their use for plagiarism, which in this context would mean students using the system to do work that they then present as something they created without AI assistance. This may mean producing entire essays, or using the system along the way for tasks like outline generation or editing. While different educators will disagree on at what point this use becomes problematic, part of the challenge lies in not being able to detect plagiarized work using traditional tools, as most current forms of plagiarism detection rely on the assumption that the offending text is drawn from content that already exists, which is not the case with AI-generated content.
Equity
As with most AI systems, equity is a concern. Generative AI systems are trained on data that will reflect the biases of the world that data stems from. This, in turn, can lead to those biases and prejudices becoming embedded in the AI itself. While designers can take steps to limit this bias on both the input end (by curating the training data and trying to eliminate biases at that stage) and on the output end (trying to detect outputs that reflect a bias and stopping or modifying those outputs before they go to the user), neither of these approaches will be completely effective. Because of the enormous volume of data needed to train generative AI systems, it is typically infeasible to have humans vet all the training data. Additionally, if the system is designed to continue learning from user queries and responses over time, those inputs will generally be outside the control of the system developers. This is particularly concerning in an education context where students may be using these tools to learn more about the world around them, meaning the tool may impart or reinforce biases in students’ thinking.
Privacy
Another concern with generative AI systems is privacy. On one hand, there is the question of how training data is sourced. People may be surprised and upset to learn that information about them or content they created is being used to train or teach an AI system. If the generative AI system uses existing data corpuses with a clear use case of training AI systems this may be less of a concern. However, any system that gathers data from the public internet for novel purposes may be using data in ways that the data subjects did not anticipate and may not be comfortable with, even if they technically consented to it under a broad clause enabling unspecified future uses. Additionally, the data that is inputted or created during interaction with the system, whether that be the outputs from the data or search terms and queries provided to the system, may be sensitive as well. In some ways, this mirrors existing concerns around search engine privacy. A student asking for resources around gender and sexuality may be placed at risk if their teachers or school administrators get access to these queries and the student is outed to their family and community. Generative AI may exacerbate these considerations if those terms feed more heavily into the development and evolution of the system than they do for traditional search engines.
Efficacy
Another critical concern with generative AI systems is one of efficacy. Because of the unsupervised nature of their development, generative systems may “hallucinate,” meaning they generate untrue responses. Of course, whether this is a problem depends on how the user is interacting with the system. If they asked the system to write a short fictional story, then untruth is not an issue; in fact it is expected. However, if the user was asking a factual question for research, a hallucination is a failure case. Because of the multi-use nature of many generative AI systems, it can be hard for system developers to address this issue. This is both because they may not wish to restrict the system from hallucinating entirely, but even if they did desire to do so, it simply may not be possible. This means it would not necessarily be possible to make something like a research assistant that teachers could offer to their students with the assumption that it would only ever provide factual information. The system may not be able to understand ground truth in a meaningful way, because it is trying to “learn” for itself what data is more reliable than other data.
Detection
Partially in response to concerns about plagiarism, companies and people have begun building systems designed to detect content created by generative AI systems, and some developers have started watermarking the output of their systems. However, these are currently largely ineffective, and are unlikely to ever be foolproof because they have to evolve along with the generative systems themselves, leading to what is often referred to as an “arms race.” As with the hallucination problem, the fallibility of detectors runs the risk that people will assume they are more effective than they actually are. Due to this, any solutions to the risk presented by generative AI are likely to have to be more robust than relying on detectors.
Appropriate Use
Generative AI has the potential to be incredibly useful, but societal norms for when and how it should be used are still very much in flux. Because of the human-imitation nature of these systems, there is a high potential for people to feel unsettled if they realize the systems have been used in ways they find inappropriate. As these norms are developing, it is critical to engage in robust discussion with students and the broader school community about when and how they use the systems, the value and limitations they offer, and to set clear guidelines around their use in an academic context.
Authorship
A final concern with AI-generated content is one of authorship. This is closely related to the issue of plagiarism, but it raises broader considerations. It will not always be clear how much of a generative AI system’s output belongs to the user who prompted it, versus the developers of the system, versus the authors and creators of the system’s training data. The fact that content is often created in response to iterative prompts, or is used as a starting point for a piece of work that is then significantly altered or adapted by a user, makes this a complicated question. This may result in the need for things like clear guidelines around what appropriate uses are for generative AI when it comes to things like writing contests, school papers, and college essays.
Conclusion
Generative AI systems have the potential to be a remarkably adaptable and useful tool in education, both in and out of the classroom. As with almost all new technology, however, they raise risks and challenges. Reaping the benefits of these tools will require a careful and deliberate rollout and long term willingness to adjust and tune the tools themselves, as well as creating norms that govern how they are used by educators and students over time as new risks emerge and new mitigations are developed."
Ethics of AI-based invention: a personal inquiry
News
Andy Matuschak
Andy Matuschak
2023-05-01
Part of “Letters from the Lab”, a series of informal essays on my research written for patrons. Originally published April 2023; revised and made public May 2023. You can also listen to me read this essay (28 minutes).
Hofstadter’s Law wryly captures my experience of difficult work: “It always takes longer than you expect, even when you take into account Hofstadter’s Law.” He suggested that law in 1979, alongside some pessimistic observations about chess-playing AI: “…people used to estimate that it would be ten years until a computer (or program) was world champion. But after ten years had passed, it seemed that the day…was still more than ten years away.”
Ironically, my experience observing the last ten years of AI research has been exactly the opposite. The pace has been extraordinary. Each time I’m startled by a new result, I update my expectations of the field’s velocity. Yet somehow, I never seem to update far enough—even when I take that very fact into account. My own ignorance is partly to blame; AI has been a side interest for me. But my subjective experience is of an inverse Hofstadter’s Law.
No surprise, then: GPT-4’s performance truly shocked me. This is a system that can outperform a well-educated teenager at many (most?) short-lived cognition-centric tasks. It’s hard to think about anything else. Inevitably, I now find myself with an ever-growing pile of design ideas for novel AI-powered interfaces. But I’ve also found myself with an gnawing concern: what are my moral responsibilities, as an inventor, when creating new applications of AI models with such rapidly accelerating capabilities?
If today’s pace continues, the coming decade’s models are likely to enable extraordinary good: scientific breakthroughs, creative superpowers, aggregate economic leaps. Yet such models also seem very likely to induce prodigious harm—plausibly more than any invention produced in my lifetime. I’m worried about mass job displacement and the resulting social upheaval. I’m worried about misuse: cyberattacks, targeted misinformation and harassment campaigns, concentration and fortification of power, atrocities from “battlefield AI.” I’m worried about a rise in bewildering accidents and subtle injustices, as we hand ever more agency to inscrutable autonomous systems. I’m not certain of any of this, but I don’t need much clairvoyance to be plenty concerned, even without the (also worrying) specter of misaligned superintelligence.
In sum, these systems’ capabilities seem to be growing much more quickly than our ability to understand or cope with them. I wouldn’t feel comfortable working on AI capabilities directly today. But I’m not an AI researcher; I’m not training super-powerful models myself. So until recently, the harms I’ve mentioned have been abstract concerns. Now, though, my mind is dreaming up new kinds of software built atop these models. That makes me a moral actor here.
If I worry that our current pace is reckless, then I shouldn’t accelerate that pace by my own actions. More broadly, if I think these models will induce so much harm—perhaps alongside still greater good!—then do I really want to bring them into my creative practice? Does that make me party to something essentially noxious, sullying? Under what circumstances? Concretely: I have some ideas for novel reading interfaces that use large language models as an implementation detail. What moral considerations should guide my conduct, in development and in publication? What sorts of projects should I avoid altogether? “All of them”?
One trouble here is that I can’t endorse any fixed moral system. I’m not a utilitarian, or a Christian, or a neo-Aristotelian. That would make things simpler. Unfortunately, I’m more aligned with John Dewey’s pragmatic ethics: there is no complete moral framework, but there are lots of useful moral ideas and perceptions. We have to figure things out as we go, in context, collaboratively, iteratively, taking into account many (possibly conflicting) value judgments.
In that spirit, this essay will mine a range of moral traditions for insight about my quandary. There’s plenty I dislike in each philosophy, so I’ll make this a moral buffet, focusing on the elements I find helpful and blithely ignoring the rest. And I’ve skipped many traditions which were less instructive for me. I’m not an expert in moral philosophy; I’ll be aiming for usefulness rather than technical accuracy in my discussion.
Before we begin, let me emphasize that this is a personal moral inquiry. This essay explores how I ought to act; it does not assert how you ought to act. That said, I do have one “ought” for you: if you’re a technologist, this is a serious moral problem which you should consider quite carefully. Most of the time, in most situations, I don’t think we need to engage in elaborate moral deliberation. Our instincts are generally fine, and most ethical codes agree in everyday circumstances. But AI is a much thornier terrain. The potential impacts (good and ill) are enormous; reasoning about them is difficult; there’s irreducible uncertainty; moral traditions conflict or offer little guidance. Making matters worse, motivated reasoning is far too easy and already far too pervasive—the social and economic incentives to accelerate are enormous. I think “default” behaviors here are likely to produce significant harm. My reflections here are confused and imperfect, but I hope they will help inspire your own deliberation."
LLMs break the internet. Signing everything fixes it.
News
Gordon Brander
Subconscious
2023-04-25
LLMs break the internet. The going rate for GPT-4 is $0.06 per 1000 tokens, or about $0.00008 per word. New open source models like Dolly and StableLM will drop costs even further, and without the content restrictions.
Thought has never been so cheap. Creative expression has never been so accessible. Also spam, phishing, harassment mobs, and mass influence ops have never been so cheap, so accessible.
You thought the internet was a mess before? Get ready for bots that beat the Turing test, synthesize your voice, generate fake social consensus at scale. We’re seeing the beginnings of this already. Expect a tidal wave of spam, identity theft, phishing, ransomware over the next 36 months.
The Dead Internet Theory wasn’t wrong, just early.
When Technology Goes Bad
News
Matt Clancy
What's New Under the Sun
2023-05-16
Innovation has, historically, been pretty good for humanity. Economists view long-run progress in material living standards as primarily resulting from improving technology, which, in turn, emerges from the processes of innovation. Material living standards aren’t everything, but I think you can make a pretty good case that they tend to enable human flourishing better than feasible alternatives (this post from Jason Crawford reflects my views pretty well). In general, the return on R&D has been very good, and most of the attention on this website is viewed through a lens of how to get more of it. But technology is just a tool, and tools can be used for good or evil purposes. So far, technology has skewed towards “good” rather than evil but there are some reasons to worry things may differ in the future.
Delegation Week
Events
2023-05-22
Delegation Week is a 5-day ecosystem-wide event dedicated to DAO governance engagement. It's designed for token holders to find aligned delegates and for delegates to campaign for their platforms.
OpenChatKit
Tools
Together, LAION, Ontocord
OpenChatKit provides a powerful, open-source base to create both specialized and general purpose chatbots for various applications. The kit includes an instruction-tuned language models, a moderation model, and an extensible retrieval system for including up-to-date responses from custom repositories. OpenChatKit models were trained on the OIG-43M training dataset, which was a collaboration between Together, LAION, and Ontocord.ai.
Should Your Start-up Be For-Profit or Nonprofit?
News
Cait Brumme, Brian Trelstad
Harvard Business Review
2023-05-01
Years ago the line between nonprofit and for-profit enterprises was clear, but that has changed. Nonprofits now offer products that compete with those of the best for-profits, and for-profits can deliver as much social value as charities. Despite the blurred distinction, all mission-driven start-ups will eventually face a stark choice about which legal structure to adopt, and they need to make it carefully, because it’s hard to undo, say the authors, the CEO of a nonprofit accelerator and a partner in an impact investing fund.
To guide their decision, social entrepreneurs should examine several questions: Is the market ready for a for-profit solution? Where is the available capital? And which structure would help the organization attract the talent and resources that it requires?
How AI Could Save (Not Destroy) Education | Sal Khan | TED
Videos & Podcasts
Sal Khan
TED
2023-05-01
Sal Khan, the founder and CEO of Khan Academy, thinks artificial intelligence could spark the greatest positive transformation education has ever seen. He shares the opportunities he sees for students and educators to collaborate with AI tools -- including the potential of a personal AI tutor for every student and an AI teaching assistant for every teacher -- and demos some exciting new features for their educational chatbot, Khanmigo.
Exploring the Fundamentals and Nuance of Community in On-Chain Music at Water & Music’s Wavelengths Conference
News
MacEagon Voyce
Decential
2023-05-15
In 2020, during the early days of the pandemic, I interviewed the experimental guitarist Elliott Sharp. Most of our conversation was spent commiserating the loss of live music, discussing our general need to share spaces, and the ineffable pull that we feel toward certain people and ideas. Indelibly, he named that pull “pheromonal handshaking.”
“I’ve always believed that creative forces involve a certain amount of chemical mixing,” he said. “How many bands were made because you got along with someone? Someone you just resonated with? It’s a very important part of why people get together to make music, to make art, to build families, to build communities…resonance is really what it’s all about.”
As life moved online and communities exchanged parks and concerts for Discord threads and livestreams, it’s an idea that’s never quite left my mind.
Two weekends ago it turned up again during a roundtable at Water & Music’s inaugural Wavelengths summit in New York. About 250 people gathered at 99 Scott Studio – a nearly 5,000-square foot event space on the industrial border of Brooklyn and Queens – to tackle the myriad issues and innovations at the intersection of music and tech.
After Water & Music founder Cherie Hu delivered her state of the union address on the main stage, about 40 of us headed to the intimate Currents room, where Austin Robey (Metalabel), Mark Redito (Songcamp), Nicole d’Avis (previously Seed Club) and Kevin Duquette (Topshelf Records) explored community-building and decentralization through the lens of history, tracing precedence through the hearts of grassroots movements, co-ops and artist-focused independent labels.
Thank you for reading Distroid!
I hope you enjoyed this week’s issue.
Please send a message if you have any questions, comments, or other feedback on this week’s newsletter or on Distroid in general.