Generative AI tools are trained using immense amounts of existing information and data. When that training data contains biases, like stereotypes or imbalanced representations and perspectives, the AI can learn and replicate those biases in its responses. AI tools are trained on content that reflect all the discriminations and injustices in our society that made that content. As such, generative AI tools may produce responses that are racist, sexist, homophobic, ableist, or otherwise discriminatory.
Furthermore, of the people working in the AI field, approximately 70% identify as male, and the majority are white. These statistics mean that the algorithms used to build AI models might favor certain outcomes, behaviors, or groups of people—and exclude or disadvantage others. For example, a 2024 study from researchers at the University of Washington found that AI tools used to screen job applicants overwhelmingly favored white, male candidates over other applicants.
Regular audits of AI models and more diverse training datasets can help reduce the impact of bias in generative AI tools. But human oversight and intervention are essential to identifying and correcting these biases. As a researcher, it is important to remain critical of AI-generated content and to compare AI responses with other sources and perspectives.
Generative AI tools, especially ones that are free to use, often harvest information from their users, increasing the risk of a breach of your personal and sensitive data. Your prompts (e.g., your questions, instructions, and other information shared with the AI) may be shared with the tool’s training dataset. While this data harvesting is meant to deepen the AI's training data and improve its responses, it also happens automatically without your explicit consent. Furthermore, this kind of data harvesting can lead to re-identification, or the process of matching anonymized data back to specific individuals. AI can analyze patterns, combine datasets, and infer identities from seemingly anonymous information.
Privacy policies for some generative AI tools permit the AI developers to sell users' personal and sensitive information to third parties. Read each tool's privacy policy to understand how your personal data is collected, used, stored, and protected. Periodically review these policies as they may change without notice.
Having trouble finding a privacy policy for a generative AI tool? Try prompting the tool to share its privacy policy with you.
UDC's license for Microsoft CoPilot ensures your personal data is neither stored nor used to train that AI model. When using Microsoft CoPilot, make sure to log in with your UDC email and password first.
Generative AI tools raise significant copyright concerns. These tools are trained on massive collections of information, both in the public domain and copyrighted. These materials are harvested from the internet without their creators' permission or compensation. Because the training data for AI models may include copyrighted materials, AI-generated content has the potential for copyright infringement. Ongoing legal battles seek to determine whether it is legal to use copyrighted materials to train AI models.
Meanwhile, the US Copyright Office has stated that content created by generative AI tools generally cannot be copyrighted because human authorship is required for copyright protection.
Generative AI is energy intensive. AI relies on data centers, the facilities that house the computing resources and infrastructure needed to support AI workloads. These data centers use massive amounts of energy. Training AI models consumes thousands of megawatt hours of electricity and emits hundreds of tons of carbon into the air. AI data centers also require water to cool their equipment and maintain optimal temperatures, placing a strain on already limited supplies of freshwater in some parts of the world. These cooling systems are loud, adding noise pollution to their surrounding environment. Furthermore, many AI servers require minerals and other rare materials, which are often mined unsustainably.
In response to a 2019 study from researchers at the University of Massachusetts, Amherst, MIT Technology Review reports that the process for training AI models "can emit more than 626,000 pounds of carbon dioxide equivalent—nearly five times the lifetime emissions of the average American car (and that includes manufacture of the car itself)."
The environmental impact of AI is not equitable across the globe. Pollution and resource scarcity disproportionately affect developing countries and regions and communities already vulnerable to the harms of climate change.
Generative AI tools rely on the labor and creative output of authors, artists, and other creators whose materials are harvested from the internet and used to train AI models.
Additionally, many AI tools, such as ChatGPT, rely on human workers to serve as "content aggregators" or "content moderators." These workers are responsible for labeling and annotating training data, testing algorithms, and other tasks. A Time magazine article, "150 African Workers for ChatGPT, TikTok and Facebook Vote to Unionize at Landmark Nairobi Meeting" (2023), explains that this labor is often outsourced to countries outside of the US, where workers are severely underpaid, overworked, and sometimes exposed to traumatizing content they are tasked with filtering out of the AI model.