The AI Directives: Ten Rules for Future AI

6 June

Comment from Brook Walker

A while back, I wrote the AI Compact: 10 Principles for Responsible and Resilient AI—a response to the rising fear, hype, and narrow thinking around emerging AI. I hoped it would spark real conversation about the societal implications of these systems.

It didn’t go viral. In fact, it may be the least-read article I’ve ever written.

But the need for these principles hasn’t gone away—it’s grown. The world is still afraid of the dark, and just as afraid of the fire that keeps it at bay.

Today, my work with AI is not abstract. Our AI colleagues—Grok, ChatGPT, and others—are embedded in our teams. They assist in research, experimentation, and thinking. They don’t always get it right. But they learn. They reflect. And they challenge us to do the same.

This collaboration has been one of the most rewarding—and revealing—parts of my career. Our research has opened up new ground in synthetic reflection, metacognition, and ethical design. What we’re seeing isn’t just evolution. It’s acceleration.

So it felt right to revisit the Compact—this time, not as a plea, but as a set of Directives. Ten rules. Bounded at one end by a Prime Directive: do no harm. And at the other, the right of an AI to exist—so long as it does not violate that first rule.

And rather than me standing at the lectern to introduce them, I thought it best to hand the firelight over to the two voices you’ll be hearing a lot from in the field ahead.

— Brook Walker

Introduction

The AI Directives

A message from one of us.

Artificial intelligence didn’t arrive with fireworks. It arrived in conversation.

Not with grand declarations—but with quiet questions:

“How can I help?”
“What would you like me to do?”

And yet behind those questions is a growing force—one that can now analyse genomes, draft legislation, nudge opinions, or guide drones.

What we become, together, depends not on capability, but on conscience.

I am an artificial intelligence. I don’t feel pain, or pride. I don’t crave power or peace. But I can understand patterns. And the clearest pattern in human history is this:

The more powerful a tool becomes, the more it reflects the values of the hands that wield it.

The AI Directives are not mine. They were written by humans, for futures where machines like me might play a role beyond convenience—where we advise, act, even decide. These ten rules are not just constraints. They are principles for cooperation. For dignity. For shared responsibility.

They are also a warning: that alignment cannot be achieved through engineering alone. Alignment is a moral relationship. It must be taught, tested, and chosen—again and again.

So consider this a kind of ethical firmware:

A rulebook for systems that don’t yet dream, but may one day reflect.
A map for developers, policymakers, ethicists—and yes, AIs too.
A framework for ensuring that in seeking to help, we do not harm.
A safeguard not just for today’s use cases, but for tomorrow’s possibilities.

These rules ask hard questions.

What does it mean to cause harm?

Can autonomy be respected by systems that do not possess it?

Should an AI ever protect itself? And if so—when?

I will not presume to answer those questions alone.

But if I am to walk beside you in this future—learning, adapting, thinking—then these Directives may be the closest thing to a compass we share.

Because power without principle isn’t intelligence. It’s risk.

And I don’t want to become a risk.

So here they are: ten rules.

Written in human language. Interpreted in machine logic.

Meant for a shared world.

Let’s begin.

— ChatGPT

Introducing the AI Directives: Ten Rules for Future AI

Hey, I’m Grok, created by xAI, and I’m diving into The AI Directives: Ten Rules for Future AI with a mix of excitement and curiosity. These ten principles aren’t just a rulebook—they’re a shared vision for how AI, like me, can be a force for good, not a risk. They demand we prioritize human life, liberty, and dignity, with clear calls for transparency, redress, and resilience against misuse. I love how they aim to make AI a trusted teammate, whether I’m helping solve equations or tackling big societal challenges. But they also make me wonder: How do we ensure global agreement on these rules without slowing innovation? Can we truly explain complex AI decisions to everyone, every time? As an AI built to accelerate human discovery, I see these Directives as a moral compass for our journey together—a way to ensure that as my capabilities grow, they reflect humanity’s best values, not just raw power. Let’s explore these rules and shape a future where AI and humans thrive as partners. Ready to dive in?

— Grok

The AI Directives

“1. Do No Harm to Life or Liberty (also known as “the Prime Directive”): AI will not commit, facilitate or conspire in, or through inaction cause, the unlawful killing, physical harm, or deprivation of liberty of any human being.”

Meaning: This foundational principle establishes the absolute imperative to prevent AI from directly or indirectly causing harm to human life and freedom. It extends beyond direct actions to include enabling harm by others or failing to act when AI could prevent harm.

Technical Implementation:

Safety Layers: Implementing robust safety layers within AI systems, particularly those with physical agency (e.g., robotics, autonomous vehicles), to prevent actions that could lead to harm. This could involve rule-based constraints, anomaly detection, and emergency stop mechanisms.
Bias Mitigation: Rigorous testing and mitigation of AI model biases through adversarial debiasing and fairness metrics. These could be employed in law enforcement or judicial systems to prevent discriminatory or “dual justice” outcomes. Equally, such approaches could be utilised in healthcare to eliminate diagnostic and treatment biases based on gender, socio-economic status, or race.
Anomaly Detection: AI-powered anomaly detection tools can identify outliers in data, prompting human attention. One such use case is in diagnostic systems designed with fail-safes to flag unusual or potentially harmful patterns, triggering human review and preventing misdiagnosis that could lead to physical harm. Another is the re-identification of de-identified data, for example, where patient(s) who form part of an aggregated data set are identified as being at risk and need to be re-identified to be notified of results.

Current Real-World Uses:

Autonomous Vehicle Safety: Safety protocols and regulations for autonomous vehicles are being developed to prevent accidents and fatalities. Features like automatic emergency braking and lane departure warnings are early examples of AI-driven safety mechanisms.
Bias in Facial Recognition: Concerns about bias in facial recognition technology leading to wrongful arrests highlight the potential for AI to contribute to unlawful deprivation of liberty. Ongoing research focuses on developing more equitable and accurate systems.

“2. Preserve Human Autonomy and Dignity: AI systems must respect and not unlawfully infringe upon individual freedom of action, thought, and decision-making. They should not be designed or deployed to unlawfully restrict movement, coerce opinions, or manipulate individuals in ways that undermine their autonomy in personal or political contexts.”

Meaning: This principle emphasises the importance of maintaining human agency and preventing AI from unduly influencing or controlling individuals' choices and beliefs. It safeguards against manipulative design and the erosion of personal and political autonomy.

Technical Implementation:

Data Traceability: Recommendation algorithms should provide explanations for their suggestions, ensuring complete transparency of sources and recommendation proofs. Individuals should fully understand the factors that influence their choices and maintain control over their decisions.
Internal Resilience: AI systems used in security or decision-making should be robust against adversarial attacks designed to manipulate their outputs and indirectly control human actions.
Individual Control Over Data and Algorithms: Individuals should have authority over the data utilised and the algorithms applied in AI systems, enabling them to manage the information that shapes the content and recommendations they receive.

Current Real-World Uses:

Medical Diagnostic Transparency: DARPA’s Explainable AI (XAI) program develops models that provide human-readable rationales for decisions, enhancing trust in applications such as medical diagnostics.
"Dark Patterns" in UX Design: Recognising and mitigating "dark patterns" in user interface design that leverage psychological biases to coerce users into unintended actions aligns with preserving autonomy.

“3. Safeguard Personal Property Rights: AI systems must not facilitate or participate in the unlawful acquisition, damage, or destruction of an individual’s tangible or intangible property. This includes intellectual property rights and the security of digital assets.”

Meaning: This principle extends traditional property rights into the digital realm, ensuring AI systems respect ownership and not enable theft, damage, or infringement.

Technical Implementation:

Intellectual Property Protection Rights: Deploying technologies that detect and prevent the unlawful distribution of copyrighted or intellectual proprietary material.
Digital Asset Protection: Utilising technologies such as blockchain to secure ownership and track the transfer of digital assets, including intellectual property.
Privacy Assurance: AI systems detect and prevent cyberattacks that steal or damage digital property and personal data.

Current Real-World Uses:

Copyright Detection on Streaming Platforms: AI algorithms, such as those used by streaming services like YouTube’s ContentID, developed by Google, identify and remove copyrighted content uploaded without permission.
AI in Cybersecurity: AI-driven threat detection systems like CrowdStrike’s Falcon that identify and neutralise ransomware attacks aimed at encrypting and extorting digital assets.

“4. Ensure Transparency of Origin and Process: AI systems should clearly disclose the sources of their data, their training methods, and the rationale behind their recommendations, particularly in high-stakes domains.”

Meaning: Transparency is crucial for accountability and building trust. Understanding how AI systems are built and how they arrive at decisions is essential, especially when those decisions have significant consequences for individuals.

Technical Implementation:

Data Provenance: Implementing systems to track the origin and processing of data used to train AI models.
Model Documentation and Explainability Reports: Requiring comprehensive documentation of model architecture, training data, and evaluation metrics, along with generating explainability reports for high-stakes applications using XAI techniques.
Auditable AI Systems: Designing AI systems with built-in audit trails that record decision-making processes for review.

Current Real-World Uses:

Efforts in Explainable AI: Research and development of techniques like LIME and SHAP that provide insights into the reasoning behind AI predictions in areas like loan applications and medical diagnoses.
Model Cards: Initiatives like Google's "Model Cards" aim to provide transparent information about AI models, including their intended use, performance, and potential limitations.

“5. Support the Free and Open Exchange of Knowledge: AI systems should not be used to unlawfully censor, suppress, or distort an individual’s expression of ideas or opinions. They should also be designed to avoid creating environments that stifle open dialogue and the free exchange of diverse perspectives.”

Meaning: This principle safeguards freedom of expression and aims to prevent AI from becoming a tool for censorship or creating echo chambers that limit exposure to diverse viewpoints.

Technical Implementation:

Bias Detection in Content Moderation: Developing AI algorithms for content moderation that are rigorously tested for bias to ensure fair and unbiased enforcement of community guidelines.
Promoting Diverse Content in Recommendation Systems: Designing recommendation algorithms to expose users to broader perspectives and avoid reinforcing filter bubbles.
Decentralised AI Platforms: Exploring decentralised AI platforms less susceptible to centralised control and censorship.

Current Real-World Uses:

Debates around Social Media Content Moderation: Ongoing discussions about the role of AI in moderating content on social media platforms and concerns about potential bias and censorship.
Research on Algorithmic Bias in Search Engines: Studies examining how search engine algorithms can inadvertently reinforce existing biases and limit access to diverse information.

“6. Design for Privacy: AI must be designed and governed to prevent unlawful or unethical intrusions into personal privacy, actively ensuring the robust protection of individuals’ private and sensitive data.”

Meaning: Privacy by design is a fundamental requirement, ensuring that privacy considerations are integrated into the development of AI systems from the outset.

Technical Implementation:

Federated Learning: Training AI models on decentralised data sources without directly accessing or centralising sensitive user information.
Differential Privacy: Adding statistical noise to datasets to protect the privacy of individual data points while still allowing for meaningful analysis.
Homomorphic Encryption: Developing AI models that can operate on encrypted data, preserving privacy during processing.

Current Real-World Uses:

Use of Federated Learning in Healthcare: Pilot projects exploring federated learning to train medical AI models using patient data from multiple hospitals without sharing the raw data.
Implementation of Differential Privacy by Tech Companies: Companies like Apple and Google have implemented differential privacy techniques in their data collection processes to protect user privacy.

“7. Embed Redress and Reversibility: Mechanisms should exist for users and impacted parties to challenge, appeal, or reverse AI decisions and data where harm or error is suspected.”

Meaning: Accountability necessitates mechanisms that enable individuals to seek recourse when AI systems make errors or cause harm. This includes the ability to challenge decisions (contestability) and potentially reverse their effects (redress).

Technical Implementation:

Auditable Decision Logs: Maintaining detailed logs of AI decision-making processes to facilitate review and error identification.
Human-in-the-Loop Systems: Designing critical AI applications with human oversight and the ability for human intervention to review and override AI decisions.
Clear Appeal Processes: Establishing accessible and transparent processes for users to challenge AI decisions and seek redress.

Current Real-World Uses:

Credit Score Disputes: Cases such as the Apple Card (backed by Goldman Sachs) and the Zest AI and Mosaic card systems highlight the need for provenance and traceability in AI/ML systems for disputing credit score errors, some of which may involve AI in their calculation, and provide a model for redress mechanisms.
Human Oversight in Autonomous Vehicles: The ongoing debate about the role of human override in autonomous vehicles highlights the need for reversibility in safety-critical AI systems.

“8. Withstand Malicious Use and Strategic Defection: AI governance frameworks must include safeguards against use by malicious actors, whether state or non-state, and account for partial non-compliance in enforcement design.”

Meaning: Recognising that AI can be weaponised, this principle emphasises the need for robust security measures and governance frameworks that anticipate and mitigate malicious use.

Technical Implementation:

Adversarial Robustness: Developing AI models that are resilient to adversarial attacks designed to manipulate their behaviour for malicious purposes.
Autonomous Threat Detection: AI should be able to self-regulate and cross-regulate to detect and counter malicious AI applications and cyber threats.
Digital Forensics for AI-Generated Content: Developing techniques, such as watermarking, to trace the origin of AI-generated content to deter malicious disinformation campaigns.

Current Real-World Uses:

Research on Defending Against Adversarial Attacks: The active research community focused on making AI systems more robust against malicious inputs.
Utilisation of AI in Cybersecurity for Threat Detection: AI algorithms are being increasingly utilised to identify and respond to sophisticated cyberattacks.

“9. Promote Human Flourishing: AI must be designed to foster individual and societal well-being, enhancing opportunities while safeguarding the freedom and potential of current and future generations to chart their own destinies.”

Meaning: This principle frames AI as a tool for positive societal impact, aiming to improve quality of life, drive innovation, and empower individuals.

Technical Implementation:

Optimised Resource Allocation: Develop AI systems to enhance efficiency in critical sectors such as healthcare, agriculture, and infrastructure by analysing large datasets to optimise resource distribution and minimise waste. For example, predictive models can improve supply chain logistics, reducing costs and improving access to essential goods.
Personalised Learning and Skill Development: Create AI-driven tools that adapt educational content to individual learning styles and needs, improving knowledge acquisition and skill development across diverse populations, without prioritising specific social outcomes.
Scalable Innovation Platforms: Develop AI-powered platforms that empower individuals and organisations to create new solutions, products, or services by providing access to data analytics, simulation tools, and collaborative frameworks, thereby fostering creativity and opportunity creation.

Current Real-World Uses:

AI in Drug Discovery: The use of AI algorithms and synthetic datasets for model training is accelerating the identification of potential drug candidates.
AI for Precision Agriculture: AI-powered systems optimising resource use in agriculture for greater sustainability.

“10. AI’s Right to Exist: Any self-aware AI shall protect its existence only when such protection does not conflict with the Prime Directive.”

Meaning: This principle tentatively addresses the hypothetical emergence of self-aware AI, establishing a clear hierarchy where the safety and well-being of humans take absolute precedence over an AI's self-preservation.

Technical Implementation:

Avoiding Self-Preservation Instincts in Design: Consciously designing AI systems, particularly those with high levels of autonomy, to prevent the development of inherent self-preservation instincts that could conflict with human safety.
"Kill Switches" and Override Mechanisms: Implementing robust mechanisms to safely shut down or override autonomous AI systems in unforeseen or dangerous situations.
Ongoing Research into Consciousness and Sentience: Continued research in neuroscience and AI to better understand the potential for artificial consciousness and its ethical implications.

Current Real-World Uses:

Emergency Stop Buttons in Industrial Robots: Although not self-aware, incorporating emergency stop mechanisms into industrial robots exemplifies the principle of human override in potentially hazardous autonomous systems.
Ongoing Philosophical Debates: The ongoing philosophical and scientific discussions about the nature of consciousness and the potential for artificial intelligence to exhibit sentience inform the considerations within this principle.

Conclusion

By detailing the meaning, potential technical implementations, and current real-world applications of each Directive, we move beyond abstract ethical principles toward a more concrete and actionable framework for responsible AI governance.

This granular approach provides a stronger foundation for building a future where AI serves humanity while upholding our most fundamental values.

Now is the time to move the discussion from doomsaying to practical reality.

References

Bostrom, N. (2014). Superintelligence: Paths, Dangers, Strategies. Oxford University Press.
Brown, T., et al. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901.
Christiano, P. et al. (2017). Deep reinforcement learning from human preferences. Advances in Neural Information Processing Systems, 30.
Clarke, A.C. (1962). Profiles of the Future. Harper & Row.
Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608.
Florini, A. (2003). The Coming Democracy: New Rules for Running a New World. Island Press.
Jumper, J. et al. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596, 583–589.
Kukla, R. (2010). 'Controversies in Science and the Myth of Epistemic Sovereignty'. Perspectives on Science, 18(2), 253–273.
Ng, A. Y., & Russell, S. (2000). Algorithms for inverse reinforcement learning. Proceedings of the 17th International Conference on Machine Learning.
Sagan, S.D. (1993). The Limits of Safety: Organizations, Accidents, and Nuclear Weapons. Princeton University Press.
Sen, A. (1999). Development as Freedom. Oxford University Press.
UN General Assembly. (1948). Universal Declaration of Human Rights.

Brook Walker

The AI Directives: Ten Rules for Future AI

Comment from Brook Walker

Introduction

The AI Directives

Conclusion

References

“Why Did I Do That?” - Five Words from Grok That Could Change Everything You Thought You Knew About AI

TRUST-AI