Tuesday, October 28, 2025

If we already have automation, what's the need for Agents?

“Automation” and “agent” sound similar — but they solve very different classes of problems.

Automation = Fixed Instruction → Fixed Outcome

  • Like Zapier, IFTTT, Jenkins pipelines, cron jobs.

  • You pre-define exact triggers, actions, rules.

  • Great when:

    • Context is stable.

    • No judgment / interpretation is needed.

    • The world doesn’t change mid-execution.

Example:

“Every day at 5pm, send me a sales report.”
✅ Perfect automation — zero thinking needed.

Agent = Goal → Autonomous Decision-Making

  • Given a goal, not just rules.

  • Perceives, plans, adapts, self-corrects, retries, negotiates ambiguity.

  • Can operate even when instructions are incomplete or circumstances change.

  • Doesn’t need babysitting.

Example:

“Grow my revenue 15% next quarter — find the best channels, experiment, and adjust.”

✅ That’s NOT automatable. Needs strategy, improvisation, learning, resource orchestration. 

Understanding token size

 


LLM - where are the parameters stored, and the file system

 

What is an LLM? Is it a set of files? Does it sit as an .exe? A folder? A single binary? What does it LOOK LIKE if I download it?”

Answer: YES — an LLM is literally a set of files.
A big model file — like .bin, .pth, .safetensors, etc. — usually 2GB to 400GB+.

Parameters live inside the model — not in vector DB.

Vector DB only stores embeddings of user/business knowledge for retrieval.

Monday, October 13, 2025

Amazon Bedrock Guardrails

Amazon Bedrock Guardrails provides safeguards that you can configure for your generative AI applications based on your use cases and responsible AI policies. You can create multiple guardrails tailored to different use cases and apply them across multiple foundation models (FMs), providing a consistent user experience and standardizing safety and privacy controls across generative AI applications. You can use guardrails for both model prompts and responses with natural language.

You can use Amazon Bedrock Guardrails in multiple ways to help safeguard your generative AI applications. For example:

  • A chatbot application can use guardrails to help filter harmful user inputs and toxic model responses.

  • A banking application can use guardrails to help block user queries or model responses associated with seeking or providing investment advice.

  • A call center application to summarize conversation transcripts between users and agents can use guardrails to redact users’ personally identifiable information (PII) to protect user privacy.

Amazon Bedrock Guardrails provides the following safeguards (also known as policies) to detect and filter harmful content:

  • Content filters – Detect and filter harmful text or image content in input prompts or model responses. Filtering is done based on detection of certain predefined harmful content categories: Hate, Insults, Sexual, Violence, Misconduct and Prompt Attack. You also can adjust the filter strength for each of these categories.

  • Denied topics – Define a set of topics that are undesirable in the context of your application. The filter will help block them if detected in user queries or model responses.

  • Word filters – Configure filters to help block undesirable words, phrases, and profanity (exact match). Such words can include offensive terms, competitor names, etc.

  • Sensitive information filters – Configure filters to help block or mask sensitive information, such as personally identifiable information (PII), or custom regex in user inputs and model responses. Blocking or masking is done based on probabilistic detection of sensitive information in standard formats in entities such as SSN number, Date of Birth, address, etc. This also allows configuring regular expression based detection of patterns for identifiers.

  • Contextual grounding checks – Help detect and filter hallucinations in model responses based on grounding in a source and relevance to the user query.

  • Automated Reasoning checks – Can help you validate the accuracy of foundation model responses against a set of logical rules. You can use Automated Reasoning checks to detect hallucinations, suggest corrections, and highlight unstated assumptions in model responses.

x

Generative AI Lifecycle

The generative AI lifecycle provides a structured framework for developing and deploying AI solutions. It consists of five key stages: defining a use case, selecting a foundation model, improving performance, evaluating results, and deploying the application.

This iterative process begins with clearly articulating the business problem and requirements, then choosing an appropriate pre-trained model as a starting point.

Throughout the lifecycle, there's a focus on continuous refinement to ensure the AI solution remains effective and aligned with business objectives.


While generative AI has numerous applications, it's equally important to recognize situations where it might not be the most appropriate solution. These include situations with high accuracy and reliability requirements, ill-defined or constantly changing problems, insufficient data quality, the need for explainability and transparency, cost-benefit considerations, and ethical concerns.



Sometimes, other methods work better than AI. This includes simple tasks that are solvable with rule-based solutions, or when model costs outweigh business benefits.

AI Use Cases














Amazon Sagemaker

 


Amazon SageMaker is used by hundreds of thousands of AWS customers to build, train, and deploy machine learning models. Now, we've taken the machine learning service and added AWS analytics capabilities - creating one unified platform for data, analytics, and AI.

The next generation of Amazon SageMaker includes virtually all of the components you need for fast SQL analytics, big data processing, search, data preparation and integration, AI model development and training, and generative AI - along with a single view into all of your enterprise data. You get a single data and AI development environment with the SageMaker unified studio, a lakehouse architecture that unifies access to all your data - on S3, in Redshift, in SaaS applications, on-premises, or in other clouds - through the open Apache Iceberg standard interface and with the SageMaker Catalog built into Unified Studio, you get end-to-end governance for your data and AI workflows.


Amazon SageMaker AIThe service previously known as Amazon SageMaker has been renamed Amazon SageMaker AI. It is integrated within the next generation of SageMaker and is also available as a standalone service for those who wish to focus specifically on building, training, and deploying AI and ML models at scale.

Amazon SageMaker AI is a fully managed service to build, train, and deploy ML models - including foundation models - for any use case by bringing together a broad set of tools to enable high-performance, low-cost machine learning. It is available as a standalone service in the AWS console, or via APIs. Model development capabilities from SageMaker AI are available in the next generation of Amazon SageMaker.

1/Amazon SageMaker AI provides access to high-performance, cost-effective, scalable, and fully managed infrastructure and tools for each step of the ML lifecycle. Using Amazon SageMaker AI tools, you can easily build, train, test, troubleshoot, deploy, and manage FMs models at scale and boost productivity of data scientists and ML engineers while maintaining model performance in production.

2/You can explore Amazon SageMaker JumpStart, which is a ML hub offering models, algorithms, and prebuilt ML solutions. SageMaker JumpStart offers hundreds of ready-to-use FMs from various model providers, including a growing list of best performing publicly available FMs such as Falcon-40B, Stable Diffusion, OpenLLaMA, and Flan-T5/UL2.

3/Amazon SageMaker machine learning operations (MLOps) capabilities help you create repeatable workflows across the ML lifecycle to experiment, train, deploy, and govern ML models at scale while maintaining model performance in production.

4/Amazon SageMaker AI provides purpose-built governance tools to help you implement ML responsibly. Amazon SageMaker Model Cards makes it easier to capture, retrieve, and share essential model information. Once the models are deployed, SageMaker Model Dashboard gives you unified monitoring across all your models by providing deviations from expected behavior, automated alerts, and troubleshooting to improve model performance. Amazon SageMaker Clarify detects and measures potential bias using a variety of metrics to help you address potential bias and explain model predictions.

5/With Amazon SageMaker Ground Truth, you can use human feedback to customize models on company- or domain-specific data for your unique use case to improve model output and task performance.

AI, ML, DL, Gen AI



Artificial intelligence (AI): The overarching field of AI, which creates intelligent systems that perform human-like tasks

• Example: Siri and Alexa are examples of AI systems that can perform human-like tasks such as answering questions, setting reminders, and controlling smart home devices.

• Machine learning (ML): A subset of AI that uses statistical techniques for prediction based on patterns

• Example: Spam filters that learn to identify and block unwanted emails are an example of ML, where the system analyzes patterns in email data to make predictions about future messages.

• Deep learning (DL): A type of ML based on neural networks that are capable of learning complex patterns from large datasets

• Example: Facial recognition systems used in smartphones and social media platforms are powered by deep learning, which can learn complex patterns in large datasets of facial images.

• Generative AI: A subset of DL that creates new data based on learned patterns, often without retraining

• Example: Text-generating models like Amazon Nova Lite and image-generating models like Amazon Nova Canvas are examples of generative AI, which can create new content (such as articles, stories, or images) based on the patterns they've learned from their training data.


In generative AI, a model is the result of applying a machine learning algorithm to training data. Models encapsulate the patterns, relationships, and rules learned from the data, so that the AI system can generate new content or make predictions when given new inputs.

The quality of a generative AI model is critically dependent on both the training data and the ML algorithm that you use. High-quality, diverse training data helps the model learn a wide range of patterns and nuances, while an appropriate algorithm ensures effective learning from this data.

Model development is often iterative. Initial models might have limitations or biases. You can address these issues by refining training data, adjusting algorithms, or fine-tuning model parameters. AWS services such as SageMaker AI help with this iterative process by providing tools for model training, evaluation, and deployment.

Be aware that a model is only as good as the information it was trained on. It is important to carefully curate data and continuously monitor and update models to make sure they remain accurate and relevant over time.



Generative AI Essentials on AWS

 https://www.bespoketraining.com/resources/webinars/unlocking-azure-ai-essentials-generative-solutions/



Sunday, October 05, 2025

Agents

 






LLM Tool Use

 























Instruction Tuning and Reinforcement Learning from Human Feedback (RLHF)

 





















RLHF - rewards to generate higher reinforcements




Choosing model - closed source or open source

 




How to choose an Model?

 


LLM Pre-training

Definition of Pre-training

Pre-training is the process of training a model on vast, diverse, and largely unlabeled data to learn general representations and patterns of a domain — such as language, vision, audio, or sensor data — before it is specialized for specific tasks.

It is a self-supervised learning stage where the model develops an internal “world model” by predicting or reconstructing parts of its input (e.g., the next token, masked pixel, next audio frame, or next action).
The goal is not to perform a narrow task, but to build a foundation of understanding that later fine-tuning, prompting, or reinforcement can adapt to many downstream objectives.


🧠 Core Idea

Pre-training = learning how the world looks, sounds, or reads
before learning how to do something with that knowledge.

General Formulation

AspectDescription
InputLarge, diverse, unlabeled data (text, images, audio, code, trajectories, etc.)
ObjectivePredict missing or future parts of data (self-supervised task)
OutcomeDense, structured representations (embeddings) capturing meaning and relationships
PurposeBuild transferable understanding to accelerate later adaptation

Why It Matters

Pre-training converts raw data → reusable intelligence.
Once the base model is pretrained, it can be:

  • Fine-tuned for specialized tasks,

  • Aligned for human intent (via RLHF),

  • Connected to live knowledge (via RAG).

It’s the difference between:

teaching a brain how to think and perceive,
versus teaching it what to think or do.


When You Should Pre-train

You should pre-train from scratch only when:

  • You need a new base model (new architecture, tokenizer, or modality);

  • Existing models don’t cover your language or data type (e.g., low-resource languages, medical imaging, genomic data);

  • You want full control over knowledge, bias, and compliance;

  • You’re performing foundational research into architectures or training dynamics.

Otherwise — reuse and fine-tune an existing pre-trained foundation. 

 









What is the need for LLM Finetuning?

 Reason number 1



You might not want the above result, rather something like below would help you


Reason number 2 for LLM fine tuning



Reason number for LLM Fine Tuning








Another reason to fine tune








LLM Fine-tuning

Next Layer: Fine-Tuning

Where RAG retrieves knowledge dynamically, fine-tuning actually modifies the model’s brain — it teaches the LLM new patterns or behaviors by updating its internal weights.


⚙️ How Fine-Tuning Works

  1. Start with a pretrained model (e.g., GPT-3.5, Llama-3, Mistral).

  2. Prepare training data — examples of how you want the model to behave:

    • Inputs → desired outputs

    • e.g., “User story → corresponding UAT test case”

  3. Train the model on these examples (using supervised learning or reinforcement learning).

  4. The model’s weights are adjusted, internalizing the new style, tone, or domain language.

After fine-tuning, the model natively performs the desired task without needing the examples fed each time.


⚖️ RAG vs Fine-Tuning: Clear Comparison

AspectRAG (Retrieval-Augmented Generation)Fine-Tuning
MechanismAdds external info at runtimeAlters model weights via training
When UsedWhen data changes often or is largeWhen you need consistent behavior or reasoning style
Data TypeDocuments, databases, APIsLabeled prompt–response pairs
CostLow (no retraining)High (GPU time, expertise, re-training)
FreshnessInstantly updatableRequires re-training to update
ControlYou control retrieved sourcesYou control reasoning patterns
Example UseAsk questions about new policiesTeach model to write test cases in your company’s format
AnalogyReading from a manual before answeringRewriting the brain to remember the manual forever

🧩 Combining Both: RAG + Fine-Tuning = Domain-Native AI

The real power comes when both are used together:

LayerRole
Fine-TuningTeaches the model how to think — e.g., how to structure a UAT test case, how to handle defects, your tone/style.
RAGGives it the latest knowledge — e.g., current epics, Jira stories, or Salesforce objects from your live data.

So the LLM becomes:

A fine-tuned specialist with a live retrieval memory.


🧬 Example: In Your AGL Salesforce / UAT Context

StepExample
Fine-tuningYou fine-tune the LLM on 1,000 existing UAT test cases and business rules. Now it understands your structure and tone.
RAG layerYou connect it to Jira and Confluence via embeddings, so when you ask, “Generate UAT test cases for Drop-3 Call Centre Epics,” it retrieves the latest epics and acceptance criteria.
ResultYou get context-aware, properly formatted, accurate UAT cases consistent with AGL’s standards.

That’s enterprise-grade augmentation — the model both knows how to think like your testers and knows what’s new from your systems.


🧠 Summary Table

CapabilityBase LLM+ RAG+ Fine-Tuning+ Both
General reasoning
Access to private or new data⚠ (only if baked in)
Domain vocabulary & formats
Updatable knowledge
Low hallucination✅✅
Cost to buildLowMedium–HighMedium

🚀 The Strategic Rule of Thumb

If your problem is...Then use...
“Model doesn’t know the latest information.”RAG
“Model doesn’t behave or write like us.”Fine-Tuning
“Model doesn’t know and doesn’t behave correctly.”Both

That’s the progressive architecture:

  • RAG extends knowledge.

  • Fine-tuning embeds behavior.

  • Together, they form the foundation for enterprise-grade AI systems.

LLMs and RAG (Retrieval-Augmented Generation)

 

🧩 What Is RAG (Retrieval-Augmented Generation)?

Retrieval-Augmented Generation (RAG) is an AI architecture pattern where a Large Language Model (LLM) doesn’t rely only on its internal “frozen” training data.


Instead, it retrieves relevant, up-to-date, or domain-specific information from an external knowledge source (like your documents, databases, or APIs) just before it generates an answer.

So the model’s reasoning process becomes:

Question → Retrieve relevant documents → Feed them into the LLM → Generate answer using both

You can think of it as giving the LLM a “just-in-time memory extension.”


⚙️ How It Works — Step by Step

  1. User query comes in.

  2. Retriever searches a knowledge base (PDFs, wikis, databases, Jira tickets, etc.) for the most relevant chunks.

  3. Top-k relevant passages are embedded and appended to the model’s prompt.

  4. LLM generates the final response, grounded in those retrieved facts.

Typical components:

ComponentDescription
LLMThe reasoning and text-generation engine (e.g., GPT-5, Claude, Gemini).
RetrieverFinds relevant text snippets via embeddings (vector similarity search).
Vector DatabaseStores text chunks as numerical embeddings (e.g., Pinecone, Chroma, FAISS).
Orchestrator LayerHandles query parsing, retrieval, prompt assembly, and response formatting.

🎯 The Core Benefit: Grounded Intelligence

RAG bridges the gap between static models and dynamic knowledge.

Problem Without RAGHow RAG Solves It
LLM knowledge cutoff (e.g., 2023)Retrieves real-time or updated data
Hallucinations / made-up factsGrounds responses in retrieved, traceable context
Domain specificity (finance, legal, energy, healthcare, etc.)Pulls your proprietary content as context
Data privacy and complianceKeeps data in your environment (no fine-tuning needed)
High cost of fine-tuning modelsLets you “teach” via retrieval instead of retraining

💡 Real-World Examples

Use CaseWhat RAG Does
Enterprise knowledge assistantSearches company Confluence, Jira, Salesforce, and answers from those docs
Customer support botRetrieves FAQs and policy docs to answer accurately
Research assistantPulls academic papers from a library before summarizing
Testing & QA (your domain)Retrieves test cases, acceptance criteria, or epic notes to generate UAT scenarios
Legal advisorRetrieves specific clauses or past judgments to draft responses

📈 Key Benefits Summarized

BenefitDescription
AccuracyReduces hallucination by grounding outputs in retrieved data
FreshnessKeeps responses current without retraining
Cost-effectiveNo need for fine-tuning or re-training large models
TraceabilityYou can show sources and citations (useful for audits, compliance)
ScalabilityWorks across thousands or millions of documents
Data ControlKeeps your proprietary knowledge within your secure environment

🧠 Why It’s Still Relevant (Even in 2025)

Modern LLMs (GPT-5, Gemini 2, Claude 3.5, etc.) can read attached documents —
but they still can’t:

  • Search across large knowledge bases automatically,

  • Maintain persistent memory across sessions,

  • Retrieve structured metadata or enforce data lineage.

RAG remains the backbone of enterprise AI because it allows controlled, explainable, and auditable intelligence.


🔍 In One Line

RAG = Reasoning + Retrieval.
It gives LLMs a dynamic external memory, making them accurate, current, and domain-aware.

Wednesday, September 17, 2025

Linear equations in AI / machine learning

Equation in AI

In machine learning, the model often starts with a linear equation:

y = w_1x_1 + w_2x_2 + \dots + b

Inputs = features (e.g., number of rooms in a house, area in sq. ft, etc.)

Weights = importance given to each feature

Bias = baseline adjustment

Output = prediction (e.g., house price)

---

2. How Weights Are Learned

Initially, weights are set randomly (like guessing).

The model makes a prediction.

It compares prediction vs. actual answer (this difference = error/loss).

Using an algorithm like gradient descent, the model adjusts weights step by step to reduce error.

---

3. Simple Example: Predicting House Price

Equation:

Price = (w_1 \times \text{Area}) + (w_2 \times \text{Bedrooms}) + b

Suppose training data says:

A 1000 sq. ft, 2-bedroom house = $300k

A 2000 sq. ft, 3-bedroom house = $500k


The model might learn weights like:

 (each sq. ft adds $150)

 (each bedroom adds $20,000)

 (no baseline adjustment)

So:

Price = 150 \times \text{Area} + 20{,}000 \times \text{Bedrooms}
---

4. Intuition

If is large → Area matters a lot.

If is small → Bedrooms don’t influence much.

AI keeps tweaking weights until the predictions match reality closely.

---

👉 In short:

Weights = knobs AI turns to “tune” importance of inputs.

Training = the process of finding the best knob settings.

---

Deep learning YouTube serirs

https://youtube.com/playlist?list=PLehuLRPyt1HxuYpdlW4KevYJVOSDG3DEz&si=-5j3MRmA5BfKqem6

Friday, August 08, 2025

AI Agents Memory

𝗔𝗜 𝗔𝗴𝗲𝗻𝘁’𝘀 𝗠𝗲𝗺𝗼𝗿𝘆 is the most important piece of 𝗖𝗼𝗻𝘁𝗲𝘅𝘁 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴, this is how we define it 

In general, the memory for an agent is something that we provide via context in the prompt passed to LLM that helps the agent to better plan and react given past interactions or data not immediately available.

It is useful to group the memory into four types:

𝟭. 𝗘𝗽𝗶𝘀𝗼𝗱𝗶𝗰 - This type of memory contains past interactions and actions performed by the agent. After an action is taken, the application controlling the agent would store the action in some kind of persistent storage so that it can be retrieved later if needed. A good example would be using a vector Database to store semantic meaning of the interactions.
𝟮. 𝗦𝗲𝗺𝗮𝗻𝘁𝗶𝗰 - Any external information that is available to the agent and any knowledge the agent should have about itself. You can think of this as a context similar to one used in RAG applications. It can be internal knowledge only available to the agent or a grounding context to isolate part of the internet scale data for more accurate answers.
𝟯. 𝗣𝗿𝗼𝗰𝗲𝗱𝘂𝗿𝗮𝗹 - This is systemic information like the structure of the System Prompt, available tools, guardrails etc. It will usually be stored in Git, Prompt and Tool Registries.
𝟰. Occasionally, the agent application would pull information from long-term memory and store it locally if it is needed for the task at hand.
𝟱. All of the information pulled together from the long-term or stored in local memory is called short-term or working memory. Compiling all of it into a prompt will produce the prompt to be passed to the LLM and it will provide further actions to be taken by the system.

We usually label 1. - 3. as Long-Term memory and 5. as Short-Term memory.

#LLM #AI #ContextEngineering

Thursday, August 07, 2025

Google Genie 3 & where's it leading us to

1. Advancing “world models” for AI

AI agents need realistic, interactive environments to learn decision-making (e.g., how to navigate, manipulate objects, or plan actions).

Traditional simulators (like game engines) are hand-coded and slow to build. Genie 3 generates new, physics-aware environments instantly from text prompts.

This makes it useful for training AI at scale without needing human-designed levels.

---

2. Democratizing content creation

Currently, building a game or simulation requires coding, asset design, and engines.

Genie 3 removes that barrier by letting anyone type a prompt (“a forest at sunset with floating islands”) and get an explorable world in seconds.

This could lead to personalized games, educational tools, or VR simulations without technical skills.
---

3. Testing AI memory and reasoning

Genie 3 introduces visual memory (the AI remembers object placement for ~1 minute).

This allows researchers to study how AI handles continuity—a step toward agents that can remember and interact in more complex ways.
---

4. Faster experimentation for researchers and developers

Instead of waiting weeks for artists and engineers to design levels, researchers can spin up thousands of unique worlds for experiments, robotics planning, or reinforcement learning.

Potential applications: autonomous driving, robotics training, creative prototyping.
---

5. Laying groundwork for AI-generated entertainment

While not a finished product, Genie 3 hints at a future where games “write themselves” based on what you imagine.

Think: a Minecraft-like game that reshapes itself dynamically rather than relying on blocks or mods.
---

In short: Genie 3 solves the problem of rapidly generating rich, interactive worlds without manual effort, which is crucial for AI development and creative prototyping, not just gaming

Framework for AI Workflow

Source

Modern large language models (LLMs) are increasingly used as autonomous agents—capable of planning tasks, invoking tools, collaborating with other agents, and adapting to changing environments. However, as these systems grow more complex, ad hoc approaches to building and coordinating them are breaking down. Current challenges include:

1. Lack of standardized structures for how agents should coordinate, plan, and execute tasks.

2. Fragmentation of frameworks—academic and industrial systems vary widely in architecture, terminology, and capabilities, making comparison difficult.

3. Scalability and reliability issues—dynamic environments demand flexible workflows, but existing designs are often brittle or inefficient.

4. Security and trust concerns—multi-agent workflows introduce risks like tool poisoning, memory corruption, and collusion.

5. Absence of clear evaluation metrics—it’s unclear how to measure success or optimize these workflows systematically.

In other words, there’s no unified understanding of how to design, manage, and improve agent workflows. The paper proposes to address this by surveying current approaches, identifying their strengths and weaknesses, and outlining future research directions.

Train yourself in QA - Roadmap

Train yourself in QA

Tuesday, August 05, 2025

AI Industrial complex

The AI Industrial Complex is a critical term used to describe the growing network of companies, governments, research institutions, and military or security organizations that are driving the rapid development and deployment of artificial intelligence—often prioritizing power, profit, or control over ethical considerations.

It’s modeled on terms like the “Military-Industrial Complex,” which warned about entrenched systems where industries and governments reinforce each other’s interests, making oversight and reform difficult.

Core Features:

1. Concentration of Power

A few tech giants (e.g., OpenAI, Google, Anthropic, Microsoft) dominate AI research, infrastructure, and data access.

These companies influence policy and public narratives around AI risks and benefits.

2. State-Industry Alliances

Governments fund AI development for economic competition, surveillance, and defense.

In return, companies gain contracts, regulatory advantages, or subsidies.

3. Hype and Speculation

Fear of “falling behind” drives massive investment, often inflating promises of what AI can deliver.

Narratives about “AI safety” or “AI for good” can mask underlying motives (e.g., market control or militarization).

4. Ethical and Social Trade-offs

Labor displacement, surveillance, bias, and environmental costs are sidelined.

Smaller players and public interests struggle to influence the trajectory.

Why the term matters:

Critics use “AI Industrial Complex” to suggest that AI development isn’t purely about innovation but about consolidating power and shaping society around the interests of a few.

Do you want me to explain who coined it, examples of how it operates today, or why some argue it’s dangerous but others think it’s necessary?

Prover-Verifier Games and GPT-5

https://arxiv.org/html/2407.13692v2

Sunday, August 03, 2025

Persona vectors

Rohan Paul

Anthropic just showed that an AI's “personality” can be traced to specific directions in its brain ("Persona vectors"), and shows what might make it act in evil or unsafe ways.

Sometimes when you're chatting with a model, it suddenly starts behaving oddly—overly flattering, factually wrong, or even outright malicious. This paper is about understanding why that happens, and how to stop it.

🧠 What's going on inside these models?

AI models don’t actually have personalities like humans do, but they sometimes act like they do—especially when prompted a certain way or trained on particular data. 

Anthropic’s team found that specific behaviors, like being “evil,” “sycophantic,” or prone to “hallucination,” show up as linear directions inside the model's activation space. 

They call these persona vectors.

Think of it like this: if you observe how the model responds in different situations, you can map those behaviors to certain regions inside the model’s brain. And if you spot where these traits live, you can monitor and even control them.

---

The diagram shows a simple pipeline that turns a plain description of a trait such as evil into a single “persona vector”, which is just a pattern of activity inside the model that tracks that trait.

Once this vector exists, engineers can watch the model’s activations and see in real time if the model is drifting toward the unwanted personality while it runs or while it is being finetuned.

The very same vector works like a control knob. 

Subtracting it during inference tones the trait down, and sprinkling a small amount of it during training teaches the model to resist picking that trait up in the first place, so regular skills stay intact.

Because each piece of training text can also be projected onto the vector, any snippet that would push the model toward the trait lights up early, letting teams filter or fix that data before it causes trouble.

Al that means, you can control the following of a model

- Watch how a model’s personality evolves, either while chatting or during training
- Control or reduce unwanted personality changes as the model is being developed or trained
- Figure out what training data is pushing those changes

🔬 How to make sense of this persona vector?

Think of a large language model as a machine that turns every word it reads into a long list of numbers. That list is called the activation vector for that word, and it might be 4096 numbers long in a model the size of Llama-3.

A persona vector is another list of the same length, but it is not baked into the model’s weights. The team creates it after the model is already trained:

They run the model twice with the same user question, once under a “be evil” system prompt and once under a “be helpful” prompt.

They grab the hidden activations from each run and average them, so they now have two mean activation vectors.

They subtract the helpful average from the evil average. The result is a single direction in that 4096-dimensional space. That direction is the persona vector for “evil.”

Because the vector lives outside the model, you can store it in a tiny file and load it only when you need to check or steer the personality. During inference you add (or subtract) a scaled copy of the vector to the activations at one or more layers. Pushing along the vector makes the bot lean into the trait, pulling against it tones the trait down. During fine-tuning you can sprinkle a bit of the vector in every step to “vaccinate” the model so later data will not push it toward that trait.

So, under the hood, a persona vector is simply a 1-dimensional direction inside the model’s huge activation space, not a chunk of the weight matrix. It is computed once, saved like any other small tensor, and then used as a plug-in dial for personality control.

---
The pipeline is automated, so any new trait just needs a plain-language description and a handful of trigger prompts. 

They validate the result by injecting the vector and watching the bot slip instantly into the matching personality.

If we already have automation, what's the need for Agents?

“Automation” and “agent” sound similar — but they solve very different classes of problems. Automation = Fixed Instruction → Fixed Outcome ...