https://bbycroft.net/llm
Scraps from various sources and my own writings on Generative AI, AGI, Digital, Disruption, Agile, Scrum, Kanban, Scaled Agile, XP, TDD, FDD, DevOps, Design Thinking, etc.
Page Hits
Wednesday, December 17, 2025
Tuesday, December 09, 2025
What AI does before it writes code for you.
Before your AI writes a single line of Python, it takes 15 hidden mental steps.
Researchers just mapped the entire "thought process"—and it's wild.
Here's the complete breakdown 🧠👇
🗂️ PHASE 1: REQUIREMENTS GATHERING
The AI isn't just reading your prompt. It's:
TSK - Identifying the core task
CTX - Understanding code context (variables, functions, types)
CST - Spotting constraints (performance, recursion, input limits)
🧩 PHASE 2: SOLUTION PLANNING
Now it strategizes:
KRL - Recalls libraries/patterns from training data
CFL - Constructs control flow (loops, branches, logic)
CMP - Compares alternative approaches
AMB - Flags ambiguous/missing info
This is where smart prompts = better code.
⚙️ PHASE 3: IMPLEMENTATION
Two substeps:
SCG - Scaffold Code Generation (rough draft/pseudocode)
CCG - Complete Code Generation (final output)
Fun fact: 30% of AI responses skip this phase entirely in the reasoning trace.
🔍 PHASE 4: REFLECTION
The AI reviews its work:
UTC - Creates unit tests
ALT - Explores post-hoc alternatives
EGC - Identifies edge cases
FLW - Spots logical flaws
STY - Checks code style
SFA - Self-asserts "this is correct"
Here's the kicker:
Not all 15 steps happen every time.
The study found 5 common "reasoning patterns" (combos of steps).
The MOST successful pattern (FP1)?
TSK→CTX→CST→KRL→CFL→CMP→AMB→SCG→CCG→ALT→EGC→SFA
It's a complete human-like workflow.
But simpler tasks use simpler patterns.
Example: Self-contained functions skip:
❌ Ambiguity recognition (AMB)
❌ Alternative exploration (ALT)
❌ Edge case checks (EGC)
The AI adapts its reasoning depth based on task complexity.
Which step matters MOST for correct code?
📊 Analysis of 1,150 traces shows:
🥇 UTC (Unit Test Creation) - Strongest positive correlation
🥈 CCG (Complete Code) - Necessary for success
🥉 SCG (Scaffold) - Helps catch logic errors early
Which steps HURT performance?
🔻 CST (Constraint ID) - Negative correlation
🔻 AMB (Ambiguity Recognition) - Negative correlation
🔻 CMP (Solution Comparison) - Negative correlation
Why? They signal unclear prompts → bad assumptions → wrong code.
Real-world example:
When tasked with validating IP addresses, Qwen3-14B:
Identified task (TSK)
Recalled regex patterns (KRL)
Planned validation logic (CFL)
Generated test cases (UTC)
Wrote final code (CCG)
Self-asserted correctness (SFA)
Result? ✅ Passed all tests.
Understanding these 15 steps lets you:
✅ Write prompts that trigger the RIGHT reasoning
✅ Spot when AI is stuck in bad patterns
✅ Improve code quality by 10-15%
Carlos E Perez on X
Tuesday, October 28, 2025
If we already have automation, what's the need for Agents?
“Automation” and “agent” sound similar — but they solve very different classes of problems.
Automation = Fixed Instruction → Fixed Outcome
-
Like Zapier, IFTTT, Jenkins pipelines, cron jobs.
-
You pre-define exact triggers, actions, rules.
-
Great when:
-
Context is stable.
-
No judgment / interpretation is needed.
-
The world doesn’t change mid-execution.
-
Example:
“Every day at 5pm, send me a sales report.”
✅ Perfect automation — zero thinking needed.
Agent = Goal → Autonomous Decision-Making
-
Given a goal, not just rules.
-
Perceives, plans, adapts, self-corrects, retries, negotiates ambiguity.
-
Can operate even when instructions are incomplete or circumstances change.
-
Doesn’t need babysitting.
Example:
“Grow my revenue 15% next quarter — find the best channels, experiment, and adjust.”
✅ That’s NOT automatable. Needs strategy, improvisation, learning, resource orchestration.
LLM - where are the parameters stored, and the file system
What is an LLM? Is it a set of files? Does it sit as an .exe? A folder? A single binary? What does it LOOK LIKE if I download it?”
Answer: YES — an LLM is literally a set of files.
A big model file — like .bin, .pth, .safetensors, etc. — usually 2GB to 400GB+.
Parameters live inside the model — not in vector DB.
✅ Vector DB only stores embeddings of user/business knowledge for retrieval.
Thursday, October 23, 2025
Monday, October 13, 2025
Amazon Bedrock Guardrails
Amazon Bedrock Guardrails provides safeguards that you can configure for your generative AI applications based on your use cases and responsible AI policies. You can create multiple guardrails tailored to different use cases and apply them across multiple foundation models (FMs), providing a consistent user experience and standardizing safety and privacy controls across generative AI applications. You can use guardrails for both model prompts and responses with natural language.
You can use Amazon Bedrock Guardrails in multiple ways to help safeguard your generative AI applications. For example:
A chatbot application can use guardrails to help filter harmful user inputs and toxic model responses.
A banking application can use guardrails to help block user queries or model responses associated with seeking or providing investment advice.
A call center application to summarize conversation transcripts between users and agents can use guardrails to redact users’ personally identifiable information (PII) to protect user privacy.
Amazon Bedrock Guardrails provides the following safeguards (also known as policies) to detect and filter harmful content:
Content filters – Detect and filter harmful text or image content in input prompts or model responses. Filtering is done based on detection of certain predefined harmful content categories: Hate, Insults, Sexual, Violence, Misconduct and Prompt Attack. You also can adjust the filter strength for each of these categories.
Denied topics – Define a set of topics that are undesirable in the context of your application. The filter will help block them if detected in user queries or model responses.
Word filters – Configure filters to help block undesirable words, phrases, and profanity (exact match). Such words can include offensive terms, competitor names, etc.
Sensitive information filters – Configure filters to help block or mask sensitive information, such as personally identifiable information (PII), or custom regex in user inputs and model responses. Blocking or masking is done based on probabilistic detection of sensitive information in standard formats in entities such as SSN number, Date of Birth, address, etc. This also allows configuring regular expression based detection of patterns for identifiers.
Contextual grounding checks – Help detect and filter hallucinations in model responses based on grounding in a source and relevance to the user query.
Automated Reasoning checks – Can help you validate the accuracy of foundation model responses against a set of logical rules. You can use Automated Reasoning checks to detect hallucinations, suggest corrections, and highlight unstated assumptions in model responses.
x
Generative AI Lifecycle
The generative AI lifecycle provides a structured framework for developing and deploying AI solutions. It consists of five key stages: defining a use case, selecting a foundation model, improving performance, evaluating results, and deploying the application.
This iterative process begins with clearly articulating the business problem and requirements, then choosing an appropriate pre-trained model as a starting point.
Throughout the lifecycle, there's a focus on continuous refinement to ensure the AI solution remains effective and aligned with business objectives.
AI Use Cases
Amazon Sagemaker
Amazon SageMaker is used by hundreds of thousands of AWS customers to build, train, and deploy machine learning models. Now, we've taken the machine learning service and added AWS analytics capabilities - creating one unified platform for data, analytics, and AI.
The next generation of Amazon SageMaker includes virtually all of the components you need for fast SQL analytics, big data processing, search, data preparation and integration, AI model development and training, and generative AI - along with a single view into all of your enterprise data. You get a single data and AI development environment with the SageMaker unified studio, a lakehouse architecture that unifies access to all your data - on S3, in Redshift, in SaaS applications, on-premises, or in other clouds - through the open Apache Iceberg standard interface and with the SageMaker Catalog built into Unified Studio, you get end-to-end governance for your data and AI workflows.
Amazon SageMaker AIThe service previously known as Amazon SageMaker has been renamed Amazon SageMaker AI. It is integrated within the next generation of SageMaker and is also available as a standalone service for those who wish to focus specifically on building, training, and deploying AI and ML models at scale.
Amazon SageMaker AI is a fully managed service to build, train, and deploy ML models - including foundation models - for any use case by bringing together a broad set of tools to enable high-performance, low-cost machine learning. It is available as a standalone service in the AWS console, or via APIs. Model development capabilities from SageMaker AI are available in the next generation of Amazon SageMaker.
1/Amazon SageMaker AI provides access to high-performance, cost-effective, scalable, and fully managed infrastructure and tools for each step of the ML lifecycle. Using Amazon SageMaker AI tools, you can easily build, train, test, troubleshoot, deploy, and manage FMs models at scale and boost productivity of data scientists and ML engineers while maintaining model performance in production.
2/You can explore Amazon SageMaker JumpStart, which is a ML hub offering models, algorithms, and prebuilt ML solutions. SageMaker JumpStart offers hundreds of ready-to-use FMs from various model providers, including a growing list of best performing publicly available FMs such as Falcon-40B, Stable Diffusion, OpenLLaMA, and Flan-T5/UL2.
3/Amazon SageMaker machine learning operations (MLOps) capabilities help you create repeatable workflows across the ML lifecycle to experiment, train, deploy, and govern ML models at scale while maintaining model performance in production.
4/Amazon SageMaker AI provides purpose-built governance tools to help you implement ML responsibly. Amazon SageMaker Model Cards makes it easier to capture, retrieve, and share essential model information. Once the models are deployed, SageMaker Model Dashboard gives you unified monitoring across all your models by providing deviations from expected behavior, automated alerts, and troubleshooting to improve model performance. Amazon SageMaker Clarify detects and measures potential bias using a variety of metrics to help you address potential bias and explain model predictions.
5/With Amazon SageMaker Ground Truth, you can use human feedback to customize models on company- or domain-specific data for your unique use case to improve model output and task performance.
AI, ML, DL, Gen AI
Artificial intelligence (AI): The overarching field of AI, which creates intelligent systems that perform human-like tasks
• Example: Siri and Alexa are examples of AI systems that can perform human-like tasks such as answering questions, setting reminders, and controlling smart home devices.
• Machine learning (ML): A subset of AI that uses statistical techniques for prediction based on patterns
• Example: Spam filters that learn to identify and block unwanted emails are an example of ML, where the system analyzes patterns in email data to make predictions about future messages.
• Deep learning (DL): A type of ML based on neural networks that are capable of learning complex patterns from large datasets
• Example: Facial recognition systems used in smartphones and social media platforms are powered by deep learning, which can learn complex patterns in large datasets of facial images.
• Generative AI: A subset of DL that creates new data based on learned patterns, often without retraining
• Example: Text-generating models like Amazon Nova Lite and image-generating models like Amazon Nova Canvas are examples of generative AI, which can create new content (such as articles, stories, or images) based on the patterns they've learned from their training data.
Monday, October 06, 2025
Sunday, October 05, 2025
LLM Pre-training
Definition of Pre-training
Pre-training is the process of training a model on vast, diverse, and largely unlabeled data to learn general representations and patterns of a domain — such as language, vision, audio, or sensor data — before it is specialized for specific tasks.
It is a self-supervised learning stage where the model develops an internal “world model” by predicting or reconstructing parts of its input (e.g., the next token, masked pixel, next audio frame, or next action).
The goal is not to perform a narrow task, but to build a foundation of understanding that later fine-tuning, prompting, or reinforcement can adapt to many downstream objectives.
🧠 Core Idea
Pre-training = learning how the world looks, sounds, or reads —
before learning how to do something with that knowledge.
General Formulation
| Aspect | Description |
|---|---|
| Input | Large, diverse, unlabeled data (text, images, audio, code, trajectories, etc.) |
| Objective | Predict missing or future parts of data (self-supervised task) |
| Outcome | Dense, structured representations (embeddings) capturing meaning and relationships |
| Purpose | Build transferable understanding to accelerate later adaptation |
Why It Matters
Pre-training converts raw data → reusable intelligence.
Once the base model is pretrained, it can be:
-
Fine-tuned for specialized tasks,
-
Aligned for human intent (via RLHF),
-
Connected to live knowledge (via RAG).
It’s the difference between:
teaching a brain how to think and perceive,
versus teaching it what to think or do.
When You Should Pre-train
You should pre-train from scratch only when:
-
You need a new base model (new architecture, tokenizer, or modality);
-
Existing models don’t cover your language or data type (e.g., low-resource languages, medical imaging, genomic data);
-
You want full control over knowledge, bias, and compliance;
-
You’re performing foundational research into architectures or training dynamics.
Otherwise — reuse and fine-tune an existing pre-trained foundation.
What is the need for LLM Finetuning?
Reason number 1
You might not want the above result, rather something like below would help you
LLM Fine-tuning
Next Layer: Fine-Tuning
Where RAG retrieves knowledge dynamically, fine-tuning actually modifies the model’s brain — it teaches the LLM new patterns or behaviors by updating its internal weights.
⚙️ How Fine-Tuning Works
-
Start with a pretrained model (e.g., GPT-3.5, Llama-3, Mistral).
-
Prepare training data — examples of how you want the model to behave:
-
Inputs → desired outputs
-
e.g., “User story → corresponding UAT test case”
-
-
Train the model on these examples (using supervised learning or reinforcement learning).
-
The model’s weights are adjusted, internalizing the new style, tone, or domain language.
After fine-tuning, the model natively performs the desired task without needing the examples fed each time.
⚖️ RAG vs Fine-Tuning: Clear Comparison
| Aspect | RAG (Retrieval-Augmented Generation) | Fine-Tuning |
|---|---|---|
| Mechanism | Adds external info at runtime | Alters model weights via training |
| When Used | When data changes often or is large | When you need consistent behavior or reasoning style |
| Data Type | Documents, databases, APIs | Labeled prompt–response pairs |
| Cost | Low (no retraining) | High (GPU time, expertise, re-training) |
| Freshness | Instantly updatable | Requires re-training to update |
| Control | You control retrieved sources | You control reasoning patterns |
| Example Use | Ask questions about new policies | Teach model to write test cases in your company’s format |
| Analogy | Reading from a manual before answering | Rewriting the brain to remember the manual forever |
🧩 Combining Both: RAG + Fine-Tuning = Domain-Native AI
The real power comes when both are used together:
| Layer | Role |
|---|---|
| Fine-Tuning | Teaches the model how to think — e.g., how to structure a UAT test case, how to handle defects, your tone/style. |
| RAG | Gives it the latest knowledge — e.g., current epics, Jira stories, or Salesforce objects from your live data. |
So the LLM becomes:
A fine-tuned specialist with a live retrieval memory.
🧬 Example: In Your AGL Salesforce / UAT Context
| Step | Example |
|---|---|
| Fine-tuning | You fine-tune the LLM on 1,000 existing UAT test cases and business rules. Now it understands your structure and tone. |
| RAG layer | You connect it to Jira and Confluence via embeddings, so when you ask, “Generate UAT test cases for Drop-3 Call Centre Epics,” it retrieves the latest epics and acceptance criteria. |
| Result | You get context-aware, properly formatted, accurate UAT cases consistent with AGL’s standards. |
That’s enterprise-grade augmentation — the model both knows how to think like your testers and knows what’s new from your systems.
🧠 Summary Table
| Capability | Base LLM | + RAG | + Fine-Tuning | + Both |
|---|---|---|---|---|
| General reasoning | ✅ | ✅ | ✅ | ✅ |
| Access to private or new data | ❌ | ✅ | ⚠ (only if baked in) | ✅ |
| Domain vocabulary & formats | ⚠ | ⚠ | ✅ | ✅ |
| Updatable knowledge | ❌ | ✅ | ❌ | ✅ |
| Low hallucination | ⚠ | ✅ | ✅ | ✅✅ |
| Cost to build | – | Low | Medium–High | Medium |
🚀 The Strategic Rule of Thumb
| If your problem is... | Then use... |
|---|---|
| “Model doesn’t know the latest information.” | ✅ RAG |
| “Model doesn’t behave or write like us.” | ✅ Fine-Tuning |
| “Model doesn’t know and doesn’t behave correctly.” | ✅ Both |
That’s the progressive architecture:
-
RAG extends knowledge.
-
Fine-tuning embeds behavior.
-
Together, they form the foundation for enterprise-grade AI systems.
LLMs and RAG (Retrieval-Augmented Generation)
🧩 What Is RAG (Retrieval-Augmented Generation)?
Retrieval-Augmented Generation (RAG) is an AI architecture pattern where a Large Language Model (LLM) doesn’t rely only on its internal “frozen” training data.
Instead, it retrieves relevant, up-to-date, or domain-specific information from an external knowledge source (like your documents, databases, or APIs) just before it generates an answer.
So the model’s reasoning process becomes:
Question → Retrieve relevant documents → Feed them into the LLM → Generate answer using both
You can think of it as giving the LLM a “just-in-time memory extension.”
⚙️ How It Works — Step by Step
-
User query comes in.
-
Retriever searches a knowledge base (PDFs, wikis, databases, Jira tickets, etc.) for the most relevant chunks.
-
Top-k relevant passages are embedded and appended to the model’s prompt.
-
LLM generates the final response, grounded in those retrieved facts.
Typical components:
| Component | Description |
|---|---|
| LLM | The reasoning and text-generation engine (e.g., GPT-5, Claude, Gemini). |
| Retriever | Finds relevant text snippets via embeddings (vector similarity search). |
| Vector Database | Stores text chunks as numerical embeddings (e.g., Pinecone, Chroma, FAISS). |
| Orchestrator Layer | Handles query parsing, retrieval, prompt assembly, and response formatting. |
🎯 The Core Benefit: Grounded Intelligence
RAG bridges the gap between static models and dynamic knowledge.
| Problem Without RAG | How RAG Solves It |
|---|---|
| LLM knowledge cutoff (e.g., 2023) | Retrieves real-time or updated data |
| Hallucinations / made-up facts | Grounds responses in retrieved, traceable context |
| Domain specificity (finance, legal, energy, healthcare, etc.) | Pulls your proprietary content as context |
| Data privacy and compliance | Keeps data in your environment (no fine-tuning needed) |
| High cost of fine-tuning models | Lets you “teach” via retrieval instead of retraining |
💡 Real-World Examples
| Use Case | What RAG Does |
|---|---|
| Enterprise knowledge assistant | Searches company Confluence, Jira, Salesforce, and answers from those docs |
| Customer support bot | Retrieves FAQs and policy docs to answer accurately |
| Research assistant | Pulls academic papers from a library before summarizing |
| Testing & QA (your domain) | Retrieves test cases, acceptance criteria, or epic notes to generate UAT scenarios |
| Legal advisor | Retrieves specific clauses or past judgments to draft responses |
📈 Key Benefits Summarized
| Benefit | Description |
|---|---|
| Accuracy | Reduces hallucination by grounding outputs in retrieved data |
| Freshness | Keeps responses current without retraining |
| Cost-effective | No need for fine-tuning or re-training large models |
| Traceability | You can show sources and citations (useful for audits, compliance) |
| Scalability | Works across thousands or millions of documents |
| Data Control | Keeps your proprietary knowledge within your secure environment |
🧠 Why It’s Still Relevant (Even in 2025)
Modern LLMs (GPT-5, Gemini 2, Claude 3.5, etc.) can read attached documents —
but they still can’t:
-
Search across large knowledge bases automatically,
-
Maintain persistent memory across sessions,
-
Retrieve structured metadata or enforce data lineage.
RAG remains the backbone of enterprise AI because it allows controlled, explainable, and auditable intelligence.
🔍 In One Line
RAG = Reasoning + Retrieval.
It gives LLMs a dynamic external memory, making them accurate, current, and domain-aware.