Tuesday, October 28, 2025

LLM - where are the parameters stored, and the file system

 

What is an LLM? Is it a set of files? Does it sit as an .exe? A folder? A single binary? What does it LOOK LIKE if I download it?”

Answer: YES — an LLM is literally a set of files.
A big model file — like .bin, .pth, .safetensors, etc. — usually 2GB to 400GB+.

Parameters live inside the model — not in vector DB.

Vector DB only stores embeddings of user/business knowledge for retrieval.

Monday, October 13, 2025

Amazon Bedrock Guardrails

Amazon Bedrock Guardrails provides safeguards that you can configure for your generative AI applications based on your use cases and responsible AI policies. You can create multiple guardrails tailored to different use cases and apply them across multiple foundation models (FMs), providing a consistent user experience and standardizing safety and privacy controls across generative AI applications. You can use guardrails for both model prompts and responses with natural language.

You can use Amazon Bedrock Guardrails in multiple ways to help safeguard your generative AI applications. For example:

  • A chatbot application can use guardrails to help filter harmful user inputs and toxic model responses.

  • A banking application can use guardrails to help block user queries or model responses associated with seeking or providing investment advice.

  • A call center application to summarize conversation transcripts between users and agents can use guardrails to redact users’ personally identifiable information (PII) to protect user privacy.

Amazon Bedrock Guardrails provides the following safeguards (also known as policies) to detect and filter harmful content:

  • Content filters – Detect and filter harmful text or image content in input prompts or model responses. Filtering is done based on detection of certain predefined harmful content categories: Hate, Insults, Sexual, Violence, Misconduct and Prompt Attack. You also can adjust the filter strength for each of these categories.

  • Denied topics – Define a set of topics that are undesirable in the context of your application. The filter will help block them if detected in user queries or model responses.

  • Word filters – Configure filters to help block undesirable words, phrases, and profanity (exact match). Such words can include offensive terms, competitor names, etc.

  • Sensitive information filters – Configure filters to help block or mask sensitive information, such as personally identifiable information (PII), or custom regex in user inputs and model responses. Blocking or masking is done based on probabilistic detection of sensitive information in standard formats in entities such as SSN number, Date of Birth, address, etc. This also allows configuring regular expression based detection of patterns for identifiers.

  • Contextual grounding checks – Help detect and filter hallucinations in model responses based on grounding in a source and relevance to the user query.

  • Automated Reasoning checks – Can help you validate the accuracy of foundation model responses against a set of logical rules. You can use Automated Reasoning checks to detect hallucinations, suggest corrections, and highlight unstated assumptions in model responses.

x

Generative AI Lifecycle

The generative AI lifecycle provides a structured framework for developing and deploying AI solutions. It consists of five key stages: defining a use case, selecting a foundation model, improving performance, evaluating results, and deploying the application.

This iterative process begins with clearly articulating the business problem and requirements, then choosing an appropriate pre-trained model as a starting point.

Throughout the lifecycle, there's a focus on continuous refinement to ensure the AI solution remains effective and aligned with business objectives.


While generative AI has numerous applications, it's equally important to recognize situations where it might not be the most appropriate solution. These include situations with high accuracy and reliability requirements, ill-defined or constantly changing problems, insufficient data quality, the need for explainability and transparency, cost-benefit considerations, and ethical concerns.



Sometimes, other methods work better than AI. This includes simple tasks that are solvable with rule-based solutions, or when model costs outweigh business benefits.

AI Use Cases














Amazon Sagemaker

 


Amazon SageMaker is used by hundreds of thousands of AWS customers to build, train, and deploy machine learning models. Now, we've taken the machine learning service and added AWS analytics capabilities - creating one unified platform for data, analytics, and AI.

The next generation of Amazon SageMaker includes virtually all of the components you need for fast SQL analytics, big data processing, search, data preparation and integration, AI model development and training, and generative AI - along with a single view into all of your enterprise data. You get a single data and AI development environment with the SageMaker unified studio, a lakehouse architecture that unifies access to all your data - on S3, in Redshift, in SaaS applications, on-premises, or in other clouds - through the open Apache Iceberg standard interface and with the SageMaker Catalog built into Unified Studio, you get end-to-end governance for your data and AI workflows.


Amazon SageMaker AIThe service previously known as Amazon SageMaker has been renamed Amazon SageMaker AI. It is integrated within the next generation of SageMaker and is also available as a standalone service for those who wish to focus specifically on building, training, and deploying AI and ML models at scale.

Amazon SageMaker AI is a fully managed service to build, train, and deploy ML models - including foundation models - for any use case by bringing together a broad set of tools to enable high-performance, low-cost machine learning. It is available as a standalone service in the AWS console, or via APIs. Model development capabilities from SageMaker AI are available in the next generation of Amazon SageMaker.

1/Amazon SageMaker AI provides access to high-performance, cost-effective, scalable, and fully managed infrastructure and tools for each step of the ML lifecycle. Using Amazon SageMaker AI tools, you can easily build, train, test, troubleshoot, deploy, and manage FMs models at scale and boost productivity of data scientists and ML engineers while maintaining model performance in production.

2/You can explore Amazon SageMaker JumpStart, which is a ML hub offering models, algorithms, and prebuilt ML solutions. SageMaker JumpStart offers hundreds of ready-to-use FMs from various model providers, including a growing list of best performing publicly available FMs such as Falcon-40B, Stable Diffusion, OpenLLaMA, and Flan-T5/UL2.

3/Amazon SageMaker machine learning operations (MLOps) capabilities help you create repeatable workflows across the ML lifecycle to experiment, train, deploy, and govern ML models at scale while maintaining model performance in production.

4/Amazon SageMaker AI provides purpose-built governance tools to help you implement ML responsibly. Amazon SageMaker Model Cards makes it easier to capture, retrieve, and share essential model information. Once the models are deployed, SageMaker Model Dashboard gives you unified monitoring across all your models by providing deviations from expected behavior, automated alerts, and troubleshooting to improve model performance. Amazon SageMaker Clarify detects and measures potential bias using a variety of metrics to help you address potential bias and explain model predictions.

5/With Amazon SageMaker Ground Truth, you can use human feedback to customize models on company- or domain-specific data for your unique use case to improve model output and task performance.

Visualizing Next Word Prediction - How to LLMs Work?

 https://bbycroft.net/llm