accelerators, AI Accelerators-Power consumption
active injection, Indirect prompt injection
adapter-based methods, PEFT techniques
adapters
agents, Agents-Efficiency
agent failure modes and evaluation, Agent Failure Modes and Evaluation-Efficiency
overview, Agent Overview-Agent Overview
planning agents, Planning-Tool selection
tools, Tools-Write actions
AI accelerators (see accelerators)
AI application building (see application building)
AI application planning (see application planning)
AI engineering (AIE)
AI engineering architecture (see engineering architecture)
AI engineering stack (see engineering stack)
AI judge, AI as a Judge
AI pipeline orchestration (see pipeline orchestration)
AI systems evaluation (see systems evaluation)
AI-as-a-judge, AI as a Judge-What Models Can Act as Judges?
limitations, Limitations of AI as a Judge-Biases of AI as a judge
models, What Models Can Act as Judges?-What Models Can Act as Judges?
reasons, Why AI as a Judge?
reference-based, What Models Can Act as Judges?
AI-powered data synthesis (see data synthesis, AI-powered)
AMP (automatic mixed precision), Training quantization
ANN (approximate nearest neighbor), Embedding-based retrieval
Annoy (approximate nearest neighbors oh yeah), Embedding-based retrieval
anomaly detection, Similarity Measurements Against Reference Data
Anthropic
APIs (see open source models, model APIs versus)
application building, Introduction to Building AI Applications with Foundation Models-Summary
application planning, Planning AI Applications-Maintenance
engineering stack, The AI Engineering Stack-AI Engineering Versus Full-Stack Engineering
foundation model use cases, Foundation Model Use Cases-Workflow Automation
rise of AI engineering, The Rise of AI Engineering-From Foundation Models to AI Engineering
application development, Three Layers of the AI Stack, Application development-AI interface
application planning, Planning AI Applications-Maintenance
approximate nearest neighbor (ANN), Embedding-based retrieval
approximate string matching, Lexical similarity
ARC-C, Public leaderboards
attention mechanisms, Attention mechanism-Attention mechanism
attention modules, Transformer block
MLP modules, Transformer block
optimization, Attention mechanism optimization-Writing kernels for attention computation
redesign, Redesigning the attention mechanism
attention modules, Transformer block
augmentation of data
automated attacks, Automated attacks
automatic mixed precision (AMP), Training quantization
autoregressive decoding bottleneck, Overcoming the autoregressive decoding bottleneck-Parallel decoding
autoregressive language model, Language models
backpropagation, Backpropagation and Trainable Parameters-Backpropagation and Trainable Parameters
batch inference APIs, Online and batch inference APIs-Online and batch inference APIs
batch size, Batch size
batching
benchmarks
biases, Biases of AI as a judge, Biases
bits-per-byte (BPB), Bits-per-Character and Bits-per-Byte
bits-per-character (BPC), Bits-per-Character and Bits-per-Byte
bottlenecks
BPB (bits-per-byte), Bits-per-Character and Bits-per-Byte
BPC (bits-per-character), Bits-per-Character and Bits-per-Byte
build time, Comparing retrieval algorithms
canonical responses, Similarity Measurements Against Reference Data
capability extension, Capability extension
chain-of-thought (CoT), Give the Model Time to Think-Give the Model Time to Think, Data Curation
chaining, AI Pipeline Orchestration
change failure rate (CFR), Monitoring and Observability
CharacterEval, Roleplaying
ChatGPT
Chinchilla scaling law, Scaling law: Building compute-optimal models
chunking, RAG Architecture, Chunking strategy-Chunking strategy
Claude, RAG and, RAG
CLIP, From Large Language Models to Foundation Models, Domain-Specific Models, Introduction to Embedding
clustering, Similarity Measurements Against Reference Data
Common Crawl dataset, Training Data-Multilingual Models
comparative evaluation, Ranking Models with Comparative Evaluation-The Future of Comparative Evaluation
comparison data, Reward model
compilers, Kernels and compilers
components definition, AI Pipeline Orchestration
computational bottlenecks, Computational bottlenecks-Computational bottlenecks
computational capabilities, of AI accelerators, Computational capabilities
compute-bound bottlenecks, Computational bottlenecks
compute-optimal models, Scaling law: Building compute-optimal models-Scaling law: Building compute-optimal models
compute-optimal training, Scaling law: Building compute-optimal models
concatenation, Concatenation
constrained sampling, Constrained sampling
context construction, Prompt engineering and context construction, Provide Sufficient Context, Step 1. Enhance Context
context efficiency, Context Length and Context Efficiency-Context Length and Context Efficiency
context length, Context Length and Context Efficiency-Context Length and Context Efficiency
context parallelism, Parallelism
context precision, Comparing retrieval algorithms
context recall, Comparing retrieval algorithms
contextual retrieval, Contextual retrieval-Contextual retrieval
continuous batching, Batching
control flow, Complex plans
conversational bots, Conversational Bots
conversational feedback
conversation length, Conversation length
conversation organization, Conversation organization
extracting, Extracting Conversational Feedback-Dialogue diversity
language diversity, Dialogue diversity
natural language feedback, Natural language feedback-Sentiment
regeneration, Regeneration
copyright regurgitation, Information Extraction
copyright, model training and, Data lineage and copyright
CoT (chain-of-thought), Give the Model Time to Think-Give the Model Time to Think
CPU memory (DRAM), Memory size and bandwidth
criteria ambiguity, Criteria ambiguity-Criteria ambiguity
cross entropy, Cross Entropy
cross-layer attention, Redesigning the attention mechanism
data annotation, Data Acquisition and Annotation-Data Acquisition and Annotation
data augmentation, Data Augmentation and Synthesis-Model Distillation
data cleaning/filtering, Clean and Filter Data
data contamination, Data contamination with public benchmarks-Handling data contamination
data coverage, Data Coverage-Data Coverage
data curation, Data Curation-Data Acquisition and Annotation
data deduplication, Similarity Measurements Against Reference Data, Deduplicate Data-Deduplicate Data
data flywheels, Data Acquisition and Annotation
data formatting, Format Data-Format Data
data inspection, Inspect Data-Inspect Data
data lineage, Data lineage and copyright
data organization, Data Organization
data privacy, Data privacy
data processing, Data Processing-Format Data
data synthesis, Data Augmentation and Synthesis-Model Distillation
AI-powered, AI-Powered Data Synthesis-Obscure data lineage
model distillation, Model Distillation
traditional techniques, Traditional Data Synthesis Techniques-Simulation
data verification, Data verification-Data verification
dataset engineering, Dataset engineering, Dataset Engineering-Summary
data augmentation/synthesis, Data Augmentation and Synthesis-Model Distillation
data curation, Data Curation-Data Acquisition and Annotation
data processing, Data Processing-Format Data
data-centric view of AI, Dataset Engineering
DDR SDRAM (doubled data rate synchronous dynamic random-access memory), Memory size and bandwidth
debugging, Break Complex Tasks into Simpler Subtasks
decoding
defensive prompt engineering
jailbreaking and prompt injection, Jailbreaking and Prompt Injection-Indirect prompt injection
prompt attack defense, Defenses Against Prompt Attacks-System-level defense
degenerate feedback loops, Degenerate feedback loop
demonstration data, Supervised Finetuning
dense retrievers, Retrieval Algorithms
dimensionality reduction, Deduplicate Data
direct manual prompt hacking, Direct manual prompt hacking-Direct manual prompt hacking
Direct Preference Optimization (DPO), Preference Finetuning
distillation, Reasons to Finetune
domain-specific capability, Domain-Specific Capability-Domain-Specific Capability
domain-specific task finetuning, Reasons Not to Finetune
domain-specific training data models, Domain-Specific Models-Domain-Specific Models
dot products, Attention mechanism
doubled data rate synchronous dynamic random-access memory (DDR SDRAM), Memory size and bandwidth
DPO (Direct Preference Optimization), Preference Finetuning
DRAM (CPU memory), Memory size and bandwidth
drift detection, Drift detection
dynamic batching, Batching
dynamic features, The role of AI and humans in the application
edit distance, Lexical similarity
Elo, Ranking Models with Comparative Evaluation, Scalability bottlenecks, Quantized LoRA
embedding, Introduction to Embedding-Introduction to Embedding
embedding algorithm, Semantic similarity, Introduction to Embedding
embedding model, From Large Language Models to Foundation Models
embedding models, Introduction to Embedding
engineering architecture, AI Engineering Architecture-AI Pipeline Orchestration
AI pipeline orchestration, AI Pipeline Orchestration-AI Pipeline Orchestration
monitoring and observability, Monitoring and Observability-Drift detection
monitoring versus observability, Monitoring and Observability
step 1: enhancing context, Step 1. Enhance Context
step 2: putting in guardrails, Step 2. Put in Guardrails-Guardrail implementation
step 3: adding model router and gateway, Step 3. Add Model Router and Gateway-Gateway
step 4: reducing latency with caches, Step 4. Reduce Latency with Caches-Semantic caching
step 5: adding agent patterns, Step 5. Add Agent Patterns
engineering stack, Three Layers of the AI Stack-Three Layers of the AI Stack
application development, Three Layers of the AI Stack
infrastructure, Three Layers of the AI Stack
ML engineering versus, Model development-Inference optimization
model development, Three Layers of the AI Stack
entropy, Entropy
epochs, Number of epochs
error correction, Reflection and error correction-Reflection and error correction
evaluation, Evaluation
evaluation harnesses, Navigate Public Benchmarks
evaluation methodology, Evaluation Methodology-Summary
AI as a judge, AI as a Judge-What Models Can Act as Judges?
AI systems evaluation (see systems evaluation)
challenges, Challenges of Comparative Evaluation-From comparative performance to absolute performance
challenges of foundation model evaluation, Challenges of Evaluating Foundation Models-Challenges of Evaluating Foundation Models
exact evaluation, Exact Evaluation-Introduction to Embedding
language model for computing text perplexity, Perplexity Interpretation and Use Cases
language modeling metrics, Understanding Language Modeling Metrics-Perplexity Interpretation and Use Cases
rank models with comparative evaluation, Ranking Models with Comparative Evaluation-The Future of Comparative Evaluation
evaluation pipeline design, Design Your Evaluation Pipeline-Iterate
step 1: creating an evaluation guideline, Step 2. Create an Evaluation Guideline -Tie evaluation metrics to business metrics
step 2: evaluating all components in a system, Step 1. Evaluate All Components in a System-Step 1. Evaluate All Components in a System
step 3: defining evaluation methods and data, Step 3. Define Evaluation Methods and Data-Iterate
evaluation-driven development, Evaluation Criteria-Evaluation Criteria
eviction policies, Exact caching
exact caching, Exact caching
exact evaluation, Exact Evaluation-Introduction to Embedding
exact matches, Exact match
expectation setting, Setting Expectations
explicit feedback, Extracting Conversational Feedback-Dialogue diversity
factual consistency, Factual consistency-Factual consistency, Create scoring rubrics with examples
faithfulness, Generation Capability
feature-based transfers, Finetuning, Finetuning Overview
feature-free transfers, Finetuning
federated learning, Model Merging and Multi-Task Finetuning
feedback design
how to collect feedback, How to collect feedback-How to collect feedback
when to collect feedback
feedforward computation, Parallelism
feedforward layer, Transformer block, LoRA configurations
few-shot learning, In-Context Learning: Zero-Shot and Few-Shot-In-Context Learning: Zero-Shot and Few-Shot
finetuning, Finetuning-Summary
defined, Modeling and training
domain-specific tasks, Reasons Not to Finetune
finetuning and RAG, Finetuning and RAG-Finetuning and RAG
hyperparameters, Finetuning hyperparameters-Prompt loss weight
memory bottlenecks, Memory Bottlenecks-Training quantization
overview, Finetuning Overview-Finetuning Overview
structured outputs, Finetuning
tactics, Finetuning Tactics-Prompt loss weight
techniques, Finetuning Techniques-Prompt loss weight
when to finetune, When to Finetune-Finetuning and RAG
FLOP (floating point operation), Model Size
foundation models, From Foundation Models to AI Engineering, Understanding Foundation Models-Summary
evaluation challenges, Challenges of Evaluating Foundation Models-Challenges of Evaluating Foundation Models
inverse scaling, Model Size
modeling, Modeling-Scaling bottlenecks
parameter versus hyperparameter, Scaling extrapolation
post-training, Post-Training-Finetuning using the reward model
sampling, Sampling-Hallucination
training data, Training Data-Domain-Specific Models
use cases, Foundation Model Use Cases-Workflow Automation
full finetuning, Parameter-Efficient Finetuning-Quantized LoRA
function calling, Function calling-Function calling
fuzzy matching, Lexical similarity
H3 architecture, Other model architectures
hallucinations
hard attributes, Model Selection Workflow
hashing, Deduplicate Data
HellaSwag, Public leaderboards
hierarchical navigable small world (HNSW), Embedding-based retrieval
high-bandwidth memory (HBM), Memory size and bandwidth
hyperparameters, Scaling extrapolation, Finetuning hyperparameters-Prompt loss weight
IDF (inverse document frequency), Term-based retrieval
IFEval, Instruction-following criteria
implicit feedback, Extracting Conversational Feedback
in-context learning, In-Context Learning: Zero-Shot and Few-Shot-In-Context Learning: Zero-Shot and Few-Shot
inconsistency, Inconsistency-Inconsistency, Inconsistency
indexing
indirect prompt injection, Indirect prompt injection-Indirect prompt injection
inference APIs, Online and batch inference APIs-Online and batch inference APIs
inference optimization, Inference optimization, Inference Optimization-Summary
AI accelerators
case study from PyTorch, Kernels and compilers
inference overview
inference performance metrics, Inference Performance Metrics-Utilization, MFU, and MBU
inference service optimization, Inference Service Optimization-Parallelism
KV cache size calculation, Attention mechanism optimization
memory-bound versus bandwidth-bound interference, Computational bottlenecks
at model/hardware/service levels, Inference Optimization
model optimization, Model Optimization-Kernels and compilers
understanding, Understanding Inference Optimization-Power consumption
inference performance metrics, Inference Performance Metrics-Utilization, MFU, and MBU
inference quantization, Inference quantization-Inference quantization
inference service
inference service optimization, Inference Service Optimization-Parallelism
inference with reference, Inference with reference
INFOBench, Instruction-following criteria
information aggregation, Information Aggregation
information extraction, Information Extraction-Information Extraction
information retrieval optimization, Retrieval Optimization-Contextual retrieval
instruction data synthesis, Instruction data synthesis-Instruction data synthesis
instruction-following capability, Instruction-Following Capability-Roleplaying
instruction-following criteria, Instruction-following criteria-Instruction-following criteria
intent classifiers, Router
inter-token latency (ITL), Latency, TTFT, and TPOT
interface, AI, AI interface
internal knowledge, Memory
inverse document frequency (IDF), Term-based retrieval
inverted file index (IVF), Embedding-based retrieval
iteration, Iterate
jailbreaking, Jailbreaking and Prompt Injection-Indirect prompt injection
Jamba architecture, Other model architectures
judges (see AI judges)
LangChain, Evaluate Prompt Engineering Tools, Prompt-level defense, Memory
language modeling metrics, Understanding Language Modeling Metrics-Perplexity Interpretation and Use Cases
language models, Language models-Language models, Perplexity Interpretation and Use Cases
large language models, From Large Language Models to Foundation Models-From Large Language Models to Foundation Models
large multimodal model (LMM), From Large Language Models to Foundation Models
latency
layer stacking, Layer stacking-Layer stacking
leaderboards, Scalability bottlenecks-Lack of standardization and quality control, Benchmark selection and aggregation-Custom leaderboards with public benchmarks
learning rate, Learning rate
leniency bias, Biases
lexical similarity, Lexical similarity-Lexical similarity
linear combination summing, Linear combination-Linear combination
Llama
LLM-as-a-judge, AI as a Judge
LMM (large multimodal model), From Large Language Models to Foundation Models
local factual consistency, Factual consistency
locality-sensitive hashing (LSH), Embedding-based retrieval
logit vectors, Sampling Fundamentals
logprobs, Temperature, Select evaluation methods
long-term memory, Memory
loop tiling, Kernels and compilers
LoRA (low-rank adaptation), LoRA-Quantized LoRA
low-rank factorization, LoRA
LSH (locality-sensitive hashing), Embedding-based retrieval
Mamba architecture, Other model architectures
manual generation, Traditional Data Synthesis Techniques-Simulation
masked language models, Language models
Massive Multitask Language Understanding (MMLU), Maintenance, Public leaderboards
MBU (model bandwidth utilization), Utilization, MFU, and MBU-Utilization, MFU, and MBU
MCQs (multiple-choice questions), Domain-Specific Capability
mean time to detection (MTTD), Monitoring and Observability
mean time to response (MTTR), Monitoring and Observability
memory bottlenecks, Memory Bottlenecks-Training quantization
bandwidth-bound, Computational bottlenecks
memory math, Memory Math-Memory needed for training
quantization, Quantization-Training quantization
size and bandwidth, Memory size and bandwidth-Memory size and bandwidth
memory math, Memory Math-Memory needed for training
MFU (model FLOPs utilization), Utilization, MFU, and MBU-Utilization, MFU, and MBU
milestone planning, Milestone Planning
mixture-of-experts (MoE) models, Model Size, Layer stacking
ML engineering, AI engineering versus, AI Engineering Versus ML Engineering-AI interface
MLP modules, Transformer block
MMLU (Massive Multitask Language Understanding), Maintenance, Public leaderboards
model APIs, open source models versus (see open source models, model APIs versus)
model architecture, Model Architecture-Other model architectures
model bandwidth utilization (MBU), Utilization, MFU, and MBU-Utilization, MFU, and MBU
model compression, Model compression
model development, Three Layers of the AI Stack, Model development-Inference optimization
model distillation, Model Distillation
model FLOPs utilization (MFU), Utilization, MFU, and MBU-Utilization, MFU, and MBU
model inference, Maintenance
model merging, Model Merging and Multi-Task Finetuning-Concatenation
model optimization, Model Optimization-Kernels and compilers
attention mechanism optimization, Attention mechanism optimization-Writing kernels for attention computation
autoregressive decoding bottleneck, Overcoming the autoregressive decoding bottleneck-Parallel decoding
kernels and compilers, Kernels and compilers-Kernels and compilers
model compression, Model compression
model ranking, Ranking Models with Comparative Evaluation-The Future of Comparative Evaluation
model router, Step 3. Add Model Router and Gateway-Gateway
model selection, Model Selection-Handling data contamination
model build versus buy, Model Build Versus Buy-On-device deployment
model selection workflow, Model Selection Workflow-Model Selection Workflow
navigating public benchmarks, Navigate Public Benchmarks-Custom leaderboards with public benchmarks
model size, Model Size-Scaling bottlenecks
model-centric AI, Dataset Engineering
model-level defense, Model-level defense
modeling, Modeling-Scaling bottlenecks
MoE (mixture-of-experts) models, Layer stacking
monitoring, Break Complex Tasks into Simpler Subtasks, Monitoring and Observability-Drift detection
MTTD (mean time to detection), Monitoring and Observability
MTTR (mean time to response), Monitoring and Observability
multi-query attention, Redesigning the attention mechanism
multi-task finetuning, Model Merging and Multi-Task Finetuning
multilingual training data models, Multilingual Models-Multilingual Models
multimodal models, From Large Language Models to Foundation Models
multiple-choice questions (MCQs), Domain-Specific Capability
n-gram similarity, Lexical similarity
natural language feedback, Natural language feedback-Sentiment
natural language generation (NLG), Generation Capability-Safety
natural language processing (NLP), Generation Capability-Safety
needle in a haystack (NIAH) test, Context Length and Context Efficiency
obscure data lineage, Obscure data lineage
observability, Monitoring and Observability-Drift detection
on-device deployment, On-device deployment
online inference APIs, Online and batch inference APIs-Online and batch inference APIs
Open CLIP, Domain-Specific Models
open source licenses, Open source, open weight, and model licenses-Open source, open weight, and model licenses
open source models, model APIs versus, Open source models versus model APIs-On-device deployment
open weight models, Open source, open weight, and model licenses
OpenAI
operator fusion, Kernels and compilers
optimization
pairwise comparison, Deduplicate Data
parallel decoding, Parallel decoding
parallelism, Parallelism-Parallelism
parallelization, Break Complex Tasks into Simpler Subtasks, Kernels and compilers
parameter-efficient finetuning, Parameter-Efficient Finetuning-Quantized LoRA
adapter-based/soft-prompt techniques, PEFT techniques-PEFT techniques
LoRA, LoRA-Quantized LoRA
Pareto optimization, Cost and Latency
partial finetuning, Parameter-Efficient Finetuning
passive phishing, Indirect prompt injection
PEFT (see parameter-efficient finetuning)
perplexity, Perplexity-Perplexity Interpretation and Use Cases
perturbation, Rule-based data synthesis
pipeline orchestration, AI Pipeline Orchestration-AI Pipeline Orchestration
monitoring and observability, Monitoring and Observability-Drift detection
planning
plan generation, Plan generation-Complex plans
reflection and error correction, Reflection and error correction-Reflection and error correction
pointwise evaluation, Reward model, Ranking Models with Comparative Evaluation
position bias, Biases
post-processing, Prompting
post-training, Modeling and training, Post-Training-Finetuning using the reward model
potential model collapse, Potential model collapse
power consumption, Power consumption-Power consumption
PPO (proximal policy optimization), Finetuning using the reward model
pre-training, Modeling and training
precision bits, Numerical Representations
preference bias, Biases
preference finetuning, Preference Finetuning-Finetuning using the reward model, Finetuning Overview
preference models, What Models Can Act as Judges?
prefilling, Transformer architecture
prefilling, decoupling from decoding, Decoupling prefill and decode
proactive features, The role of AI and humans in the application
probabilistic nature of AI, The Probabilistic Nature of AI-Hallucination
procedural generation, Traditional Data Synthesis Techniques-Simulation
product quantization, Embedding-based retrieval
prompt attacks, Defensive Prompt Engineering, Jailbreaking and Prompt Injection-Indirect prompt injection
prompt caching, Prompt caching-Prompt caching
prompt catalogs, Organize and Version Prompts
prompt engineering, Prompt Engineering-Summary
basics, Introduction to Prompting-Context Length and Context Efficiency
best practices, Prompt Engineering Best Practices-Organize and Version Prompts
defensive engineering, Defensive Prompt Engineering-System-level defense
restricting model knowledge to its context, Provide Sufficient Context
terminology ambiguity: prompt versus context, In-Context Learning: Zero-Shot and Few-Shot
prompt loss rate, Prompt loss weight
prompt optimization, Evaluate Prompt Engineering Tools
prompt versioning, Organize and Version Prompts-Organize and Version Prompts
prompt-level defense, Prompt-level defense
proprietary prompts, Proprietary Prompts and Reverse Prompt Engineering-Proprietary Prompts and Reverse Prompt Engineering
proximal policy optimization (PPO), Finetuning using the reward model
public leaderboards, Public leaderboards
QAT (quantization-aware training), Training quantization
QLoRA (quantized LoRA), Quantized LoRA-Quantized LoRA
QPS (queries per second), Comparing retrieval algorithms
quality control, Quality control
quantization, Quantization-Training quantization
quantization-aware training (QAT), Training quantization
quantized LoRA (QLoRA), Quantized LoRA-Quantized LoRA
queries per second (QPS), Comparing retrieval algorithms
query rewriting, Query rewriting
query vector (Q), Attention mechanism
RAG (retrieval-augmented generation), RAG-RAG with tabular data
finetuning and, Finetuning and RAG-Finetuning and RAG
RAG architecture, RAG Architecture
RAG beyond texts, RAG Beyond Texts-RAG with tabular data
retrieval algorithms, Retrieval Algorithms-Combining retrieval algorithms
retrieval optimization, Retrieval Optimization-Contextual retrieval
random feedback, Biases
range bits, Numerical Representations
rating algorithms, Ranking Models with Comparative Evaluation
reactive features, The role of AI and humans in the application
recall, Comparing retrieval algorithms
recurrent neural networks (RNNs), Transformer architecture
reference-based judges, What Models Can Act as Judges?
reference-based metrics, Similarity Measurements Against Reference Data
reference-free metrics, Similarity Measurements Against Reference Data
reflection, Reflection and error correction-Reflection and error correction
regeneration, Regeneration
reinforcement learning from human feedback (RLHF), Preference Finetuning-Finetuning using the reward model
relevance, Generation Capability
reliability, latency versus, Guardrail implementation
replica parallelism, Parallelism
reranking, Reranking
restricted weight, Open source, open weight, and model licenses
retrieval algorithms, Retrieval Algorithms-Combining retrieval algorithms
retrieval optimization
retrieval-augmented generation (see RAG)
retrievers
reverse prompt engineering, Proprietary Prompts and Reverse Prompt Engineering-Proprietary Prompts and Reverse Prompt Engineering
reward models, Reward model-Reward model, What Models Can Act as Judges?
RLHF (reinforcement learning from human feedback), Preference Finetuning-Finetuning using the reward model
RNNs (recurrent neural networks), Transformer architecture
RoleLLM, Roleplaying
roleplaying, Roleplaying-Roleplaying
rule-based data synthesis, Rule-based data synthesis-Rule-based data synthesis
S4 architecture, Other model architectures
sampling, Sampling-Hallucination
probabilistic nature of AI, The Probabilistic Nature of AI-Hallucination
sampling fundamentals, Sampling Fundamentals-Sampling Fundamentals
sampling strategies, Sampling Strategies-Stopping condition
strategies, Sampling Strategies-Stopping condition
structured outputs, Structured Outputs-Finetuning
test time compute, Test Time Compute-Test Time Compute
scaling bottlenecks, Scaling bottlenecks-Scaling bottlenecks, Scalability bottlenecks
scaling extrapolation, Scaling extrapolation
scaling law, Scaling law: Building compute-optimal models-Scaling law: Building compute-optimal models
scoring rubrics, Create scoring rubrics with examples
self-evaluation, What Models Can Act as Judges?
self-supervision language models, Self-supervision-Self-supervision
self-verification, Factual consistency
semantic caching, Semantic caching
semantic similarity, Semantic similarity-Semantic similarity
sequence parallelism, Parallelism
sequential finetuning, Model Merging and Multi-Task Finetuning
SFT (supervised finetuning), Post-Training, Supervised Finetuning-Supervised Finetuning, Finetuning Overview
short-term memory, Memory
simulation, Simulation
simultaneous finetuning, Model Merging and Multi-Task Finetuning
SLERP (spherical linear interpolation), Spherical linear interpolation (SLERP)
slicing, Annotate evaluation data
soft attributes, Model Selection Workflow
soft prompt-based PEFT methods, PEFT techniques-PEFT techniques
sparse models, Model Size, Model compression
sparse retrievers, Retrieval Algorithms
speculative decoding, Speculative decoding-Speculative decoding
spherical linear interpolation (SLERP), Spherical linear interpolation (SLERP)
SQL queries, Agent Overview
static batching, Batching
static features, The role of AI and humans in the application
stopping condition, Stopping condition
structured data, Perplexity Interpretation and Use Cases, Memory
structured outputs, Structured Outputs-Finetuning
summing, Summing-Pruning redundant task-specific parameters
superficial imitation, Superficial imitation
supervised finetuning (SFT), Post-Training, Supervised Finetuning-Supervised Finetuning, Finetuning Overview
supervision, Self-supervision
synthesis of data (see data synthesis)
system components evaluation, Step 1. Evaluate All Components in a System-Step 1. Evaluate All Components in a System
system prompts, System Prompt and User Prompt-System Prompt and User Prompt
system-level defense, System-level defense
systems evaluation, Evaluate AI Systems-Summary
evaluation criteria, Evaluation Criteria-Cost and Latency
evaluation pipeline design, Design Your Evaluation Pipeline-Iterate
evaluation-driven development, Evaluation Criteria-Evaluation Criteria
model selection, Model Selection-Handling data contamination
OpenAI model quality, Custom leaderboards with public benchmarks
task-based evaluation, Step 1. Evaluate All Components in a System
temperature, Temperature-Temperature
term frequency (TF), Term-based retrieval
text-to-SQL, Structured Outputs, Functional Correctness, RAG with tabular data
throughput, Throughput and goodput-Throughput and goodput
time between tokens (TBT), Latency, TTFT, and TPOT
time per output token (TPOT), Setting Expectations, Latency, TTFT, and TPOT-Latency, TTFT, and TPOT
time to first token (TTFT), Setting Expectations, Latency, TTFT, and TPOT-Latency, TTFT, and TPOT
tokenization, Multilingual Models, Model Size, Bits-per-Character and Bits-per-Byte, Term-based retrieval, Chunking strategy
tokenizer, Chunking strategy
tokens, Language models, Model Size
tool use, Tool selection
top-k, Top-k
top-p, Top-p
TPOT (time per output token), Setting Expectations, Latency, TTFT, and TPOT-Latency, TTFT, and TPOT
traces, Logs and traces
trainable parameters, Backpropagation and Trainable Parameters-Backpropagation and Trainable Parameters
training, Modeling and training-Modeling and training
training data, Training Data-Domain-Specific Models
training quantization, Training quantization-Training quantization
transfer learning, Finetuning Overview
transformer architecture, Transformer architecture-Transformer block
attention mechanism, Attention mechanism-Attention mechanism
transformer blocks, Transformer block-Transformer block
TruthfulQA, Public leaderboards
TTFT (time to first token), Setting Expectations, Latency, TTFT, and TPOT-Latency, TTFT, and TPOT
turn-based evaluation, Step 1. Evaluate All Components in a System
unstructured data, Data Organization, Memory
use case evaluation, Use Case Evaluation-AI product defensibility
usefulness threshold, Setting Expectations
user feedback, User Feedback-Degenerate feedback loop
extracting conversational feedback, Extracting Conversational Feedback-Dialogue diversity
feedback design, Feedback Design-How to collect feedback
feedback limitations, Feedback Limitations-Degenerate feedback loop
value vector (V), Attention mechanism
vector database, Embedding-based retrieval-Embedding-based retrieval
vectorization, Kernels and compilers
vocabulary, Perplexity Interpretation and Use Cases