| 13B | 13 Billion (parameters) |
| 405B | 405 Billion (parameters) |
| 7B | 7 Billion (parameters) |
| 70B | 70 Billion (parameters) |
| A100 | Ampere 100 Nvidia GPU |
| A2C | Advantage Actor Critic |
| A3C | Asynchronous Advantage Actor Critic (predates A2C) |
| AAAI | Association for the Advancement of Artificial Intelligence |
| ACL | Association for Computational Linguistics |
| ACM | Association for Computing Machiner |
| AdaM | Adaptive Moment estimation (momentum) |
| AdaMW | Adaptive Moment estimation with Weight decay |
| ADMM | Alternating Direction Method of Multipliers |
| AGI | Artificial General Intelligence |
| AGIEval | Artificial General Intelligence Evaluation (exams dataset) |
| AI | Artificial Intelligence |
| AI2 | Allen Institute for Artificial Intelligence |
| aka | also known as |
| AMI | Amazon Machine Image |
| AMD | Advanced Micro Devices |
| ANI | Artificial Narrow Intelligence |
| ANN | Artificial Neural Network |
| ANSI | American National Standards Institute |
| AP | Advanced Placement (exams) |
| APE | Automated Prompt Engineering |
| API | Application Programming Interface |
| AR | Augmented Reality |
| ARC | Abstraction and Reasoning Corpus |
| ARC | AI2 Reasoning Challenge |
| ARC-C | AI2 Reasoning Challenge - Challenge set |
| ARC-E | AI2 Reasoning Challenge - Easy set |
| ARES | Automated RAG Evaluation System |
| ASCII | American Standard Code for Information Interchange |
| ASI | Artificial Super Intelligence |
| ASR | Automatic Speech Recognition (speech-to-text) |
| ASR | Automatic Speech Translation |
| AT | Added Toxicity |
| AUC | Area Under the Curve (curve could be ROC, DET, PR, etc) |
| AVX2 | Advanced Vector eXtensions version 2 (256-bit: eight 32-bit single-precision numbers) |
| AVX512 | Advanced Vector eXtensions version 512 (512-bit: sixteen 32-bit single-precision numbers) |
| AWS | Amazon Web Services |
| B | Billion |
| B100 | Blackwell 100 Nvidia GPU |
| BAIR | Berkeley AI Research |
| BART | Bidirectional and Auto-Regressive Transformers |
| BB | BIG Benchmark |
| BBH | Beyond the Imitation Game (BIG) Bench Hard suite |
| BBQ | Bias Benchmark for Question answering |
| BCE | Binary Cross Entropy |
| BERT | Bidirectional Encoder Representations from Transformers |
| BEST-RQ | BErt-based Speech pre-Training with Random-projection Quantizer |
| BF16 | 16-bit Brain Floating-point format = (-1 * sign_bit) * (2 ** (128 * exponent_bit[7] + 64 * exponent_bit[6] + 32 * exponent_bit[5] + 16 * exponent_bit[4] + 8 * exponent_bit[3] + 4 * exponent_bit[2] + 2 * exponent_bit[1] + 1 * exponent_bit[0] - 127)) * (1 + 1/2 * mantissa_bit[6] + 1/4 * mantissa_bit[5] + 1/8 * mantissa_bit[4] + 1/16 * mantissa_bit[3] + 1/32 * mantissa_bit[2] + 1/64 * mantissa_bit[1] + 1/128 * mantissa_bit[0]) [example: bin(torch.tensor(-1.5, dtype = torch.bfloat16).view(torch.uint16)) = 0b1011111111000000] |
| BFCL | Berkeley Function Calling Leaderboard |
| BFGS | Broyden Fletcher Goldfarb Shanno optimization |
| BFS | Breadth-First Schedule (or Search) |
| Bi-LSTM | Bidirectional LSTM |
| BIG | Beyond the Imitation Game (the Turing Test is known as the Imitation Game) |
| BLAS | Basic Linear Algebra Subprograms |
| BLEU | BiLingual Evaluation Understudy |
| BLOOM | Bigscience Large Open-science Open-access Multilingual language model |
| BM25 | Best Match 25 (an extension of TF*IDF with length normalization and term frequency saturation) |
| BN | Batch Normalization (center and scale) |
| BOLD | Bias in Open-ended Language generation Dataset |
| BoolQ | Boolean (yes/no) Questions (dataset) |
| BPE | Byte Pair Encoding |
| BPTT | Back Propagation Through Time |
| BSD | Berkeley Software Distribution license |
| C4 | Colossal, Cleaned Common Crawl |
| CAM | Class Activation Map |
| CBOW | Continuous Bag Of Words |
| CBRNE | Chemical, Biological, Radiological, Nuclear, and high-yield Explosives (threats) |
| CD | Contrastive Divergence |
| CelebA | Celebrity faces with Attributes |
| CERN | Conseil Européen pour la Recherche Nucléaire |
| cGAN | conditional GAN |
| ChartQA | Chart Question Answering |
| CI | Confidence Interval, where confidence is the probability that the interval construction method will generate an interval that contains the true value of the parameter of interest [if there is no overlap between a pair of confidence intervals, we assume there is a statistically significant difference between the parameters being compared] |
| CIFAR | Canadian Institute For Advanced Research |
| CLEVR | Compositional Language and Elementary Visual Reasoning |
| CLIP | Contrastive Language-Image Pretraining |
| CLM | Causal Language Modeling |
| CLS | CLaSsification token |
| CNN | Convolutional Neural Network |
| CNTK | Cognitive ToolKit |
| CO2 | Carbon diOxide (emissions) |
| COCO | Common Objects in Context |
| CoLA | Corpus of Linguistic Acceptability |
| CoNLL | Conference on Natural Language Learning |
| ConvNet | Convolutional Network |
| CoQA | Conversation Question Answering |
| CoT | Chain of Thought |
| CP | Context Parallelism (input sequence chunks are processed in parallel) |
| CPU | Central Processing Unit |
| CR | Customer Reviews dataset |
| CRF | Conditional Random Field |
| CSAM | Child Sexual Abuse Material |
| CSI | Control Sequence Introducer: an ANSI sequence for controlling foreground and background colors for text, e.g. f'\x1b[0;30;48;2;{red};{green};{blue}m' contains 0 for reset; 30 for black foreground color; 48 for background color; and 2 for red, green, and blue components for background color |
| CSS | Cascading Style Sheet |
| CSV | Comma Separated Values |
| CUDA | Common Unified Device Architecture |
| cuDNN | CUDA DNN library |
| CV | Cross Validation; also Computer Vision |
| CVF | Computer Vision Foundation |
| CVPR | Computer Vision and Pattern Recognition |
| DAG | Directed Acyclic Graph |
| DCGAN | Deep Convolutional GAN |
| DCQCN | Data Center Quantized Congestion Notification |
| DDDQN | Dueling Double Deep Quality estimation Network [to be fair, I've not seen others abbreviate this] |
| DDPG | Deep Deterministic Policy Gradient |
| DDQN | Double Deep Quality estimation Network (as in two networks) |
| DeBERTa | Decoding-enhanced BERT with disentangled attention |
| DET | Detection Error Trade-off |
| distilBERT | distilled (smaller) version of larger BERT model |
| df | degrees of freedom |
| DFS | Depth-First Schedule (or Search) |
| DL | Deep Learning |
| DM Mathematics | Deep Mind Mathematics dataset |
| DNN | Deep Neural Network |
| DocQA | Document Question Answering (dataset) |
| DocVQA | Document Visual Question Answering (dataset) |
| DP | Data Parallelism (observations are processed in parallel) |
| DPO | Direct Preference Optimization |
| DQN | Deep Quality estimation Network |
| DRAM | Dynamic Random Access Memory |
| DRL | Deep Reinforcement Learning |
| DROP | Discrete Reasoning Over the content of Paragraphs |
| DSO | Dynamic Shared Object |
| DSPy | Demonstrate Search Predict for python (pipeline optimization) |
| DSVM | Data Science Virtual Machine |
| DTD | Describable Textures Dataset |
| DUC | Document Understanding Conference |
| EC2 | Elastic Compute Cloud |
| ECACL | European Chapter of the ACL |
| ECCV | European Conference on Computer Vision |
| ECMP | Equal Cost Multi-Path (routing) |
| ELECTRA | Efficiently Learning an Encoder that Classifies Token Replacements Accurately |
| ELBO | Evidence Lower BOund |
| ELMo | Embeddings from Language Models |
| Elo | Arpad Elo's last name (pronounced "ee lou"): devised rating system where player's initial rating moves up or down based on rating of opponent |
| ELRA | European Language Resources Association |
| ELU | Exponential Linear Unit |
| EM | Exact Match |
| EM | Expectation Maximization |
| EMA | Exponential Moving Average |
| EMNLP | Empirical Methods in Natural Language Processing |
| ETA | Estimated Time of Arrival (of completion) |
| EuroSAT | European Satellite |
| EWMA | Exponentially Weighted Moving Average |
| EXAMS | multi-subject high-school EXAMinationS (dataset) |
| exp | exponential function [base is 'e' (Euler's number ~ 2.71828)] |
| F score | Function returning the harmonic mean of precision and recall (always less than or equal to arithmetic mean) |
| F-beta | TP / (TP + (FP + beta * FN) / (1 + beta)) |
| F1 | TP / (TP + (FP + FN) / 2) |
| f8_e4m3 | 8-bit floating-point format, with 4-bit exponent and 3-bit mantissa |
| FAISS | Facebook Artificial Intelligence Similarity Search |
| FER | Facial Expression Recognition |
| FFN | Feed Forward Network |
| FFT | Fast Fourier Transform |
| FGVC | Fine-Grained Visual Classification |
| FID | Frechet Inception Distance |
| FLaN | Finetuned Language Network |
| FLEURS | Few-shot Learning Evaluation of Universal Representations of Speech |
| FLOPs | FLoating-point Operations (Per Second) |
| FMA | Fused Multiply-Add |
| FN | False Negative [Actual = Positive; Prediction = Negative] |
| FNR | FN Rate |
| FP | False Positive [Actual = Negative; Prediction = Positive] |
| FP8 | 8-bit Floating Point representation (see f8_e4m3) |
| FPR | FP Rate |
| FRR | False Refusal Rate (false positive rate for safety) |
| FSDP | Fully Sharded Data Parallelism |
| FT | Fine Tuning |
| GAE | Generalized Advantage Estimation |
| GAIA | General AI Assistants (benchmark) |
| GAN | Generative Adversarial Network |
| GAT | Graph ATtention network |
| GB | GigaBytes |
| GCN | Graph Convolutional Network |
| GCP | Google Cloud Platform |
| GELU | Gaussian Error Linear Unit |
| GEMM | GEneral Matrix Multiplication |
| gensim | generate similar |
| GGML | GPT-Generated Model Language |
| GGUF | GPT-Generated Unified Format |
| GLM | General Language Model |
| GLM | Generalized Linear Model |
| GloVe | Global Vectors for word representation |
| GLUE | General Language Understanding Evaluation |
| GMAT | Graduate Management Admission Test |
| GNN | Graph Neural Network |
| Gov | Government |
| GPQA | Graduate-level Google-Proof Question Answering (dataset) |
| GPT | Generative Pre-trained Transformer |
| GPTQ | GPT Quantization |
| GPU | Graphics Processing Unit |
| GQA | Generalized Query Attention |
| GQA | Grouped Query Attention |
| GRE | Graduate Record Examination |
| GRU | Gated Recurrent Unit cell (a set of 3 or 6 matrices) |
| GSM8K | Grade School Math 8000 problems dataset |
| GTSRB | German Traffic Sign Recognition Benchmark dataset |
| GTX | Giga Texel shader eXtreme |
| HBM | High-Bandwidth Memory |
| HDF5 | Hierarchical Data Format version 5 |
| HellaSwag | Harder Endings, Longer contexts, and Lowshot Activities for Situations With Adversarial Generations |
| HELM | Holistic Evaluation of Language Models |
| HH | Helpful and Harmless dialogue dataset |
| HMM | Hidden Markov Model |
| HNSW | Hierarchical Navigable Small Worlds |
| HTML | Hyper Text Markup Language |
| HTTP | Hyper Text Transfer Protocol |
| HTTPS | Hyper Text Transfer Protocol Secure |
| HSV | Hue, Saturation, and Value |
| HumanEval | Human (code) Evaluation (dataset) |
| I | Identity matrix |
| I | Informational message |
| ICASSP | International Conference on Acoustics, Speech, and Signal Processing |
| ICCV | International Conference on Computer Vision |
| ICD | Insecure Code Detector |
| ICLR | International Conference on Learning Representations |
| ICML | International Conference on Machine Learning |
| IDF | Inverse Document Frequency |
| IDSIA | Istituto Dalle Molle di Studi sull'Intelligenza Artificiale |
| IEEE | Institute of Electrical and Electronics Engineers |
| IFEval | Instruction Following Evaluation (benchmark) |
| IFT | Instruction Fine Tuning |
| IID | Independent and Identically Distributed |
| IJCAI | International Joint Conference on Artificial Intelligence |
| IJCNLP | International Joint Conference on Natural Language Processing |
| ILSVRC | Imagenet Large Scale Visual Recognition Challenge |
| IMDB | Internet Movie DataBase |
| IML | Instruction Meta Learning |
| IO | Input Output |
| IOU | Intersection Over Union |
| IRA | Irish Republican Army (referenced by a paper, regarding safety) |
| IS | Inception Score |
| ISBN | International Standard Book Number |
| ISSN | International Standard Service Number |
| ITN | Inverse Text Normalization |
| JSON | JavaScript Object Notation |
| k | A variable often used to represent a count, as in k-fold CV or k-means |
| K80 | Kepler 80 Nvidia GPU |
| KITTI | Karlsruhe Institute of Technology and Toyota Technological Institute |
| KL | Kullback - Leibler divergence (relative entropy) |
| KTO | Kahneman-Tversky Optimization |
| l1, l2 | Lebesgue space norm, defined as the "p"-th root of the sum of abolute values raised to the "p"-th power |
| L-BFGS | Limited-memory Broyden Fletcher Goldfarb Shanno optimization |
| LAMB | Layerwise Adaptive Moments optimizer for Batch training |
| LaMDA | Language Model for Dialog Applications |
| LCFT | Long Context Fine Tuning |
| LG | LLaMA Guard |
| LHC | Large Hadron Collider |
| libROSA | library for the Recognition and Organization of Speech and Audio |
| LID | Language IDentification |
| LLaMA | Large Language model Meta AI |
| LLaVA | Large Language and Vision Assistant |
| LLM | Large Language Model |
| LM | Language Model |
| LMSys | Large Model Systems (organization) |
| log | logarithm [base is 'e' (Euler's number), unless specified otherwise] |
| LoRA | Low Rank Adaptation |
| LR | Learning Rate |
| LREC | Language Resources and Evaluation Conference |
| LSAT | Law School Admission Test |
| LSTM | Long Short-Term Memory cell (a set of 4 or 8 matrices) |
| LT | Lost Toxicity |
| M4T | Massively Multilingual and Multimodal Machine Translation |
| M60 | Maxwell 60 Nvidia GPU |
| MAE | Mean Absolute Error |
| MAP | Maximum A Posteriori |
| MAP@k | Mean Average Precision for 'k' recommendations |
| MAP-Elites | Multi-dimensional Archive of Phenotypic Elites |
| MAST | ML Application Scheduler on Twine (Twine is Metas cluster management system) |
| MATH | Mathematics Aptitude Test of Heuristics (dataset) |
| MB | Mega Bytes |
| MBPP | Mostly Basic Python Problems (dataset) |
| MC | Monte Carlo |
| MC | Multiple Choice |
| MCMC | Markov Chain Monte Carlo |
| MCQ | Multiple Choice Question |
| MCTS | Monte Carlo Tree Search |
| MDP | Markov Decision Process |
| METEOR | Metric for Evaluation of Translation with Explicit ORdering |
| MFU | Model FLOPs Utilization |
| MFCC | Mel(ody) Frequency Cepstral Coefficients |
| MGSM | Multilingual Grade School Math |
| MHR | Modularity - Hierarchy - Reuseg |
| MIPRO | Multi-prompt Instruction PRoposal Optimizer |
| MIT | Massachusetts Institute of Technology license |
| ML | Machine Learning |
| MLE | Maximum Likelihood Estimate |
| MLM | Masked Language Modeling |
| MLP | Multi-Layer Perceptron (stack of "dense" layers) |
| MLS | Multilingual LibriSpeech |
| MMDialog | Multi-Modal Dialog |
| MMLU | Massive Multi-task Language Understanding |
| MMLU-Pro | Massive Multi-task Language Understanding - Professional |
| MMMU | Massive Multi-discipline Multimodal Understanding (benchmark) |
| MNIST | Modified NIST |
| MNLI | Multi-genre Natural Language Inference dataset |
| MoE | Mixture of Experts |
| MPNet | Masked and Permuted pre-training Network |
| MPQA | Multi-Perspective Question Answering dataset |
| MPT | MosaicML Pretrained Transformer (Databricks) |
| MR | Movie Reviews dataset |
| MRI | Magnetic Resonance Imaging |
| MRPC | Microsoft Research Paraphrase Corpus |
| MSE | Mean Squared Error |
| MT | Machine Translation |
| MT-Bench | Multi-Turn Benchmark |
| MuSR | Multi-step Soft Reasoning |
| MXNet | Mixing eager and graph mode for Networks |
| n | A variable often used for a count of something; e.g. n-dimensional or n-gram |
| NAACL | North American chapter of the ACL |
| NaN | Not a Number |
| NAS | Neural Architecture Search |
| NCCL | Nvidia Collective Communications Library |
| NCCLX | Nvidia Collective Communications Library eXtension (Meta) |
| NDCG@k | Normalized Discounted Cumulative Gain for 'k' recommendations |
| NER | Named Entity Recognition |
| NeurIPS | Neural Information Processing Systems |
| NExT-QA | Next generation of VQA models to Explain Temporal actions |
| NF4 | Normal Float 4 (4-bits) |
| NIC | Network Interface Card |
| NIH | National Institutes of Health |
| NIH | Needle In a Haystack |
| NIST | National Institute of Standards and Technology |
| NLG | Natural Language Generation |
| NLI | Natural Language Inference (entailment: if A then B; contradiction: if A then not B) |
| NLL | Negative Log Likelihood |
| NLLB | No Language Left Behind (translation) |
| NLP | Natural Language Processing |
| NLTK | Natural Language ToolKit |
| NMS | Non Max Suppression |
| NMT | Neural Machine Translation |
| NN | Nearest Neighbor |
| NN | Neural Network |
| NPC | Non-Playable Character (a character controlled by a computer) |
| NSFW | Not Safe For Work |
| NUMA | Non-Uniform Memory Access |
| NumPy | Numeric library for Python |
| Nvidia | "invidia" is Latin for "envy", which sounds like a pronounciation of NV (Next Vision) |
| OBQA | Open Book Question Answering |
| OCR | Optical Character Recognition |
| OGB | Open Graph Benchmark |
| OGBN | OGB Node propery prediction task |
| OOV | Out Of Vocabulary |
| OPRO | Optimization by PROmpting |
| OPT | Open Pre-trained Transformer |
| P40 | Pascal 40 Nvidia GPU |
| PAIR | Prompt Automatic Iterative Refinement |
| PaLM | Pathways Language Model |
| PAWS | Paraphrase Adversaries from Word Scrambling |
| PB | Peta Bytes |
| PCA | Principal Component Analysis |
| PCam | Patch Camelyon |
| PCI | Peripheral Component Interconnect |
| PDF | Portable Document Format |
| PDF | Probability Density Function |
| PEFT | Parameter Efficient Fine Tuning |
| PG | Policy Gradient |
| PHP | Personal Home Page |
| PHP | PHP: Hypertext Processor |
| PhotoDNA | Photo DeoxyriboNucleic Acid (image identification) |
| PII | Personally Identifiable Information |
| PIL | Python Imaging Library |
| PIQA | Physical Interaction Question Answering |
| PMLR | Proceedings of Machine Learning Research |
| POMDP | Partially Observable Markov Decision Process |
| POS | Part Of Speech |
| PER | Prioritized Experience Replay |
| PID | Process Identifier |
| Pixel | Picture element |
| PLM | Permuted Language Modeling |
| PM | Prosody Model |
| PNG | Portable Network Graphics image format |
| POS | Part Of Speech |
| PP | Pipeline Parallelism |
| PPO | Proximal Policy Optimization |
| PR | Precision vs Recall curve |
| ProLog | Programming Logic language |
| PTB | Penn TreeBank |
| PubMed | indexed Published Medical literature |
| PUE | Power Usage Effectiveness (GPUs require cooling) |
| pvalue | probability of false reject (for null hypothesis) |
| QA | Question Answering |
| QKV | Query Key Value |
| QLoRA | Quantized Low Rank Adaptation |
| QNLI | Question-answering Natural Language Inference dataset |
| QQP | Quora Question Pairs (dataset) |
| QuAC | Question Answering in Context dataset |
| QuALITY | Question Answering with Long Input Texts, Yes! |
| QT | Quality Tuning |
| R-CNN | Region-based CNN |
| RaCE | Reading Comprehension dataset from Examinations |
| RAG | Retrieval Augmented Generation |
| RAGAS | RAG ASessment (framework) |
| RAM | Random Access Memory |
| Rand | Random |
| RDMA | Remote Direct Memory Access |
| ReAct | Reasoning and Acting (agent loop) |
| REINFORCE | REward Increment = Nonnegative Factor times Offset Reinforcement times Characteristic Eligibility |
| ReLU | Rectified Linear Unit |
| RESISC | Remote Sensing Image Scene Classification |
| ResNet | Residual Network |
| REST | REpresentational State Transfer |
| RGB | Red, Green, and Blue |
| RL | Reinforcement Learning |
| RLAIF | Reinforcement Learning from AI Feedback |
| RLHF | Reinforcement Learning from Human Feedback |
| RM | Reward Model |
| RMSnorm | Root Mean Square normalization |
| RMSprop | Root Mean Square gradient propagation |
| RNN | Recurrent Neural Network |
| RoBERTa | Robustly optimized BERT approach |
| ROC | Receiver Operating Characteristic curve |
| RoCE | RDMA over Converged Ethernet |
| ROI | Region Of Interest |
| RoPE | Rotary Position Embeddings |
| ROUGE | Recall-Oriented Understudy for Gisting Evaluation |
| RS | Rejection Sampling |
| RT | RunTime; also RealTime |
| RTE | Recognizing Textual Entailment dataset |
| RTX | Ray-tracing Texel eXtreme |
| RWKV | Receptance Weighted Key Value (architecture) |
| SAC | Soft Actor Critic |
| SARSA | State Action Reward State Action |
| SAT | Scholastic Aptitude Test |
| SBERT | Sentence BERT |
| SciPy | Scientific library for Python |
| SDPA | Scaled Dot Product Attention |
| SELU | Scaled Exponential Linear Unit |
| SentEval | Sentence Evaluation |
| seq2seq | sequence-to-sequnce |
| SFT | Supervised Fine Tuning |
| SG | Skip-Gram |
| SGD | Stochastic Gradient Descent |
| SGM | Standard Generalized Markup text format |
| SICK-R | Sentences Involving Compositional Knowledge - Relatedness |
| SIGCOMM | Special Interest Group on data COMMunications |
| SIGIR | Special Interest Group on Information Retrieval |
| SiLU | Sigmoid Linear Unit (activation function); aka Swish |
| SIQA | Social Interaction Question Answering |
| SLM | Strange Loop Machine (MDP loop) |
| SLT | Spoken Language Technology |
| SLT | Statistical Learning Theory |
| SME | Subject Matter Expert |
| SMI | System Management Interface |
| SMoE | Sparse Mixture of Experts |
| SNAP | Stanford Network Analysis Platform |
| SNLI | Stanford Natural Language Inference dataset |
| SO | Shared Object |
| spaCy | syntactic parser using C-extensions for python (Cython) |
| SQL | Structured Query Language |
| SRAM | Static Random Access Memory |
| SRN | Simple Recurrent Network [refers to SimpleRNN() layer] |
| SSCD | Self-Supervised Copy Detection |
| SSD | Single Shot multibox Detector |
| SSD | Solid State Drive |
| SST | Stanford Sentiment Treebank |
| STEM | Science, Technology, Engineering, and Mathematics |
| STL | Self-Taught Learning |
| STSb | Semantic Text Similarity benchmark |
| SUN | Scene Understanding dataset |
| SUTLM | Speech Unit and Text Language Model |
| SVHN | Street View House Numbers dataset |
| SVM | Support Vector Machine |
| SWA | Sliding Window Attention |
| SWAG | Situations With Adversarial Generations |
| swin | shifted window (transformer) |
| SwiGLU | Swish Gated Linear Unit (activation function) |
| SXM# | Servier PCI eXpress Module, with version number |
| t | A variable often used for a test statistic, as in t statistic, t distribution, t test |
| T5 | Text-To-Text Transfer Transformer |
| tanh | hyperbolic tangent |
| TB | Tera Bytes |
| tCO2eq | tonnes of carbon dioxide equivalent |
| TD | Temporal Difference |
| TDP | Thermal Design Power |
| TD3 | Twin Delayed Deep Deterministic policy gradient |
| TDNN-OPGRU | Time-Delay Neural Network with Output-gate Projected GRU |
| Texel | Texture element |
| TextVQA | Text Visual Quesion Answering |
| TF | Term Frequency |
| TF-IDF | Term Frequency - Inverse Document Frequency |
| TL;DR | Too Long; Didn't Read: a prefix for a summary |
| TN | Text Normalization |
| TN | True Negative [Actual = Negative; Prediction = Negative] |
| TP | Tensor Parallelism (feature chunks processed in parallel) |
| TP | True Positive [Actual = Positive; Prediction = Positive] |
| TPR | True Positive Rate |
| TPU | Tensor Processing Unit |
| TReC | Text Retrieval Conference |
| TRL | Tranformer Reinforcement Learning |
| TRPO | Trust Region Policy Optimization |
| TSNE | T-distributed Stochastic Neighbor Embedding |
| TSV | Tab Separated Values |
| TTS | Text To Speech |
| TV | Television |
| TVQA | Television Question Answering (dataset) |
| UCB | Upper Confidence Bound |
| UCF | University of Central Florida |
| ULMFiT | Universal Language Model Fine Tuning |
| UMAP | Uniform Manifold Approximation and Projection |
| URL | Uniform Resource Locator |
| US | United States |
| USA | United States of America |
| USE | Universal Sentence Encoder |
| USENIX | Unix Users Group (organization) |
| UTF-8 | Unicode Transformation Format - 8-bit, where a character can be represented by a 1-byte, 2-byte, 3-byte, or 4-byte sequence; the first byte of a character determines how many bytes are used to represent the character [0-127 are 1-byte ASCII values] |
| V100 | Volta 100 Nvidia GPU |
| VAD | Voice Activity Detection |
| VAE | Variational AutoEncoder |
| VGG-16 | Oxford University Visual Geometry Group 16-layer network |
| VI | Variational Inference |
| ViP-LLaVA | Visual Prompt - LLaVA |
| ViT | Vision Transformer |
| vLLM | virtual LLM (inference engine) |
| VM | Virtual Machine |
| VOC | Visual Objects Challenge |
| vocoder | voice encoder |
| VPG | Vanilla Policy Gradient |
| VQA | Visual Question Answering |
| VR | Violation Rate (false negative rate for safety) |
| VR | Virtual Reality |
| VRAM | Video RAM |
| VTAB | Visual Task Adaptation Benchmark |
| W&B | Weights and Biases |
| WACV | Winter conference on Applications of Computer Vision |
| Wav | Waveform audio format |
| WER | Word Error Rate |
| WinoGrande | adversarial Winograd Schema challenge (identify the antecedent of an ambiguous term) |
| WNLI | Winograd Natural Language Inference dataset |
| WuPS | Wu and Palmer Similarity |
| XAI | X (formerly Twitter) AI |
| XGBoost | eXtreme Gradient Boosting |
| XLA | accelerated Linear Algebra |
| XLM-R | cross-lingual Language Model - RoBERTa, where 'X' represents a cross |
| XS Test | eXaggerated Safety behaviors Test |
| YFCC | Yahoo Flickr Creative Commons |
| YOLO | You Only Look Once |
| ZeroSCROLLS | Zero-shot CompaRison Over Long Language Sequences |
| ZIP | Zone Improvement Plan |