# sutskever-30-implementations **Repository Path**: dragon515/sutskever-30-implementations ## Basic Information - **Project Name**: sutskever-30-implementations - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-02-03 - **Last Updated**: 2026-02-03 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Sutskever 30 - Complete Implementation Suite **Comprehensive toy implementations of the 30 foundational papers recommended by Ilya Sutskever** [![Implementations](https://img.shields.io/badge/Implementations-30%2F30-brightgreen)](https://github.com/pageman/sutskever-30-implementations) [![Coverage](https://img.shields.io/badge/Coverage-100%25-blue)](https://github.com/pageman/sutskever-30-implementations) [![Python](https://img.shields.io/badge/Python-NumPy%20Only-yellow)](https://numpy.org/) ## Overview This repository contains detailed, educational implementations of the papers from Ilya Sutskever's famous reading list - the collection he told John Carmack would teach you "90% of what matters" in deep learning. **Progress: 30/30 papers (100%) - COMPLETE! πŸŽ‰** Each implementation: - βœ… Uses only NumPy (no deep learning frameworks) for educational clarity - βœ… Includes synthetic/bootstrapped data for immediate execution - βœ… Provides extensive visualizations and explanations - βœ… Demonstrates core concepts from each paper - βœ… Runs in Jupyter notebooks for interactive learning ## Quick Start ```bash # Navigate to the directory cd sutskever-30-implementations # Install dependencies pip install numpy matplotlib scipy # Run any notebook jupyter notebook 02_char_rnn_karpathy.ipynb ``` ## The Sutskever 30 Papers ### Foundational Concepts (Papers 1-5) | # | Paper | Notebook | Key Concepts | |---|-------|----------|--------------| | 1 | The First Law of Complexodynamics | βœ… `01_complexity_dynamics.ipynb` | Entropy, Complexity Growth, Cellular Automata | | 2 | The Unreasonable Effectiveness of RNNs | βœ… `02_char_rnn_karpathy.ipynb` | Character-level models, RNN basics, Text generation | | 3 | Understanding LSTM Networks | βœ… `03_lstm_understanding.ipynb` | Gates, Long-term memory, Gradient flow | | 4 | RNN Regularization | βœ… `04_rnn_regularization.ipynb` | Dropout for sequences, Variational dropout | | 5 | Keeping Neural Networks Simple | βœ… `05_neural_network_pruning.ipynb` | MDL principle, Weight pruning, 90%+ sparsity | ### Architectures & Mechanisms (Papers 6-15) | # | Paper | Notebook | Key Concepts | |---|-------|----------|--------------| | 6 | Pointer Networks | βœ… `06_pointer_networks.ipynb` | Attention as pointer, Combinatorial problems | | 7 | ImageNet/AlexNet | βœ… `07_alexnet_cnn.ipynb` | CNNs, Convolution, Data augmentation | | 8 | Order Matters: Seq2Seq for Sets | βœ… `08_seq2seq_for_sets.ipynb` | Set encoding, Permutation invariance, Attention pooling | | 9 | GPipe | βœ… `09_gpipe.ipynb` | Pipeline parallelism, Micro-batching, Re-materialization | | 10 | Deep Residual Learning (ResNet) | βœ… `10_resnet_deep_residual.ipynb` | Skip connections, Gradient highways | | 11 | Dilated Convolutions | βœ… `11_dilated_convolutions.ipynb` | Receptive fields, Multi-scale | | 12 | Neural Message Passing (GNNs) | βœ… `12_graph_neural_networks.ipynb` | Graph networks, Message passing | | 13 | **Attention Is All You Need** | βœ… `13_attention_is_all_you_need.ipynb` | Transformers, Self-attention, Multi-head | | 14 | Neural Machine Translation | βœ… `14_bahdanau_attention.ipynb` | Seq2seq, Bahdanau attention | | 15 | Identity Mappings in ResNet | βœ… `15_identity_mappings_resnet.ipynb` | Pre-activation, Gradient flow | ### Advanced Topics (Papers 16-22) | # | Paper | Notebook | Key Concepts | |---|-------|----------|--------------| | 16 | Relational Reasoning | βœ… `16_relational_reasoning.ipynb` | Relation networks, Pairwise functions | | 17 | **Variational Lossy Autoencoder** | βœ… `17_variational_autoencoder.ipynb` | VAE, ELBO, Reparameterization trick | | 18 | **Relational RNNs** | βœ… `18_relational_rnn.ipynb` | Relational memory, Multi-head self-attention, Manual backprop (~1100 lines) | | 19 | The Coffee Automaton | βœ… `19_coffee_automaton.ipynb` | Irreversibility, Entropy, Arrow of time, Landauer's principle | | 20 | **Neural Turing Machines** | βœ… `20_neural_turing_machine.ipynb` | External memory, Differentiable addressing | | 21 | Deep Speech 2 (CTC) | βœ… `21_ctc_speech.ipynb` | CTC loss, Speech recognition | | 22 | **Scaling Laws** | βœ… `22_scaling_laws.ipynb` | Power laws, Compute-optimal training | ### Theory & Meta-Learning (Papers 23-30) | # | Paper | Notebook | Key Concepts | |---|-------|----------|--------------| | 23 | MDL Principle | βœ… `23_mdl_principle.ipynb` | Information theory, Model selection, Compression | | 24 | **Machine Super Intelligence** | βœ… `24_machine_super_intelligence.ipynb` | Universal AI, AIXI, Solomonoff induction, Intelligence measures, Self-improvement | | 25 | Kolmogorov Complexity | βœ… `25_kolmogorov_complexity.ipynb` | Compression, Algorithmic randomness, Universal prior | | 26 | **CS231n: CNNs for Visual Recognition** | βœ… `26_cs231n_cnn_fundamentals.ipynb` | Image classification pipeline, kNN/Linear/NN/CNN, Backprop, Optimization, Babysitting neural nets | | 27 | Multi-token Prediction | βœ… `27_multi_token_prediction.ipynb` | Multiple future tokens, Sample efficiency, 2-3x faster | | 28 | Dense Passage Retrieval | βœ… `28_dense_passage_retrieval.ipynb` | Dual encoders, MIPS, In-batch negatives | | 29 | Retrieval-Augmented Generation | βœ… `29_rag.ipynb` | RAG-Sequence, RAG-Token, Knowledge retrieval | | 30 | Lost in the Middle | βœ… `30_lost_in_middle.ipynb` | Position bias, Long context, U-shaped curve | ## Featured Implementations ### 🌟 Must-Read Notebooks These implementations cover the most influential papers and demonstrate core deep learning concepts: #### Foundations 1. **`02_char_rnn_karpathy.ipynb`** - Character-level RNN - Build RNN from scratch - Understand backpropagation through time - Generate text 2. **`03_lstm_understanding.ipynb`** - LSTM Networks - Implement forget/input/output gates - Visualize gate activations - Compare with vanilla RNN 3. **`04_rnn_regularization.ipynb`** - RNN Regularization - Variational dropout for RNNs - Proper dropout placement - Training improvements 4. **`05_neural_network_pruning.ipynb`** - Network Pruning & MDL - Magnitude-based pruning - Iterative pruning with fine-tuning - 90%+ sparsity with minimal loss - Minimum Description Length principle #### Computer Vision 5. **`07_alexnet_cnn.ipynb`** - CNNs & AlexNet - Convolutional layers from scratch - Max pooling and ReLU - Data augmentation techniques 6. **`10_resnet_deep_residual.ipynb`** - ResNet - Skip connections solve degradation - Gradient flow visualization - Identity mapping intuition 7. **`15_identity_mappings_resnet.ipynb`** - Pre-activation ResNet - Pre-activation vs post-activation - Better gradient flow - Training 1000+ layer networks 8. **`11_dilated_convolutions.ipynb`** - Dilated Convolutions - Multi-scale receptive fields - No pooling required - Semantic segmentation #### Attention & Transformers 9. **`14_bahdanau_attention.ipynb`** - Neural Machine Translation - Original attention mechanism - Seq2seq with alignment - Attention visualization 10. **`13_attention_is_all_you_need.ipynb`** - Transformers - Scaled dot-product attention - Multi-head attention - Positional encoding - Foundation of modern LLMs 11. **`06_pointer_networks.ipynb`** - Pointer Networks - Attention as selection - Combinatorial optimization - Variable output size 12. **`08_seq2seq_for_sets.ipynb`** - Seq2Seq for Sets - Permutation-invariant set encoder - Read-Process-Write architecture - Attention over unordered elements - Sorting and set operations - Comparison: order-sensitive vs order-invariant 13. **`09_gpipe.ipynb`** - GPipe Pipeline Parallelism - Model partitioning across devices - Micro-batching for pipeline utilization - F-then-B schedule (forward all, backward all) - Re-materialization (gradient checkpointing) - Bubble time analysis - Training models larger than single-device memory #### Advanced Topics 14. **`12_graph_neural_networks.ipynb`** - Graph Neural Networks - Message passing framework - Graph convolutions - Molecular property prediction 15. **`16_relational_reasoning.ipynb`** - Relation Networks - Pairwise relational reasoning - Visual QA - Permutation invariance 16. **`18_relational_rnn.ipynb`** - Relational RNN - LSTM with relational memory - Multi-head self-attention across memory slots - Architecture demonstration (forward pass) - Sequential reasoning tasks - **Section 11: Manual backpropagation implementation (~1100 lines)** - Complete gradient computation for all components - Gradient checking with numerical verification 17. **`20_neural_turing_machine.ipynb`** - Memory-Augmented Networks - Content & location addressing - Differentiable read/write - External memory 18. **`21_ctc_speech.ipynb`** - CTC Loss & Speech Recognition - Connectionist Temporal Classification - Alignment-free training - Forward algorithm #### Generative Models 19. **`17_variational_autoencoder.ipynb`** - VAE - Generative modeling - ELBO loss - Latent space visualization #### Modern Applications 20. **`27_multi_token_prediction.ipynb`** - Multi-Token Prediction - Predict multiple future tokens - 2-3x sample efficiency - Speculative decoding - Faster training & inference 21. **`28_dense_passage_retrieval.ipynb`** - Dense Retrieval - Dual encoder architecture - In-batch negatives - Semantic search 22. **`29_rag.ipynb`** - Retrieval-Augmented Generation - RAG-Sequence vs RAG-Token - Combining retrieval + generation - Knowledge-grounded outputs 23. **`30_lost_in_middle.ipynb`** - Long Context Analysis - Position bias in LLMs - U-shaped performance curve - Document ordering strategies #### Scaling & Theory 24. **`22_scaling_laws.ipynb`** - Scaling Laws - Power law relationships - Compute-optimal training - Performance prediction 25. **`23_mdl_principle.ipynb`** - Minimum Description Length - Information-theoretic model selection - Compression = Understanding - MDL vs AIC/BIC comparison - Neural network architecture selection - MDL-based pruning (connects to Paper 5) - Kolmogorov complexity preview 26. **`25_kolmogorov_complexity.ipynb`** - Kolmogorov Complexity - K(x) = shortest program generating x - Randomness = Incompressibility - Algorithmic probability (Solomonoff) - Universal prior for induction - Connection to Shannon entropy - Occam's Razor formalized - Theoretical foundation for ML 27. **`24_machine_super_intelligence.ipynb`** - Universal Artificial Intelligence - **Formal theory of intelligence (Legg & Hutter)** - Psychometric g-factor and universal intelligence Ξ₯(Ο€) - Solomonoff induction for sequence prediction - AIXI: Theoretically optimal RL agent - Monte Carlo AIXI (MC-AIXI) approximation - Kolmogorov complexity estimation - Intelligence measurement across environments - Recursive self-improvement dynamics - Intelligence explosion scenarios - **6 sections: from psychometrics to superintelligence** - Connects Papers #23 (MDL), #25 (Kolmogorov), #8 (DQN) 28. **`01_complexity_dynamics.ipynb`** - Complexity & Entropy - Cellular automata (Rule 30) - Entropy growth - Irreversibility (basic introduction) 28. **`19_coffee_automaton.ipynb`** - The Coffee Automaton (Deep Dive) - **Comprehensive exploration of irreversibility** - Coffee mixing and diffusion processes - Entropy growth and coarse-graining - Phase space and Liouville's theorem - PoincarΓ© recurrence theorem (will unmix after e^N time!) - Maxwell's demon and Landauer's principle - Computational irreversibility (one-way functions, hashing) - Information bottleneck in machine learning - Biological irreversibility (life and the 2nd law) - Arrow of time: fundamental vs emergent - **10 comprehensive sections exploring irreversibility across all scales** 29. **`26_cs231n_cnn_fundamentals.ipynb`** - CS231n: Vision from First Principles - **Complete vision pipeline in pure NumPy** - k-Nearest Neighbors baseline - Linear classifiers (SVM and Softmax) - Optimization (SGD, Momentum, Adam, learning rate schedules) - 2-layer neural networks with backpropagation - Convolutional layers (conv, pool, ReLU) - Complete CNN architecture (Mini-AlexNet) - Visualization techniques (filters, saliency maps) - Transfer learning principles - Babysitting tips (sanity checks, hyperparameter tuning, monitoring) - **10 sections covering entire CS231n curriculum** - Ties together Papers #7 (AlexNet), #10 (ResNet), #11 (Dilated Conv) ## Repository Structure ``` sutskever-30-implementations/ β”œβ”€β”€ README.md # This file β”œβ”€β”€ PROGRESS.md # Implementation progress tracking β”œβ”€β”€ IMPLEMENTATION_TRACKS.md # Detailed tracks for all 30 papers β”‚ β”œβ”€β”€ 01_complexity_dynamics.ipynb # Entropy & complexity β”œβ”€β”€ 02_char_rnn_karpathy.ipynb # Vanilla RNN β”œβ”€β”€ 03_lstm_understanding.ipynb # LSTM gates β”œβ”€β”€ 04_rnn_regularization.ipynb # Dropout for RNNs β”œβ”€β”€ 05_neural_network_pruning.ipynb # Pruning & MDL β”œβ”€β”€ 06_pointer_networks.ipynb # Attention pointers β”œβ”€β”€ 07_alexnet_cnn.ipynb # CNNs & AlexNet β”œβ”€β”€ 08_seq2seq_for_sets.ipynb # Permutation-invariant sets β”œβ”€β”€ 09_gpipe.ipynb # Pipeline parallelism β”œβ”€β”€ 10_resnet_deep_residual.ipynb # Residual connections β”œβ”€β”€ 11_dilated_convolutions.ipynb # Multi-scale convolutions β”œβ”€β”€ 12_graph_neural_networks.ipynb # Message passing GNNs β”œβ”€β”€ 13_attention_is_all_you_need.ipynb # Transformer architecture β”œβ”€β”€ 14_bahdanau_attention.ipynb # Original attention β”œβ”€β”€ 15_identity_mappings_resnet.ipynb # Pre-activation ResNet β”œβ”€β”€ 16_relational_reasoning.ipynb # Relation networks β”œβ”€β”€ 17_variational_autoencoder.ipynb # VAE β”œβ”€β”€ 18_relational_rnn.ipynb # Relational RNN β”œβ”€β”€ 19_coffee_automaton.ipynb # Irreversibility deep dive β”œβ”€β”€ 20_neural_turing_machine.ipynb # External memory β”œβ”€β”€ 21_ctc_speech.ipynb # CTC loss β”œβ”€β”€ 22_scaling_laws.ipynb # Empirical scaling β”œβ”€β”€ 23_mdl_principle.ipynb # MDL & compression β”œβ”€β”€ 24_machine_super_intelligence.ipynb # Universal AI & AIXI β”œβ”€β”€ 25_kolmogorov_complexity.ipynb # K(x) & randomness β”œβ”€β”€ 26_cs231n_cnn_fundamentals.ipynb # Vision from first principles β”œβ”€β”€ 27_multi_token_prediction.ipynb # Multi-token prediction β”œβ”€β”€ 28_dense_passage_retrieval.ipynb # Dense retrieval β”œβ”€β”€ 29_rag.ipynb # RAG architecture └── 30_lost_in_middle.ipynb # Long context analysis ``` **All 30 papers implemented! (100% complete!) πŸŽ‰** ## Learning Path ### Beginner Track (Start here!) 1. **Character RNN** (`02_char_rnn_karpathy.ipynb`) - Learn basic RNNs 2. **LSTM** (`03_lstm_understanding.ipynb`) - Understand gating mechanisms 3. **CNNs** (`07_alexnet_cnn.ipynb`) - Computer vision fundamentals 4. **ResNet** (`10_resnet_deep_residual.ipynb`) - Skip connections 5. **VAE** (`17_variational_autoencoder.ipynb`) - Generative models ### Intermediate Track 6. **RNN Regularization** (`04_rnn_regularization.ipynb`) - Better training 7. **Bahdanau Attention** (`14_bahdanau_attention.ipynb`) - Attention basics 8. **Pointer Networks** (`06_pointer_networks.ipynb`) - Attention as selection 9. **Seq2Seq for Sets** (`08_seq2seq_for_sets.ipynb`) - Permutation invariance 10. **CS231n** (`26_cs231n_cnn_fundamentals.ipynb`) - Complete vision pipeline (kNN β†’ CNNs) 11. **GPipe** (`09_gpipe.ipynb`) - Pipeline parallelism for large models 12. **Transformers** (`13_attention_is_all_you_need.ipynb`) - Modern architecture 13. **Dilated Convolutions** (`11_dilated_convolutions.ipynb`) - Receptive fields 14. **Scaling Laws** (`22_scaling_laws.ipynb`) - Understanding scale ### Advanced Track 15. **Pre-activation ResNet** (`15_identity_mappings_resnet.ipynb`) - Architecture details 16. **Graph Neural Networks** (`12_graph_neural_networks.ipynb`) - Graph learning 17. **Relation Networks** (`16_relational_reasoning.ipynb`) - Relational reasoning 18. **Neural Turing Machines** (`20_neural_turing_machine.ipynb`) - External memory 19. **CTC Loss** (`21_ctc_speech.ipynb`) - Speech recognition 20. **Dense Retrieval** (`28_dense_passage_retrieval.ipynb`) - Semantic search 21. **RAG** (`29_rag.ipynb`) - Retrieval-augmented generation 22. **Lost in the Middle** (`30_lost_in_middle.ipynb`) - Long context analysis ### Theory & Fundamentals 23. **MDL Principle** (`23_mdl_principle.ipynb`) - Model selection via compression 24. **Kolmogorov Complexity** (`25_kolmogorov_complexity.ipynb`) - Randomness & information 25. **Complexity Dynamics** (`01_complexity_dynamics.ipynb`) - Entropy & emergence 26. **Coffee Automaton** (`19_coffee_automaton.ipynb`) - Deep dive into irreversibility ## Key Insights from the Sutskever 30 ### Architecture Evolution - **RNN β†’ LSTM**: Gating solves vanishing gradients - **Plain Networks β†’ ResNet**: Skip connections enable depth - **RNN β†’ Transformer**: Attention enables parallelization - **Fixed vocab β†’ Pointers**: Output can reference input ### Fundamental Mechanisms - **Attention**: Differentiable selection mechanism - **Residual Connections**: Gradient highways - **Gating**: Learned information flow control - **External Memory**: Separate storage from computation ### Training Insights - **Scaling Laws**: Performance predictably improves with scale - **Regularization**: Dropout, weight decay, data augmentation - **Optimization**: Gradient clipping, learning rate schedules - **Compute-Optimal**: Balance model size and training data ### Theoretical Foundations - **Information Theory**: Compression, entropy, MDL - **Complexity**: Kolmogorov complexity, power laws - **Generative Modeling**: VAE, ELBO, latent spaces - **Memory**: Differentiable data structures ## Implementation Philosophy ### Why NumPy-only? These implementations deliberately avoid PyTorch/TensorFlow to: - **Deepen understanding**: See what frameworks abstract away - **Educational clarity**: No magic, every operation explicit - **Core concepts**: Focus on algorithms, not framework APIs - **Transferable knowledge**: Principles apply to any framework ### Synthetic Data Approach Each notebook generates its own data to: - **Immediate execution**: No dataset downloads required - **Controlled experiments**: Understand behavior on simple cases - **Concept focus**: Data doesn't obscure the algorithm - **Rapid iteration**: Modify and re-run instantly ## Extensions & Next Steps ### Build on These Implementations After understanding the core concepts, try: 1. **Scale up**: Implement in PyTorch/JAX for real datasets 2. **Combine techniques**: E.g., ResNet + Attention 3. **Modern variants**: - RNN β†’ GRU β†’ Transformer - VAE β†’ Ξ²-VAE β†’ VQ-VAE - ResNet β†’ ResNeXt β†’ EfficientNet 4. **Applications**: Apply to real problems ### Research Directions The Sutskever 30 points toward: - Scaling (bigger models, more data) - Efficiency (sparse models, quantization) - Capabilities (reasoning, multi-modal) - Understanding (interpretability, theory) ## Resources ### Original Papers See `IMPLEMENTATION_TRACKS.md` for full citations and links ### Additional Reading - [Ilya Sutskever's Reading List (GitHub)](https://github.com/dzyim/ilya-sutskever-recommended-reading) - [Aman's AI Journal - Sutskever 30 Primers](https://aman.ai/primers/ai/top-30-papers/) - [The Annotated Transformer](http://nlp.seas.harvard.edu/annotated-transformer/) - [Andrej Karpathy's Blog](http://karpathy.github.io/) ### Courses - Stanford CS231n: Convolutional Neural Networks - Stanford CS224n: NLP with Deep Learning - MIT 6.S191: Introduction to Deep Learning ## Contributing These implementations are educational and can be improved! Consider: - Adding more visualizations - Implementing missing papers - Improving explanations - Finding bugs - Adding comparisons with framework implementations ## Citation If you use these implementations in your work or teaching: ```bibtex @misc{sutskever30implementations, title={Sutskever 30: Complete Implementation Suite}, author={Paul "The Pageman" Pajo, pageman@gmail.com}, year={2025}, note={Educational implementations of Ilya Sutskever's recommended reading list, inspired by https://papercode.vercel.app/} } ``` ## License Educational use. See individual papers for original research citations. ## Acknowledgments - **Ilya Sutskever**: For curating this essential reading list - **Paper authors**: For their foundational contributions - **Community**: For making these ideas accessible --- ## Latest Additions (December 2025) ### Recently Implemented (21 new papers!) - βœ… **Paper 4**: RNN Regularization (variational dropout) - βœ… **Paper 5**: Neural Network Pruning (MDL, 90%+ sparsity) - βœ… **Paper 7**: AlexNet (CNNs from scratch) - βœ… **Paper 8**: Seq2Seq for Sets (permutation invariance, attention pooling) - βœ… **Paper 9**: GPipe (pipeline parallelism, micro-batching, re-materialization) - βœ… **Paper 19**: The Coffee Automaton (deep dive into irreversibility, entropy, Landauer's principle) - βœ… **Paper 26**: CS231n (complete vision pipeline: kNN β†’ CNN, all in NumPy) - βœ… **Paper 11**: Dilated Convolutions (multi-scale) - βœ… **Paper 12**: Graph Neural Networks (message passing) - βœ… **Paper 14**: Bahdanau Attention (original attention) - βœ… **Paper 15**: Identity Mappings ResNet (pre-activation) - βœ… **Paper 16**: Relational Reasoning (relation networks) - βœ… **Paper 18**: Relational RNNs (relational memory + Section 11: manual backprop ~1100 lines) - βœ… **Paper 21**: Deep Speech 2 (CTC loss) - βœ… **Paper 23**: MDL Principle (compression, model selection, connects to Papers 5 & 25) - βœ… **Paper 24**: Machine Super Intelligence (Universal AI, AIXI, Solomonoff induction, intelligence measures, recursive self-improvement) - βœ… **Paper 25**: Kolmogorov Complexity (randomness, algorithmic probability, theoretical foundation) - βœ… **Paper 27**: Multi-Token Prediction (2-3x sample efficiency) - βœ… **Paper 28**: Dense Passage Retrieval (dual encoders) - βœ… **Paper 29**: RAG (retrieval-augmented generation) - βœ… **Paper 30**: Lost in the Middle (long context) ## Quick Reference: Implementation Complexity ### Can Implement in an Afternoon - βœ… Character RNN - βœ… LSTM - βœ… ResNet - βœ… Simple VAE - βœ… Dilated Convolutions ### Weekend Projects - βœ… Transformer - βœ… Pointer Networks - βœ… Graph Neural Networks - βœ… Relation Networks - βœ… Neural Turing Machine - βœ… CTC Loss - βœ… Dense Retrieval ### Week-Long Deep Dives - βœ… Full RAG system - ⚠️ Large-scale experiments - ⚠️ Hyperparameter optimization --- **"If you really learn all of these, you'll know 90% of what matters today."** - Ilya Sutskever Happy learning! πŸš€