# assignment1-basics **Repository Path**: jiandandema/assignment1-basics ## Basic Information - **Project Name**: assignment1-basics - **Description**: Student version of Assignment 1 for Stanford CS336 - Language Modeling From Scratch - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-12-21 - **Last Updated**: 2025-12-21 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # CS336 Spring 2025 Assignment 1: Basics For a full description of the assignment, see the assignment handout at [cs336_spring2025_assignment1_basics.pdf](./cs336_spring2025_assignment1_basics.pdf) If you see any issues with the assignment handout or code, please feel free to raise a GitHub issue or open a pull request with a fix. ## Setup ### Environment We manage our environments with `uv` to ensure reproducibility, portability, and ease of use. Install `uv` [here](https://github.com/astral-sh/uv) (recommended), or run `pip install uv`/`brew install uv`. We recommend reading a bit about managing projects in `uv` [here](https://docs.astral.sh/uv/guides/projects/#managing-dependencies) (you will not regret it!). You can now run any code in the repo using ```sh uv run ``` and the environment will be automatically solved and activated when necessary. ### Run unit tests ```sh uv run pytest ``` Initially, all tests should fail with `NotImplementedError`s. To connect your implementation to the tests, complete the functions in [./tests/adapters.py](./tests/adapters.py). ### Download data Download the TinyStories data and a subsample of OpenWebText ``` sh mkdir -p data cd data wget https://huggingface.co/datasets/roneneldan/TinyStories/resolve/main/TinyStoriesV2-GPT4-train.txt wget https://huggingface.co/datasets/roneneldan/TinyStories/resolve/main/TinyStoriesV2-GPT4-valid.txt wget https://huggingface.co/datasets/stanford-cs336/owt-sample/resolve/main/owt_train.txt.gz gunzip owt_train.txt.gz wget https://huggingface.co/datasets/stanford-cs336/owt-sample/resolve/main/owt_valid.txt.gz gunzip owt_valid.txt.gz cd .. ```