Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions phases/00-setup-and-tooling/09-data-management/docs/en.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ This downloads the IMDB movie review dataset. After the first download, it loads
Some datasets are too large to fit on disk. Streaming loads them row by row without downloading the full thing.

```python
dataset = load_dataset("wikimedia/wikipedia", "20220301.en", split="train", streaming=True)
dataset = load_dataset("wikimedia/wikipedia", "20231101.en", split="train", streaming=True)

for i, example in enumerate(dataset):
print(example["title"])
Expand Down Expand Up @@ -140,7 +140,7 @@ Model weights and large datasets should not go into git. Three options:

**Option A: .gitignore (simplest)**

```
```text
*.bin
*.safetensors
*.pt
Expand Down