What happens if paging file is too small?

I have ~50000 images and annotation files for training a YOLOv5 object detection model. I've trained a model no problem using just CPU on another computer, but it takes too long, so I need GPU training. My problem is, when I try to train with a GPU I keep getting this error:

OSError: [WinError 1455] The paging file is too small for this operation to complete

This is the command I'm executing:

train.py --img 640 --batch 4 --epochs 100 --data myyaml.yaml --weights yolov5l.pt

CUDA and PyTorch have successfully been installed and are available. The following command installed with no errors:

pip3 install torch==1.10.0+cu113 torchvision==0.11.1+cu113 torchaudio===0.10.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html

I've found other people online with similar issues and have fixed it by changing the num_workers = 8 to num_workers = 1. When I tried this, training started and seemed to get past the point where the paging file is too small error appears, but then crashes a couple hours later. I've also increased the virtual memory available on my GPU as per this video (https://www.youtube.com/watch?v=Oh6dga-Oy10) that also didn't work. I think it's a memory issue because some of the times it crashes I get a low memory warning from my computer.

Any help would be much appreciated.

The Windows Pagefile is used for virtual memory operations by the Windows kernel.

Windows pagefile sizes are set during installation, and normally do not have to be changed. However, if you add memory to your system after initialization, you may need to increase the “initial pagefile size” on the primary boot drive. This is especially true if you’re trying to get a kernel memory dump to diagnose a problem.

If your pagefile is too small, you may get a memory.dmp file, but Debugging Tools for Windows won’t be able to read it. Even if you don’t intend to examine the dump yourself, you still might want to give it to someone else! On systems with solid-state disks (SSDs or NVMe drives), your system administrator might have limited the size to reduce wear on the disk — but this can prevent analysis later.

It’s rather involved to change the sizes, so here are step-by-step procedures, one for Windows 10, another for older versions of Windows.
 

Windows 10

On Windows 10, you can get directly to system properties from the task bar, but then you have to traverse several dialogs.

Windows 2000 through 7

  • Log in as a system administrator.
  • Open the system control panel, and double-click “System”:

What happens if paging file is too small?

  • After a short pause, you’ll see the the general system properties page. Select the “Advanced tab”

What happens if paging file is too small?

  • In Windows 2000, select “Performance Options”. In Windows XP through Windows 7, select the “Settings” button under “Performance”. The image below is for Windows XP:

What happens if paging file is too small?

  • You’ll see the Performance Options page. In Windows 2000, select “Change…” In Windows XP through Windows 7, select “Advanced” and then, under “Virtual Memory,” select “Change.” The image below is for Windows XP:

What happens if paging file is too small?

  • You’ll see the “Virtual Memory” page. Select a drive, if more than one, and change the initial size of the paging file. (If you are doing this because of a message from “How to Enable Kernel Memory Dumps”, be sure to select the drive mentioned in the alert box, and also set the initial size according to the alert box.) The image below is from Windows XP.

What happens if paging file is too small?

  • Press OK, and exit from the Performance Options (2K) or Virtual Memory (XP through Windows 7) page.
  • Restart the system.

This problem is about DataLoader.
you must reduce the value of num_workers.
in folder : python\Lib\site-packages\torch\utils\data open dataloader.py and in line 189 write self.num_workers = 2.

The issue is with how multi-process Python works on Windows with the pytorch/cuda DLLs. The number of workers you set in the DataLoader directly relates to how many Python processes are created.

Each time a Python process imports pytorch it loads several DLLs. These DLLs have very large sections of data in them that aren't really used, but space is reserved for them in memory anyways. We're talking in the range of hundreds of megabytes to a couple gigabytes, per DLL.

When Windows is asked to reserve memory, if it says that it returned memory then it guarantees that memory will be available to you, even if you never end up using it.

Linux allows overcommitting. By default on Linux, when you ask it to reserve memory, it says "Yeah sure, here you go" and tells you that it reserved the memory. But it hasn't actually done this. It will reserve it when you try to use it, and hopes that there is something available at that time.

So, if you allocate memory on Windows, you can be sure you can use that memory. If you allocate memory on Linux, it is possible that when you actually try to use the memory that it will not be there, and your program will crash.

On Linux, when it spawns num_workers processes and each one reserves several gigabytes of data, Linux is happy to say it reserved this, even though it didn't. Since this "reserved memory" is never actually used, everything is good. You can create tons of worker processes. Just because pytorch allocated 50GB of memory, as long as it never actually uses it it won't be a problem. (Note: I haven't actually ran pytorch on Linux. I am just describing how Linux would not have this crash even if it attempted to allocate the same amount of memory. I do not know for a fact that pytorch/CUDA overallocate on Linux)

On Windows, when you spawn num_workers processes and each one reserves several gigabytes of data, Windows insists that it can actually satisfy this request should the memory be used. So, if Python tries to allocate 50GB of memory, then your total RAM + page file size must have space for 50GB.

So, on Windows NumPythonProcesses*MemoryPerProcess < RAM + PageFileSize must be true or you will hit this error.

Your suggestion of lowering num_workers decreases NumPythonProcesses. The suggestions to modify the page file size increase PageFileSize. My FixNvPe.py script decreases MemoryPerProcess.

The trick is to find a balance of all of these variables that keeps that equation true.

What happens if pagefile is too small?

The risk is that if you have too little RAM and you set the paging file too small or disable it, your programs could crash usually giving you an “Out of memory” error. You also would not be able to diagnose crashes in case you wanted to as mentioned by others, but most of us never do that anyway.

Does paging file size affect performance?

Having a larger page file is going to add extra work for your hard drive, causing everything else to run slower. Page file size should only be increased when encountering out-of-memory errors, and only as a temporary fix. A better solution is to adding more memory to the computer.

What is a good paging file size?

On most Windows 10 systems with 8 GB of RAM or more, the OS manages the size of the paging file nicely. The paging file is typically 1.25 GB on 8 GB systems, 2.5 GB on 16 GB systems and 5 GB on 32 GB systems.

What happens if paging file is too big?

As time goes by, this paging file has the potential to store a large amount of data. If it becomes too big, it might end up corrupting itself, causing all sorts of issues on your device. This is more likely on older systems, such as Windows 7, but in some cases can happen on Windows 10 as well.