The general idea behind caching: when a file is frequently read, storing its data in the memory and retrieving it from the memory is much faster than reading it repeatedly from the disk (read cache). When a file is written, temporarily storing its data in the memory and avoiding repeated overwriting operations to transfer to the disk would improve performance (lazy write or write-behind). Other ideas concerning that the hard disk is fast at sequential access brings two other cache ideas: reading ahead several contiguous blocks upon each read (read-ahead), and sorting the order of blocks to write before writing to the disk (elevator algorithm).
Windows NT implements its disk cache as a library that the file system uses when reading or writing a file or metadata. Metadata here means the FAT (file allocation table) or the MFT (master file table), or perhaps the directories (confirmed in Windows Internals, 4th Edition).
Upon a file is read, the file system calls the cache to get the data. If the data is mapped by the cache, then the cache directly returns it by accessing the data-mapping virtual memory in the cache virtual address space. If it has not been mapped yet, the cache allocates a cache control block to point to it and then calls the memory manager to map a view of the file to a reserved virtual memory address. Then the cache manager reads the virtual memory. Next time when reading the file, if the cache doesn't allocate a cache block, it means that the data is already inside the cache.
When the cache manager is reading a mapped memory block, the memory manager asks the file system to retrieve it on demand (if it is not inside the physical memory). Note that as the working set of the cache may be trimmed by the memory manager, even the data is in the cache, it may still not be in the physical memory. Because the working set of the cache is managed by the memory manager, so the number of cache blocks present in the physical memory is dynamic.
Note that blocks in the physical memory include the blocks that are in the working set and the blocks that are in the memory manager's transition list (standby list), which are virtual memory pages that are ready to be swapped out to the disk but not yet done. In the case of the memory-mapped files, "swapping out" means a no-op for a non-modified page and a write to file operation for a modified page. If the cache requests pages in the transition list, they would be retrieved fast.
The memory manager automatically expands the cache working set when there are many accesses active. The memory manager then expands the working set just like it would to usual processes. So when there are many read requests, the cache may grow larger and larger. This often happens when: 1. You use a waveform editing program to load a large WAV file; 2. You use an anti-virus program (for example, KILL) to do a whole system scan. Maybe both programs opened the file in the default mode, which random access is assumed initially. I'm not very sure about this, however it is said that if you explicitly opens a file with a non-random hint (for example, sequential reading), the cache won't keep the passed blocks mapped for long. Instead, it would unmap the blocks very soon after they are used. But don't worry: they may still be in the transition list. File blocks mapped in write requests may be unmapped as they are flushed, so write requests may not affect the working set too much.
It is said by Mark Russinovich that the cache manager uses a LRU replacing algorithm since Windows NT 5.0 (that is, Windows 2000). That seems to tell us that Windows NT 4.0 was not using this obvious simple optimizing algorithm. In NT 4.0 the cache virtual address space and the control blocks are used in a round robin way. A replacing algorithm is used when mapping of a cache block requires an occupied cache block to be unmapped. Of course, one layer below the cache manager, the memory manager still uses a working set mechanism, so the cache not using LRU won't be too big a problem. However, what I was originally thinking about was a more sophisticated history-recording replacing algorithm, which was discussed in detail in my non-finished project ReadPad. The idea of its cache is in this picture.