[Numpy-discussion] Possible roadmap addendum: building better text file readers
Sun Feb 26 13:16:10 CST 2012
On Sun, Feb 26, 2012 at 1:00 PM, Nathaniel Smith <email@example.com> wrote:
> On Sun, Feb 26, 2012 at 5:23 PM, Warren Weckesser
> <firstname.lastname@example.org> wrote:
> > I haven't pushed it to the extreme, but the "big" example (in the
> > directory) is a 1 gig text file with 2 million rows and 50 fields in each
> > row. This is read in less than 30 seconds (but that's with a solid state
> > drive).
> Obviously this was just a quick test, but FYI, a solid state drive
> shouldn't really make any difference here -- this is a pure sequential
> read, and for those, SSDs are if anything actually slower than
> traditional spinning-platter drives.
> For this kind of benchmarking, you'd really rather be measuring the
> CPU time, or reading byte streams that are already in memory. If you
> can process more MB/s than the drive can provide, then your code is
> effectively perfectly fast. Looking at this number has a few
> - You get more repeatable measurements (no disk buffers and stuff
> messing with you)
> - If your code can go faster than your drive, then the drive won't
> make your benchmark look bad
> - There are probably users out there that have faster drives than you
> (e.g., I just measured ~340 megabytes/s off our lab's main RAID
> array), so it's nice to be able to measure optimizations even after
> they stop mattering on your equipment.
For anyone benchmarking software like this, be sure to clear the disk cache
before each run. In linux:
$ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
In Mac OSX:
I'm not sure what the equivalent is in Windows.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion