The Enigma Guide to Avoiding an Actual Pandas Pandemonium

An image of a large library.  The shelves lined with books rise up under a domed ceiling.

When you first start out using Pandas, it's often best to just get your feet wet and deal with problems as they come up. Then, the years pass, the amazing things you've been able to build with it start to accumulate, but you have a vague inkling that you keep making the same kinds of mistakes and that your code is running really slowly for what seems like pretty simple operations. This is when it's time to dig into the inner workings of Pandas and take your code to the next level. Like with any library, the best way to optimize your code is to understand what's going on underneath the syntax.

It can be hard to know where to start, though. There are tools out there that can help boost productivity—but what exactly are these tools, and where can you find them?

The image headline is “Choosing the right way to iterate through rows.  There are three code blocks below the headline.  The first shows ‘iterrows,’ a generator that yields indices and rows.  The second code block shows what is described as “a somewhat optimized way that borrows from functional programming” and the third code block shows a method for using vectorization that is about 100 to 300 times faster for this operation.

In the spirit of learning—and sharing!—I recently culled together some of the Pandas tips and tricks I’ve come across over the years. Some of these methods I’ve learned at conferences, others I’ve picked up in books or from colleagues. After running a few tutorials at Enigma—including a session with our Software Craftsmanship Guild, an internal club that promotes the learning and practice of software engineering skills—I realized this information was worth sharing more broadly.

So, here is The Enigma Guide to Avoiding an Actual Pandas Pandemonium, which digs into coding best practices, common silent failures, how to speed up your runtime, and ways to lower your memory footprint. This is a bunch of suggestions for optimizing your Pandas code, conveniently packaged together in one place.

Have thoughts on the tutorial, or tips you want to share? Let us know!