These 7 Programming Habits Are Making You a Less Productive Data Scientist

Author:Murphy | View: 24479 | Time: 2025-03-23 19:55:05

I'm pretty sure we've all done at least one of these bad habits at some point or another in our collective journeys toward Data Science.

Whether we do them when we're just starting with our learning, or we do them later on because we're reasonably good at what we do and get lazy, we've all committed at least one of these programming faults.

Whatever the reason, it's never too late to clean up your programming act and become a more productive, efficient data scientist. Luckily, you've already done the hard part and taught yourself to code. Now, we just need to refine your techniques and implement some good Habits to help you get more enjoyment out of one of the best parts of data science projects. At the very least what you learn today will keep you from getting yelled at by your software department.

Straight from the experience of a software development student, these habits are surely keeping you from being your most productive self – let's change that right now.

1. Not commenting your code

Commenting your code is a critical part of code documentation that ensures three things:

Other people can understand your code.
Other people can maintain your code.
You can understand your code when you revisit it after a period away.

At the bare minimum, there are three types of comments that you must have in your code – these are non-negotiable:

The first is a comment that describes any newly committed or shared code within your personal, or team's repository. This comment could be a part of your Git commit message, but it should also feature within the code, usually above the chunk of code that you just committed. I like to separate this code section further by creating a dashed line around it that encompasses the code and the comment – just for clearer viewing, nothing more. This comment should describe the code's functionality.
The second is a comment that describes each function in your code. This comment should sit directly on top of each function and explain its inputs and outputs, as well as the function logic.
The final comment you should have should be located on top of any one-liners in your code. One-liners describe logic in your code that is usually spread out over several lines but was instead written in one single line. Sometimes these one-liners can be difficult to understand, so a comment that describes what it does and how each part works is a great way to keep yourself organized.

From working with other software developers, what I've seen that sets apart good code from great code is the level and detail of comments within. In other words, comment your code succinctly, accurately, and liberally. If something seems obvious to you, make a comment just in case it isn't obvious to someone else (or yourself in the future for that matter).

2. Not using GitHub for version control

There's really no reason anymore for someone working in tech to not be using GitHub. GitHub is not only a version control tool, but it's also a productivity tool that helps you easily work on code and collaborate with others.

GitHub is generally the golden standard and the one by which most tech companies carry out version control for their code. Even if you're a single data scientist working at a company, GitHub is an important tool if you're sharing your code with a software department that will turn it into production code.

At one point or another, we've all announced that we're going to really learn how to use GitHub once and for all. GitHub has so many useful features (including tracking changes in your code as well as working on older versions of your code), but to be honest, data scientists can usually get away with using it for simple commits to the main branch, with a few branches for running alternative scenarios. It's as simple as that.

Well, this is your sign to begin using it – for real this time.

Comprehensive Guide to GitHub for Data Scientists

3. Not testing your code

We've all been there, where we avoid running and testing our code for as long as possible because we were afraid of what might happen. 1% of the time we might get lucky and have everything run properly, but I can guarantee you that the other 99% of the time everything goes haywire.

Coming from an education in software development, they drilled testing into us regularly. Not testing your code regularly was seen as a sin and something that we got quick at. Not only did it help us find bugs and errors immediately, but it also meant that we weren't having to sift through hundreds of lines of code to find the issue.

Testing your code is as simple as writing unit tests, a type of test that involves checking that functions, objects, or classes (the units) are working properly. A simple way to carry out unit testing is to print the output of functions based on the input you give them.

How to Write Unit Tests for Python Functions

4. Not breaking down complex problems into simple variables and functions

Data science problems are often complex and involve many moving parts. These problems can be intimidating and without being broken down into simple parts, can lead you to sit staring at your computer screen until it's time to go home without having written a single line of code.

The trick is to begin a problem with the end in mind and then break it down into simple variables and functions – because really, that's all that code is, are variables and functions.

To start, you need to ask yourself what this problem is trying to solve and what the outcome of this solution will be. This will help you begin to figure out what pieces of code you'll need to reach the final goal. Once you've determined what the solution should look like, you can begin laying out the individual variables and functions that you'll need to make it happen.

See, you've already broken down your complex problem into manageable tasks! Doesn't that feel better?

After adding these tasks to your Kanban board (my favorite way to stay organized while working on a complex problem), you can then start creating the smaller parts of the project that will lead to a whole solution.

5. Not refactoring your code

Code refactoring refers to restructuring your code without changing its original function. While refactoring is typically seen in software development scenarios, it can also be used by data scientists to clean up their code.

Refactoring is easier than it sounds: take a look at some old code you've written and ask how it could be written more efficiently. Then, applying good coding practices, clean up your code until it looks better than it did before.

Refactoring is a task best done after you've written code that works. For example, when you're first working on a data science problem, you want to make sure that your code works, regardless of if it's pretty. Then, once you've ensured that it gives the proper outputs, you can go back in and clarify variable names, properly indent your code, use python syntax standards to create functions that eliminate redundancy, and generally re-write anything that looks like a pile of spaghetti.

I recommend doing any refactoring only after you've gotten out of the flow state and gotten your code to work. Stopping and refactoring your code every time you write a line will knock you out of the zone and will make it take ten times longer to finish your code. Much like how it's suggested that you don't worry about spelling or grammar until you've gotten your ideas down on paper, wait until you've written all of your code to make it look pretty.

6. Not keeping your code organized

Bad organization skills are something I often came across during my time in college studying software development. As much as many of my classmates were brilliant developers, organization skills were not their strong suit which left a lot to be desired when looking at their code.

Learning proper code organization skills early on can help you create code that is easy to navigate and work with, and that will reduce the time it takes you to push out projects.

According to Karl Broman, code and data organization are as simple as:

Keeping everything for a project in a single directory. This directory should include all of your data, code, and results for a project, which makes it easier to work on in the future or hand off to someone else.
Separating raw data from derived data. Keep two subdirectories, one that contains raw data and the other that contains derived data. A subdirectory that contains data summaries can also be useful to keep your data organized.
Keeping your data separate from your code. Keep your code in one subdirectory and your data in another subdirectory (or three as described above).
Staying away from absolute paths and instead using relative paths: When it comes to collaborating with other people who may not have copies of your project directory in the exact same location, it's important to use relative paths to allow them to open and access all of your files.
Choosing good file (and variable, and function) names: Raw data file names should be kept the same as when you got them, but code file names should be as descriptive as possible. The same goes for variable and function names, naturally.
Never use "final" in a file name: As Broman says, "nothing is ever final". Multiple versions of a file should be appended with a version number, but the "final" version should never be labeled as such.
Writing documentation and README files. Documentation is necessary to explain what everything is and does. Good documentation involves describing the files and the processes they contain. Keeping README up to date is important, as is including your contact information if anyone has further questions.

One of the tasks I've taken on in previous projects is organizing my team's code. Not only is it a great way to learn how the code works, but it's also an important skill to become proficient in. Make an organization system that works for you (mine is quite similar to the one described above) and put it into practice for every project, big or small.

7. Not taking breaks

I'll let you in on a little secret: you write bad code when you don't take breaks.

Hustle culture is still alive and well in tech, where it's not abnormal to work 90-hour weeks, only getting up once in a while to refill your soda and not seeing the sun for days on end.

While it may seem like your boss owns you, they don't own your health (both physical, mental, and emotional).

So I want you to pledge right now that you'll get at least one hour of activity a day, you'll drink something other than soda at least twice a day (water is a good place to start, coffee doesn't count), you'll get outside at least once a day, and you'll try to eat food that isn't takeout.

Trust me, I can't remember the number of times that I've solved coding or logical problems while out walking the dog. Fresh air seems to do wonders for the problem-solving part of your brain.

Sitting in front of your computer banging out code for eight or more hours every day without any type of break is going to leave you burned out and lacking in productivity even more than you already are. In other words, life is short, your job isn't everything, and taking care of yourself will make you a better data scientist in the long run.

Subscribe to get my stories sent directly to your inbox: Story Subscription

Please become a member to get unlimited access to Medium using my referral link (I will receive a small commission at no extra cost to you): Medium Membership

Support my writing by donating to fund the creation of more stories like this one: Donate

Tags: Artificial Intelligence Data Science Habits Machine Learning Programming