Streamline Your Workflow when Starting a New Research Paper

Author:Murphy  |  View: 27176  |  Time: 2025-03-22 19:18:06
Photo by Maksym Kaharlytskyi on Unsplash

I am a researcher with over seven years of experience working in public health and epidemiological research. Every time I am about to start a new research paper, I create a folder for this project, inside multiple folders for each section of my work, and Word documents for the manuscript with specific headings. Having published over 170 peer-review papers, I must have done this process over 200 times. I think it is about time to automatize this process, and I will share with you how to do so!

At the end of the post, you will find the full code. You only need to copy and paste it onto your favorite Python environment and hit run. The code will generate the function create_project_structure. This function will create a folder structure and Word documents ready for you to go straight to work on your research paper.


What does this function do?

This function will generate a folder in a specified path (base_path), and this folder will be named as you wish (project_name). This function will also create two Word documents, one for the supplementary materials and the other one for the manuscript with headings. The create_project_structure function needs two inputs: base_path and project_name.


What is the folder structure?

The function will generate a folder with the following structure.

Project Name (project_name)

— -00.References

— -01.Datasets

— -02.Scripts

— -03.Figures

— -04.Tables

— -05.Supplementary Materials

— -06.Manuscript

— -07.Submissions

Finally, inside these folders there will be a folder named _old. This is a personal preference. I prefer to move old files into the _old folder, so that the main folder looks tidy while I keep old versions as a backup.


What is the purpose of each folder?

00.References: to keep all references for this project, you may also store here your EndNote file or any other reference manager you use.

01.Datasets: to keep all the relevant datasets for your analysis. You might keep the original dataset, the dataset after applying your selection criteria, and the dataset with the predictions.

02.Scripts: to keep your analysis code, whether R scripts or Python Jupyter Notebooks.

03.Figures: to save your figures which most likely were created with your scripts.

04.Tables: to save your tables, also probably automatically created with your scripts.

05.Supplementary Materials: to save supplementary materials you need to publish alongside your paper.

06.Manuscript: to keep the working manuscript. In many epidemiology and public health journals the only accept Word documents (or PDF). We still use Word and share the document to collect feedback from co-authors. Some people work online like in Google Docs.

07.Submissions: inside this folder I will create a new folder for each journal to which I submit the paper for consideration. I like to keep all submitted documents for each journal. This helps to keep track of all journals we submitted to.


Word documents created

The function create_project_structure will also create two Word documents. One Word document will be inside the folder 06.Manuscript. The other Word document will be inside the folder 05.Supplementary Materials. We usually use Word documents to write these sections. Conversely, in other research fields they may use LaTeX (e.g., Overleaf).

Both Word documents will be named with a suffix showing today's date. In addition, the Word document in the 06.Manuscript folder will be created with headings level 1 and level 2. These headings represent the standard sections of many Biomedical, public health, and epidemiology journals. Feel free to edit the headings according to your needs!


Subheadings in the manuscript Word document

The Word document in the 06.Manuscript folder will have headings that are standard across several journals. They may vary according to your paper, area of research, and the target journal. Headings with one digit are level 1, and headings with two digits are level 2.

  1. Title
  2. Abstract
  3. Introduction
  4. Methods

3.1. Study design

3.2. Data sources

3.3. Study population

3.4. Variables

3.5. Statistical Analysis

3.6. Ethics

  1. Results

4.1. Description of study population

4.2. Main findings

4.3. Complementary findings

  1. Discussion

5.1. Main findings

5.2. Implications

5.3. Strengths and limitations

5.4. Conclusions

  1. Disclosures

6.1. Acknowledgements

6.2. Contributions

6.3. Funding

6.4. Conflict of interest

6.5. Data sharing

6.6. Code sharing

  1. Tables
  2. Figures
  3. References

These headings are largely consistent with some reporting guidelines such as STROBE.


How to run the code?

If you are working on a Jupyter Notebook, for example using Virtual Studio Code, you only need to copy the full code in one cell and run it. Please, make sure you have the necessary libraries. Make sure you have the library to manipulate Word documents (pip install python-docx). After you run the cell with all the code, you will be prompted to input the folder path (base_path) and then the project name (project_name). A field will be activated at the top of the window for you to input this information. Type the path and press enter. Type the project name and press enter.

Screenshot by the author.

That's all you need! You will have created a folder with the structure showed above as well as two Word documents. In a few seconds you have the environment ready for you to quickly work on your next research paper.

Once you have entered both pieces of information (base_path and project_name), the cell will stop running and you will see the following output.

Screenshot by the author. blurred for privacy purposes

As you can see, several folders have been created inside the specified path, where I created the project My Research Project3.


Below you will find the code you need to copy and paste.

import os
from datetime import datetime
from docx import Document

def create_project_structure(base_path, project_name):
    # Define the main folder structure
    subfolders = [
        "00.References",
        "01.Datasets",
        "02.Scripts",
        "03.Figures",
        "04.Tables",
        "05.Supplementary Materials",
        "06.Manuscript",
        "07.Submissions"
    ]

    try:
        # Get current date
        date_suffix = datetime.now().strftime("%Y-%m-%d")

        # Create the main project folder
        project_path = os.path.join(base_path, project_name)
        os.makedirs(project_path, exist_ok=True)
        print(f"Created main folder: {project_path}")

        # Create subfolders and the "_old" folder within each
        for folder in subfolders:
            subfolder_path = os.path.join(project_path, folder)
            os.makedirs(subfolder_path, exist_ok=True)
            print(f"Created subfolder: {subfolder_path}")

            old_folder_path = os.path.join(subfolder_path, "_old")
            os.makedirs(old_folder_path, exist_ok=True)
            print(f"Created '_old' folder: {old_folder_path}")

        # Create the Manuscript Word document
        manuscript_path = os.path.join(project_path, "06.Manuscript")
        manuscript_filename = f"Manuscript_{date_suffix}.docx"
        manuscript_file = os.path.join(manuscript_path, manuscript_filename)
        create_manuscript(manuscript_file)
        print(f"Created manuscript file: {manuscript_file}")

        # Create the Supplementary Materials Word document
        supplementary_path = os.path.join(project_path, "05.Supplementary Materials")
        supplementary_filename = f"Supplementary_Materials_{date_suffix}.docx"
        supplementary_file = os.path.join(supplementary_path, supplementary_filename)
        create_blank_document(supplementary_file)
        print(f"Created supplementary materials file: {supplementary_file}")

        print("Project structure created successfully!")
    except Exception as e:
        print(f"An error occurred: {e}")

def create_manuscript(file_path):
    """Creates a manuscript Word document with the specified structure."""
    document = Document()

    # Level 1 and Level 2 headings
    headings = [
        "0. Title", "1. Abstract", "2. Introduction", "3. Methods",
        "3.1. Study design", "3.2. Data sources", "3.3. Study population", "3.4. Variables", 
        "3.5. Statistical Analysis", "3.6. Ethics", "4. Results",
        "4.1. Description of study population", "4.2. Main findings", "4.3. Complementary findings",
        "5. Discussion", "5.1. Main findings", "5.2. Implications", "5.3. Strengths and limitations", 
        "5.4. Conclusions", "6. Disclosures", "6.1. Acknowledgements", "6.2. Contributions", 
        "6.3. Funding", "6.4. Conflict of interest", "6.5. Data sharing", "6.6. Code sharing", 
        "7. Tables", "8. Figures", "9. References"
    ]

    for heading in headings:
        if "." in heading and heading[2].isdigit():  # Check for level 2 headings
            document.add_heading(heading, level=2)
        else:
            document.add_heading(heading, level=1)

    document.save(file_path)

def create_blank_document(file_path):
    """Creates a blank Word document."""
    document = Document()
    document.save(file_path)

# Example usage
if __name__ == "__main__":
    user_path = input("Enter the base path for the project: ")
    project_name = input("Enter the name of the project folder: ")
    create_project_structure(user_path, project_name)

If you found this code useful, share it with your friends and colleagues. Also, give this story your love with thumbs up and feel free to connect with me on LinkedIn. Feel free to work on this code to improve it for other type of journals or research fields or make it a free software!

Tags: Biomedical Data Science Productivity Programming Research Methods

Comment