Top 12 Skills Data Scientists Need to Succeed in 2025

Author:Murphy  |  View: 29523  |  Time: 2025-03-22 19:09:04
Source: Image made by author, with help from Claude.

The AI landscape is moving faster than a rocket ship in 2025, and hanging on is getting harder and harder!

Will you keep your current position, get hired, get promoted or get sacked? That depends on YOU and how fast you can adapt to change.

This isn't to say that if you don't adapt – you will perish.

Many things are changing, but other things are not. Understanding which changes require your attention is the key to success.

Yes, the new AI revolution is proliferating in huge sections of the economy, with a ton of new tools to boost productivity and automate many tasks. And if you were overwhelmed last year, better buckle in for another wild ride.

So, how should you handle this always-accelerating train of AI hype and tools?

By focusing on what matters.

Although AI tools are super shiny and powerful, many of the skills that will really help you succeed in your career haven't changed too much for the past few decades and even for several millennia!

I'll guide you through the top 12 essential skills you need to thrive as a data scientist, ML engineer, or applied scientist in 2025. From timeless capabilities to emerging technologies, I'll give you practical examples and resources to help you get started and develop each skill. In other words:

This post will be your one-stop shop for leveling up in 2025.


Before We Start

What does a data scientist do in 2025?

Well, it depends.

First of all, I'm going to loosely use the term data scientist for roles that handle data, machine learning, deep learning, and generative AI models. However, other names for such roles include machine learning engineer, applied scientist, algorithm developer, and many more.

So, what should you expect in these roles? Well, I've reviewed the stated responsibilities of over 500 job descriptions, and here are my findings:

Top 5 Responsibilities Of Data Scientists:

  1. Data & Modelling: Process and engineer large-scale datasets. Utilize statistics and data analysis. Build, train, and evaluate ML models (including LLMs, computer vision, tabular data, time series, audio, and recommendation systems). Develop GenAI applications and workflows.
  2. Research & Development: Build internal tools. Conduct literature reviews. Stay current with the latest developments. Own projects from POC to deployment.
  3. Infrastructure Design & Development: Design cloud-based model serving infrastructure. Build data, training, and inference pipelines.
  4. Performance & Monitoring: Define and track success metrics. Build dashboards and monitoring tools. Ensure model reliability at scale.
  5. Collaboration & Communication: Present to stakeholders and clients. Share knowledge with team members. Work across different teams.

So, how much of the AI hype train made it into these job descriptions? Not as much as you would think. Take advantage of this and choose specific areas where you want to grow tall, but never forget your roots.


So What Skills Are Important In 2025?

Source: Image created by author with Dall E 3.

1. Communication Skills

Nobody knows what you are doing and what you are thinking.

Is there anything that you want? Do you have any issues? Are you stuck? Do you have any questions? Don't keep it in. Let it out.

Why should you get what you want if you didn't ask for it?

Communicating your problems, issues, thoughts, results, and conclusions is vital to success in Data Science and to being productive in general.

I know it's sometimes embarrassing to ask silly, "dumb," and naive questions. However, from my experience, those who frequently ask "dumb" questions get ahead faster. This is because they learn faster, generate value faster, and gain valuable connections and trust.

I personally love asking "dumb" and "trivial" questions, like "what is an error?". It almost always makes people stop in their tracks and question their most basic assumptions, usually leading to much better understanding of the project and problem, while improving communication between team members.

Take control of your career, interrupt your friend, and ask your questions. They will respect you more for it. The more you practice communicating your needs, thoughts, and ideas, the better you will become at it – not just in your career but in life in general.

What should you practice?

  1. Explaining technical stuff: Try to translate what you are doing or what you just learned to your non-technical friend or family member. Use the [ELI5 technique](https://blog.groovehq.com/the-eli5-technique.) (Explain Like I'm 5) and lead with the business impact before diving into details.
  2. Storytelling: Create a narrative for the story you want to tell. Create compelling visualizations or presentations to support your story. You can gradually build up complex concepts using the Progressive Disclosure Principle or the Pyramid Principle, which does it in reverse.
  3. Communicating with stakeholders: Stakeholders don't have spare time. They want information fast and to the point. Practice the Bottom Line Up Front (BLUF) and Red/Amber/Green (RAG) techniques to deliver your message.
  4. Communicating in writing: Use BLUF and RAG, then summarize what's been done, what's next, and what's blocking progress. When documenting your work, focus on the problem, methodology, reproducibility, and accessibility.
  5. Avoiding pitfalls: Think before your talk and don't jabber on without any structure; avoid overwhelming with technical details or jargon; lastly, don't procrastinate and wait too long to communicate; it's always better to communicate something than to do nothing at all.

Where Can You Practice?

There are many opportunities to look for!

Try to explain your work to colleagues, friends, or family members. Send frequent status updates through email or your messaging app. Volunteer to present at team meetings or conferences. Join or start a journal club to practice discussing technical papers.

You could also seek feedback on your offline communications and presentations. Create a blog or contribute to technical documentation to improve your offline communication.

The opportunities are endless, you just need to look for them!


2. Programming Skills (Python)

That's right, Python is number 2.

After communication, I don't need to tell you that core programming skills are super important for data scientists.

As a data scientist, you need a scripting language like Python to do what you need to do! From data scraping, extraction, manipulation, and analysis to developing custom models and their training, evaluation, inference pipelines, deploying services, and building interactive applications.

You need more than just the machine-learning-related libraries. (That's number 6 on this list).

Be familiar with the full breadth of what you can do in Python.

Try to familiarize yourself with as many built-in modules as you can in the Python Standard Library. It provides a ton of super useful tools that will expand the horizons of what you thought Python could do.

For example, [dataclasses](https://docs.python.org/3/library/dataclasses.html) are very useful tools for managing data passed through objects. This example shows the object detection outputs of one model (object detector) being passed to the second model (pose estimator).

<script src="https://gist.github.com/BjBodner/2f1a9107f337f17dcabf563173a6c243.js"></script>

Running this should produce the following output:

----------------------------------------
detection 1: person, conf: 91.94%
bbox (92, 46, 62, 171)
keypoints:
[[114.488144  96.92579 ]
 [144.00072   67.28505 ]
 [138.01134   59.30764 ]]
----------------------------------------
detection 2: cat, conf: 25.65%
bbox (175, 65, 31, 93)

3. Deep Understanding Of Data

Do you know what's in your data?

3.1 Data Validation

What data issues do you have? which edge cases, corruptions, noise?

If you don't know the answers to these questions, you might build a data pipeline or preprocessing component that produces bugs that are hard to understand and catch, or worse yet – silent errors! (Where everything runs fine, but you get behaviors that you don't want).

How can you avoid these issues?

By explicitly testing ALL the assumptions you have about your data.

For example, is your data free of duplicates and missing values? Are all the values numeric, strings, or tensors? Are all the labels positive integers? Are all the files present? Are the paths structured as you believe they are?

<script src="https://gist.github.com/BjBodner/98219d8445facca982749726528b5b1f.js"></script>

Think of it like tests for your data.

Too many times have I skipped this crucial step and ended up paying the price of silent bugs that are super hard to catch by without explicit testing. That's why I never forget Bob Colwell's famous saying: "If you didn't test it, it doesn't work".


3.2 Understanding Your Data

Source: image from Deep Learning vs Data Science: Who Will Win?

Exploratory data analysis (EDA) is your friend!

Understanding the distributions involved, the difficulty of the learning task when training on the data, how to partition your data to provide insights, analyze performance, improve training, and encourage behaviors.

This will allow you to find the things you didn't know that you didn't know!

Hypothesis testing is good for finding things you know you don't know, e.g., you know which questions to ask but you don't know the answer to.

But what about questions you didn't think about asking?

This is where visualization comes in. This is your chance to open your mind and let it race through possibilities. It is your chance to find patterns and learn to ask new questions you've never thought about asking.

Get very comfortable with tools such as [matplotlib](https://matplotlib.org/), [seaborn](https://seaborn.pydata.org/), [plotly](https://plotly.com/), and related libraries. Any LLM can likely implement some starter code for visualizing your data, but you should still know how to go in and tailor it to exactly what you need.

Remember, creating clear, informative visualizations that communicate insights effectively is a skill, not an art, and practice makes perfect.


3.3 Understanding The Effects

Last but not least, think about what behavior you expect your model to exhibit if it will be trained on the data.

Will your model really be able to learn what you want it to learn?

Can you use feature engineering to improve the learning signal? This is one of those places where you should try to leverage your mathematical intuition, knowledge of statistical methods, and probability theory.

This is also relevant to evaluation data. Try to think about what it means to get certain metrics on a certain partition of the data. Does it really support or disprove a hypothesis or business objective? If not, what kind of data will be better aligned with the goal?


4. Software Engineering Best Practices

Didn't I cover this already?

No. We talked about "how" to do things with code, now let's talk about "what" you should be doing.

Each programming language has its own upsides, downsides, perks, and quirks. That said, there are many best practices in software engineering that you should master, as these concepts will give you a competitive edge over your peers and make you a true professional in your craft.

From my experience (as someone who learned these concepts way too late in my career), these skills will likely provide much more value than any single framework or tool:

  1. Version control (Git and Github): careful commits, branches, pull requests, and documentation.
  2. Writing clean code and avoiding smelly code.
  3. Different types of testing.
  4. Object-Oriented Programming (OOP) concepts: Basic and SOLID principles, as well as common design patterns.
  5. Containerization.

Take this example of smelly code:

<script src="https://gist.github.com/BjBodner/687685b29aed5f13c40422811da73ec7.js"></script>

What wrong with it?

Many things actually.

The names of the method and variables are not meaningful(l,x,y), have no type hints, don't have input validation, repeated logic (>= comparisons) and may produce undesirable behaviors, empty lists, and invalid grades.

Can we do better? Of course, we can! Check out this code instead:

<script src="https://gist.github.com/BjBodner/c0153ea18c0e407c22678b8e4e2daab7.js"></script>

This clean code is much, much better!

For example, the variable and function have meaningful names and type hints that clearly define inputs and output types (making it easier to read and test), constants are clearly defined, there is no repeated logic, and input validation ensures reliable behavior.

For further reading, I highly recommend checking out refactoring.guru, which has a ton of really good and precise information about these topics.


5. Interacting With Databases

As a data scientist, don't you work with data?

And where does your data live?

Picture this: You've just joined a fast-growing startup as a data scientist. On your first day, you discover your data is scattered across half a dozen different systems. Customer information lives in PostgreSQL, user behavior data streams into MongoDB, session data sits in Redis, and your marketing team just dumped a treasure trove of campaign data into Snowflake.

Welcome to the modern data ecosystem.

Don't get overwhelmed, though; it's not as bad as you think. There are all kinds of databases out there, but the most basic families are relational and NoSQL databases.

  • Relational Databases (such as PostgreSQL and MySQL): These keep data in structured tables, which are good when your data can fit naturally into tables with rows and columns, has well-defined relationships between different types of data, has a relatively stable data schema, and requires you to run complex queries.
    <script src="https://gist.github.com/BjBodner/46632a694313a902a8949266ce59744b.js"></script>
  • NoSQL Databases: This family actually has several different ways to store data. It is actually a family of "all the other storage approaches," including: a. Document-based: such as MongoDB. Good for semi-structured data. b. Key-value based: such as Redis. Good for simple and fast lookups. c. Wide-column based: such as Cassandra. Good for handling massive amounts of structured data across several servers. d. Graph-based: such as Neo4j. Good for situations where the relationships between entries are important. e. Vector-based: such as Pinecone. Good for similarity queries and storing high dimensional data, like feature embeddings.
    <script src="https://gist.github.com/BjBodner/bacd2feb4f7b18a5ccd35126150b01cf.js"></script>

    So, Which Query Languages Should You Learn?

It's okay to be a bit lazy on this one.

I'd recommend starting with MySQL and MongoDB, as I think these are good places to start. If you want, you can take a look at Redis, DynamoDB, Pinecone, Neo4j, and others. However, don't get caught up in this.

It would also be good to get some basic understanding of what data pipelines, data lakes, and data warehouses are.

As a data scientist, you won't likely need to be a grandmaster of these frameworks. Just start with the very basics, build as you go, and always keep your eye on the real goal: turning data into insights.


6. Cloud Computing

Example of virual private cloud (VPC). Source: image made by author with Claude.

How can you train your model if you can't get your data and don't even have a computer?

As a data scientist, you will likely be working with cloud platforms (such as AWS, GCP, Azure) for data storage and processing, as well as model training, evaluation, and deployment.

Even though you will likely always have a DevOps team supporting you, some basic understanding and hands-on experience with cloud services will give you a clear edge over your peers!

It will be like learning how to swim, but in a pool of abstract concepts.

This knowledge will help you communicate your needs better, give you more independence with your computing resources, help you get more and stronger GPUs, and accelerate your development, training, and inference!

For example, in one of my previous projects, I managed to 2X the number of strong GPUs my team had by optimizing the resources my team was using within its budget.

How To Get Started?

As with all technical skills, I think the best way to learn them is to start right away with hands-on tutorials. Let's play around a bit with Amazon's Simple Storage Service (S3) to save and download some data.

6.1 Let's Start With Account Setup And Installations:

If you don't have an account, create one at aws.amazon.com, then create an IAM user with a policy that gives you access to S3.

Didn't understand any of that? no worries.

Hopefully, your DevOps team will be able to help you with setting this up, and if not, I'd recommend checking out this great tutorial on IAM, this one on S3, and this hands-on tutorial, too.

Ready?

Let's go!

First, let's install the rest of the pip packages:

pip install awscli boto3 torch torchvision

Now, configure your environment.

aws configure
# Enter your:
# - AWS Access Key ID
# - AWS Secret Access Key
# - Default region (e.g., us-east-1)
# - Default output format (json)

After you set this up, your environment will be linked to your IAM user with an access policy to do things with S3, either through the AWS command line interface (AWSCLI) or through their Python library (boto3).

Got that? Good.

6.2 Setting Up Our Helper Class

Let's now make a little class to manage uploading, downloading, and verifying data to and from S3. Here specifically, we'll handle images and a labels.pt tensor from a local directory.

<script src="https://gist.github.com/BjBodner/3d6e34f581c0f08caef4c9c61499ece4.js"></script>

6.3 Interacting With S3

Then, we can run these operations using the following script, in which we will upload the first 10 images and labels of the FashionMNIST test dataset (MIT license).

<script src="https://gist.github.com/BjBodner/1968f036b1f8fe9330010926f2c1a88f.js"></script>

Running this should produce the following output, in which we confirm that the files we downloaded (after we uploaded them) are the same files we started with.

data saved locally to dataFashionMNISTprocessedtest
Created bucket: fashionmnist-s3-demo-20241230210543
uploading data
done
downloading data
done
downloading data
done
Dataset verification: passed

Lastly, we can also use the command line interface to confirm the data is still present in our S3 bucket. In this case, let's use the AWSCLI:

aws s3 ls --recursive {BUCKET_NAME}

$ aws s3 ls --recursive fashionmnist-s3-demo-20241230210543
2024-12-30 21:05:45        394 cifar10/test/images/0.png
2024-12-30 21:05:45        582 cifar10/test/images/1.png
2024-12-30 21:05:45        391 cifar10/test/images/2.png
2024-12-30 21:05:45        408 cifar10/test/images/3.png
2024-12-30 21:05:45        635 cifar10/test/images/4.png
2024-12-30 21:05:45        444 cifar10/test/images/5.png
2024-12-30 21:05:45        542 cifar10/test/images/6.png
2024-12-30 21:05:46        700 cifar10/test/images/7.png
2024-12-30 21:05:46        224 cifar10/test/images/8.png
2024-12-30 21:05:46        381 cifar10/test/images/9.png
2024-12-30 21:05:46        808 cifar10/test/labels.pt

Important!

Always clean up the cloud resources you use to avoid excessive costs! Note that in this demo, the buckets will have different names (with timestamps) each time you run the script, so make sure to delete these buckets in S3 when you are done!

aws s3 rm s3://{BUCKET_NAME} - - recursive
aws s3 rb s3://{BUCKET_NAME}

7. Mastering Machine Learning Frameworks

You know this, so I won't dive deep with this one.

As a data scientist, you need to know how and be willing to get your hands dirty with ML frameworks, so it is important to get good at using them! These include frameworks such as PyTorch, TensorFlow, and Scikit-Learn.

For example, here is a use case where you want to train a custom head for specific tokens of a pre-trained vision transformer to localize objects in a patch of an image. It also uses different learning rates to leverage pertained weights without "erasing" them.

You probably won't find this kind of functionality in high-level trainers, such as when using the [Hugging Face](https://huggingface.co/docs/transformers/en/main_classes/trainer) or [Fastai](https://fastai1.fast.ai/training.html) trainers.

<script src="https://gist.github.com/BjBodner/fa066c7ef5a862d23505ca624da9a211.js"></script>

8. MLOps

Ever forget where you put your best model?

MLOps is a catch-all phrase that includes all the tech needed to push an ML project through its entire lifecycle, from initial experiment tracking to production deployment. This includes containerization, monitoring, and maintaining model performance over time.

Do you need to use all this stuff even if you are working alone? on something that's experimental? On a one-week project?

Yes, yes and yes.

From my experience, moving from "we're just trying stuff out" to "we need to work in an organized way" happens way too slowly and sometimes never!

My rule of thumb is:

For ANY project, always use an experiment and configuration manager to track your configurations, metrics, and model checkpoints.

This is why you should master the basic features of experiment managers, such as [wandb](https://wandb.ai/site/), [mlflow](https://mlflow.org/), [clearml](https://clear.ml/?utm_feeditemid&utm_device=c&utm_term=clearml&utm_source=adwords&utm_medium=ppc&utm_campaign=Search%20%7C%20Brand&hsa_cam=16563267736&hsa_grp=134088880946&hsa_mt=e&hsa_src=g&hsa_ad=636186357048&hsa_acc=4043203093&hsa_net=adwords&hsa_kw=clearml&hsa_tgt=kwd-784171082579&hsa_ver=3&gad_source=1&gclid=Cj0KCQiAvP-6BhDyARIsAJ3uv7ar2la2E9xUmv68XJBn9E67dMsdCCNSr0kcrtPip1t7f2aAWFOIlNkaAnucEALw_wcB), [neptune](https://docs.neptune.ai/usage/), and things like that. Don't worry; they all have the same basic features, so just focus on one.

More elaborate MLOps components can wait until the project is more mature, but learning them will give you an edge. These include data, training, evaluation, CI/CD and inference pipelines, model and data versioning, ML monitoring, and more.

Here is an example of using [wandb](https://wandb.ai/site/), to train a model and track its metrics and configuration, then save it to a model registry using their artifacts API.

<script src="https://gist.github.com/BjBodner/12516c8a4a85aa6108d41f732daf9081.js"></script>

When opening up the [wandb](https://wandb.ai/site/) web app, you should see the experiments that ran and their metrics, configuration, and associated saved artifacts.

Screenshot of artifacts being created in wandb. Source: screenshot by author.
Screenshot of metrics being tracked in wandb. Source: screenshot by author.

9. Understanding Metrics

One of the hardest things in data science is deciding what it means to succeed at your task or goal. You can probably think of a ton of metrics, such as [accuracy](https://scikit-learn.org/1.5/modules/generated/sklearn.metrics.accuracy_score.html), [F1](https://scikit-learn.org/1.5/modules/generated/sklearn.metrics.f1_score.html), [precision](https://scikit-learn.org/1.5/modules/generated/sklearn.metrics.precision_score.html), [recall](https://scikit-learn.org/1.5/modules/generated/sklearn.metrics.recall_score.html), [AUROC](https://scikit-learn.org/1.5/modules/generated/sklearn.metrics.roc_auc_score.html), [mAP](https://towardsdatascience.com/map-mean-average-precision-might-confuse-you-5956f1bfa9e2), [MSE](https://scikit-learn.org/1.5/modules/generated/sklearn.metrics.mean_squared_error.html), etc.

However, are these what you really care about? If your model is really good at one of them, does it mean your project is a success?

Typically not.

To develop systems that provide value to your clients and stakeholders, you need to understand their needs, define what success means, and find a way to quantify it.

Many metrics can be useful for improving the model, but choosing the right metric that indicates success will ensure you really provide value.

<script src="https://gist.github.com/BjBodner/8290353dc78ca416fce31073b2c5490b.js"></script>

How To Get Good At This?

Try refreshing your knowledge of different loss functions and performance metrics. You can review the supported metrics in Machine Learning frameworks, such as [sklearn.metrics](https://scikit-learn.org/1.5/modules/model_evaluation.html), [torchmetrics](https://lightning.ai/docs/torchmetrics/stable/), [torch loss functions](https://neptune.ai/blog/pytorch-loss-functions), and more.

Many of these metrics have deep foundations in statistical measures and probability theory. However, I believe that trying to understand these metrics directly can be more beneficial than studying these fields.

Wanna gain more intuition? be proactive and try to interact with these metrics by running them with different inputs and plot what you get. Make sure you are able to explain to someone else why the plots look this way.

This is how you develop an intuitive feeling for these topics.


10. Problem-Solving & Critical Thinking

Source: image made by author with Dall E 3.

Have you ever been in this pose? Isn't it the best? Being immersed in thought, considering different approaches, solutions and outcomes.

Solving problems should be your bread and butter. However, unless you are Einstien, you will need a systematic way to tackle new problems. Here's the one I use:

  1. Clearly define your problem: your current situation, desired outcome, constraints, and success metrics.
  2. Before diving in, take a step back: is an algorithmic solution really necessary? Is there sufficient quality data? Does solving this problem provide value to your stakeholders, customers, or team?
  3. Make Data-Driven Decisions: Be naive, start small, and iterate. Try a super simple hypothesis and test it. Document the hypotheses you test, run small experiments, record outcomes, adjust hypotheses, add complexity, and repeat.

That said, definitely consider different problem-solving frameworks that might work better for you, such as PDCA, IDEAL, 5 Whys, and First Principles Thinking.

When looking for simple solutions and ways to test your hypothesis, statistical measures are your friend. These include measures such as Student's T-test, Pearson correlation, and Wilcoxon signed-rank test.


11. AI-Based Tools And Workflows

Wait, didn't you say it's not all about the AI hype train?

Okay you got me.

I did say that, but I also think they can provide real value in automating certain tasks you already sort of understand. This is because you should never trust the outputs of these AI models without verifying them yourself.

This will ensure you provide real value and **** don't look like an idiot who just copied the output of an AI model.

11.1 AI-Based Tools

There are many different AI tools out there, so I would like to focus more on how you should be using these tools rather than which tools to use.

  1. Code-related AI Tools: generate, edit, review, and test your code – whether through the API, UI (e.g., open AI canvas), or in your IDE. If you are not using these, wake up! These tools are a must in 2025 and will significantly boost your productivity and the quality of your code.
  2. Media generators (image, audio, 3D, and video): As a data scientist, you must communicate your ideas and results quite a bit. (Remember the number one skill?). These tools can help you do so in presentations.
  3. AI coworkers: brainstorm ideas, feedback, and advice regarding work and life decisions. Always take it with a grain of salt.
  4. Knowledge gateways: whether it is a reading assistant, a summarizer, a search engine, or just an LLM response. These tools can make knowledge more accessible, but be on the watch for hallucinations!
  5. Communication assistants: whether for translating to or from your native tongue, drafting an email, a letter, or a slide, these tools can likely save time and help improve your communication skills.
  6. "In-app" AI tools: AI tools such as Copilot are used for spreadsheets, slideshows, text editors, etc. These AI-centric user interfaces can save you time compared to the old graphical user interfaces.

11.2 AI-Based Workflows

Even though not every data scientist today is working to develop API-based AI applications, I think that it is vital to at least be familiar with the range of possible applications and workflows you can build with AI APIs.

I know what you might be saying:

Sounds complicated and either way, I don't need this.

But I disagree. These applications can now work with any type of data and have the potential to automate huge chunks of your day-to-day work and let you focus on the things that really matter – doing good data science.

Best of all, it's actually pretty simple to start building workflows! It is definitely simpler than many other machine learning systems.


Let's take a look at a super simple example of building a workflow with an LLM-based agent that decides which (if any) tools to use.

If you are new to these things, think of "tools" as rule-based actions that the agent can decide to take based on its inputs. These can include running a method or a script, executing a terminal command, calling another API, using some resource, or even creating and/or running a different agent.

11.2.1 Installation And Environment Setup

pip install -qU langchain-anthropic langgraph langchain-core

Get your Anthropic API key here and set it as an environment variable:

export ANTHROPIC_API_KEY='your-api-key-here'

11.2.2 Imports And Helper Functions

Let's define the tools we want the model to use and some helper functions for calling the model and routing based on its decisions.

<script src="https://gist.github.com/BjBodner/93d87c452e41dd6ca50fcab2ea69dd18.js"></script>

Next, we can construct and run a workflow that uses the model to call the relevant tools based on its input. In this case, we will use a fixed input, though you can also parse it from your terminal or load it from a file.

11.2.3 Running The Graph

<script src="https://gist.github.com/BjBodner/2e4550d2dcd52900592bcb0a85abb005.js"></script>

This should produce the following output:

================================ Human Message =================================
what's the weather in the coolest cities?
================================== Ai Message ==================================
[{'text': "I'll help you find out the weather in the coolest cities. I'll break this down into two steps:nn1. First, I'll get the list of coolest citiesn2. Then, I'll check the weather for each of those cities", 'type': 'text'}, {'id': 'toolu_012ZCXz6Db7bFa4M4FVmibQ3', 'input': {}, 'name': 'get_coolest_cities', 'type': 'tool_use'}]
Tool Calls:
get_coolest_cities (toolu_012ZCXz6Db7bFa4M4FVmibQ3)
Call ID: toolu_012ZCXz6Db7bFa4M4FVmibQ3
Args:
================================= Tool Message =================================
Name: get_coolest_cities
nyc, sf
================================== Ai Message ==================================
[{'text': "Now, I'll check the weather for New York City (NYC) and San Francisco (SF):", 'type': 'text'}, {'id': 'toolu_01WARnjshmMJ1qc6vjrAEqL3', 'input': {'location': 'nyc'}, 'name': 'get_weather', 'type': 'tool_use'}, {'id': 'toolu_01RgkFQ6ijpm39hZvqvYJLoQ', 'input': {'location': 'sf'}, 'name': 'get_weather', 'type': 'tool_use'}]
Tool Calls:
get_weather (toolu_01WARnjshmMJ1qc6vjrAEqL3)
Call ID: toolu_01WARnjshmMJ1qc6vjrAEqL3
Args:
location: nyc
get_weather (toolu_01RgkFQ6ijpm39hZvqvYJLoQ)
Call ID: toolu_01RgkFQ6ijpm39hZvqvYJLoQ
Args:
location: sf
================================= Tool Message =================================
Name: get_weather
It's 60 degrees and foggy.
================================== Ai Message ==================================
Here's the weather in the coolest cities:
New York City (NYC): It's 90 degrees and sunny
San Francisco (SF): It's 60 degrees and foggy

Quite a contrast between the two cities! NYC is experiencing a hot, sunny day, while SF is cool and foggy. Would you like to know anything else about these cities or their weather?

12. Adaptability & Continuous Learning

Souce: image created by author with with Dall E 3.

This is probably the most important one.

The key to staying relevant in 2025 isn't just learning everything new, but learning the right things at the right pace.

Remember, you already have most of these skills in some form or another. Embrace new AI tools strategically – it's about sharpening your spear, not chasing every shiny object. Here are a few tips:

  1. Create a Learning Framework: Set aside 2–3 dedicated hours per week for learning, ideally at the same time each week to build a habit. Maintain a "skills inventory" document tracking your current expertise levels and identifying gaps; these can be new skills or existing ones.
  2. 80/20 Rule for AI tools: spend 80% of the time mastering skills you already have and 20% experimenting with new tech. Always try to apply what you learn to real problems and projects you are working on.
  3. Use the "learn-apply-teach" method: Learn something new, apply it to a real project within 1 week, then explain it to a colleague. Document your learning in a personal wiki that no one needs to see.
  4. Measure progress and stay relevant: Set quarterly learning goals with specific, measurable outcomes. Track your "wins" where you successfully applied your new skills. Most importantly, review and update your goals.

Conclusion

Being a data scientist ain't easy.

You need a ton of soft and hard skills, which can take years to develop. But don't worry, no one is perfect, no one is good at ALL of these skills, and no one will ever be.

In this post, we've reviewed the top 12 skills that will be most important to succeed in the 2025 job market:

  1. Communication skills
  2. Programming skills (Python).
  3. Undersanding and handling of data.
  4. Software engineering best practices.
  5. Interacting with databases.
  6. Cloud computing.
  7. Machine learning frameworks.
  8. MLOps.
  9. Understanding metrics.
  10. Problem-solving skills.
  11. AI tools.
  12. Continuous learning.

So what should you do with all this information?

Take action and start boosting your skills!

That said, take things one step at a time. Don't try to take it all in; you'll get overwhelmed, procrastinate, and inevitably stay in place.

If you want to improve, start small.

Pick one skill from this list today, and dedicate the next month to mastering it. Pick a different one each month and you'll cover them all!

Treat this post as a roadmap to learning what matters in the job market of 2025. I hope it will help you all land your first position soon or make you uniquely valuable and excel in your current position.

Good luck learning. You rock!


Sources and Further Reading:

[1] Qu, Changle, Sunhao Dai, Xiaochi Wei, Hengyi Cai, Shuaiqiang Wang, Dawei Yin, Jun Xu, and Ji-Rong Wen. "Tool Learning with Large Language Models: A Survey." (2024). arXiv preprint arXiv:2405.17935.

[2] Dosovitskiy, Alexey. "An image is worth 16×16 words: Transformers for image recognition at scale." (2020). arXiv preprint arXiv:2010.11929.

[3] Chase, H. (2022). LangChain [Computer software]. https://github.com/langchain-ai/langchain.

[4] Refactoring Guru. https://refactoring.guru/.

[5] LangGraph. https://github.com/langchain-ai/langgraph.

[6] Murphy, Forrest. "[Use This Simple Technique To Explain Complicated Concepts To Anyone](https://blog.groovehq.com/the-eli5-technique.)."

[7] Thompson, Pat. "Using the progressive disclosure principle in academic writing." (2022).

[8] Angelo, Linsday. "Strategic Storytelling: Helpful Tips to Boost your Business Communication and Influence."

[9] "Bottum Line Up Front (BLUF)". Wikipedia: The Free Encyclopedia.

[10] Henricksen, Tom. "What is the project status? Red Amber Green what does that mean?" (2023) Medium.com.

[11] Python Software Foundation. "The Python Standard Library". (2024). https://docs.python.org/3/library/index.html.

[12] Geeks For Geeks. www.geeksforgeeks.org.

[13] datacamp.com. https://www.datacamp.com/tutorial/aws-s3-efs-tutorial.

[14] AWS documentation. https://docs.aws.amazon.com/.

[15] Weights And Biases documentation. https://docs.wandb.ai/guides/.

[16] "Plan Do Check Act". Wikipedia: The Free Encyclopedia.

[17] "Five Whys". Wikipedia: The Free Encyclopedia.

[18] Tubis, Nick. "First Principles Thinking: The Blueprint For Solving Business Problems." (2023). Forbes.

[19] Tindle, Austin. "Learn, Apply, Teach, Repeat: Guidelines for Technical Learning". (2019). medium.com.

[20] Xiao, Han, Kashif Rasul, and Roland Vollgraf. "Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms." (2017). arXiv preprint arXiv:1708.07747.

Tags: AI Career Advice Data Science Deep Dives Machine Learning

Comment