Essential Links for a Data Scientist
The Data Science field is rapidly evolving, with developments nearly every day that have massive implications on tech and society at large. Whether you’re a current or aspiring Data Scientist, ML engineer, or a data enthusiast, these sites will help you keep up with the blistering pace of the field.
(Quick Disclaimer – I have no affiliation with any of the websites listed below and receive no compensation of any form for linking to them. These are just sites I find useful and believe are worth sharing.)
Latest News
Hacker News –
The preeminent tech news source for anyone in the know. It’s not pretty but you can find some great nuggets here and the discussion can bring some solid context. Focus is on Tech and entrepreneurship and HN is run by the famed startup accelerator Y combinator.
Twitter –
Love it or hate it some of the best AI/ML discussion happens on twitter and there are tons of good accounts to follow. Be weary of the AI influencers making emphatic claims and writing long threads with lots of bold and exclamation marks. A random spattering of accounts I get value from:
@gdb – Cofounder OpenAI
@karpathy – former AI director at Tesla
@AndrewYNg – Stanford Professor and formerly at Google Brain
@emollick – Wharton Prof and one of my favorite follows
Reddit r/datascience subreddit –
The community is very broad and commentary can be polarizing, but regardless this is a good place to keep up with DS news and industry trends.
Blogs and Communities
These can often be the best place to get information and deeper breakdowns than a news headline. Tutorials and guides as well can be great. There’s many DS blogs out there, but I’ll give some of my favorites.
Towards Data Science –
Lots of great breakdowns and articles here. The quality can be a mixed bag but it is fantastic for the most part and there is so much great info here. Beware that after 3 articles it will require you to subscribe for more.
KDNuggets –
Largely considered the most popular DS blog and always has new and relevant content. Covers the whole range of Data Science, from SQL and databases, to cutting edge ML/AI and Large Language Models.
Cross Validated –
Part of the StackExchange network, this site is dedicated to “Statistics, machine learning, data analysis, data mining, and data visualization” and features Q and A for nearly every question you could think of.
AWS Machine Learning Blog –
Provides insights, tutorials, and updates on using AWS for machine learning and AI projects. Even if you don’t work in AWS there’s still be great content here.
One Useful Thing –
I follow this blog for ChatGPT content and it’s a fantastic resource for learning about all the new things happening with these tools. I often look here for inspiration on new ways to utilize ChatGPT, one recent quality post was How to use AI to do practical stuff.
Specialized Libraries and Tools
Hugging Face –
A Hub of open source models including many of the current best Large Language Models (LLM’s), with a real focus on community and open source. One of the hottest names/website of the current AI wave. Many of the most impactful AI advancements are happening here.
ArXiv (Artificial Intelligence and Machine Learning sections) –
This is a repository of new research papers posted before they are published in peer-reviewed journals and is full of the latest research developments in AI/ML. There are also sections for Computer Vision and NLP. It can be overwhelming but the great papers you can find make it worth it.
ArXiv Sanity Preserver –
This site makes ArXiv easier to use and bubbles up the most impactful papers, a project by well known AI Researcher Andrej Karpathy.
Google Dataset Search –
A free tool to search for datasets, you can find data on almost any topic imaginable here. It sources data from a vast array of sites including Public and private research institutions, government datasets, Kaggle, and a variety of other sources.
Learning and Education
Kaggle –
A very well known and massive data science community. You can download and share datasets, find code examples, and participate in DS competitions to see who can create the best model for a given dataset. They also have a Jupyter notebook environment that lets you write python and create models for free. The competitions can be a great way to learn and can even land you a job if someone takes notice of your performance.
DataCamp –
One of the most popular data related skill building websites. There are some great courses here that you can take for free or for only a few dollars a month. Highly recommend as a way to keep your abilities sharp and upskill. However If you’re looking for credentials for a resume there are likely higher prestige online programs to take (that will also cost more).
Looking for more knowledge? Check out our article on “Free Data Science Books” for a curated list of open source/free access books