Images

I remember back in 2015 when I was about to graduate from my university, the tech giants of India had visited our campus. The main agenda they had was to deliver pre placements talks, conduct seminars on corporate culture and business processes. They had also taken sessions on the by laws of working in a corporate office and also getting us into various workshops that centred around cutting edge technologies including software engineering, game developments, automation testing, robotic process automation and analytics. I was a young tech graduate or rather a to be young tech graduate was fascinated particularly by analytics and data science. Terms like Big data, Cloud storage, machine learning and artificial intelligence blew my mind as I became obsessed with these technologies.

Isn’t it wonderful, how writing a few lines of codes can convert a machine into an intelligent appliance that could carry mimic human actions?

Isn’t it fascinating to understand layers of artificial neurons when interlinked with one another with certain amount of weights, could actually process data as our brain does?

Isn’t it brilliant to have an assistant right at our fingertips to help us plan our days better and also keep reminding us important assignments?

These were the exciting questions that I was bombarded with while I was inside that seminar hall where industry experts were presenting their views on these topics. If you are one among those who are fascinated with these technologies and want to work on these niche technologies, this article is for you. For all of those who are already experts in these topics, please go through the article and suggest possible points where this can be improved.

Looking forward to becoming a Data Scientist? Check out the Data Science Bootcamp Program and get certified today.

Getting into the world of Analytics

The seminars and the workshops that I talked about at the beginning of the article ended on a pretty rough note when they announced the opening into these fields were restricted only to master’s students and I being from the undergraduate mass could sense the bird flying out of the cage. But I had decided to Learn Data Science and there began my journey into data analytics.

So how did I start to Learn Data Science all on my own?

Is there a way to Learn Data Science without going for a full-fledged university program?

Images

What are the skillsets and knowledge needed to Learn Data Science Course in 2020?

I will answer all these questions in this article but before going further into the topics and giving you a step-by-step approach, I would like to answer some questions that I get asked more often when young graduates enter into the corporate for the first time.

Breaking the myths: Data Science is data science and is not rocket science

Do I have to an expert in various programming languages to enter into this field?

The short and simple answer to this question is no. You do not have to know a lot of languages to Learn Data Science. On a brighter side, not knowing any programming language before getting into data science is a plus. The reason I am saying this is because it becomes difficult sometime to switch to a new language if someone had been working on a particular programming language for a very long period. So if you have no idea of how to write a code and have never been into coding, no worries, coding is an advantage to have but is not a necessity.

Do I have to be an expert in Mathematics to understand Data Science?

Yes. Only if you wish to optimize the existing algorithms even more. If you have no such intentions, basic knowledge of statistics, probability, and linear algebra will do the job with a pinch of calculus to understand the working better and to get a grasp of how one algorithm is a better suit for a problem than the others.

I know how to code in Python. Does that mean I am a data scientist?

The answer here will be a no again. Python is a full-fledged programming language with support for web developments, servers, general programming and machine learning. Simply by learning python, you perhaps know the basics and know how to write a small code in python, but doing data science is beyond just coding, it involves hypothesis testing, statistical inferences and data visualization which takes practice and effort to learn and master.

Which programming language do I learn to do Data Science- R or Python?

This is perhaps the most interesting question compared to the questions mentioned above. The one reason that makes this question even more interesting is the fact that there is no correct answer to this question. It depends a lot on the project, the expected output, and the business expectations that have to be answered. In my opinion, start with one hop onto the other if need be but do not try to learn both at the same time.

Answering the Big Question: The Path to Learn Data Science in 2020

  1. Understand Data Science: The first step and perhaps the most crucial step to Learn Data Science is to understand the term data science itself. At first it might seem weird when someone says that in order to start learning data science, you have to first understand what is data science because the entire quest to Learn Data Science is to answer the question what is data science. It is not exactly true, what I mean when I say understand data science is that to understand the process as per which a data science project works. Data Science is all about the following:
    • Asking a question
    • Getting relevant data in order to answer that question
    • Cleanse the data to get rid of outliers and missing values
    • Explore the data and perform certain basic statistical analysis which will give you values like the mean, median and mode
    • Visualize the data to get trends or correlation between various attributes
    • Build the model that predicts the answer to the question
    • Evaluate the performance of the model
    • Deploy the model in the production systems to answer the question asked in real time.

By now, I am sure you must have realized that data science is not only python or any other programming language. Data Science rather engulfs aspects from Mathematics, statistics, software engineering and coding and is often multidisciplinary in nature. And therefore it becomes crucial to figure out what exactly you want to start with and the general answer to this question will be whatever that suits your current skillset. If you are a graduate in business management, may be asking questions is what you can start with, if you are a software engineer you might want to start with how to deploy models into the production systems or even cleansing the data and if you are from statistics or mathematics background, you might want to start with visualizing and exploring data. One cannot get the grip of data science by learning all in one go. It’s a step by step process.

  1. Start learning programming languages for data science (R/Python): Immediately following the realization of where the focus should be, you must start with getting your hands dirty in learning the technology behind it which forms the next milestone in the quest to Learn Data Science. Since our end goal here is to become a data scientist who has an end to end knowledge of a data science project and its implementation, it is imperative that we start investing our time and energy in learning the programming language that supports the tasks on a computer.

Both R and Python offer a wide range of software packages to meet the needs of various stages of a data science project right from exploratory data analytics to modeling and to deployment. You need not start doing high-performance programming the moment you step into these subjects but start with the basics and get used to the programming language in terms of syntax, performance and the best practices that follows learning that particular language.

After a point, when you are fairly comfortable with the syntax and the flow of the programming language, start learning a package that compliments your goals set in the last milestone. For example, if you are into developing models for the data, you can start by learning the package scikit-learn. Similarily, we have Matplotlib and seaborn for visualizing data and we have numpy and pandas for numerical analysis of data to name a few.

  1. Learn Exploratory data analytics: Once you are familiar with whatever you started initially, you might want to broaden your horizon in this field and for achieving the same the next milestone in your quest to Learn Data Science is to learn exploratory data analysis. In case, you started with exploratory data analysis, please feel free to skip this topic and move to the next one.

The various packages which help a data scientist establish exploratory data analytics are pandas, matplotlib, and seaborn. Since all these packages might be overwhelming when you start, you can start with the basics, for example, plotting a simple histogram distribution of the data or getting a trend line for the data, etc. My personal recommendation in learning exploratory data analysis is to start with pandas and learn how to handle files and also learn basic data munching, followed by learning the package matplotlib to understand various simple graphs that can be plotted. Seaborn comes later as it is an advanced software package for plotting and visualizing data with additional controls on various attributes such as colors, canvas, etc.

  1. Start with Machine Learning: The most entertaining part of a data science project is undoubtedly the part which deals in modelling. This happens to be the next milestone towards your quest to Learn Data Science. It is at this juncture that we see the lifeless circuits actually do some intelligent processing of data and giving desirable or undesirable outputs. In data science, machine learning is used to predict the future or categorize populations or extract information from a cluster of data available in a row and column format.

Python has a package called the scikit-learn which deals in machine learning. Below are some of the points which makes scikit-learn one of the most popular software packages in python:

  • Scikit-Learn offers a very stable and easy to learn interface for machine learning in Python
    • Scikit-Learn has a number of modules each differentiated as per the type of machine learning algorithm which makes it fairly intuitive for the data scientists to use
    • The documentation for Scikit-Learn is available online and is very detailed which makes the package easy to learn and use
    • Scikit-Learn provides a number of tuning parameters which play a key role in optimizing the models and their outputs.

This does not mean that Scikit-Learn is the only package available for machine learning. There are a handful of other packages as well such as keras, Pytorch, H20 and XGBOOST that are quite popular too but scikit-learn is more widely used and as someone who is just starting up with machine learning and data analytics, scikit-learn is a good place to start due to the tremendous support it gets from the online community.

  1. Expand your knowledge on Machine Learning: Once you are ready with the basics of machine learning using the programming language of your choice, Python being my choice of course, it is very important to start getting knowledge on the variety of other areas that need similar attention as we do while developing our model. This forms your next milestone in your quest to Learn Data Science. Some of these areas are as below,

  • Is there a way I can make the model more efficient and optimized?

This is called model optimization and is often done by tuning various parameters related to the algorithm you are working with.

  • How do I show the results to the stakeholders?

Often, when results are in numeric format or categories, it is not intuitive and is difficult to understand too. The challenge here is to interpret the results and display them in a format that is easily understood and easily conveyed to the audience which can be the higher management in case of corporates or it can be the public if it is a research project.

  • How is the performance of my model?

This is called model evaluation and there are a number of ways in which this can be tested. One of the methods which is generally employed in doing the performance testing of various models is called the train-test method. In this method, the entire dataset is divided into two parts, usually in the ratio 80:20 where 80% of the data is taken as the training data set which helps the model learn the data and 20% of the data is taken as the test data which helps validating the model but running the model on it and finding out for how many entries the model predicted the correct output. There are other methods too which ensures performance optimization of the model.

  • Which all attributes do I select and which all do I remove from the training?

This is called feature engineering where we find out the features or attributes which contribute the most to the output or the set of outputs that are in view of the data science project. It is a common practice followed by many data scientists to formulate additional features in the dataset by aggregating data values together. All of these can be broadly classified as feature engineering though many would argue that selection and rejection of attributes from the data set do not come under the banner of feature engineering but for the time being, these should be done.

  1. Practice the concepts: A wise human once said, “Practice makes a man perfect”. The most crucial part in your quest to Learn Data Science is practice. Below are some of the tips you can consider while you are trying to practice the concepts that you learned in your quest to Learn Data Science,

  • Kaggle Competitions: Kaggle is a part of the Google LLC subsidiary which acts an independent platform for data enthusiasts to Learn Data Science. Kaggle regularly comes up with competitions which can be used to practice the concepts and also learn the places which needs improvement.

  • Open Source projects: There are a number of open-source projects on the internet which runs on collaboration from data enthusiasts across the globe. Usually, all the open-source projects maintain their own repositories in Git or any other repository platform where you can contribute your own code for solving a particular problem.

  • Attending conferences and keeping yourself updated: There are many conferences throughout the world that focus on the trends of data science and what new one can expect in future given the fact that the field of data science and machine learning is an emerging field and is still under research. Some of the conferences that you can attend are PyCon US, SciPy and PyData conferences which would be enlightening and at the same enriching for many young data scientists.

The Bottom Line: Fall in Love with Data and if possible get married to it!

I have come across many young graduates with high energy and enthusiasm for data science. However, I have noticed that about 90% of those graduates give up on data science once they start feeling it’s too overwhelming for them. So if you are one among those who tried learning data science in the past but never got far, you are not alone, there are many with you on this failure. The fault is not with you but with the mindset and the guidance. I am a self-taught data nerd, and if I can do it anyone can do it and learn data science the correct way.

The bottom line for data science is to fall in love with data, getting obsessed by it. I remember, when I started with data science, it was to predict the industry best salary for a particular because I was skeptical that with the skillset and knowledge that I had I was not getting paid as per the standards. Needless to say, the model that I created initially failed big time because I lacked the basic understanding of data and would go to any extent to make the model better and better. This is where I started looking in the internet to find solutions as to how to make the models better and more efficient. I was completely obsessed with the miniature project that I had started and no boundaries or difficulties stopped me in the middle of nowhere.

What I have observed in my short span of stay in the IT world is that youngsters want to get into data science because they find it cool and expect that the average salary is much higher than that of other professionals in the sector which is not true anyways. As a result, when they face challenging problems and concepts in data science, they simply lose hope and give up. Yet another factor that calls for a youngster to give up data science is the wrong guidance. Usually, when youngsters reach out for help to get them started with data science, they are bombarded with mathematical formulae, big hefty algorithms and fuzzy calculations which frightens them to the core. Rather, in my opinion, they should be given a problem statement and asked to solve. That is a much better way of learning data science than going through the algorithms one after the another without actually implementing them in a real world problem. This is another reason, why many students who are regular on platforms like kaggle are far superior to others who have earned a degree in data science.

In the bottom-line, fall in love with data and if possible marry it because this is what it takes exactly to become a good data scientist.

A Quick Wrap Up: Questions Answered

Now that you know the milestones towards your quest to Learn Data Science and also the bottom line of becoming a great data scientist, it is now time to answer the questions that I get asked normally by young graduates and also by experienced professionals who are planning to have a transition from their current role to a role in data science project.

How did I start to Learn Data Science all on my own?

Very simple. I married data because I was obsessed with my model that predicted the correct industry standard salary for a particular role which was around three and a half years back. Unfortunately, I had no awareness of a repository called git and in one of the dire circumstances of a hard disk crash, I lost my model. Yet, that model was the start to this vast field and I remember spending sleepless nights in improving my model by learning different techniques one at a time. Some were fruitful and improved the model while others were not, but the learning was immense. As I said the right amount of motivation takes you a long way and this was indeed my case when I decided to prepare a model to help fellow employees and young graduates to decide on the remunerations when they get a job offer.

Is there a way to Learn Data Science without going for a full-fledged university program?

Yes. Definitely there are ways of doing it. The data science community is a vast open source community that is under continuous improvement and research. The more you go through the models of others and go through documentations of various packages and practice them in real world problems, the better you become in data science. It is good to have a degree, but is not a necessity for getting into a data science role.

What are the skillsets and knowledge needed to Learn Data Science in 2020?

Unfortunately, I do not have a concrete answer to this question because I have seen professionals getting into data science who never coded in their life, I have seen management professionals getting into data science, I have seen consultants getting into data science. In short, you do not need a skillset or knowledge to learn data science, you grow and nurture them as you learn data science.

In a nutshell, the way to Learn Data Science in 2020 is to have a lot of enthusiasm, proper guidance and start with the foundations in the right way without getting pestered by the complex algorithms or mathematics right in the beginning. To recap, below are the milestones that you need to cover one at a time in order to learn and expand your horizon in Data Science,

  • Understand Data Science
  • Start learning Programming languages for data science (R/Python)
  • Learn Exploratory data analytics
  • Start with Machine Learning
  • Expand your knowledge on Machine Learning
  • Practice the concepts

And not to forget the bottom-line: Fall in Love with Data and if possible get married to it.

I hope this read was an interesting one and you got a hold of how do you start to Learn Data Science in 2020

Recommended Reads:

Also check this video:

E&ICT IIT Guwahati Best Data Science Program

Ranks Amongst Top #5 Upskilling Courses of all time in 2021 by India Today

View Course

Recommended videos for you

Interested in Henry Harvin Blog?
Get Course Membership Worth Rs 6000/-
For Free

Our Career Advisor will give you a call shortly

Someone from India

Just purchased a course

1 minutes ago
Henry Harvin Student's Reviews
Henry Harvin Reviews on Trustpilot | Henry Harvin Reviews on Ambitionbox |
Henry Harvin Reviews on Glassdoor| Henry Harvin Reviews on Coursereport