Data Science Project Flow Overview
1. Decode the business problem
It is very important to understand the business problem first before moving to next further step.
Ask lots of very relevant questions, that will help you to understand the problem, and then set the objectives in which how you are going to solve the problem.
2. Data acquisition/gathering
Gather information/data from multiple sources like Web Servers, Logs, Databases, APIs, Online Repositories, etc. For finding a right data, you need to put lot of effort and time
3. Data preparation
After the data is gathered, then comes the step called "Data preparation", in this step we need to do two activities,
3.1. Data Cleaning - This is most time consuming step, in this step only, you will find "Inconsistent Datatypes", "Misspelled Attributes", "Missing and duplicate values"
3.2. Transformation - Here you modify the based on the rules which are defined already, ETL tools like Talend, Infomatica Power Center will be helpful here to handle complex transformation, which helps team to understand the data structure better.
4. Exploratory data analysis - In this step, understanding what you actually do with data is critical part in the data science project. Define and re-define the selection of feature variables that will be used in the model development.
5. Data modelling or model building - This is the step where you can play with various machine learning algorithms and choose best performing model. What you do is, create model using different machine learning techniques like KNN, Decision Tree, Random Forest, etc. and apply the model on the data(training dataset, then on test dataset), then select the performing model. Here people use programming language like Python, R for building the model.
6. Visualization and communication - In this step, you can create nice/relevant reports/charts to convey the business findings. You can use tools like Tableau, Power BI, Zeppelin, Python Jupyter, etc. to create powerful reports.
7. Deployment & maintenance - Deployment & maintenance - Test the model in the pre-production model before deploying into the production environment which is one of the best practice to follow. After successful deploying into production environment, you can do the real-time analytics and also monitoring the project performance is a important task.
Happy Learning !!!
0 Comments