Machine learning with Flowchart

Umakant_Shinde
Analytics Vidhya
Published in
3 min readJan 6, 2021

--

Step by step process of solving machine learning problems

Photo by Ramón Salinero on Unsplash

we know what is machine learning but in short defining machine learning

machine

Flowchart of solving machine learning problems

Collect Data:-

Solving machine learning problems firstly we need raw data because without raw data we can not do machine learning problems. raw data we get from further discussion of the problem with client and data scientist team we focus on data that is a data integration and data integration is a very difficult task because we collect data from multiple resources like structure data unstructured data, web scraping, etc. collected data stored in data warehouse and we get data from a data warehouse as a data scientist we know how can collect data from multiple resources I know this work done by data engineering but as a data scientist, we know that. data engineering facing problems data integration from multiple resources

Data Analysis:-

after collecting data second step is data analysis here I'm covering an extra point that is data cleaning.

data cleaning means either remove or use the imputer function for null values. In the data set there are so many null values present I don't know how much when we cleaning that means we are removing null values but there is some other technique is that is imputer function for integer values only.

when we use the imputer function means that we are removing null value and putting means, median function using this function we put int value in null value but this technique useful only for integer data set not string if all null value string null than we clean data and create a perfect dataset.

after cleaning data we analyze data for which machine learning is best for that data set and we find out the relation of features that means is data suitable for linear regression, logistics regression, clustering or so on methods

Split Data:-

3rd step is that split data in that we split data for training and testing almost 80% of data is for training and 20% for testing is a basic rule in the machine learning

Train Data

in this step, we do training data for machine analysis itself and we do one more step is a validation of training data because training data set will generate either overfitting or under the fitting problem that means false positive output or true negative output that means overfitting means when you go new area and 1st person give disrespect and you considering all people are same this is overfitting problem

Test and evaluate:-

in the testing phase, we test the model using cross-validation we check is the model is well or not is going is right or not, there are some technique of cross-validation (in-depth I will share in next blog )and we use confusion matrix for checking model performance

Model deployment:-

this last step is not for machine learning engineering this step for a data scientist model deployment means after saving the model either we use pickle file in web development or s/w.

--

--

Umakant_Shinde
Analytics Vidhya

Computer Science Engineer. machine learning and data science . I’m trying to cover basic level to advance level topic in the data science domain