This is the final project for module 3 in Flatiron School. After I had learned about machine learning in module 3, I applied this knowledge to predict the churning pattern for telecom company customers.
If anyone is interested in my project, please feel free to have a look at this link.
Every day people change from one mobile network to another. Commonly service providers are trying their best to maintain their current customers rather than obtaining a new one. In this project, the classifier will be built to predict the pattern of churn customers to the telecom company.
In this project, I have used the SyriaTel churn customer dataset. It is available on Kaggle. Here is a preview of the data:
- state – customer living state
- account length – customer usage length
- area code – customer area code number
- phone number – customer phone number
- international plan – customer use international plan or not
- voice mail plan – customer use voice mail plan or not
- number vmail messages – number of voice mail usage
- total day minutes – total daytime call length
- total day calls – total number of daytime call
- total day charge – total charge for daytime call
- total eve minutes – total evening-time call length
- total eve calls – total number of evening-time call
- total eve charge – total charge for evening-time call
- total night minutes – total night-time call length
- total night calls – total number of night-time call
- total night charge – total charge for night-time call
- total intl minutes – total international call length
- total intl calls – total number of international call
- total intl charge – total charge for international call
- customer service calls – number of customer call customer service
- churn – customer use or stop using service
Exploratory Data Analysis
After pre-processing and cleaning data, let see the number of staying and leaving customers in this data.
From the observed customers, 3,333 people:
- The number of customers who are leaving is still not high, approximately 14.49%.
- The number of customers who are staying is 85.81%.
There are only two plans for customers so let see these two plans affect the number of churn customers.
As shown on the bar graph, the churned customers have not used mainly the international plan and not used the voice mail plan. This result shows that there is a chance that the current international plan is not good enough to use. There is a chance to prevent customers from leaving by improving the current international plan. Conversely, the graph shows that many customers are leaving because of the voice mail plan.
From the above graph, the area code seems not to affect. This data only shows that most of SyriaTel are in the 415 area.
Therefore, the data that used in the modeling are listed:
- account length
- area code
- international plan
- voice mail plan
- number vmail messages
- total day calls
- total day charge
- total eve calls
- total eve charge
- total night calls
- total night charge
- total intl calls
- total intl charge
- customer service calls
In this project, I have tried the model as listed:
- Logistic Regression
- Decision Tree
- Random Forest
The best result is from the Decision Tree model. The accuracy is 92.81% and the confusion matrix is as below:
From the importance of the features, it is quite clear that the main result of leaving customers is the cost. The top 3 in total daytime charge, total evening charge, and international charge. If SyriaTel wants to keep current customers, the calling charge during daytime or evening should be reduced or some plan during this time should be released.
From the analysis, the recommendation for the company are as below:
- There are only two plans for customers and they are not enough. The company should change to a better plan or add more.
- The best model is the Decision Tree with 92.81% accuracy.
- The most important factors are the total charge for daytime calls, evening calls, and international. This supports that churn customers are not satisfied with the cost.
For the future, more detail about charges should be included because it is the most concern factor for customers. Also, I should try another ME algorithm to ensure the best analysis.