“I can’t make bricks without clay.” Any guesses who said this? It wasn’t a famous tech CEO, or a data analyst. The person who said this lived long before the tech companies even existed. This line was said by Sherlock Holmes. , the famous detective created by Sir Arthur Conan Doyle. What Doyle meant was that Holmes couldn’t draw any conclusions (bricks) without data (clay). Therefore, we can say that data is the building block that we use for everything. We might not realize it, but people analyze data all the time. For instance, while buy a new Smartphone, we pay attention to different features like price, company’s reputation, storage capacity, ram, display etc. We compare all these features with other brand and then finally decide which phone is best for us. This is an example from everyday life where we use data to take right actions.
In this project, I will be deciding which is the best possible location to open a restaurant in Bangalore with the help of a popular machine learning algorithm, the K — means clustering algorithm. I am going to analyze the localities in Bangalore city to identify the most profitable area since the success of the restaurant depends on the people, ambience and the most important factor i.e. the density of other restaurants in that locality.
In this project, I will be going through all these process in a step by step manner:
- Understanding the Business problem.
- Data Acquisition.
- Data Cleaning.
- Data Analysis/Modeling.
- Data Visualization.
Find the code associated with this article on GitHub
Find me on LinkedIn
Understanding the Business problem.
Bangalore, the capital of the Indian state of Karnataka with a population of over 10 million. The social and economic diversity in the city is clearly reflected in Bangalore food. You can find both north Indian and south Indian people who travelled from various states to Bangalore. Due to high population, restaurant owners end up sharing the locality. Hence, from the point of view of the investors, they would prefer to set up businesses in neighborhoods where the competition is less intense.
Data acquisition and cleaning .
- Scrapping data from Wikipedia Page into DataFrame
I have scraped the link from Wikipedia to get the information about the localities in Bangalore using beautiful soup library. Here I have used beautiful soup as it is easy for parsing HTML and return the value in text form. I have created a empty list called “neighborhoodList” so that I can append the data using a for loop into this list.
2. Get the geographical coordinates
As I want to visualize the localities on map, I would be needing Latitude and Longitude. I can get coordinates of each locality with geocoder library. Then I have made a function to get latitude and longitude of each member of the list “neighborhoodList”. I have save the coordinates data into a temporary dataframe so that I can merge the coordinates data into my main dataframe.
3. Creating map of Bangalore
After getting the coordinates of Bangalore, I used folium library to visualize the map of Bangalore. Then I wanted to my localities to be superimposed on top of my Bangalore map. Using for loop , markers was added to each locality displaying all the neighborhood’s it was located in on top of Bangalore map.
4. Using the Foursquare API to explore the Localities
Foursquare is a company that built a massive dataset of location data. Their location data is most comprehensive out there and quite accurate. Foursquare API allows us to search for venues and return venue data for a given a location. After entering the foursquare API information such as client_id, client_secret, and version, we access the API and, let’s see the top 100 venues that are within a radius of 2 km.
5. Analyzing Each Locality
Now that we have acquired data from Foursquare API, we should be ready to start analyze localities and location data .One-hot-encoder helps us to encode categorical features as a one-hot numeric array and create a binary column for each category. We can separate each locality based off their venue category.
Next, let’s group rows by neighborhood and by taking the sum of the frequency of occurrence of each category
6. Getting total number of venues classified as restaurant for each Locality.
After collecting data from the Foursquare API, I got a list of 7008 venues. However, not all venues have restaurants. I have then collected a list of all restaurants from these 2710 venues and turned it into a dataFrame. We grouped the restaurants based on locality. We obtained a total of 139 localities. The locality with the highest number of restaurants is Jayanagar with 55 restaurants.
7. Clustering of restaurants by Localities.
I have decided to explore localities, segment them, and group them into clusters to find similar localities. To be able to do that, I will be using K-Means Clustering which is unsupervised machine learning.
WSS Plot or Within Sum of Square Error Plot is used to identify the optimal number of clusters. Elbow in the graph represents the optimal value of the clusters. We should select the optimal number of clusters (basically K) from the graph where we see sudden decrease in the elbow curve. Like here K=6 would be good choice for K.
Clustering the Bangalore Localities Using K-Means with K =6
Classifying the localities to represent restaurant density as follows:
- 0–19 Restaurants: “Low”;
- 20–39 Restaurants: “Medium”
- Over 40 Restaurants: “High”
Visualization of restaurant density categories
Over 50 percent of the Localities fall under the “low” density category regarding restaurant density and fewer than 10 Localities fall under the “high” density category.
Visualization of Clusters
In the below map of Bangalore, I have plotted the different localities color coded according to their cluster groups.
Results and conclusion
The restaurant cluster obtained from Machine learning model are
- Cluster 0: West/South West Bangalore with low densities;
- Cluster 1: South Bangalore with low densities;
- Cluster 2: East Bangalore with low to medium densities;
- Cluster 3: North Bangalore with low densities;
- Cluster 4: Central Bangalore — North Zone with medium densities;
- Cluster 5: Central Bangalore — South Zone with medium to high densities.
The results are intuitive as the restaurant densities in most of the localities are mostly low at the peripheries of the city in cluster 0, cluster 1 and cluster 3, with higher toward the center in cluster 5 and a moderate rate in cluster 2.
Some of the neighborhoods in the vicinity of the Marathahalli/Whitefield area have medium densities which can be explained by development as a result of the presence of numerous corporate offices.
From the point of view of an investor, there is a great opportunity and high potential to open new restaurants in peripheries of the city as there is little or no competition from existing restaurants. Meanwhile, opening a restaurant in central Bangalore would be a bad idea since there are high density of restaurant and high competition. Investor needs to stand out by Aggressive Promotion or Hiring a Great Chef if he planning to open a restaurant East and North part of Bangalore where there is a medium competition.
This analysis helps us to make decisions only based on location and competition. But we always want to improve right? Further exploration or ways in which we can improve this analysis is asking additional questions like:
- How much do people usually spend when eating at a restaurant?
- How much money is customer willing to pay for a good meal?
- Is customer willing to travel some distance to your restaurant?
- And food preferences etc.
Since a public data is not readily available right now. We can improved with the help of more data and different machine learning technique. Additionally, we can utilize this venture to investigate any situation, for example, opening a new gym, ice cream parlor, or opening of a Jeweler shop and so forth.