Note:
Question 1-5 are answered in this notebook along with the code and the development of the notebook.
Report in the other PDF file covers these topics as requested:
- Discuss the results (your understanding) of the clustering algorithm on clustering crime dataset used in this work (Max 500 words).
- Change the number of clusters to a different value and perform the clustering algorithm and draw the graph again. Discuss your results briefly.
- Consider a different towns (example : dudley) and perform the clustering again. You should choose the number of clusters from the dendrogram according. Discuss your results briefly.
import pandas as pd
import matplotlib.pyplot as plt
import folium
import os, re
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import normalize
from IPython.display import IFrame
from sklearn.cluster import AgglomerativeClustering
import scipy.cluster.hierarchy as shc
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')
path_to_data = './crime'
cd = os.path.dirname(os.path.abspath(path_to_data))
i = 0
columns = range(1,100)
dfList = []
for root, dirs, files in os.walk(cd):
for fname in files:
if re.match("^.*.csv$", fname):
frame = pd.read_csv(os.path.join(root, fname))
frame['key'] = "file{}".format(i)
dfList.append(frame)
i += 1
dataset = pd.concat(dfList)
dataset.head()
Crime ID | Month | Reported by | Falls within | Longitude | Latitude | Location | LSOA code | LSOA name | Crime type | Last outcome category | Context | key | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | cb47df2e6dbe1b37b6362a9312116911e0542f31dbdd56... | 2022-11 | West Midlands Police | West Midlands Police | -1.848972 | 52.588428 | On or near Woodside | E01009417 | Birmingham 001A | Other theft | Unable to prosecute suspect | NaN | file0 |
1 | 47d811512341fd0ecec13cb254b40f1b220928a31d349a... | 2022-11 | West Midlands Police | West Midlands Police | -1.849790 | 52.590937 | On or near Walsall Road | E01009417 | Birmingham 001A | Violence and sexual offences | Investigation complete; no suspect identified | NaN | file0 |
2 | da2ff35b910f6ec63d2f4f6c006fa46991675511a2f25d... | 2022-11 | West Midlands Police | West Midlands Police | -1.844834 | 52.590077 | On or near Heathfield Road | E01009417 | Birmingham 001A | Violence and sexual offences | Unable to prosecute suspect | NaN | file0 |
3 | 08fe094d4f9b3df0a748b8605c92e58c7cc6ebf171bdc3... | 2022-11 | West Midlands Police | West Midlands Police | -1.849790 | 52.590937 | On or near Walsall Road | E01009417 | Birmingham 001A | Violence and sexual offences | Unable to prosecute suspect | NaN | file0 |
4 | a8aa5abff380c85b94e97fd746ed67f4b02d21716a00da... | 2022-11 | West Midlands Police | West Midlands Police | -1.847123 | 52.593864 | On or near Bramble Way | E01009417 | Birmingham 001A | Violence and sexual offences | Unable to prosecute suspect | NaN | file0 |
print(dataset.shape)
(1253892, 13)
name_number = 'RianYan-2424072'
dataset.to_csv(name_number, index=False)
data = pd.read_csv(name_number)
data['Crime type'].value_counts()
Crime type Violence and sexual offences 527940 Vehicle crime 125466 Public order 107829 Criminal damage and arson 95256 Other theft 85566 Burglary 68814 Anti-social behaviour 64362 Shoplifting 57903 Robbery 27603 Drugs 26025 Possession of weapons 23727 Other crime 20934 Theft from the person 14625 Bicycle theft 7842 Name: count, dtype: int64
Q1: Using a similar approach display the number of crimes in each month. You can use the "Month" column to do that.
data['Month'].value_counts()
Month 2022-07 103746 2022-05 102315 2022-08 101025 2022-06 100002 2022-10 98874 2022-03 97791 2023-03 97356 2022-04 95529 2022-09 95091 2022-11 92892 2023-01 92811 2023-02 88431 2022-12 88029 Name: count, dtype: int64
data['town'] = data['LSOA name'].str.split(' ').str[0]
data.head()
Crime ID | Month | Reported by | Falls within | Longitude | Latitude | Location | LSOA code | LSOA name | Crime type | Last outcome category | Context | key | town | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | cb47df2e6dbe1b37b6362a9312116911e0542f31dbdd56... | 2022-11 | West Midlands Police | West Midlands Police | -1.848972 | 52.588428 | On or near Woodside | E01009417 | Birmingham 001A | Other theft | Unable to prosecute suspect | NaN | file0 | Birmingham |
1 | 47d811512341fd0ecec13cb254b40f1b220928a31d349a... | 2022-11 | West Midlands Police | West Midlands Police | -1.849790 | 52.590937 | On or near Walsall Road | E01009417 | Birmingham 001A | Violence and sexual offences | Investigation complete; no suspect identified | NaN | file0 | Birmingham |
2 | da2ff35b910f6ec63d2f4f6c006fa46991675511a2f25d... | 2022-11 | West Midlands Police | West Midlands Police | -1.844834 | 52.590077 | On or near Heathfield Road | E01009417 | Birmingham 001A | Violence and sexual offences | Unable to prosecute suspect | NaN | file0 | Birmingham |
3 | 08fe094d4f9b3df0a748b8605c92e58c7cc6ebf171bdc3... | 2022-11 | West Midlands Police | West Midlands Police | -1.849790 | 52.590937 | On or near Walsall Road | E01009417 | Birmingham 001A | Violence and sexual offences | Unable to prosecute suspect | NaN | file0 | Birmingham |
4 | a8aa5abff380c85b94e97fd746ed67f4b02d21716a00da... | 2022-11 | West Midlands Police | West Midlands Police | -1.847123 | 52.593864 | On or near Bramble Way | E01009417 | Birmingham 001A | Violence and sexual offences | Unable to prosecute suspect | NaN | file0 | Birmingham |
towns = ['Wolverhampton']
filtered_data = data[data.town.str.contains('|'.join(towns), na=False)]
filtered_data.head()
Crime ID | Month | Reported by | Falls within | Longitude | Latitude | Location | LSOA code | LSOA name | Crime type | Last outcome category | Context | key | town | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
28063 | 3740d3b5932e40edd0b2d94ba63bfc787d237ef876374b... | 2022-11 | West Midlands Police | West Midlands Police | -2.129332 | 52.618605 | On or near Barrington Close | E01010434 | Wolverhampton 001A | Burglary | Investigation complete; no suspect identified | NaN | file0 | Wolverhampton |
28064 | 861910c2c962166de706c516ec55038a0ab9c46f6a14d6... | 2022-11 | West Midlands Police | West Midlands Police | -2.121103 | 52.617840 | On or near Sherborne Road | E01010434 | Wolverhampton 001A | Burglary | Unable to prosecute suspect | NaN | file0 | Wolverhampton |
28065 | bb82e4a2a6ca33c8b527b2270c8716793789d63769f010... | 2022-11 | West Midlands Police | West Midlands Police | -2.129332 | 52.618605 | On or near Barrington Close | E01010434 | Wolverhampton 001A | Burglary | Investigation complete; no suspect identified | NaN | file0 | Wolverhampton |
28066 | 7be5d72cf3e24376607e6b0478fdcca8a4cf07653a308c... | 2022-11 | West Midlands Police | West Midlands Police | -2.126509 | 52.617745 | On or near Marklin Avenue | E01010434 | Wolverhampton 001A | Burglary | Investigation complete; no suspect identified | NaN | file0 | Wolverhampton |
28067 | e4c839a07f0475a1dae66ce5710e0a7591ef9b39491bf1... | 2022-11 | West Midlands Police | West Midlands Police | -2.129332 | 52.618605 | On or near Barrington Close | E01010434 | Wolverhampton 001A | Criminal damage and arson | Investigation complete; no suspect identified | NaN | file0 | Wolverhampton |
Q2: Display crime types in Wolverhampton.
filtered_data['Crime type'].value_counts()
Crime type Violence and sexual offences 54645 Public order 10959 Vehicle crime 9564 Criminal damage and arson 9327 Other theft 7389 Burglary 6282 Shoplifting 6138 Anti-social behaviour 5436 Other crime 2319 Drugs 2181 Robbery 2139 Possession of weapons 2106 Theft from the person 972 Bicycle theft 885 Name: count, dtype: int64
filtered_data['LSOA code'].value_counts().nlargest(10)
LSOA code E01010521 11172 E01010564 3849 E01010414 2601 E01010530 2370 E01010450 1998 E01010410 1950 E01010476 1761 E01010472 1710 E01010473 1674 E01010508 1596 Name: count, dtype: int64
Q3: Provide a prime landmark of alteast 2 LSOA code.If there is no recognisable prime landmark, provide name(s) of the nearby streets/roads sorrounding that area.
E01010521 - Univeristy of Wolverhampton
E01010414 - The Street names surrounding this area are: Wellington Rd., Black Country Route, and Oxford St.
filtered_important_data = filtered_data[['LSOA code','Crime type']]
filtered_important_data.head()
LSOA code | Crime type | |
---|---|---|
28063 | E01010434 | Burglary |
28064 | E01010434 | Burglary |
28065 | E01010434 | Burglary |
28066 | E01010434 | Burglary |
28067 | E01010434 | Criminal damage and arson |
filtered_important_data = filtered_data[['LSOA code','Crime type']]
filtered_important_data = pd.get_dummies(filtered_important_data, columns=['Crime type'])
clustering_data = filtered_important_data.groupby(['LSOA code']).agg({'Crime type_Anti-social behaviour':'sum',
'Crime type_Bicycle theft':'sum',
'Crime type_Burglary':'sum',
'Crime type_Criminal damage and arson':'sum',
'Crime type_Drugs':'sum',
'Crime type_Other crime':'sum',
'Crime type_Other theft':'sum',
'Crime type_Possession of weapons':'sum',
'Crime type_Public order':'sum',
'Crime type_Robbery':'sum',
'Crime type_Shoplifting':'sum',
'Crime type_Theft from the person':'sum',
'Crime type_Vehicle crime':'sum',
'Crime type_Violence and sexual offences':'sum'}).reset_index()
clustering_data[:5]
LSOA code | Crime type_Anti-social behaviour | Crime type_Bicycle theft | Crime type_Burglary | Crime type_Criminal damage and arson | Crime type_Drugs | Crime type_Other crime | Crime type_Other theft | Crime type_Possession of weapons | Crime type_Public order | Crime type_Robbery | Crime type_Shoplifting | Crime type_Theft from the person | Crime type_Vehicle crime | Crime type_Violence and sexual offences | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | E01010410 | 108 | 18 | 84 | 129 | 9 | 30 | 459 | 42 | 183 | 27 | 105 | 15 | 177 | 564 |
1 | E01010411 | 18 | 3 | 18 | 69 | 3 | 9 | 21 | 0 | 45 | 6 | 3 | 0 | 42 | 264 |
2 | E01010412 | 36 | 0 | 30 | 96 | 12 | 36 | 27 | 15 | 96 | 21 | 18 | 3 | 69 | 453 |
3 | E01010413 | 21 | 9 | 21 | 93 | 9 | 21 | 36 | 9 | 78 | 12 | 33 | 0 | 66 | 342 |
4 | E01010414 | 120 | 36 | 147 | 255 | 24 | 39 | 120 | 48 | 213 | 48 | 285 | 72 | 243 | 951 |
clustering_data_original = clustering_data.copy()
clustering_data_original.head()
LSOA code | Crime type_Anti-social behaviour | Crime type_Bicycle theft | Crime type_Burglary | Crime type_Criminal damage and arson | Crime type_Drugs | Crime type_Other crime | Crime type_Other theft | Crime type_Possession of weapons | Crime type_Public order | Crime type_Robbery | Crime type_Shoplifting | Crime type_Theft from the person | Crime type_Vehicle crime | Crime type_Violence and sexual offences | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | E01010410 | 108 | 18 | 84 | 129 | 9 | 30 | 459 | 42 | 183 | 27 | 105 | 15 | 177 | 564 |
1 | E01010411 | 18 | 3 | 18 | 69 | 3 | 9 | 21 | 0 | 45 | 6 | 3 | 0 | 42 | 264 |
2 | E01010412 | 36 | 0 | 30 | 96 | 12 | 36 | 27 | 15 | 96 | 21 | 18 | 3 | 69 | 453 |
3 | E01010413 | 21 | 9 | 21 | 93 | 9 | 21 | 36 | 9 | 78 | 12 | 33 | 0 | 66 | 342 |
4 | E01010414 | 120 | 36 | 147 | 255 | 24 | 39 | 120 | 48 | 213 | 48 | 285 | 72 | 243 | 951 |
clustering_data.drop(['LSOA code'], axis = 1, inplace = True, errors = 'ignore')
clustering_data.head()
Crime type_Anti-social behaviour | Crime type_Bicycle theft | Crime type_Burglary | Crime type_Criminal damage and arson | Crime type_Drugs | Crime type_Other crime | Crime type_Other theft | Crime type_Possession of weapons | Crime type_Public order | Crime type_Robbery | Crime type_Shoplifting | Crime type_Theft from the person | Crime type_Vehicle crime | Crime type_Violence and sexual offences | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 108 | 18 | 84 | 129 | 9 | 30 | 459 | 42 | 183 | 27 | 105 | 15 | 177 | 564 |
1 | 18 | 3 | 18 | 69 | 3 | 9 | 21 | 0 | 45 | 6 | 3 | 0 | 42 | 264 |
2 | 36 | 0 | 30 | 96 | 12 | 36 | 27 | 15 | 96 | 21 | 18 | 3 | 69 | 453 |
3 | 21 | 9 | 21 | 93 | 9 | 21 | 36 | 9 | 78 | 12 | 33 | 0 | 66 | 342 |
4 | 120 | 36 | 147 | 255 | 24 | 39 | 120 | 48 | 213 | 48 | 285 | 72 | 243 | 951 |
data_scaled = normalize(clustering_data)
data_scaled = pd.DataFrame(data_scaled, columns=clustering_data.columns)
data_scaled.head()
Crime type_Anti-social behaviour | Crime type_Bicycle theft | Crime type_Burglary | Crime type_Criminal damage and arson | Crime type_Drugs | Crime type_Other crime | Crime type_Other theft | Crime type_Possession of weapons | Crime type_Public order | Crime type_Robbery | Crime type_Shoplifting | Crime type_Theft from the person | Crime type_Vehicle crime | Crime type_Violence and sexual offences | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.134580 | 0.022430 | 0.104673 | 0.160748 | 0.011215 | 0.037383 | 0.571964 | 0.052337 | 0.228038 | 0.033645 | 0.130841 | 0.018692 | 0.220561 | 0.702805 |
1 | 0.063848 | 0.010641 | 0.063848 | 0.244750 | 0.010641 | 0.031924 | 0.074489 | 0.000000 | 0.159620 | 0.021283 | 0.010641 | 0.000000 | 0.148978 | 0.936435 |
2 | 0.074458 | 0.000000 | 0.062048 | 0.198555 | 0.024819 | 0.074458 | 0.055844 | 0.031024 | 0.198555 | 0.043434 | 0.037229 | 0.006205 | 0.142711 | 0.936931 |
3 | 0.056095 | 0.024041 | 0.056095 | 0.248422 | 0.024041 | 0.056095 | 0.096163 | 0.024041 | 0.208354 | 0.032054 | 0.088150 | 0.000000 | 0.176299 | 0.913551 |
4 | 0.108702 | 0.032611 | 0.133161 | 0.230993 | 0.021740 | 0.035328 | 0.108702 | 0.043481 | 0.192947 | 0.043481 | 0.258168 | 0.065221 | 0.220123 | 0.861467 |
plt.figure(figsize=(10, 7))
plt.title("Dendrograms")
dend = shc.dendrogram(shc.linkage(data_scaled, method='ward'))
plt.axhline(y=1.5, color='r', linestyle='--')
<matplotlib.lines.Line2D at 0x12d88f710>
Q4. Discuss what happens when you decide to cut the dendogram in different level.
Imagine we cut the dendrogram in a different level at 2.0, it will only show 2 clusters.
cluster = AgglomerativeClustering(n_clusters=3, affinity='euclidean', linkage='ward')
cluster_ids = cluster.fit_predict(data_scaled)
plt.figure(figsize=(10, 7))
plt.title("Dendrograms")
dend = shc.dendrogram(shc.linkage(data_scaled, method='ward'))
plt.axhline(y=2.0, color='r', linestyle='--')
plt.axhline(y=1.0, color='r', linestyle='--')
<matplotlib.lines.Line2D at 0x34f690050>
Imagine we cut the dendrograms at level 1.0, it will show 5 clusters.
In sum, cutting a dendrogram at higher levels yields fewer, larger clusters with coarser groupings, combining more diverse data points. Cutting at lower levels results in more, smaller clusters with finer groupings, capturing more specific relationships among similar data points. This affects the granularity and specificity of the clustering outcome.
clustering_data['cluster'] = cluster_ids
clustering_data.head()
Crime type_Anti-social behaviour | Crime type_Bicycle theft | Crime type_Burglary | Crime type_Criminal damage and arson | Crime type_Drugs | Crime type_Other crime | Crime type_Other theft | Crime type_Possession of weapons | Crime type_Public order | Crime type_Robbery | Crime type_Shoplifting | Crime type_Theft from the person | Crime type_Vehicle crime | Crime type_Violence and sexual offences | cluster | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 108 | 18 | 84 | 129 | 9 | 30 | 459 | 42 | 183 | 27 | 105 | 15 | 177 | 564 | 0 |
1 | 18 | 3 | 18 | 69 | 3 | 9 | 21 | 0 | 45 | 6 | 3 | 0 | 42 | 264 | 1 |
2 | 36 | 0 | 30 | 96 | 12 | 36 | 27 | 15 | 96 | 21 | 18 | 3 | 69 | 453 | 1 |
3 | 21 | 9 | 21 | 93 | 9 | 21 | 36 | 9 | 78 | 12 | 33 | 0 | 66 | 342 | 1 |
4 | 120 | 36 | 147 | 255 | 24 | 39 | 120 | 48 | 213 | 48 | 285 | 72 | 243 | 951 | 0 |
hiarchical_cluster = pd.DataFrame(round(clustering_data.groupby('cluster').mean(),1))
hiarchical_cluster
Crime type_Anti-social behaviour | Crime type_Bicycle theft | Crime type_Burglary | Crime type_Criminal damage and arson | Crime type_Drugs | Crime type_Other crime | Crime type_Other theft | Crime type_Possession of weapons | Crime type_Public order | Crime type_Robbery | Crime type_Shoplifting | Crime type_Theft from the person | Crime type_Vehicle crime | Crime type_Violence and sexual offences | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
cluster | ||||||||||||||
0 | 70.4 | 20.0 | 73.9 | 95.6 | 39.0 | 18.7 | 123.8 | 30.1 | 142.8 | 37.4 | 204.4 | 27.3 | 101.6 | 510.4 |
1 | 30.4 | 3.5 | 34.1 | 56.3 | 10.4 | 14.2 | 35.9 | 11.5 | 60.9 | 10.4 | 12.7 | 2.9 | 51.8 | 339.8 |
2 | 9.9 | 0.9 | 25.4 | 19.7 | 3.0 | 9.0 | 15.6 | 2.7 | 21.3 | 3.0 | 5.1 | 1.3 | 50.7 | 109.4 |
Q5. Discuss the cluster results based on your dataset.
Through the calculation of mean values for various crime types across the three identified clusters, we can categorize the clusters as follows:
Cluster 0: High-Risk Crime Area • Description: Cluster 0 exhibits the highest mean values for a wide range of crime types, including anti-social behavior, burglary, criminal damage and arson, drugs, possession of weapons, public order offenses, robbery, shoplifting, theft from the person, vehicle crime, and violence and sexual offenses. • Implication: The elevated mean values for most crime categories indicate that this cluster represents high-risk areas with significant crime activity.
Cluster 1: Moderate Crime Area • Description: Cluster 1 shows mean values for all crime types that are intermediate compared to the other clusters. • Implication: With crime levels that are neither too high nor too low, this cluster can be characterized as moderate in terms of crime risk.
Cluster 2: Low Crime Area • Description: Cluster 2 has the lowest mean values for each crime type. • Implication: The consistently low mean values across all crime categories suggest that this cluster represents low-risk areas with minimal crime occurrences.
clustering_data_original['cluster'] = cluster_ids
clusters = clustering_data_original[['LSOA code', 'cluster']]
clusters.head()
LSOA code | cluster | |
---|---|---|
0 | E01010410 | 0 |
1 | E01010411 | 1 |
2 | E01010412 | 1 |
3 | E01010413 | 1 |
4 | E01010414 | 0 |
clustered_full = pd.merge(filtered_data, clusters, on='LSOA code')
clustered_full.head()
Crime ID | Month | Reported by | Falls within | Longitude | Latitude | Location | LSOA code | LSOA name | Crime type | Last outcome category | Context | key | town | cluster | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 3740d3b5932e40edd0b2d94ba63bfc787d237ef876374b... | 2022-11 | West Midlands Police | West Midlands Police | -2.129332 | 52.618605 | On or near Barrington Close | E01010434 | Wolverhampton 001A | Burglary | Investigation complete; no suspect identified | NaN | file0 | Wolverhampton | 0 |
1 | 861910c2c962166de706c516ec55038a0ab9c46f6a14d6... | 2022-11 | West Midlands Police | West Midlands Police | -2.121103 | 52.617840 | On or near Sherborne Road | E01010434 | Wolverhampton 001A | Burglary | Unable to prosecute suspect | NaN | file0 | Wolverhampton | 0 |
2 | bb82e4a2a6ca33c8b527b2270c8716793789d63769f010... | 2022-11 | West Midlands Police | West Midlands Police | -2.129332 | 52.618605 | On or near Barrington Close | E01010434 | Wolverhampton 001A | Burglary | Investigation complete; no suspect identified | NaN | file0 | Wolverhampton | 0 |
3 | 7be5d72cf3e24376607e6b0478fdcca8a4cf07653a308c... | 2022-11 | West Midlands Police | West Midlands Police | -2.126509 | 52.617745 | On or near Marklin Avenue | E01010434 | Wolverhampton 001A | Burglary | Investigation complete; no suspect identified | NaN | file0 | Wolverhampton | 0 |
4 | e4c839a07f0475a1dae66ce5710e0a7591ef9b39491bf1... | 2022-11 | West Midlands Police | West Midlands Police | -2.129332 | 52.618605 | On or near Barrington Close | E01010434 | Wolverhampton 001A | Criminal damage and arson | Investigation complete; no suspect identified | NaN | file0 | Wolverhampton | 0 |
def get_color(cluster_id):
if cluster_id == 1:
return 'darkred'
if cluster_id == 0:
return 'green'
if cluster_id == 2:
return 'yellow'
#create a map
this_map = folium.Map(location =[clustered_full["Latitude"].mean(), clustered_full["Longitude"].mean()], zoom_start=5)
def plot_dot(point):
'''input: series that contains a numeric named latitude and a numeric named longitude this function creates a CircleMarker and adds it to your this_map'''
folium.CircleMarker(location=[point.Latitude, point.Longitude],
radius=2,
color=point.color,
weight=1).add_to(this_map)
clustered_full["color"] = clustered_full["cluster"].apply(lambda x: get_color(x))
#use df.apply(,axis=1) to iterate through every row in your dataframe
clustered_full.apply(plot_dot, axis = 1)
#Set the zoom to the maximum possible
this_map.fit_bounds(this_map.get_bounds())
#Save the map to an HTML file
this_map.save(os.path.join('Crime_map.html'))
Change the number of clusters into different value, previous n_cluster = 3, I will change it to 2 here.
cluster1 = AgglomerativeClustering(n_clusters=2, affinity='euclidean', linkage='ward')
cluster_ids1 = cluster1.fit_predict(data_scaled)
clustering_data['cluster1'] = cluster_ids1
hiarchical_cluster1 = pd.DataFrame(round(clustering_data.groupby('cluster1').mean(),1))
hiarchical_cluster1
Crime type_Anti-social behaviour | Crime type_Bicycle theft | Crime type_Burglary | Crime type_Criminal damage and arson | Crime type_Drugs | Crime type_Other crime | Crime type_Other theft | Crime type_Possession of weapons | Crime type_Public order | Crime type_Robbery | Crime type_Shoplifting | Crime type_Theft from the person | Crime type_Vehicle crime | Crime type_Violence and sexual offences | cluster | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
cluster1 | |||||||||||||||
0 | 27.4 | 3.1 | 32.8 | 50.9 | 9.3 | 13.4 | 32.9 | 10.2 | 55.1 | 9.3 | 11.6 | 2.6 | 51.6 | 305.7 | 1.1 |
1 | 70.4 | 20.0 | 73.9 | 95.6 | 39.0 | 18.7 | 123.8 | 30.1 | 142.8 | 37.4 | 204.4 | 27.3 | 101.6 | 510.4 | 0.0 |
Selecting another City - Birmingham
dataset.head()
Crime ID | Month | Reported by | Falls within | Longitude | Latitude | Location | LSOA code | LSOA name | Crime type | Last outcome category | Context | key | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | cb47df2e6dbe1b37b6362a9312116911e0542f31dbdd56... | 2022-11 | West Midlands Police | West Midlands Police | -1.848972 | 52.588428 | On or near Woodside | E01009417 | Birmingham 001A | Other theft | Unable to prosecute suspect | NaN | file0 |
1 | 47d811512341fd0ecec13cb254b40f1b220928a31d349a... | 2022-11 | West Midlands Police | West Midlands Police | -1.849790 | 52.590937 | On or near Walsall Road | E01009417 | Birmingham 001A | Violence and sexual offences | Investigation complete; no suspect identified | NaN | file0 |
2 | da2ff35b910f6ec63d2f4f6c006fa46991675511a2f25d... | 2022-11 | West Midlands Police | West Midlands Police | -1.844834 | 52.590077 | On or near Heathfield Road | E01009417 | Birmingham 001A | Violence and sexual offences | Unable to prosecute suspect | NaN | file0 |
3 | 08fe094d4f9b3df0a748b8605c92e58c7cc6ebf171bdc3... | 2022-11 | West Midlands Police | West Midlands Police | -1.849790 | 52.590937 | On or near Walsall Road | E01009417 | Birmingham 001A | Violence and sexual offences | Unable to prosecute suspect | NaN | file0 |
4 | a8aa5abff380c85b94e97fd746ed67f4b02d21716a00da... | 2022-11 | West Midlands Police | West Midlands Police | -1.847123 | 52.593864 | On or near Bramble Way | E01009417 | Birmingham 001A | Violence and sexual offences | Unable to prosecute suspect | NaN | file0 |
towns1 = ['Birmingham']
filtered_data1 = data[data.town.str.contains('|'.join(towns1), na=False)]
filtered_data1.head()
Crime ID | Month | Reported by | Falls within | Longitude | Latitude | Location | LSOA code | LSOA name | Crime type | Last outcome category | Context | key | town | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | cb47df2e6dbe1b37b6362a9312116911e0542f31dbdd56... | 2022-11 | West Midlands Police | West Midlands Police | -1.848972 | 52.588428 | On or near Woodside | E01009417 | Birmingham 001A | Other theft | Unable to prosecute suspect | NaN | file0 | Birmingham |
1 | 47d811512341fd0ecec13cb254b40f1b220928a31d349a... | 2022-11 | West Midlands Police | West Midlands Police | -1.849790 | 52.590937 | On or near Walsall Road | E01009417 | Birmingham 001A | Violence and sexual offences | Investigation complete; no suspect identified | NaN | file0 | Birmingham |
2 | da2ff35b910f6ec63d2f4f6c006fa46991675511a2f25d... | 2022-11 | West Midlands Police | West Midlands Police | -1.844834 | 52.590077 | On or near Heathfield Road | E01009417 | Birmingham 001A | Violence and sexual offences | Unable to prosecute suspect | NaN | file0 | Birmingham |
3 | 08fe094d4f9b3df0a748b8605c92e58c7cc6ebf171bdc3... | 2022-11 | West Midlands Police | West Midlands Police | -1.849790 | 52.590937 | On or near Walsall Road | E01009417 | Birmingham 001A | Violence and sexual offences | Unable to prosecute suspect | NaN | file0 | Birmingham |
4 | a8aa5abff380c85b94e97fd746ed67f4b02d21716a00da... | 2022-11 | West Midlands Police | West Midlands Police | -1.847123 | 52.593864 | On or near Bramble Way | E01009417 | Birmingham 001A | Violence and sexual offences | Unable to prosecute suspect | NaN | file0 | Birmingham |
filtered_data1['LSOA code'].value_counts().nlargest(10)
LSOA code E01033620 18717 E01033615 9519 E01033561 6381 E01033617 6303 E01033557 4461 E01009239 3591 E01009146 3396 E01009200 3315 E01009284 3099 E01009378 3051 Name: count, dtype: int64
filtered_important_data1 = filtered_data1[['LSOA code','Crime type']]
filtered_important_data1.head()
LSOA code | Crime type | |
---|---|---|
0 | E01009417 | Other theft |
1 | E01009417 | Violence and sexual offences |
2 | E01009417 | Violence and sexual offences |
3 | E01009417 | Violence and sexual offences |
4 | E01009417 | Violence and sexual offences |
filtered_important_data1 = filtered_data1[['LSOA code','Crime type']]
filtered_important_data1 = pd.get_dummies(filtered_important_data1, columns=['Crime type'])
clustering_data2 = filtered_important_data1.groupby(['LSOA code']).agg({'Crime type_Anti-social behaviour':'sum',
'Crime type_Bicycle theft':'sum',
'Crime type_Burglary':'sum',
'Crime type_Criminal damage and arson':'sum',
'Crime type_Drugs':'sum',
'Crime type_Other crime':'sum',
'Crime type_Other theft':'sum',
'Crime type_Possession of weapons':'sum',
'Crime type_Public order':'sum',
'Crime type_Robbery':'sum',
'Crime type_Shoplifting':'sum',
'Crime type_Theft from the person':'sum',
'Crime type_Vehicle crime':'sum',
'Crime type_Violence and sexual offences':'sum'}).reset_index()
clustering_data2[0:5]
LSOA code | Crime type_Anti-social behaviour | Crime type_Bicycle theft | Crime type_Burglary | Crime type_Criminal damage and arson | Crime type_Drugs | Crime type_Other crime | Crime type_Other theft | Crime type_Possession of weapons | Crime type_Public order | Crime type_Robbery | Crime type_Shoplifting | Crime type_Theft from the person | Crime type_Vehicle crime | Crime type_Violence and sexual offences | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | E01008881 | 54 | 9 | 45 | 102 | 30 | 15 | 69 | 36 | 126 | 117 | 78 | 12 | 54 | 441 |
1 | E01008882 | 48 | 0 | 39 | 33 | 21 | 24 | 45 | 18 | 84 | 15 | 0 | 0 | 69 | 321 |
2 | E01008883 | 45 | 3 | 48 | 81 | 54 | 9 | 36 | 21 | 93 | 15 | 30 | 6 | 42 | 387 |
3 | E01008884 | 54 | 6 | 72 | 90 | 30 | 33 | 84 | 36 | 93 | 18 | 27 | 12 | 117 | 615 |
4 | E01008885 | 12 | 0 | 45 | 27 | 6 | 6 | 66 | 15 | 45 | 0 | 18 | 0 | 45 | 153 |
clustering_data_original1 = clustering_data2.copy()
clustering_data_original1.head()
LSOA code | Crime type_Anti-social behaviour | Crime type_Bicycle theft | Crime type_Burglary | Crime type_Criminal damage and arson | Crime type_Drugs | Crime type_Other crime | Crime type_Other theft | Crime type_Possession of weapons | Crime type_Public order | Crime type_Robbery | Crime type_Shoplifting | Crime type_Theft from the person | Crime type_Vehicle crime | Crime type_Violence and sexual offences | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | E01008881 | 54 | 9 | 45 | 102 | 30 | 15 | 69 | 36 | 126 | 117 | 78 | 12 | 54 | 441 |
1 | E01008882 | 48 | 0 | 39 | 33 | 21 | 24 | 45 | 18 | 84 | 15 | 0 | 0 | 69 | 321 |
2 | E01008883 | 45 | 3 | 48 | 81 | 54 | 9 | 36 | 21 | 93 | 15 | 30 | 6 | 42 | 387 |
3 | E01008884 | 54 | 6 | 72 | 90 | 30 | 33 | 84 | 36 | 93 | 18 | 27 | 12 | 117 | 615 |
4 | E01008885 | 12 | 0 | 45 | 27 | 6 | 6 | 66 | 15 | 45 | 0 | 18 | 0 | 45 | 153 |
clustering_data2.drop(['LSOA code'], axis = 1, inplace = True, errors = 'ignore')
clustering_data2.head()
Crime type_Anti-social behaviour | Crime type_Bicycle theft | Crime type_Burglary | Crime type_Criminal damage and arson | Crime type_Drugs | Crime type_Other crime | Crime type_Other theft | Crime type_Possession of weapons | Crime type_Public order | Crime type_Robbery | Crime type_Shoplifting | Crime type_Theft from the person | Crime type_Vehicle crime | Crime type_Violence and sexual offences | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 54 | 9 | 45 | 102 | 30 | 15 | 69 | 36 | 126 | 117 | 78 | 12 | 54 | 441 |
1 | 48 | 0 | 39 | 33 | 21 | 24 | 45 | 18 | 84 | 15 | 0 | 0 | 69 | 321 |
2 | 45 | 3 | 48 | 81 | 54 | 9 | 36 | 21 | 93 | 15 | 30 | 6 | 42 | 387 |
3 | 54 | 6 | 72 | 90 | 30 | 33 | 84 | 36 | 93 | 18 | 27 | 12 | 117 | 615 |
4 | 12 | 0 | 45 | 27 | 6 | 6 | 66 | 15 | 45 | 0 | 18 | 0 | 45 | 153 |
data_scaled1 = normalize(clustering_data2)
data_scaled1 = pd.DataFrame(data_scaled1, columns=clustering_data2.columns)
data_scaled1.head()
Crime type_Anti-social behaviour | Crime type_Bicycle theft | Crime type_Burglary | Crime type_Criminal damage and arson | Crime type_Drugs | Crime type_Other crime | Crime type_Other theft | Crime type_Possession of weapons | Crime type_Public order | Crime type_Robbery | Crime type_Shoplifting | Crime type_Theft from the person | Crime type_Vehicle crime | Crime type_Violence and sexual offences | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.106769 | 0.017795 | 0.088974 | 0.201675 | 0.059316 | 0.029658 | 0.136427 | 0.071179 | 0.249128 | 0.231333 | 0.154222 | 0.023726 | 0.106769 | 0.871947 |
1 | 0.136662 | 0.000000 | 0.111038 | 0.093955 | 0.059790 | 0.068331 | 0.128121 | 0.051248 | 0.239159 | 0.042707 | 0.000000 | 0.000000 | 0.196452 | 0.913929 |
2 | 0.106968 | 0.007131 | 0.114100 | 0.192543 | 0.128362 | 0.021394 | 0.085575 | 0.049919 | 0.221068 | 0.035656 | 0.071312 | 0.014262 | 0.099837 | 0.919928 |
3 | 0.082509 | 0.009168 | 0.110012 | 0.137515 | 0.045838 | 0.050422 | 0.128347 | 0.055006 | 0.142099 | 0.027503 | 0.041254 | 0.018335 | 0.178769 | 0.939685 |
4 | 0.063839 | 0.000000 | 0.239396 | 0.143637 | 0.031919 | 0.031919 | 0.351114 | 0.079799 | 0.239396 | 0.000000 | 0.095758 | 0.000000 | 0.239396 | 0.813945 |
plt.figure(figsize=(10, 7))
plt.title("Dendrograms")
dend_new = shc.dendrogram(shc.linkage(data_scaled1, method='ward'))
plt.axhline(y=3.5, color='r', linestyle='--')
<matplotlib.lines.Line2D at 0x37f3d6310>
n_cluster is set to 4
cluster2 = AgglomerativeClustering(n_clusters=4, affinity='euclidean', linkage='ward')
cluster_ids2 = cluster2.fit_predict(data_scaled1)
clustering_data2['cluster2'] = cluster_ids2
clustering_data2.head()
Crime type_Anti-social behaviour | Crime type_Bicycle theft | Crime type_Burglary | Crime type_Criminal damage and arson | Crime type_Drugs | Crime type_Other crime | Crime type_Other theft | Crime type_Possession of weapons | Crime type_Public order | Crime type_Robbery | Crime type_Shoplifting | Crime type_Theft from the person | Crime type_Vehicle crime | Crime type_Violence and sexual offences | cluster2 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 54 | 9 | 45 | 102 | 30 | 15 | 69 | 36 | 126 | 117 | 78 | 12 | 54 | 441 | 1 |
1 | 48 | 0 | 39 | 33 | 21 | 24 | 45 | 18 | 84 | 15 | 0 | 0 | 69 | 321 | 1 |
2 | 45 | 3 | 48 | 81 | 54 | 9 | 36 | 21 | 93 | 15 | 30 | 6 | 42 | 387 | 1 |
3 | 54 | 6 | 72 | 90 | 30 | 33 | 84 | 36 | 93 | 18 | 27 | 12 | 117 | 615 | 1 |
4 | 12 | 0 | 45 | 27 | 6 | 6 | 66 | 15 | 45 | 0 | 18 | 0 | 45 | 153 | 2 |
hiarchical_cluster2 = pd.DataFrame(round(clustering_data2.groupby('cluster2').mean(),1))
hiarchical_cluster2
Crime type_Anti-social behaviour | Crime type_Bicycle theft | Crime type_Burglary | Crime type_Criminal damage and arson | Crime type_Drugs | Crime type_Other crime | Crime type_Other theft | Crime type_Possession of weapons | Crime type_Public order | Crime type_Robbery | Crime type_Shoplifting | Crime type_Theft from the person | Crime type_Vehicle crime | Crime type_Violence and sexual offences | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
cluster2 | ||||||||||||||
0 | 79.6 | 14.0 | 66.9 | 80.7 | 34.6 | 13.8 | 231.1 | 33.1 | 144.8 | 52.7 | 278.8 | 50.2 | 123.0 | 482.6 |
1 | 41.8 | 3.5 | 44.7 | 64.2 | 19.9 | 12.9 | 47.0 | 18.3 | 72.6 | 21.9 | 12.3 | 11.6 | 70.6 | 386.9 |
2 | 29.7 | 7.0 | 44.1 | 35.3 | 7.6 | 6.2 | 36.1 | 7.8 | 40.4 | 11.6 | 16.5 | 3.5 | 97.5 | 165.6 |
3 | 0.0 | 0.2 | 0.2 | 1.0 | 5.4 | 0.9 | 0.6 | 2.1 | 0.8 | 2.2 | 0.0 | 0.0 | 0.3 | 8.0 |