Data Wrangling
The process of turning raw data into more appropriate format. in the end, the result should have acceptable Data Quality.
Contents included in this notebook:
- Data Exploration
- Data Cleaning
- Merging Datasets
Also this notebook will introduce you to Pandas which faciliate the manipulation task.
Datasets:
from os.path import join
import numpy as np
import pandas as pd
%pdb 1
Automatic pdb calling has been turned ON
Samsung after-sales service in Iran
Data Exploration
The purpose of this part is to gain a better understanding of the data. what’s data is about? what’s the structure? what’s the values? …
Let’s read Iran’s Earthquakes dataset into a DataFrame.
service_df = pd.read_csv(join('Datasets', 'service', 'samdata.csv'))
service_df
No | Service_type | Cost_Type | Product_Group | Serial_No | City | Product_Date | Receipt_Date | Appoint_Date | Complete_Date | TAT01 | TAT02 | Job_Satus | Defect_Des | Symptom_Desc | Repair_Action_Desc | Labor_Charge_Desc | Engineer | Labor_Charge_Amount | Parts_Amount | Discount_Amount | Total_Invoice_Amount | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | Normal Warranty | AUDIO - HOME THEAT | 2F/LAHFZ7E-IP-X | Tehran | NaN | 2/5/2012 | 2/5/2012 | 1/21/2013 | 0 | 351 | 0 | WORN-OUT | OTHER PROBLEMS | SOFTWARE UPGRADE | Free Of Charge | Tech15 | 0 | 0.00 | 0.000 | 0.000 |
1 | 2 | 1 | Normal Warranty | REFRIGERATOR-SBS | AIP244AACA/4E7 | Tehran | NaN | 7/7/2012 | 9/10/2012 | 1/31/2013 | 65 | 208 | 1 | SHORT | OTHER PROBLEMS | OTHER REPAIR | Free Of Charge | Tech03 | 0 | 3437481.60 | 687486.320 | 2748885.280 |
2 | 3 | 1 | Normal Warranty | REFRIGERATOR-SBS | JK-C4BBCAEKL72 | Tehran | NaN | 7/7/2012 | 8/15/2012 | 1/31/2013 | 38 | 208 | 1 | OPEN | OTHER PROBLEMS | OTHER REPAIR | Free Of Charge | Tech03 | 0 | 4228756.56 | 845751.312 | 3383005.248 |
3 | 4 | 0 | Out of Warranty | MONITOR - TFT LCD | IP2JO4/-A-47 | Tehran | NaN | 8/7/2012 | 8/7/2012 | 1/13/2013 | 0 | 158 | 1 | WORN-OUT | OTHER PROBLEMS | ELECTRICAL PARTS REPLACEMENT | Free Of Charge | Tech15 | 0 | 0.00 | 0.000 | 0.000 |
4 | 5 | 0 | Out of Warranty | AUDIO - MP3 Player | AU/BCQE2A/P | Tehran | NaN | 8/7/2012 | 8/9/2012 | 1/7/2013 | 2 | 153 | 1 | WORN-OUT | OTHER PROBLEMS | SOFTWARE UPGRADE | Medium | Tech07 | 120000 | 765488.60 | 177087.820 | 708381.680 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
3131 | 3132 | 0 | Normal Warranty | LCD-40 | IP2JRIP/QE2KLKL | REZVANSHAHR | NaN | 1/31/2013 | 1/31/2013 | 1/31/2013 | 0 | 0 | 1 | WORN-OUT | OTHER PROBLEMS | ELECTRICAL PARTS REPLACEMENT | Major | Tech13 | 420000 | 0.00 | 84000.000 | 336000.000 |
3132 | 3133 | 0 | Normal Warranty | LED46 | IP2JRC/&/2/IP | ESFAHAN | NaN | 1/31/2013 | 1/31/2013 | 1/31/2013 | 0 | 0 | 1 | WORN-OUT | OTHER PROBLEMS | ELECTRICAL PARTS REPLACEMENT | Major | Tech13 | 420000 | 0.00 | 84000.000 | 336000.000 |
3133 | 3134 | 0 | Extended Warranty | HARD DISK DRIVE | S2/BJ2/B2KL4Z2 | Tehran | 40552.0 | 1/31/2013 | 1/31/2013 | 1/31/2013 | 0 | 0 | 1 | WORN-OUT | OTHER PROBLEMS | REPAIR REFUSAL | Minimum | Tech07 | 50000 | 0.00 | 10000.000 | 40000.000 |
3134 | 3135 | 0 | Out of Warranty | HARD DISK DRIVE | S2/BJ2/SC2/QE2 | Tehran | 38825.0 | 1/31/2013 | 1/31/2013 | 1/31/2013 | 0 | 0 | 1 | WORN-OUT | OTHER PROBLEMS | REPAIR REFUSAL | Free Of Charge | Tech07 | 0 | 0.00 | 0.000 | 0.000 |
3135 | 3136 | 0 | Out of Warranty | DVD PLAYER/ RECORD | A&QEKLCDQAEA7AZ | Tehran | 38448.0 | 1/31/2013 | 1/31/2013 | 1/31/2013 | 0 | 0 | 1 | WORN-OUT | OTHER PROBLEMS | MECHANICAL PARTS REPLACEMENT | Minor | Tech18 | 150000 | 1488482.00 | 328686.400 | 1318785.600 |
3136 rows × 22 columns
shape
attribute indicate how large the resulting DataFrame is.
service_df.shape
(3136, 22)
print("pandas only displays " + str(pd.options.display.max_rows) + " rows" )
print("and " + str(pd.options.display.max_columns) + " columns at max by default" )
pandas only displays 60 rows
and None columns at max by default
The size of the dataset is not that big and beside that, it only have 2 more columns than pandas maximum displayed columns. so let’s tell pandas to show all columns for convenience.
pd.set_option('display.max_columns', None)
use loc()
and iloc()
for selecting rows in specific range.
service_df.iloc[26:33:3]
No | Service_type | Cost_Type | Product_Group | Serial_No | City | Product_Date | Receipt_Date | Appoint_Date | Complete_Date | TAT01 | TAT02 | Job_Satus | Defect_Des | Symptom_Desc | Repair_Action_Desc | Labor_Charge_Desc | Engineer | Labor_Charge_Amount | Parts_Amount | Discount_Amount | Total_Invoice_Amount | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
26 | 27 | 0 | Normal Warranty | DIGITAL CAMERA | AAKLMCN/CIPE/GT | Tehran | NaN | 11/22/2012 | 11/22/2012 | 1/26/2013 | 0 | 65 | 1 | DEFORMED | OUTCASE CRACK | ELECTRICAL PARTS REPLACEMENT | Major | Tech11 | 250000 | 0.0 | 50000.0 | 200000.0 |
29 | 30 | 0 | Normal Warranty | TFT18 | H2MQ2/2/472 | Tehran | NaN | 12/1/2012 | 12/1/2012 | 1/6/2013 | 0 | 36 | 1 | WORN-OUT | OTHER PROBLEMS | ELECTRICAL PARTS REPLACEMENT | Free Of Charge | Tech16 | 0 | 0.0 | 0.0 | 0.0 |
32 | 33 | 0 | Normal Warranty | TFT18 | H2MQ2/AE27 | Tehran | NaN | 12/2/2012 | 12/2/2012 | 1/6/2013 | 0 | 35 | 1 | WORN-OUT | NO POWER | ELECTRICAL PARTS REPLACEMENT | Free Of Charge | Tech16 | 0 | 0.0 | 0.0 | 0.0 |
sample()
returns random rows to look at.
service_df.sample(13)
No | Service_type | Cost_Type | Product_Group | Serial_No | City | Product_Date | Receipt_Date | Appoint_Date | Complete_Date | TAT01 | TAT02 | Job_Satus | Defect_Des | Symptom_Desc | Repair_Action_Desc | Labor_Charge_Desc | Engineer | Labor_Charge_Amount | Parts_Amount | Discount_Amount | Total_Invoice_Amount | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
130 | 131 | 0 | Normal Warranty | LED40 | IP2JQC/KLIP2IPIP | Tehran | NaN | 12/18/2012 | 12/18/2012 | 1/30/2013 | 0 | 43 | 1 | WORN-OUT | NO PICTURE/NO RASTER | SET EXCHANGE | Free Of Charge | Tech23 | 0 | 0.00 | 0.000 | 0.000000e+00 |
1807 | 1808 | 0 | Out of Warranty | MONITOR - TFT LCD | H2MPA/2KL74/ | Tehran | 38083.0 | 1/13/2013 | 1/14/2013 | 1/23/2013 | 1 | 10 | 0 | WORN-OUT | OTHER PROBLEMS | ELECTRICAL PARTS REPLACEMENT | Free Of Charge | Tech16 | 0 | 0.00 | 0.000 | 0.000000e+00 |
2971 | 2972 | 1 | Normal Warranty | LED40 | IP2JQA/&427A | Tehran | NaN | 1/27/2013 | 1/27/2013 | 1/28/2013 | 0 | 1 | 1 | WORN-OUT | OTHER PROBLEMS | ELECTRICAL PARTS REPLACEMENT | Major | Tech15 | 420000 | 14086824.28 | 2803384.856 | 1.161354e+07 |
3043 | 3044 | 0 | Out of Warranty | DIGITAL CAMERA | AAKLMCN/CAE/LAJ | Tehran | 40808.0 | 1/28/2013 | 1/28/2013 | 1/30/2013 | 0 | 2 | 1 | DEFORMED | OTHER PROBLEMS | ELECTRICAL PARTS REPLACEMENT | Major | Tech11 | 250000 | 584648.70 | 166828.740 | 6.677189e+05 |
2331 | 2332 | 0 | Normal Warranty | LCD-40 | IP2JRA/72A/KL | Tehran | NaN | 1/20/2013 | 1/20/2013 | 1/21/2013 | 0 | 1 | 1 | WORN-OUT | OTHER PROBLEMS | SET EXCHANGE | Free Of Charge | Tech14 | 0 | 0.00 | 0.000 | 0.000000e+00 |
101 | 102 | 0 | Out of Warranty | MONITOR - TFT LCD | H2MM2/A/4IPKL | Tehran | 38081.0 | 12/16/2012 | 12/17/2012 | 1/5/2013 | 1 | 20 | 0 | WORN-OUT | OTHER PROBLEMS | ELECTRICAL PARTS REPLACEMENT | Free Of Charge | Tech16 | 0 | 0.00 | 0.000 | 0.000000e+00 |
3096 | 3097 | 0 | Normal Warranty | LED40 | IP2JQA/&KL/4/ | Tehran | NaN | 1/28/2013 | 1/28/2013 | 1/31/2013 | 0 | 3 | 1 | WORN-OUT | NO PICTURE/NO RASTER | SET EXCHANGE | Free Of Charge | Tech23 | 0 | 0.00 | 0.000 | 0.000000e+00 |
2181 | 2182 | 0 | Normal Warranty | DVD PLAYER | ZT2QARDBC/2&IP2Z | Tehran | NaN | 1/19/2013 | 1/19/2013 | 1/20/2013 | 0 | 1 | 1 | WORN-OUT | READ ERROR | ELECTRICAL PARTS REPLACEMENT | Major | Tech18 | 170000 | 1840063.68 | 402012.736 | 1.608051e+06 |
2141 | 2142 | 0 | Normal Warranty | LED40 | IP2JQA/&QE4/ | Tehran | NaN | 1/19/2013 | 1/19/2013 | 1/21/2013 | 0 | 2 | 1 | WORN-OUT | OTHER PROBLEMS | ELECTRICAL PARTS REPLACEMENT | Minor | Tech14 | 215000 | 0.00 | 43000.000 | 1.720000e+05 |
935 | 936 | 0 | Normal Warranty | LCD-40 | IP2JRA/7/7&IP | Tehran | NaN | 1/5/2013 | 1/5/2013 | 1/6/2013 | 0 | 1 | 1 | WORN-OUT | OTHER PROBLEMS | SET EXCHANGE | Free Of Charge | Tech08 | 0 | 0.00 | 0.000 | 0.000000e+00 |
1698 | 1699 | 0 | Normal Warranty | LCD-40 | IP2JQA/&2772 | Tehran | NaN | 1/10/2013 | 1/19/2013 | 1/20/2013 | 8 | 10 | 1 | WORN-OUT | OTHER PROBLEMS | SET EXCHANGE | Free Of Charge | Tech13 | 0 | 0.00 | 0.000 | 0.000000e+00 |
343 | 344 | 1 | Normal Warranty | LED40 | IP2JR2/-2/IP4 | Tehran | NaN | 12/27/2012 | 12/31/2012 | 1/1/2013 | 4 | 5 | 1 | WORN-OUT | OTHER PROBLEMS | ELECTRICAL PARTS REPLACEMENT | Major | Tech10 | 420000 | 24887810.32 | 5063582.064 | 2.025433e+07 |
1875 | 1876 | 1 | Normal Warranty | LCD- 46 | IP2JQ7/42/42 | Tehran | NaN | 1/14/2013 | 1/17/2013 | 1/20/2013 | 3 | 6 | 1 | WORN-OUT | OTHER PROBLEMS | ELECTRICAL PARTS REPLACEMENT | Free Of Charge | Tech15 | 0 | 0.00 | 0.000 | 0.000000e+00 |
Dataset itself have a column for index values, let’s be sure that it’s really unique for each row, then we can condiser No
column as index.
service_df['No'].is_unique
True
service_df.set_index("No", inplace=True)
head(n)
returns first n rows.
tail(n)
returns last n rows.
Default value is considered to be 5.
service_df.head()
Service_type | Cost_Type | Product_Group | Serial_No | City | Product_Date | Receipt_Date | Appoint_Date | Complete_Date | TAT01 | TAT02 | Job_Satus | Defect_Des | Symptom_Desc | Repair_Action_Desc | Labor_Charge_Desc | Engineer | Labor_Charge_Amount | Parts_Amount | Discount_Amount | Total_Invoice_Amount | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
No | |||||||||||||||||||||
1 | 0 | Normal Warranty | AUDIO - HOME THEAT | 2F/LAHFZ7E-IP-X | Tehran | NaN | 2/5/2012 | 2/5/2012 | 1/21/2013 | 0 | 351 | 0 | WORN-OUT | OTHER PROBLEMS | SOFTWARE UPGRADE | Free Of Charge | Tech15 | 0 | 0.00 | 0.000 | 0.000 |
2 | 1 | Normal Warranty | REFRIGERATOR-SBS | AIP244AACA/4E7 | Tehran | NaN | 7/7/2012 | 9/10/2012 | 1/31/2013 | 65 | 208 | 1 | SHORT | OTHER PROBLEMS | OTHER REPAIR | Free Of Charge | Tech03 | 0 | 3437481.60 | 687486.320 | 2748885.280 |
3 | 1 | Normal Warranty | REFRIGERATOR-SBS | JK-C4BBCAEKL72 | Tehran | NaN | 7/7/2012 | 8/15/2012 | 1/31/2013 | 38 | 208 | 1 | OPEN | OTHER PROBLEMS | OTHER REPAIR | Free Of Charge | Tech03 | 0 | 4228756.56 | 845751.312 | 3383005.248 |
4 | 0 | Out of Warranty | MONITOR - TFT LCD | IP2JO4/-A-47 | Tehran | NaN | 8/7/2012 | 8/7/2012 | 1/13/2013 | 0 | 158 | 1 | WORN-OUT | OTHER PROBLEMS | ELECTRICAL PARTS REPLACEMENT | Free Of Charge | Tech15 | 0 | 0.00 | 0.000 | 0.000 |
5 | 0 | Out of Warranty | AUDIO - MP3 Player | AU/BCQE2A/P | Tehran | NaN | 8/7/2012 | 8/9/2012 | 1/7/2013 | 2 | 153 | 1 | WORN-OUT | OTHER PROBLEMS | SOFTWARE UPGRADE | Medium | Tech07 | 120000 | 765488.60 | 177087.820 | 708381.680 |
Columns
attribute returns all column names.
service_df.columns
Index(['Service_type', 'Cost_Type', 'Product_Group', 'Serial_No', 'City',
'Product_Date', 'Receipt_Date', 'Appoint_Date', 'Complete_Date',
'TAT01', 'TAT02', 'Job_Satus', 'Defect_Des', 'Symptom_Desc',
'Repair_Action_Desc', 'Labor_Charge_Desc', 'Engineer',
'Labor_Charge_Amount', 'Parts_Amount', 'Discount_Amount',
'Total_Invoice_Amount'],
dtype='object')
dtypes
shows each column data type.
service_df.dtypes
Service_type int64
Cost_Type object
Product_Group object
Serial_No object
City object
Product_Date float64
Receipt_Date object
Appoint_Date object
Complete_Date object
TAT01 int64
TAT02 int64
Job_Satus int64
Defect_Des object
Symptom_Desc object
Repair_Action_Desc object
Labor_Charge_Desc object
Engineer object
Labor_Charge_Amount int64
Parts_Amount float64
Discount_Amount float64
Total_Invoice_Amount float64
dtype: object
info()
function summarise the dadaset.
service_df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 3136 entries, 1 to 3136
Data columns (total 21 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Service_type 3136 non-null int64
1 Cost_Type 3136 non-null object
2 Product_Group 3126 non-null object
3 Serial_No 3127 non-null object
4 City 3136 non-null object
5 Product_Date 724 non-null float64
6 Receipt_Date 3136 non-null object
7 Appoint_Date 3136 non-null object
8 Complete_Date 3136 non-null object
9 TAT01 3136 non-null int64
10 TAT02 3136 non-null int64
11 Job_Satus 3136 non-null int64
12 Defect_Des 3136 non-null object
13 Symptom_Desc 3136 non-null object
14 Repair_Action_Desc 3136 non-null object
15 Labor_Charge_Desc 3136 non-null object
16 Engineer 3136 non-null object
17 Labor_Charge_Amount 3136 non-null int64
18 Parts_Amount 3136 non-null float64
19 Discount_Amount 3136 non-null float64
20 Total_Invoice_Amount 3136 non-null float64
dtypes: float64(4), int64(5), object(12)
memory usage: 539.0+ KB
For numerical values describe()
function shows statical features of each column.
service_df.describe()
Service_type | Product_Date | TAT01 | TAT02 | Job_Satus | Labor_Charge_Amount | Parts_Amount | Discount_Amount | Total_Invoice_Amount | |
---|---|---|---|---|---|---|---|---|---|
count | 3136.000000 | 724.000000 | 3136.000000 | 3136.000000 | 3136.000000 | 3136.000000 | 3.136000e+03 | 3.136000e+03 | 3.136000e+03 |
mean | 0.105230 | 39227.758287 | 1.289222 | 6.647959 | 0.977360 | 129877.232143 | 2.303481e+06 | 4.844214e+05 | 1.924006e+06 |
std | 0.306898 | 1152.746282 | 5.060422 | 14.012700 | 0.148778 | 176843.065376 | 6.682868e+06 | 1.349372e+06 | 5.400547e+06 |
min | 0.000000 | 33870.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 |
25% | 0.000000 | 38451.000000 | 0.000000 | 1.000000 | 1.000000 | 0.000000 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 |
50% | 0.000000 | 38823.000000 | 0.000000 | 3.000000 | 1.000000 | 0.000000 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 |
75% | 0.000000 | 40188.000000 | 1.000000 | 6.000000 | 1.000000 | 250000.000000 | 7.510464e+04 | 8.738318e+04 | 3.827746e+05 |
max | 1.000000 | 40813.000000 | 81.000000 | 351.000000 | 1.000000 | 540000.000000 | 4.480716e+07 | 8.088432e+06 | 3.635773e+07 |
Columns can be accessed as a attribute to the dataframe. Each column in fact is a Series. we can count each repetative value by using value_count()
function.
service_df.Product_Group.value_counts()
LED40 1008
LCD-40 529
DIGITAL CAMERA 282
LED46 250
PRINTER - MONO LAS 151
DVD PLAYER 119
AUDIO - MP3 Player 109
HARD DISK DRIVE 102
AUDIO - HOME THEAT 102
REFRIGERATOR-SBS 68
DVD PLAYER/ RECORD 65
PLASMA-42 62
CAMCORDER 57
PLASMA-43 49
PRINTER - COLOR LA 35
MONITOR - TFT LCD 24
VACUUM CLEANER 24
TFT18 18
WASHING MACHINE 16
PLASMA-51 15
LED55 15
OVEN - MICROWAVE 11
LCD- 46 8
PLASMA-50 4
AIR CONDITIONER 3
Name: Product_Group, dtype: int64
Also we can group rows by columns. and get each group size.
service_df.groupby(by =["Product_Group", "Cost_Type"]).size()
Product_Group Cost_Type
AIR CONDITIONER Normal Warranty 2
Out of Warranty 1
AUDIO - HOME THEAT Normal Warranty 3
Out of Warranty 99
AUDIO - MP3 Player Normal Warranty 28
Out of Warranty 81
CAMCORDER Normal Warranty 5
Out of Warranty 52
DIGITAL CAMERA Normal Warranty 149
Out of Warranty 133
DVD PLAYER Normal Warranty 53
Out of Warranty 66
DVD PLAYER/ RECORD Extended Warranty 11
Normal Warranty 18
Out of Warranty 36
HARD DISK DRIVE Extended Warranty 63
Normal Warranty 14
Out of Warranty 25
LCD- 46 Normal Warranty 5
Out of Warranty 3
LCD-40 Normal Warranty 522
Out of Warranty 7
LED40 Normal Warranty 1003
Out of Warranty 5
LED46 Normal Warranty 247
Out of Warranty 3
LED55 Normal Warranty 14
Out of Warranty 1
MONITOR - TFT LCD Normal Warranty 5
Out of Warranty 19
OVEN - MICROWAVE Normal Warranty 3
Out of Warranty 8
PLASMA-42 Normal Warranty 55
Out of Warranty 7
PLASMA-43 Normal Warranty 45
Out of Warranty 4
PLASMA-50 Normal Warranty 3
Out of Warranty 1
PLASMA-51 Normal Warranty 12
Out of Warranty 3
PRINTER - COLOR LA Normal Warranty 14
Out of Warranty 21
PRINTER - MONO LAS Normal Warranty 57
Out of Warranty 94
REFRIGERATOR-SBS Normal Warranty 32
Out of Warranty 36
TFT18 Extended Warranty 2
Normal Warranty 15
Out of Warranty 1
VACUUM CLEANER Normal Warranty 1
Out of Warranty 23
WASHING MACHINE Out of Warranty 16
dtype: int64
In fact the result of above command is a Series which have multiple indexes. by using unstack()
, we reshape the series into a Dataframe
service_df.groupby(by =["Product_Group", "Cost_Type"]).size().unstack()
Cost_Type | Extended Warranty | Normal Warranty | Out of Warranty |
---|---|---|---|
Product_Group | |||
AIR CONDITIONER | NaN | 2.0 | 1.0 |
AUDIO - HOME THEAT | NaN | 3.0 | 99.0 |
AUDIO - MP3 Player | NaN | 28.0 | 81.0 |
CAMCORDER | NaN | 5.0 | 52.0 |
DIGITAL CAMERA | NaN | 149.0 | 133.0 |
DVD PLAYER | NaN | 53.0 | 66.0 |
DVD PLAYER/ RECORD | 11.0 | 18.0 | 36.0 |
HARD DISK DRIVE | 63.0 | 14.0 | 25.0 |
LCD- 46 | NaN | 5.0 | 3.0 |
LCD-40 | NaN | 522.0 | 7.0 |
LED40 | NaN | 1003.0 | 5.0 |
LED46 | NaN | 247.0 | 3.0 |
LED55 | NaN | 14.0 | 1.0 |
MONITOR - TFT LCD | NaN | 5.0 | 19.0 |
OVEN - MICROWAVE | NaN | 3.0 | 8.0 |
PLASMA-42 | NaN | 55.0 | 7.0 |
PLASMA-43 | NaN | 45.0 | 4.0 |
PLASMA-50 | NaN | 3.0 | 1.0 |
PLASMA-51 | NaN | 12.0 | 3.0 |
PRINTER - COLOR LA | NaN | 14.0 | 21.0 |
PRINTER - MONO LAS | NaN | 57.0 | 94.0 |
REFRIGERATOR-SBS | NaN | 32.0 | 36.0 |
TFT18 | 2.0 | 15.0 | 1.0 |
VACUUM CLEANER | NaN | 1.0 | 23.0 |
WASHING MACHINE | NaN | NaN | 16.0 |
let’s count NaN values in each columns.
service_df.isna().sum()
Service_type 0
Cost_Type 0
Product_Group 10
Serial_No 9
City 0
Product_Date 2412
Receipt_Date 0
Appoint_Date 0
Complete_Date 0
TAT01 0
TAT02 0
Job_Satus 0
Defect_Des 0
Symptom_Desc 0
Repair_Action_Desc 0
Labor_Charge_Desc 0
Engineer 0
Labor_Charge_Amount 0
Parts_Amount 0
Discount_Amount 0
Total_Invoice_Amount 0
dtype: int64
let’s see rows with NaN values
service_df[service_df[['Product_Group','Serial_No']].isna().any(1)]
Service_type | Cost_Type | Product_Group | Serial_No | City | Product_Date | Receipt_Date | Appoint_Date | Complete_Date | TAT01 | TAT02 | Job_Satus | Defect_Des | Symptom_Desc | Repair_Action_Desc | Labor_Charge_Desc | Engineer | Labor_Charge_Amount | Parts_Amount | Discount_Amount | Total_Invoice_Amount | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
No | |||||||||||||||||||||
412 | 0 | Out of Warranty | NaN | 2P-/BAXS-E/KLKLK | Tehran | 38818.0 | 12/29/2012 | 12/31/2012 | 1/8/2013 | 2 | 10 | 1 | WORN-OUT | OTHER PROBLEMS | OTHER REPAIR | Free Of Charge | Tech16 | 0 | 0.00 | 0.000 | 0.000 |
542 | 1 | Normal Warranty | PRINTER - MONO LAS | NaN | Tehran | NaN | 12/31/2012 | 12/31/2012 | 1/1/2013 | 0 | 1 | 1 | WORN-OUT | OTHER PROBLEMS | OTHER REPAIR | Maximum | Tech20 | 350000 | 0.00 | 70000.000 | 280000.000 |
626 | 0 | Out of Warranty | NaN | ZADCKLVKSAE2KL2K | MASHHAD | 38814.0 | 1/1/2013 | 1/8/2013 | 1/13/2013 | 7 | 12 | 1 | DEFORMED | Test OK | No Action | Free Of Charge | Tech08 | 0 | 0.00 | 0.000 | 0.000 |
704 | 0 | Out of Warranty | NaN | A/&2GV2B&E/KL2K | Tehran | 40548.0 | 1/1/2013 | 1/2/2013 | 1/5/2013 | 1 | 4 | 1 | Software Bug | ZOOM PROBLEM | FIRMWARE UPGRADE | Major | Tech08 | 250000 | 0.00 | 50000.000 | 200000.000 |
876 | 0 | Out of Warranty | VACUUM CLEANER | NaN | Tehran | NaN | 1/2/2013 | 1/6/2013 | 1/6/2013 | 4 | 4 | 1 | WORN-OUT | OTHER PROBLEMS | ELECTRICAL PARTS REPLACEMENT | Free Of Charge | Tech17 | 0 | 0.00 | 0.000 | 0.000 |
934 | 0 | Normal Warranty | NaN | ZTAPKLCABA/2IP24L | Tehran | NaN | 1/5/2013 | 1/5/2013 | 1/5/2013 | 0 | 0 | 1 | WORN-OUT | READ ERROR | MECHANICAL PARTS REPLACEMENT | Major | Tech12 | 0 | 1158705.72 | 231741.144 | 826864.576 |
1059 | 0 | Normal Warranty | NaN | ZTAPKLCIBAE&&/D | Tehran | NaN | 1/5/2013 | 1/6/2013 | 1/7/2013 | 1 | 2 | 1 | WORN-OUT | READ ERROR | MECHANICAL PARTS REPLACEMENT | Medium | Tech12 | 0 | 1158705.72 | 231741.144 | 826864.576 |
1207 | 0 | Normal Warranty | NaN | ZTAPKLCIBB/&IP47A | Tehran | NaN | 1/6/2013 | 1/6/2013 | 1/10/2013 | 0 | 4 | 1 | WORN-OUT | OTHER PROBLEMS | ELECTRICAL PARTS REPLACEMENT | Major | Tech14 | 170000 | 1158705.72 | 265741.144 | 1062864.576 |
1213 | 0 | Out of Warranty | NaN | 2P7IPBAKPQEKL27/N | Tehran | 38080.0 | 1/6/2013 | 1/6/2013 | 1/7/2013 | 0 | 1 | 1 | WORN-OUT | Test OK | No Action | Free Of Charge | Tech20 | 0 | 0.00 | 0.000 | 0.000 |
1235 | 0 | Out of Warranty | VACUUM CLEANER | NaN | Tehran | NaN | 1/6/2013 | 1/8/2013 | 1/9/2013 | 2 | 3 | 1 | WORN-OUT | OTHER PROBLEMS | ELECTRICAL PARTS REPLACEMENT | Minor | Tech17 | 145000 | 15526.44 | 32105.288 | 128421.152 |
1550 | 0 | Normal Warranty | NaN | ZTAPKLCIB2/A/7KLZ | Tehran | NaN | 1/9/2013 | 1/9/2013 | 1/9/2013 | 0 | 0 | 1 | WORN-OUT | READ ERROR | MECHANICAL PARTS REPLACEMENT | Medium | Tech12 | 0 | 1158705.72 | 231741.144 | 826864.576 |
1571 | 0 | Normal Warranty | NaN | ZTAPKLCIBA/A4QEP | ARDEBIL | NaN | 1/9/2013 | 1/9/2013 | 1/27/2013 | 0 | 18 | 1 | WORN-OUT | OTHER PROBLEMS | ELECTRICAL PARTS REPLACEMENT | Free Of Charge | Tech18 | 0 | 0.00 | 0.000 | 0.000 |
1617 | 0 | Out of Warranty | NaN | A/&KLKLVBB&E/2AN | Tehran | 40548.0 | 1/9/2013 | 1/10/2013 | 1/13/2013 | 1 | 4 | 1 | DEFORMED | NO POWER | ELECTRICAL PARTS REPLACEMENT | Major | Tech08 | 250000 | 0.00 | 50000.000 | 200000.000 |
1814 | 0 | Out of Warranty | DVD PLAYER | NaN | Tehran | NaN | 1/14/2013 | 1/14/2013 | 1/26/2013 | 0 | 12 | 1 | WORN-OUT | OTHER PROBLEMS | ELECTRICAL PARTS REPLACEMENT | Minor | Tech14 | 120000 | 868118.56 | 187823.812 | 781285.648 |
2032 | 0 | Out of Warranty | DIGITAL CAMERA | NaN | Tehran | NaN | 1/15/2013 | 1/15/2013 | 1/21/2013 | 0 | 6 | 1 | DEFORMED | OTHER PROBLEMS | MECHANICAL PARTS REPLACEMENT | Free Of Charge | Tech11 | 0 | 0.00 | 0.000 | 0.000 |
2036 | 0 | Out of Warranty | VACUUM CLEANER | NaN | Tehran | NaN | 1/16/2013 | 1/20/2013 | 1/20/2013 | 4 | 4 | 1 | WORN-OUT | OTHER PROBLEMS | ELECTRICAL PARTS REPLACEMENT | Free Of Charge | Tech17 | 0 | 18858.40 | 3871.880 | 15887.520 |
2269 | 0 | Out of Warranty | AUDIO - HOME THEAT | NaN | Tehran | NaN | 1/20/2013 | 1/20/2013 | 1/20/2013 | 0 | 0 | 1 | WORN-OUT | OTHER PROBLEMS | ELECTRICAL PARTS REPLACEMENT | Free Of Charge | Tech18 | 0 | 1450087.28 | 280018.456 | 1160077.824 |
2772 | 0 | Out of Warranty | DVD PLAYER | NaN | Tehran | NaN | 1/24/2013 | 1/24/2013 | 1/26/2013 | 0 | 2 | 1 | WORN-OUT | OTHER PROBLEMS | MECHANICAL PARTS REPLACEMENT | Minimum | Tech18 | 70000 | 868118.56 | 187823.812 | 751285.648 |
2892 | 0 | Out of Warranty | DIGITAL CAMERA | NaN | ABHAR | NaN | 1/27/2013 | 1/27/2013 | 1/28/2013 | 0 | 1 | 1 | DEFORMED | OTHER PROBLEMS | SCRAP | Free Of Charge | Tech11 | 0 | 0.00 | 0.000 | 0.000 |
Data Cleaning
In this part, based on what we want to do with the data, we restructure, reformat and validate the data.
for start, let’s remove columns that’s not useful for our hypothetical purpose.
- Does
Serial_No
really help the model to decide how much time it takes to be repaired? - or is
Product_Date
values meaningful?
if the answers to these questions are No, we can drop those columns.
service_df = service_df.drop(columns=['Product_Date', 'Serial_No'])
service_df
Service_type | Cost_Type | Product_Group | City | Receipt_Date | Appoint_Date | Complete_Date | TAT01 | TAT02 | Job_Satus | Defect_Des | Symptom_Desc | Repair_Action_Desc | Labor_Charge_Desc | Engineer | Labor_Charge_Amount | Parts_Amount | Discount_Amount | Total_Invoice_Amount | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
No | |||||||||||||||||||
1 | 0 | Normal Warranty | AUDIO - HOME THEAT | Tehran | 2/5/2012 | 2/5/2012 | 1/21/2013 | 0 | 351 | 0 | WORN-OUT | OTHER PROBLEMS | SOFTWARE UPGRADE | Free Of Charge | Tech15 | 0 | 0.00 | 0.000 | 0.000 |
2 | 1 | Normal Warranty | REFRIGERATOR-SBS | Tehran | 7/7/2012 | 9/10/2012 | 1/31/2013 | 65 | 208 | 1 | SHORT | OTHER PROBLEMS | OTHER REPAIR | Free Of Charge | Tech03 | 0 | 3437481.60 | 687486.320 | 2748885.280 |
3 | 1 | Normal Warranty | REFRIGERATOR-SBS | Tehran | 7/7/2012 | 8/15/2012 | 1/31/2013 | 38 | 208 | 1 | OPEN | OTHER PROBLEMS | OTHER REPAIR | Free Of Charge | Tech03 | 0 | 4228756.56 | 845751.312 | 3383005.248 |
4 | 0 | Out of Warranty | MONITOR - TFT LCD | Tehran | 8/7/2012 | 8/7/2012 | 1/13/2013 | 0 | 158 | 1 | WORN-OUT | OTHER PROBLEMS | ELECTRICAL PARTS REPLACEMENT | Free Of Charge | Tech15 | 0 | 0.00 | 0.000 | 0.000 |
5 | 0 | Out of Warranty | AUDIO - MP3 Player | Tehran | 8/7/2012 | 8/9/2012 | 1/7/2013 | 2 | 153 | 1 | WORN-OUT | OTHER PROBLEMS | SOFTWARE UPGRADE | Medium | Tech07 | 120000 | 765488.60 | 177087.820 | 708381.680 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
3132 | 0 | Normal Warranty | LCD-40 | REZVANSHAHR | 1/31/2013 | 1/31/2013 | 1/31/2013 | 0 | 0 | 1 | WORN-OUT | OTHER PROBLEMS | ELECTRICAL PARTS REPLACEMENT | Major | Tech13 | 420000 | 0.00 | 84000.000 | 336000.000 |
3133 | 0 | Normal Warranty | LED46 | ESFAHAN | 1/31/2013 | 1/31/2013 | 1/31/2013 | 0 | 0 | 1 | WORN-OUT | OTHER PROBLEMS | ELECTRICAL PARTS REPLACEMENT | Major | Tech13 | 420000 | 0.00 | 84000.000 | 336000.000 |
3134 | 0 | Extended Warranty | HARD DISK DRIVE | Tehran | 1/31/2013 | 1/31/2013 | 1/31/2013 | 0 | 0 | 1 | WORN-OUT | OTHER PROBLEMS | REPAIR REFUSAL | Minimum | Tech07 | 50000 | 0.00 | 10000.000 | 40000.000 |
3135 | 0 | Out of Warranty | HARD DISK DRIVE | Tehran | 1/31/2013 | 1/31/2013 | 1/31/2013 | 0 | 0 | 1 | WORN-OUT | OTHER PROBLEMS | REPAIR REFUSAL | Free Of Charge | Tech07 | 0 | 0.00 | 0.000 | 0.000 |
3136 | 0 | Out of Warranty | DVD PLAYER/ RECORD | Tehran | 1/31/2013 | 1/31/2013 | 1/31/2013 | 0 | 0 | 1 | WORN-OUT | OTHER PROBLEMS | MECHANICAL PARTS REPLACEMENT | Minor | Tech18 | 150000 | 1488482.00 | 328686.400 | 1318785.600 |
3136 rows × 19 columns
Dataset at hand have some NaN values. what does it mean? is it that the value exist but not provided or there isn’t anything to fill that field?
possible solutions
- drop data
- drop the whole row
- drop the whole column
- Imputation
- replace it by mean
- replace it by frequency
- adding unkown category
- …
it’s important to have a logical reason. it’s not reasonable to fill NaN values with mean when NaN occupies half of that column.
Where’s the NaN values ?
service_df.isna().sum()
Service_type 0
Cost_Type 0
Product_Group 10
City 0
Receipt_Date 0
Appoint_Date 0
Complete_Date 0
TAT01 0
TAT02 0
Job_Satus 0
Defect_Des 0
Symptom_Desc 0
Repair_Action_Desc 0
Labor_Charge_Desc 0
Engineer 0
Labor_Charge_Amount 0
Parts_Amount 0
Discount_Amount 0
Total_Invoice_Amount 0
dtype: int64
Let’s see rows with NaN values.
service_df[pd.isna(service_df.Product_Group)]
Service_type | Cost_Type | Product_Group | City | Receipt_Date | Appoint_Date | Complete_Date | TAT01 | TAT02 | Job_Satus | Defect_Des | Symptom_Desc | Repair_Action_Desc | Labor_Charge_Desc | Engineer | Labor_Charge_Amount | Parts_Amount | Discount_Amount | Total_Invoice_Amount | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
No | |||||||||||||||||||
412 | 0 | Out of Warranty | NaN | Tehran | 12/29/2012 | 12/31/2012 | 1/8/2013 | 2 | 10 | 1 | WORN-OUT | OTHER PROBLEMS | OTHER REPAIR | Free Of Charge | Tech16 | 0 | 0.00 | 0.000 | 0.000 |
626 | 0 | Out of Warranty | NaN | MASHHAD | 1/1/2013 | 1/8/2013 | 1/13/2013 | 7 | 12 | 1 | DEFORMED | Test OK | No Action | Free Of Charge | Tech08 | 0 | 0.00 | 0.000 | 0.000 |
704 | 0 | Out of Warranty | NaN | Tehran | 1/1/2013 | 1/2/2013 | 1/5/2013 | 1 | 4 | 1 | Software Bug | ZOOM PROBLEM | FIRMWARE UPGRADE | Major | Tech08 | 250000 | 0.00 | 50000.000 | 200000.000 |
934 | 0 | Normal Warranty | NaN | Tehran | 1/5/2013 | 1/5/2013 | 1/5/2013 | 0 | 0 | 1 | WORN-OUT | READ ERROR | MECHANICAL PARTS REPLACEMENT | Major | Tech12 | 0 | 1158705.72 | 231741.144 | 826864.576 |
1059 | 0 | Normal Warranty | NaN | Tehran | 1/5/2013 | 1/6/2013 | 1/7/2013 | 1 | 2 | 1 | WORN-OUT | READ ERROR | MECHANICAL PARTS REPLACEMENT | Medium | Tech12 | 0 | 1158705.72 | 231741.144 | 826864.576 |
1207 | 0 | Normal Warranty | NaN | Tehran | 1/6/2013 | 1/6/2013 | 1/10/2013 | 0 | 4 | 1 | WORN-OUT | OTHER PROBLEMS | ELECTRICAL PARTS REPLACEMENT | Major | Tech14 | 170000 | 1158705.72 | 265741.144 | 1062864.576 |
1213 | 0 | Out of Warranty | NaN | Tehran | 1/6/2013 | 1/6/2013 | 1/7/2013 | 0 | 1 | 1 | WORN-OUT | Test OK | No Action | Free Of Charge | Tech20 | 0 | 0.00 | 0.000 | 0.000 |
1550 | 0 | Normal Warranty | NaN | Tehran | 1/9/2013 | 1/9/2013 | 1/9/2013 | 0 | 0 | 1 | WORN-OUT | READ ERROR | MECHANICAL PARTS REPLACEMENT | Medium | Tech12 | 0 | 1158705.72 | 231741.144 | 826864.576 |
1571 | 0 | Normal Warranty | NaN | ARDEBIL | 1/9/2013 | 1/9/2013 | 1/27/2013 | 0 | 18 | 1 | WORN-OUT | OTHER PROBLEMS | ELECTRICAL PARTS REPLACEMENT | Free Of Charge | Tech18 | 0 | 0.00 | 0.000 | 0.000 |
1617 | 0 | Out of Warranty | NaN | Tehran | 1/9/2013 | 1/10/2013 | 1/13/2013 | 1 | 4 | 1 | DEFORMED | NO POWER | ELECTRICAL PARTS REPLACEMENT | Major | Tech08 | 250000 | 0.00 | 50000.000 | 200000.000 |
based on the task you about to do with data after the data is prepared, decide how much droping rows is logical. if 10 rows is not a lot, we can drop them.
service_df.dropna(inplace=True)
what cities are in the dataset?
print(sorted(service_df.City.unique()))
['ABADAN', 'ABHAR', 'AHVAZ', 'AMOL', 'ARAK', 'ARDEBI', 'ARDEBIL', 'BABOL', 'BABOLSAR', 'BANDAR ABBAS', 'BANDAR ANZALI', 'BANDARABBAS', 'BARDSIR', 'BEHBAHAN', 'BIJAR', 'BIRJAND', 'BOJNORD', 'BOKAN', 'BOMEHEN', 'BOROJERD', 'CHABAHAR', 'CHALOOS', 'CHALOUS', 'CHENARAN', 'DEZFOL', 'DEZFOUL', 'DOROD', 'ESFAHAN', 'ESFAHN', 'ESFERAYN', 'ESLAM ABAD', 'ESLAM SHAHR', 'ESLAMSHAHR', 'FARDIS', 'FARHANG', 'GARMSAR', 'GHAEM SHAHR', 'GHAEMSHAHR', 'GHARCHAK', 'GOLPAYGAN', 'GONBAD', 'GONBAD KAVOS', 'GORGAN', 'HAMEDAN', 'HASHTGERD', 'HSMEDAN', 'ILAM', 'IZEH', 'JAHROM', 'JEYROFT', 'KARAJ', 'KASHAN', 'KASHMAR', 'KERMAN', 'KERMANSHAH', 'KHOMAM', 'KHOMEIN', 'KHORAM ABAD', 'KHOY', 'KhoramAbad', 'LAHIJAN', 'LANGEROD', 'LANGEROOD', 'MAHMOUD ABAD', 'MAHSHAHR', 'MALAYER', 'MANJIL', 'MASHHAD', 'MEHRSHAHR', 'MIYANEH', 'NAHAVAND', 'NIYSHABOUR', 'NOR', 'NOSHAHR', 'NOUR', 'NOUSHAHR', 'OROMIYEH', 'PARS ABAD', 'QAZVIN', 'QOM', 'RAFSANJAN', 'RASHT', 'REZVANSHAHR', 'ROBAT KARIM', 'ROBATKARIM', 'RODSAR', 'SABZEVAR', 'SALMAS', 'SARAF ZADEH', 'SARBANDAR', 'SARI', 'SAVAD KOOH', 'SAVEH', 'SAVOJBOLAGH', 'SEMNAN', 'SHAHR GHODS', 'SHAHRE GHODS', 'SHAHRE KORD', 'SHAHRGHODS', 'SHAHROD', 'SHAHROOD', 'SHAHRYAR', 'SHAZAND', 'SHIRAZ', 'SIRJAN', 'TABAS', 'TABRIZ', 'TALESH', 'TEHRAN', 'TONEKABON', 'Tehran', 'VARAMIN', 'YAZD', 'ZAHEDAN', 'ZANJAN']
Look at the values above, what you see? we have multiple ways of writing same city name.
- ‘ARDEBI’, ‘ARDEBIL’
- ‘BANDAR ABBAS’, ‘BANDARABBAS’
- ‘CHALOOS’, ‘CHALOUS’
- ‘DEZFOL’, ‘DEZFOUL’
- ‘ESFAHAN’, ‘ESFAHN’
- ‘ESLAMSHAHR’, ‘ESLAM SHAHR’
- ‘GHAEM SHAHR’, ‘GHAEMSHAHR’
- ‘HAMEDAN’, ‘HSMEDAN’
- ‘KHORAM ABAD’, ‘KhoramAbad’
- ‘LANGEROD’, ‘LANGEROOD’
- ‘NOR’, ‘NOUR’
- ‘NOSHAHR’, ‘NOUSHAHR’
- ‘ROBAT KARIM’, ‘ROBATKARIM’
- ‘SHAHR GHODS’, ‘SHAHRE GHODS’, ‘SHAHRGHODS’
- ‘SHAHROD’, ‘SHAHROOD’
- ‘TEHRAN’, ‘Tehran’
so we should transform them into a common format. it’s a Data standardization.
city_dictionary = {'ARDEBIL': 'ARDEBI', 'BANDARABBAS': 'BANDAR ABBAS', 'CHALOUS': 'CHALOOS', 'DEZFOUL': 'DEZFOL', 'ESFAHN': 'ESFAHAN', 'ESLAM SHAHR': 'ESLAMSHAHR', 'GHAEMSHAHR': 'GHAEM SHAHR', 'HAMEDAN': 'HSMEDAN', 'KhoramAbad': 'KHORAM ABAD', 'LANGEROOD': 'LANGEROD', 'NOUR': 'NOR', 'NOUSHAHR': 'NOSHAHR', 'ROBATKARIM': 'ROBAT KARIM','SHAHRE GHODS': 'SHAHR GHODS', 'SHAHRGHODS': 'SHAHR GHODS', 'SHAHROOD': 'SHAHROD', 'Tehran': 'TEHRAN'}
service_df = service_df.replace({'City': city_dictionary})
Now let’s check if dataframe have duplicate rows or not.
service_df.duplicated().any()
True
unfortunately, it does. since we’re looking at the dataframe as Analytical base table(ABT), it’s better to drop them.
service_df.drop_duplicates(inplace=True)
service_df.shape
(1971, 19)
let’s look again at dtypes. we can convert them into more sufficent types.
service_df.dtypes
Service_type int64
Cost_Type object
Product_Group object
City object
Receipt_Date object
Appoint_Date object
Complete_Date object
TAT01 int64
TAT02 int64
Job_Satus int64
Defect_Des object
Symptom_Desc object
Repair_Action_Desc object
Labor_Charge_Desc object
Engineer object
Labor_Charge_Amount int64
Parts_Amount float64
Discount_Amount float64
Total_Invoice_Amount float64
dtype: object
Receipt_Date
, Appoint_Date
and Complete_Date
elements are object, let’s convert them to datetime type.
Remember that we have explored the data. so we know data is using this (%m/%d/%Y) time format.
service_df[["Receipt_Date", "Appoint_Date", "Complete_Date"]] = service_df[["Receipt_Date", "Appoint_Date", "Complete_Date"]].apply(lambda _: pd.to_datetime(_, format='%m/%d/%Y'))
Now let’s see all values for each column with object dtype
for column in service_df.select_dtypes(include=[object]).columns:
print(column, sorted(service_df[column].unique()), sep="\n\t")
print("-------------------------")
Cost_Type
['Extended Warranty', 'Normal Warranty', 'Out of Warranty']
-------------------------
Product_Group
['AIR CONDITIONER ', 'AUDIO - HOME THEAT', 'AUDIO - MP3 Player', 'CAMCORDER', 'DIGITAL CAMERA', 'DVD PLAYER', 'DVD PLAYER/ RECORD', 'HARD DISK DRIVE', 'LCD- 46', 'LCD-40', 'LED40', 'LED46', 'LED55', 'MONITOR - TFT LCD ', 'OVEN - MICROWAVE', 'PLASMA-42', 'PLASMA-43', 'PLASMA-50', 'PLASMA-51', 'PRINTER - COLOR LA', 'PRINTER - MONO LAS', 'REFRIGERATOR-SBS', 'TFT18', 'VACUUM CLEANER', 'WASHING MACHINE']
-------------------------
City
['ABADAN', 'ABHAR', 'AHVAZ', 'AMOL', 'ARAK', 'ARDEBI', 'BABOL', 'BABOLSAR', 'BANDAR ABBAS', 'BANDAR ANZALI', 'BARDSIR', 'BEHBAHAN', 'BIJAR', 'BIRJAND', 'BOJNORD', 'BOKAN', 'BOMEHEN', 'BOROJERD', 'CHABAHAR', 'CHALOOS', 'CHENARAN', 'DEZFOL', 'DOROD', 'ESFAHAN', 'ESFERAYN', 'ESLAM ABAD', 'ESLAMSHAHR', 'FARDIS', 'FARHANG', 'GARMSAR', 'GHAEM SHAHR', 'GHARCHAK', 'GOLPAYGAN', 'GONBAD', 'GONBAD KAVOS', 'GORGAN', 'HASHTGERD', 'HSMEDAN', 'ILAM', 'IZEH', 'JAHROM', 'JEYROFT', 'KARAJ', 'KASHAN', 'KASHMAR', 'KERMAN', 'KERMANSHAH', 'KHOMAM', 'KHOMEIN', 'KHORAM ABAD', 'KHOY', 'LAHIJAN', 'LANGEROD', 'MAHMOUD ABAD', 'MAHSHAHR', 'MALAYER', 'MANJIL', 'MASHHAD', 'MEHRSHAHR', 'MIYANEH', 'NAHAVAND', 'NIYSHABOUR', 'NOR', 'NOSHAHR', 'OROMIYEH', 'PARS ABAD', 'QAZVIN', 'QOM', 'RAFSANJAN', 'RASHT', 'REZVANSHAHR', 'ROBAT KARIM', 'RODSAR', 'SABZEVAR', 'SALMAS', 'SARAF ZADEH', 'SARBANDAR', 'SARI', 'SAVAD KOOH', 'SAVEH', 'SAVOJBOLAGH', 'SEMNAN', 'SHAHR GHODS', 'SHAHRE KORD', 'SHAHROD', 'SHAHRYAR', 'SHAZAND', 'SHIRAZ', 'SIRJAN', 'TABAS', 'TABRIZ', 'TALESH', 'TEHRAN', 'TONEKABON', 'VARAMIN', 'YAZD', 'ZAHEDAN', 'ZANJAN']
-------------------------
Defect_Des
['COLD SOLDER', 'DEFORMED', 'DIRTY', 'DRY', 'LEAKING', 'OPEN', 'SHORT', 'Software Bug', 'WORN-OUT', 'WRONG PARTS']
-------------------------
Symptom_Desc
['BEATING ON PICTURE', 'BUTTON/DIAL NOT WORKING', 'Broken External Flash', 'Cosmetic replacement(customer request)', 'DISPLAY DIM', 'Defective Date Mode', 'Defective Eye Cup', 'Defective LCD Touch Screen', 'Defective Usb Cable', 'EXCESSIVE COLOR', 'Exterior Problem/damage', 'FAULTY COUNTER/TIMER', 'FAULTY OSD', 'FIRMWARE UPAGRADE REQUESTED', 'HORIZONTAL LINE PRINTED', 'MEMORY ERASED/MISSED', 'NO AUDIO', 'NO LOADING', 'NO PICTURE/NO RASTER', 'NO POWER', 'No Power', 'OTHER FAULTY INDICATION', 'OTHER PROBLEMS', 'OUTCASE CRACK', 'PICTURE SLIDING', 'PICTURE TOO BRIGHT', 'PICTURE TOO DARK', 'PIXEL MISSING', 'POOR FOCUS', 'POWER OFF NOT WORKING', 'POWER ON, BUT NO OPERATION', 'Power On, But No Operation', 'READ ERROR', 'RESET NOT WORKING', 'SHORT BATTERY CHARGING', 'SHORT BATTERY LIFE', 'SNOWY PICTURE', 'SOUND CLIPPING', 'Test OK', 'UNIT AUTO-OFF', 'VERTICAL LINE PRINTED']
-------------------------
Repair_Action_Desc
['ADJUSTMENT', 'CHANGE LCD', 'CHANGE OUTCASE', 'CLEANING', 'Cosmetic replacement(customer request)', "DECK ASS'Y REPAIR/REPLACEMENT", 'DECK CLEANING', 'ELECTRICAL PARTS REPLACEMENT', 'FIRMWARE UPGRADE', 'HEAD CLEANING', 'MECHANICAL PARTS REPLACEMENT', 'MECHANICAL RE-ASSEMBLING', 'No Action', 'OTHER CLEANING', 'OTHER ELECTRICAL ADJUSTMENT', 'OTHER MECHANICAL ADJUSTMENT', 'OTHER REPAIR', 'PARTS REMOVAL', 'PCB REPLACEMENT', 'PICKUP CLEANING', 'REPAIR REFUSAL', 'SCRAP', 'SET EXCHANGE', 'SOFTWARE UPGRADE', 'SOLDERING', 'TELECOM CARRIER PROBLEM']
-------------------------
Labor_Charge_Desc
['Free Of Charge', 'Major', 'Maximum', 'Medium', 'Minimum', 'Minor']
-------------------------
Engineer
['Tech01', 'Tech02', 'Tech03', 'Tech04', 'Tech05', 'Tech06', 'Tech07', 'Tech08', 'Tech10', 'Tech11', 'Tech12', 'Tech13', 'Tech14', 'Tech15', 'Tech16', 'Tech17', 'Tech18', 'Tech20', 'Tech21', 'Tech22', 'Tech23', 'Tech24']
-------------------------
it seems all columns with object dtype can be seen as category type.
for column in service_df.select_dtypes(include=[object]).columns:
service_df[column] = service_df[column].astype('category')
Now let’s see all values for each column with integer dtype
for column in service_df.select_dtypes(include=['int64']).columns:
print(column, sorted(service_df[column].unique()), sep="\n\t")
print("-------------------------")
Service_type
[0, 1]
-------------------------
TAT01
[0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 34, 36, 37, 38, 40, 42, 44, 50, 54, 55, 58, 64, 65, 67, 68, 81]
-------------------------
TAT02
[0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21, 22, 23, 24, 25, 26, 27, 28, 30, 31, 32, 33, 34, 35, 36, 37, 38, 40, 41, 42, 43, 44, 45, 46, 47, 48, 50, 51, 52, 54, 56, 58, 60, 65, 68, 73, 75, 77, 78, 80, 81, 82, 84, 87, 88, 105, 122, 135, 153, 158, 208, 351]
-------------------------
Job_Satus
[0, 1]
-------------------------
Labor_Charge_Amount
[0, 50000, 70000, 85000, 120000, 145000, 150000, 170000, 180000, 200000, 215000, 240000, 250000, 300000, 350000, 360000, 400000, 420000, 540000]
-------------------------
also Service_type
and Job_Satus
can be seen as categorical data type.
service_df['Service_type'] = service_df['Service_type'].astype('category')
service_df['Job_Satus'] = service_df['Job_Satus'].astype('category')
Scaling and Normalization
Labor_Charge_Amount
, Parts_Amount
, Discount_Amount
and Total_Invoice_Amount
values shows how big their are. we can scale them down
service_df['Labor_Charge_Amount'] = service_df['Labor_Charge_Amount'].div(1000)
service_df['Parts_Amount'] = service_df['Parts_Amount'].div(1000)
service_df['Discount_Amount'] = service_df['Discount_Amount'].div(1000)
service_df['Total_Invoice_Amount'] = service_df['Total_Invoice_Amount'].div(1000)
service_df.head()
Service_type | Cost_Type | Product_Group | City | Receipt_Date | Appoint_Date | Complete_Date | TAT01 | TAT02 | Job_Satus | Defect_Des | Symptom_Desc | Repair_Action_Desc | Labor_Charge_Desc | Engineer | Labor_Charge_Amount | Parts_Amount | Discount_Amount | Total_Invoice_Amount | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
No | |||||||||||||||||||
1 | 0 | Normal Warranty | AUDIO - HOME THEAT | TEHRAN | 2012-02-05 | 2012-02-05 | 2013-01-21 | 0 | 351 | 0 | WORN-OUT | OTHER PROBLEMS | SOFTWARE UPGRADE | Free Of Charge | Tech15 | 0.0 | 0.00000 | 0.000000 | 0.000000 |
2 | 1 | Normal Warranty | REFRIGERATOR-SBS | TEHRAN | 2012-07-07 | 2012-09-10 | 2013-01-31 | 65 | 208 | 1 | SHORT | OTHER PROBLEMS | OTHER REPAIR | Free Of Charge | Tech03 | 0.0 | 3437.48160 | 687.486320 | 2748.885280 |
3 | 1 | Normal Warranty | REFRIGERATOR-SBS | TEHRAN | 2012-07-07 | 2012-08-15 | 2013-01-31 | 38 | 208 | 1 | OPEN | OTHER PROBLEMS | OTHER REPAIR | Free Of Charge | Tech03 | 0.0 | 4228.75656 | 845.751312 | 3383.005248 |
4 | 0 | Out of Warranty | MONITOR - TFT LCD | TEHRAN | 2012-08-07 | 2012-08-07 | 2013-01-13 | 0 | 158 | 1 | WORN-OUT | OTHER PROBLEMS | ELECTRICAL PARTS REPLACEMENT | Free Of Charge | Tech15 | 0.0 | 0.00000 | 0.000000 | 0.000000 |
5 | 0 | Out of Warranty | AUDIO - MP3 Player | TEHRAN | 2012-08-07 | 2012-08-09 | 2013-01-07 | 2 | 153 | 1 | WORN-OUT | OTHER PROBLEMS | SOFTWARE UPGRADE | Medium | Tech07 | 120.0 | 765.48860 | 177.087820 | 708.381680 |
Dummy variables
how about Regression model? but all variables should be numerical whereas dataframe have some columns with categorical type.
to address, An indicator variable rendered as a numerical variable can be used to label categorical variables.
dummy_variable_cost_type = pd.get_dummies(service_df["Cost_Type"])
dummy_variable_cost_type
Extended Warranty | Normal Warranty | Out of Warranty | |
---|---|---|---|
No | |||
1 | 0 | 1 | 0 |
2 | 0 | 1 | 0 |
3 | 0 | 1 | 0 |
4 | 0 | 0 | 1 |
5 | 0 | 0 | 1 |
... | ... | ... | ... |
3132 | 0 | 1 | 0 |
3133 | 0 | 1 | 0 |
3134 | 1 | 0 | 0 |
3135 | 0 | 0 | 1 |
3136 | 0 | 0 | 1 |
1971 rows × 3 columns
as you can see, there’s only one cell in each row with value one.
Merging Datasets
so we have the service_df Dataframe and dummy_variable_cost_type Dataframe. we want to merge them into one Dataframe. in this case we append columns from dummy_variable_cost_type into service_df.
this link is a comprehensive description to merging
service_df = pd.concat([service_df, dummy_variable_cost_type], axis=1)
Droping original “Cost_Type” column since it’s redundant.
service_df.drop('Cost_Type', axis = 1, inplace=True)
service_df
Service_type | Product_Group | City | Receipt_Date | Appoint_Date | Complete_Date | TAT01 | TAT02 | Job_Satus | Defect_Des | Symptom_Desc | Repair_Action_Desc | Labor_Charge_Desc | Engineer | Labor_Charge_Amount | Parts_Amount | Discount_Amount | Total_Invoice_Amount | Extended Warranty | Normal Warranty | Out of Warranty | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
No | |||||||||||||||||||||
1 | 0 | AUDIO - HOME THEAT | TEHRAN | 2012-02-05 | 2012-02-05 | 2013-01-21 | 0 | 351 | 0 | WORN-OUT | OTHER PROBLEMS | SOFTWARE UPGRADE | Free Of Charge | Tech15 | 0.0 | 0.00000 | 0.000000 | 0.000000 | 0 | 1 | 0 |
2 | 1 | REFRIGERATOR-SBS | TEHRAN | 2012-07-07 | 2012-09-10 | 2013-01-31 | 65 | 208 | 1 | SHORT | OTHER PROBLEMS | OTHER REPAIR | Free Of Charge | Tech03 | 0.0 | 3437.48160 | 687.486320 | 2748.885280 | 0 | 1 | 0 |
3 | 1 | REFRIGERATOR-SBS | TEHRAN | 2012-07-07 | 2012-08-15 | 2013-01-31 | 38 | 208 | 1 | OPEN | OTHER PROBLEMS | OTHER REPAIR | Free Of Charge | Tech03 | 0.0 | 4228.75656 | 845.751312 | 3383.005248 | 0 | 1 | 0 |
4 | 0 | MONITOR - TFT LCD | TEHRAN | 2012-08-07 | 2012-08-07 | 2013-01-13 | 0 | 158 | 1 | WORN-OUT | OTHER PROBLEMS | ELECTRICAL PARTS REPLACEMENT | Free Of Charge | Tech15 | 0.0 | 0.00000 | 0.000000 | 0.000000 | 0 | 0 | 1 |
5 | 0 | AUDIO - MP3 Player | TEHRAN | 2012-08-07 | 2012-08-09 | 2013-01-07 | 2 | 153 | 1 | WORN-OUT | OTHER PROBLEMS | SOFTWARE UPGRADE | Medium | Tech07 | 120.0 | 765.48860 | 177.087820 | 708.381680 | 0 | 0 | 1 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
3132 | 0 | LCD-40 | REZVANSHAHR | 2013-01-31 | 2013-01-31 | 2013-01-31 | 0 | 0 | 1 | WORN-OUT | OTHER PROBLEMS | ELECTRICAL PARTS REPLACEMENT | Major | Tech13 | 420.0 | 0.00000 | 84.000000 | 336.000000 | 0 | 1 | 0 |
3133 | 0 | LED46 | ESFAHAN | 2013-01-31 | 2013-01-31 | 2013-01-31 | 0 | 0 | 1 | WORN-OUT | OTHER PROBLEMS | ELECTRICAL PARTS REPLACEMENT | Major | Tech13 | 420.0 | 0.00000 | 84.000000 | 336.000000 | 0 | 1 | 0 |
3134 | 0 | HARD DISK DRIVE | TEHRAN | 2013-01-31 | 2013-01-31 | 2013-01-31 | 0 | 0 | 1 | WORN-OUT | OTHER PROBLEMS | REPAIR REFUSAL | Minimum | Tech07 | 50.0 | 0.00000 | 10.000000 | 40.000000 | 1 | 0 | 0 |
3135 | 0 | HARD DISK DRIVE | TEHRAN | 2013-01-31 | 2013-01-31 | 2013-01-31 | 0 | 0 | 1 | WORN-OUT | OTHER PROBLEMS | REPAIR REFUSAL | Free Of Charge | Tech07 | 0.0 | 0.00000 | 0.000000 | 0.000000 | 0 | 0 | 1 |
3136 | 0 | DVD PLAYER/ RECORD | TEHRAN | 2013-01-31 | 2013-01-31 | 2013-01-31 | 0 | 0 | 1 | WORN-OUT | OTHER PROBLEMS | MECHANICAL PARTS REPLACEMENT | Minor | Tech18 | 150.0 | 1488.48200 | 328.686400 | 1318.785600 | 0 | 0 | 1 |
1971 rows × 21 columns
now store the prepared dataframe. and let’s say you are just interested in “HARD DISK DRIVE”s. to_csv()
and to_pickle
will do this.
Harddrive_df = service_df.loc[service_df['Product_Group'] == 'HARD DISK DRIVE']
Harddrive_df.to_csv('Harddrive.csv')
Iran’s Earthquakes through history
Opening Excel Files
as another example. let’s look at this new database
earthquake_dfs = pd.read_excel(join('Datasets', 'earthquake', 'data.xls'), sheet_name=None)
earthquake_dfs.keys()
dict_keys(['Historical', '1900-1963', '1964-2000'])
data is divided between three sheet. let’s look at two of them.
earthquake_df1 = earthquake_dfs['1900-1963']
earthquake_df1.head()
No. | Date | Unnamed: 2 | Unnamed: 3 | Time | Location | Unnamed: 6 | Unnamed: 7 | Magnitude | Unnamed: 9 | Ref. | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | NaN | Y | M | D | NaN | Lat. | Long. | Dept | mb | Ms | NaN |
1 | 1.0 | 1900 | 6 | NaN | NaN | 38.5 | 43.3 | NaN | NaN | 5 | MEA |
2 | 2.0 | 1900 | 7 | 12 | NaN | 40.28 | 43.1 | NaN | NaN | 5.9 | ULM |
3 | 3.0 | 1901 | 4 | 3 | 57.0 | 40 | 44.3 | 15 | NaN | 3.9 | MEA |
4 | 4.0 | 1902 | NaN | NaN | NaN | 39 | 43.3 | NaN | NaN | 5 | MEA |
earthquake_df2 = earthquake_dfs['1964-2000']
earthquake_df2.head()
No. | Date | Unnamed: 2 | Unnamed: 3 | Time | Location | Unnamed: 6 | Unnamed: 7 | Magnitude | Unnamed: 9 | Unnamed: 10 | Ref. | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | NaN | Y | M | D | NaN | Lat. | Long. | Dept | mb | Ms | Mw | NaN |
1 | 1.0 | 1964 | 1 | 12 | 124550.0 | 31.52 | 49.23 | 56 | 5.1 | NaN | NaN | ISC |
2 | 2.0 | 1964 | 1 | 19 | 91353.0 | 26.79 | 54 | 38 | 5.6 | NaN | NaN | ISC |
3 | 3.0 | 1964 | 1 | 30 | 122310.0 | 32.7 | 47.8 | 33 | NaN | NaN | NaN | PDE |
4 | 4.0 | 1964 | 2 | 5 | 102423.0 | 40.37 | 46.1 | 9 | 4.5 | NaN | NaN | ISC |
second row is header too. we could have use header=[0, 1]
as argument for csv_read
to fix this. but now let’s do it manually.
earthquake_df1.columns = ['No', 'Year', 'Month', 'Day', 'Time', 'Lat', 'Long', 'Dept', 'mb', 'Ms', 'ref']
earthquake_df1 = earthquake_df1.iloc[1:].reset_index(drop=True)
earthquake_df2.columns = ['No', 'Year', 'Month', 'Day', 'Time', 'Lat', 'Long', 'Dept', 'mb', 'Ms', 'Mw', 'ref']
earthquake_df2 = earthquake_df2.iloc[1:].reset_index(drop=True)
earthquake_df1.head()
No | Year | Month | Day | Time | Lat | Long | Dept | mb | Ms | ref | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1.0 | 1900 | 6 | NaN | NaN | 38.5 | 43.3 | NaN | NaN | 5 | MEA |
1 | 2.0 | 1900 | 7 | 12 | NaN | 40.28 | 43.1 | NaN | NaN | 5.9 | ULM |
2 | 3.0 | 1901 | 4 | 3 | 57.0 | 40 | 44.3 | 15 | NaN | 3.9 | MEA |
3 | 4.0 | 1902 | NaN | NaN | NaN | 39 | 43.3 | NaN | NaN | 5 | MEA |
4 | 5.0 | 1902 | 2 | 13 | 93906.0 | 40.72 | 48.71 | 33 | 6 | NaN | B77 |
Concatenation
we concat these two dataframe into one. notice that how new columns are the union of two dataframes’ columns.
earthquake_df_all = pd.concat([earthquake_df1, earthquake_df2])
earthquake_df_all.head()
No | Year | Month | Day | Time | Lat | Long | Dept | mb | Ms | ref | Mw | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1.0 | 1900 | 6 | NaN | NaN | 38.5 | 43.3 | NaN | NaN | 5 | MEA | NaN |
1 | 2.0 | 1900 | 7 | 12 | NaN | 40.28 | 43.1 | NaN | NaN | 5.9 | ULM | NaN |
2 | 3.0 | 1901 | 4 | 3 | 57.0 | 40 | 44.3 | 15 | NaN | 3.9 | MEA | NaN |
3 | 4.0 | 1902 | NaN | NaN | NaN | 39 | 43.3 | NaN | NaN | 5 | MEA | NaN |
4 | 5.0 | 1902 | 2 | 13 | 93906.0 | 40.72 | 48.71 | 33 | 6 | NaN | B77 | NaN |
Imputation
Also let’s replace NaN
s in mb
column with mean of that column.
earthquake_df_all['mb'].fillna((earthquake_df_all['mb'].mean()), inplace=True)
earthquake_df_all.head()
No | Year | Month | Day | Time | Lat | Long | Dept | mb | Ms | ref | Mw | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1.0 | 1900 | 6 | NaN | NaN | 38.5 | 43.3 | NaN | 4.459897 | 5 | MEA | NaN |
1 | 2.0 | 1900 | 7 | 12 | NaN | 40.28 | 43.1 | NaN | 4.459897 | 5.9 | ULM | NaN |
2 | 3.0 | 1901 | 4 | 3 | 57.0 | 40 | 44.3 | 15 | 4.459897 | 3.9 | MEA | NaN |
3 | 4.0 | 1902 | NaN | NaN | NaN | 39 | 43.3 | NaN | 4.459897 | 5 | MEA | NaN |
4 | 5.0 | 1902 | 2 | 13 | 93906.0 | 40.72 | 48.71 | 33 | 6.000000 | NaN | B77 | NaN |
Further work can be done on this dataframe.
Json files (Bonus)
In this part, we will use request to gather search results.
import requests
import json
query = {'q': 'playstation 5', 'sort': 'popularity', 'size' : 50}
response = requests.get("https://api.torob.com/v4/base-product/search/", params=query)
response_json = response.json()
response_json.keys()
dict_keys(['results', 'count', 'max_price', 'min_price', 'next', 'previous', 'spellcheck', 'available_filters', 'filter_by_category_title', 'categories', 'parent_categories'])
response_df = pd.json_normalize(response_json['results'])
response_df.head()
image_url | discount_info | random_key | name1 | name2 | more_info_url | web_client_absolute_url | price | price_text | price_text_mode | shop_text | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | https://storage.torob.com/backend-api/base/ima... | [] | 2e71cfe4-7b77-4e80-837b-de93b5fb95d7 | کنسول بازی سونی PlayStation 5 Digital پلی استی... | Playstation 5 Digital Editoin | https://api.torob.com/v4/base-product/details-... | /p/2e71cfe4-7b77-4e80-837b-de93b5fb95d7/%DA%A9... | 20120000 | از ۲۰٫۱۲۰٫۰۰۰ تومان | active | در ۱۰۱ فروشگاه |
1 | https://storage.torob.com/backend-api/base/ima... | [] | 9825c33c-07e6-4ac7-8bd6-12015ebb3e3c | کنسول بازی سونی PlayStation 5 Standard پلی اس... | Playstation 5 Standard Editoin | https://api.torob.com/v4/base-product/details-... | /p/9825c33c-07e6-4ac7-8bd6-12015ebb3e3c/%DA%A9... | 21700000 | از ۲۱٫۷۰۰٫۰۰۰ تومان | active | در ۱۳۶ فروشگاه |
2 | https://pcjanebi.com/wp-content/uploads/2021/0... | [] | 9687bb29-671b-42e4-986b-5f56ca3ade59 | باندل کنسول بازی سونی PlayStation 5 Standard پ... | Standard PlayStation 5 bundle+DualSense+Pulse+... | https://api.torob.com/v4/base-product/details-... | /p/9687bb29-671b-42e4-986b-5f56ca3ade59/%D8%A8... | 29990000 | از ۲۹٫۹۹۰٫۰۰۰ تومان | active | در ۵ فروشگاه |
3 | https://storage.torob.com/backend-api/base/ima... | [] | 3d19000e-b1a3-48de-8ef2-70c23c89b534 | پلی استیشن 5 دیجیتال به همراه بازی باندل 2 دسته | https://api.torob.com/v4/base-product/details-... | /p/3d19000e-b1a3-48de-8ef2-70c23c89b534/%D9%BE... | 23000000 | ۲۳٫۰۰۰٫۰۰۰ تومان | active | در مرکزخریددات کام | |
4 | https://storage.torob.com/backend-api/base/ima... | [] | 88d90bf1-b3c9-4a6f-8abc-0605f0e8a008 | کنسول بازی سونی پلی استیشن 5 ظرفیت 825 گیگابایت | https://api.torob.com/v4/base-product/details-... | /p/88d90bf1-b3c9-4a6f-8abc-0605f0e8a008/%DA%A9... | 21299000 | ۲۱٫۲۹۹٫۰۰۰ تومان | active | در ژیوان |