{ "cells": [ { "cell_type": "markdown", "source": [ "# Data Wrangling\n", "\n", "The process of turning raw data into more appropriate format. in the end, the result should have acceptable [Data Quality](https://en.wikipedia.org/wiki/Data_quality).\n", "\n", "### Contents included in this notebook:\n", " * Data Exploration\n", " * Data Cleaning \n", " * Merging Datasets\n", " \n", "\n", "Also this notebook will introduce you to [Pandas](https://pandas.pydata.org/) which faciliate the manipulation task.\n", "\n", "### Datasets:\n", " * [Samsung after-sales service in Iran](https://www.kaggle.com/sinatavakolibanizi/samsung-aftersales-service-in-iran)\n", " * [Iran's Earthquakes through history](https://www.kaggle.com/mehrdat/irans-earthquakes)\n", " " ], "metadata": {} }, { "cell_type": "code", "execution_count": 187, "source": [ "from os.path import join\n", "import numpy as np\n", "import pandas as pd\n", "%pdb 1" ], "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Automatic pdb calling has been turned ON\n" ] } ], "metadata": {} }, { "cell_type": "markdown", "source": [ "## Samsung after-sales service in Iran" ], "metadata": {} }, { "cell_type": "markdown", "source": [ "### Data Exploration\n", "\n", "The purpose of this part is to gain a better understanding of the data. what's data is about? what's the structure? what's the values? ... \n", "\n", "Let's read Iran's Earthquakes dataset into a [DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html)." ], "metadata": {} }, { "cell_type": "code", "execution_count": 188, "source": [ "service_df = pd.read_csv(join('Datasets', 'service', 'samdata.csv'))\n", "service_df" ], "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NoService_typeCost_TypeProduct_GroupSerial_NoCityProduct_DateReceipt_DateAppoint_DateComplete_DateTAT01TAT02Job_SatusDefect_DesSymptom_DescRepair_Action_DescLabor_Charge_DescEngineerLabor_Charge_AmountParts_AmountDiscount_AmountTotal_Invoice_Amount
010Normal WarrantyAUDIO - HOME THEAT2F/LAHFZ7E-IP-XTehranNaN2/5/20122/5/20121/21/201303510WORN-OUTOTHER PROBLEMSSOFTWARE UPGRADEFree Of ChargeTech1500.000.0000.000
121Normal WarrantyREFRIGERATOR-SBSAIP244AACA/4E7TehranNaN7/7/20129/10/20121/31/2013652081SHORTOTHER PROBLEMSOTHER REPAIRFree Of ChargeTech0303437481.60687486.3202748885.280
231Normal WarrantyREFRIGERATOR-SBSJK-C4BBCAEKL72TehranNaN7/7/20128/15/20121/31/2013382081OPENOTHER PROBLEMSOTHER REPAIRFree Of ChargeTech0304228756.56845751.3123383005.248
340Out of WarrantyMONITOR - TFT LCDIP2JO4/-A-47TehranNaN8/7/20128/7/20121/13/201301581WORN-OUTOTHER PROBLEMSELECTRICAL PARTS REPLACEMENTFree Of ChargeTech1500.000.0000.000
450Out of WarrantyAUDIO - MP3 PlayerAU/BCQE2A/PTehranNaN8/7/20128/9/20121/7/201321531WORN-OUTOTHER PROBLEMSSOFTWARE UPGRADEMediumTech07120000765488.60177087.820708381.680
.....................................................................
313131320Normal WarrantyLCD-40IP2JRIP/QE2KLKLREZVANSHAHRNaN1/31/20131/31/20131/31/2013001WORN-OUTOTHER PROBLEMSELECTRICAL PARTS REPLACEMENTMajorTech134200000.0084000.000336000.000
313231330Normal WarrantyLED46IP2JRC/&/2/IPESFAHANNaN1/31/20131/31/20131/31/2013001WORN-OUTOTHER PROBLEMSELECTRICAL PARTS REPLACEMENTMajorTech134200000.0084000.000336000.000
313331340Extended WarrantyHARD DISK DRIVES2/BJ2/B2KL4Z2Tehran40552.01/31/20131/31/20131/31/2013001WORN-OUTOTHER PROBLEMSREPAIR REFUSALMinimumTech07500000.0010000.00040000.000
313431350Out of WarrantyHARD DISK DRIVES2/BJ2/SC2/QE2Tehran38825.01/31/20131/31/20131/31/2013001WORN-OUTOTHER PROBLEMSREPAIR REFUSALFree Of ChargeTech0700.000.0000.000
313531360Out of WarrantyDVD PLAYER/ RECORDA&QEKLCDQAEA7AZTehran38448.01/31/20131/31/20131/31/2013001WORN-OUTOTHER PROBLEMSMECHANICAL PARTS REPLACEMENTMinorTech181500001488482.00328686.4001318785.600
\n", "

3136 rows × 22 columns

\n", "
" ], "text/plain": [ " No Service_type Cost_Type Product_Group \\\n", "0 1 0 Normal Warranty AUDIO - HOME THEAT \n", "1 2 1 Normal Warranty REFRIGERATOR-SBS \n", "2 3 1 Normal Warranty REFRIGERATOR-SBS \n", "3 4 0 Out of Warranty MONITOR - TFT LCD \n", "4 5 0 Out of Warranty AUDIO - MP3 Player \n", "... ... ... ... ... \n", "3131 3132 0 Normal Warranty LCD-40 \n", "3132 3133 0 Normal Warranty LED46 \n", "3133 3134 0 Extended Warranty HARD DISK DRIVE \n", "3134 3135 0 Out of Warranty HARD DISK DRIVE \n", "3135 3136 0 Out of Warranty DVD PLAYER/ RECORD \n", "\n", " Serial_No City Product_Date Receipt_Date Appoint_Date \\\n", "0 2F/LAHFZ7E-IP-X Tehran NaN 2/5/2012 2/5/2012 \n", "1 AIP244AACA/4E7 Tehran NaN 7/7/2012 9/10/2012 \n", "2 JK-C4BBCAEKL72 Tehran NaN 7/7/2012 8/15/2012 \n", "3 IP2JO4/-A-47 Tehran NaN 8/7/2012 8/7/2012 \n", "4 AU/BCQE2A/P Tehran NaN 8/7/2012 8/9/2012 \n", "... ... ... ... ... ... \n", "3131 IP2JRIP/QE2KLKL REZVANSHAHR NaN 1/31/2013 1/31/2013 \n", "3132 IP2JRC/&/2/IP ESFAHAN NaN 1/31/2013 1/31/2013 \n", "3133 S2/BJ2/B2KL4Z2 Tehran 40552.0 1/31/2013 1/31/2013 \n", "3134 S2/BJ2/SC2/QE2 Tehran 38825.0 1/31/2013 1/31/2013 \n", "3135 A&QEKLCDQAEA7AZ Tehran 38448.0 1/31/2013 1/31/2013 \n", "\n", " Complete_Date TAT01 TAT02 Job_Satus Defect_Des Symptom_Desc \\\n", "0 1/21/2013 0 351 0 WORN-OUT OTHER PROBLEMS \n", "1 1/31/2013 65 208 1 SHORT OTHER PROBLEMS \n", "2 1/31/2013 38 208 1 OPEN OTHER PROBLEMS \n", "3 1/13/2013 0 158 1 WORN-OUT OTHER PROBLEMS \n", "4 1/7/2013 2 153 1 WORN-OUT OTHER PROBLEMS \n", "... ... ... ... ... ... ... \n", "3131 1/31/2013 0 0 1 WORN-OUT OTHER PROBLEMS \n", "3132 1/31/2013 0 0 1 WORN-OUT OTHER PROBLEMS \n", "3133 1/31/2013 0 0 1 WORN-OUT OTHER PROBLEMS \n", "3134 1/31/2013 0 0 1 WORN-OUT OTHER PROBLEMS \n", "3135 1/31/2013 0 0 1 WORN-OUT OTHER PROBLEMS \n", "\n", " Repair_Action_Desc Labor_Charge_Desc Engineer \\\n", "0 SOFTWARE UPGRADE Free Of Charge Tech15 \n", "1 OTHER REPAIR Free Of Charge Tech03 \n", "2 OTHER REPAIR Free Of Charge Tech03 \n", "3 ELECTRICAL PARTS REPLACEMENT Free Of Charge Tech15 \n", "4 SOFTWARE UPGRADE Medium Tech07 \n", "... ... ... ... \n", "3131 ELECTRICAL PARTS REPLACEMENT Major Tech13 \n", "3132 ELECTRICAL PARTS REPLACEMENT Major Tech13 \n", "3133 REPAIR REFUSAL Minimum Tech07 \n", "3134 REPAIR REFUSAL Free Of Charge Tech07 \n", "3135 MECHANICAL PARTS REPLACEMENT Minor Tech18 \n", "\n", " Labor_Charge_Amount Parts_Amount Discount_Amount Total_Invoice_Amount \n", "0 0 0.00 0.000 0.000 \n", "1 0 3437481.60 687486.320 2748885.280 \n", "2 0 4228756.56 845751.312 3383005.248 \n", "3 0 0.00 0.000 0.000 \n", "4 120000 765488.60 177087.820 708381.680 \n", "... ... ... ... ... \n", "3131 420000 0.00 84000.000 336000.000 \n", "3132 420000 0.00 84000.000 336000.000 \n", "3133 50000 0.00 10000.000 40000.000 \n", "3134 0 0.00 0.000 0.000 \n", "3135 150000 1488482.00 328686.400 1318785.600 \n", "\n", "[3136 rows x 22 columns]" ] }, "metadata": {}, "execution_count": 188 } ], "metadata": {} }, { "cell_type": "markdown", "source": [ "```shape``` attribute indicate how large the resulting DataFrame is." ], "metadata": {} }, { "cell_type": "code", "execution_count": 189, "source": [ "service_df.shape" ], "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "(3136, 22)" ] }, "metadata": {}, "execution_count": 189 } ], "metadata": {} }, { "cell_type": "code", "execution_count": 190, "source": [ "print(\"pandas only displays \" + str(pd.options.display.max_rows) + \" rows\" ) \n", "print(\"and \" + str(pd.options.display.max_columns) + \" columns at max by default\" ) " ], "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "pandas only displays 60 rows\n", "and None columns at max by default\n" ] } ], "metadata": {} }, { "cell_type": "markdown", "source": [ "The size of the dataset is not that big and beside that, it only have 2 more columns than pandas maximum displayed columns. so let's tell pandas to show all columns for convenience." ], "metadata": {} }, { "cell_type": "code", "execution_count": 191, "source": [ " pd.set_option('display.max_columns', None)" ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "use ```loc()``` and ```iloc()``` for selecting rows in specific range." ], "metadata": {} }, { "cell_type": "code", "execution_count": 192, "source": [ "service_df.iloc[26:33:3]" ], "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NoService_typeCost_TypeProduct_GroupSerial_NoCityProduct_DateReceipt_DateAppoint_DateComplete_DateTAT01TAT02Job_SatusDefect_DesSymptom_DescRepair_Action_DescLabor_Charge_DescEngineerLabor_Charge_AmountParts_AmountDiscount_AmountTotal_Invoice_Amount
26270Normal WarrantyDIGITAL CAMERAAAKLMCN/CIPE/GTTehranNaN11/22/201211/22/20121/26/20130651DEFORMEDOUTCASE CRACKELECTRICAL PARTS REPLACEMENTMajorTech112500000.050000.0200000.0
29300Normal WarrantyTFT18H2MQ2/2/472TehranNaN12/1/201212/1/20121/6/20130361WORN-OUTOTHER PROBLEMSELECTRICAL PARTS REPLACEMENTFree Of ChargeTech1600.00.00.0
32330Normal WarrantyTFT18H2MQ2/AE27TehranNaN12/2/201212/2/20121/6/20130351WORN-OUTNO POWERELECTRICAL PARTS REPLACEMENTFree Of ChargeTech1600.00.00.0
\n", "
" ], "text/plain": [ " No Service_type Cost_Type Product_Group Serial_No \\\n", "26 27 0 Normal Warranty DIGITAL CAMERA AAKLMCN/CIPE/GT \n", "29 30 0 Normal Warranty TFT18 H2MQ2/2/472 \n", "32 33 0 Normal Warranty TFT18 H2MQ2/AE27 \n", "\n", " City Product_Date Receipt_Date Appoint_Date Complete_Date TAT01 \\\n", "26 Tehran NaN 11/22/2012 11/22/2012 1/26/2013 0 \n", "29 Tehran NaN 12/1/2012 12/1/2012 1/6/2013 0 \n", "32 Tehran NaN 12/2/2012 12/2/2012 1/6/2013 0 \n", "\n", " TAT02 Job_Satus Defect_Des Symptom_Desc Repair_Action_Desc \\\n", "26 65 1 DEFORMED OUTCASE CRACK ELECTRICAL PARTS REPLACEMENT \n", "29 36 1 WORN-OUT OTHER PROBLEMS ELECTRICAL PARTS REPLACEMENT \n", "32 35 1 WORN-OUT NO POWER ELECTRICAL PARTS REPLACEMENT \n", "\n", " Labor_Charge_Desc Engineer Labor_Charge_Amount Parts_Amount \\\n", "26 Major Tech11 250000 0.0 \n", "29 Free Of Charge Tech16 0 0.0 \n", "32 Free Of Charge Tech16 0 0.0 \n", "\n", " Discount_Amount Total_Invoice_Amount \n", "26 50000.0 200000.0 \n", "29 0.0 0.0 \n", "32 0.0 0.0 " ] }, "metadata": {}, "execution_count": 192 } ], "metadata": {} }, { "cell_type": "markdown", "source": [ "```sample()``` returns random rows to look at." ], "metadata": {} }, { "cell_type": "code", "execution_count": 193, "source": [ "service_df.sample(13)" ], "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NoService_typeCost_TypeProduct_GroupSerial_NoCityProduct_DateReceipt_DateAppoint_DateComplete_DateTAT01TAT02Job_SatusDefect_DesSymptom_DescRepair_Action_DescLabor_Charge_DescEngineerLabor_Charge_AmountParts_AmountDiscount_AmountTotal_Invoice_Amount
1301310Normal WarrantyLED40IP2JQC/KLIP2IPIPTehranNaN12/18/201212/18/20121/30/20130431WORN-OUTNO PICTURE/NO RASTERSET EXCHANGEFree Of ChargeTech2300.000.0000.000000e+00
180718080Out of WarrantyMONITOR - TFT LCDH2MPA/2KL74/Tehran38083.01/13/20131/14/20131/23/20131100WORN-OUTOTHER PROBLEMSELECTRICAL PARTS REPLACEMENTFree Of ChargeTech1600.000.0000.000000e+00
297129721Normal WarrantyLED40IP2JQA/&427ATehranNaN1/27/20131/27/20131/28/2013011WORN-OUTOTHER PROBLEMSELECTRICAL PARTS REPLACEMENTMajorTech1542000014086824.282803384.8561.161354e+07
304330440Out of WarrantyDIGITAL CAMERAAAKLMCN/CAE/LAJTehran40808.01/28/20131/28/20131/30/2013021DEFORMEDOTHER PROBLEMSELECTRICAL PARTS REPLACEMENTMajorTech11250000584648.70166828.7406.677189e+05
233123320Normal WarrantyLCD-40IP2JRA/72A/KLTehranNaN1/20/20131/20/20131/21/2013011WORN-OUTOTHER PROBLEMSSET EXCHANGEFree Of ChargeTech1400.000.0000.000000e+00
1011020Out of WarrantyMONITOR - TFT LCDH2MM2/A/4IPKLTehran38081.012/16/201212/17/20121/5/20131200WORN-OUTOTHER PROBLEMSELECTRICAL PARTS REPLACEMENTFree Of ChargeTech1600.000.0000.000000e+00
309630970Normal WarrantyLED40IP2JQA/&KL/4/TehranNaN1/28/20131/28/20131/31/2013031WORN-OUTNO PICTURE/NO RASTERSET EXCHANGEFree Of ChargeTech2300.000.0000.000000e+00
218121820Normal WarrantyDVD PLAYERZT2QARDBC/2&IP2ZTehranNaN1/19/20131/19/20131/20/2013011WORN-OUTREAD ERRORELECTRICAL PARTS REPLACEMENTMajorTech181700001840063.68402012.7361.608051e+06
214121420Normal WarrantyLED40IP2JQA/&QE4/TehranNaN1/19/20131/19/20131/21/2013021WORN-OUTOTHER PROBLEMSELECTRICAL PARTS REPLACEMENTMinorTech142150000.0043000.0001.720000e+05
9359360Normal WarrantyLCD-40IP2JRA/7/7&IPTehranNaN1/5/20131/5/20131/6/2013011WORN-OUTOTHER PROBLEMSSET EXCHANGEFree Of ChargeTech0800.000.0000.000000e+00
169816990Normal WarrantyLCD-40IP2JQA/&2772TehranNaN1/10/20131/19/20131/20/20138101WORN-OUTOTHER PROBLEMSSET EXCHANGEFree Of ChargeTech1300.000.0000.000000e+00
3433441Normal WarrantyLED40IP2JR2/-2/IP4TehranNaN12/27/201212/31/20121/1/2013451WORN-OUTOTHER PROBLEMSELECTRICAL PARTS REPLACEMENTMajorTech1042000024887810.325063582.0642.025433e+07
187518761Normal WarrantyLCD- 46IP2JQ7/42/42TehranNaN1/14/20131/17/20131/20/2013361WORN-OUTOTHER PROBLEMSELECTRICAL PARTS REPLACEMENTFree Of ChargeTech1500.000.0000.000000e+00
\n", "
" ], "text/plain": [ " No Service_type Cost_Type Product_Group \\\n", "130 131 0 Normal Warranty LED40 \n", "1807 1808 0 Out of Warranty MONITOR - TFT LCD \n", "2971 2972 1 Normal Warranty LED40 \n", "3043 3044 0 Out of Warranty DIGITAL CAMERA \n", "2331 2332 0 Normal Warranty LCD-40 \n", "101 102 0 Out of Warranty MONITOR - TFT LCD \n", "3096 3097 0 Normal Warranty LED40 \n", "2181 2182 0 Normal Warranty DVD PLAYER \n", "2141 2142 0 Normal Warranty LED40 \n", "935 936 0 Normal Warranty LCD-40 \n", "1698 1699 0 Normal Warranty LCD-40 \n", "343 344 1 Normal Warranty LED40 \n", "1875 1876 1 Normal Warranty LCD- 46 \n", "\n", " Serial_No City Product_Date Receipt_Date Appoint_Date \\\n", "130 IP2JQC/KLIP2IPIP Tehran NaN 12/18/2012 12/18/2012 \n", "1807 H2MPA/2KL74/ Tehran 38083.0 1/13/2013 1/14/2013 \n", "2971 IP2JQA/&427A Tehran NaN 1/27/2013 1/27/2013 \n", "3043 AAKLMCN/CAE/LAJ Tehran 40808.0 1/28/2013 1/28/2013 \n", "2331 IP2JRA/72A/KL Tehran NaN 1/20/2013 1/20/2013 \n", "101 H2MM2/A/4IPKL Tehran 38081.0 12/16/2012 12/17/2012 \n", "3096 IP2JQA/&KL/4/ Tehran NaN 1/28/2013 1/28/2013 \n", "2181 ZT2QARDBC/2&IP2Z Tehran NaN 1/19/2013 1/19/2013 \n", "2141 IP2JQA/&QE4/ Tehran NaN 1/19/2013 1/19/2013 \n", "935 IP2JRA/7/7&IP Tehran NaN 1/5/2013 1/5/2013 \n", "1698 IP2JQA/&2772 Tehran NaN 1/10/2013 1/19/2013 \n", "343 IP2JR2/-2/IP4 Tehran NaN 12/27/2012 12/31/2012 \n", "1875 IP2JQ7/42/42 Tehran NaN 1/14/2013 1/17/2013 \n", "\n", " Complete_Date TAT01 TAT02 Job_Satus Defect_Des Symptom_Desc \\\n", "130 1/30/2013 0 43 1 WORN-OUT NO PICTURE/NO RASTER \n", "1807 1/23/2013 1 10 0 WORN-OUT OTHER PROBLEMS \n", "2971 1/28/2013 0 1 1 WORN-OUT OTHER PROBLEMS \n", "3043 1/30/2013 0 2 1 DEFORMED OTHER PROBLEMS \n", "2331 1/21/2013 0 1 1 WORN-OUT OTHER PROBLEMS \n", "101 1/5/2013 1 20 0 WORN-OUT OTHER PROBLEMS \n", "3096 1/31/2013 0 3 1 WORN-OUT NO PICTURE/NO RASTER \n", "2181 1/20/2013 0 1 1 WORN-OUT READ ERROR \n", "2141 1/21/2013 0 2 1 WORN-OUT OTHER PROBLEMS \n", "935 1/6/2013 0 1 1 WORN-OUT OTHER PROBLEMS \n", "1698 1/20/2013 8 10 1 WORN-OUT OTHER PROBLEMS \n", "343 1/1/2013 4 5 1 WORN-OUT OTHER PROBLEMS \n", "1875 1/20/2013 3 6 1 WORN-OUT OTHER PROBLEMS \n", "\n", " Repair_Action_Desc Labor_Charge_Desc Engineer \\\n", "130 SET EXCHANGE Free Of Charge Tech23 \n", "1807 ELECTRICAL PARTS REPLACEMENT Free Of Charge Tech16 \n", "2971 ELECTRICAL PARTS REPLACEMENT Major Tech15 \n", "3043 ELECTRICAL PARTS REPLACEMENT Major Tech11 \n", "2331 SET EXCHANGE Free Of Charge Tech14 \n", "101 ELECTRICAL PARTS REPLACEMENT Free Of Charge Tech16 \n", "3096 SET EXCHANGE Free Of Charge Tech23 \n", "2181 ELECTRICAL PARTS REPLACEMENT Major Tech18 \n", "2141 ELECTRICAL PARTS REPLACEMENT Minor Tech14 \n", "935 SET EXCHANGE Free Of Charge Tech08 \n", "1698 SET EXCHANGE Free Of Charge Tech13 \n", "343 ELECTRICAL PARTS REPLACEMENT Major Tech10 \n", "1875 ELECTRICAL PARTS REPLACEMENT Free Of Charge Tech15 \n", "\n", " Labor_Charge_Amount Parts_Amount Discount_Amount Total_Invoice_Amount \n", "130 0 0.00 0.000 0.000000e+00 \n", "1807 0 0.00 0.000 0.000000e+00 \n", "2971 420000 14086824.28 2803384.856 1.161354e+07 \n", "3043 250000 584648.70 166828.740 6.677189e+05 \n", "2331 0 0.00 0.000 0.000000e+00 \n", "101 0 0.00 0.000 0.000000e+00 \n", "3096 0 0.00 0.000 0.000000e+00 \n", "2181 170000 1840063.68 402012.736 1.608051e+06 \n", "2141 215000 0.00 43000.000 1.720000e+05 \n", "935 0 0.00 0.000 0.000000e+00 \n", "1698 0 0.00 0.000 0.000000e+00 \n", "343 420000 24887810.32 5063582.064 2.025433e+07 \n", "1875 0 0.00 0.000 0.000000e+00 " ] }, "metadata": {}, "execution_count": 193 } ], "metadata": {} }, { "cell_type": "markdown", "source": [ "Dataset itself have a column for index values, let's be sure that it's really unique for each row, then we can condiser *```No``` column* as index." ], "metadata": {} }, { "cell_type": "code", "execution_count": 194, "source": [ "service_df['No'].is_unique" ], "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "True" ] }, "metadata": {}, "execution_count": 194 } ], "metadata": {} }, { "cell_type": "code", "execution_count": 195, "source": [ "service_df.set_index(\"No\", inplace=True)" ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "```head(n)``` returns first *n* rows.\n", "```tail(n)``` returns last *n* rows.\n", "Default value is considered to be 5." ], "metadata": {} }, { "cell_type": "code", "execution_count": 196, "source": [ "service_df.head()" ], "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Service_typeCost_TypeProduct_GroupSerial_NoCityProduct_DateReceipt_DateAppoint_DateComplete_DateTAT01TAT02Job_SatusDefect_DesSymptom_DescRepair_Action_DescLabor_Charge_DescEngineerLabor_Charge_AmountParts_AmountDiscount_AmountTotal_Invoice_Amount
No
10Normal WarrantyAUDIO - HOME THEAT2F/LAHFZ7E-IP-XTehranNaN2/5/20122/5/20121/21/201303510WORN-OUTOTHER PROBLEMSSOFTWARE UPGRADEFree Of ChargeTech1500.000.0000.000
21Normal WarrantyREFRIGERATOR-SBSAIP244AACA/4E7TehranNaN7/7/20129/10/20121/31/2013652081SHORTOTHER PROBLEMSOTHER REPAIRFree Of ChargeTech0303437481.60687486.3202748885.280
31Normal WarrantyREFRIGERATOR-SBSJK-C4BBCAEKL72TehranNaN7/7/20128/15/20121/31/2013382081OPENOTHER PROBLEMSOTHER REPAIRFree Of ChargeTech0304228756.56845751.3123383005.248
40Out of WarrantyMONITOR - TFT LCDIP2JO4/-A-47TehranNaN8/7/20128/7/20121/13/201301581WORN-OUTOTHER PROBLEMSELECTRICAL PARTS REPLACEMENTFree Of ChargeTech1500.000.0000.000
50Out of WarrantyAUDIO - MP3 PlayerAU/BCQE2A/PTehranNaN8/7/20128/9/20121/7/201321531WORN-OUTOTHER PROBLEMSSOFTWARE UPGRADEMediumTech07120000765488.60177087.820708381.680
\n", "
" ], "text/plain": [ " Service_type Cost_Type Product_Group Serial_No \\\n", "No \n", "1 0 Normal Warranty AUDIO - HOME THEAT 2F/LAHFZ7E-IP-X \n", "2 1 Normal Warranty REFRIGERATOR-SBS AIP244AACA/4E7 \n", "3 1 Normal Warranty REFRIGERATOR-SBS JK-C4BBCAEKL72 \n", "4 0 Out of Warranty MONITOR - TFT LCD IP2JO4/-A-47 \n", "5 0 Out of Warranty AUDIO - MP3 Player AU/BCQE2A/P \n", "\n", " City Product_Date Receipt_Date Appoint_Date Complete_Date TAT01 \\\n", "No \n", "1 Tehran NaN 2/5/2012 2/5/2012 1/21/2013 0 \n", "2 Tehran NaN 7/7/2012 9/10/2012 1/31/2013 65 \n", "3 Tehran NaN 7/7/2012 8/15/2012 1/31/2013 38 \n", "4 Tehran NaN 8/7/2012 8/7/2012 1/13/2013 0 \n", "5 Tehran NaN 8/7/2012 8/9/2012 1/7/2013 2 \n", "\n", " TAT02 Job_Satus Defect_Des Symptom_Desc Repair_Action_Desc \\\n", "No \n", "1 351 0 WORN-OUT OTHER PROBLEMS SOFTWARE UPGRADE \n", "2 208 1 SHORT OTHER PROBLEMS OTHER REPAIR \n", "3 208 1 OPEN OTHER PROBLEMS OTHER REPAIR \n", "4 158 1 WORN-OUT OTHER PROBLEMS ELECTRICAL PARTS REPLACEMENT \n", "5 153 1 WORN-OUT OTHER PROBLEMS SOFTWARE UPGRADE \n", "\n", " Labor_Charge_Desc Engineer Labor_Charge_Amount Parts_Amount \\\n", "No \n", "1 Free Of Charge Tech15 0 0.00 \n", "2 Free Of Charge Tech03 0 3437481.60 \n", "3 Free Of Charge Tech03 0 4228756.56 \n", "4 Free Of Charge Tech15 0 0.00 \n", "5 Medium Tech07 120000 765488.60 \n", "\n", " Discount_Amount Total_Invoice_Amount \n", "No \n", "1 0.000 0.000 \n", "2 687486.320 2748885.280 \n", "3 845751.312 3383005.248 \n", "4 0.000 0.000 \n", "5 177087.820 708381.680 " ] }, "metadata": {}, "execution_count": 196 } ], "metadata": {} }, { "cell_type": "markdown", "source": [ "```Columns``` attribute returns all column names." ], "metadata": {} }, { "cell_type": "code", "execution_count": 197, "source": [ "service_df.columns" ], "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "Index(['Service_type', 'Cost_Type', 'Product_Group', 'Serial_No', 'City',\n", " 'Product_Date', 'Receipt_Date', 'Appoint_Date', 'Complete_Date',\n", " 'TAT01', 'TAT02', 'Job_Satus', 'Defect_Des', 'Symptom_Desc',\n", " 'Repair_Action_Desc', 'Labor_Charge_Desc', 'Engineer',\n", " 'Labor_Charge_Amount', 'Parts_Amount', 'Discount_Amount',\n", " 'Total_Invoice_Amount'],\n", " dtype='object')" ] }, "metadata": {}, "execution_count": 197 } ], "metadata": {} }, { "cell_type": "markdown", "source": [ "```dtypes``` shows each column data type." ], "metadata": {} }, { "cell_type": "code", "execution_count": 198, "source": [ "service_df.dtypes" ], "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "Service_type int64\n", "Cost_Type object\n", "Product_Group object\n", "Serial_No object\n", "City object\n", "Product_Date float64\n", "Receipt_Date object\n", "Appoint_Date object\n", "Complete_Date object\n", "TAT01 int64\n", "TAT02 int64\n", "Job_Satus int64\n", "Defect_Des object\n", "Symptom_Desc object\n", "Repair_Action_Desc object\n", "Labor_Charge_Desc object\n", "Engineer object\n", "Labor_Charge_Amount int64\n", "Parts_Amount float64\n", "Discount_Amount float64\n", "Total_Invoice_Amount float64\n", "dtype: object" ] }, "metadata": {}, "execution_count": 198 } ], "metadata": {} }, { "cell_type": "markdown", "source": [ "```info()``` function summarise the dadaset. " ], "metadata": {} }, { "cell_type": "code", "execution_count": 199, "source": [ "service_df.info()" ], "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "\n", "Int64Index: 3136 entries, 1 to 3136\n", "Data columns (total 21 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 Service_type 3136 non-null int64 \n", " 1 Cost_Type 3136 non-null object \n", " 2 Product_Group 3126 non-null object \n", " 3 Serial_No 3127 non-null object \n", " 4 City 3136 non-null object \n", " 5 Product_Date 724 non-null float64\n", " 6 Receipt_Date 3136 non-null object \n", " 7 Appoint_Date 3136 non-null object \n", " 8 Complete_Date 3136 non-null object \n", " 9 TAT01 3136 non-null int64 \n", " 10 TAT02 3136 non-null int64 \n", " 11 Job_Satus 3136 non-null int64 \n", " 12 Defect_Des 3136 non-null object \n", " 13 Symptom_Desc 3136 non-null object \n", " 14 Repair_Action_Desc 3136 non-null object \n", " 15 Labor_Charge_Desc 3136 non-null object \n", " 16 Engineer 3136 non-null object \n", " 17 Labor_Charge_Amount 3136 non-null int64 \n", " 18 Parts_Amount 3136 non-null float64\n", " 19 Discount_Amount 3136 non-null float64\n", " 20 Total_Invoice_Amount 3136 non-null float64\n", "dtypes: float64(4), int64(5), object(12)\n", "memory usage: 539.0+ KB\n" ] } ], "metadata": {} }, { "cell_type": "markdown", "source": [ "For numerical values ```describe()``` function shows statical features of each column." ], "metadata": {} }, { "cell_type": "code", "execution_count": 200, "source": [ "service_df.describe()" ], "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Service_typeProduct_DateTAT01TAT02Job_SatusLabor_Charge_AmountParts_AmountDiscount_AmountTotal_Invoice_Amount
count3136.000000724.0000003136.0000003136.0000003136.0000003136.0000003.136000e+033.136000e+033.136000e+03
mean0.10523039227.7582871.2892226.6479590.977360129877.2321432.303481e+064.844214e+051.924006e+06
std0.3068981152.7462825.06042214.0127000.148778176843.0653766.682868e+061.349372e+065.400547e+06
min0.00000033870.0000000.0000000.0000000.0000000.0000000.000000e+000.000000e+000.000000e+00
25%0.00000038451.0000000.0000001.0000001.0000000.0000000.000000e+000.000000e+000.000000e+00
50%0.00000038823.0000000.0000003.0000001.0000000.0000000.000000e+000.000000e+000.000000e+00
75%0.00000040188.0000001.0000006.0000001.000000250000.0000007.510464e+048.738318e+043.827746e+05
max1.00000040813.00000081.000000351.0000001.000000540000.0000004.480716e+078.088432e+063.635773e+07
\n", "
" ], "text/plain": [ " Service_type Product_Date TAT01 TAT02 Job_Satus \\\n", "count 3136.000000 724.000000 3136.000000 3136.000000 3136.000000 \n", "mean 0.105230 39227.758287 1.289222 6.647959 0.977360 \n", "std 0.306898 1152.746282 5.060422 14.012700 0.148778 \n", "min 0.000000 33870.000000 0.000000 0.000000 0.000000 \n", "25% 0.000000 38451.000000 0.000000 1.000000 1.000000 \n", "50% 0.000000 38823.000000 0.000000 3.000000 1.000000 \n", "75% 0.000000 40188.000000 1.000000 6.000000 1.000000 \n", "max 1.000000 40813.000000 81.000000 351.000000 1.000000 \n", "\n", " Labor_Charge_Amount Parts_Amount Discount_Amount \\\n", "count 3136.000000 3.136000e+03 3.136000e+03 \n", "mean 129877.232143 2.303481e+06 4.844214e+05 \n", "std 176843.065376 6.682868e+06 1.349372e+06 \n", "min 0.000000 0.000000e+00 0.000000e+00 \n", "25% 0.000000 0.000000e+00 0.000000e+00 \n", "50% 0.000000 0.000000e+00 0.000000e+00 \n", "75% 250000.000000 7.510464e+04 8.738318e+04 \n", "max 540000.000000 4.480716e+07 8.088432e+06 \n", "\n", " Total_Invoice_Amount \n", "count 3.136000e+03 \n", "mean 1.924006e+06 \n", "std 5.400547e+06 \n", "min 0.000000e+00 \n", "25% 0.000000e+00 \n", "50% 0.000000e+00 \n", "75% 3.827746e+05 \n", "max 3.635773e+07 " ] }, "metadata": {}, "execution_count": 200 } ], "metadata": {} }, { "cell_type": "markdown", "source": [ "Columns can be accessed as a attribute to the dataframe. Each column in fact is a [Series](https://pandas.pydata.org/docs/reference/series.html). we can count each repetative value by using ```value_count()``` function." ], "metadata": {} }, { "cell_type": "code", "execution_count": 201, "source": [ "service_df.Product_Group.value_counts()" ], "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "LED40 1008\n", "LCD-40 529\n", "DIGITAL CAMERA 282\n", "LED46 250\n", "PRINTER - MONO LAS 151\n", "DVD PLAYER 119\n", "AUDIO - MP3 Player 109\n", "HARD DISK DRIVE 102\n", "AUDIO - HOME THEAT 102\n", "REFRIGERATOR-SBS 68\n", "DVD PLAYER/ RECORD 65\n", "PLASMA-42 62\n", "CAMCORDER 57\n", "PLASMA-43 49\n", "PRINTER - COLOR LA 35\n", "MONITOR - TFT LCD 24\n", "VACUUM CLEANER 24\n", "TFT18 18\n", "WASHING MACHINE 16\n", "PLASMA-51 15\n", "LED55 15\n", "OVEN - MICROWAVE 11\n", "LCD- 46 8\n", "PLASMA-50 4\n", "AIR CONDITIONER 3\n", "Name: Product_Group, dtype: int64" ] }, "metadata": {}, "execution_count": 201 } ], "metadata": {} }, { "cell_type": "markdown", "source": [ "Also we can group rows by columns. and get each group size." ], "metadata": {} }, { "cell_type": "code", "execution_count": 202, "source": [ "service_df.groupby(by =[\"Product_Group\", \"Cost_Type\"]).size()" ], "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "Product_Group Cost_Type \n", "AIR CONDITIONER Normal Warranty 2\n", " Out of Warranty 1\n", "AUDIO - HOME THEAT Normal Warranty 3\n", " Out of Warranty 99\n", "AUDIO - MP3 Player Normal Warranty 28\n", " Out of Warranty 81\n", "CAMCORDER Normal Warranty 5\n", " Out of Warranty 52\n", "DIGITAL CAMERA Normal Warranty 149\n", " Out of Warranty 133\n", "DVD PLAYER Normal Warranty 53\n", " Out of Warranty 66\n", "DVD PLAYER/ RECORD Extended Warranty 11\n", " Normal Warranty 18\n", " Out of Warranty 36\n", "HARD DISK DRIVE Extended Warranty 63\n", " Normal Warranty 14\n", " Out of Warranty 25\n", "LCD- 46 Normal Warranty 5\n", " Out of Warranty 3\n", "LCD-40 Normal Warranty 522\n", " Out of Warranty 7\n", "LED40 Normal Warranty 1003\n", " Out of Warranty 5\n", "LED46 Normal Warranty 247\n", " Out of Warranty 3\n", "LED55 Normal Warranty 14\n", " Out of Warranty 1\n", "MONITOR - TFT LCD Normal Warranty 5\n", " Out of Warranty 19\n", "OVEN - MICROWAVE Normal Warranty 3\n", " Out of Warranty 8\n", "PLASMA-42 Normal Warranty 55\n", " Out of Warranty 7\n", "PLASMA-43 Normal Warranty 45\n", " Out of Warranty 4\n", "PLASMA-50 Normal Warranty 3\n", " Out of Warranty 1\n", "PLASMA-51 Normal Warranty 12\n", " Out of Warranty 3\n", "PRINTER - COLOR LA Normal Warranty 14\n", " Out of Warranty 21\n", "PRINTER - MONO LAS Normal Warranty 57\n", " Out of Warranty 94\n", "REFRIGERATOR-SBS Normal Warranty 32\n", " Out of Warranty 36\n", "TFT18 Extended Warranty 2\n", " Normal Warranty 15\n", " Out of Warranty 1\n", "VACUUM CLEANER Normal Warranty 1\n", " Out of Warranty 23\n", "WASHING MACHINE Out of Warranty 16\n", "dtype: int64" ] }, "metadata": {}, "execution_count": 202 } ], "metadata": {} }, { "cell_type": "markdown", "source": [ "In fact the result of above command is a Series which have [multiple indexes](https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html). by using ```unstack()```, we reshape the series into a Dataframe" ], "metadata": {} }, { "cell_type": "code", "execution_count": 203, "source": [ "service_df.groupby(by =[\"Product_Group\", \"Cost_Type\"]).size().unstack()" ], "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Cost_TypeExtended WarrantyNormal WarrantyOut of Warranty
Product_Group
AIR CONDITIONERNaN2.01.0
AUDIO - HOME THEATNaN3.099.0
AUDIO - MP3 PlayerNaN28.081.0
CAMCORDERNaN5.052.0
DIGITAL CAMERANaN149.0133.0
DVD PLAYERNaN53.066.0
DVD PLAYER/ RECORD11.018.036.0
HARD DISK DRIVE63.014.025.0
LCD- 46NaN5.03.0
LCD-40NaN522.07.0
LED40NaN1003.05.0
LED46NaN247.03.0
LED55NaN14.01.0
MONITOR - TFT LCDNaN5.019.0
OVEN - MICROWAVENaN3.08.0
PLASMA-42NaN55.07.0
PLASMA-43NaN45.04.0
PLASMA-50NaN3.01.0
PLASMA-51NaN12.03.0
PRINTER - COLOR LANaN14.021.0
PRINTER - MONO LASNaN57.094.0
REFRIGERATOR-SBSNaN32.036.0
TFT182.015.01.0
VACUUM CLEANERNaN1.023.0
WASHING MACHINENaNNaN16.0
\n", "
" ], "text/plain": [ "Cost_Type Extended Warranty Normal Warranty Out of Warranty\n", "Product_Group \n", "AIR CONDITIONER NaN 2.0 1.0\n", "AUDIO - HOME THEAT NaN 3.0 99.0\n", "AUDIO - MP3 Player NaN 28.0 81.0\n", "CAMCORDER NaN 5.0 52.0\n", "DIGITAL CAMERA NaN 149.0 133.0\n", "DVD PLAYER NaN 53.0 66.0\n", "DVD PLAYER/ RECORD 11.0 18.0 36.0\n", "HARD DISK DRIVE 63.0 14.0 25.0\n", "LCD- 46 NaN 5.0 3.0\n", "LCD-40 NaN 522.0 7.0\n", "LED40 NaN 1003.0 5.0\n", "LED46 NaN 247.0 3.0\n", "LED55 NaN 14.0 1.0\n", "MONITOR - TFT LCD NaN 5.0 19.0\n", "OVEN - MICROWAVE NaN 3.0 8.0\n", "PLASMA-42 NaN 55.0 7.0\n", "PLASMA-43 NaN 45.0 4.0\n", "PLASMA-50 NaN 3.0 1.0\n", "PLASMA-51 NaN 12.0 3.0\n", "PRINTER - COLOR LA NaN 14.0 21.0\n", "PRINTER - MONO LAS NaN 57.0 94.0\n", "REFRIGERATOR-SBS NaN 32.0 36.0\n", "TFT18 2.0 15.0 1.0\n", "VACUUM CLEANER NaN 1.0 23.0\n", "WASHING MACHINE NaN NaN 16.0" ] }, "metadata": {}, "execution_count": 203 } ], "metadata": {} }, { "cell_type": "markdown", "source": [ "let's count NaN values in each columns. " ], "metadata": {} }, { "cell_type": "code", "execution_count": 204, "source": [ "service_df.isna().sum()" ], "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "Service_type 0\n", "Cost_Type 0\n", "Product_Group 10\n", "Serial_No 9\n", "City 0\n", "Product_Date 2412\n", "Receipt_Date 0\n", "Appoint_Date 0\n", "Complete_Date 0\n", "TAT01 0\n", "TAT02 0\n", "Job_Satus 0\n", "Defect_Des 0\n", "Symptom_Desc 0\n", "Repair_Action_Desc 0\n", "Labor_Charge_Desc 0\n", "Engineer 0\n", "Labor_Charge_Amount 0\n", "Parts_Amount 0\n", "Discount_Amount 0\n", "Total_Invoice_Amount 0\n", "dtype: int64" ] }, "metadata": {}, "execution_count": 204 } ], "metadata": { "scrolled": false } }, { "cell_type": "markdown", "source": [ "let's see rows with NaN values" ], "metadata": {} }, { "cell_type": "code", "execution_count": 205, "source": [ "service_df[service_df[['Product_Group','Serial_No']].isna().any(1)]" ], "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Service_typeCost_TypeProduct_GroupSerial_NoCityProduct_DateReceipt_DateAppoint_DateComplete_DateTAT01TAT02Job_SatusDefect_DesSymptom_DescRepair_Action_DescLabor_Charge_DescEngineerLabor_Charge_AmountParts_AmountDiscount_AmountTotal_Invoice_Amount
No
4120Out of WarrantyNaN2P-/BAXS-E/KLKLKTehran38818.012/29/201212/31/20121/8/20132101WORN-OUTOTHER PROBLEMSOTHER REPAIRFree Of ChargeTech1600.000.0000.000
5421Normal WarrantyPRINTER - MONO LASNaNTehranNaN12/31/201212/31/20121/1/2013011WORN-OUTOTHER PROBLEMSOTHER REPAIRMaximumTech203500000.0070000.000280000.000
6260Out of WarrantyNaNZADCKLVKSAE2KL2KMASHHAD38814.01/1/20131/8/20131/13/20137121DEFORMEDTest OKNo ActionFree Of ChargeTech0800.000.0000.000
7040Out of WarrantyNaNA/&2GV2B&E/KL2KTehran40548.01/1/20131/2/20131/5/2013141Software BugZOOM PROBLEMFIRMWARE UPGRADEMajorTech082500000.0050000.000200000.000
8760Out of WarrantyVACUUM CLEANERNaNTehranNaN1/2/20131/6/20131/6/2013441WORN-OUTOTHER PROBLEMSELECTRICAL PARTS REPLACEMENTFree Of ChargeTech1700.000.0000.000
9340Normal WarrantyNaNZTAPKLCABA/2IP24LTehranNaN1/5/20131/5/20131/5/2013001WORN-OUTREAD ERRORMECHANICAL PARTS REPLACEMENTMajorTech1201158705.72231741.144826864.576
10590Normal WarrantyNaNZTAPKLCIBAE&&/DTehranNaN1/5/20131/6/20131/7/2013121WORN-OUTREAD ERRORMECHANICAL PARTS REPLACEMENTMediumTech1201158705.72231741.144826864.576
12070Normal WarrantyNaNZTAPKLCIBB/&IP47ATehranNaN1/6/20131/6/20131/10/2013041WORN-OUTOTHER PROBLEMSELECTRICAL PARTS REPLACEMENTMajorTech141700001158705.72265741.1441062864.576
12130Out of WarrantyNaN2P7IPBAKPQEKL27/NTehran38080.01/6/20131/6/20131/7/2013011WORN-OUTTest OKNo ActionFree Of ChargeTech2000.000.0000.000
12350Out of WarrantyVACUUM CLEANERNaNTehranNaN1/6/20131/8/20131/9/2013231WORN-OUTOTHER PROBLEMSELECTRICAL PARTS REPLACEMENTMinorTech1714500015526.4432105.288128421.152
15500Normal WarrantyNaNZTAPKLCIB2/A/7KLZTehranNaN1/9/20131/9/20131/9/2013001WORN-OUTREAD ERRORMECHANICAL PARTS REPLACEMENTMediumTech1201158705.72231741.144826864.576
15710Normal WarrantyNaNZTAPKLCIBA/A4QEPARDEBILNaN1/9/20131/9/20131/27/20130181WORN-OUTOTHER PROBLEMSELECTRICAL PARTS REPLACEMENTFree Of ChargeTech1800.000.0000.000
16170Out of WarrantyNaNA/&KLKLVBB&E/2ANTehran40548.01/9/20131/10/20131/13/2013141DEFORMEDNO POWERELECTRICAL PARTS REPLACEMENTMajorTech082500000.0050000.000200000.000
18140Out of WarrantyDVD PLAYERNaNTehranNaN1/14/20131/14/20131/26/20130121WORN-OUTOTHER PROBLEMSELECTRICAL PARTS REPLACEMENTMinorTech14120000868118.56187823.812781285.648
20320Out of WarrantyDIGITAL CAMERANaNTehranNaN1/15/20131/15/20131/21/2013061DEFORMEDOTHER PROBLEMSMECHANICAL PARTS REPLACEMENTFree Of ChargeTech1100.000.0000.000
20360Out of WarrantyVACUUM CLEANERNaNTehranNaN1/16/20131/20/20131/20/2013441WORN-OUTOTHER PROBLEMSELECTRICAL PARTS REPLACEMENTFree Of ChargeTech17018858.403871.88015887.520
22690Out of WarrantyAUDIO - HOME THEATNaNTehranNaN1/20/20131/20/20131/20/2013001WORN-OUTOTHER PROBLEMSELECTRICAL PARTS REPLACEMENTFree Of ChargeTech1801450087.28280018.4561160077.824
27720Out of WarrantyDVD PLAYERNaNTehranNaN1/24/20131/24/20131/26/2013021WORN-OUTOTHER PROBLEMSMECHANICAL PARTS REPLACEMENTMinimumTech1870000868118.56187823.812751285.648
28920Out of WarrantyDIGITAL CAMERANaNABHARNaN1/27/20131/27/20131/28/2013011DEFORMEDOTHER PROBLEMSSCRAPFree Of ChargeTech1100.000.0000.000
\n", "
" ], "text/plain": [ " Service_type Cost_Type Product_Group Serial_No \\\n", "No \n", "412 0 Out of Warranty NaN 2P-/BAXS-E/KLKLK \n", "542 1 Normal Warranty PRINTER - MONO LAS NaN \n", "626 0 Out of Warranty NaN ZADCKLVKSAE2KL2K \n", "704 0 Out of Warranty NaN A/&2GV2B&E/KL2K \n", "876 0 Out of Warranty VACUUM CLEANER NaN \n", "934 0 Normal Warranty NaN ZTAPKLCABA/2IP24L \n", "1059 0 Normal Warranty NaN ZTAPKLCIBAE&&/D \n", "1207 0 Normal Warranty NaN ZTAPKLCIBB/&IP47A \n", "1213 0 Out of Warranty NaN 2P7IPBAKPQEKL27/N \n", "1235 0 Out of Warranty VACUUM CLEANER NaN \n", "1550 0 Normal Warranty NaN ZTAPKLCIB2/A/7KLZ \n", "1571 0 Normal Warranty NaN ZTAPKLCIBA/A4QEP \n", "1617 0 Out of Warranty NaN A/&KLKLVBB&E/2AN \n", "1814 0 Out of Warranty DVD PLAYER NaN \n", "2032 0 Out of Warranty DIGITAL CAMERA NaN \n", "2036 0 Out of Warranty VACUUM CLEANER NaN \n", "2269 0 Out of Warranty AUDIO - HOME THEAT NaN \n", "2772 0 Out of Warranty DVD PLAYER NaN \n", "2892 0 Out of Warranty DIGITAL CAMERA NaN \n", "\n", " City Product_Date Receipt_Date Appoint_Date Complete_Date TAT01 \\\n", "No \n", "412 Tehran 38818.0 12/29/2012 12/31/2012 1/8/2013 2 \n", "542 Tehran NaN 12/31/2012 12/31/2012 1/1/2013 0 \n", "626 MASHHAD 38814.0 1/1/2013 1/8/2013 1/13/2013 7 \n", "704 Tehran 40548.0 1/1/2013 1/2/2013 1/5/2013 1 \n", "876 Tehran NaN 1/2/2013 1/6/2013 1/6/2013 4 \n", "934 Tehran NaN 1/5/2013 1/5/2013 1/5/2013 0 \n", "1059 Tehran NaN 1/5/2013 1/6/2013 1/7/2013 1 \n", "1207 Tehran NaN 1/6/2013 1/6/2013 1/10/2013 0 \n", "1213 Tehran 38080.0 1/6/2013 1/6/2013 1/7/2013 0 \n", "1235 Tehran NaN 1/6/2013 1/8/2013 1/9/2013 2 \n", "1550 Tehran NaN 1/9/2013 1/9/2013 1/9/2013 0 \n", "1571 ARDEBIL NaN 1/9/2013 1/9/2013 1/27/2013 0 \n", "1617 Tehran 40548.0 1/9/2013 1/10/2013 1/13/2013 1 \n", "1814 Tehran NaN 1/14/2013 1/14/2013 1/26/2013 0 \n", "2032 Tehran NaN 1/15/2013 1/15/2013 1/21/2013 0 \n", "2036 Tehran NaN 1/16/2013 1/20/2013 1/20/2013 4 \n", "2269 Tehran NaN 1/20/2013 1/20/2013 1/20/2013 0 \n", "2772 Tehran NaN 1/24/2013 1/24/2013 1/26/2013 0 \n", "2892 ABHAR NaN 1/27/2013 1/27/2013 1/28/2013 0 \n", "\n", " TAT02 Job_Satus Defect_Des Symptom_Desc \\\n", "No \n", "412 10 1 WORN-OUT OTHER PROBLEMS \n", "542 1 1 WORN-OUT OTHER PROBLEMS \n", "626 12 1 DEFORMED Test OK \n", "704 4 1 Software Bug ZOOM PROBLEM \n", "876 4 1 WORN-OUT OTHER PROBLEMS \n", "934 0 1 WORN-OUT READ ERROR \n", "1059 2 1 WORN-OUT READ ERROR \n", "1207 4 1 WORN-OUT OTHER PROBLEMS \n", "1213 1 1 WORN-OUT Test OK \n", "1235 3 1 WORN-OUT OTHER PROBLEMS \n", "1550 0 1 WORN-OUT READ ERROR \n", "1571 18 1 WORN-OUT OTHER PROBLEMS \n", "1617 4 1 DEFORMED NO POWER \n", "1814 12 1 WORN-OUT OTHER PROBLEMS \n", "2032 6 1 DEFORMED OTHER PROBLEMS \n", "2036 4 1 WORN-OUT OTHER PROBLEMS \n", "2269 0 1 WORN-OUT OTHER PROBLEMS \n", "2772 2 1 WORN-OUT OTHER PROBLEMS \n", "2892 1 1 DEFORMED OTHER PROBLEMS \n", "\n", " Repair_Action_Desc Labor_Charge_Desc Engineer \\\n", "No \n", "412 OTHER REPAIR Free Of Charge Tech16 \n", "542 OTHER REPAIR Maximum Tech20 \n", "626 No Action Free Of Charge Tech08 \n", "704 FIRMWARE UPGRADE Major Tech08 \n", "876 ELECTRICAL PARTS REPLACEMENT Free Of Charge Tech17 \n", "934 MECHANICAL PARTS REPLACEMENT Major Tech12 \n", "1059 MECHANICAL PARTS REPLACEMENT Medium Tech12 \n", "1207 ELECTRICAL PARTS REPLACEMENT Major Tech14 \n", "1213 No Action Free Of Charge Tech20 \n", "1235 ELECTRICAL PARTS REPLACEMENT Minor Tech17 \n", "1550 MECHANICAL PARTS REPLACEMENT Medium Tech12 \n", "1571 ELECTRICAL PARTS REPLACEMENT Free Of Charge Tech18 \n", "1617 ELECTRICAL PARTS REPLACEMENT Major Tech08 \n", "1814 ELECTRICAL PARTS REPLACEMENT Minor Tech14 \n", "2032 MECHANICAL PARTS REPLACEMENT Free Of Charge Tech11 \n", "2036 ELECTRICAL PARTS REPLACEMENT Free Of Charge Tech17 \n", "2269 ELECTRICAL PARTS REPLACEMENT Free Of Charge Tech18 \n", "2772 MECHANICAL PARTS REPLACEMENT Minimum Tech18 \n", "2892 SCRAP Free Of Charge Tech11 \n", "\n", " Labor_Charge_Amount Parts_Amount Discount_Amount Total_Invoice_Amount \n", "No \n", "412 0 0.00 0.000 0.000 \n", "542 350000 0.00 70000.000 280000.000 \n", "626 0 0.00 0.000 0.000 \n", "704 250000 0.00 50000.000 200000.000 \n", "876 0 0.00 0.000 0.000 \n", "934 0 1158705.72 231741.144 826864.576 \n", "1059 0 1158705.72 231741.144 826864.576 \n", "1207 170000 1158705.72 265741.144 1062864.576 \n", "1213 0 0.00 0.000 0.000 \n", "1235 145000 15526.44 32105.288 128421.152 \n", "1550 0 1158705.72 231741.144 826864.576 \n", "1571 0 0.00 0.000 0.000 \n", "1617 250000 0.00 50000.000 200000.000 \n", "1814 120000 868118.56 187823.812 781285.648 \n", "2032 0 0.00 0.000 0.000 \n", "2036 0 18858.40 3871.880 15887.520 \n", "2269 0 1450087.28 280018.456 1160077.824 \n", "2772 70000 868118.56 187823.812 751285.648 \n", "2892 0 0.00 0.000 0.000 " ] }, "metadata": {}, "execution_count": 205 } ], "metadata": {} }, { "cell_type": "markdown", "source": [ "### Data Cleaning\n", "\n", "In this part, based on what we want to do with the data, we restructure, reformat and validate the data.\n", "\n", "\n", "for start, let's remove columns that's not useful for our hypothetical purpose.\n", "\n", " * Does *```Serial_No```* really help the model to decide how much time it takes to be repaired?\n", " * or is *```Product_Date```* values meaningful?\n", "\n", " if the answers to these questions are No, we can drop those columns. \n", " " ], "metadata": {} }, { "cell_type": "code", "execution_count": 206, "source": [ "service_df = service_df.drop(columns=['Product_Date', 'Serial_No'])\n", "service_df" ], "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Service_typeCost_TypeProduct_GroupCityReceipt_DateAppoint_DateComplete_DateTAT01TAT02Job_SatusDefect_DesSymptom_DescRepair_Action_DescLabor_Charge_DescEngineerLabor_Charge_AmountParts_AmountDiscount_AmountTotal_Invoice_Amount
No
10Normal WarrantyAUDIO - HOME THEATTehran2/5/20122/5/20121/21/201303510WORN-OUTOTHER PROBLEMSSOFTWARE UPGRADEFree Of ChargeTech1500.000.0000.000
21Normal WarrantyREFRIGERATOR-SBSTehran7/7/20129/10/20121/31/2013652081SHORTOTHER PROBLEMSOTHER REPAIRFree Of ChargeTech0303437481.60687486.3202748885.280
31Normal WarrantyREFRIGERATOR-SBSTehran7/7/20128/15/20121/31/2013382081OPENOTHER PROBLEMSOTHER REPAIRFree Of ChargeTech0304228756.56845751.3123383005.248
40Out of WarrantyMONITOR - TFT LCDTehran8/7/20128/7/20121/13/201301581WORN-OUTOTHER PROBLEMSELECTRICAL PARTS REPLACEMENTFree Of ChargeTech1500.000.0000.000
50Out of WarrantyAUDIO - MP3 PlayerTehran8/7/20128/9/20121/7/201321531WORN-OUTOTHER PROBLEMSSOFTWARE UPGRADEMediumTech07120000765488.60177087.820708381.680
............................................................
31320Normal WarrantyLCD-40REZVANSHAHR1/31/20131/31/20131/31/2013001WORN-OUTOTHER PROBLEMSELECTRICAL PARTS REPLACEMENTMajorTech134200000.0084000.000336000.000
31330Normal WarrantyLED46ESFAHAN1/31/20131/31/20131/31/2013001WORN-OUTOTHER PROBLEMSELECTRICAL PARTS REPLACEMENTMajorTech134200000.0084000.000336000.000
31340Extended WarrantyHARD DISK DRIVETehran1/31/20131/31/20131/31/2013001WORN-OUTOTHER PROBLEMSREPAIR REFUSALMinimumTech07500000.0010000.00040000.000
31350Out of WarrantyHARD DISK DRIVETehran1/31/20131/31/20131/31/2013001WORN-OUTOTHER PROBLEMSREPAIR REFUSALFree Of ChargeTech0700.000.0000.000
31360Out of WarrantyDVD PLAYER/ RECORDTehran1/31/20131/31/20131/31/2013001WORN-OUTOTHER PROBLEMSMECHANICAL PARTS REPLACEMENTMinorTech181500001488482.00328686.4001318785.600
\n", "

3136 rows × 19 columns

\n", "
" ], "text/plain": [ " Service_type Cost_Type Product_Group City \\\n", "No \n", "1 0 Normal Warranty AUDIO - HOME THEAT Tehran \n", "2 1 Normal Warranty REFRIGERATOR-SBS Tehran \n", "3 1 Normal Warranty REFRIGERATOR-SBS Tehran \n", "4 0 Out of Warranty MONITOR - TFT LCD Tehran \n", "5 0 Out of Warranty AUDIO - MP3 Player Tehran \n", "... ... ... ... ... \n", "3132 0 Normal Warranty LCD-40 REZVANSHAHR \n", "3133 0 Normal Warranty LED46 ESFAHAN \n", "3134 0 Extended Warranty HARD DISK DRIVE Tehran \n", "3135 0 Out of Warranty HARD DISK DRIVE Tehran \n", "3136 0 Out of Warranty DVD PLAYER/ RECORD Tehran \n", "\n", " Receipt_Date Appoint_Date Complete_Date TAT01 TAT02 Job_Satus \\\n", "No \n", "1 2/5/2012 2/5/2012 1/21/2013 0 351 0 \n", "2 7/7/2012 9/10/2012 1/31/2013 65 208 1 \n", "3 7/7/2012 8/15/2012 1/31/2013 38 208 1 \n", "4 8/7/2012 8/7/2012 1/13/2013 0 158 1 \n", "5 8/7/2012 8/9/2012 1/7/2013 2 153 1 \n", "... ... ... ... ... ... ... \n", "3132 1/31/2013 1/31/2013 1/31/2013 0 0 1 \n", "3133 1/31/2013 1/31/2013 1/31/2013 0 0 1 \n", "3134 1/31/2013 1/31/2013 1/31/2013 0 0 1 \n", "3135 1/31/2013 1/31/2013 1/31/2013 0 0 1 \n", "3136 1/31/2013 1/31/2013 1/31/2013 0 0 1 \n", "\n", " Defect_Des Symptom_Desc Repair_Action_Desc \\\n", "No \n", "1 WORN-OUT OTHER PROBLEMS SOFTWARE UPGRADE \n", "2 SHORT OTHER PROBLEMS OTHER REPAIR \n", "3 OPEN OTHER PROBLEMS OTHER REPAIR \n", "4 WORN-OUT OTHER PROBLEMS ELECTRICAL PARTS REPLACEMENT \n", "5 WORN-OUT OTHER PROBLEMS SOFTWARE UPGRADE \n", "... ... ... ... \n", "3132 WORN-OUT OTHER PROBLEMS ELECTRICAL PARTS REPLACEMENT \n", "3133 WORN-OUT OTHER PROBLEMS ELECTRICAL PARTS REPLACEMENT \n", "3134 WORN-OUT OTHER PROBLEMS REPAIR REFUSAL \n", "3135 WORN-OUT OTHER PROBLEMS REPAIR REFUSAL \n", "3136 WORN-OUT OTHER PROBLEMS MECHANICAL PARTS REPLACEMENT \n", "\n", " Labor_Charge_Desc Engineer Labor_Charge_Amount Parts_Amount \\\n", "No \n", "1 Free Of Charge Tech15 0 0.00 \n", "2 Free Of Charge Tech03 0 3437481.60 \n", "3 Free Of Charge Tech03 0 4228756.56 \n", "4 Free Of Charge Tech15 0 0.00 \n", "5 Medium Tech07 120000 765488.60 \n", "... ... ... ... ... \n", "3132 Major Tech13 420000 0.00 \n", "3133 Major Tech13 420000 0.00 \n", "3134 Minimum Tech07 50000 0.00 \n", "3135 Free Of Charge Tech07 0 0.00 \n", "3136 Minor Tech18 150000 1488482.00 \n", "\n", " Discount_Amount Total_Invoice_Amount \n", "No \n", "1 0.000 0.000 \n", "2 687486.320 2748885.280 \n", "3 845751.312 3383005.248 \n", "4 0.000 0.000 \n", "5 177087.820 708381.680 \n", "... ... ... \n", "3132 84000.000 336000.000 \n", "3133 84000.000 336000.000 \n", "3134 10000.000 40000.000 \n", "3135 0.000 0.000 \n", "3136 328686.400 1318785.600 \n", "\n", "[3136 rows x 19 columns]" ] }, "metadata": {}, "execution_count": 206 } ], "metadata": {} }, { "cell_type": "markdown", "source": [ "Dataset at hand have some NaN values. what does it mean? is it that the value exist but not provided or there isn't anything to fill that field?\n", "\n", "possible solutions\n", "\n", "* drop data\n", " - drop the whole row\n", " - drop the whole column\n", "* [Imputation](https://en.wikipedia.org/wiki/Imputation_(statistics))\n", " - replace it by mean\n", " - replace it by frequency\n", " - adding unkown category\n", " - ...\n", "\n", "it's important to have a logical reason. it's not reasonable to fill NaN values with mean when NaN occupies half of that column." ], "metadata": {} }, { "cell_type": "markdown", "source": [ "Where's the NaN values ?" ], "metadata": {} }, { "cell_type": "code", "execution_count": 207, "source": [ "service_df.isna().sum()" ], "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "Service_type 0\n", "Cost_Type 0\n", "Product_Group 10\n", "City 0\n", "Receipt_Date 0\n", "Appoint_Date 0\n", "Complete_Date 0\n", "TAT01 0\n", "TAT02 0\n", "Job_Satus 0\n", "Defect_Des 0\n", "Symptom_Desc 0\n", "Repair_Action_Desc 0\n", "Labor_Charge_Desc 0\n", "Engineer 0\n", "Labor_Charge_Amount 0\n", "Parts_Amount 0\n", "Discount_Amount 0\n", "Total_Invoice_Amount 0\n", "dtype: int64" ] }, "metadata": {}, "execution_count": 207 } ], "metadata": {} }, { "cell_type": "markdown", "source": [ "Let's see rows with NaN values." ], "metadata": {} }, { "cell_type": "code", "execution_count": 208, "source": [ "service_df[pd.isna(service_df.Product_Group)]" ], "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Service_typeCost_TypeProduct_GroupCityReceipt_DateAppoint_DateComplete_DateTAT01TAT02Job_SatusDefect_DesSymptom_DescRepair_Action_DescLabor_Charge_DescEngineerLabor_Charge_AmountParts_AmountDiscount_AmountTotal_Invoice_Amount
No
4120Out of WarrantyNaNTehran12/29/201212/31/20121/8/20132101WORN-OUTOTHER PROBLEMSOTHER REPAIRFree Of ChargeTech1600.000.0000.000
6260Out of WarrantyNaNMASHHAD1/1/20131/8/20131/13/20137121DEFORMEDTest OKNo ActionFree Of ChargeTech0800.000.0000.000
7040Out of WarrantyNaNTehran1/1/20131/2/20131/5/2013141Software BugZOOM PROBLEMFIRMWARE UPGRADEMajorTech082500000.0050000.000200000.000
9340Normal WarrantyNaNTehran1/5/20131/5/20131/5/2013001WORN-OUTREAD ERRORMECHANICAL PARTS REPLACEMENTMajorTech1201158705.72231741.144826864.576
10590Normal WarrantyNaNTehran1/5/20131/6/20131/7/2013121WORN-OUTREAD ERRORMECHANICAL PARTS REPLACEMENTMediumTech1201158705.72231741.144826864.576
12070Normal WarrantyNaNTehran1/6/20131/6/20131/10/2013041WORN-OUTOTHER PROBLEMSELECTRICAL PARTS REPLACEMENTMajorTech141700001158705.72265741.1441062864.576
12130Out of WarrantyNaNTehran1/6/20131/6/20131/7/2013011WORN-OUTTest OKNo ActionFree Of ChargeTech2000.000.0000.000
15500Normal WarrantyNaNTehran1/9/20131/9/20131/9/2013001WORN-OUTREAD ERRORMECHANICAL PARTS REPLACEMENTMediumTech1201158705.72231741.144826864.576
15710Normal WarrantyNaNARDEBIL1/9/20131/9/20131/27/20130181WORN-OUTOTHER PROBLEMSELECTRICAL PARTS REPLACEMENTFree Of ChargeTech1800.000.0000.000
16170Out of WarrantyNaNTehran1/9/20131/10/20131/13/2013141DEFORMEDNO POWERELECTRICAL PARTS REPLACEMENTMajorTech082500000.0050000.000200000.000
\n", "
" ], "text/plain": [ " Service_type Cost_Type Product_Group City Receipt_Date \\\n", "No \n", "412 0 Out of Warranty NaN Tehran 12/29/2012 \n", "626 0 Out of Warranty NaN MASHHAD 1/1/2013 \n", "704 0 Out of Warranty NaN Tehran 1/1/2013 \n", "934 0 Normal Warranty NaN Tehran 1/5/2013 \n", "1059 0 Normal Warranty NaN Tehran 1/5/2013 \n", "1207 0 Normal Warranty NaN Tehran 1/6/2013 \n", "1213 0 Out of Warranty NaN Tehran 1/6/2013 \n", "1550 0 Normal Warranty NaN Tehran 1/9/2013 \n", "1571 0 Normal Warranty NaN ARDEBIL 1/9/2013 \n", "1617 0 Out of Warranty NaN Tehran 1/9/2013 \n", "\n", " Appoint_Date Complete_Date TAT01 TAT02 Job_Satus Defect_Des \\\n", "No \n", "412 12/31/2012 1/8/2013 2 10 1 WORN-OUT \n", "626 1/8/2013 1/13/2013 7 12 1 DEFORMED \n", "704 1/2/2013 1/5/2013 1 4 1 Software Bug \n", "934 1/5/2013 1/5/2013 0 0 1 WORN-OUT \n", "1059 1/6/2013 1/7/2013 1 2 1 WORN-OUT \n", "1207 1/6/2013 1/10/2013 0 4 1 WORN-OUT \n", "1213 1/6/2013 1/7/2013 0 1 1 WORN-OUT \n", "1550 1/9/2013 1/9/2013 0 0 1 WORN-OUT \n", "1571 1/9/2013 1/27/2013 0 18 1 WORN-OUT \n", "1617 1/10/2013 1/13/2013 1 4 1 DEFORMED \n", "\n", " Symptom_Desc Repair_Action_Desc Labor_Charge_Desc Engineer \\\n", "No \n", "412 OTHER PROBLEMS OTHER REPAIR Free Of Charge Tech16 \n", "626 Test OK No Action Free Of Charge Tech08 \n", "704 ZOOM PROBLEM FIRMWARE UPGRADE Major Tech08 \n", "934 READ ERROR MECHANICAL PARTS REPLACEMENT Major Tech12 \n", "1059 READ ERROR MECHANICAL PARTS REPLACEMENT Medium Tech12 \n", "1207 OTHER PROBLEMS ELECTRICAL PARTS REPLACEMENT Major Tech14 \n", "1213 Test OK No Action Free Of Charge Tech20 \n", "1550 READ ERROR MECHANICAL PARTS REPLACEMENT Medium Tech12 \n", "1571 OTHER PROBLEMS ELECTRICAL PARTS REPLACEMENT Free Of Charge Tech18 \n", "1617 NO POWER ELECTRICAL PARTS REPLACEMENT Major Tech08 \n", "\n", " Labor_Charge_Amount Parts_Amount Discount_Amount Total_Invoice_Amount \n", "No \n", "412 0 0.00 0.000 0.000 \n", "626 0 0.00 0.000 0.000 \n", "704 250000 0.00 50000.000 200000.000 \n", "934 0 1158705.72 231741.144 826864.576 \n", "1059 0 1158705.72 231741.144 826864.576 \n", "1207 170000 1158705.72 265741.144 1062864.576 \n", "1213 0 0.00 0.000 0.000 \n", "1550 0 1158705.72 231741.144 826864.576 \n", "1571 0 0.00 0.000 0.000 \n", "1617 250000 0.00 50000.000 200000.000 " ] }, "metadata": {}, "execution_count": 208 } ], "metadata": {} }, { "cell_type": "markdown", "source": [ "based on the task you about to do with data after the data is prepared, decide how much droping rows is logical. if 10 rows is not a lot, we can drop them." ], "metadata": {} }, { "cell_type": "code", "execution_count": 209, "source": [ "service_df.dropna(inplace=True)" ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "what cities are in the dataset?" ], "metadata": {} }, { "cell_type": "code", "execution_count": 210, "source": [ "print(sorted(service_df.City.unique()))" ], "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "['ABADAN', 'ABHAR', 'AHVAZ', 'AMOL', 'ARAK', 'ARDEBI', 'ARDEBIL', 'BABOL', 'BABOLSAR', 'BANDAR ABBAS', 'BANDAR ANZALI', 'BANDARABBAS', 'BARDSIR', 'BEHBAHAN', 'BIJAR', 'BIRJAND', 'BOJNORD', 'BOKAN', 'BOMEHEN', 'BOROJERD', 'CHABAHAR', 'CHALOOS', 'CHALOUS', 'CHENARAN', 'DEZFOL', 'DEZFOUL', 'DOROD', 'ESFAHAN', 'ESFAHN', 'ESFERAYN', 'ESLAM ABAD', 'ESLAM SHAHR', 'ESLAMSHAHR', 'FARDIS', 'FARHANG', 'GARMSAR', 'GHAEM SHAHR', 'GHAEMSHAHR', 'GHARCHAK', 'GOLPAYGAN', 'GONBAD', 'GONBAD KAVOS', 'GORGAN', 'HAMEDAN', 'HASHTGERD', 'HSMEDAN', 'ILAM', 'IZEH', 'JAHROM', 'JEYROFT', 'KARAJ', 'KASHAN', 'KASHMAR', 'KERMAN', 'KERMANSHAH', 'KHOMAM', 'KHOMEIN', 'KHORAM ABAD', 'KHOY', 'KhoramAbad', 'LAHIJAN', 'LANGEROD', 'LANGEROOD', 'MAHMOUD ABAD', 'MAHSHAHR', 'MALAYER', 'MANJIL', 'MASHHAD', 'MEHRSHAHR', 'MIYANEH', 'NAHAVAND', 'NIYSHABOUR', 'NOR', 'NOSHAHR', 'NOUR', 'NOUSHAHR', 'OROMIYEH', 'PARS ABAD', 'QAZVIN', 'QOM', 'RAFSANJAN', 'RASHT', 'REZVANSHAHR', 'ROBAT KARIM', 'ROBATKARIM', 'RODSAR', 'SABZEVAR', 'SALMAS', 'SARAF ZADEH', 'SARBANDAR', 'SARI', 'SAVAD KOOH', 'SAVEH', 'SAVOJBOLAGH', 'SEMNAN', 'SHAHR GHODS', 'SHAHRE GHODS', 'SHAHRE KORD', 'SHAHRGHODS', 'SHAHROD', 'SHAHROOD', 'SHAHRYAR', 'SHAZAND', 'SHIRAZ', 'SIRJAN', 'TABAS', 'TABRIZ', 'TALESH', 'TEHRAN', 'TONEKABON', 'Tehran', 'VARAMIN', 'YAZD', 'ZAHEDAN', 'ZANJAN']\n" ] } ], "metadata": {} }, { "cell_type": "markdown", "source": [ "Look at the values above, what you see? we have multiple ways of writing same city name. \n", "\n", "* 'ARDEBI', 'ARDEBIL'\n", "* 'BANDAR ABBAS', 'BANDARABBAS'\n", "* 'CHALOOS', 'CHALOUS'\n", "* 'DEZFOL', 'DEZFOUL'\n", "* 'ESFAHAN', 'ESFAHN'\n", "* 'ESLAMSHAHR', 'ESLAM SHAHR'\n", "* 'GHAEM SHAHR', 'GHAEMSHAHR'\n", "* 'HAMEDAN', 'HSMEDAN'\n", "* 'KHORAM ABAD', 'KhoramAbad'\n", "* 'LANGEROD', 'LANGEROOD'\n", "* 'NOR', 'NOUR'\n", "* 'NOSHAHR', 'NOUSHAHR'\n", "* 'ROBAT KARIM', 'ROBATKARIM'\n", "* 'SHAHR GHODS', 'SHAHRE GHODS', 'SHAHRGHODS'\n", "* 'SHAHROD', 'SHAHROOD'\n", "* 'TEHRAN', 'Tehran'\n", "\n", "so we should transform them into a common format. it's a Data standardization." ], "metadata": {} }, { "cell_type": "code", "execution_count": 211, "source": [ "city_dictionary = {'ARDEBIL': 'ARDEBI', 'BANDARABBAS': 'BANDAR ABBAS', 'CHALOUS': 'CHALOOS', 'DEZFOUL': 'DEZFOL', 'ESFAHN': 'ESFAHAN', 'ESLAM SHAHR': 'ESLAMSHAHR', 'GHAEMSHAHR': 'GHAEM SHAHR', 'HAMEDAN': 'HSMEDAN', 'KhoramAbad': 'KHORAM ABAD', 'LANGEROOD': 'LANGEROD', 'NOUR': 'NOR', 'NOUSHAHR': 'NOSHAHR', 'ROBATKARIM': 'ROBAT KARIM','SHAHRE GHODS': 'SHAHR GHODS', 'SHAHRGHODS': 'SHAHR GHODS', 'SHAHROOD': 'SHAHROD', 'Tehran': 'TEHRAN'}\n", "service_df = service_df.replace({'City': city_dictionary})" ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "Now let's check if dataframe have duplicate rows or not." ], "metadata": {} }, { "cell_type": "code", "execution_count": 212, "source": [ "service_df.duplicated().any()" ], "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "True" ] }, "metadata": {}, "execution_count": 212 } ], "metadata": {} }, { "cell_type": "markdown", "source": [ "unfortunately, it does. since we're looking at the dataframe as [Analytical base table(ABT)](https://en.wikipedia.org/wiki/Analytical_base_table), it's better to drop them." ], "metadata": {} }, { "cell_type": "code", "execution_count": 213, "source": [ "service_df.drop_duplicates(inplace=True)\n", "service_df.shape" ], "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "(1971, 19)" ] }, "metadata": {}, "execution_count": 213 } ], "metadata": {} }, { "cell_type": "markdown", "source": [ "let's look again at dtypes. we can convert them into more sufficent types." ], "metadata": {} }, { "cell_type": "code", "execution_count": 214, "source": [ "service_df.dtypes" ], "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "Service_type int64\n", "Cost_Type object\n", "Product_Group object\n", "City object\n", "Receipt_Date object\n", "Appoint_Date object\n", "Complete_Date object\n", "TAT01 int64\n", "TAT02 int64\n", "Job_Satus int64\n", "Defect_Des object\n", "Symptom_Desc object\n", "Repair_Action_Desc object\n", "Labor_Charge_Desc object\n", "Engineer object\n", "Labor_Charge_Amount int64\n", "Parts_Amount float64\n", "Discount_Amount float64\n", "Total_Invoice_Amount float64\n", "dtype: object" ] }, "metadata": {}, "execution_count": 214 } ], "metadata": {} }, { "cell_type": "markdown", "source": [ "*```Receipt_Date```*, *```Appoint_Date```* and *```Complete_Date```* elements are object, let's convert them to datetime type.\n", "Remember that we have explored the data. so we know data is using this (%m/%d/%Y) time format." ], "metadata": {} }, { "cell_type": "code", "execution_count": 215, "source": [ "service_df[[\"Receipt_Date\", \"Appoint_Date\", \"Complete_Date\"]] = service_df[[\"Receipt_Date\", \"Appoint_Date\", \"Complete_Date\"]].apply(lambda _: pd.to_datetime(_, format='%m/%d/%Y'))" ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "Now let's see all values for each column with object dtype" ], "metadata": {} }, { "cell_type": "code", "execution_count": 216, "source": [ "for column in service_df.select_dtypes(include=[object]).columns:\n", " print(column, sorted(service_df[column].unique()), sep=\"\\n\\t\")\n", " print(\"-------------------------\")" ], "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Cost_Type\n", "\t['Extended Warranty', 'Normal Warranty', 'Out of Warranty']\n", "-------------------------\n", "Product_Group\n", "\t['AIR CONDITIONER ', 'AUDIO - HOME THEAT', 'AUDIO - MP3 Player', 'CAMCORDER', 'DIGITAL CAMERA', 'DVD PLAYER', 'DVD PLAYER/ RECORD', 'HARD DISK DRIVE', 'LCD- 46', 'LCD-40', 'LED40', 'LED46', 'LED55', 'MONITOR - TFT LCD ', 'OVEN - MICROWAVE', 'PLASMA-42', 'PLASMA-43', 'PLASMA-50', 'PLASMA-51', 'PRINTER - COLOR LA', 'PRINTER - MONO LAS', 'REFRIGERATOR-SBS', 'TFT18', 'VACUUM CLEANER', 'WASHING MACHINE']\n", "-------------------------\n", "City\n", "\t['ABADAN', 'ABHAR', 'AHVAZ', 'AMOL', 'ARAK', 'ARDEBI', 'BABOL', 'BABOLSAR', 'BANDAR ABBAS', 'BANDAR ANZALI', 'BARDSIR', 'BEHBAHAN', 'BIJAR', 'BIRJAND', 'BOJNORD', 'BOKAN', 'BOMEHEN', 'BOROJERD', 'CHABAHAR', 'CHALOOS', 'CHENARAN', 'DEZFOL', 'DOROD', 'ESFAHAN', 'ESFERAYN', 'ESLAM ABAD', 'ESLAMSHAHR', 'FARDIS', 'FARHANG', 'GARMSAR', 'GHAEM SHAHR', 'GHARCHAK', 'GOLPAYGAN', 'GONBAD', 'GONBAD KAVOS', 'GORGAN', 'HASHTGERD', 'HSMEDAN', 'ILAM', 'IZEH', 'JAHROM', 'JEYROFT', 'KARAJ', 'KASHAN', 'KASHMAR', 'KERMAN', 'KERMANSHAH', 'KHOMAM', 'KHOMEIN', 'KHORAM ABAD', 'KHOY', 'LAHIJAN', 'LANGEROD', 'MAHMOUD ABAD', 'MAHSHAHR', 'MALAYER', 'MANJIL', 'MASHHAD', 'MEHRSHAHR', 'MIYANEH', 'NAHAVAND', 'NIYSHABOUR', 'NOR', 'NOSHAHR', 'OROMIYEH', 'PARS ABAD', 'QAZVIN', 'QOM', 'RAFSANJAN', 'RASHT', 'REZVANSHAHR', 'ROBAT KARIM', 'RODSAR', 'SABZEVAR', 'SALMAS', 'SARAF ZADEH', 'SARBANDAR', 'SARI', 'SAVAD KOOH', 'SAVEH', 'SAVOJBOLAGH', 'SEMNAN', 'SHAHR GHODS', 'SHAHRE KORD', 'SHAHROD', 'SHAHRYAR', 'SHAZAND', 'SHIRAZ', 'SIRJAN', 'TABAS', 'TABRIZ', 'TALESH', 'TEHRAN', 'TONEKABON', 'VARAMIN', 'YAZD', 'ZAHEDAN', 'ZANJAN']\n", "-------------------------\n", "Defect_Des\n", "\t['COLD SOLDER', 'DEFORMED', 'DIRTY', 'DRY', 'LEAKING', 'OPEN', 'SHORT', 'Software Bug', 'WORN-OUT', 'WRONG PARTS']\n", "-------------------------\n", "Symptom_Desc\n", "\t['BEATING ON PICTURE', 'BUTTON/DIAL NOT WORKING', 'Broken External Flash', 'Cosmetic replacement(customer request)', 'DISPLAY DIM', 'Defective Date Mode', 'Defective Eye Cup', 'Defective LCD Touch Screen', 'Defective Usb Cable', 'EXCESSIVE COLOR', 'Exterior Problem/damage', 'FAULTY COUNTER/TIMER', 'FAULTY OSD', 'FIRMWARE UPAGRADE REQUESTED', 'HORIZONTAL LINE PRINTED', 'MEMORY ERASED/MISSED', 'NO AUDIO', 'NO LOADING', 'NO PICTURE/NO RASTER', 'NO POWER', 'No Power', 'OTHER FAULTY INDICATION', 'OTHER PROBLEMS', 'OUTCASE CRACK', 'PICTURE SLIDING', 'PICTURE TOO BRIGHT', 'PICTURE TOO DARK', 'PIXEL MISSING', 'POOR FOCUS', 'POWER OFF NOT WORKING', 'POWER ON, BUT NO OPERATION', 'Power On, But No Operation', 'READ ERROR', 'RESET NOT WORKING', 'SHORT BATTERY CHARGING', 'SHORT BATTERY LIFE', 'SNOWY PICTURE', 'SOUND CLIPPING', 'Test OK', 'UNIT AUTO-OFF', 'VERTICAL LINE PRINTED']\n", "-------------------------\n", "Repair_Action_Desc\n", "\t['ADJUSTMENT', 'CHANGE LCD', 'CHANGE OUTCASE', 'CLEANING', 'Cosmetic replacement(customer request)', \"DECK ASS'Y REPAIR/REPLACEMENT\", 'DECK CLEANING', 'ELECTRICAL PARTS REPLACEMENT', 'FIRMWARE UPGRADE', 'HEAD CLEANING', 'MECHANICAL PARTS REPLACEMENT', 'MECHANICAL RE-ASSEMBLING', 'No Action', 'OTHER CLEANING', 'OTHER ELECTRICAL ADJUSTMENT', 'OTHER MECHANICAL ADJUSTMENT', 'OTHER REPAIR', 'PARTS REMOVAL', 'PCB REPLACEMENT', 'PICKUP CLEANING', 'REPAIR REFUSAL', 'SCRAP', 'SET EXCHANGE', 'SOFTWARE UPGRADE', 'SOLDERING', 'TELECOM CARRIER PROBLEM']\n", "-------------------------\n", "Labor_Charge_Desc\n", "\t['Free Of Charge', 'Major', 'Maximum', 'Medium', 'Minimum', 'Minor']\n", "-------------------------\n", "Engineer\n", "\t['Tech01', 'Tech02', 'Tech03', 'Tech04', 'Tech05', 'Tech06', 'Tech07', 'Tech08', 'Tech10', 'Tech11', 'Tech12', 'Tech13', 'Tech14', 'Tech15', 'Tech16', 'Tech17', 'Tech18', 'Tech20', 'Tech21', 'Tech22', 'Tech23', 'Tech24']\n", "-------------------------\n" ] } ], "metadata": {} }, { "cell_type": "markdown", "source": [ "it seems all columns with object dtype can be seen as category type." ], "metadata": {} }, { "cell_type": "code", "execution_count": 217, "source": [ "for column in service_df.select_dtypes(include=[object]).columns:\n", " service_df[column] = service_df[column].astype('category')" ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "Now let's see all values for each column with integer dtype" ], "metadata": {} }, { "cell_type": "code", "execution_count": 218, "source": [ "for column in service_df.select_dtypes(include=['int64']).columns:\n", " print(column, sorted(service_df[column].unique()), sep=\"\\n\\t\")\n", " print(\"-------------------------\")" ], "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Service_type\n", "\t[0, 1]\n", "-------------------------\n", "TAT01\n", "\t[0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 34, 36, 37, 38, 40, 42, 44, 50, 54, 55, 58, 64, 65, 67, 68, 81]\n", "-------------------------\n", "TAT02\n", "\t[0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21, 22, 23, 24, 25, 26, 27, 28, 30, 31, 32, 33, 34, 35, 36, 37, 38, 40, 41, 42, 43, 44, 45, 46, 47, 48, 50, 51, 52, 54, 56, 58, 60, 65, 68, 73, 75, 77, 78, 80, 81, 82, 84, 87, 88, 105, 122, 135, 153, 158, 208, 351]\n", "-------------------------\n", "Job_Satus\n", "\t[0, 1]\n", "-------------------------\n", "Labor_Charge_Amount\n", "\t[0, 50000, 70000, 85000, 120000, 145000, 150000, 170000, 180000, 200000, 215000, 240000, 250000, 300000, 350000, 360000, 400000, 420000, 540000]\n", "-------------------------\n" ] } ], "metadata": {} }, { "cell_type": "markdown", "source": [ "also *```Service_type```* and *```Job_Satus```* can be seen as categorical data type." ], "metadata": {} }, { "cell_type": "code", "execution_count": 219, "source": [ "service_df['Service_type'] = service_df['Service_type'].astype('category')\n", "service_df['Job_Satus'] = service_df['Job_Satus'].astype('category')" ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "Scaling and Normalization\n", "\n", "*```Labor_Charge_Amount```*, *```Parts_Amount```*, *```Discount_Amount```* and *```Total_Invoice_Amount```* values shows how big their are. we can scale them down \n" ], "metadata": {} }, { "cell_type": "code", "execution_count": 220, "source": [ "service_df['Labor_Charge_Amount'] = service_df['Labor_Charge_Amount'].div(1000)\n", "service_df['Parts_Amount'] = service_df['Parts_Amount'].div(1000)\n", "service_df['Discount_Amount'] = service_df['Discount_Amount'].div(1000)\n", "service_df['Total_Invoice_Amount'] = service_df['Total_Invoice_Amount'].div(1000)" ], "outputs": [], "metadata": {} }, { "cell_type": "code", "execution_count": 221, "source": [ "service_df.head()" ], "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Service_typeCost_TypeProduct_GroupCityReceipt_DateAppoint_DateComplete_DateTAT01TAT02Job_SatusDefect_DesSymptom_DescRepair_Action_DescLabor_Charge_DescEngineerLabor_Charge_AmountParts_AmountDiscount_AmountTotal_Invoice_Amount
No
10Normal WarrantyAUDIO - HOME THEATTEHRAN2012-02-052012-02-052013-01-2103510WORN-OUTOTHER PROBLEMSSOFTWARE UPGRADEFree Of ChargeTech150.00.000000.0000000.000000
21Normal WarrantyREFRIGERATOR-SBSTEHRAN2012-07-072012-09-102013-01-31652081SHORTOTHER PROBLEMSOTHER REPAIRFree Of ChargeTech030.03437.48160687.4863202748.885280
31Normal WarrantyREFRIGERATOR-SBSTEHRAN2012-07-072012-08-152013-01-31382081OPENOTHER PROBLEMSOTHER REPAIRFree Of ChargeTech030.04228.75656845.7513123383.005248
40Out of WarrantyMONITOR - TFT LCDTEHRAN2012-08-072012-08-072013-01-1301581WORN-OUTOTHER PROBLEMSELECTRICAL PARTS REPLACEMENTFree Of ChargeTech150.00.000000.0000000.000000
50Out of WarrantyAUDIO - MP3 PlayerTEHRAN2012-08-072012-08-092013-01-0721531WORN-OUTOTHER PROBLEMSSOFTWARE UPGRADEMediumTech07120.0765.48860177.087820708.381680
\n", "
" ], "text/plain": [ " Service_type Cost_Type Product_Group City Receipt_Date \\\n", "No \n", "1 0 Normal Warranty AUDIO - HOME THEAT TEHRAN 2012-02-05 \n", "2 1 Normal Warranty REFRIGERATOR-SBS TEHRAN 2012-07-07 \n", "3 1 Normal Warranty REFRIGERATOR-SBS TEHRAN 2012-07-07 \n", "4 0 Out of Warranty MONITOR - TFT LCD TEHRAN 2012-08-07 \n", "5 0 Out of Warranty AUDIO - MP3 Player TEHRAN 2012-08-07 \n", "\n", " Appoint_Date Complete_Date TAT01 TAT02 Job_Satus Defect_Des \\\n", "No \n", "1 2012-02-05 2013-01-21 0 351 0 WORN-OUT \n", "2 2012-09-10 2013-01-31 65 208 1 SHORT \n", "3 2012-08-15 2013-01-31 38 208 1 OPEN \n", "4 2012-08-07 2013-01-13 0 158 1 WORN-OUT \n", "5 2012-08-09 2013-01-07 2 153 1 WORN-OUT \n", "\n", " Symptom_Desc Repair_Action_Desc Labor_Charge_Desc Engineer \\\n", "No \n", "1 OTHER PROBLEMS SOFTWARE UPGRADE Free Of Charge Tech15 \n", "2 OTHER PROBLEMS OTHER REPAIR Free Of Charge Tech03 \n", "3 OTHER PROBLEMS OTHER REPAIR Free Of Charge Tech03 \n", "4 OTHER PROBLEMS ELECTRICAL PARTS REPLACEMENT Free Of Charge Tech15 \n", "5 OTHER PROBLEMS SOFTWARE UPGRADE Medium Tech07 \n", "\n", " Labor_Charge_Amount Parts_Amount Discount_Amount Total_Invoice_Amount \n", "No \n", "1 0.0 0.00000 0.000000 0.000000 \n", "2 0.0 3437.48160 687.486320 2748.885280 \n", "3 0.0 4228.75656 845.751312 3383.005248 \n", "4 0.0 0.00000 0.000000 0.000000 \n", "5 120.0 765.48860 177.087820 708.381680 " ] }, "metadata": {}, "execution_count": 221 } ], "metadata": {} }, { "cell_type": "markdown", "source": [ "Dummy variables\n", "\n", "how about [Regression](https://en.wikipedia.org/wiki/Regression_analysis) model? but all variables should be numerical whereas dataframe have some columns with categorical type.\n", "\n", "to address, An indicator variable rendered as a numerical variable can be used to label categorical variables.\n" ], "metadata": {} }, { "cell_type": "code", "execution_count": 222, "source": [ "dummy_variable_cost_type = pd.get_dummies(service_df[\"Cost_Type\"])\n", "dummy_variable_cost_type" ], "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Extended WarrantyNormal WarrantyOut of Warranty
No
1010
2010
3010
4001
5001
............
3132010
3133010
3134100
3135001
3136001
\n", "

1971 rows × 3 columns

\n", "
" ], "text/plain": [ " Extended Warranty Normal Warranty Out of Warranty\n", "No \n", "1 0 1 0\n", "2 0 1 0\n", "3 0 1 0\n", "4 0 0 1\n", "5 0 0 1\n", "... ... ... ...\n", "3132 0 1 0\n", "3133 0 1 0\n", "3134 1 0 0\n", "3135 0 0 1\n", "3136 0 0 1\n", "\n", "[1971 rows x 3 columns]" ] }, "metadata": {}, "execution_count": 222 } ], "metadata": {} }, { "cell_type": "markdown", "source": [ "as you can see, there's only one cell in each row with value one. " ], "metadata": {} }, { "cell_type": "markdown", "source": [ "### Merging Datasets\n", "\n", "so we have the *service_df* Dataframe and *dummy_variable_cost_type* Dataframe. we want to merge them into one Dataframe. in this case we append columns from *dummy_variable_cost_type* into *service_df*.\n", "\n", "this [link](https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html) is a comprehensive description to merging\n" ], "metadata": {} }, { "cell_type": "code", "execution_count": 223, "source": [ "service_df = pd.concat([service_df, dummy_variable_cost_type], axis=1)" ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "Droping original \"Cost_Type\" column since it's redundant." ], "metadata": {} }, { "cell_type": "code", "execution_count": 224, "source": [ "service_df.drop('Cost_Type', axis = 1, inplace=True)\n", "service_df" ], "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Service_typeProduct_GroupCityReceipt_DateAppoint_DateComplete_DateTAT01TAT02Job_SatusDefect_DesSymptom_DescRepair_Action_DescLabor_Charge_DescEngineerLabor_Charge_AmountParts_AmountDiscount_AmountTotal_Invoice_AmountExtended WarrantyNormal WarrantyOut of Warranty
No
10AUDIO - HOME THEATTEHRAN2012-02-052012-02-052013-01-2103510WORN-OUTOTHER PROBLEMSSOFTWARE UPGRADEFree Of ChargeTech150.00.000000.0000000.000000010
21REFRIGERATOR-SBSTEHRAN2012-07-072012-09-102013-01-31652081SHORTOTHER PROBLEMSOTHER REPAIRFree Of ChargeTech030.03437.48160687.4863202748.885280010
31REFRIGERATOR-SBSTEHRAN2012-07-072012-08-152013-01-31382081OPENOTHER PROBLEMSOTHER REPAIRFree Of ChargeTech030.04228.75656845.7513123383.005248010
40MONITOR - TFT LCDTEHRAN2012-08-072012-08-072013-01-1301581WORN-OUTOTHER PROBLEMSELECTRICAL PARTS REPLACEMENTFree Of ChargeTech150.00.000000.0000000.000000001
50AUDIO - MP3 PlayerTEHRAN2012-08-072012-08-092013-01-0721531WORN-OUTOTHER PROBLEMSSOFTWARE UPGRADEMediumTech07120.0765.48860177.087820708.381680001
..................................................................
31320LCD-40REZVANSHAHR2013-01-312013-01-312013-01-31001WORN-OUTOTHER PROBLEMSELECTRICAL PARTS REPLACEMENTMajorTech13420.00.0000084.000000336.000000010
31330LED46ESFAHAN2013-01-312013-01-312013-01-31001WORN-OUTOTHER PROBLEMSELECTRICAL PARTS REPLACEMENTMajorTech13420.00.0000084.000000336.000000010
31340HARD DISK DRIVETEHRAN2013-01-312013-01-312013-01-31001WORN-OUTOTHER PROBLEMSREPAIR REFUSALMinimumTech0750.00.0000010.00000040.000000100
31350HARD DISK DRIVETEHRAN2013-01-312013-01-312013-01-31001WORN-OUTOTHER PROBLEMSREPAIR REFUSALFree Of ChargeTech070.00.000000.0000000.000000001
31360DVD PLAYER/ RECORDTEHRAN2013-01-312013-01-312013-01-31001WORN-OUTOTHER PROBLEMSMECHANICAL PARTS REPLACEMENTMinorTech18150.01488.48200328.6864001318.785600001
\n", "

1971 rows × 21 columns

\n", "
" ], "text/plain": [ " Service_type Product_Group City Receipt_Date Appoint_Date \\\n", "No \n", "1 0 AUDIO - HOME THEAT TEHRAN 2012-02-05 2012-02-05 \n", "2 1 REFRIGERATOR-SBS TEHRAN 2012-07-07 2012-09-10 \n", "3 1 REFRIGERATOR-SBS TEHRAN 2012-07-07 2012-08-15 \n", "4 0 MONITOR - TFT LCD TEHRAN 2012-08-07 2012-08-07 \n", "5 0 AUDIO - MP3 Player TEHRAN 2012-08-07 2012-08-09 \n", "... ... ... ... ... ... \n", "3132 0 LCD-40 REZVANSHAHR 2013-01-31 2013-01-31 \n", "3133 0 LED46 ESFAHAN 2013-01-31 2013-01-31 \n", "3134 0 HARD DISK DRIVE TEHRAN 2013-01-31 2013-01-31 \n", "3135 0 HARD DISK DRIVE TEHRAN 2013-01-31 2013-01-31 \n", "3136 0 DVD PLAYER/ RECORD TEHRAN 2013-01-31 2013-01-31 \n", "\n", " Complete_Date TAT01 TAT02 Job_Satus Defect_Des Symptom_Desc \\\n", "No \n", "1 2013-01-21 0 351 0 WORN-OUT OTHER PROBLEMS \n", "2 2013-01-31 65 208 1 SHORT OTHER PROBLEMS \n", "3 2013-01-31 38 208 1 OPEN OTHER PROBLEMS \n", "4 2013-01-13 0 158 1 WORN-OUT OTHER PROBLEMS \n", "5 2013-01-07 2 153 1 WORN-OUT OTHER PROBLEMS \n", "... ... ... ... ... ... ... \n", "3132 2013-01-31 0 0 1 WORN-OUT OTHER PROBLEMS \n", "3133 2013-01-31 0 0 1 WORN-OUT OTHER PROBLEMS \n", "3134 2013-01-31 0 0 1 WORN-OUT OTHER PROBLEMS \n", "3135 2013-01-31 0 0 1 WORN-OUT OTHER PROBLEMS \n", "3136 2013-01-31 0 0 1 WORN-OUT OTHER PROBLEMS \n", "\n", " Repair_Action_Desc Labor_Charge_Desc Engineer \\\n", "No \n", "1 SOFTWARE UPGRADE Free Of Charge Tech15 \n", "2 OTHER REPAIR Free Of Charge Tech03 \n", "3 OTHER REPAIR Free Of Charge Tech03 \n", "4 ELECTRICAL PARTS REPLACEMENT Free Of Charge Tech15 \n", "5 SOFTWARE UPGRADE Medium Tech07 \n", "... ... ... ... \n", "3132 ELECTRICAL PARTS REPLACEMENT Major Tech13 \n", "3133 ELECTRICAL PARTS REPLACEMENT Major Tech13 \n", "3134 REPAIR REFUSAL Minimum Tech07 \n", "3135 REPAIR REFUSAL Free Of Charge Tech07 \n", "3136 MECHANICAL PARTS REPLACEMENT Minor Tech18 \n", "\n", " Labor_Charge_Amount Parts_Amount Discount_Amount \\\n", "No \n", "1 0.0 0.00000 0.000000 \n", "2 0.0 3437.48160 687.486320 \n", "3 0.0 4228.75656 845.751312 \n", "4 0.0 0.00000 0.000000 \n", "5 120.0 765.48860 177.087820 \n", "... ... ... ... \n", "3132 420.0 0.00000 84.000000 \n", "3133 420.0 0.00000 84.000000 \n", "3134 50.0 0.00000 10.000000 \n", "3135 0.0 0.00000 0.000000 \n", "3136 150.0 1488.48200 328.686400 \n", "\n", " Total_Invoice_Amount Extended Warranty Normal Warranty \\\n", "No \n", "1 0.000000 0 1 \n", "2 2748.885280 0 1 \n", "3 3383.005248 0 1 \n", "4 0.000000 0 0 \n", "5 708.381680 0 0 \n", "... ... ... ... \n", "3132 336.000000 0 1 \n", "3133 336.000000 0 1 \n", "3134 40.000000 1 0 \n", "3135 0.000000 0 0 \n", "3136 1318.785600 0 0 \n", "\n", " Out of Warranty \n", "No \n", "1 0 \n", "2 0 \n", "3 0 \n", "4 1 \n", "5 1 \n", "... ... \n", "3132 0 \n", "3133 0 \n", "3134 0 \n", "3135 1 \n", "3136 1 \n", "\n", "[1971 rows x 21 columns]" ] }, "metadata": {}, "execution_count": 224 } ], "metadata": {} }, { "cell_type": "markdown", "source": [ "now store the prepared dataframe. and let's say you are just interested in \"HARD DISK DRIVE\"s. ```to_csv()``` and ``` to_pickle``` will do this." ], "metadata": {} }, { "cell_type": "code", "execution_count": 225, "source": [ "Harddrive_df = service_df.loc[service_df['Product_Group'] == 'HARD DISK DRIVE']\n", "Harddrive_df.to_csv('Harddrive.csv') " ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "## Iran's Earthquakes through history" ], "metadata": {} }, { "cell_type": "markdown", "source": [ "### Opening Excel Files\n", "as another example. let's look at this new database" ], "metadata": {} }, { "cell_type": "code", "execution_count": 226, "source": [ "earthquake_dfs = pd.read_excel(join('Datasets', 'earthquake', 'data.xls'), sheet_name=None)\n", "earthquake_dfs.keys()" ], "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "dict_keys(['Historical', '1900-1963', '1964-2000'])" ] }, "metadata": {}, "execution_count": 226 } ], "metadata": {} }, { "cell_type": "markdown", "source": [ "data is divided between three sheet. let's look at two of them." ], "metadata": {} }, { "cell_type": "code", "execution_count": 227, "source": [ "earthquake_df1 = earthquake_dfs['1900-1963']\n", "earthquake_df1.head()" ], "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
No.DateUnnamed: 2Unnamed: 3TimeLocationUnnamed: 6Unnamed: 7MagnitudeUnnamed: 9Ref.
0NaNYMDNaNLat.Long.DeptmbMsNaN
11.019006NaNNaN38.543.3NaNNaN5MEA
22.01900712NaN40.2843.1NaNNaN5.9ULM
33.019014357.04044.315NaN3.9MEA
44.01902NaNNaNNaN3943.3NaNNaN5MEA
\n", "
" ], "text/plain": [ " No. Date Unnamed: 2 Unnamed: 3 Time Location Unnamed: 6 Unnamed: 7 \\\n", "0 NaN Y M D NaN Lat. Long. Dept \n", "1 1.0 1900 6 NaN NaN 38.5 43.3 NaN \n", "2 2.0 1900 7 12 NaN 40.28 43.1 NaN \n", "3 3.0 1901 4 3 57.0 40 44.3 15 \n", "4 4.0 1902 NaN NaN NaN 39 43.3 NaN \n", "\n", " Magnitude Unnamed: 9 Ref. \n", "0 mb Ms NaN \n", "1 NaN 5 MEA \n", "2 NaN 5.9 ULM \n", "3 NaN 3.9 MEA \n", "4 NaN 5 MEA " ] }, "metadata": {}, "execution_count": 227 } ], "metadata": {} }, { "cell_type": "code", "execution_count": 228, "source": [ "earthquake_df2 = earthquake_dfs['1964-2000']\n", "earthquake_df2.head()" ], "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
No.DateUnnamed: 2Unnamed: 3TimeLocationUnnamed: 6Unnamed: 7MagnitudeUnnamed: 9Unnamed: 10Ref.
0NaNYMDNaNLat.Long.DeptmbMsMwNaN
11.01964112124550.031.5249.23565.1NaNNaNISC
22.0196411991353.026.7954385.6NaNNaNISC
33.01964130122310.032.747.833NaNNaNNaNPDE
44.0196425102423.040.3746.194.5NaNNaNISC
\n", "
" ], "text/plain": [ " No. Date Unnamed: 2 Unnamed: 3 Time Location Unnamed: 6 Unnamed: 7 \\\n", "0 NaN Y M D NaN Lat. Long. Dept \n", "1 1.0 1964 1 12 124550.0 31.52 49.23 56 \n", "2 2.0 1964 1 19 91353.0 26.79 54 38 \n", "3 3.0 1964 1 30 122310.0 32.7 47.8 33 \n", "4 4.0 1964 2 5 102423.0 40.37 46.1 9 \n", "\n", " Magnitude Unnamed: 9 Unnamed: 10 Ref. \n", "0 mb Ms Mw NaN \n", "1 5.1 NaN NaN ISC \n", "2 5.6 NaN NaN ISC \n", "3 NaN NaN NaN PDE \n", "4 4.5 NaN NaN ISC " ] }, "metadata": {}, "execution_count": 228 } ], "metadata": {} }, { "cell_type": "markdown", "source": [ "second row is header too. we could have use ```header=[0, 1]``` as argument for ```csv_read``` to fix this. but now let's do it manually." ], "metadata": {} }, { "cell_type": "code", "execution_count": 229, "source": [ "earthquake_df1.columns = ['No', 'Year', 'Month', 'Day', 'Time', 'Lat', 'Long', 'Dept', 'mb', 'Ms', 'ref']\n", "earthquake_df1 = earthquake_df1.iloc[1:].reset_index(drop=True)\n", "earthquake_df2.columns = ['No', 'Year', 'Month', 'Day', 'Time', 'Lat', 'Long', 'Dept', 'mb', 'Ms', 'Mw', 'ref']\n", "earthquake_df2 = earthquake_df2.iloc[1:].reset_index(drop=True)\n", "earthquake_df1.head()" ], "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NoYearMonthDayTimeLatLongDeptmbMsref
01.019006NaNNaN38.543.3NaNNaN5MEA
12.01900712NaN40.2843.1NaNNaN5.9ULM
23.019014357.04044.315NaN3.9MEA
34.01902NaNNaNNaN3943.3NaNNaN5MEA
45.0190221393906.040.7248.71336NaNB77
\n", "
" ], "text/plain": [ " No Year Month Day Time Lat Long Dept mb Ms ref\n", "0 1.0 1900 6 NaN NaN 38.5 43.3 NaN NaN 5 MEA\n", "1 2.0 1900 7 12 NaN 40.28 43.1 NaN NaN 5.9 ULM\n", "2 3.0 1901 4 3 57.0 40 44.3 15 NaN 3.9 MEA\n", "3 4.0 1902 NaN NaN NaN 39 43.3 NaN NaN 5 MEA\n", "4 5.0 1902 2 13 93906.0 40.72 48.71 33 6 NaN B77" ] }, "metadata": {}, "execution_count": 229 } ], "metadata": {} }, { "cell_type": "markdown", "source": [ "### Concatenation\n", "we concat these two dataframe into one. notice that how new columns are the union of two dataframes' columns." ], "metadata": {} }, { "cell_type": "code", "execution_count": 230, "source": [ "earthquake_df_all = pd.concat([earthquake_df1, earthquake_df2])\n", "earthquake_df_all.head()" ], "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NoYearMonthDayTimeLatLongDeptmbMsrefMw
01.019006NaNNaN38.543.3NaNNaN5MEANaN
12.01900712NaN40.2843.1NaNNaN5.9ULMNaN
23.019014357.04044.315NaN3.9MEANaN
34.01902NaNNaNNaN3943.3NaNNaN5MEANaN
45.0190221393906.040.7248.71336NaNB77NaN
\n", "
" ], "text/plain": [ " No Year Month Day Time Lat Long Dept mb Ms ref Mw\n", "0 1.0 1900 6 NaN NaN 38.5 43.3 NaN NaN 5 MEA NaN\n", "1 2.0 1900 7 12 NaN 40.28 43.1 NaN NaN 5.9 ULM NaN\n", "2 3.0 1901 4 3 57.0 40 44.3 15 NaN 3.9 MEA NaN\n", "3 4.0 1902 NaN NaN NaN 39 43.3 NaN NaN 5 MEA NaN\n", "4 5.0 1902 2 13 93906.0 40.72 48.71 33 6 NaN B77 NaN" ] }, "metadata": {}, "execution_count": 230 } ], "metadata": {} }, { "cell_type": "markdown", "source": [ "### Imputation\n", "Also let's replace *`NaN`*s in *`mb` column* with mean of that column." ], "metadata": {} }, { "cell_type": "code", "execution_count": 231, "source": [ "earthquake_df_all['mb'].fillna((earthquake_df_all['mb'].mean()), inplace=True)\n", "earthquake_df_all.head()" ], "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NoYearMonthDayTimeLatLongDeptmbMsrefMw
01.019006NaNNaN38.543.3NaN4.4598975MEANaN
12.01900712NaN40.2843.1NaN4.4598975.9ULMNaN
23.019014357.04044.3154.4598973.9MEANaN
34.01902NaNNaNNaN3943.3NaN4.4598975MEANaN
45.0190221393906.040.7248.71336.000000NaNB77NaN
\n", "
" ], "text/plain": [ " No Year Month Day Time Lat Long Dept mb Ms ref Mw\n", "0 1.0 1900 6 NaN NaN 38.5 43.3 NaN 4.459897 5 MEA NaN\n", "1 2.0 1900 7 12 NaN 40.28 43.1 NaN 4.459897 5.9 ULM NaN\n", "2 3.0 1901 4 3 57.0 40 44.3 15 4.459897 3.9 MEA NaN\n", "3 4.0 1902 NaN NaN NaN 39 43.3 NaN 4.459897 5 MEA NaN\n", "4 5.0 1902 2 13 93906.0 40.72 48.71 33 6.000000 NaN B77 NaN" ] }, "metadata": {}, "execution_count": 231 } ], "metadata": {} }, { "cell_type": "markdown", "source": [ "Further work can be done on this dataframe." ], "metadata": {} }, { "cell_type": "markdown", "source": [ "## Json files (Bonus)" ], "metadata": {} }, { "cell_type": "markdown", "source": [ "In this part, we will use request to gather search results. " ], "metadata": {} }, { "cell_type": "code", "execution_count": 232, "source": [ "import requests\n", "import json" ], "outputs": [], "metadata": {} }, { "cell_type": "code", "execution_count": 236, "source": [ "query = {'q': 'playstation 5', 'sort': 'popularity', 'size' : 50}\n", "response = requests.get(\"https://api.torob.com/v4/base-product/search/\", params=query)" ], "outputs": [], "metadata": {} }, { "cell_type": "code", "execution_count": 237, "source": [ "response_json = response.json()\n", "response_json.keys()" ], "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "dict_keys(['results', 'count', 'max_price', 'min_price', 'next', 'previous', 'spellcheck', 'available_filters', 'filter_by_category_title', 'categories', 'parent_categories'])" ] }, "metadata": {}, "execution_count": 237 } ], "metadata": {} }, { "cell_type": "code", "execution_count": 238, "source": [ "response_df = pd.json_normalize(response_json['results'])\n", "response_df.head()" ], "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
image_urldiscount_inforandom_keyname1name2more_info_urlweb_client_absolute_urlpriceprice_textprice_text_modeshop_text
0https://storage.torob.com/backend-api/base/ima...[]2e71cfe4-7b77-4e80-837b-de93b5fb95d7کنسول بازی سونی PlayStation 5 Digital پلی استی...Playstation 5 Digital Editoinhttps://api.torob.com/v4/base-product/details-.../p/2e71cfe4-7b77-4e80-837b-de93b5fb95d7/%DA%A9...20120000از ۲۰٫۱۲۰٫۰۰۰ تومانactiveدر ۱۰۱ فروشگاه
1https://storage.torob.com/backend-api/base/ima...[]9825c33c-07e6-4ac7-8bd6-12015ebb3e3cکنسول بازی سونی PlayStation 5 Standard پلی اس...Playstation 5 Standard Editoinhttps://api.torob.com/v4/base-product/details-.../p/9825c33c-07e6-4ac7-8bd6-12015ebb3e3c/%DA%A9...21700000از ۲۱٫۷۰۰٫۰۰۰ تومانactiveدر ۱۳۶ فروشگاه
2https://pcjanebi.com/wp-content/uploads/2021/0...[]9687bb29-671b-42e4-986b-5f56ca3ade59باندل کنسول بازی سونی PlayStation 5 Standard پ...Standard PlayStation 5 bundle+DualSense+Pulse+...https://api.torob.com/v4/base-product/details-.../p/9687bb29-671b-42e4-986b-5f56ca3ade59/%D8%A8...29990000از ۲۹٫۹۹۰٫۰۰۰ تومانactiveدر ۵ فروشگاه
3https://storage.torob.com/backend-api/base/ima...[]3d19000e-b1a3-48de-8ef2-70c23c89b534پلی استیشن 5 دیجیتال به همراه بازی باندل 2 دستهhttps://api.torob.com/v4/base-product/details-.../p/3d19000e-b1a3-48de-8ef2-70c23c89b534/%D9%BE...23000000۲۳٫۰۰۰٫۰۰۰ تومانactiveدر مرکزخریددات کام
4https://storage.torob.com/backend-api/base/ima...[]88d90bf1-b3c9-4a6f-8abc-0605f0e8a008کنسول بازی سونی پلی استیشن 5 ظرفیت 825 گیگابایتhttps://api.torob.com/v4/base-product/details-.../p/88d90bf1-b3c9-4a6f-8abc-0605f0e8a008/%DA%A9...21299000۲۱٫۲۹۹٫۰۰۰ تومانactiveدر ژیوان
\n", "
" ], "text/plain": [ " image_url discount_info \\\n", "0 https://storage.torob.com/backend-api/base/ima... [] \n", "1 https://storage.torob.com/backend-api/base/ima... [] \n", "2 https://pcjanebi.com/wp-content/uploads/2021/0... [] \n", "3 https://storage.torob.com/backend-api/base/ima... [] \n", "4 https://storage.torob.com/backend-api/base/ima... [] \n", "\n", " random_key \\\n", "0 2e71cfe4-7b77-4e80-837b-de93b5fb95d7 \n", "1 9825c33c-07e6-4ac7-8bd6-12015ebb3e3c \n", "2 9687bb29-671b-42e4-986b-5f56ca3ade59 \n", "3 3d19000e-b1a3-48de-8ef2-70c23c89b534 \n", "4 88d90bf1-b3c9-4a6f-8abc-0605f0e8a008 \n", "\n", " name1 \\\n", "0 کنسول بازی سونی PlayStation 5 Digital پلی استی... \n", "1 کنسول بازی سونی PlayStation 5 Standard پلی اس... \n", "2 باندل کنسول بازی سونی PlayStation 5 Standard پ... \n", "3 پلی استیشن 5 دیجیتال به همراه بازی باندل 2 دسته \n", "4 کنسول بازی سونی پلی استیشن 5 ظرفیت 825 گیگابایت \n", "\n", " name2 \\\n", "0 Playstation 5 Digital Editoin \n", "1 Playstation 5 Standard Editoin \n", "2 Standard PlayStation 5 bundle+DualSense+Pulse+... \n", "3 \n", "4 \n", "\n", " more_info_url \\\n", "0 https://api.torob.com/v4/base-product/details-... \n", "1 https://api.torob.com/v4/base-product/details-... \n", "2 https://api.torob.com/v4/base-product/details-... \n", "3 https://api.torob.com/v4/base-product/details-... \n", "4 https://api.torob.com/v4/base-product/details-... \n", "\n", " web_client_absolute_url price \\\n", "0 /p/2e71cfe4-7b77-4e80-837b-de93b5fb95d7/%DA%A9... 20120000 \n", "1 /p/9825c33c-07e6-4ac7-8bd6-12015ebb3e3c/%DA%A9... 21700000 \n", "2 /p/9687bb29-671b-42e4-986b-5f56ca3ade59/%D8%A8... 29990000 \n", "3 /p/3d19000e-b1a3-48de-8ef2-70c23c89b534/%D9%BE... 23000000 \n", "4 /p/88d90bf1-b3c9-4a6f-8abc-0605f0e8a008/%DA%A9... 21299000 \n", "\n", " price_text price_text_mode shop_text \n", "0 از ۲۰٫۱۲۰٫۰۰۰ تومان active در ۱۰۱ فروشگاه \n", "1 از ۲۱٫۷۰۰٫۰۰۰ تومان active در ۱۳۶ فروشگاه \n", "2 از ۲۹٫۹۹۰٫۰۰۰ تومان active در ۵ فروشگاه \n", "3 ۲۳٫۰۰۰٫۰۰۰ تومان active در مرکزخریددات کام \n", "4 ۲۱٫۲۹۹٫۰۰۰ تومان active در ژیوان " ] }, "metadata": {}, "execution_count": 238 } ], "metadata": {} } ], "metadata": { "kernelspec": { "name": "python3", "display_name": "Python 3.8.10 64-bit" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" }, "interpreter": { "hash": "916dbcbb3f70747c44a77c7bcd40155683ae19c65e1c03b4aa3499c5328201f1" } }, "nbformat": 4, "nbformat_minor": 4 }