Juice Shop

Assume you’ve opened a “juice shop”, and you want to improve your business.



If you are smart enough, you’ll start recording sales data to understand your business.

What kind of data you’ll record?

Collecting Data

  • Temporal data

  • Text data

    • text

    • categorical

  • Numerical data

    • Continuous

    • Discrete

    • Categorical

Sorting Data

Sorting by Sales

Or based on Flyers

Outliers:

  • Data points far away from others

  • Outliers can largely affect the analysis.

  • Outliers might be mistakes or very rare

Filtering

We can filter data based on any criteria on any of the fields.



  • Day=’Sat’ or ‘Sun



  • Temperature < 30

Drive values from existing data

  • You can do any kind of calculation on any field

    • Change temperature from C° to F°

    • Adding a Month field

  • Or generate a new field by combining already existing fields

    • Revenue: Sales*Price



Aggregating data

  • We can use aggregating functions (e. g., sum) to summarize data and get feel as a whole.



  • Count, Distinct Count, Sum, Min, Max



Highlighting Data

Interpreting numbers in large tables is difficult.

  • We can use heatmaps to visualize the scale of values





  • We can use “data bars” to visualize the scale of values





  • We can highlight individual values that fall within sum criteria:

    • e. g., top 30% (good days) and less 30% (bad days) Revenues



Grouping Data

It is common to group data by categorical fields and compute subtotal values



On more than one field



Is Price=50 always better?

Visualization

  • Line plot
  • Column chart
  • Joint column chart
  • Scater plot

Statistical analysis

Statistics is the core of data science.

Using Statistics:

  • You can see the distribution of the data

  • How much variance there is between values

  • How changes in one feature affect values of other features

The first point to start is descriptive statistics