Data Understanding:
The dataset contains 1,000 rows and 17 columns. The columns in the dataset include:
- Invoice ID: A unique identifier for each transaction.
- Branch: The branch of the supermarket where the transaction occurred.
- City: The city where the branch is located.
- Customer type: Whether the customer was a member or a normal customer.
- Gender: The gender of the customer.
- Product line: The category of the product.
- Unit price: The price of a single unit of the product.
- Quantity: The number of units of the product purchased.
- Tax: The amount of tax applied to the transaction.
- Total: The total amount of the transaction.
- Date: The date of the transaction.
- Time: The time of the transaction.
- Payment: The payment method used for the transaction.
- COGS: The cost of goods sold.
- Gross margin percentage: The gross margin percentage.
- Gross income: The gross income from the transaction.
- Rating: The customer satisfaction rating for the transaction.
Data Preparation:
Before conducting the analysis, the following data preparation steps were performed:
- Missing values were checked and found to be absent in the dataset.
- Duplicate values were checked and removed from the dataset.
- Date and Time columns were combined into a single column and converted to a datetime datatype.
- A new column was created for the day of the week of the transaction.
Analysis:
Sales Trend by Weekday:
The total sales amount for each day of the week was calculated and plotted in a bar chart. The analysis showed that the highest sales were on Saturday, followed by Sunday and Friday. Monday had the lowest sales.
- Product Line Analysis:
The product lines were analyzed to identify the most popular product categories. The top three product lines by sales were Food and beverages, Fashion accessories, and Electronic accessories.
- Gender Analysis:
The sales by gender were analyzed to identify if there was any gender preference for certain product categories. The analysis showed that the sales by gender were roughly equal, and there was no significant difference in the sales of different product categories.
- Customer Type Analysis:
The sales by customer type were analyzed to identify if there was any preference for certain product categories among members and normal customers. The analysis showed that members had higher sales in all product categories compared to normal customers.
- Payment Method Analysis:
The sales by payment method were analyzed to identify if there was any preference for certain payment methods among customers. The analysis showed that most customers preferred using e-wallets, followed by cash and credit cards.
Conclusion:
The analysis of the Supermarket Sales dataset provided insights into customer purchasing behavior, popular product categories, and sales trends. The analysis showed that the highest sales were on weekends, the most popular product categories were Food and beverages, Fashion accessories, and Electronic accessories, and most customers preferred using e-wallets for payment. The analysis also showed that members had higher sales in all product categories compared to normal customers.