E-news Express.

Background: 

An online news portal aims to expand its business by acquiring new subscribers. Every visitor to the website takes certain actions based on their interest. The company plans to analyze these interests and wants to determine whether a new feature will be effective or not. Companies often analyze users' responses to two variants of a product to decide which of the two variants is more effective. This experimental technique is known as a/b testing that is used to determine whether a new feature attracts users based on a chosen metric.

Suppose you are hired as a Data Scientist in E-news Express. The design team of the company has created a new landing page. You have been assigned the task to decide whether the new landing page is more effective to gather new subscribers. Suppose you randomly selected 100 users and divided them equally into two groups. The old landing page is served to the first group (control group) and the new landing page is served to the second group (treatment group). Various data about the customers in both groups are collected in 'abtest.csv'. Perform the statistical analysis to answer the following questions using the collected data.

Objective:

  1. Do the users spend more time on the new landing page than the old landing page?

  2. Is the conversion rate (the proportion of users who visit the landing page and get converted) for the new page greater than the conversion rate for the old page?

  3. Does the converted status depend on the preferred language? [Hint: Create a contingency table using the pandas.crosstab() function]

  4. Is the mean time spent on the new page same for the different language users?

Data Dictionary:

  1. user_id - This represents the user ID of the person visiting the website.

  2. group - This represents whether the user belongs to the first group (control) or the second group (treatment).

  3. landing_page - This represents whether the landing page is new or old.

  4. time_spent_on_the_page - This represents the time (in minutes) spent by the user on the landing page.

  5. converted - This represents whether the user gets converted to a subscriber of the news portal or not.

  6. language_preferred - This represents the language chosen by the user to view the landing page.

Data Transformation:

The data transformations applied to this dataset include:

  1. Type Conversion: Columns group, landing_page, converted, and language_preferred were initially stored as objects (strings) and were converted to categorical types to optimize memory usage and simplify analysis​.

  2. Data Preparation: The dataset was read into a DataFrame, and column data types were corrected where necessary. Each column's non-null count and types were verified to ensure data consistency before proceeding with further analysis​.

Conclusion

It is recommended to adopt the new landing page as the primary interface for subscribers, as it has demonstrated higher conversion rates compared to the old design. The data also shows a positive correlation between time spent on the page and conversion rates, indicating that future designs should focus on enhancing user engagement by incorporating interactive or valuable content that encourages prolonged visits. Additionally, since language preference does not significantly impact conversion rates, resources could be more effectively allocated to improving the overall page design rather than making language-specific modifications.

The analysis indicates that the new landing page performs better in terms of user engagement and conversions. Therefore, implementing the new design is likely to boost subscriber numbers, particularly if the focus remains on maintaining engaging content that encourages users to spend more time on the site​.