Please note: I worked on this project during a two month period as a Research Assistant for Empirical Marketing Professor Anita Rao at the University of Chicago Booth School of Business, but unfortunately due to time constraints with my class schedule, part-time internship, and extracurricular commitments I was unable to finish the project.
Here is a link to the GitHub repo containing the code I used to parse user web-browsing data.
Firms sometimes make selective or deceptive claims. Such claims can have negative consequences for consumers, especially if consumers are not fully informed and the claims are hard to verify. This paper aims to measure the decline in demand that a firm making such claims faces when caught.Rao, Wang : Demand for ‘Healthy’ Products: False Claims and FTC Regulation
In 2011, the Federal Trade Commission (FTC) went after 10 companies purporting to sell “acai-berry weight-loss supplements” that could help consumers lose weight rapidly. While the FDA does not regulate supplement marketing, the FTC based its sting operation around “fake news websites” these companies created that mimicked real news media outlets. These fake news websites contained affiliate links redirecting to the weight loss supplement websites. The redirects deceived consumers into thinking that the weight loss supplements were supported and tested by mainstream news outlets.
The research was aimed at understanding how effective the sting operation was at protecting consumers against fraudlent marketing companies. To do so, we used Nielsen Data that contained monthly user browsing histories across hundreds of devices accessing these fake news websites.
User browsing history files (~2GB each) are large and difficult to work with. I created several Python scripts to help me with the process, and gained proficiency using Python’s Pandas library.
I also used Bloomberg Law to research each of the 10 lawsuits in-depth to understand which fake news domains I needed to be on the lookout for. During this time, I was curious about the prospect of Law School and had taken the LSAT, so I found reading court document briefs to be rather fun.
I didn’t commit the raw data .CSV to the GitHub because of its sheer size (2GB), but you can see the parsed data here if you’re interested for January 2011.
Here’s the link to that GitHub again if you’re curious to see some of the code. My biggest takeaways from this project were learning how to work with large datasets, quickly reading court case briefs for important information (think Mike Ross from Suits USA), and creating an infrastructure to parse those datasets into meaningful analysis.