In this case analysis I will be going over information about data scraping and whether it is
good or bad based on specific cases. In the paper written by Tan, he mentions that LinkedIn tries
its hardest to prohibit scraping from its site but not all companies can properly monitor and
prevent it. This is because some of the scraping tools either minimize the amount of accounts
scraped or they spread out the time between the accounts scraped .
Although it is not illegal to scrap information that is public, LinkedIn still tries to protect
its users by banning anyone who it catches scraping. Specifically in the case mentioned in the
paper, Linked In actually won because they sent a cease and desist letter that was then not
followed. In this case analysis I will argue that Kantian Deontology shows us that this company
shouldn’t have used the scraped data because it could potentially put risk to the customers whose
data was collected (Tan 2024).
Throughout the paper written by Zimmer, he explains the situation that happened with a
company using scrape facebook data for a research project. Essentially the company got
permission from both facebook and the university to use the information but after the
information they used was posted publicly it was clear exactly which university the information
had come from and this put the students whose information was used at risk. Because of this
situation Zimmer mentions issues with the T3 project including the nature of consent and
anonymization in the paper.
As far as the nature of consent goes, although they had permission from Facebook and
the university they used, they didn’t make any further moves to get anymore consent from the
users of the accounts, after assuming that the profile information was free for the public to see.
For the anonymization part of the issues with the research the information was supposed to be
anonymous but many easily guessed the university based on information that was used in the
research including specific majors that only that school offers.
Based on the article Zimmer wrote that the personal information that was able to be
accessed was because some of the research team was actually from harvard and being that they
were part of the community they were scraping they get more access then the public would of the
profiles so they accessed information that was not supposed to be included in the report. This
type of incident can happen anywhere if any one of the people doing the scraping have more
access to the profiles that are being scraped then needed, allowing more personal information to
be extracted. This is why I believe that using the scraped information is dangerous because it can
open up incidents like this. Even if the information was kept and used they still should’ve asked
the students if it was okay to use the information (Zimmer 2010).
When thinking about the Kantian Deontology view, I believe this also follows through
with my view that not using the scraped data would’ve been better. Based on the beliefs of the
Kantian Deontologists, they follow the belief that the moral duty of everyone should be to
respect others and always use good will and intentions. Using information that was plucked from
a bunch of profiles, public or not, would not be allowed without the express permission of the
person whose profile it is. Legally however, that is not the case because scraping is legally
allowed for any publicly available information. Using anyone’s information without their
permission, especially when the information is not as anonymous as you’d think it would be, is
dangerous for everyone and based on the respect and good will on deontology they would agree
with me.
In O’Neil’s paper she writes about the zero tolerance policy, which was mostly popular
for political battles in bigger states like New York. This policy was meant to idolize no tolerance
for any acts that are criminal or illegal. It was compared to the fact that in some areas with
smaller police presence, they mainly focus on Part 1 crimes that are more violent like murder,
rape, and assault and not much on the smaller crimes. Even though using the PredPol methods of
tracking the areas where the part 1 crimes are more prevalent has seen a reduction in those
crimes, it does not count the crimes not included in the searches. That includes the crimes
committed by the rich like fraud. I commend the fact that using the system to observe certain
crimes is helping in some ways, however, it also makes the rates for the other crimes that aren’t
looked for unaffected and maybe even potentially rising (O’Neil 2016).
When it comes to using the scraped data I believe that more of a no tolerance or zero
tolerance policy needs to be able to be included on any websites, like LinkedIn, who do not want
any scraping on their sites. Because as mentioned in the Tan article, some of the scrapped
material does actually get used for scammers to send people emails or messages that they
typically do not have access to (Tan 2024). Even in articles where the scraping data is used for
good purposes, like in Zimmer’s article with the T3 research, it would be better if they got
permission from the customers to use the data, so it wouldn’t necessarily be scraping (Zimmer
2010).
When it comes to Kantian Deontology, the right way is the good or respecting way. So
when focusing on the good and the bad that can come from using and scraping data it’s better to
go with the good way. In this case having a zero tolerance policy on scraping of data and, like
mentioned in the Zimmer argument, insure that there is proper consent and that no personal
information is being shared or used when using scraped data (O’Neil 2016 & Zimmer 2010). In
the case with the Linked In issue with the data scraping as mentioned before, they would prefer
to have scraping of data on their site be unallowed in general so, why not allow them to do that
(Tan 2024).
If Deontology believes that you should be respectful to everyone because it is your duty
as a human being then, the fact that Linked In would like to ban data scraping should be possible.
Even if a zero tolerance policy was in place, not all sites would apply it, so there will still be
plenty of places that researchers and even potential scammers can use the information from
(O’Neil 2016). Overall, I believe that allowing the option for a no tolerance policy for scraping
would benefit sites like LinkedIn and that allowing that option would be seen as a good and
respectful thing in the Deontologists eyes.
Realistically, I’m sure if you scrape data that you have permission to use from the specific
customers the information is scraped from then there aren’t really too many issues with using
scraped data as long as no sensitive or personal information is shared or scraped. However, that
is not how data that is scraped works today, and anything that is public is free to be scraped
according to Tan in the paper about the LinkedIn issues with trying to sue over scraping (Tan
2024). So, based on the reasons stated from Tan, Zimmer, O’Neil, and even the Kantian
Deontologists views on respecting others, everyone should agree that with the rules used for
scraping data today that it should not be done or used for information the way it has been.
Sources
O’Neil, Cathy. “Civilian Casualties: Justice in the Age of Big Data.” Weapons of Math
Destruction, Crown, New York, 2016.
Tan, Jason. “The Fine Line of Linkedin Data Scraping: Legality, Consequences, and Best
Practices.” Engage AI, Engage AI, 22 Feb. 2024, engage-ai.co/linkedin-data-
scraping-legality-consequences-best-practices/.
Zimmer, Michael. “‘but the data is already public’: On the Ethics of Research in facebook.”
Ethics and Information Technology, vol. 12, no. 4, 4 June 2010, pp. 313–325,
https://doi.org/10.1007/s10676-010-9227-5.