Current location - Education and Training Encyclopedia - Resume - Is it illegal to collect reptile data?
Is it illegal to collect reptile data?
May be illegal. Generally speaking, it is not illegal for its crawler to download data, because the data crawled by the crawler is also the data that users can see when they open the page on the website. However, if the website meets the following conditions, there will be legal risks in forcibly collecting data. May cause violations of privacy. These "crawlers" follow specific programs, follow certain paths, simulate manual operations, and extract and store data from platforms presented by terminals such as websites and applications. With the development of big data and other technologies, the influence of web crawler is gradually increasing, not only grabbing, but even grabbing tickets, stealing numbers and supplying computer systems. It will also crawl, which makes it gradually enter the public eye. Next, it is also a discussion about the infringement boundary of reptile technology. There are also many kinds of reptiles.

For example, according to the system structure and implementation technology, crawlers can be divided into general web crawlers (crawling all the contents of the network, regardless of priority), focused web crawlers (crawling only pages related to preset themes), incremental web crawlers (crawling only new pages or changed pages) and deep web crawlers (visiting deep pages). The reptiles we usually see are also used to grab data. This reptile actually did two things:

1, get the source code of the webpage;

2. Parse and extract the required data from the web page source code. Many anti-crawler technologies are aimed at the first task, which prevents you from getting the source code through the crawler. As long as you get the source code, there are many ways to parse and extract data. It can be said that when you get the source code, most of the crawler's work is completed.

How to improve the efficiency of web crawler

1. can improve the crawling frequency of reptiles and crack the verification information of some websites. The verification adopted by the website is generally the verification code or the user needs to log in.

2. Let the crawler use multithreading, and the computer should have enough memory. You should also use the proxy IP, and the proxy IP should be stable online. This method is a good choice to improve efficiency.

Legal basis:

People's Republic of China (PRC) Civil Code

Article 110

Natural persons enjoy the right to life, body, health, name, portrait, reputation, honor, privacy and marital autonomy. Legal persons and unincorporated organizations enjoy the right of name, reputation and honor.