You can ask for help or information either via email at firstname.lastname@example.org or via IRC channel #redpluck on the Freenode network. We'll be adding other ways to contact us in the future.
Check our YouTube channel for scraping tutorials
- Can i extract data from multiple pages?
- Can i extract data from AJAX-based websites?
- Pages behind the login page
- How do i extract data?
- Is data extraction legal?
What is ReDPluck?
ReDPluck is a service that let's you extract complex data from any website. After extracting, your data is available in file format for downloading or as an API with various helpful methods including formatting - displaying data in JSON, XML or CSV format.
What is API
API or Application Programming Interface allows various devices and software applications to communicate with each other and share data in one unified format. Extracting data from a website allows you to access that data in a format that fits your needs. Furthermore, setting certain extracting schedule you can keep your data up to date. API has a hard limit of 10,000 items per request.
API could be private and public. Private API allows access only to those who are either logged in into the service or to those, who can provide special private key. Public API opens access to your data to anyone.
Besides API, you can get your data in a file format. This is useful for large data dumps, since API is limited to 10,000 items items it can return with one request, you might find it easier to download whole data set in one go.
Can i extract data from multiple pages?
Absolutely! While selecting fields for extraction, all you need to do is select either whole pagination element or just link leading to the next page. We will try and extract all data unless you sent a limit on how many pages you need.
Can i extract data from AJAX-based websites?
Some you can, some you can't. It all depends on how particular website is structured.
Data behind the login page
Not yet. We will be working on it in the future.
How do i extract data
General idea behind extracting is as follows: You provide URL of the page you want to scrape. After that, we'll load that page and you can start picking items that you're interested in. For example, you can click on a first link in a large list of links. After that we will show you a popup where you can set various settings for this field including its name and custom regular expression for this field. Besides that here you can choose to select other similar eements which will allow us to scrape page not as a single item, but as a list of items with its own properties.
Is data extraction legal?
Yes, it is legal if website you're scraping is public. Search engines such as Google and Yahoo operate in a similar way. Often on websites you can find small text file called robots.txt (http://example.com/robots.txt). This file contains information on various sections of the websites that shouldn't be accessible to engines and scrapers. It would be a good idea to refer to this file before you start data extraction. In other words, it is up to you to determine if it's legal to use this service on certain pages.