botextract logo botextract


The challenge and the solution

The challenge

The web is already bigger and growing faster than any other source of business intelligence and information management data. However, extracting clean and meaningful data from the web remains a challenge.

The problem with web data extraction is that the web consists of unstructured (unordered and unsorted) data. Every web site has a different structure and presentation. The data on web sites is presented for humans, not machines. Every web site can change its content, structure and presentation at any time.

There are many standalone software and web-based solutions that offer do-it-yourself point-and-click interfaces to create small pieces of software (usually called "agents") to collect data from the web. Their vendors often promise that users do not need any technical expertise to create these agents. The users just highlight the data that they want to extract, and the agent will magically learn what and how to collect (some vendors even promise "artificial intelligence"!). Once created, these agents can then run at scheduled intervals. This do-it-yourself approach sounds perfect in theory, but sadly, it is too good to be true. It fails in all but the most basic scenarios. The reason is simply that real life web data is almost never as easily and reliably presented as shown in prefabricated demonstrations. Real life web data keeps changing, making it a moving target. On top of that, there are plenty of technical challenges, such as (Java)scripts and CAPTCHA's.

Of course, you can always extract web data manually. However, apart from the obvious high labour costs, this manual cut-and-paste approach is slow and error-prone. Due to its tedious and repetitive nature, it is also a drain on your human resources.

You could build up an in-house technical team for automated web data extraction. However, it is difficult and expensive to hire and to retain qualified and experienced Information Technology staff with expertise in this technical niche area. It does also require considerable time and money to build up such an in-house capacity.

The solution

Botextract does the web data extraction work for you. We have the expertise to create reliable solutions for you.

  • You don't need expertise in web data extraction.
  • You don't have to invest in expensive infrastructure.
  • You don't have to employ administrative or Information Technology staff for web data extraction.
  • You can free administrative staff from tedious, error-prone, demotivating and expensive manual data collection activities.
  • We run your web data extraction in reliable, secure, and scalable cloud services on your schedule.
  • We take care of all technical difficulties, such as (Java)script based web sites, IP number blocking, or CAPTCHA's. We protect your identity at any time.
  • We maintain and update your web data extraction when required, for example when the data sources change. We make you an offer for the maintenance and the ongoing costs.
  • You can download the extracted web data from us, or we can deliver the extracted web data to you via electronic data transfer or email. We can deliver your data in the format of your choice, for example as Microsoft Excel files, text files (CSV), Adobe PDF documents, or XML. We can also store your data directly into cloud storage or database services, such as Microsoft Azure SQL Databases.

The next step

Use the contact form or send us an email and tell us what you would like to extract from the web, from where that you want to collect the data, and how that you want to get the data delivered to you.

We will make you an offer that includes the maintenance and ongoing costs. We will even provide a free proof of concept where appropriate. No kidding, we really mean it: For free and without any responsibilities for you.