In order to understand the application of Packet Sniffer for Web Analytics proposes I interviewed Ivo Rehberger, founder, Development Manager and the main software architect of Nextwell. He has a doctoral degree in technical cybernetics, and has published many articles in technical journals and e-zines. In NEXTWELL he is responsible for research and development.

Netwell commercializes a very interesting product called Clipen which is awfully usefull for Analytics proposes.

JUAN: What’s packet sniffer?

IVO: Generally, packet sniffer is an application (or sometimes appliance) that captures packets from a network and stores them on a hard drive. The term sniffer has probably its origin in the fact the application usually runs in promiscuous mode, when each packet present “on the wire” (i.e. passive voice for present) is captured and stored. Packet sniffers are known to be improperly used to capture user passwords or other private information especially if they are sent as plain text which is even frequent these days. Nevertheless, packet sniffer is a useful tool mainly to inspect network communication, to analyze network or network application errors, to monitor network application performance, and last but not least very efficient source of input data for web analytics.

JUAN: How does this technology apply for Analytics?

IVO: Packet sniffer is placed between client’s computer and server machine and thus it has unique ability to get complete information about interaction made between users and the server application of the website. Packet sniffing as discussed here is just the principle to acquire information needed for web analytics, this does not mean each sniffing application is automatically applicable for this purpose. So if I speak concretely about our clickstream processing engine technology called Clipen, the main mission this technology serves is to provide high-quality clickstream data that are completely processed and sessionized in near real-time. We are fully aware of how much results of an analysis depend on the quality of the input data, and this awareness (together with scalability) is reflected across the entire conception of the technology. From an architectural point of view the engine plays role as an infrastructural component of web analytics and data warehousing solutions. The most hidden and important component to achieve optimal analytical results based on complex, quality and near real-time available input data. To support this, the Clipen technology ensures both procedural and performance aspects of the clickstream data processing.

JUAN: What´s advantages and/or disadvantages does it has over Page Tagging or Log Analyzers?

IVO: Web logs are inaccurate and incomplete data source from its nature and to integrate logs from several machines of same website may be terrible task with uncertain results. Page tags, on the other hand, must be embedded into each page you want to track and this task may be really painful. Tag also tends to under-measure traffic. Also, not every on-line company wants to affect its fine-tuned e-commerce application to be affected by tagging. On Nextwell’s website I published some kind of comparison of the data collection methods. The main difference that characterizes packet sniffing used for data collection is the fact we have complete information about what’s the result of each HTTP request made by website users. Was the page downloaded completely or incompletely? How much of the page was really downloaded? How long lasted the download? As the sniffer “sits” on the network line between both communication parties (but on the server side for this case), the complete information of the client/server communication is available.

The biggest advantages of the Clipen technology are complex and high-quality data with near real-time availability, high performance and scalability, easy deployment, processing customization, content tracking (website content labeling). The technology is completely non-invasive and does not require any website changes to enable the clickstream data processing. Packet sniffing is entirely transparent for the tracked server and does not affect its operation.

As disadvantage somebody may consider fact the technology is on the server side so it may be affected by proxy servers or browser caches that do not send HTTP request to original server but serve it from cache. However, there are some proxies busting techniques that if applied within HTTP server they prevent proxy servers from unwanted caching. In addition, today websites are full of dynamic content that shouldn’t be cached anyway and that’s why on-line applications inherently act against caching of HTML pages. Influence of caching is negligible.

JUAN: How does it work?

IVO: This answer is quite technical; I borrowed it from Clipen’s user guide. Machine operating Clipen technology has to be connected to the tracked website’s network flow via particular network switch with port mirroring feature or via device like TAP. The clickstream data processing starts by creation of a network communication snoop files using Clipen sniffer. TCP connections stored in each snoop file are demultiplexed and divided into parallel tasks that are distributed to particular Clipen processing nodes to process them concurrently. Within the processing of the task there are following steps that Clipen node performs:

1. TCP streams reassembling

2. TCP streams SSL/TLS decoding (in the case of HTTPS)

3. HTTP request/response message completion

4. HTTP headers analysis and extraction

5. page/non-page recognition

6. session identification

7. user identification

8. content identification

9. storing data to Clipen database

When processed data are stored in the Clipen database they go through post-processing executed internally within the Clipen database managed by relational database management system. The post-processing is composed of the following parts:

1. Sessionization

2. Referring-referred resolving

3. Page view dwell time specification

4. Robot detection

After post-processing the output data are available for extraction from the Clipen database; the data extraction is fully driven by Clipen user’s external ETL application. Clipen is universal data production system so whatever external application may extract the processed data and use those for any purpose, typically for web analytics and data warehousing, but currently we also prepare product called Clipen Benchmark that is intended for performance monitoring of web applications and for so called “real user monitoring”.

JUAN: How do you imagine Analytics in the future?

IVO: I’m not the right man to predict the future but let’s try. What may change the future of analytics are probably the RFID technologies. RFID chips and systems are capable to bring many new applications that will shift analytics a large step ahead. On the other hand the RFID will evoke (or maybe already evokes) many issues regarding privacy. The capability to analyze very detailed information about person may also lead to more strict legal control so the technological evolution in this area may finally cause some regulation. However, it’s obvious that one-to-one marketing will get new meaning thanks to RFID technologies and analytical conveniences they will bring.

JUAN: What´s your passion?

IVO: I like sport shooting. Especially practical shooting – both handgun and shotgun – under IPSC rules. Fortunately, Czech Republic has judicious gun control laws thus shooters are not persecuted here like for example in the UK. But there is not much time for hobbies now. What is important for me is to meet friends regularly if possible and relax with them in nice pub with good beer. OK, maybe not surprise, as I live in Czech Republic where we have the best beer in the world :-) (Note from Juan: I lived in Germany and I should say that the best beer I ever tried is Alt Beer from Dusseldorf, but I do agree to try yours just for researching proposes).

JUAN: What´s the best book you´ve read?

IVO: Probably Ken Kesey’s “One Flew Over the Cuckoo’s Nest” but there are also many others.

