Clarifying web crawling's confused role in risk management
In recent years, “web crawling” has emerged as a buzzword in compliance and merchant underwriting. Through our training events, conferences, and conversations with industry professionals, we’ve noticed a recurring theme: while many reference “web crawling” as part of their Know Your Business (KYB) processes, the term is often misunderstood. In this post, we’ll unpack what web crawling really means, explore its role in KYB, and show how it fits into broader compliance strategies.
What is web crawling and how does it apply to KYB?
At its core, web crawling is the automated process of collecting publicly available data from websites to identify specific information or patterns. Search engines, for instance, use crawlers to index web content, while data aggregators rely on them to gather information for analysis. In compliance, web crawling is an essential tool for monitoring online content, detecting risks, and verifying claims.
However, a common misconception is that web crawling is a one-click solution. Many believe a bot can scan the entire internet and instantly solve compliance challenges. In reality, web crawling alone cannot provide all the answers. While it’s effective for large-scale data collection, algorithms are essential to structure and analyse the gathered data, transforming raw information into actionable insights. Humans are required to go where machines cannot. Intuition, experience and perception are needed especially when interpreting complex patterns or intent behind marketing practices and connections between business entities.
Web crawling in merchant underwriting
In KYB and merchant underwriting, web crawling supports several critical tasks:
- Assessing compliance: Crawlers can identify potential violations of card scheme rules (e.g., Mastercard BRAM or Visa VIRP).
- Detecting risks: Subtle indicators, like hidden connections between websites or unverified business models, often require a combination of automated and manual analysis.
- Verifying merchant claims: For example, discrepancies between a merchant’s declared activities and their online presence can signal deceptive practices.
That said, our approach to web crawling is different from fully automated bots. Typically, our solutions combine automation with expert review to ensure risks aren’t missed. An additional layer of intelligent algorithms analyses the crawled data, automatically identifying potential issues and flagging them for further human review.
This is where underwriters show their strength. Verifying flagged issues is a key part of the underwriter's role, where they weed out false positives that simple algorithms—or even sophisticated AI analysis—cannot properly understand. While we take on much of the heavy lifting, we encourage clients to actively engage with the results—reviewing flagged issues or verifying annotations—to build a robust compliance framework.
Understanding website content monitoring
Website content monitoring is a critical part of ongoing compliance for businesses operating in the payment ecosystem. As a specialised form of web crawling, it focuses on specific compliance needs. Unlike general web crawling, which collects and indexes a wide range of website data, content monitoring kees watch of compliance-critical elements. This involves tracking changes to merchant websites, such as product listings, user-generated content, and terms and conditions, to ensure compliance with card scheme rules and industry regulations. This vigilance helps identify risks like non-compliance, counterfeit goods, or misleading claims, which can arise as businesses evolve or expand.
Monitoring strategies typically combine automated tools, such as web crawlers tailored to scan for compliance-relevant updates, such as web crawlers that scan for updates, with human expertise to interpret more complex risks. Key areas of focus include reviewing advertising practices, merchant information, and audience demographics. Certain merchant category codes (MCCs), such as dating services or marketplaces, require extra attention due to their higher compliance risks.
The need for nuanced analysis and continuous monitoring
Effective monitoring of merchant profiles requires a nuanced approach, as risks can evolve and emerge over time. Web crawling is not a one-time activity; rather, it needs to be an ongoing process to ensure merchants remain compliant as new risks surface. Persistent monitoring tools—ours is called ‘Monitor’—provide continuous oversight by regularly scanning merchant websites. This helps detect any changes or emerging risks that may not have been apparent during the initial review.
For a deeper dive into the importance of website content monitoring and how it can strengthen your compliance strategy, read our blog post: The Basics of Website Content Monitoring.
Limitations of basic web crawling for compliance
While web crawling is powerful, it has limitations in the KYB and merchant underwriting context:
- Lack of context: Crawlers can’t interpret nuanced issues. For example, a crawler might flag a licensed pharmacy based on keywords like 'prescription drugs' or 'opioids'. However, this lack of context doesn’t account for whether the pharmacy is properly licensed to sell such medications in its jurisdiction. Additional context, such as verifying the pharmacy’s licence and compliance with local regulations, is needed to properly assess whether the entity poses a compliance risk.
- Inability to interpret intent: Crawling itself is primarily a data-gathering tool, collecting raw information from websites. However, keywords alone can’t provide full clarity. For instance, identifying whether content reflects fraudulent practices requires analysis beyond the crawler’s capabilities. Human expertise is essential to review and interpret the data in context, ensuring that intent and compliance risks are accurately assessed.
- Technical barriers: Many high-risk industries implement measures to block crawlers, such as DDoS protections or restricted member-only areas. This is particularly common in adult entertainment, crypto, or gambling industries, where compliance risks are often hidden behind logins. On top of that, many sites limit request frequency, and crawlers must be optimised to avoid being blocked, throttled, or shown a CAPTCHA. Websites often block or challenge bots with CAPTCHA to prevent scraping, requiring advanced methods like rotating IPs or CAPTCHA-solving services.
- Website structure and data quality: Some webdesign architecture can break crawlers, necessitating frequent updates to scraping scripts. As a result, data retrieved from websites may be inconsistent or inaccurate, requiring validation and cleaning for reliable use.
- Scalability and reliability
As scraping operations grow, ensuring scalability and resource management becomes critical to maintain efficiency. Proper storage and organisation are essential for managing large volumes of scraped data and ensuring efficient access. In addition, automated scraping needs proper monitoring to ensure consistent performance and manage failures without manual intervention.
Going beyond the crawl in KYB and compliance
Web crawling is a cornerstone of effective KYB and merchant underwriting, but it’s not a standalone solution. Combining crawling with human expertise, automation, and continuous monitoring ensures a comprehensive compliance strategy. At Web Shield, our solutions like InvestiGate and Monitor integrate these elements to streamline KYB processes and reduce compliance risks.
It’s essential to understand that while we provide a powerful tool to help uncover risks, web crawling is just part of the synergy between underwriting technology and compliance departments. Expertise on both sides is needed to critically assess these findings. This collaborative effort ensures a more robust compliance framework.
To see how Web Shield can enhance your compliance efforts, request a trial and experience the difference for yourself.
Let us guide you through the world of compliance
Card scheme compliance can be a daunting task. Our team of experts is here to help. Get expert advice and cutting-edge tools to improve your business.