Cognitive Research: Fake Blogs Generating Real Money

Summary

In the past several months Cisco Cognitive Threat Analytics (CTA) researchers have observed a number of blog sites using either fake content or content stolen from other sites to drive traffic to click on ad-loaded web sites. We have observed traffic volume up to 10,000 requests per hour, targeting hundreds of sites. The estimated lifetime of this campaign is at least 9 months. With a single click worth anywhere from $0.01 and $1, these scams can yield substantial returns for their owners.

Fake blogs are not new, but these actors are operating with a slightly different MO. Effort has been made to evade web reputation based blocks and hide from the eyes of investigators. First, we observe a large number of similar sites with word-based and topic-based generated domain names. These sites look like benign travel-related blogs full of content at first sight. Secondly, most of the intermediate infrastructure will redirect a random request away towards Google, making the investigation more difficult.

The general traffic pattern was observed as follows:

Large numbers of requests arrive from infected clients to the fake blog sites. To look less suspicious, the requests look like search queries – for example: cruiserly.net/search/q/greyhounds.
There is a series of redirects via intermediate sites, which are already associated with click-frauds – for example: findreek.com.
These redirects bring the clients towards another set of fake sites, with travel related names (e.g. tourxperia.com), this time these sites have no content.
Finally, clients are sent to browse arbitrary web sites to generate clicks and/or revenue.

Details of the analysis follow:

Fake blog sites

The investigation started when Cognitive Threat Analytics (CTA) identified a volume of similar looking HTTP anomalies of the form

hxxp://statetraveling.com/search/q/disordering
hxxp://statetraveling.com/go/u/0/r/1581
hxxp://statetraveling.com/search/q/predators
hxxp://traveltourned.com/search/f/1/q/offside
hxxp://traveltournment.com/search/q/cabinetry

Since beginning of our investigation, we have discovered dozens of domains following the same pattern. The domain names are continually updated and revolve around travel:

constravel.net
countravel.net
cruiseraf.net
cruiserly.net
eurvoyage.net
excursionership.com
fairvoyage.net
journeymoons.com
journeymouth.com
mimicross.net
morecruise.net
otraveling.net
pejourney.net
rockcross.net
routewayscales.com
shipjourneys.com
statetraveling.com
traveltourned.com
traveltournment.com
traveltript.com
tripworldwin.com

The front page of these sites greets us with the banner “Travel worldwide with us” and there are a few blog posts. However, these articles are the same on all of the sites, and their content is copied word by word from Atlas Obscura, a popular travel-blog site.

Visiting any of the exact URLs with the “/search” component yields a page that states 0 results have been found. If these requests were human clicks, this would mean that all site visitor searches come up empty, all of the time.

Visiting a URL with the “/go” component will result in a HTTP/302 redirect to google.com. This is unusual behavior for a web service. Reading between the lines this is similar to saying “nothing to see here, move on”.

Access patterns

To investigate further, we needed to see what happened when the infected customers visited these URLs. This was performed by forensic analysis of the actual network traffic, which was available in form of web proxy logs.

By stitching together these logs using referrer headers and time proximity, we discovered a characteristic sequence that the infected clients perform:

URL	Res	Referrer
cruiserly.net/search/q/greyhounds	200
cruiserly.net/go/u/0/r/3846	302	cruiserly.net/search/q/greyhounds
199.182.165.105/c.php?i=ehysdx%2bqpifxpw9i4mxiof53on10kg1y4knnm 2blvqdp0cfcr%2be0tjplc%2boojt2xsgb69bo3 g0m6%2bq7rjm8wa1z8chuqqgz43ph%2b0m8o0e% 3d&j=842b98&tid=c8ff98d1430a-3b 683c-3b6535-643b64-3c3838-3b3735-656537 -373736-383637-663866-67693a-3c3a	302	cruiserly.net/search/q/greyhounds
findreek.com/cen?ag=8adb3fcf03e1009cd8db00c1256dbd36-81-0&g=9d9517&t=aa2a773	200	cruiserly.net/search/q/greyhounds
tourxperia.com/lr?si=8adb3fcf03e1009cd8db00c1256dbd36-81-0	200	findreek.com/cen?ag=8adb3fcf03e1009cd8db00c1256dbd36-81-0&g=9d9517&t=aa2a773
tourxperia.com/search	200	tourxperia.com/lr?si=8adb3fcf03e1009cd8db00c1256dbd36-81-0
findreek.com/cex?si=8adb3fcf03e1009cd8db00c1256dbd36-81-0	302	tourxperia.com/search
(arbitrary site)	302	tourxperia.com/search

Again, visiting most of these URLs from a browser leads to HTTP/302 towards Google. As we established before, this is indicative of intent to hide the infrastructure from uninvited guests.

To verify whether these clicks are likely to be human or machine, we analyzed their volume:

The different colored lines represent different customers aggregate activity per hour towards these sites. In the marked areas there is observable correlation of ramp-up and/or ramp-down of the activity. This suggests some coordination capability across the infected user base. The volume is large, peaking at the order of 10,000 hits per hour, which is not likely to be caused by human behavior. Putting these facts together, it would seem a botnet is the primary cause of this behavior.

Final destinations

The final redirect in the chain can send the client to a large set of web sites. For example, tourxperia.com/search is found to be a referrer of at least 100 other sites. We took a detailed look at two representatives:

espirial.com
news788.com

These sites don’t look as fake as the original travel-related blogs, but what they have in common is a fair amount of advertisements. The access pattern in this case is not a single URL (e.g. a single advertisement), but a full load of the web page. This means the botnet is used to draw traffic to sites, which opens the door to various monetization schemes.

IOCs and beyond

From the above analysis we can observe two HTTP destinations that are shared between all the traffic:

findreek.com
199.182.165.105

We have observed no legitimate use for these domains across hundreds of additional customers. Searching AMP Threat Grid for these domains shows a hefty set of malware samples with a threat score of at least 90 out of 100. These two pieces of information combined make good Indicators of Compromise (IOCs) and candidates to be blocked.

To search for more IOCs and understand the dynamics of the campaign, we have trained a customized classifier on the known IOC traffic and let it search the CTA telemetry data across a broader time scale to highlight further points of interest.

The results are as follows:

The fake blog sites are rotated out and replaced with new ones. However there is no update of the content of the sites. One example would be voiage.net [sic], which was registered in early February 2015, but the mock-articles are still dated to fall 2014.
The IP address 199.182.165.105 has been also replaced with the domain f.click-process.com. One of its DNS address records still points to this IP address.
The findreek.com domain has been in place for quite some time. We can observe it dating back to September 2014. At that time, it replaced an even older domain, clickered.com, which had the same behavior. According to passive DNS records, clickered.com has been in use from early 2013.

Conclusion

In this article, we have presented a network-centric deep dive into a long-running campaign. One of the unique aspects is the use of blog sites with fake content as front-ends for c&c. The analysis also shows how network forensics combined with advanced anomaly detection (CTA) and sandboxing (AMP ThreatGrid) can work together to identify and dissect such a campaign back-to-back, across its whole kill-chain.

Since the discovery of this campaign, the indicators of compromise (IOCs) have been propagated into the respective products. In case the IOCs change, we also have the custom trained classifier as part of CTA now, where it provides additional layer of robust detection capability.

About Cisco Cognitive Threat Analytics:

Cisco Cognitive Threat Analytics (CTA) is a cloud-based breach detection and analytics technology focused on discovering novel and emerging threats by identifying C&C activity of malware. CTA processes web access logs from the Cisco Cloud Web Security (CWS), Cisco Web Security Appliance (WSA), or 3rd party web proxies such as Blue Coat ProxySG. CTA reduces time to discovery (TTD) of threats operating inside the network. It addresses gaps in perimeter-based defenses by identifying the symptoms of a malware infection or data breach using behavioral analysis and anomaly detection. The technology relies on advanced statistical modeling and machine learning to independently identify new threats, while constantly learning from what it sees and adapting over time. Through additional careful correlation, CTA presents 100% confirmed breaches to keep security teams focused on the particular devices that require a remediation. Focusing on C&C activity detection, CTA addresses a security visibility gap by discovering threats that may have entirely bypassed web as an infection vector (infections delivered through email, infected USB stick, BYOD).

Alan Chung says:

September 24, 2015 at 10:26 am

Great blog! Well written with high information content and attractive title that ties back to the advantage of Cisco CTA.
Richard Staynings says:

September 25, 2015 at 2:20 pm

Well written article exposing a growing click fraud problem supported by solid research. Ironically I witnessed an expose of click fraud earlier this week using Quantum Metric – a new tool that recently came on the market. This is a much bigger problem than anyone imagined until now and is costing companies hundreds or thousands of dollars per hour!
Boris says:

September 25, 2015 at 11:11 pm

>With a single click worth anywhere from $0.01 and $1, these scams can yield substantial returns for their owners.

According to traffic flow, which you presented, such clicks costs $0.1-$0.5 per 1000 nowdays. They don’t look like Traffo Escobars.
- Michal Svoboda says:
  
  September 30, 2015 at 4:15 am
  
  Hi Boris, thanks for the comment. I guess the estimate was off since I used sources which list costs of more prominent clicks. Can you point at a good reference with more accurate click prices?

Comments are closed.

Security