From the category archives:

Coding



Are Links a Good Proxy for Traffic?

by Dixon Jones on December 2, 2009

One of the most exciting developments on the web in the last two years has been the increased ability to combine two or more technologies to create something new. We see this most in the Social space, of course, but overlaying data sets could lead to a dramatically increased understanding of the world we live in.

Today – after a delay of some months – I received my Hitwise (now called Experian Hitwise) newsletter. It is one of the very very that I actually remember signing up to. The other 100 a day never seem to get to my in-box now. Hitwise always use the newsletter to focus on on me as a UK user and choose an industry to give us some insight about the market share (in terms of traffic) about that industry and today chose to focus on the UK’s online property websites. Read more >>

{ 4 comments }

BBC External Link Tracking Script

by Patrick Altoft on February 11, 2009

Since my post about the BBC blocking the link juice on external links they seem to have come up with a very clever piece of JavaScript to get around the issue. It’s not on all the external links but it’s on some of them.

Links are still clean and there is nothing in the source code to indicate that they pass through a redirect. However when you click on any external link users are still sent through a tracking script.

This might be a very useful script for some of you who want to track links but still pass SEO benefits, the original is here. For a demo visit this page and click the “Movie Review Query Engine” link.

var LinkTrack = function ()
{
this.docLinks = document.links;
this.location= location.pathname;
}
LinkTrack.prototype.updateHrefs = function ()
{
var currlink, hostname, protocol, linktext;
if (!(!document.getElementsByTagName && document.all))
{
for (var i=0; i {
currlink = this.docLinks[i];
hostname = currlink.hostname ? currlink.hostname.toLowerCase() : "";
protocol = currlink.protocol.toLowerCase();
linktext = currlink.innerText;
if (protocol == 'http:' && (hostname != 'bbc.co.uk' && hostname.indexOf('.bbc.co.uk') == -1))
{
currlink.href = this.getNewUrl(currlink.href);
if (document.all && currlink.innerText.toLowerCase() == currlink.href.toLowerCase()) currlink.innerText = linktext;
}
}
}
}
LinkTrack.prototype.getNewUrl = function (destination)
{
var newUrl = '/go';
newUrl += this.location;
newUrl += (newUrl.substr(newUrl.length-1) == '/')? 'ext/_auto/-/' : '/ext/_auto/-/';
newUrl += destination;
return newUrl;
}
var myC = new LinkTrack();
myC.updateHrefs();

{ 4 comments }

Five Ways to Detect Fraud Using Geolocation

by Patrick Altoft on December 15, 2008

This post was written by Quova, they sent it as a press release but I thought it was interesting enough to publish on the blog.

The 2008 Edition of the CyberSource Online Fraud Report highlights that out of 318 online sellers surveyed an average 1.4 % of their orders are lost to online fraud, often resulting from buyers who used credit card numbers later identified as stolen. The report estimates that in 2007 $3.6 billion in online revenues were lost in this way.

Though geolocation is just one of the risk monitoring tools used (the average e-merchant online uses at least four tools), it provides an important line of defense. The foundation for geolocation is the Internet protocol (IP) address – a numeric string assigned to every device attached to the Internet. When individual surfs the Web, their computer sends out this IP address to every Web site visited. Geolocation can provide much more than a geographic location. Many providers supply up to 30 data fields including country, region, state, city, ZIP code and Time zone for each IP address that can help to further determine if users really are where they say they are.

Equipped with this information, e-merchants can use geolocation to flag suspect transactions and address them individually.

Five key Ways to Detect Fraud using Geolocation include

Check for anonymous proxy servers and other location-masking systems

  • While not all proxy servers are bad, the use of an anonymous proxy that hides or masks a unique IP address can be a fraud indicator. Lists of anonymous proxies that are abusing the system are provided by a select few geolocation vendors (including Quova) that notify the e-merchant when an order comes from one of the proxy servers

Check the distance between actual and expected user locations

  • It’s a general rule of thumb that shoppers will be logging on the Internet within close proximity to their billing or shipping addresses. Many Quova customers report that orders coming from 500 miles or more away from the expected location have a higher probability of being fraudulent. With geolocation, e-merchants can elect to decline, or flag for review, orders falling X miles or more away from the shipping or billing address

Use domain information to assess risk

  • With access to domain information gathered from the shopper’s ISP, it can be easier to determine whether an order should be declined, accepted or flagged. An e-merchant can track user sessions and know that the customer frequently connects from work and from home.

Build user profiles

  • Once a profile is built, e-merchants can look for changes & differences between the observed behaviors they see online and what they have on file. Geolocation provides a simple way for merchants to expand their user profiles behind the scenes by assuming that most valid orders will follow the same pattern. If several different domain extensions or ISPs are used in one day, chance are those orders may be fraudulent.

Use time-zone information to track the transaction velocity

  • If a user is connecting to a Web site in relatively short periods of time and the log-ins are more then 1,000 miles away from each other, this is a major red flag for an online merchant. For each shopper, e-merchants can use geolocation data to enable business rules that
    1) request the current local time at the shopper’s location;
    2) alert them to potential “time-zone hopping” within a short period of time, where the same account is accessed from multiple geographic locations; and
    3) alert them to orders placed at times of the day that aren’t consistent with previous orders stored in the user’s profile.

It’s not unusual for a Web site to keep track of user behavior, such as pages they have clicked on and the products they purchase. This is called behavioral targeting and due to the customer’s computer never being accessed, geolocation does not infringe on personal privacy. In a nutshell, geolocation is just one of many things you can check in the fraud cycle and protects both the consumer and the merchant from criminal activity.

{ 1 comment }

VAT Cut Implications for Ecommerce Sites in the UK

by Patrick Altoft on November 24, 2008

In todays pre-budget report the Chancellor introduced a reduction in VAT rate from 17.5% to 15% to help kick start the economy and increase consumer spending.

VAT has been set at 17.5% since before ecommerce was invented and this latest change is going to cause a huge headache for ecommerce websites.

VAT Cut Implications for Ecommerce Sites in the UKSome systems are likely to have VAT hard coded into the ecommerce systems but even advanced programs will still need to be adjusted to reflect the new rates. Read more >>

{ 28 comments }

iPhone 3G affiliates to get £20 commission but only if they don’t promote it

by Patrick Altoft on July 10, 2008

Following on from my iPhone confusion post yesterday Kieron posted this morning about how affiliates are being banned from promoting the iPhone but that they still get £20 commission if they happen to make a sale.

This is the most ridiculous thing I have ever heard, here is the email from TradeDoubler that I got this morning:

Hi Patrick,

As a follow up to our communication earlier this week from the Carphone Warehouse, we have a very positive update regarding commission payments against the new iPhone.

Following discussions, it has been agreed that the Carphone Warehouse will pay affiliates £20 for each approved iPhone sale.

The iPhone goes live on the site on the 11th July, however they will also pay affiliates for all the Apple iPhone orders from the 7th July that go through as sales.

This is based on one important factor. Affiliates will not be able to advertise this product under any means, no links can go through to the Apple iPhone page, and if you been found advertising the Apple iPhone we will not be able to pay any commission on the sales generated.

Please don’t hesitate to contact us if you have any questions.

Kind Regards

Glen Blake
Account Director
glen.blake@tradedoubler.com

Words fail me. How can they tell people who have probably been writing about the iPhone for 2 years to suddenly stop promoting it? What about sites that have hundreds or thousands of visitors a day arriving from search engines looking for the iPhone? You can’t just turn this traffic off.

{ 4 comments }

Cut and paste one line of code to make any website editable

by Patrick Altoft on July 7, 2008

Have you ever wanted to edit the web pages of another website? This simple line of code makes it possible.

Of course you can’t actually edit the actual web page but you can edit the page as you see it on your screen.

This is one of the ways scammers create fake screenshots, fake Adsense & affiliate earnings and even fake Paypal transactions.

All you need to do is visit the site you want to edit, paste the code below into your web browser address bar (tested in Firefox & IE7) and hit the Enter button.
Then simply select a portion of text on the page and start editing.

javascript:document.body.contentEditable='true'; document.designMode='on'; void 0

{ 125 comments }

How to Scrape Pages With ColdFusion

by Patrick Altoft on January 26, 2008

This is a guest post by Guy from nullamatix.com

With the exponential growth of the Internet, data harvesting has become increasingly popular in the last few years. Several web sites sell large databases of information relevant to lawyers, doctors, businesses, schools, just about anything imaginable.

After seeing all this content, I asked myself, “How is all this information compiled?” Surely some poor sap isn’t being paid to manually insert each record. With a little research, I was able to come up with a pretty simple solution using Coldfusion.

To keep things simple, we’re going to harvest data from articles-hub.com. First, open your favorite text editor and drop in the following code:

<cfhttp url="http://www.articles-hub.com/Article/700.html" method="GET">
<cfset sDoc = trim(cfhttp.fileContent)>

This tells Coldfusion to literally get the contents of the specified page, then store that content into a variable named sDoc.

The following bit of code is where the magic happens. If you’re unfamiliar with regular expressions, now is a great time to learn. Insert the following bit of code after the variable declaration mentioned above:

<cfset regExp = '<span class="article_display_title" >
        ([\s\S]*?)</span>[\s\S]*?<div align=[\s\S]*?
</div>
    ([\s\S]*?)
          </div>
            </div>'>

Without going into to much detail, this variable tells Coldfusion what to look for, and where. View the source code of the page defined above and goto line 1016. You’ll notice the span tag defined in regExp is on that line. When our application is executed, Coldfusion will begin searching sDoc for that tag. Once located, the data sitting in place of the first expression ([\s\S]*?) will be defined as $1, which is the article’s title. Coldfusion continues searching, and looks over everything between:

</span>[\s\S]*?<div align=[\s\S]*?</div>

until the next expression containing the actual article content is reached. Finally, our variable stops when the two consecutive </div> tags are reached.

This information should simplify the regular expression creation process. Any set of information you’re wanting to store for later, use ([\s\S]*?). If you’re wanting to skip over anything, use [\s\S]*?.

With our data sets defined, we can output the results into a nice, organized product. Drop in this code:

<cfset q_srch = queryNew("title, article")>
<cfset start = 1>
<cfloop condition="#start#">
  <cfset stResult = REfindNoCase(regExp,sDoc,start,"Yes")>
  <cfif stResult.pos[1]>
     <cfset queryAddRow(q_srch)>
     <cfset querySetCell(q_srch,"article",mid(sDoc,stResult.pos[3],stResult.len[3]))>
     <cfset querySetCell(q_srch,"title",mid(sDoc,stResult.pos[2],stResult.len[2]))>
  </cfif>
  <cfset start = stResult.pos[1] + stResult.len[1]>
</cfloop>

The code above tells Coldfusion to create a virtual query with two columns: title and article. Next, a starting point to loop through the results is defined. The loop is then started and begins searching sDoc with the regular expression criteria defined above. Each matching result is parsed, stored in a virtual row with the respective column, and assigned unique ID. We’re now ready to test our primitive data mining application.

Here’s how our application should look as of now:

<cfhttp url="http://www.articles-hub.com/Article/700.html" method="GET">
<cfset sDoc = trim(cfhttp.fileContent)>
<cfset regExp = '<span class="article_display_title" > 

        ([\s\S]*?)</span>[\s\S]*?<div align=[\s\S]*?
</div>
    ([\s\S]*?)
          </div>
            </div>'>
<cfset q_srch = queryNew("title, article")>
<cfset start = 1>
<cfloop condition="#start#">
  <cfset stResult = REfindNoCase(regExp,sDoc,start,"Yes")>
  <cfif stResult.pos[1]>
     <cfset queryAddRow(q_srch)>
     <cfset querySetCell(q_srch,"article",mid(sDoc,stResult.pos[3],stResult.len[3]))>
     <cfset querySetCell(q_srch,"title",mid(sDoc,stResult.pos[2],stResult.len[2]))>
  </cfif>
  <cfset start = stResult.pos[1] + stResult.len[1]>
</cfloop>
<cfdump var="#q_srch#">

Go ahead and save the file as miner.cfm, or whatever you’d like, and browse to that file in your web browser. For example, http://192.168.230.239:80/miner.cfm. The article’s title and content are displayed in an organized table.

Here’s a screen shot of data harvested from a site containing US College information:

US School Data

Ok, that’s nice, but this information is totally useless unless we can dump it into a database, so here’s what we need to do.

After the </cfloop> tag, drop in a modified version of this code:

<cfquery name="insert_data" datasource="localdev">
INSERT article_dump(title,content) VALUES('#q_srch.title#','#q_srch.article#')
</cfquery>

The value of datasource is completely independent to each system – that just so happens to be the name of my datasource. After defining the appropriate datasource, you can either create a table with 3 columns (id, title, content) called article_dump, or us an already existing table. Just make sure to change the code where necessary. If you refresh miner.cfm in your browser, the data is not only displayed, but inserted into our database, too.

Let’s take this a step further, and automate the entire process. Go back to the top of miner.cfm and add the following code as the first line:

<cfloop from="500" to="5000" index="LoopCount">

Now replace 700.html on the second line with:

#LoopCount#.html

Scroll to the bottom and add a the closing cfloop tag to the last line:

</cfloop>

We just told Coldfusion to visit 500.html, 501.html, 502.html, 503.html, etc, until 5000.html is reached and insert each set of results into the database before moving onto the next. With this short piece of code, I’ve created databases with over 20,000 records in less than an hour, and now you can, too.

Here’s the entire final product:

<cfloop from="500" to="5000" index="LoopCount">
<cfhttp url="http://www.articles-hub.com/Article/#loopcount#.html" method="GET">
<cfset sDoc = trim(cfhttp.fileContent)>
<cfset regExp = '<span class="article_display_title" > 

        ([\s\S]*?)</span>[\s\S]*?<div align=[\s\S]*?
</div>
    ([\s\S]*?)
          </div>
            </div>'>
<cfset q_srch = queryNew("title, article")>
<cfset start = 1>
<cfloop condition="#start#">
  <cfset stResult = REfindNoCase(regExp,sDoc,start,"Yes")>
  <cfif stResult.pos[1]>
     <cfset queryAddRow(q_srch)>
     <cfset querySetCell(q_srch,"article",mid(sDoc,stResult.pos[3],stResult.len[3]))>
     <cfset querySetCell(q_srch,"title",mid(sDoc,stResult.pos[2],stResult.len[2]))>
  </cfif>
  <cfset start = stResult.pos[1] + stResult.len[1]>
</cfloop>
<cfquery name="insert_data" datasource="localdev">
INSERT article_dump(title,content) VALUES('#q_srch.title#','#q_srch.article#')
</cfquery>
</cfloop>

{ 16 comments }

Geotargeting with PHP : A complete guide

by Patrick Altoft on September 5, 2007

Geotargetting specific adverts, affiliate offers or content pieces is one of the most efficient improvements you can make to your website.

Spend an hour installing and configuring your geotargeting script and your revenue can increase dramtically.

Geotargetting is simply the art of showing different content to your visitors depending on which country they are from. For example if I have an affiliate offer that is only available to customers from the UK I know that it will be useless to US visitors. Using my script I will direct US visitors towards a similar product on Amazon or eBay.

Advertising networks such as DoubleClick and Adsense allow advertisers to target specific countries resulting in larger CPM payments and greater ROI for the advertisers so it clearly makes sense for you to do the same.

Other useful applications would include allowing advertisers on your site to target traffic from a certain location or to stop Yahoo Publisher Network ads showing for international users.

The first step towards installing your geotargetting script is to visit Maxmind and purchase a downloadable GeoIP database for $50. This database allows you to match up your visitors IP address to their country.

Once you have the database you will need to upload it to your site, I suggest keeping it outside your root directory or renaming it in case other people try to use it. The database file should be called geoip.inc.

Next you will need to add the script below to the top of each page on your website. If you have a main database connection file included already you could just add it to this.


if (isset($_COOKIE["geoip"])) {
$country = $_COOKIE['geoip'];
}
else
{
include("/home/your_folder/geoip.inc");
$gi = geoip_open("GeoIP.dat",GEOIP_STANDARD);
$country = geoip_country_code_by_addr($gi, $REMOTE_ADDR);
geoip_close($gi);

setcookie(“geoip”, $country, time()+3600, “/”, “.yoursite.com”, 0); //1 hour cookie

}

Using this information

Now we have stored the country of the visitor in a $country variable that can be used on each page of your site. To make use of this simply add the following code to your pages:


if($country=="GB"){
//UK offer
}
else
{
// worldwide offer
}

Combine this script with the outbound affiliate link redirection script and you have a perfect money making machine.

If you have any questions please post below.

{ 15 comments }

XSS Exploit on Half a Million 123 Reg Parked Domains

by Patrick Altoft on September 5, 2007

UK registrar 123-reg.co.uk has had a fair few customer relations issues in the past. Today I was digging into an issue for a client site and found some interesting things related to the way 123 Reg handles parked pages.

The problem was that the clients site didn’t open when you missed the www out of the domain. For example visiting this link was OK but this one takes you to a parked page (this site isn’t my client, just an example).

A quick check on how many sites 123 Reg has parked and indexed in Google reveals about half a million so there are plenty of trusted domains to have fun with.

123 Reg has left a nice XSS hole in their parked pages allowing any users to create an unlimited number of links on spam sites like this one and even better this one.

Basically every single one of the half million domains parked with 123 Reg can be injected with links to whatever sites a spammer wants.

123 Reg XSS

{ 4 comments }

Why you need an API

by Patrick Altoft on September 4, 2007

The benefits of having an affiliate program are well documented. Savvy ecommerce site owners can analyse the marketing strategies of successful affiliates and copy them for greater revenue. Merchants can also hide behind fake affiliate accounts while using blackhat and email spam to promote their products.

Having an API is just as good as having an affiliate program if you don’t operate a merchant site. BlogStorm uses the Yahoo API to track links and thousands of other sites create applications far cooler than those offered by the parent company.

If Yahoo, Google or any other company decided that an application running of their API was becoming too popular they could code a copycat version within weeks and gain more traction than the original quite easily.

Alexa recently added a bunch of new features to the Alexa charts after Alexaholic started to gain more and more users. Eventually Alexa blocked Alexaholic resulting in a PR disaster.

Today we see that Digg has a picture section, courtesy of a clever programmer and the Digg API. Digg has promised a picture section by October so there is plenty of time for digpicz.com to gain traction in the meantime. No doubt Digg programmers will be keen to see their users reaction to digpicz.com and have over a month to analyse feedback and improve their picture section accordingly.

The Digg management team has more awareness of reputation management than most web companies so they probably won’t try to shut down the digpicz.com but it must be reassuring for them to know they could if they had to.

{ 0 comments }