"Phishing" is the term coined by hackers for attempting to lure personal information out of people by pursuading them to visit web sites that look like genuine bank, credit card, or payment sites, when they are actually sophisticated fakes of those sites.
This tries to give a description of roughly how the phishing net works. It is pretty complicated, so this description can't be perfect.
Many of the items listed below handle "obfuscations" (attempts to disguise the real text) of text and URLs. These include swapping letters around, using letters that look very like other letters, using ";" instead of ":", using "," instead of "." and many tricks like that. I have tried to highlight which rules handle obfuscations, but I have not given the details of exactly what the rule will accept. There are many many variations on the expected text that will be detected.
| Keep track of all <BASE> tags as they provide a root URL for every relative link on the page. | ||
| Attach the <BASE> URL onto the front of all relative URLs contained in every link on the page. | ||
| Look for links contained in imagemaps. The imagemap may be inside a link to a safe site, and contain an image of the text of the name of the safe site. But it can have a rectangle defined in it, whose link destination is a fraud site. Reduce these by removing imagemaps so the real destination of the link is used instead of the apparent destination. | ||
| | ||
| Real destination or apparent destination | Operation | |
|---|---|---|
| apparent | Convert to lower case. | |
| apparent | Allow for links that look like Microsoft's ADO.Net, ASP.Net and other .Net functionality. | |
| apparent | Remove %a0 encoded characters (hard space). | |
| apparent | Decode all %-encoded characters. | |
| apparent | Remove all white space. | |
| apparent | Remove all leading numbers in square brackets. | |
| apparent | Change any \ to / as many browsers do this quietly to help Windows authors. | |
| apparent | Remove all HTML tags. | |
| apparent | Remove the username part of email addresses. | |
| apparent | Remove all &-encoded symbols such as < and >. | |
| apparent | Remove leading <. | |
| apparent | Remove trailing >. | |
| apparent | Convert all & characters to their international equivalent. | |
| real | Convert to lower case. | |
| real | Remove %a0 encoded characters (hard space). | |
| real | Decode all %-encoded characters. | |
| real | Force "safe" result if it does not contain either a . or a /. | |
| real | Remove all white space. | |
| real | Change all \ to / as many browsers do this quietly. | |
| real | Force "safe" result if it is an email address. | |
| real | Remove trailing dots and commas and other punctuation. | |
| real | Remove leading [numbers]. | |
| real | Remove all HTML tags. | |
| real | Remove "blocked::" labels as inserted by some other products. | |
| real | Remove "outbind:// | |
| real | Insert the BASE url if the link is relative and the BASE url is defined. | |
| real | Remove any leading http:// or ftp:// or obfuscations of those, including replacing the : with a ;. | |
| real | Force "safe" result if it is a mailto: link. | |
| real | Remove everything after the first / or ?. | |
| real | Remove any trailing br, p or ul tags. | |
| real | Force "safe" result is it is a file: link. | |
| real | Force "safe" result if it is a link to somewhere else in the same page (internal link). | |
| real | Remove any trailing /. | |
| real | Force "dangerous" result if URL contains any non-printable-ASCII characters. | |
| real | Identify JavaScript links. | |
| apparent | Continue searching if any of these are true:
| |
| apparent | Remove leading strings that look like http:, ftp: mailto: and other obfuscations of these. | |
| apparent | Remove everything after the first /. | |
| apparent | Remove all trailing . characters (and obfuscations). | |
| apparent | Add www. on the front unless it already starts with www, ftp, mailto or obfuscations of these. | |
| real | Force a "dangerous" result if Phishing By Numbers and link is numeric (IPv4 and IPv6). | |
| | ||
| both | Compare the apparent destination with the real destination, with an optional www on the front. | |
| | ||
| If they do not match, and the real address is not in the Phishing Safe Sites file, trigger a "dangerous" result. | ||
The less strict phishing net basically does the same process as above, except that it has a list of all the "generic" domains in every country around the world, such as ".com", ".co.uk", ".mil.es" and so on.
It chops the generic domain off the end, and uses the last remaining element as the name of the company or organisation owning the domain. If the displayed URL contains the same organisation name as the real URL then the result is considered to be a safe link.
So, for example, "http://www.mycompany.co.uk" with a real URL of "http://tracker.mycompany.co.uk" would be considered safe, but "http://www.othercompany.co.uk" would be considered dangerous, and would be highlighted.
The result is slightly less strict checking, but enormously less false alarms caused by companies that like to monitor exactly who clicks on what by using multiple servers.
If you are running an ISP, I strongly recommend that you run the "less strict" phishing net.