The user agent string, a piece of data transmitted in the HTTP header during a web request, contains information valuable in determining browser type and often basic system information.
Example user agent string sent from a web browser during an HTTP request:
Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/532.5 (KHTML, like Gecko) Chrome/18.104.22.168 Safari/532.5
The above example, for instance, provides information such as browser and browser version, user locale (language), OS, system architecture and the layout engine used. When authoring documents for the Web, information from the user agent string can be valuable in determining how best to mark-up documents.
Getting the information is easy.
Collecting user agent strings
Two methods for accessing the user agent string include:
- From the HTTP request header’s User-Agent field; and
Reading from the User-Agent field
A benefit of using the HTTP header to gather data is simplicity of design.
HTTP request header showing the User-Agent field (in bold):
GET / HTTP/1.1 Host: livehttpheaders.mozdev.org <strong>User-Agent:</strong> Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/532.5 (KHTML, like Gecko) Chrome/22.214.171.124 Safari/532.5 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive
Using the HTTP header the user agent is transmitted directly to the HTTP server on page request, making it possible for servers to output the user agent string to a log file for later analysis. The user agent string alone provides enough information to implement on websites valuable browser support strategies such as graded browser support.
Regardless of the collection approach used, methods for extracting data from the string remain similar.
Data extraction methods
Once the user agent string(s) are collected, data extraction may take place. Two methods for reading and extracting information from the user agent string include brute force and pattern recognition:
- Under the brute force approach the user agent string is compared programmatically to a database of known strings. Though it offers a relatively simple implementation, the brute force approach can be difficult to maintain and becomes increasingly inefficient as comparison data sets grow larger.
- Thanks to RFC 2616 and preceding RFCs, and de facto standards for formatting user agent strings, another method known as pattern recognition is possible. Using pattern recognition the user agent string is broken into its component pieces and heuristics applied to gather information. Though more complex to implement than the brute force approach, pattern recognition does not suffer from the same problems in efficiency and maintainability in the long-run.
Due to its drawbacks in the application of extracting data form user agent strings, the brute force approach will not be discussed further in this article.
Check out Identify User Agent by string format recognition for an example of user agent pattern recognition. Though a little outdated, the article provides additional depth, in addition to some useful programming techniques and lax copyright restrictions.
User agent spoofing
Impersonating browsers and mobile devices is simple with Firefox. Just download User Agent Switcher plug-in and put it to the test at useragentstring.com. See Web Development and Debugging Tools for a list of tools useful for front end development.