Designing URIs
ASP.NET is essentially an advanced request-processing framework. Naturally, the URI is the most important part of any request (or should be). URIs should be well designed, and should represent the requested content accurately and succinctly. Unfortunately, they are frequently misused, which causes browsers, users, and search engines no end of trouble.Although you can't change the main portion of the URI without reloading the page, you can modify the fragment to your heart's content with Javascript. Adobe has released a BrowserManager class that makes "deep linking" easier, and vastly improves the user's experience on all-flash sites.
Some misuse URIs by making them too generic;
some sites have everything on the home page.
Flash, AJAX, and frames are the biggest culprits here,
as they are capable of making big changes to the current content of the page without affecting the address bar.
Users of this type of site are frustrated because if they bookmark a buried page in the site,
it only records the address of the home page.
The back button also betrays them - it doesn't undo their actions anymore,
but plops them completely off the site.
Search engines dislike these sites because either (1) they can't access buried content due to its form
(JavaScript or Flash) or (2) they can access it, but all keywords are diluted from the massive amount
of content available on one page.
Some developers take the misuse to the opposite end.
The feel that the address bar is the perfect place to store all variables, interface state data,
and user preferences. They, too, cause problems for both users and search engines.
Users bookmarking or e-mailing such links often find that they no longer work after their session has expired,
or after a change was made on the site.
Their length and lack of simplicity also makes them hard to understand,
as many users depend on the address bar to understand where they are located on the site. URLs longer than 80 characters are also a pain to e-mail. Many e-mail clients will break the URL in half, making it unusable.
Search engines find these type confusing, because they see (and rank) each unique URI as a separate page, and dilute ranking accordingly.
So, you ask, what makes a good URI?
- It should be as short as possible. Don't sacrifice consistency or obviousness, but be brief.
- Organize and name things logically. ASP.NET isn't always helpful in keeping a clean structure, so I highly recommend that you use a URL rewriting module. URIs should be 'hackable'.
- URIs should be deterministic.
- No two URIs should ever display the same page
- The same URI should always display the same content.
- The query string should only contain data that AFFECTS THE QUERY. If it doesn't describe the content, it doesn't belong. Keep the query string for queries, please.
-
Tip: Don't try to spam URLs with keywords. Density algorithms are applied here, also. As with page titles, pick 1 keyword and stick with it.The URI path should not rely on cryptic or numerical identifiers. If it does, it should also provide a human-readable title. It's really nice to be able to look at a URL and guess what it contains - especially when you have a long list of them. As a bonus, search engines absolutely love URIs that match keywords.
Bad examples:
- /Default.aspx?tabid=3
- /Products/ShowProduct.aspx?prodid=4982
- /showblog.aspx?articleid=98
Better examples:
- /Default.aspx?tabid=3&title=ContactUs
- /Products/ShowProduct.aspx?id=4982&product=Nokia_Wall_Adpater_12V
- /showblog.aspx?articleid=98&title= Why_you_should_never_concatenate_SQL_commands
Even better:
- /contact/
- /products/4982_Nokia_Wall_Adapter_12v
- /blog/98_Why_you_should_never_concatenate_SQL_commands
WWW
The famous "www" prefix is actually pointless. You can still have ftp, mail, and smtp subdomains without forcing your website to use www. The www convention came into being since servers were typically named after their role, and HTTP was just starting out. Since web browsers only speak HTTP, you should really point your second-level domain (example.com) directly to your web server. Realize that some search engines will index www.example.com and example.com separately, since they are different locations. To prevent SSL cert and cross-domain flakiness in Flash, you should standardize on one or the other. You can force this by checking for www in Global.asax, and calling Response.Redirect() with the "fixed" version of the requested URI.URIs in the HTTP protocol
Let's look at how URI is sent to the server using HTTP Here is a basic GET request. The first line consists of the HTTP method, followed by a root-relative path, then the protocol version. The subsequent lines contain the header collection, in the form of simple name-colon-value pairs. The two parts of the URI here are the path (/blog?page=2), and the HOST-header (youngfoundations.org). We know that the scheme is probably "http" since we are communication using the HTTP protocol. IIS tells us which port the request arrived on, so between the pieces we can reconstruct the original URI somewhat accurately.Note: there are LOTS of schemes out there that use the HTTP protocol, like firefoxurl://, etc.
Note: The HOST header is important, since some servers host dozens of domains,
and this allows IIS to forward the request to the appropriate application in shared hosting situations.
Multiple domains (hostnames) can be pointed to a single application.
The path and the query are divided by the first question mark.
GET Request
GET /blog?page=2 HTTP/1.1[CRLF]
Host: youngfoundations.org[CRLF]
Connection: close[CRLF]
Accept-Encoding: gzip[CRLF]
Accept: text/xml,application/xml,application/xhtml+xml,text/html; q=0.9,text/plain; q=0.8,image/png,*/*;q=0.5[CRLF]
Accept-Language: en-us,en;q=0.5[CRLF]
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7[CRLF]
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.5; .NET CLR 2.0.50727) Gecko/20070713 Firefox/2.0.0.5 Web-Sniffer/1.0.24[CRLF]
Referer: http://web-sniffer.net/[CRLF]
The client can send content with any request, although it is typically sent with the POST method.
The header collection is separated from the request body by the character sequence [CRLF][CRLF] (2 newlines).
The content in the request body is described by the content-type and content-length HTTP headers.
POST Request
POST /blog HTTP/1.1[CRLF]
Host: youngfoundations.org[CRLF]
Connection: close[CRLF]
Accept-Encoding: gzip[CRLF]
Accept: text/xml,application/xml,application/xhtml+xml,text/html; q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5[CRLF]
Accept-Language: en-us,en;q=0.5[CRLF]
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7[CRLF]
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.5; .NET CLR 2.0.50727) Gecko/20070713 Firefox/2.0.0.5 Web-Sniffer/1.0.24[CRLF]
Referer: http://web-sniffer.net/[CRLF]
Content-type: text/html; charset=utf-8 [CRLF]
Content-length: 19[CRLF]
[CRLF]
Sample content body
Response
The HTTP response generated by your ASP.NET application looks slightly different that the request that prompted it. The general format remains, but the first line is now [HTTP Version] [Status-code] [Status code description]. Http status codes are very important, but are beyond the scope of this article.HTTP/1.1 301 Moved Permanently [CRLF]
Connection: close [CRLF]
Date:Fri, 03 Aug 2007 00:36:57 GMT [CRLF]
Server:Microsoft-IIS/6.0 [CRLF]
X-Powered-By:ASP.NET [CRLF]
Location:http://www.microsoft.com [CRLF]
Content-Length:31 [CRLF]
Content-Type:text/html [CRLF]
Set-Cookie:ASPSESSIONIDSSSBDQAT=PIJAGJDBFFLFAALAJDCGBAMI; path=/CRLFCache-control:private [CRLF]
[crlf]
[Content-body]
Important note: If you have multiple domains pointing to one website,
make sure they are all 301 redirected to precisely one host name.
Otherwise you will sabotage your search engine placement by (1) diluting your page rank, and
(2) being penalized for duplicate content.
URIs versus URLs
The term URL (Uniform Resource Locator) has been considered obsolete by w3c for a long time. In its place stands the URI (the Uniform Resource Identifier). Strictly speaking, a URL must provide all of the information required to located and retrieve a resource, while a URI is only required to identify it in relation to the current context. Thus, a URL is a URI that "in addition to identifying a resource, [provides] a means of locating the resource by describing its primary access mechanism (e.g., its network 'location').". Most people aren't aware of the difference, and use them interchangeably.- For example, the following URI is also a URL:
-
- http://www.mysite.com:54321/ folder/virtualfolder/default.aspx? param1=thisisatest¶m2=test2
- However, these are not:
-
- ../css/shared.css [URI relative to the location of the parent document]
- /images/banner.jpg [URI relative to the current network location (usually termed 'absolute')]
- Logo.gif [URI relative to the location of the parent document.]
- #requirements [URI fragment relative to current document.]
Fragments
Fragments describe a section, place, or entity in the current document. In HTML, they usually refer to a certain anchor tag (by name or ID). The window is usually scrolled to the location of the anchor tag. Fragments are never sent to the server computer, and only function as a display instruction to the client. If a fragment isn't understood, it is ignored. Fragments are pretty much free-form. If the current document is http://mysite.com/home.html and a link to http://mysite.com/home.html#part3 is clicked, the browser (or user-agent), is not supposed to ask the server for http://mysite.com/home.html again, but older clients may. Relative fragments like #part3 are handled more reliably. Now let us dissect the following URL: http://www.mysite.com:54321/folder/virtualfolder/default.aspx? param1=thisisatest¶m2=test2 http The scheme (protocol). The protocol determines how the client should talk to the server (basically the language, or grammar). www.mysite.com The computer the resource is located on (DNS, WINS, or IP Address) :54321 The port number to communicate with on the computer. Instead of trying to sort out incoming packets and route them to the right application on the server computer, ports are used. Certain default ports are assumed for some protocols. Http requests are sent to port 80 by default. Https requests are sent to port 443, and FTP requests are sent to port 21. If an application is not listening on that port (or the request packets are blocked by a firewall), no response will be given. Additional sorting is sometimes performed, as in the case of WCF (.NET 3.0) port sharing, or when multiple sites are hosted on a single server. When an HTTP request is sent to a server, it is accompanied by the original hostname from the address bar. An unlimited number of DNS (Domain Name System) addresses can point to a single computer, which is convenient for web hosting providers. IIS (Internet Information Services) can be configured to look at this host header, and forward the request to whichever site is configured to receive requests for that particular hostname (DNS address). For information about DNS, read http://en.wikipedia.org/wiki/Domain_name_system.Super-simplified view of DNS
DNS addresses are hierarchical, and levels (domains) are separated by a period. Domains progress from most specific to least specific. For example, in resolving www.mysite.com, the following steps would be taken:- Ask computer 'COM' where computer 'MYSITE' is at (what its IP address is).
- Ask computer MYSITE where computer 'WWW' is at.