Getting familiar with Sitemaps

30 / Jul / 2015 by Rishabh Dixit 0 comments

About Sitemap

Sitemap is an XML file that include URLs of a website and their basic information like last updated, how often it changes and how much important it is as compared to other URLs by giving its priority.

Purpose of Sitemap

The information listed in Sitemap.xml file is used by the search engines to decide efficiently which URLs are available for crawling over a website. So Sitemaps are a URL inclusion protocol that allows search engine to crawl a site much more effectively and sensibly.

Sitemap being an XML file consists of XML tags. The Sitemap file must be UTF-8 encoded and can be a normal text file or a compressed .gz file. Some of the supported XML tags are:


<?xml version="1.0" encoding="UTF-8"?>

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

   <url>

      <loc>http://www.example.com/</loc>

      <lastmod>2015-07-29</lastmod>

      <changefreq>monthly</changefreq>

      <priority>0.6</priority>

   </url>

</urlset>

  • <urlset> : This tag is a required tag and is the topmost element used in the sitemap file after <?xml version> element and rest of the elements must be contained inside it.
  • <url> : This tag is a required tag and is the parent tag for each URL entry. The remaining tags are children of this tag.
  • <loc> : This tag is a required tag and is used to indicate the URL of the page. This URL must begin with the protocol(such as http,https) and end with a trailing slash, if required by the web server. Its value must be less than 2048 characters.
  • <lastmod> : This tag is an optional tag and indicates the date of last modification of the file. One of its format is YYYY-MM-DD.
  • <changefreq> : This tag is an optional tag and denotes how frequently the page is likely to change. Its values are: never, yearly, monthly, weekly, daily, hourly, always.
    If there is a document that changes every time it is being accessed, in that case the document should use “always” value and if there are some archived URLs which we know that never going to change anymore than in that case it’s better to use “never” value.
  • <priority> : This tag is an optional tag and used to describe the priority of this URL relative to other URLs of a website. Valid values are from 0.0 to 1.0. It’s default value for a page is 0.5.

There can be multiple Sitemap files, with each Sitemap file not having more than 50,000 URLs or must not be larger than 10MB. One can also compress the Sitemap file but when a Sitemap compressed file is uncompressed it must not be more than 10MB of size. If more URLs need to be included beyond 50,000 limit than create multiple Sitemap files.

Sitemap Index File

The main purpose of Sitemap index file is to group multiple sitemap files i.e if multiple Sitemap files are used then each one should be listed in a Sitemap index file.

Purpose of using Sitemap Index File

By doing so it would be easier for the crawler to keep track of multiple Sitemap files created for a website. Also similar to Sitemap files, the Sitemap index file must not list more than 50,000 Sitemaps and must not be larger than 10MB. There can be multiple Sitemap index files as well and can be compressed.

The following example shows a Sitemap index file:


<?xml version="1.0" encoding="UTF-8"?>

<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

   <sitemap>

      <loc>http://www.example.com/sitemap1.xml.gz</loc>

      <lastmod>2015-07-28</lastmod>

   </sitemap>

   <sitemap>

      <loc>http://www.example.com/sitemap2.xml.gz</loc>

      <lastmod>2015-07-29</lastmod>

   </sitemap>

</sitemapindex>

  • <sitemapindex> : This is a required tag and includes information about all the Sitemaps available for a website.
  • <sitemap> : This is a required tag and includes information about an individual Sitemap.
  • <loc> : This is a required tag and indicates the location of the Sitemap.
  • <lastmod> : This is an optional tag and identifies the time a particular Sitemap file was modified.

An important note about Sitemap.xml file that it must be UTF-8 encoded and any URLs or data values must use entity escape codes for the following characters listed below:

Character Escape Code
  • Ampersand (&)
&amp;
  • Single Quote (‘)
&apos;
  • Double Quote (“)
&quot;
  • Greater Than (>)
&gt;
  • Less Than (<)
&lt;

I hope you enjoyed it and will explore more about Sitemaps.

FOUND THIS USEFUL? SHARE IT

Leave a Reply

Your email address will not be published. Required fields are marked *