Technical SEO

When you talk about Technical SEO, it demands more attention from you and any mistake can result in disappearing from search results and you will face loss in organic traffic.

Internal linking: The right way to do internal linking is to use clustered rather than cross-linking. If you are doing pagination use rel ‘prev’ and rel ‘next’ for it.

Moreover, read internal linking questions.

Crawling and Indexation: There are certain key points that you need to care about them. Crawl budget, user agent, robots.txt, robot Meta tag, and canonical tag. I am describing it below:

Crawl Budget: How many pages are crawl per day is dependent upon crawl budget. Less budget means less crawling and more budget means more crawling. It depends upon the site and can vary site to site. You can check your crawl budget in the Google webmaster tool that is also called Google search console.

User-agent: Robots.txt is a text file created by webmasters to instruct web robots on how to crawl pages on their website. The robots.txt file is part of the robots exclusion protocol (REP)

User-agent Example:

Basic format:
User-agent: [user-agent name]
Disallow: [URL string not to be crawled]
Block all content on site:
User-agent *
Disallow: /
Block specific category on site:
User-agent *
Disallow: /category
Block specific URL:
User-agent *
Disallow: /Your-page.html
Allow specific URL - By default everything is allowed
Allow: /Your-page.html

Robots.txt file format.

Above I have described how the user agent work and how to allow file and folder.
Below I am providing the WordPress site robots txt file. You can see multiple pages and folder blocked by robots.

Just For Example

Sitemap: https://wordpress.com/sitemap.xml

User-agent: *

Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /wp-signup.php
Disallow: /press-this.php
Disallow: /remote-login.php
Disallow: /activate/
Disallow: /cgi-bin/
Disallow: /mshots/v1/
Disallow: /next/
Disallow: /public.api/


Note: * is associated with all bots. You can specify your bots instead of writing *. You can put Googlebot.

Robots Meta Tag Syntax

I am writing the code and its description

<meta name="robots" content="Directives">

In an explanation of this code, what are robots? Googlebot, Bingbot is an example of robots. Directives for robots meta tag is noindex, nofollow, nosnippet. all other directives explained in the bottom image which is at last of this page. Example:

<meta name="robots" content="noindex, nofollow">

Note: You need to type as it is.

If you will type above code then your site will not index by Google search engine and moreover, if you want not to track your own page by a search engine then and from other websites,

But make sure you can not use nofollow for blocking certain sites for your domain. You need to use Disavow links

In which URL you put this code that URL will not be indexed in the search engine and not followed by it. In this statement, I am talking about the page or post which will be affected by this code. 

Robots.txt valid schema and not a valid schema

In this image, you can see the work of robots.txt on a subdomain. When you are working on subdomain make sure valid robots.txt for it.  

robots.txt on a subdomain

In the above image, see the domain name and note down the way to instruct the crawler to visit your website. In the first option, you can see the subdomain and the secure version is not valid to http://example.com/robots.txt. See the proper match. Similarly, in the second option WWW version is not valid for without it on the start of the domain name and any of modification before domain that does not exactly match with its main domain where the robots.txt is handled.

Crawlers will not check for robots.txt files in subdirectories

For example in http://example.com/folder/robots.txt

Robots.text with IP address as the host-name will only be valid for crawling of that IP-address as the host-name. It will not automatically be valid for all websites hosted on that IP-address.

Robots.txt with ip address

Note crawler only finds robots.txt after the domain name but not in another directory or folder. For example www.example.com/robots.txt. The second option is confusing. Even the name is similar but extra code or text on the domain name makes a difference. Note: IDNs are equivalent to their Punycode versions. See also RFC 3492. Next is the third option in the image, a "file transfer protocol" is not the same as HTTP. Keep both separate in technical understanding. Think about the algorithm, how to get what. The fourth option says that the dot address is not valid with the text domain name even it is for the same domain.

Standard port numbers

80 for HTTP, 443 for https, 21 for FTP  are equivalent to their default host-name.

Robots.txt files on non-standard port numbers are only valid for content made available through those port numbers.

Robots.txt non standard post
Three codes 80, 443, and 21 I already told you about what you can use. The first option is for the first code to explain. Even without mentioning the code after the domain is valid. In the second option that looks similar to the first one by formatting not technically. For this note, Robots.txt files on non-standard port numbers are only valid for content made available through those port numbers.

Robots.txt directives like "alt noindex" and "alt nofollow"

Robots txt noindex directive tells to the crawler, not index the page in the search engine and also helps to don't show the cached link in the search result. Similarly, nofollow directive means not follow by Googlebot if you mention Meta name as Googlebot.

Alt noindex
Description of directives

1. none: attribute is for both no index and nofollow.

2. noarchive: Used when you are required to block old web pages or even an entire domain.

3. nosnippet: Used to block video, text result.

4. notranslate: It has the same functionality as above but in its own way as name instructing. 

5. noimageindex: Same as name instructing.

6. Unavailable after: Only use if you want to block the web page in search after some time but you need to add a date with it. 

To do this see my post HTTP Header.

SEO Points For Awareness

  • Google is doing mobile-first indexing. So make sure your web pages are mobile-friendly with no SEO issues. It will increase your SEO user experience through the website load speed.

  • Your XML sitemap should include all URLs you want to index.

  • After doing the on-page SEO make sure your all 301 redirects work properly.

  • Avoid duplicate content issues and build some quality links through link building. It is important for ranking factors.

  • Your site structure should use the clustered linking. Avoid non-relevant pages for links.

  • Use only JPEG 2000, JPEG XR, and WebP image format to reduce loading time which is recommended by Lighthouse and enhance the site speed with mobile-friendliness content.

Technical knowledge for simple understanding

Stay Happy!

Like and Share.

Popular posts from this blog

SEO Vocabulary - The SEO Dictionary

What IS SEO: Optimizing The Content For Search Engine

Mismatched URL - Title and Description Gap

SEO True False Statements

SEO Redirect: How it help us?

High Quality Website

Multiple Sitemaps: How To Submit A Sitemap Index

Canonical Tag - Mobile or Desktop Version Teller

HTTP Status Code IN SEO

Log Files On Server and IN SEO

Navigate the blog from the labels. This is for your help to easily navigate the articles according to the category. Moreover, on each post, you will find the internal links that will help you to navigate the related articles.
Protected by Copyscape

Popular posts from this blog

Log Files On Server and IN SEO

What IS SEO: Optimizing The Content For Search Engine

HTTP Status Code IN SEO