Task One:

This section goes over a quick explanation of Google as a search engine and website indexer, that uses spiders/crawlers to gather keywords and website urls to make a dictionary on websites to suggest when using the search engine.

ANS 1 : There is nothing needed to be done. Press complete and away we go!

Task Two:

This section explains the use of crawlers and the “How” a website is indexed.

Hint: Your best way of solving these questions is using ctrl+f, which will bring up a word finder, and search the webpage for keywords or clues of the question being asked.

Question 1: Name the key term of what a “Crawler” is used to do

The first answer can be found reading this paragraph or (ctrl+f) searching for the word “crawler” and seeing what sentences contains a word that is the answer:

ANS 1: index

Question 2: What is the name of the technique that “Search Engines” use to retrieve this information about websites?

This answer is a little harder to find and requires you to crawl the paragraphs yourself looking for the quotation key word “Search Engine” hint that the question is offering.

ANS 2: crawling

Question 3: What is an example of the type of contents that could be gathered from a website?

Searching for the word “content” will help with this answer. The question has a couple of possible answers of the type of content that can be gathered from a website. It could be urls to other websites posted on the crawled website, could be information on specific subjects, or keywords.

ANS 3: keywords

Task Three

This section goes over SEO and how websites can be ranked by the amount of keywords and hashtags that it meets in searchability for users, social media, and search engines.

These questions will require using the site: https://seositecheckup.com/ and using tryhackme.com to find the answers.

Link for test: https://seositecheckup.com/seo-audit/tryhackme.com

Question 1: Using the SEO Site Checkup tool on “tryhackme.com”, does TryHackMe pass the “Meta Title Test”? (Yea / Nay)

Check out the websites’ description Meta Title Test. Result has a passing green check mart

ANS: Yea

Question 2: Does “tryhackme.com” pass the “Keywords Usage Test?” (Yea / Nay)

The website has a red x mark meaning it did not pass the Keywords Usage Test and says there are not any keywords used.4

ANS: Nay

Question 3: Use https://neilpatel.com/seo-analyzer/ to analyse https://blog.cmnatic.co.uk: What “Page Score” does the Domain receive out of 100?

Look at the On-Page SEO Score

ANS: 81/100

Question 4: With the same tool and domain in Question #3 (previous): How many pages use “flash”?

Don’t see anything mentioning flash, or Adobe flash, so I am going with 0.

ANS: 0

Question 5: From a “rating score” perspective alone, what website would list first? tryhackme.com or blog.cmnatic.co.uk

The site — tryhackme had a score of 62, while blog.cmnatic.co.uk has a score of 81

ANS: blog.cmnatic.co.uk

Task Four

This section goes over Robots.txt and which file directories on a sitemap can be allowed or disallowed to be indexed by crawlers(and also can be limited to which crawlers can access these directories like if it is google or a bing crawler).

Question 1: Where would “robots.txt” be located on the domain “ablog.com”

When a site is first being accessed by crawlers, there will be a page where information is hidden under, and that is often in the directory/text file called “ robot.txt “ which will hold the sitemap for crawlers to get the index they want to yoink

ANS: ablog.com/robots.txt

Question 2: If a website was to have a sitemap, where would that be located?

The sitemap is the file that will hold an xml format file with the website’s index for crawlers to use.

ANS: sitemap.xml

Question 3: How would we only allow “Bingbot” to index the website?

This requires using the code that is mentioned in the lesson of User-Agent, and the specific crawler allowed instead of All which would be *

ANS: User-Agent: Bingbot

Question 4: How would we prevent a “Crawler” from indexing the directory “/dont-index-me/”?

Using the lesson example to solve this question of limiting a index directory

ANS: Disallow: /dont-index-me/

Question 5: What is the extension of a Unix/Linux system configuration file that we might want to hide from “Crawlers”?

The hint for this says “system files are usually 3/4 characters!” so that means the configuration file extension is slightly short than the usual abbreviation of config

ANS: .conf

Task Five

This section is pretty self explanatory after reading the first few paragraphs of this lesson.

Question 1: What is the typical file structure of a “Sitemap”?

As mentioned in a previous lesson, if you look at the file format for “/sitemap.xml “, it uses a xml format

ANS: xml

Question 2: What real life example can “Sitemaps” be compared to?

The sitemap is the map to help the little crawlers to not get lost on a website.

ANS: map

Question 3: Name the keyword for the path taken for content on a website

The first explanation of routes are in the lesson’s sentence “The blue rectangles represent the route to nested-content, similar to a directory I.e. “Products” for a store.”

ANS: route

Task Six

Now to the meat of the whole “Google Dorking”/Google Fu by using the index categorizations for websearches that Google has meticulously gathered. All those crawlers can now be used in reverse uno fashion for your speedy research.

Question 1: What would be the format used to query the site bbc.co.uk about flood defences?

You want to use the tag [index:] the website [bbc.co.uk] and the topic [flood defences]

ANS: site: bbc.co.uk flood defences

Question 2: What term would you use to search by file type?

Looking at the chart, filetype: would be the option

ANS: filetype:

Question 3: What term can we use to look for login pages?

The Hint in this says “term: query” so if you think about an intitle on a web address and web page title, it may say website.com/login, or the page title itself may say “login”

*ANS: intitle: login

And, we completed this room as well 🙌