Search Engine Crawlers and Dynamic Web Pages – Site
Search Engine Crawlers and Dynamic Web Pages
Jerry Yu
There are misunderstandings and confusions in the Search Engine Optimization SEO world in regard to search engines indexing of dynamic web pages.
It has been claimed that search engine spiders dont index/crawl dynamic web pages well. This statement is only half true. The correct statement should be "Search engines dont index/crawl dynamic web pages well if the page URL contains "" without quotes character.". Search engines do index dynamic web pages very well if the page URL contains no "" characters.
URLs that contain "" are called dynamic URLs.
What web pages are dynamic
If you have knowledge about HTML, you know the web pages you create normally have .htm, or .html, file extension. These files are static because the HTML code dont change on the fly when requested and they are not processed by web servers. They can be viewed without using a web server.
A web page is said to be dynamic if it is created by using server-side scripting languages such as php, asp, jsp, perl, cgi and so on. These languages are like normal programming languages such as C++, Java, etc. The major difference is scripting languages cant be compiled beforehand. They can only be processed by web servers on the fly when the page is requested by a visitor. Dynamic pages cant be viewed without a web server.
When a dynamic page is requested, the web server first looks at the pages source code and if any server-side scripting code exist, it will process them and generate static HTML result. When processing of the full page has been completed, web server sends only pure HTML code to the web visitors browser.
Using scripting languages to create web pages gives you the power to do nearly anything you want. If the dynamic page has no "" character in its URL, search engine spiders treat the page the same as a normal HTML static page.
Query string parameters
When "" character is used, the pages full URL changes when values after "" change. The portion after "" is called the pages query string parameters, or simply query parameters. Every time when parameters changes, the resulted page will be different.
A page URL can contain more than one "" character. When this happens, search engine spiders will have difficult time to index the resulted page. If the page has only one "" character, major search engine spiders can crawl that page well. For example, Google can index and store a pages URL as http://www.examplesite.com/product.aspid=12345. But if the same pages URL is
http://www.examplesite.com/product.aspid=12345&category=23&page=3
Most search engines will not be able to index it well even though Googlebot and Yahoo! Slurp may be able to index it.
Note: Googlebot is Googles web-crawling robot. Yahoo! Slurp is Yahoos web-crawling robot. Search engine robots collect documents from the web to build a searchable index.
Yahoo help says
"Yahoo! does index dynamic pages, but for page discovery, our crawler mostly follows static links. We recommend you avoid using dynamically generated links except in directories that are not intended to be crawled/indexed e.g., those should have a /robots.txt exclusion."
Googles Webmaster Guidelines:
"If you decide to use dynamic pages i.e. the URL contains a "" character, be aware that not every search engine spider crawls dynamic pages as well as static pages. It helps to keep the parameters short and the number of them small."
Lets analyze what Google has stated above.
1. the URL contains a "" character: this means the definition of dynamic pages are those containing "" characters in URL.
2. keep the parameters short: this means the number of characters in each individual parameter should be short. There is no quantitative measurement given by Google but we can check some web forums to see examples. My Search engine friendly article http://www.webactionguide/action-guide/build-site/se-friendly.php referenced black hat seo discussion thread on Cre8ASiteForums. Its URL is http://www.cre8asiteforums.com/viewtopic.phpt=8386
This page was crawled by Google. The length of its query parameter is 4 characters. There are many other examples on the internet that have more characters and were crawled successfully. The maximum number of characters that can be accepted by Google is unknown.
3. keep the number of them small: this means we should keep the number of parameters in each URL as small as possible. The above Cre8ASiteForums example has one parameter.
At least now we can say Googlebot is able to crawl dynamic pages that have one query parameter and the number of characters in the parameter can be 4.
How to get your pages crawled if using query parameters are not avoidable
Query parameters are often used for database calls to retrieve stored information by using primary keys in one or more tables. Database Management System DBMS makes some tedious work easy to manage. When query parameters must be used for your site, consider build a site map page and hard code a pages URL. For example, the previous URL can be hard coded as
http://www.examplesite.com/product12345-23-3.asp
Hand code every dynamic page is time-consuming. If you use Apache web server, there is a Apache mod_rewrite module to help you http://httpd.apache.org/docs/mod/mod_rewrite.html rewrite the requested URL to one with no "" character embedded on the fly.
Another mod rewrite resource site is www.modrewrite.com.
An interesting article on weberblog.com talked about a practical example of how Google successfully indexed a dynamic page after applying mod_rewrite module. The page originally had 17 characters in the query parameter.
Before rewrite: http://www.weberblog.com/article.phpsroty=20040419170030157
After rewrite: http://www.weberblog.com/article.php/20040419170030157
So, if your site is experiencing the same problem, hurry up and implement mod_rewrite now.
About The Author
Jerry Yu is an experienced internet marketer and web developer. Visit his site http://www.WebActionGuide.com for FREE "how-to" step-by-step action guide, tips, knowledge base articles, and more.
How To See What Pages Of Your Site Google Has In Its Index – Domain Name
How To See What Pages Of Your Site Google Has In Its Index
Tinu AbayomiPaul
There is a lag time between the indexing or updating of your site, and the time it takes to show new results in the database. Depending on your site, where it was linked from, who it was linked from, and who knows what other factors, the amount of time varies.
With the method I teach in my book it seems to take two to four days on average for the Googlebot to stop by initially, and then another two days to one week to appear in search listings for the first listing.
You can read more about the book here: http://www.freetrafficdirectory.com/book
But even if it takes more than four to seven days for the Googlebot spider to show up at your site, or to return, if ever, there are several ways you can track the results. First, you can use Google itself.
Go to www.google.com and type in site: then your domain name. So for yahoo.com, youd type in site:yahoo.com.
The results will show you which pages of your site are showing up in Google.
If you know you wont have time to check on a daily basis, you can use a site called Google Alert, which you can find at
http://www.googlealert.com .
The great thing about this site is that it will track up to five terms per email address and have them sent to you via email on a daily basis. Using this you can track your ranking for your most important terms, or see how often your competitors site comes up versus yours.
To use this to see when pages of your site come up, create an account , then in the search terms section, type in, as one word, whatever is between www and your sites suffix .com, .net, .org, .biz, .uk, etc. and you will start getting emailed results.
The only problem is that the resulting page is sometimes a day behind Googles actual indexing. But for a free automated resource, you really couldnt beat it.
Until now.
Googles new Web Alerts just came out on the 29th of March. You can access it here:
http://www.google.com/webalerts
You can use Googles new Web Alerts service in much the same way. Its currently in Beta development, so make sure you save the information sent to you. Since its so new, youll probably want to sign up to both services and compare the results.
My favorite use for this is finding out when people mention my name or re-print my article at their sites, so that I can link back, or email to thank them. A big advantage Google.coms in-house version of the web alerts system is that they have a news version that you can subscribe to, which will help you stay on top of your niche in whatever industry youre in.
Currently I use the Google Alerts site for several on-going searches, and Googles Beta Web Alerts for my most mission-critical, time-sensitive news.
Theres yet another way to use Google to track how your site is doing in Google. It will tell you the cached version of your page, which Google stores. Sometimes the date posted next to the listing of the cached page can help give you a good estimate of when Google will be back at your site.
For example, at the moment, I seem to see the spider most predictably every day between midnight and 6 am EST since my home page began to score a PR of 5, then periodically at other points in my site during the day. I figured this out by looking at Googles cache of my home page over a period of one week.
This search will tell you pages that Google considers similar to yours. It will also show sites that it considered linked to you, and show sites that carry your full url, hyperlinked or not. Its not 100% accurate, but it will give you a much better idea than youd get from guessing- and its free.
Go back to Googles home page - www.google.com - and type in info:yoursitenameandsuffix. So if your site was ExactSeek.com youd type info:www.exactseek.com. You can also use site:yoursitenameandsuffix to find out which pages have been indexed by Googles search engine spider.
Curiously, Google used to show different results for info:www.exactseek.com and info:exactseek.com instead of including results for exactseek.com in the www evaluation. I havent seen this much anymore, but if you see one permutation showing up in results for the other, you may want to do both.
Youre going to want to bookmark this page and visit it on a weekly basis. The best day to look would be the one week anniversary of what day Google last cached a page at your site. The date will often be shown next to the word cached on one of your page results. If the cached page date is the same, that means Google hasnt been back to your site.
Marry this information with your study of your web stats to get more ideas on getting the most out of your weekly or daily exercises involving search engines and links from other sites, not just Google.
Copyright 2004 Tinu AbayomiPaul
About The Author
Tinus adventures with Google began when friends challenged her to "put her traffic where her site is". She was challenged to raise her brand-new site to top 100,000 status in Alexa and get well ranked in Google in 90 days, spending less than $100. When she won in 34 days, she decided to use the site she built to share her free traffic secrets. For more free traffic secrets, subscribe to her newsletter at ftdsecrets-subscribe@topica.com or visit her site for more free articles like this : http://www.freetrafficdirectory.com
Local Search and Internet Yellow Pages – A Whole New Vocabulary for Small Business Sales – Marketing
Local Search and Internet Yellow Pages - A Whole New Vocabulary for Small Business Sales
Dr. Lynella Grant
Buyers want both online and local information about where to buy. Most small businesses are local in nature, serving people who live nearby. Their customers found them through traditional methods like the Yellow Pages or newspaper ads. So far, the Internet hasnt figured prominently in their marketing efforts.
Thats about to change, as Local Search methods become more widespread. Even for buyers expecting to spend their money close to home, more and more of them go to the Internet to locate desired products and services. They rely on search engines to find suitable vendors in the fastest, easiest way.
Local Search combines the search query word or phrase with specific geographic terms, like city or zip code. That way, search results only include enterprises in that local area. Instead of information about a small enterprise being lost among millions of pages of search results, it shows up in a small pool of local providers. Thats good for them, as well as the person looking for what they provide.
Small operations can easily be located by a whole new group of buyers Consumers dont simply go to the Yellow Pages when ready to buy - as they once did. Studies show that an astonishing 36% of online searches are conducted to find local businesses. About a quarter of all Internet users already conduct local searches. Theyd do even more of it, if the desired small business data were more complete.
Local enterprises need to prepare for the impact of changing customer habits. An easy first step is to include your business in Internet Yellow Pages IYP, along with the printed Yellow Page directory. That puts your enterprise on the radar screen. Learn how your business can make the most of Local Search by visiting http://www.yellowpagesage.com Youll find reliable advice from experts in Yellow Pages and Local Search so you can get more mileage from your promotional dollars.
Start by getting comfortable with search concepts, and improve your odds of being found when people search online for what you offer. You dont even need your own Web site to benefit from Internet Yellow Pages and Local Search.
Learn the Relevant Terms
Search Engine - method for locating the information available on the Internet; a program that searches Web pages for requested keywords, then returns a list of documents where the query terms were found
Google and Yahoo, the major general search engines, have both shifted gears to make Local Search a priority when delivering relevant results.
Spider also called "crawler" or "bot" - goes to every page on every Web site and reads the information so it can be available to searchers; to "crawl" a site it collects and indexes information from it
Specialized Search Engines - narrow focus of information crawled and indexed, like medical, business, or shopping sites
Keywords - word or phrases used by search engines to locate relevant Web pages; words chosen to improve a sites search engine placement and ranking
Search Query - search request, which the search engine compares to the spidered entries, then returns results to the searcher
Search Results - compiled list of Web pages that a search engine delivers in response to a query; the number of items returned is usually overwhelming in the millions, so searchers only bother to view results on the first pages
Relevant Results - the test of a good search is whether the results obtained relate to what the person wanted to find, without a lot of irrelevant links
Local Search - combining a geographic term in a search query to locate suitable providers in a specific area
Pay per Click PPC - method of building traffic whereby site owners bid on search terms keywords that link to their site
Geographic Terms - specific information about the local area that can be included in a local search: zip code, town, county, geographic region, state
Top Ranking - sites shown on the first pages of search results
Search Engine Optimization SEO - fine-tuning keywords and page content so the Web site rates high in search engine results
Tags and Titles on Web Pages - provide site keywords and information to search engine spiders for indexing a site
Internet Yellow Pages IYP - directory of business phone numbers and locations in a geographic area, organized by category; searchable data base accessed on the Internet
Make your business easy for searchers to find The public is embracing the convenience of searching on the Internet to find information about local businesses. However, their searches for desired information are compromised because so many local enterprises dont show up in the databases as yet. Those that do have an edge in their local market. Climb aboard! Make sure searchers can find you. For little or no money, you can expose your enterprise to the whole world.
Whether or not your business has a Web site, you need to provide the information people are looking for in the places that they look for it. Local Search and Internet Yellow Pages open new avenues to buyers ready to spend. Best of all, they support and compliment your traditional methods of finding new business. So you cover all your bases.
c2004, Lynella Grant
About The Author
Dr. Lynella Grant, an expert in visual communication. How printed materials send signals that strengthen or undo the words. Author, The Business Card Book & Yellow Page Smarts http://www.yellowpagesage.com Off the Page Press 719 395-9450
grant@yellowpagesage.com