PDA

View Full Version : This might explain some of the high number of guests.


Oldschool
02-13-2010, 03:11 AM
While looking at "who's online" I notice four guests had similar IP's so I opted to "show" "user agents" and resolve the IP addresses (possibly mod only functions) and found the following interesting.

__________________________________________________ __________

09:36 PM Guest
Viewing Member List

220.181.7.89
Baiduspider+(+http://ww w.baidu.com/search/spider.htm (http://ww%20w.baidu.com/search/spider.htm))

__________________________________________________ __________

09:43 PM Guest
Viewing Thread
Crimson-Helmed Rider aka CHR

220.181.7.80
Baiduspider+(+http://ww w.baidu.com/search/spider.htm (http://ww%20w.baidu.com/search/spider.htm))

__________________________________________________ ___________

09:49 PM Guest
Viewing Thread
A Modest Proposal

220.181.7.94
Baiduspider+(+http://ww w.baidu.com/search/spider.htm (http://ww%20w.baidu.com/search/spider.htm))

__________________________________________________ ___________

09:35 PM Guest
Viewing Index
Sryth Forum

220.181.7.82
Baiduspider+(+http://ww w.baidu.com/search/spider.htm (http://ww%20w.baidu.com/search/spider.htm))

__________________________________________________ ____________




My very limited knowledge told me this was a spider (a bot by my understanding) which can be used for various things. That was about the extent of my knowledge.

After a little googling it seems many search engines use spidering (http://en.wikipedia.org/wiki/Spidering) as a way to index the World Wide Web.

Baidu.com is a Chinese search engine for websites. Also it appears that this may be a "legitimate" or harmless bot doing just the above. But I only did a cursory bit of googling.

Any feedback appreciated.

EDIT: Spaced the "ww w" in the links above to disable them as I'm not "sure" of them.

Oldschool
02-13-2010, 03:31 AM
I'm sure other "crawlers" have visited at various times as well and that may also explain the few occasions where the forum seems slow or disrupted especially since I've noticed other crawlers (see the wiki article) such as Googlebot and Yahoo! Slurp visiting before although at the time I didn't know they what they were.


http://en.wikipedia.org/wiki/Spidering#Politeness_policy


EDIT: This could also explain slow-downs on other sites even registered sites like Sryth since the main page and most of the associated Sryth links are open to everyone.

Oldschool
02-13-2010, 04:28 PM
Currently seven (now eight) guests and all appear to be crawlers of one type or another.

racey
02-23-2010, 11:56 PM
Wow!

Most users ever online was 48, 1 Hour Ago at 06:47 PM.

Oldschool
02-24-2010, 03:05 AM
Wow indeed wonder how many were bots?

Oldschool
03-31-2010, 02:14 AM
http://www.osnn.net/attachments/green-room/3923d1134575201t-osnn-irc-channel-irc-freenode-net-holythreadresbatman.jpg



But seriously, I thought to revive this thread since some folks have questions/concerns regarding this type of activity.

These posts come from "The Round Table" (a mods only section of the forum).

Kudos to SMV for a very well informed post.


Just a heads up and most if not all of you have probably seen this thread or would've without this double-post.

While it seems this is legitimate surfing there are instance of bots being used for other purposes (e.g. spamming, "harvesting" email addy's for spam) so I thought to post it here.

[Edit: Removed link to this specific thread]

Thanks

This info comes from the forum of another game that I play.

A Never ending list of spiders and bots that have visited our forums ...

Alexa Spiders
Artabus Spiders
Ask.com Spiders
BoardReader Spiders
Clicksor.com Spiders
Cuil Spider Spiders
dotnetdotcom.org Spiders
ExaLead Beta Spiders
Google Spiders
Google Wireless Transcoder Spiders
Majestics MJ12bot Spider
MLBot Spiders
MSNBot Spiders
Netcraft Web Server Survey Spiders
Nutch Spiders
Panscient Spiders
PycURL Spiders
ScoutJet Spiders
TinEye Spiders
Whois Source Spiders
Yahoo! Slurp Spiders
Yandex Spiders
80legs Spiders


Spider (Bot) List

Spiders, or "bots," as there sometimes called, are software applications that automatically run tasks over the internet. There are multiple uses for spiders, but the two ill focus on are the ones were most concerned with. My own personal terminology (not an official definition) uses the term "Spider" to describe automated search engine software that combs our site, and the term "bot" to describe automated software meant to spam our community or individual members, thus ... Some spiders like the yahoo or google spider are our friends, while other bots or spam bots, are our enemies.

Spiders are our friends. We allow the spiders to comb out site to collect information that they take back to thier home to analize. The Google Spider, for example, collects all types of information; How long the website has been up, how long the domain name is reserved for, keywords, pages, spelling, language, working/broken links, keywords, meta tags, website performance, the format the website was designed in (html, ect), incomming links, outgoing links, redirects, error headers, timeouts, proxy support, ect, ect. The google bot then takes this information home, or to the google indexer, where everything is sorted and analized. This information is then stored in thier database and given a page rank. When someone runs a search on google, for example, then our information is displayed in thier search results based on the data the spider gathered from our site, in an order that is based on googles pagerank matrix. Its important to us that were listed in the first three pages of a google search with our keywords. Statistically speaking, page one is a difficult position but gives our site the most publicity ... Pagerank three, puts us on page three of the search results. while this is a good rank to have, its not number one. lol. anything after pagerank three is generally useless, as its been proven that the number of people that use the pages after page three is greatly reduced. i think its because people get to page three, then if they dont find what they want, they run anouther search with modified keywords. Basically said, if we want to be seen on the web, we need our spider friends to do what their doing. its also important to know, that the work done on the server side of this pagerank process has been done and compleated. our pages have been made spider friendly. file edits have been made to include spider text that they like to feed on, and the like. the majority of the work needed to get the pagerank higher, lies in the message board users themselves. the more content you post, the more keywords we have to be indexed, the more pages we have for them to feed on, ect. links to www.themacguild.com (http://www.themacguild.com) placed in signature lines or posts of users on other boards help out dramatically, as well as links posted on other websites as well. incomming links are probably the most important part of the pagerank matrix, but i encourage people not to post the links on other boards in the form of spam, as not only would we not want people to do that to us, we dont want our name associated with such an atrocius and inmature act.

Bots are bad. there is two types of bots that we will see on the forums. one is a flat out spam bot that is programed to register to any and all forums that they can, and post advertisments on the boards. there concern is not the one or two people that may follow links to there products, although they dont mind that im sure, but there primary concern is to put links on our message board. these links are VERY important to the matrix google uses to give websites a higher pagerank. and remember, the higher pagerank, the higher probability that millions and millions of people will find your site. So, in short, the spider will come in here and read the links to the spam bot left, then give a higher pagerank to the spammer, which results in big money for them.

anouther not so friendly bot is the bot that combs websites for contact information. these bots search thru every page to find email address, phone numbers, residential address, im information, ect, so they can spam you directly. please dont post your email on our pages, and notice that your email address has been properly masked in your contact information so that only users can contact you.

there are way to many spiders and bots out there for us to identify. as time goes, youll see our list grow and grow. as we submit the site to more and more search engines and directories, more spiders will be invited, and more bots will find us. with the good, come the bad. our doors are open for the spider, the defences are up for the spam bot, all you got to do is keep your email address and other personal contact information from the bots looking for that information.

hope this helps anybody that needs help in understanding whats going on with the spider list. if you have any questions, please let me know.

Oldschool
05-15-2010, 12:12 AM
Periodically I'll check for malicious bots.

Currently 22 users online - 15 guests.

Feels like I'm in an oldschool ;) game of Berzerk (http://en.wikipedia.org/wiki/Berzerk) with 14 of the 15 guest being bots - well spiders actually. Which are all legit, indexing things.

Oldschool
09-20-2010, 04:34 PM
For anyone interested in such things....

22 guests currently on - all appear legit.

19 search engine bots. All but two are for ask.com - one being Yandex a popular Russian search engine and the other Yahoo.

Oldschool
02-04-2011, 01:31 PM
Currently 18 guests - 17 crawlers and all but one is from ask.com.

Oldschool
03-01-2011, 07:37 PM
Where's Sam with Sting when you need him......

From Who's Online,

Username|Location|IP Address
Guest |Viewing User Profile Oldschool | spider44.yandex.ru

I'm being spidered...........!!!!!!! :eek: :cool:

Oldschool
07-24-2011, 04:03 PM
Culled from another thread where this would have been off-topic.....

Not that that's ever stopped me before. :rolleyes:

And for clarification staff can "resolve ip addresses" and "view user agents" which is a bit above my head but suffice it to say it'll sometimes (depending on things over my head I'm guessing) let staff view ISP particulars and other "stuff" like operating system, browser info, etc.... It's used to monitor users and while we get many bots/spiders most seem legitimate. However there have been suspected malicious bots surfing the forum. I say suspected because I end up having to Google any suspicious ones and am not 100 percent sure about the info.

DEJINX..... I'd say it's a given we've had malicious bots surf our forum and I give credit to the confirmation/approval check to join more than anything as to why we've not had any problems.

Example of a known "good" bot.

Normal view of IP address - 66.249.72.101

Once IP addy resolved (clicked) - crawl-66-249-72-101.googlebot.com

And viewing user agent (a menu option) -
66.249.72.101
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

or "resolved" -
crawl-66-249-72-101.googlebot.com
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

And although it's no secret a reminder to folks "staff" can see hidden/invisible users.

Tetracapillactomist
07-25-2011, 02:51 PM
Far too much snooping, neh, and some people I have no reason to trust being able to see too far down my throat while I sometimes feel like throttling theirs...
Oh well, let the snoops snoop, and the creeps creep. :)
Sometimes I wish them to make the first move so I expand some... energy. >;]
And at other times... who needs the hassle... >:\