http://www.hackerschool.org/HS_Boards/zboard.php?id=Free_Lectures&no=1383 [º¹»ç]
ÀÏ´Ü
À¥º¿: À¥À» µ¹¾Æ´Ù´Ï¸é Á¤º¸¸¦ ¼öÁýÇÏ´Â ÇÁ·Î±×·¥
ÀϹÝÀûÀ¸·Î ±¸±Ûº¿ÀÌ Á¦ÀÏ À¯¸íÇÕ´Ï´Ù.
±¸±Ûº¿°°Àº À§´ëÇÑ(?) ÇÁ·Î±×·¥ ±îÁö´Â ¸ø¸¸µé°í.. ÀÏ´Ü ¿ø¸®ºÎÅÍ ¾Ë¾Æº¾½Ã´Ù.
ÀÏ´Ü ÃʱâÀÇ ÁÖ¼Ò¸¦ ÁÝ´Ï´Ù.
(¿¹¸¦ µé¾î¼ www.google.com)
±×·¯¸é ÀÏ´Ü ±× ÆäÀÌÁöÀÇ ¼Ò½º¸¦ º¿ÀÌ ´Ù¿îÀ» ¹Þ½À´Ï´Ù.
±×¸®°í ³ª¼ °¢Á¾ ¸µÅ©¸¦ ¼öÁýÇÕ´Ï´Ù.
±¸±ÛÀ» ¿¹·Î µé¸é http://images.google.comµîÀÌ ÀÖ½À´Ï´Ù.
±×·¯¸é ¶Ç ±× ÆäÀÌÁö¸¦ ´Ù¿î ¹Þ½À´Ï´Ù.
¶Ç ºÐ¼®À» ÇÕ´Ï´Ù.
¹«ÇÑÈ÷ ·çÇÁ¸¦ µ½´Ï´Ù...(¹¹?)
±×·¯¸é ÀÏ´Ü °£´ÜÇÏ°Ô ¸¸µé¾î º¾½Ã´Ù.
ÆÄÀ̽ãÀ¸·Î ÀÛ¼ºÇÒ°Ì´Ï´Ù.(C·Î Â¥¸é ¼ÒÄϺÎÅÍÇؼ ½É°¢ÇÕ´Ï´Ù.)
ÀÏ´Ü vim webbot.py¸¦ ÀÔ·ÂÇؼ ÆíÁý»óÅ·Πµé¾î°©´Ï´Ù.
ÀÏ´Ü ¼Ò½º¸¦ ³õ°í ½ÃÀÛÇÏ°Ú½À´Ï´Ù.
#!/usr/bin/python
import urllib2, re, string
enter_point ='http://' + raw_input('enter url: ') # enter point
db_name = 'base.txt' # input data base name
def uniq(seq):
set = {}
map(set.__setitem__, seq, [])
return set.keys()
def geturls(url):
items = []
request = urllib2.Request(url)
request.add_header('User-Agent', 'iBot ;)')
content = urllib2.urlopen(request).read()
items = re.findall('href="http://.*?"', content)
urls = []
for item in items:
item = item.replace('href=','')
item = item.replace('"','')
urls.append(item)
return urls
db = open(db_name,'w')
allurls = uniq(geturls(enter_point))
for url in allurls:
urls = geturls(url)
for u in urls: allurls.append(u)
allurls = uniq(allurls)
db.write(string.join(urls,'\n'))
print url+' ['+str(len(allurls))+']'
db.write('\n\n')
db.close()
ùÁÙ¿¡ #!/bin/pythonÀ» ÀÔ·ÂÇÕ´Ï´Ù.
ÀÌ°Ç ¼Ð½ºÅ©¸³Æ®ÀÇ ±âº»ÀÔ´Ï´Ù.
±×¸®°í import urllib2, re, string¸¦ Ãß°¡ÇÕ´Ï´Ù.
import´Â C¾ð¾îÀÇ #include¿Í ºñ½ÁÇÕ´Ï´Ù. ±×³É urllib2,re,string ÀÌ 3°³ÀÇ ¶óÀ̺귯¸®¸¦ ·ÎµåÇÑ´Ù°í »ý°¢ÇÏ°Ú½À´Ï´Ù.
±×¸®°í óÀ½À¸·Î ¾¾¾Ñ url°ªÀ» ÀÔ·Â¹Þ¾Æ¾ß ÇÕ´Ï´Ù.
ÀԷ¹ޱâ À§ÇØ enter_point ='http://' + raw_input('enter url: ') ¸¦ Ãß°¡ÇÕ´Ï´Ù.
±×·¯¸é http://ÀԷ¹ÞÀº°ª ÀÌ enter_point¶ó´Â º¯¼ö¿¡ ÀúÀåµË´Ï´Ù.
±×¸®°í ¸µÅ©¸¦ ºÐ¼®ÇØ¾ß Çϴµ¥ °¡Àå ±âº»ÀûÀÎ ¸µÅ©ÀÎ <a>ű׸¦ µû¶ó°¥ °Ì´Ï´Ù.
aű×ÀÇ ±¸Á¶Áß Áß¿äÇÑ ºÎºÐÀº href="url" ÀÔ´Ï´Ù.
±×·¡¼ Áß°£¿¡
item = item.replace('href=','')
item = item.replace('"','')
¶ó´Â Äڵ尡 µé¾î°©´Ï´Ù.
request.add_header('User-Agent', 'iBot ;)') À̺κÐÀº ºê¶ó¿ìÀúÀÇ À̸§ÀÔ´Ï´Ù. ÆÄÀ̾îÆø½º·Î ºê¶ó¿ì¡À» Çϸé ÀÌ ºÎºÐÀÌ ÆÄÀ̾îÆø½ºÀÇ °íÀ¯ ¹®ÀÚ¿ÀÌ µé¾î°©´Ï´Ù.
¾Æ¹«Æ° ¾î¶»°Ôµç Çؼ ¹®ÀÚ¿µéÀ» ¾ò¾î ¿Ô½À´Ï´Ù.
ÀÌ°É base.txt¿¡ ÀúÀåÇÏ°Ô µË´Ï´Ù.
ÀÌ·±°Á´ óÀ½½á¼ ¾û¸ÁÀ̳׿ä ....
³ªÁß¿¡ ¼öÁ¤À» ÇÏ°Ú½À´Ï´Ù.(¶ó°í Çسõ°í ¾ðÁ¦ÇÒ±î..) |
Hit : 16183 Date : 2010/02/08 04:52
|