Preprocessing in WUM (Program)

First program is “Loading the web server logs using user specified date range” . & then i have to preprocessing these logs to form the sequence database. Preprocessing include 1. data cleaning ( to remove .jpg extension ; to remove the page except status code is 200 (successful) )

2.User Identification ( i want to use with cs-username in IIS log but there’s a problem. *** I use Forms Authentication with no anonymous login but cs-username field is still “-” in logs. I have to solve this problem when I write a BookStore Web Site*** ) or (I have to use IP & user Agent to identify user,but there ‘s also a problem ‘cos I use 1 PC (WindowsXP , IIS5.1 , VS2005 ) to test my website so my ip address is “localhost” ; most of the other web usage mining thesis use logs from many sites so they don’t use cs-username; In my case , I built my web site & i use “Login username ” ,i think cs-username field must be filled with login-username” but i ‘v heard that if i use the IIS authentication, cs-username field is filled with DOMAIN\USER like widows logon user )

3. session identification ( after user identification – using cs-username or heuristics IP & UserAgnet) , i have to sessioniaze with timeout (30 minutes default)  . another way is Referer .

After sessionization , i ‘ll get the sequence Database (sid,sequence) ; I have to show this output to the teacher ; this is the first part .

This entry was posted in Thesis, Web Usage Mining and tagged , , , , , , , , , , . Bookmark the permalink.

Leave a comment