06.28.05

A quick note on the web content reader through proxy server program

Posted in Programming at by chenty

I have written a web HTML reader through proxy server codes in my last post. However, when I integrate that in a Windows App, it has a small glitch. When I press “Query” Button, it takes 3-10 seconds to get me the result. During this time, if I click on “Query” button again, there will be no reaction, and it looks like the program is freezed. In order to overcome this problem. There are 2 solutions:

1. When OnClick is triggered, disable “Query” button, so users can see that the button is unusable. When the query is done, enable the button again. I like this way. It works pretty good to me.

2. Use multithreading technique. Embed the code I wrote last time in a method. Let’s call it runQuery.
private void runQuery(object o) {…}

In Query_Click method, I will use:
ThreadPool.QueueUserWorkItem(new WaitCallback(runQuery), null);

By doing this, Query_Click will exit as soon as it is clicked, and hand the job over runQuery. This will make the “Query” button available immediately. However, we should always handle multithreading with caution. This approach does create “race condiction”. For example, it can mess up byteCount that returned by stream handlers etc. Therefore, multithreading approach in this application is not appropriate. However, this approach can be helpful sometimes. I remeber they use a calculator application with multithreading support to handle “unresponsible button” problem in mutlithreading tutorial on MSDN.

Get HTML code from URL by using HTTPWebRequest and WebProxy

Posted in Programming at by chenty

I have created a couple of programs that retrieves web contents in HTML format by using HttpWebRequest class in .NET framework. However, In some cases, I want to retrieve the web content through a proxy server to protect my privacy. This can be done EXTREMELY easy with .NET framework (I start loving Microsoft lately).

All I have to do is create a WebProxy object, and assign to HttpWebRequest.Proxy property.
i.e.
WebProxy myProxy = new WebProxy(strProxyServerAddress, intProxyServerPort); //there are 10 different ways to create a WebProxy objects
myRequest.Proxy = myProxy;

That’s it. Quite simple, eh?

Here is a sample code to retrieve a webpage by using URL, Proxy Server, Proxy Port:

try
{
//create a HttpWebRequest
HttpWebRequest myRequest = (HttpWebRequest)WebRequest.Create(strURL);
myRequest.Method = “GET”;
myRequest.Timeout = 10000; //10 seconds timeout

WebProxy myProxy = new WebProxy(strProxyServerAddress, intProxyServerPort)
myRequest.Proxy = myProxy;

//send request
HttpWebResponse myResponse = (HttpWebResponse)myRequest.GetResponse();

Stream myStream = myResponse.GetResponseStream();
StreamReader myStreamReader = new StreamReader(myStream);
return myStreamReader.ReadToEnd().Trim();
}
catch (Exception error)
{
return error.Message;
}

For the codes above, strURL, strProxyServerAddress, and intProxyServerPort are variables for URL, Proxy Server Address, and Proxy Port respectively.

I setup a 10 seconds timeout because request through a proxy can take longer than directly visit a website, and sometimes it does not work at all. Set a short timeout will make sure the proxy you are using have good quality.

06.20.05

Export from SQL Server to new Excel files using DTS

Posted in Programming at by chenty

I have encountered a problem today by modifying one of DTS package I wrote. The package read data from SQL Server, and saved the result into a .csv file. The file name can be assigned through package global variables and Dynamic Properties Task. It works fine. However, after I changed connection type from “File Destination” to “Microsoft Excel 97-2000″, the package does not work anymore. After debugging around, I found the problem is this package cannot create new .xls files. After researching on Internet, I found the solution.

The problem was for regular text file, it can be created if it does not exist. This does not hold when assigning a non-exist file name as Data Source of Microsoft Excel 97-2000 connection. The solution is you have to create the Excel first, before you run the package that doing transfer. The code to create new Excel files in VB Script is:

Dim appExcel
Dim newBook
Dim oSheet

Set appExcel = CreateObject(”Excel.Application”)
Set newBook = appExcel.Workbooks.Add
Set oSheet = newBook.Worksheets(1)

‘Specify the column name in the Excel worksheet

oSheet.Range(”A1″).Value = “AccountNo”
oSheet.Range(”B1″).Value = “CustomerName”
oSheet.Range(”C1″).Value = “CustomerAddress”

‘Specify the name of the new Excel file to be created

DTSGlobalVariables(”v_FileName”).Value = “c:\abc.xls” /* you can assign this dynamically */
With newBook
.SaveAs DTSGlobalVariables(”fileName”).Value
.save
End With

appExcel.quit

Then you can pass the file name to the package, e.g. oPackage.GlobalVariables.Item(1).Value= DTSGlobalVariables(”v_FileName”).Value

This works well, and now my DTS package can export directly to Excel files. If you are new to DTS and look for more details, here is a link:
http://sqljunkies.com/How%20To/A8CB0AFE-D143-4B49-B865-4FBBFEDFCCD7.scuk

Moved my blog to a new server

Posted in Miscellaneous at by chenty

I am happy I moved my blog to a new host. Blogger is good, but I would rather pay someone to get my own space. I am using WordPress for this new blog system. Now I can easily back up blogs and edit template files (even though I am not quite familiar with PHP). I will also try to a little bit more because I am paying for it. =)

Last week was busy, and this week will be even busier. I need to move this week, and I need to acquire G driving license as well (I failed the G test last week). I was a little bit sick during the weekend too. What a week! Now I am going back to work. =P

06.15.05

Busy life

Posted in Programming at by chenty

I didn’t write anything for a couple of days. It’s not because I have nothing to write, but I am very busy with that mining program. There are several problems I need to tackle right now.

1. I need to study more about concurrency and locks in SQL Server. Currently, multiple copies running from different computers are not likely to load sample data at the same time. However, if I run the program in Windows Service mode, the retrieving will be much more frequent. Because each retriving transaction lasts around 1 mins, it is very likely to have 2 or more processes try to retrieve same piece of information, which is not supposed to be (that means both copies will mine the same data later, which is waste of resource).

2. I have never written an application in Windows Service mode before, so I am trying to figure out how things work. It seems to be simple, and just override OnStart, OnStop, OnPause methods etc. in ServiceBase. However, I think it is hard to make a good windows service application with good availability and reliability.

Anyway, I will log what I learned, and hopefully, someone else will find the information I present in my blog is useful. =)

06.08.05

Multithreading in .NET (part 2)

Posted in Programming at by chenty

In the last post, I created a sample code for run queries in different threads. I used foreach to create threads for all queries. However, this approach has a big problem: ThreadPool can main max 25 threads per processor, so create thousands of threads at the same time, will cause problem. I don’t know why you can queue them thread creating request, at least it does not work with HttpWebRequest. Therefore, I have to implement a mechanism can keep tracking how many threads are created.

I use a variable numOfThread to track. When a new thread created, the program calls:
Interlocked.Increment(numOfThread)

When a thread exits, it calls:
Interlocked.Decrement(numOfThread)
//use Interlocked.Increament instead of ++ to prevent race condition

In my program, I maintain 10 threads:
foreach (Record r in recordList)
{
while (true) {
if (numOfThread < = 10)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(runProcess), r);
Interlocked.Increament(numOfThread)
break;
}
}
}

This codes solves the problem of “There were not enough free threads in the ThreadPool object tocomplete the operation.” Do a search on Google, and you will find several posts regarding this problem. However, because I put the code in a while(true) loop, the CPU will be busy looping, which takes 50% CPU time (a horrible performance results). A quick fix of this problem put thread to sleep a small amount of time after each check:
if (numOfThread < =10)
{

}
else
{
Thread.Sleep(1);
}

Here, I sleep the thread for 1 millisecond, which is small consider each thread has 6 seconds life cycle, but can reduce CPU usage dramatically (only call 1000 times per second). After I implement this code, CPU time has been reduced to <1%.

The solution I presented above is not a very good programming practice. A reliable approach to fix the bug is to write your own thread and scheduling. Nevertheless, this solution is a really good fix, and can save tons of time. I have successfully run the above code on a 100,000 queries batch, so I assume it is good for most cases. =P

06.06.05

Multithreading in .NET

Posted in Programming at by chenty

The prototype of the application I am working on has a major performance problem. I did manage to run it on multiple computers to speed it up. However, the maintainability is very poor, and running on multiple computers are hard to manage.

The reason the application has poor performance is because when consuming the web service, there is 3-10 seconds delay (and sometimes, the delay can go over 10 second). The average delay is 6 seconds, which leads to 10 queries/min. This number can be increased by run on multiple computers. By running the application 5 different computers, the performance increases to approximately 50 queries/min. Therefore, I conclude that the delay is due to transmission delay on the Internet.

In order to resolve the problem, multithreading can be used since most time have been wasted while waiting the web service’s responses. I searched on the Internet, and I found the following links are pretty good to start with:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/csref/html/vcoricsharptutorials.asp
(This link is from MSDN, and it is a very good tutorial for beginners)

http://www.c-sharpcorner.com/Multithreading.asp
(This link has a list of articles and source codes that can help you understand more)

To resolve the problem my application has, I use ThreadPool. In multithreading development, programmers spend a lot of time to schedule and defines events and delegates to main a pool of threads. However, with ThreadPool, CLR will maintain a pool of thread for you. However, there are some cases you should think about creating your own threads, for example, if you want to assign priorities on threads for better performance, which cannot be done by ThreadPool. Nevertheless, in my case, I just want the program run the web service queries on multiple threads, and this can be done extremely easy with ThreadPool.

The first version I made is fairly simple:

for each query, I create something called Process. A process will handle all query and data parsing for a single query. The original design runs Processes one after another using foreach.
e.g.
foreach (Record r in recordList)
{
runProcess(r); //application has to wait one runProcess finished before it runs another
}

now I change this piece of code to:
foreach (Record r in recordList)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(runProcess), r);
}

eventFinish.WaitOne(TimeSpan.Infinite, true);

You can see, the program will create multiple threads to run, and then wait for finish signal.

in runProcess, I add:
if (totalRun == recordCount)
{
eventFinish.Set(); //tell main thread, all records are queried
}

(To be continued)

06.02.05

The more I know, the less I realize I know.

Posted in Programming at by chenty

Today, I have run the first integration test on performance of the project I am currently working on. It is a data mining program, but it is surprisingly slow. The good news is that I have expected this would happen, so I design the application can run on multiple computers concurrently. The test is successful, and I managed to run the application on 3 computers. However, I doubt how reliable this “concurrent” thing works. Thus, I have scanned through topics on Transactions and Locks. Kinda lost there. As I learned more and more about database system, I realized that how little I know about it. Nevertheless, the bottom line is, I will conquer it.

I have also built indexes on some tables, and the data file size increased insanely. It does improves the query performance a lot, but not much when I use “LIKE”. I figured index might not work well with “LIKE”. Well, I have to commit that I know little about database. =)

06.01.05

Stored Procedure (SQL Server 2000) and C#

Posted in Programming at by chenty

I have programmed many e-commerce websites that are implemented by ASP.NET (c#) and Microsoft SQL Server 2000. However, most these website SQL transactions seem to be simple, and I have used Dynamic SQL for years. That makes me a real amatuer because I have never used stored procedures. =)

However, I am sure I am not the only one, here are some arguments (very interesting read and comments):
http://weblogs.asp.net/fbouma/archive/2003/11/18/38178.aspx

Nevertheless, I recently work with an application totally based on database transactions, and one of the NFR is performance. Therefore, I decide to use “stored procedure” with my C# programs. After researching on Internet for days, here are some books and websites I found that are very useful.

Even though I haven’t really worked with stored procedure, but I have learned some before when I was taking database course, so I assume you are already familiar with the basics. If not, any SQL books are helpful.

If you work with SQL Server 2000, I think “MCAD/MCSE/MCDBA Self-Paced Training Kit: Microsoft SQL Server 2000 Database Design and Implementation, Second Edition (Exam 70-229)” Chapter 8 is where you should start with. However, this book is a little bit confusing (I feel it is really unorganized), so if you get lost while reading it, do not worry. Just try to understand the basic.

For C# programmer, I found this link is really helpful:
http://www.411asp.net/home/tutorial/howto/database/storedpr
The first article there is from MSDN, and that article provides very good sample code. When you read through it, make sure you understand how to use RETURN_VALUE. Another thing you don’t want to miss is how to use @@ERROR, @@ROWCOUNT etc. Sooner or later you will need to use these.

There was a problem I had while using SELECT TOP @COUNT. I was trying to let the stored procedure returns @COUNT records back, but this is not allowed in SQL SERVER 2000. The solution is not to use SELECT TOP, but to use “SET ROWCOUNT”. A sample code is like this:

SET ROWCOUNT @COUNT
SELECT * FROM …. –your SQL statement
SET ROWCOUNT 0

I hope these information I provide is useful to .NET programmers who just start to work with stored procedures. ^^