In one of my recent posts, I brought up the subject of using XML files as opposed to databases. When we released the referenced application to the public and began using it, I encountered a strange problem that took me the better part of the morning to solve.
After the first new news release was posted, which included an amazing amount of text and HTML code (the meat of the news release was two extremely long lists – broken into four-column tables – of award recipients). Once the release was posted, the script that accessed the XML files began timing out every time I tried to access the page. Of course, because the news release was posted almost immediately after making the page live, I had no idea what was actually causing the timeouts.
I had the following possibilities in mind:
- Our network connection to the server and/or within our office was having issues, which was preventing me from being able to load the script efficiently
- Now that the page was live, the amount of traffic we were receiving was overloading the script, causing too many RACE conditions while trying to access the XML files (if you don’t know what a RACE condition is, you should be able to find some good information on Google)
- The newest news release was causing some sort of unknown problem
I began by calling one of our telecommuters and asking her to try to access the page. Obviously, if she encountered the same issue I was encountering, that would eliminate the possibility that the problem was confined to our specific network connection within the office.
Needless to say, the script timed out for her, too.
The next step was to try to see if the script was getting overloaded by too much traffic. I began by decreasing the amount of information that the script tried to access at one time. Rather than pulling six months of news releases at a time, I lowered it to three months, then two months, and finally just the current month. With all of those adjustments, the script still timed out every time I tried it.
I then decided to open up the XML files and remove the latest news release to see if that made a difference. As soon as I did that, the script reacted immediately.
At least, at this point, I had figured out the problem was related to the latest post. The next question I had to answer was: What is it about this news release that was causing the problem? Naturally, I first assumed that the problem was related to the size of the code contained within the release. However, after doing a little research, I realized that that couldn’t be the problem, as we had other news releases from the past with a similar amount of code, and they were loading without any trouble.
Then, I noticed something odd about the new release that was posted: all of the code was contained on a single line within the XML file. I did a quick search and replace within the file to add new lines after each occurrence of “<br />” and “</p>”. I then put the code back into the XML file and saved it onto the server. It worked wonderfully.
Next, in order to avoid this problem in the future, I added a function into my script that will automatically insert those new lines when a news release is edited or created.
Now, the real question is, what actually caused the problem? What is it about the way VBScript reads files that caused it to choke on one long line of information? Why did it time out rather than throwing a particular error? Unfortunately, I have no answers to these questions, yet. If you have any ideas, please share them with me.