If it weren’t the weekend…

Published 4.22.2005 by ~mattg

… someone would have died by now. The past few days I’ve been absolutely swamped with tasks, and they aren’t the typical “hey, can you add this little feature here” type of requests.

First of all, two of the .NET applications I have been left with are in dire need of some major “tweaking,”, to put it mildly. On top of that, one of them is approaching a pretty big deadline, so it’s become even more pressing. The other, which is supposed in it’s “beta” period, has some huge flaws in it. It’s not like the features don’t work, in most cases the features work fine. It’s just some of my big pet peeves are being exercised with little care. For example, the primary “layout” tool for the one application is an HTML table. In most good design circles, this is sacriledge, but here, it seems to be common practice.

Enough griping. I did manage to begin my venture into control development. While it is only web controls, which I have to assume are easier from a rendering standpoint than their windows counterparts, it’s still pretty nice to be able to get into such development. It broadens one’s horizons.

For those of you anxiously waiting the regular expressions for glossarizing pages, read on.

The glossarizing problem, on the surface, is pretty easy. Find some terms on a page and wrap them in a custom {g} {/g} tag set (to be processed later by a different application). However, some of the extra stipulations made it a little more difficult than first though.

First of all, each key term had multiple words or phrases to match, so I couldn’t use a single MatchCollection class to encapulate them all. I ended up creating my own Collection object for storing these matches. There were also words that “did not match.” For example, the term “key” matches “key” and “Key”, but not “Key Terms” or “public key”. To handle those, I ended up locating the “do not match” terms, and removing terms from the Custom Match collection whose index well within the index and lengthh of the “do not match” term that I found.

Secondly, previously tagged terms should not be tagged again. In other words, if you tag “risk management”, you shouldn’t tag “risk”. This required the use of regular expressions lookaround feature. Specifically, ensuring that the custom glossary tags were not present before or after the search word. For example, to find “risk management” I used the following regular expression:

(?<!\{g\})\brisk\smanagement\b(?!\{/g\})

Last, but certainly not least, the phrases should not be found within htmll tags or their attributes. At first I thought about using lookaround again, but it got a bit hairy, so instead, I simply removed all the html tags and replaced them with $!$. I then performed all the searching and wrapping, and finally re-inserted the HTML tags at their proper location.

A quick note: If you use the indexes from the Matches in the MatchCollection to edit the string, be SURE you order the matches from highest to lowest index. If you do not, the moment you insert or remove characters from the low-ordered indices, the higher ones get screwed up. If i’d have figured that out sooner, I wouldn’t have pulled my hair out for so long.

Filed under .NET Development, Web Development

Comments (0)

Comments RSS - Trackback - Write Comment

No comments yet

Write Comment