Skip navigation

Category Archives: Uncategorized

On a recent project I had to send a collection of objects from JavaScript to an embedded Flash application. Each object was an associative array containing properties options for different elements in the Flash app. It turned out fairly cumbersome to actually do so! First, let’s review the ways that an associative array can be created in JavaScript:

1) An object literal

var arr = {name: "edge-color", group: "edges"};

2) An object

var arr = new Object();
arr.name = "edge-color"
arr.group = "edges";

3) An array literal

var arr = ["name": "edge-color", "group" : "edges"];

4) An array object

var arr = new Array();
arr.name = "edge-color"
arr.group = "edges";

All of these are acceptable means of creating an associative array. Next, we need to send this information to the Flash object using the ActionScript ExternalInterface class. This is where our trouble starts. If you use either of the options 3 and 4, this will NOT WORK! Array is a special class that has strange underpinnings. Only Object will be translated, and the most reliable way to do that is by converting to JSON. First, if you are using a modern browser the function JSON.stringify(Object) should be available to you without any special include on JavaScript.

On the ActionScript side, decoding JSON is equally straightforward:

var object:Object = JSON.decode(s);

I’ll leave it up to the reader to figure everything else out…

I started the actual process of writing my thesis two weeks ago. Here is the running log of what I have accomplished and the page counts. I need to have a rough draft out by the 25th.

June 10th – 10 pages. Introduction, Problem Description.

June 16th – 20 pages. Background Research, Some Implementation details.

June 17th – 28 pages. Screenshots and some results filling in.

June 19th – 39 pages. Added discussion, visualization sections. More results section, FoxNews integration.

June 20th – 52 pages. Rough draft complete.

A recent project I’m working on has a need for storing a Multimap, where the key is a String and the values is a Set. I’m trying to keep this as abstract as possible, but basically, these sets of values can grow over time and are compared to each other in a large scale fashion (in line of millions of records).

While Google’s Multimap collection is a great API to use, it turns out that their implementations are slow for this particular use case. In general, they are very concerned with maintaining data integrity when they expose their underlying objects, so if you change the exposed data it changes the internal multimap data. This adds a lot of overhead on iterating, which I have to do a lot of. In fact, profiling the application indicates that more time is spent iterating than on the expensive comparison operation we do.

These are the three implementations tested: HashMultimap, LinkedHashMultimap, and an internal StringMultiMap that isn’t concerned with exposing the underlying objects (and doesn’t use their WrappedIterator code). The StringMultiMap is extremely fast compared to the others. Obviously LinkedHashMultiMap is slower then HashMultimap. The chart loses some value since I can’t really describe the entire task, but basically, iterating and putting values is a major component of the task, and the task is held fixed while only the data structure changed.

I gave this presentation to the KEWI Group at the University of Nebraska at Omaha. It’s a tutorial on OpenNLP, and discusses using the library for document or text classification.

I’ve used OpenNLP in both work and academic settings and it works phenomenally well.

Check it out!

This happened using Nutch 1.0. When I went to import it into Eclipse, it gave me an error because it has an incomplete JRE set somewhere. This is a recurring theme, but I’m not entirely sure what is causing it, nor do I care enough to investigate it. So, logging it and moving on: I just had to set the java compiler to the default. Done.

Next – compiling. The base 1.0 didn’t want to compile for me. Was complaining about something on line 62. This sequence in particular:

      
      <touch datetime="01/25/1971 2:00 pm">
      <fileset dir="${conf.dir}" includes="**/*.template"/>
    </touch>
    

I removed that, it built. Success.

I need to find a way to add the full HTML content to the index, unfortunately since Nutch rolls their own parser and doesn’t use Tika this is going to be difficult. I’ll have to modify every parser to add the full content to the document. For now, the solution I have *may* work. I will try running it and seeing what I get for content. If it’s parsable/usable by everything I’ll leave it be.

Follow

Get every new post delivered to your Inbox.