Tuesday, November 12, 2013

AJAX Client Triggered Reloads (and a general API framework for Server)

I have been asked this question the most, and I also see this appear numerous times on the Community and other blogs. Stefan Walther has done an amazing amount of work to solve this problem. The solution requires IIS, Stefan's EDX wrapper written in C#, and Stefan's Javascript extension for QlikView. Although I have not tested it, the architecture tells me that it probably violates the same-origin policy and will require browser configuration change on each client.

I ventured out to find a simpler alternative and in the process I discovered how we can use the QlikView Automation API, and thus Desktop style macros, with a Server deployment.

Perhaps the simplest way to trigger a reload from AJAX client is to create a button that does a Server side export, and write a script on the Server that waits for the export file to appear. When the file appears the script will use qv.exe to do the reload, delete the export file, and continue to wait for the export file to appear again. Different file names can be chosen to indicate full or partial reload etc. The only additional software required with this approach is a script, which could be a dozen lines of batch file or VBScript.

A little more sophisticated way of sending messages from AJAX clients to your scripts, because that it is essentially what we are trying to do, is using DynamicUpdate. DynamicUpdate can insert, update and delete data rows loaded in memory. The changes are instantly and simultaneously visible to all clients. DynamicUpdate commands can be conveniently programmed as button actions. Changes to in-memory data can be monitored and acted upon as follows.


  1. A button with Action DynamicUpdate Inserts, Updates or Deletes rows into a table, not related to business data, to signal what it needs to do
  2. The Insert, Update or Delete can be conceptualized as a message that the client is broadcasting to all other clients
  3. A Desktop client that has the document open in Server is able to see the message in terms of Insert, Update or Deletes on a certain table or tables that it is monitoring
  4. The Desktop client (same or another instance) has also opened the QVW file that is loaded by the Server to perform Reload and Save. This is required because the Reload and Save aspects of API are disabled when the Desktop client loads a document from Server
  5. The Automation API exposed by the Desktop client is now available to macros and/or external scripts to access memory data loaded by server, which allows to read "messages" sent by other clients, and to perform reloads
  6. The macros and scripts are written the same way as you would for a Desktop application, thus restoring the missing macro functionality in a Server deployment. Not only can you process messages sent by clients, you can also update data back to a database server etc. based on changes made my clients, which is the other frequently asked question
I will provide working code samples in future posts.

Monday, November 11, 2013

Adding Data to Memory with Partial Reloads

The Add Load statement is wonderful for adding data to memory and making it instantly available to all AJAX client users. It is an alternative to DynamicUpdate, which I will talk about in near future, for relatively large additions or changes. Previously I provided an example of getting real time quotes by parsing JSON returned by Yahoo API service. In that example we can replace the Load statements with Add Load, and schedule Partial Reloads every 5 minutes to get near real time pulse on the market.

Tmp:
Add Load
json2csv(@1, 'query.results.quote') as CSV
From [http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.quotes%20where%20symbol%20in%20(%22YHOO%22%2C%22AAPL%22%2C%22GOOG%22%2C%22MSFT%22)%0A%09%09&env=http%3A%2F%2Fdatatables.org%2Falltables.env&format=json] (txt, utf8, no labels, delimiter is '');

Quotes:
Add LOAD * FROM_FIELD (Tmp, CSV)
(txt, utf8, embedded labels, delimiter is ',', msq);

Drop Table Tmp;

Note that with the Publisher license it is straightforward to schedule Partial Reloads by using the QlikView Management Console (QMC). However without Publisher you can only schedule full reloads. To workaround you can schedule a batch file from Windows scheduler that uses QV.EXE command line option /rp for partial reload.

qv.exe /rp c:\ProgramData\QlikTech\Documents\stocks.qvw

The beauty of all of this is that QlikView is grabbing, storing, processing and presenting data. Database and application servers are not required.

Thursday, November 7, 2013

Consuming Web Services that Return JSON

There are hundreds of web services that return data in either XML or JSON formats. XML is understood by QlikView and is straightforward to load. However QlikView cannot directly parse JSON formatted data which is becoming more and more popular due its simplicity and ease of consumption by browser apps written in Javascript.

There are many different ways to connect a new data source, such as a web service that returns JSON, to QlikView. You can write an external program in C#, PowerShell, Python etc. to talk to the web service and dump the data into a format such as CSV or XML that QlikView can read. You can use the Connector API to create a connector. You can also use an external program to use the Automation API to pump the data in via DynamicUpdate. Here is a fourth approach that is simple, requires no external components and does not create any files. It also highlights the usefulness of loading from a Field.

Here is the Javascript code to parse JSON. Add it to the Module in your QVW document.

function getRow(obj) {
    var str = '';
    for (var p in obj) {
        if (obj.hasOwnProperty(p)) {
            str += '"' + obj[p] + '",';
        }
    }
    return str.slice(0, -1);
}

function getHeading(obj) {
    var str = '';
    for (var p in obj) {
        if (obj.hasOwnProperty(p)) {
            str += '"' + p + '",';
        }
    }
    return str.slice(0, -1);
}

function json2csv(jsonText, arrayName) {
var object = eval('(' + jsonText + ')');
eval( "var array = object." + arrayName );
var csv = '';
csv += getHeading(array[0]) + '\n';
for (var i=0; i<array.length; i++) {
csv += getRow(array[i]) + '\n';
}
return csv.slice(0, -1);
}

Here is an example that loads stock quotes for your favorite companies from the Yahoo Query Language (YQL) API. YQL can be told to return in XML as well, but I am asking it to return JSON here. Note that the entire From for Tmp table load is one long line!

Tmp:
Load
json2csv(@1, 'query.results.quote') as CSV
From [http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.quotes%20where%20symbol%20in%20(%22YHOO%22%2C%22AAPL%22%2C%22GOOG%22%2C%22MSFT%22)%0A%09%09&env=http%3A%2F%2Fdatatables.org%2Falltables.env&format=json] (txt, utf8, no labels, delimiter is '');

Quotes:
LOAD * FROM_FIELD (Tmp, CSV)
(txt, utf8, embedded labels, delimiter is ',', msq);

Drop Table Tmp;


Wednesday, November 6, 2013

Cleansing and Matching using Regular Expressions

Here are three module functions in Javascript that I find handy to cleanse data during load.

function parse(pattern, options, text, returnIndex) {
var re = new RegExp(pattern, options);
var result = re.exec(text);
if (result && result.length > returnIndex) {
return result[returnIndex];
}
return null;
}

function matchx(pattern, options, text) {
var re = new RegExp(pattern, options);
var result = re.exec(text);
if (result) { return true }
return false;
}

function replacex(pattern, options, text, newText) {
var re = new RegExp(pattern, options);
return text.replace(re, newText);
}

Use parse to grab interesting text from a field, usually to create a new field. If a field has job description, you may to create a job title field from the description by looking for words that end with 'er' such as 'developer', 'manager', words ending with 'tect' such as 'architect', and words ending with 'yst' such as 'analyst. This results in the following load script code to be used within a load statement:

parse('\s(\w+er|\w+tect|\w+yst)[\s$]', 'i', [Job Desc], 1) as Title

The "i" option makes matching case in-sensitive. The returnIndex of 1 returns index 1 of the array created by Javascript RegExp.exec function. This will allow you to extract and separate multiple items from one field into multiple fields.

Use matchx to match fields to complex regular regular expression patterns. Matchx returns true or false. Here is an example. If a field has the word 'garbage' in any case, any where, that record will not be loaded.

Reject records where myField  is 'This is garbage', 'Garbage no good', 'No GarBage allowed', etc.

LOAD
     myField
FROM mySource
Where not matchx('garbage', 'i', myField);

Use replacex to replace regular expression pattern matched substring(s) in the field. The following will replace 'nyc, NY', 'manhattan,NY', and 'New York City, NY' in Location field with 'New York, NY'.

replacex('^(nyc|new york city|manhattan).*?,', 'i' , Location, 'New York') as CleanLocation

My favorite site to learn writing regular expressions is http://www.regular-expressions.info/

Screen Scraping Web Sites with QlikView

There are websites that essentially provide a web interface to a database. Data is often presented as a HTML table, sometimes spread on multiple pages. When you feel limited by the slicing and dicing options as well as search capability, it is tempting to pull the data into QlikView. However most websites are designed for end users and do not provide bulk export. QlikView does provide a HTML parser than can read HTML tables directly from a web site.

First you have to make sure that the interesting data is in HTML tabular format by looking at the source of the page. HTML tables use the <table><tr><td></td></tr></table> structure. Next you should observe the URL in the browser address bar as you click through the website, noticing the differences when you select different options.

Here is an example of a load script that will load 150 technology jobs from dice.com:

Jobs:
LOAD [Job Title],
     Company,
     Location,
     [Date Posted]
FROM
[http://www.dice.com/job/results/10013?caller=advanced&n=50&src=19&x=all&p=z]
(html, codepage is 1252, embedded labels, table is @1);

for i = 50 to 150 step 50
LOAD [Job Title],
     Company,
     Location,
     [Date Posted]
FROM
[http://www.dice.com/job/results/10013?caller=advanced&n=50&o=$(i)&src=19&x=all&p=z]
(html, codepage is 1252, embedded labels, table is @1);
Next

Notice the $(i) embedded into the second load FROM URL which tells dice.com web server the record number to list from. This is required because the server will list a maximum of 50 records at a time.

Note that most web sites do not allow you to redistribute their content, and some will limit number of hits per day from an IP address. It is a good idea to dump into QVD or CSV after running the load, and using local files for successive loads.
store Jobs into jobs.csv (txt);