... could potentially return a great deal of data. Would it be better if it were passed a callback so that data can be streamed out?
Brian FrankWed 12 Jun 2013
That is a good question. The problem is that typically that data then has to get transformed into a HGrid, and then serialized into Zinc. So the entire pipeline would have to be redesigned to handle streaming. That would be pretty complicated and especially painful in Java since it doesn't have first class functions. I actually implemented things in SkySpark like that, and have decided it really isn't worth the complication.
Matthew LohbihlerThu 13 Jun 2013
Oh, SkySpark doesn't actually use your Java implementation? Maybe i'll just make the streaming changes in my own bitbucket branch then.
Matthew LohbihlerThu 13 Jun 2013
Oh, re first class functions, this is true, but interfaces and anonymous classes make it pretty straightforward regardless. The syntax isn't exactly tidy to say the least, but using Eclipse et al helps remove most of the pain.
Brian FrankThu 13 Jun 2013
SkySpark is written in fantom - so we don't use any of the code in the Java Haystack toolkit directly.
Mike JarmyFri 14 Jun 2013
Matthew, are you referring to using chunked encoding to write the data?
Matthew LohbihlerFri 14 Jun 2013
That's related, yes. The response may end up being chunked, but that's a matter for the web server to decide. (I.e. whether it has queued too much data in its output buffer.)
Regarding the onHistRead method, it's a matter of significant performance improvement, especially when there is a lot of data. Currently the Haystack code makes a call into the database to get a HHisItem array. The use of arrays is itself problematic because we will rarely know how many rows are going to be returned, and so we have to load all of the HHisItem objects into a list, and then convert the list to an array, meaning we've created two potentially large arrays (including the one backing the list) when we only really needed one (if we just returned the list). Then, when creating the HGrid that ultimately gets returned, the HHisItems are converted to HRows, meaning that we've again doubled the amount of memory consumption, and to boot, the HGrid itself has its own row list, meaning we've now created 3 potentially large arrays. Finally, note that code that writes the HGrid to the output stream does not declare it as an entity, and so a large amount of data could end up chunking the response. Please correct me if any of this is wrong, but this is what i recall seeing.
But with streaming, no arrays are needed, and a minimal amount of object instatiation need be done. Plus, data makes it into the response stream as soon as the first row is available, minimizing "time-to-screen". I've checked in a version of this with streaming completed only for the onHistRead method so far, but i'm considering implementing for tag requests too. The code is in my bitbucket fork:
(I also changed the project structure to accommodate Eclipse, among a few other enhancements.)
Mike JarmyFri 14 Jun 2013
That's related, yes. The response may end up being chunked, but that's a matter for the web server to decide. (I.e. whether it has queued too much data in its output buffer.)
If the web server in question is an AX station, then AFAIK its not going to queue up data in a buffer for us like this automatically. So we'd probably have to roll our own. I think this would be doable but I haven't looked into it yet.
Regarding the onHistRead method, it's a matter of significant performance improvement, especially when there is a lot of data.
I agree that there are opportunities for performance improvement in what we've got now, though I'd be interested in seeing some before/after performance numbers. Mostly I think the streaming approach will just make the garbage collector work less hard -- there probably will not be a big performance boost. In cases where you are fetching a very large amount of data, its true that we could definitely run in to bigger problems. But really one shouldn't do that anyway -- just fetch it in batches.
Part of the reason haystack-java uses explicit arrays rather than java.util.Lists is because we have to support J2ME. We have to support J2ME because of the very large installed base of JACEs that many of us want to use haystack-java in. Since J2ME doesn't support generics, its better to use arrays rather than Lists, since the API is self-documenting that way.
The code is in my bitbucket fork: https://bitbucket.org/mlohbihler/haystack-java. (I also changed the project structure to accommodate Eclipse, among a few other enhancements.)
Your use of non-J2ME features like generics, annotations, etc means that your forked code will not run on the majority of JACEs installed in the field, since it can't be compiled to Java 1.3. As such, your fork will never be merged back in. Also, I can't help but notice that you've gratuitously reformatted every single file in the entire code base rather than change a couple of settings in your IDE.
Matthew LohbihlerFri 14 Jun 2013
Re code format, i asked Brian for a formatting rule file, but he didn't have anything like it. Rather than continue with arbitrary formatting, i decided to convert to a standard.
Matthew LohbihlerFri 14 Jun 2013
Re merge, i didn't suspect my changes would be merged in - at least there seemed to be significant reluctance. So, for those who are implementing on a more advanced Java platform and want the performance features and other enhancements, they can decide for themselves which they want to use.
Matthew LohbihlerFri 14 Jun 2013
My performance testing shows that my fork gets data into the output stream pretty much instantly, while there is significant delay in the original Haystack ("Haystack classic"?). Overall, history is delivered about 30% faster. (My specific test had an average of 1634ms, with original at 2363ms.) Interestingly, the response times in my fork are also very consistent (stddev of 8.4), while original Haystack had a stddev of 226. I suspect this is due to the extra garbage collection that original has to do.
Matthew Lohbihler Mon 10 Jun 2013
This method:
... could potentially return a great deal of data. Would it be better if it were passed a callback so that data can be streamed out?
Brian Frank Wed 12 Jun 2013
That is a good question. The problem is that typically that data then has to get transformed into a HGrid, and then serialized into Zinc. So the entire pipeline would have to be redesigned to handle streaming. That would be pretty complicated and especially painful in Java since it doesn't have first class functions. I actually implemented things in SkySpark like that, and have decided it really isn't worth the complication.
Matthew Lohbihler Thu 13 Jun 2013
Oh, SkySpark doesn't actually use your Java implementation? Maybe i'll just make the streaming changes in my own bitbucket branch then.
Matthew Lohbihler Thu 13 Jun 2013
Oh, re first class functions, this is true, but interfaces and anonymous classes make it pretty straightforward regardless. The syntax isn't exactly tidy to say the least, but using Eclipse et al helps remove most of the pain.
Brian Frank Thu 13 Jun 2013
SkySpark is written in fantom - so we don't use any of the code in the Java Haystack toolkit directly.
Mike Jarmy Fri 14 Jun 2013
Matthew, are you referring to using chunked encoding to write the data?
Matthew Lohbihler Fri 14 Jun 2013
That's related, yes. The response may end up being chunked, but that's a matter for the web server to decide. (I.e. whether it has queued too much data in its output buffer.)
Regarding the onHistRead method, it's a matter of significant performance improvement, especially when there is a lot of data. Currently the Haystack code makes a call into the database to get a HHisItem array. The use of arrays is itself problematic because we will rarely know how many rows are going to be returned, and so we have to load all of the HHisItem objects into a list, and then convert the list to an array, meaning we've created two potentially large arrays (including the one backing the list) when we only really needed one (if we just returned the list). Then, when creating the HGrid that ultimately gets returned, the HHisItems are converted to HRows, meaning that we've again doubled the amount of memory consumption, and to boot, the HGrid itself has its own row list, meaning we've now created 3 potentially large arrays. Finally, note that code that writes the HGrid to the output stream does not declare it as an entity, and so a large amount of data could end up chunking the response. Please correct me if any of this is wrong, but this is what i recall seeing.
But with streaming, no arrays are needed, and a minimal amount of object instatiation need be done. Plus, data makes it into the response stream as soon as the first row is available, minimizing "time-to-screen". I've checked in a version of this with streaming completed only for the onHistRead method so far, but i'm considering implementing for tag requests too. The code is in my bitbucket fork:
https://bitbucket.org/mlohbihler/haystack-java
(I also changed the project structure to accommodate Eclipse, among a few other enhancements.)
Mike Jarmy Fri 14 Jun 2013
If the web server in question is an AX station, then AFAIK its not going to queue up data in a buffer for us like this automatically. So we'd probably have to roll our own. I think this would be doable but I haven't looked into it yet.
I agree that there are opportunities for performance improvement in what we've got now, though I'd be interested in seeing some before/after performance numbers. Mostly I think the streaming approach will just make the garbage collector work less hard -- there probably will not be a big performance boost. In cases where you are fetching a very large amount of data, its true that we could definitely run in to bigger problems. But really one shouldn't do that anyway -- just fetch it in batches.
Part of the reason haystack-java uses explicit arrays rather than java.util.Lists is because we have to support J2ME. We have to support J2ME because of the very large installed base of JACEs that many of us want to use haystack-java in. Since J2ME doesn't support generics, its better to use arrays rather than Lists, since the API is self-documenting that way.
Your use of non-J2ME features like generics, annotations, etc means that your forked code will not run on the majority of JACEs installed in the field, since it can't be compiled to Java 1.3. As such, your fork will never be merged back in. Also, I can't help but notice that you've gratuitously reformatted every single file in the entire code base rather than change a couple of settings in your IDE.
Matthew Lohbihler Fri 14 Jun 2013
Re code format, i asked Brian for a formatting rule file, but he didn't have anything like it. Rather than continue with arbitrary formatting, i decided to convert to a standard.
Matthew Lohbihler Fri 14 Jun 2013
Re merge, i didn't suspect my changes would be merged in - at least there seemed to be significant reluctance. So, for those who are implementing on a more advanced Java platform and want the performance features and other enhancements, they can decide for themselves which they want to use.
Matthew Lohbihler Fri 14 Jun 2013
My performance testing shows that my fork gets data into the output stream pretty much instantly, while there is significant delay in the original Haystack ("Haystack classic"?). Overall, history is delivered about 30% faster. (My specific test had an average of 1634ms, with original at 2363ms.) Interestingly, the response times in my fork are also very consistent (stddev of 8.4), while original Haystack had a stddev of 226. I suspect this is due to the extra garbage collection that original has to do.