Slow chunk response in Play 2.2
Asked Answered
S

1

6

In my play-framework-based web application users can download all the rows of different database tables in csv or json format. Tables are relatively large (100k+ rows) and I am trying to stream back the result using chunking in Play 2.2.

However the problem is although println statements shows that the rows get written to the Chunks.Out object, they do not show up in the client side! If I limit the rows getting sent back it will work, but it also has a big delay in the beginning which gets bigger if I try to send back all the rows and causes a time-out or the server runs out of memory.

I use Ebean ORM and the tables are indexed and querying from psql doesn't take much time. Does anyone have any idea what might be the problem?

I appreciate your help a lot!

Here is the code for one of the controllers:

@SecureSocial.UserAwareAction
public static Result showEpex() {

    User user = getUser();
    if(user == null || user.getRole() == null)
        return ok(views.html.profile.render(user, Application.NOT_CONFIRMED_MSG));

    DynamicForm form = DynamicForm.form().bindFromRequest();
    final UserRequest req = UserRequest.getRequest(form);

    if(req.getFormat().equalsIgnoreCase("html")) {
        Page<EpexEntry> page = EpexEntry.page(req.getStart(), req.getFinish(), req.getPage());
        return ok(views.html.epex.render(page, req));
    }

    // otherwise chunk result and send back
    final ResultStreamer<EpexEntry> streamer = new ResultStreamer<EpexEntry>();
    Chunks<String> chunks = new StringChunks() {
            @Override
            public void onReady(play.mvc.Results.Chunks.Out<String> out) {

                Page<EpexEntry> page = EpexEntry.page(req.getStart(), req.getFinish(), 0);
                ResultStreamer<EpexEntry> streamer = new ResultStreamer<EpexEntry>();
                streamer.stream(out, page, req);
            }
    };
    return ok(chunks).as("text/plain");
}

And the streamer:

public class ResultStreamer<T extends Entry> {

private static ALogger logger = Logger.of(ResultStreamer.class);

public void stream(Out<String> out, Page<T> page, UserRequest req) {

    if(req.getFormat().equalsIgnoreCase("json")) {
        JsonContext context = Ebean.createJsonContext();
        out.write("[\n");
        for(T e: page.getList())
            out.write(context.toJsonString(e) + ", ");
        while(page.hasNext()) {
            page = page.next();
            for(T e: page.getList())
                out.write(context.toJsonString(e) + ", ");
        }
        out.write("]\n");
        out.close();
    } else if(req.getFormat().equalsIgnoreCase("csv")) {
        for(T e: page.getList())
            out.write(e.toCsv(CSV_SEPARATOR) + "\n");
        while(page.hasNext()) {
            page = page.next();
            for(T e: page.getList())
                out.write(e.toCsv(CSV_SEPARATOR) + "\n");
        }
        out.close();
    }else {
        out.write("Invalid format! Only CSV, JSON and HTML can be generated!");
        out.close();
    }
}


public static final String CSV_SEPARATOR = ";";
} 

And the model:

@Entity
@Table(name="epex")
public class EpexEntry extends Model implements Entry {

    @Id
    @Column(columnDefinition = "pg-uuid")
    private UUID id;
    private DateTime start;
    private DateTime finish;
    private String contract;
    private String market;
    private Double low;
    private Double high;
    private Double last;
    @Column(name="weight_avg")
    private Double weightAverage;
    private Double index;
    private Double buyVol;
    private Double sellVol;

    private static final String START_COL = "start";
    private static final String FINISH_COL = "finish";
    private static final String CONTRACT_COL = "contract";
    private static final String MARKET_COL = "market";
    private static final String ORDER_BY = MARKET_COL + "," + CONTRACT_COL + "," + START_COL;

    public static final int PAGE_SIZE = 100;

    public static final String HOURLY_CONTRACT = "hourly";
    public static final String MIN15_CONTRACT = "15min";

    public static final String FRANCE_MARKET = "france";
    public static final String GER_AUS_MARKET = "germany/austria";
    public static final String SWISS_MARKET = "switzerland";

    public static Finder<UUID, EpexEntry> find = 
            new Finder(UUID.class, EpexEntry.class);

    public EpexEntry() {
    }

    public EpexEntry(UUID id, DateTime start, DateTime finish, String contract,
            String market, Double low, Double high, Double last,
            Double weightAverage, Double index, Double buyVol, Double sellVol) {
        this.id = id;
        this.start = start;
        this.finish = finish;
        this.contract = contract;
        this.market = market;
        this.low = low;
        this.high = high;
        this.last = last;
        this.weightAverage = weightAverage;
        this.index = index;
        this.buyVol = buyVol;
        this.sellVol = sellVol;
    }

    public static Page<EpexEntry> page(DateTime from, DateTime to, int page) {

        if(from == null && to == null)
            return find.order(ORDER_BY).findPagingList(PAGE_SIZE).getPage(page);
        ExpressionList<EpexEntry> exp = find.where();
        if(from != null)
            exp = exp.ge(START_COL, from);
        if(to != null)
            exp = exp.le(FINISH_COL, to.plusHours(24));
        return exp.order(ORDER_BY).findPagingList(PAGE_SIZE).getPage(page);
    }

    @Override
    public String toCsv(String s) {
        return id + s + start + s + finish + s + contract + 
                s + market + s + low + s + high + s + 
                last + s + weightAverage + s + 
                index + s + buyVol + s + sellVol;   
    }
Stander answered 9/10, 2013 at 22:35 Comment(0)
I
3

1. Most of browsers wait for 1-5 kb of data before showing any results. You can check if Play Framework actually sends data with command curl http://localhost:9000.

2. You create streamer twice, remove first final ResultStreamer<EpexEntry> streamer = new ResultStreamer<EpexEntry>();

3. - You use Page class for retrieving large data set - this is incorrect. Actually you do one big initial request and then one request per iteration. This is SLOW. Use simple findIterate().

add this to EpexEntry (feel free to change it as you need)

public static QueryIterator<EpexEntry> all() {
    return find.order(ORDER_BY).findIterate();
}

your new stream method implementation:

public void stream(Out<String> out, QueryIterator<T> iterator, UserRequest req) {

    if(req.getFormat().equalsIgnoreCase("json")) {
        JsonContext context = Ebean.createJsonContext();
        out.write("[\n");
        while (iterator.hasNext()) {
            out.write(context.toJsonString(iterator.next()) + ", ");
        }
        iterator.close(); // its important to close iterator
        out.write("]\n");
        out.close();
    } else // csv implementation here

And your onReady method:

            QueryIterator<EpexEntry> iterator = EpexEntry.all();
            ResultStreamer<EpexEntry> streamer = new ResultStreamer<EpexEntry>();
            streamer.stream(new BuffOut(out, 10000), iterator, req); // notice buffering here

4. Another problem is - you call Out<String>.write() too often. Call of write() means that server needs to send new chunk of data to client immediately. Every call of Out<String>.write() have significant overhead.

Overhead appears because server needs to wrap response into chunked result - 6-7 bytes for each message Chunked response Format. Since you send small messages, overhead is significant. Also, server needs to wrap your reply in TCP packet which size will be far less from optimal. And, server needs to perform some internal action to send an chunk, this is also require some resources. As result, download bandwidth will be far from optimal.

Here is simple test: send 10000 lines of text TEST0 to TEST9999 in chunks. This takes 3 sec on my computer in average. But with buffering this takes 65 ms. Also, download sizes are 136 kb and 87.5 kb.

Example with buffering:

Controller

public class Application extends Controller {
    public static Result showEpex() {
        Chunks<String> chunks = new StringChunks() {
            @Override
            public void onReady(play.mvc.Results.Chunks.Out<String> out) {
                new ResultStreamer().stream(out);
            }
        };
        return ok(chunks).as("text/plain");
    }
}

new BuffOut class. It's dumb, I know

public class BuffOut {
    private StringBuilder sb;
    private Out<String> dst;

    public BuffOut(Out<String> dst, int bufSize) {
        this.dst = dst;
        this.sb = new StringBuilder(bufSize);
    }

    public void write(String data) {
        if ((sb.length() + data.length()) > sb.capacity()) {
            dst.write(sb.toString());
            sb.setLength(0);
        }
        sb.append(data);
    }

    public void close() {
        if (sb.length() > 0)
            dst.write(sb.toString());
        dst.close();
    }
}

This implementation have 3 second download time and 136 kb size

public class ResultStreamer {
    public void stream(Out<String> out) {
    for (int i = 0; i < 10000; i++) {
            out.write("TEST" + i + "\n");
        }
        out.close();
    }
}

This implementation have 65 ms download time and 87.5 kb size

public class ResultStreamer {
    public void stream(Out<String> out) {
        BuffOut out2 = new BuffOut(out, 1000);
        for (int i = 0; i < 10000; i++) {
            out2.write("TEST" + i + "\n");
        }
        out2.close();
    }
}
Individually answered 19/10, 2013 at 18:58 Comment(6)
Thanks for your answer Viktor. The buffering will improve the speed however the delay between when I write to out and when it shows up in the browser is still huge. Adding simple println statements shows that all rows will be written to the out and when there is no more and out is closed they start loading in the browser!! And if the number of rows are too big there is a timeout error like this:Stander
[ERROR] [10/22/2013 13:57:16.285] [application-akka.actor.default-dispatcher-5] [ActorSystem(application)] Failed to run termination callback, due to [Futures timed out after [5000 milliseconds]] java.util.concurrent.TimeoutException: Futures timed out after [5000 milliseconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:96) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:100) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at akka.dispatch.MonitorableThreadFactory$AkkaForkJoinWorkerThread$$anon$Stander
Could you please insert several System.out.println(System.currentTimeMillis()) into your code and show output here? Please place them after static Result showEpex(), after // otherwise chunk result and send back line, just before last line of public void stream(Out<String> out, Page<T> page, UserRequest req) and just before return ok(chunks).as("text/plain");? For some reason your chunk execution didn't finish or take so much time, so execution was terminated by play framework. Also, did you tried to run my code? Could you please confirm if you have same troubles with it?Individually
Here is the output: 1382542832461: showEpex() 1382542837811: showEpex() 1382542837875: start chunking 1382542837988: returning chunks 1382542882427: exiting stream() 1382542937281: exiting stream() The output appears in the browser after the second exiting stream not the first one! If instead of paging through results I just use your code but with a bigger loop like 10000000, then the same thing happens like when I stream big(16GB) database(timeout)! It seems that instead of generating and writing to out, it will first generate everything and then chunks already gathered and generated data!Stander
Updated the answer. Should help with your problemIndividually
I'm having the exact same issue. The browser does not pop-up the download until all the chunks have been created and out.close is called... is this issue fixed in play 2.3? Does making a scala controller fix this bug?Hystero

© 2022 - 2024 — McMap. All rights reserved.