Fast reading of console input
Asked Answered
S

5

14

I need for fast reading data from standard input stream of console. Input consist of 100.000 rows with 20 chars each (2 million chars); user paste it from clipboard. My procedure works for about 3 minutes (very slowly; the target is 10 seconds). It is look like:

var inputData = new string[100000]; // 100.000 rows with 20 chars
for (int i = 0; i < 100000; i++) // Cycle duration is about 3 minutes...
{
    inputData[i] = Console.ReadLine();
}
// some processing...

What's I tried:

  1. Directly: Console.Read, Console.ReadKey - the same result

  2. Console.In: Read(), ReadLine(), ReadAsync(), ReadLineAsync(), ReadBlock(with various block size), ReadBlockAsync(), ReadToEnd(), ReadToEndAsync() - the same result

  3. new StreamReader(Console.OpenStandardInput(buffer)) with various buffer and block size - the same result

  4. Hide console window at start of reading, and show it when reading is finished - acceleration 10%

  5. I tried get input data from file - it's works perfectly and fast. But I need read from __ConsoleStream.

I noticed, while input reading in progress - process conhost.exe actively uses a processor.

How can I speed up the reading of input?

upd:

  1. Increasing/decreasing Console.BufferHeight and Console.BufferWidth has no effect

  2. ReadFile msdn is also slowly. But I noticed an interesting fact:

    ReadFile(handle, buffer, bufferSize, out bytesCount, null);
    // bufferSize may be very big, but buffer obtains no more than one row (with \r\n).
    // So, it seems that data passed into InputStream row-by-row syncroniously.
    
Sleuth answered 26/10, 2015 at 9:28 Comment(12)
inputData = Console.ReadLine(); won't compile and how exactly does the Clipboard fit in?Micrometry
Reading 20 MB of text should take much less than a second.Micrometry
Why not directly reading the data from the clipboard? #3840580Handiness
I wonder if playing with BufferHeight changes anything.Jalap
@HenkHolterman, sorry, must be inputData[i] = Console.ReadLine();Sleuth
@GSerg, BufferHeight and BufferWidth has no effectSleuth
@AlexH, Because the task is get a data from input stream (keyboard).Sleuth
Is it just as slow in a release build?Jalap
@Jalap It is slow in debug and release builds.Sleuth
Are you sure the bottle neck is in the Console.Read? In general I find, if changing one line of code multiple times yields zero change to the run time, is because I am looking at the wrong place.Monomania
@Aron, yes, I'm sure. Because, I compared DateTime.Now.Ticks before and after calling Read() .Sleuth
A I said in my post below, Read() and Readline() echo the pasted text, and the act of writing 100,000 characters dooms the process to failure (Console.WriteLines are time killers). Using Console.Readkey(true) prevents this echo effect and speeds things up dramatically without any other modifications.Dishpan
A
2

In you scenario a lot of time is wasted by attempts to display inserting symbols. You can disable inserting symbols displaying in Windows (I don't know how to do that on other platforms).

Unfortunately, necessary API is not exposed by .NET (at least in 4.6.1). So you need following native methods/constants:

internal class NativeMethods
{
    [DllImport("kernel32.dll", SetLastError = true)]
    internal static extern bool SetConsoleMode(IntPtr hConsoleHandle, int mode);

    [DllImport("kernel32.dll", SetLastError = true)]
    internal static extern bool GetConsoleMode(IntPtr hConsoleHandle, out int mode);

    [DllImport("kernel32.dll", SetLastError = true)]
    internal static extern IntPtr GetStdHandle(int nStdHandle);

    internal const int STD_INPUT_HANDLE = -10;
    internal const int ENABLE_ECHO_INPUT = 0x0004;
}

and use them in following way before receiving data from clipboard:

var handle = NativeMethods.GetStdHandle(NativeMethods.STD_INPUT_HANDLE);
int mode; 
NativeMethods.GetConsoleMode(handle, out mode);
mode &= ~NativeMethods.ENABLE_ECHO_INPUT; // disable flag
NativeMethods.SetConsoleMode(handle, mode);

Don't forget to revert console mode flags back when you finished receiving clipboard data. I hope it will reduce your performance problem. More info about console modes can be found on GetConsoleMode

Further attempts to optimize can include:

  • Rewrite console reading code without locks (as it implemented in .NET) and ensure that no any threads works with console at that moment. Quite expensive task.
  • Try to find a way to increase stdin buffer size. But i'm not sure is it possible at all.
  • Don't forget to test in release build without debugging %)
Atalie answered 8/10, 2018 at 16:36 Comment(0)
D
2

Your main slowdown here is that Console.Read() and Console.ReadLine() both "echo" your text on the screen - and the process of writing the text slows you WAY down. What you want to use, then, is Console.Readkey(true), which does not echo the pasted text. Here's an example that writes 100,000 characters in about 1 second. It may need some modification for your purposes, but I hope it's enough to give you the picture. Cheers!

public void begin()

    {   List<string> lines = new List<string>();
        string line = "";
        Console.WriteLine("paste text to begin");
        int charCount = 0;
        DateTime beg = DateTime.Now;
        do
        {
            Chars = Console.ReadKey(true);
            if (Chars.Key == ConsoleKey.Enter)
            {
                lines.Add(line);
                line = "";
            }
            else
            {
                line += Chars.KeyChar;
                charCount++;
            }


        } while (charCount < 100000);
        Console.WriteLine("100,000 characters ("+lines.Count.ToString("N0")+" lines) in " + DateTime.Now.Subtract(beg).TotalMilliseconds.ToString("N0")+" milliseconds");

    }

I'm pasting a 5 MB file with long lines of text on a machine with all cores active doing other things (99% CPU load) and getting 100,000 characters in 1,600 lines in 1.87 seconds.

Dishpan answered 27/8, 2016 at 3:40 Comment(1)
I'm sure you realize that file upload and/or IO goes a LOT faster than this... this is a relatively poor choice performance-wise.Dishpan
A
2

In you scenario a lot of time is wasted by attempts to display inserting symbols. You can disable inserting symbols displaying in Windows (I don't know how to do that on other platforms).

Unfortunately, necessary API is not exposed by .NET (at least in 4.6.1). So you need following native methods/constants:

internal class NativeMethods
{
    [DllImport("kernel32.dll", SetLastError = true)]
    internal static extern bool SetConsoleMode(IntPtr hConsoleHandle, int mode);

    [DllImport("kernel32.dll", SetLastError = true)]
    internal static extern bool GetConsoleMode(IntPtr hConsoleHandle, out int mode);

    [DllImport("kernel32.dll", SetLastError = true)]
    internal static extern IntPtr GetStdHandle(int nStdHandle);

    internal const int STD_INPUT_HANDLE = -10;
    internal const int ENABLE_ECHO_INPUT = 0x0004;
}

and use them in following way before receiving data from clipboard:

var handle = NativeMethods.GetStdHandle(NativeMethods.STD_INPUT_HANDLE);
int mode; 
NativeMethods.GetConsoleMode(handle, out mode);
mode &= ~NativeMethods.ENABLE_ECHO_INPUT; // disable flag
NativeMethods.SetConsoleMode(handle, mode);

Don't forget to revert console mode flags back when you finished receiving clipboard data. I hope it will reduce your performance problem. More info about console modes can be found on GetConsoleMode

Further attempts to optimize can include:

  • Rewrite console reading code without locks (as it implemented in .NET) and ensure that no any threads works with console at that moment. Quite expensive task.
  • Try to find a way to increase stdin buffer size. But i'm not sure is it possible at all.
  • Don't forget to test in release build without debugging %)
Atalie answered 8/10, 2018 at 16:36 Comment(0)
C
0

Use native WinApi function:

  1. Get input handle: GetStdHandle msdn
  2. Read 22 bytes (with endline /n/r) with ReadFile (Instead of ReadLine) msdn

Examples for WinApi use in C#: http://www.pinvoke.net/

Catbird answered 26/10, 2015 at 12:40 Comment(1)
The last idea: - Read all input with one ReadFile call to one memory buffer; - Don't use string array - use one buffer of memory (may be, performance fall down because string objects are created for a long time).Catbird
C
0

I don't see that you need to preserve order? If so, use Parallel in combination with partitioner class since you're executing small tasks:

See When to use Partitioner class? for example

This means you have to change datatype to ConcurrentBag or ConcurrentDictionary

Clause answered 27/12, 2016 at 12:8 Comment(0)
M
-2

Why not use

Parallel.For

To Multi-Thread the read from Console? If not then try to pull it straight from the clipboard using

https://msdn.microsoft.com/en-us/library/kz40084e(v=vs.110).aspx

Machree answered 27/6, 2017 at 14:39 Comment(1)
Why would you downvote me for simply giving an idea?Machree

© 2022 - 2024 — McMap. All rights reserved.