Frequency of symbols in programming languages
Asked Answered
B

6

21

I'm looking for some kind of reference which shows the frequency of symbols of popular programming languages. I'm trying to design an optimal keyboard layout for programming.

If there is no such reference, I wouldn't mind creating a simple utility that figures this out. However, I would need suggestions as to which files to analyze for each language.

One of the problems I can foresee is say I get some objective-c code, if it is a simple program with no objects, then the [ and ] keys will be far less frequent than an average objective-c file. So, I would say one of the guidelines is that the sample code should be representative of an average file and use the most commonly used features of the language.

Originally I was thinking that I should get the same code written in different languages, but I'm not sure if that's a good idea since some languages have different uses than others.

Bulgaria answered 6/8, 2010 at 6:40 Comment(2)
Are you designing a keyboard layout for a laptop or a full-size keyboard? (My hunch is that you want to try to avoid requiring users pressing multiple shift-like keys to get programming symbols, but that's not really avoidable on small keyboards…)Michaelis
Sounds interesting - have always noted that the 'Mavis Bacon' etc typing tutors aren't really geared toward coders. But had never thought of changing the keyboard myself ;)Wittenberg
P
7

For large code samples to use for statistical analysis, you might try browsing popular open-source projects or searching on Koders by language.

I made some simple changes to a QWERTY layout a few years ago, and I've been using it ever since as my general-purpose layout:

  • Swap digits for their corresponding shift-symbols.
  • Swap _ and -: names with underscores are common, and now - and + both require Shift.
  • Swap [] and {}: blocks are more common than subscripts.

Plus two optional changes, to taste:

  • Swap ` and ~: destructors are common.
  • Swap ' and ": strings are more common than characters.

The last is the only one that typically would interfere with typing ordinary English text. The layout works beautifully for C++, Perl, and whatever else I've used in the past two or three years. The noticeable speed increase comes from the drastic reduction in the need to hit the Shift key. I find that using Shift for the numbers isn't a big deal since the number pad is usually faster anyway.

Puissance answered 12/8, 2010 at 3:22 Comment(1)
I use AutoHotKey to do this task and I've done about the same thing as you. Except that I've added a lot more other changes like changing the Capslock to Alt and then with combinations of Alt and j, k, l, i, I simulate arrow keys to prevent moving my fingers while programming.Cannabis
C
7

@Derek Jones cited The New C Standard: An economic and cultural commentary which has the information but here are the frequencies contained therein for quick reference:

space 15.083
! 0.102
" 0.376
# 0.175
$ 0.005
% 0.105
# 0.175
& 0.237
' 0.101
( 1.372
) 1.373
* 1.769
+ 0.182
, 1.565
- 1.176
. 1.512
/ 0.718
: 0.192
; 1.276
< 0.118
= 1.039
> 0.587
? 0.022
@ 0.009
[ 0.163
\ 0.97
] 0.163
^ 0.003
_ 2.550
{ 0.303
| 0.098
} 0.210
~ 0.002

Here is the same sorted by frequency:

space 15.083
_ 2.550
* 1.769
, 1.565
. 1.512
) 1.373
( 1.372
; 1.276
- 1.176
= 1.039
/ 0.718
> 0.587
" 0.376
{ 0.303
& 0.237
} 0.210
: 0.192
+ 0.182
# 0.175
] 0.163
[ 0.163
< 0.118
% 0.105
! 0.102
' 0.101
| 0.098
? 0.022
@ 0.009
$ 0.005
^ 0.003
~ 0.002
Criticize answered 6/7, 2020 at 23:59 Comment(0)
I
2

The book The New C Standard: An economic and cultural commentary contains a lot of measurements of C source usage. The usage figures and tables are available as a stand-alone pdf

Irrefrangible answered 23/2, 2011 at 2:12 Comment(1)
Character frequencies are on page 30 of the figures documentCriticize
S
1

Their is a version of the Dvorak keyboard layout available, optimized for programmers.

http://www.kaufmann.no/roland/dvorak/

If you happen to use Ubuntu, it is already on your system.

Schmitz answered 6/8, 2010 at 6:57 Comment(2)
Yup, that's exactly the keyboard I am customizing. I don't like how the equals sign is hard to reach.Bulgaria
Really? just stretch your index finger. Works for me ;)Ewold
A
1

There's a vast collection of open-source software that you could measure to gain some good data on character frequency. Sourceforge and github would be the places to look.

Developers don't just write code though, they also write design documents, emails and answers to stack overflow questions. Maybe installing a key logger on a few consenting developers computers would be the best way.

Alex answered 12/8, 2010 at 3:1 Comment(0)
H
1

What you're looking for is a good corpus of programming languages. While nothing immediately sprung up in a cursory Googling, the following links might hopefully prove to be useful if you do create your own tool.

A novel framework to detect source code plagiarism

Calgary Corpus

Generating an NLP Corpus from Java Source Code

A Computer Science Text Corpus/Search Engine X-Tec and Its Applications

Mining search topics from a code search engine usage log

Hema answered 7/10, 2010 at 4:45 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.