| META TOPICPARENT
Reconnaissance of Coding Data
These commands were used to perform basic integrity checks on the data, especially the first column, which is the most heterogeneous in the type of data it contains.
Some command line stuff to perform basic integrity checks on the data.
Scale of stuff to look at
pbpaste | cat | wc
65536 9807 101150
pbpaste takes the contents of the clipboard and sends it to
| pipe character which "pipes"
stdin of the following command
cat concatenates to
stdout . I use
cat here defensively:
cat seems to do some smart things with encodings, "conditioning" the text for use by other utilities and in this simple pipeline could have been omitted with identical results. I have encountered situations in which subsequent processing of the clipboard contents behaved better when using
pbpaste if I inserted
cat . It may be superfluous voodoo
wc performs a word count, reporting number of lines, number of words, number of characters * The clipboard apparently has
- 65536 lines -- more than I want to look at -- Most are probably empty. That number is 2^16 and probably represents the maximum number of possible rows.
- 9807 words
- 101150 characters
Scale of unique stuff
pbpaste | cat | sort | uniq -c | wc
967 2571 14253
sort sorts the lines
uniq -c finds unique lines, the
-c flag says to count how many instances of each line occurred
- I was actually wanting to see the unique lines, but by starting with
wc I got an idea of how much stuff I was going to need to look at, here nearly 1000 lines.
The unique stuff
pbpaste | cat | sort | uniq -c | less
- same as above except replace
less which lets me page backwards and forwards through the output.
The unique stuff of likely interest that isn't a problem number
pbpaste | cat | grep '^[A-Z]' | sort | uniq -c | less
- similar to above, but only show lines which start with a capital letter
grep g eneralized r egular e xpression p arser looks at lines and passes ones which match to
stdout discarding non-matches
^ anchors to the start of the line
[A-Z] matches any single character in the given range
- single quotes to protect the search pattern from interpretation by the shell
Check the other stuff
pbpaste | cat | grep -v '^[A-Z]' | sort | uniq -c | less
-- DickFurnas - 2010-01-05
- same as above, except the
-v flag tells grep to reverse its behavior, send lines which do not match to
- Why? To see if I missed anything of interest.