I love getting pull requests on GitHub. It's such a lovely gift when someone wants to contribute their code to my code. However, it seems there are three kinds of pull requests that I get.
- Awesome, appreciated and wanted.
- Not so good, thanks for trying, but perhaps another time.
- THE WALL OF PINK
I'd like to talk about The Wall of Pink. This is a pull request that is possibly useful, possibly awesome, but I'll never know because 672 lines (GitHub tells me) changed because they used CRs and I used LFs or I used CRLF and they used LF, or I used...well, you get the idea.
There is definitely a problem here. But what's the problem? Well, it's kind of like endianness, except we're still talking about it in 2013.
"A big-endian machine stores the most significant byte first—at the lowest byte address—while a little-endian machine stores the least significant byte first." - Endianness
Did you know for a long time Apple computers were big endian and Intel computers were little endian? The Java VM is big endian. I wrote shareware code generator 16 years ago that generated a byte array on an Intel PC that was later entered into a PalmPilot running a Motorola 68328. This was the last time I thought about endianness in my career. Folks working on lower-level stuff do think about this sometimes, admittedly, but the majority of folks don't sweat endianness day to day..
TCP/IP itself is, in fact, big endian. There was a time when we had to really think about the measurable performance hit involved in using TCP/IP on a little-endian processor. But we don't think about that anymore. It's there but the abstraction is not very leaky.
It's years later, but CR/LF issues plague us weekly. That Wall of Pink I mentioned? It looks like this. I had to scroll 672 lines before I saw the +green where the added lines were added. Who knows what really changed here though? Can't tell since this diff tool thinks every line changed.
Sigh.
Whose fault is this?
Perhaps we blame Émile Baudot in 1870 and Donald Murray in 1899 for adding control characters to instruct a typewriter carriage to return to the home position plus a line feed to advance the paper on the roller. Or we blame Teletype machines. Or the folks at DEC, or perhraps Gary Kidall and CP/M for using DEC as a convention. Then the bastards at IBM who moved to ASCII from EBCDIC and needed a carriage return when punch-cards fell out of favor.
The text files we have to day on Windows still have a CR LF (0D 0A) after every line. But Apple uses just uses a line feed (LF) character. There's no carriage to return, but there are lines to advance so it's a logical savings.
Macs and PCs are sharing text more than ever. We live in a world where Git is FTP for code, we're up a level, above TCP/IP where Endianness is hidden, but still in text where CR LF's aren't.
We store our text files in different formats on disk, but later when the files are committed to Git, how are they stored? It depends on your settings and the defaults are never what's recommended.
You can setup a .gitattributes per repo to do things like this:
*.txt -crlf
Or you can do what GitHub for Windows suggests with text=auto.
# Auto detect text files and perform LF normalization
* text=auto
What's text=auto do?
This ensures that all files that git considers to be text will have normalized (LF) line endings in the repository. The core.eol configuration variable controls which line endings git will use for normalized files in your working directory; the default is to use the native line ending for your platform, or CRLF if core.autocrlf is set.
It uses the native line ending for your platform. But if you spend a few minutes googling around you'll find arguments several ways with no 100% clear answer, although most folks seem to believe GitHub has the right one.
If this is the right answer, why isn't it a default? Is it time to make this the default?
This is such a problem that did you know GitHub for Windows has dedicated "normalize your repo's CRLF" code? They'll fix them all and make a one-time commit to fix the line endings.
I think a more complete solution would also include improvements to the online diff tool. If the GitHub repro and server knows something is wrong, that's a great chance for the server to suggest a fix, proactively.
Solutions
Here's some possible solutions as I see it.
Make Windows switch all text files and decades of convention to use just LF- Have better platform specific defaults without a .gitattributes file
- Have the GitHub web application be more proactive in suggesting solutions and preventing badness
- Have the GitHub for Windows desktop application proactively notice issues (before I go to settings) and offer to help
- Make the diff tool CR/LF aware and "do the right thing" like desktop diff tools that can ignore line ending issues
Until something is done, I'll always tense up when I see an incoming pull request and hope it's not a Wall of Pink.
Thoughts?
© 2013 Scott Hanselman. All rights reserved.